August 4, 2015

Why AV is Dead, and what to do about it.

Scott McDonald, Herjavec Group

In the proverbial cat-and-mouse game of cybersecurity neither the attacker nor defender can maintain their advantage for very long.  The lifecycle of new technologies in IT is very short. But in cybersecurity that time is condensed into nanoscopic increments of obsolescence allowing new lethal threats to overtake yesterday’s sophisticated cyber defenses.

Let’s take a look ‘under the hood’ of the mechanics of Anti-Virus software and why it has reached the end of its replacement cycle. Which is to say, we will strive to understand “Why AV is Dead…”[1]

AV engines work by comparing a file (binary, executable, text file, document, data file, DLL, archive, etc.) with a database of known bad files. If they find a match then the file is convicted as malicious and terminated/quarantined/cleaned.[2]

AV manufacturers build their databases by cataloguing either,[3]

  • known bad search strings (cleartext) in the malicious file,
  • hashing part of the known bad malicious file,
  • hashing the full known bad malicious file. 

To understand techniques (2) and (3), consider how hashing algorithms work. A hashing algorithm is a one-way function (computation). It takes the contents of a file, performs a calculation on the entire file computing a hash value such as this 32 digit hexadecimal number:  0ab2fe2fb74e02b94a129ef497e34e4c.   

That specific hash-value (aka: checksum) is actually the hash of all the text in this article.  It is using the MD5 hash function which is an industry standard for cataloguing malware.[4]  For example, the US Department of Homeland Security uses the MD5 hashing standard in their US-CERT security bulletins as does most of the industry.[5]   Since these hashing algorithms only compute values in one-direction you cannot reverse the hash to unpack the full file. Hence, they are said to be, “one-way functions.”  You also cannot easily get two different files to hash to the same value.  Hence, we call MD5 checksums relatively “collision-proof.”   

By metaphor, no two fingerprints look exactly the same. Likewise, you might be able to match a fingerprint to someone being investigated if you already have that individual’s fingerprints stored in your database. But if you do not have the fingerprint (hash) already in the database of known offenders then the suspect (or the malicious file) may pass through undetected. 

SM image 1The mathematics of hashing algorithms has not changed since the 1990’s and still works very well today for “known” malware.  So long as the incoming malware matches the full-file hash in the database then the AV triggers a positive match, warning the user/admin of malware detected and sending the malicious file into quarantine.   Everything works as intended and the user remains secure. 

In order to defeat this system of static file hashes all the malware needs to do is change one character in the original file and it will hash to a different value.  That value will not be in the AV’s library of known bad MD5 hashes. 

When malware re-writes or changes its source-code every time it executes it is said to be, “polymorphic.”[6]  The defence against polymorphic malware is to scan using fuzzy logic.  

Fuzzy definitions of malware might work when matching short character strings (for example, “SQL injection” and “SQL-injection” and “S.Q.L.Injection” and “sqlinjection” can all be treated as functionally equivalent) but do not work when matching full-file-hashes because of the collision-proof attribute of one-way hash functions.

Fuzzy definition #1 and fuzzy definition #2 will compute to entirely different values even though they are very similar in source code. Thus to compute all the hashes for all the fuzzy definitions of every known malware in the wild would create a computational complexity problem that could not be solved by software running on a desktop computer.

To emphasize this sensitivity consider the following two sentences:
“This sentence represents a known malicious file trying to own your computer.”

MD5 hash: 455b5d9e8edecf28d36c8516a7d3f55f [7]

“This sentence represents a known malicious file trying to pwn your computer.”

MD5 hash: 680e1773b086ccc63c2f813a14785a28

The malware author simply changed one letter in the word, “own” to “pwn” and the hash value is completely unrecognizable. Therein lies the death of AV.[8]

To complicate matters even further a new and malevolent service online has emerged to aid the attackers.

Crypting services can start from $10 USD / month and will take your malware, obfuscate the source code, insert noisy stubs of nonsense (salting, chaffing) and test the new & enhanced malware against all the known AV engines in the market. [9]

So to summarize the problem with Anti-Virus consider this diagram:

malware_diagram_AV

But fear not.  The arms race between Cat (defender) and Mouse (attacker) does not end with the death of AV.   A new armory of next generation endpoint defences have hit the market offering a wide range of solutions to address this problem. They include:

Hardware Isolation:

Hardware isolation works through micro-virtualizing processes running on the endpoint (web browser tabs, outlook emails, documents in excel, etc.) each into their own isolated micro-VM. These containers run within their own micro-OS not touching the working memory of the parent operating system.  This contains malicious scripts from infecting the host (isolating browser cross-site scripting attacks, application exploits, macros, break-out malware, etc.) and terminates the entire micro-VM once the process is finished.

 malware_diagram_hardwareisoloation

Detonation/Sandboxing:

Malware detonation works by executing the malicious file within a safe environment (“sandbox”) to observe its behaviour and identify it as malicious or safe.  This detonation can occur within an appliance inside your IT infrastructure (inline or tapped) or it can occur as part of a threat-cloud emulation external to your environment and maintained by the manufacturer.  In both cases, this approach can convict a file that has never been seen before providing protection against “unknown” threats.

 malware_diagram_Sandbox

Artificial Intelligence:

Machine intelligence leverages supercomputer-sized, deep learning algorithms to model file comparisons across millions of attributes (far more than a single, human brain can process).  During the education phase the machine is given known good and known bad files and uses its own AI logic to learn what malicious files (malware) look like.   Then in the deployment phase a light-agent harnesses these models to convict never before seen incoming files and makes a probabilistic judgment (ex: 87.9% probably malicious).  The customer then designs their policies around their risk-threshold (ex: critical servers terminate all processes over 20% probably malicious).

 malware_diagram_AI_Learning

Memory Exploit Control:

Rather than trying to match the exact hammer, this methodology simply listens for the sound the hammer makes.  What that means, is that all attacks must execute one of a number of different exploit techniques (ex: memory corruption techniques such as heap sprays, buffer overflows and spawning child processes).  The Exploit Control agent sits in memory listening for these exploits and once detected it terminates the culprit process and reports back to the user/admin that an attempted attack occurred.  Since most advanced intrusions involve a multi-stage and multi-vector attack anatomy this approach has the advantage that it only needs to kill one stage in the exploit kill chain to defeat the threat.

 malware_diagram_Exploit_Control

Application Whitelisting:

Rather than going in circles trying to determine if an unknown application is good or bad why not simply just block everything unknown?  Application whitelisting does just that.  It determines what software you trust to execute in your environment and stops everything else from running.  This principle of least privilege applied to endpoints is very powerful as it restricts the attack surface so only those mission critical or approved applications are allowed to run.

malware_diagram_Application_Whitelisting

 This list represents just five leading next-gen approaches, of which there are many more. This new market in cybersecurity can be challenging to navigate. Allow Herjavec Group to share our expertise in endpoint protection with your team. Book a session with one of our security specialists and technical architects to walk through our Next Generation Endpoint Toolkit. This session will equip your security team to speak the language of endpoint protection and feel more comfortable identifying the best solutions to fit your organization’s needs.

For more information contact a Herjavec Group Security Specialist.


 

Notes

[1] In 1987, Fred Cohen prophetically wrote that, “There is no algorithm that can perfectly detect all possible computer viruses.” This utterance was made in the same year that John McAfee launched VirusScan, arguably the first AV product ever sold. Fred Cohen’s observation remains prophetic today even though it took nearly 30 years for Anti-Virus software to play out the full lifecycle of its existence. If you are interested in the seminal mathematics research of Fred Cohen’s proof see the article “An Undetectable Computer Virus,” by David M. Chess and Steve R. White,

https://web.archive.org/web/20110604155118/http://www.research.ibm.com/antivirus/SciPapers/VB2000DC.htm

[2]Each Anti-Virus manufacturer has a slightly different approach to what files get scanned, when they get scanned (opening file, saving file, etc.) and how they are scanned.

[3] Note that the 3-tiered classification offered here is a generalization of AV products (each differs slightly in how they generate signatures) and a simplification of the options for the sake of clarity. For example, Autosig developed in 2006 generates signatures by performing statistical analysis of byte frequency in the invariant code shared by parent families of viruses. http://www.gecode.org/~schulte/teaching/theses/ICT-ECS-2006-122.pdf

[4] MD5 was developed by Ron Rivest at MIT in 1992 and disclosed to the world in his seminal research paper entitled, “The MD5 Message-Digest Algorithm.” https://tools.ietf.org/html/rfc1321  

[5] US-CERT – Department of Homeland Security: https://www.us-cert.gov/

[6] Raghunathan, Srinivasan (2007). “Protecting anti-virus software under viral attacks.” M.Sc. Thesis, Arizona State University

[7] Try it yourself. Use this online MD5 generator: http://www.danstools.com/md5-hash-generator/

[8]The “Death of AV” phrase made headlines with mainstream media as late as 2014 when Brian Dye declared to the Wall Street Journal that antivirus “is dead.” http://securitywatch.pcmag.com/security/323419-symantec-says-antivirus-is-dead-world-rolls-eyes

[9] Famous cyber security journalist Brian Krebs has a good description of darkweb crypting services in his blog article entitled, “Antivirus is dead. Long live antivirus.” http://krebsonsecurity.com/2014/05/antivirus-is-dead-long-live-antivirus/#more-25861




*By selecting one of the communications above, you consent to Herjavec Group
sending commercial electronic messages to you for marketing purposes,
including information about the products, services and events selected.