YARA is certainly a useful member of the toolset of researchers, threat hunters, incident responder, and many other defenders. At its core are two essential capabilities. First is to match static qualities of a file or region of memory. The second is to provide a way to express logic applied to combinations of these static qualities. There are a multitude of major and minor use cases for YARA. For a researcher, one of the most powerful is to hunt for files inside of, and on the way into, a malware repository. Including hunting, five general use cases for YARA are outlined below. The examples given here are distilled and simplified for illustrative purposes.
Filtration
This first use case focuses on conservation of scarce resources in an organization. Even though fully-automated analysis is not the most difficult process according to Lenny Zeltser's four stages [1], the tools that carry out this task consume limited available resources. In order to use these resources more efficiently, a filter should be used to remove files that do not require further analysis. This prevents time and money being wasted on unnecessary processing.. This is especially true when using a sandbox type of automated malware analysis (AMA) system. These systems can require full execution or emulation of the samples and often have timeouts measured in minutes rather than seconds. By implementing a YARA based filter to exclude files, more of this scarce resource is left to be applied to important or unknown files.
Email Analysis
Using YARA to triage and analyze malicious emails has a few fundamental challenges that are different than any of the other use cases. Raw email data must be encoded as text and due to limitations in this legacy protocol, the lines of this text have a limited length. Two main methods of this type of encoding make up the multipurpose internet mail extensions or MIME [2]. These two encodings are quoted-printable [3] and Base64 [4]. Because of this, an indicator such as an email address, URL, hostname, or IP address can end up split across two lines at an arbitrary location in the indicator. This can be seen in Figure 1 where a malicious URL is split across two lines of quoted printable text in an email body.
Figure 1: URL Indicator Split Across Two Lines
This is a highly simplified example to best illustrate the challenge. In Figure 2, the encoding is removed using a CyberChef [5] recipe which shows the full URL indicator.
Figure 2: Decode Quoted Printable and Extract URL Indicator [6]
URL analysis shows that this link led to an Office 365 / Sharepoint Online phishing page [7]. One open source tool for processing emails in a way that exposes the data beneath the encoding to a set of YARA rules is PM_Shredder [8] introduced in 2014 [9]. This solves the problem of applying a YARA rule directly to the email and potentially having some indicators not detected because they are split across two lines. Figure 3 shows the results of submitting this same email to the Titanium Platform. The attachments are automatically decoded and any installed YARA rules are then applied not only to the email itself, but also to the decoded attachments. If the YARA rule specifically has the tag "malicious", those files are marked with a threat detection as seen on the top two results.
Figure 3: Malicious Email Triage Results
Memory Analysis
During malware research and especially during incident response, one must analyze the memory of an infected computer. One goal of this type of analysis is to identify processes that are suspicious or malicious. To do memory analysis, there are two basic tools needed: one to collect the memory and the other to analyze it. A recommended format to use for the collection step is the Advanced Forensics File Format (AFF4) [10]. This format is supported by the memory collection tool WinPmem [11]. Figure 4 shows the collection process on a computer which is also running Ryuk ransomware.
Figure 4: WinPmem Collecting Memory in AFF4 Format
Once the memory has been collected, the physical memory must then be extracted from the AFF4 file. This advanced forensic format is structured data and contains more than just the memory itself. Figure 5 shows the command to extract the physical memory from the AFF4 file.
Figure 5: Command to Extract Physical Memory from AFF4
One excellent tool for analyzing collected memory is Volatility [12]. Even though the Python 3 version of Volatility has been released in beta, the version used here is 2.6.1 on Python 2. Figure 6 shows the results of the analysis plugin "yarascan". This plugin scans either the whole memory image or one process using a YARA ruleset of the user's choice.
Figure 6: YARA Match on Ryuk String in Process
The particular rule used in this example is simplified for illustrative purposes. This rule is seen in Figure 7.
Figure 7: YARA Rule Matching Ryuk String
As we can see, the combination of YARA signatures applied to a memory image using Volatility provides a quick method for identifying malicious processes. This first step can highlight locations that should be investigated further and more closely.
Retrohunting
A major part of the research process is building rules in YARA that match assorted features of a malware sample. As the reverse engineering process progresses, the researcher is always looking for idiosyncratic or unique code, strings, or combinations of the two. This is often an iterative process that starts with a single sample. From that sample, one process is to identify static features that can be translated into a YARA rule and then deployed in a retrohunting system [13]. The goal of this process is to identify other malicious samples that share the identified static feature with the sample being analyzed. By identifying related files, one can gain knowledge of the sample from information known about the other related files. This is an especially powerful process when applied to incident response. The incident responder needs to learn as much as possible about a file in as short a period of time when working on an incident. By deploying YARA rules in a retrohunt, the responder may be able to link the file from the incident to files associated with other incidents. Those other incidents may provide insight into an effective course of action that can be reused by the responder in the current incident. Even though attribution is hard, clusters of files that share a feature can be one piece of the puzzle from which attribution may emerge.
Malware Hunting
This type of hunting is based on the same YARA rules that one uses in a retrohunt. However, the directionality of the hunt is different. A retrohunt looks backwards in time to match samples that have already been collected and are kept in a malware repository. Hunting can additionally be forward-looking where the YARA rules are applied to files that are flowing in from any number of sources. Files collected from email attachments, carved from network traffic, or executed on an endpoint can all be collected into a flow of data into a malware repository. At the entrance of the repository is the YARA matching system. Alerts based on these YARA matches can fuel multiple processes for the defender. One may be trying to identify novel variants of known malware or simply looking for features that correlate with maliciousness. The goal of the hunt dictates the style of the rule the defender deploys. Some rules cast a very wide net; they are written to match a specific malware feature, but not narrow to the point of identifying a malware family. An example of this would be a rule that detects the specific xor key used in a type of PowerShell [14] stager that is known to deliver shellcode or Cobalt Strike reflective loaders.
rule PowerShell_Stager
{
strings:
$a = "-bxor 35"
condition:
$a
}
This rule targets a category of file used during the delivery stage of intrusion. Many different adversaries may be using this same script as a template to deliver radically different types of shellcode. Once this broader rule catches the category, further analysis is needed on the embedded shellcode. This rule is also prone to false positives due to just how wide a net it casts. But this is not a permanent problem. Hunting is an iterative process. If a wide-net rule like this has a high output of false positives, one must analyze those FPs with a goal of identifying a feature among them that can be used to filter them out. This harks back to the first use case of filtration mentioned above.
The other goal of hunting is to identify and differentiate among malware families as files are collected and analyzed. One of the REVERSING 2020 sessions, presented by Tomislav Peričin, specifically covered this style of YARA rule and presented a ruleset for specifically identifying malware families.
[1] https://zeltser.com/mastering-4-stages-of-malware-analysis/
[2] https://en.wikipedia.org/wiki/MIME
[3] https://en.wikipedia.org/wiki/Quoted-printable
[4] https://en.wikipedia.org/wiki/Base64
[5] https://github.com/gchq/CyberChef
[6] hxxps[://]instinctive-cake-waxflower[.]glitch[.]me/
[7] https://urlscan.io/result/90f2e06d-d9ad-4953-bc1b-f79f3588c93c/
[8] https://github.com/x41x41x90/pm_shredder
[9] http://doczz.net/doc/6575584/all-your-metadatas-are-belong-to-me--reverse-engineering
[10] http://www2.aff4.org/index.html
[11] https://github.com/Velocidex/c-aff4/releases
[12] https://github.com/volatilityfoundation/volatility
[13] https://www.youtube.com/watch?v=T8utVmUbxlk
[14] https://attack.mitre.org/techniques/T1086/
- Watch our Video: How to Hunt for Threats Using YARA Rules
- Read our Blog: Exposing Ryuk Variants Using YARA
Keep learning
- Find the best building blocks for your next app with RL's Spectra Assure Community, where you can quickly search the latest safe packages on npm, PyPI and RubyGems.
- Get up to speed on securing AI/ML systems and software with our Special Report. Plus, see the Webinar: The MLephant in the Room.
- Learn about complex binary analysis and why it is critical to software supply chain security in our Special Report. Plus: Take a deep dive with RL's white paper.
Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.