The purpose of YARA rules is to improve our methods of malware detection. New malware families appear and evolve every day, so it is important to provide our clients with tools to protect themselves. This is why ReversingLabs' threat research team continually writes YARA rules, to deliver an open-source, working tool that detects the latest malware families.
The rules also must be as precise and verbose as possible to prevent the appearance of false positives. The creation of high quality YARA rules allows our clients to keep their defenses up to date, giving them the best chance at preventing security incidents.
Since the threat landscape is constantly changing, the research team at ReversingLabs continuously updates the company's public YARA rules repository on GitHub with new and actual threats. This blog post describes the process of how we write our high-quality YARA rules. Here's an example of writing detailed YARA rules, demonstrated by the YARA rule for the GwisinLocker ransomware.
Choose the target
Writing high-quality YARA rules is a time consuming process, which means that our team must choose their battles. There are many criteria for choosing a malware family, and usually samples which will be chosen for analysis are the ones which are known to have a big impact, and are highly popular in the threat landscape, such as:
- New ransomware which targets big companies and businesses
- Destructive wipers used as means of cyber warfare
- Spyware and backdoors used by the various APT groups
Most malware in the modern threat landscape is packed with custom or off-the-shelf packers, to make analysis and signature matching harder. This is why our team checks if the samples are packed before they start writing the YARA rule. YARA rules should match malicious code, not the packing layer, and we write them with the second, unpacked, layer in mind. This additionally makes them suitable to be deployed on dynamic analysis solutions, for runtime inspection. ReversingLabs can automatically unpack more than 400 executable packer formats.
When malware is packed by unidentified custom packers, the unpacking must be done manually. This typically involves using a debugger to analyze the packer layer, identify where execution is passed to the second layer, and extracting the payload. One common technique that packers use to execute the packed code is process injection. Process injection comes in several variants which include self-injection, PE injection, and process hollowing. All of the aforementioned variants can be recognized by the typical pattern of API calls which must occur during the unpacking.
In a nutshell, the process into which the packer is injecting the payload needs to be created or opened (using CreateProcess or OpenProcess APIs). Additional memory in the process might then be allocated with VirtualAllocEx, and is populated with the payload by using WriteProcessMemory. Other APIs might also be invoked, among which are:
- VirtualProtectEx
- ReadProcessMemory
- CreateRemoteThread
- ResumeThread
- NtResumeThread
The execution of the malicious payload is then resumed instead of the original process’s contents. The malicious payload can be obtained in several ways from memory, and it’s important to dump the payload in an executable format for later analysis.
To make sure that we don't duplicate effort, every unpacked sample is matched against our entire YARA signature collection, to see which, if any, patterns are matched. This enables us to easily track novel malware, as well as new malware versions.
Do detailed, in-depth analysis
Every malware family has its own characteristics and set of behaviors. The way these are implemented in the code differs from one malware family to another. However, the behavior of malware types (like ransomware or backdoors, among others) can usually be described by a set of common actions that all malware families of a certain type share. For ransomware like GwisinLocker, the behaviors we are interested in are:
- Finding the files
- Encrypting files
- Dropping the ransom note
- Establishing a remote connection with the C2 server
- Decrypting the malware configuration
One of the more interesting behaviors we found in GwisinLocker is the shutting down of the VMWare ESXi machines before the encryption. The part of code which implements this behavior can be seen in the picture below. The constants that can be seen in the picture are the sets of strings which are used as a method of obfuscation. They represent the following command:
esxcli vm process kill --type=force --world-id="[ESXi] Shutting down - %s"
Stack strings are a method of obfuscation in which the string is built on the stack one (or few) character(s) at the time. The purpose of this technique is to confuse the reverser and make the reversing process slower. We will use this part of the code to create a behavior-focused pattern. The hardcoded stack strings are a good choice for a byte pattern because they make the pattern more unique and specific. By extension, this reduces the probability of catching false positives once the YARA rule is deployed. The created pattern can be seen in the picture below.
This small rule which represents the behavior-focused pattern is evaluated against samples in our cloud, to identify other potentially interesting samples with similar behavior, which might have been missed during initial sample collection stages. The results should be analyzed to see how similar (or different) the matched samples are. The possible conclusions derived from this step are:
- The samples are very similar. This means that we are on the right track and that they probably belong to the same malware family
- The samples are notably different. This means that the code pattern is not unique to the malware being analyzed, or it might be a part of a common library which is reused among different malware. Either way, the pattern needs to be expanded with more specific data, or supplemented with other parts of the code which are more unique to this malware family.
YARA rule structure matters
Every rule consists of the "meta" section, the "strings" section, and the "condition" section. They are described in detail below.
The meat of the 'meta' section
Every rule needs to have a "metadata" section, which is divided in two parts:
CCCS YARA metadata
We've decided to conform to the publicly available CCCS YARA validator. The specification requires several fields to be present, among which the most important are “sharing” and “malware.” The "sharing" field describes the sharing limitations of the YARA rule. The value "TLP:WHITE" means that the YARA rule can be freely distributed. The "malware" field contains the information about the category of the samples that YARA rule detects. Our YARA rule aim to detect the samples which belong to the "MALWARE" category, and have their family name.
author |
Always set to "ReversingLabs" |
source |
Always set to "ReversingLabs" |
status |
Always set to "RELEASED" |
sharing |
Always set to "TLP:WHITE" |
category |
Always set to "MALWARE" |
malware |
Malware family name, in uppercase in the form MALWAREFAMILYNAME |
description |
Always needs to begin with "YARA rule that detects...", only the malware family name and malware family type are changed |
If you're interested in the more detailed explanations of the fields, you can check out the CCCS YARA standard configuration page, and see how they’re used in our public YARA rules.
ReversingLabs-specific YARA metadata
ReversingLabs’ YARA rules are one of multiple classification methods, and they supplement more complex classifiers for added protection. In order for ReversingLabs’ core engine to correctly classify files using YARA rules, additional metadata must be present. The required metadata has the following structure:
tc_detection_type |
MalwareFamilyType from the rule name |
tc_detection_name |
MalwareFamilyName from the rule name |
tc_detection_factor |
Usually set to 5, but often depends on the threat type |
The example of the "meta" field for the GwisinLocker ransomware can be seen in the following image:
Another example can be seen in the YARA rule for the HermeticWiper malware which was covered in one of our previous From the Labs blog posts.
The "strings" section
As analysts, our team commonly needs to update each other’s rules, and must be thoughtful of how fast they are evaluated, given the millions of files ReversingLabs processes daily. There are some good practices which should be followed to increase the readability and speed of the YARA rule evaluation:
- Standardize the indentation and be consistent with it. For example, if you use one tab for the indentation, make sure it applies in all your rules.
- Break the longer patterns into more, sequentially named subpatterns (e.g. $encrypt_files_p1, $encrypt_files_p2, ...)
- The pattern shouldn't start or end with the optional, masked bytes (question marks).
- Use patterns with longer sequences of exact (non-optional) bytes, as they serve as anchors
The example of correctly written and split patterns is the kill_processes pattern from the GwisinLocker YARA rule, which can be seen in the following image:
The "condition" section
The rules are evaluated on PE and ELF files, so the "magic" bytes at the beginning of each file need to be checked:
- uint16(0) == 0x5A4D - The "MZ" header for the PE files
- uint32(0) == 0x464C457F - The ".ELF" header for the ELF files
When writing conditions, the team uses a whitespace-heavy style, to keep the rules consistent and readable. Additionally, we split the blocks by logical operators, to make it visually easy to see how the patterns are grouped. This organization makes it easy to troubleshoot and fix signatures as new versions appear, without compromising the logical validity of the condition.
The example of the GwisinLocker condition can be seen in the picture below. The first group of conditions covers the 32-bit version of the ransomware, while the second group covers the 64-bit version.
YARA rules: A continuous process
Threat actors keep developing the malware in their arsenal, and the ReversingLabs malware research team continuously monitors the threat landscape for new versions that our existing YARA rules do not cover. When a new version is discovered, the process outlined in this post is repeated. The YARA rule is then updated to keep pace with the new threats in the never ending cat-and-mouse game known as malware analysis.
Learn more about ReversingLabs' Malware Analysis and Threat Hunting solutions:
Keep learning
- Find the best building blocks for your next app with RL's Spectra Assure Community, where you can quickly search the latest safe packages on npm, PyPI and RubyGems.
- Get up to speed on securing AI/ML systems and software with our Special Report. Plus, see the Webinar: The MLephant in the Room.
- Learn about complex binary analysis and why it is critical to software supply chain security in our Special Report. Plus: Take a deep dive with RL's white paper.
Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.