Malicious actors are constantly on the lookout for new attack vectors and techniques, using them to infiltrate even the most secure of organizations. Many organizations have adopted security best practices, such as defense-in-depth and diversity of defense, and continue to improve their security posture to impede attackers.
Attacking an organization head-on will likely yield fewer results and will typically be detected earlier than taking a covert approach, such as using spear phishing to target individuals. One major covert attack vector is the software supply chain, in which the attacker doesn’t target the organization itself. Instead, supply chain attacks target a trusted vendor or multiple vendors who provide the organization with software or services. Additionally, as software delivery continues to evolve, a new set of providers often referred to as package manager repositories have emerged to serve software development companies.
Popular package manager repositories for software developers are PyPI, RubyGems, NuGet, and npm, to name a few. Such repositories serve thousands, if not millions of software developers around the world. Developers are able to accelerate their own projects by openly sourcing software components to meet their needs. It’s too cumbersome (and potentially insecure) to reinvent the wheel whenever a developer needs to implement a common functionality; one that has already been implemented by someone else as a library or a module, and has been proven to work correctly. Do you need to make an HTTP request in Python? You can use the requests PyPI package. Would you like to manage RabbitMQ through Ruby? You can do it with the rabbitmq_manager Ruby gem.
Python Package Index, commonly known as PyPI or “Cheese Shop”, has been the target of misuse on several occasions. The most common attack approach is named typosquatting, in which an attacker deliberately makes typos when naming malicious packages (such as djanga instead of django) in the hope that an unsuspecting user will accidentally mistype the name and install the malicious package. PyPI has removed the affected packages, but there's still the question if they were the only ones.
To satisfy our curiosity, we processed the entire PyPI repository with our Titanium Platform static analysis solution running on a single server with two AMD EPYC 32 core processors, 256 GB of RAM, and two 3.5 TB NVMe SSD disks. The data set contained 1,584,049 files, including all packages and their historical versions, with the total size of 2.61 TB. The packages were processed in a little less than a day (more specifically, in 23 hours and 44 minutes), and the Titanium Platform managed to unpack 289.4 million files from the input data set, of which 55.8 million were unique. To get a better glimpse into the unpacked files, a filetype distribution snapshot was taken and can be seen in Figure 1. Most of the Binary column consists of two archive formats, TAR and GZIP, since the majority of files in the repository were .tar.gz archives, while the Text column mostly covers Python scripts and various setup and configuration files.
However, a lot of packages contain executable files for various operating systems (such as PE, ELF, and MachO files). One example is a package that can be used to compare files and see the differences between them, and as a testing sample set, it includes a variety of executable and non-executable file formats. The issue here is that when users pull the package from the repository, they might get more than what they bargained for.
Figure 1 - Filetype distribution
Our processing effort included an additional set of custom YARA rules modelled after the malicious data set exposed by other researchers. We were able to find a package that the initial analysis had missed, and which was still available in the PyPI repository at the time of writing. The affected package name is libpeshnx, developed by user ruri12, and released on November 23, 2017, as illustrated in the screenshot below. The name libpeshnx looks like a variant of another (previously reported) package by the same author, called libpeshka. Two additional packages have been identified, called libpesh and libari, but they only contain references to the malicious function without any code.
Figure 2 - Available packages
The backdoor logic is extremely simple, and has been succinctly described when libpeshka had first been found. In a nutshell, if the package is installed on a Linux system, it will try to download a file from the C2 domain, save it as a hidden file named .drv in the user’s home directory, and persist itself inside .bashrc to be run as a background process whenever an interactive non-login shell is created (i.e. any time a shell is opened after the initial login). The complete source code can be seen in Figure 3.
Figure 3 - Backdoor downloader code
Luckily, the package looks like a development version of the malware. When the package is installed with pip, it doesn’t automatically run the malicious function, but it does install the function as a library. It also creates an eggsecutable script, which can then be used to execute the malicious payload out-of-the-box. Alternatively, the malicious library can be imported, but the precise module and function names have to be known and executed. Additionally, the C2 server seems to have been offline for quite some time (it was already offline 7 months ago, during the initial libpeshka disclosure).
Nonetheless, it is troubling because there have been 82 installations of libpeshnx monthly on average (the exact installation breakdown per month can be seen in Figure 4), and other ruri12’s packages have been installed even more frequently. PyPI’s security team has been contacted, and the packages have been removed from the repository. We’d like to thank them for their prompt response.
Figure 4 - Monthly number of libpeshnx installations
Given the lack of scrutiny involved during the package submission, review, and approval process, and the attack surface size that such platforms provide, public package repositories might slowly become a malware repository platform, unless their security posture changes. To greatly reduce the possibility of hosting malware, such repositories would all benefit from continuous processing and a better review process.
Until then, be careful what you type, because you’re just the attacker’s type.
Affected packages and SHA256:
libari-0.1, 1f45d5e3948533c2c7f389968e006a7e33b6b79348d4375f3de60ea47a75d2cc
libari-0.2, 669f4ab40636f59470496ae0da9d852294b2d5918a7242d0bd8f5ba489abae5b
libari-0.3, 5639a4c6aa9ec39f37644a543f9b5a04e7fa5aa63843602c94db91034461d8f1
libpesh-0.1, 0eaa213c631966e2f08d858c9b4766ecc5a6f49dd2a75f91c74781f447af6b4e
libpeshnx-0.1, b828582a6dd07ba10ff71ecbb4300866f690e46efb23a969ffd29c8990f7880e
Disclosure timeline:
07/09/2019 - Contacted PyPI security team, packages have been removed promptly
Keep learning
- Find the best building blocks for your next app with RL's Spectra Assure Community, where you can quickly search the latest safe packages on npm, PyPI and RubyGems.
- Get up to speed on securing AI/ML systems and software with our Special Report. Plus, see the Webinar: The MLephant in the Room.
- Learn about complex binary analysis and why it is critical to software supply chain security in our Special Report. Plus: Take a deep dive with RL's white paper.
Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.