Python Package Index (PyPI) attackers used compiled code to evade detection. It’s possibly the first attack to take advantage of .PYC file direct execution — but likely not the last.
The ReversingLabs’ reverse engineering team led by Karlo Zanki (pictured) spotted the tactic. In this week’s Secure Software Blogwatch, we round up reax right.
Your humble blogwatcher curated these bloggy bits for your entertainment. Not to mention: Bees Packed in a Suitcase.
AST/SCA FAIL — RL FTW
What’s the craic? Steven J. Vaughan-Nichols reports — “Compiled Python Code Used in a New PyPI Attack”:
“Up to mischief”
PyPI can’t catch a break. The popular Python programming language code repository has been subject to numerous attacks and has had to restrict new members for a while. Now [there’s] a novel attack … to dodge software code security scanners. … Great. Just great.
…
It employs a previously unexplored approach, exploiting the capability of … Python byte code (PYC) files … to be directly executed. Thus, it avoids security tools that scan Python source code (PY) files for trouble. The ReversingLabs crew found the suspect package when its ReversingLabs Software Supply Chain Security platform discovered suspicious behaviors from a … compiled binary.
…
[It] triggered a previously unseen loading technique inside the main.py file that avoids using the usual import directive … to avoid detection by security tools. … In short, they were up to mischief. … Once active, the loaded library would then execute a host of malicious functions, such as collecting usernames, hostnames, and directory listings, and fetching commands for execution using scheduled tasks or cronjob.
Filling in the blanks, it’s Nate Nelson — “Novel PyPI Malware Uses Compiled Python Bytecode to Evade Detection”:
“Attackers are evolving”
Malicious packages aren't new — or particularly rare — in PyPI, but unlike the lot of them, fshec2 contained all of its malicious functionality inside of its compiled code, making it hard to spot. … The genius of fshec2 is in how it dispenses with basic conventions of good hacker hygiene: [It] front loaded its malicious functionalities, and didn't rely on obfuscation tools at all.
…
Bytecode is a representation of Python, compiled as a set of instructions for the Python Virtual Machine. … It exists somewhere between source code and being a machine binary.
…
"There's been a huge increase in … malicious Python libraries [being] leveraged to serve malware," … says Ashlee Benge, director of threat intelligence advocacy at ReversingLabs. … "This behavior is a bit more sophisticated, and it shows that the attackers are evolving and paying attention to the better detections that are being rolled out. … We're probably going to continue to see this kind of attack increase in the future."
And your boss would understand Lucian Constantin — “Most vulnerability scanning tools don't read compiled open-source software”:
“Modern software supply chain threats”
The vast majority of the packages found on public repositories such as npm for JavaScript, PyPI for Python, and RubyGems for Ruby consist of open-source code files that are packaged into archives. They are easy to unpack and read, and as a result security scanners for these repositories have been built to handle this type of packaging.
…
To deal with these modern software supply chain threats, organizations need more than static code analysis solutions. … Attackers are in a constant battle with security companies to evade detection.
Horse’s mouth? ReversingLabs’ Karlo Zanki — “When byte code bites”:
“Configuration mistakes”
It may be the first supply chain attack to take advantage of the fact that … PYC files can be directly executed. … We reported the discovered package, named fshec2, to the PyPI security team on April 17, 2023, and it was removed from the PyPI repository the same day. [They] acknowledged that it had not been previously seen.
…
ReversingLabs regularly scans open-source registers such as PyPi, npm, RubyGems, and GitHub looking for suspicious files. … These often jump out at us from the millions of legitimate … files hosted on these platforms because they exhibit strange qualities or behaviors. … That was the case with fshec2: … A scan using the ReversingLabs Software Supply Chain Security platform … extracted a suspicious combination of behaviors from an fshec2 compiled binary.
…
Given the malware’s reliance on remote C2 infrastructure, it made sense to scout out the web host used in the attack. … Like regular developers, malware authors often make configuration mistakes when setting up infrastructure. … The sheer number of these mistakes might lead us to the conclusion that this attack was not the work of a state-sponsored actor and not an advanced persistent threat (APT).
ELI5? Howard Solomon explains like we’re five — “New way of compromising PyPI repository”:
Researchers at ReversingLabs found a package on PyPI that used compiled Python code to evade detection by security software. … This discovery is another reason why developers have to be careful of every piece of open-source code they download.
But why would PyPI permit .PYC files in the first place? thames flows fast:
Some people use this as a very weak form of copy protection so they can distribute Python programs to customers without giving them source code. That isn't what it was orignally intended for, it's just a side effect.
…
However, this does mean that there is a use case for having ".pyc" files rather than source in a package. This in turn means that having the standard installation tools exclude ".pyc" files would break at least some existing software.
What else is PyPI doing to tighten security? Ee Durbin brings plans forward — “Enforcement of 2FA”:
Beginning today, all uploads from user accounts with 2FA enabled will be required to use an API Token or Trusted Publisher configuration in place of their password. … In February of 2022 we began notifying users on upload that this change was coming.
…
However, some valid concerns were raised regarding the use of user-scoped API tokens for new project creation. … Given this, and our commitment to further rolling out 2FA across PyPI, we are now enforcing this policy.
“Trusted Publisher”? William Woodruff explains — “A new benchmark for packaging security”:
“An attacker cannot spoof an OIDC token”
For the past year, we’ve worked with the Python Package Index to add a new, more secure authentication method called “trusted publishing.” [It] eliminates the need for long-lived API tokens and passwords, reducing the risk of supply chain attacks and credential leaks while also streamlining release workflows.
…
Trusted publishing is built on top of OpenID Connect (OIDC), an open identity attestation and verification standard built on top of OAuth2. OIDC enables identity providers (IdPs) to produce publicly verifiable credentials that attest to a particular identity. [They’re] JWTs under the hood, meaning that an identity under OIDC is the set of relevant claims in the JWT.
…
Because OIDC tokens are cryptographically tied to a particular OIDC IdP’s public key, an attacker cannot spoof an OIDC token, even if they know the claims within it. … It’s our opinion that … this kind of trusted publishing scheme will become instrumental to the security model of open-source packaging.
Sounds great. But u/nsomnac masters the unintentional pun:
2FA is more of a token gesture pretending to solve a problem. If PyPI wanted to really make a difference they’d do deeper forensics on what gets uploaded for distribution and maybe flag projects that endure more than some threshold percentage of their codebase. This just feels like a cop-out to define a more robust publication workflow.
What’s this constant drip drip drip of supply-chain hacks doing? Plest ponders:
The more things change the more they stay the same. … At this rate we're all going to have to go back to "rolling our own" in house packages again like we did years ago and stop depending on external repos. That'll … kill the lightning pace most projects are forced to run at these days.
Meanwhile, @BrentonPoke peeks under the hood: [You’re fired—Ed.]
PyPI is great when it works.
And Finally:
ASMR beekeeper vs. suitcase hive
Don’t try this at home, kids.
Hat tip: Dick Wonder
You have been reading Secure Software Blogwatch by Richi Jennings. Richi curates the best bloggy bits, finest forums, and weirdest websites … so you don’t have to. Hate mail may be directed to @RiCHi or ssbw@richi.uk. Ask your doctor before reading. Your mileage may vary. Past performance is no guarantee of future results. Do not stare into laser with remaining eye. E&OE. 30.
Image sauce: Karlo Zanki
Keep learning
- Get up to speed on securing AI/ML systems and software with our Special Report. Plus: See the Webinar: The MLephant in the Room.
- Learn how you can go beyond the SBOM with deep visibility and new controls for the software you build or buy. Learn more in our Special Report — and take a deep dive with our white paper.
- Upgrade your software security posture with RL's new guide, Software Supply Chain Security for Dummies.
- Commercial software risk is under-addressed. Get key insights with our Special Report, download the related white paper — and see our related Webinar for more insights.
Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.