Introduction
Developing software solutions is a complex task requiring a lot of time and resources. In order to accelerate time to market and reduce the cost, software developers create smaller pieces of functional code which can be reused across many projects. The concept of code reuse is one of the cornerstones of modern software engineering and it is universally accepted that everybody should strive towards it. However, in addition to the positives, organizations need to be aware of the security risks introduced by such third-party components.
The growing number of cyber incidents that target the software supply chain are focused on high-value target compromises. With the latest surge and public uproar, the US President Biden has issued the Executive Order on Improving the Nation’s Cybersecurity in order to create an institutional framework addressing these kinds of security risks.
This blog will describe the hidden risks behind off-the-shelf software supply chain components. We’ll address the importance of validating third-party software components as a way to manage the risks that they can introduce. We’ll also explain why some of these security risks can only be recognized by analyzing the final software product delivered to the customers.
Package repositories
Reusable software modules can be distributed in different forms. Low-level components are often distributed as libraries, while more complex modules are distributed in the form of a package. To make developers' lives easier and to provide version tracking, packages are usually distributed through public package repositories like npm, RubyGems, PyPI, NuGet, etc. These package repositories contain similar types of packages, usually based on the platform that they are targeting, or runtime that they require for execution.
For the purpose of this blog, we have analyzed packages hosted on the (NuGet) package repository. NuGet is the package manager for the .NET framework and it contains almost 260,000 unique packages that have collectively produced more than 100 billion package downloads. Static analysis was used to process more than 4 million various package versions from the NuGet repository in order to find out if any of them or the components they include contain some of the known software vulnerabilities.
Figure 1: File type distribution of unique files extracted from NuGet packages
Figure 1 lists file types found within analyzed NuGet packages. If we ignore the expected textual JavaScript files, we can see that the dominant files are actually Windows Executable (PE) files, followed by a smaller number of other executable file types such as Linux (ELF) or Mac (MachO). Looking at the subtype distribution of PE and PE+ files in Figure 2, we can see that the vast majority of them are DLL files - libraries containing reusable code. This means they have potential hidden security risks, making them the focus of this research.
Figure 2: File subtype distribution of unique PE/PE+ files extracted from NuGet packages
We used Titanium Platform which can automatically detect different security-related issues during static file analysis. One of its new features is marking issues as policy violations. Think of policies as a set of common guidelines which lead to a more secure code base. For example, executable files shouldn’t contain malicious functionality. They need to be signed with valid certificates, consistently implement security mitigations, etc. Titanium Platform recognizes and reports such policy violations. It can also grade any piece of software based on detected issues. Each policy violation is reported with a human-readable description. It has an associated severity level and an estimated effort required for it to be resolved.
Figure 3: Policy violation examples
We tracked hundreds of defined policies and their violations for the analyzed packages. Going through all of them would be time consuming, so, for the purpose of this research, we will be focusing on those related to known software vulnerabilities.
Known vulnerabilities in public packages
Under the term “known vulnerabilities'', we refer to software vulnerabilities that are identified in the NVD Database by a CVE number and with an assigned CVSS score. If Titanium Platform identifies that some of the components found in the analyzed software have an associated vulnerability, related information will be visible in the created software quality report.
Figure 4: Vulnerability information in SDLC report
Processing the NuGet repository revealed 51 unique software components vulnerable to actively exploited, high-severity vulnerabilities. Several software components vulnerable to medium and low-severity vulnerabilities were also detected.
Figure 5: Vulnerability related policy violations
All identified precompiled software components in our research were different versions of 7Zip, WinSCP and PuTTYgen, programs that provide complex compression and network functionality. They are continuously updated to improve their functionality and to address known security vulnerabilities. However, as the previous results show, sometimes it happens that other software packages get updated but still keep using several years old dependencies containing known vulnerabilities.
One such example is the WinSCPHelper package. As stated on its homepage, this package is meant to be used for “Basic downloading and uploading of files(SFTP) via WinSCP”. The latest version of this package, v1.0.13, has been downloaded more than 33,000 times and is still being actively installed. The problem here is that this package uses an old and vulnerable WinSCP version v5.11.2, while the latest WinSCP version is v5.17.10. The latest version fixes a critical CVE-2021-3331 vulnerability which affects all older versions and “allows remote attackers to execute arbitrary programs when the URL handler encounters a crafted URL that loads session settings”. Using this package in your software product exposes it to the same type of vulnerability.
While in this case the analyzed package clearly states that it uses WinSCP, it doesn’t disclose the version in the list of dependencies, and you can’t easily find out which vulnerabilities affect its underlying dependency. It is manual work, still doable, but it requires some effort. However, there are situations involving “silent vulnerabilities” where this is not easily done.
Silent vulnerabilities in public packages
Silent vulnerabilities are known vulnerabilities that can’t be found by inspecting the dependency list. Such vulnerabilities are introduced by statically linking package dependencies, which results in “hiding” vital information from the dependency list. Very often developers opt out of including used third-party functionality by using a compiled library, and instead go the route of compiling its source code themselves. To better explain the concept of a silent vulnerability, we’ll use a very popular compression library: zlib.
When looking at zlib’s official pages, you won’t find any downloadable binaries. Developers are given source code which encourages them to compile it on their own into their projects. When zlib source code is statically compiled it leaves some artifacts that can be found in the compiled binary. Two of them stand out immediately and can be recognized easily because they mention the authors of zlib.
inflate 1.2.1 Copyright 1995-2003 Mark Adler
deflate 1.2.1 Copyright 1995-2003 Jean-loup Gailly
Figure 6: Copyright string from zlib
The great thing about these copyright strings is that they contain version numbers which can tell us which zlib version is being used in the library, and help us determine which vulnerabilities it is impacted by. The list of known zlib vulnerabilities can be found by searching the CVE Details vulnerability repository. As we can see, there are 9 known vulnerabilities, with a few of them being rated as high-severity vulnerabilities.
Figure 7: Known zlib vulnerabilities
For this research, we have created a few YARA rules and used them to scan the contents of the NuGet package repository. The objective was to find software packages containing components that statically link vulnerable zlib versions. Research results show that more than 50,000 software components extracted from NuGet packages were statically linked to the vulnerable 1.2.8 version of zlib. Some of the discovered packages were updated in later releases to use a newer zlib version, but a big number of them still use a vulnerable zlib version, even in the latest release.
Figure 8: Number of software components exposed to known zlib vulnerabilities
One example package that contains a silent zlib vulnerability is the DicomObjects package. As the package description says, this is “a simple to use .NET based DICOM library. The API is user friendly and is aimed to provide easy ways to build DICOM solutions”. DICOM is of course a building block for many Healthcare applications providing a standard for the communication and management of medical imaging information and related data. NuGet statistics show that it has been downloaded more than 50,000 times, and that it was last updated 4 months ago. This package has been published by a company named Medical Connections, and by looking at their web page we can assume that they have a lot of customers involved in developing medical software and hardware infrastructure.
Figure 9: Medical Connections customer logos
Titanium Platform analysis found that the latest version of DicomObjects package available in NuGet repository, v8.40.1101, included a file named ceTe.DynamicPDF.Viewer.40.x86.dll. This file contained a statically linked zlib version 1.2.8 affected by CVE-2016-9840, CVE-2016-9841, CVE-2016-9842 and CVE-2016-9843 vulnerabilities.
Dynamic PDF viewer is a proprietary software developed by a different company named DynamicPDF. Their web page shows that the latest version of DynamicPDF viewer is v3.06, while Titanium Platform detected that the version of DynamicPDF viewer extracted from the DicomObjects package is v1.0.4. This is one of the most common software maintenance problems. Developers create a software package, decide to use third-party software, but during subsequent updates, the dependencies get overlooked. In this case, things are even worse because it is not explicitly mentioned anywhere that the DicomObjects package depends on DynamicPDF.Viewer. There is no way to tell that DynamicPDF.Viewer depends on the vulnerable zlib library. Stacking hidden dependencies in such a way leads to multiple levels of silent vulnerabilities and makes software maintenance and auditing significantly more difficult. While such development practices shorten time to market, they also make it harder to detect vulnerable third-party software later on.
Another exemplary package containing a silent zlib vulnerability is the librdkafka.redist package. This is a C library implementation of the Apache Kafka protocol, providing Producer, Consumer and Admin clients. This package has a total of 16,742,564 downloads and the latest version, updated less than a month ago, has already been downloaded almost 20,000 times. From the following image, it is visible that this package is a dependency for several other quite popular packages.
Figure 10: librdkafka.redist dependent packages and github repositories
Many of these packages are maintained by Confluent, a company working with large Enterprise clients. A vulnerability introduced in the librdkafka.redist package would propagate to all these dependent and frequently downloaded packages listed in Figure 10. The NuGet page for the librdkafka.redist package, as well as the related github page, does not mention that this software has any dependencies. The installation instructions say that, to use it on Windows, it is enough to just reference this NuGet package in your Visual Studio project. But our static analysis revealed that the latest version of this package, v1.7.0 contains several native libraries needed during runtime, including the aforementioned vulnerable 1.2.8 version of zlib.dll. Even though in this case zlib source code is not statically linked into the proprietary code, this dependency relation is not explicitly mentioned and introduces another silent vulnerability to all of the dependency chains that include the librdkafka.redist package.
Software quality assessment report
ReversingLabs develops tools that help companies maintain a better insight into the software solutions they produce. Using advanced static analysis on binary code, ReversingLabs provides a way to perform a detailed analysis of the final product distributed to the customers. An approach like this enables detection of very sophisticated attacks such as SUNBURST which have been tremendously successful by targeting later phases of the software development life cycle.
In this way, users can analyze various types of software packages, even when there is NO source code access, and verify their dependencies for presence of known malware and vulnerabilities. This approach allows users to compare different versions of software packages, and to get a better insight into the most significant differences between them. For example, this can lead to investigation of differences in functional capabilities, or differences in dependency lists.
Figure 11: SDLC report summary
Software quality assessment produces a comprehensive report containing a list of software components (SBOM - Software Bill of Materials), metadata extracted from these components, and also a list of issues related to the security and quality of these components. Each detected issue comes with a human-readable description, and metrics such as the severity and effort required to fix it. Based on these metrics, a final grade is given for the analyzed software solution. This grade can warn you if the analyzed software shouldn’t be released in its current state.
Figure 12: Bill of Materials from the software quality assurance report
The report can be used by all stakeholders involved in various stages of software development and maintenance to serve as “the” document for understanding, managing, verifying and tracking the necessary software quality improvements.
Conclusion
Secure software development is a complex problem, as it involves many participants across multiple stages of development. Regardless of what type of software your company produces, sooner rather than later, there will be a need to include third-party dependencies into your solution. This will introduce a need to manage security and code quality risks. Software supply chain attacks are a growing threat to the cyber community. They are the DDOS analog to traditional breaches. As such they have started to attract the attention of state governments with new directives towards enforcing best secure software development practices as a way to minimize the attack surface.
Companies developing software solutions need to become more aware of such risks, and need to become more involved in their handling. Both the inputs and final outputs of the software development process need to be checked for tampering and code quality issues. Transparent software development is one of the keystones needed to enable early detection and prevention of software supply chain attacks, and we will continue to provide and improve the tools needed to fulfill such tasks. Please contact us to get a security assessment report for your software solutions. We’d like to offer our help and work together to improve your solutions’ security posture.
Keep learning
- Find the best building blocks for your next app with RL's Spectra Assure Community, where you can quickly search the latest safe packages on npm, PyPI and RubyGems.
- Get up to speed on securing AI/ML systems and software with our Special Report. Plus, see the Webinar: The MLephant in the Room.
- Learn about complex binary analysis and why it is critical to software supply chain security in our Special Report. Plus: Take a deep dive with RL's white paper.
Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.