The exposure risk of development secrets is becoming a problem of epidemic proportions, driven by the growing complexity of the software supply chain. Over the past four years, the incidence of exposed secrets has quadrupled, GitGuardian's 2024 State of Secrets Sprawl report has found.
In 2023 alone, 12.8 million occurrences of exposed secrets on GitHub occurred, a 28% increase over 2022, the report noted. The report also highlighted the proliferation of 50 million new repositories on GitHub in 2023 — a 22% increase over 2022 — which amplifies the risk of both accidental exposures and deliberate malicious acts.
GitGuardian's Thomas Segura said low-code and generative AI tools such as GitHub Copilot, for one, are lowering the bar for software development but presenting a new challenge.
"Many people are joining this community of coders, and they're not necessarily educated sufficiently on security best practices so they hard-code secrets. Mistakes happen, so we're finding more and more secrets on the platform."
—Thomas Segura
This lack of security knowledge among the new kids on the coding block isn't the only factor contributing to secrets exposure in the GitHub developer community, Segura said. GitHub itself also presents some problems.
"There are some complexities in how Git and GitHub work. Git is not the most beginner-friendly tool. Even for experienced developers, it's easy to commit to the wrong repo."
—Thomas Segura
The problems with secrets security, highlighted by last year's high-profile breach at Circle CI, is growing. GitHub is enhancing its secrets tools, but the problem requires a holistic solution. GitGuardian's 2024 deep-dive into the secrets exposure landscape contains key takeaways for software security teams. Here are seven you need to take action on.
[ Related: Secrets leaks require holistic protection | Report: The State of Software Supply Chain Security 2024 ]
1. After an exposed secret is discovered, mitigation must be swift
The GitGuardian report found that 90% of exposed secrets remain active at least five days after a developer is notified of the breach. The report noted:
"This finding emphasizes a crucial lesson in code security: while detecting vulnerabilities is critical, the real challenge lies in remediation. Security, we believe, must be a shared responsibility across all stages of the Software Development Life Cycle (SDLC), not just the domain of specialized teams. Raising awareness about these seemingly minor lapses is essential for mitigating supply chain risks."
Erich Kron, a security awareness advocate for KnowBe4, said that anytime valid secrets are leaked, organizations should be concerned until the issue is resolved. Kron stressed that he would not count on any developer resolving the issue within five days. "It is more likely it will take quite a bit longer to modify the code and test."
"Until a fix is issued, organizations should ensure they have mitigations in place to counter the threat or at the very least monitor for abnormal behaviors."
—Erich Kron
GitGuardian's Segura said that you have to consider the secret compromised at the moment of the leak.
"It comes down to the velocity of compromising secrets. Easily identifiable secrets, such as API keys, can be compromised in a matter of seconds because GitHub is monitored by malicious bots. So five days is way too long for a secret to be exposed."
—Thomas Segura
2. Alerting developers is insufficient; they need guidance
As the data on days that exposed secrets remain active shows, alerting developers about leaks falls short of mitigating the problem. While the majority of security initiatives focus on detecting these leaks, the actual bottleneck lies in remediation, the report noted. What’s truly essential is providing them with the necessary guidance and support to rectify their mistakes effectively, it said:
"Alerting developers is not enough. If you stop at the alerting step, you're not fixing the issue. What many developers need is guidance."
3. Some file types are more likely to leak secrets than others
By calculating a risk score based on the prevalence of a file type on GitHub and which file types were involved in a secrets leak, GitGuardian's researchers were able to assign a probability of data leakage to a file type. "This metric gives us a clearer picture of which file types warrant closer scrutiny for potential security risks," the researchers wrote.
The file type most likely to be a source of a secrets leak is .env files, with a 54% probability that a scan will reveal a secret in it. ".env files are commonly used to store environment variables," Segura explained. "When you're coding, they're the best place to put your secrets, but you should never commit them to your repository."
"It's a common mistake to commit the file. Then the file will be published, and it holds all the sensitive information for your project."
—Thomas Segura
4. Automated detection is a necessary but not enough alone
GitHub has a program for detecting and reporting potentially exposed secrets. The program allows service providers to contribute their secrets formats for scanning, enabling GitHub to detect potential secrets in public repositories and public npm packages. Four participants in the program — WeChat, Stripe, Datadog, and OpenAI — were studied by the GitGuardian researchers. Despite their participation in the program, the researchers found, those organizations still experienced a high rate of unresolved leaks.
This situation highlights that automated detection is a necessary but insufficient layer of protection, the report noted. If valid secrets are exposed for a long time, the report continued, threat actors can compromise resources, data, and move laterally across the supply chain. Fixing vulnerabilities should be the primary focus of a dedicated secrets security program, the GitGuardian report added.
Protecting secrets requires more than automation alone, Segura said.
"It's being able to monitor all these assets, having a way to prioritize efficiently, having a way to educate people and remind them of the danger of sharing a secret through Slack or putting a secret in a ticket. It's way more complex than just implementing a new vault or a top-notch secrets manager. It requires a real strategy."
—Thomas Segura
5. DMCA notices can be used to stop leaks
The Digital Millennium Copyright Act, signed into law in 1998, established a process known as "notice and takedown." It allows copyright holders to request the removal of infringing content from online platforms. If a copyright holder believes their work is being infringed, it can send a takedown notice to a service provider, who is then obligated to remove the infringing material.
If an organization feels a repository contains leaked secrets, it can file a notice with GitHub to take the repository offline, the report noted:
"Data points to an increasing use of DMCA notices as a last-ditch effort to remove repositories that inadvertently expose secrets. About 12% of takedown repositories in 2023 were exposing secrets. That tends to indicate that the takedown was motivated by the exposure of sensitive information."
However, Segura said taking down repositories isn't enough, since those secrets may have been stolen before the repositories were taken offline. "That's why we're trying to educate people about revoking secrets," he explained.
Kron elaborated on the limitations of DMCA takedowns. "DMCA takedown notices are only going to impact legitimate hosts and will do nothing for the sharing of the information among cybercriminals and on the dark web," he said.
And there are risks with DMCA notices, the report noted:
"It’s worth noting that the public disclosure of such requests can inadvertently highlight problematic repositories to malicious actors. This visibility makes it a double-edged sword, suggesting it should be used as a last-resort solution due to the risk of drawing unwanted attention to sensitive content."
6. Don't count on AI tools to block leaks
To test the ability of the ChatGPT large language model (LLM) to identify valid secrets, GitGuardian's researchers prompted it to scan 1,000 known valid secrets. It failed to identify 15.2% of them, the report found. "This finding is particularly concerning given that the test focused on specific secrets. The recall rate is likely to be even lower for generic secrets."
The researchers also tested precision of ChatGPT in identifying hard-coded secrets. They found the LLM flagging an excessive number of files.
"A manual review of the documents confirms this, revealing that alerts for secrets were triggered in simple cases, such as common placeholders or even IP addresses. This suggests a high propensity for generating false positives."
7. Always revoke compromised secrets
Repository owners often react to leaks by either deleting the repository or making it private, cutting off public access to the leaked information, the report explained. However, this approach can lead to one of the riskiest scenarios for an organization: a “zombie leak.”
A zombie leak occurs when a secret is exposed but not revoked, remaining a potential attack vector, the report continued. The commit author may believe that deleting the commit or repository is sufficient, overlooking the crucial revocation step, the GitGuardian report noted:
"It’s important to remember that numerous threat actors continuously monitor and mirror public GitHub activity in real-time. Any sensitive information exposed, even briefly, should be considered compromised. For secrets, this means that merely hiding the leak is ineffective and can create a false sense of security."
Segura said that a zombie leak may have been collected by an adversary, "so as long as it's not invalidated, it can still be exploited in an attack."
An ounce of prevention (of secrets leaks) is worth a pound of cure
Git is a distributed version-control system that allows developers to track changes in their code and collaborate with others. Git repositories are hosted on GitHub, which also offers additional services, such as issue tracking, pull requests, and code review.
Segura said that GitHub is developing push protection. "It alerts developers at the push phase of a commit when a secret has leaked. That's really good for the security of the platform, but it's not a silver bullet. There will still be leaks."
Darren Guccione, CEO of Keeper Security, said the problem organizations face with secrets management is that credentials can wind up distributed and stored all over the place.
"As organizations continue to expand their IT environments, the secrets needed for their systems and apps to function also increase exponentially. They may be hard-coded directly into software, stored in plaintext config files, or sitting on a developer’s workstation."
—Darren Guccione
Hybrid and multicloud environments can spread out these secrets and lead to duplicated or outdated credentials, Guccione said. Before long, the data environment contains an immeasurable number of secrets — SSH keys alone can easily number in the thousands — all stored haphazardly and with no centralized solution to organize, manage, or secure them.
"The lack of central management not only expands the potential attack surface; it also puts the network in a position where an updated credential can take down the entire production system."
—Darren Guccione
Because secrets unlock access to highly sensitive systems and data, the damage potential of a stolen secret increases exponentially, Guccione said. "Breaches of highly confidential data, breaches of highly sensitive apps and systems, and decreased productivity can result in compromised credentials, a damaged reputation, and millions of dollars in mitigation costs, legal fees, and lost revenue."
The GitGuardian report recommended a multilayered detection strategy for preventing secrets leakage. That includes real-time monitoring of repositories to strengthen version-control systems against exposed secrets and adopting a "shift-left" approach to development to address security concerns earlier in the SDLC.
A secure secrets manager can also be a valuable weapon in the battle to protect secrets. "It is critical that organizations implement a secure secrets manager to lock down secrets and prevent potentially damaging sprawl," Guccione said.
Secrets managers can eliminate hard-coded credentials from source code, config files, and CI/CD systems, and centralize secrets in a secure and user-friendly vault, Guccione said.
"By providing a comprehensive set of features designed to securely manage and control access to secrets, a secrets manager eliminates the uncontrolled distribution and storage of confidential information, such as passwords, API keys, certificates, and other sensitive data across an organization."
Joe Coletta, software supply chain security evangelist at ReversingLabs, said protecting secrets must start from square one to be effective, with developers scanning application code for secrets to allow them to identify and track any secrets, and then determine whether or not it is safe to be included in the build of a software artifact. But all tools are not created equally, he stressed.
"There are a plethora of free and paid scanning tools available to developers, but they can be noisy: warning about credentials that are actually benign and have no value to threat actors. This is why dev teams should invest in tooling that filters out this noise and pinpoints the exposures that actually pose a threat."
—Joe Coletta
Keep learning
- Get up to speed on securing AI/ML systems and software with our Special Report. Plus: See the Webinar: The MLephant in the Room.
- Learn how you can go beyond the SBOM with deep visibility and new controls for the software you build or buy. Learn more in our Special Report — and take a deep dive with our white paper.
- Upgrade your software security posture with RL's new guide, Software Supply Chain Security for Dummies.
- Commercial software risk is under-addressed. Get key insights with our Special Report, download the related white paper — and see our related Webinar for more insights.
Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.