Meta’s been fined $276 million for leaking people’s PII. But the leak wasn’t directly via a vulnerability, but rather due to data scraping. Helen Dixon (pictured), the head of Ireland’s GDPR regulator, ruled that Meta should have prevented the scrape.
What can you do to prevent it in your shop? Red-team how legitimate features could be misused. Software supply chain attacks — such as dependency confusion and typo squatting — might also open the door to scrapers.
So monitor real-time usage for unusual patterns. In this week’s Secure Software Blogwatch, we suggest how — and what to do if you find them.
Your humble blogwatcher curated these bloggy bits for your entertainment. Not to mention: Mariah Carey has defrosted.
Finebook
Irish Aunty’s Brian O'Donovan reports — “Meta fined €265m”:
“Regulating on behalf of all EU users”
Facebook parent company Meta has been fined … by the Irish Data Protection Commission (DPC) following a data breach which saw the personal details of hundreds of millions of Facebook users published … on an online hacking forum. Facebook said at the time that the information … was "scraped" … by malicious actors through a vulnerability in its … Contact Importer tools.
…
Meta was found to be in breach of Article 25 of the GDPR. … As well as the fine, Meta has been issued with a reprimand and an order requiring it to bring its processing into compliance. … Helen Dixon … the Data Protection Commissioner said the large fine imposed on Meta is intended to have a deterrent effect. [She] said when products and services are being designed, [they] must be designed to adequately protect a person's data.
…
Ms. Dixon added that the Commission is regulating on behalf of all EU users. … No objections to the drafts were raised [by] other EU data protection authorities.
How much is that in real money? Sam Schechner says — “Irish Regulator Fines Meta on User Privacy”:
“Several dozen more ongoing cases”
265 million euros [is] equivalent to about $276 million. [It] is the latest indication of how authorities in the [EU] are becoming more aggressive in applying the bloc’s privacy law to large technology companies. [This] is the third time Ireland has fined Meta … in a privacy case over the past 15 months, bringing the combined financial penalties to the equivalent of more than $900 million.
…
[The] fine stems from disclosures in the spring of 2021 that [the] information of more than 530 million Facebook users [leaked] from mass “scraping” of public profiles. … Ireland’s Data Protection Commission … said the company hadn’t taken sufficient technical and organizational steps to prevent such a leak.
…
GDPR has been enforced for nearly five years but is only now generating a series of decisions with big fines. [The] regulator says it has several dozen more ongoing cases involving multiple big tech companies [including] Meta.
How has Meta’s DevSecOps changed as a result? Mike Clark penned this 18 months ago — “How We Combat Scraping”:
“Identifying and deterring scraping”
We’d like to explain … what we’re doing to prevent scraping to protect people’s information. … Using automation to get data from Facebook without our permission is a violation of our terms. … Scrapers may not access or collect data from our products using automated means.
…
[But] it can be difficult to detect them. We do however, have a number of methods to distinguish unauthorized, automated activity. … The first way we aim to make scraping more difficult is through the use of rate limits and data limits.
…
[But] we know that scrapers are determined to find new ways to get data. That’s why we’ve also focused on developing other methods of identifying and deterring scraping. We won’t go into all of them because we don’t want to give a roadmap to scrapers.
Oh! That quickly turned into a whole lot of nothing. Still, at least they’re doing something — even if they won’t say what. But this Anonymous Coward isn’t impressed:
While a watchdog can fine and scold, nobody knows how to "un-leak" data. Since most of us only have one … real identity, one leak is all it takes.
Exactly! What can you do? One of the best resources is JonasCz’s. Here’s a tiny flavor:
“Unfortunately this is hard”
Monitor your logs & traffic patterns. Limit access if you see … unusual activity, such as many similar requests from a specific IP address, someone looking at an excessive number of pages or performing an unusual number of searches.
…
Limit access to your website (or show a captcha) for requests originating from the IP addresses used by … services such as Amazon Web Services or Google app Engine … proxy or VPN providers. … Don't just do it on a per-IP address basis; you can use other indicators and methods: … How fast users fill out forms, and where on a button they click; … gather a lot of information with JavaScript, such as screen size / resolution, timezone, installed fonts, etc; … HTTP headers and their order, especially User-Agent. … Use and require cookies. … If it doesn't request assets (CSS, images), it's not a real browser.
…
Obfuscate your endpoints and make them hard for others to use. … Don't expose any APIs … unintentionally. … Don't forget your mobile site and apps. … If feasible, don't provide a way … to get all of your dataset.
…
Slow down scrapers and make them ineffective. You could also show a captcha if actions are completed too fast or faster than a real user would. … Screw with the scraper: Insert fake, invisible honeypot data. … Unfortunately this is hard, and you will need to make trade-offs between preventing scraping and degrading the accessibility for real users and search engines. … Show a friendly error message that doesn't tell the scraper what caused it. Something like:
…
Sorry, something went wrong. You can contact support via helpdesk@example.com.
Other suggestions include these, by SyneRyder:
If you haven't already, try adding some "trap streets" to your data. Map makers occasionally include streets that don't exist, so if a competitors map includes it too, it's clear that the competitor copied it.
…
I did that with an online marketing dictionary I wrote years ago, some of the definitions included strange usage examples that contained the names of several of my friends. When a competitor scraped us, instead of shutting them down, the boss negotiated a data licensing arrangement with the scraper instead, so we ended up getting a revenue stream and backlinks out of the incident.
…
The DMCA is often effective. I've made DMCA requests against websites that distributed cracks of my software and they often disappeared in a couple of days.
But nicolaiplum is amazed we’re even talking about this:
The amazing thing about this is that the Irish Data Protection Commission did anything at all. The second-most attractive thing about Ireland as a place to put your EU subsidiary of a US corporation is its incredibly ineffective and supine regulator (the most attractive thing is the low corporate tax rate).
There have been a lot of rumours that data protection regulators in parts of the EU that are more effective, like Germany and Netherlands, told the Irish DPC that if the Irish did not act, the Germans and Dutch would start their own enforcement actions, and that this finally prodded the Irish DPC into doing something.
How bad is that fine, really? At the time, it was said the leak covered “more than 533 million users.” Nitmare64 does the math:
52 cents per person affected. LOL, no wonder these companies keep doing this.
Meanwhile, Dudezila has an inquiring mind:
Serious question: Who gets the money and what do they do with it?
And Finally:
I promise not to subject you to Mariah this year
You have been reading Secure Software Blogwatch by Richi Jennings. Richi curates the best bloggy bits, finest forums, and weirdest websites … so you don’t have to. Hate mail may be directed to @RiCHi or ssbw@richi.uk. Ask your doctor before reading. Your mileage may vary. Past performance is no guarantee of future results. Do not stare into laser with remaining eye. E&OE. 30.
Image sauce: Stuart Isett (cc:by-nc-nd; leveled and cropped)
Keep learning
- Find the best building blocks for your next app with RL's Spectra Assure Community, where you can quickly search the latest safe packages on npm, PyPI and RubyGems.
- Learn how you can go beyond the SBOM with deep visibility and new controls for the software you build or buy. Learn more in our Special Report — and take a deep dive with our white paper.
- Commercial software risk is under-addressed. Get key insights with our Special Report, download the related white paper — and see our related Webinar for more insights.
Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.