It’s bad that Backblaze did not do their due diligence while integrating with Facebook pixel, but the bigger problem is the tendency of third party integrations having default settings that are overly aggressive when it comes to access of user/site data.
This should be a warning to every developer, when you integrate with any third party - don’t just copy the default snippet of code they recommend, read the docs thoroughly and then test/monitor what is being sent back to the third party. And if you are writing a third party widget, be considerate and make the defaults the least aggressive possible when it comes to accessing user/site/server data.
At this point, every developer should (and needs) to be aware of how invasive any tracking code they blindly copy/paste into a website. I think it's required from people who operate normal websites, blogs, etc... but missing it is somewhat forgivable, especially from people just starting.
But the level of irresponsibility with customers' most private data from a company who's MAIN JOB is to protect it is absolutely shocking. Yes, it's the freaking pixel that does the tracking, BUT IT'S THEIR RESPONSIBILITY TO KNOW WHAT THE HECK IS HAPPENING ON THEIR WEBSITE. Don't they have any sort of vulnerability assessment or security code review? Their reply tweet is almost as infuriating as how something like that could even happen to begin with. Like yeah, sorry, ain't our fault! It's astounding.
I considered using backblaze a couple of times and now I'm very happy that I eventually didn't.
> Don't they have any sort of vulnerability assessment or security code review?
I haven’t seen any evidence that they do. The last time I brought up their history of bad security practices on HN, one of their co-founders decided that the correct course of action was to come on here, accuse me of being a bad actor, and repeatedly make up quotes I didn’t say.[0] All because I tried to warn others in the community that something just like this was likely to happen again. And now it has. So, you know.
Wow. After reading about the FB tracking I was wary about Backblaze but teetering on the edge of willing to give it a pass if they fixed it. But after reading brianwski's comments in that thread, the arrogance and unprofessionalism (especially in his last comment) just completely turned me off. That attitude goes beyond a technical fuckup or bad marketing move. I'm moving my backups out of Backblaze today and won't be looking back.
OK, except I’m not a Backblaze user and I never have been (except for the 14-day free trial). I haven’t had any private correspondence with them since reporting the vulnerability I discovered in 2019. This exchange on HN was the first time I’ve ever knowingly[0] interacted with this guy. If he has actually been ‘dealing with [me] for years’ in the way you imply, it has been a very one-sided relationship.
As far as having an axe to grind goes… if wanting to protect others is grinding an axe, I guess I’m guilty of that. I don’t feel like a handful of topical messages warning people of a legitimate and clearly ongoing problem is some abusive behaviour on my part, but maybe I’m wrong. I’m happy to learn from others’ perspectives, since I’m sure I could be a more effective communicator.
[0] I suspect he was the one who replied to my vulnerability report since the same attitude was on display in those messages too, but I don’t know since that account was just named “bbqa”.
Astonishing arrogance. Their replies on current incident are also dismissive and arrogant. I can’t even imagine what kind of culture they have internally.
> Don't they have any sort of vulnerability assessment or security code review?
I bet people just don't realize that the frontend code could end up being a source of major data confidentiality vulnerability. The threat modeling, auditing etc. usually just concentrates on attack scenarios involving the backend to save money and keep frontend development a bit lighter on the security review process side.
Does not make it excusable of course, just means their threat modeling was inadequate. But probably explains how this was able to sip into production.
I dunno, you don’t need to be particularly concerned with security to understand that you do not need a facebook tracking pixel on your ‘paying customer’ UI.
Facebook will be pitching to your Marketing folks that's exactly where you need it.
Facebook want data on what actions users took before signing up, which users actually signed up and started paying, and how that relates to revenue. This UI is exactly where they can determine these types of actions.
Whether this actually makes Facebook better at marketing or not is a good question.
That's why it's a killer mistake to let "marketing folk" dictate _anything_ that has to do with the app. They can suggest, but not dictate. If this points to anything it'd the dysfunctional development process at Backblaze.
> I considered using backblaze a couple of times and now I'm very happy that I eventually didn't.
Same. I ended up going with spideroak because it was too hard to figure out what other providers were even promising for privacy, and it’s been fine. I’m thinking maybe of trying rsync.net with duplicity one day
> Because otherwise how do you know when they suddenly change their script to do something entirely different?
Even though it's inconvenient maybe we should treat it as just another 3rd party dependency that needs to be downloaded, screened, and then used from the internal store. Pretty dangerous to dynamically load a script from a site like facebook.com.
As much as I loathe and despise Facebook, this one is on Backblaze: they integrated with a well-known evil (seriously this surprises absolutely nobody) and should have been more careful with their settings. Or, you know, not have done it at all.
> but the bigger problem is the tendency of third party integrations having default settings
It's certainly a problem, but the biggest problem is simply that Backblaze doesn't need to integrate Facebook pixel to their web interface, especially when users are logged in.
There is absolutely zero benefit to this for their users, but I also fail to see what it could bring to them as a company??
They have now created a major PR problem for themselves, and what did they get in exchange?
(Ok, as the saying goes, there's no such thing as bad publicity. But still.)
The FB pixel is how FB gets to know about conversions. If your company runs FB adds, FB encourages you to let them know about which users converted to paying customers so it can improve the targeting of your ad campaign. And if you offer a free trial and the upgrade to paying customer happens inside the user dashboard, presto-bingo Marketing now needs you to install the FB pixel there.
It's disappointing that this is what the internet has become, and I'm happy to see issues like this brought forward. I can see value in feeding back conversion events to an ad network, but the "just let us run code on your page" style of integration needs to stop. Give developers some API to explicitly send such an event, if they really need to.
>And if you are writing a third party widget, be considerate and make the defaults the least aggressive possible when it comes to accessing user/site/server data.
Unless your business model is literally collecting all personal and private information, then don't do it.
>>It’s bad that Backblaze did not do their due diligence while integrating with Facebook pixel
How much "due diligence" should be required to know that adding anything from facebook INSIDE your backup / file storage product is a bad idea... hell adding anything from facebook at any product should be considered a bad idea...
> and then test/monitor what is being sent back to the third party.
Trust, but verify. Especially FB.
I see a lot of comments about a paid service using tracking pixels. I understand the backlash but this is also how companies that are taking your hard earned money improve and iterate on their products. I know people don’t like tracking but usage analytics are invaluable to improving the product. There’s only so much customer engagement you can do, and often that leaves really unexpected outcomes in the dark (if you asked most customers, they’d ask for a faster horse).
The issue here is that Backblaze injected untrusted code into their core product which by its very nature is extremely sensitive. I’m not really sure what they were thinking. Rolling your own tracking is a pain, but some security contexts demand it.
Pet peeve of mine: "Trust but verify" is an oxymoron. If you "trust" someone/something you don't need to verify. I understand the sentiment, but it's based on a faulty assumption that trust is a binary state.
It's most exemplary case is the use done by the security teams in most IT shops: "Yes, we did put endpoint protection on your laptop, it's not that we don't trust you -- but you know what they say...". I wish they would simply say:"Of course we don't trust you, not on this. You are going to click on that shady link or download and run random executables from the interwebs. But we do trust you're good at your job".
The historical usage of the saying "trust but verify" is that the verification happens after they had the chance to do the thing (possibly badly), as opposed to mistrust where you'd verify everything that they'd do before allowing the thing to be done.
It's essentially "audits instead of pre-approval".
I totally trust Facebook. To fuck me over at every possible opportunity, even if they’ve pinky promised “we’d never do that!”. Because they have a track record of doing exactly that.
‘Trust’ is defined as wilful forbearance toward those agencies that we can conceive of as being sources of harm to us.
If you “trust Facebook to fuck [you] over” then you’re not trusting Facebook. You’re expecting Facebook to fuck you over and minimising your exposure to that process.
If you ask a computing security (or indeed any security) professional, "trust" has a different definition. If A trusts B, then B has the capability of doing something bad to A. This is regardless of whether A has granted B that capability. So, when you drive down the road, you trust the oncoming car not to swerve into your path and crash into you. This is just another way of saying that the oncoming car has the capability to swerve into your path and crash into you. There's no wilful forbearance involved.
I disagree insofar as the scenarios you describe (implicit trust in computer security, trust that another driver won’t swerve into your vehicle’s path) are perfectly good examples of the overall definition I posited above.
Indeed, if one didn’t have forbearance of (say) a software vendor or perhaps a dependency then one would be entirely free to not use that vendor’s product or that particular package. Similarly, if you don’t swerve right to prudentially make space for an approaching vehicle to serve into your path then you’re showing them forbearance and trusting them (or swerve left, depending on one’s location on the Earth what side of the road is legally mandate).
So, yep: those are perfect examples of Trust as embodied in the definition I presented above. And of course it lies at the root of initiatives such as Trusted Computing Initiative & cetera.
Unfortunately Facebook has lots of information about you whether you have an account or not. You’re a big black hole with known characteristics inferred from those acquaintances of yours who interact with you in a manner Facebook can track. You might not have a name and an account, but there’s a big you-shaped golem in their data and there’s absolutely nothing you can do about it, unfortunately, and they’ll use it as they see fit with or without your consent.
Makes me think of that Junji Ito comic with the person-shaped holes in the rock after an earthquake. Only now, it's an allegory for social media giants and their corrupting influence on the people they lure into their system.
You can I'd you live in a jurisdiction which gives you the right to demand for all information a company has on you, and the right to be forgotten ( GDPR in the EU, maybe California as well?)
My guess would be that developers installed a single Google Tag Manager script and left the tracking to the marketing/analytics team. From that point they manage what third party scripts are added to the site, not engineers.
This situation cases a tunnel vision which can unfortunately lead to situations like this.
Facebook doesn't make an usage analytics service. And I really, really doubt 99.9% of the people using these abundance of tracking services ever discover any worthwhile corner cases you extol their virtues for. "Analytics" is just the 2000 word for "Reporting" - clueless middlemanagers demand shiny graphics.
this is also how companies that are taking your hard earned money improve and iterate on their products.
What analytics do you think they can do using Facebook that they cannot do from their own httpd access logs, a technology that's 30 years old at this point?
Or, if that is not sufficient, one of the increasing number of ethical and/or self-hosted services, like Plausible? There is no need whatsoever to ship this kind of data to an outside party.
Tracking is not ethical. Corporations only get away with it because most users don't realise it's even happening.
If you went into a retail store and an employee followed you around the whole time with a notebook and stopwatch writing down everywhere you walked and every product you looked at, you would rightly be creeped the fuck out and tell him to stop.
This is exactly what online tracking is, but done virtually.
> If you went into a retail store and an employee followed you around the whole time with a notebook and stopwatch writing down everywhere you walked and every product you looked at
I hate to break it to you, but retail companies are doing this today using security camera footage, to figure out what parts of the store customers start at, or spend the most time in..
This is one of key applications of GDPR in Europe - the fact that you can collect data for one purpose (e.g. security cameras) does not necessarily imply that you're permitted to use the same data for any other purpose (e.g. marketing analysis of customer movements).
For the former purpose, it would generally be sufficient to inform visitors with a sign on the entrance with legitimate interest clause; for the latter example IMHO the only practical compliant solution would require anonymization of the data, so you could make and store density data iff you don't have any way to tie them back to customer identities including the purchases they made, which is a key difference from the facebook example, which (as far as I understand) uses unique IDs to link the conversions to specific FB accounts.
The difference is that with a human near by I feel like I am being judged and when they are not I do not feel that kind of pressure. The grocery store I've gone to my whole life has always had security cameras tracking people for reasons I assume to deal with theft. It does not effect me at all and the store has useful data that they can use. It benefits all parties.
>The difference is that with a human near by I feel like I am being judged and when they are not I do not feel that kind of pressure
I will offer you another perspective to consider.
Technology is extremely subversive in that it bypasses all of our brain's instinctual responses. Someone or something monitoring and tracking you should be setting off warning sirens in your brain. At best they are trying to study you, at worse they are trying to exploit or harm you.
Through hundreds of thousands of years of evolution our brains have built up warning systems to make us feel fear and unease when we realise we are being tracked. But since humans have spent 99.99% of evolution entirely in the physical world these systems have no concept of the digital.
The reason you feel extremely uneasy when being monitored by a person, but not when being monitored by a computer system that is collecting the exact same information (or more), is that your subconscious brain doesn't understand computers.
>our brains have built up warning systems to make us feel fear and unease when we realise we are being tracked
>The reason you feel extremely uneasy when being monitored by a person
I don't though. If someone walked up to me and asked me what my favorite color was and they wrote it down I don't feel any negative feeling.
Even if it was a true thing just because we have a warning system that doesn't mean that something is actually bad. Tracking just gives people more information to allow for better decisions to be made and it can make things more efficient.
"but the bigger problem is the tendency of third party integrations having default settings that are overly aggressive when it comes to access of user/site data.
"
Our permission schemes are way too broad in general. I have done some tests with Zapier and IFTTT. For every integration you consent with them seeing ALL your data, being able to modify and so on. You also don't get a log how often the permission was used.
We're looking to invert this equation at Transcend. All client side network emissions can be regulated in accordance with tracking consent following a block/quarantine-by-default model.
This means that any network emissions you add to your site would have to be categorized by URL or domain instead of relying on shaky special-cased integrations that can fail as soon as an API changes.
PSA for any devs out there implementing FB Pixels:
– Facebook's pixel will, by default, attach click listeners to the page and send back associated metadata. This simplifies implementation needs, but can create unintentional information leaks in privileged contexts. To disable this behavior, there's a flag[1] you can use. After which, you can manually trigger FB Pixel hits and control both when they're fired and what information is included in them.
– There's a feature for the FB Pixel called Advanced Matching[2] that allows you to send hashed PII as parameters with your FB events. "Automatic Advanced Matching" can be enabled at any time via a toggle in the FB interface. I believe that setting autoConfig to false as mentioned above will similarly prevent Automatic Advanced Matching from working (since it disables the auto-creation of all those listeners to begin with). When manually triggering pixel calls like above, you can use this functionality via "Manual Advanced Matching"[3].
As a general rule, I'd strongly encourage anyone implementing a Facebook pixel to also include the autoConfig = false flag. This makes it work like most other pixels, where the base tag just instantiates an object. After which, hits only occur when explicitly defined in the site code, and include specifically what details you include in it. That way you're fully aware of the scope of data disclosure happening and any need from marketing to include sensitive (or potentially sensitive) information in these calls has to be explicitly requested (and theoretically vetted) as part of the standard dev process.
I don't necessarily disagree with you, and whether a pixel should be there at all is definitely a discussion in itself.
But for those who are implementing FB Pixels, I wanted to put out some potentially useful information that can help protect against unintended data disclosure, after mentioning the auto-listener behavior in a reply to another comment and being met with surprise[1].
Given their (paying) customer base, which skews more towards content producers, I suspect it's more likely intended to ease setup for less-technically-savvy users.
ie they don't really want your truly-security-critical customer data. But if they can boost their conversion rate with sites like dogfoodreviews.com by 5%, and the price is sending backblaze.com's fantastically-sensitive paid customer data into an unsecured data path, they will absolutely do it.
Comparable to the absolute havoc that Zoom wreaked on browser security to save one click on starting a call.
> I suspect it's more likely intended to ease setup for less-technically-savvy users.
> ie they don't really want your truly-security-critical customer data.
It's both. It eases implementation with a one-and-done snippet, and then slaps a user-friendly GUI on the other side for marketers to sort through the firehose and use what they want.
While making it also trivially easy for marketers to toggle a button that OKs the turbo-boost mode that siphons up (hashed) sensitive customer information, which can then be used to claim credit for additional conversions by cross-referencing the (hashed) PII siphoned up against what Facebook has for those exposed to your ads.
> I guess they can’t afford to cover costs given current prices so they sell customer metadata to Facebook
This sentence, your thesis, is absurd. Where does it say they make money from selling data?
Furthermore, "I guess they can’t afford to cover costs given current prices" is a really strange foundation to leap from. Do you have any facts? Or are you just speculating on BackBlaze and making wild assumptions?
> now that other like-minded crazies can find each other faster than ever
You are right, there is nowhere else online where "crazies" gather - not 4chan, not reddit, not voat, not twitter, just Facebook?
The last point I’ll admit exposes my bias against Facebook.
But the previous two points make no difference on why tracking is baked into an internal portal so I speculated as to the reasoning as anyone would do — it’s not a stretch to think that data owners could sell customer data to an aggregator like Facebook for an additional line item of revenue.
While this FB pixel debacle is obviously a very big screw up, it's pretty much a "screw up" and unintentional from what I understand so far. And they have fixed it already which is a positive step towards redemption.
From my speculation - the screw up seems to have happened from including the googletagmanager. They probably only wanted it to stay on the home page of B2 (for ad conversion tracking if I were to guess), not on the dashboard itself after login. The screw up caused it to be on the dashboard too.
Running and measuring ads is one of many things that delivers value to a business, yes. The privacy issue in this case is clearly an implementation mistake and seems to have been resolved.
Ignoring the situation and context to make a comical statement doesn't really add anything to the discussion.
The overwhelmingly mentality on HN is that All Ads are bad.
And all targeting Ads is bad. Because by their definition, All ads are tracking ads.
This mentality also fits the current Internet and Twitter narrative. Especially true when it is from Facebook. Which happens to be pure evil on HN, twitter sphere and MainStream Media.
Ads are micropayments that work. I don’t like them per se and run an ad blocker and PiHole, but the fact that others don’t allows me to micropay for a lot of content with my time.
When I read a 5 minute Medium article, I'm paying with 5 minutes of my time. If I decide to bail out 1 minute into the article, I have still lost 1 minute of my time.
The creator isn't getting any benefit from it, but I'm still paying.
> Because by their definition, All ads are tracking ads.
John Gruber has an ad at daringfireball.net, currently for a company called Simris. IIRC the ad is pure text, not loaded by a script, and does not track you. Other blogs (usually security professionals ime) have text-based ads that are probably part of the theme in a static site generator.
I assume they get business value by retargeting site visitors - to do this, just run the FB pixel (properly configured!) on a marketing pages of the website. Not in the logged-in part!
Why? Using ads to increase business is completely valid. This issue is data leakage due to an implementation error and has nothing to do with using advertising services from Facebook or other companies.
By this reasoning, using guns to shoot people is completely valid; the issue is stray shots due to inadequate aim and has nothing to do with being a criminal engaged in a drug war.
I'll resist the temptation to draw a parallel between "advertising services from Facebook or other companies" and a crime syndicate.
No, that's not the same reasoning at all. It's an irrelevant and outrageous strawman where you compared the use of advertising to "using guns ... as a criminal engaged in a drug war". Ridiculous at best and I'm not sure what temptation you resisted.
If you have a real rebuttal against advertising then reply with that instead and we can discuss how technical implementations can be fraught with security mistakes and errors, regardless of industry or product.
The point is that "technical implementations", such as how to shoot properly, shouldn't be discussed "regardless of industry or product", such as being a gangster: sharing PII with Facebook is something most web sites should avoid, not something they should do properly.
Why not? Technical implementations can always be discussed separately from the context they're used in, and even your extreme example of guns has perfectly valid uses in the police and military. Yet you're making the strange comparison to being "a gangster". Why? What's the point of this convoluted analogy?
> "sharing PII with Facebook is something most web sites should avoid, not something they should do properly"
Again, why? You seem to claim a lot without any basis. Data has valid uses, and being used properly is foundational to providing privacy.
It shouldn't be controversial that not sharing sensitive data with Facebook is "foundational to providing privacy" and therefore using a Facebook tracker to fuck users needs a solid, extraordinary justification like "valid" gun use by the police and military.
You seem to believe that this particular breach is accidental, but reckless incompetence on Backblaze's part isn't much better than deliberate disregard for user privacy: any online service from Facebook should raise a red flag.
If you agree that the sensitive part was in error, then you're just against sharing data with Facebook for ads? That's certainly not some unanimous global perspective as I'm sure you know, since that's their actual business used by millions of other companies.
There are ways to use ads without violating privacy nor breaking the law (remember that this practice is illegal under the GDPR).
Either way, if you must do ad tracking, do so on your homepage. Once the user is logged in and has paid you money for a service there shouldn’t be any ads nor tracking.
> Finding new users that are similar to your existing customers is a completely valid strategy.
What on earth does “valid” mean here? It’s certainly not acceptable (to me as a customer) if it involves exposing your existing customers to these risks. Those ends can not justify those means.
Valid as in it's a common, reliable and efficient way to gain new customers.
Customers weren't intentionally exposed to that risk nor was it part of a trade-off, it was an implementation mistake for many reasons, something I've repeated 3 times now. What is so complicated to understand here?
Customers were intentionally exposed to the risk, because they intentionally added this third-party code. If they’re not thinking in terms of risk management when they add third-party trackers to their site they do not have an adequate security process. There is a trade-off to security whenever you allow code like that in your product. They can’t just wave it off as a mistake, because it’s a mistake that is very telling about their priorities.
It’s very simple: if you include un-vettable third-party code in your system, and system also handles sensitive data, you are dealing with a huge risk. You need to make sure that the code is unable to touch the sensitive data. As it turns out, it’s a lot harder than not having untrusted code and sensitive data in the same system in the first place. The direct mistake was probably that the wrong code was included on the wrong page, but if the risks involved had been taken seriously, such a small mistake would not have been able to have such a catastrophic effect.
> Finding new users that are similar to your existing customers is a completely valid strategy.
But this can be achieved with tracking in the homepage without embedding trackers in the actual product right next to sensitive data?
> Most people in this thread are making wild statements from the typical emotional/outrage driven pile-on when anything happens.
This doesn't make these statements any less valid though? Most people are indeed outraged that a paid professional product is ratting them out to Facebook which makes total sense as nobody would've expected that.
The thing that concerns me about the FB Pixel (and GTM) is that the host is completely free to do anything and everything to the page. Even if they don't do anything "evil" today, tomorrow is a different story completely. This scares the pants off of me and makes me want to rip out any "tracking" that I've ever installed on any site anywhere. Actually, that's probably not a bad idea.
Are there no browser level protections for this type of thing? I thought CORS was supposed to prevent these activities from happening.
Virtually all tracking boils down to 1x1 sized images getting embedded on the page, with various metadata being attached in that image call. The javascript libraries may include other functionality (like additional fingerprinting and such), but are primarily just convenient abstractions that generate and embed the the tracking images for you. Most provide the details needed[1] to build your own generator function, which would allow you to integrate the tracking you want while reducing your security exposure to third party code.
As for GTM – a deployed container is self-contained. If you don't want to expose your site to third party code, but want to use GTM as a convenient control plane for configuration of tags and tagging rules, you can do that. Instead of using the standard snippet that loads the container from Google, you can just grab the generated javascript file for the container after a new deploy and self-host it. It gives you the convenience of GTM (central control plane for tagging-related stuff, versioning and commenting, etc) but without the security exposure of embedding externally hosted scripts.
The actual 1x1 pixel is a leftover from the previous generation tracking tools, and even the page you liked to recommends _against_ using that method because it can’t spy on users enough.
Here we are talking about a tracking _script_ embedded in the page and sending to Facebook everything the user does (“standard or custom events triggered by UI interactions”).
Using only a pixel to track how users move around the app wouldn’t have landed Backblaze in as much hot water. Instead, it looks like the Facebook _tracking script_ (automatically) exfiltrated sensitive data like file names, and that crosses a limit.
It's not a leftover – the core premise of how these scripts work use the exact same principle. Even when using the JS tracking library, if you look at the network calls to Facebook after the initial script download, they're all hits to https://www.facebook.com/tr/ with the metadata for the call in query parameters and a return an image content type (image/gif).
As I mentioned in my original comment, the tracking scripts are more than just generator functions for the image pixels. They also do stuff like browser fingerprinting and cookie management[1], and ensure these things get tacked onto generated pixel calls. This improves the fidelity of the data sent back to Facebook, but ultimately it all boils down to image calls with tracking data tacked on as query parameters to the call.
The reason Facebook (and others) don't recommend doing this is because
– As you mentioned, they have way more freedom to do what they want on the page when you load their actual script. So of course that's going to be their preference.
– Advertisers use these pixels for attribution purposes, but ad networks also use the opportunity to further fingerprint and profile users for targeting within their platform.
– The tracking script abstracts away the actual tracking protocol being used (i.e. the query parameters and their associated values). Which helps ensure calls are made correctly, as well as provides flexibility to make changes in the underlying protocol while retaining a stable interface via the JS SDK.
- Takes care of things like generating a unique user id, looking for and saving Facebook Click IDs when seen on incoming traffic, and tacking those values onto pixel calls when they occur.
Any user ID can actually be used, so long as it's unique (and Facebook's methodology is documented and easily replicated in [1], if you want to be consistent with the SDK). And persisting a query parameter into a cookie is actually more robust if done by a first-party script, since ITP has made the lifespan for cookies written by third-party scripts so short.
As long as your custom image generator accounts for those two components (generates a client id if none exists and persists + includes a fbclid if seen on incoming traffic), you will get close to parity with the JS tracking library as far as attribution in Facebook Ads without any need to load third party scripts from Facebook (or other advertisers). Which, as an advertiser, is the only part that you care about. What isn't at parity is all of the secondary fingerprinting that ad networks do, but that's the ad network's problem and preventing that shady shit from happening on your site is the precise reason you'd want to roll your own tracking calls to begin with.
As a first-party site owner, subresource integrity checks[0] (that someone else already linked elsewhere in this thread) lets you at least determine, at the browser request level, if a third-party script has changed since you installed (and hopefully audited) it.
For various reasons including this, advertising tracking is moving server-side, where the company can much more tightly control what gets sent to the vendors, and where third party JavaScript no longer has access to the DOM, network requests, or cookies.
The upside of third-party trackers is that you can completely block all of them by just blocking third-party javascript. What are we going to do once all of this tracking code starts getting served from the first party domain instead? Or even served inside the same source files as site code?
I imagine we will start seeing a new class of privacy extensions that behave more like anti-virus. Checking for known hashes of tracking scripts, monitoring for certain patterns of behaviour during execution.
The future is entirely server-side tracking, with no JavaScript executed in the client unless for UX tracking like Hotjar or A/B testing like Target or Optimize.
Personally, I haven't seen a desire in companies to skirt GDPR. Rather companies just want to be compliant and not have to worry about data breaches or reputational damage from their marketing tools. This example with Backblaze is exactly what companies are trying to avoid.
As a protection for the users, addons like Facebook Container for Firefox [0] can isolate all Facebook tracking and prevent the scripts from running on pages that are not facebook.com.
And if that doesn't tick your creepy boxes lets try the financials. If a user hits your tracking pixel they (and those like them) will more likely see ads similar to yours, meaning potential customers will be more expensive to obtain now.
One easy way to avoid this kind of mistake in your own product: make a clear distinction between your publicly-facing web site ("corpweb") and your web app for logged-in users. Preferably, they should be served from separate infrastructure.
Corpweb should be as static as possible, except for whatever third-party JS the marketing professionals think is necessary. It's their job, they know what's best.
Your app should have zero third-party JS except for technical analytics (New Relic, Datadog, whatever).
(This distinction can be fuzzier for free services, and for consumer stuff with non-sensitive data; Backblaze is neither of those.)
That made me LOL. You'd like to think they know best. Most of the time, they do things because other people do things, but don't truly understand what the true ramifications are of their requests.
Marketing: "What's the big deal? All you have to do is add the 2 or 3 lines of JS to the site."
Devs: "Do you know exactly what that will do?"
Marketing: "It'll give us all sorts of useful metrics for free"
Devs: "Do you know if it is secure or will cause our site to become less secure due to vulns in the included JS? Will it cause the site's performance to become sluggish where we will get blamed? Do you know exactly what data is being collected, and will it affect any of our other obligations of maintaining this site?"
Marketing: "Um..., that's your job. We just want the data"
I'm speaking from the perspective of a site, like Backblaze, where web app and the site fulfill two separate functions. There are lots of cool metrics that marketing wants; any code they put into www.backblaze.com is pretty low cost, and usually done by a separate team than product.
The product site itself (usually app.example.com, but Backblaze seems to use secure.backblaze.com) actually contains customer data in the browser context, is under much higher base resource loads from its core functionality, and is used repeatedly in workflows where poor performance is painful to the user.
No one gives a shit if your pricing page takes 500ms to load instead of 100ms, or if a dozen social media companies who already know where you work learn what kinds of professional products you're looking for.
They do care if a file listing takes longer, a recipe opens slower, if word frequencies in confidential data are leaked to the world.
In most paid products, the non-paid part of a site has radically different performance and security requirements from the paid part, and forcing one to be built to the requirements of the other (in either direction) is wasteful, or dangerous, or both.
This is essentially true for every single human and the reason discussion forums has discussions, because we keep being unable to see the other persons perspective and think we need to spread "the truth" to them.
If you are doing business in the EU, then you have to be careful about using 3rd-party stuff (Google fonts, embedded Youtube videos, embedded maps, chat widgets, ads, pixels, analytics etc. etc.) on the public-facing website as well.
> It's their job, they know what's best.
That's like sending an alcoholic down the spirits aisle.
As long as they don't have the car keys, they can go knock themselves out.
Re: the more substantive legal points - there are off-the-shelf solutions (CMPs, for example) and easy checklists for complying in the setting of a static public-facing website. The web designers and brand managers I've worked with are more than capable of meeting clear industry standards.
It's inside a web app, where customer data is on the page and in JS scopes, where the product team is essential in safeguarding customer data.
> there are off-the-shelf solutions (CMPs, for example) and easy checklists for complying in the setting of a static public-facing website.
In my experience, the people implementing these often don't understand enough about the technology to be allowed to implement this. I wonder if "easy to use" tag managers are to blame, by allowing non-experts to add JS and other includes to webpages without process or scrutiny.
Check a few big brand-name websites, and look at whether they place (third party) cookies before the CMP has even been interacted with.
I can think of some major high street labels where the consent prompt is mere theatre.
> there are off-the-shelf solutions (CMPs, for example) and easy checklists for complying in the setting of a static public-facing website.
Consent Management Platforms, things like Cookiebot?
In my experience, blocking 3rd-party HTTP requests, cookies, LocalStorage access etc. before consent is given – is easy in simple cases, but can quickly get technical and tricky.
I agree with this approach but there's an issue with "it's their job, they know best". They will inevitably want to put the same tracking crap in the logged-in site as well, so they can see how visitors converted into users, how they used the site once they were users, how valuable that made them as customers, and so on. As long as FB/other analytics firm is saying "we can help you market better with additional data", marketers are going to advocate sharing it.
Which is why it has to be a two way street - when it comes to the product, specifically its security and performance, engineering needs to absolutely own the thing.
> Corpweb should be as static as possible, except for whatever third-party JS the marketing professionals think is necessary. It's their job, they know what's best.
I strongly disagree. Marketing professionals often lack technical understanding and are superficial about the consequences. This mentality is how you end up with engineers working for companies doing morally questionable choices because they just want to be a cog in the system instead of being concerned about the direction and how the company do business. Separations between services is a shared illusion, if something is against your values please tell your fellow human beings.
In a static website, where there are standard tools and checklists and web designers to walk them through it, marketing's lack of technical understanding is less of an issue. And in a B2B web app like Backblaze's and my own, the data exposed to the public web site is just not all that sensitive.
And I'm not talking as a prospective employee who "just wants to be a cog in the machine"; I'm talking as a founder and CTO who sets company goals. I'm worried about data leaks caused by poor implementation and short-sightedness, not those caused by company policy that I disagree with. If I disagree with company policy, I change it.
I'd be interested to see a site where marketing professionals with limited technical understanding knocked themselves out, but they used standard tools and checklists, and it came out OK. Do you have any examples?
I agree with this delineation, and if necessary, it lets the sales/marketing people go absolutely wild with the tracking, analytics and CRM-system integration on the public facing marketing website, should the C-level people decide to allow them to do so.
I’m baffled by this mentality but I guess it explains a lot. At which point are you concerned about what other services or even the company is doing? That’s a real slippery slope for me.
* Dealing with sensitive data is their core business, not a side concern. i.e. their customer base is security-conscious enough that they know incompetence on that front will kill their business
* Both of those specific services omit lots of specific identifiers in their data collection, and require you to go out of your way to send truly sensitive stuff to their servers
* By necessity, a technical analytics service gives you lots more control over where exactly it hooks in to your code.
The goal of marketing is to get product in front of the right audience. The proposed solution you mention would include plenty of folks who don’t convert. That would likely create less effective results for marketing.
Also, what would you use to monitor user behavior to improve your product if third party tracking is frowned upon?
> Also, what would you use to monitor user behavior to improve your product if third party tracking is frowned upon?
* user testing, user interviews
* (there's probably a fancy word for this) generate usage metrics from production data. If the product is e.g. a To Do app, you could measure "engagement" by counting how many To Do items each user has created
99% of the time, 1st-party tracking just means hosting the tracking script on their own domain, the data still gets sent to 3rd-party analytics services.
Well, here is where my hot take and general principle needs some nuance :-P
> The proposed solution you mention would include plenty of folks who don’t convert.
The one bit of data I currently have going from the app to corpweb analytics is precisely this - associating a conversion with a website user. That conversion info is sent with hand-coded triggers, the relevant third-party libraries are self-hosted (a good thing generally), and code doesn't call out to it when the user is outside of the billing/subscription flow.
> Also, what would you use to monitor user behavior to improve your product if third party tracking is frowned upon?
I'm open to A/B-testing stuff - the important part in my mind is less specifically about third-party tracking, and more about ensuring that the product/engineering team makes decisions about the product. Pulling third-party code of any kind, but especially tracking code, is a process full of footguns that should be under the control of people who know what they're doing and are empowered to say "no".
I would still be very cautious about giving analytics code full access to all user activity.
Google Analytics custom events and Datadog are in fact the only third-party analytics code I run in my app. Tools like GA that gather less-identifiable user behavior info are borderline in my book between technical and marketing; and I honestly just trust Google as an organization not to mix my own GA data into their marketing data.
Yup! I happen to not do that; part of why I trust Google on this (in the site owner role) is that they're good at making privacy-compromising options clearly-delineated and optional.
And this is a smalls subset of filenames that could not only provide PII but also potentially embarrassing or private information that isn't identifying but would be accompanied with files that are personally identifying.
Just from looking at my own documents folder, "Stephen Tordoff - ESA Appeal SSCS1.pdf". That would reveal that I have a disability, and that I am (or have) claiming benefits for it. Not everyone would be happy with that being public knowledge, and I'd be less that thrilled if things like it were shared with Facebook for no reason.
I’ve worked for multiple FAANGs and can assure you that all have considered file names to be PII. Pretty much anything that contains user input should be treated as PII.
Lawfirms often utilize automatic content-derived filenames, up to the maximum OS-supported limit. As a result you'll find all sorts of private information in the filenames within their backups.
I should also mention, those automagic naming schemes use the beginning text in the document. So it's basically the personal details of plaintiff vs. defendant.
I've also seen similar schemes used by doctor offices/hospitals, where you'll see the patient name and ailment. I once had to troubleshoot a backup problem for what turned out to obviously be an OB-GYN. Imagine my horror as I saw long filenames containing patient names and common STDs scroll by in the thousands.
No, that’s neither true nor “technically correct”. Services that the user provides health information to for their own purposes aren’t considered covered entities or their business associates, and thus HIPAA rules don’t apply. This is why Dropbox, and GMail don’t have to be HIPAA compliant.
If you say “Read the law strictly enough” please do...
You might be right, I believed I saw some definition that simply stated that "system receiving or storing PHI" would be required to be HIPAA complaint, regardless of how the data got there.
I'm still wondering, because Backblaze will sign a BAA (their website says so), making them a business associate. I'm not talking about some private person uploading their own documents. My concern is that given that Backblaze will sign a BAA, then some companies must be using Backblaze and potentially storing PHI data there. Yes?
Backblaze then need to follow: "§ 164.312 Technical safeguards. (e)(1) Standard: Transmission security. Implement technical security measures to guard against unauthorized access to electronic protected health information that is being transmitted over an electronic communications network."
Facebook isn't authorized to access this data, but that might be more of a problem for Backblaze, even if Facebook could be required to delete the data.
Yeah, that seems like it would be on Backblaze, but this might not apply here, as 164.312 only applies to PHI, and it's quite likely that Backblaze will only sign a BAA for their B2 service, and not their standard backup service.
Pay careful attention to the response from Backblaze:
> "Hi Brett! The pixels we use are primarily for audience building when we advertise on other platforms like Facebook for example." [...]
The carefully calculated cutesy "Hi Brett!" with the exclamation point is the same reason big tech companies use infantile graphics [0]: by seeming playful, they create the illusion they are a Safe Friend you can Trust.
Using a salutation, and addressing someone by name is not a conspiracy to make people trust you. The things that you should care about are the banality of evil, and that no one believes that they've done anything wrong. I live in the Midwest and my job is to make low-impact CRUD applications for a small car insurance company. I would use the same salutation, because I have been taught that that is what I'm supposed to do. I wasn't coached in some session on how to trick people into thinking I'm their buddy - it just becomes part of the shared social vocabulary.
Modern business vocabulary has shifted from "Dear Mr. LastName," to "Hi FirstName!". This shift happened first in more "trendy" places, although most everyone has already moved on to using informal language in customer relations.
I do agree with your point about banality of evil.
Ha - you’re right. I suppose I object to "Hi $FirstName!" becoming the standard in professional communication. Often there’s nothing exciting that follows that exclamation mark, and the comma has been the standard so far.
Backblaze is role-playing 'trusted friend' on twitter the same way McDonalds and Wendys get into 'fights' on twitter. It's just corporate playbook stuff; I wouldn't say it's a conspiracy. Here's the latest posts on backblaze's twitter: https://i.imgur.com/mMkylym.png
I believe that your analysis has flaws. It would be quite awkward on social media to use the a more formal way of addressing people. Twitter and other platforms have a "style" of conversation and trying to fit the square peg of formal writing into the round hole of internet conversation sounds stilted. I do not understand why you think corporations would decide to do that, no matter what their intentions are.
I humbly await your response.
Yours Truly,
Mr. rPlayer
P.S. I hope your Grandmother is doing well. Please send my regards.
P.P.S. Please invest in my new cloud computing blockchain biotech startup where we sell NFTs.
Please update your signature to conform with the current standards, as outlined in the last month's circular.
Best Regards, TeMPOraL
--
TeMPOraL, Internet Compliance Officer (ICO)
ACME LLC - Synergizing Creative Accounting
ACME LLC, NaN NaN, Null Islands.
The content of this message is confidential and intended for the recipient specified in message only. It is strictly forbidden to share any part of this message with any third party, without a written consent of the sender. If you received this message by mistake, please reply to this message and follow with its deletion, so that we can ensure such a mistake does not occur in the future.
Please do not print this message unless it is necessary. Every unprinted message helps the environment. Think of the trees!
I don't use twitter so I don't know what the etiquette over there but outside of emails I wouldn't normally expect any salutation in an internet message.
So for me it's not so much that the salutation not formal enough, it's more that it's odd that it exists at all.
But then again maybe usages differ in twitterworld.
(For the curious: I couldn't be bothered to install https://projectnaptha.com/ (probably would've worked), I just resized a terminal to the same column width and blindly retyped the text into a shell printf statement.)
And also when you upload something to blackblaze, you are already handing out your files to a third party. There shouldn't be anything sensitive that you didn't encrypt client side.
Except... are the filenames encrypted? For many, if not most, filenames are plaintext. Backblaze's default "SSE-B2" encryption leaves the filenames in plaintext. GPG encryption typically leaves the filenames in plaintext. There's still information people consider private and didn't expect to be shared with Facebook.
That's server side encryption. My point is if you store sensitive files on a remote server not under your complete control, you should always use client side encryption. Good client side encryption will not leak filenames/directories either. A lot of them do, unfortunately.
There are programs that automatically encrypt files before storing them. A particularly enticing one is Cryptomator, which acts as a ‘disk’ keeping files on various providers and thus syncs between devices—however its Android app isn't open-source so I never actually tried it.
It's still revealing two levels of filename extesions - type of encryption and file format. Full filename should be simply random alphanumeric ASCII string without any filename extensions, although this requires to manage and store a map of keys and filenames.
(as generated by for example `head -c 30 /dev/urandom | base64 | tr -d '+/='`)
are extremely suspicious-looking. To me that screams "obsessively paranoid". I reckon that if I were in law enforcement, the fact that there is absolutely nothing I can infer from these filenames, combined with the obvious complexity associated with correctly maintaining something like this, would actually make me that much more interested in decrypting this information simply to take a look at it and rule it out.
Which is exactly why this would be a scenario in which I _would_ want to "reuse someone else's password", if you will, and I'd theoretically go digging for common archival file patterns, and use the most common I came across.
On-disk filenames should be hashes, much like .git contains lots of hash-named files. My Restic backup folders don’t show much beyond that they are Resic backup folders (if even?).
>I reckon that if I were in law enforcement, the fact that there is absolutely nothing I can infer from these filenames, combined with the obvious complexity associated with correctly maintaining something like this, would actually make me that much more interested in decrypting this information simply to take a look at it and rule it out.
If I ever did something shady, I'd absolutely make a ton of honeypots like this.
rsync.net was posted here on HN a few days ago.
Also, tarsnap is another popular service.
Neither have the special additions that makes Backblaze so popular, but could be popular alternatives.
From a quick look at pricing pages it looks like rsync.net is 5x as expensive as backblaze b2, and has a minimum of 400gb per month (it also looks like you might have to preallocate vs pay on demand). And tarsnap is 10x as expensive as rsync.net.
My guess the bulk of that price difference is due to economies of scale.
One big difference is that with B2 you can't have append-only backups. (or maybe it's very non-obvious from the docs?) That's pretty much why I use rsync.net instead - I can configure a separate key for uploads. Otherwise what's to stop anyone lifting the B2 access key from the host with automatic backups configured and just deleting all old data?
I don't have huge backup needs (around 500GB, doesn't grow much over time), so I just use duplicity and upload to AWS S3 with the "infrequent access" setting, and have the bucket auto-replicate to another region in another country. Costs me around $12/mo just for the data storage. The AWS calculator tells me if I had to retrieve all that data it would cost under $10 to do so.
Given it's all personal data that I can stand to be without for days if needed, I could probably use Glacier (or even Glacier Deep Archive) and pay less than a third of the cost (or less than a twelfth!), but the absolute dollar amount isn't high enough for me to go through the trouble of changing up my backup scripts.
Sure, if you're running a business and are constantly generating lots of new data that needs to be backed up, that gets more expensive, but I also would expect a business with that much data could easily afford to spend several orders of magnitude more than I do on backups.
I use Linode Object Storage (a S3-like service) with rclone to manage my backups. (my backups are only a few dozens of gb, I don't need something like duplicity at the moment)
So far I'm happy with it, pricing is $5/month per 250gb + 1To egress. And, most importantly, none of the AWS complexity overhead. I have enough of that at work.
So, looking at some potential things to switch to...
OVH has two interesting products here:
OVH cloud archive works with rsync, costs roughly half as much for storage as blackblaze, roughly the same for egress, but charges for ingress (at the same rate as egress).
OVH object store is s3 compatible (like blackblaze), charges roughly the same for bandwidth, 2x for storage.
Digitalocean has a blob store with the same pricing on bandwidth, 4x the pricing on storage, a minimum spend of $5/month on storage, but the first tb ($10) of bandwidth free.
Before you experience a major problem, you tend to guess what is good enough to mitigate it. Once you've experienced a major problem, you tend to ensure that exact failure mode never happens again.
I would be surprised if OVH had another fire in the future.
The only reason I use Backblaze instead of tarsnap is it is 62 times cheaper for the same amount of storage. tarsnap is dramatically overpriced if you ask me.
I get the feeling the reason for the Tarsnap price is simply a lack of scale and bootstrap nature of it. (i.e. there isn't a VC feeding money into hosting costs to drive the costs down to grow the userbase). The cost seems fair, but noncompetitive against other vendors on most easy to find quantitative measures (Cost, features, SLAs, locations, etc).
However I do think there is value in Tarsnap if security is really that important. Colin is really switched on, and I trust both the service, and if anything happened, he would deal with it quickly and professionally. If I had the kind of profile that meant I needed to protect myself against determined attackers, then tarsnap would be a no-brainer for me.
Backblaze is also bootstrapped, no VC investment except a small initial round. They’re likely investing in the business using the profits thrown off by their unlimited desktop storage offering.
And tarsnap possibly uses the most expensive tier of S3:
> the original version, which can survive the loss of 2 datacenters, not the "reduced redundancy" version which can only survive the loss of a single datacenter
These days, there's not just reduced redundancy, but also infrequent access, which seems better for backups…
Keep in mind that their storage boxes only use RAID and do not provide any other kind of redundancy by default. If the machine goes up in flames, your data is gone.
First, we only offer SSH / TCP22 so the transit is encrypted.
Second, we have installed, and maintain on the server side tools like 'borg' and 'rclone'. So while you might just 'rsync' or 'sftp' your data to us (in which cases it would not be encrypted on our end) you can also use sophisticated, encrypted backup tools like (borg, duplicity, git-annex, rclone, restic, etc.)
If you choose a tool like that, rsync.net does not hold the encryption keys. The data appears to be random from our viewpoint.
Can someone here translate the PR/marketing speak here for us mere mortals? How does having Facebook tracking on the web front-end of existing and paying users help with lead generation?
Within Facebook, you can use the event stream collected by your FB Pixel to both define conversion criteria as well as create audiences and define inclusion/exclusion criteria for that audience. When it comes to tracking on pages behind auth, primarily it's for audience building which can be used for
– Cross-sell/Up-sell campaigns. Build an audience based on usage patterns, and create a create a campaign for a complimentary service or higher tier (say, for example, someone clicks the button for a gated feature they don't have access to).
– Suppression lists. If you don't want your campaigns to target existing users, you can build an audience from pixel data on your authenticated pages and suppress against that.
– Lookalike audiences. After you create an audience in Facebook, you can create a "lookalike audience" from that. So even if you aren't actively doing either of the above, you'd derive value from tracking your "best" customers and using it as a seed list for a lookalike audience.
You're also not limited to using the FB Pixel for any of the above. In addition to a browser-side pixel, FB allows you to upload hashed customer information and use those for conversion tracking and audience building. Which used to be completely transparent to end users, but now you're able to see a list of companies that have uploaded your info to FB in this manner (I can't recall where it's buried in the user settings, off the top of my head).
All of that said, it's entirely likely that Backblaze wasn't intentionally sending any of this data to FB to begin with. An insidious aspect of FB's Pixel is that it automatically attaches listeners to a bunch of stuff on the page such as buttons and sends back interactions and associated metadata[1]. The flag to disable this isn't mentioned in the implementation instructions that are generated upfront, and it's actually a fairly uncommon trait for ad pixels. So a typical implementation tends to leave it on out of ignorance rather than make a deliberate determination on whether to use or disable that functionality.
"An insidious aspect of FB's Pixel is that it automatically attaches listeners to a bunch of stuff on the page such as buttons and sends back interactions and associated metadata[1]. "
Um. Holy crap. Is this common knowledge? I don't get shocked easily these days but... wow
I'm not really sure if it's common knowledge, but I'd say it's less common then it should be. I went ahead and made a top level comment[1] with some info about controlling data disclosure to FB's Pixel in case it's helpful to others.
In the analytics space, you traditionally had to:
– Initialize a tracking object on page load
– Explicitly call methods on that tracking object when you wanted to actually send a hit
This is how it works for Adobe Analytics and Google Analytics historically. Many of the newer analytics providers instantiate auto-listeners, which gave them an edge on the out-of-the-box analytics features. And Google Analytics 4 (the newest release), also does this.
So it's not unheard of for site analytics. And a quick glance at a particular provider's website can usually make it obvious if this is occurring, based on the advertised features.
Ad pixels tend to be different though. You create a conversion event within the ad platform, and you're given a snippet of code to fire when that specific event occurs, which both instantiates the tracking object and calls the tracking method with the conversion event's configuration details.
Facebook's pixel works far more like a modern analytics library than an ad pixel. It vacuums up the hit data from the site and the marketer is able to sort it out after the fact in Facebook's interface and use what they want from it. Marketers working within Facebook can see this is happening because they set up the conversions and audiences against the hit data, but that's "just the way things work" in Facebook so they think nothing of it. Marketers coming from other channels will notice how different it is, but won't realize what's actually happening nor the implications behind it. Devs would realize pretty quickly what's happening after a few minutes exposure to the FB Pixel interface and it'd trigger a red flag for them, but that's marketing's territory and all devs see are the snippets provided for implementation. So the only time most people become aware of it is if marketing has someone technical working directly within FB's interface or if the person tasked with implementation has a reason to dig into Facebook's dev documentation rather than just plop the snippet on the page like they were told.
Thanks for the link, I didn't know that was a thing.
Having said that, this sounds like "Hey guess what? We are gonna snoop on you, and profile the hell out of you and leak all of your sensitive data all over the place (filenames can be) all because you are a paying customer"
That's about the worst way to disrespect a paying customer. Is there a way to easily identify companies that does this, I can avoid them?
No, they don't need to snoop on you. Just fire a single pixel saying this is a user that they would like to find more of. FB does the rest based on data in FB, not on Backblaze. The mistake was them just adding it to the online dashboard because that's the easy way to automatically include paying users.
> "easily identify companies that does this"
No. And this can happen for a lot of reasons as explained so even harder to check for.
That PR/marketing tweet comes across as from someone who doesn't understand how big deal this actually is and why customers won't be comfortable with a FB pixel on their dashboards with data filenames and sizes.
Some might find this tweet useful for the time being:
> What happens if you resolve http://facebook.com to 127.0.0.1 via hosts-file?
(Or put it into your Pi-Hole DNS Ad-Blocker, or the like.) Does the Backblaze UI still work?
> Answer: Seems to work well. You need to add the main domain and sub domains (www. in this case), btw.
Including the filenames seems to have been unintentional, looks like they were logging analytics to Facebook, probably an even (form submission) but uploaded the form html with contents.
But why they need to submit that to Facebook for paying users I don't understand. The only thing I can think of is excluding active users from advertising... But is that worth the privacy intrusion?
They answered in the Twitter thread: they send data about paying users to Facebook so they can build lookalike audience targeting for new user acquisition. Other major ad platforms (Google, LinkedIn) have similar features.
This doesn’t look like the right data to be sending FB for that, though.
Why not just use client side encryption and be done with it? Why would anyone browse their personal files remotely on a web ui? If you care about privacy, take the proper precautions.
There really isn't any competitor to B2 on price, or convenience (e.g. them sending you a NAS device for recovery).
Signing up for Wasabi resulted in a deluge of unwanted marketing spam, all written to be "cheerful and your friend"
Also, huge warning for Wasabi newbies so you are not surprised. This might wreck your wallet.
Your egress is capped at your total amount stored. You cannot store 5TB and have 6TB of downloads against that account. The front page is covered in 'No Charges For Egress' 'no additional charges for egress or API requests'.. etc. This is not written anywhere on the main marketing pages, only behind a small link.
Another important point is file deletion. You are required to pay for multiple billing cycles (months) of files. Be very cautious about this. Using it as a temporary bucket will incur significant costs, if you upload a file, YOU WILL PAY FOR 90 DAYS OF THAT FILE. It is almost certainly not cost effective to store files in Wasabi any less than long term. Deleting a file that you just uploaded will incur a deletion fee equivalent to 3 months of storing that file.
One thing people should bear in mind when looking at Wasabi is that they have a 90 day minimum retention period for their PAYG plans. So if you upload a large file and overwrite it tomorrow, you'll be charged the storage fee for the day plus the storage fee for the remaining 89 days, and then the storage fee for the new file, and so on.
The same goes for AWS IA and Glacier classes and Google's Nearline and Coldline. Unless you're storing files for a long time, always factor in the minimum retention period before estimating costs. It'll prevent any nasty billing surprises -- speaking from experience unfortunately.
They could maybe salvage goodwill with genuine corporate soul-searching that ends up asserting/reasserting values -- and leads them to focus on providing trustworthy service to their users, and conspicuously away from some "tech" industry norms of selling out one's users.
As a provider of a paid service, it seems like they're in a better position to take the high road than a lot of tech companies are, but they have to decide that's who they are, and be clear they mean it.
Yev from Backblaze here -> we’ve looked into and verified the issue and have pushed out a fix. We will continue to investigate and will provide updates as we have them.
The exact phrasing you have used here is repeated multiple times in the Twitter thread, and I can only conclude that "pushed out a fix" is what your marketing department has decided to call what you have done.
What you have right here and right now is a public relations disaster. Trust in your brand has been damaged. It cannot be repaired by you providing minimal information. Your standardised message is akin to "Don't worry your little heads over the details - trust us, everything is fine now", and to be honest I find it a bit insulting. As far as we know, "pushed out a fix" could mean that you have hidden the tracking, so it is harder to find. Your short message is making the public relations disaster worse, not better.
These are the steps that you need to take:
1. Provide an explanation of why tracking was being performed in the first place, including an analysis of how much of that was a mistake.
2. Make an apology for breaching your customers' trust. This is a really important step, and it should be repeated in each of your press releases.
3. Provide details on the steps you have taken to fix the problem, and what that means for tracking data.
4. Make a promise that strictly limits the level of tracking that you will be allowing yourself to make in the future. Ideally we would all want that to be zero, and if you intend to do business with certain jurisdictions then you are limited to what is legal, but you must in any case be clear about what tracking you will ever do.
Honesty and transparency are the keys at this point to restoring your brand. I do not think the community will accept anything less.
Yev, if you are not in the upper management chain of Backblaze, please show mnw21cam's message to someone who is.
The problem is not that there was a little bug which caused the Facebook tracker to get a few little pieces of information it shouldn't have.
The problem is that Backblaze failed to understand how to distinguish appropriate and inappropriate uses of third-party trackers for signed-in users on a security-critical application. The Facebook pixel should never have been there at all. It shouldn't even have been considered. It should've been an absolute no-brainer that Facebook has no business being on secure pages on a critical infrastructure service for paying customers.
The fact that the pixel even showed up at all on a logged in page represents a breach of trust for customers and casts doubt on Backblaze's competence in handling security issues. This warrants a serious reply from the CEO, not a copy-pasted meaningless reassurance.
"What you have right here and right now is a public relations disaster. Trust in your brand has been damaged. It cannot be repaired by you providing minimal information."
I'm not sure that's true.
There are a number of barely-technical subreddits, typically centered around Plex or some flavor of bittorrent, that consist of an all-day-every-day request for "as much cloud storage as possible for the lowest possible cost and please say it's free".
These are not HN readers. It's as if they attained consciousness two minutes ago and one minute ago they decided they needed cloud storage.
This is the audience these kind of analytical tools are geared toward and I don't think this changes their "engagement" or their "convertibility" or their "lifetime value".
The real information here is that in 2021, and in conjunction with a much more sophisticated product offering (B2), Backblaze is very aggressively pursuing flat-rate, loss-leaders who can be influenced and targeted by facebook.
I understand that is a lower price than your own product, but it doesn't necessarily follow that they're losing money.
You seem to be implying that their covering the losses on storage by selling customer data. Is that correct? In which case, would you like to be a bit more specific and explicit?
No - I am drawing a distinction between B2 and the "unlimited" plans that I assume to be loss-leaders.
Aren't those the kind of users you can monetize through social media "engagement" ? I don't think it's the B2 users... which class of users had data leaks ?
The floor on keeping HDDs spinning is about $4/spindle-month (assuming $0.60/KWh total cost in a data center) and 150 drives probably eats more power than the rest of the enclosure and a rack switch, so call it $6/spindle, and Backblaze aims at $30/TB upfront cost and seems to have only >=4TB spindles so there's plenty of room for unlimited plan users to keep making them money at $6/month. I, as a fairly nerdy person, only need to back up a hair under 2TB at this point, and I assume I'm past the 90th percentile of bb's unlimited customer distribution and they would still make money off of me if I kept 2TB there >15 months on the unlimited plan ($30 / ($5 - ($6/spindle * spindle/4TB * 2TB) month).
I think their best advertising for B2 is the quarterly hard disk stats, it's a shame they made this mistake with the fb tracker; I can't store data in my garage as cheaply as they do. ~100W continuous for a box with drives at $0.30/KWh (CA PG&E) is $20/month.
I'm jumping through the hoop of s3backer to turn B2 into a vdev for a zpool with encrypted datasets; I'll have to try something similar on rsync.net with sshfs.
But just to be clear: is your claim that the unlimited storage consumer backup plan is a loss leader, and they cover their costs by selling the backed up data (or metadata) to the likes of Facebook?
Then I genuinely have no idea what your goal here is. What is your claim?
It seems like you're trying to waft a very serious claim in the direction of competitor, but you're being very careful to avoid saying anything explicit or specific.
My long held contention - which I have repeated in this forum many times and wrote into a formalized blog posting 12 years ago[1] - is that flat rate data service offerings pit the provider against the consumer in an antagonistic relationship.
Which is to say: the provider has a vested interest in minimizing your usage and incurred costs which runs directly counter to the consumers desire to use as many resources as possible. This antagonistic relationship leads to all manner of dysfunctions and bad patterns.
When I see serious businesses "enhancing engagement" with facebook pixels, I think that perhaps that is one more side effect of that antagonistic provider/customer relationship.
HOWEVER, it turns out that the tracking code was on the B2 side of things - the people-paying-money side of things - and not on the who-will-let-me-upload-movies-forever side of things.
So my sense was wrong.
I was suggesting that this might not be as brand damaging - and trust eroding - as my parent suggested. After all, both sides of that unlimited flat rate storage relationship are pretty dysfunctional. If this was on the B2 side of things then I take it back - it's probably quite damaging.
Regardless: I stand by my disdain - and continue to warn against - flat-rate service offerings. You want your provider to happily enable you to use more of their product.
While on a personal level I agree with you, there is a saying in Dutch: "je moet niet wrijven in een vlek", which translates to "don't rub in a stain".
Sort of like the Streissand Effect; if BB starts posting press releases about "breaching trust", filled with apologies, the audience that will become aware of this issue will be a lot bigger than it currently is.
So from a PR persepctive, a one line low key "we pushed a fix" reply makes total sense.
Being open and making clear that you understood you goofed would be total opposite of the Streisand effect - where you'd try to _silence_ discourse with the unintended effect of amplifying it.
Right now, the issue is being downplayed, much to the chagrin of an increasingly knowledgeable set of computer users.
Yev here -> re: fixing the issue - we removed the offending code from the logged in web pages. Rest assured that we have are taking this very seriously internally and we will continue to investigate and provide updates as we have them.
Yev here -> Absolutely here you on that. We wanted to make sure we knew what we were talking about before we made any declarative statements. We just finished our root cause analysis and updated our blog post with updated information: https://www.backblaze.com/blog/privacy-update-third-party-tr....
Yev here -> we wanted to make sure we were able to investigate the root cause so we knew what we were talking about. We've completed that root cause analysis and have updated our blog post with additional information -> https://www.backblaze.com/blog/privacy-update-third-party-tr...
Thanks for participating in this thread. As a longtime paying customer, I consider this a monumental security breach and I will be leaving the service. It’s clear that Backblaze have prioritized growth hacking or whatever over the security and privacy of me as a paying customer, and that your security processes are woefully inadequate.
I might very well start using backblaze next year or maybe even next month[1], but that is depending on the outcome of this event.
For a comparison: one good friend once called med to apologize that he had been laughing behind my back with some friends.
Guess who I definitely trust today? The one admitted his mistake. He always was a nice bloke and I guess he will never ever do anything like that ever again.
[1]: I won't start using it this week or the next however.
What kind of scenario do you have in mind? I think it’s possible to turn an incident like this around, PR-wise, but I can’t see how they will explain how they can sell something as secure and trusted, when their security process was unable to discover that they had deployed spyware in production. If it was there for a few hours before it was discovered and removed, well maybe.
Just hope they announce the changes to security and implementation processes rather than just if they fix this issue or not. This really shouldn't have occurred in the first place, so you want to know they've fixed the root cause, bad process, not just the symptom.
As another data point, Backblaze has pretty much until this weekend to provide an update that includes "we have removed Facebook and are getting a 3rd party review for other security holes in our product". After this weekend, I won't care because I'll be on a different product.
This is embarrassingly bad, as in, I'm now embarrassed for recommending using Backblaze at my company.
Crap, I just realized I got Backblaze installed at two previous companies. Thanks!
Thanks. However, it is just not excusable and a breach of trust. The Backblaze Twitter communication making pretty clear that they don't see a problem with tracking paying customers. We are moving somewhere else.
If a company writes childish messages in a place where it is expected to read childish messages, I wouldn't use that to judge the rest of the organization.
That might be and I am not judging you for this perspective. From a company handling my data I however expect them to be professional wherever they articulate themselves about a serious problem.
Does Wasabi do something similar? Have you checked? There’s probably a very limited window to do that, I’d imagine everyone in this space is checking their trackers now.
I just signed up for a free trial with Wasabi and they include trackers from Google analytics and LogRocket. I'm not sure if they send filename data across the wire.
I've been considering using Backblaze for both personal and company needs -- and we're talking 50+ TB here -- but this incident made me reconsider.
I'd still use Backblaze but that's VERY dependent on how do you handle this. Just saying "we fixed it" doesn't answer the much more fundamental question of "what is the FB tracking pixel doing in a privacy-critical page in the first place?".
Please, do a thorough post-mortem analysis and publish it. Looking at the comments here, this could mean you get or lose the business of many.
Fixing the problem is only half of it, you need to make a commitment to a comprehensive and transparent review of the engineering practices that allowed this to go to production. And fully disclose how long it has been going on.
Hi Yev. It's great to know that you fixed it. Could you share since when you had this issue? I'm a long-time paying customer and feel somehow betrayed here.
According to Backblaze's own policies, they will be emailing you about this data breach "without undue delay" — at least if you've logged in while the breach was present:
> In the unlikely event of a data breach, as defined in the GDPR, Backblaze will without undue delay send its affected customers a notification email, and provide at its discretion, updates through other communications channels. This notification will describe the nature of the data breach, including where possible, the categories and approximate number of data subjects concerned, the categories and approximate number of personal data records concerned, the contact point where more information can be obtained, the likely consequences of the personal data breach, and the measures taken or proposed to be taken by Backblaze to address the data breach, including, where appropriate, measures to mitigate its possible adverse effects.
Will you be pushing out a fix to address the complete lack of any process that should have flagged and prevented this from happening in the first place? The quick fix I am pushing out for my clients who use Backblaze is moving them to another backup provider.
Yev here with a brief update on the fix that was pushed out - we removed the offending code from the logged in web pages. We will continue to investigate and provide updates as we have them.
I would think the type of customer that used Backblaze is orthogonal to a typical Facebook user and is actively hostile to Facebook’s practises. By even considering using their spyware in any shape or form is an egregious breach of trust and the response from your company needs to be significantly better to restore confidence.
I'm in engineering of a financial services. When we built our front-end UI for eKYC our marketing requested for google tag manager / facebook pixel and various other tracking features to be built.
I had to fight hard as an engineer to make sure that it does not happen. We had meetings after meetings, and it took a lot of effort for me to explain the risk of data leakage. I was questioned on my "insecurity" for not "trusting" people. It was not a nice experience. I had to inform them that tracking needs to be dealt with properly, not just lazily install google tag manager because it gives marketing 'flexibility'.
The only thing I knew about 'tag manager' before this was that it was always blocked by NoScript. Your comment made me go look up what it does and now I know that I will never unblock it.
Apparently it lets people drop in random code from a bunch of different analytics platforms, so it's pretty much guaranteed to consist entirely of the sort of stuff I have NoScript enabled to block in the first place.
I've seen GTM take down production multiple times because of marketing shipping random JS with no approvals.
Some random guy in his basement assured someone in marketing they could handle our volume? Chuck their tag in and watch their website get DDoS'ed with millions of requests per minute, which takes out our website because marketing made it fail loudly.
I'm surprised that GTM doesn't handle that. They would have a good idea of how long requests are taking to different domains and limit requests to the slower ones.
I mean, yes, true, but it kind of misses the point: marketing doesn't ask you, they ask up. And we're just the BOFH pinheads who make everything so harrrrrd with our stupid "concerns."
IT can often be "we make someone else's bad idea happen," and that's because we simply lack veto power.
That's par for the course for technically orientated roles. I was labelled 'defensive' and denied a pay increase because a business development executive wanted to make our documentation dynamic based on user access, and I pointed out it was difficult to find a solution when users can have over 400 access permissions, which varied depending on country, and our documentation was 900 HTML pages, some of which were equavlent to 200 A4 pages.
If you are the most knowledgeable person then you get blamed for their bullshit fantasies being impossible or unwise (or illegal)
I was questioned on my "insecurity" for not "trusting" people.
"I trust people just fine. I trust people working for outside companies dependent on information gathering to gather information. Google has no fiduciary duty to our clients. Same with Facebook. We DO have a fiduciary duty to our clients and it includes not doing things that may send their confidential information to third parties because SOME of the information used may be useful to our marketing department."
The reason for that is I'm the CTO, responsible for building out the tech / engineering team.
Hiring people is difficult as there is lack of supply of talented people. We hire InfoSec on contract basis not full-time and they don't join such meetings due to the nature of the contract. So all the responsibility fell on me to defend our technical decisions at that point in time.
I'm working on building out the engineering culture / awareness within management now, to ensure these things do not happen, and I don't have to be questioned as to why we cannot install "google tag manager" in our front-end.
It all comes down to creating awareness, and making people understand. Fortunately for me our CEO gets it, he ended up siding with me.
Honest question: how can a financial services company not have an in-house infoSec team?
To me, this is an even more concerning issue. But then, I have no idea how the finance services world works, so maybe this is more common than I think?
FinTech doesn't always mean global mega bank. There's lots of small scale start-ups that fit into the financial services category that wouldn't/couldn't afford full time InfoSec roles.
Outsourced CISO/InfoSec is a valid and reasonable thing for some companies.
I feel like a small scale startup needs internal infosec and audit teams even more. Unlike the incumbents, who are "too big to fail" and therefore are able to get away with blatant insecurity, a startup's in a much more vulnerable position, and any security breach is significantly riskier in terms of corporate longevity.
If I was running a financial services startup, those groups would be near the front of my list in terms of internal hiring.
There are huge repercussions. We are governed by PDPA (our GDPR equivalent) law where penalties are extreme. Thailand takes Data Privacy as seriously as EU.
But again, since it's not always easy to find the right people I end up having to fill in for everything we don't have a team member to execute on.
Worth sharing this thread around your company, maybe?
Could help with convincing people that there are negative marketing consequences to careless and seemingly harmless decisions.
No has nothing to do with tax. Usually contractors have very fixed scope, they focus on doing what is in the contract. Sometimes things that come up ad-hoc like marketing requesting installation of Google tag Manager, is outside the scope of the contract. It would require a lot of giving them context, amending contract etc... It's not necessarily convenient to have to ask them to come in every time there is a problem.
Usually I try to reason with management first. If it can be resolved internally we would not include outside consultants, however if it gets serious beyond something we can handle internally we would ask outside consultant to come in.
LOL, worked at a place and we uploaded all of the inventory for Facebook integration. We charge millions to customers for this data but "we need to show up on Facebook" wins the cake.
Sure, in the end I asked our head of compliance to join the meeting. I made it very clear to everyone what was at stake, and that I’ve done my duty in raising awareness. If they would like to proceed I hold 0 responsibility. Usually when you do it like that no one wants to put their neck on the line. Our head of compliance take this stuff really seriously as he has to report to central bank, so him and our CEO ended up agreeing to not use GTM.
You really need to present well and be careful about arguments like “millions of websites use GTM”. I did days of research and presented that while yes using GTM on Wordpress sites that hold no sensitive data might be fine however we are a financial services and we collect customer private data. So getting everyone on the same page and presenting alternative ways of solving the problem was critical.
It's scary to think that a company that seem to have a decent policy on privacy / data collection practices at one moment is just one step away from some marketing manager or MBA changing that. It's really hard to gain customer trust once you loose it, and in BackBlaze's case it seem to be for marginal if any monetary benefit.
I think part of the reason is that most of these companies don't value customer trust.
It's a lot more than just the person in marketing not realizing. You can't expect marketing to understand these things.
It means there's either nobody reviewing the privacy implications of marketing decisions, or that somebody who knows better is reviewing these decisions and decided leaking data like this is acceptable.
Both those possibilities make backblaze a non-starter for me now.
> You can't expect marketing to understand these things.
I work in marketing ops and I know how difficult it is to get marketing folks to understand or care about how the tracking they use works. There’s a small but growing number of us trying to change behaviour and awareness from the inside out but if marketers fuck up on privacy an example should be made of them.
Exactly this. If this was possible, nobody said that this is a bad bad idea or did say but got overruled it makes me think this product is no longer safe. I'll give some time for them to publish post mortem and in the meantime will look where to move data.
It's been a problem almost everywhere I've worked. Marketing / business folks just want their analytics, and they don't want to spend any money on it (e.g. they don't want to pay for some privacy-respecting 3rd party service, and they don't want to pay devops to stand up an internal OSS alternative).
Unless you have really strong dev leadership, the trackers will end up in your product.
I'd love to hear from folks who have successfully blocked their business team on this. What tactics did you use?
* Announces this breach to the relevant data protection authorities. They have 48 hours from learning about it to giving an initial report to the UK's ICO.
* Makes a blog post apologizing, explaining how it happened, and what they've done to prevent it happening again.
I doubt it’s intentional, but companies should seriously consider if the benefits of integrating things like Facebook Analytics tools outweighs the negatives. It seems like considering their audience they would not use Facebook of all things.
I operate quite a few apps and recently launched a website too which handles a lot of sensitive user data. I decided to make - not having any analytics, trackers and ads the selling points of my apps and sites. Get a lot of positive emails from customers thanking me for that. I was recently even wondering if Google penalizes sites for not having putting their analytics on the sites/apps from appearing on the search results.
I legit don't understand why a paid storage service would put a FB pixel on their dashboard which handles user files. It's a completely foreign concept to me. This seems like a screw up but also erodes a lot of trust which is unfortunate as I had been looking at them for past 2 months actually.
I even made a post just yesterday and another couple weeks ago on how BackBlaze's inability to set a specific file name, file size limit and expiry date on the pre-signed urls is preventing some of us from switching over from S3 to Backblaze for our storage of app data needs. And surprisingly, I wasn't the only one as I got a few people responding with same concern.
> A limitation I ran across when using B2 was that their pre-defined url generation doesn't allow you to set file-size limits nor does it allow you to set the file name in the pre-defined url. It simply gives you a pod url to upload it to. So if you are using b2 for storage for lets say image uploads from browser, some malicious user has the ability to modify the network request with whatever file name or file size they want. Next thing you know, you have a 5gb sized image uploads happening.... This pretty much prevents me from using B2 for now.
> I ran into the same limitation! IIRC, there also wasn't a way to expire a signed upload URL sooner than whatever the default was, which was hours or maybe a day. I had the exact use case you mentioned, too - image uploads bypassing my backend server. I didn't want the generation of a signed url to, say, upload a profile photo, give carte blanche to create a hidden image host when combined with the limitation that you highlighted. All sorts of bad things could come of that. I ended up just going back to S3 - costs more, but still worth it.
Since this is for a site/app which lets users upload data, I am really trying to avoid S3 due to crazy costs. I might look into DigitalOcean's offerings. Anyone have any other recommendations?
> Since this is for a site/app which lets users upload data, I am really trying to avoid S3 due to crazy costs. I might look into DigitalOcean's offerings. Anyone have any other recommendations?
Dumb suggestion: Run it yourself? Minio is easy to use, even in multi-server mode.
That's one of the things I was looking into too actually. Though if I want to go with Minio, I would also want to do that on my own physical servers instead of cloud to completely cut out third parties. Do you have experience in that? I guess I would need a proper dedicated high speed internet for it too for any decent traffic?
Sorry, I've never needed to use it like that (and, full disclosure, I've never run minio in prod either); I can recommend Hetzner dedicated servers because they don't bill for egress but that's it
I get why they might be doing it ( audience building ), but they should have restricted it only to their HomePage or some info pages. ( check azernik's response on how to do this correctly ! )
It's easy to attack Backblaze.
Before attacking them for this, please make sure the company that you are working for or building doesn't do the same thing. ( I know for a fact that a lot of startups make heavy use of Audience building).
> Before attacking them for this, please make sure the company that you are working for or building doesn't do the same thing.
I don't get why these two are related at all. One can do both the second and the first. By their own admission in other HN threads, Backblaze earns several million dollars a year and is proud not to have VC backing. So it doesn't seem like anybody is attacking an underdog who's struggling to change the status quo and needs to be held to lower standards.
Have been a Backblaze customer for many years, mostly because of their state of HD here on HN. Lost all confidence in Backlaze.
Alternatives? Had been using rsyncnet in startups for many years, but was more expensive (now using Backblaze for storing GBs of raw images from DSLRs)
Did you read anything more than the incredibly poorly worded headline?
The data was harvested by the Facebook pixel as part of their audience building tech for acquisition of new customers. So you are being sold by Facebook, Backblaze just happens to have been quite careless here but they are not "selling your data".
The breach of trust is BB letting third party code on their platform and especially from a particularly untrustworthy third party. That's it, it's egregious and should be taken seriously, that also means discussing it seriously.
This kind of hyperbole is counter-productive, it only makes it easier to ignore your concerns as "crazy overreacting".
Wow they pretty much flushed all user trust and good will down the toilet with this. I had considered using them in the past but I sure won't be considering them now. Even if this were an accident how are there not procedures in place that would have required approval and vetting?
Does anyone have any idea how long this has been going on?
And: it only happens when rendering the filenames to a browser window right? So only when I browse folder y in bucket x are my filenames for that folder shared with FB?
Backblaze, I really enjoyed being a paying customer. Until now. Bunch of dorks.
Edit: goes to show you have to encrypt EVERYTHING at rest, even file names...
The ad hominem stuff isn't classy at all. I agree.
It was a heat of the moment thing. Mostly I'm being angry at invasive tracking being the norm when going down the 'growth hacking' path. I must lack perspective but it saddens me that contextual advertising and focus groups apparently aren't enough.
edit: to make up by adding something actually useful to the discussion... I checked and I don't see any DNS requests being made for any facebook domain when browsing my B2 buckets. Maybe by now they got rid of the tracking pixel?
Great. I finally shook off my lethargy and did some research and thought they were perfect feature and price-wise. Setup a backup script using rclone and did a backup yesterday - worked great too - now this.
Had a look at rsync.net - too pricey for my puny sub-500GB data.
I'm curious to know what emailing you would help with. Is it a lower than published pricing tier plus no 400GB minimum order? Or some special plan based on geographical location? Something else?
FWIW, I agree with you on your previous posts/thoughts and I'm all for pay for use and against flat rate/unlimited plans, as long as it fits in a budget.
This should be trivially and universally fixable everywhere:
Never include the 3rd party marketing scripts on any pages where a user is authenticated.
But of course that would deprive many companies of mountains of valuable data so it ain't ever happening, right?
---
Also, am I missing something obvious here? If Backblaze -- or anyone else really -- wants analytics, what do they need the Facebook pixel for? There are so many good analytics services out there.
I'm also a long term paying customer and I'm leaving for sure.
If they can have FB pixels in the admin area, they basically have no security processes working. If marketing drives their tech decisions, this is not a company to trust your data to.
Not all of the various silos within evilCorp structure knows what the others are doing, if even the groups within the same silos know what is going on.
It's inexcusable, honestly. These are the kind of decisions that not only need department head approval but also CTO approval. If the CTO didn't see it coming, they failed.
HIPAA isn't some random cert you have to satisfy a single big customer. It needs to be a priority and all business decisions have to be made around it. GDPR is another one.
This only happens if you use their web interface, right? If you only interact with your B2 storage via their API, such as through a backup program like Arq, there would be no Facebook involvement?
And they're back. As far as I can tell, they've removed tracking. ublock is no longer blocking anything on that page, and I do not see the cd[buttonFeatures] property in headers.
Imagine if S3 did this. I don't like Amazon, but at least they are security professionals.
Ad tracking pixels in your object store dashboard is just clownshoes from a security engineering standpoint, over and above the fact that it's a slimy, dickhead move for a paid service.
It's possible to manage the chrome browser centrally via g suite, and push out DoH settings. I use this in conjunction with NextDNS to put all browsers organization-wide onto a single blocklist, and block the common tracker hosts (GA, GTM, Facebook's domains, et c).
I know that many computer systems have to be certified or compliant with a spec to be used (HIPAA). Is there a possibility that such data being sent across the wire to a 3rd party would break such compliance?
The thing is, even if it was a dumb mistake, it's one of those mistakes a company like Backblaze can't afford to make.
If they don't pay attention to stuff like this, then why should I trust them with anything at all? This isn't some minor oopsie, this is failing to deliver on their core product[1]:
Top Backblaze B2 Use Case Solutions
Backup & Archive
Store securely to the cloud incl. safeguarding data on VMs, servers, NAS, and computers
Backblaze is not led by dumb people. This was a conscious decision that they made and the got caught. I used to recommend them to people and I never will again.
A dumb mistake that leaks private information to a third party seems like good reason not to trust or use their service. For me, the idealogue, the fact they're sending any tracking data from the dash to Facebook is enough reason.
I know this is a common experience on many sites but I'm a paying customer on flickr.com (have been since ~2004) and other day I visited in a new browser or in private mode and got the "we use cookies..." dialog, it list over 90 advertising companies and probably over 500 tracking options.
Want to see it, go to flickr.com in a private window, it should pop up something about cookies. Pick "Manage Settings". It's insane.
Just an FYI in case this isn't clear to people. Even though this is a serious problem, it's super unlikely that FB is using the filenames in any way, or aware they were being sent. That part is probably just a (really awful) mistake.
This is the same company that has published iOS apps that have silently escaped the sandbox. I think pathological is probably an appropriate word for them and wouldn't trust anything remotely related to them with a single bit of personal data.
Possibly, but company culture transcends the immediate team. And if your business model is based on data siphoned off every possible channel, the concern is very justified.
super unlikely that FB is using the filenames in any way
First, we don't know that. Second, it doesn't matter even if they are not using it in any way. Backblaze shouldn't be sending this data to FB. And lastly, even if FB isn't using it in any way today, how do we know they won't use in future?
Nobody should be having access to any data that they don't absolutely need. It doesn't matter how mundane the data is, which company the data is going to, etc etc.
More so with all the machine learning code being deployed to gather better information about people to thrust more ads on them, it's possible that even those who work at Facebook may probably not know the extent to which all this data is funneled into various systems and attempts are made to extract monetary value from it.
Like I said, it's definitely a problem regardless.
But this is like saying it that if you misplace your cell phone, it doesn't matter whether you left it in your friend's car or a taxi. Of course it matters whether or not data is being (mis)used.
Facebook, like other GDPR compliant companies, deletes most data after 90 days by default. My guess is that this data was not intentionally stored, and any place where it was will be deleted manually within a few days as part of remediation for this issue. Even if that doesn't happen, it would automatically disappear after 90 days.
> Facebook, like other GDPR compliant companies, ...
Isn't there a lawsuit or similar going through the EU courts atm about Facebook not really being GDPR compliant? eg They claim they are, but the court case is about them not being.
Surely Backblaze needs to do more than just fix the pixel and "investigate". At a minimum:
- provide customers with an exact timeline of when the pixel was introduced so they can work out for themselves what data has been leaked
- report themselves to relevant regulators in each jurisdiction that they operate in for leaking customers' sensitive data – under the EU GDPR there is a time limit for this.
- get on the phone to Facebook and beg them to demonstrably delete the data
Filenames are not the worst possible thing to leak, but they can be sensitive and it's not good enough to just go "oh, oops, we implemented it wrong, we've fixed it now".
I appreciate that their Twitter account is looking into it, but feel like this should be a pretty quick fix. I'm hesitant to even log into my Backblaze account again until it's sorted.
Considering this, do Backblaze regularly undergo audits to make sure there aren't any security and privacy holes that go undetected? I know now all services do this, but Mullvad at least have built a nice VPN reputation on account of that.
Might be worth looking into, if nothing else then for public perception reasons.
I'm a long term B2 customer and left FB a long time ago and block trackers.
The major point to me is: If marketing at Backblaze drives decisions that influence security and tech has either no say in this or is not competent enough, it's not a company for me to trust my data with.
I think Backblaze will think two or three times before integrating spyware into their product in the future, so I agree that jumping ship is the wrong answer. Now, if it happens again, then yes. I think jumping ship is the right move.
In a sense, that makes life easier for me. One less alternative to consider. The name "Blackblaze" is burned forever. You will never win me as a customer. I write this here, because people in similar positions at similar companies might read this. And I do not think I am alone.
backblaze signs BAA agreements with companies storing personally identifiable medical information, I wouldn't believe anyone who told me that this facebook data leak was turned off for those customers; they should immediately be investigated and fined if any such breaches indeed happened.
Full agreement, any interest and goodwill towards the company is now completely gone.
Unfortunately HIPPA is actually specific in this scenario. I highly doubt file sizes have to do with PHI. However if you can show that they are sending the actual data in the backup, that would be a reportable offense.
Disclaimer: I am not a lawyer, this is not legal advice, YMMV.
Been a huge fan of Backblaze for years and b2 was my plan for server backup; always seemed like a great underdog company in my eyes- they just lost me. I already canceled my Spotify for a similar —Facebook-related— reason.
Probably because Spotify is using the Facebook SDK in all their apps. As a user you cannot disable it and a few years back the Facebook SDK caused an outage where you couldn't start Spotify (and many other major apps).
One possible, slight, softening factor - if it's an accident/bug, I might consider them, if they can explain exactly how it happened and what extreme measures they'll take to ensure that it never happens again. But... their initial answer on that thread is
> Believe that's the Facebook pixel we use for tracking, we've forwarded to our web team for review in case that is not intended behavior.
Just... "in case" it's unintended is not promising.
It's disappointing that a company dealing with sensitive data resorts to third party tracking solutions (and then Facebook of all things!)
Especially a tech shop like backblaze who have engineers building amazing tech in general. But then you cheap out on implementing some basic metrics for the web UI. Do you even need all the bells and whistles Facebook offers?
It is just lazy engineering. They want lookalikes. Fine. Collect, clean, mask the data in house and send it to FB. Dumpling to a 3rd party data directly from page is irresponsible to say the least as such "Opsies" are bound to happen.
they can’t do that. they don’t have the original data (it is owned by fb) to create the lookalike audience. methinks you don’t understand how these tracking doohickeys work
FB allows a custom audience to be built from identifiers like email, phone, etc. They claim it is hashed before use so they never see the raw data (not sure if they mean hashed client side or server side though).
This will rely on a user's FB account having the same email as used for BB, which could be unlikely in the case a company is paying for it. But it should work well enough for retail targeting.
All they say is they don't learn anything new about your customers. I would take that statement with a lot of caution. For one, they should know that I am a bb customer now. Having a graph of all the companies and products I use is huge PI.
> They claim it is hashed before use so they never see the raw data
If they can match hash data with real data, they can know more than they did before. Depending on what algorithm they use for hashing (no mention of it), they could be using a similarity hash so that will minimize changes if there are minor differences in the dataset.
Let's say I find a profile through comparing hash of email to email in Facebook's database. I can then compare additional information to see if a customer has provided incorrect information to Facebook as a user. Facebook could check if my address is similar to online shopping sites I use and if not, flag the account.
Yeah there is a lot of room to hide in the gaps of what they said.
Still, main point is you only need an identifier and none of the other data Facebook has. Pixels are not required for this as noted in the original comment, they probably have enough in the account details already.
> they don’t have the original data (it is owned by fb) to create the lookalike audience.
they have the original data of their own audience. So they can send it explicitly to FB (instead of FB sucking it from the page's pixel so to speak) for FB to build the lookalike audience which FB would do using the wide FB owned data.
the data they hold on their audience is irrelevant here. eg BB not collecting age and gender, and they don’t have the web history of the user (that fb pixel and thumbs icon enables). the lookalike audience is based on FB profile.
A have a question about creating lookalike audiences, by sending data to FB (either separate or through pixel tracking). Is that data not by definition PII, and so they are likely violating GDPR doing this?
the data sent by a pixel is not PII. it enables lookup of the existing PII. the company putting the pixel on their page isn’t collecting PII and this is outside of GDPR. the accidental transmission of file names doesn’t seem to be PII to me.
I'm guessing Backblaze will come back with a more detailed report on this, but I wonder if the data already given to Facebook can be deleted properly (so that Facebook doesn't have it to use for other purposes elsewhere). Is that even possible to get done for a relatively large company like Backblaze?
Also, isn't this a violation of GDPR in EU, or is there a "you opted in by default when you logged in and so it's not our fault" argument in play here?
Not even sending an advertising id (or anything else) to a Facebook url when I visit a public non-Facebook page is ok. This is unbelievably far from ok.
Yes but but if marketing drives their security decisions, where else do they leak? What data do they sell? If they play lose with security, in what areas do they also play lose with your security? With your login email? Payment data? Do they sell your credit history? How do they vet their employees?
I was just thinking the other day, was it worth using rclone to encrypt the file names in the B2 bucket I used for backup. Given that it limits you to the rclone tool to access files for recovery.
Given Backblaze's good track record, I am assuming this was default FB pixel tracking behaviour. A company like BB should however know better than to use FB tracking on their paid products.
Seems like a poor decision / dumb mistake. Fix it and move on. I can understand the impetus to need FB ads & lookalike audiences to grow the company -- it serves no one if BackBlaze goes bust. It's obviously wrong to send filenames to FB as well.
If they fix this, goodwill will be burned, but it's not a dealbreaker for using them. If they don't fix it, then yes by all means leave the service.
Tag manager is used for managing all of the various tracking and analytics snippets for a site in a central location, so if your ad blocker stops tag manager itself, you often preempt many other things that would run. In this case I’d assume the facebook pixel was installed using tag manager.
They also don't allow you to encrypt data on first setup. So you need to first upload a few files unencrypted before you can set your key. That's why I'm not using them. Basic encryption feature is not thought well...
Is this just getting press because it’s Facebook? Anyone that works at a company that runs a proxy server to access the Internet should just check the referrer fields leaving the company for requests to sites like Gravatar.
It's getting press because it is siphoning off the page contents which include the names of files you have stored. It is quite a different scenario than just loading an image through an img-tag (which can be controlled through things like Referrer-Policy these days)
This should be a warning to every developer, when you integrate with any third party - don’t just copy the default snippet of code they recommend, read the docs thoroughly and then test/monitor what is being sent back to the third party. And if you are writing a third party widget, be considerate and make the defaults the least aggressive possible when it comes to accessing user/site/server data.