Hacker News

28 minutes ago by azernik

One easy way to avoid this kind of mistake in your own product: make a clear distinction between your publicly-facing web site ("corpweb") and your web app for logged-in users. Preferably, they should be served from separate infrastructure.

Corpweb should be as static as possible, except for whatever third-party JS the marketing professionals think is necessary. It's their job, they know what's best.

Your app should have zero third-party JS except for technical analytics (New Relic, Datadog, whatever).

(This distinction can be fuzzier for free services, and for consumer stuff with non-sensitive data; Backblaze is neither of those.)

6 minutes ago by walrus01

I agree with this delineation, and if necessary, it lets the sales/marketing people go absolutely wild with the tracking, analytics and CRM-system integration on the public facing marketing website, should the C-level people decide to allow them to do so.

2 hours ago by allendoerfer

In a sense, that makes life easier for me. One less alternative to consider. The name "Blackblaze" is burned forever. You will never win me as a customer. I write this here, because people in similar positions at similar companies might read this. And I do not think I am alone.

44 minutes ago by colechristensen

https://help.backblaze.com/hc/en-us/articles/217667238-Backb...

backblaze signs BAA agreements with companies storing personally identifiable medical information, I wouldn't believe anyone who told me that this facebook data leak was turned off for those customers; they should immediately be investigated and fined if any such breaches indeed happened.

Full agreement, any interest and goodwill towards the company is now completely gone.

2 hours ago by ajdude

Been a huge fan of Backblaze for years and b2 was my plan for server backup; always seemed like a great underdog company in my eyes- they just lost me. I already canceled my Spotify for a similar ā€”Facebook-relatedā€” reason.

2 hours ago by magicalhippo

Yeah off my list as well. Was looking into them as my backup needs had changed somewhat and they'd finally gotten a datasenter in the EU. Shame.

2 hours ago by smnrchrds

> The name "Blackblaze" is burned forever.

So they should be good continuing to use "Backblaze"?

2 hours ago by Zhenya

This is actually pretty crazy. You pay for this service, and then they share, what many consider to be pii, directly to facebook.

I'm guessing if you're logged into facebook, now FB can correlate:

1) you use a backup service

2) all the metadata from the file names

Whoops.

This is why my network runs pi-hole / diversion with all tracking blocked network wide.

11 minutes ago by tuwtuwtuwtuw

Who are the "many" that consider file name PII?

4 minutes ago by ipnon

URLs, of which file names could be considered a subset, are definitely considered PII.[0]

[0] https://cphs.berkeley.edu/hipaa/hipaa18.html

a minute ago by Zhenya

John_smith_college_graduation.jpg

6 minutes ago by ncallaway

I would generally consider a filename to be the kind of field that could easily contain PII.

Names or identifiers are the kinds of things that very often end up in file names.

2 hours ago by CA0DA

Wow. That's a big deal and the person in marketing/sales who requested the Facebook Ad pixel did not realize that's a big deal.

2 hours ago by artellectual

I'm in engineering of a financial services. When we built our front-end UI for eKYC our marketing requested for google tag manager / facebook pixel and various other tracking features to be built.

I had to fight hard as an engineer to make sure that it does not happen. We had meetings after meetings, and it took a lot of effort for me to explain the risk of data leakage. I was questioned on my "insecurity" for not "trusting" people. It was not a nice experience. I had to inform them that tracking needs to be dealt with properly, not just lazily install google tag manager because it gives marketing 'flexibility'.

an hour ago by 0xy

Never ever give marketing access to deploy arbitrary JS onto your website under any circumstances.

Google Tag Manager is an absolute cancer on web development.

Once it's in, you'll never get rid of it.

33 minutes ago by Natsu

The only thing I knew about 'tag manager' before this was that it was always blocked by NoScript. Your comment made me go look up what it does and now I know that I will never unblock it.

Apparently it lets people drop in random code from a bunch of different analytics platforms, so it's pretty much guaranteed to consist entirely of the sort of stuff I have NoScript enabled to block in the first place.

an hour ago by artellectual

Agreed!

an hour ago by rodgerd

I'm surprised and disappointed that you didn't have your InfoSec and risk people in your corner.

an hour ago by artellectual

The reason for that is I'm the CTO, responsible for building out the tech / engineering team.

Hiring people is difficult as there is lack of supply of talented people. We hire InfoSec on contract basis not full-time and they don't join such meetings due to the nature of the contract. So all the responsibility fell on me to defend our technical decisions at that point in time.

I'm working on building out the engineering culture / awareness within management now, to ensure these things do not happen, and I don't have to be questioned as to why we cannot install "google tag manager" in our front-end.

It all comes down to creating awareness, and making people understand. Fortunately for me our CEO gets it, he ended up siding with me.

28 minutes ago by WrtCdEvrydy

LOL, worked at a place and we uploaded all of the inventory for Facebook integration. We charge millions to customers for this data but "we need to show up on Facebook" wins the cake.

an hour ago by CuriousNinja

It's scary to think that a company that seem to have a decent policy on privacy / data collection practices at one moment is just one step away from some marketing manager or MBA changing that. It's really hard to gain customer trust once you loose it, and in BackBlaze's case it seem to be for marginal if any monetary benefit. I think part of the reason is that most of these companies don't value customer trust.

an hour ago by notatoad

It's a lot more than just the person in marketing not realizing. You can't expect marketing to understand these things.

It means there's either nobody reviewing the privacy implications of marketing decisions, or that somebody who knows better is reviewing these decisions and decided leaking data like this is acceptable.

Both those possibilities make backblaze a non-starter for me now.

3 hours ago by guywhocodes

Yikes, that's a lot of good will burned. Doesn't matter if it is unintentional or not.

3 hours ago by celsoazevedo

The inclusion of Facebook tracking was intentional:

https://twitter.com/backblaze/status/1373751015594356739

Not sure if they intended to send file names and sizes to FB, but in any case this doesn't look good. I'm currently looking for alternatives.

3 hours ago by chunkles

rsync.net was posted here on HN a few days ago. Also, tarsnap is another popular service. Neither have the special additions that makes Backblaze so popular, but could be popular alternatives.

2 hours ago by thayne

From a quick look at pricing pages it looks like rsync.net is 5x as expensive as backblaze b2, and has a minimum of 400gb per month (it also looks like you might have to preallocate vs pay on demand). And tarsnap is 10x as expensive as rsync.net.

My guess the bulk of that price difference is due to economies of scale.

an hour ago by tbodt

The only reason I use Backblaze instead of tarsnap is it is 62 times cheaper for the same amount of storage. tarsnap is dramatically overpriced if you ask me.

an hour ago by gpm

So, looking at some potential things to switch to...

OVH has two interesting products here:

OVH cloud archive works with rsync, costs roughly half as much for storage as blackblaze, roughly the same for egress, but charges for ingress (at the same rate as egress).

OVH object store is s3 compatible (like blackblaze), charges roughly the same for bandwidth, 2x for storage.

https://www.ovhcloud.com/en/public-cloud/prices/#439

Digitalocean has a blob store with the same pricing on bandwidth, 4x the pricing on storage, a minimum spend of $5/month on storage, but the first tb ($10) of bandwidth free.

https://www.digitalocean.com/pricing#spaces-object-storage

2 hours ago by Skunkleton

Rsync.net has a self-supported borg option that is a great deal.

2 hours ago by BeefySwain

Can someone here translate the PR/marketing speak here for us mere mortals? How does having Facebook tracking on the web front-end of existing and paying users help with lead generation?

an hour ago by cosmie

Within Facebook, you can use the event stream collected by your FB Pixel to both define conversion criteria as well as create audiences and define inclusion/exclusion criteria for that audience. When it comes to tracking on pages behind auth, primarily it's for audience building which can be used for

ā€“ Cross-sell/Up-sell campaigns. Build an audience based on usage patterns, and create a create a campaign for a complimentary service or higher tier (say, for example, someone clicks the button for a gated feature they don't have access to).

ā€“ Suppression lists. If you don't want your campaigns to target existing users, you can build an audience from pixel data on your authenticated pages and suppress against that.

ā€“ Lookalike audiences. After you create an audience in Facebook, you can create a "lookalike audience" from that. So even if you aren't actively doing either of the above, you'd derive value from tracking your "best" customers and using it as a seed list for a lookalike audience.

You're also not limited to using the FB Pixel for any of the above. In addition to a browser-side pixel, FB allows you to upload hashed customer information and use those for conversion tracking and audience building. Which used to be completely transparent to end users, but now you're able to see a list of companies that have uploaded your info to FB in this manner (I can't recall where it's buried in the user settings, off the top of my head).

All of that said, it's entirely likely that Backblaze wasn't intentionally sending any of this data to FB to begin with. An insidious aspect of FB's Pixel is that it automatically attaches listeners to a bunch of stuff on the page such as buttons and sends back interactions and associated metadata[1]. The flag to disable this isn't mentioned in the implementation instructions that are generated upfront, and it's actually a fairly uncommon trait for ad pixels. So a typical implementation tends to leave it on out of ignorance rather than make a deliberate determination on whether to use or disable that functionality.

[1] https://developers.facebook.com/docs/facebook-pixel/advanced...

2 hours ago by yellow_postit

Assuming itā€™s to build lookalike audiences [1] (find me people that act like the paying customers).

[1] https://www.facebook.com/business/help/164749007013531?id=40...

2 hours ago by busymom0

That PR/marketing tweet comes across as from someone who doesn't understand how big deal this actually is and why customers won't be comfortable with a FB pixel on their dashboards with data filenames and sizes.

an hour ago by whenlambo

Take a look on https://wasabi.com/

2 hours ago by teruakohatu

Including the filenames seems to have been unintentional, looks like they were logging analytics to Facebook, probably an even (form submission) but uploaded the form html with contents.

But why they need to submit that to Facebook for paying users I don't understand. The only thing I can think of is excluding active users from advertising... But is that worth the privacy intrusion?

2 hours ago by robbiemitchell

They answered in the Twitter thread: they send data about paying users to Facebook so they can build lookalike audience targeting for new user acquisition. Other major ad platforms (Google, LinkedIn) have similar features.

This doesnā€™t look like the right data to be sending FB for that, though.

2 hours ago by neilv

They could maybe salvage goodwill with genuine corporate soul-searching that ends up asserting/reasserting values -- and leads them to focus on providing trustworthy service to their users, and conspicuously away from some "tech" industry norms of selling out one's users.

As a provider of a paid service, it seems like they're in a better position to take the high road than a lot of tech companies are, but they have to decide that's who they are, and be clear they mean it.

2 hours ago by kardos

Yep that pretty much burns the service :/

3 hours ago by alphabettsy

I doubt itā€™s intentional, but companies should seriously consider if the benefits of integrating things like Facebook Analytics tools outweighs the negatives. It seems like considering their audience they would not use Facebook of all things.

2 hours ago by busymom0

I operate quite a few apps and recently launched a website too which handles a lot of sensitive user data. I decided to make - not having any analytics, trackers and ads the selling points of my apps and sites. Get a lot of positive emails from customers thanking me for that. I was recently even wondering if Google penalizes sites for not having putting their analytics on the sites/apps from appearing on the search results.

I legit don't understand why a paid storage service would put a FB pixel on their dashboard which handles user files. It's a completely foreign concept to me. This seems like a screw up but also erodes a lot of trust which is unfortunate as I had been looking at them for past 2 months actually.

I even made a post just yesterday and another couple weeks ago on how BackBlaze's inability to set a specific file name, file size limit and expiry date on the pre-signed urls is preventing some of us from switching over from S3 to Backblaze for our storage of app data needs. And surprisingly, I wasn't the only one as I got a few people responding with same concern.

https://news.ycombinator.com/item?id=26430959

Basically:

> A limitation I ran across when using B2 was that their pre-defined url generation doesn't allow you to set file-size limits nor does it allow you to set the file name in the pre-defined url. It simply gives you a pod url to upload it to. So if you are using b2 for storage for lets say image uploads from browser, some malicious user has the ability to modify the network request with whatever file name or file size they want. Next thing you know, you have a 5gb sized image uploads happening.... This pretty much prevents me from using B2 for now.

> I ran into the same limitation! IIRC, there also wasn't a way to expire a signed upload URL sooner than whatever the default was, which was hours or maybe a day. I had the exact use case you mentioned, too - image uploads bypassing my backend server. I didn't want the generation of a signed url to, say, upload a profile photo, give carte blanche to create a hidden image host when combined with the limitation that you highlighted. All sorts of bad things could come of that. I ended up just going back to S3 - costs more, but still worth it.

Since this is for a site/app which lets users upload data, I am really trying to avoid S3 due to crazy costs. I might look into DigitalOcean's offerings. Anyone have any other recommendations?

2 hours ago by mgraczyk

Just an FYI in case this isn't clear to people. Even though this is a serious problem, it's super unlikely that FB is using the filenames in any way, or aware they were being sent. That part is probably just a (really awful) mistake.

31 minutes ago by TheDudeMan

> it's super unlikely that FB is using the filenames in any way

You mean other than storing them forever?

13 minutes ago by mgraczyk

Facebook, like other GDPR compliant companies, deletes most data after 90 days by default. My guess is that this data was not intentionally stored, and any place where it was will be deleted manually within a few days as part of remediation for this issue. Even if that doesn't happen, it would automatically disappear after 90 days.

an hour ago by swiley

This is the same company that has published iOS apps that have silently escaped the sandbox. I think pathological is probably an appropriate word for them and wouldn't trust anything remotely related to them with a single bit of personal data.

12 minutes ago by mgraczyk

Sounds unrelated

24 minutes ago by RobLach

Iā€™m unsure if thereā€™s any evidence to support this claim.

an hour ago by walrus01

But you have absolutely no way of knowing for certain.

11 minutes ago by mgraczyk

That's not actually true, it's possible to wait and see if FB publishes a statement. If they do, you'll know.

3 hours ago by AdmiralAsshat

Just remember the old adage:

Even if you're paying for the product, you're probably still the product.

2 hours ago by risyachka

Pure speculation. In some cases you are, in some you are not.

In this particular one you are not.

2 hours ago by withjive

You're being down voted because in this case customers are obviously the product. Being sold to Facebook.

Daily digest email

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.