writing for the Wired has an Interesting Story on Google’s Safe Browsing.
The story is here.
In the beginning there was phone phreaking and worms. Then came spam and pop ups. And none of it was good. But in the nascent decades of the internet, digital networks were detached and isolated enough that the average user could mostly avoid the nastiest stuff. By the early 2000s, though, those walls started coming down, and digital crime boomed.
Google, which will turn 20 in September, grew up during this transition. And as its search platform spawned interconnected products like ad distribution and email hosting, the company realized its users and everyone on the web faced an escalation of online scams and abuse. So in 2005, a small team within Google started a project aimed at flagging possible social engineering attacks—warning users when a webpage might be trying to trick them into doing something detrimental.
A year later, the group expanded its scope, working to flag links and sites that might be distributing malware. Google began incorporating these anti-abuse tools into its own products, but also made them available to outside developers. By 2007, the service had a name: Safe Browsing. And what began as a shot in the dark would go on to fundamentally change security on the internet.
You’ve been protected by Safe Browsing even if you haven’t realized it. When you load a page in most popular browsers or choose an app from the Google Play Store, Safe Browsing is working behind the scenes to check for malicious behavior and notify you of anything that might be amiss. But setting up such a massive vetting system at the scale of the web isn’t easy. And Safe Browsing has always grappled with a core security challenge—how to flag and block bad things without mislabeling legitimate activity or letting anything malicious slip through. While that problem isn’t completely solved, Safe Browsing has become a stalwart of the web. It underlies user security in all of Google’s major platforms—including Chrome, Android, AdSense, and Gmail—and runs on more than 3 billion devices worldwide.
In the words of nine Google engineers who have worked on Safe Browsing, from original team members to recent additions, here’s the story of how the product was built, and how it became such a ubiquitous protective force online.
Niels Provos, a distinguished engineer at Google and one of the founding members of Safe Browsing: I first started working on denial of service defense for Google in 2003, and then late in 2005 there was this other engineer at Google called Fritz Schneider who was actually one of the very first people on the security team. He was saying, ‘Hey Niels, this phishing is really becoming a problem, we should be doing something about it.’ He had started to get one or two engineers interested part time, and we figured out that the first problem that we should be solving was not actually trying to figure out what is a phishing page, but rather how do we present this to the user in a way that makes sense to them? So that started the very early phishing team.
One of the trends that we had observed was the bad guys figured out that just compromising other web servers actually doesn’t really give you all that much. What they were getting was essentially bandwidth, but not a lot of interesting data. So then they turned to their compromised web servers that got lots and lots of visitors, and it was like, ‘How about we compromise those people with downloads?’ So there was a change in malicious behavior.
We were already working on phishing, and I thought, you know, the malware thing may be even a larger problem. And we’re sort of uniquely positioned, because with the Google search crawler we have all this visibility into the web. So then we started with phishing and malware, and Safe Browsing came together that way.
Panos Mavrommatis, Engineering Director of Safe Browsing: Safe Browsing started as an anti-phishing plugin for Mozilla Firefox, since this was 2005 and Google didn’t have its own browser then. When I joined in 2006, the team lead at the time was Niels, and he wanted us to expand and protect users not just from phishing but also from malware. So that was my initial project—which I haven’t finished yet.
‘But we did not really conceive that 10 years later we would be on 3 billion devices. That’s actually a little bit scary.’
Niels Provos, Google
The goal was to crawl the web and protect users of Google’s main product, which was Search, from links that could point them to sites that could harm their computer. So that was the second product of Safe Browsing after the anti-phishing plugin, and the user would see labels on malicious search results. Then if you did click on it you would get an additional warning from the search experience that would tell you that this site might harm your computer.
One interesting thing that happened was related to how we communicated with web masters who were affected by Safe Browsing alerts. Because very quickly when we started looking into the problem of how users might be exposed to malware on the web, we realized that a lot of it came from websites that were actually benign, but were compromised and started delivering malware via exploits. The site owners or administrators typically did not realize that this was happening.
In our first interactions with web masters they would often be surprised. So we started building tools dedicated to web masters, now called Search Console. The basic feature was that we would try to guide the web master to the reason that their website was infected, or if we didn’t know the exact reason we would at least tell them which pages on their server were distributing malware, or we would show them a snippet of code that was injected into their site.
Provos: We got a lot of skepticism, like ‘Niels, you can’t tell me that you’re just doing this for the benefit of web users, right? There must be an angle for Google as well.’ Then we articulated this narrative that if the web is safer for our users, then that will benefit Google, because people will use our products more often.
But we did not really conceive that 10 years later we would be on 3 billion devices. That’s actually a little bit scary. There’s a sense of huge responsibility that billions of people rely on the service we provide, and if we don’t do a good job at detection then they get exposed to malicious content.
Mavrommatis: Around 2008 we started building an engine that ran every page Google already fetched, to evaluate how the page behaved. This was only possible because of Google’s internal cloud infrastructure. That was part of why Google was able to do a lot of innovation at the time, we had this extremely open infrastructure internally where you could use any unused resources, and do things like run a malicious detection engine on the full web.
Moheeb Abu Rajab, Principal Engineer at Safe Browsing: Coming from graduate school, I had been trying to build this type of system on a couple of machines, so I was spending lots of time trying to set that up. And it’s just minimum effort at Google to run on a huge scale.
Mavrommatis: The other thing we developed at the same time was a slower but deeper scanner that loaded web pages in a real browser, which is more resource-intensive than the other work we had been doing that just tested each component of a site. And having those two systems allowed us to build our first machine learning classifier. The deeper crawling service would provide training data for the lightweight engine, so it could learn to identify which sites are the most likely to be malicious and need a deep scan. Because even at Google-scale we could not crawl the whole search index with a real browser.
Noé Lutz, Google AI engineer, formerly Safe Browsing: Around the same time, in 2009, we worked on machine learning for phishing as well. And this was a pretty scary moment for the team because up until then we used machine learning as a filtering function, to figure out where to focus this heavy weight computing resource, but this was the first time we actually decided something was phishing or malicious or harmful or not harmful in a fully automated way.
I remember the day we flipped the switch it was like, now the machine is responsible. That was a big day. And nothing bad happened. But what I do remember is it took extremely long for us to turn that tool on. I think we all expected that it would take a couple of weeks, but it took actually several months to make sure that we were very confident in what we were doing. We were very conscious from the get go how disruptive it can be if we make a mistake.
Provos: The moments that stand out do tend to be the more traumatic ones. There was a large production issue we had in 2009, it was a Saturday morning. We had a number of bugs that came together and we ended up doing a bad configuration push. We labeled every single Google search result as malicious.
Even in 2009 Google was already a prevalent search engine, so this had a fairly major impact on the world. Fortunately our site reliability engineering teams are super on top of these things and the problem got resolved within 15 minutes. But that caused a lot of soul searching and a lot of extra guards and defenses to be put in place, so nothing like that would happen again. But luckily by then we were already at a point where people within Google had realized that Safe Browsing was actually a really important service, which is why we had integrated it into Search in the first place.
Nav Jagpal, Google Software Engineer: In 2008 we integrated Safe Browsing into Chrome, and Chrome represented a big shift, because before with browsers like Internet Explorer, you could easily be on an old version. And there were drive-by downloads exploiting that, where you could go to a website, not click on anything, and walk away with an infection on your computer. But then over time everyone got better at building software. The weakest link was the browser; now it’s the user. Now to get code running on people’s machines, you just ask them. So that’s why Safe Browsing is so crucial.
Mavrommatis: Around 2011 and 2012 we started building even deeper integrations for Google’s platforms, particularly Android and Chrome Extensions and Google Play. And we created unique, distinct teams to go focus on each product integration and work together with the main teams that provided the platforms.
Allison Miller, former Safe Browsing product manager, now at Bank of America (interviewed by WIRED in 2017): Safe Browsing is really behind the scenes. We build infrastructure. We take that information and we push it out to all the products across Google that have any place where there is the potential for the user to stumble across something malicious. People don’t necessarily see that that goes on. We’re a little too quiet about it sometimes.
Fabrice Jaubert, software development manager of Safe Browsing: There were challenges in branching out outside of the web, but there were advantages, too, because we had a little bit more control over the ecosystem, so we could guide it toward safer practices. You can’t dictate what people do with their web pages, but we could say what we thought was acceptable or not in Chrome extensions or in Android apps.
Lutz: There were also some non-technical challenges. Google is a big company, and it can be challenging to collaborate effectively across teams. It’s sometimes hard to realize from the outside, but Chrome is written in a language that is different from a lot of other parts of Google, and they have release processes that are very different. And the same is true for Android, they have a different process of releasing software. So getting everybody aligned and understanding each other, I perceived it as a big hurdle to overcome.
Stephan Somogyi, Google AI product manager, formerly Safe Browsing: This is a very hackneyed cliché so please don’t use it against me, but the whole ‘rising tide lifts all boats’ thing actually really holds true for Safe Browsing. There wasn’t ever any debate that we wanted to expand its reach onto mobile, but we had a profound dilemma, because the amount of data that Safe Browsing used for desktop was an intractable amount for mobile. And we knew that everything that we push down to the mobile device costs the user money, because they’re paying for their data plans. So we wanted to use compression to take the data we already had and make it smaller. And we didn’t want the users to get hosed by five apps each having their own Safe Browsing implementation and all downloading the same data five times. So we said let’s bake it into Android and take the heavy lifting onto ourselves all in one place. It’s been a system service since fall of 2015.
So we built a dead simple API so developers can just say, ‘Hey Android Local System Service, is this URL good or bad?’ We also wanted to write this thing so it wouldn’t unnecessarily spin up the cell modem and eat battery life, because that’s just not nice. So if the network isn’t up anyway, don’t call it up. We just spent an awful lot of effort on implementation for Android. It turned out to be a lot more subtle and nuanced than we first anticipated.
Mavrommatis: The other big effort that our team was involved in around 2013 and 2014 was what we call “unwanted software.” It’s primarily for desktop users, and it’s sort of an adaptation from actors who may have in the past been using just malware techniques, but now they would find that it’s possible to hide malware within software that seems focused on a legitimate function. It was unclear how antivirus companies should label this, and how big companies and browsers should deal with this. But what we focused on was what is the impact on the user?
Around 2014, our data showed that over 40 percent of the complaints that Chrome users reported were related to some sort of software that was running on their device that would impact their browsing experience. It might inject more ads or come bundled with other software they didn’t need, but it was a potentially unwanted program. These practices were causing a lot of problems and we would see a lot of Chrome users downloading these kinds of apps. So we refined our download protection service and also found ways to start warning users about potentially unwanted downloads.
Jagpal: It’s a large responsibility, but it also feels very abstract. You get a warning or alert and you think, ‘Wait a minute, am I protecting myself here?’ But it’s so abstract that if we write code for something concrete, like turning on a light switch at home, it’s like, ‘Whoa, that is so cool. I can see that.’
Jaubert: My 14-year-old definitely takes Safe Browsing for granted. He got a phishing message as an SMS text, so it didn’t go through our systems, and he was shocked. He asked me, ‘Why aren’t you protecting me? I thought this couldn’t happen!’ So I think people are starting to take it for granted in a good way.
Emily Schechter, Chrome Security product manager (former Safe Browsing program manager): You can tell people that they’re secure when they’re on a secure site, but what really matters is that you tell them when they’re not secure, when they’re on a site that is actively doing something wrong.
People should expect that the web is safe and easy to use by default. You shouldn’t have to be a security expert to browse the web, you shouldn’t have to know what phishing is, you shouldn’t have to know what malware is. You should just expect that software is going to tell you when something has gone wrong. That’s what Safe Browsing is trying to do.