Search Engines in general and Google in particular know a lot about everyone. Moreover, Google can learn about you without you ever having used their services. They know what they know because people choose to trust them. But in fact, Google is quite draconian in their policies and approaches to identification, profiling and tracking of individuals and organization along with associated interests, behaviors, and relationships.
Google offers many services, of which the vast majority are free such as Search, Safe Browsing, GMail, Apps, Docs, Maps, Wallet, Voice, Android OS, DNS, etc. It is worthwhile for Google to offer these free services so it can continue to identify, profile and track users.
[Memory lane: See predictions in CSO's 2006 article 5 ways Google is shaking the security world]
Google is a relationship of convenience for users, but people and organizations should understand that Google has made it clear they intend to own your data regardless of its legality or your desire for privacy. Google's actions clearly show that it operates with impunity. From reading your emails and voicemails, collecting data from personal wireless networks, online book publishing without permission and use of third party applications, Google's intent is demonstrated through their track record.
Moreover, most organizations have absolutely no idea what data is leaking to Google. And since Google has no delete button and a minimum 18 months retention policy with no maximum, organizations have no sense of how much data is sitting on Google's servers. They have no mechanism to even track this. However, everything Google collects is public, by virtue of content or criteria. This makes all of your data accessible by content and/or criteria. In fact there is a whole industry devoted to this -- Search Engine Optimization (SEO). However, what happens when the SEO's priority shifts from page rankings to uncovering an organization's vulnerabilities or competitive business plans? These are Blackhat SEOs.
How can organizations understand the extent of this threat and mitigate it? By leveraging both technologies and methodologies.
Today Search Engine Data Leakage Prevention technology is available to identify which specific Google applications and services are being used within an organization. Once identified, these applications, services and even file content can be blocked or logged. This offers the ability to, for example, allow Google Search without allowing Google Safe Browsing. The same holds true for the balance of Google's services.
Additional technology exists to account for SSL and encrypted traffic--i.e. traffic that circumvents organizational security. Simply by utilizing HTTPS, any user or site can bypass any perimeter security controls organizations may have in place. This technology can enforce global security policies on all traffic including SSL or IPSec encrypted traffic and provide visibility into all traffic, which includes SSL encrypted Google traffic.
Processes for managing the Google threat can incorporate one or more of these elements:
Google Service Identification - To identify all individual Google services and applications that are being used by organizational users. Furthermore, this could also offer insight into the content, including documents, leaked to Google intentionally or inadvertently.
Google Service Control and Blocking - Once Google services and the extent of their use have been identified, the determination can be made to implement controls for certain services that are deemed intrusive or unnecessary, such as blocking Google Safe Browsing. Alternative, less intrusive and perhaps more effective alternatives can replace these. An organization may also choose to automatically redirect users to alternative services if they select an undesirable Google service to simplify user transition.
Anonymization and Obfuscation - As is the case with all organizations, there are certain services that are deemed necessary such as Google Search. In these instances the user queries are anonymized through anonymizers. And obfuscation will eliminate the "The Search Bubble" (when Google delivers searches, based on your profile) is circumvented through eliminating the Google User ID (GUID) assigned to all users via cookies. Finally, user traffic to Google is further obfuscated through generation of random traffic to render its behavioral data invalid.
A combination of technologies and processes offers the needed visibility, control, mitigation and anonymization to prevent Google from gaining an insight that Blackhat SEOs can leverage to identify vulnerabilities or confidential business direction. By understanding the extent to which Google touches your organization, and by eliminating unwanted access and insight into your environment along with obfuscation of permitted functions, your organization can gain and retain full functionality while benefiting from the control you need.
Babak Pasdar is President and CEO of Bat Blue Networks.