Skip links

The Massive Data Collection by Facebook – Visualized

Research. Facebook’s data collection occurs through a wide variety of sources. Several studies have mapped out and analyzed the social media giant’s practices. Unfortunately, it is often difficult to visualize research outcomes due to the complexity of the analyzed phenomenon. ShareLab offers a comprehensive study into this topic, with visualized findings presented in a high – quality format.

ShareLab is a research and data investigation lab part of Share Foundation. Researchers look at the crossing between technology and society in an effort to better understand the new, emerging forms of privacy-related risks, network neutrality and security threats.

The investigation Facebook Algorithmic Factory is a trilogy that aims at mapping and visualizing a complex and invisible exploitation process hidden behind a black box of the World’s largest social network. In order to understand the context of this Facebook research, ShareLab first describes the Algorithmic Society. In this context,  labour is performed by algorithms. The objects of labour are digital content, digital footprint, metadata. Moreover, tools of labour are represented by social networks, digital platforms, devices. The resulting products are profiles, patterns, anomalies and predictions. In this framework, ShareLab exemplifies immaterial labour and states that

Every one of over 1 billion Facebook users, digital workers, work averagely 20+ minutes per day on liking, commenting, and scrolling through status updates.  That is more than 300.000.000 working hours of free digital labour per day.

ShareLab split their research findings into three articles, each looking at a different aspect of the Facebook phenomenon.  The team of researchers starts with Immaterial Labour and Data Harvesting Collection, an article focused on data collection. Human Data Banks and Algorithmic Labour is the second part that looks at storage and algorithmic processing. Finally, ShareLab presents their research on user targeting in Quantified Lives. ShareLab offers the following illustration to help the reader understand the path of the trilogy’s research .

ShareLab – Facebook Algorithmic Factory. Click for full enlarged version

The following sections in this article are concerned with the first part of the research trilogy, focused on data collection.

Data Collection

In order to map out how data collection takes places on Facebook, the ShareLab team looked at Facebook Data Policy, Input fields on Facebook, Cookie and pixel technology on 3rd party websites, Facebook- owned companies’ policy, Facebook Vendors, service providers and other partners and Facebook Ireland Ltd Report of Audit (2011). Further, researchers make the differentiation between data collection within and outside the Facebook platform.

A. Within Facebook

  1. Actions and Behaviour – interactions (likes, comments, any scrolling and/or clicking), content uploaded of created on the platform, visited pages.

  2. Profile Info – About me section and the like

A main difference between A1 and A2 is that while Profile information is static, rarely changed and could be fake, Activities and Behaviour data is dynamic, continuous and updated in real-time. Below you can see how data is collected within the Facebook platform using the two methods described above.

ShareLab – Actions and Behaviour. Click for full enlarged version.


ShareLab – Account and Profile Info. Click for full enlarged version.

B. Digital Footprint

  1. Mobile Devicesdevice ID, location, contacts, SMS content etc

  2. Laptop or Desktop Computers – IP address, operating system, browser type etc.

This section is concerned with data Facebook collects outside its platform. The four main ways this happens is through Cookies, Mobile Phone permissions, other Facebook Companies and Facebook Partners.


An important method Facebook gathers data outside its platform are cookies. According to Facebook’s Cookie Policy:

Cookies are small pieces of text used to store information on web browsers. Cookies are used to store and receive identifiers and other information on computers, phones, and other devices. […]We use cookies if you have a Facebook account, use the Facebook Services, including our website and apps (whether or not you are registered or logged in), or visit other websites and apps that use the Facebook Services (including the Like button or our advertising tools).

Looking at the 50 most used websites in Serbia, the ShareLab team identified 7 different 3rd party cookies embedded in every examined website. In total, 174 different types of cookies detected 365 times, belonging to 87 different companies. Four big US companies seem to dominate that list: Google (90%), Facebook (46%), Twitter (24%) and Amazon (10%). The illustration below pictures these results.

ShareLab – Facebook Online Trackers. Click for full enlarged version.


Mobile Phone Permissions

ShareLab draws on some of their previous research to point at the different types of information we allow Facebook to collect when installing Facebook, Facebook Messenger, WhatsApp and Instagram on a mobile device. Such data collection includes: device identifier, precise location of your device, identity of your contacts, content of your SMS messages, Your call log, record audio, get information about Your WiFi connection, download files without notification and many more.

ShareLab – Facebook Mobile Apps Permissions. Click for full enlarged version.


Other Facebook Companies

At the time of the research, Facebook owned and operated 7 more companies: Facebook Payments Inc., Atlas, Instagram LLC, Onavo, Parse, Moves, Oculus, LiveRail, WhatsApp Inc. and Masquerade. Information collected from the Facebook platform is coupled with that collected by the rest of the Facebook “family members”. Some of these are data giants themselves, such as WhatsApp (1 billion users) – February 2016) and Instagram (500 million – June 2016). They collect similar information to that mentioned above, collected by Facebook. Some, such as Oculus Rift and Facebook Payments add to the range of collected data information on users’ physical movements and dimensions, as well as financial transaction information, credit card numbers etc.

ShareLab – Facebook Mergers and Acquisitions. Click for full enlarged version.


Facebook Partners

Facebook developed the Facebook Partners program  in order to extend its data collection to 3rd party partners sources. In effect, data from data brokers such as Acxion, Datalogix and Epsilon was integrated in all categories of Facebook advertising. Such data brokers collect information through store loyalty cards, mailing lists, public records information (including home or car ownership), browser cookies, and more. To get an understanding of the scale, worth-mentioning is that Oracle-owned Datalogix has over 650 customers (among them digital media publishers and top US advertisers) that get insight into over $2 trillion in consumer spending in order for brands to personalize and measure every customer interaction. On top of the exchanging data with the mentioned partners, Facebook also collaborates with hundreds of other data dealers, Ad technology developers, data and marketing analysis companies, vendors, service providers and other partners that are providing technical infrastructure services. An overview of Facebook’s partners is available below.

ShareLab – Facebook Partners. Click for full enlarged version.


The complexity of the topic makes it difficult at times to map all data users provide, willingly or unwillingly, to the social media giant. Despite such challenges, ShareLab’s research manages to offer a comprehensive visual study of Facebook’s data collection practices.In the following articles, I will sum up the findings of the trilogy’s last 2 parts: storage and algorithmic, and user targeting.

Get the research here