Who is reading over your shoulder when you read the news online?
Who is reading over your shoulder when you read the news online?
Have you ever read a newspaper and noticed a stranger reading it over your shoulder? Reading the news online is like having Google, Facebook, or Twitter doing the same thing. Known as "third party trackers", these companies collect data about who you are, what you’re reading and what you’re interested in, usually without you ever knowing it. Online tracking is an integral component of the internet's business model and it plays a vital role in a larger industry which profits out of our data.
Why are we being tracked? Online tracking is part of a larger industry which makes a profit out of our data. The data industry makes billions of dollars from collecting data about who we are and what we are interested in by tracking the websites we access every day.
Who is tracking you when you read the news online?
Trackography is an open source project of Tactical Tech that aims to increase transparency about the online data industry by illustrating who tracks us online and where our data travels to when we access websites. In particular, Trackography shows:
The data collected through Trackography is open and can serve as a resource for researchers, lawyers, activists, advocates, campaigners and digital security trainers who are interested in raising critical questions about third party trackers or who want to show what happens to our data online.
Tactical Tech aims to empower groups and individuals with practical ways to defend their right to privacy. To this end it is important to understand the broader data ecology, especially about who collects data, how and what is done with it.
Trackography was developed to:
By detecting the specific companies which track our online activity and the geographical location of servers that our data travels to when we access websites, we hope to contribute to the discussion on unseen and unconsented data collection and on the politics of data.
Online tracking means that our online behaviour is under the microscope by parties we have not explicitly consented to.
Advertising is the default business model of the internet. Almost every single website we access is being tracked by someone, somewhere. This is enabled through the use of tracking technologies, such as cookies, by companies which make millions out of tracking, collecting, analysing, processing, aggregating and selling our data - often at the cost of our civil liberties (such as our right to privacy).
In many cases, such companies create profiles about groups and individuals which aid the advertising industry. These profiles tell a story about us – which may or may not be true - and can include our political beliefs, gender, sexual orientation, economic status, habits, interests, affiliations and much more.
And while this might all sound harmless, we have very little control over how and when our data is collected, how our profiles are created, whether they are accurate, who they are subsequently shared with, who has access to them, what they are used for, where they are stored and for how long.
This is all part of a large industry which profits out of tracking, collecting and aggregating data with the purpose of creating individual and group profiles. Such profiles are then subsequently sold to various third parties which range from advertisers, publishers, insurance companies, pharmaceuticals, banks, communications service providers to departments of government.
Individual profiling can raise various types of concerns. Imagine not being able to get a bank loan because your bank has bought data about you which shows that you are an "unreliable customer". Or imagine your insurance company classifying you as someone with "risky behaviour" due to the fact that your browsing activities show that you have an interest in extreme sports. Or even worse, imagine law enforcement agencies knocking on your door because you "read too much" anarchist material online.
Group profiling can be equally problematic and can raise concerns for societies at large. Sociologist David Lyon argues that profiling is a powerful means of creating and reinforcing long-term social differences. Research has shown that clustering data about groups can lead to social stratification and discrimination, which is reinforced by an entire data brokerage industry that operates behind the scenes. Data brokers - companies that collect, analyse and sell consumer information - enable discriminatory targeting of groups based on sensitive information like financial situation or health indicators. By selling marketing lists like "Rural and Barely Making it" or "Tough Start: Young Single Parents", data brokers are putting people into categories ("data segments") which can lead to discriminatory behaviour towards them by those who acquire such lists.
Many countries around the world though do not have data protection laws. In the countries that do have privacy laws, they are not always properly enforced and/or inadequately safeguard data. The European Union is considered to have the strongest privacy frameworks globally, but even its Data Protection Directive is unable to catch up with the fast paced developments on the internet.
The architecture and business model of the internet is such which enables multiple third parties to constantly collect, process, aggregate, share, sell and store data in various countries around the world. This means that while our data might initially be collected within the EU, it might end up travelling to various other countries before it is ultimately stored in a final, non-EU country - only to then be shared again with parties located in other countries. In other words, it is practically very difficult to pinpoint the precise location of our data in every given moment, which makes its regulation and protection even harder.
Trackography highlights this problem by illustrating the specific countries that our data travels to when we access websites.
Online tracking: Media websites
Tactical Tech started Trackography by exploring online tracking through media websites across 38 countries around the world.
The premiss was that unlike other types of websites, online news are read by most of us everyday, regardless of our background. Third party trackers can potentially identify a lot of information about individuals based on the type of news they regularly read - such as their political beliefs, economic status, and much more - and create profiles about them.
Beyond the media: Online tracking across websites
Trackography expanded to the examination of online tracking through various other different types of websites. Such websites fall under the following sectors:
Each of the above sectors includes further sub-categories. Websites under the financial sector, for example, cover business, jobs, e-commerce, banks and consultancy. Under the health sector, we also included the websites of insurance companies. A wide range of different types of websites are included under "society" which cover human rights, activism, LGBT rights, dating, entertainment, culture and travel. We subsequently created new lists of websites and ran our software locally in the following three Asian countries:
Students from the University of Amsterdam subsequently contributed to the project by compiling lists of websites for 17 countries in the European Union which cover the following:
Through the use of VPNs, the students ran our software for each of the 17 European countries and collected results which show which third party companies track Europeans' access to these websites and where around the world their data travels to.
Trackography provides a snapshot of the third party tracking in over 2,500 media websites across 38 countries at specific moments in time. Some key findings based on the data that we collected include the following:
1. The United States of America (U.S.) is the main country globally which tracks what we read online.
2. In some cases, reading the news online results in individuals' data landing in the servers of adversary states. For example:
3. While your country might have privacy legislation, reading the news online might result in your data travelling to countries which have no privacy law. For example:
4. Some media organisations which advocate and promote human rights enable multiple companies to track individuals who access their websites. For example:
5. The Wall Street Journal, the Philippine Daily Inquirer and Kashmir Times enable the most companies globally to track the visitors of their websites - according to some of our tests.
6. Unlike the Global North, most countries in the Global South do not host the servers of their media websites. Instead, they are usually hosted in countries of the Global North which means that their citizens' data is subsequently handled and regulated under different laws and jurisdictions.
7. When we access media websites globally, the main countries our data travels to include the following:
Even though all the above countries have privacy laws, there are limits. Such laws don't necessarily protect the data of foreign citizens nor does all data fall under these laws. Additionally, it is currently unclear who these countries share collected data with and where such data is eventually stored.
8. Country-specific highlights:
Trackography provides a snapshot of third party tracking through various different types of websites (e.g. governmental and financial) in India, Thailand, the Philippines and 17 European countries. Some key findings based on the data that we collected include the following:
Google tracks Indians' online behaviour more than any other company. In particular, it tracks 68.7% of users' access to health, financial, social and political websites in India.
Even though the server of onlymyhealth.com is located in India, when users access that website they also connect to the servers of 18 tracking companies which are located in the United States, Australia, Japan, Vietnam, Germany, the Netherlands, Sweden, Ireland and the United Kingdom. Such companies include data brokers like PubMatic and AppNexus, which have data retention periods of 270 and 730 days respectively.
onlymyhealth.com includes information on various health conditions, ranging from pregnancy and STDs to diabetes and cancer
Google tracks all (100%) access to Thailand's most popular websites.
Google tracks the online behaviour of Thai online users more than any other company. In particular, it tracks 66.8% of users' access to financial, social, health and governmental websites in Thailand.
Even though the server of ohozaa.com (one of the most popular websites in Thailand) is located in Thailand, when users access that website they also connect to the servers of 11 tracking companies which are located in the United States, Japan, Malaysia and Ireland.
Google tracks the online behaviour of Philippine users more than any other company. In particular, it tracks 73.2% of users' access to financial, social, health and governmental websites.
Google tracks 80% of access to the most popular websites in the Philippines.
When users in the Philippines access couragephilippines.blogspot.com they connect to the servers of 16 tracking companies which are located in Canada, Australia, France and the United Kingdom. Such companies include data brokers like PubMatic and Lotame, which have data retention periods for 270 days.
couragephilippines.blogspot.com is a website run by the Roman Catholic Church which provides "spiritual support for men and women with same-sex attractions"
Based on the following results it is evident that Google tracks users' access to governmental websites in 17 European countries the most:
|Spain||Google, 43.1%||Facebook, 5.2%||AddThis, 3.4%|
|Switzerland||Google, 29.7%||Facebook, 1.9%||AddThis, 1.3%|
|Belgium||Google, 49.2%||Facebook, 3.1%||Neustar, 2.7%|
|Hungary||Google, 47.9%||Facebook, 13.7%||Yandex, 1.6%|
|Latvia||Google, 46.5%||Gemius, 1.6%||Facebook, 3.6%|
|Cyprus||Google, 48.2%||AddThis, 3.6%||Facebook, 5.8%|
|Estonia||Google, 58%||Twitter, 11.6%||Brightcove, 2.3%|
|England||Google, 68.2%||Twitter, 9.1%||AddThis, 1.2%|
|Facebook, 2.3%||Neustar, 2.9%|
|Ireland||Google, 50%||Facebook, 13.2%||Krux, 31.9%|
|Austria||Google, 80.4%||comScore, 34.8%||Zopim, 1.1%|
|Romania||Google, 52.2%||Twitter, 10.9%||LongTail Video, 2.9%|
|Germany||Google, 22.9%||etracker, 11.4%|
Specific Media, 7.7%
|Malta||Google, 43.1%||Twitter, 13.8%||WPP, 10%|
|France||Google, 40%||Twitter, 16.7%||AddThis, 8.3%|
|Italy||Google, 49.2%||Facebook, 4.6%||Yahoo!, 1.5%|
|Sweden||Google, 41.7%||LongTail Video, 2.1%||Facebook, 2.1%|
View more results on online tracking across websites in the European Union, Asia, Thailand and the Philippines here.
For our case study on media websites we ran our distributed data collection software in 38 countries around the world and identified hundreds of companies which track individuals through media websites. According to our results, some of these companies track individuals in almost all of the countries and media websites that we examined. We call them the "globally prevailing tracking companies". We analysed their privacy policies to gain an insight on how they claim to handle our data.
In particular, we collected data on the following fields from their privacy policies:
Out of 25 globally prevailing tracking companies, 19 of them state in their privacy policies that they collect personally identifiable information (PII) and disclose data to third parties, without explicitly prohibiting them from using such data for unspecified purposes.
Only 11 out of 25 globally prevailing tracking companies disclose how long they retain data for in their privacy policies.
22 out of 25 globally prevailing tracking companies are based in the United States of America.
Only 3 out of 25 globally prevailing tracking companies support Do Not Track (DNT).
While 25 globally prevailing tracking companies state in their privacy policies that users can "opt-out" from online tracking, this option is largely conditional in some cases due to some of the following reasons:
users can only opt-out if their browser is not configured to block third party cookies
in some cases, users can only opt-out through the Digital Advertising Alliance website
While most globally prevailing tracking companies comply with the U.S - EU Safe Harbor Framework, this does not prevent them from collecting users' data and from sharing it with third parties.
Through Trackography we examined which companies track us and where our data travels to when we access websites. Our methodology includes the following:
1. Creation of datasets
Tactical Tech started Trackography by exploring online tracking through media websites. We created datasets which contain the URLs of global, national and local media websites and blogs covering the news for 38 countries around the world. These datasets were reviewed by global contributors to the project.
2. Running Trackography's data collection software
Our software is designed to emulate a browser and to connect to the websites included in the datasets. The software not only allows us to view a user's traceroute to the server of a specific website everytime he or she accesses it, but to also collect all the third party URLs which are included in the websites.
Details about how to run our software can be viewed through our repository on github.
3. Analysis of results
Some of the results collected from our software illustrate which specific companies track us and where our data travels to when we access websites included in our datasets.
In our case studies we examined the results we collected based on the following:
How and why online tracking differs in various countries around the world
The legal privacy framework of some of the countries that host the servers of websites and tracking companies
4. Visualisation of results
We created a map which visualises some of our research results.
We encourage you to play with the map and to view the following:
which companies track you when you read the news online
which countries your data passes through when you read the news online
which countries are hosting the servers of the companies tracking you
which countries are hosting the servers of the media websites you access
how tracking companies handle your data
Which types of websites has Trackography looked at?
We started off the project by examining online tracking through media websites. Afterwards we expanded the project and are currently examining online tracking through various other types of websites.
Who is tracking us when we access websites? Who are the 'trackers'?
For more information about the specific companies tracking you when you access specific websites in various countries, please view our map.
Why are these trackers interested in tracking us?
Companies track users' access to websites because they engage in (one or more of) the following:
Many of these companies argue that they track individuals' access to websites so that they can improve the services that they provide. Companies in the advertising business aim to understand their audience as much as possible so that they can provide targeted advertisements.
Do trackers change across time?
Yes, our results provide a snapshot and show which companies track us when we access a website in a specific moment in time. The third party trackers will change depending on the browser, location and time the website is accessed.
What data is being tracked when we access websites?
Moreover, every third party tracker collects your IP address and other identifiable data and stores browser cookies, local shared objects and other tracking technologies on your browser. This permits them to keep track of your online habits and behaviour and to create profiles about you.
Why are you focusing on media websites?
We chose media websites for our first examination because they are commonly accessed by the majority of citizens around the world who have Internet access - regardless of their background, gender, ethnicity, occupation, affiliations and other characteristics. We are interested in exploring how regular daily browsing habits, such as reading the news online, can result in our tracking.
Furthermore, third party trackers can potentially identify a lot of information about individuals based on the type of news they regularly read - such as their political beliefs, economic status, and much more - and create profiles about them.
Why is information about my country missing from the map?
If information about your country is missing from our map, that's likely because we haven't found someone yet to assist us with the review of the list of media websites and/or to run our software from your country. Please help us add information by connecting us with a media expert and/or someone who runs Linux from your country.
Why is my media organisation missing from the map?
We collected lists of media websites in assistance with local partners. If your media organisation is missing from our map and you would like it to be included in the tests, please contact us at email@example.com.
Where can I find my country's list of media websites?
You can find your country's list of media websites through our github repository. If we have already reviewed your country's media list, you can find it in the verified section and if not, you can find it in the unverified section.
How can I review my country's media list?
If you found your country's media list in the unverified section and you're a media expert, a journalist or generally have good knowledge of your country's media, you can review the media list through the following steps:
Add missing websites which cover the news, are of public interest and which are regularly accessed by most individuals on a national or regional level in your country
Delete websites from which are not regularly updated, do not necessarily cover the news and are not regularly accessed by most individuals on a national or regional level in your country
Separate the following in the list:
National media websites
Why is my country's list of media websites in the unverified section?
If you found your country's list of media websites in the unverified section, that's probably because it has not been reviewed by a media expert yet.
Why doesn't my country have a list of media websites on github?
If you didn't find your country's list of media websites in the verified or unverified sections on github, that's probably because we have not compiled a list for your country yet. Contact us at firstname.lastname@example.org, ask us to add a list of media websites for your country or help us create it.
How can my country's list of media websites be transferred to the verified section on github?
Should I add Facebook pages in the lists of media websites?
No, because Facebook is one of the third party trackers often included in media websites that we are interested in detecting. We are interested in media websites which include the domains and servers of third party trackers, but not in webpages hosted by third party trackers themselves, such as Facebook.
Should the media lists be restricted to citizens accessing them in my country or can they also be expanded to media websites accessed by my country's diaspora?
Preferably, we would like to restrict media websites to ones accessed by individuals residing in your country. However, websites accessed by your country's diaspora can also be included - but that is not our priority.
Should media websites accessed via mobile phones also be included in the lists?
Currently, we are not including websites accessed via mobile phones. However, we hope to expand the project to include those in the future.
What do you mean by "network infrastructure" in the map?
Companies in the "purple countries" of the Trackography map host the network infrastructure required to reach the servers of the media websites you have selected, as well as the servers of the companies which track users through the selected websites. By network infrastructure we mean the satellites, fibre optic cables, switches, routers and international or national Internet carriers.
How did Trackography expand beyond the media?
More recent tests have examined online tracking through websites that fall under the following categories:
Government and Politics
What other types of websites are included in Trackography's tests?
Various non-media websites have been included in Trackography's latest tests, all of which can be viewed through our repository on github. Such websites cover banks, consultancy, health insurance, government services, human rights, activism, LGBT rights, dating, culture and travel, to name a few.
Why did Trackography expand beyond the media?
Trackography expanded to the examination of online tracking across a broad spectrum of different types of websites to:
increase transparency about which specific third parties are in a position to aggregate tracked data and to potentially create profiles about groups and individuals
illustrate the countries under which our data is potentially regulated following our access to different types of websites
foster a debate about profiling which can occur through aggregated online tracking
If the various types of websites that we regularly access are matched together, one can potentially reach inferences about us. For example, if someone knows that you regularly access LGBT websites, as well as the websites of the European Union and other job-seeking websites, it's not that hard for someone to correlate such data and to reach the inference that you are an unemployed LGBT person based in the EU, right? That may or may not be accurate, but that's not the point. That is an inference that algorithms are likely to reach when the above has been correlated and aggregated.
We expanded Trackography to the examination of online tracking across various types of websites to identify which main companies are in a position to aggregate tracked data and to potentially play a key role in the profiling business.
Which countries have been included in such tests?
We have collected results on online tracking across various different websites - as mentioned above - in India, Thailand and the Philippines, as well as in 18 European countries. More details can be viewed through our repository on github.
Why was India selected for these tests?
India was selected as a case study because we are interested in exploring online tracking in the world's largest (in terms of population) democracy. Furthermore, we are interested in exploring the potential role that the data brokerage industry plays in the global south.
Why was Thailand selected for these tests?
Thailand was selected as a case study because we are interested in exploring online tracking in a non-democratic regime of the global south, which can then potentially be compared with online tracking in democratic regimes in both the global south and north.
Why were the Philippines selected for these tests?
The Philippines was selected as a case study because we are interested in exploring online tracking in the global south and we happened to be in the region for RightsCon 2015.
Why were tests run on governmental websites in the European Union?
Given that governments in the European Union are committed to protecting their citizens' data, we are interested in exploring whether their websites enable third parties to collect data through online tracking.
How can I contribute to the creation of datasets which include various types of websites from my country?
Our software is designed to run on a list of websites to detect the third party trackers and the traceroutes that are performed when we access the websites in the list. In order for our software to be able to run on a list of websites, it needs to be included in our repository on github.
You can create your own list of websites that you are interested in examining and add it to our github repository. Alternatively, if you are not a github user you can drop us an email at email@example.com and we can add it for you.
What does Trackography's software do?
Our software is designed to:
Who can run the software?
Any Linux user can potentially run our software. It's quite easy and details about how to run it can be viewed here.
I want to run the software, but I am not a Linux user. Can I?
Unfortunately not (yet).
How can I run the software for all non-media websites?
When running the software on a list of websites under our "special media" category on github (which includes all other, diverse lists of websites, excluding media websites), please run the following:
./perform_analysis.py -S name-of-file
python2 perform_analysis.py -S name-of-file
More details can be found on github.
How long does it take to run the software?
The software usually requires about 30 minutes to run and sends 8-15 megabytes of data to our server.
The software hasn't finished running and I need to relocate to another location under a different ISP. What should I do?
Freeze the software by pressing control + C. When you return to the same ISP, you can restart the software and resume from where you left it.
Should I run the software over Tor?
No, because our software performs traceroutes which cannot run over Tor. If the software runs over Tor, the web connection would appear from a different network point than the traceroutes and would lead to inaccurate results.
Can I run the software over a VPN?
If you would like to run the software over a VPN, please specify the country of your endpoint in the required field right before you start running the software. For example, if you are based in the United States but your VPN ends in Sweden, please specify the country with "-c sweden". It is also recommended that you add the option "-i".
Once the software has run, should I send the results to you?
No need to. Once you've run the software, the results will automatically be transmitted to our server. If you would like to prevent your collected data from automatically being transmitted to our server, please add the option "-d".
Tool lets you see exactly who's tracking what you're reading - Open Society Foundation
Reputation, Polizei und Nutzerverfolgung - Elektrischer reporter
Partner project Tactical Techs new tool helps you watch the watchers - Global Voices Online
Trackography bringt licht ins dickicht der stillen datenlauscher - Politik Digital
Trackography you never read alone - Netzpolitik
Tracker ranking der 50 top news webseiten - Netzpolitik
If you have any questions or concerns which we have not addressed in the above FAQ section, please feel free to contact us at **firstname.lastname@example.org**
pub 3200R/0x94E7EF47 2014-08-05 [expires: 2015-08-30] Key fingerprint = ABC2 7639 5EE3 3245 A0A1 3973 40E2 6C25 94E7 EF47 uid TrackMap project email@example.com sub 3200R/0x504DEBDF 2014-08-05 [expires: 2015-08-30]
Tactical Tech's Trackography team primarily worked on this project:
Claudio Agosti - Project lead and software developer
Maria Xynou - Researcher
Niko Para - Web developer
Fieke Jansen - Politics of Data lead
Roberto Pizzato – Research assistant
Additionally, many individuals from around the world have helped make Trackography possible. Special thanks to the following individuals and organisations which contributed to this project:
Alisa Ruban (Centre UA) Andrew Hilts Anne Roth Article 19 staff Ayaz Ahmed Khan Bella Shakhmirza Dalia Othman Dr. Olumide Abimbola (Nigerians Talk) Front Line Defenders George Kargiotakis (Greek Research and Technology Network) Gustavo Gus Hanna Kreitem Katie Kleemola Marta G. Franco (Catorce) Melanie Pinlac Minimal Mycelium Niels ten Oever (Article 19) Nick Hargreaves Nighat Dad (Digital Rights Foundation) Pavlo Myronov Rahma Muhammad Mian (Knight International Journalism Fellow) Rona Even Merrill Sasha Kinney Snehashish Gosh Syaldi Sahude szaszak Tareef Alateeq University of Amsterdam students Vannak Lach Ximin Luo
Many thanks to all the anonymous activists around the world who contributed and who continue to contribute to this project.