In recent years, technology companies have collected a large amount of data about people, and if they fall into the wrong hands, they can bring more danger. In response to this danger, Europe adopted the General Data Protection Rules (GDPR), giving users broader “access rights”. According to the rules, any company must provide users with personal data that they collect and post.In addition, these companies must provide data in such a way that users can read it in a timely manner, and provide sufficient background information to help users understand how the company collects and uses this information.
The initial goal of the GDPR is that when users understand what data a company has, they can use it to make informed decisions, such as deciding whether to provide this data and collecting data without their consent. Let these companies pay the price.
The problem, however, is that companies are often shy about providing this data. After all, if their services are essentially “forced consent” (Google has recently been fined 50 million euros), then they may not want to make it easier for users to see how much personal data they collect. Technical reporter John Porter decided to test the “access rights” offered by the four largest technology companies operating in the European Union: Apple, Amazon, Facebook and Google. His results show that, although users may receive raw data, they are in fact difficult to understand, and it is difficult to make informed decisions based on this data.
In accordance with the rules of the ICO Data Protection Registrar UK, companies must provide all personal data, that is, any identified or identifiable data relating to individuals, at the request of the user. Information should be provided to individuals in a “widely used electronic format” in a “concise, transparent, understandable and accessible form, using clear and concise language”. It sounds simple, but how do the technical giants of the Big Four do it?
Initially, Porter easily uploaded his own data. Both Google and Apple's data download services allow you to choose which data to download. Facebook is not, but these three companies are easy to find personal data on their sites. At the same time, getting data to Amazon is a bit burdensome and requires clicking on the site’s “Contacts” page to find the options that are hidden at the end of the list. Potter waited 30 days before he received a link to download his data.
However, when Potter looked at the data he received, everything became confusing. Some files have ambiguous tags, and the storage format for other files is a headache. In fact, finding out what data Potter is looking at is not as easy as you think.
Google’s location tracking data is particularly difficult to understand. The company has repeatedly been criticized for tracking Android users, even if they turned off the main location tracking feature in the operating system. Consumer groups in seven European countries filed complaints with the appropriate data security regulators, using the rights granted by the GDPR to upload personal data, which should be a way to check whether these services use certain methods to collect more data. This should be a tool that allows companies like Google to take responsibility.
But when you actually look at the data, it's hard to see and understand. All of Porter's whereabouts from Google are contained in a 61-megabyte JSON file, and when opened in Chrome, it displays a tangled array of fields labeled “timestimpms”, “lattudeE7”, “logitudeE7”, and Evaluation of whether it is sitting or sitting in a specific vehicle.
Porter said that he had no doubt that this was all the location history information that Google linked to his account, but without context this data does not make sense. He must work hard to begin to understand these numbers and import them into other software for proper analysis. If the goal of a GDPR is to give people more control and understanding of the data collected by the company, then this piece of data uploaded by Google is practically useless. If you want to enter data into another system, JSON is great. But if you want to estimate how much data Google has and make informed data privacy decisions, they are not that helpful.
When it comes to other files, Porter does not even know what data he was looking at. A 4 GB HTML file called “My Activity” in the ADS folder may show it a large amount of information related to ad tracking data collected by Google, but there are no comments or metadata to explain this.
So far, these files are the most confusing and most important of all data downloads. They contain a lot of personal information that potential advertisers want, and Google should put more effort into explaining the meaning of this information. The company provided index HTML files to summarize user data, so why not include information about the contents of each file?
Despite the problems, Apple’s data release is better than Google’s. Most of the data provided by Apple is easy to read and understand, for example, CSV, TXT and JPG, with several JSON files. However, when you enter these files, there is still a lot of information that is difficult to understand.
For example, the file named “Apple ID Account Information” appears to contain 11 almost identical entries in the Apple Porter account that were created on the same date in 2014, but Apple did not explain that they were What. It seems that another ambiguous CSV-file "Application and Services Analytics" contains a complete list every time Porter searches the App Store, but there are too many empty cells in it only when he sees a 6.7 MB file. It has data.
Although it's terrible to hear all the Alexa queries, Amazon has done a much better job of presenting data, but this may be due to the fact that it has relatively little support for individuals. In most cases, the files and folders provided by Amazon have a clear label, although the company still has work to do to better tag its spreadsheet.
The irony is that Facebook actually has the most understandable data among the four services. First, each file provided by Facebook is an HTML file, and each document is sorted into a clearly marked folder, which provides the user with an overview of what is contained in each document. The files themselves have a clear layout and format, and viewing them is like viewing a page on Facebook, even if one of the pages is completely stored on the user's computer.
Facebook upload includes long index files that show users where to find all their information.
It's terrible to see the amount of personal data of users stored on Facebook, but at least you know exactly what the information is, and not guessing the content of each file.
At the end of the experiment, Porter found nearly 138 GB of data in the four services he contacted. These include 1.1 GB from Facebook, 392 MB from Amazon and 254 MB from Apple. Although Google has 72.5 GB of data for Potter downloads, most of them are backups of its Google Drive and Google Photos, which are 44.3 GB and 25.7 GB, respectively. The remaining data of Google Porter is only 2.5 GB.
After trying to understand and understand everything, it becomes clear that if these companies want us to really control their data, they still have a long way to go to manage their GDPR rules. Being able to upload data is one thing, but in order to make it useful, it means that you have to work harder to ensure that downloaded content will be easier for ordinary people to understand. At the very least, this means that these companies must provide a better index to tell users what data is included in which file, but it also means that they can organize the contents of these files in their own way.