By: Henry Fleming
In 2020, if you do not want any data to be collected about yourself, you would have to give up social media, texting, and email. But you would also have to give up less obvious services like banks, credit and debit cards, and even grocery stores, the outdoors, and countless other services. So unless you want to become a hermit who grows their own food deep in an unknown cave, to not have data collected about yourself is impossible.
The world is becoming a place with more information than we know what to do with. And most people do not have access to it. It is estimated that in 2020 humanity will create 35 zettabytes of data. That is like writing this article in ASCII about 4,795,177,400,000,000,000 times. If you wrote this entire article every day, it would take you about 13,137,472,000,000,000 years to write this that many times.
Today, more and more problems are happening because of the increasing amount and demand of data, and it is starting to become apparent that there needs to be a change.
As more people have access to the internet, demand for and amount of data increases. As people browse data, more data is collected, feeding an infinite loop of data. According to a 2013 study published by ScienceDaily, 90 percent of data has been created in the last two years. Because the growth of data is exponential, this is probably still true in 2020.
Why should we care about having too much data? The problem really started when online companies started collecting data about their users. Suddenly there was a huge treasure trove of data just sitting there only for the company to use. But it is too much data for even a big company to process. So they started selling it to other companies, and this is a very profitable industry. But the catch is, the client has to send the company the data, tying up internet data lines. Today, with more appliances connected to the internet, it is being tied up even more. We are starting to reach physical limits of how much data we can transmit. You may have noticed this yourself: if you have ever used wifi with a lot of other devices connected to it, you might have noticed that the wifi is really slow. This is because the way wifi works. If two signals are transmitted at the same time, both transmitters will wait a different amount of time and then try again. If the connection still fails, the transmitters will wait twice as long as before. More devices means more traffic, which means a transmitter is more likely to be transmitting at the same time as another transmitter. This in turn creates longer wait times. Why not just have each person use a different bandwidth for wifi? This would be unnecessary because devices are not transmitting all the time, leaving empty bandwidth that other radio services can not use. To allocate that bandwidth for wifi, you would need to remove allocations from other services, but most of the radio spectrum is already allocated and some parts of the radio spectrum have unique properties that would not allow wifi to work. And lower radio frequencies are not feasible for wifi because longer antennas are required.
Another problem with data is just the sheer size of it. Steven Fleming, who works as an analytics manager at Experian, a consumer reporting agency, answered questions about how his company uses their vast amount of data. How much of their data is used? “Pretty much all of it… An example would be when someone applies for a loan, they can look back and see how they applied for other loans, how many times, what types of loans. They can look at, did you get other loans, how did you do on paying back.” But how is Experian getting this data? “Most of [the data] is collected by other companies, that are offering some sort of credit product or loan” and “when you apply for those you give some of your personal information, [like] your social security number, name, address.” Then he says, “that lender will then say well, I do not know [this person] directly, I am going to query Experian and Experian going to say here is the information that we have about [the person]. Experian will store that data the lender sent and [Experian will] have a new piece of data,” and each month the lender reports on how the person has performed on payments that month and Experian will also store that. Does Experian ever delete data? “Usually it is more of a soft delete. so that the data still there but its coded so that it will not be used [in credit score calculations.]” Not deleting data has led to incredibly large data sets “[Experian] is currently moving its data from data centers around the country to the faster Amazon web server. because consumers had to wait too long to make a decision based on their credit score.”
The speed of accessing data is a problem that Experian is not alone dealing with. If you tried to download 35 zettabytes of data, assuming a download speed of 1 gigabits per second and a byte size of eight, it would take over 8,878,741 years to download. By which point humanity will have created at least 310,755,935 zettabytes of data (that is if things continue at and above the current rate), which will take another 9,854,006,060,000 years to download. How fast would you have to download it in a year? About 1,109,800 gigabits per second.
But this just affects the internet right? Nope. In countless other fields, data’s problems are bleeding over.
In early February 2020, OneWeb, a satellite internet service that aims to provide “high-speed broadband internet to every corner of the earth,” launched 34 satellites into space. And in this case, internet expansion has literally bled over into astronomy. These satellites transmit and receive on frequencies close to that of radio telescopes. While the laws require them not to transmit on the radio telescopes frequencies, sometimes it is hard to filter out the noise. Some radio telescopes are starting to have interference in the forms of streaks of light across the images. The satellite interference also limits the times of night that radio telescopes can operate. Since the earth rotates, this also limits the area of the sky that astronomers can study, which can stop humanity from making important astronomical discoveries. But this is just with 34 satellites. OneWeb plans to launch a total of 650 satellites. But OneWeb is not the source of the problem: they act on the growing market of internet expansion, which is happening because more people want faster access to large data sets and websites. It is not just OneWeb. SpaceX and Amazon have also expressed interest in satellite internet, which could lead to even more problems and even an astronomical blackout.
Feb 26, 2020. Clearview AI, a company that had only just risen to fame a month earlier, was hacked (this actually happened while I was writing this article). Luckily the hacker(s) did not steal Clearview AI’s massive database of over 3 billion controversially obtained pictures of human faces, but the data that the hacker(s) did steal included all of Clearview AI’s client information which included the number of searches that each client did and the number of accounts that they set up. This brings up the question of unwilling data collection and data security.
One thing about data is that if it is public at any point in time, it is really hard to delete fully even if you press the delete button. Why? Because other people or companies might have downloaded or web scraped that data, and the company themselves might make the data not publicly available but keep it for personal gain.
So why was Clearview AI’s database so controversial? Clearview AI was developing an AI system to identify photographs of people. Clearview AI worked mainly with law enforcement agencies. Clearview AI was web scraping images for their database from social media platforms like Facebook and Twitter, unannounced to the users of those platforms. And if you wanted to have a picture removed from Clearview AI’s database you would have to go through a difficult process where you would have to give them a picture of yourself holding a government id and Clearview AI would review you with law enforcement. Just for one picture. But the hack might just be a warning sign. The hacker(s) may have just been showing that they can get into their database and may be coming back to steal all 3 billion photos. Since we do not know if the hacker(s) are malicious or just mad about their photos getting collected, we do not know what would happen if the photos were stolen. Having billions of photos without having to do the web scraping required to get them normally makes it much easier to pull off identity theft and other identity-related crimes.
So what can we do? Humans are really good at solving problems; it is amazing that we have computers in the first place. As more people are aware of problems with data, things might improve: people might decide to connect fewer devices to their wifi, companies might collect less data, satellite internet technology might be advanced, companies might give more options to people about data collection so that people can have very little data collected about them. Eventually, other physical limits might be reached. If we can foresee these problems we can eliminate them before they exist.
“The Flood of Big Data.” International Business Machines <https://www.ibmbigdatahub.com/infographic/flood-big-data>
“Big Data, for better or worse: 90% of world's data generated over last two years.” ScienceDaily <https://www.sciencedaily.com/releases/2013/05/130522085217.htm>
“Ethernet: The Definitive Guide.” Charles E. Spurgeon, and Joann Zimmerman.
“OneWeb Launches 34 Satellites as Astronomers Fear Radio Chatter.” Shannon Hall <https://www.nytimes.com/2020/02/06/science/oneweb-launch.html>
“Clearview AI, The Company Whose Database Has Amassed 3 Billion Photos, Hacked.” Kate O'Flaherty <https://www.forbes.com/sites/kateoflahertyuk/2020/02/26/clearview-ai-the-company-whose-database-has-amassed-3-billion-photos-hacked/#22a7fa547606>
“Clearview AI’s Database Has Amassed 3 Billion Photos. This Is How If You Want Yours Deleted, You Have To Opt Out.” Kate O'Flaherty <https://www.forbes.com/sites/kateoflahertyuk/2020/01/26/clearview-ais-database-has-amassed-3-billion-photos-this-is-how-if-you-want-yours-deleted-you-have-to-opt-out/#4f3940f360aa>
Comments