This fact sheet on big data synthesizes some facts deemed significant by our Fellows as part of the Presans Platform activity.
1. What is big data?
The big data theme is part of the more general topic of artificial intelligence. It refers to digital assets generally characterized by 3, 4 or 5 attributes starting with “V”: voluminous, veloces, varied, veracious, valued. The industrialization of operations on these assets to cross data in a new way took place on the agenda of companies around 2014, on the premise that everyone had to recruit data scientists and start to create value with its data, not just digital platforms.
2. Big data already outdated?
The application of big data is proving more difficult than expected in certain sectors, particularly in the industry where the killer app has not yet been found. The termination of the Google Flu project is often cited to relativize other failures.
However, it seems premature to announce the end of big data. On the contrary, the close affinity between big data and deep learning should continue to generate massive performance gains in ecosystems with appropriate governance. Deep learning is indeed very sensitive to the volume and quality of the data available: the more the data cover a wide variety of situations, the more likely the learning is to correctly automate the desired behavior.
Big data does not seem relevant where key players do not have access to a large centralized base, nor the ability to coordinate with each other to form a large base together.
3. Applications
Google and Facebook base their advertising business model on targeting enabled by big data. Amazon is one of the pioneers of dynamic pricing, which is one of the major applications of big data in the economic world. Servicization and functionalization rely on big data generated by sensors placed on industrial assets or others. The same goes for smart energy networks and smart cities.
But big data also has applications in politics, in the world of sport or in all areas where it is to prevent risks, or detect patterns … Prediction: big data is just beginning.
4. Actors
The big data movement merges with that of digital transformation and finds its origin in the world of Internet giants. It is the digital giants that began in the mid-2000s to build massive data storage and processing capabilities, and then make these massive capabilities available to other players via the cloud. The concentration of data within the platforms, supplemented by data brokers, quickly blew up the volume of data available.
In a longer perspective, big data marks a stage in the generalization to non-state or non-state actors of the ability to produce statistics on populations. At the origin of this development we find, not a desire for fiscal extraction, but the dream of individualizing as much as possible the advertising targeting.
5. Epistemological limit and ethical consequences
The fundamental epistemological limit of big data is inherent in statistics in general: correlation does not imply causality. A limit not to lose sight of because the increase in volume and variety of data also leads to an increase in correlations without a causal link.
An unscrupulous data scientist may choose not to keep to this epistemological limit, and all the more so because the economic interests at stake weigh heavily. In general, the statistical probity of science-based studies is not based on a system of favorable incentives: there are hardly any resources allocated to the replication of studies.
To this first ethical problem is added a second, which is that of the protection of privacy. Big data allows companies to better understand the preferences of an individual than those close to them. Coming out of Big Tech, Big Data thrives for good in Big Business and Big Government. Let’s assume that it will not be enough to wrap yourself in a posture of ethical superiority to foil a Big Brother scenario imposed by the big powers of the data.