Big data is a hot topic in recent years. The advantages of big data and the new trend of thought brought by big data have formed a research upsurge. From random sampling to all samples, from demanding accuracy to dealing with confounding, from pursuing causality to finding correlation, the era of big data is changing our information environment and information processing thinking mode. However, not everyone can enter the big data era at the same time. Like every innovation and diffusion of media technology, sensitive enterprises and organizations are pioneers and practitioners of big data and the earliest beneficiaries of big data; Ordinary individuals are different in the face of big data, some are slow to follow up in time, some lack data analysis ability, some don't know how to find open data, and some are at a loss in the face of data noise. The digital divide in the traditional Internet era has not been completely bridged, but in the era of big data, a new digital divide is forming, which constantly affects and changes people's political and economic status.
To discuss the digital divide in the era of big data, it is necessary to clearly distinguish between "digital difference" and "digital divide". Etymologically, the two words are similar in meaning, and they are both translated from the "digital divide". However, from the perspective of communication effect or emotional color, the digital divide can arouse people's vigilance more than the digital difference. In the era of big data, people create data and are surrounded by data. Because of people's vision and energy, people will inevitably have differences when facing data and making choices. For example, personalized search engines and personalized folders provided by the Internet will lead to personalized information browsing, and digital differences in the era of big data are inevitable. The digital divide can arouse people's vigilance more than the digital difference, which emphasizes the difference in understanding and opportunity. The digital gap means knowing that there is an opportunity and not doing it. The digital divide means wanting to do it without the ability and opportunity. In the context of big data, the digital divide may exist at three levels: owning data, analyzing data and thinking about data.
Three different analytical dimensions
(A) the digital divide in data
In the era of big data, terms such as "brand-new", "revolution" and "subversive" frequently appear, but the problems referred to under the label of "big data" have a long history. With the rapid development of Internet, people have to face the exponential growth of data, information overload and data processing problems. In the era of big data, the technology of data mining, storage, processing and application has developed rapidly, but the current discussion on big data has not given a satisfactory answer to the most basic question of who owns the data, which has caused the digital divide.
1. Open data
For enterprises and governments, big data is a valuable asset. "Mastering big data can be transformed into a source of economic value", which can understand and manage society from a more accurate perspective. Therefore, enterprises and governments need to collect data from the general public. The dissemination of data is a bottom-up process, and "digital pioneers" from enterprises and governments are the first to own and control big data. However, bridging the digital divide only needs another form of data flow, that is, open data-let the data owned by enterprises and governments be shared by the public, which is a top-down process. In real life, this top-down information flow faces resistance everywhere: on the one hand, enterprises regard data as core competitiveness or core secrets, spend a lot of manpower, material resources and financial resources to do data analysis, and it is difficult to enjoy data; On the other hand, the pace of government data disclosure is still relatively slow, and it is still difficult for the public to obtain valuable information.
The digital difference caused by open data needs to be solved through open data. What data can be opened to the public, in what form, who is the specific implementer, and who can pay for the "hitchhiking" behavior in the process of data opening, these are all issues that need to be considered. Big data can not only generate commercial value, but also have the characteristics of publicity. In this process, data closely related to public interests need to be open. As early as June 65438+1October 65438+July 2007, China passed the Regulations on Information Disclosure in People's Republic of China (PRC), which clearly defined the principles, scope, methods, procedures and supervision and guarantee system of information disclosure. In the era of big data, the government should further increase the openness of data, and at the same time educate the public about the literacy of obtaining data, so as to realize the ownership and enjoyment of data by the whole people. As a public resource, the fairness of data distribution, like the fairness of wealth distribution, will have a great impact on the social structure. Governments and enterprises can rely on the development of data storage and analysis technology to do "data banking" business, so that every citizen has the opportunity to store and extract the data he wants in the "data banking". In the book Big Data, Tu Zipei, a domestic scholar, thinks about open data from the perspective of data democracy, and points out that open data games will promote "a series of movements and slogans such as open politics, open government, open media and open cities", which provides a feasible way to eliminate the digital divide formed by data ownership and build a beautiful new world with fair data.
2. Data collection
The foundation of the era of big data lies in massive data. How big is big data? The latest report of McKinsey Global Institute defines big data as: "Big data refers to data groups whose scale exceeds the ability of traditional database software tools to capture, store, manage and analyze." Moreover, the standards of big data are constantly changing with the exponential growth of data. Today, when we talk about big data, we often use pb as the unit. Massive data provides more detailed information, but there are also some hidden concerns, that is, the value density of data is too small, so the cost of collecting data and finding valuable information in massive data is too high. In an exclusive interview with Xie Wei, a reporter from China Economic Weekly, Schoenberg said: "In many ways, we still live in an era of' small data', and collecting data in this era is very time-consuming, expensive and difficult." Data collection in the era of big data is a huge project, and big data is far from being affordable for ordinary people.
The digital divide in data collection does not seem to decrease in the era of big data, but gradually expands with the development of big data processing technology. It is not easy for media and enterprises to collect and process data. The famous Harvard Business Review magazine made a scientific survey on the application of big data by global fortune 1000 enterprises, and found that "most enterprises are still in the primary stage of big data, still very young, and do not have the ability to truly mine big data". In addition, "only smart% of the respondents think that their enterprise's data accessibility is good enough or reaches the world-class level. Only 2 1% of the respondents think that their company's analytical ability is good enough or reaches the world level. " Obviously, for the general public, data collection and mining are more difficult and different. In the era of information flow dominated by search engines, the public has created a digital divide because they use different search engines. There is a difference between using ordinary search engines and using more professional search engines and databases. In the era of big data, the public not only needs to know how to use professional search engines, but also needs to quickly find the most valuable information in the vast amount of information. Due to the difference of public capacity, the digital divide in the collection stage will be hard to avoid. Moreover, the data under the internet is constantly updated, and timeliness is very important and key. In the study of "knowledge gap", western scholars J.S. Aitima and F.G. Klein once mentioned the "upper limit effect", that is, the knowledge gap will gradually narrow over time. However, in the Internet age, the value and timeliness of information are closely related. Even if the public's "gap" in collecting data gradually narrows over time, the value of data owned by latecomers will be greatly reduced. Levinson, a representative of the School of Media Environment, may help to alleviate the differences in data collection in the era of big data. He believes that the establishment of information classification rules can solve the problem of information overload. For example, the problem of library information overload can be solved by establishing book classification rules on books and operating according to this rule. This idea has universal enlightenment significance for solving the information overload that has long plagued mankind.
(b) Analysis of the digital divide in data
Who owns the data will be different. Under the same data, the public's ability to use the data will be different. Big data includes both structured data based on quantitative relationship and unstructured data based on qualitative description, and unstructured data often accounts for a large proportion. Therefore, in the era of big data, having the same data does not mean that you can also use it. It is still necessary to arouse our vigilance to analyze the data and tap the digital gap in value.
1. Data deletion
The era of big data is an era of highly fragmented information. Repetition, noise, redundancy and human factors (network water army) in information all affect people's analysis and utilization of data. Deleting data at this time is as important as collecting data. In addition to "The Age of Big Data: Great Changes in Life, Work and Thinking", Schoenberg also has an influential work-Delete: Choosing the Way of Big Data. In this work, Schoenberg reminds people that in the era of big data, "memory becomes the norm and forgetting becomes the exception", so we should pay attention to the way of information selection; In this "world without forgetting", forgetting has just become a valuable information processing method, and deleting permission data is a humanized problem. With the development of "Computer Primitive Generation"
For a long time, everyone has a green, embarrassing and even insignificant past. Before the advent of the Internet, people would try to forget these little happy memories, but the memory of the Internet made everyone small and faced with the reality that people might pay for the mistakes they made ten years ago.
Deletion is also a technical problem. In the Internet age, data with a long history will gradually become "data garbage", which will not only occupy a lot of storage resources, but also affect the analysis of current data. Evaluating and deleting data has become an indispensable data processing method in the era of big data. But when it comes to individuals, there is a problem. People can't evaluate and process information like machines, and they can only process information according to past experience. Ticino, another foreign scholar, mentioned that personal information reserves will also create a "knowledge gap" when analyzing the causes of the "knowledge gap", that is, "formal education and information obtained from mass media will help people with higher education provide a background for understanding knowledge." The era of big data has not changed people's habit of accepting information, so the first person to learn to accept and delete information is the person with higher education. Deletion also has philosophical significance. In the era of big data, the choice is to delete. People's acceptance of data has a zero-sum effect. "Facing one set of data means giving up other data, which is another kind of deletion. Dealing with low-quality outdated data is the premise of discovering the meaning of big data. Simplification: the winning rule of business in the era of big data, written by Matthew E. May, a famous scholar, also mentioned the problem of information deletion and simplification in the era of big data. In the era of big data, enterprises that can quickly obtain the most valuable data in the first time will gradually develop, and enterprises that do not understand big data or are addicted to big data will gradually fall behind.
2. Useful data
The era of big data provides a diverse, detailed and complex data environment. In the era of big data, all reality can be quantified as data. But if you use big data to create value, you need to find valuable data from massive data and restore it to reality. Because having a data set, regardless of its size, will not bring any value in itself. "The ultimate value of big data is still reflected in the' availability' of data. At the same time, the problem of digital divide also appears in the "availability" of data. Big data is like providing a delicious nut, which is difficult to open without tools, and the "cloud storage and cloud computing" used by big data is not easy for any public to master. A small number of people have mastered the ability to analyze and apply data, and quite a few people are at a loss in the face of massive big data, and eventually fall into the anxiety of information overload.
Bridging the "digital divide" in data availability needs to make data directly visible, which is still a public topic. Restoring data to reality requires not only the artificial intelligence technology of data analysis, but also people's keen analysis and judgment ability. More importantly, it is necessary to truly convey the environment of data prompts to the public. The government and the media still have a lot to do. First, the popularization of data processing technology is needed, and the interpretation of big data on public affairs is regarded as a public utility. For example, in the 1960s, john mccarthy, known as the "father of artificial intelligence", predicted that "one day, computing may become a public facility". Secondly, the media should be a "ferryman" between data and reality. For example, when American journalists reported tornadoes, they superimposed the damage data and maps of houses damaged by tornadoes to make big data maps. "In this way, the audience can not only accurately understand the general area of disasters caused by tornadoes, but also accurately understand the specific situation of losses caused by tornadoes in a certain area.
(C) the digital divide in data thinking
The important change brought by the big data boom is about the change of data thinking. There is a lot of discussion about big data, but it is not the concept of "big data" that naturally changes our information environment. Instead, the big data thinking from "digital survival" to "digital survival" has given people an extra perspective to understand the world today when the Internet is gradually moving towards massive data. The digital divide beyond big data technology comes from people's thinking level, that is, people think differently about data.
1. Beyond Big Data
One of the thinking in the era of big data is to transcend the "data myth" and regard data as a tool rather than data hegemony. In the book "The Age of Big Data", Schoenberg pointed out three changes brought by big data: not random samples, but all data; Not accurate, but mixed; Not causality, but correlation. These changes have had a great impact on the traditional quantitative research methods, but the improvement of quantitative methods can not replace qualitative research, and we must go beyond the data to discover the meaning and value behind the data. As a result, big data thinking includes three levels. The first level is to discover massive data and understand its potential value, but it can't make good use of the data; The second level is that it can make good use of data, but it often falls into data worship and cannot solve the meaning problem; The third level is to be able to use data, but at the same time to go beyond data and find value. These three levels are not only a diachronic process of big data development, but also a * * * knowledge process. The rise and spread of the concept of big data will take time, so the three-level "digital divide" in data thinking will exist for a long time.
2. Big data literacy
Narrowing the digital divide still requires efforts from both hardware and software, which is still the case in the era of big data. According to the statistical report of China Internet in recent years, the digital divide of hardware is gradually narrowing, while the digital divide of software is still expanding. Bridging the digital divide requires the government, enterprises and so on. Open public data and provide ways to use it. It is also necessary to improve the big data literacy of all citizens and realize the ownership and enjoyment of big data by the whole people. Data literacy, also known as data information literacy, mainly refers to people's ability in the collection, organization and management, processing and analysis of scientific data, sharing and collaborative innovation and utilization of scientific data, as well as ethics and behavior norms in the process of data production, management and release. Only by comprehensively improving the data literacy of the whole people can we confidently meet the arrival of the era of big data and use big data to create new benefits for mankind.