Current location - Quotes Website - Collection of slogans - How to escape from ideological limitations?
How to escape from ideological limitations?
This article is reproduced from: Man-machine and Cognitive Laboratory of Foshan Institute of Science and Technology, by Diao and Yao Zhiying.

While attaching great importance to big data thinking, we should also treat its limitations rationally: misunderstanding of all-data mode, anxiety of quantitative thinking and excessive worship of correlation; It is necessary to consider the whole, quantitatively integrate the qualitative, causality emphasizes the correlation and complementarity, and realize the transcendence of big data thinking.

With the rapid development of a new generation of information technology, especially the widespread popularity of mobile Internet, big data, cloud computing, smart wear and other technologies, data has exploded, and human society has entered the era of big data characterized by data. The arrival of the digital age, in which everything is recorded and everything is analyzed, is unstoppable. [1] In the big data environment of 10, data has become a "new energy" to drive economic and social development and create greater economic and social benefits. In the field of scientific research, Jim Gray, winner of Turing Prize in computer science, put forward the "fourth paradigm" of scientific research, that is, the research paradigm based on data-intensive computing. In this context, "quantify everything" and "let the data speak" have become the slogans of the times. People pay more attention to the holistic thinking of "all data instead of samples", pursue the quantitative thinking of "quantification instead of qualitative" and emphasize the related thinking of "correlation instead of causality". This undoubtedly has a great impact on the traditional thinking of grasping the relationship between things by pursuing regularity, causality and sampling methods. However, everything is the unity of opposites. In the current upsurge of big data thinking, it is necessary to maintain rationality, dialectically treat the thinking changes it brings, take its limitations seriously, explore complementary ways, and better adapt to the survival and development of the era of big data at the thinking level.

1 Limitations of big data thinking

1. Misunderstanding of Full Data Mode

With the popularity of various sensors and intelligent devices, real-time monitoring of things and data acquisition and transmission can be realized, and the data obtained from things is not only sample data, but all data. This mode is called "all data mode". On the basis of full data model, we can analyze and grasp the characteristics and attributes of things more comprehensively, and it is also conducive to more objective and scientific decision-making. However, for the all-data model, some scholars have suggested that "N = all" is often a hypothesis of data, not reality. Therefore, while pursuing sufficient data, it is necessary to make necessary scrutiny.

First of all, we are gradually caught in the contradiction between explosive data growth and backward technology. In the big data environment, the data is changing rapidly, not standing still. According to IBM's estimation, the amount of newly generated data reaches 2.5 * 10 18 bytes every day. If 1 m3 water is compared to a byte, its data volume is even larger than the total water storage capacity of the earth, which is 1.42 * 10 18 m3, and its data increment is very amazing. Even though the level of data technology has improved rapidly, it still lags behind the growth rate of data. "Even if we do collect all the data and analyze it with technology, we can only grasp the relationship between points or grasp the local correlation. But this does not mean that the general laws and trends of things can be drawn. " This shows that the relative lag of technology hinders the realization of all-data mode.

Secondly, the objective existence of "data island" limits the realization of "all-data mode". To realize the "all-data mode", its important premise is to realize data openness and * * * sharing. With the value of data being well known by enterprises and governments, data opening and sharing have achieved certain results, but so far, the circulation channels of data resources have not been fully opened, and the problem of "data islands" still exists to some extent. Mainly manifested in: First, there is no real cross-industry data flow. After recognizing the potential value of data, enterprises and governments can quickly realize the flow of data resources between departments or within departments to facilitate the convenient development of organizations. However, driven by the interests of all data subjects, the data between departments and within departments have not really flowed with each other, which has become another important problem to be solved urgently in the "data island". Second, the rise of the data trading market has intensified the formation of "data islands" to some extent. Driven by interests, emerging enterprises with data sales as their profit model will inevitably improve the confidentiality of collected data, and this kind of psychology and behavior will also make the problem of "data islands" more prominent. Third, the slow docking speed of enterprises and the fast data update speed make the problem of "data island" prominent. Because the development speed of technology can't keep up with the growth speed of data, and the data update is slow, the * * * of old and new data will "blind" people's vision and lead to a new level of "data island". Therefore, the so-called "all-data mode" may become an ideal state we expect, a new "utopia" constructed by the development of data technology, and a projection of the information society-Plato's cave shadow.

Finally, the key value of big data is not "big" and "complete", but "useful". Pursuing the all-data model will create an illusion that as long as all the data can be obtained, more data value can be mined. At present, most of the data that can be mined for value are structured data that can be recognized by computers, but in the whole data world, most of the valuable data are unstructured data based on documents that are not recognized. 20 14, unstructured data accounts for more than 80% of the total data, and 20 15 accounts for more than 85%. At the same time, the growth rate of unstructured data is more than twice that of structured data. This causes some unstructured data to be unrecognizable because it cannot be identified as "data garbage", and finally it is discarded. In this way, the realization of our so-called "full data mode" will become more difficult.

2. Quantify thinking anxiety

In the era of big data, all phenomena and behavioral changes in nature and human society are digital, and it is possible to "quantify everything". At the same time, we should pay attention to several problems in quantitative thinking. Defects of Ontology and Method In today's era of big data, all human activities will leave traces of data, and the whole world will gradually evolve into a digital world, and the data world outlook will continue to be highlighted. Under the guidance of data worldview, "quantifying everything" has become the methodology in the era of big data. Philosophers also began to reflect on the relationship between data and the world, and even put forward the conclusion that the origin of the world is data. But has data become the ontology of the world? We believe that this idea is mainly due to the deviation in understanding the nature of data, which needs careful consideration.

First of all, the data source of big data is mainly based on people's conscious or unconscious behavior in social life. In other words, big data is a quantitative reflection of the objective existence of perceptual activities in people's social life, and "quantifying everything" is an ideal way to understand things in the era of big data. Therefore, in essence, the source of data is the objective material world. Without the material world, data will become "water without a source, wood without a root".

Secondly, the main purpose of "quantifying everything" is to collect, transmit, store and analyze the data generated by people's past perceptual and objective activities, so as to intervene and guide people's behavior. Its main function is to improve the objectivity and scientificity of prediction and give full play to people's subjective initiative and creativity. But this ideal method of "quantifying everything" only realizes that "data is static data of human social life", but ignores the objective fact that "human social life is dynamic data". It regards the whole human social life as a static data set without vitality, ignoring that many phenomena in the whole nature and human society are rapidly changing and complicated.

(2) Personal behavior is "chosen"

Quantitative prediction will make individual behavior "selected". Quantitative analysis and processing of people's behavior, attitude and personality based on big data technology can predict and help people find the so-called suitable marriage and love object, but we will also ask: Is this object found by the system the most suitable for individuals? If we make this choice according to the quantitative analysis of data, should we abandon our personal intuition and feelings? Do we give up the right to choose, or follow the system to let us be "chosen"? From another point of view, this is an understanding of the relationship between sensibility and rationality: perceptual factors such as feeling and inspiration are the only ones at the beginning of human life, and they are the most instinctive intuition of human beings to the whole nature and society. Rationality is gradually developed on the basis of sensibility. People pay more attention to rationality, mainly because rationality is easy to master because of its clear and strict logic, while sensibility is easy to be ignored because of its uncertainty. But it is precisely because of this that rationality is limited, while sensibility is infinitely extended because of the uncertainty of its performance, and it can also make the most instinctive intuitive response to the constantly changing and developing world. We have doubts about finding the so-called suitable love or marriage partner based on big data analysis, because just as the human brain can't be replaced by computers, sensibility can't be replaced by reason.

The object of big data analysis and prediction may be a good choice, but it is not necessarily the appropriate or best choice. This kind of prediction has actually had a certain impact on the individual's freedom of choice.

(3) The intensification of data dictatorship

Quantitative forecasting intensifies "data dictatorship". The core of digital thinking is quantification, or "speaking with data". The successful prediction made by quantitative analysis will further aggravate people's dependence on data assets. The success story of Wal-Mart's so-called "beer and diapers" is empirical. Nowadays, enterprises and governments pay more attention to the role of data, especially in the decision-making process. It seems that the lack of data will greatly reduce its persuasion. If the government makes any decision based on the data, it will have the opposite consequences. For example, GDP is 6% this year and 6.3% last year, which is 0.3 percentage points lower than last year. Can we conclude that this year's economy must be worse than last year? Obviously, it is not objective to make such an assessment only based on this data. Yevgeny Morozov, an Internet philosopher, sharply criticized the ideology behind many "big data" applications and warned that "data tyranny" was about to happen. "Words have no intention, meaning comes from context", so data analysis and prediction need to be related to the corresponding scenes, otherwise "ambiguity" will appear.

(4) Privacy peeping and moral torture

"Quantifying everything" further exposes personal privacy, and quantitative prediction is sometimes against morality and ethics. First, personal privacy is exposed to the sun. The application of smart devices such as wearable tools and smart chips can monitor everyone's behavior in real time. Under the supervision of the "third eye", we are naked and become "invisible people". For example, various medical sensors can monitor the physiological changes of individuals in real time. Secondly, the leakage of digital privacy has deepened social discrimination. With the digitalization of personal behavior, under the guidance of data interests, it is easy to leak privacy, which will also deepen the degree of social discrimination. For example, if a hospital leaks personal medical data and shows that someone has HIV, people will look at this person with colored glasses, resulting in psychological imbalance, blocked life, difficult employment and so on. In addition to the violation of individual human rights, the degree of social discrimination has further deepened. Finally, big data prediction sometimes goes against human morality. As we all know, Target has a project analysis, which is based on the data analysis of personal browsing and purchasing pregnant women's products. It can predict the time when a girl is pregnant in advance, and give the relevant pregnancy product coupons to the girl, but the girl's father didn't know it, and scolded the manager after learning it. There are two questions behind this matter worth pondering: First, how did the company learn that the girl was pregnant? How is personal privacy leaked? On the other hand, our privacy is voyeurism, which is obtained without personal knowledge and consent, which not only makes individuals panic, but also violates the law. Second, the father, as the closest person to the girl, hasn't learned about it yet, but the company has learned about it and pushed the coupon first. Is this disrespectful to others? Is it against morality and ethics? The related ethical issues deserve reflection.

3. Overuse of correlation

The core thinking of big data is related thinking, but related thinking also leads to the problem of excessive worship in life practice. There are several reasons why people idolize related thinking: First, the existence of massive data makes it impossible for people to directly dig out truly valuable things from numerous messy data, so people can only get the relevance between things through statistical correlation analysis, and then further dig out the real "knowledge" behind them. Secondly, under the background of highly complex and uncertain times, it is more difficult for people to explore the causal relationship between things. Complexity science tells us that the world is complex and universal, which requires us to look at the world with complexity thinking and grasp and study the whole human society as a whole. Relevance thinking grasps the relevance of things from a macro perspective, which intensifies people's worship of relevance thinking. Finally, in the rapidly changing environment, correlation analysis is more suitable for the logic of enterprise management: only focusing on form, not seeking reasons. For actual business activities, the pursuit is to get the maximum profit at the lowest cost in the shortest time, which further intensifies the excessive worship of related thinking by enterprises. "The essence of big data is a statistical correlation. Phenomenologically, it is consistent with the statistical laws in classical science, which is where they are the same or confused [2]. However, when using correlation analysis, we should pay attention to the following two problems: First, the key of correlation analysis is to find "related objects". With the growth of data volume, the breadth and depth of data are also expanding, and there are more and more meaningless redundancy and junk data, which brings more data noise and really valuable data is submerged. How to find "correlation" from the numerous data noises is an important problem to be solved in big data analysis. Second, the objective existence of pseudo-correlation and pseudo-correlation is the difficulty of big data analysis. There are many kinds of statistical correlations, including positive correlation and negative correlation, strong correlation and weak correlation, and false correlation and false correlation. False correlation and other related relationships will lead to wrong analysis results and bring serious consequences. Several flu prediction errors in Google flu system confirmed this. How to identify false associations and other related relationships is a difficult point for big data analysis to break through. Finding the causal relationship of things is a long-standing mindset and habit of human beings, and it is also a necessary way to grasp the inherent nature of things. Reichenbach, a famous philosopher of science, said: "Without causality, there is no correlation." To prevent blind worship of related thinking, break through the limitations of big data thinking, and pay attention to using complementary thinking to transcend the limitations of big data thinking.

2 Realize the transcendence of big data thinking in complementarity

1. Overall consideration

As a philosophical category that marks the separability and unity of objective things, whole and part have important epistemological significance. From the perspective of methodology, the "all-data model" focuses on grasping things in a holistic way, rather than restoring methods. Therefore, to overcome the limitations of "all-data mode", we must focus on the whole and grasp it systematically; Give consideration to both parts and deepen understanding. Realize the unity of overall method and reduction method.

First of all, focus on the whole and grasp it systematically. Classical system theory holds that we should regard the whole thing as an organic whole and grasp the characteristics and functions of the whole. In addition, complexity science believes that the world is complex and changeable, which requires us to have a global vision and grasp complex objects as a whole. In the era of big data, what we have to do is to take all the data as a whole, use machines and modeling to find the correlation between the data, find out the "related objects", grasp the overall attributes of things reflected behind the data, further analyze the structure and relationship among the internal elements of things, dig deep into the causal relationship between the elements, and understand things in a concrete and comprehensive way.

Secondly, give consideration to both parts to deepen understanding. The traditional reductionism holds that things are divided into different parts, and the understanding of the whole can be realized through the understanding and integration of each part. Although the traditional reductionism also has the defect of ignoring the interrelation and interaction between various parts of things, this does not mean that reductionism is useless, and its reduction method has not eliminated people's overall understanding of things. In terms of research strategy, the idea of reductionism is mainly embodied in a strategy of layer-by-layer analysis. Therefore, in the complex era, the key to using the reduction method is to recognize the level of reducing things.

In the era of big data, it is difficult to find the causal relationship between data because of the huge data and complex structure. So we take the whole data as a whole to grasp its relevance, but what is the overall essence of data materialization? Then it is necessary to further analyze the causal logic between internal factors, which essentially uses the method of reduction. In this sense, the exploration of causal logic is the concrete embodiment of the reduction method, but this reduction method is different from the traditional reduction method. Therefore, "the complex relationship between reduction method and holistic method should be' complementary' in the final analysis." The development of modern science also shows that "reductionism is not enough, as long as reductionism is not enough; There is no holism, no holism ... the scientific attitude is to combine reductionism with holism. " Only by fully understanding the dialectical relationship between the whole and the parts and the complex relationship between the whole method and the reduction method can we use this tool to understand and transform the world.

2. Quantitative integration and qualitative integration

The purpose of quantitative research is to answer the quantitative attributes of things and their movements, while the purpose of qualitative research is to deeply study the specific characteristics or behaviors of objects and further explore their reasons. From the content point of view, qualitative research and quantitative research should be unified and complementary: qualitative research lays the foundation for quantitative research, which is the basis of quantitative research; Quantitative research is the concretization of qualitative research, which makes qualitative research more scientific and accurate, and thus draws more extensive and in-depth conclusions. Both of them have their own advantages in analyzing problems from different angles, and it is precisely because of this that they can achieve a more comprehensive understanding of things. Therefore, in scientific research, we should combine the two, learn from each other's strong points and give full play to the maximum effect. First of all, the overall grasp of quantity is the basis of qualitative research. In the big data environment, the important role of "quantifying everything" is mainly based on three reasons: First, massive data makes it possible to "quantify everything". Based on the application of various intelligent devices, people's physical world and virtual world can be quantified. Through the data analysis of perceptual objects, we can find the correlation between data from the correlation degree presented by the correlation coefficient of quantity, grasp the correlation between data, and determine the quantitative connection of data materialization. Second, "quantifying everything" helps us to grasp things from the overall quantity. Through quantitative analysis, we can have a general understanding of the wholeness of things in quantity, which is not an abstract general understanding of things in the sense of qualitative research, but a concrete understanding of specific things, that is, we can construct a brand-new overall picture. Third, the essence of big data itself is a collection of quantitative relations, which has practical guiding significance. Albert-laszlo Barabas pointed out: "93% of human behavior is predictable, but in the past, we didn't have relevant data and there was no way to explore human behavior." Therefore, quantitative research plays an important role in grasping the correlation trend between things. Secondly, qualitative causal research has created new connections and met new demands. Although the quantitative analysis of big data enables us to grasp the relevance of things as a whole, it is impossible to clarify the causal relationship between them. Causality is the relationship between the interaction between elements and their effects. Therefore, on the basis of grasping related things in the quantitative dimension, we should deeply study the structure and combination of internal factors, explore the causal relationship of internal factors, change the interaction between factors, and combine the needs of human development to create results that meet human needs. On the other hand, the new causal relationship arising from the causal logic between internal factors can be further investigated or tested in quantitative research. In this way, quantitative research provides the overall quantitative attributes and general structure of perceptual objects for qualitative research. On this basis, the qualitative research deeply discusses the interaction between elements, draws representative conclusions, and then puts the whole data into quantitative research for empirical study to realize the complementarity of quantitative and qualitative research.

3. Causality emphasizes that in the context of the era of big data, Schoenberg proposed that "we know what is enough and there is no need to know why." Since then, people have paid more attention to relevance than causality. However, while the whole human society is actively concerned about correlation, it is inevitable to reflect on and re-evaluate the importance and influence of causality. We can't help but ask questions: first, is there an ontological problem of causality in the world? Second, what is the relationship between relevance and causality? Third, how do scientific research complement each other? Regarding the ontology of causality, we believe that causality exists objectively. Causal thinking is a long-term thinking habit of human beings, and it is also the logical premise for us to understand the nature of the world. In modern times, the research results of natural science and humanities and social sciences are based on the strict mathematical logic reasoning of causality, and the central task of natural science is to reveal the causal relationship between things. Regarding the relationship between causality and correlation, some scholars believe that it is a reflection of the relationship between science and technology in the context of the era of big data. Science is the knowledge of exploring causality, that is, the law of causality, while technology is the method and skill to solve problems. The two have different emphases, but they are not antagonistic. Just as technology solves "how to do it" and science answers "why", relevance can guide us to "how to do it" in practice and causality can answer "why to do it". Even if the era of big data pays more attention to relevance, it is always inseparable from the pursuit of causality, which is determined by the nature of thinking. Paying attention to correlation analysis does not deny causality analysis, nor does it mean that causality is unimportant, but it is more conducive to the in-depth analysis of causality, because the two are not exclusive, but coexist. We can complement each other in scientific research. First of all, correlation is the basis of causality research. In the era of big data, we can find the correlation of things quickly, conveniently and accurately through the correlation analysis based on massive data, and then explore the causal relationship of correlation and grasp the essence of things. As Schoenberg said: "By finding out the things that may be related, we can further analyze the causal relationship. If there is a causal relationship, we can further find out the reasons." In the process of finding the correlation of features, there is actually an analysis of causality.

Secondly, causality is the inherent regulation and goal of relevance. In the field of scientific research, what we seek is not only to know the relevance of "what", but more importantly, to find out the causal relationship between things, so that the scientific theory established from this can stand the test of practice. In this sense, causality is the inherent and essential regulation of related relationships in the era of big data, and it is also the goal pursued behind related relationships, which plays a decisive role. What we need to do is to take causal thinking as the research foundation and relevant thinking as the research orientation, and the two complement each other to tap the value contained in big data and realize the transcendence of big data thinking.

Welcome to comment.

Summoning order

Come as soon as possible with academic stunts!

1, academic level: a certain professional academic level is necessary!

2, have unique insights: depth, breadth and sharpness are the best!

3. Content category: natural science, social science, etc. There is no limit!

4. Materials or translated articles: anything that meets the above conditions is acceptable.

Academic consultation/submission email address

Disclaimer: The copyright belongs to the original author. The views of the article do not represent the position of this institution.

Journal of China Electronics Research Institute welcomes experts and scholars to contribute! Submit a link to ki.net

Tel: 0 10-

Log e-mail: