This paper includes the following four parts:
1. Why is data analysis important?
2. What are the common analysis methods?
3. Some data-driven methods
4. Advanced skills of data analysts
| 0x00Why is data analysis important?
On the first level, statistics is still the core method of data analysis.
Let's first look at the definition of data analysis: "The process of studying and summarizing data in detail in order to extract useful information and form conclusions". Based on statistical methods, data analysis provides rigorous analytical methods and tools for social science problems. Although the emergence of big data technology has greatly expanded the boundaries of statistical research, it has not changed the basic idea of statistics to infer population distribution characteristics through random sampling. The most basic statistical methods, such as causal inference, sufficiency principle, data induction, etc. Even enhanced by the popularity of big data technology. With the blessing of big data technology, many important socio-economic psychological variables can be constructed, such as residents' happiness and investors' emotions. The development of real-time technology even makes real-time prediction possible.
On the second level, data analysis has guiding significance for business development.
Or quote the famous saying of Peter Gluck, a master of management: "If you can't measure it, you can't imagine it". Only by finding the key measurement index of business development, namely "Polaris Index", can we optimize the business in a targeted manner. There is a saying circulating on the Internet. Avinash Kaushik, one of Google's analysis promoters, famously said, "All data add up to crabs. Duan or death. " All the total data is rubbish, either group or death. Summarizing data will cover up many problems. Only by in-depth analysis of the data can we get the real reason of the trend and understand how to optimize the "Polaris Index". Under the premise of the gradual disappearance of the Internet demographic dividend, in-depth understanding and analysis of business data can keep high-quality business growth.
To sum up, data analysis is still very important. If you want to know what value your work can produce, the knowledge of data analysis is an essential "data sense" for data practitioners.
| 0x00 1 What are the common analysis methods?
The post skills of data analysts require us to analyze and solve problems in an orderly and systematic way. We need to learn from some commonly used analytical methods to quickly locate the root of the problem.
The analysis method includes two parts, one is macro-strategic analysis, and the other is micro-data analysis.
Macro-strategic analysis mainly includes:
PEST analysis, by studying politics, economy, society and technology, analyzes the macroeconomic situation faced by enterprises.
SWOT analysis, by studying strengths, weaknesses, opportunities and threats, to dynamically analyze the internal and external competition of enterprises;
Five-force model, by analyzing the competitiveness of existing competitors in the same industry, the entry ability of potential competitors, the substitution ability of substitutes, the bargaining power of suppliers and the bargaining power of buyers, analyzes the competitive strategy of enterprises.
Although macro analysis is too big for our daily work, it is actually very helpful for some specific industries, such as insurance, medical care, online education, mutual funds and logistics. , analysis of policies, regulations, risks and other considerations.
Next, let's talk about the common microscopic data analysis methods. The following introduces several commonly used methods with a small case.
The first is hypothesis testing.
Hypothesis test analysis, also known as statistical hypothesis test, is a statistical inference method, which is used to judge whether the differences between samples, between samples and between samples and the whole population are caused by sampling errors or by essential differences. Mainly divided into three steps: 1, put forward the hypothesis; 2. Collect evidence; 3. Draw a conclusion.
Hypothesis test is mainly based on logical reasoning to analyze the causes of problems, so it is often used in attribution analysis.
For example, our Polaris index has dropped, and we need to find out the corresponding reasons. There are three possibilities at first, user problem, product problem or competing product problem.
From these three aspects, we can put forward three hypotheses:
If the user has a problem, then we can analyze the problem from the business link diagram or disassemble the problem through multi-dimensional analysis;
If there is something wrong with the product, we can study whether the recently launched product functions meet the needs of users;
If there is a problem with competing products, we can investigate whether the competing products are subsidized and promoted on a large scale through external market information.
After a preliminary conclusion is reached, the process of analysis usually goes on, asking why, and then continuing to verify the reasons with data until the root of the problem is found.
The second is logical tree analysis.
Logical tree analysis is easy to understand. Is to break down a complex problem into several simple problems, and then expand the problem step by step like a tree trunk. By solving a single subproblem, we can get a summary of the answers to the question.
For example, if we also analyze the reasons for slow profit growth, we can divide the problem into three dimensions: income, cost and gross profit by logical tree, and then analyze the problems in each dimension in turn.
Income needs to consider the number of customers, customer quality, payment rate, willingness to pay and other issues; Cost needs to consider advertising cost, labor cost, promotion strategy and other issues; Gross profit needs to consider issues such as warehouse allocation and channel quality. Finally, through the summary of each sub-question, the real reason is obtained.
The logical tree has three basic principles, namely
Factorization: reduce the same problem into elements;
Frame framing: organize all elements into a framework and observe the principle of no weight and no leakage;
Relevance: The elements in the framework maintain the necessary interrelation, which is simple but not isolated.
The third one talks about group analysis.
Grouping analysis is to divide data into different groups according to a certain characteristic, such as time and interest, and compare problems by comparing the data differences between groups.
Group analysis is very helpful for the analysis of different stages of product life cycle, such as the effect of newly released version, dividing users into different groups according to time, and then comparing the retention rates among different groups to analyze the reasons why users stay or leave.
For example, users of the video platform need to recharge to become VIP to watch the platform-specific TV series, but users can cancel their subscriptions at any time. Users who unsubscribe in this way are losing users. In order to analyze the reasons for the loss of users, we can use group analysis.
Draw the data of each group as a dotted line, with time on the horizontal axis and retention rate on the vertical axis, and then compare the dotted lines of each group. It is usually easy to see that the retention rate varies greatly at different times for the following reasons:
The product has recently introduced some new functions, but these new functions are not suitable for new users;
Recently, the market has been engaged in promotional activities to bring new users, but the company's products are of no value to these new users, resulting in the loss of users.
Combined with the hypothesis test mentioned above, the root of the problem is further analyzed, so we have formed some relatively fixed analysis methods: 1. Group analysis, looking for groups with low retention rate; 2. Hypothesis test, ask questions and verify why the retention rate is so low.
The combination of different strategies forms our own analysis method.
Of course, there are many methods of data analysis, which need to be summarized and improved bit by bit through daily study and practice.
|0x02 Some Methods of Data Driving
Data-driven, in short, is to analyze the causes of problems through the data of digital services, such as e-commerce, video, etc., and propose optimized solutions to promote business growth or product iteration. This is the key to the growth of the Internet industry, a business method that data practitioners need to master, and an important measure to evaluate a person's working ability.
Data driving usually consists of the following processes:
Qualitative analysis of the data, found the problem;
Quantitative analysis of data to determine the influence surface;
Investigate the common practices of companies, competitors and industries;
Estimate the effect after solving the problem;
Design the corresponding experimental mechanism;
AB test draws the experimental conclusion;
Follow-up changes of online tracking strategy.
The following are some knowledge that needs to be involved in data analysis, namely qualitative and quantitative analysis and AB test. Other parts are usually implemented by the engineering team.
Qualitative analysis is a "qualitative" study of the research object and an analysis of the internal laws; Quantitative analysis is a quantitative study of the research object, describing its interaction and development trend.
For example, the problem of "low order-to-payment conversion rate" in the e-commerce scenario was found through data, and the problem needs to be analyzed. We found this problem in some commodities by grouping+funnel method, and then analyzed the cause of the problem by sampling the data, probably because of the false price. This is a qualitative analysis. After locating the cause, we select abstract commodities and infer the overall influence range by artificially evaluating the proportion of false prices, which is quantitative analysis.
Then, we set some strategies and need to verify the influence of these strategies on the problem of "low conversion rate from order to payment", so we need to do experimental comparison.
AB experiment refers to the problem that there are two or more solutions to the same problem, and the same group of people are randomly divided into two groups. In the same time dimension, the experimental group and the control group carried out experiments, and used a small number of the same measurement indicators to measure which scheme had better results. Of course, the premise of doing this is that the sample size is sufficient, but it is usually not difficult for digital Internet services.
By analyzing the comparative data of different strategies after AB experiment, we can see whether our strategy can bring the expected positive effect, and if so, we can go online. After going online, quantitative data analysis is carried out to see the degree of problem solving.
These are some conventional data-driven methods.
|0xFF Advanced Skills of Data Analysts
Data analysts also need to understand algorithms.
Many times, analysts are divided into "forward" and "backward" roles, just like development. The "forward" role follows the business, and can find the problems existing in the business and find the corresponding optimization points; The role of "backward" is more to realize a function, optimize algorithms or experimental methods, more like the back end, but smarter.
Although statistics can provide us with a very good analytical method, not all the problems in the world can be summarized by statistics. Analysts in many directions still need to master algorithms to meet the needs of their work.
For example, the most typical "supply and demand matching" problem, because quantitative change causes qualitative change.
In the past development history of the Internet, no matter in B2C, C2C, B2B and B2B2C, we have established an accurate portrait system, which not only aims at users, but also aims at suppliers, and realizes the user management of thousands of people and better matches the supply and demand management. Later, this mechanism was derived from other aspects, such as personalized video recommendation, network car management, etc., which are all part of the matching between supply and demand.
However, how to do matching recall in tens of millions or even billions of commodities, how to match clues in massive data, how to determine who is our target population, how to recommend information flow to the most suitable people, and how to measure these effects … Many schemes need to be comprehensively considered, whether it is to form rules through statistical data analysis or to mine features through algorithms, which are all ways to achieve the goal.
Because large companies are rich in resources, they usually run in parallel, which to some extent strictly distinguishes the responsibility boundary between data analysis and data algorithm. However, the limited resources of small and medium-sized enterprises may cause the phenomenon that analysis is an algorithm.
Similarly, there are fields such as risk control and knowledge map. In addition to the coverage of manpower, the intervention of machines is also needed to optimize the effect.
In fact, the growth of data analysts is more like a marathon, because they need to be exposed to a lot of knowledge, be able to allocate their time and energy reasonably, and often remind themselves of what their core goals are in order to do things well and not fall behind in long-distance running. Analysis is just a skill. As a career in life, we need to be closer to the actual scene, close to the company's development, and make corresponding reasonable strategies.
"To conscience" is the law.
Wang Yangming studied Taoism all his life, and after realizing his conscience in his later