Fortune Telling Collection - Comprehensive fortune-telling - Science Network-Statistical Thinking I Understand-Wang Wei's Blog

Science Network-Statistical Thinking I Understand-Wang Wei's Blog

Mark Twain, a famous American novelist, quoted former British Prime Minister Benjamin Disraeli in his autobiography 1907:

There are three kinds of lies: lies, damn lies and statistics.

Because of Mark Twain's high popularity, this sentence was widely circulated after he said it.

Everyone has studied mathematics for many years. Of course, one of the reasons why we want to learn mathematics is that we will use some mathematics in our life and career, that is to say, mathematics can be regarded as a tool. A person who is proficient in mathematics often has the characteristics of strong logic and accurate calculation. What about statistics?

On the one hand, statistics are becoming more and more important now. When making decisions, people must have statistical data and regard statistical data as amulets. At the same time, there are people like Mark Twain who scoff at statistics. Even in academic circles, many people think that statistics is only a part of mathematics; But more statisticians think and repeatedly emphasize that statistics and mathematics are completely different.

Perhaps it is easier for us to feel what is an economic mind, what is a literary cell and what is a musical accomplishment. So what is a statistical mind? Statistical cells? And statistical literacy? It's not easy to make it clear. This paper attempts to talk about the above problems through the interpretation of statistical thinking mode.

1, correctly understand the importance of statistical thinking

Let's look at an example first. 1985165438+10 In October, American scholar gary taylor found a poem in the library of Oxford University in England (let's call it "Taylor's poem"), which triggered a war of words among British and American scholars who studied Shakespeare's literary works. The focus of the debate is whether this poem was written by Shakespeare.

Many experts believe that this "Taylor poem" is different from Shakespeare's other works in terms of wording, sentence formation and charm. Two months after the dispute, Science magazine 65438+65438,0986 published on124 October published an article entitled "Shakespeare's New Poems: An Ode to Statistics", which introduced how two statisticians, Efron and Sisted, used statistical methods to identify whether this "Taylor Poem" was written by Shakespeare.

The methods of Efron and Thisted are as follows: Everyone has his own usage habits, especially for rare words, and the usage habits of each author may be even different. There are 884,647 words in Shakespeare's known works, including 3 1534 different words. Among these different words, 14376 words only appear 1 time from beginning to end, and 4343 words only appear twice. Words that appear several times count. Those rare words that appear less frequently in the total works are Shakespeare's rare words. Based on these data, assuming that the 429-word Taylor Poem was written by Shakespeare, they estimated that there would be several words, which never appeared in the total works (that is, new words), only 1 time, twice, ... until it appeared 99 times, and all the estimated values were given. The actual situation is very consistent with the estimate.

That's not enough. Could it be that poets of all ages have similar vocabulary habits? So, the two men found three poets who were about the same age as Shakespeare, and each took a poem and compared it with the other four Shakespeare poems and this Taylor poem. After three statistical tests, it is found that if the first three songs are all Shakespeare's works, the actual value and estimated value of the frequency of rare words are not consistent. The four selected Shakespeare poems, although occasionally inconsistent, are generally acceptable. Efron and Thisted said that their analysis could not completely prove that Taylor's Poems was written by Shakespeare, but it was really surprising that the use of uncommon words was so consistent with all Shakespeare's works.

After the statisticians expressed their opinions, a literary controversy quickly subsided. No wonder we pay tribute to statistics. Making decisions by statistical methods embodies an objective and reasonable thinking. It is best to use objective statistical methods to judge whether it is the same as subjective debate style. But what is objective enough? Efron and Thisted not only tested Taylor's poems, but also compared several contemporary poets of Shakespeare. Which is more reliable? In case poets in Shakespeare's time also had the habit of using uncommon words such as fashion, this test would be of no reference value.

Statistics, like our thinking, is objective at first, otherwise it is self-deception. On the contrary, if our thinking is statistical, it is extremely objective.

William J. Sutherland, a professor at Cambridge University in the United Kingdom, published an article in the 20 13 issue of Nature, entitled "20 Facts You Should Know When Interpreting Scientific Views". After reading it, I found that all the scientific facts mentioned in it are related to statistical thinking.

Statistics is one of the most important tools in modern scientific research. Gao Erdun, a famous British biologist, once said: "Statistics has an extraordinary ability to deal with complex problems. When scientific explorers are struggling in the process of progress, only statistics can help them open a channel. " When using scientific research conclusions to assist realistic decision-making, we must have good statistical thinking in order to keep a clear understanding of scientific conclusions and interpret the scientific truth behind them more accurately.

The era of big data has changed from information shortage to information flooding, and the crisis of information shortage has given way to the difficulty of information screening. In this context, scientific method has become a compulsory course for everyone. In today's increasingly dependent on data, only by establishing correct statistical thinking can we effectively process and analyze data. Today, the world is entering the era of big data with information explosion, and statistics are becoming more and more important, which verifies the prediction of British science fiction writer H·G· Wells: "Statistical thinking will one day become an essential ability for an efficient citizen, just like reading and writing."

Statistics is widely used in various disciplines, from natural sciences to humanities and social sciences, and even information decision-making of industry and commerce and government. As a tool and means to understand nature and society, the quantitative relationship of objective phenomena is statistically studied to help decision makers understand the role of scientific research evidence in decision-making. As Fisher, the founder of modern statistics, said, "The unique aspect that brought human progress in the 20th century was statistics. The ubiquity of statistics and its application in exploring new knowledge fields far exceed any technological or scientific invention in the 20th century. "

Ma Yinchu once said: "Scholars cannot study without statistics, industrialists cannot practice without statistics, and politicians cannot be in power without statistics." Statistical thinking is a way of thinking in the process of obtaining data, extracting information from data and demonstrating the reliability of conclusions, which plays a great role in improving human cognition. Statistical thinking plays an irreplaceable role in scientific investigation to solve the mystery of nature, in investigating the authors of early anonymous literary works, in giving the annual representative of archaeological relics, in solving court disputes and in making the best decision.

Statistics is a knowledge from experience to rationality, and it is a science that uses the law of accidental discovery. It is not only a method or technology, but also contains the elements of the world outlook-a way of looking at thousands of things in the world. This is what people usually mean when they talk about things from a statistical point of view. The cultivation of statistical thinking not only needs to learn some specific instructions, but also can connect these instructions into an organic and clear picture from the perspective of development and gain a sense of history. As Schleswig in Germany once said, "Statistics is dynamic history, and history is static statistics."

From the statistical point of view, the knowledge that people get from experience or experiment contains uncertainty. Statistics focuses on measuring the uncertainty contained in this knowledge. Once uncertainty can be measured, people's knowledge will expand and their understanding of the world will leap forward. This process is repeated in the process of human knowledge accumulation. No wonder someone concluded:

In the final analysis, all knowledge is history: what we have now is a summary and deduction of what we found in the past;

In the abstract sense, all science is mathematics: all knowledge can be summarized as mathematical reasoning and operation;

On the basis of rationality, all judgments come from statistics: all judgments are a summary of past laws, that is, judging future trends according to the probability model of past data.

2. What is statistical thinking and its common ways?

First, let's look at what statistics are doing.

Discovering regularity from randomness is the basic idea of statistics and the charm of statistics.

In short, the two core concepts of statistical expression are:

Most of the knowledge we have learned in middle school is about inevitability. When it says 1 is 1, there is no mistake. And once a proposition is proved to be right, the question will always be right, without exception, unless you can find out the loopholes in the proof. In statistics, randomness is everywhere. It allows mistakes, and if there is no mistake, people will suspect that it is false. Statistics will also guarantee a problem, but its guarantee is based on probability form. Moreover, the guaranteed probability is not 100%, and there is an error. Statistics are full of "uncertainty". For example, it is a typical statistical guarantee to claim that 95% of the capacity of a beverage is between 425ml and 43 1ml. Statistics represents a way of looking at the world.

In a random world, the truth is often difficult to know, and everything is hypothetical, depending on which one you are willing to accept. The meaning of acceptance, just like the bride nodded and said "I do" at the wedding, does not mean that the groom is really the best for her. It's just that "she is willing to accept it at present". Similarly, in statistics, acceptance does not mean truth, and rejection does not mean falsehood. Statisticians' judgments always give errors, which is statistical inference under allowable errors.

Probability and error constitute the two pillars of statistical thinking. And it shows almost all the statistical key points.

There is a certain correspondence between the methods in statistics and people's way of thinking. Let's list the common ways of thinking in statistics.

(1) is good at using data.

"data! Data! Data! " He cried impressively. "I can't make bricks without clay." This is a sentence that Sherlock Holmes said in a famous novel.

There is no Fiona Fang without rules, no brick wall without clay, and no decision without data.

Holmes can infer from some clues at the murder scene that the suspect may be left-handed or pass through an orchard. Fortune tellers also rely on information. Collected many different faces and the fate of the eight characters. With more readers, it is naturally easy to analyze the future according to people's faces. Don't people who are good at seeing through human nature also read widely? Making decisions requires data, and every data may be useful information. Statisticians must make good use of information if they want to develop their skills. So for statisticians, data is like rice that mice like to eat.

(2) Be good at capturing uncertainty.

The operation of the universe is intertwined with inevitability and randomness. For example, we know that Comet Halley approaches the Earth once every 76 years (which is inevitable). Although we can know what will happen in 76 years, will it rain tomorrow? Not so sure (random). Another example is to loosen the coin in your hand, which was learned in middle school physics class. If air resistance is ignored, the time required for coins to land is a fixed value at a fixed height. But after landing, which side is up? It is unpredictable. This is uncertainty.

People generally know what will happen in the future and how, but they can't fully grasp it. In a random world, inevitability makes people willing to prepare in advance, while uncertainty makes people full of hope or fear for the future. An inevitable world, without change, lacks hope for the future, which will make people lose the motivation to work hard. In a random world, luck alone will make people lose their positive and serious determination. Three points are doomed, five points depend on hard work, and two points depend on luck. This is the great design of the creator.

Because of the existence of uncertainty, all we can do is to understand it and often try to reduce these uncertainties. So our ancestors summed up some so-called rules for the random world to deal with such uncertainty. For example, the law of large numbers, another important random law is the central limit theorem.

Predicting and estimating in statistics is essentially generalization. It's the statistician's skill to generalize from one to the other.

(3) Have the thinking of believing in probability.

Mathematician Pierre-Simon Laplace once said, "Most of the most important problems in life are just probability problems". In the random world, the word probability is catchy, but few people really understand the meaning of probability.

What is the meaning of probability? When we roll the dice or draw lots, we usually use "the same possibility" to explain the probability. That is, the six faces of the dice, and the probability of each face is considered as 1 in 6. This explanation is quite applicable in daily life. When there is no other information, it is usually assumed that every possible result has the same probability.

The second way is to explain probability with relative frequency. For example, if a professional basketball player's shooting percentage in the past is 0.527, it means that the player's shooting percentage in the next shot is about 0.527. This common probability explanation is objective. The theoretical basis behind it is the law of large numbers. For the phenomenon, you can observe it repeatedly.

The last way is subjective probability. For example, the probability of Brazil winning the World Cup, catching up with a girl and so on are subjective probabilities. These events cannot be observed repeatedly and are one-off.

The above three explanations of probability are sometimes used interchangeably or mutually verified.

There are small probability events. What you thought impossible at first will happen as long as you observe it enough times. Some people call it the real law of large numbers. When a small probability meets a large sample, it will not be too unexpected. In a random world, believe in probability, not challenge it.

(4) Have reasonable estimation thinking.

Once upon a time, there was a child selling fried dough sticks. He always puts all the money he sells in a basket full of fried dough sticks. One day, due to something urgent, I put my basket on a big stone and went to the toilet. When I came back later, it was a bolt from the blue and all the money in the basket was gone. He ran to tell the county magistrate in tears. After hearing this, the county magistrate asked someone to bring the stone for interrogation. Despite repeated threats, Si Tong said nothing. The county magistrate was furious and told people to hit stones with sticks. It's just that even if the stick is broken, the stone still won't talk. Everyone laughed when they saw it. The magistrate was even more angry. He fined the onlookers two coppers each and threw them into a basin full of water. Suddenly, the county magistrate pointed to a person and said, "You are the one who stole the money." The man cried injustice, and everyone was puzzled. The magistrate explained, "The child sells fried dough sticks, and his money is stained with oil. When other people's money is thrown into the water, no oil floats up. Only when this person throws money into the water and oil floats up does it mean that this person has stolen money. " The man bowed his head and confessed, and everyone was convinced.

This wisdom of the county magistrate's judgment is similar to the principle that teachers ask the most naughty students first: when choosing from several possibilities, give priority to the most possible situation. Will there be mistakes? Of course it will. Just because the money in his pocket has oil, you think he stole the money from the child who sells fried dough sticks? If someone receives the change from the fritters seller, isn't it stained with oil?

However, this method that people often use when making choices is effective. From the point of view of statistical thinking, it is the famous maximum likelihood method, which determines the estimated value according to the one with the greatest probability of occurrence. This method has many good properties, and it can often get good estimators.

In the NBA professional basketball game in the United States, teams win and lose each other, and it is hard to say which team is the strongest. In the regular competition, each team has to play 82 games, and the eight teams with the highest winning percentage in each district can play the playoffs. The winning percentage is the number of games won divided by the number of games. In order to keep the game visible, the NBA has a draft mechanism, so that the strength of each team will not differ greatly. Sometimes the winner of the whole season wins less than 60%. According to the winning percentage after many games in a season, it is the practice of professional football to decide who is stronger this year and whether he can participate in the playoffs. For another example, the idea of estimating the probability of successful operation and the probability of giving birth to triplets is often adopted.

With the development of statistics, hundreds of schools are arguing about this estimation method. These reasonable estimation methods often have their own advantages and are suitable for some occasions, and no method is always the best. For example, sometimes we think that an interval can be described more clearly, which is the famous confidence interval estimation method.

(5) Have a hypothesis testing thinking and think that there is no doubt.

People often seek fairness or justice. Take a simple cake shared by two people for example. If both sides don't want to take less, is there any good way to share it? This should be a way for neither of us to feel cheated. Even if it's for someone, it's best to draw lots. What if the election party thinks he earns more than half and the cutting party thinks he earns only half?

It is a fairer way for both the prosecution and the defense to choose me from you who have no principle similar to presumption.

1933, Neiman in Poland and Pearson in Britain gave the famous Neiman-Pearson lemma and established the principle of presumption of innocence in statistics, that is, hypothesis test.

Hypothesis in English comes from the ancient Greek hypotithenai, and scientific hypothesis (or hypothesis theory) is also the word. In mathematics, we often prove whether a proposition is true or false. But in the random world, many phenomena can only be regarded as hypotheses, depending on which one is more willing to accept. Accepting does not mean that you completely believe that the hypothesis is true, and rejecting does not mean that the hypothesis is false. After verification, no matter which hypothesis is accepted, it can't become a rule, and the hypothesis will always be a hypothesis.

3. Concluding remarks

In the preface of A Brief History of Mathematical Statistics, Mr. Chen Xiru said: "Statistics is not only a method or technology, but also contains elements of the world outlook-it is a way of looking at everything in the world. This is what we often say from a statistical point of view. But statistical thought also has a development process. Therefore, the cultivation of statistical ideas (or viewpoints) not only needs to learn some specific knowledge, but also needs to organically and clearly link these knowledge from the perspective of development and gain a sense of history. "

The establishment of statistical thinking is not achieved overnight. If there is any trick, it is to study and practice, then study and practice, and then continue to study and practice.

References: