Fortune Telling Collection - Comprehensive fortune-telling - Reflections on Big Data

Reflections on Big Data

2000 words after reading "Big Data"

Today, we are in an era of big data, and sometimes the data gives us strong proof. The following are 2000 words after reading big data. Welcome to read!

After reading "Big Data" 2000 words 1 these two years, the ideas of big data and cloud computing spread everywhere like the music of Little Apple. Every company, whether it is an internet company or a traditional enterprise, flaunts its own big data.

1, physical Internet of Things and virtual Internet of Things

Once upon a time, the concept of Internet of Things became popular. The huge Internet of Things allows a large number of objects in the world to be detected and networked, including people, cars, houses and other objects that can be networked. These objects can be perceived in a way and their information can be recorded for use. A few years ago, this was a seemingly unreachable thing, and it was unrealistic to put a so-called RFID tag on every object. Nowadays, with the widespread use of mobile phones, human beings themselves have joined the Internet of Things. Why the Internet of Things? What is this for? To know what the internet of things has gained, we only need to look at what we have gained after an object has not joined the internet of things and joined the internet of things. Well, obviously, we need to get the information of the object in some way. This stored information is called data.

The data generated by the Internet of Things is the information between physical objects, but now on the Internet, the largest amount of data is virtual objects, or network virtual objects. Because the network object is directly parasitic on the network and has the characteristics of convenient access to the network, it has great advantages when it is still difficult to obtain the information of the entity object. However, in the future, the amount of data generated by the physical Internet of Things will definitely increase, perhaps exceeding the amount of data connected with things on the network.

The widespread use of the Internet makes the generation and dissemination of information easy. Everyone who accesses the network exists in a certain role and is the creator of information on the network. For the generated information, everyone who accesses the network has multiple roles. For the network service provider, he is the role of the network user. For the portal, he is the user; For social networking sites, we play a virtual or real network role; For a browser, he is a series of roles, browsing the web and listing mouse actions ... Different roles depend on what information the other party needs from our actions. If all kinds of roles on the network are regarded as virtual objects, then the virtual Internet of Things composed of such virtual objects will produce a huge amount of data. I have experienced days when there is always a lack of access to information. Now, since information acquisition has become so easy, it is bound to usher in an era of information explosion-the era of big data.

2, the change of thinking

As technology changes, our way of thinking will also change. In the past era of small data, it was time-consuming and laborious to obtain, store and organize information, so we had to be careful to figure out how to collect as accurate information as possible with the least cost and the fastest way. The reason why there is sampling statistics is that it is impossible to obtain all the samples due to technical limitations, or even if they are obtained, they cannot be processed within a reasonable time. Because the cost of obtaining information is high, we must think everything clearly before we can begin to deal with it. This is like coding in paper bags in the early days of computers. The cost of an error is too high, and people have to verify the code countless times before entering it. Modern computers have greatly improved the efficiency of coding, which enables people to create more powerful software. People don't need to think too much about the code before they start coding, because the machine will help you solve some problems. Therefore, those who worry that people will become lazy or thoughtless because it is too convenient to obtain data and the cost of data processing and analysis is too low are really worrying. Historically, technological progress has improved human productivity, but it has not made people lazy, because at the same time, desire has also increased. Mankind will only become greater.

So in the era of big data, when the data is more comprehensive, we can set foot in some areas that could not be covered before because of insufficient data, such as forecasting. This is an exciting field, but in fact, this field has already appeared, and everyone is a beneficiary. The intelligent association function in the input method we usually use can predict the words we may enter next according to the words we entered before, thus saving our input time. In this algorithm, there is no artificial intelligence, only a large number of statistics on people's input habits. It is a statistical way to predict through the statistics of a large number of data, rather than adding unique rules or logic. This leads to an important way of information processing in the era of big data. Based on statistics, we can get the correlation of different individuals, but we don't need to know their causal relationship. We benefit from the correlation. This way, seemingly opportunistic, can give us an advantage at a critical moment. We are used to knowing the causal logic of a thing first, and then inferring the corresponding result. However, there will always be some phenomena that cannot be explained by reasonable logic. Wouldn't it be fun to skip the logical stage and enjoy some results directly through big data analysis (the case of Wal-Mart's beer and diapers)? Of course, rigorous logic is always worthy of respect.

3. The stickiness of the Internet

In the era of attracting users through new tricks in breadth, due to the improvement of technology, what an entrepreneur has developed in a new field is easy to be copied by others. At this time, depth is very important. Especially shopping websites, low-profit websites, portals and other websites with a large amount of information, the more you know about a user, the greater the advantage. Therefore, in the era when technology is no longer the most important factor, how to increase the stickiness and loyalty of users is the first priority. Through the user's previous information, we can infer the user's preferences and recommend the corresponding information or items to the user. When you know a user better, but others don't, this user will be inseparable from you. With his intelligent sorting function, the "Today's Headlines" application of news portal, and his recommendation algorithm in various shopping websites (but this is purely to increase consumption rather than increase user stickiness), users can give corresponding recommendations according to their previous browsing and preferences. The basis of these is to have a record of the user's behavior, otherwise there is no way to talk about it.

All walks of life are frantically seizing the opportunity to obtain data, and with enough data, everything becomes possible.

After reading "Big Data" in 2000 words, everything in the past is a prelude, which is the favorite quote of the big data industry. Big data is the current trend, and the era of big data is considered as a primary reading material to understand big data. I read it twice in a row recently, and the second time I wrote this comment. On the whole, it is worth reading, but the details need to be discussed.

Wikipedia's explanation of big data: Big data, or huge data, massive data and big data, refers to the huge amount of data involved, which cannot be intercepted, managed, processed and sorted out in a reasonable time.

Some people say that it is the era of reading pictures. In addition to novels and chicken soup for the soul, most bestsellers now have pictures. This book is a special case.

First of all, try to analyze the author's three viewpoints, which are three sentences that the big data industry likes to quote:

1 is not a random sample, but all data.

I think everyone can realize that the analysis of all data is better than the analysis of random samples, but in reality we often can't get all the data: first, data collection methods, each method has its scope of application, and it is impossible to cover everything; Second, from the data analysis, fighters can only count the bullet holes on the returning plane, and those that crash can't. By analyzing the fighters in flight, Ward found the weak points that are most likely to cause the crash. Third, the processing capacity can't keep up, just like the previous weather forecast was too far off the mark because it didn't have time to calculate those data. "Sampling analysis is the product of the era of information shortage and the era of analog data with limited information circulation", and the author obviously only pays attention to part of the reasons.

From the perspective of language understanding, what is all the data, whether it is "all the data we need" or "all the data we can collect", in many business cases in the book, we only deal with "all the data we can collect" or "all the data we think". People's understanding of nature is always limited, and existentialism believes that the world has no ultimate goal. For example, "Farecast made a forecast with the price data of airlines for a whole year", "a whole year" is a sample, or "all the data we need".

From a historical point of view, Ptolemy's sole purpose in building the Alexandria Library abroad is to "collect books from all over the world" and realize his dream of "collecting knowledge from all over the world". In China, Qianlong compiled four books, each of which had subjective factors. At that time, they all thought they could collect all the books. In the end, we didn't get all the books in that dream.

Not accurate, but mixed.

Because we have been sampling in the past, it is at a confidence level with a clear tolerance or deviation. Mankind will always know that we work with limited accuracy. At the same time, the author himself admits that "error is not an inherent feature of big data, but a real problem that needs our urgent treatment and may exist for a long time". Are the characteristics of big data accurate or mixed?

This leads to a question, how to control the quality of big data: First, it is not required to be accurate, but to what extent it is inaccurate, it needs to be defined, otherwise it will be a mess. On the other hand, if the tolerance is defined, everything that meets the conditions is accurate (or am I still stuck in the era of small data? I haven't straightened out the logic here. Just like the zero defect theory put forward by crosby, a master of quality management, I always think it is a false proposition. Defects must exist, depending on how to define them; Second, the processing of a large number of unstructured data, such as news quantification and sentiment analysis, still has a huge room for improvement in non-SQL applications.

"the problem will not be instantaneous, but it will go wrong slowly." We can predict the future by finding a correlation and monitoring it. Of course, I agree with this statement, but this does not mean that we can give up accuracy, but we need to redefine accuracy. For the project management industry, if a project has serious problems, we believe that there must be many factors and process problems, and we have lost many opportunities to recover. And if we blindly tolerate hybridity, the result is obviously unacceptable.

3 is not a causal relationship, but a correlation.

This is the greatest contribution of this book to big data theory, and it is also the most controversial place. I can't even watch the translation anymore.

I am too familiar with this relationship. Fortune-telling in primary schools is a typical "not causality, but correlation". Fortune telling is actually a summary of trends. Under given conditions, it will tell you what you need to stay away from and what you need to get close to, but it won't tell you why.

We often talk about science, however, what is science, no one can say clearly. My understanding of science is: first, there is a clear scope; Second, establish a mandatory and correct axiom within this scope; Third, there is a clear deduction process; Four can be copied. The hegemony of science lies in dismissing everything that does not meet these four conditions as pseudoscience and feudal superstition, and rejecting all your mistakes with those that do not meet the first two conditions. From this definition, big data does not conform to science.

The butterfly effect in chaos theory mainly focuses on correlation. It refers to the dependence on the sensitivity of initial conditions. Small differences at the input end will quickly amplify to the output end, but no one knows what can be output.

Once human beings give up the pursuit of causality, they also give up their best quality: willpower. Many people don't want to believe that fortune telling is because they are afraid that once they know their fate, they can't fight any more. Even if I believe in fortune telling, I am still exploring the causal factors in the relevant relationship. One of the reasons why I gave up my first job was that I was tired of such a certain tomorrow: when a task was sent out, I could probably predict which link would go wrong. As long as I don't follow, these links will go wrong in nine cases out of ten.

After analyzing these three viewpoints, here are some questions about big data theory. Big data is an important part of the current popular feedback economy. It is widely used in the financial and Internet industries and is considered as a high-paying field. Many times I wonder if the trend generated by the so-called invisible hand is invisible. For example, several companies pushed a concept and said it was a trend, which soon became a trend. Living examples around us are Tmall's Double Eleven and JD.COM. 6 18 of COM. A giant opens the way and countless people follow suit, which naturally creates a shopping festival. As for whether it is reasonable or not, it is of little significance to investigate it, because many things are incomparable. This is different from the bee colony thinking without compulsory control center.

After reading this book, I always feel that what the author said is too absolute, perhaps my understanding is too superficial, so I finally concluded under the temptation:

Love is inexhaustible, and it never rains but it pours.

The blessings cannot be exhausted, and when they are exhausted, they will be lonely.

You can't say everything, but it's easy to say everything

The rules are not feasible, and it will be complicated to do things.

;