Fortune Telling Collection - Fortune-telling birth date - What are the essential similarities and differences between the algorithm of machine learning and the algorithm in Introduction to Algorithms?

What are the essential similarities and differences between the algorithm of machine learning and the algorithm in Introduction to Algorithms?

Author: Dong Keren

Link:/question/24976006/answer/29682806

Source: Zhihu.

The copyright belongs to the author. Please contact the author for authorization for commercial reprinting, and please indicate the source for non-commercial reprinting.

The algorithm in the introduction of algorithm is essentially a problem with accurate solution, how to get this solution more efficiently. This efficiency can be shorter calculation time or less space needed in the calculation process.

A simple example is how to quickly rearrange an unordered array from small to large, or find its median. There is a definite and unique answer to these questions, and there is generally a stupid method (exhaustive or traversal) that can be solved step by step. The so-called algorithm is how to simplify the steps and find this solution faster and easier. The data processed by these algorithms are also simple and clean types, such as arrays, binary trees and graphs. For these algorithms, the data scale affects the time and space required for calculation, and will not affect the logic and calculation results of the algorithm itself because of the change of scale.

Generally, the problem to be solved by machine learning has no exact solution, nor can it be found by exhaustive or traversal. What needs to be emphasized is the attribute of "learning", that is, it is hoped that the algorithm itself can dynamically discover new laws and even change the logic and behavior of the algorithm program according to the changes of given data or computing environment.

For example, a thousand documents can be divided into different categories. The simplest can be given several categories, such as news, novels, poems and so on. The algorithm can automatically classify articles into corresponding categories according to their contents. It can be seen here that even if people do this problem, there are many vague and uncertain places. For example, should a crime documentary in the Legal Evening News be classified as news or fiction? Or should a long poem, such as Homer's epic, belong to a novel or a poem? The machine learning algorithm is to automatically give the division according to the rules found in the content of the article. However, different algorithms can give different solutions, and these solutions can all be "correct", so it is generally necessary to artificially design a judgment standard to decide which is better or worse.

You can also let the algorithm discover the rules in articles and group articles with high similarity without giving a category in advance. In this way, different algorithms may give different classification numbers, which may be three, four or five, all of which may be "correct" classification. Even what is "similarity", different algorithms can give different explanations, which can be the frequency and proportion of nouns, verbs and adjectives, or the grammatical structure of sentences.

Further, you may also hope that this algorithm can be used to judge the category of a new document. The more new documents you enter, the larger the initial data set. When the scale becomes larger, the laws that are not obvious in the original data may become obvious. For example, there is only one argumentative paper in the original 1000 document, and most algorithms may not be able to classify it separately. But when you continuously input 100 argumentative papers, the proportion of argumentative papers in the data becomes10/100%, which is almost 10%. At this time, the algorithm should separate the argumentative papers. In this sense, the data itself has a great influence on the algorithm, which is also the essential difference between the introduction to algorithm and the algorithm.

Technically, the algorithm in Introduction to Algorithms focuses on data structure and computational complexity, which belongs to a branch of discrete mathematics and does not involve advanced mathematical concepts such as calculus. The algorithm of machine learning itself is based on theories and technologies such as probability theory, statistics and optimization, which makes people feel more "mathematical" from this perspective.

In the specific implementation details, machine learning algorithm will apply a lot of techniques in the introduction of the algorithm to improve the calculation efficiency. However, it should be emphasized that this is only for the bottom-level implementation, and there is not much connection between the two in the logic of the algorithm itself. In other words, the technology in Introduction to Algorithms can help you write faster programs to run machine learning algorithms, but it is not helpful to the problems that machine learning tries to solve. Skillful use of binary tree hash table and accurate estimation of the complexity of a graph algorithm may not help you guess what is the best gift for your girlfriend's birthday (Taobao Jun who has used machine learning algorithm probably knows! )。 So don't think of them as building blocks and components.

Finally, if the above explanation still confuses you, there is a more popular explanation: the introduction of algorithms is to teach you how to count, and machine learning is basically equivalent to astrology. One is mechanical, the other is cheating. That's about it.

See the link: /question/24976006 for detailed analysis.