If it’s possible to find the manifesto of “data science,” the article published in far 2008 can be it. The main idea of research is that the classical scientific approach presented in the scheme form: a hypothesis model – the experiment, faced serious problems today.
One of them – impossibility to check the theory experimentally. For example, it is impossible to prove or disprove the M-theory in quantum physics because the humanity has no sufficient resources yet to do the necessary experiments.
Approaching a similar problem, on the other hand, is possible using the data science tools. Instead of being focused on relationships of cause and effect, data scientists suggest to study the correlation between objects. Presence of large volume of data, and opportunities to analyze it, does the unnecessary existence of any theory or model in general!
It can be explained on the example of an algorithm of ranging of search pages in google. To define what of pages is more relevant deliveries to the computer not necessary to carry out the in-depth semantic analysis, and it is enough to be focused on statistics of attendance of a particular resource. The algorithm assumes that to us not important the reason motivating people to arrive in one way or another, to us it is important to trace and classify final behavior. With enough data, the numbers speak for themselves.
Search engines, e-marketing not the only scopes of the analysis of big data. Today data science is systematically integrated everywhere, since scientific community, and further into all spheres of human activity.
In fact, It’s not about “the end of the theoretical study.” Of course, we will need a theory in future. Theory guides us when we have limited data and gives us a place of solid ground to have discussions on. Moreover, a data science can supplement a classical science way of exploring a world around us. For example, results of machine learning process can point scientists where they should focus their activity and what model is meaningless.
The theoretical approach is not only a way of exploring, but it’s also a way to structure and save the knowledge.
If we came to the start point of some new area where we don’t have enough data the ML tools will be ineffective as well as statistical analysis. In this case, we need to develop our knowledge through building a sharp model, upgrading it and then, when we’ll have enough data to analyze, we can use data science. Let’s look at Ballistics. In case of data science, we’ll need to do a large number of experiments to calculate a prediction for flying shell. And every time we when we want to calculate the trajectory, we need to spend a lot of resources. Or we can just use a one-line formula for every situation. It’s all about a balance between approaches. Data screen helps theoretic to focus on usefull data, as well as theory can help to upgrade the data science methods.
The bottom line is we can’t fully avoid a theory. It must be a part of the research process. The point is – the theory is just a first stage of data mining. Now we discovered a method which can cumulate and upgrade the theoretical approach – the data science
If to judge by the volume of development of studying of data, then it is possible to assume that shortly (even if not now), data science methods will occupy important weight in processes of decision-making in a scientific and business community.
If to whom it is interesting, with pleasure I will listen to criticism and possible shortcomings of a similar approach.
You will find the reference to an article below.