In addition to above-mention big data problem, I want to share with you one interesting article. In short, one company was trying to collect data for their research for a while and couldn’t do it due to the technical complexity. But then some happened and database been hacked and published for everyone. The published dump has been contained a mix of private and public data itself. So, here is a dilemma: “Can this database dump now be used?”, “Does it become “public”? or “whether to use this dataset to produce a socially useful research?”
The bottom line is the company didn’t use the hacked data. They provide the list of arguments which I share. Here is some:
- Researchers have a limited capability to distinguish between public and private information within the hacked data.
- May see private data when cleaning the data.
- Perhaps legitimizing criminal activity.
- Violating users’ expectation of privacy.
- Using people’s data without consent.
- We want this data, but we don’t need it. Other data can be ethically collected and used
The only benefit of using the illegal information is a “faith in goodness powers of the research for men.” But, honestly, it’s a bull shit. The majority of research has a primary goal to increase the revenue of the company. The dirty pool game can break the fair concurrence in data-providers business. As a result, fewer companies will care about data security what can badly affect to the end user.
Hack attack on large credit company Equifax
In this case, the “black market of data” can occur. If a company needs some “sensitive” data for those research, they can just commission a hacking this data with the following publication. The company will wash hands of an affair shifting the blame on “a bad hacker.” This kind of practice will finally remove borders in privacy.
Specialists of The University of Michigan comment:
“When using the hacked data, you reward criminal activity, and in this way, criminals will be motivated to find more ways to hack data. It is like buying a stolen bike from a criminal. Besides, the private data can come in (more) wrong hands so the private data will be spread more and more among more and more people. And because researchers have a limited capability to distinguish between public and private information within the hacked data, they may use private data or spread private data by accident. All the above will lead to a higher possibility of abuse of private data.”
As a data science still a very young an against to journalism and don’t have bases such as code of conduct, we need to be more careful in making decisions about what passes and what won’t. It can be very complicated based on the fact that we can’t evaluate an impact correctly for both cases. Let’s say, we collect data for cancer research. For more performance, we need more information to mine. The results of our study would have a significant impact on man, sure enough. BUT, we can’t calculate even closely the risk of concentration a massive amount of private, sensitive data in one place. If this kind of data would be used in bad faith, some the story can change unpredictably, and we faced the much worse questions.