When government-funded research scientists create a study involving human subjects, an ethics committee typically reviews the study and determines if the subjects could possibly be harmed. No such oversight applies when social science researchers use big data. When Stanford University researchers announced that they had created a machine-learning algorithm that could identify images of men as being either gay or straight; advocacy groups quickly condemned the research. Wired reported the response from the LGBT and brought up an interesting point; do ethics come into play when researchers use a database instead of data from real human subjects?
Gathering and interpreting information from public databases may be its infancy, however, big data research gives social scientists free, rapid access to large amounts of data, which is hard to resist for cash-strapped researchers. Dating websites and social media provide enormous amounts of information, including age, sexual orientation, geographic location and education levels. Researchers could argue that people give this information freely, knowing that it will be posted online; nevertheless, individuals never explicitly gave permission for researchers using artificial intelligence to analyze their data. Many people understand that Facebook aggregates their data and publishes their demographics, in addition to allowing advertisers to target specific groups, but few people realize that social science researchers are analyzing their name and image.
The intangible potential harms that result from AI research include a violation of information privacy to an incorrect classification of an individual. The real harm is when researchers release free apps that guess ethnicity from a person’s name, which has already happened. While such as an can help determine diversity at a company, an ethics review could have caught the fact that employers, landlords and others could use the app to discriminate as well.
Research ethics for data science argue that ethics reviews may create bureaucratic burdens when there is no interaction with their subjects. It will be interesting to see how oversight will change large-scale data studies, if there is any change at all.