Everyone who uses Facebook should read this new study published in the Proceedings of the National Academy of Sciences (open access article). Applying data mining algorithms to Facebook likes for 58,000 volunteers, researchers from Cambridge University in the UK were able to predict a wide range of sensitive attributes with uncomfortable precision. These attributes include sexual orientation, ethnicity, political ideation, religion, personality traits, intelligence, substance use, social network size, and demographic characteristics.
Here is the summary:
We show that easily accessible digital records of behavior, Facebook Likes, can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender. The analysis presented is based on a dataset of over 58,000 volunteers who provided their Facebook Likes, detailed demographic profiles, and the results of several psychometric tests. The proposed model uses dimensionality reduction for preprocessing the Likes data, which are then entered into logistic/linear regression to predict individual psychodemographic profiles from Likes. The model correctly discriminates between homosexual and heterosexual men in 88% of cases, African Americans and Caucasian Americans in 95% of cases, and between Democrat and Republican in 85% of cases. For the personality trait “Openness,” prediction accuracy is close to the test–retest accuracy of a standard personality test. We give examples of associations between attributes and Likes and discuss implications for online personalization and privacy.
Given that Facebook likes are not protected by privacy guidelines, you have no ability to control the release of the information. Perhaps I am a suspicious sort, but I can envision too many opportunities for discrimination and misuse.
Perhaps it is just another sign of the times. Our government has given itself unprecedented access to our digital communications and data, dispensing with warrants and judicial oversight. Retailers have been collecting more and more data regarding our buying habits, presumably to refine marketing strategies to their customers. No doubt Google could paint a pretty accurate picture of my interests and attributes. Electronic invasions of privacy are the norm rather than exception.
Still, the ability to accurately guess many things I would not normally disclose on a public forum from a seemingly innocuous data stream is disconcerting. It did not even take that many likes to start making accurate predictions. The average number of Facebook likes in the study cohort was 68. For many attributes, the difference in predictive power based on 20 likes and 100 was not that large.
For the researchers, the findings are a cautionary tale about the ease of using a limited digital behavioral snapshot to paint a very detailed picture of an individual. That is not to say Facebook is making data available for such individual profiling to occur outside the bounds of their proprietary control. No one would be surprised if Facebook pushed ads at you based on your likes. However, most would be less sanguine about prospective employers, telemarketers, or law enforcement officers being able to profile you for other purposes.
On the other hand, the predictability of individual attributes from digital records of behavior may have considerable negative implications, because it can easily be applied to large numbers of people without obtaining their individual consent and without them noticing. Commercial companies, governmental institutions, or even one’s Facebook friends could use software to infer attributes such as intelligence, sexual orientation, or political views that an individual may not have intended to share. One can imagine situations in which such predictions, even if incorrect, could pose a threat to an individual’s well-being, freedom, or even life. Importantly, given the ever-increasing amount of digital traces people leave behind, it becomes difficult for individuals to control which of their attributes are being revealed. For example, merely avoiding explicitly homosexual content may be insufficient to prevent others from discovering one’s sexual orientation.
The concluding paragraph struck me as naive.
It is our hope, however, that the trust and goodwill among parties interacting in the digital environment can be maintained by providing users with transparency and control over their information, leading to an individually controlled balance between the promises and perils of the Digital Age.
If someone wants to use data in an underhanded manner, respect for boundaries and quaint notions of ethics are unlikely to deter them.