Big Data Ethics: Privacy

The following is a guest post by Jim Harris, Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ) Blog.

This blog series will examine the four main considerations of big data ethics as defined by the Council for Big Data, Ethics, and Society, specifically privacysecurityequality and access. Each post will address one of these elements, summarizing the issue at hand as well as providing takeaways for how practitioners can handle these concerns.


Privacy is generally defined as a fundamental human right to certain freedoms, including freedom from observation, freedom from public attention and freedom from interference in our personal lives. These freedoms give us the right to keep our private lives private, to keep our secrets to ourselves and to share our lives only with those we trust.

In the era of big data these freedoms face new challenges. Examples include the challenge to freedom from observation revealed by the government surveillance scandal in 2013, the challenge to freedom from public attention revealed by the celebrity photo hack of Apple’s iCloud in 2014 and the challenge to freedom from interference revealed by the Facebook emotional manipulation experiment in 2012.

In some ways, however, our privacy’s greatest challenge is us. While we would be outraged by the discovery of surveillance devices hidden in our homes to record whatever we say and do, most of us own a device that’s not only capable of doing so but that we also use willingly – a smartphone. We record what we say and do in a variety of data formats such as voicemails, text messages, emails, photos, videos and social networking status updates.

Furthermore, the geo-location tags, date-time stamps and other information associated with this data are the bits and bytes of digital breadcrumbs we scatter along our daily paths. Moreover, since we create most of this data using services we didn’t pay for (e.g., Gmail, Facebook, Twitter, Instagram, YouTube), we essentially tender our privacy as currency. This self-surveillance avails companies and governments alike with a lot of data about us.

How this data is being used is the issue and it’s forcing us to rethink how we define privacy and how we defend against violations of privacy. Neil Richards and Jonathan King of the Washington University School of Law argue that it’s not sufficient to assign privacy solely to the domain of individual responsibility. They see privacy as more than the keeping of our own secrets.

Richards and King advise “privacy should not be thought of merely as how much is secret, but rather about what rules are in place (legal, social, or otherwise) to govern the use of information as well as its disclosure. Our ability to reveal patterns and new knowledge from previously unexamined troves of data is moving faster than our current legal and ethical guidelines can manage.

Many of the most revealing personal data sets such as call history, location history, social network connections, search history, purchase history, and facial recognition are already in the hands of governments and corporations. Further, the collection of these and other data sets is only accelerating.”

Not only is big data collection accelerating, but big data applications are also multiplying. “Data analysts are using big data,” Andrej Zwitter of the University of Groningen cautions, “to find out our shopping preferences, health status, sleep cycles, moving patterns, online consumption, friendships, etc.” In most cases, this is done with differential privacy, which attempts to minimize the identification of individuals within big data sets, the value of which is largely their size and, hence, their use in aggregate analytics. Examples include using voter registration data to identify voting trends without identifying individual voters and using medical records to determine the effectiveness of certain treatments without identifying individual patients.

However, as Zwitter explains, “de-individualization is just one aspect of anonymization. Location, gender, age, and other information relevant for the belongingness to a group and thus valuable for statistical analysis relate to the issue of group privacy. Anonymization of data is, thus, a matter of degree of how many and which group attributes remain in the data set. To strip data from all elements pertaining to any sort of group belongingness would mean to strip it from its content. In consequence, despite the data being anonymous in the sense of being de-individualized, groups are always becoming more transparent.”

Anonymity, like many aspects of privacy, cuts both ways. To Zwitter’s point, group privacy advocates argue for anonymity and against its challenges, such as Facebook’s real name policy, whereas the down – and dark – side of anonymity is sadly exemplified by cyberbullying.

It’s also important to note that while we want to protect our privacy, we also want to benefit from innovations made possible by surrendering some of our privacy. “We welcome GPS, cell tower, and even Wi-Fi location tracking of our cell phones,” remark Richards and King, “so that we can make calls more easily and use location services in applications to check-in, navigate, or find our friends.

We willingly share information to feed big data algorithms so dating sites can find us compatible mates, career sites can help us more quickly find jobs, online bookstores can recommend books for us to read, and social networking sites can connect us with new friends.”


Even as we struggle to come to terms with the implications and unintended consequences of having nearly every aspect of our lives captured as data, privacy must remain a fundamental human right. Therefore, it’s understandable that many are calling for ethics standards and a code of ethical practices regarding big data.

However, there are no easy answers or quick solutions. Individuals, businesses and governments must educate themselves on all aspects of this issue. As Richards and King conclude, “we need to enable government officials to use big data to act in our defense. We want to share information with companies to let them serve us better with big data. Yet we need to think more broadly about big data so we can develop privacy ethics, norms, and legal protections to prevent important societal values like privacy, confidentiality, transparency, and identity from becoming subordinate to the new capabilities of big data.”

Last updated January 2016