A smorgasbord of issues to think about when your organisation starts deploying big data based systems, from Kate Carruthers

“We ask ethical questions whenever we think about how we should act. Being ethical is a part of what defines us as human beings.”
— The Ethics Centre, Sydney

Humans have been thinking about the moral principles that govern our behaviour or the way in which we conduct ourselves for aeons. We are moving at lightspeed towards a new and exciting future that is built on algorithms, data, and digital technologies. Ethics is an area of increasing importance since we are barrelling forward with the proliferation of data through digital and IoT, and there seems to be little opportunity to slow things down.

I’ve been thinking about digital and data ethics since I joined Steve Wilson, David Bray, John Taschek, and R “Ray” Wang  on a Digital Ethics for the Future panel in 2016.

5 propositions about data

  • Data is not neutral. All data is subject to bias.
  • There is no such thing as raw data. Even the simple mechanism of selecting data, means you’ve exercised judgment as to which data to include or exclude.
  • The signal to noise ratio has changed. We now have so much data that there is more noise than signal, and it gets difficult to ascertain which of it is the signal.
  • Data is not inherently smart. Our interpretation of data is what adds value.
  • The more data we have, the less anonymity. It’s becomes increasingly difficult to avoid identification.

Why this is important

There have been numerous examples of data breaches. The Australian Red Cross and the nation of Sweden are just two recent examples.

To understand the scale and scope of data breaches globally, it’s worth spending a few minutes in the visualisation tool World’s Biggest Data Breaches.

Every data breach is the result of some defect in the design, development, or deployment of the technology. These breaches could be prevented by including some ethical frameworks into the design, build, and deployment phases.

It is  interesting to recall the ease with which Microsoft’s artificial intelligence (AI) based Twitter chatbot Tay was trained to become very nasty very quickly. Twitter users taught Tay to be a racist asshole in less than a day.

Anyone deploying an artificial  intelligence system needs to understand the training data they’re using, and ponder the potential consequences of design and deployment decisions.

Then there was the recent example of bathroom soap dispensers being designed to recognise white hands but not coloured ones.

This is obvious bias from the design and development team, and an example of why diversity in teams is critical. The average software developer is a white male, which means that it’s likely that every design has default settings reflecting white male biases.

The issues of bias — both unconscious and conscious — are enormous.

The ocean of data is increasing at a vast rate, as shown by this chart from the IDC Data Age 2025 study. This means that we need to develop ethical frameworks to support the acquisition, management and analysis of large datasets.


Some existing approaches

Universities have a long history of managing ethics, but even they are struggling with the implications of the complex data sets and algorithms they’re dealing with.

Over the years, the ICT industry has developed a number of codes of ethics, and codes for professional practice, yet many developers and data scientists are  unaware of them.

Some examples of these codes of practice include:

But realistically, if developers haven’t even heard of these codes, how can they possibly influence the design of solutions that avoid bias and other ethical issues?

Some newer approaches

“Privacy is an inherent human right, and a requirement for maintaining the human condition with dignity and respect.”
Bruce Schneier

The beginnings of new approaches are starting to emerge, such as the Accenture: 12 guidelines for developing data ethics codes. Recent initiatives such as the OWASP Security by Design Principles and Privacy by Design might well provide a good starting point for thinking about how we can embed good practice into the design and building of data sets and algorithms.

There is some good discussion of these issues  in  Luciano Floridi and Mariarosaria Taddeo’s What is Data Ethics? (2016). As they note, we need to examine ethics in terms of these three categories:

  • Data, including how we generate, record, and share data, including issues of consent and unintended uses of the data;
  • Algorithms, or how we interpret data via artificial intelligence, machine learning and robots; and
  • Practices, devising responsible innovation and professional codes to guide this emerging science.

There have been developments in the area of community based approaches to improving digital and data ethics, chiefly in the area of machine learning and AI.

Some of the groups working in this area:

Some new ways to think about digital and data ethics

“Complexity is a defining feature of the digital era, and we are not adjusting our governance structures to manage it.”
Kent Aitken, Prime Minister’s Fellow, Public Policy Forum Canada, 2017

We need to be clear that technology has no ethics. It is people who demonstrate ethics. And technology inherits the biases of its makers. We need to develop ethical frameworks and governance practices that enable us to develop solutions that are better from an ethical perspective.

I believe that if we start from the principles of Privacy by Design and Security by Design, then we have a reasonably firm practical basis for the future.

One thing is certain, at an institutional level, information security, privacy, and data governance will need more work to form a solid foundation to enable better data ethics.