Developing a Practical Model for Ethical AI in the Business World: Stage 2 – Technical Development

In this blog post series, Amalgam Insights is providing a practical model for businesses to plan the ethical governance of their AI projects.

To read the introduction, click here.

To read about Stage 1: Executive Design, click here

This blog focuses on Technical Development, the second of the Three Keys to Ethical AI described in the introduction.

Figure 1: The Three Keys to Ethical AI

Stage 2: Technical Development

Technical Development is the area of AI that gets the most attention as machine learning and data science start to mature. Understandably, the current focus in this Early Adopter era (which is just starting to move into Early Majority status in 2020) is simply on how to conduct machine learning, data science efforts, and potentially deep learning projects in a rapid, accurate, and potentially repeatable manner. However, as companies conduct their initial proofs of concepts and build out AI services and portfolios, the following four questions are important to take into account.

  • Where does the data come from?
  • Who is conducting the analysis?
  • What aspects of bias are being taken into account?
  • What algorithms and toolkits are being used to analyze and optimize?

Figure 2: Technical Development

Where does the data come from?

Garbage In, Garbage Out has been a truism for IT and data projects for many decades. However, the irony is that much of the data that is used for AI projects used to literally be considered “garbage” and archival exhaust up until the practical emergence of the “Big Data” era at the beginning of this decade. As companies use these massive new data sources as a starting point for AI, they must check on the quality, availability, timeliness, and context of the data. It is no longer good enough to just pour all data into a “data lake” and hope that this creates a quality training data sample.

The quality of the data is determined by the completeness, accuracy, and consistency of the data. If the data have a lot of gaps, errors, or significant formatting issues, the AI will need to account for these issues in a way that maintains trust. For instance, a long-standing historical database may be full of null values as the data source has been augmented over time and data collection practices have improved. If those null values are incorrectly accounted for, AI can end up defining or ignoring a “best practice” or recommendation.

From a practical perspective, consider as an example how Western culture has recently started to formalize non-binary gender or transgendered identity. Just because data may not show these identities prior to this decade does not mean that these identities didn’t exist. Amalgam Insights would consider a gap like this to be a systemic data gap that needs to be taken into account to avoid unexpected bias, perhaps through the use of adversarial de-biasing that actively takes the bias into account.

The Availability and Timeliness of the data refers to the accessibility, uptime, and update frequency of the data source. Data sources that may be transient or migratory may serve as a risk for making consistent assumptions from an AI perspective. If an AI project is depending on a data source that may be hand-curated, bespoke in nature, or inconsistently hosted and updated, this variability needs to be taken into account in determining the relative accuracy of the AI project and its ability to consistently meet ethical and compliance standards.

Data context refers to the relevance of the data both for solving the problem and for providing guidance to downstream users. Correlation is not causation, as the hilarious website “Spurious Correlations” run by Tyler Vigen shows us. One of my favorite examples shows how famed actor Nicolas Cage’s movies are “obviously” tied to the number of people who drown in swimming pools.

Figure 3: Drownings as a Function of Nicolas Cage Movies


(Thanks to Spurious Correlations! Buy the book!)

But beyond the humor is a serious issue: what happens if AI assumptions are built on faulty and irrelevant data? And who is checking the hyperparameter settings and the contributors to parameter definitions? Data assumptions need go through some level of line of business review. This isn’t to say that every business manager is going to suddenly have a Ph.D. level of data science understanding, but business managers will be able to either provide confirmation that data is relevant or provide relevant feedback on why a data source may or may not be relevant.

Who is conducting the analysis?

In this related question, the deification of the unicorn data scientist has been well-documented over the last few years. But just as business intelligence and analytics evolved from the realm of the database master and report builder to a combination of IT management and self-service conducted by data-savvy analysts, data science and AI must also be conducted by a team of roles that include the data analyst, data scientist, business analyst, and business manager. In small companies, an individual may end up holding multiple roles on this team.

But if AI is being developed by a single “unicorn” focused on the technical and mathematical aspects of AI development, companies need to make sure that the data scientist or AI developer is taking sufficient business context into account and fully considering the fundamental biases and assumptions that were made during the Executive Design phase.

What aspects of bias are being taken into account?

Any data scientist with basic statistical training will be familiar with Type I (false positive) and Type II (false negative) errors as a starting point for identifying bias. However, this statistical bias should not be considered the end-all and be-all of defining AI bias.

As parameters and outputs become defined, data scientists must also consider organizational bias, cultural bias, and contextual bias. Simply stating that “the data will speak for itself” does not mean that the AI lacks bias; this only means that the AI project is actively ignoring any bias that may be in place. As I said before, the most honest approach to AI is to acknowledge and document bias rather than to simply try to “eliminate” bias. Bias documentation is a sign of understanding both the problem and the methods, not a weakness.

An extreme example is Microsoft’s “Tay” chatbot released in 2016. This bot was released “without bias” to support conversational understanding. The practical aspect of this lack of bias was that the bot lacked the context to filter racist messages and to differentiate between strongly emotional terms and culturally appropriate conversation. In this case, the lack of bias led to the AI’s inability to be practically useful. In a vacuum, the most prevalent signals and inputs will take precedence over the most relevant or appropriate signals.

Unless the goal of the AI is to reflect the data that is most commonly entered, an “unbiased” AI approach is generally going to reflect the “GIGO” aspect of programming that has been understood for decades. This challenge reflects the foundational need to understand the training and distribution of data associated with building of AI.

What algorithms and toolkits are being used to analyze and optimize?

The good news about AI is that it is easier to access than ever before. Python resources and a plethora of machine learning libraries including PyTorch, Scikit, Keras, and, of course, Tensorflow, make machine learning relatively easy to access for developers and quantitatively trained analysts.

The bad news is that it becomes easy for someone to implement an algorithm without fully understanding the consequences. For instance, a current darling in the data science world is XGBoost (Extreme Gradient Boosting) which has been a winning algorithmic approach for recent data science contests because it reduces data to an efficient minima more quickly than standard gradient boosting. But it also requires expertise in starting with appropriate features, stopping the model training before the algorithm overtunes, and appropriately fine tuning the model for production.

So, it is not enough to simply use the right tools or the most “efficient” algorithms, but to effectively fit, stop, and tune models based on the tools being used to create models that are most appropriate for the real world and to avoid AI bias from propagating and gaining overweight influence.

In our next blog, we will explore Operational Deployment with a focus on the line of business concerns that business analysts and managers consider as they actually use the AI application or service and the challenges that occur as the AI logic becomes obsolete or flawed over time.