Data Science – Page 14 – Amalgam Insights

Posted on March 27, 2020January 3, 2022 by Hyoun Park — 3 Comments

Conferencing Solutions: Work in the Time of Corona

This blog is a continuation of our research on remote work in the Time of Corona. Over the next several weeks, Amalgam Insights will provide guidance on platforms, necessary functionality, and top vendors across a variety of remote work technologies needed to maintain employee productivity.

This four-part blog series on Conferencing Solutions includes the following topics:
Part I: Introduction to Conferencing Solutions
Part II: Defining and Purchasing Conferencing Solutions
Part III: Evaluating Conferencing Solutions
Part IV: Recommendations for Effective Conferencing

Key Stakeholders: Chief Executive Officer, Chief Information Officer, Network Services Directors and Managers, Telecom Directors and Managers, IT Directors and Managers

Why It Matters: In the Time of Corona, Amalgam Insights estimates that the United States workforce working from home has increased from 5% at the end of 2019 to over 30% as of the end of March 2020. In light of this fundamental shift, companies must choose, deploy, and administer conferencing solutions effectively to support remote workers and maintain collaboration-based productivity.

Top Takeaway: Conferencing is a core capability to support teamwork, collaboration, and face-to-face interaction in remote work settings. By understanding the features, pricing, and best practices associated with supporting conferencing solutions at scale, companies can evaluate and implement conferencing solutions at scale to support business continuity and remote teams in the Time of Corona.

The Value of Conferencing Solutions Has Increased in the Time of Corona

The COVID-19 outbreak is spurring businesses across the globe to implement, support, and even promote, remote work. Companies that had previously held out against telecommuting now have little choice but to take the leap as jurisdictions force shutdowns of non-essential businesses and regulate travel and aggregations of people.

There is no guarantee COVID-19 will subside in the next few weeks and global trends indicate that anti-COVID-19 social dispersion tactics require two+ months to be fully effective. As a result, every enterprise that does not require a large on-site contingent must maintain employee productivity as they work from home to keep the global economy afloat. And in all honesty, companies need to have contingency plans for remote work based on the ongoing threat of when the next virus, natural disaster or terrorist attack will come. This novel coronavirus is both a test of current policies and an opportunity to improve remote work responses when the next regional or global emergency occurs. Enterprises are quickly learning the consequences of being late adopters in developing a remote work policy.

Communications technology has evolved quickly over the past decade as features such as text chat, video conferencing, recording, and well-governed centralized administration have become standard capabilities. Organizations undergoing due diligence for products that will optimize work environments will want to start by evaluating and administering these solutions. This first report in a series dedicated to remote work tools explores conferencing, a vital capability that brings colleagues together virtually.

Key Vendors for Enterprise Conferencing

Key Vendors that Amalgam Insights recommends in the Conferencing market include BlueJeans, Cisco WebEx, Google Hangouts, GoToMeeting, Microsoft Teams, RingCentral, Zoho Meeting and Zoom. Each of these vendors meets Amalgam Insights’ minimum expectations for business-grade administration, governance, breadth of capabilities, and business support to organizations that deploy at scale.

To support work in the time of Corona, each vendor recommended by Amalgam Insights for enterprise conferencing usage has provided resources or enterprise-grade functions to their users as a free or reduced-price offering. These offerings include:

BlueJeans: Free Access for First Responders and NGOs

Cisco WebEx: Upgraded Free Accounts to support 100 users with toll calls and no time restrictions and 90-day business trials

Google Hangouts: Premium GSuite features until July 1, including conferencing to 250 users and recording meetings to Google Drive

GoToMeeting: Emergency Remote Work Kits with 3 months access to GoToMeeting, GoToWebinar, remote access, & remote support products

Microsoft Teams: 6 month trials for premium version of Microsoft Teams & free version for all educational institutions

RingCentral: 90 day trial for educators, health-care providers, and non-profits

Zoho Meeting: Zoho is providing its entire Remotely Suite free until July 1, which includes Meeting, documents, webinar, project management, and remote support tools

Zoom: lifted 40-minute limit on free accounts in multiple countries, lifted time limits for schools using free accounts

We hope that these offerings from the top vendors that Amalgam Insights recommends in the conferencing space will be helpful as companies seek solutions to support their remote employees. In the next blog in this series, Part II: Defining and Purchasing Conferencing Solutions, Amalgam Insights will discuss functional capabilities that companies should consider as they start to commit to a conferencing solution and consolidate vendors.

Posted on March 9, 2020January 3, 2022 by Hyoun Park — 1 Comment

Work in the Time of Corona: An Alert for the Amalgam Community

Summary: This piece provides guidance on:

The “market sizing” of Coronavirus threat from a technology industry analyst perspective
5 key platforms and representative vendors for remote work usage
Key recommendations for world-class virtual events and remote work, and
A selection of vendors providing free applications and services in the Era of Corona.

Novel coronavirus (COVID-19) has wreaked havoc on the tech workplace in the early part of 2020. And, like the great Gabriel Garcia Marquez novel “Love in the Time of Cholera,” we all face legitimate challenges in deciding how rational and detached or how personal and connected to be at a specific time in history.

Continue reading Work in the Time of Corona: An Alert for the Amalgam Community

Posted on January 27, 2020January 3, 2022 by Hyoun Park

RIP to Business Legends Leila Janah and Clayton Christensen

On January 23, 2020, the business world lost two of its biggest stars: Leila Janah and Clayton Christensen. Both of them were personal inspirations to me both in their ability to execute on big ideas and to make a real difference in the world by living up to the cliche of “doing well by doing good.”

Leila Janah passed away on January 23rd, 2020 at the age of 37. She was an unstoppable force in fighting global poverty through her organizations of Samasource, LXMI, and Samaschool. From our Amalgam Insights’ perspective, the work that Samasource did in training AI data for many of the world’s biggest enterprises made Samasource an important company to watch.

But even more importantly was how Samasource conducted this AI data training. Samasource has employed people across India, Kenya, Uganda, Haiti, Pakistan, Ghana, and South Africa through what is now called impact sourcing where workers are trained to a job and paid a living wage with the goal of rising above poverty. The company has hired and trained thousands of people since its founding in 2008 and has a global staff of 2,900.

In running Samasource, Janah both lifted up thousands of people and created an organization that was seen as a legitimate growth business. In November 2019, Samasource raised a $14.8 million A round to increase growth at a time when trusted AI data is more important than ever. Janah was a pioneer in simultaneously evolving the concept of managing AI data while creating a massively successful growth company and creating change in emerging markets.

RIP Leila Janah. Thank you for making a difference.

Clayton Christensen passed away on January 23rd, 2020 at the age of 67. He obviously needs no introduction in the business world as he has been a guiding light in the business world for decades. His 1997 book The Innovator’s Dilemma is the most important business book of recent times and Christensen is in the pantheon of great business authors along with the likes of Benjamin Graham, Dale Carnegie, W. Edwards Deming, Peter Drucker, and Michael Porter.

Christensen coined the term of “Disruptive Innovation,” which described how products and services that could be seen as inferior, tangential, and more accessible than their dominant status quo market equivalents could eventually usurp leading market positions over time. Although Christensen could have sat on his laurels and simply used his Innovator’s Dilemma work for the next 20+ years, Christensen was always seeking to improve and perfect his work on “disruptive innovation” and to push back against the breathless hype of the phrase he was associated with.

Christensen’s ability to translate his theories and work into tangible action that helped transform the likes of Apple, Intel, Netflix, and practically every company that has successfully evolved or fended off new competitors. Christensen’s work made every company more aware of its need to serve customer needs, disrupt as needed to match customer preferences, and identify “jobs to be done.”

RIP Clayton Christensen. Thank you for being a brilliant thinker and an even better person.

Posted on January 8, 2020January 3, 2022 by Hyoun Park — Leave a comment

5 MegaThemes for the 2020s That Will Transform IT

As we get ready for 2020, Amalgam Insights is here to prepare companies for the future. In the past few weeks, we’ve been posting insights on what to look for in 2020 with posts including:

Nine Big Trends for Technology Expense Management in 2020

AI on AI – 8 AI Predictions for the Data Savvy Pro

Looking at Microservices, Containers, and Kubernetes with 2020 Vision

and our four-part series on Ethical AI for the future:

Developing a Practical Model for Ethical AI in the Business World: Introduction

Developing a Practical Model for Ethical AI in the Business World: Stage I – Executive Design

Developing a Practical Model for Ethical AI in the Business World: Stage 2 – Technical Development

and

Developing a Practical Model for Ethical AI in the Business World: Stage 3 – Operational Deployment

Over this decade, we have learned how to work with technology at massive scale and with unprecedented power as the following technology trends surfaced in the 2010s:

The birth and death of Big Data in supporting massive scale as the terabyte shifted from an intimidating amount of data to a standard unit of measurement
The evolution of cloud computing from niche tool to a rapidly growing market that is roughly $150 billion a year now and will likely be well over a trillion dollars a year by the end of the 2020s
The Internet of Things, which will enable a future of distributed and specialized computing based on billions of processors and build on this decade’s massive progress in creating mobile and wireless smart devices.
The democratization of artificial intelligence tools including machine learning, deep learning, and data science services and platforms that have opened up the world of AI to developers and data analysts
The use of CRISPR Cas9 to programmatically edit genes, which has changed the biological world just as AI has changed the world of technology
Brain biofeedback and Brain-Computer Interfaces, which provide direct neural interfaces to control and affect a physical environment.
Extended Reality, through the development of augmented and virtual reality which are starting to provide realistic sensory simulations available on demand

These bullet points describe where we already are today as of the end of 2019. So, how will all of these technologies affect the way we work in the 2020s? From our perspective, these trends fit into 5 MegaThemes of Personalization, Ubiquity, Computational Augmentation, Biologically Influenced Computing, and Renewability.

We believe the following five themes have both significantly evolved during the 2010s and will create the opportunity for ongoing transformative change that will fundamentally affect enterprise technology. Each of these MegaThemes has three key trends that will affect the ways that businesses use technology in the 2020s. This piece provides an introduction to these trends that will be contextualized from an IT, data, and finance perspective in future work, including blogs, webinars, vendor landscapes, and other analyst insights.

Over the rest of the year, we’ll explore each of these five MegaThemes in greater detail, as these primary themes will end up driving innovation, change, and transformation within our tactical coverage areas including AI, analytics, Business Planning, DevOps, Finance and Accounting, Technology Expense Management, and Extended Reality.

Posted on January 7, 2020January 3, 2022 by Tom Petrocelli — 1 Comment

Looking at Microservices, Containers, and Kubernetes with 2020 Vision

Some years are easy to predict than others. Stability in a market makes tracing the trend line much easier. 2020 looks to be that kind of year for the migration to microservices: stable with steady progression toward mainstream acceptance.

There is little doubt that IT organizations are moving toward microservices architectures. Microservices, which deconstruct applications into many small parts, removes much of the friction that is common in n-Tier applications when it comes to development velocity. The added resiliency and scalability of microservices in a distributed system are also highly desirable. These attributes promote better business agility, allowing IT to respond to business needs more quickly and with less disruption while helping to ensure that customers have the best experience possible.

Little in this upcoming year seems disruptive or radical; That big changes have already occurred. Instead, this is a year for building out and consolidating; Moving past the “what” and “why” and into the “how” and “do”.

Kubernetes will be top of mind to IT in the coming year. From its roots as a humble container orchestrator – one of many in the market – Kubernetes has evolved into a platform for deploying microservices into container clusters. There is more work to do with Kubernetes, especially to help autoscale clusters, but it is now a solid base on which to build modern applications.

No one should delude themselves into thinking that microservices, containers, and Kubernetes are mainstream yet. The vast majority of applications are still based on n-Tier design deployed to VMs. That’s fine for a lot of applications but businesses know that it’s not enough going forward. We’ve already seen more traditional companies begin to adopt microservices for at least some portion of their applications. This trend will accelerate in the upcoming year. At some point, microservices and containers will become the default architecture for enterprise applications. That’s a few years from now but we’ve already on the path.

From a vendor perspective, all the biggest companies are now in the Kubernetes market with at least a plain vanilla Kubernetes offering. This includes HPE and Cisco in addition to the companies that have been selling Kubernetes all along, especially IBM/Red Hat, Canonical, Google, AWS, VMWare/Pivotal, and Microsoft. The trick for these companies will be to add enough unique value that their offerings don’t appear generic. Leveraging traditional strengths, such as storage for HPE, networking for Cisco, and Java for Red Hat and VMWare/Pivotal, are the key to standing out in the market.

The entry of the giants in the Kubernetes space will pose challenges to the smaller vendors such as Mirantis and Rancher. With more than 30 Kubernetes vendors in the market, consolidation and loss is inevitable. There’s plenty of value in the smaller firms but it will be too easy for them to get trampled underfoot.

Expect M&A activity in the Kubernetes space as bigger companies acquihire or round out their portfolios. Kubernetes is now a big vendor market and the market dynamics favor them.

If there is a big danger sign on the horizon, it’s those traditional n-Tier applications that are still in production. At some point, IT will get around to thinking beyond the shiny new greenfield applications and want to migrate the older ones. Since these apps are based on radically different architectures, that won’t be easy. There just aren’t the tools to do this migration well. In short, it’s going to be a lot of work. It’s a hard sell to say that the only choices are either expensive migration projects (on top of all that digital transformation money that’s already been spent) or continuing to support and update applications that no longer meet business needs. Replatforming, or deploying the old parts to the new container platform, will provide less ROI and less value overall. The industry will need another solution.

This may be an opportunity to use all that fancy AI technology that vendors have been investing in to create software to break down an old app into a container cluster. In any event, the migration issue will be a drag on the market in 2020 as IT waits for solutions to a nearly intractable problem.

2020 is the year of the microservice architecture.

Even if that seems too dramatic, it’s not unreasonable to expect that there will be significant growth and acceleration in the deployment of Kubernetes-based microservices applications. The market has already begun the process of maturation as it adapts to the needs of larger, mainstream, corporations with more stringent requirements. The smart move is to follow that trend line.

Posted on December 6, 2019January 3, 2022 by Hyoun Park — 1 Comment

Why Technology Business Management and Technology Expense Management Are Misaligned

(Note: This presentation is also available in a presentation format on Slideshare)

One of the most frequent questions Amalgam Insights receives is how Technology Business Management and Technology Expense Management are related to each other. And what do these topics have to do with the new phrase of “FinOps,” that is starting to appear? Amalgam’s perspective is definitely different.

Why FinOps is a Misnomer
Continue reading Why Technology Business Management and Technology Expense Management Are Misaligned

Posted on December 5, 2019January 3, 2022 by Hyoun Park — 1 Comment

Nine Big Trends for Technology Expense Management in 2020

What are the key IT cost trends that you need to be aware of in 2020? In case you missed my webinars with MDSL yesterday, you can get a short summary right here.
Continue reading Nine Big Trends for Technology Expense Management in 2020

Posted on December 4, 2019January 3, 2022 by Hyoun Park — 2 Comments

Developing a Practical Model for Ethical AI in the Business World: Stage 3 – Operational Deployment

In this blog post series, Amalgam Insights is providing a practical model for businesses to plan the ethical governance of their AI projects.

To read the introduction, click here.

To read about Stage 1: Executive Design, click here

To read about Stage 2: Technical Development, click here.

This blog focuses on Operational Deployment, the third of the Three Keys to Ethical AI described in the introduction.

Figure 1: The Three Keys to Ethical AI

Stage 3: Operational Deployment

Once an AI model is developed, organizations have to translate this model into actual value, whether it be by providing the direct outputs to relevant users or by embedding these outputs into relevant applications and process automation. But this part of AI also requires its own set of ethical considerations for companies to truly maintain an ethical perspective.

Who has access to the outputs?
How can users trace the lineage of the data and analysis?
How will the outputs be used to support decisions and actions?

Figure 2: Deployment Strategy

Who has access to the outputs?

Just as with data and analytics, the value of AI scales as it goes out to additional relevant users. The power of Amazon, Apple, Facebook, Google, and Microsoft in today’s global economy shows the power of opening up AI to billions of users. But as organizations open up AI to additional users, they have to provide appropriate context to users. Otherwise, these new users are effectively consuming AI blindly rather than as informed consumers. At this point, AI ethics expands beyond a technical problem into an operational business problem that affects every end user affected by AI.

Understanding the context and impact of AI at scale is especially important for AI initiatives that are focused on continuous improvement focused on increasing user value. Amalgam Insights recommends a focus on directly engaging user feedback for user experience and preference rather than simply depending on A/B testing. It takes a combination of quantitative and qualitative experience to optimize AI at a time when we are still far from truly understanding how the brain works and how people interact with relevant data and algorithms. Human feedback is a vital aspect for AI training and to understand the perception and impact of AI.

How can users trace the lineage of the data and analysis?

Users accessing AI in an ethical manner should have basic access to the data and assumptions used to support the AI. This means both providing quantitative logic and qualitative assumptions that can communicate the sources, assumptions, and intended results of the AI to relevant users. This context is important in supporting an ethical AI project as AI is fundamentally based not just on a basic transformation of data, but on a set of logical assumptions that may not be inherently obvious to the user.

From a practical perspective, most users will not fully understand the mathematical logic associated with AI, but users will understand the data and basic conceptual assumptions being made to provide AI-based outputs. Although Amalgam Insights believes that the rise of AI will lead to a broader grasp of statistics, modeling, and transformations over time, it is more important that both executive and technical stakeholders are able to explain how AI technologies in production are productive, relevant, and ethical based on both a business and technical basis.

How will the outputs be used to support decisions and actions?

Although this topic should already have been explored at the executive level, operational users will have deeper knowledge of how the technology will be used on a day-to-day basis and should revisit this topic based on their understanding of processes, internal operations, and customer-facing outcomes.

There are a variety of ways that AI can be used to support the decisions we make. In some cases, such as with search engines and basic prioritization exercises, AI is typically used as the primary source of output. For a more complex scenario, such as sales and marketing use cases or complex business or organizational decisions, AI may be a secondary source to provide an additional perspective or an exploratory and experimental perspective simply to provide context for how an AI perspective would differ from a human-oriented perspective.

But it is important for ethical AI outputs to be matched up with appropriate decisions and outcomes. A current example creating headlines is focused on the current launch of the Apple credit card and decisions being made about disparate credit limits for a married man and woman based on “the algorithm.” In this example, the man was initially given a much larger credit limit than the woman despite the fact that the couple filed taxes jointly and effectively shared joint income.

In this case, the challenge of giving “the algorithm” an automated and primary (and, likely, exclusive) role in determining a credit limit has created issues that are now in the public eye. Although this is a current and prominent example, it is less of a statement about Apple in particular and more of a statement regarding the increasing dependence that financial services has on non-transparent algorithms to accelerate decisions and provide an initial experience to new customers.

A more ethical and human approach would have been to figure out if there were inherent biases in the algorithm. If the algorithm had not been sufficiently tested, it should have been a secondary source for a credit limit decision that would ultimately be made by a human.

So, based on these explorations, we create a starting point for practical business AI ethics.

Figure 3: A Practical Framework

Recommendations

Maintain a set of basic ethical precepts for each AI project across design, development, and deployment. As mentioned in Part 1, these ethical statements should be focused on a few key goals that should be consistently explored from executive to technical to operational deployment. These should be short enough to fit onto every major project update memo and key documentation associated with the project. By providing a consistent starting point of what is considered ethical and must be governed, AI can be managed more consistently.

Conduct due diligence across bias, funding, champions, development, and users to improve ethical AI usage. The due diligence on AI currently focuses too heavily on the construction of models, rather than the full business context of AI. Companies continue to hurt their brands and reputation by putting out models and AI logic that would not pass a basic business or operational review.

Align AI to responsibilities that reflect the maturity, transparency, and fit of models. For instance, experimental models should not be used to run core business processes. For AI to take over significant operational responsibilities from an automation, analytical, or prescriptive perspective, the algorithms and production of AI need to be enterprise-ready just as traditional IT is. Just because AI is new does not mean that it should bypass key business and technical deployment rules.

Review and update AI on a regular basis. Once an AI project has been successfully brought to the wild and is providing results, it must be managed and reviewed on a regular basis. Over time, the models will need to be tweaked to reflect real-life changes in business processes, customer preferences, macroeconomic changes, or strategic goals. AI that is abandoned or ignored will become technical debt just as any outdated technology is. If there is no dedicated review and update process for AI, the models and algorithms used will eventually become outdated and potentially less ethical and accurate from a business perspective.

We hope this guide and framework are helpful in supporting more ethical and practical AI projects. If you are seeking additional information on ethical AI, the ROI of AI, or guidance across data management, analytics, machine learning, and application development, please feel free to contact us at research@amalgaminsights.com and send us your questions. We would love to work with you.

Posted on December 4, 2019January 3, 2022 by Hyoun Park — 1 Comment

AI on AI – 8 Predictions for the Data Savvy Pro

When we started Amalgam Insights, we oh-so-cleverly chose the AI initials with the understanding that artificial intelligence (the other AI…), data science, machine learning, programmatic automation, augmented analytics, and neural inputs would lead to the greatest advances in technology. At the same time, we sought to provide practical guidance for companies seeking to bridge the gaps between their current data and analytics environments and the future of AI. With that in mind, here are 8 predictions we’re providing for 2020 for Analytics Centers of Excellence and Chief Data Officers to keep in mind to stay ahead while remaining practical.

1. In 2020, AI becomes a $50 billion market, creating a digital divide between the haves and have nots prepared to algorithmically assess their work in real time. Retail, Financial Services, and Manufacturing will be over half of this market.

2. The data warehouse becomes less important as a single source of truth. Today’s single source replaces data aggregation and duplication with data linkages and late-binding of data sources to bring together the single source of truth on a real-time basis. This doesn’t mean that data warehouses aren’t still useful; it just means that the single source of truth can change on a real-time basis and corporate data structures need to support that reality. And it becomes increasingly important to conduct analytics on data, wherever the data may be, rather than be dependent on the need to replicate and transfer data back to a single warehouse.

3. Asking “What were our 2020 revenues?” will be an available option in every major BI solution by the end of 2020, with the biggest challenge then being how companies will need to upgrade and configure their solutions to support these searches. We have maxed out our ability to spread analytics through IT. To get beyond 25% analytics adoption in 2020, businesses will need to take advantage of natural language queries and searches are becoming a general capability for analytics, either as a native or partner-enabled capability.

4. 2020 will see an increased focus on integrating analytics with automation, process mapping, and direct collaboration. Robotic Process Automation is a sexy technology, but what makes the robots intelligent? Prioritized data, good business rules, and algorithmic feedback for constant improvement. When we talk about “augmented analytics” at Amalgam Insights, we think this means augmenting business processes with analytic and algorithmic logic, not just augmenting data management and analytic tasks.

5. By 2025, analytic model testing and Python will become standard data analyst and business analyst capabilities to handle models rather than specific data. Get started now in learning Python, statistics, an Auto Machine Learning method, and model testing. IT needs to level up from admins to architects. All aspects of IT are becoming more abstracted through cloud computing, process automation, and machine learning. Data and analytics are no exception. Specifically, Data analysts will start conducting the majority of “data science” tasks conducted in the enterprise, either as standalone or machine-guided tasks. If a business is dependent on a “unicorn” or a singular talent to conduct a business process, that process is not scalable and repeatable. As data science and machine learning projects start becoming part of the general IT portfolio, businesses will push down more data management, cleansing, and even modeling and testing tasks to the most dependable talent of the data ecosystem, the data analyst.

6. Amalgam Insights predicts that the biggest difference between high ROI and low ROI analytics in 2020 will come from data polishing, not data hoarding. – The days of data hoarding for value creation are over. True data champions will focus on cleansing, defining, prioritizing, and separating the 1% of data that truly matters from the 99% more suited to mandatory and compliance-based storage.

7. On a related note, Amalgam Insights believes the practice of data deletion will be greatly formalized by Chief Data Protection Officers in 2020. With the emergence of CCPA along with the continuance of GDPR, data ownership is now potentially risky for organizations holding the wrong data.

8. The accounting world will make progress on defining data as a tangible asset. My expectations: changes to the timeframes of depreciation and guidance on how to value specific contextually-specific data such as customer lists and business transactions. Currently, data cannot be formally capitalized, meaning asset. Now that companies are generally starting to realize that data may be their greatest assets outside of their talent, accountants will bring up more concerns for FASB Statements 141 and 142.

Posted on December 2, 2019January 3, 2022 by Hyoun Park — 2 Comments

Developing a Practical Model for Ethical AI in the Business World: Stage 2 – Technical Development

In this blog post series, Amalgam Insights is providing a practical model for businesses to plan the ethical governance of their AI projects.

To read the introduction, click here.

To read about Stage 1: Executive Design, click here

This blog focuses on Technical Development, the second of the Three Keys to Ethical AI described in the introduction.

Figure 1: The Three Keys to Ethical AI

Stage 2: Technical Development

Technical Development is the area of AI that gets the most attention as machine learning and data science start to mature. Understandably, the current focus in this Early Adopter era (which is just starting to move into Early Majority status in 2020) is simply on how to conduct machine learning, data science efforts, and potentially deep learning projects in a rapid, accurate, and potentially repeatable manner. However, as companies conduct their initial proofs of concepts and build out AI services and portfolios, the following four questions are important to take into account.

Where does the data come from?
Who is conducting the analysis?
What aspects of bias are being taken into account?
What algorithms and toolkits are being used to analyze and optimize?

Figure 2: Technical Development

Where does the data come from?

Garbage In, Garbage Out has been a truism for IT and data projects for many decades. However, the irony is that much of the data that is used for AI projects used to literally be considered “garbage” and archival exhaust up until the practical emergence of the “Big Data” era at the beginning of this decade. As companies use these massive new data sources as a starting point for AI, they must check on the quality, availability, timeliness, and context of the data. It is no longer good enough to just pour all data into a “data lake” and hope that this creates a quality training data sample.

The quality of the data is determined by the completeness, accuracy, and consistency of the data. If the data have a lot of gaps, errors, or significant formatting issues, the AI will need to account for these issues in a way that maintains trust. For instance, a long-standing historical database may be full of null values as the data source has been augmented over time and data collection practices have improved. If those null values are incorrectly accounted for, AI can end up defining or ignoring a “best practice” or recommendation.

From a practical perspective, consider as an example how Western culture has recently started to formalize non-binary gender or transgendered identity. Just because data may not show these identities prior to this decade does not mean that these identities didn’t exist. Amalgam Insights would consider a gap like this to be a systemic data gap that needs to be taken into account to avoid unexpected bias, perhaps through the use of adversarial de-biasing that actively takes the bias into account.

The Availability and Timeliness of the data refers to the accessibility, uptime, and update frequency of the data source. Data sources that may be transient or migratory may serve as a risk for making consistent assumptions from an AI perspective. If an AI project is depending on a data source that may be hand-curated, bespoke in nature, or inconsistently hosted and updated, this variability needs to be taken into account in determining the relative accuracy of the AI project and its ability to consistently meet ethical and compliance standards.

Data context refers to the relevance of the data both for solving the problem and for providing guidance to downstream users. Correlation is not causation, as the hilarious website “Spurious Correlations” run by Tyler Vigen shows us. One of my favorite examples shows how famed actor Nicolas Cage’s movies are “obviously” tied to the number of people who drown in swimming pools.

Figure 3: Drownings as a Function of Nicolas Cage Movies

(Thanks to Spurious Correlations! Buy the book!)

But beyond the humor is a serious issue: what happens if AI assumptions are built on faulty and irrelevant data? And who is checking the hyperparameter settings and the contributors to parameter definitions? Data assumptions need go through some level of line of business review. This isn’t to say that every business manager is going to suddenly have a Ph.D. level of data science understanding, but business managers will be able to either provide confirmation that data is relevant or provide relevant feedback on why a data source may or may not be relevant.

Who is conducting the analysis?

In this related question, the deification of the unicorn data scientist has been well-documented over the last few years. But just as business intelligence and analytics evolved from the realm of the database master and report builder to a combination of IT management and self-service conducted by data-savvy analysts, data science and AI must also be conducted by a team of roles that include the data analyst, data scientist, business analyst, and business manager. In small companies, an individual may end up holding multiple roles on this team.

But if AI is being developed by a single “unicorn” focused on the technical and mathematical aspects of AI development, companies need to make sure that the data scientist or AI developer is taking sufficient business context into account and fully considering the fundamental biases and assumptions that were made during the Executive Design phase.

What aspects of bias are being taken into account?

Any data scientist with basic statistical training will be familiar with Type I (false positive) and Type II (false negative) errors as a starting point for identifying bias. However, this statistical bias should not be considered the end-all and be-all of defining AI bias.

As parameters and outputs become defined, data scientists must also consider organizational bias, cultural bias, and contextual bias. Simply stating that “the data will speak for itself” does not mean that the AI lacks bias; this only means that the AI project is actively ignoring any bias that may be in place. As I said before, the most honest approach to AI is to acknowledge and document bias rather than to simply try to “eliminate” bias. Bias documentation is a sign of understanding both the problem and the methods, not a weakness.

An extreme example is Microsoft’s “Tay” chatbot released in 2016. This bot was released “without bias” to support conversational understanding. The practical aspect of this lack of bias was that the bot lacked the context to filter racist messages and to differentiate between strongly emotional terms and culturally appropriate conversation. In this case, the lack of bias led to the AI’s inability to be practically useful. In a vacuum, the most prevalent signals and inputs will take precedence over the most relevant or appropriate signals.

Unless the goal of the AI is to reflect the data that is most commonly entered, an “unbiased” AI approach is generally going to reflect the “GIGO” aspect of programming that has been understood for decades. This challenge reflects the foundational need to understand the training and distribution of data associated with building of AI.

What algorithms and toolkits are being used to analyze and optimize?

The good news about AI is that it is easier to access than ever before. Python resources and a plethora of machine learning libraries including PyTorch, Scikit, Keras, and, of course, Tensorflow, make machine learning relatively easy to access for developers and quantitatively trained analysts.

The bad news is that it becomes easy for someone to implement an algorithm without fully understanding the consequences. For instance, a current darling in the data science world is XGBoost (Extreme Gradient Boosting) which has been a winning algorithmic approach for recent data science contests because it reduces data to an efficient minima more quickly than standard gradient boosting. But it also requires expertise in starting with appropriate features, stopping the model training before the algorithm overtunes, and appropriately fine tuning the model for production.

So, it is not enough to simply use the right tools or the most “efficient” algorithms, but to effectively fit, stop, and tune models based on the tools being used to create models that are most appropriate for the real world and to avoid AI bias from propagating and gaining overweight influence.

In our next blog, we will explore Operational Deployment with a focus on the line of business concerns that business analysts and managers consider as they actually use the AI application or service and the challenges that occur as the AI logic becomes obsolete or flawed over time.

Category: Data Science

Conferencing Solutions: Work in the Time of Corona

Work in the Time of Corona: An Alert for the Amalgam Community

RIP to Business Legends Leila Janah and Clayton Christensen

5 MegaThemes for the 2020s That Will Transform IT

Looking at Microservices, Containers, and Kubernetes with 2020 Vision

Why Technology Business Management and Technology Expense Management Are Misaligned

Nine Big Trends for Technology Expense Management in 2020

Developing a Practical Model for Ethical AI in the Business World: Stage 3 – Operational Deployment

AI on AI – 8 Predictions for the Data Savvy Pro

Developing a Practical Model for Ethical AI in the Business World: Stage 2 – Technical Development

Recent Posts

Categories