Artificial Intelligence (AI) has a great capacity for good. I believe human-driven AI will probably be one of the greatest tools humanity has ever developed. But fulfilling that potential requires us to do the hard work - now. This begins with ensuring the data our systems ingest are comprehensive and free of bias. The good news is that technology can and should help.
Data Bias - A Real-World Example
The typical enterprise won’t gain much benefit from AI trained on data scraped randomly off the internet. Business value comes with AI trained on an organization’s own data, which is also where bias can creep in. Flawed data sets produce flawed AI decisions, and these can have drastic consequences:
A woman in the United States took sleeping tablets, following her doctor’s advice based on the manufacturer’s own guidelines. The next morning, she rose and drove to work, but got pulled over—and later arrested. The issue? The prior night’s medication still in her system left her driving under the influence. She fought the charges in court where it was later revealed the medicine guidelines her physician gave her, based on the advice from the manufacturer, were developed using data solely from male test subjects. With faster metabolisms, certain medicines exit the systems of men far faster than women. In this case, biased medical data led to bad medicine and a scary legal entanglement.
How to Avoid Biased Datasets
To avoid biased data, or at the very least mitigate its prevalence, companies should follow two important steps. First, the widest array of data needs to be ingested. This includes vast amounts of their own, proprietary raw data, structured and unstructured, drawing upon every possible company source, such as documents, excel files, research, financials, regulatory data, historical data and benchmarks. Second, controls are required, enabled by meta-tagging data with contextual information.
To accelerate this process, companies need a tool that enables the data to be ingested with the necessary context applied. This has historically been the role of subject matter experts. However, processing data at scale requires a rules-based engine to classify data with the proper taxonomies and ontologies, thus providing the context behind the data, which can so often expose the bias.
This process enables businesses to not only consider the validity of the algorithm, but really, the source data used to train the algorithm as well. Oversight is where humans can help keep the AI decisioning on track. For example, we wouldn’t teach an algorithm that 2+2=5. But that’s exactly what we’re doing if we don’t ensure the data we use for AI is clean, sensible and has the proper metadata context.
Infusing AI with internal data already shows great promise. BloombergGPT™ is reported to be 52% proprietary or cleaned financial data. Its study found, “the BloombergGPT model outperforms existing open models of a similar size on financial tasks by large margins, while still performing on par or better on general natural language processing benchmarks.” This is just one example but shows how powerful integrating internally sourced data sets can be.
AI Still Needs Humans
Regardless of where the data comes from, AI lacks a moral compass and ethical context that human decisions organically include.
To compensate for this gap, we must ask the right questions and include those rationales in our data sets. AI algorithms also need to be trained across cultures, ages and genders, as well as a host of other parameters to account for bias. The cleaner the data points used, the more sound the decision.
The “wisdom of crowd” theory puts forth, in brief, that the more data points you combine about a particular question, the more “right” your resulting answer. This even holds when crowd-sourced decisions are compared to experts. Stripped to its core, AI takes a reasonable guess based on the data it has. Accuracy, therefore, comes from aggregating the data points and balancing the wrong and the right to discern the most probable. But AI can’t govern itself. It takes diverse and critical thinking, weighing many factors to ensure the decisions we get via AI’s advanced decision-making are for the good of the whole, rather than biased to the few.
A Transparent Way Forward
As the world of data grows, businesses need scalable solutions to process and manage it all. There is a limit to how much information a human brain can process. And repeatedly retaining subject matter experts is impractical. Achieving unbiased data requires an agile, transparent, rules-based data platform where data can be ingested, harmonised and curated for the AI tool. If businesses and their AI teams are to responsibly move forward, they need a replicable, scalable way to ensure AI algorithms are trained with clean, quality data. Preferably, their proprietary own.
In my next blog, I am going to look at another feature that any data platform should have to help remove data bias and add further transparency to the data: bi-temporality. That piece will look at how it can be leveraged to provide data provenance and lineage throughout the life cycle of the data.
Philip Miller is a Customer Success Manager for Progress | MarkLogic.
12 oktober 2023 Praktisch en interactief seminar met Nigel Turner Data-gedreven worden lukt niet door alleen nieuwe technologie en tools aan te schaffen. Het vereist een transformatie van bestaande business modellen, met cultuurverandering, een heron...
6 t/m 8 november 2023 Praktische workshop Data Management Fundamentals door Chris Bradley - CDMP-examinatie optioneel De DAMA DMBoK2 beschrijft 11 disciplines van Data Management, waarbij Data Governance centraal staat. De Certified Data Manag...
9 en 10 november 2023 Praktische workshop Data Governance & Stewardship door Chris Bradley - CDMP-examinatie optioneel Wat betekent Data Governance eigenlijk, hoe kunnen we het praktisch laten werken en wat zijn de implicaties? Deze 2-daag...
16 en 17 november 2023Praktische workshop boordevol tips en technieken met Alec Sharp Er is toenemende belangstelling voor modelgebaseerde technieken. Alec Sharp behandelt de belangrijkste technieken, waaronder Concept Models, Process Scope en Proces...
30 november 2023 (online cursus van 1 ochtend) Workshop met BPM-specialist Christian Gijsels over business analyse, modelleren en simuleren met de nieuwste release van Sparx Systems' Enterprise Architect, versie 16.Intensieve cursus waarin de belangr...
7 december (online seminar op 1 middag)Praktische tutorial met Alec Sharp Alec Sharp illustreert de vele manieren waarop conceptmodellen (conceptuele datamodellen) procesverandering en business analyse ondersteunen. Waardevolle online tutorial van e...
29 - 31 mei 2024Praktische driedaagse workshop met internationaal gerenommeerde spreker Alec Sharp over herkennen, beschrijven en ontwerpen van business processen. De workshop wordt ondersteund met praktijkvoorbeelden en duidelijke, herbruikbare rich...