Consider the biggest, oldest, and most confusing IT challenge in history: managing and integrating data across the enterprise. As the volume, diversity, variability, and distribution of data across on-premises and cloud platforms grow exponentially, AI/ML technologies urgently need help. “You need technology that addresses the integration of data management into the enterprise,” said Stuart Bond, vice president of data integration and intelligence software at IDC.

Can AI/ML bring order to the chaos of data? It’s possible given certain conditions are met, but the consensus in the industry is that for now we are only scratching the surface of the potential for AI/ML to be realized one day. Existing embedded software vendors such as Informatica, IBM, and SnapLogic have added AI/ML capabilities to automate various tasks, while startups such as Tamr, Cinchy, and Monte Carlo have added AI/ML capabilities. uses AI/ML at the heart of its products. No vendor offers an AI/ML solution that automates end-to-end data management and integration processes, and there is no sign of it.
In short, because it is impossible. No product or service can reconcile all data anomalies without human intervention, much less reform the chaotic enterprise data architecture. What new AI/ML-based solutions can do now is significantly reduce manual labor in a variety of data management and integration tasks, from cataloging data to creating data pipelines and improving data quality.
It can also be a great achievement. However, real and lasting impact requires a Chief Data Officer (CDO) approach, rather than an impulsive introduction of integration tools for one-off projects. For companies to prioritize which AI/ML solutions to apply where, they must first develop a consistent and holistic view of their data assets (customer data, product data, transaction data, event data, etc.) .) and the metadata that defines them. data types must be fully understood.
How far does the business data problem go?
Most companies today have vast data stores, each associated with its own applications and use cases. This phenomenon has been further exacerbated by cloud computing. Indeed, business units can quickly build cloud applications with their own data silos. Some data stores are used for transactions or other operational activities, while others are used for analytics or business intelligence (mainly data warehouses).
“Almost every company in the world uses more than 20 data management tools,” said Noel Yuhana, vice president of Forrester Research. What complicates the problem is that the data management tools do not communicate with each other. Data management tools manage everything from data cataloging to master data management (MDM), data governance and data observability. Some vendors have mixed AI/ML capabilities in their products, while others haven’t yet.
At its core, the main purpose of data integration is to map the schema of different data sources so that different systems can share, synchronize or enrich data. The latter is essential to develop a 360 degree view of the customer, for example. However, even seemingly simple tasks, such as checking whether a customer or company with the same name is in fact the same object or checking which information in which record is correct, require human intervention. Field experts are often mobilized to set rules for handling the various exceptions.
These rules are typically stored in a rules engine built into the integration software. Michael Stonebreaker, who created relational databases, founded Tamer, which developed ML-based MDM systems. To illustrate the limitations of a rules-based system, Stonebreaker cites a real-life example of a large media company that created its own MDM system and accumulated rules over 12 years.
“This media company has 300,000 rules,” Stonebreaker said. If you ask someone how many rules are appropriate, they will usually answer 500. If you are threatened, you can reluctantly answer 1,000, and if you are physically threatened, you can say 2,000. But 50,000 or 100,000 rules, it’s completely unmanageable. The reason there are so many rules is that there are so many special cases.


