Data Engineering Grease, Bodies and Gears Behind AI

14
Jan 25
By | Other

The penetration of AI is deepening. We’re starting to put AI-powered advances in the hands of everyone from nurses to civil engineers. These are advances based on deep learning mechanisms that are based on pattern recognition, massively complex extended algorithmic logic, and custom-tailored specialized intelligence derived from techniques such as augmented regression generation.

When a new part rolls down a production line today, manufacturers use imaging equipment to scan the inside of components, then run these data files through AI models for analysis. Better products and services are emerging every day as a result of the data analytics we apply across industries and the paths we identify for applying AI to make parts, processes, people and products work better.

These systems detect impossible-to-find defects in real time, improving product quality and reducing costs. The benefits are clear. However, enterprise deployment of AI is rarely so straightforward.

Why AI is not magic

There is an expectation, given the incredible capabilities of consumer-facing AI tools, that all we need to do as a company is add a little corporate data to an AI solution and the result will be magical. In reality, deploying enterprise AI is a very complex data engineering challenge, points out Jim Liddle, chief innovation officer for data intelligence and AI at cloud file service platform company Nasuni. So what is the big challenge here?

“One of the biggest and least-understood barriers to successful AI deployment is the ‘file data’ associated with work applications within any given enterprise IT cluster. Structured data is familiar territory for most organizations, but unstructured data is [file] data such as documents, images, videos and other files make up 90% of all data generated,” Liddle explained. “This is the raw material that organizations want AI tools to work on. So how is it done?”

How should organizations go about working with this level of data if it is to be useful for consuming the AI ​​engine?

Synthesize & Curate… Then Do

First, we must remember that data engineers must source this unstructured data and find out where it lives… whether it resides on local storage devices within an office environment, in the cloud, or if it is distributed across platforms different software. The same data engineering team must then “synthesize and curate” these files.

“This is where the work of deep data engineering becomes critical for exploring, cleaning, normalizing and organizing data in a scalable and repeatable way,” explained Liddle.

“Once a business has a grasp of data engineering work, the next big question is which tool or type of AI model they should deploy. It depends on the particular needs of an organization, and there are associated concerns that the data engineering team typically should not prioritize.”

From this point, the data team should look to see if there is any presence of “latent bias” in the data being used (or built into the underlying model of your chosen AI provider) and this control mechanism works accordingly that data privacy and security must be evaluated in the context of business plans to use an AI model with its data.

Questions to ask include:

  • Will the AI ​​model run strictly internally?
  • Will it put AI-driven tools inside a product?

The last point below is important because a firm will need to consider the regulatory landscape and adhere to the rules governing the territories in which it operates. A business could easily run afoul of some of the new regulations that require companies to disclose to their customers whether they use AI in their product suite.

Validation Break Vs. AI for AI’s sake

“Last but not least, an organization will likely want to detail (or at least try to project) the business value associated with its data engineering efforts. These projects will not be AI for AI’s sake. The board or executive leadership team will want to see how each data engineering plan and AI implementation will drive revenue, reduce costs, or both. With something like the production use case described above, the value is clear. An AI-assisted inspection tool helps companies minimize product defects and improve quality, thereby reducing costs and increasing customer satisfaction,” detailed Liddle, in an effort to really explain how these projects should set in real-world scenarios.

He gives another example, that of a global media and marketing company with studios around the world.

If each of the offices of this company has its own files stored independently, then a creative team creating or developing a new project cannot use the institutional knowledge base of the larger firm. But if this global marketing giant’s files are consolidated, curated and made securely available to AI-enhanced search and indexing tools, then creatives in an office can quickly access previous work on projects in industry or similar locations, or even previous work for the same client. This rapid access to institutional knowledge will help them generate more informed work in less time.

“The key step here is not exactly which AI tool you choose or develop,” Nasuni’s Liddle said. “This is certainly important. But first, a firm must do the data engineering work that allows these tools to be successful. This starts with a strong data management framework for files and unstructured data. This framework should provide visibility into the dataset with a rich understanding of that dataset and global access to those files. Finally, the data management framework must be able to ingest new data and make it available to AI tools without forcing the data engineering team to jump through technical hoops. The fresher and more relevant the data the AI ​​can have, the better the results.”

IT is not magic

Data engineering has never been an easy job, and in many ways, it’s harder than ever. Alas, laments Liddle, there is no magic, but there are strong data management frameworks that can help firms get the most out of AI and generate real, measurable returns.

In reality – and as essential as data management frameworks undoubtedly are – a holistic approach to total data engineering is also needed along with a comprehensive approach to data origin control and further data dissemination , preparation and dissemination. Just remember, IT looks like magic, but guts, gears, and grease show that it really isn’t.

Click any of the icons to share this post:

 

Categories