For those of us who remember the data mining processes of the eighties and nineties, there’s a certain comfort in knowing that machine learning techniques have been around even before that milestone on the global technology evolutionary curve. Legacy is, of course, no bad thing; it brings with it experience. It’s a message that Ulf Persson is happy to reiterate in his position as CEO of the Document AI platform company ABBYY (hereafter written as Abbyy); the company has been around for some 30-plus years now.
The company held its Abbyy Ascend 2026 partner and practitioner convention in Nashville this month to explain where its platform goes next, while also tabling some significant news.
OCR, RPA… IDP
Tracing its routes back through the pre-millennial iterations of Optical Character Recognition (OCR) and the intersection of Robotic Process Automation (RPA) to sit in what the company claims to be the new vanguard of Intelligent Document Processing (IDP), Abbyy has seen industry standards come and go.
Perhaps no surprise then, the organization has worked with a group of big names to now lay down a new industry standard measure, benchmark and technology gauge meter. Abbyy Ascend saw the company work with partners IBM and Red Hat to announce the formation of the DocLang working group under the Linux Foundation’s LF AI & Data Foundation.
The DocLang AI-native standard has been laid down with the intention of “revolutionizing enterprise document processing”, by providing a unified, AI-readable format to represent documents for language model and agentic AI consumption.
“DocLang is specifically engineered to address industry challenges with a minimal, standardised, and AI-native method for representing document structure, meaning, layout, and governance,” commented Maxime Vermeir, vice president, AI strategy at Abbyy. “Being designed for efficient machine processing provides a predictable structure optimised for modern AI tokenisation and modelling techniques. Organisations will see a significant difference with more reliable interpretation, reduced hallucinations, and lower computational costs.
The View From The CEO
Speaking to CEO Persson about his wider vision for the company, he suggests that there is a particular kind of institutional self-confidence that only comes from having been in the right place in the market for a very long time. Abbyy was founded in 1989, and Persson doesn’t need to oversell the pivot to AI – because, as he is at pains to point out, Abbyy never really left it.
His own route to the top job was unconventional by tech-industry standards. A background in economics and investment, years as a board member and then chairman, before stepping into the chief executive role during a period of significant strategic repositioning.
“I wasn’t a technologist per se,” he acknowledges, “but I came pretty well prepared. I knew the customers, I knew the people inside the business.”
From OCR to Intelligent Automation
To understand where the organization is going, it helps to understand where it came from. OCR was the foundation of the company, and Abbyy worked through a marketplace where Nuance and Kofax also existed as document capture application vendors. Today, Abbyy has been through a process where non-essential activities were divested.
Then came robotic process automation. The CEO says there has been a shift in enterprise consciousness. “It really focused companies’ attention on the necessity of understanding data and extracting and activating data that is embedded in business documents,” he says. Before RPA, the conversation was about back-end batch processing, archiving and large-scale scanning workflows. What RPA did was drag document intelligence into the transactional, customer-facing front end of business operations – where decisions happen in real time, not overnight.
The Generative AI Inflection Point
The arrival of large language models did not catch Abbyy flat-footed. The company had been working with NLP, machine learning and computer vision for decades. But, says Persson, generative AI is not a continuation of the previous curve; it is a different curve entirely.
“What we could do ten years ago, five years ago, two years ago – and now – it’s completely different. The art of the possible is so different.” The practical consequence for ABBYY over the past two and a half years has been a thoroughgoing change in how it builds products, what it promises customers, and how it thinks about the boundaries of the intelligent document processing category it helped create.
That category, he suggests, may not survive much longer in its current form. IDP is increasingly bleeding into process understanding, orchestration and agentic AI. Abbyy’s technical approach is built around vision language models — VLMs — rather than the generative, dialogue-driven models that dominate public discourse. The distinction matters. Where a generative model produces conversational, open-ended responses, a VLM trained for document understanding does something more structured and, for enterprise purposes, more auditable: it identifies the fields, labels, names and value structures embedded in the physical layout of a document, and returns them as clean, queryable data.
This is not a merely technical preference – it is a compliance and risk management position. “If there is an error in what comes back from a number of documents, we know what that error is,” says the CEO. “We can build structures and processes and human-in-the-loop around that. But if you use a model that you don’t know what’s gone into it, you can have different errors every single time. For many customers, that is just unacceptable.”
DocLang Defined, Definitively
Returning to the DocLang announcement. This standard is being positioned as a universal AI-native document format, but de facto formats are tough to solidify and many flaky attempts have fallen by the way. What kind of critical mass does Persson and team anticipate they will need to make this stick?
“As you’re reading this, there’s an AI agent trying to read through a document and make sense of it, and a frustrated developer wondering why there’s yet another document format that’s breaking his pipeline. DocLang is defined to solve exactly that frustration. By leveraging the developer ecosystem around DocLang and ABBYY FineReader Engine, we’ll quickly get a snowball effect of adoption. And let’s not forget about the built-in governance capabilities; this is something organizations are in dire need of further expanding the adoption opportunity,” said Persson.
All well and good then, but what does IBM and Red Hat’s involvement in DocLang bring beyond a bit of big brand credibility?
“For me, the group that makes up the DocLang workforce demonstrates the true need for a universal AI-native document standard. Each organisation continues to realize their business runs on documents and that future success of leveraging AI advancements hinges on having that cornerstone of their data fully under control,” said the CEO.
A Document Format for the AI Era
DocLang proposes a document format built from the ground up for the AI era — one that preserves structure, context and inter-field relationships in a way that language models can consume efficiently. The ambition is to establish it as a de facto standard, in the same way that the Linux Foundation has underpinned open infrastructure standards across the industry.
But is that realistic, or just aspirational marketing?
The CEO is measured. “Could someone else do another similar standard? In principle, yes. But we’d be the first. And why would you?” The argument is less about technical lock-in than about the enormous latent value sitting in enterprise document archives – billions of PDFs at insurance companies, banks, government agencies – that remain opaque to the AI systems now being built to interrogate them. DocLang is, in this framing, an infrastructure for unlocking that value.
Abbyy is clearly deliberate about communicating: That three decades of experience with document data is not legacy baggage. Where newer AI-native competitors are building understanding of document problems from scratch, Abbyy says that it arrives with customer relationships, language and vertical-domain training data, and institutional knowledge about failure modes that only comes from having processed more documents in more languages more times than almost anyone else.
“Just because a new tool or technology – however powerful – comes around, doesn’t mean that whatever you knew before you have to throw out,” the CEO says.
Memory & Experience is a Feature, Not a Bug
The market for document intelligence is no longer about finding five keywords to save an invoice to an archive. It is about identifying embedded risks in lease agreements, navigating changing regulations, and extracting value from the “billions of records” locked in corporate PDF silos.
As the CEO puts it, Abbyy’s “one-plus-one” advantage is clear: It is an AI company built on thirty years of document experience, not a startup starting from zero. In the world of enterprise tech, that kind of institutional memory is a feature, not a bug.

