In this Digital CxO Leadership Insights video, Mike Vizard talks with Veryfi co-founders Ernest Semerda and Dmitry Birulia about how the JSON data format and AI will transform robotic process automation.
Mike Vizard: Hello and welcome to the latest edition of the Digital CXO Leadership Insights videocast. I’m your host Mike Vizard. Today we are with Ernest Semerda and Dmitry Birulia, tough for me to say. But they are co-founders of Veryfi. We’re going to be talking about all things related to OCR, AI, and robotic process automation and digital transformation. Gentleman, welcome to the show.
Ernest Semerda: Thank you very much, Mike.
Dmitry Birulia: Thank you Michael.
Mike Vizard: So, what exactly is going on these days with OCR? The technology has been around forever. We’ve been talking about robotic process automation for a while and it seems to be some people do it, some people do it poorly, and it’s kind of uneven. Then, everybody has been talking about AI. But we’ve been talking about all this stuff forever and I feel like it’s this primordial soup of technologies that never quite find the catalyst that really drives mainstream adoption everywhere. Usually, it’s some maybe some global 2000 company that throws an army of people at it that make it work. But where are we on this journey and what needs to change?
Dmitry Birulia: Yeah, that’s a great question, Mike. I like what you said about the army of people. Usually, companies try to throw small armies at a problem like extraction of data from unstructured documents. Then, they end up using humans in the loop solutions. So, usually, offshore labor to do that work. Yet, we always talked about flying cars; right? The flying cars are coming, we’re all going to be flying around instead of driving around like Back to the Future but that never happens; right?
So, the reality is someplace in the middle. What we’ve seen a lot is a lot of RPA companies, their focus is on automations. So, bringing multiple pieces together, automated certain tedious tasks. But when it comes to data extraction and it’s commonly referred to as OCR which is creating an image and converting that into text so it could be used for something. I mean, that’s very specialized. That technology, what we’ve got in Go Net is grabbing an image, converting to text, but that text is still unstructured; right? It’s very hard to use.
The best thing you can use it for is copy and paste manually. So, you copy that section out, you paste it into your ERP or some other product. But the next step from that is using specialized intelligence. That’s where you’ve got that text, the system has then learned enough to understand that invoice number and the value next to it is the actual value for the invoice number; right, and then be able to structure that in some sort of a common standard format like JSON for example which is a very common format that’s being used today.
Previously, it was XML. Then you can rely on that structured data to be able to feed it into maybe a product you’ve built to be able to automate — actually automate something or be able to feed it into an ERP system, feed it into a BI system to get more insight. So, that’s the complexity that hasn’t really matured to be able to understand the conversion from a picture or PDF to text and from text to something that’s standardized, something that you can rely on. Because when you think about documents, they’re not all English; right? They come multiple different languages.
So, how do you understand across languages? How do you understand across different structures? That’s where the AI is rely required. That’s where deep learning, for example, plays such a pivotal role in being able to create value out of that OCR conversion. I know that was a long answer but hopefully that sort of hit the nail on a spot.
Mike Vizard: As I understand it though, if I convert it to a JSON format, I can also use the components of the things that I converted; right? Today, if I convert to something that’s OCR, it’s still basically one big static blob but it sounds to me like if I have it in a JSON format, I can then go in there and start using — the third and the fourth paragraph might be the only germane part of this thing then I can start integrating that with my others processes. Is that about right?
Ernest Semerda: Yeah, exactly. So, look at line items for example. Line items are quite interesting. They’re available both on a receipt, on an invoice, on a bill. You can use line items for example in expense management scenario where you can enforce policies based on what people spend when they travel. So, that’s an expense management use case.
Or, all the way, could stretch it to the loyalty CPG industry where brands really want to understand what consumers are spending their money on when they’re spending. Things like fast-moving consumer goods, things that you buy on a regular basis. So, that’s where line items can add a lot of value, provide intelligence to the brands to be able to thing incentivize the consumer with the right coupons, with the right cashback schemes, anyway to build that brand equity and get the consumer to spend more money by going into the shop using that coupon.
Dmitry Birulia: I just want to add, too, because there are different approaches to do the data. Like, intelligent document processing. One approach is templatize approach where there’s no AI. Okay, I know how the document layout look like, I’m just going to templatize it. I know where exactly the value I’m interested in resides on the page. For example, it might be working okay for very standard documents like driver’s license, passports, W-9s, W-2s. Those are very standard documents and the templates may work for those types of problems unless the layout changes, then you have to change your templates; right?
Then there is another approach which is generic approach where there are companies like the big companies who are taking this approach, they call it a “form recognizer” where they recognize any types of forms and they break that form into key value players or they find tables that were in the document and they basically create this table where, again it’s a key value pair. Then, the complexity here is that a user of that solution would have to deal with all the different headers of that table and every single different key possible to compare the key value to use the key value pairs. Then, there’s AI solution where it works the way we human understand documents.
I know Ernest was rereferring to invoices for example. If we take an invoice — I know he gave an example of invoice number — if I hide every single word on the invoice and I show you just one number, would the human be able to tell us what the number refers to? Probably not until you start building a little bit more of the context around that number. Right?
It’s like, there’s an invoice number somewhere above or below. If there are lines across that number, then probably a table. The same way we humans consume information from say maybe invoices, the same way AI consumes or understands that information. Then, what we at Veryfi do, we provide the very standardized JSON where — like, for the invoice number, the key will always be “invoice number” even though on the invoice it could say factura which is in Spanish for invoice. Or, it might say IND, or it might be in a completely different language. So, this is an AI approach of understanding the documents.
Mike Vizard: Where are we on this adventure? Because I can remember 40 years ago interviewing a fellow and he told me that there would be paperless bathrooms before there were paperless bathrooms and so far, he seems to be okay with that prediction. So, my question is: Where are we going from here? Because it seems like every time I go to visit an office, there’s still a lot of paper floating around. Maybe not as much, there’s forms on electronic devices here and there. We haven’t seemed to have made this great leap forward after 40 years so what’s holding us up and what’s next?
Ernest Semerda: Let’s not even worry about paper. Let’s think about, we have so much unstructured documents, right? Whether it’s paper or whether it’s a digital form like a PDF, we have so many systems that are generating data on a daily basis that we have no standard way of creating, for example, an invoice. So, this is probably one of things that’s holding us back, everyone’s trying to revolutionize, change the world but they’re creating different systems, different protocols on top of it. This is where we are today. Even if we remove paper from circulation, we still have the issue of unstructured or semi-structured documents; right?
For example, the PDF. So, that’s where we’re at. As we tell everyone to standardize and use one standard, whatever that is, it’s such a big change to try to get everyone in the world to sort of unify and follow one standard. I know for example open banking in the Commonwealth countries, they’ve been pushing for that one standard. It’s just like pushing a builder up a hill because everyone’s got different ideas on what that standard should look like. So, that’s where we’re at today because of all these contradictive approaches to solving some of these challenges.
Dmitry Birulia: Definitely going to take a lot of time. Millions and millions of checks are written every year in the United States when we have association actions, we have credit cards and there’s a lot of cash transactions happening even though everybody thought that credit cards would solve this.
Mike Vizard: As we go forward, how smart can AI get? Will it be able to determine that certain documents conflict with each other or maybe this document doesn’t have the same kind of data it should have? Or, maybe is this thing outright fraudulent?
Ernest Semerda: Yeah, do you want to start or should I?
Dmitry Birulia: You start.
Ernest Semerda: Okay. So, specialized intelligence, I think, is what we’re talking about. General intelligence, which everyone sort of talks about in the media, that’s a long way off. But when it comes to specialized intelligence, it’s just a matter of training the system with enough data. You know, we’ve got a very large machine model because we collect data from multiple sources and we’ve been building out this machine model so it’s pretty specialized. It’s very intelligent in terms of being able to understand what is the value of an invoice number across all the different languages.
So, that is already here. But then the next step — once you get that data, what do you do with that data. I think you touched up on this point, it’s you can detect, for example, duplication because we’ve got all that historical data as well. It can then run live a query that says, “Hey, this is a duplicate of a previous submission.” That’s very important in expense management, very important in the loyalty space, they call it fraud when someone tries to submit a CPG receipt twice.
So, you can do all of these fancy calculations but getting the data in a structured form is really pivotal here. Our system, you can classify the document (I think was one of your questions as well). We can identify if it’s a bill, if it’s a receipt, if it’s a W-2 that was submitted to the API. So, all of that classification is already in place. We can identify logos. Your vendor information like Walgreens, for example, that’s a custom font and you need to use machine visioning in order to understand what that is.
But we can understand that today so all of that functionality is available today. I think what isn’t available is the stuff we still don’t know. It’s like what do you do with that data to push it to the next level? To be able to change bookkeeping from a monthly process to a real-time process where the machines are doing all of that work so now, you’re getting insights in real-time versus having 12 data points on an annual basis; right?
So, there’s all these great opportunities that come out of it and the one that really excites me the most is we’ve seen industrial revolutions. If it wasn’t for electricity, we wouldn’t have computers, for example. So, we see these leaps and bounds that happen every so often and change society as a whole. We know that the next evolution is going to be around AI intelligence. You already see that with Tesla’s manufacturing, right? How much automation is coming in. But all of these machines, they’re only as smart as the data we’re giving to them. So, if we can’t the data (the foundations) right in a standard way, these AI systems will be stupid. They’re not going to be smart.
So, getting these foundations laid, just like what Veryfi is providing, is pivotal for us because that really enables the next evolution of society to be able to truly move to a system where we have AI making all these smarter business decisions for us because of the foundational data layout.
Mike Vizard: Who gets the need for that these days? We hear about chief data officers but it’s not quite clear what their mission is. But to your point, who’s in charge of aggregating the data in a way that makes it more consumable for AI models and who are you encountering that’s leading the charge?
Ernest Semerda: Oh gosh, that’s a good question. So, I mean the chief information officer, I think part of their job is to make sure there’s trust and security around the data because there’s so much data that’s moving back and forth. Take, for example, hospitals; right? They use faxes because I think 70 percent of health institutions in the U.S. still use faxes. You’ve got so much PIA data that’s moving back and forth between these old school devices.
The role of it is, what do you do with that data after it’s been transferred from one party to the other? Now, in a hospital scene, in a healthcare space for example, you can use data for multiple things. Being able to draw up history of the patients’ medical records, for example. Being able to understand what medicines they’ve taken; right? Whether they’re taking too much or there’s a better alternative for that medicine.
Just one example, I’m Australian so I was in Australia and I purchased an asthma spray (one of these puffers) from two different pharmacies on the same day. If it wasn’t for Veryfi, I would never have picked up that there was a major price discrepancy between the morning purchase and the afternoon purchase. In Australia, medicine should be standardized and it was like a 40 percent difference. So, that was really insightful; right? For me, for example, and for any consumer that there’s something wrong with the pricing. So, I went back to the second pharmacy and got my difference back in terms of how much they overcharged me.
But when you think about what’s happening in America, the number one cause of bankruptcy in the U.S. is actually medical billing because of human error in the process. So, if we can eliminate all of that, we can reduce a lot of pain for people that happens due to human errors, humans getting just tired from doing so much data entry. So, we’re collecting data from multiple sources is what I’m trying to say. Right? Whether it’s on a consumer level, whether it’s internal. We just need to make sure that that data is — like for us, for example, security and privacy is pivotal to what we do because we process so much data.
Data privacy is going to be paramount for all of us as we move forward because of the wealth of data, the wealth of PII that’s out there. The last thing you want is that to be leaked out to any rogue hackers. Yet, multiple locations of data coming in and then the next step is what do we do with that data? We generally don’t want to trash it because there’s so many insights and so many opportunities that can be gained from it.
Mike Vizard: How much are people going back in time to capture data? I mean, are they going back years to digitize this stuff? Are they going back months? Or, do they draw a line in the sand and say, “Let’s just start from here”?
Ernest Semerda: Yeah, I think it depends on the application. The IRS wants you to keep a record of up to seven years back of all your — on what you spend your money on — so, all of your receipts, any bills you may have had — just in case you get audited. So, for those industries, you’re keeping the records for a very long time. It’s a sedentary obligation, I guess is probably a better way to explain it. But at the same time, when I think about all that data, imagine being able to glimpse how your habits are on spending or certain brands that you spend more money on. There’s just so much data.
And I think you could sell that data as an individual. I mean, that’s just sort of a new idea that could come out of it. You know, we all own that data to a degree but you look at other industries. Like in healthcare, you might want to keep the data for a few years as well just so you’ve got a medical record of what sort of drugs you’ve consumed or services you’ve had especially if you’re moving between countries. Like, my medical records from Australia don’t exist in the U.S. because the healthcare systems just don’t talk together. That’s internationally. But even locally, when you go between healthcare institutions, the electronic health records do not communicate either. So, I think it all depends is what I’m trying to say.
Dmitry Birulia: Like I said, different applications. If for example, someone just needs to validate these drivers’ licenses pictures matches the face of the person who claims to be on that driver’s license, that’s just an instant document data extraction, validation, and then this data is no longer required. However, smart businesses try to process all their data because that data provides the insights, they might not be looking for today but they might start looking to those insights tomorrow.
Ernest Semerda: Yeah, I mean, there is so much data in the company and the executive team could make better, faster, smarter decisions. Actually, one good example that came into my mind was construction. So, they call it “job costing” in construction, keeping a record of every single transaction. So, if this was a Home Depot receipt, every single line item here would basically have to have a job code associated with it. That’s important because as general contractors go and get loans or starting new projects, they need to be able to demonstrate that they can forecast how much a project will cost.
Being able to understand the costs of a project as well, not to overblow the budget, and even getting loans, like mezzanine loans from the banks is very important as demonstrating historical data to show that they can be trusted. Just part of that liability check for the banks, too. So, job costing, get data from the field like from the work site is important to be as close to real-time as possible to be able to catch certain projects that are running overtime or anything like that. Yet, construction has been doing that for a long time. The second largest industry in the world, right, requires that sort of data to be as close to real-time as possible.
Dmitry Birulia: And also, the security and fraud prevention, for example. A lot of big enterprises have been fraudulently invoiced for millions of dollars. AI can capture that fraud and can actually flag those invoices that, for example, historically had one banking remittance information and all of the sudden, there’s a new invoice comes in and the bank routing and account number change. So, the AI can definitely identify those changes and point them out.
Mike Vizard: All right. Hey, guys, thanks for being on the show. It kind of sounds like if we don’t start with the fundamentals, we’re just going to have all kinds of headaches as we go along and we’ve never been very good at data management as it stands but maybe JSON is a place to start.
Ernest Semerda: Yeah. Cool, thanks Mike.
Mike Vizard: All right guys, stay safe. I want to thank you all for watching this latest episode of the Digital CxO Leadership Insights videocast. I’m your host, Mike Vizard. You can find this and other episodes on the Digital CxO website. We invite you to check them out and once again, thanks for spending time with us.
Ernest Semerda: Cheers, bye.
Dmitry Birulia: Cheers.