Chief Content Officer,
Techstrong Group


In this Digital CxO Leadership Insights Series video, Mike Vizard talks to ScaleOut Software CEO, Dr. William Bain, about how digital twins will enable deeper levels of analytics in real time.



Mike Vizard: Hello, and welcome to the latest edition of the Digital CxO Leadership Insights series. I’m your host Mike Vizard. Today we’re with Dr. William Bain, who’s CEO for ScaleOut Software. And we’re talking about digital twins, at scale, in production environments. William, welcome the show.

Dr. William Bain: Thank you very much – pleasure to be here.

Mike Vizard: I think a lot organizations have been experimenting with digital twins and kind of trying to look at it as a way to mirror all kinds of real world processes. But once they started doing that, in a production environment, some interesting challenges start to emerge. So from your perspective, what are you seeing folks trying to do to kind of implement digital twins at scale?

Dr. William Bain: Well, you don’t see much attempts to implement digital twins at scale, because digital twins have traditionally been used for the field of product lifecycle management, which means to help design very large complex devices like a jet engine, or an entire airplane, with a digital twin model. And people have thought about this term digital twins as at a high level of encompassing all aspects of the design in software that matches a physical entity, usually a single large, complex entity. The idea of applying digital twins to a large collection of devices or data sources in the in a real world, for example, tracking a fleet of trucks – imagine a large fleet that’s nationwide, where you have telematics systems, which are trying to get telemetry from these trucks and track where all the trucks are, what the drivers are doing, whether the cargo is safe, whether they’re mechanical issues, all of those things require tracking a large number of relatively simple things instead of a single very large complex thing like a jet engine. And the use of digital twins in that application is relatively new. And that’s something we’ve been looking at, and developing software for, for about the last three to four years, we’re starting to see other companies get into this space. And I think the real benefits of it are, it allows you, as an operational manager to have a very powerful tool for decision making. Because now you can model and track thousands, or even millions of entities that are under your control. For example, if you’re a logistics manager tracking pallets all over the United States, or you’re in a disaster recovery situation, there are countless examples; we can go through a few of those if we have time. All of these require the ability to have lots of digital twins and be able to run them in software at the same time. And that’s the challenge that creates is that you need a software platform on which you can run thousands or even millions of software objects called digital twins, run them in parallel, and be able to receive and process telemetry if you’re attached to a live system; or be able to model and simulation and be able to extract results very quickly so that operational managers can actually make decisions, good decisions quickly.

Mike Vizard: What are some of the technology hurdles that we need to overcome? I mean, you mentioned the collection of data. If I got a truck moving down a highway at 60 miles an hour, how do I collect that data and put it into a digital twin in a way that is occurring in near real time?

Dr. William Bain: Well, let’s think about why we want to do that in the first place. And what is the goal here? What a systems do today, telematics systems, for example, that are tracking fleets of trucks, they’ve recognized for more than a decade the importance of getting periodic telemetry and trucks used to send out telemetry once a minute. And now it’s down to once a second, once every few seconds. And the reason why you want to do that is so that you can have a good idea of where all of the assets are; you want to know that your drivers are not getting too tired driving over time. A lot of times, you want to make sure that, as I’ve mentioned, mechanical issues that might be emerging are handled quickly. And you know, if you have lost drivers, you have blocked highways, you have weather delays – all of these things can impact the flow of a system. What people typically do with this kind of telemetry is they dump it into a large database or a scalable file system, a big data file system like HDFS, and then they process that data offline, to figure out what went on. They look at this data more or less forensically, because it’s too hard to look at that data in real time. What digital twins do is they allow you to keep contextual information about every truck, every driver. So at your digital fingertips, you have information about that vehicle. And when a telemetry comes in, you can match up that telemetry with your knowledge about it and make a real time decision about whether something needs to happen. So for example, if you had a truck that you’re seeing lateral accelerations from the truck, and you want to explore them; is there a problem with the vehicle or problem with the driver? Well, if you know that driver has a bad driving history, and has been driving for 18 hours or some period of time that’s too long, then you’re going to think maybe those lateral accelerations are not due to a mechanical issue; maybe they’re due to the driver. And you can call that driver and you know, tell them to check to see whether or not it’s time for them to pull over. That’s just one very small example. But the ability to apply context to the data you’re seeing come in to telemetry soon coming in is a very powerful tool for making good decisions about what to do next in an operational system. And the challenge you have is if you look at the way people manage this kind of data today, they silo it in databases. So what’s hard to get at is when you get a telemetry message, it’s very hard to go get the context where – how are you going to get the drivers record? How are you going to get the mechanical record for that engine? How are you going to know what kind of cargo that is and whether or not a temperature change is important to that cargo? You’d have to have all that information right at hand, in order to make a good decision, which means you need to have a digital twin, for thousands of entities out in the real world. There are many other examples of activity; another quick example of fraud detection: You have credit card users that are going to credit cards, you know, to gas stations and going to make credit card purchases at points of sale. You want to be able to make a really good decision about whether to accept that transaction. Now, the big banks that are looking at how to improve their real time fraud detection, they still typically hold that data in column oriented oriented databases, so that they have to go do a database access to get the information they can in real time. So they can make a decision in about 15 milliseconds, which is about the amount of time they have. And so if you have all the context you need about that particular user, who is immediately at hand for access, within a millisecond, you can make a much better decision about whether to accept that transaction. So again, digital twins can provide value in credit card transactions, and we can talk about several other use cases where it’s compelling to be able to apply context to telemetry in real time. And then we need to talk also about how you can use the same infrastructure for simulation. But to answer your question, the challenge is, how do you, if you have digital twins, say you are able to implement them? And how do you then host them on commodity servers or in the cloud, and be able to do that at scale, so that you can distribute incoming messages to them, be able to process them, and be able to get results within a few milliseconds? And that’s a challenge. And it requires an in-memory computing architecture, which is designed for that purpose. It’s very hard from an application perspective, to implement all of the orchestration that’s needed to deploy and run digital twins at scale. And that’s where having an in memory computing platform comes in. And that’s the technology we’ve worked on for the last 18 years.

Mike Vizard: It seems to me that a lot of these applications are what I would call event driven, and the history of it is so much dependent upon batch oriented processing, that a lot of things are always out of sync with each other. We’ve been talking about event driven for a long time, but we don’t have a lot of people with those kinds of skills. So, you know, before we run, do we need to kind of address a skills issue here around how to build these digital twin type applications?

Dr. William Bain: Well, that’s a great question. In fact, that’s one of the key reasons to use digital twins is because they simplify the application developers view, as the application developer does not worry about event driven programming or actor programming, or some of the other techniques that are required to implement the full system. By using a digital twin, you can factor out the orchestration of all of this execution, and the processing of events, for example, from the application specific code. So all the application developer has to worry about is understanding their physical device or data source that they’re tracking or modeling and simulation. What are its key parameters, its key properties? And what kinds of messages will come from that live device or from another simulated version of that device? And how do I process those messages? So it understands all of the application specific or domain specific aspects of that application, but it does not have to worry about the orchestration of it. So for example, if you’re tracking a truck, you need to know things like the parameters of the engine – you know, the oil temperature, oil pressure, you know, brake temperature, whatever it is the parameters of the driver, you know – it’s time that started; how long is an operating route, also the itinerary for that truck, and the type of cargo and its parameters. All of these things are all domain specific. But all of the aspects of event driven programming are completely factored out and made and pushed into the underlying platform that is going to process these digital twins for our platform in particular, or others like ours. So that is a great simplification. And it’s one that people understand because digital twins have been in use for PLM (Product Lifecycle Management) for more than 20 years. So people have an understanding what a digital twin is. If you said it’s an actor model, people go, “Okay, I’m not sure I understand what an actor model is.” But if you say it’s a digital twin people are not as intimidated. But it really boils down to a simple object oriented model with properties and code that knows how to update those properties, when a new message comes in for that particular instance. So it’s actually simplifies the job of the programmer.

Mike Vizard: So ultimately, I have now a higher level of abstraction for building these types of applications in this unique use case. We also talk a lot these days about edge computing. And then we’ve been talking about cloud computing for the better part of a decade or more. Are these two things going to converge ultimately, as we use applications like these? Because it seems like a lot of what is happening will be data collected at the edge, but processed somewhere, but then sent back to some sort of action required at the edge. And so are we kind of splitting hairs between the cloud and the edge when it’s all one unified model?

Dr. William Bain: Well, in fact, it is one unified model – you’re right. And as the intelligence at the edge grows, more of this processing can be done at the edge. And one of the nice things about a digital twin model is, it doesn’t really care where it runs. So for example, if this is going back to the telematics example of the hardware that they actually run on trucks today – to produce it telemetry is fairly rudimentary- it probably would have a difficult time hosting a digital twin. But over time, you know, we see the hardware improving rapidly. And so the nice thing is, if you have a digital twin model that you’re running in the cloud, then you can migrate that to the edge seamlessly, without any change to the application code and just host it on the edge. And what do you get for that? You get much lower latency and processing with the telemetry coming in. Now, for simulation, that’s not the edge and clouds are really not a criteria that we need to concern ourselves with, because a simulation is typically run in one place on one large system with many servers running together to implement the simulation. But for real time analytics, yes, you do need to have an edge strategy and a cloud strategy, and where you benefit from the cloud strategy is when you need to aggregate information across many, many instances. So even though you’re processing telemetry about a particular truck and make some decision about it, or a health care device, or whatever it might be, I’ll give you another good example that’s been in the news: A railroad, tracking the actual temperatures. And if you think about it, they have, they have those boxes that they have on the tracks, and they look at the axles as they go by, and they can figure out which axle of which train it is, and what its current temperature is. And then they store that information in that box. And if they see a temperature exceed a certain limit, then they radio the engineer on the train, and hopefully the train stops before the derailment occurs. But what they don’t have is any processing of context. So what they really need to be doing is knowing what that actual has been doing for the last two hours or maybe two days, and understand what mechanical issues it’s had in the past. As you know, they discovered there were some defects in some of these axles. So if you know about those defects, and you know about the progression of temperatures, then you know what you can do is you can have a much earlier notification that the problem is occurring, and stop the train before you reach the temperature at which there’s going to be a breakdown and the train might derail. So some of this computing can be done at the edge. But you can imagine that, in this particular implementation, the edge is just one box on a track, it’s not going with the axle. So you really need to pull that data out to the cloud. So by just adding cell based, you know, broadcasting of the messages coming off this box, you can push that data to the cloud. And now you can have a digital twin of the cloud that’s tracking every axle on every train in the United States and watching it, you know, watching how it’s behaving and whether there’s a problem that’s going to occur, and accidents like we’ve seen in the last month or so can be avoided that way.

Mike Vizard: Of course, you cannot walk down the street these days without somebody jumping out and talking to you about their new AI model – whether it’s generated. The question is, where would AI converge with digital twins and how might those two kind of – two plus two equals 100 kind of thing?

Dr. William Bain: Yeah, well, that’s an interesting question. Yeah, digital twins can host machine learning algorithms like spike detection and anomaly detection. In fact, with our product, we do that with Microsoft ML. And so the challenge that a developer has when implementing a digital twin – we talked about the fact that, okay, now the developer just has to worry about application specific code. But sometimes that code is very hard to write. We actually had a company, a large cheese company, come to us a few years ago, saying, “We want you to implement streaming telemetry to track cheese boilers, and let us know if our cheese boilers are operating optimally, or you know, if there’s a problem.” And we said, that’s great, except we don’t have anybody on staff in our little software company that understands cheese boilers. So we weren’t able to actually implement that code. Well, it turns out that with machine learning, you can look at datasets where they get telemetry from, you know, like these cheese boilers, whatever they are, and be able to then look for anomalies; if what you have is a history of anomalies that have occurred, you can train a machine learning algorithm to detect them in the future. And the nice thing about digital twins is you can run this algorithm independently for every particular entity, every boiler, or every truck engine, or whatever it might be. And so machine learning can be running in parallel, and generating alerts without the need to actually write code, which can be very difficult if not impossible, to write – to find those particular anomalies or spikes in the telemetry. I think we ought to talk a little bit about simulation. So if I could, let me give you an example of how we can use digital twins in simulation to do some very interesting things. When we talked about live systems, just now all the examples I gave you are live systems. And there are many more examples. In health care, for example, tracking health care devices, and so forth. But the thing that you really need to do is, you have to be able to test this real time analytics in these digital twins that are implemented, implemented in some other way. If you don’t use digital twins, you might implement streaming analytics with another technique, like open source, for example. But in any case, you need a way to generate a workload as if it were coming from live devices. So you can test your analytics and verify that before you deploy them live, we’re going to test those train axles to make sure we can detect failed axles before they occur, and we need to test that with a workload generator. Well, digital twins can do that, play that role, too. They can implement a simulation of the thousands or millions of devices that are generating the telemetry, so you can use them in a dual role, not only to analyze telemetry, but to create telemetry. And that’s a more conventional way of thinking about digital twins; the digital twins have usually been thought of as modeling the behavior of a device. Well, in this case, as a workload generator, they’re modeling the behavior of a train axle, and maybe periodically having go to a high temperature sending out, you know, letting it be detected as it is coming out as having a high temperature. And then you can have your streaming analytics, say, “Well, I detected that and we stopped that train.” So using digital twins for simulation provides this missing piece of testing real time analytics. And then you can take it one step beyond that, to modeling a system in whole without, not just as a workload generator. But modeling a complex system, I’ll give you an example of an airline, you know, airlines have passengers, hundreds of thousands of passengers and thousands of airplanes. I don’t know how many gates they worry about, but they have lots of gates at every airport, and all of that, that whole system has to mesh and work seamlessly. And you know, weather delays occur. I was on a flight over the weekend, and we had weather delays on both ends, and somebody is making a decision about holding flights when a weather delay occurs. And that decision making is usually based today on intuition. But with simulation, you can simulate the entire system of passengers and airplanes. And inject weather delays, run this like a weather simulation faster than real time. And then make predictions about if I hold this slide, what’s going to be the impact of my system in three hours. So simulation is a tool that people can use to make better decisions, both within life systems. That’s an example of making predictions for a live system, or just for building a system like a smart cities development. And you’re looking at traffic control, and building management and you want to make sure you’ve got it all working well before you go and build everything. simulations with thousands, or millions of entities are something that digital twins can help with.

Mike Vizard: Are we suffering from a lack of imagination when it comes to digital twins, because there are things that we could be doing that are feasible, but we assume that they’re more complex than they need to be. And we think we’re going to have to hire a bunch of programmers to write low-level C Code when we can with a higher level of abstraction,

Dr. William Bain: Yeah, actually the profit is worse than that. It’s, and by the way, C Code, nothing wrong with C Code; I’ve written a lot of it. So the problem is that most systems become legacy systems. And the innovators dilemma bears on this problem that people think, well, the problem is too hard, so they don’t even try to think about could we solve it today. Because the tool of having a digital twin to help solve it isn’t usually something that comes to mind. I’ll give you an example of a security system and think about a large company, which has badges and everybody has a badge. And every time you go in and out of a building – that’s recorded in a database, but what you really want is to understand what is the pattern of usage of a badge. So for example, if you have an employee who has given a notice of termination. You know, they’re gonna leave. And then they’re, you know, they have two weeks, and then now all of a sudden, they start appearing in a building, maybe a sensitive building, where they’ve never been before. And you want like to flag that and then just sort of understand why are you there; you’ve never been in this building, you know? I’m thinking of a large company, like some of the ones I’ve worked in the past – big, high-tech companies, and you have, you know, hundreds of buildings. And you wonder, well, you’ve always been in these buildings. Now, why are you all in this building suddenly here, and you’re going to be leaving the company – another case of safety, right? If you’ve been allowed to do certain things in a factory setting. But you know, other things are beyond your job description. And if you were to wander into the wrong part of the factory, it might be dangerous for you. And that, you would want to flag that. Well, digital twins can help solve that problem. But it’s a security problem, the safety problem by tracking a particular person’s bad usage in real time, and making an instant decision to flag a problem if before it becomes a real a bigger problem. You know, if you think about it, the systems that implements badge tracking, doesn’t do that. Not as far as I can tell. I haven’t seen that coming up in a new product. And the reason is, because I think people don’t think about how can we employ context with a model like a digital twin model, in real time to solve problems that are hard to solve? Let me see if we have time to give you one other quick one that we just saw a few weeks ago, which is clone license plate detection, crime prevention. So if you think about it, there’s this way that people are stealing cars by cloning the license plate. I’d never heard of it. But and then what they do is they know about a real car that’s sitting in the driveway somewhere. So they steal a car that’s identical, put that license plate on it, and drive it away. And everybody thinks, well, if the police see that car, they’ll say, well that’s, you know, it’s the car that belongs somewhere else to our legitimate person’s legitimate owner. So and what they typically do is they steal groups of cars, and they create a convoy. And this is in the UK, that we’ve heard about this. But I’ve been told it’s true in the United States as well. And then they would drive those to a port, and put them on a ship and send them off for sale in another country. So what the police need to do is detect the cloned license plates, and then detect the convoys of cars with cloned license plates going off to the port. And that’s a hard problem. And it’s an ideal problem for digital twins. Because with a digital twin, you can bring the context to bear on this problem. When a CCTV is I realized, there are privacy issues. But that aside, when a CCTV gets a license plate number, it could get a location of a vehicle. And it can match that with what it knows about that vehicle and its last known locations, and see if it makes sense. If a car has been operating in, say, the London area and all of a sudden you find it, you know, in another part of the UK, within minutes, then you know that that’s probably a cloned license plate. And then you can start tracking that and start finding out is there a convoy that it’s part of, and you know, take down a whole crime in progress. So that’s another example of digital twins that doesn’t come to mind, you know, off the top of your head.

Mike Vizard: Alright folks, well, you heard it here. I wager that there’s probably a half a dozen problems in your organization that you never thought would be fixed. And maybe you could just take a minute and think about how a digital twin would be applied; a little imagination and the world could be a different place. William, thanks for being on the show.

Dr. William Bain: I appreciate the opportunity. Thank you.

Mike Vizard: Thank you for all watching the latest episode of the Digital CxO Leadership Insights series. You can find this episode and others on the website. We invite you to check them all out, and once again, thanks for watching.