CONTRIBUTOR
Chief Content Officer,
Techstrong Group

Synopsis

In this Digital CxO Leadership Insights video, Mike Vizard talks with HEAVY.AI CTO Todd Mostak about why AI and analytics will ultimately converge.

 

Transcript

Mike Vizard: Hello, and welcome to the latest edition of the Digital CxO Leadership Insights series. I’m your host, Mike Vizard. Today we’re with Todd Mostak, who’s CTO for HEAVY.AI. We are going to be talking about analytics and data science and how it all comes together. Todd, welcome to the show.

Todd Mostak: Hey, thanks for having me, Mike. Good to be here.

Mike Vizard: All right, we’ve had analytics forever and a day; people have been running around with their business intelligence apps. Sometimes they use them, sometimes they don’t. And they sit with their spreadsheet. And then we had all these data scientists folks come along and start building all kinds of interesting models. How do the models and the analytics come together to do something useful for a business? Is all this going to converge? And how will that occur?

Todd Mostak: Well, I think we’re seeing that play out. Now, Mike, you know, there’s worlds colliding literally right now, where you have, I guess, more traditional approaches. And then by traditional I don’t mean out of date, right? You know, people have been using SQL for the last almost,  40/50 years, and they’ll probably use it for the next 40 or 50 years. You know, people need to see their data. So BI as a modality for understanding your data is here to stay. And I think it’s critical. In fact, I think we have to be careful about the pendulum swinging too far in the other direction, and just having the machine spit on answers, right, people need to trust, they need to understand you still need a human in the loop. But on the other hand, there’s, you know, new, I shouldn’t say new, right? You know, statistical learning has been around for ages, but increasingly powerful techniques. Some of them are in the algorithms, but a lot of it’s just in the compute power that we can bring to bear in the form of data science and machine learning. Obviously, you hear a lot about deep nets, but there’s a whole variety of techniques that people are using. And so, you know, you have folks who may be trained on the traditional approaches, but not as versed in data science and unknown methods. And I think you have folks coming in who are trained data scientists, and sometimes they look at each other with suspicion, but really, they’re both trying to get answers to the same questions many times. And in fact, I think the approaches complement each other.

Mike Vizard: A lot of times there is this debate about whether or not I could just use traditional statistical analysis to solve a problem versus having to go out and build an AI model, which, in some people’s minds may be a little more complex. Is there any guidelines for which route to take when? Or do I just pursue both? And compare the answers? Because, you know, it’s a good way to check on one versus the other?

Todd Mostak: Well, first off, I would say that, you know, unless you know your problem domain and have a very mature machine learning pipeline to handle it, you know, jumping in without jumping in and running models without actually understanding your data, seeing it and doing basic exploratory analysis is a bit crazy, right? And I think most people know that, you know, people talk a lot about auto ml, and that has a place but I think, you know, if you’re not doing the basic analytics, like you can’t build a rocket until you fly a plane, right? And so, you know, on the other hand, I think that people who are resistant to, you know, as great as the human, especially visual cortex is uniquely capable of picking out patterns in the data. And that’s critically important, but sometimes they pick out the wrong patterns, right? Or sometimes, obviously, there’s so much data to sift through that there’s not enough people to our subject matter experts to actually find all the anomalies and correlations. And so that’s where kind of statistical learning machine learning data science, whatever you want to call it, comes in, and can help, I think, in many cases to automate the process just because there’s not enough humans surface anomalies and correlations for the user. And in some cases, pick out stuff that humans just can’t write, like, there are some domains where machine learning is better at picking out the needle in the haystack than than a human would be. But you really have to know the domains and people need to build trust; they need to know their data, they need to build trust in the models. And that’s why I think the two things are not separate domains, but in fact, inherently intertwined or need to be at least.

Mike Vizard: You mentioned, auto ML, and there’s a lot of talk about the democratization of AI. So is that really happening? Or is it just really, you know, high end power users that know how to play with that?

Todd Mostak: I think, you know, it is happening in fits and starts right? And, you know, part of this is the BI players almost across the board that made a push to bring in at least basic ML capabilities like clustering, or you know, I think Tableau has an explain feature into their products. You know, I think until, until there’s more education on what’s actually happening behind the scenes and when and why you’d want to use an ML technique, you know, there’s going to still be distrust, right? So I think part of it is just people catching up. And, you know, I think in five or 10 years, even basic analyst positions, you’ll need to know what regression or random forest is, right? Or at least a classification model. So I think there’ll be a democratization, I think, the other thing is that, yeah, there’s a gap between the BI platforms, which kind of puts the model in a complete back black box, and you hit a, you know, magic one button sometimes. And it spits out a result, which people might say that it’s cool, but how do I know it’s giving me good stuff? And then you have like, the whole notebook driven workflows of data scientists, which are really wonky, and certainly not no-code, right? So I think there’s something in between, you know, as company heavy AI, we’re striving to keep the human in the loop to provide powerful visualization capabilities to scale, but also letting you know, novice intermediate data practitioners actually do some basic modeling, even through the UX, and actually spit out actually what’s happening behind the scenes if they want to interrogate what’s happening in the model war.

Mike Vizard: There’s also some confusion about how much data you need to build an AI model. Initially, everybody said, we’re gonna need massive amounts of data. But lately, you hear folks talking about the ability to build models with maybe not as much data, which makes it more accessible. So are we starting to make some progress here in terms of can we build an accurate model without necessarily collecting, you know, terabytes, upon terabytes of data to the point where we got petabytes and stuff to manage just to support AI ML?

Todd Mostak: Well, I think that’s the almost pivotal question, right? Because, you know, certainly the, the hyper scalars, you know, the Googles and Amazons have both enough compute and data to throw at the problem, where this is likely less pressing to them, you know; that it’s interesting in the fact that some of these transformational transformer models, language models, you can basically get the models online, but whether you can get the immense amount of training data, or certainly the vast computational resources needed to train, it’s a very different matter. I do think that people are starting to focus on this more, and also focus on the, the computational efficiency, the data efficiency and energy efficiency of these models. And in fact, you know, there’s promising things like, you know, Facebook has a system called Scuba around being able to bootstrap models from, you know, limited expert knowledge. And so I think that’s going to become important if this is going to actually be truly democratized, because no matter the training, you know, you and I could both have our postdocs in data science, but if we don’t have the data, or the GPUs, for example, you know, we won’t be able to get far. But you know, that said, I don’t want to overhype that, because I think, you know, we read all about these giant models and giant problems, you know, even with hundreds of rows of data and something simple to learn, you can often get very good results, you know. Maybe they’re a few percent off the state of the art. But just using a random forest or logistic regression gets you very far. And these models are often much more data efficient.

Mike Vizard: One of the other issues we hear about all the time is that there’s a bit of a divide between the data scientists and the application developers, and a lot of the developers use DevOps to update applications continuously. And the models come along once every six months. And somebody is trying to insert that into a flow. And it’s manageable. But you know, is there more efficient ways to think about this? How do we actually get our models inserted in the applications and get the value proposition out of all this stuff?

Todd Mostak: Yeah, it’s kind of like, you know, the old tree in the forest, right? Like, you know, the number of models and even promising models that must be created. And the number that actually get deployed; there’s a huge disparity there, right? Because the operational pipeline, and there are a number of players here, you know? ML flow from data bricks. And, yeah, most of the cloud providers have capabilities in this regard. I think that’s going to be a key question. I don’t think it can just be solved with technology, right? It’s a people and process issue, as well, because often these data scientists are currently, except in kind of very forward thinking organizations, shoehorned into an innovation or data science group, not necessarily embedded with the day to day business units. And so, you know, it can be seen as those guys in the corner over there are, you know, doing some fancy stuff we don’t understand and we have this model that works. Why should we change it? Right? People need to be nervous about changing models, right? It’s very, you know, just because something, you have a small test data set, and it proves out during production and making business decisions, there needs to be, you know, some friction there. But I would argue that a lot of it is the closer you get the data scientists, the DevOps, or MLOps engineers, and the business units and subject matter experts, the more this becomes organic.

Mike Vizard: Speaking of changing models, the models themselves are subject to drift; we get new data comes in, or some of the assumptions that we made are no longer valid. So is there a way of monitoring these things? And do I need to continuously update them? Or what is the best practice?

Todd Mostak: Yeah, and I don’t, you know, there’s folks out there who would be deeper experts in this than I am I mean, I think, obviously, COVID was the quintessential Black Swan, if you will, right? And there’s going to be more things like that may not be pandemics that maybe, you know, versus stagflation, and maybe anything, right, where, quickly, something that was back tested on past data is irrelevant, right. So I think, first off, you know, the ability like a circuit breaker to say, “Hey, you know, we’re going to continuously do test performances of our model, and make sure that we’re going to sound all the alarms or stop using it if we see a big jump in inaccuracy, too.” I think the Holy Grail is continuous training, right? So that you don’t stop the show, but you adapt when possible to new data. A lot of the algorithms aren’t very good at that, right? They’re designed for batch training. I think people are focusing more even with the transformer models on you know, being able to, to tweak or to do kind of final end mile feature engineering, but that also translates into new data arrivals, and you want to train the model, continuously to train the model, over time; I think that’s going to be important. But you know, it’s fun, all right, with so few organizations having a large amount of models deployed, right? You know, and they don’t have often have a large track history of actually having these models in production. I think people haven’t fully seen the extent to which this is going to be a problem. You know, there are regime shifts, right? Like things change. And that’s no more the case than right now. Seems like it’s slapping us in the face right now.

Mike Vizard: Now. To that point, are we putting too much faith in these AI models? As one guy once pointed out to me, he said, “You know, it’s one thing to be wrong, it’s another thing to be wrong and scale.” So how do we kinda like, you know, put some checks and balances in here so that we don’t wind up, you know, relying on an AI model that takes us down a path that has some sort of negative result?

Todd Mostak: Yeah, I think that’s, you know, first off, I think people need to even like just not talking about temporal change, right, like, if you a fresh model, you know, fully trained you do all the nice things, it’s very hard as the models get more complex to understand what’s happening under the hood, you know, the perennial blackbox problem. And so, you know, and there can often be legal implications of that as well. You know, your model could be discriminatory, for example. So first off, I think, you know, there’s a burgeoning field, where people are trying to actually understand what’s happening, and figuring out the blind sites, or even potentially security weaknesses of these models of how you could, you know, exploit them or trick them. I think part of that is in visualization, there’s some very promising visualization capabilities, that some of our customers have used these capabilities, to see how the inputs and outputs of the model correlate in a way that, you know, human beings can understand. You know, the other part of this is just, again, it’s, it’s a process thing where, you know, there are more, rather than just the shiny new object, you know, models are judged on their merits, both, you know, at lunchtime and continuously over time. In the case, that performance may degrade, due to what we were just talking about regime shifts, I think that’s gonna become a big thing. But, again, I feel like we’re on the very early part of the, you know, people are talking about ML data science and hyping it for a long time. You know, in my experience, you know, a limited set of organizations have gotten to a point where they’ve had any number of models in production for many years making key business decisions. And so I think they’re just starting to get stung by this stuff.

Mike Vizard: What’s your best advice to folks you’ve been at this longer than most folks, and a lot of people are just getting started, so what do you know now that you kind of wish you knew maybe two years ago?

Todd Mostak: Well, I think that it’s you know, it’s pivotal, even if the dots aren’t explicitly connected, just increasing data literacy in an organization and giving people the tools to ask their own questions. And I know there are some cautionary points about that right? People will jump to the wrong conclusions to governance. Yeah, there needs to be some sort of data governance, but having it where, you know, it’s not a where the business, those people in the corner are doing these models, but at least getting to the point where people are having a conversation around the same data, understand what’s happening, have a sense of the data, and that means that there’s more people that can actually sound the alarm. If you know, the model is trying to drive the figurative car at the business off the cliff. You know, they would say, oh, no, no, this can’t be right. You know, we still haven’t found a way to replicate the human intuition with with AI yet. Especially a subject matter expert. So I think, you know, arming the organization with data literacy is key. And then I think, a little bit, you know, it’s funny, you see the gamut. Some organizations are very risk averse, when it comes to stuff and you know, probably too conservative, others are very cavalier and have fully drunk the Kool Aid of like, AI is our salvation, and have gone in the opposite direction. So I think a happy medium of like, let’s, let’s try this stuff out, build data literacy, understand, be able to see the data and do analytics well, and data governance and catalog. And then finally, let’s use models in situations where it’s not life or death, figuratively speaking, or maybe not, in the sense of like, you know, let’s get our feet wet, making decisions that are reversible or not going to sink the business. Until we get our sea legs if you will, and can actually move to bigger problems, because there’s definitely a lot of risk of going too far and too far, you know, to full steam ahead and ending up, you know, making the wrong wrong decision.

Mike Vizard: So, gotta strike a balance. As always, Todd, thanks for being on the show.

Todd Mostak: Yeah, thanks so much Mike. My pleasure.

Mike Vizard:
All right. Thank you all for watching this latest episode. You can find this one and all the other ones that we’ve done on digitalcxo.com. We invite you to check them all out. And once again, thanks for watching the show.