CONTRIBUTOR
Chief Content Officer,
Techstrong Group

Synopsis

In this Digital CxO Leadership Insights interview, Mike Vizard talks to Zachary Hanif VP, Head of Machine Learning Model and Platforms at Capital One, about why best data management practices are core to artificial intelligence (AI) success.

Transcript

Mike Vizard: Hello, and welcome to the latest edition of the Digital CxO Leadership Insights series video. I’m your host Mike Vizard. Today we’re with Zachary Hanif, who is Head of Machine Learning Model and Platforms at Capital One. We’re going to be talking about data management, data meshes and the creation of AI models and how all this comes together. Zachary, welcome to the show.

Zachary Hanif: Thank you for having me, Michael.

Mike Vizard: It seems like one of the upsides and the negatives of AI is it kind of forced us to get better at data management, I would argue a lot of enterprises, we’re not exactly going to get a Good Housekeeping seal of approval for the way data was historically managed. And that seems to have improved considerably because the AI models need quality data to really drive the outcome we’re looking for. So have you seen people kind of have a greater appreciation for data management these days?

Zachary Hanif: I think that there’s been a greater appreciation for all sorts of things inside of the ML lifecycle: data management being absolutely one of them 100%. But we’re also seeing that same level of appreciation showing up inside of just kind of all of the areas of what you could consider to be the data lifecycle or the ML ops lifecycle as well – that entire body of events that start when you begin collecting data, and end once your model is running in production, being monitored, and being continuously refit and retrained to ensure that it remains fit for use everything in between there. I think we’re seeing an increased level of interest, infrastructure and investment across the entire industry, observing it to be more of a prominent part of data scientists’ day-to-day working lifecycle.

Mike Vizard: And how are those teams coming together? It seems like we’re seeing the rise of data engineers to address data ops; we had data scientists who were building the models, and we have DevOps teams that insert the models into the application. So how are we bringing all those squads together?

Zachary Hanif: So I think that one of the things that’s really great about this particular field is that many of those roles certainly existed in the past, but as more investment has come in, more and more people have begun to specialize inside of those areas. And you start seeing this proliferation of specialties and titles and different aspects of the role as people identify the things that they’re passionate about, areas where their enterprises need them to fill gaps. And just as the entire ecosystem continues to evolve, we identify new and exciting ways for us to kind of like participate inside of that and de-risk parts of the process. I think one of the things that I would say is the benefit here is because many of those roles have existed in the past, they may have been going by different names, and they may have had different people in them. But the actual activities that we’re discussing here, even in their infancy, have existed continuously. One of the things that we get to leverage very, very heavily as all the historical learnings that both data science and software engineering have had in the area of building cross functional teams to be able to directly support each other. When you have a team of specialists, you know, really talented thought work oriented individuals, the benefit that you get from having people who specialize and go really, really deep in certain areas, and have the breadth to communicate with each other is incredibly powerful. I think the other thing that we’re beginning to see more here is, we’re not just relying on making sure that we’ve got every single specialty, every single aspect, as a human being represented inside of the team, we’re beginning to invest more and more instead of repeatable infrastructure, and things that kind of create a common operating environment where there are certain assumptions that individual team members can make. Otherwise, sooner or later, you end up with teams, which have to be 150 people in size to make any kind of forward progress by investing very heavily inside of that infrastructure, investing heavily inside of, you know, processing procedure to make sure that we’ve got a common operating environment and framework for our data transformation and machine learning or modeling needs. We learn very, very quickly. We can leverage very, very quickly, the infrastructure that’s present for us to create this world where we can continuously focus on the things that bring the greatest value to those teams, as opposed to all the things that are necessary, just simply to operate inside of the space.

Mike Vizard: What is the relationship between those teams and the rest of the business? It’s great that we have these cross functional teams across data management and application development in data science. But how do we get those guys aligned with what the rest of the business is trying to do? Because, you know, we’ve been dealing with this divide between IT and the business for decades now.

Zachary Hanif: I think that’s a great question. I think that at the end of the day, we’re looking at a situation where more and more as businesses begin to continue down the journey of becoming, you know, technically reliant and offering their services over digital means. Tech is becoming more and more core to the warp and woof of the business as a whole. It’s a fundamental component is the thing that the businesses in many cases built on. Now, while individual business driven key results may not specifically reference core pieces of technology, as technology becomes more and more of a channel for customer enablement and for access to customers individual needs and the primary platform by which we offer more of these services, I think that we’re beginning to observe that line, which historically may have existed across many, many companies, is beginning to become grayer and grayer beginning to draw down. I think the other thing that I’ll say here is, again, these are challenges that all of technology has been working with over the course of say, the last 30 years or so right, we have a firm understanding of how of how to attack and challenge these problems. And while the specific implementations of those solutions may be may look different from company to company, and from industry to industry, one of the things that I think we are all very happy about is that as time goes on, as we continue to focus on this body of work, we’re seeing new and more areas where technology is able to is enabling the driving of business, enabling new capabilities, and further ensuring customer success.

Mike Vizard: Do you think that these investments in AI are going to be roughly the equivalent of the, you know, the effort to go to the moon, which yielded all these interesting technologies that we commercialized later? So are we seeing the same effect as I look up and down these AI models? I see new things like data meshes and other new technologies being used more broadly. So is this going to be like the gift that keeps on giving?

Zachary Hanif: I think one of the fastest ways to kind of create a situation where I look back in five years and think about a conversation and think badly of how it is to try to predict the future too strongly; that just simply being the future is broad. It’s constantly moving. And it’s hard to hard to pin down. Obviously, being in the field that I am, I completely believe in the promise of large scale statistical modeling and the ability to learn adapt from data and ongoing events. So yeah, I in many ways, I’m a believer, I think that one of the things that I would say is that if we look at the number of man hours that went into, say, the Apollo Project, as you had mentioned, and the number of collective man hours we have going into the large data space, the modeling arenas that we have today, I would say that, just simply due to laws of large numbers and the number of people who are able to join, collaborate and participate inside of the ecosystem, we’re seeing far more man hours going into the latter than the former. As we think about kind of what machine learning is enabling for us, we keep seeing these individual inflection points. Right now, instead of machine learning, I’d say it’s probably every six years or so, where we’re seeing new things get released from academic capabilities, to industrial adoption, so on and so forth, where it starts to change the way we think about these things. I’m not sure that I’m capable of predicting whether or not kind of the the newest wave of the NLP based models, Chat GPT, and all the other things that are coming up from there are going to be kind of the thing that affects kind of the next major paradigm shift inside of this space. But the fact that we’re seeing an increasing rate of innovation in these areas, and we’re seeing capabilities and technologies that – heck, at the very beginning of my career, these were almost unthinkable things. And we’ve seen that three or four times now, over the last, say, 10 to 15 years, right, certainly indicates that we’re going to be rethinking the way business is done in many ways, across multiple, multiple industries. And we’re going to begin to discover new products and services that we couldn’t have thought about before, that are now enabled by this suite of capabilities.

Mike Vizard: There are a lot of folks who launch these projects and don’t necessarily get the results they expected. Is that just kind of the cost of doing business in AI, and it’s kind of the cost of learning? And we need to figure out how to accept that?

Zachary Hanif: One of the things that I like to say is that machine learning is very much an experimental science, right? We have a body of data, we, as humans, from the beginning, have some intuition about what the data is able to teach us communicate, well, we can use it for all those fun things. But we don’t actually have a full and complete knowledge of it. And so as a direct result, in any experimental science, we try things and sometimes they even work, right, we take a look at the data that we have, we think about ways that we can model and represent it in a way that it’s interpretable by a machine and kind of models or represents the world that we are attempting to better understand. And through a series of experiments, we attempt to get closer and closer into a situation that allows us to achieve our business goals for like in the most abstract sets of statements. I would say that’s generally the way these things work. One of the things that’s really exciting about machine learning in particular, is that the cost of experimentation – (hey, I had a clever idea. I want to try something out. I want to keep an eye on how this is going is very low considering especially when you reflect on the cost of experimentation in some of the more physically oriented sciences right now) is a very low barrier to entry in many cases, and it’s a relatively low cost to run all sorts of experiments relatively quickly. That’s something that I think we should be able to embrace and encourage, of course, you’re not going to get it right every single time. And sometimes you’re in a situation where you don’t have kind of the data that you need, or the signal isn’t really there that is required to be able to answer the business question you’re attempting to address. But I would say that in the kind of, like, largest and most generic sense, yes, absolutely. Teams will go through multiple iterations on models. And the goal of a lot of the infrastructure that is being built across the open source and industry communities, as well as you know, teams, that that I’m working with very closely right now is to bring it to a point where we continuously drive down the cost of that experimentation, drive up the relative signal to noise ratio that we get per experiment. So you hopefully can run fewer to get to an effect that is desirable, and most importantly, to be able to go through and really have an intimate and deep understanding of that process as a whole. And the end drive a deeper understanding of what your model is doing.

Mike Vizard: There’s a lot of conversation these days about bias and ethics. And I guess my question is, is that really something that starts with the AI model? Or is that more something that is built into the data management itself, it’s the old phrase garbage in, garbage out, and maybe we need to spend more time thinking about what data we’re going to feed to the mall.

Zachary Hanif: I actually think that it’s both. I think that if you attempt to pin down bias and ethics and where you need to think about it inside of the overall modeling lifecycle, right, you’re probably in a situation where you’re going to wake up one day and realize that you’ve missed something meaningful, right? Bias. Bias mitigation is something that you want to be able to do at every stage inside of that overall lifecycle. It’s something that you want to be thinking about on a consistent basis, all the way from the very beginning, as you said, garbage in, garbage out. But what you do with your inputs is also relevant. So you think about that all the way through, you consider all the changes the modifications that you’re making, and the implications that it has. And finally, of course, you monitor the effects once your model goes into production.

Mike Vizard: One of the issues that’s come up recently is people are wondering for lack of a better way of describing and if this winds up being a zero sum game in this sense. So if I have an AI model that’s optimized for me as a supplier, and somebody else has an AI model that’s optimized for them as a buyer, well, the two models just meet somewhere in the internet universe and cancel each other out, or how will we kind of have a situation where you know, my model beats up your ammo.

Zachary Hanif: So I think that like when we consider questions like that, in the abstract, they become so difficult to kind of talk about and rationalize around, that it’s almost impossible to have kind of a accurate or even balanced view. In some cases, I think if you’re in a situation where you’re thinking about problems like this, you should really consider the specifics of the environment that you’re in the specifics of your industry, your field, all sorts of other materials like that, and then go from there. Unfortunately, I think, or fortunately, in many cases, inside of machine learning, understanding the context around the model itself, and recognizing that models are not things that operate inside of silos, and they are not things that are exclusive in any way, shape, or form to one particular body of activities. And they have transferable properties, in many cases, is very important. But the swath overall of kind of like the the problem space that you’re addressing via that question is so so, so broad, I would be reluctant to like point at any one thing and make kind of any kind of a declarative statement about it.

Mike Vizard: What’s your best advice to folks about getting started, and a lot of the folks that I’ve talked to, in the past like to at least focus on something that’s a closed loop process versus some of these more open ended processes that are a little more difficult to automate? Because, you know, the algorithms ultimately have to learn and they will drift that they have too much data thrown at them over time.

Zachary Hanif: I think that very true. I think one of the things that I would suggest is, while closed loop processes are very, very helpful, you want to if especially if you’re a industrial firm, or a proprietary group, looking to train like start beginning your overall corporate journey into this space, one of the things that you probably want to do very early on is make sure you really understand the problem you’re attempting to address. And you’re able to very clearly and objectively articulate how you will know whether or not your approach has been successful. Right. Going back to the statement that machine learning and modeling is an experimental practice at the end of the day. It’s science and the core of science is that ideas are tested by experiment, and can be measured in an objective manner. Once you have kind of like really internalized that. I think it starts to matter a little bit less whether or not it’s closed loop or open loop Oops, any of the other ways that you want to kind of rationalize around the problem. While some of them may be inherently maybe a little bit easier, the most important thing that you can do is to be able to say, Do I understand the domain in which I’m operating? Or my model will be operating? Do I have a good understanding of kind of the assumptions that are implicit in my data and my statistical modeling approach? And then finally, do I have a really great objective function that I can use? Probably more than one to be perfectly honest between us, but at least one really strong objective function to be able to turn around and say, “This is how I know if the model is having the effect that I intended it to do.”

Mike Vizard: Do you think, at least in my experience, that maybe business leaders are overestimating the impact AI will have and the technology folks are under estimating the impact AI will have?

Zachary Hanif: That’s another one of those fun, very broad questions. I think that when ever anyone is business, technology, any field that you’re in, when you’re on a frontier of something, you don’t know what’s over the horizon, that kind of the definition of the frontier at the end of the day, as you journey further, as you understand more and more about how that particular area of the world works, what is possible, and what is available to you, as you begin to engage inside of the space, you’re going to be able to open up into a world where you can make more accurate predictions. I don’t think it’s so much of an issue of hey, business, people don’t understand what’s going on and they’re over ambitious or technical people are overly pessimistic, I think it’s more along the lines of everyone has a slightly different body of experience and different body of kind of expectations and goals across new technical spaces and frontiers. And one of the things that’s really exciting about times like this, is everyone’s looking at it in a slightly different way, wherever reality happens to be is usually somewhere in the kind of net effect of all of those all those wants and desires. But I wouldn’t say I wouldn’t characterize it the way you did, I think I would characterize it more under the context of new things are hard, and they often have unexpected effects. And another unexpected parts of them are tricky, or very exciting. And so it’s more along the lines of as we all explore what the possibilities are in this kind of continuously emerging arena. I would say that, like I did earlier in this conversation, it’s hard to predict with any real accuracy in the long term, certainly. And that’s why we experiment. That’s why we try new things. The goal of experimentation is to make sure that the cost of experimenting is low, you can iterate very quickly, and you have a really firm understanding about whether or not the thing you did has the effect you intended.

Mike Vizard: All right. And for those newbies who are watching this, what’s the one thing that you know, now that you wish you knew when you were first started down this path?

Zachary Hanif: Oh, man, there’s so much of it. I think that one of the things that I now have a deeper appreciation for that, when I first started down this path is really that experimentation model. I think that if I had spent, at least for my first couple of models that I had, that I had started to build and train and work with, if I had spent probably, I don’t know, double the time on really defining those objective measures of success and cleaving to them very, very closely. I’m sure those models would have been far more fantastic. That being that being said, you know, one of the great things about an arena like this is, again, the cost of experimentation is low. And when you’re just getting started, and you’re like – especially if you’re an individual just getting started, play with it, experiment with it. It’s cheap to train a model, it’s cheap to get access to large bodies of well understood public, very safe information that you can begin to work with. Play with it. There’s no faster way to learn as someone who’s just starting out in something than making mistakes, staring at the results and trying to figure out how you can do better, right? That’s something that’s really exciting. And there’s tons of resources out there that allow you to get kind of direct hands on experience in this particular arena for free.

Mike Vizard: All right, folks, you heard it here. You cannot afford not to experiment because that’s how we learn. Zach, thanks for being on the show.

Zachary Hanif: Absolutely. Thanks for having me, Michael.

Mike Vizard: And thank you all for watching the latest episode of the Digital CxO Leadership Insights series. I’m your host Mike Vizard. You can find this episode and others on the digitalcxo.com website. We invite you to check them all out. And once again, thanks for watching.