In this Digital CxO Leadership Insights series video, Mike Vizard speaks with Chad Simpson, senior vice president of technology at City Furniture, and his colleague Ryan Fattini, director of data engineering, about data virtualization.
Mike Vizard: Hey, guys, welcome to the latest edition of the Digital CxO video podcast. I’m your host, Mike Vizard. Today we’re here with Chad Simpson, who is the senior vice president of technology at City Furniture, and his colleague, Ryan Fattini, who is director of data engineering. Gentlemen, welcome to the show.
Ryan Fattini: Thank you.
Chad Simpson: Thanks, Mike. Good to be here.
Mike Vizard: You guys had some big projects involving data virtualization as part of your whole digitization strategy. Can you guys maybe walk us through a little bit of how you got into this project in the first place? There’s a lot of things you could prioritize. So how did this make it to the top of the list?
Chad Simpson: I’ll kick it off. Then, Ryan, I’ll let you take it from there. So we started our data – really, our big data digitization project about three years ago. We turned 50 this year. This was our 50th anniversary, so you can imagine we have no shortage of data.
But it was very hard for people to access. It was in our system of record, a mainframe system. Traditionally, the way that you accessed it was to put a ticket in with IT and have a programmer pull out the data for you, which is, as you can imagine, not very customer-friendly and not the way that we want to serve our customers from a data perspective.
So about three years ago we started an initiative to start getting our data out of the mainframe system and into a logical data warehouse using real-time journaling, CDC. Then we quickly realized that we needed to build out a data team. So we created a data engineering team, a data analytics team, and now we also have a data science team as part of that – over 25 people as part of that collective team.
We quickly realized that to be more nimble and adaptive in the future, we needed to augment the data warehouse with something that would allow us to be flexible. So if somebody wanted Power BI or they wanted to use XYZ product, we didn’t want to create all these data models inside the warehouse, and the solution to that was really data virtualization.
We’re a big Gartner partner. We’ve been partnering with Gartner for five years. We did a lot of due diligence, a lot of research, and Denodo kept coming up on the top of the list for best-in-class data virtualization, which kind of checked all the boxes for agility, adaptability, performance, governance, when you talk about providing data from different sources, and then providing it out to other sources. We felt like Denodo really gave us that opportunity to be nimble for the future.
So that’s sort of a high-level view. Then, Ryan, if you want to – you know what was really intriguing for you for Denodo and the main drivers of why you guys felt like you needed it?
Ryan Fattini: It’s exactly like you said, the warehouse, the streaming engine based on our mainframe system. So we built this out with IBM technology, full stack IBM. So a streaming engine maybe streams about close to 200 tables real-time. However, the company demand for data as it increased, more and more data sources that weren’t part of the mainframe system were needed to basically hybridize and then deliver reports.
So this goes back to pre-COVID. So if we rewind back to where we’re talking, we’re talking now the end of 2019/early 2020. So the bottlenecking of the warehouse, the streaming engine, and the need for all these other data sources was putting a stress on our software engineering. In other words, we have to do a data engineering, software engineering solution to scrape together all this data and make copies and into the warehouse.
We had a small team at the time. Data engineering was small. So it was a few of us kind of hacking away, trying to put together all this date. The alternative would be to then rely on the core software teams, and it seemed like we’re going backwards. In other words, we’re starting a data team, we’re starting a data engineering team, but now we’ve got to pull in nuclear teams that are managing our core transactional systems to move data around.
So Q1 2020 is when we started to think exactly like Chad was saying. One, we have this data model, non-agnostic data models. So we have data models rolled up in Cognos. If you don’t know, with Cognos you can’t really access some of these complex data models like for sales and supply chain.
The other problem with all these other data sources was the high demand that other departments were seeing, especially when they see this real-time streaming data. They’re like, “Hey, I want real-time KPIs, but my data isn’t in this mainframe system.”
So the company made the decision to do a proof of concept with Denodo, exactly what Chad said. The partner said that virtualization might be the approach for you guys. It might be able to connect instead of having to rely on engineering resources to kind of pull it all together.
So now COVID is popping up, and as a company we’re in this paradigm shift, this data paradigm shift at the same time that the world is started to freefall. So going into February and April when everything starts to shut down, now we’re shutting down. We’re flattening the curve. We’re reduced now because of furloughs. Everybody got furloughed. It was a tough decision to make.
We were reduced to skeleton teams during COVID, with this virtualization kind of potential solution, but everybody is bleeding. Nobody knows what the future is going to be. It’s all uncertain. The company made the decision, instead of to play defense, “Let’s just kind of bunker down and ride it out, till we’re all dead maybe.”
The company made the decision, “Okay, let’s go on offense. Let’s take advantage of the fact that we have very little transactions. We have skeleton teams of our top skilled people left. Let’s build out an aggressive data architecture. Now is the time to do it, so we go on offense.”
So the company made the decision. At the time, Chad made the decision to say, “Let’s invest.” Where everybody is trying to cut costs in every potential area they could, Chad said, “Let’s pull the trigger on this thing _____.”
During the COVID lockdown, which took six weeks or something, we’re in Florida so we started to open up a little bit quicker. We built out our virtualization layer and connect all the data sources that all these other departments needed, and we start building KPIs that the company has never seen before, data models that the company has never been able to see because you’re talking about combining one, two, and three sources together into a nice cache piece of the data artifact that you can then attach any VI tool to and show the company data and gain insights that never before were possible.
So we come out of COVID, as we start to reopen, strong. We’re like the bull charging right out of COVID. The culture and philosophy of this kind of go on offense is not just data. It’s every department.
So we start to pick up market share. We start to pick up market share, and you give a little bit of market share back as other companies finally come out of their shells, but we didn’t give it all back. So because of this kind of data surge and the company leadership to take the risk, to go on offense, we were able to build out a very strong data architecture and pick up market share.
We’ve been doing well, and we’ve got good reception from the data community with different speaking events. It really, really paradigm shifted us. We’re a different company now.
Mike Vizard: Chad, have you seen a difference in the way people are consuming that data? We’re talking about real-time now. Are you seeing your end users making better, faster decisions? Or what’s been the impact in the business?
Chad Simpson: Absolutely, Michael. I think we have a long way to go, because I think a lot of our consumers, they were so used to getting data batch and queue, so overnight, weekly, whatever it was. A lot of them aren’t used to the real-time data yet. They haven’t adapted to being able to leverage it real-time.
I like the term insights in the morning; action in the afternoon. You get the data in the morning and you’re reacting to it in the afternoon. But a lot of folks haven’t adapted to that, frankly. They’re so used to waking up and getting the reports from overnight, then drinking their coffee and figuring out what’s going on. But the reality is that we have the capability now of giving real-time data, real-time insights.
I would say it depends on each department. Our sales team, they’re consuming it real-time and they’re leveraging it real-time. We know that because whenever we have problems, they make us very aware that their data is not available real-time in the way that they’re leveraging it.
So I would say it depends on the department. It’s a cultural shift, but we’re on the right journey and we’re getting there and it’s adding a lot of value having it real-time.
Mike Vizard: Ryan, one of the challenges that a lot of people have is a lot of the end users don’t always trust the data, because they’re not sure if it’s quality, if it’s inconsistent. So how did you approach all that, to make sure that the data itself has the integrity required to drive real-time decision making?
Ryan Fattini: It’s brutal. That’s precisely correct. This was a brutal process that we dealt with going back a little bit before 2020. So by the time we got to the data virtualization layer, we pretty much had that in order, but it was brutal at first.
In other words, I would say the first six to eight months when we were introducing conformed data. So it doesn’t look like it does in the source and we’re providing different types of data models, and a lot of different validations ways and also business rules. So in other words, it’s not just the data.
People might be calculating business written differently in their mind. It might be _____ taxes and the other one is including taxes. So not just getting the data validated, but also getting the business rules standardized and what it means to say what is _____ _____? What does it mean to say this is – [inaudible]? What does it mean to say – [inaudible – audio fades]?
So exactly correct. This was brutal for the first six to eight months and it required a ton of validation, like Excel sheets. Like here’s the source. Here’s the conformed layer, because the column made the difference. In other words, we had to prove out the data lineage over and over and over and over again for the first six to eight months.
Then eventually, as you start to win more and more and more of the arguments and you start to prove more and more and more of your concepts, the trust builds. But it’s exactly like you’re saying. It’s a brutal process and it does take time, and you really have to be prepared for that if you’re going to do any of these paradigm shifting data architectures or maneuvers.
You’ve got to be prepared for kind of a brutal data validation phase with your stakeholders, exactly like Chad said, the people who are consuming the same data the same way for 15 years and now you’re doing something completely different. There’s going to be a lot of pushback, so you have to prove your case, and we did.
Mike Vizard: Chad, do you think that this is part of some larger trend, where we’re maybe shifting away or at least de-emphasizing batch-oriented applications and we’re shifting the enterprise to more of this real-time footing? Do people really understand the implications of doing that?
Chad Simpson: I do, Michael. I think it’s part of a broader shift in terms of technology in general. I hear the term event-driven, event-based. That’s where we see the world going, both with data and with technology in general, getting away from the batch and queue.
We’re a big believer of Toyota and the lean philosophy. One of those principles is just in time, one piece flow. If you apply that to data and really tech, it’s the complete opposite of batch processing. That’s where we see data going, and we also see just processing in general. It’s the most efficient way to do that, which is event-driven, queue-based versus this batch and flow, which is the way we’ve done it for 20 years.
So we’re looking at all parts of our tech organization in determining where those bottlenecks are, batch processing. For instance, we have a nightly process. It’s a big batch process that generates all these reports at night.
I second that right now in trying to figure out how can we just make that happen throughout the day ideally, so that it’s not this one-hour, two-hour batch at night, and it just happens all throughout the day, because that solves multiple problems from speed, efficiency, visibility. It’s really decoupling those things so that they’re happening more real-time. There’s lots of different value you get from that, but absolutely, more event-driven and less batch and queue across the board.
Mike Vizard: Ryan, back in the day we all had storage admins and DBAs, and now we have data engineers. Is that a different mindset for the IT organization? And how do you get the rest of the company to appreciate the need for actual engineers versus mere mortal administrators?
Ryan Fattini: First, you need leaders that understand the vision like Chad. We had a great CFO announced, Steve Wilder. We had an enterprise architect that had the vision. So your leaders need to share the vision.
If the leaders share the vision and you can prove your case, everybody kind of falls in line, but there is this kind of paradigm shifting curve that there is resistance. There is resistance to, “Why do I need this?” and you can justify the expense. If it’s something that’s completely unproven, it’s just common in an enterprise.
But when the company sees the leaders are behind it and the leaders can make a strong case for it, piece by piece it kind of all coalesces together. That’s what happened in our case.
So like you’re saying, we have software engineers. A data engineer is basically a software engineer that’s kind of applied the data. That’s kind of a generalized way to say it, but most of our data engineering comes from the software engineering, nuclear teams _____ differential, which means they’re bringing with them the same coding standards and protocols and everything and then applying that to data.
The extension of that is the data scientist. The way the industry is going, back in the day they had the backend developers and frontend developers, and then it kind of migrated to the full stack. The same thing is happening with data science. You have your data engineers and you have your data scientists. One of the problems that companies had with the bottlenecking is the data scientists couldn’t get the models into production, because they weren’t software engineers and they didn’t know DevOps.
So you’re getting this hybrid machine learning engineer role that you’re hearing, data science engineer role that you’re hearing more and more about, and we did this. So Chad approved the data science engineering role about six months ago, really forward-looking. In other words, we’re getting way ahead of this problem, which basically takes our software engineers and either trains them to understand data science to the level to be able to build a model, or we have a couple Ph.D. candidates who are good data scientists, who are skimming up to be software engineers.
So we’re hybridizing the two together, and that’s the extension of exactly what you’re saying. Data engineering is a different thing, but it extends from software engineering, and then the data science engineering hybrid is extending from the data engineering. So we’re getting further and further and further away from the older kind of demarcated roles, and it’s going more towards this hybridization, where I believe in the future your general software engineer is going to know machine learning.
In other words, it’s going to be part what a software – the expectation of a software engineer is they’re going to know some machine learning. That’s where I believe where we’re going to be headed. We’re going to be even more hybridized. But we’ll see.
Mike Vizard: Chad, what is the company’s expectation for the investment in data science? People talk about AI, but at the end of the day, what do you think the business executives are going to get out of that investment? Are they going to be able to see or guess or know who’s in the market for furniture when? How smart can smart get?
Chad Simpson: I think that’s a good question. We talk about descriptive and prescriptive analytics. We’re still heavily in the descriptive analytics phase, but we want to move more towards that prescriptive. Right now, we’re dealing a lot with no knowns, but we’ll start with more data science, more machine learning, more data.
We’re going to get more into that unknown unknowns, which is really telling the business things that they don’t even think about, just because the team can see correlations in data in mixing in other environmental variables. My hope is that they will start giving the business insights that they’re not even thinking about, they’re not even asking about, which will lead to other questions and other benefits.
We have a long learning curve just to serve each department head and things that they want to see, but I really think to take it to the next level it’s us using the data to tell stories that we’re not even thinking about yet. As we get more data, as we get more efficient, as we build up the team and the team understands our business more, which I think is very important for us and for any business, I think we’ll start getting more insights that we’re not even thinking about today, questions we’re not even asking.
Mike Vizard: Ryan, do you think that we forgot about data virtualization as a concept that’s been around for a while? Back when I was younger, people told me the root of all evil was moving data. So maybe the time has come again and we should rethink our whole approach to how we want to access and analyze data.
Ryan Fattini: I think so. The biggest pushback to data virtualization is performance. You have data source A, data source B, data source C, and you’re kind of at the mercy of the slowest data source. Sometimes some of these complex joins against the multiple data sources. On a server, it can be a little bit clunky. Sometimes queries can hang.
So the biggest industry pushback to virtualization is performance, but, like everything, performance improves. Performance increases. It get better and better and better. So I think when performance is at a level where everybody is absolutely comfortable plugging 50 data sources into a virtualization layer, that is probably the future, probably the future that we’re simply connecting everything into one virtualization layer, and everything is in place and it can reduce a lot of cost and insights can be immediately available.
The performance needs to get there, but I believe that’s the direction we’re headed. Some years down the road, we’re going to be almost entirely virtualization. It’s going to be the data lakes, the data warehouses, the data demarcs, all this schlepping of data like you just mentioned are going to probably go away. We need to get the performance and the tech up to speed, but the philosophy and the concept I think is the direction.
Mike Vizard: Chad, what do you know now that you wish you knew three years ago, when you first started on this whole adventure?
Chad Simpson: That’s a good question. I would say we did some trial and errors to get to where we are. It wasn’t a perfect, smooth journey. I would say really having a good plan to curate the data before you start trying to move it and use it, spending a lot of time on that upfront.
I think we had to go back and take multiple steps with that to get to a point where we had good, clean, curated data that we felt good about. I think if we would have had more of a planning phase for that upfront, it would have prevented some of the rework downstream.
Some of that is very hard to not have to go through a rigorous process, just because of the nature of that data and the company. It’s typically not in a state that’s really consumable, so it does take work. But I would say more time planning upfront on curation and the cleansing of the data before moving it around or exposing it would be my number one.
Mike Vizard: All right. Ryan, you are King for the Day of all things related to data and storage. What’s the one thing you wish the industry would kind of fix to make your life a little easier?
Ryan Fattini: Honestly, I don’t even think it’s the tech. I think if there was a solution to culture. What happens is the tech moves fast, and it moves faster a lot of times than the culture. So I don’t have a wish tech-wise, but I wish we had a better play the soft game of how to catch the culture up sometimes, the speed, because our tech moves faster. We move very fast. Things change quickly.
Then when you’re talking about paradigm shifting types of cultural changes, it sometimes slows everything down. I believe our data team and our tech team is far ahead of sometimes the way the actual business culture is. I think we’re ahead of them a little bit, and that can be frustrating at times because without those communication skills, without the ability to communicate what you’re doing so they can understand, it can kind of stall a little bit.
So I don’t think tech is a problem. I think any company can deal with this. I think understanding how to massage the culture in a way to understand these types of paradigm shifting moves, especially as it relates to data, I would like to – if I were King for a Day, that answer.
Mike Vizard: All right. You heard it here, folks. For the first time after three or four decades, IT is moving faster than the business can absorb. We all heard about the business complaining about IT for three decades, and we’ve reached some seminal moment.
Gentlemen, thank you for being on the show.
Chad Simpson: Thanks, Mike. I really appreciate it. I had a good time.