In this Digital CxO Leadership Insights series video, Mark Van de Wiel, field CTO of Fivetran, explains why data is the key to resolving supply chain issues.
Mike Vizard: Hey folks welcome to my latest addition of the Digital CxO videocast. I’m your host, Mike Vizard and today we’re with Mark Van de Wiel, who is the field CTO for Fivetran. They do a lot of work in the data integration and transformation space. We’re going to be talking about the heavy lifting involved with data that goes into digital transformation. Hey, Mark, welcome to the show.
Mark Van de Wiel: Yeah, happy to be here, thank you Mike.
Mike Vizard: I guess my first question to you is: Do you think folks underestimate how much work is required with data when they start launching these digital transformation initiatives and that seems to me where all the heavy lifting is these days.
Mark Van de Wiel: Yeah absolutely, that’s a good point. I think organizations really underestimate what it takes to take data from the multitude of data sources organizations deal with and consolidate that data into a system in a way that they can really make progress towards their digital transformation. Especially if we think about, like it’s one thing to get some data into call it a “central data warehouse,” but then it’s one other thing to make sure that the data stays in sync and that it’s actually accurate data and that the data from multiple systems continues to flow continuously.
That’s where especially as it relates to software as a service systems, as an example, it becomes relatively complicated over time to make sure that the data continues to flow, because the API changes, the interface changes to retrieve the data and then the connector breaks and maybe there’s, maybe there’s problems with the data quality. We didn’t actually retrieve the right data. How do you recognize this? How do you manage this?
Because at the end of the day as a data-driven organization you’re going to make decisions based on the data that you’ve pulled together and what if that data wasn’t accurate? Well then, then you’re decisions are only going to be as accurate as the data that arrives there in the first place. I think the underestimate – what organizations underestimate, to your point, is like what is really involved in getting that system setup and getting those ongoing feeds into that centralized environment?
Mike Vizard: Part of that issue is if I look back in time most of the applications that are out there are what we would call “batch oriented” and they’re being updated once every 24 hours hopefully. But as we move into the brave, new digital business transformation world and everything is kind of operating in near-real time, so how hard is it to keep all that data in sync so you don’t wind up with a scenario where for example, someone goes online and they think that an item is going to be available in a certain store and they get there and it’s not there.
Mark Van de Wiel: Yeah, so that’s, that’s, that’s indeed the kind of problem space that at Fivetran we operate in. We look at replication technologies. At a high level I describe it as there’s two challenges that we’re trying to solve from the Fivetran perspective.
On the one hand there are systems that might still be running in an on-prem environment or maybe they’re currently cloud hosted, but they’re basically database applications and you’re looking at the transactions that take place in the database, right? Like an item gets old, then the database gets updated and there’s one less item in the inventory, right? And you want to keep that data visible to their customers, because as they approach the store they want to know if their item is in stock.
So it’s those kind of applications and for those applications there’s a technology out there or a concept, it’s called “Log-based Change Data Capture,” which is a very much nonintrusive technology approach to retrieve the changes and propagate them into another environment if you want to make that data available to another environment. So this is one kind of problem, the database applications.
Then the other kind of problem or challenge that we address from the Fivetran perspective is it’s really the software as a service applications. You’re a vendor and you’re using Salesforce, right? Like the grandfather of software was a service if you like, but then maybe you’re doing ticketing management in-service now. Well then you’ve got some support system out there on Zendesk and you’re doing all kind of marketing activities through Google Analytics or HubSpot or Marketo. You want to combine all that data and well to the point of the inventory and the items that you have in stock and then provide a wholistic overview and go through the motions of the digital transformation, the data-driven enterprise and you’re building solutions around all of that.
So now as it relates to the software as a service applications, that’s where some of the challenges come in; it’s like, “Hey, the vendor changed the API.” It’s like, “Hey, we’re pulling information about the Facebook ads that we placed and Facebook decided to change the API and now our pipeline is broken, like how do we fix it?”
This is to the earlier point about, what are the challenges? This is where some of the technology comes in and we provide a manage service where the customer doesn’t need to worry about like okay the vendor changed the API, now we will, we will adjust the pipeline or the connector when there are changes so that the data can continue to flow. And as an organization instead of focusing on keeping like what does it take to pull the data together, you can focus on solving your data problems. You can find out like what is the best way… “What marketing campaign should we run to get and to promote the items that we have in excess inventory,” as an example.
Mike Vizard: You guys also do some work in the area of what you’re calling “data lineage graphs” I think. Does data essentially starting to have an identity? Is that part of the whole process of keeping track of what data is where? Where we… You know what the graph do and how are we approaching that whole issue?
Mark Van de Wiel: Yeah, so from a data lineage perspective, you want to know where the data came from, so that you have an understanding whether the data is accurate and what is the impact of the data when it changes. As an organization you might want to know things like okay like from a customer perspective, from a privacy perspective there’s GDPR, et cetera.
Like when individuals approach the company and indicate that their data needs to be removed, you’re going to want to know where the data comes from and that’s where the data lineage comes in.
Mike Vizard: We also hear a lot about data gravity these days. Does data tend to have gravity and how much gravity and are there copies of data that exists in both say a cloud and on-premise environments that are pulling at each other or what is this concept of data gravity and how should I think about it?
Mark Van de Wiel: Yeah, so data gravity so if you think about data and if you think about the different systems, right? I alluded to the consolidation of data and that’s arguably like where gravity comes in, is where you want to pull together some of the data. But if you think about the different systems, we often talk about data silos, right? Like, “Yes, we have an ERP system and we manage our finance and we manage our supply chain through our ERP system.” “But then we have a CRM system and that CRM system becomes the place where we manage our customer information.” “Then we have a campaign management system and that becomes its own silo.”
Again, it’s in an effort to pull together data from these different data sources that’s where the data gravity comes in. You want to pull it into a central environment where you can do cross-system analysis if you like. Then you want to have the necessary analytical power to execute the kind of routines or analyses that you want to run and maybe some of the most forward-thinking organizations are looking into machine learning algorithms and artificial intelligence and where does that fit in?
We might have social media feeds, we might have internet of things kind of data sources that feed _____ of generated data into our system and how can we combine that with our manufacturing information in order to identify what parts to produce next, because we’re going to need them for preventive maintenance? Those kinds of scenarios are, is where indeed like from a data gravity perspective you want to pull together the data from the right sources in order to get it into a central environment.
Mike Vizard: When I was much younger the idea that you were moving data was considered a last resort. People said you know only bad things can happen when you move data. There will be a security issue. There will a cost issue. Now it seems like we’re routinely shift data, but do people really appreciate you know some of those finer points around security?
Mark Van de Wiel: Absolutely, like security is very important, it’s highly critical to customers and it’s where from a company perspective we make sure we have the security certifications in place so that companies can feel comfortable that a third-party auditor has assessed how we handle the data, how we deal with customer data, and how we can manage it. But absolutely security is indeed a very important aspect. I think if you think about the history of the data copying and there’s a lot of thought that came to mind when you made that comment, but like there is like you think about like how infrastructure has changed and how when the internet became really popular how that’s provided opportunities and then cloud vendors came along.
Where in the past when you think about well okay moving data and you think back a couple of decades like people would have to make a very significant investment into an environment that was going to run on-premises, that needed to be scoped quite carefully, because it was like a very significant investment and it had to last for at least five years and ideally it would be scalable, but it would be arguably it would have to be sized for the peak load that was expected over the course of the five years. Then you were going to deal with your data challenges.
One thing that the cloud provides is that arguably that shear infinitely scalable environment not only from a technology scalability perspective, but also using a pay-as-you-go model. So the amount of risk that you’re taking before you start copying your data, the amount of risk you’re taking into making that decision to try this out is very low. It’s not like you’re going to have to put down a seven-figure number on an investment that you know is going to get written off over the course of five years, no you can start it today and if you don’t like next month you turn it off and you stop paying for it. So that’s one aspect.
Then the other thing about copying data what we’ve seen in organizations we’ve seen like growing data volumes and growing complexity and everybody is online 24/7 and everybody is serving their customers 24/7 and as a result of that systems no longer have the kind of downtime that the once upon a time they might have had. So there is less opportunity to let’s say batch operations, but also there is the system becomes more critical, right?
Like you can’t… Like if there was any possible disruption to the system at any point during the day it’s going to be disruptive to your organization, because we are generating revenues 24/7 and because of that if you didn’t copy your data and you’d be hitting your system with let’s say the queries. It’s Monday morning on the east coast, and most analysts start their working day and they start hitting the system, and suddenly you see that flood of activity on your systems and it’s the same systems that process the transactions. Imagine your systems start slowing down as a result of that. And well as much as it’s primetime for the analysts to run the reports, it might also be primetime for the customers to make their purchases or your customers to drive your business. You don’t want that kind of impact.
So copying data I think has certainly evolved into a more acceptable concept, but also if you think about the security with the security certifications, with I guess the proven practices of public cloud vendors I think there is certainly a shift that has happened and continues to happen across industries where you see some of the less conservative to the more conservative ones all getting into, like buying into the concept of copying the data and likewise adopting the cloud technologies.
Mike Vizard: Back in the day we had these ETL tools and you had a specialist who would come along and move that for you and it wasn’t considered you know this hip function, but now today we have data engineers are like the hottest title going in IT and you’re hard pressed to find these folks.
Are we moving down a path where this whole process though will get democratized and your average end user will just start moving data around as they see fit or will we always need a specialist?
Mark Van de Wiel: We’re hoping and anticipating that we’re getting closer and closer to the end user, but at the same time I would recognize that organizations have gradually shifted their focus, right? Like there used to be a large IT department and the IT department was going to solve all of the IT problems across the organization, whether that was desktop or database applications or serve infrastructure, everything was done by the IT department.
Then once upon a time the CIO or the CEO decided that, “Well there is cost center over there, it’s called ‘IT’ and we want to save costs, we’re going to cut down the IT department, right, and we’re going to shrink it and we’re going to minimize it.” But at the same time there are needs, absolute needs from an analytics perspective, right? As we’re looking at driving the cost of the IT department, at the same time we’re realizing like, “Okay, but we want to digitize the organization. We want to run these analytics in order to become smarter about our customers. We want to outsmart, outperform the competition?” All of these things. So there is this need for analytics.
So we’ve seen a shift from let’s say the tech savvy people in the IT department, to getting more and more tech savvy individuals in the different business units, right, where it’s marketing or sales operations or whether it’s on the manufacturing side. We’ve seen more and more tech savvy individuals contributing to these organizations. Those users are the ones that can move the data. They have access to a system over here. They have access again using a pay-as-you-go scheme, low risk, right? Like if they kind of know what technology they need to work through the analytics that they want to access and they’re going to be able – they’re going to want to move the data.
What we’re as a data integration vendor what we look at is making it easier and easier for those kinds of users to enable their integration requirements and likewise to your point exactly it becomes more of a democratization of the data or the access to the data where you’re getting the users that move the data have less need to be tech savvy as time goes on. Again, like we’re really focused on enabling those kinds of users, because that’s where we see that the need is going to be going forward.
Mike Vizard: I don’t think it’s a secret that the way data is managed in most organizations would not get a Good Housekeeping seal of approval, it’s a bit of a mess. Is that going to get better? I think it’s a critical issue, because a lot of business leaders don’t always trust the data, because they’re not sure of its quality or providence or if it conflicts with something in another system. So are we getting to the point where we can rely on the data?
Mark Van de Wiel: Yes, that’s a, that’s a great question. I think that is a very important aspect of the kind of business that we’re in, right? Like it’s at the end of the day as an organization you or as an executive in an organization you need to be able to trust your data. If that trust is lost then arguably any and all costs that are invested in getting the data to the point where you’re looking at it, if that trust is lost then the entire solution is arguably wasted. There is no value if the trust is not there.
What we provide in our technology suite is data validation capability, so that users can go in at any moment in time and actually validate whether the data from the source is in fact in sync with the data on the destination. I think that is you’re making an incredibly important point here, because if again if the trust is not there and the solution would be completely wasted as a result of it, that in the end is going to make or break your digital environment, the foundation for your digital transformation. That I think is a very key aspect, a very key attribute that indeed as an organization you want to, you want to watch out for and make sure that you have the governance, like the routines in place that you can validate that your data is trustworthy.
Mike Vizard: All right folks you heard it here. The goal is to make better decisions faster, rather than just making more bad decisions faster.
Mark, thanks for being on the show.
Mark Van de Wiel: Excellent, thank you. This was a lot of fun, thank you.
Mike Vizard: All right guys and thank you all for tuning in to this episode, and there’s plenty more on digitalcxo.com. Thank you.