
The year 2024 will be remembered as a year of data consolidation, increased data volumes and governance, a focus on artificial intelligence (AI)-driven quality and data democratization.
Although foreseeing what will happen in the year ahead has become somewhat of a cliché, it’s still enlightening and educational.
In the spirit of tradition, here are a few predictions for the data landscape in the year ahead, given by David Jayatillake, the VP of AI with Cube:
1. Priority #1 will be accurate, performance-optimized, reliable data.
Is there an AI winter coming? Maybe. Businesses have invested heavily in AI, and many are not getting the expected benefits. AI disillusionment is primarily due to one crucial factor: AI projects depend heavily on quality data.
In 2025, organizations will stop relying on their messy data warehouses. Instead, they will put an abstraction layer—a universal semantic layer—between cloud data platforms and data consumers like AI so they can enrich their data with context and meaning and avoid hallucinations.
2. We’ll see more open, cloud-based storage.
In past years, storage has been coupled with the data warehouse, whether Databricks and Iceberg or Snowflake, and scalable Cloud blob storage. With the advent of open storage layers, companies will be able to use whatever compute they want, so they are no longer locked in, and they can use more tools to access the same data easily and efficiently.
Open, cloud-based data storage also typically requires less technical configuration by the user. Instead, modern data stack companies provide security, maintenance and updates as part of their service. This makes modern data stacks more flexible and efficient than their legacy counterparts.
3. No-code visual data modeling will become the norm.
In 2025, it will become standard practice for visual interfaces to coexist alongside code-first data modeling platforms as large enterprises increasingly adopt data modeling tools and business users and data engineers collaborate.
In the past, contributing to data models required a solid understanding of code, a barrier that limited participation to those with a technical background. Visual interfaces will allow data analysts and business users to contribute directly to data models.
4. Data products and embedded analytics will become table stakes.
Data products and embedded analytics are further along than people realize and will become table stakes in 2025. Many businesses already sell data, data products and analytics suites integrated into enterprise software packages.
As people become more used to data features in their products, there will be more data-product features in those existing applications.
5. Universal semantic layers will be a boon for data security.
Teams frequently store data at rest in various places, making sensitive data vulnerable. The recent Snowflake data breach wasn’t an actual data breach at all. It was caused by people leaving their credentials open so bad actors could access them. With a universal semantic layer that aggregates, limits, and controls access to data for most use cases, businesses can avoid the risks associated with everyone accessing data platforms without proper access controls.
Semantic layers have many features that enable easier access, but in a double-sided way, they also make the data more secure. To provide the correct data aggregated the right way, people get good answers to queries and have limited access. Semantic layers also define the data, meaning and entities it relates to; therefore, how to secure the data becomes much more apparent. For example, rather than data residing in a column, table or file with a difficult-to-decipher name that makes it impossible to know if it contains personally identifiable information (PII), the semantic layer classifies whether the data contains PII, such as an email address.
6. Organizations will stop trying to build their own universal semantic layers.
Many businesses build their own data governance, definitions, APIs and caching to protect their underlying data system. They soon realize the effort is incredibly costly and consumes too many engineering resources.
Any company that consumes data, whether via a BI tool or a front-facing customer application, often begins by writing a bespoke SQL every time they want to access data. They soon realize this is a costly waste of time and resources. Instead of building data governance, definitions, APIs, caching to protect the underlying data system, and so on, they prefer to buy a pre-built system that has been built over five years with a dedicated team of engineers.
A Final Word
It’s human nature to want to know what’s coming next, so as a data geek, I can’t resist analyzing the ever-changing landscape and hypothesizing what’s to come.
As 2024 comes to a close, I’m glad to see so many organizations harnessing data to make more informed insights, whether to advance their business goals, solve the climate dilemma, further medical research or empower customers to better their lives.
Want to hear more technology predictions from Techstrong? Register for Predict 2025 HERE.