The prevalence of big data received a shot in the arm from an unlikely source in early 2020. The widespread impact of Covid-19 saw companies turbo-charge their digital transformation plans, and a wider array of organizations began identifying opportunities by extracting insights through analytics of deep data lakes. This enabled them to discover trends and patterns never before discernable.
But the tsunami of data created by ubiquitous global digitization has created challenges surrounding storage and analytic firepower, sometimes limiting its potential. With the rapid adoption of cloud technologies across virtually every industry, many of these obstacles have begun to fade. In their place, we now see exciting new potential for more practical, instant and accessible business intelligence. With this in mind, here are our top five predictions you’ll see play out this year.
The adoption of data lakes
With digital adoption supercharged by the outbreak of Covid-19, data lakes have become a highly economical option for companies. The rise in remote and hybrid working environments has increased the need for data lakes for faster and more efficient data manipulation. With Microsoft, Google, Amazon and other tech giants actively encouraging the move to the cloud, the adoption of data lakes is making it easier and cheaper.
As organizations migrate to the cloud and focus on cloud data lakes, they will also move to converge the data warehouse with the data lake. Data warehouses were created to be optimized for SQL analytics, but the need for an open, straightforward and secure platform that can support the rapid rise in new types of analytic requirements and machine learning will ultimately see data lakes become the primary storage for data. The adoption of data lakes will continue in 2022 and beyond with the market expected to grow from $3.74 billion in 2020 to $17.60 billion by 2026, at a CAGR of 29.9% over the forecast period 2021 – 2026.
Streaming data and data at rest will unify
Big Data analytics today focuses on two primary sources — streaming data and data residing in a database or data lake. Over the next year, we expect to see these sources continue to converge with streaming and operational systems providing more unified analytics. The result will be an improvement of data-driven insights to improve operational decision-making through the use of lightweight analytics and improved predictive capabilities.
With a data lake, or even a simple database, queries can be fairly complex without regard for dynamic data flows that require extensive resources to process. Streaming data is fluid, and those resource demands and ongoing additions therefore require that its queries remain superficial. As such, today’s predictions for financial markets, supply chain, customer profiling and maintenance-repair-overhaul – are limited, often based on lightweight, “shallow” data.
This year, we’ll see the steady increase of cloud-based storage and applications providing the elasticity needed to eliminate resource limitations and replace the traditional approach of the familiar centralized structure. Performing analytics on distributed clusters – and aggregating the results of both streaming and operational data sources on other clusters into a single pane of glass – will become the norm. The results will yield truly comprehensive predictive models, taking the best from a data lake’s deep data and the streaming source’s live data flows.
Data sharing will become pervasive
Beyond the technical advantages of cloud migration (hardware support, storage/bandwidth limits, backup, and security), perhaps the most obvious is the ability to share data that is no longer stored physically within a company’s internal network. Providing valuable data – of use strategically, financially, or even for compliance – to third parties simplifies and streamlines distribution processes for both the provider and consumer. One significant benefit: the data lake/streaming data analysis discussed above now has a new consumer base. Whether focusing first on a commercialized, public-facing marketplace like AWS, or starting with an internal sharing platform like Snowflake’s (for internal departments and some verticals), this paradigm applies to each approach, and offers fundamental improvements to the complex, multi-step systems and policies in place today. Cloud providers will offer both these data exchange offerings in order to capture the market for both “intranet and internet” data providers and their consumers.
Query engines will become smarter, seamlessly adapting to process unprepared data
Database optimization is being sped up and improved by baking machine learning (ML) right into the database. It’s a prime use case, as the ML has access to its most valuable resource for building effective models: massive amounts of anonymized data, within a well-defined structure and context. We have witnessed this trend making strides with the creation or dropping of indexes as the query engine senses the need, but this is just the beginning, and momentum will snowball. This trend is seeing an increasing drive towards the separation of data storage and data consumption. The next generation engine will embrace this separation between data storage and consumption by applying dynamic acceleration strategies, such as cache and index, based on the analytical workload pattern and behavior. The philosophy behind this revolution is to ‘let the engine work for you’. The engine should not expect the data to be prepared, rather the engine adjusts itself to the data it encounters. This wide-open space will become a must-have rather than a nice-to-have function, as customers discover both cost savings and improved performance.
Predictive analytics will drive next-generation of digital applications
As we began this blog with our discussion of the merging of analytics drawn from dynamic data feeds and data lakes, the access to these insights will need to be re-imagined. The classic dashboards used for “data storytelling” are today based on historical data carefully collected, queried and gathered into a report for periodic review. It’s good stuff, but it’s outdated by the time it’s compiled and presented.
As we move through 2022, we’ll see the dashboard remain in use, but the content offered will be live and as-it-happens dynamic, drawn from processes built right into application code. Significantly, access to this information will also be democratized across all relevant internal departments, available directly to tactical teams like sales, marketing, QA, and others – rather than having to be parsed, interpreted, and distributed by a data department. With live trend analysis, these departments can adapt and improve much more quickly than with today’s longer-term cycles. With the recognition that business value is often about how people react and behave, rather than simply following the money, this game-changing drive toward prediction is an exciting “perfect storm” of new advances in cloud, database and analytics.
As should be clear, 2022 appears to be the year of confluence. The merging of several technological paradigms maturing steadily in the past few years is set to create a less compartmentalized, historical and resource-constrained analytics ecosystem.
Companies with the most to gain are those who value their ability to quickly adjust their processes and services based on what customers tell them they prefer – explicitly, but more and more, passively, through their actions.