Big DataNewsletter

How Small Data Can Drive Big Insights

data science

There seems to be a widespread tendency to focus on “Big Data” and advanced analytics before addressing the basics of managing small data at smaller scales within an organization. Clearly, Big Data is becoming a more important part of the landscape, but in most cases, it is not the most pressing data issue within an enterprise.

A Gartner report predicts that by 2025, 70% of organizations will shift their focus from big to small and wide data, providing more contexts for analytics and making artificial intelligence (AI) less data hungry.

“Disruptions such as the COVID-19 pandemic have caused historical data that reflects past conditions to quickly become obsolete, which is breaking much production AI and machine learning (ML) models,” said Jim Hare, distinguished research vice president at Gartner.

“In addition, decision making by humans and AI has become more complex and demanding, and overly reliant on data hungry deep learning approaches,” said Hare.

In such a scenario, CXOs should turn to new analytics techniques known as “small data” and “wide data”, said Gartner.

Small data is an approach that requires less data but still offers useful insights. The approach includes certain time-series analysis techniques or few-shot learning, synthetic data, or self-supervised learning.

Wide data enables the analysis and synergy of a variety of small and large, unstructured, and structured data sources. It applies X analytics, with X standing for finding links between data sources, as well as for a diversity of data formats. These formats include tabular, text, image, video, audio, voice, temperature, or even smell and vibration.

Taken together they are capable of using available data more effectively, either by reducing the required volume or by extracting more value from unstructured, diverse data sources,” said Hare, adding that both approaches facilitate more robust analytics and AI, reducing an organization’s dependency on big data and enabling a richer, more complete situational awareness or 360-degree view.

Why big data is not for everyone

Even businesses that generate billions of dollars in revenue often don’t have anything that can truly be called Big Data. Millions, and in some cases billions of records do not necessarily amount to Big Data.

Placing your data into the wrong category can be costly leading to complex technology, more challenging user experiences, and less stability. Big Data technology is inherently more temperamental than Small Data technology, believe experts who draw a comparison between Hadoop and PostgreSQL, and there are fewer skilled technologists.

In that sense, Small Data as structured or unstructured information that is in the sub-terabyte range in scale. In most businesses, this can include all core sales information, operational performance data, or purchasing data over the course of several years. You should be asking yourself whether this core data is integrated, accessible, and useful before adding other, larger, less-structured data sets to the mix.

Small Data Can Drive Big Insights

Small Data often provides the clarity and intuition about your business that complex analytical magic can’t necessarily provide. Big data and predictive analytics often help you do those things that you are already doing faster, more efficiently, or in a more targeted way. Small data can often tell you whether you are doing the right things in the first place, explained Maryam Farboodi, Assistant Professor of Finance at the MIT Sloan School of Management.

“Big Data is all about finding correlations, but Small Data is all about finding the causation, the reason why,” Farboodi said.

On one hand, you have the large companies like the Amazons and the eBays of the world, which are thriving on Big Data. A lot of small businesses have been made to believe that they have to follow that trend. But small firm doesn’t have a lot of data and the high uncertainty makes investors reluctant to finance these firms. Statistics suggest that small firms are going through a tough time. In the last three decades, the annual rate of new startups has slipped to less than 8- 13%, according to previous research that the authors cite in their paper.

Farboodi, and her coauthors, Juliane Begenau of Stanford University’s Graduate School of Business and Laura Veldkamp of Columbia University Business School, writes: “Big companies, after all, have more economic activity and longer company histories so they have more data to process and analyze. As Big Data technology improves, large firms will continue to attract a more than proportional share of data processing, and investor support and interest. For smaller companies, the more data they produce, the better chance they stand of lowering their cost of capital. The researchers conclude that if investors can better understand their returns, they will face lower risk and will be more willing to finance them. That would decrease the cost of capital faced by these firms and help them explore growth opportunities.”

According to Gartner, Potential areas where small and wide data can be used are demand forecasting in retail, real-time behavioral and emotional intelligence in customer service applied to hyper-personalization, and customer experience improvement.

Other areas include physical security or fraud detection and adaptive autonomous systems, such as robots, which constantly learn by the analysis of correlations in time and space of events in different sensory channels.

Overcoming the small data challenges

Nonetheless, small Data presents its own set of challenges that are further complicated when Big Data is added to the mix. The hard part tends to be getting the business meaning of the data right, linking it to reference data, and handling the exceptions. The scale of data doesn’t have much impact on how hard it is to integrate as one might think.

It can be difficult for business users to explore data because of technical challenges – IT organizations may not have the right specialized expertise to wring every drop of performance out of their data oriented systems. This means that handling performance and usability problems is often deferred.

There are a few key things that can help simplify getting more value out of Small Data and addressing some of the challenges identified above. One obvious way, xx notes is by creating a vision and pursuing an “Enterprise Data Architecture”. However, the idea is to start small.

“Do not try to build complete enterprise data architecture and attendant processes before going to work with real data, solving real business problems,” Martin Lindstrom said in his book ‘Small Data’.

At the same time, transitioning away from a legacy environment can be daunting, but the tools and technology available have progressed rapidly. Building a cloud-based infrastructure could be surprisingly low cost.

From a more technical perspective, there are a number of established technologies that can help support making better use of small data, for instance, having a public cloud. The economics have become very attractive at all but the very largest scales, and unless you have highly specialized regulatory or security requirements, there are few functional barriers.

Most organizations will not be able to attract the specialized skills that they need to succeed in fully realizing the value of their data. Look to external providers or Specialists over broad-based integrators/consultants, which operate on a high-leverage model to fill the gaps.

Nonetheless, for many problems and questions, small data in itself is enough. The data on my household energy use, the times of local buses, government spending – these are all small data. As Rufus Pollock, social entrepreneur and head of the Open Knowledge Foundation, says the hype around big data is misplaced – small, linked data is where the real value lies.

The real opportunity is not big data, but small data. Not centralized “big iron”, but decentralized data wrangling. Not “one ring to rule them all” but “small pieces loosely joined”, believes Pollock. Hence one can safely say, the next decade belongs to distributed models not centralized ones, to collaboration not control, and to small data not big data.


Leave a Response