Some notable new trends are emerging in innovative organizations that are mature in data, indicating we are moving into a new era for data science.
Mature organizations are starting to change their strategies in how they organize their data people, and how they build their data science platforms. These organizations aim to extract more value from data than their peers - and do it more consistently, more rapidly, and more cost-effectively.
So, what are these innovative changes they are making?
The first new trend we see is the merger of data science teams with general data analytics teams into larger organizations focused on Data Enablement, and on delivering AI-powered Products/Services in production as a goal.
It is a move that brings data engineering teams and data science teams together under the same management structure, reversing a decade of ring-fencing data science teams away from their generalist data analytics peers.
This is a big change in direction. Previously, data science teams had been broadly organized around an activity I call value discovery. Value discovery was a “getting started” approach used early on, to help organizations discover if data science and AI technologies could be valuable to them. During this early phase of enterprise data science adoption, the focus was on building nascent AI teams, building their relationships with the business, and defining and testing hypotheses to discover how valuable the data science use cases might be. What they discovered was whether or not AI could be of value to the business.
The legacy of this value discovery phase can be seen in the organization chart. Five years ago, almost all data science teams I know of were ring-fenced away from generalist data analytics teams. Today we are starting to see mature organizations remove that ring-fencing. Part of the motivation is that they are demanding that their AI investments result in products that run in production, rather than limited proof of value applications that run in local laboratories, or under the desk. Leaders are now pushing teams to move to a new mode of working that I call value delivery. They want AI-led growth that shareholders can bank on.
The second new trend is a move to investing in technologies that incorporate data engineering best practices to help Data Scientists meet a rising tide of AI industrialization expectations.
Data science leaders are now inheriting the requirements that general analytics teams (doing data engineering) have had for decades -- such as provenance, auditability, security, resiliency, and scalability -- and they are responding to them by investing in new types of tools that help them to raise their game. In these forward-looking enterprises, business leaders have been very clear, “AI and data science isn’t an experiment anymore, it’s business as usual, please act accordingly.” The directive to productize AI is driving ideas from data engineering into the data science stack, a move further supporting the need to fold these teams together.
The rise of the Feature Store is one of the clearest examples of this change in focus. Feature Stores, if you've not yet heard of them, are a specialist type of production-grade data warehouse, but re-imagined such that they accelerate data scientists in building AI services. Feature Stores help data scientists in sharing trusted training data, selecting and reusing trusted feature engineering code, and delivering data security. They are also scalable and resilient, both when training and deploying AI models. In short, they help you do data-science-engineering in production rather than in labs, accelerate best practices like MLOps, and the finished models can be reliably integrated with other production-grade services, to deliver enterprise and web-scale applications.
The second example is the rise of Apache Spark as a first-class data science tool. Apache Spark is a highly scalable parallel computation environment that helps construct production-grade applications that process data. Having personally written a book on the subject in 2017, I have a great deal of first-hand experience in what I call Spark hesitancy in the data science community, who prefer python tools that often do not scale natively. Year on year, this hesitancy is turning to advocacy. This change is partly in response to the monumental efforts to build cloud-ready products like DataBricks and AWS Glue, and community work to get Spark to run on Kubernetes, all of which are making it easy to run cloud-native AI applications that organizations can trust to run lights out, which brings me to the next emerging trend.
The third new trend is that data scientist teams are migrating to the cloud, and to move this along, they are engaging third-party suppliers, often for the first time.
Migrating to the cloud is challenging, and engineering skills not typically found in data science teams are needed to make it happen. As established teams struggle to scale up these new skills, they are starting to reach out to firms like 6point6 to help them make the transition.
To summarize, I see a large structural shift in the enterprise Data Science space. The new challenge being set by leaders is one firmly aimed at Value Delivery and the industrialization of established use cases to drive shareholder value. The emerging technical and organizational tactics we are observing all corroborate that new change in direction, and I expect these changes to begin spreading to all sectors, as a wider group of organizations mature past the discovery phase. This trend is so clear we have created a dedicated Feature Engineering (FE) team that is specifically positioned to offer these organizations the help they need to make that data science transition to the cloud. Feature Engineering is where we are placing our bets.
Andrew Morgan is the Director of Data at 6point6. A 25-year practitioner in data engineering and data science, he is the author of the book ‘Mastering Spark for Data Science.’