April 18, 2022
Opinion: CEO of Synthesis AI says synthetic data can save on AI development cost and time.
There’s no shortage of predictions about how AI could fundamentally change the way we live and work. AI, enabling machines to exhibit human-like cognition, can drive our cars, recognize intricate patterns, and has become a transformational technology in our digital age.
For today’s data and technology leaders, the pressure is mounting to create a modern data architecture that fully fuels their company’s digital and AI transformations. Simultaneously, concerns about AI adoption and scalability are creating a ‘moving target’ problem for leaders.
Furthermore, the performance of AI systems is hard to predict and requires a significant upfront investment to acquire and prepare the necessary training data. The difficulties associated with enterprise adoption could pose significant barriers to realizing the benefits of AI.
When organizations seek to harness the power of AI, one of the first questions AI programs may need to answer is around analytical adequacy: Is there data, and is it of sufficient quality to address the specific business need? Data is the foundation for any AI project, but the trouble is that there is not a clear-cut answer for how much data you need to ensure a target performance.
Because of this, the data collection process remains a significant stumbling block for today’s enterprises. An Allegion report revealed that 51% of respondents claimed they did not have enough data, adding to the complexity of completing an AI project.
Computer vision and visual data
Computer vision is one application of AI that holds great potential to transform industries that generate massive amounts of visual data.
Computer vision is what enables computers to look at the world and automatically make decisions. Its applications range from autonomous vehicles to facial verification to unlock our phones, with a myriad of use cases in between.
Yet, the bottleneck holding back the development of computer vision is, ironically, data. For computer vision to take root, thousands of photos, videos, and other images need to be compiled for an effective AI to become useful.
Up to now, AI computer vision has relied heavily on supervised learning, where humans label key attributes in an image and then teach computers to do the same. Supervised learning requires humans in the loop, making the process expensive and hard to scale.
These challenges do not come as a result of technical complexity, rather, process complexity. Once the data is collected, teams need to correctly label and categorize it to feed and train an algorithm. To create reliable and accurate training data, data scientists collect and label tens of thousands and sometimes millions of data assets.
This work can cost from $0.50 to $1 per picture frame and take more than 30 minutes for a single individual to label all associated pixels.
Once this process is finished, the model is trained by optimizing its performance, and only when the process is complete will you know what a model is capable of producing.
As we look at the world of autonomous vehicles, augmented reality, and virtual reality, it becomes clear that we’re fundamentally limited by the traditional approaches of AI computer vision.
In applications requiring human data, privacy and regulatory concerns present additional challenges.
Enterprises that want to use AI to tackle significant challenges need a more ambitious approach. Enter synthetic data -- computer-generated data that mimics real-world phenomena.
The ability to create vast amounts of perfectly-labeled photorealistic images on demand has the potential to change the current AI development paradigm.
Synthetic data disrupts traditional data-to-insight pipelines, allowing organizations of all sizes to test, tune, and optimize revolutionary AI models to create dramatically better business value.
Synthetic data is an important approach to solving the data problem. It takes the form of digitally created images, video, and 3D environments that can be used to train deep learning models to develop perfectly-labeled, realistic datasets and simulated environments at scale − meaning data scientists can use it to overcome a massive barrier to entry.
Rather than the traditional approach of building something, breaking it, analyzing the failures, and returning to the drawing board to redesign, synthetic data promises to break that protracted cycle and encourage more widespread design exploration at the beginning stages.
It addresses the complex landscape of accelerated time-to-market schedules by providing engineers early insights to reduce costs and risks, improve delivery schedules, and bolster competitive advantage with more innovative products.
The ability to test drive a greater number of possible design iterations at the process’s onset allows organizations to work out any complications early on when changes are far less costly. Synthetic data also directly addresses potential privacy and regulatory concerns.
Synthetic data’s simulation-driven design has the power to flip the development process on its head. While it is not a one-size-fits-all solution, synthetic data has the potential to dramatically improve the economics and chances for success in AI transformation initiatives.
Using simulated data as a source of training samples can significantly save both cost and time while also addressing real-world data scarcity.