How synthetic data is unlocking new opportunities for intelligent video

 

By Barry Norton, VP of Human-Centred AI at Milestone Systems.

 

Video technology has come a long way over the past few decades, not least because of advances in video analytics, and the AI that makes this possible. Yet while the AI market is projected to reach an eye-watering 1.3 trillion USD by 2030 according to a MarketsandMarkets forecast, one potential drag on this massive growth is the availability of large datasets on which to train AI models. So-called synthetic data could be the answer.

Pioneering work by the so-called “Godfathers of AI,” 2018 Turing Award winners Yoshua Bengio, Geoffrey Hinton, and Yann LeCun, along with Fei-Fei Li’s creation of ImageNet, helped lay the groundwork for modern AI, particularly in computer vision (CV). That’s particularly relevant for sensors that create image data, such as video, and it unlocked a host of new opportunities to improve the safety of our cities, transport, retail stores and more.

Because of AI, organisations are now able to gain deeper insights to inform their strategies and make better decisions on where to build a new road, which products to place on a particular store shelf, and how to plan maintenance or cleaning schedules. It truly is a brave new world, transformed by the combination of video and AI.

Accurate AI requires large training datasets

However, to make these AI models as accurate as possible, training with huge datasets is needed. The datasets used to train AI models need to be representative, diverse to ensure accuracy and fairness, and legally sourced to respect data owners IP rights. As AI evolves, the need for these large, (partly)annotated datasets becomes more pressing and obtaining this data isn’t always simple. Especially when dealing with sensors such as cameras that can collect a lot of personal or confidential information. Safety, privacy, and practical limitations can restrict the amount and quality of data that an AI can be trained with.

This is where synthetic data steps in to open up new opportunities.

The solution offered by synthetic data

Synthetic data refers to artificially generated or augmented datasets that simulate real-world conditions. By using this data, AI developers can train models on vast amounts of diverse and representative information, while mitigating the ethical and legal concerns surrounding privacy and consent. Moreover, synthetic data can preserve key real-world characteristics, ensuring that models learn from realistic environments without needing to expose individuals to risk — and it is a ready-to-use source, which can speed up algorithm development time.

What’s more, synthetic data can help reduce bias in AI models. Traditional datasets are often shaped by the biases present in the original data collection process, which can skew the outcomes of AI decision-making. By designing synthetic data collection processes thoughtfully, developers can minimise the biases that arise from relying on historical datasets.

Lastly, synthetic data is scalable and cost-effective. It enables AI developers to create vast, diverse datasets quickly and affordably, which is particularly useful for tasks that require specific, high-quality data that is not readily available.

BOX-OUT In action: protecting Danish harbours

The potential role of synthetic data in improving safety and saving lives can be seen in a research project in Denmark, where AI models used to detect someone falling into a harbour have been trained on different datasets including synthetic data.

Unfortunately, Danish harbours have witnessed numerous drowning incidents over the years, with 1,647 lives lost between 2001 and 2015 in Danish waters, and a quarter of these tragedies occurring in harbours themselves.

In one of Denmark’s busiest ports, Aalborg Harbour, researchers created the largest outdoor thermal dataset for video analytics to enable AI-equipped video cameras to detect different types of objects in a thermal setup. To cover fall incidents, volunteers were asked to fall into water. It was however too dangerous to ask human volunteers to do this. Moreover, jumping into a harbour looks different from someone accidentally losing their footing and falling in. The researchers also needed a representative dataset to cover wheelchair users, cyclists, and skateboarders.

Warmed-up dummies were used to mimic human bodies, but again, couldn’t fully capture the full complexity of a human falling into the harbour. Therefore, the best solution was synthetic data that could model more intricate behaviours and diverse falling scenarios.

Using synthetic data, the project expanded its training dataset without compromising safety or ethical concerns. The AI model developed through this process show promising results to alert rescue teams if and when a person fell into the harbour, increasing the chances of survival by minimising response times and reducing cold water exposure.

The broader applications for synthetic data

Video analytics is ubiquitous across multiple industries and the same will apply to the synthetic data it is trained with. Further use cases include manufacturing, where synthetic data-trained AI models can ensure automated production lines function correctly. AI can detect anomalies in production or potential equipment failure. Collecting large amounts of production line footage can be risky, given the confidential information on manufacturing techniques and components.

Synthetic data may also be helpful in healthcare settings where patient privacy is paramount and collecting training data for scenarios like falling might be too challenging. It can help to train models to detect when a dementia patient is lost and wandering the halls of a hospital, or for example, to alert staff when a care home patient has fallen out of their bed.

A growing opportunity

As we witness more uses of AI in video and other applications, so too can we expect a rise in the use of synthetic data. Providing a safe, ethical and scalable data source, this data can be the best option in some situations. Everyone working with data and video, therefore, should be aware of the opportunities that synthetic data brings to their AI’s accuracy, representation, and overall effectiveness.