Framework enhances synthetic data for AI training


By Steinur Bell, University of Pittsburgh Swanson School of Engineering
Thursday, 25 September, 2025

Framework enhances synthetic data for AI training

To train artificial intelligence (AI) models, researchers need good data and lots of it. However, most real-world data has already been used, leading scientists to generate synthetic data. While the generated data helps solve the issue of quantity, it may not always have good quality, and assessing its quality has been overlooked.

Wei Gao, associate professor of electrical and computer engineering at the University of Pittsburgh Swanson School of Engineering, has collaborated with researchers from Peking University to develop analytical metrics to evaluate the quality of synthetic wireless data. The researchers have created a novel framework that significantly improves the task-driven training of AI models using synthetic wireless data.

“Synthetic data is vital for training AI models, but for modalities such as images, video or sound, and especially wireless signals, generating good data can be difficult,” Gao said.

Gao has developed metrics to quantify affinity and diversity, essential qualities for synthetic data to be used for effectively training AI models.

“Generated data shouldn’t be random. Take human faces. If you’re training an AI model to identify human faces, you need to ensure that the images of faces represent actual faces. They can’t have three eyes or two noses. They must have affinity,” Gao said.

The images also need diversity. Training an AI model on a million images of an identical face won’t achieve much. While the faces must have affinity, they must also be different, as human faces are. As Gao noted, “AI models learn from variation.”

Different tasks have different requirements for judging affinity and diversity. Recognising a specific human face is different from distinguishing it from that of a dog or a cat, with each task having unique data requirements. Therefore, in systemically assessing the quality of synthetic data, the team applied a task-specific approach.

“We applied our method to downstream tasks and evaluated the existing work of synthesising data. We found that most synthetic data achieved good diversity, but some had problems satisfying affinity, especially wireless signals,” Gao said.

Today, wireless signals are used in a range of technologies. Mobile and Wi-Fi signals, as radio waves, hit objects and bounce back towards their source. These signals can be interpreted to indicate everything from sleep patterns to the shape of a person sitting on a couch. To advance this technology, researchers need more wireless data to train models to recognise human behaviours in the signal patterns. However, as a waveform, the signals are difficult for humans to evaluate.

It’s not like human faces, which can be clearly defined. “Our research found that current synthetic wireless data is limited in its affinity. This leads to mislabelled data and degraded task performance,” Gao said.

To improve affinity in wireless signals, the researchers took a semi-supervised learning approach. “We used a small amount of labelled synthetic data, which was verified as legitimate,” Gao said. “We used this data to teach the model what is and isn’t legitimate.”

Gao and his collaborators developed SynCheck, a framework that filters out synthetic wireless samples with low affinity and labels the remaining samples during iterative training of a model.

“We found that our system improves performance by 4.3%, whereas a nonselective use of synthetic wireless data degrades performance by 13.4%,” Gao said.

This research marks an important first step towards ensuring not just an endless stream of data, but of quality data that scientists can use to train more sophisticated AI models.

Image credit: iStock.com/RerF

Related News

Layered semiconductor shows potential for memory storage

Researchers have discovered that a unique semiconductor undergoes notable structural changes...

Research breakthrough in data centre interconnects

Researchers have combined nonlinear predistortion and digital resolution enhancement to overcome...

Researchers achieve data speed record on optical fibre

Researchers have reportedly achieved a world record transmission capacity with a new compact...


  • All content Copyright © 2025 Westwick-Farrow Pty Ltd