Faster imaging thanks to GPGPU-based computer modules

Congatec Australia Pty Ltd

By Zeljko Loncaric, Marketing Engineer
Tuesday, 31 May, 2016


Faster imaging thanks to GPGPU-based computer modules

GPGPU-based embedded computer modules provide more powerful graphics units with each new generation. Medical imaging technologies can use them not just to display but also to process compute-intensive raw data, thereby delivering higher quality results faster.

Before signal data from medical imaging applications such as ultrasound, X-ray, MRI, CT or digital video endoscopy can be displayed on a monitor, a huge amount of often highly complex calculations must be performed. And demands increase constantly: image quality and resolution are expected to be higher and higher while processing times must get shorter. This applies in every performance class — from mobile ultrasound devices to high-performance MRIs in real time.

To stay competitive, manufacturers are looking for ever more powerful hardware that can support increasingly faster and better imaging. For some time now, they have been able to use programmable graphics units — so-called general-purpose graphics processing units, or GPGPUs for short, that are available on computer-on-modules as application-ready and flexibly integrable computing cores.

Division of labour

The strength of GPGPU-based calculations is their parallel processing capability. For this purpose, GPGPUs integrate several hundreds to thousands of processing units in single instruction, multiple data (SIMD) engines. As their name implies, they can process a single operation on many data points in parallel.

In imaging technology, for example, this would be all calculations that are required to compute a line or complete image frame or video recording. If an image line consists of 4096 pixels (4K) with 16-bit colour information, you need approximately 8.2 clock cycles for one image line and 17,700 clock cycles for a complete image frame for one operation if there are 500 processing units in the GPU. With a GPU clock speed of 800 MHz and 400 GFlops computing power, a complete 4K image can be processed within 22 microseconds.

For ultrasound devices that currently have significantly lower resolutions, this is more than adequate. However, it is not enough to display MRI data in real time with complex algorithms. This is because an algorithm often uses thousands of such operations. The most powerful graphics cards today offer several teraflops of computing power, so they are capable of representing even MRI data in real time. For many developers of imaging technologies, GPGPUs are therefore a welcome alternative to DSPs or FPGAs.

An ultrasound-guided needle navigation system based on a congatec module with the AMD Embedded G-Series SoC.

A substitute for DSPs and FPGAs

There are several reasons that speak for GPGPUs instead of DSPs or FPGAs. Firstly, they are relatively standardised and therefore easier to program. Secondly, they are already part of the architecture that is also used for the visualisation. Another important argument, however, is performance: it increases faster in GPGPUs than with DSPs or FPGAs, as there is a huge demand for ever higher resolutions and better graphics in the consumer market. This is why there is such an enormous team of developers working on ever more efficient solutions that become ready for market even faster. Demand also ensures that GPGPUs can be offered at attractive prices.

For the programming of GPGPUs, OpenCL has established itself as an open standard. It is supported by all major processor and GPU manufacturers such as Intel and AMD, as well as ARM and NVIDIA. This comprehensive support by chipmakers provides a solid basis for long-term development and compatibility.

Hardware independence

Since OpenCL abstracts the underlying hardware, the generated code can be ported from one platform to another. The individual workloads are automatically distributed optimally between the defined — if necessary virtual — system’s available compute units. This ensures the best possible performance at minimum power consumption. Thanks to this high level of abstraction, OEMs become independent of the hardware used and can develop hardware and software separately. As a result, companies benefit from shorter time to market and reduced R&D expenses. This is particularly true in comparison to using FPGAs and DSPs, where each new generation often requires new programming effort.

Even with independence from a specific hardware, developers still need attractive platforms with promising GPU or GPGPU roadmaps. Next to high-end systems with multiple parallel graphics cards, SoC-based solutions are well suited to compact medical devices. SoCs integrate CPU and GPU together with the I/O controller hub in a single chip, thereby providing a highly integrated solution. There are two distinct performance classes: one for movable or stationary systems and one for portable and mobile devices. For x86, both Intel and AMD processors are suitable.

congatec computer modules.

AMD Embedded R-Series SoC processors

The AMD Embedded R-Series SoC processors are the latest GPGPU SoCs. They stand out through their high-performance AMD Radeon HD graphics and provide a maximum theoretical compute power of 819 GFlops. So, a 4K image with 4096 x 2160 pixels can be processed with up to 92,570 operations per second. In addition to these pure performance figures, it is interesting to note that the SoCs offer a TDP with a wide scalability from 12 to 35 W. This means they can power fanless, completely sealed and therefore particularly hygienic and robust designs.

Heterogeneous System Architecture 1.0

The AMD R-Series also brings support of the Heterogeneous System Architecture (HSA) 1.0 specification to x86 for the first time. Released by the HSA Foundation in March 2015, HSA 1.0 is designed to optimise the use of OpenCL. For this purpose, HSA defines a standardised platform design to unify the programming of all the SoC’s processing units. This requires the processors to have a memory that can be shared by all processing units. The AMD Embedded R-Series supports DDR4 RAM with ECC, which provides more bandwidth than DDR3. This enables developers to use GPGPU performance more effectively than if just relying on OpenCL.

Intel Core processors

Intel also recently introduced processors with powerful, integrated GPGPUs: the 5th and 6th generation of the Intel Core architecture. The offering scales from 15 W Intel Core processors for mobile ultrasound devices to server-on-modules with Intel Xeon processors in the 45 W class and Intel Iris Pro Graphics 6300 for stationary systems. In the most powerful version they provide vision systems with a GPGPU performance of up to 883.2 GFlops at 1150 MHz. This corresponds to a theoretical capacity of 99,830 operations per second executed on a 4K image.

Intel Pentium and Celeron SoCs

The latest Intel Pentium and Celeron SoC processors (formerly codenamed Braswell) are available with a TDP of 4 to 6 W. While they use Intel Graphics Gen8, like 5th generation Intel Core processors, the number of execution units in these low-power processors is limited to a maximum of 16. Performance therefore peaks at 358.4 GFlops per second. This computing power allows some 40,500 operations for a 4K image.

AMD Embedded G-Series SoCs

The AMD Embedded G-Series SoCs are now available in their second generation with a TDP of 6 to 25 W. They integrate the AMD Radeon R5E/R3E next-generation graphics core. The most powerful version with AMD Radeon HD Graphics 8500E provides a GPGPU performance of up to 153.6 GFlops. This translates into a respectable 17,700 operations per second for a 4K image for the pioneer of SoC designs in the x86 segment.

Related Articles

3D reflectors help boost data rate in wireless communications

Cornell researchers have developed a semiconductor chip that will enable smaller devices to...

Scientists revolutionise wireless communication with 3D processors

Scientists have developed a method for using semiconductor technology to manufacture processors...

Portable antenna could help restore communication after disasters

Researchers from Stanford and the American University of Beirut have developed a lightweight,...


  • All content Copyright © 2024 Westwick-Farrow Pty Ltd