Part of the  

Chip Design Magazine


About  |  Contact

Posts Tagged ‘multicore’

Blog Review – Monday, Nov. 17 2014

Monday, November 17th, 2014

Harking back to analog; What to wear in wearables week; Multicore catch-up; Trusting biometrics
By Caroline Hayes, Senior Editor.

Adding a touch of nostalgia, Richard Goering, Cadence, reviews a mixed signal keynote at Mixed-Signal Summit that Boris Murmann made at Cadence HQ. His ideas for reinvigorating the role of analog make interesting reading.

As if there wasn’t enough stress about what to wear, ARM adds to it with its Wearables Week. Although David Blaza finds that Shane Walker, IHS is pretty relaxed, offering a positive view of the wearables and medical market.

Practise makes perfect, believes Colin Walls, Mentor, who uses his blog to highlight common misconceptions of C++, multicore and MCAPI for communication and synchronisation between cores.

Biometrics are popular and ubiquitous but Thomas Suwald, NXP looks at what needs to be done for secure integration and the future of authentication.

Blog Review – Mar 24 2014 Horse-play; games to play; multi-core puzzles; Moore pays

Monday, March 24th, 2014

Cadence’s Virtuoso migration path explained; Dassault reaches giddy-up heights in showjumping; an enthusiastic review of the Game Development Conference 2014, Mentor offers hope for embedded developers coping with complexity and MonolithIC 3D believes the end is nigh for Moore’s Law without cost penalties. By Caroline Hayes, Senior Editor.

Advocating migrating designs, Tom Volden, Cadence, presents an informative blog, explaining the company’s Viruoso design migration flow.

Last week, Paris, France hosted the Saut Hermès international showjumping event and Aurelien Dassault, reports on a 3D Experience for TV viewers to learn more about the artistry.

Creating a whole new game plan, Ellie Stone, ARM, reviews some of the highlights from GDC 14 (Game Developers’ Conference) with new of partners projects, the Sports Car Challenge and the Artist Competition wall.

The joy for a consumer is the bane of a developer’s working day: high complexity means developing multi-threaded applications bearing multiple-OS in mind, laments Anil Khanna, Mentor. He does offer hope though, with this blog and a link to more information.

Do not mourn the demise of Moore’s Law without counting the cost, warns Zvi Or-Bach, MonolithIC 3D. His blog has some interesting illustrations for the end of smaller transistors without price increases.

Deeper Dive: Nervous drivers can relax in safety-March 20 2014

Thursday, March 20th, 2014

Nervous drivers can relax in safety

Caroline Hayes, Senior Editor, looks at some of the ways being deployed to improve design, communications and to tackle vehicle safety on roads today.

Software and chip companies are working hard to make our streets safer – this is not a war on crime, but a concerted effort to make an automobile as safe as possible, protecting the driver and passengers from harm inside and outside the vehicle.

Ralf Klein, Product Manager, for German timing analysis and verification tool company, Symtavision believes that the European automotive market (led by Germany) considers the worse-case scenarios, looking at buffering and responses times. There are, however, communications problems, in the high levels of integration within the vehicle, which are increasing as bandwidth grows. Last year, the company put the analysis capability onto the Ethernet, introducing both a standard Ethernet version and an Ethernet AVB (Audio Video Broadcast) version, which prioritize communications. It is emerging as the in-car infotainment distribution standard, although is now being used increasingly in driver assistance and control functionality systems.

Most recently, the company has collaborated with development tools company, Lauterbach, to develop a joint workflow to develop automotive ECU (Engine Control Unit). The workflow is made up of Lauterbach’s TRACE32 modular, microprocessor development tools and Symtavision’s Trace Analyzer tool, which visualizes and analyzes timing data. It also used the company’s SymTA/S system level tool suite for planning, optimizing and verifying real-time systems.
ECU code is imported into TRACE32 from any third party ECU configuration tool for target debugging, emulation and software validation. Trace data from ECU measurements or hardware independent simulations is then passed to TraceAnalyzer to visualize and analyze timing traces and validate ECU scheduling. Timing models can be processed in SymTA/S for analysis and it is here that scheduling can be changed. When optimized, the configuration is returned to TRAACE32 and uploaded to the target.

It has a floating license access, says Klein, which speeds up exchange between the server and tools. The flow allows engineers to focus on the entire communications chain, or on separate parts of the communications path, as required, to decrease the iterations of the ECU development.

Speculating on the future, Klein says that most needs are with the automotive sector, although it is looking at the gateway area before deciding on the best strategy to communicate data form the bus to the Ethernet. “Ethernet and multi-core are areas to explore,” he says “As Ethernet and multi-core are used more, [our] tools can be used for analysis, making inter-core communications a possibility for the future”.

Multi-core microcontrollers from the UK’s fabless semiconductor company, XMOS, have recently been qualified to AEC-Q100 standards, making them suitable for use in automotive projects. The company’s xCORE multi-core microcontrollers can be used in infotainment, driver assistance and powertrain control. “It has been tricky to gain recognition in machine bearing and ADAS (Advanced Driver Assistance Systems), but now we are being designed into the new, next-generation of systems,” said Andy Gothard, Director of Corporate Marketing, XMOS. He revels in the prospect of “accelerating these tasks”, remarking that there is considerable growth. “Car makers and audio equipment makers want networked, real-time, data. They want to ditch wire in a car and audio,” he says. There is also an evolution of intelligent, autonomous cars and intelligence, with data from multiple sensors as well as condition monitoring will improve safety as well as contribute to energy efficiency and cost savings.

Initially, the xCORE XS1-L16A-128, 16-core microcontroller is available, to the Ethernet AVB standard via a twisted pair connection, using BroadR-Reach. The low latency architecture can be used in ECU, power train, chassis and active safety systems in vehicles as well as audio interfacing, DSP processing for active noise cancellation.
The company plans to release further, 6-, 8- 12- and other 16-core versions in the second half of this year.

Deeper Dive-Imagination promises a game-changer for multi-core design

Thursday, September 11th, 2014

In a recent announcement, Imagination Technologies introduced the first IP cores to combine a 64bit architecture and hardware virtualization with scalable performance through multi-threading, multi-core and multi-cluster coherent processing.
By Caroline Hayes, Senior Editor.

Mark Throndson, Director Business Development, Imagination Technologies, told EE Catalog that the MIPS I-class I6400 CPU addresses automotive, embedded, DTV/set top box, mobile and enterprise applications.

They are the first IP cores to combine a 64bit architecture and hardware virtualization with scalable performance through multi-threading, multi-core and multi-cluster coherent processing. This means, explains Throndson, that engineers can implement a smaller core at the same performance, or a faster core in the same area to meet the demands of mobile computing or performance-demanding data center servers.

This is the third announcement for the MIPS Warrior family from the company, but it is not an evolution, says Thordson. “This is the first 64bit MIPS core, and it has new features and new capabilities,” he says, leaving us in no doubt.

This is the sixth release of the architecture and the resulting core supports simultaneous multi-threading – up to four hardware threads per core – to increase performance, and 32MIPS code. As such MIPS r6’s new instructions are claimed to enhance the performance on JIT compile technologies, JavaScript, browsers, PIC (position independent code) for Android, running on MIPS32 architecture without requiring separate ISAs, datapaths or mode switching, says Throndson, eliminating wasted silicon area and power.

Jim McGregor, founder and principal analyst, Tirias Research: “To address the ongoing evolution in applications from IoT to mobile to networking and storage, companies need to select scalable platforms that can future-proof their designs. With 64bit, multi-threading, and multi-core/multi-cluster support, the I6400 is designed to be a flexible, low-power processor architecture capable of scaling across a wide range of applications. Imagination now has MIPS IP cores for everything from microcontrollers to 64bit servers, delivering choice across the range and changing the competitive dynamic in the industry.”

Early access releases are available now, and customers are expected to realize more than 50% higher CoreMark performance and 30% higher DMIPS compared to leading competitors in its class (based on preliminary benchmark results).

The simultaneous multi-threading enables execution of multiple instructions from multiple threads every clock cycle. Preliminary benchmarking shows that adding a second thread leads to performance increases of 40 to 50% on popular industry benchmarks including SPECint and EEMBC’s CoreMark, with less than a 10% cluster area increase.

Like existing MIPS Warrior cores, the hardware virtualization technology adds security throughout the system and across the SoC. There is support for up to 15 secure/non-secure guests as well as the ability to support multiple independent security contexts and multiple independent execution domains. It can be scaled to support secure content delivery, secure payments and identity protection across multiple applications and multiple content sources.
Targetting mobile devices, the cores use advanced power management with PowerGearing for MIPS. A dedicated clock and voltage level can be provided to each core in a heterogeneous cluster, while maintaining coherency across multiple CPUs to ensure that only sleeping core wake when needed, without expending energy on others.

Another feature is the floating point unit which supports both single and double precision capabilities for general computing or control systems processing.

There is also support for a 128bit SIMD support, to boost performance and throughput using SIMD execution in data-parallel applications. It supports 8-, 16-, 32- and 64bit integer and 32- and 64bit floating point data types, for audio, video, vision, and other compute-intense tasks, across the spectrum of applications sectors.

The core is built on the MIPS SIMD RISC architecture, with instructions supported within C or OpenCL, although also to leverage existing code, if this is appropriate.

MIPS Coherency Manager fabric (Figure 1) is based on a multi-core coherent interconnect architecture and is able to support multi-core configurations of up to six cores per cluster. This is particularly useful where cores on one cluster can have different synthesis targets, and operate at different clock frequencies and voltages. For optimum performance, the fabric implements hardware pre-fetching, which combined with wider buses and lower latencies compared to previous generations.

The ecosystem includes software, tools and applications and the new prpl open source foundation. This an open source software for MIPS I-class and Warrier cores, founded by Broadcom, Cavium, Ikanos, Ineda Systems, Ingenic Semiconductor, Lantiq, PMC and Qualcomm, among other. The software is focused on the IoT and data center applications.
One of the first projects completed through the prpl open source foundation is support for the MIPS64 r6 architecture in the QEMU open source emulator, currently available at

Early access releases are available now, with production RTL and general availability scheduled for December.

September 11, 2014

Software Programmers Face Multicore Challenges

Thursday, February 19th, 2009

By John Blyler
As multicore technology moves into the embedded systems, software developers face tool shortcomings, legacy code preservation and scalability challenges.

Max Domeika , an embedded tools consultant in the Software and Services Group at Intel, explained that one of the biggest challenges facing embedded system software developers is the growing number and types of multicore processors. “There are so many different cores with varying capabilities even within a given architectural family of processors, not to mention virtual enhancements like hyperthreading techniques, which enable a single core processor to look like two cores to the operating system.”

Hyperthreading is based on “out of order” scheduling of a processor in which the incoming code instructions are identified early enough so they can be executed in parallel. From a programmer’s viewpoint, it looks like you have two different processors, even though you are sharing many of the same resources, such as execution units and memory caches. It gives the programmer more options, but it is not the same as having two distinct cores as in a multicore environment.

Why are embedded designers upgrading to multicore technology? Many have found that the use of multiple processor cores reduces system latency or delay. “If you have latency sensitive applications – even something as simple as a user interface – you can spawn threads with reduced latency,” notes Domeika. Threads, or a collection of related code segments, are at the heart of multithreading paradigm, which allows programmers to design software applications whose threads can be executed concurrently.

Scalability is one of the biggest obstacles faced by embedded developers who want to or must use multicore systems. “You may ship with a two-core architecture, but the next revision of the product my have four cores. This means that you want to create your software in such a manner that it scales with as little effort as possible,” Domeika says. Many developers have found out the hard way that a system hard-coded for a two-core architecture must be completely rewritten to run on a four-core system. This translates to additional product cost and a longer time to market.

To accommodate the next product revision, a programmer has to look at all of their routines and figure out how to re-partition the executions among an increased number of processors.

At the heart of the scalability question is the balance between different types of available parallelism.  Besides multithreading, another means of providing for parallel execution is through the use of Single Instruction – Multiple Data processing.  Intel’s implementation of SIMD instruction is embodied by its Intel SSE instructions.  Another example of SIMD instruction set processing is AltiVec, used by the PowerPC community – namely IBM and Freescale.

“In many ways, it is still left up to the programmer to decide how best to balance the amount and type of parallelism to be used. Programmers must think about what happens at the tread level for a shared memory machine, the process level, and also the SIMD vector level. This is a very hard problem,” notes Domeika.
Scalability and parallel implementation issues help to reinforce the need for better development tools and environments. One way to address these needs is through standards and automation. In the multicore space, several companies – including Intel – have worked on a standard called OpenMP which enables concurrency on shared memory machines. One challenge with using OpenMP in the past was balancing parallelization operating on multiple cores and parallelization available through use of SIMD instruction sets.To address this challenge, the Intel Compiler recently added support for automatic vectorization inside of OpenMP pragmas, which are used for compiler control. Automatic vectorization is a technique that transforms a series of sequential operations into operation performed in parallel. Hence, automatic vectorization analyzes the code to take advantage of SIMD instructions, while working within the OpenMP environment to balance different forms of parallelism.

All of these advances are great. However, for many embedded software developers, the main issue is how to preserve their legacy embedded C code while still upgrading to multicore technology. “Most customers that I talk with are interested in new technologies like multicore but are more interested in preserving their business, which means their 10 million lines of legacy C code,” explains Domeika. “That is why I’m one of the co-chairs—along with David Stewart, CEO at CriticalBlue —of the Multicore Programming Practices (MPP) Group. The charter of the group is to collect the best known methods of programming practices using today’s technology. For embedded developers, that means C/C++ and libraries, mainly POSIX threads.

Such cooperation among embedded software developers will be critical for the continued growth of multicore technology. As in the desktop world, embedded software is the last – but certainly not the least – component necessary for a successful system.

View the video interview with Max.