Part of the  

Chip Design Magazine

  Network

About  |  Contact

Headlines

Headlines

Can We “Tock”? Intel’s 4th Generation Core Haswell Targets Embedded, Big Time

intel’s 4th generation core (codename Haswell) was introduced in desktop, mobile versions and embedded versions. The architecture’s feature set is a boon to embedded designers.

On June 3, 2013 Intel announced the 4th Generation Core processor family, code-named “Haswell.” As this announcement follows on the heels of the previous “Sandy Bridge” (2nd) and “Ivy Bridge” (3rd) generations, the general public could be forgiven for seeing this as another “ho-hum” march forward of annual technology. Most consumers, if they even notice the name at all, will find Haswell processors in the latest Ultrabooks, laptops and convertibles available starting in summer 2013.

130620_intel_7

Intel’s 4th Generation processor “Haswell” family includes Core i3, i5, and i7 variants.
But for Intel and the embedded market, Haswell represents “significant upgrades in graphics capabilities and improvements in compute performance and power consumption,” said the official press release. The up-to-quad-core family, in Core i3, i5, and i7 versions, has more permutations and flavors than a Baskin-Robbins ice cream shop when you include the Southbridge platform controller hub (PCH) and Xeon E3 server versions.

Remember, Intel has historically made the most hay in desktops, laptops and servers and until the rollout of Haswell, the product line going back to x86s treated embedded as an opportunistic afterthought. But there are now eight Haswell CPU SKUs and four chipset SKUs focused primarily on embedded (Figure 1), or what Intel calls the Intelligent Systems market. Of course, all SKUs are on the embedded roadmap and have extended lifecycle support.

130620_intel_1
Figure 1: Haswell has eight CPU and four chipset (Southbridge) SKUs just for the embedded market. Notice the absence of a Core i3 version for now. Perhaps the new Intel Silverton Atom family moves up to cover the low end. (Courtesy: Intel)

Like Ivy Bridge before it, Haswell is fabbed on Intel’s 22nm tri-gate technology. But Haswell is a “Tock” product whereas Ivy Bridge was a “Tick” (Figure 2). In Intel’s development model, this means Ivy Bridge introduced core concepts, while Haswell improves upon them and rolls out major product variants. As the desktop market wanes, the portable market—especially in tablets, POS terminals, convertibles (like the Microsoft Surface), and other as-yet-undefined devices—looms large. Portable means all-day low power, and it also means rich HD graphics on multiple screens. So Haswell boasts improvements in graphics and power consumption, features that benefit Intel’s traditional segments but really matter on embedded platforms like COM Express, ITX, MicroTCA and myriad proprietary form factors.

130620_intel_2
Figure 2: Haswell is Intel’s 4th generation Core processor, representing a “Tock” product with substantial new features and variants. (Courtesy: Intel)

Haswell has two other significant improvement categories: 1) signal processing and computing with new Intel Advanced Vector Extensions (AVX 2.0); and 2) security and manageability. In the latter, Intel extends vPRO, TXT and Active Management Technology plus a whole realm of security features such as AES-NI crypto to embedded SKUs. The company now openly discusses how Haswell’s hardware integrates with Intel’s McAfee subsidiary’s software.

On-Board GPU Graphics
Intel HD graphics used to be what you’d get with a low-end system that couldn’t justify a separate GPU from ATI (now AMD) or nVidia. In embedded, a separate GPU didn’t make sense so Intel-based systems were “stuck” with only fair graphics performance and features. Haswell brings a 50-60 percent improvement over Ivy Bridge in what Intel now calls “best in class.” 3D rendering is built-in, along with HD playback to three separate screens. We had thought AMD would beat Intel to the punch on this latter feature as their APUs with ATI GPU EyeFinity have supported multi-screens for a while, but apparently not in embedded versions. Intel’s graphics engine offers multi-CODEC support and can decode simultaneously while transcoding video streams using Intel Clear Video HD and Intel Quick Sync Video 2.0 features. Haswell now supports future 4K x 2K screens right now. There’s also output options for 3840×2160 @ 60Hz on DisplayPort 1.2, and 4096×2304 @ 24Hz on HDMI 2.

The Intel HD 4600, HD 5000, Intel Iris and Intel Iris Pro graphics subsystems in Haswell chips support APIs for Microsoft DirectX 11.1, OpenGL 4.0 and OpenCL 1.2. The SKU will determine which exact graphics subsystem is available (number of execution units, EU) in Intel’s “building block” arrangement. This allows Haswell graphics to only get better in the future, though they’re impressive today. A simplified logical diagram of the graphics architecture is shown in Figure 3.

130620_intel_3
Figure 3: Intel’s Chief Media Architect Dr. Hong Jiang showed this simplified diagram of the graphics architecture at IDF2012. (Courtesy: Intel)

CODEC improvements over the Gen 3 Sandy Bridge graphics include: native MVC short format, MPEG2 decode, hardware decode acceleration of scalable video coding (SVC), AVC and VC1. By placing these decode options in hardware EUs, the Haswell ICs perform faster, consume lower power and in embedded applications bring actual GPU functions to a single-chip design. In effect, COM Express boards can now run better than XBox360 graphics. Intel might argue that Haswell is substantially better. As well, for video encoding—perhaps for surveillance systems or remote M2M high-res sensors—Haswell has a hybrid hardware/GPU approach that “provides balance between performance, power and flexibility.” With a separate discrete GPU, the embedded designer would be constrained in all these areas…plus board real estate. Haswell does it all in only one IC.

Power and Performance
We admit to being a bit confused on the power consumption specs for Haswell. As shown in Figure 1 above, the embedded versions of Haswell chips consume 35-65W TDP, with Xeon server versions typically higher. Yet Intel is claiming all-day battery life in Ultrabooks and embedded intelligent systems devices. Apple’s new Macbook Air 2013, launched as we went to press, uses a Haswell ULT with a battery life increase from 7 hours to 12 hours on the newest 2013 13-incher running “wireless web.” How is Apple doing this with a 35W Haswell?

Moreover, in thermal-sensitive designs, Intel is talking about 15W SDP (system design power: the power consumed when the system is doing typical work, not worst-case TDP). Even more confusing is that this 15W is the processor plus the PCH Southbridge such as the QM87 Express mobile chipset used on embedded boards from companies like Congatec, Concurrent, Mercury Computer and others. Then there’s the Ultra Low Power (U-Series) versions of Haswell that we were told will be launched “later this year,” said Ken Caviasca, GM of Intel’s Platform Enabling and Development Team in the Intelligent Systems Group (ISG). The U-Series takes embedded integration one step further by combining the PCH monolithically onto the die, creating the “first time the Intel Core processor family includes the CPU+PCH in one package at 15W,” says an Intel press document given to us. Perhaps these are the processors found in the Macbook Air.

Regardless of the SKU power permutations which may soon be revealed, Haswell’s building-block architecture is designed for substantially lower power with “all day use” and “>10 days of connected standby” as stated by Intel’s Paul Otellini during IDF2011. Intel’s marketing terms are:

  • Intelligent Power Technology: automated energy efficiency to reduce power
  • Automated low-power states: adjusts system power based on real-time processor loads
  • Rapid Start Technology: improves OS boot time and wakes from deep sleep more quickly
  • Fully integrated voltage regulator: integrates legacy power delivery onto processor package/die. (This Intel wording leads me to believe that future versions will be monolithic and some current SKUs may be MCMs.)

Starting with a new extreme low-power idle state, Intel’s secret for power reduction is a combination of: 1) silicon architecture enhancements at the logic and process level; 2) IP block modularity and variable cache and graphics subsystems (Figure 4); and 3) holistic system-level power management which increasingly includes software. (Refer to “Careful System Hygiene is the Next Step in Power Reduction.”) In short: the company looked at everything possible to reduce power, lessons they picked up from the smartphone and tablet markets.

130620_intel_4
Figure 4: The Haswell architecture is modular and scalable, allowing Intel to create SKUs which target power/performance points for specific markets and applications. We recommend choosing devices by major market (Desktop, Mobile, Embedded, Server), then visiting www.ark.intel.com for details. (Courtesy: Intel)

Active power was reduced by improvements in Turbo Boost, which is now variable (“Dynamic”) instead of at fixed frequencies, allowing the cores to over-clock as needed to get more work done in a shorter time. Finer grain voltage/frequency islands were incorporated and clocks were decoupled from logic when possible. Subsystems that were not performance-driven were turned off. And the communications link between the CPU and the PCH was optimized to reduce power.

In idle mode, a finer-grained power gating was used (similar to Active mode). Intel added new C-States and improved the entry and exit latencies so power wasn’t needlessly burned doing no work. For example, in mode C7 all the clocks are stopped, the voltage is removed from the majority of the CPU, and this state is now engaged even when the system’s display is active. And of course, process improvements and transistor design—perhaps Intel’s biggest secret differentiating weapon—were realized to reduce active and leakage current consumption. State transition times were improved by about 25 percent, which allows the system more time to do work and then shut down.

Additionally, major subsystems like graphics also received a going-over to reduce power. For example, Haswell has higher graphics throughput than Ivy Bridge, which is partially due to parallelism, thus higher application performance per watt. This leads to a lower overall duty cycle, which in turn reduces platform power. Graphics “slices” (GT3 over the former GT1/GT2) now include power gating: turning off more clocks and logic blocks when not in use. There’s better and more granular drop-of-point clock gating to power-down EUs, and an improved memory controller that saves power. Finally, there’s a radically improved software stack to accompany the new graphics and EU subsystems, which includes the ability to move processor graphics (PG) to sleep much faster.

In short, everything was examined with a philosophy of either improve it or “turn it off if it’s not needed.” Intel reports a “20x reduction in idle power” due to these design and software tweaks.

The power enhancements go hand-in-hand with Haswell’s performance increases. One major improvement is Advanced Vector Extensions 2.0 which accelerates integer/floating point matrix operations for signal and image processing applications like medical imaging, sensor fusion or facial recognition. Haswell performs 32 single-precision FLOPs/cycle compared to 16 in Sandy Bridge (Gen 2) and 8 in Nehalem (Gen 1) generations. Similarly, for double-precision, the numbers are 16 FLOPs/cycle vs 8 and 4, respectively. In embedded applications, this means that a single Haswell device can perform most algorithmic or DSP functions on-chip, alleviating the need for co-processors like standalone DSPs or FPGAs which add size, weight and power (SWaP).

Simultaneous multi-threading via Intel Hyper-Threading Technology boosts performance in parallel, multi-threaded applications and it’s granular to two threads for each of the four cores. There’s also reduced latency in the pipeline, leading to more work per clock cycle and overall lower power over absolute time.

Security and Manageability
There are also new Endian conversion instructions for interfacing to non-x86 systems such as accelerators or peripherals, and integer instructions for security: indexing, hashing and cryptography. Intel’s AES New Instructions (AES-NI)—talked about for a while now—are implemented on-chip in Haswell processors. This means that encryption and decryption are done in hardware and not via software algorithm, which burns cycles and power. For vulnerable embedded M2M nodes on the Internet of Things that are increasingly targeted by cyber crooks, now collected, stored and transmitted data can be encrypted/decrypted on the fly. Figure 5 shows Haswell’s cryptography performance improvements for select algorithms relative to previous Intel microarchitectures.

130620_intel_5
Figure 5: With on-board hardware for AES-NI, Haswell SKUs bring cyrptography to embedded M2M designs which are increasingly targeted on the Internet of Things. (Courtesy: Intel)

When the CPU is paired with a PCH such as the QM87 Express (mobile version), Intel vPro brings in Active Management Technology, Virtualization Technology, and Trusted Execution Technology. This is a comprehensive set of hardware, firmware, software and operating system tools that provides what Intel calls “behind the glass” security. For example, there is down-the-wire security even when the system is off, non-responsive or if higher level software agents are disabled by malware or an attack. The processor can be shut down remotely or queried, and code can be remotely managed and even updated.

VT-d, VT-x and TXT—the abbreviations for a number of virtualization and management initiatives Intel has been unveiling across its processors since Nehalem—have been rolled under vPro and greatly expanded. Haswell adds unique hardware features and instructions designed to make embedded security a reality. Virtualization, for instance, is an ideal way to protect the environment from rogue code by partitioned isolation. VT on Haswell substantially improves guest/host transition times and adds “Accessed” and “Dirty” bits for extended page tables to eliminate the major cause of virtual machine exits. Additional instructions and tweaks make running code in a VM faster and better, increasing the likelihood that embedded designers will use these kinds of security features to protect their systems.

As mentioned earlier, Intel is now actively promoting the relationship between processor security features like these coupled with higher-order software such as McAfee’s Deep Defender (Figure 6). We believe this strategy is based upon a paradigm shift in combating malware, attacks and cyber crime. As articulated by the US DoD and NSA, despite the best preparation, attacks are going to happen. The proper response is to rapidly become aware of them, mitigate the damage and heal from them, and change tactics to stay ahead of the criminals. Intel ’s and McAfee’s teaming up (Figure 6) exemplifies this methodology.

130620_intel_6
Figure 6: Hardware features with processor instructions, coupled with software from McAfee, is designed to protect future embedded systems while mitigating damage if security breaches occur. (Courtesy: Intel)

 


ciufo_chrisChris A. Ciufo is editor-in-chief for embedded content at Extension Media, which includes the EECatalog print and digital publications and website, Embedded Intel® Solutions, and other related blogs and embedded channels. He has 29 years of embedded technology experience, and has degrees in electrical engineering, and in materials science, emphasizing solid state physics. He can be reached at cciufo@extensionmedia.com.

Tags: , , , ,

Leave a Reply