The Power Treadmill

April 25th, 2013

By Frank Ferro
The recent purchase of an LTE smart phone has me back on my power management soapbox. I upgraded my phone about a month ago to the newest version (staying with the same manufacturer as my previous device) and to my dismay, although it wasn’t completely unexpected, the battery life was actually shorter. I did not do a ‘scientific’ comparison, but following the same daily use pattern I noticed the battery life percentage indicator was much lower at the end of the day when compared to my two-year-old 3G model.

The reason I say this was not completely unexpected is because in my February 2013 SLD blog (Power Management: Throwing Down the Gauntlet) I sighted a recent survey showing that users of 4G phones were less satisfied with the battery life than were users of 3G phones. This was due to the fact that the radio needed to wake-up more often to look for a 4G base station. I also suspect that the larger screen is another key contributor.

Instead of speculating (and complaining) about battery life, let’s take a look at the power profile of a smart phone to determine where we can get the most ‘bang for the buck’ when looking for places to save power. The table below is from an article in the November 2012 Microwave Journal showing the power profile of major components in a smart phone. As the table shows, the overall power consumption over the last few years has nearly doubled. According to the same article, battery capacity has been increasing by about 10% per year for the last few years, so the battery technology has not been able to keep pace with the smart phone power requirements.

Note that battery life is a function of phone use cases scenarios, with such activities as voice calls, video calls (e.g. Skype or FaceTime), using the Bluetooth headset, Wi-Fi, watching videos, listening to audio, etc. Each of these use cases puts different loads on the CPU, GPU, display and the various radios, so the table provides a general idea of the overall power profile for a given use case.

As suspected, the radio takes a reasonably large percentage of the power (23%), but the rate that RF power has been increasing with each new technology node is relatively low at 11%. The largest rate increase has been in the display at 300%, which should not surprise anyone given the size and the resolution of the smart phone displays. And consider that the data in this chart does not take into account some of the most recent smartphone models having even larger displays with better resolution.

Looking next at the processor and peripherals we see that together they account for more than half of the power consumption, so clearly targeting these components for better power management will have a significant benefit to the overall battery life. The problem, however, is that new smartphone processors keep increasing in speed and adding more processor cores and GPUs, so the power treadmill is not slowing down.

Is help on the way?
One obvious solution is better battery technology. Recent research claims that lithium-ion micro-batteries will provide a 10x power improvement or be 10 times smaller than today’s batteries—take your pick. Given that these batteries are still in the research stage, don’t expect to see commercial products anytime soon. Consequently, we need to look at the silicon for some immediate relief.

Silicon providers traditionally have relied on process technology progression to reduce power (usually with lower operating voltage), but at 40nm process nodes and smaller there is limited help because leakage power has become difficult to control. If not properly managed, leakage power can exceed dynamic power. New process techniques such as silicon on insulator (SOI) have helped, and the new FinFET technology offers improved leakage, but we have to wait a bit longer for full production of FinFETs.

So where can we expect to get the most immediate and largest improvements in silicon power consumption? Looking at the above graph, the highest power consumption gains can be achieved with architectures that comprehend power management (left side of the graph). SoC designers must incorporate power management techniques early in the design phase as a fundamental part of the architecture, and not look for power optimization later during the silicon implementation phase (right side of graph). Techniques such as power shutoff, adaptive voltage scaling (AVS), dynamic voltage and frequency scaling (DVFS) and clock gating are in fact being used in various combinations in the latest smartphone SoCs. These techniques are good, but are they enough to keep up with the treadmill?

Other than CPF and UPF, which only specify power intent, there is not a standard methodology for implementing a power management architecture. For example, AVS and DVFS lower power consumption, but at the cost of increased system complexity. Therefore, without a standard methodology AVS and DVFS are used sparingly in the system to trade off design complexity with power savings. In addition to the hardware, software complexity also increases as more aggressive power management is applied to the system. To take full advantage these and other power saving techniques, design tools and IP are needed to allow SoC designers to deploy better power management without the design risk. Applying a standard methodology will simplify development, especially for design teams that are not familiar with power management, and increase thier ability to verify functionality and performance of the power management network.

So maybe (just maybe), two years from now when my phone contract expires, I will be able to purchase a smart phone that actually will have longer battery life. This only will be possible, however, with a combination of improved battery technology, process technology and better SoC power architectures. Given the SoC design cycle time, better SoC power architecture work needs to start right now in order for these SoCs to be in smart phones by March 2015— my current phone contract expiration date—or I will have to wait another two years…

—Frank Ferro is director of product marketing at Sonics.

The Business Of Things

March 28th, 2013

By Frank Ferro
The Internet of things (IOT) will create $14 trillion dollars in business opportunities according to Cisco. Unless you are a government accumulating debt, most of us think that’s a big number—and a big opportunity. The much quoted “50 billion connected devices to the Internet by 2020” forecast is the impetus driving companies in all parts of the ecosystem including infrastructure, applications, services, systems, and semiconductors to position themselves for a share of this market.

Although much of the high-tech growth in recent years has been centered around connected consumer devices, with 1 billion units shipped in 2012 and an estimated 4.5 billion ‘connected screens’ to the Internet in 2016, these markets are maturing and consolidating. The HDTV market has matured, smart phones are next, and tablets will not be far behind. As a result, both the winners and losers in these markets are looking at the IOT as a way to leverage their technology investments.

The IOT market is, in fact, becoming a reality as new products and applications expand beyond vertical markets, making their way to the consumer. Our familiarity (affection may be a better word) with smart phones and tablets, along with the cloud infrastructure that makes these devices so useful, are enablers for the IOT. These devices provide an easy and intuitive interface to a wide range of technology products, which up to this point have only been envisioned. I am sure that your cable or Internet service provider has tried to get you to add home security to your system. These systems will allow you to monitor and control your home from any mobile device. Even my pool service company wants to sell me a controller with Wi-Fi so I can control and monitor the pool from my smart phone. I can fire up the spa on my way home from work, but of course then I would need a smart blender to prepare the margaritas!

These two simple examples begin to give us a sense of just how big this market can be. Basically any device that can connect to the Internet is fair game. This is a multifaceted challenge and is difficult to get your arms around. As briefly mentioned, there are many vertical market segments such as health care, industrial, transportation, energy, consumer and home, retail, IT and networks, to name only a few. There are also many layers of technology to deal with such as sensors, microcontrollers, power management, energy harvesting, systems, applications and infrastructure. The requirements and challenges for each of these market segments will vary, including cost, power consumption and performance.

The real question then is how can SoC companies create a successful business model around the IOT? Having a pool controller or security system that is connected to the Internet is nice, but how many of these products are sold per year? Last year for example, there were about 1 million cars sold with Wi-Fi connectivity and the number is projected to be 7.2 million in 2017. This is healthy growth, but when compared to the >500 million smart phones with Wi-Fi, this is a relatively small market. I am using connectivity (Wi-Fi in this case) as a proxy for these segments, but the same volumes generally apply to the underlying controllers, as well. Plus, the turnover rate in many segments of the IOT is much slower, with consumers owning products for seven years or longer and only one product per household versus many connected devices per person.

Because these markets are so segmented, SoC development cost no longer can be $100 million per generation if you expect to run a successful business. Chip development cost will need to be significantly lower (~one tenth) and be based on an architecture and design methodologies that are flexible enough to support a range of market requirements. Microcontroller companies have had to deal with this challenge for years, and more recently some Wi-Fi companies have adapted to these challenges. As the IOT evolves however, more complexity is being pushed closer to the end device so the requirements are no longer a simple sensor and controller.

To create SoCs that support this increasing level of complexity (e.g. low power for one application, high bandwidth for another) at a low design cost, a strategy needs to be developed that includes architecture, IP and design methodology. For example, several companies already have adopted on-chip network IP as a design methodology that provides standard interfaces with universal connectivity for IP cores from multiple vendors. Using this design approach allows IP cores to be quickly and reliably added or removed from the SoC without any significant design work because each core is isolated from the rest of the SoC. With this IP, SoCs can be quickly adapted with very little design cost to support multiple market segments along with changing design requirements.

Another good example is power management. Today this is done in an ad hoc fashion with no uniform design methodology. Some companies look to process technology and clock gating for low-power designs, others look to better architectures, and still others use design techniques such as DVFS, and some use all of the above. IP and EDA tools that can provide a unified methodology with standard power interfaces (beyond CPF/UPF) will save cost, development time and allow for chips with much lower power consumption.

It is good that semiconductor companies are talking about the IOT market as the ‘next big thing,’ but they need to take a serious look at the business model and the chip design methodologies required to support these wide ranging market segments if they want a piece of the $14 trillion pie.

—Frank Ferro is director of product marketing at Sonics.

Power Management: Throwing Down The Gauntlet

February 28th, 2013

By Frank Ferro
The recent burst of articles challenging smart phone battery life has me asking the question, “Are we ready to turn the corner on power consumption?” About two years ago I was bemoaning the fact that we are willing to live with a smart phone that gets only one day of battery life (Powering Forward or Moon Walking). As of today, nothing has changed. We still need to charge the phone every day. Recent processor announcements continue to be about adding more CPU cores (i.e. more performance). Not to pick on any one company, but did the announcement of an 8-core processor significantly change the smart phone? Is this product creating anticipation in the market for a new processor with 16 cores? Not really.

For most of us, all we want is a smart phone that has a reliable voice connection with a fast Internet browser and decent battery life. Okay, I watch short video clips on my phone, use the maps, along with a few cool apps, but do we need HD quality on the small screen? Even as Mobile World Congress is kicking-off in Barcelona this week, I saw ST-Ericsson announced their new NovaThor L8580, the first smart phone processor to hit 3GHz speed mark. Putting aside the debate about if and when a 3GHz processor is needed in a smart phone, speed still is getting attention.

There is hope, however, for those of us that don’t want to always carry around a power cord. Google and Motorola are making noise about upcoming products that will focus on battery life. Google CEO Larry Page said, “Battery life is a huge issue. You shouldn’t have to worry about constantly recharging your phone.” Consumers also are weighing in (finally), expressing their dissatisfaction with battery life. In a recent J.D. Power survey of smart phone users, they say that battery performance is becoming a critical factor in overall product satisfaction. The report states that “satisfaction with battery performance is by far the least satisfying aspect of smartphones.”

Another interesting aspect of the report is that users of 4G phones gave battery performance lower rankings than 3G users. 4G phones apparently need to ping the base station more often looking for a 4G connection, and there are fewer of them than 3G base stations. Although this may be a temporary situation as 4G proliferates, early testing of voice over LTE (VoLTE) shows a significant reduction in battery life when compared to CDMA, so we are still in an uphill battle.

On the semiconductor side, companies will continue to compete with high-end SoCs that are loaded with features. However, recent consolidation of the application processor market is the first sign that these SoCs are reaching initial levels of product maturity. As with most product cycles, the goal for first- or second-generation products is to grab market share by getting to market quickly. In these early generation products, there is not too much care taken (typically) to be gate- and power-efficient. At the product level there are also signs that the smart phone market is starting to mature with the release of the first midrange and value-smart phones. This clearly will open up opportunities for the major SoC players to do cost and power reductions. It also will open up new opportunity for other SoC vendors to compete that missed the initial market cycle.

Product shrinks and removing features certainly will help power consumption as gate counts go down (or at least are not going up). In addition, current power management techniques—such as power switching, including dynamic voltage and frequency scaling—provide power savings, but is this enough? As SoCs are redesigned to meet the requirements of a segmenting smart phone market, this is a great opportunity for chipmakers to adopt much more aggressive power management techniques. For example, these complex SoCs include a collection of subsystems with multiple power and clock requirements that are grouped by ‘domains.’ These domains can be turned on or off based on the expected use cases (e.g., when I am listening to music I want video and all radios asleep), thereby consuming as little power as possible. Due to software complexity and interdependencies between domains, however, the number of domains that can be controlled is limited. Less domain control means that more parts of the chip are on. In addition, the switching speed at which these domains can be turned on or off needs improvement. The current ‘top down’ software-controlled view can be relatively slow, again leaving domains on much longer than necessary.

The good news is that the market will force the SoC manufactures to get much more aggressive about power management. The J.D. Power report also indicated that smart phone owners who are highly satisfied with their device’s battery life are more likely to repurchase the same brand of smart phone, so better power management is now a real competitive issue. Current SoC leaders must make it a priority to innovate around power management, implementing much more aggressive power saving techniques—or they run the risk of leaving the door wide open for competitors. The power gauntlet has been thrown down.

—Frank Ferro is director of product marketing at Sonics.

The CES Effect

January 31st, 2013

By Frank Ferro
CES draws a lot of attention. Everyone wants to be first to see the latest and greatest consumer products. If you don’t mind squeezing through the crowd, you can glimpse the startling picture quality of an OLED TV. Never mind viewing the quality of a 4K Ultra HDTV, at CES you can skip a generation and see what an 85” 8K UHDTV looks like. Talk about resolution! You also can explore a working smart home connected by a host of products enabling the “Internet-of-Things,” see products that can sling video from your phone to other screens, and then see robots clean windows. You can even use your brain waves to control toy helicopters and kitty ears. And the list goes on.

This is all fun, but CES is also a place where you can collect valuable data points on markets, products and companies. Careful observation will help get answers to the following questions:

  1. Is a product is going to be real in the market and when?
  2. What’s the strategy of leading consumer and semiconductor companies?
  3. What’s coming next?

All the hype, discussion and speculation around these questions I like to call the ‘CES Effect’.

What is real? One of the hits of this year’s show was the 4K UHDTVs. There is no question that these TV’s are going to find their way into consumers’ homes. The only question is when. I remember when HDTVs first appeared at CES in the late ’90s at a cost of about $10K. I knew that it would be a long time before one would show up in my home. Ten years later, in 2009, I purchased my first HD set for about $600. Cost was not the only factor that limited widespread HD adoption; it also was limited by the available content and lack of infrastructure.

A very similar discussion is now taking place with regard to UHDTV including: where is the content? Can the infrastructure handle higher resolution? Higher frame rates are needed to view sporting events; you need HDMI 2.0, and so on. Given this, and the price tag, it will be a few more years before UHDTVs are adopted by consumers. Technologies like H.265 will certainly help the deployment providing similar or better quality with about 50% reduction in media files. I am sure that when my current HD set is on its last legs (hopefully five to six years from now), I probably will have no choice but to purchase a 4K set because these will eventually overtake existing HD technology.

What‘s not real on the other hand are 3D TVs. Yes, they have been at CES for a few years now, and maybe it is me, but the user experience seems to be getting worse and not better. Not to ‘toot my own horn’ but about a year ago I predicted that we are not ready for 3D because there is not a practical consumer use case. Even for movies, my wife and I will not pay extra to see the 3D version, preferring the 2D instead. 3D will remain a novelty for games or special applications, but not the widespread adoption that was expected. Actually, if you want a real ‘3D’ experience, go and view the 8K resolution UHDTVs. The depth and clarity of this picture gave the impression of three dimensions. Unfortunately, I will have to wait even longer to get one of these. Gesture recognition is another technology that was hyped a year ago but was basically absent for similar reasons as 3D—lack of a scalable use model for the consumer (also discussed in Dec 2011 SLD blog).

Just when CES was starting to feel like a “mobile show,” this year the clock was turned back to more traditional mix of consumer electronics with only a handful smart phone announcements. Perhaps companies are holding their announcements for Mobile World Congress in February. Even so, it is clearly a sign that the smart phone market is maturing and there is less jockeying for position.

Providing an interesting dichotomy to the show were a number of processor announcements from Intel, Nvidia, Qualcomm, Samsung and ST-Ericsson—a dichotomy because you can see iPhone cases next to semiconductor booths. At a consumer show do buyers from big box stores care about 8 CPU processors cores or 72 GPUs? Maybe the PC market has trained the consumer to just know that a dual-core processor is better than a single-core and a quad-core is better than dual-core.

In any case, semiconductor companies are ‘leaning forward’ with very aggressive designs to cover a range of markets. The Tegra 4 from Nvidia, for example, with four ARM Cortex-A15 CPU cores and 72 GPUs, is targeting the gaming and tablet markets with enough power to support 4K (UHDTV) output. Similarly, the Snapdragon 800 from Qualcomm will support higher-end gaming, augmented reality and 4K content. The Samsung Exynos 5 Octa uses ARM’s big.LITTLE architecture with 4 Cortex-A15s (big) and 4 Cortex-A7s (LITTLE) in order to save significant power over the previous quad-core version. Intel on the other hand is targeting value smart phones with its Lexington platform and is giving the ‘heads-up’ on Clover Trail+ along with a new 22nm Atom-based design.

If I can boil all this this data down, the ‘CES effect’ on the SoC world is the need for more performance, higher complexity and longer usage per charge (lower power). This should not be a big surprise to anyone tracking the SoC market. The consumer’s demand for all these high-tech gadgets is unrelenting and the pace of SoC development is not letting up anytime soon. I also could add to the list lower SoC cost (both development and product cost) and better execution (TTM). To keep up this pace, contributions are needed from all parts of the semiconductor ecosystem including better IP, improved system architecture and analysis tools.

And P.S.: If I see another Dick Tracy watch at CES (which I did) I will scream. Give up already!

—Frank Ferro is director of product marketing at Sonics.

The Network Is The SoC…

December 19th, 2012

By Frank Ferro
SoC design continues to challenge semiconductor and system companies in their pursuit to create a better user experience for a wide range of products. Given this, I was pleasantly surprised to see that two of the “Ten technologies that will change the world in 2013,” according to EETimes (December 2012 issue) were SoC-related.

One is virtual SoC prototypes and the other is IP subsystems. These technologies are right up there in the top 10 list with heterogeneous networks, gesture recognition and 3D printing (which by the way I struggle to ‘wrap my head around’ because this is a real Star Trek replicator!) Both virtual SoC prototypes and IP subsystems are making such lists because they are now necessary pieces in the SoC design puzzle. The complexity of SoCs designed in 28nm process technology and below are becoming too unwieldy for design teams to manage as more and more functionality is being crammed onto the die. Note that that 3D FinFET transistors also made the top 10 list (14nm and below).

Having the ability to create virtual prototypes addresses not only SoC complexity, but also the time-to-market pressure, by pipelining software development in advance of silicon. Virtual prototypes can be a cost effective alternative to FPGA emulators for hardware and software development. However, they also can be used in conjunction with FPGAs for hardware testing and third-party IP integration. Clearly defining the architecture based on a more detailed understanding of the system’s performance behavior, in advance of the SoC implementation, will save time and cost during the implementation phase, ensuring the SoC meets design specifications.

Along with virtual prototypes, IP subsystems are clawing their way out of an esoteric world as they emerge as a key component in a complex SoC design strategy. IP subsystems are a way to ‘divide and conquer,’ where advanced functions such as graphics, audio or video are addressed by the subsystem. The advantage of this approach is that these functions can be tested and verified at the unit level, then integrated with the top-level SoC functions. Another advantage is that subsystems are available as commercial IP blocks from multiple vendors, making for good competition. Plus, the expertise for these functions does not need to exclusively reside ‘in house.’ Semico Research predicts 25% of the SoCs that ship next year will include subsystems, with this number increasing to more than 65% in 2015.

SoC Design is Fabric Design: As collections of subsystems begin to make up a larger percentage of the SoC, integrating these subsystem along with other IP components is the real challenge. A customer recently noted that the speed and success of an SoC program is tightly coupled to their ability to do the fabric design (or the on-chip communications network). Being a supplier of on-chip networks, it is certainly encouraging to hear customers elevate the importance of this IP in their SoC methodology, equating it with the success or failure of a program. Fortunately (or unfortunately), this is true because the network touches every aspect of the SoC design from early architecture exploration all the way through to back-end layout. So the on-chip network is not only a critical IP block connecting all the cores in the system, it is also is a tool for architecture exploration and performance analysis. And finally, it is a platform methodology to allow the rapid and repeatable assembly of the SoC, enabling design teams to meet the rapidly changing market requirements.

PPA: Understanding tradeoffs around performance, power and area (PPA) are essential to ensure that architectural intent can be realized in silicon. Connecting so many cores and subsystems together creates natural contention points in the network which, if not managed, will mean poor performance for the various usage scenarios or failure of the SoC completely. To answer these PPA questions, RTL or SystemC models of the on-chip network allow the SoC architect and designer to model and analyze critical data paths in order to optimize the system (e.g. optimize buffer sizes and minimize wires). Architectural features in the network, such as virtual channels, QoS, and true non-blocking flow control (not simply request and response pipelining), provide the concurrency necessary to keep the performance up and the gate-count down. Features such as virtual channels also help with the back-end layout implementation because the logical network design is separated from the physical layout, thus avoiding performance problems late in the design as components are shifted on the die.

Mainstream: SoCs are now the critical component for leading-edge products in all the major market segments (consumer, communication, networking, enterprise, automotive). Successful SoC execution therefore is key to the success of both system and semiconductor companies, and hence the visibility. A better SoC methodology built around the on-chip network fabric is necessary to improve IP integration, help meet performance goals, and to avoid back-end layout problems (timing closure). Being on a top ten list is nice as long your SoC is the top seller.

—Frank Ferro is director of product marketing at Sonics.

Open IP Development Tools

November 29th, 2012

By Pascal Chauvet
How much time have you wasted trying to understand software tools by deciphering the logic of their creator? I always find it very frustrating to be limited by features and tool capabilities that do not do exactly what I want, or which do not work at all with my other applications. We are engineers! We can learn and adapt, but we often want to be able to extend and improve the tools we are using. Why is that not always possible?

Adding or replacing an EDA tool from different vendors in your design flow does not have to be a headache. It should never force you to make major modifications in your methodology and overall environment. So how is it achieved? Enforcing the support of standards for tool interoperability is an obvious first step.

In the world of SoC architecture exploration and platform assembly, the IP-XACT standard, despite its flaws, has been widely adopted. IP-XACT also is used to ease IP integration. Similarly, IP model interoperability has benefited from SystemC TLM 2.0. For performance analysis and system debugging, UVM transaction recording and SCV transaction recording have made it easier to share instrumented models or RTL monitors to analyze simulation results.

Modularization of functionalities, in order to be shared across a common software platform such as Eclipse, opens up new opportunities for tool interoperability and integration.

Scripting capability built around the base commands of any tool transforms it into a very powerful application in the hands of its user. The most successful EDA tools have such a customization layer.

The de-facto language for user-level scripting in the EDA industry is TCL. Many CAD departments have managed to build complete infrastructure around their tool flow with TCL. I believe it is safer to stay away from any wholly proprietary language, or even any more exotic language, as these defeat the purpose of language unification.
The support for industry standards, along with the scripting capability of tool environments, defines what is called the “openness” of these environments. The more “open” the tools, the easier it will be to use them together and to adapt them to your needs.

EDA vendors are not the only companies building CAD tools for their users. Tools built by IP providers are often underestimated and should also be subjected to close scrutiny about openness. The more configurable an IP, the more sophisticated will be the tools associated with it. Memory subsystems and on-chip communication networks (interconnect or network on chip) are perfect examples of highly configurable IP. Ironically, even if these complex IP products are architected and designed to be easily interfaced with all other IP cores in a system, their tools may not be built with the same objective in mind.

Architecting and assembling a large SoC implies intimate knowledge of all the IP components that compose the system. That is why it can be extremely challenging for an EDA vendor to build such design environments. Until recently, the big 3 (Cadence, Synopsys and Mentor Graphics) had not shown much interest in tools for architects, or even tools for platform assembly. Perhaps the numbers for the ESL market were considered too small to be taken seriously.

EDA vendors tend to build new tools starting with very broad objectives. They want to determine if the tool creates any interest, but unfortunately these vendors usually barely scratch the surface. It is not until they work with a lead customer and address customer-specific requests that they refine the implementation. Openness and more precisely scripting is a must, so the user can add their own “know-how” to the tool.

IP vendors, on the other hand, have full knowledge of their IP, but they will often sacrifice having an open environment in order to limit dependency on external elements out of their control. This approach is indeed a safer, easier, and faster way to get a tool out that addresses your IP needs. But does it really help a customer to achieve their goals?

Forcing architects to systematically translate the requirements and constraints of the large system they are building into IP specific ones is an inefficient task. At the end of the day the entire SoC has to perform as expected so it can implement all the supported applications.

Buyer Beware. Any company evaluating a new IP should pay close attention to these tooling aspects. It is necessary to look beyond the mere “eye candy” UI. Ask yourself these questions: How will this tool play with the rest of your environment? And will you be able to extend it and mold it to your needs? Always be wary of vendors that assume they know more than you do.

—Pascal Chauvet is an application architect at Sonics

Coherently Incoherent: Dealing With Complexity

October 25th, 2012

By Frank Ferro
I was a bit frustrated this weekend after installing a digital light timer—yes a light timer. As an engineer this should be no big deal, and for the most part, I installed it without shocking myself or other major problems. This timer had all the bells and whistles. It knows about time zones, adjusts daily for dawn and dusk. It even adjusts for daylight savings time. The problem came when I tried to program this device. It took me two days to get it right (I actually had to read the instructions)! How did a very simple function like a switch become so complicated?

I had a similar thought last week as discussions have intensified around the need for embedded consumer SoCs to support hardware cache coherency. How did connecting one core to another (a switch and router) become so complicated? Much of the discussion has been sparked with the recent introduction of the new ARM CCN-504 cache coherent interconnect. Although this IP is for high-end computing platforms, it is clear that these types of coherent networks will be needed for lower-performance applications also. It’s not that cache coherency is new in embedded SoCs. It has been used in computing clusters for some time now. Keeping memory coherency within the computing cluster, however, has been the problem of the CPU vendor only because it did not affect other memory transactions in the system. What is new is that other processors in the system (GPUs and DSPs) also will need coherent access to memory.

Why? I don’t need to reiterate the increased computing demands and bandwidth challenges in today’s mobile SoCs. The advanced features in today’s on-chip networks like QoS and virtual channels have been introduced to maximize system bandwidth and concurrency, thus reducing any negative performance impact due to multiple processors competing for system memory. Even so, anything that can be done to minimize the need to access off-chip system memory, with its long latencies, has real performance advantages. Given this, other processors like the GPU and DSP can have their own local caches to maximize performance, but these local caches may need to have a consistent view of memory with the other heterogeneous processors in the system. Reducing the number of external memory accesses also has power savings benefits, which is critical for mobile SoCs.

Another layer of complexity. Supporting cache coherency means that the on-chip network has the added task of determining if a data transaction is coherent or non-coherent. If the transaction is coherent, it will have to be directed to a coherency network to manage the progress of these shared transactions; if not, then the data can pass directly to memory. Supporting coherency brings many new and challenging architectural decisions to the SoC design team including: how many computing clusters will be supported, will other cores be fully coherent or I/O coherent, how to mix coherent and non-coherent masters, what type of coherency scheme (snooping, directory), how well will the system scale? These are all critical questions with answers that vary widely today depending on the customer, the application and the processor used.

Given ARM’s large share of the mobile processor market, the ACE specification is closest thing we have to a standard. Implementations today include a mix of coherent and non-coherent networks connected via ACE and ACE-lite (for I/O coherent data) ports. In addition, OCP 3.0 has provided a protocol specification for coherent data transfers, and some companies have developed their own proprietary coherency networks. As the networks evolve I would expect to see better integration of coherent and non-coherent networks. I also would expect to see networks that offer more scalability (moving easily from one or two computing clusters to systems that support many more).

For cache coherency to be adopted on a wider scale, customers need to see the clear benefit of adding this level of complexity to the SoC. Having the ability to simulate with test cases showing performance and power benefits will be very important. In addition, SoC designers are still not certain of the system requirements (listed above) and the timing to introduce hardware coherency. One thing is certain however, that supporting hardware cache coherency in embedded SoCs offers both potential benefits and challenges to designers that are not for the faint of heart. This is definitely not a simple switch.

—Frank Ferro is director of marketing at Sonics.

You Get What You Want

September 27th, 2012

By Frank Ferro
Now that the iPhone 5 hype is quieting down, the discussion has turned to the A6 chip that is powering this must-have device. There is much speculation on what is inside the A6 processor. Is it a dual-core A15 or a custom architecture? Is it a ‘big.LITTLE’ architecture? What speed are cores running at—1.2GHz? Others argue that the graphics processor is of equal importance to the CPU for the overall user experience. In any case, Apple is boasting a 2x performance improvement over the previous generation iPhone.

This discussion, as expected, has expanded to rival CPUs like Qualcomm’s custom Krait core, used in Snapdragon, or Intel’s Atom processor. With all the talk of the processor performance (CPU and GPU), I found it interesting that there was only one brief reference to the memory architecture of the A6. At MemCon last week during the keynote presentation by Martin Lund, senior vice president at Cadence, mentioned the importance of ‘compute, interconnect and storage.’ He then continued on to discuss the time and energy engineering teams spend optimizing the memory interface to minimize latency. Of course at a memory conference we expect the focus to be on the memory, but the point is well taken. The CPU is only one part of the equation.

For the best system performance, proper consideration needs to be given to interconnect and memory architecture also, or all the CPU speed and internal architecture efficiency of the CPU will not be realized in the system. And getting the right SoC efficiency between CPU/GPU, interconnect and memory system makes a big difference on battery life as well as user-observed performance.

From my biased view of the world, I would agree that the on-chip interconnect, and to a lesser degree the memory subsystem, do not get the attention they deserve when considering the performance of these SoCs. It is certainly easier and more intuitive to talk about how the raw megahertz and the number of CPU cores translate into a better product than how the performance of the interconnect or the memory affects the system. For the SoC architects and design engineers, however it is another story. They are by necessity “getting it.” All the CPU/Graphics performance in world does not translate into better system performance in a heterogeneous compute environment if you have a poor interconnect and memory subsystem design.

Multi-threading and QoS techniques have been around for a long time as a way to improve system concurrency and memory efficiency. Even so, these architectural enhancements have not been widely used in embedded consumer SoCs. As I was scanning articles it was interesting to see that the new RAZR i phone uses a 2GHz single-core Medfield Atom processor with Hyper-Threading Technology. This technology allows a single core to do “two things at once,” and by taking advantage of concurrent processing they claim much lower power consumption compared to “ramping up dual cores.” Having two ‘virtual cores’ is a good illustration of how efficiency is gained by fully utilizing the existing hardware rather than throwing more cores at the problem.

The same concept of multi-threading can also be used—and given applications processor performance requirements really must be used—to get maximum utilization of the on-chip communications network. (I intentionally changed the terminology here, because we are now talking about more than a simple interconnect.) The use of multiple threads or virtual channels optimizes the utilization of connections in the network by combining physical connections that are under-utilized. Now one physical link appears as multiple logical links. This of course saves wires, but there are other advantages (keep reading).

Blocking and tackling. To help manage traffic from multiple processors (CPU, GPU, etc.) competing for DRAM resources, quality-of-service (QoS) or a priority setting is applied to the data. This can work fine, but trouble often arises when each core asserts that it has the highest priority traffic—effectively giving no one priority. This is known as a panic failure. Other failures can occur like ‘head of line blocking’ when higher-priority traffic gets stuck behind lower-priority traffic. Having virtual channels at your disposal will help the system designers avoid these blocking scenarios because the virtual channel enables a ‘passing lane’ for more important traffic. When one logical connection cannot make progress, other data that is not blocked is free to flow over the same physical link. This advantage here is much better system concurrency because processors spend much less time waiting (i.e. being blocked) for resources.

Get what you want and not what you need. At recent sales refresher course I was reminded that customers will not buy a product, no matter how good it is, if they don’t think they need it. This is a perfect summary for understanding why many of the non-blocking concurrency techniques that have been around for a long time are just starting to be implemented on a broader scale in embedded SoCs. Throwing more processor cores at the problem can become impractical for power- and cost-sensitive applications, along with the fact that DRAM bandwidth efficiency also is not moving forward fast enough from a mobile perspective. So SoC designers are now being forced to get more efficiency out of the system in order to take full advantage of all the CPU power available to them. Now they want and need solutions like non-blocking virtual channels for better system concurrency that is demanded for today’s smart devices.

—Frank Ferro is director of marketing at Sonics.

Verify This

August 23rd, 2012

By Frank Ferro
Verify this? No, New Jersey in me is not coming out. This is not a pejorative; it is simply a request and a question. It is a request by SoC designers to the verification team. It is also the verification’s team response when they realize the enormity of the task: “You want me to verify this?”

As I continue the discussion on the use of System IP for SoC design, one of the less glamorous or often forgotten tasks of the SoC design is verification. There has been much discussion on how the demands from both the consumer and cloud infrastructure markets are driving SoC complexity, so I will not elaborate here. It is enough to say, that as the number of individual IP cores and subsystems increase, with a mix of IP from third parties along with internally developed IP, the task of verifying the functionality of individual IP blocks, subsystems and the end-to-end system is providing new challenges for verification teams.

There are several aspects of verification, including functional verification and performance verification. Functional verification includes connectivity testing and IP integration. Performance verification includes many aspects of overall system testing such as speed, power and bandwidth for various use cases. Both types of verification are important and necessary in today’s complex SoC, but for now I will focus on the functional verification and save performance verification for another blog.

Over the last few years as device complexity grew, SoC designers discovered that a large percentage of chip bugs often were found in their internally developed system interconnect. Individual IP blocks from trusted sources were tested and verified (which is good), but when connecting these blocks to the system, errors occurred due to protocol mismatches or corner cases that were difficult to anticipate. This problem (among others) facilitated the use of on-chip network system IP as a solution.

Having the interconnect as a well-defined IP block (IP generator to be precise) allowed a ‘correct-by-construction’ methodology by defining a standard protocol interface or ‘socket’ for each core in the system. These sockets then can be verified based on the interface protocol, either standard or proprietary, with protocol checkers. It is expected today that the on-chip network IP ships with protocol checkers and test benches for standard protocols such as AMBA (AXI3/4) and OCP. It is also expected that the test environment be open and flexible enough integrate custom protocol checkers developed by the customer.

Is this enough? The use of on-chip network system IP with a socket interface for the most part has solved or reduced many of the connectivity errors, thereby giving designers confidence with unit level verification. The challenge facing the verification team now is how to test connectivity and functionality at the system level when they have multiple levels of networks. A subsystem, or groups of IP blocks for specific functions, has its own on-chip network, which then is connected to higher levels of the on-chip network at the system level (networks embedded in networks). Clearly a standardized methodology is needed to keep the task of verifying all this IP from spinning out of control.

UVM (Universal Verification Methodology) is an industry standard methodology that helps designers meet this challenge by allowing them to leverage and re-use verification infrastructure that is already in place from IP vendors. UVM provides the necessary hooks to connect IP-specific ports or interfaces, thus creating the verification infrastructure used to perform functional verification of IP that can be re-used at the subsystem and SoC levels. Passive monitors, scoreboard components and functional coverage modules can be re-used for SoC/subsystem verification. With UVM infrastructure, building a complete verification environment for an SoC containing multiple IPs is a task that now becomes cost-effective in terms of time and effort.

With complexity of SoCs growing exponentially and time to market shrinking, any reuse of verification infrastructure from IP vendors can help to speed-up SoC delivery, which is critical for SoC product execution. Having system IP that includes a UVM verification infrastructure removes “shock and awe” from the verification team at having to create their own infrastructure. Now when asked to “verify this” they can simply reply, “no problem.”

—Frank Ferro is director of marketing at Sonics.

The Power Of Dark Silicon

July 26th, 2012

By Frank Ferro
Even though the cloud is permeating everything we do today, I was recently reminded that it’s even omnipresent far outside the walls of tech. With all the TV ads, as well as our most prominent airports and U.S. highways peppered with cloud-based billboards, even our parents know how to properly use cloud in sentence today. But to hear about the cloud from the pulpit at church on Sunday, that caught me a bit off guard. (And no, the punch line was not that heaven was in the clouds!)

A visiting homilist explained that he could travel light—especially to a high-tech place like Silicon Valley now that all his information was in the cloud and easily accessible from anywhere, anytime. It was true, he had no problem accessing his homily, until he tried to print and the printer ran out of ink. Unfortunately, the content was stuck in the cloud so he had to “wing it.” Needless to say, this quickly brings us back down to earth with a firm reminder that as powerful as the cloud is, at some point we are limited by the performance of (or lack thereof) our local devices.

Clearly our local devices are not much use without cloud content, but the opposite also holds true: Accessing information is only as good as our local device. When browsing Web sites from a smartphone, for example, how often are we frustrated with waiting for the page to render? With technology’s near limitless trajectory and upgrades, we are no longer forgiving consumers and our patience has hit an all-time low with most of our “smart” devices.

Web browsing speed is now a major metric when comparing smartphones. We want the information and we demand it now. Of course, there are many factors that affect Web page speed, but if we normalize the connection to the cloud, a fair comparison can be made of device performance. The first thing everyone looks at is the processor/GPU speed and number of cores. Speed is important, of course, but it’s not the entire picture.

To ensure the best overall device performance, the entire system design needs to be considered. This includes the performance of all the heterogeneous processors, the overall dataflow efficiency and power consumption of the underlying SoC architecture. All of these topics are worthy of further exploration (and likely future blogs), but I am going to focus on the one you might consider the least likely to impact performance—power consumption.

Everyone understands that the processor performance and power must be balanced in mobile devices to achieve acceptable battery life. For years, this has been a key differentiator for processor companies (both IP and hardware) competing in the mobile market. The problem is compounded as more and more functionality is packed onto a single silicon substrate, (e.g. 28nm) with multiple processor cores operating at different frequencies while trying to perform multiple tasks concurrently. The problem now is not only power consumption, but also power density.

Consider again the case of Web browsing on a mobile device. If you want to open a Web page while running multiple applications concurrently (and today this is an automatic must) then the power of the processor can easily ‘spike’ above set limits permitted by the wireless specification. This forces the processor to ‘throttle back’ because it is getting too hot to support the task, and ultimately significantly slows down the speed at which the page can be viewed. This is not as bad perhaps as running out of ink when needing a hard copy at the last minute, but it is another factor that can limit the end user experience, making a device much less desirable and competitive in a highly competitive market.

So at the SoC level, how can system architects deal with these increasing power challenges? Using power management IP, which is an important subset of system IP, allows the SoC architect to take a different approach when considering the system power design. System IP for power management allows the architect to look at the chip from several power perspectives, including the lowest hardware levels that define power domains, control of individual cores or subsystems, and even power management from the application perspective.

The Plumbing: A complex SoC, especially for the mobile market, easily can have 10 or more power domains. And within these domains there are many more frequency and voltage domains. System IP tools allow the SoC architect to efficiently define these domain hierarchies and automatically insert the correct type of logic to support domain boundaries. Many gates (read chip area, cost and power) can be spent crossing domain boundaries. Therefore, efficient implementation is critical for the best performance/power ratio. Additionally, having support for CPF (Common Power Format) and UPF (Universal Power Format) is also critical for the design flow.

Keeping Silicon Dark: The next step is to ensure that all silicon blocks are kept dark, or off for as long as possible. Unfortunately, today most power control is done via software only. This can be a relatively slow and unreliable method, because the CPU has to wait many cycles from the time a power down command is issued to the actual power down of a core. It needs to be certain that all transactions in the system are complete.

Using system IP (where the power control resides inside the on-chip network), allows for much finer control of power up and power down of system cores than exclusively software alone. For example, the on-chip network has visibility to all the transactions in the system, so by interacting directly with a power management unit, the network, when a power down command is issued, can halt any additional transactions to the subsystem or core being put to sleep. Once all transactions are complete, the network can then let the power manager know that it is safe to shut the core down. This is much faster and more reliable than using software alone, delivering a significant power savings—2x or perhaps up to 3x. The same is true for wake-up because the on-chip network can ‘see’ a transaction coming to a core that is asleep and can quickly notify the power manager to wake it up.

Moving forward: By using system IP for the ‘power hierarchy and system control’, it would be possible to extend this control even up to the level of the application. Having a dedicated power processor with interfaces to the software, applications can become acutely ‘power aware’ by giving them knowledge of the hardware and the other applications with which they are interacting. Although this vision may be a few years away, it would significantly improve the overall user experience and reduce the total power consumption of any mobile device. But for now, if the system architect takes advantage of system IP, they can develop smarter power architectures that will not only reduce power, but actually improve performance since more of the silicon can be ‘dark’ and there is less chance of hitting those power limits.

So the next time you need to access that information from the cloud but have to wait, think of how handy that system IP would come in now…

—Frank Ferro is director of marketing at Sonics.

Next Page »