Posts Tagged ‘Microsoft’

Power Bits: Jan. 28

Friday, January 28th, 2011

By Ed Sperling
Microsoft is looking for 16-core servers for future data centers using Intel’s Atom and AMD’s upcoming Bobcat processor lines in order to lower power consumption in data centers. The announcement, made at the Linley Data Center Conference in San Jose this week poses an interesting dilemma for Intel and AMD—as well as challenges for ARM and even Microsoft.

On the Intel and AMD front, the big question is how such a shift would impact revenues, given the fact that both companies have used the power-saving lite versions of their processors in much lower-cost devices such as netbooks. While some data centers are experimenting with Atom-based arrays for servers, the real savings have come from virtualization to improve server utilization, and cloud-based strategies to quickly ramp up and ramp down compute capacity as needed.

Virtualization has been extremely successful. Most large companies have adopted it to some extent because it costs money to power and to cool a server, whether it’s utilized at an optimal 60% to 85% or whether it’s running at 5% to 15% utilization, which was the industry average prior to virtualization. The problem for Microsoft was that the virtualization was done using VMware and Citrix software, not Microsoft software.

Without virtualization, it’s not entirely certain whether many applications will be able to natively take advantage of 16 cores, considering most currently don’t use more than one or two. In fact, the main applications that can be effectively parallelized are databases, graphically-oriented applications such as Adobe Photoshop, and highly computational scientific applications, where the biggest threat to Microsoft and Intel is Nvidia.

Microsoft’s approach will likely be a more effective management of virtualized applications across those cores so that cores can be turned on and off as needed, but it’s certainly not the only company that sees that opportunity. Virtualization currently uses all the cores indiscriminately, but much more intelligence is being added into the virtualization and middleware layer to cut energy consumption.

For ARM, that means even greater engagement with both Intel and AMD, where it will have to push lower power while defending its performance and the competitiveness of Linux. Given ARM’s grassroots type of ecosystem marketing, it remains to be seen whether it can rise up to the din of the marketing machines of its new competitors. Lower power consumption is a good story, but in the enterprise so are performance and deep relationships.

The old adage that no one ever got fired for buying IBM can now be applied to Microsoft, Intel, AMD and to a lesser extent VIA. Most IT departments have no history with ARM, except in handheld devices, and IT is one of the most conservative purchasing groups on the planet because the stakes of making a bad decision can be monumental. Breaking into the mobile market takes months. Breaking into the IT world can take years, and sometimes even decades. This isn’t a battle fought on technological merits. It’s like a medieval siege. And while ARM may meet the technology challenge, it remains to be seen whether it can meet the long-term marketing challenge.

In this part of the market, the adage about IBM is still true. IBM’s mainframe sales are up, in part because mainframes are still the most secure and effective virtualization environment. IBM invented virtualization in the 1960s, incidentally. And on its newest machines it’s offering water cooling once—which can further cut power consumption because it costs less to cool.

Power Bits: Jan. 7

Friday, January 7th, 2011

By Ed Sperling
Microsoft will develop its next version of Windows for AMD, Microsoft and ARM SoCs. The emphasis is on SoCs, and the focus of SoCs has been on two things: power and the reusability of existing and commercially developed IP.

This is an interesting challenge for Microsoft, as well as for Intel, AMD, and ARM’s slew of partners. A general-purpose OS takes a lot more code to create—and it takes a lot more power to use—than a real-time operating system or an embedded version. The result is greatly reduced battery life and more time with a plug in the wall. Even open-source Linux has the same problem, which is why companies such as Mentor Graphics offer a slimmed down embedded version.

The big question for architects of these SoCs will be one of priorities. What takes precedence? Is it processing power? Is it performance? Or is it segregation of more efficient code for individual cores.

Microsoft’s announcement doesn’t address these kinds of issues. Intel has said next to nothing other than a canned statement from Douglas Davis, VP and GM of the tablet group: “…what is so exciting is how our two companies will be able to match a tailored, low-powered operating system with future generations of our popular Intel Atom processors…”

And comments from ARM, and ARM customers Nvidia, Qualcomm and TI have been no more enlightening. This isn’t a simple problem to solve while maintaining backward compatibility with bloated applications developed when power efficiency were far less critical than ease of use and connectivity. And it’s not one that anyone is likely to be talking about for at least a year or more. But when they finally do start talking, it will be very interesting to hear how these companies will position Windows and its very large code base.

One On One With South Korea’s CTO

Thursday, August 12th, 2010

By Ed Sperling
Chang-Gyu Hwang, national chief technology officer for South Korea, sat down with Low-Power Engineering to talk about the future trends in technology, global business and power. Prior to his current role, which was created by the Korean government in April, he ran the semiconductor business at Samsung, where he spent the last 20 years in top management positions. He also is the former CTO at Samsung. What follows are excerpts of that interview.

LPE: What do you see happening next in technology?
Hwang: Over the past 20 years we’ve gone from PCs to mobile and the Internet. That will go to the mobile Internet, which will dominate over the next couple years. Then we’re expecting some fusion type of industry. There is a lot of room to improve, technology-wise and business-wise, but to satisfy and raise customer demand there will have to be some fusion industry. Inside IT many technologies will be embedded. Biotechnology will be a future technology. So will nanotechnology.

LPE: How will these technologies be used?
Hwang: In green transportation systems, for example, every country has automobiles, express trains, battery technology and battery charger systems, a smart grid and even nuclear energy. All of these are related. We’re expecting these industries will converge. Korea is relatively late in the industrial revolution, but it is strong in IT, shipbuilding, nuclear power and smart grids. We are looking for synergy in different technologies. I’m spending a lot of time in the United States. It’s a big market, and there are interesting technologies. I’m looking for partnerships from an open innovation point of view.

LPE: There’s a huge emphasis on saving power these days. What will change and why, both from a technology and a business standpoint?
Hwang: Smart grids are about using energy more intelligently and optimizing it, both from the producer and the consumer point of view. Korea is doing both. Korea has a relatively good start. We’re educating engineers. From an industry point of view, we’re well balanced in terms of nuclear and other types.

LPE: You mentioned embedding technology and convergence. Can you elaborate?
Hwang: It started with semiconductors. Many functions are now embedded into a single chip. They’re more cost-effective and use less power. At Samsung we introduced fusion technology where we embedded SRAM logic into memory chips in the OneNAND chip. There was intelligence in the software. We improved speed, reduced the chip size, used less power and boosted overall performance. That chip is dominating all markets because of its many advantages and lower cost. That kind of trend will be more prevalent. But the whole concept of an SoC was introduced because other chips were too expensive. The future will be in the fusion of emerging technology into one chip in the mobile and multimedia area. That will happen in other industries such as energy savings for automobile applications and across other industries.

LPE: Will the boundaries change between who produces what?
Hwang: Korea is relatively strong in hardware. It has a good total solution for making components, with good engineering skills. It knows how to reduce cost while maintaining relatively high performance. In the United States, companies like Google, Microsoft and Apple have different strengths. They’re strong in business and technology. Google is good in search engines, but they need server, low-power and low-latency technologies. Korea is a major supplier of components for Apple. If you apply this to other industries, there are many opportunities for collaboration. The United States has a lot of creative ideas and is very good in software, but in Korea there is more application in new fields. The two working together will drive those industries.

LPE: How about China?
Hwang: China is another variable. They are not a first mover, but they are a very effective follower—especially with low labor costs and low-cost solutions. The United States and Korean collaboration is at a higher level than what China will offer. Korea spends a lot of money on R&D, and it’s strong in core technologies in several industries. At that level we can collaborate with the United States.

Dr Hwang

LPE: What is your big challenge?
Hwang: My goal is to become the No. 5 technology country by 2020. We also need to be a first mover in some technologies. I have to figure out which industries to target, from the early design and planning stage. These should be as unique as possible, whether it was started in Korea or whether it can be developed using open innovation from other sources.

LPE: Where does the government play vs. companies?
Hwang: When you look at major Korean companies like Samsung, LG or Hyundai, they have their own plan. They are investing in R&D for the next era. But from a national R&D point of view, there’s more risk taking involved. We try to find areas that are not easy to get into, consolidate whatever research is out there, and then create new projects. Eventually companies will get involved, but initially this will be a government effort.

LPE: So this is pure research?
Hwang: Yes. More long-term research and more fundamental. I call it R&BD—research and business development. Otherwise we will not consider it seriously.

LPE: Can companies afford this kind of research on advanced SoCs?
Hwang: Normally a company cannot afford to initiate a project by itself. But at this moment we need to collect all the ideas, not only within a company. It also has to come from research laboratories, institutes and university research. There needs to be consolidation from the initial stage to the planning stage. That’s the reason we are initiating this from a national level. That’s our role.

LPE: How much of this is done through universities?
Hwang: We don’t involve them much, but we are inviting leading institutes and universities to collaborate at the initial stage.

LPE: Is there a lot of opportunity in fixing the Internet?
Hwang: The Internet still needs a lot of new technology. It’s handling massive storage. It needs low power technology. But most consumers are still asking for more speed. Low power and high speed are a contradiction, so we need to bridge these two extremes. There is an opportunity for technology. There’s also an opportunity from a business standpoint for using new technologies.

LPE: Where is the United States lagging?
Hwang: The United States is quite late for biotechnology. But there are opportunities for merging biotechnology with information technology with things like protein chips or chips that can check the status of your heart. There are many applications to find disease in the early stage at a very low cost. Korea is one of the leaders in this kind of hardware, as well as mobile phones and TVs—which are all linked to each other.

Power Optimization Drives Embedded And Multicore Software

Thursday, December 10th, 2009

By John Blyler

Max Domeika, senior software engineer in the Developer Products Division at Intel, sat down with LPE Consulting Editor John Blyler to talk about the growing importance – and intersection – of both the multicore and embedded markets. What follows are excerpts of that conversation.

LPE: Intel’s software focus seems to be following its hardware processor drive into both multicore and embedded markets. What challenges does that bring for traditional software developers?

Max Domeika: Coming from a background in the desktop software application space at Intel I’m now spending more time working in the embedded multicore arena. This year I’ve been particularly focused on power issues, primarily on the Atom processor. My task is to see what sorts of tools are needed to help developers move to both embedded and multicore applications. Already I see a long term need for both power optimization and power measurement tools. The key is to monitor the power-related impacts of your application on the specific and overall system performance.

In the past, desktop clients and server users haven’t had to pay much attention to power. Desktop systems plug into the wall and their software applications use as much power as the processor wants to give them. Over the past several years these processors have incorporated features that help control the amount of power usage in both C (idle) states and P (operational) states.

How do these processor states affect the development of software applications?

In the past, processors either ran at full speed or idle. Several years ago hardware designers added features to the chips to control how deep of a sleep the processors are in. As you know, these features allow different portions of the chip and caches to be turned-off. One of the challenges is that while deep sleep saves more power, it often takes more power to wake up.

For the software developer, this means that you don’t want the application to enter the deepest C-state if you will have to wake up immediately. There needs to be some smarts as to how deep a sleep you go into, which is really an operating system issue. P states, or operational states, utilize varying frequency and voltage to balance the amount of the execution that the OS determines is needed. These states directly affect the performance of the system.

These states can have a big impact on the application, restricting how developers write the code. If your application causes the processor to perform poorly, that will have a negative effect on power utilization. Developer need more mature tools to help figure out which application processes result in effective use of the C-states.

I see a need for continued maturity of the power optimization and power measurement tools and methodologies. These power tools must also be tied to traditional performance analysis and optimization tools, because many of the techniques for mitigating power are the same techniques that you use for traditional performance optimization. The entire system must run as quickly as possible while using as little power as possible.

At the other end of the spectrum are the same issues of power optimization and performance analysis, but applied to a multicore environment. Tools and methodologies need to mature to include multicore development, too.

Are the two worlds of embedded and multicore coming together? After all, Intel’s Atom isn’t yet a multicore architecture, is it?

Well, some instances of the processor already support hyper-threading, a technology that dates back to the Pentium 4 processor. The key here is hyper-threading, which makes the environment look like two (or more) processors from the point of view of the operating system. That’s why the software techniques that developers use on multicore are starting to have an impact on embedded applications targeting the Atom processor.

Isn’t the low power push also affecting the high-end embedded and multicore processors like the Xeon?

Power is important across the board. We’ve seen power optimization become important in servers – especially with the “greening” of data centers.

Let’s talk about the actual tool environment for both power issues and multicore design. What’s happening there?

Today, I’d summarize the tool environment as consisting of a collection of separate tools and techniques. For example, if you want to do power optimization then you might use Power Top – an open source tool for doing power measurement. Conversely, if you want to do performance analysis – to count cache misses, branch mispredictions, memory fetches and the like – using performance monitoring counters, you might use another open source tool called OProfile. Intel also has a tool called the VTune Performance analyzer.

These tools show what performance issues are occurring on the chip, which in turn helps the developers to optimize their code. For example, if you see examples of high cache miss rates, you can investigate to see what portions of the code are causing this problem. This might mean that the data structure of the application needs to be changed to get better cache performance. Performance and power tools give the developer a means of getting valuable feedback from the hardware.

Most desktop application developers are well versed in Microsoft’s Visual Studio IDE. What tools are available for these developers as they move toward multicore applications?

Intel has the Intel Parallel Studio, which integrates well with Microsoft’s Visual Studio for multicore (parallel) code development. Parallel Studio is not targeted at embedded folks, but rather at the desktop client environment. Intel has tools that also help with the programming interface, compiler, libraries and more. With regard to debugging, we have a set of enhancements that integrate into the MS Visual Studio to help with parallel debug.

Debugging is a key issue. While developers could someday spawn 64 threads on a multicore chip – because they have 64 cores – that is not the best way to begin. In multithread implementations it’s best to start with one thread and make sure the program works, then we’ll move up to two and four, etc. Good debugging tools provide easy mechanisms to start debugging one thread then scale to more cores, i.e., they are serially consistent.

Another challenge in multicore development is in the area of configuration control. You may have multiple threads running on multiple cores, but you don’t want multiple version of the same code. Instead, you want one version of the code with a parameter that you can change that will change as your processor cores change. Again, good debugging tools have those configuration control features.

Tools work best when they are following a set methodology. You’re co-chair with David Stewart from CriticalBlue on the Multicore Programming Practices (MPP) Working Group – part of the Multicore Association for which Markus Levy is CEO. Please give the readers a quick update of the MPP.

As you know, our focus is on documenting best-known methods for multicore software development techniques. This year we are in the middle of documenting the best practices. Internal review of this document should begin shortly. We hope to have it ready for external review by the first half of next year.

This document will fulfill a need that is oftentimes overlooked, namely, what are the best practices using the technology that is available today. We have customers that are becoming more and more aware of the challenges of multicore software development. But we are still building awareness and educating the larger group of mainstream programmers. Even when mainstream developers identify the need for a multicore program, they are often stuck with their existing code. Not everyone has the resources, time or need to complete rewrite their legacy code. The MPP document will provide mainstream programmers with a workable set of best practices for multicore development throughout the typical life cycle development process: analysis, design-implementation, debug and the performance tune-up phase.

That’s the big vision. We just have to keep executing. This is all volunteer work, so it’s not something where I can say, ‘Hey, let’s meet this schedule in two weeks. You have to do it.’ Instead, we just have to keep the momentum going. I am pretty pleased with the progress. The feedback from our internal surveys is positive

The goal of your best practices working group is to use existing languages like C/C++ to develop multicore applications. Do you see the need to create new programming languages?

The Multicore Association has other working groups that are developing standards for new software approaches, such as those focused on multicore communications and runtime APIs. The overall plan is to incorporate best-known techniques using those APIs as we move forward. But it’s hard to predetermine the best-known methods before the APIs are available. We won’t know until we get there, but we can’t wait for one to proceed before starting the other.

Intel is working on several technologies both language extensions and new APIs so yes, there is a need for technology; I’m not so sure on new programming languages.  In embedded, C and C++ are going to be with us for sometime, so I’d say there’s probably less need in embedded.

Intel’s Challenges In China

Wednesday, April 15th, 2009

By Gongyu Su

Beijing__Intel has never abandoned its attempt to become a dominant player in the embedded systems market, but it faces enormous odds in China.

Intel’s strength is in the CPU world, dating all the way back to the 386 processor. In the Chinese embedded market, the xScale PAXA255/PXA270 is quite familiar to many engineers, as well. The CPU giant adopted ARM kernel with a view toward trying out the promising handheld terminal market (PDA, smart phone). The product proved to be quite popular from 2003 to 2006. In fact, xScale was used by the 2006 Intel Cup Undergraduate Electronic Design Contest-Embedded System Design Invitational Contest, which was attended by more than 160 teams.

But market awareness and market penetration are two different things. A heavyweight in one market always has a tough job breaking into other markets. Only a few companies, such as Broadcom and Apple, have figured out a way to do that. Intel has made concerted efforts to break out of its core markets before with cable modems for broadband access, xScale in the handheld market and WiMax for network access anytime and anywhere. So far, its success is still confined to the CPU market.

The reason: Industry heavyweights are almost always their own worst enemy. Despite their name recognition, they still have to compete in unfamiliar markets with new players who are struggling for survival in unfamiliar fields. And they still have to maintain their current business. The so-called Wintel alliance cannot help Intel in other fields, and the enormous bureaucracy within Intel makes it difficult to expand and quickly adapt.

That adaptation is critical to deal with market changes, as well as to deal with local heroes that sprout up during times of change. The world is connected by the Internet, the network is mobilized by 3G, information is secured through Google’s cloud computing, friends are linked by Facebook and Linked In, and multimedia applications become part of the iTunes environment. In this new world, the power of Wintel Alliance has been marginalized. Every person will have multiple ARM processors, while one in several families may have a single x86 processor.

In China, the importance of Google, QQ (China’s popular instant messaging platform) and Gmail has far exceeded Office’s penetration into the average person’s life. The Internet, SMS, 3G and LBS (Location-Based Service) are the focus of technologies nowadays. This presents a huge challenge for Intel.

The grassroots kingdom of embedded system

For many years, the desktop office system has been ruled by Intel and Microsoft. By constantly improving the speed of CPU, Intel entices consumers to spend more money to support the inefficient applications that continue to fill up the Windows system. While driving industry growth, the alliance also leads to increasing system costs. Ten years ago, few people thought a 1GB hard drive necessary, but today any Microsoft application can use up hundreds of megabytes of disk space—whether it’s for an Intel CoreDuo or an Atom chip.

In a world dominated by these two giants, handhelds, terminals and network equipment that require high cost efficiency—as well as energy efficiency—have been forced to look outside the Intel world. Grassroots vendors teamed up to form a complete industry alliance ranging from IP (ARM, MIPS, PowerPC, etc.) to microcontrollers (from 4 bit to 64 bit) and operating systems (such as Linux and uC/OS) to compiling and debugging environments to support low-cost, simplified code that requires minimum power. These embedded systems have gained a strong foothold in China.

An embedded system is a special-purpose computer system designed to perform one or several dedicated functions. It ranges from portable terminals to large network equipment. Different from general-purpose computer systems, embedded systems usually execute predefined tasks with special requirements, often with real-time system constraints. As the tasks are specific and designated, designers can optimize the embedded system to reduce the size, increase reliability and performance. Embedded systems are usually mass-produced and cost-sensitive; a few cents difference in cost may affect the future of the product, so engineers always choose hardware configurations that exactly meet the required functionality.

At the core of an embedded system are one or more pre-programmed microprocessors or microcontrollers for executing a few tasks. Unlike general-purpose computers that can run software selected by users, embedded systems usually run programs on limited hardware resources. For example, most systems use flash instead of hard drives for storage so that the codes are optimized and simplified to the greatest extent, whether there’s an OS or not. The software is usually solidified as firmware on one or more ROM or flash IC chips and remains temporarily unchanged.

Due to features in hardware design, PDAs and handheld equipment are considered embedded systems, despite better software scalability than other systems. However, the definition is becoming vague, as some systems have adopted universal software platform such as Windows XP and USB interfaces, and they traditionally belong to personal computers.

Gongyu Su is the general manager of EEFocus, the Chinese affiliate of Low-Power Design and System-Level Design.

Writing Application Software Directly To The Metal

Friday, March 13th, 2009

By Ed Sperling

How necessary is an operating system?

That question would have been considered superfluous a decade ago, possibly even blasphemous and career-limiting. But it now is beginning to surface in low-power discussions, particularly in compute-intensive applications where performance and power are both critical. General-purpose operating systems constantly call on the processor for updates, while software written straight into the metal using Verilog or System C can be written for specific cores.

Highly parallelized applications such as search, particularly in bioinformatics, already are exploring writing applications directly into FPGAs. And heterogeneous cores may give application developers more reason to write to the chip rather than an operating system application programming interface (API).

For application developers, power is as much a balancing act with performance as it is for hardware developers. While classical scaling before 90nm provided both power and performance benefits at each process node, the decision has moved largely to one or the other. For every gain in performance, there has to be a subsequent drop in power somewhere on the chip. Otherwise the clock speed cannot be improved without burning up the chip.

That has prompted software developers to look for different solutions. Even Intel, whose success was built almost entirely on tight integration with operating systems—Windows, Mac OSX and Linux—is looking at utilizing some of the cores in its future chips differently.

“There is broad agreement that we need to be able to represent the ability to do parallelism at the application level and not force everything through the operating system,” said Pat Gelsinger, senior vice president in charge of Intel’s Enterprise Group. “Any time you have a call through the operating system to get a resource—whether it’s a thread or an I/O—your application has gone away for thousands of clock cycles. You want to do that when you need something that only the operating system can give you.”

Typically the operating system acts like a layer of middleware. It makes the connections through its APIs that allow applications like Office to work together so that portions of one application can be dragged and dropped into another. But in highly parallel applications, the interactions are largely within the application rather than with other applications.

“There is an active effort to move some of this parallelism to the application level so the application programmer, given the right tools and libraries, can take advantage of that.” Gelsinger said. “Microsoft has taken steps like that recently with networking and the NPI (network programming interface) layer—moving it into the user space. Use the operating system for what you need it for, but allow parallelism to be more lightweight. Those steps are under way, and they will have great benefit. It started out as the HPC (high-performance computing) community, where they were using tens of thousands of threads.”

IBM is likewise experimenting with a thinner operating system layer for its Power architecture. Brad McCredie, chief architect of the new Power 6 chip and an IBM Fellow, said one of the first examples are hardware accelerators, which are being used to speed up applications.

“We’ve already created an architected layer in the Cell processor,” said McCredie. “It’s not exactly writing software into the metal. We gave the software programmers an architected interface, so we hid some of the messiness of the 100 gigaflop accelerator with a new generalized interface, which is OpenCL. We expect to put in multiple types of accelerators in the future.”

At some point, though, even this approach will run out of steam. McCredie said the debate inside IBM right now is when exactly that point will occur. He believes it will happen at 22nm.

“Eventually we’re going to run out of power on a chip,” he said. “The next way will be to design devices to do fewer and fewer things. That trend will happen. The question is whether we will be able to invent a more specific device that can do 80% of the workloads at less power? If it only does 10%, then no one will write a line of code for it. But if it covers 80%, then it will have much better power/performance.