4-PEAKS Research

RSIM Home

4-PEAKS Home

People

Results

Papers

Funding

Links

4-PEAKS Research Results and Ongoing Work

Execution time predictability

Hardware adaptation to save energy

Hardware adaptation for temperature control

Real-time scheduling for simultaneous multithreaded (SMT) processors

Energy consumption of SMT vs. CMP

Analysis of media applications on current general-purpose processors

Reconfigurable caches

Architectures for network workloads

Cross-layer adaptation

Execution time predictability

One commonly cited shortcoming of general-purpose processors is that their complex features (e.g., out-of-order issue) result in unpredictable execution times, making them unsuitable for real-time multimedia applications. Our ISCA'01 paper tests this conjecture by examining execution time variability at the frame granularity for several multimedia applications. We find that while there is often some variability, it is mostly caused by the application algorithm and input. In contrast to conventional wisdom, aggressive architectural features induce little additional variability and unpredictability.

Hardware adaptation to save energy

Two sources of energy-inefficiency in modern processors are: (1) computational slack, where the system runs faster than necessary for the application's real-time constraint, and (2) variable resource utility, where often resources stay active, consuming energy, but contributing little to performance. Recently, researchers have proposed two forms of hardware adaptation to improve energy efficiency: architecture adaptation and dynamic voltage/frequency scaling (DVS). A key to the effective use of these adaptations is the control algorithm that determines when and what to adapt. We have proposed the first (to our knowledge) adaptation control algorithms that integrate both architecture adaptation and DVS and address both sources of energy inefficiency (computational slack and variable resource utility), targeted towards multimedia applications. Our results show that the proposed algorithms are effective at reducing energy consumption in a variety of scenarios, architecture adaptation is effective with and without DVS, and addressing both sources of energy inefficiency gives significant gains. Overall, an integrated design works better than using any technique alone.

This work appeared in MICRO'01 (an adaptation control algorithm that addresses computation slack with integrated DVS and architecture adaptation) and in ASPLOS'02 (an algorithm that integrates the MICRO’01 algorithm with algorithms to address energy inefficiencies through variable resource utility).

Hardware adaptation for temperature control

Power management to control chip temperature is becoming an increasingly important area of research for general-purpose processors. In order to save on chip packaging costs, Dynamic Thermal Management (DTM) techniques have been proposed. However, these techniques result in degradation in processor performance when invoked. Exploiting the features of multimedia and networking applications, we are developing techniques for temperature control that have a much lower impact on performance than current techniques. More results coming soon…

Real-time scheduling for simultaneous multithreaded (SMT) processors

The scheduling of soft real-time applications such as multimedia applications on SMT processors introduces new challenges that cannot be met by traditional real-time scheduling algorithms. In particular, to run these workloads on SMT processors, we must determine 1) which threads to run simultaneously (the co-schedule) and 2) how to share processor resources amongst them, given the real-time constraints of the applications. Our RTSS’02 paper was the first to explore soft real-time scheduling on an SMT processor. We examined previous multiprocessor co-scheduling algorithms, including partitioning and global scheduling. We proposed new variations that consider resource sharing and try to utilize SMT more effectively by exploiting application symbiosis. We found that the best algorithm uses global scheduling, exploits symbiosis, prioritizes high utilization tasks, and uses dynamic resource sharing. This algorithm, however, imposes significant profiling overhead and does not provide admission control. We proposed alternatives to overcome these limitations, but at the cost of schedulability.

Energy consumption of SMT vs. CMP

Results coming soon…

Analysis of media applications on current general-purpose processors

Our ISCA'99 paper was our first step in understanding the performance of multimedia applications on general-purpose processors. The paper finds that several conventional processor techniques that enhance instruction-level parallelism (ILP) and the recent media ISA extensions are generally effective for our media benchmarks. The memory behavior of the benchmarks makes large caches generally ineffective, but software prefetching can often be used to substantially improve memory performance. After the use of software prefetching, our benchmarks become primarily compute (vs. memory) bound, motivating a focus on improving computation speed.

Reconfigurable caches

While large caches are generally not effective for media applications, a large number of on-chip transistors will continue to be devoted to caches for other general-purpose applications. Our ISCA'00 paper proposes a new reconfigurable cache organization that allows the cache SRAM arrays to be dynamically divided into partitions that can be used for other processor activities. For media applications, we illustrate the use of such reconfigurable caches for instruction memoization to improve computation speed.

Architectures for network workloads

Network applications are becoming a major part of computing systems and everyday life. As today's networks getting faster, more processing power is needed for these applications. Currently, there are many specialized network processors that have dedicated instruction sets and functional units for packet processing. There is much diversity in these architectures and it is yet unclear which design is best for network applications. We are exploring the behavior of network applications on general-purpose processors and are researching the special needs of these applications in terms of performance, predictability, and power consumption.

Cross-layer adaptation

Our results on cross-layer adaptation are reported on the GRACE web site.