The objective of this project is to develop general-purpose architectures that can meet the performance demands of future complex media applications in an energy-efficient way, while also continuing to work well on other common workloads for desktop, laptop, and handheld systems. This project makes two broad contributions:
(1) Analysis of complete media applications. We analyze several complex media applications and make the case that complex media applications require efficient support for multiple types of parallelism including instruction and thread level parallelism (ILP and TLP), and multiple forms of DLP such as sub-word SIMD, vectors, streams, and vectors/streams of SIMD.
(2) ALP: Exploiting ILP, TLP, and DLP with an evolutionary programming model and hardware. Our second broad contribution is a complete architecture, called ALP, that effectively supports all levels of parallelism described above in an energy efficient way, using an evolutionary programming model and hardware.
The most novel part of ALP is a new technique called SIMD vectors and streams to support larger amounts of DLP than possible with sub-word SIMD (SIMD for short). The programming model for SIMD vectors/streams lies between SIMD and conventional vectors. SIMD vectors exploit the regular data access patterns that are the hallmark of DLP by providing support for conventional vector memory instructions. They differ from a conventional vector implementation in that computation on SIMD vector data is performed by conventional SIMD instructions.
Our evaluations show that our design decisions in ALP are effective. Relative to a single-thread superscalar without SIMD, for our application suite, ALP achieves aggregate speedups from 5X to 56X, energy reduction from 1.7X to 17.2X, and energy-delay product (EDP) reduction of 8.4X to 970X. These results include benefits from a 4-way CMP, 2-way SMT, SIMD, and SIMD vectors/streams.
This work is in collaboration with Yen-Kuang Chen and Eric Debes at Intel.