Experiences Implementing Tinuso in gem5

  • Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjaergaard, Nicklas Bo Jensen, and Sven Karlsson. Experiences Implementing Tinuso in gem5. Second gem5 User Workshop, 2015.
    [BibTeX] [Abstract] [Download handouts]

    In recent years, high performance computing systems have started to make use of FPGA based hardware accelerators to improve performance and power properties. While FPGAs are becoming more competitive in terms of speed, power efficiency, and logic capacity, the effort required to program these heterogeneous systems has limited their impact. To address these issues, our lab has developed Tinuso, which is is a lightweight processor architecture we designed and optimized for implementation on FPGAs. Tinuso is designed to be multi-core and configurable for specific applications. To identify an ideal multi-core configuration for an application, we need a powerful simulation environment that can efficiently explore the design space. We use gem5 as our simulation platform, and have added support for our processor architecture. For Tinuso, we follow a hardware/software co-design approach in order to keep the hardware resource usage low. This results in processor cores that deliver significantly higher performance while requiring fewer hardware resources than commer- cial processor implementations. However, Tinuso is in many ways different from other processor architectures. To maintain high operating frequencies Tinuso includes a large number of delay slots; four for standard branch instructions. Tinuso also includes delay slots on other instructions, where the result is not available for a number of cycles after the instruction has been executed. To complicate matters, certain instruction patterns have a different number of delay slots. For example, the compare instruction normally has two delay slots, but only one when followed by a branch. We have modified the gem5 in-order CPU model to support a dynamic commit delay. This allows us to fetch instructions in the proper order, which maintains cache behavior, but still simulate the behavior of Tinuso?s delay slots. Along the way, we have found gem5 to be a very flexible platform that we have been able to use for our design space exploration and compiler verification.

    @Misc{2015-06-MAXWELL,
    title = {{Experiences Implementing Tinuso in gem5}},
    author = {Maxwell Walter and Pascal Schleuniger and Andreas Erik Hindborg and Carl Christian Kjaergaard and Nicklas Bo Jensen and Sven Karlsson},
    howpublished = {Second gem5 User Workshop},
    address = {Portland, Oregon, USA},
    date = {2015-06-14},
    year = {2015},
    handouts = {http://www.m5sim.org/wiki/images/f/f5/2015_ws_16_gem5-workshop_mwalter.pptx},
    abstract = {In recent years, high performance computing systems have started to make use of FPGA based hardware accelerators to improve performance and power properties. While FPGAs are becoming more competitive in terms of speed, power efficiency, and logic capacity, the effort required to program these heterogeneous systems has limited their impact. To address these issues, our lab has developed Tinuso, which is is a lightweight processor architecture we designed and optimized for implementation on FPGAs. Tinuso is designed to be multi-core and configurable for specific applications. To identify an ideal multi-core configuration for an application, we need a powerful simulation environment that can efficiently explore the design space. We use gem5 as our simulation platform, and have added support for our processor architecture.
    For Tinuso, we follow a hardware/software co-design approach in order to keep the hardware resource usage low. This results in processor cores that deliver significantly higher performance while requiring fewer hardware resources than commer- cial processor implementations. However, Tinuso is in many ways different from other processor architectures. To maintain high operating frequencies Tinuso includes a large number of delay slots; four for standard branch instructions.
    Tinuso also includes delay slots on other instructions, where the result is not available for a number of cycles after the instruction has been executed. To complicate matters, certain instruction patterns have a different number of delay slots. For example, the compare instruction normally has two delay slots, but only one when followed by a branch. We have modified the gem5 in-order CPU model to support a dynamic commit delay. This allows us to fetch instructions in the proper order, which maintains cache behavior, but still simulate the behavior of Tinuso?s delay slots. Along the way, we have found gem5 to be a very flexible platform that we have been able to use for our design space exploration and compiler verification.}
    }

This entry was posted in Dissemination. Bookmark the permalink.

Comments are closed.