Project Milestone for CS740: Computer Architecture

Fast Block Operation in DRAM

Group Member : Ningning Hu (hnn@cs.cmu.edu)
Jichuan Chang (cjc@cs.cmu.edu)
Project URL: http://www.cs.cmu.edu/~hnn/cs740-project.html

Major Changes: The main change is about our benchmarks. We have expected to use some popular benchmarks such as Spec95 to evaluate our work. But most of the benchmarks we could get proved to be unsuitable (refer to "Surprise" section for the reasons). Now we have to write our own benchmarks, and base our analysis on them.

What We Have Accomplished So Far:
1. We have designed the hardware structure of DRAM in order to suppor the block operation. We consider the following three modes of block copy:
    (i) Aligned row copy: copy one row to another row, with the assumption that source and destination address have the same row offset;
    (ii) Unaligned row copy: this mode is to alleviate the strict requirement of (i), and it enables systems to do block copy so long as the block size if big enough;
    (iii) Subrow copy: when row size is too large, we find there are only a few block operations, so we use this mode to reduce the operation unit so that block copy could work more frequently. To support this mode, we need more complicated hardware, and that makes us suspect it will improve the performance greatly, which will be tested by our evaluation.
2. We have defined and implemented the instructions in SimpleScalar to support the above three modes of block operation.
3. For each of the above three modes, modify the system calls whose performance could be improved , mainly include memcpy() and bcopy(). Update the library of SimpleScalar.
4. Building benchmarks. Although, there are few existing benchmarks we could use to analyze our work, we did find two (even these two are not very useful, better than nothing anyway), and successfully build them on SimpleScalar.

Meeting Our Milestone: Our milestone is, up to now, we should have finished the implementation of the new memory instruction on the simulator and should be on the way of evaluation. Since we have finished the modification of SimpleScalar and currently working on our benchmarks, we have met our milestone.

Surprise:
1. Because of the limitation of the materials about SimpleScalar, we have mistakenly spent too much time on hacking the source code of SimpleScalar Glib. Later, another successful method proves it totally unnecessary.
2. Most of the popular benchmarks proved to be unsuitable for our testing. The reasons include:
(i) There are too few block copy operations in the benchmarks, even there exist some, the block sizes are generally too small to use our block operation instruction, especailly for the two row copy modes. Most of the benchmarks we could get from Spec95, and the networking benchmarks belong to this category.
(ii) The benchmarks use some system calls, for example, some mathematic library, X window library, which is not supported by SimpleScalar. SPLASH and some of Spec95 belong to this category.

Revised Schedule: We should be able to achieve our modified goal following the schedule listed in our project proposal.

Resource Needed: Currently, what we need is some good benchmarks which could be supported by SimpleScalar. Since time is limited and all our effort on it failed, we currently just use our own benchmarks to do the analysis.

Last Modified: Nov. 19, 2000