JDEP 284H Foundations of Computer Systems

# **The Memory Hierarchy**

Dr. Steve Goddard goddard@cse.unl.edu

http://cse.unl.edu/~goddard/Courses/JDEP284

# Giving credit where credit is due

- Most of slides for this lecture are based on slides created by Drs. Bryant and O'Hallaron, Carnegie Mellon University.
- I have modified them and added new slides.

**Topics** 

- Storage technologies and trends
- Locality of reference
- Caching in the memory hierarchy

# Random-Access Memory (RAM)

## Key features

- RAM is packaged as a chip.
- Basic storage unit is a cell (one bit per cell).Multiple RAM chips form a memory.

## Static RAM (SRAM)

- Each cell stores bit with a six-transistor circuit.
- Retains value indefinitely, as long as it is kept powered.
- Relatively insensitive to disturbances such as electrical noise.
- Faster and more expensive than DRAM.

## Dynamic RAM (DRAM)

- Each cell stores bit with a capacitor and transistor.
- Value must be refreshed every 10-100 ms.
- Sensitive to disturbances.
- Slower and cheaper than SRAM.

| SRAM vs DRAM Summary |                  |                |          |            |      |                                 |  |
|----------------------|------------------|----------------|----------|------------|------|---------------------------------|--|
|                      | Tran.<br>per bit | Access<br>time | Persist? | Sensitive? | Cost | Applications                    |  |
| SRAM                 | 6                | 1X             | Yes      | No         | 100x | cache memories                  |  |
| DRAM                 | 1                | 10X            | No       | Yes        | 1X   | Main memories,<br>frame buffers |  |











All enhanced DRAMs are built around the conventional DRAM core.

- Fast page mode DRAM (FPM DRAM)
   Access contents of row with [RAS, CAS, CAS, CAS, CAS] instead of [(RAS,CAS), (RAS,CAS), (RAS,CAS), (RAS,CAS)].
- instead of [(RAS,CAS), (RAS,CAS), (RAS,CAS), (RAS,CAS) Extended data out DRAM (EDO DRAM)
- Enhanced FPM DRAM with more closely spaced CAS signals.
   Synchronous DRAM (SDRAM)
- Driven with rising clock edge instead of asynchronous control signals.
- Double data-rate synchronous DRAM (DDR SDRAM)
   Enhancement of SDRAM that uses both clock edges as control signals.
- Video RAM (VRAM)
  - Like FPM DRAM, but output is produced by shifting row buffer
     Dual ported (allows concurrent reads and writes)



















## **Disk Geometry**

Disks consist of platters, each with two surfaces. Each surface consists of concentric rings called tracks. Each track consists of sectors separated by gaps.





## **Disk Capacity**

Capacity: maximum number of bits that can be stored. • Vendors express capacity in units of gigabytes (GB), where 1 GB = 10^{A\_2}

Capacity is determined by these technology factors:

- Recording density (bits/in): number of bits that can be squeezed into a 1 inch segment of a track.
- Track density (tracks/in): number of tracks that can be squeezed into a 1 inch radial segment.

Areal density (bits/in2): product of recording and track density.
 Modern disks partition tracks into disjoint subsets called recording

- zones
- Each track in a zone has the same number of sectors, determined by the circumference of innermost track.
- Each zone has a different number of sectors/track

## Capacity = (# bytes/sector) x (avg. # sectors/track) x (# tracks/surface) x (# surfaces/platter) x (# platters/disk) Example: = 512 bytes/sector = 512 bytes/sector = 20,000 tracks/surface = 2 surfaces/platter = 5 platters/disk Capacity = 512 x 300 x 20000 x 2 x 5 = 30,720,000,000 = 30.72 GB





## **Disk Access Time**

#### Average time to access some target sector approximated by : ■ Taccess = Tavg seek + Tavg rotation + Tavg transfer

#### Seek time (Tavg seek)

Time to position heads over cylinder containing target sector.
Typical Tavg seek = 9 ms

#### Rotational latency (Tavg rotation)

Time waiting for first bit of target sector to pass under r/w head.
 Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min

#### Transfer time (Tavg transfer)

- Time to read the bits in the target sector.
- Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1 min.

## **Disk Access Time Example**

#### Given:

- Rotational rate = 7,200 RPM
  Average seek time = 9 ms.
- Average seek time = 9 ms.
   Avg # sectors/track = 400.

#### Derived:

- Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms.
- Tavg transfer = 60/7200 RPM x 1/400 secs/track x 1000 ms/sec = 0.02 ms
- Taccess = 9 ms + 4 ms + 0.02 ms

### Important points:

- Access time dominated by seek time and rotational latency.
- First bit in a sector is the most expensive, the rest are free.
- SRAM access time is about 4 ns/doubleword, DRAM about 60 ns
   Disk is about 40,000 times slower than SRAM,
- 2,500 times slower then DRAM.

## **Logical Disk Blocks**

Modern disks present a simpler abstract view of the complex sector geometry:

 The set of available sectors is modeled as a sequence of bsized logical blocks (0, 1, 2, ...)

Mapping between logical blocks and actual (physical) sectors

- Maintained by hardware/firmware device called disk
- controller.
- Converts requests for logical blocks into (surface,track,sector) triples.

Allows controller to set aside spare cylinders for each zone.

Accounts for the difference in "formatted capacity" and "maximum capacity".









|      | metric           | 1980   | 1985  | 1990 | 1995  | 2000  | 2000:1980 |
|------|------------------|--------|-------|------|-------|-------|-----------|
| SRAM | \$/MB            | 19,200 | 2.900 | 320  | 256   | 100   | 190       |
|      | access (ns)      | 300    | 150   | 35   | 15    | 2     | 100       |
|      |                  |        |       |      |       |       |           |
|      | metric           | 1980   | 1985  | 1990 | 1995  | 2000  | 2000:1980 |
| DRAM | \$/MB            | 8,000  | 880   | 100  | 30    | 1     | 8,000     |
|      | access (ns)      | 375    | 200   | 100  | 70    | 60    | 6         |
|      | typical size(MB) | 0.064  | 0.256 | 4    | 16    | 64    | 1,000     |
|      | metric           | 1980   | 1985  | 1990 | 1995  | 2000  | 2000:1980 |
|      | \$/MB            | 500    | 100   | 8    | 0.30  | 0.05  | 10,000    |
| Disk | access (ms)      | 87     | 75    | 28   | 10    | 8     | 11        |
|      | typical size(MB) | 1      | 10    | 160  | 1.000 | 9.000 | 9.000     |













## **Memory Hierarchies**

Some fundamental and enduring properties of hardware and software:

- Fast storage technologies cost more per byte and have less capacity.
- The gap between CPU and main memory speed is widening.
- Well-written programs tend to exhibit good locality.

These fundamental properties complement each other beautifully.

They suggest an approach for organizing memory and storage systems known as a memory hierarchy.



## Caches

# Cache: A smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device.

Fundamental idea of a memory hierarchy:

- For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1.
- Why do memory hierarchies work?
  - Programs tend to access the data at level k more often than they access the data at level k+1.
  - Thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit.
  - Net effect: A large pool of memory that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top.







| Cache Type              | What Cached             | Where Cached           | Latency<br>(cycles) | Managed<br>By       |
|-------------------------|-------------------------|------------------------|---------------------|---------------------|
| Registers               | 4-byte word             | CPU registers          | 0                   | Compiler            |
| TLB                     | Address<br>translations | On-Chip TLB            | 0                   | Hardware            |
| L1 cache                | 32-byte block           | On-Chip L1             | 1                   | Hardware            |
| L2 cache                | 32-byte block           | Off-Chip L2            | 10                  | Hardware            |
| Virtual<br>Memory       | 4-KB page               | Main memory            | 100                 | Hardware+<br>OS     |
| Buffer cache            | Parts of files          | Main memory            | 100                 | OS                  |
| Network<br>buffer cache | Parts of files          | Local disk             | 10,000,000          | AFS/NFS<br>client   |
| Browser<br>cache        | Web pages               | Local disk             | 10,000,000          | Web<br>browser      |
| Web cache               | Web pages               | Remote server<br>disks | 1,000,000,000       | Web proxy<br>server |