Topics Covered

Chapters Covered:
  • 5, 7, 8
  • Performance, Memory, Caches, Memory Management
    • All other knowledge implied: Machine Design, Digital/Sequential design too
Topics:

Processor design
        1 bus vs 2 bus vs 3 bus designs
        - 1-bus: All data transfer over 1 bus
        - 2-bus: one for data, another for address
        - 3-bus: one for data, address, and control. Fast, but complex
        Tri-state devices, what they are and why they are necessary
        - Allow lines to be disconnected from the bus, allow for values to remain stable and specific (otherwise, held values/flip flops would interrupt)
        - Why? Prevents conflicts when multiple devices use the same bus.
        What the control signals do
        - Tell components what to do. PCin, PCout, Read/Write, Load, etc.
        Why control signals need to be asserted carefully
        - Is there to this? - Incorrect assertions leads to incorrect execution
        Why you cannot connect outputs together
        - The same reason you need tri-state, they would mix and overwrite each other.
        - If two devices drive different values, they clash and cause errors (short circuit).
        - Solution? Tri-state buffers.
        How to evaluate a diagram of a machine 
- Look for how buses connect to ALU, memory, registers
- Look at control signals, and how they affect data flow
        How to make a machine perform functions like Fetch, Execute, etc.
        - How to actually do this?
        - Cycle:
        - Fetch
        - Decode
        - Execute
        - Store
        - Repeat
        How CPU control logic can be represented as a State Transition Diagram
        - It can be represented as moving through the FDE cycle in a state transition diagram. Moving through the same fetch and decode cycles, then branching out to execute, then coming back to the same fetch state to repeat.

Processor performance
        Defining performance
                1. Throughput - Total instructions executed per second
                2. Latency - Time taken for one instruction
        Measuring performance
                What a benchmark program is, what it is used for, etc.
                - It’s used to simulate a workload to measure CPU performance
        Reporting performance
                MIPS vs FLOPS vs Elapsed time
                - MIPS (Millions of Instructions Per Second): Can be misleading, different instructions have different kinds.
                - FLOPS (Floating Point Operations Per Second): A more representative metric. Especially valuable for scientific computing.
                - Elapsed Time: Real world performance, the literal time
        Improving performance
                CPU time equation
                - CPU_time = CPI x Instructions x Cycle Time
                -
        Basics of Amdahl’s law
        - “the overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used”
        - Optimize the things that are used a lot
        The heat problem in modern processors
        - Find More: Modern CPUs generate heat due to high frequency & power usage.
        -
        
Basics of Memory
        Static memory cells (what they are, how they work, etc.)
        - SRAM: Uses flip flops to store bits
        - Faster (no need to refresh?), but expensive
        Dynamic memory cells (what they are, how they work, etc.)
        - DRAM: Uses capacitors to store bits
        - Slower, but denser and cheaper
        - “Needs a periodic refresh?”

Basics of Caching
        What the memory hierarchy is
        - Registers → L1 → L2 → L3 → RAM → Disk.
- Closer = faster, but smaller.
        What locality is, the kinds of locality, and why it matters
- Temporal Locality: Recently accessed data is likely to be reused / reaccessed
- Spatial Locality: Nearby memory locations are likely to be reused / reaccessed
        The goal of caching
        - Store frequently used data closer to the CPU to speed up access
        How a cache works in detail
                FA, SA, DM, replacement strategies, write policies, line sizes,
                etc.
                - FA: Fully associative. Blocks can go anywhere
                - SA: Set associative. Blocks can go in a few places
                - DM: Direct mapped. Each memory block maps to one cache locations
                - Replacement: LRU (least recently used), FIFO, etc.
                - Write Policies: Write back vs. write-through
        Average Memory Access Time equation
- AMAT=Hit Time+(Miss Rate×Miss Penalty)

Basics of Virtual Memory
        What virtual memory is
        - Provides illusion of large memory using disk storage.
        Why it is necessary
        - Allows programs more memory than physically available
        - Memory protection between processes
        What program characteristic VM exploits in order to work
        - Locality: Only small portions of a program used at a time
        How VM works in detail
                Frames, pages, Page Tables, etc.
                - Memory is divided into pages (logical units)
                - Pages are mapped to frames (physical memory)
                - Page Table: Maps virtual addresses to physical addresses
        What a TLB is, why it is necessary
        - Translation Lookaside Buffer caches recent address translations.
        - Speeds up access, avoids page table lookups
        What a page table walk is, why it is necessary
        - Happens when TLB misses
        - CPU searches page table for correct frame

Virtual Memory

But, what is Virtual Memory?

  • Has some good visualizations!

Virtual Addresses
Physical Addresses

Memory: RAM, around 4 GB in past. Could also access other forms of memory, like other devices.
Problems with physical addresses:

  1. Not enough memory -> crash if not enough RAM
  2. Memory Fragmentation -> Run out of space
  3. Security -> Corrupt other program’s data
    They could be solved by giving each program its own memory space (virtual memory)
  • However, it has to be mapped back to real memory in RAM (physical memory)

Swap Memory: additional memory, memory not in RAM
Page Fault: When data not available in RAM and we have to go to disk, we call that a page fault

Implementation:

CPU has to find the mapping from virtual address, to a physical address, and read that data.

If a page table has every entry in physical memory:

  • a word is 32 bits
  • 2^32 addresses for each byte = 2^30 words
    Instead of 1 entry to 1 word, allocate blocks / pages

Use TLB to MASSIVELY speed this up. If hit, can be 1 cycle, and not have to do as many checks with tables in memory.

MMU - Memory Management Unit, usually on CPU and programmed by OS