clock rate: used to refer to the speed of CPU. (Hz)
clock cycle(時脈週期): Essentially all computers are constructed using a clock running at a constant time. Thise discrete time events are called tick, clock trick, clock periods, clocks, cycles, or clock cycles. Computer designers refer to the time of a clock period by its duration (e.g., 1 ns) or by its rate (e.g., 1GHz).
(把 pipeline 想成血管, clock cycle 很高則血液流得快,但太高會爆血管。clock cycle time 可以調整。) (大部分一個 instruction 出去,執行一個cycle,)
Execution time (second)
= (CPU clock clock cycle for a program) * (clock cycle time)(second)
= (CPU clock clock cycle for a program) / (clock rate)(Hz)
CPU time: (clock cycle/program)*(clock cycle/Instruction)*(second/clock cycle)
==================================
Example:
Our favorite program runs in 10 seconds on computer A, which has a 400Mhz. clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many cycles as machine A for the same program. What clock rate should we tell the designer to target?"
a program Execution time/number of cycle/clock rate
computer A ==> 10 second / X /400 Mhz
computer B ==> 6 second / 1.2X / ? Mhz
Execution time (second) = (number of cycle) * (clock cycle time)(second)
= (number of cycle) / (clock rate)(Hz)
computer A number of cycle = Execution time * clock rate
= 10 * 400Mhz
= 10 * 400 * 10^6 hz
= 4000 * 10^6 hz
computer B clock rate = (number of cycle) / (Execution time)
= 4000 * 10^6 * 1.2 / 6
= 800 * 10^6
= 800 Mhz
X = 800 Mhz
===================================
1Hz(hertz) = 1 cycle per second
kHz = 10^3 Hz
MHz = 10^6 Hz
GHz = 10^9 Hz
THz = 10^12 Hz
===================================
3+6=9
operand "3" and "6"
operator "+"
===================================
instruction set Architecture (ISA)
An instruction set is a list of all the instructions, and all their variations, that a processor can execute.
Instructions include:
Arithmetic such as add and subtract
Logic instructions such as and, or, and not
Data instructions such as move, input, output, load, and store
Control flow instructions such as goto, if ... goto, call, and return.
The ISA serves as the boundary between the aostware and hardware.
===================================
pipelining Hazards
Hazards prevent next instruction from executing during its designated clock cycle
Structural hazards: attempt to use the same hardware to do two different things at once
Data hazards: Instruction depends on result of prior instruction still in the pipeline
Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).
===================================
尚未整理
CPI-inst count-cycle time
MIPS
locatly
trade-off
CICS
RICS
Classifying Instruction Set Architecture:
(a) Stack
(b) Accumulator
(c) Register-Memory
(d) Register-Register/load-stor
Quantitative Principles of Design
1.Take Advantage of Parallelism
2.Principle of Locality
將 data 放在是當的位置,使其存取 (access) 方便。
包含(1)Temporal locality 和(2)Spatial locality。
(1) loot , reuse 等小區塊資訊常被使用,可放在快取。
(2) straight-line code, array access等連續記憶體,可一並下載至快取,方便下次使用。
3.Focus on the Common Case
4.Amdahl’s Law
5.The Processor Performance Equation
2008年9月30日 星期二
2008年9月29日 星期一
[Computer Architectue]作業2
9/23
Course Website: http://w2cn.cis.nctu.edu.tw/
注意事項如下:
1. 這禮拜有reading assign,老師說最多寫半頁就好,至多一頁,due day是9/29(Mon)
Reading assign: Chapter 1 & Appendix B (Lec 1 & 2, & appendix B handout)
2.下禮拜上課會開始排座位,每個人要按照座位表座,出席跟課堂上發問會算分數(15%)
3.下禮拜四(10/2)晚上補課,老師說會請吃晚餐??
Course Website: http://w2cn.cis.nctu.edu.tw/
注意事項如下:
1. 這禮拜有reading assign,老師說最多寫半頁就好,至多一頁,due day是9/29(Mon)
Reading assign: Chapter 1 & Appendix B (Lec 1 & 2, & appendix B handout)
2.下禮拜上課會開始排座位,每個人要按照座位表座,出席跟課堂上發問會算分數(15%)
3.下禮拜四(10/2)晚上補課,老師說會請吃晚餐??
[Computer Architectue]作業1
9/16
Problem 1: Memory Hierarchy
Problem 1a: Assume that we have a 32-bit processor (with 32-bit words) and that this processoris byte-addressed (i.e. addresses specify bytes). Suppose that it has a 512-byte cache that is twowayset-associative, has 4-word cache lines, and uses LRU replacement. Split the 32-bit addressinto “tag”, “index”, and “cache-line offset” pieces. Which address bits comprise each piece?
tag:
index:
offset: bits 3—0 (we’ll give you this one).
Problem 1b: How many sets does this cache have? Explain.
Problem 1c: Draw a block diagram for this cache. Show a 32-bit address coming into thediagram and a 32-bit data result and “Hit” signal coming out. Include, all of the comparators inthe system and any muxes as well. Include the data storage memories (indexed by the “Index”),the tag matching logic, and any muxes. You can indicate RAM with a simple block, but makesure to label address widths and data widths. Make sure to label the function of various blocksand the width of any buses.
Problem 1d: Below is a series of memory read references set to the cache from part (a). Assumethat the cache is initially empty and classify each memory references as a hit or a miss. Identifyeach miss as either compulsory, conflict, or capacity. One example is shown. Hint: start bysplitting the address into components. Show your work.
Problem 1e: Calculate the miss rate and hit rate.
Problem 2: Basic Pipelining
Problem 1a: Assume that we have a 32-bit processor (with 32-bit words) and that this processoris byte-addressed (i.e. addresses specify bytes). Suppose that it has a 512-byte cache that is twowayset-associative, has 4-word cache lines, and uses LRU replacement. Split the 32-bit addressinto “tag”, “index”, and “cache-line offset” pieces. Which address bits comprise each piece?
tag:
index:
offset: bits 3—0 (we’ll give you this one).
Problem 1b: How many sets does this cache have? Explain.
Problem 1c: Draw a block diagram for this cache. Show a 32-bit address coming into thediagram and a 32-bit data result and “Hit” signal coming out. Include, all of the comparators inthe system and any muxes as well. Include the data storage memories (indexed by the “Index”),the tag matching logic, and any muxes. You can indicate RAM with a simple block, but makesure to label address widths and data widths. Make sure to label the function of various blocksand the width of any buses.
Problem 1d: Below is a series of memory read references set to the cache from part (a). Assumethat the cache is initially empty and classify each memory references as a hit or a miss. Identifyeach miss as either compulsory, conflict, or capacity. One example is shown. Hint: start bysplitting the address into components. Show your work.
Problem 1e: Calculate the miss rate and hit rate.
Problem 2: Basic Pipelining
Problem 2a: How many branch delay slots does this pipeline have? Explain
Problem 2b: Suppose that we include complex branch conditions: eg:bisqrt $r1, $r2, somewhere ; branch If $r1 square root of $r2Is this likely to change the number of branch delay slots? If not, explain. If so, how many willthere be now?
Problem 2c: What is a load delay-slot? Is it a feature of the instruction set or of a particularimplementation?
Problem 2d: Suppose that the data cache takes 1 cycle to access on a hit and 100 cycles for amiss. How many load delay slots will there be? Explain.
Problem 2e: Suppose that cache hits (both instructions and data) take 4 cycles but are pipelined.What does this affect?
Problem 2f: Modify the following datapath to handle forwarding. Be careful (don’t forgetforwarding for branches!). Pick an economical solution and make sure to include control signals.You can draw something below if required.
Problem 2b: Suppose that we include complex branch conditions: eg:bisqrt $r1, $r2, somewhere ; branch If $r1 square root of $r2Is this likely to change the number of branch delay slots? If not, explain. If so, how many willthere be now?
Problem 2c: What is a load delay-slot? Is it a feature of the instruction set or of a particularimplementation?
Problem 2d: Suppose that the data cache takes 1 cycle to access on a hit and 100 cycles for amiss. How many load delay slots will there be? Explain.
Problem 2e: Suppose that cache hits (both instructions and data) take 4 cycles but are pipelined.What does this affect?
Problem 2f: Modify the following datapath to handle forwarding. Be careful (don’t forgetforwarding for branches!). Pick an economical solution and make sure to include control signals.You can draw something below if required.
Problem 2g: Why might the following instruction sequence be important? Is there any way thatwe can handle it without stalling?
Problem 3: Open problem
Problem3a:
Problem3b: RISC V.S. CISC
Problem3c:
Problem 3: Open problem
Problem3a:
Problem3b: RISC V.S. CISC
Problem3c:
訂閱:
文章 (Atom)