# Power Management

## Outline

- Power Management Mechanisms
- Server Power Management

## Power and Energy

Power (Watts) = Energy (Joules) / Time (sec)

- Power is limited by infrastructure (e.g., power supply)
- Energy: what the utilities charge for or battery can store

### **CMOS Power Consumption**

$$P_{total} = P_{dyn} + P_{stat} = P_{tran} + P_{sc} + P_{lkg}$$

#### **Dynamic Power**

- Signal transitions
  - Logic activity
  - Glitches
- Short-circuit





- Leakage



#### **Dynamic Power Consumption**

- Dynamic power:  $P_{dyn} = \alpha \cdot C_L \cdot f_{clk} \cdot V_{DD}^2$ 
  - α activity factor (i.e., the probability the given node will change its state from 1 to 0 or vice versa at a given clock tick)
  - C<sub>L</sub> total load capacitance
  - $f_{clk}$  clock frequency
  - V<sub>DD</sub> supply voltage
- Circuit techniques to reduce dynamic power
  - State/bus encoding ( $\sqrt{\alpha}$ )
  - Reduce device size ( $\downarrow C_L$ )
  - Pipelining & parallelism ( $\downarrow f_{clk} \downarrow V_{DD}$ )
- Run-time techniques to reduce dynamic power
  - Clock-gating  $(\downarrow \alpha)$
  - Dynamic Voltage & Frequency Scaling DVFS ( $\downarrow f_{clk}, \downarrow V_{DD}$ )

### Leakage Power Consumption

- Static power:  $P_{\text{static}} = I_{\text{stat}} \cdot V_{\text{DD}} = (I_{\text{sub}} + I_{\text{D}} + I_{\text{GIDL}} + I_{\text{PT}} + I_{\text{G}}) \cdot V_{\text{DD}}$ 
  - I<sub>sub</sub> Subthreshold leakage
  - I<sub>D</sub> Junction Reverse Bias Current
  - I<sub>GIDL</sub> Gate Induced Drain Leakage
  - I<sub>PT</sub> Punch-through Current
  - IG Gate Tunneling Currents
  - V<sub>DD</sub> Supply voltage
- Circuit techniques to reduce leakage power
  - Increase V<sub>t</sub>: use Multiple-threshold ( $V_t$ ) devices ( $\downarrow$  I<sub>stat</sub>)
    - Use low  $V_t$  devices (have high leakage) only in critical circuits
- Run-time techniques to reduce leakage power
  - Reduce idle circuit's voltage to retention (  $\downarrow V_{\text{DD}}$  )
  - Power-gate idle circuit ( $\downarrow V_{DD}$ )
  - Dynamic Voltage & Frequency Scaling DVFS (  $\downarrow V_{DD}$ )



#### **PM Architecture Overview**



## Clock-gating



Clock-gating stops the clock to an idle core/unit to reduce dynamic power

### Local Power-gating



#### **Global Power-gating**



#### **DVFS**



chip serial voltage identification (SVID)

The PMU controls the VR using an off-

#### Advanced PM Architecture (I)



Per-Core PLL enable per-cores Dynamic Frequency Scaling (DFS)

#### Advanced PM Architecture (II)



## More Advanced PM Features

- There are more advanced PM features:
  - Power budget management
  - Computational sprinting (e.g., Turbo)
  - Maximum current limit protection
  - Maximum voltage limit protection
  - Voltage emergency prevention & avoidance
  - Adaptive voltage scaling
  - Reliability degradation mitigation
  - System level idle power-states
  - System level DVFS
  - Race to halt
  - Hardware duty cycling
  - ...

## Outline

- Power Management Mechanisms
- Server Power Management

### Server Power Management

- Core Idle States
- Package Idle States
- DRAM Idle States
- IO Link States

### Core Idle Power State – Core C-states

- Core C-states are power saving states enable the core to reduce its power consumption during idle periods
- Intel's Skylake architecture offers four main Core C-state:

| Core State | Sleep Level | Power per core | <b>Transition Latency</b> |
|------------|-------------|----------------|---------------------------|
| СО         | Active      | 4W             |                           |
| C1         | Shallow     | 1.4W           | 2µs                       |
| C1E        | Medium      | 0.9W           | 10µs 📕                    |
| C6         | Deep        | 0.1W           | 133µs                     |

Transition Latency: Time to switch from an active to an idle state (and back)

#### CO (Active) Core C-state



#### C1 (Shallow) Core C-states



| C-State   | Clocks       | ADPLL | L1/L2 Cache | Voltage | Context    |
|-----------|--------------|-------|-------------|---------|------------|
| <b>C1</b> | Most Stopped | On    | Coherent    | Nominal | Maintained |

### C1E (Medium) Core C-state



| C-State | Clocks       | ADPLL L1/L2 Cache |          | Voltage Contex |            |
|---------|--------------|-------------------|----------|----------------|------------|
| C1E     | Most Stopped | On                | Coherent | Min V/F        | Maintained |

Flush L1/L2 Caches



| C-State   | e Clocks ADPLL |    | L1/L2 Cache Voltage |         | Context    |  |
|-----------|----------------|----|---------------------|---------|------------|--|
| <b>C6</b> | Running        | On | Flushed             | Nominal | Maintained |  |

Voltage L3 Cache SF CMS (1.375MB) Save Core's Vector Execution (Snoop (Converged mesh stop) Filter) **Server Extension Context to** S/R SRAM Vector L1D 256KB L2 **Execution** Engine 8 CH Save/ Loau/Store 768KB L2 Restore Server ution Ports **SRAM** Extension Decode L11& **Out-Of-Order** 81 Fetch/Prefetch ADPLL Engine **MS-ROM** FIVR A A

| C-State   | itate Clocks |    | ADPLL L1/L2 Cache |         | Context  |  |
|-----------|--------------|----|-------------------|---------|----------|--|
| <b>C6</b> | Running      | On | Flushed           | Nominal | S/R SRAM |  |

L3 Cache SF CMS (1.375MB) **Turn-off the** Vector Execution (Snoop (Converged mesh stop) Filter) **Server Extension** clocks and PLL Vector L1D 256KB L2 **Execution** Engine & Ctl & Ctl Save/ Load/Store 768KB L2 Restore Server Extension **Execution Ports SRAM** Decode L11 & **Out-Of-Order** 81 Fetch/Prefetch NDDI Engine **MS-ROM** FIVR 

Voltage

| C-State   | Clocks  | ADPLL | L1/L2 Cache | Voltage | Context  |
|-----------|---------|-------|-------------|---------|----------|
| <b>C6</b> | Stopped | off   | Flushed     | Nominal | S/R SRAM |



| C-State   | Clocks  | ADPLL | ADPLL L1/L2 Cache |          | Context  |
|-----------|---------|-------|-------------------|----------|----------|
| <b>C6</b> | Stopped | off   | Flushed           | Shut-off | S/R SRAM |

## Package C-states

- Package C-states are power saving states that enable the uncore and DRAM to reduce their power consumption during idle periods
- For a system to enter Package C-states, the cores and IO links should be idle
- Intel's Skylake architecture offers three Package C-states:
  - PCO Active
  - PC2 Intermediate (non-architectural)
  - PC6 Deep

#### PCO (Active) Package C-state



#### All cores in CC6



#### **IOs in L1 state**



29

#### **Dram in Self Refresh**



30



Reduce CLM voltage to retention



| PC-state | <b>CC-state</b> | Clock   | PLL | Voltage | LLC      | 10 | DRAM |
|----------|-----------------|---------|-----|---------|----------|----|------|
| PC6      | <b>C6</b>       | Stopped | Off | Ret     | Coherent | L1 | SR   |

32

## **IO Link States**

- Link Power States: Power saving states that enable the IO to reduce its power consumption during idle periods.
  - LO: Active
  - LOp: Partial Active (<10ns, 25%)
  - LOs: Standby (<64ns, 50%)
  - L1: Link Down (us)



## **DRAM Idle States**

- CKE (Clock Enable)-OFF mode:
  - Normally, the MC (Memory Controller) sends clock signal to DRAM
  - When MC turns off the CKE signal the DRAM can enter either the Active Power Down mode or the Pre-charged Power Down (10-30ns, >50%).
- Self-refresh:
  - Normally, the MC sends refresh commands to DRAM
  - When DRAM enters self-refresh, DRAM is responsible to issue the refresh commands as a result the interface between the SoC and DRAM can be turned-off (several us).



