# Inproving Output TimingStability for WaveGen (Part 2)

٧l

Overview of Waveform Generator Design Evolution: What and Why



E2. Timer advances data from buffer to DAC. Low/Empty ISR writes next batch of data to buffer

## Detailed Overview of What and Why in Design Evolution

Output timing bad: Very unstable, vulnerable to other software (processes and handlers), timing errors accumulate. Greedy, doesn't share CPU. And add OS with ticks from HW timer interrupts

A2. OS-triggered periodic task software writes to DAC

A. Task software writes to DAC

Add HW timer (tracks time accurately)

And access HW timer directly

B. Task software poll/blocks on timer, then writes to DAC

And add DMA with ISR. software buffer

F.Timer triggers DMA data transfer, DMA ISR writes data to buffer

Output timing better: Tolerates more interference, vulnerable to processes and handlers, errors don't accumulate. Shares CPU.

Tight deadline:T<sub>Samble</sub>

Put code in HW timer's ISR

interference, vulnerable to processes and handlers, errors don't accumulate. Greedy, doesn't share CPU. Tight deadline: T<sub>Samble</sub>

Output timing better: Tolerates more

.Tight Deadline: ISR must write first new sample to buffer within T<sub>c</sub>

2. Long DMA ISK is delays other processing too much

Output timing: Even better. Vulnerable to other ISRs and interrupt locking f<sub>samble</sub> times per second Tight deadline: T<sub>Samble</sub>

Use 1-deep DAC input buffer

D. Timer advances data from buffer to DAC. Timer ISR writes next data to buffer

C. Timer ISR writes next data to DAC

Deadline better:  $2T_{Samble}$ Interrupt overhead for each sample wastes CPU time.

Use N-deep DAC input buffer with low/empty ISR

E. Timer advances data from buffer to DAC. Timer ISR writes next data to buffer

Interrupt overhead for each sample wastes CPU time

Add N-deep DAC input buffer with low/empty ISR

E2. Timer advances data from buffer to DAC. Low/Empty ISR writes next batch of data to buffer Split into double-by fer to ease first sample's deadline and cuts ISR duration in half.

G. Timer triggers DMA with double-buffering, DMA ISR switches buffers and refills with data

H. Timer triggers DMA, DMA ISR writes urgent data to buffer and triggers task to write rest of data

Move non-urgent

work to task

## Software and Hardware Components in Design Evolution

General Trend: Move operations which need synchronization (update output,



## **Example Code from ESF**

```
void Play_Tone(unsigned int period, unsigned int
num cycles, unsigned wave_type) {
 unsigned step, out data;
 while (num_cycles>0) {
  num_cycles--;
  for (step = 0; step < NUM_STEPS; step++) {</pre>
   switch (wave type) {
    case SQUARE:
     if (step < NUM STEPS/2)</pre>
      out data = 0;
     else
      out data = MAX DAC CODE;
     break;
    case RAMP:
     out_data = (step*MAX_DAC_CODE)/NUM_STEPS;
     break;
  5
     v1
```

```
case SINE:
    out_data = SineTable[step];
    break;
default:
    break;
}
DACO->DAT[0].DATH = DAC_DATH_DATA1(out_data >> 8);
DACO->DAT[0].DATL = DAC_DATL_DATA0(out_data);
Delay_us(period/NUM_STEPS);
}
}
```

## Vulnerability to Timing Interference

- A. Task busy-waits for constant time (blind), then writes to DAC
- A2. OS runs task periodically, task writes to DAC
- **B.** Task software poll/blocks on timer, then writes to DAC
- C. Timer ISR writes data to DAC
- D. Timer advances buffer data to DAC, Timer ISR writes next data to buffer
- E. Timer advances buffer data to DAC. Timer ISR writes data to buffer
- E2. Timer advances buffer data to DAC. Low/Empty ISR writes next batch of data to buffer
- F. Timer triggers DMA data transfer, DMA ISR writes next batch of data to buffer
- G. Timer triggers DMA transfer, DMA ISR switches buffers and writes next batch of data to previous buffer
- H. Timer triggers DMA transfer, DMA ISR writes urgent data to buffer, triggers task. Task writes rest of data batch

**DAC Overview** 



- Load DACDAT with 12-bit data N
- MUX selects a node from resistor divider network to create

$$V_o = (N+1)*V_{in}/2^{12}$$

- V<sub>o</sub> is buffered by output amplifier to create V<sub>out</sub>
  - V<sub>o</sub> = V<sub>out</sub> but V<sub>o</sub> is high impedance can't drive much of a load, so need to buffer it

## **DAC** Registers

#### **DAC** memory map

| Absolute address (hex) | Register name                       | Width<br>(in bits) | Access | Reset value | Section/<br>page |
|------------------------|-------------------------------------|--------------------|--------|-------------|------------------|
| 4003_F000              | DAC Data Low Register (DAC0_DAT0L)  | 8                  | R/W    | 00h         | 30.4.1/531       |
| 4003_F001              | DAC Data High Register (DAC0_DAT0H) | 8                  | R/W    | 00h         | 30.4.2/532       |
| 4003_F002              | DAC Data Low Register (DAC0_DAT1L)  | 8                  | R/W    | 00h         | 30.4.1/531       |
| 4003_F003              | DAC Data High Register (DAC0_DAT1H) | 8                  | R/W    | 00h         | 30.4.2/532       |
| 4003_F020              | DAC Status Register (DAC0_SR)       | 8                  | R      | 02h         | 30.4.3/532       |
| 4003_F021              | DAC Control Register (DAC0_C0)      | 8                  | R/W    | 00h         | 30.4.4/533       |
| 4003_F022              | DAC Control Register 1 (DAC0_C1)    | 8                  | R/W    | 00h         | 30.4.5/534       |
| 4003_F023              | DAC Control Register 2 (DAC0_C2)    | 8                  | R/W    | 0Fh         | 30.4.6/534       |

- This peripheral's registers are only eight bits long (legacy peripheral).
- DATA[11:0] stored in two registers
  - DATA0: Low byte [7:0] in DACx\_DATnL
  - DATA1: High nibble [11:8] in DACx\_DATnH

## **WAVEGEN STARTING POINT: DESIGN A**

## A. Simple Starter Code



while (1) {
 compute data
 // Synchronize: Wait until time for next sample
 for (t = T; t>0; t--); // busy wait loop creates delay
 // Position of following code implicitly schedules it
 write data to DAC
}



- Timing is unstable. Make T what value?
  - What if computing data takes variable time?
  - Vulnerable to interference by other handlers, processes on CPU
- Using busy-waiting to create time delay is greedy because it doesn't share CPU
- Synchronization and scheduling done completely in software

## Unbuffered DAC with Busy-Wait Code





## Sequence Diagram





## **IMPROVING OUTPUT TIMING STABILITY**

## Detailed Overview of What and Why



Add HW timer, DMA with ISR, software buffer

F.Timer triggers DMA data transfer, DMA ISR writes data to buffer

## Software and Hardware Components













## A2. Scheduler Releases WaveGen Periodically



- DAC update is on time if ...
  - Task\_WaveGen starts on time (tasks/ISRs finish early enough for scheduler to run it), and
  - Task WaveGen not preempted by other processing
- DAC update is delayed if other processing (tasks, ISRs) cause timing interference:
  - Task\_WaveGen starts late if tasks/ISRs finish too late (delaying scheduler), so DAC is updated late
  - Task\_WaveGen updates DAC late if preempted by higher-priority software processes (e.g. ISRs)

#### B. Software Polls Hardware Timer



#### B. Software Polls Hardware Timer



# Sequence Diagram





## Unbuffered DAC with Timer Polling



## C. Hardware Timer Periodically Triggers Interrupt for DAC Write



- Some timing jitter possible if time to compute data varies
- If so, may be able to flip order: write data computed in previous ISR execution, then pre-compute next data item



## Sequence Diagram



#### C. Timer ISR



## Example Source Code from ESF: Listings 7.7, 7.8

```
void Init TPM (void)
   // Turn on clock to TPM
   SIM->SCGC6 |= SIM SCGC6 TPM0 MASK;
   // Set clock source for tpm
   SIM->SOPT2 |= (SIM SOPT2 TPMSRC(1) | SIM SOPT2 PLLFLLSEL MASK);
   // Load the counter and mod, given prescaler of 32
   TPMO->MOD = (F TPM CLOCK/(F TPM OVFLW*32))-1;
   // Set TPM to divide by 32 prescaler, enable counting (CMOD) and interrupts
   TPMO->SC = TPM SC CMOD(1) | TPM SC PS(5) | TPM SC TOIE MASK;
   // Enable interrupts in NVIC
                                              void TPMO IRQHandler() {
  NVIC SetPriority (TPM0 IRQn, 3);
                                                 static int change=STEP SIZE;
  NVIC ClearPendingIRQ(TPM0 IRQn);
                                                 static uint16 t out data=0;
  NVIC EnableIRQ(TPM0 IRQn);
```

 ESF code generates each waveform sample in ISR (IRQ Handler)

C. Timing Analysis







## D. Add I-element Input Data Buffer for DAC



• Write timing window is now up to  $T_{\text{sample}}$  (50  $\mu$ s) before ideal output update time

## Sequence Diagram



- ISR can start slightly late and still update buffer in time
  - Slack time:  $T_{\text{sample}} T_{\text{compute}} T_{\text{to be determined}}$

#### D. Add Single-Entry Data Buffer before DAC



## **DAC Operating Modes**



Status flags in DACx SR

## Detailed Overview of What and Why



Add HW timer, DMA with ISR, software buffer

F.Timer triggers DMA data transfer, DMA ISR writes data to buffer

## F. Use Timer-Triggered DMA to Transfer Data



DMA Can Read and Write Memory and Peripheral Registers

**PWM** 

**Low-Power Timer** 

Periodic Interrupt Timers

Real-Time Clock



- System bus is a crossbar switch: three three-way switches
  - Each of three masters (CPU core, DMA, USB) can access a different device (flash memory, SRAM, peripheral bus) simultaneously
  - If multiple masters try to access same device, crossbar arbitrates (decides order for accesses)
- KL25Z Reference Manual
  - Ch. 20: Crossbar Switch Lite, Ch. 3 Section 4.6: Crossbar Configuration
  - Ch. 21: Peripheral Bridge

**Analog Comparator** 

12-bit DAC

16-bit DAC

Internal Voltage

Reference

## Basic Concepts

 Hardware which reads data from a source with and writes it to a destination

- Can copy data quickly
- Can eliminate ISRs which just copy data (e.g. move ADC results into buffer)
- Example: Data copy
  - Source: source memory buffer
  - Destination: destination memory buffer
  - Trigger: software command
- Various configurable options
  - Number of data items to copy
  - Source and destination addresses can be fixed or change (e.g. increment, decrement)
  - Size of data item (1, 2, 4 bytes)
  - When transfer starts



- Initialization: Configure controller
- Transfer: Data is copied
- Termination: Channel indicates transfer has completed

Copy Fill Boffer W/Input Data Fill Boffer W/Input Data Send our buffer Data

#### DMA Controller Details

- 4 independent channels
  - Channel 0 has highest priority
- 8-, 16- or 32-bit transfers, data size can differ between source and destination
- Modulo addressability
  - Pointer wraps around at end of buffer
  - Enables ring buffers with size of 2<sup>N</sup>
- Can trigger with hardware signal or software
- How many data items are transferred per trigger?
  - One: "Cycle stealing"
  - All items: grabs bus
- Hardware acknowledge/done signal



## Sequence Diagram







## Example Source Code from ESF: Listings 9.4, 9.5

ESF code initializes buffer (TriangleTable), then replays buffer contents repeatedly

```
void Play Tone with DMA(void) {
      Init RGB LEDs();
      Control RGB LEDs (0,0,0);
       Init DAC();
      Init TriangleTable();
      Init DMA For Playback(TriangleTable, NUM STEPS);
       Init TPM(10);
      Start TPM();
      Start DMA Playback();
       while (1)
                                                                                               TriangleTable
           #define MAX DAC CODE (4095)
           #define NUM STEPS (512)
           uint16 t TriangleTable[NUM STEPS];
           void Init TriangleTable(void) {
                 unsigned n, sample;
                 for (n=0; n<NUM STEPS/2; n++) {
                         sample = (n*(MAX DAC CODE+1)/(NUM STEPS/2));
                        TriangleTable[n] = sample; // Fill in from front
                         TriangleTable[NUM STEPS-1-n] = sample; // Fill in from back
```

## Example Source Code from ESF: Listings 9.6

```
uint16_t * Reload_DMA_Source=0;
uint32_t Reload_DMA_Byte_Count 1024
```

TriangleTable

```
void Init DMA For Playback(uint16 t * source, uint32 t count) {
      // Save reload information
      Reload DMA Source = source;
      Reload DMA Byte Count = count*2;
      // Gate clocks to DMA and DMAMUX
      SIM->SCGC7 |= SIM SCGC7 DMA MASK;
      SIM->SCGC6 |= SIM SCGC6 DMAMUX MASK;
      // Disable DMA channel to allow configuration
      DMAMUX0 \rightarrow CHCFG[0] = 0;
      // Generate DMA interrupt when done
      // Increment source, transfer words (16 bits)
      // Enable peripheral request
      DMAO->DMA[0].DCR = DMA DCR EINT MASK | DMA DCR SINC MASK |
      DMA DCR SSIZE(2) | DMA DCR DSIZE(2) | DMA DCR ERQ MASK | DMA DCR CS MASK;
      // Configure NVIC for DMA ISR
      NVIC SetPriority (DMA0 IRQn, 2);
      NVIC ClearPendingIRQ(DMA0 IRQn);
      NVIC EnableIRQ(DMA0 IRQn);
      // Set DMA MUX channel to use TPMO overflow as trigger
      DMAMUX0->CHCFG[0] = DMAMUX CHCFG SOURCE (54);
```

MCU Memory Address Space

TriangleTable

# Example Source Code from ESF: Listings 9.6, 9.7, 9.8

```
void Start_DMA_Playback() {
    // initialize source and destination pointers
    DMA0->DMA[0].SAR = DMA_SAR_SAR((uint32_t) Reload_DMA_Source);
    DMA0->DMA[0].DAR = DMA_DAR_DAR((uint32_t) (&(DACO->DAT[0])));
    // byte count
    DMA0->DMA[0].DSR_BCR = DMA_DSR_BCR_BCR(Reload_DMA_Byte_Count);
    // clear done flag
    DMA0->DMA[0].DSR_BCR &= ~DMA_DSR_BCR_DONE_MASK;
    // set enable flag
    DMAMUX0->CHCFG[0] |= DMAMUX_CHCFG_ENBL_MASK;
}
```

```
void DMA0_IRQHandler(void) {
    // Turn off blue LED in DMA IRQ handler
    Control_RGB_LEDs(0,0,0);
    // Clear done flag
    DMA0->DMA[0].DSR_BCR |= DMA_DSR_BCR_DONE_MASK;
    // Start the next DMA playback cycle
    Start_DMA_Playback();
    // Turn on blue LED
    Control_RGB_LEDs(0,0,1);
}
```

```
DMA Channel 0

SAR

DAC 0

DAT[0]
```

uint32\_t Reload\_DMA\_Byte\_Count 1024

uint16\_t \* Reload DMA Source=0;

# F. Timing Analysis



## **LOOSENING DEADLINES**

Share CPU core better, reducing timing interference to other SW processing

Overview of Waveform Generator Design Evolution: What and Why



#### Detailed Overview of What and Why

Output timing bad: Very unstable, vulnerable to other software (processes and handlers), timing errors accumulate. Greedy, doesn't share CPU.

And add OS with ticks from HW timer interrupts

> A2. OS-triggered periodic task software writes to DAC

> > Put code in HW timer's ISR

Add HW timer (tracks time accurately) And access HW

A. Task software

writes to DAC

timer directly

B. Task software poll/blocks on timer, then writes to DAC

And add DMA with ISR. software buffer

F.Timer triggers DMA data transfer, DMA ISR writes data to buffer

Output timing better: Tolerates more interference, vulnerable to processes and handlers, errors don't accumulate. Shares CPU.

Tight deadline: T<sub>Samble</sub>

handlers, errors don't accumulate. Greedy, doesn't share CPU. Tight deadline: T<sub>Samble</sub> C. Timer ISR writes next data to DAC

Output timing better: Tolerates more

Tight Deadline: ISR must write first new sample to interference, vulnerable to processes and buffer within T<sub>s</sub>

2. Long DMA ISR is delays other processing too much

Output timing: Even better. Vulnerable to other ISRs and interrupt locking f<sub>samble</sub> times per second Tight deadline: T<sub>Samble</sub>

Use 1-deep DAC input buffer

D. Timer advances data from buffer to DAC. Timer ISR writes next data to buffer

Deadline better:  $2T_{Samble}$ Interrupt overhead for each sample wastes CPU time.

Use N-deep DAC input buffer with low/empty ISR

E. Timer advances data from buffer to DAC. Timer ISR writes next data to buffer

Interrupt overhead for each sample wastes CPU time

Add N-deep DAC input buffer with low/empty ISR

E2. Timer advances data from buffer to DAC. Low/Empty ISR writes next batch of data to buffer Split into double-by fer to ease first sample's deadline and cuts ISR duration in half.

G. Timer triggers DMA with double-buffering, DMA ISR switches buffers and refills with data

H. Timer triggers DMA, DMA ISR writes urgent data to buffer and triggers task to write rest of data

Move non-urgent

work to task



- Buffer is queue: first-in, first-out (FIFO)
- Buffering delays data, so will not work for all applications



## Tight Timing Requirements for Buffer



- Despite fewer interrupts (due to multiple buffer entries), still need to save next sample to buffer before DMA reads it
- Examine and understand timing requirements for buffer
- Producer adds data (light blue-green) to fill in buffer
  - In example, first four items have already been added

- Consumer reads data item from red buffer entry
  - Data item in buffer is not needed (old, stale) after being read by consumer
- Producer (Thread\_Refill\_Sound\_Buffer) must stay ahead of consumer (DMA controller)

## G. DMA with Double Buffering



Loosening Timing Requirements with Double-Buffering



- Use two buffers, each half-size (N = 4 entries)
- Initialization
  - Start filling buffer 0
  - Can start playing buffer 0 after it has ≥1 sample
- After buffer 0 is filled, start filling buffer 1,
- Operation: After playing last sample from buffer 0,
  - Switch to playing buffer I
- <sup>45</sup> Start refilling buffer 0

- Generalization
  - After playing last sample from buffer x, switch to playing buffer y, start filling buffer x
- Deadlines
  - Now have two deadlines, one per buffer ⊗
  - Much looser deadlines: extended to from  $T_{Sample}$  to  $(N+1)*T_{Sample}$

Sequence Diagram



#### G. DMA with Double Buffer

