# **Event-Driven Packet Processing**

Stephen Ibanez, Gordon Brebner, Gianni Antichi, Nick McKeown

May 1<sup>st</sup> 2019

# **P4 Programming Model**



#### Synchronous packet-by-packet processing

# **Limitations of P4 Programming Model**

#### > Performing periodic tasks

- >> HULA [1] periodic packet probes
- >> Count-Min-Sketch periodic state reset

#### > Updating state multiple times / using state in a different stage

>> Using congestion signals in ingress pipeline (AQM, NDP [2])

| Common Congestion Signals                                                      | Other Congestion Signals                                                                                                                                                                       |
|--------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <ul><li>Queue size</li><li>Queue service rate</li><li>Queueing delay</li></ul> | <ul> <li>Packet loss volume</li> <li>Rate of change of queue size</li> <li>Timestamp of buffer<br/>overflow/underflow events</li> <li>Per-active-flow buffer occupancy</li> <li>Etc</li> </ul> |

#### > Solution:

Generalize: Packet arrival/departure events  $\rightarrow$  data-plane events

>> 3

[1] Katta, Naga, et al. "Hula: Scalable load balancing using programmable data planes." SOSR, 2016.[2] Handley, Mark, et al. "Re-architecting datacenter networks and stacks for low latency and high performance." SIGCOMM, 2017.

#### **Data-Plane Events**

Packet & Metadata Events

Metadata Events

| Event Type              | Description                                           |
|-------------------------|-------------------------------------------------------|
| Ingress Packet          | Packet arrival                                        |
| Egress Packet           | Packet departure                                      |
| Recirculated packet     | Packet sent back to ingress                           |
| Buffer Enqueue          | Packet enqueued in buffer                             |
| Buffer Dequeue          | Packet dequeued from buffer                           |
| Buffer Overflow         | Packet dropped at buffer                              |
| Buffer Underflow        | Buffer becomes empty                                  |
| Timer Event             | Configurable timer expires                            |
| Control-plane triggered | Control-plane triggers processing logic in data-plane |
| Link Status Change      | Link goes down / comes up                             |
| Packet Transmission     | Packet finished transmission                          |
| State Condition Met     | User defined condition                                |

### **Event-Driven Programming Model**



Does not sacrifice line-rate packet processing

#### **Event-Driven Programming Model**

#### > E.g: Compute total buffer occupancy:

```
// arch.p4
extern shared_register<T> {
    shared_register();
    void read(out T result);
    void write(in T value);
}
```

>> 6

```
// my_prog.p4
shared_register<bit<32>>() bufSize_reg;
```

```
// use bufSize to make forwarding decisions
}
```

```
// Enqueue Event Logic
control Engueue(inout eng data t meta) {
  bit<32> bufSize;
  apply {
    bufSize reg.read(bufSize);
    bufSize = bufSize + meta.pkt_len;
    bufSize_reg.write(bufSize);
  }
// Dequeue Event Logic
control Dequeue(inout deq data t meta) {
  bit<32> bufSize:
  apply {
    bufSize reg.read(bufSize);
    bufSize = bufSize - meta.pkt_len;
    bufSize reg.write(bufSize);
```

#### **Lower Line Rate Event Processing**

- > Multi-ported memory is more practical
- > One port per event type that accesses state array



## **Higher Line Rate Event Processing**

> Multi-ported memory is impractical



# **Higher Line Rate Event Processing**



#### **NetFPGA SUME Event Switch Demo**

- > Simple Fair-RED (FRED) AQM implementation
- > Isolate TCP flow from non-adaptive UDP flow
- > Computes per-active-flow queue occupancy
  - » Enqueue & Dequeue Events

#### > Queue occupancy tracing:







> Network algorithms are event-driven, so should our data-plane architectures



> Potential to offload much more functionality to our data-planes

# **Questions?**

### **Line Rate Event Processing**



#### > Idle clock cycles:

- 1. Workload contains large packets
- 2. Pipeline runs faster than line rate

#### > Bounded staleness of the main register

### **SUME Event Switch on NetFPGA**

