# Extending the range of P4 programmability

Gordon Brebner Xilinx Labs, San Jose, USA

P4EU Keynote, Cambridge, UK 24 September 2018



### What this talk is about

- >P4 history and status
- > Portable NIC Architecture (PNA)
- > Programmable Target Architecture (PTA)
- Programmable Traffic Manager (PTM)
- > Towards open reference platforms





### P4 history and status



#### Programming Protocol-independent Packet Processors



#### > Language first appeared in paper published in July 2014

- >> Original version and early evolution now known as P4<sub>14</sub>
- >> Revised version known as P4<sub>16</sub> released in May 2017

#### > Three goals:

- >> Reconfigurability in the field reprogramming of networking equipment
- >> Protocol independence not tied to any specific networking protocols
- >> Target independence not tied to any specific networking hardware

#### > P4 Language Consortium (P4.org) set up in 2015

- >> Xilinx was a founding member of P4.org
- >> Now has >100 members



### P4 language features ... in one slide



>> 40



### **Original perspective (P4<sub>14</sub>)**



• Language Design WG



### **Diverse targets** (P4<sub>16</sub>)



Language Design WGArchitecture WG



### **Complex control planes**



- Language Design WGArchitecture WG
- API WG



### **Rich applications**



• Language Design WG

**E** XILINX.

- Architecture WG
- API WG
- Applications WG



### **Education**



- Language Design WG
- Architecture WG
- API WG
- Applications WG
- Education WG



### P4 ecosystem





#### **E** XILINX.

### Xilinx (P4) SDNet product (www.xilinx.com/sdnet)



## **SDNet-supported research community today:** 60 institutions in 22 countries



### **Status of P4**

#### > Industry Momentum

- >> Diverse collection of P4-enabled targets
- >> Growing number of P4-based products
- >> Real-world deployments

#### > Academic Interest

- >> Research papers at top conferences
- >> New courses at leading universities

#### > Open Source Community

- >> Vibrant technical working groups
- > Powerful set of P4 tools
- >> P4.org joined Linux Foundation this year

"Our whole networking industry stands to benefit from a language like P4 that unambiguously specifies forwarding behavior, with dividends paid in software developer productivity, hardware interoperability, and furthering of open systems and customer choice."

— Tom Edsall, Cisco





### **Portable NIC Architecture**

P4 community desire New P4.org Architecture sub-group





### Switch vs. NIC: Superficially similar ...

#### > Switch-style architecture



> NIC-style architecture



>> 29

### Xilinx Labs Smart NIC prototype (evolved 2015-2018)



Xilinx NICs and Barefoot switch: In-band Network Telemetry (INT) inter-operability Demonstrated at MWC 2018 and OFC 2018



**E** XILINX.

### Use Case 1/3: Basic NIC ingress and egress

#### > Example:

- >> 40Gb/s IP packet forwarding
- >> 1 CPU core needed instead of 6 CPU cores
- >> Full line rate with 64-byte packets



### Use Case 2/3: Direct egress to ingress bridging

#### > Example:

- >> NFV Service Function Chaining (SFC)
  - Offload of NSH protocol used for SFC
- >> 5x reduction in VM-to-VM latency
- >> Throughput matches the PCIe bandwidth



### **Use Case 3/3: Bump-in-wire acceleration**

#### > CPU out of main processing loop

>> Just used for configuration and exceptions

#### > Example:

- >> Video Transcoding appliance
- >> Accelerate video coding
- >> 25x better frames/second per Watt



### Some Portable NIC Architecture (PNA) discussions

#### > Expect there to be separate ingress and egress pipelines

- >> What are the standard components of each pipeline? Are there pipeline variants?
- >> Which components are P4-programmable?
- >> Is direct interaction between ingress and egress, and egress and ingress, allowed?

#### > How is host CPU interface modelled?

- >> Differentiate data plane CPU roles, and control plane CPU roles
- >> Impact on P4Runtime

#### > Beyond packet forwarding (future steps – of general P4 interest)

- >> Is protocol (e.g., TCP) termination covered?
- >> Is 'Type 3' NIC covered payload processing as well?



### **Programmable Target Architecture**

Stanford, Xilinx Labs Now in discussion with Barefoot, Cornell, VMware Research



### Examples of the many possible target architectures



**Portable Switch Architecture (PSA)** 



**Custom in-line processing** 



### **Programmable Target Architecture (PTA)**

#### > Motivations

- >> Extend P4 ("P4+") to allow description of target architectures: components and connectivity
- >> End-to-end P4 program verification relative to particular architectures
- >> Explore performance tradeoffs of various architectures

#### > Three actors



#### (1) Target architecture designer

Implements:

- Externs in target architecture
  - In-line (packet processing)
  - Look-aside (header processing)
- P4Runtime+ API for externs

Provides:

• P4+ architecture description

#### (2) P4 programmer

Implements:

- P4-programmable
  - "holes" in the target
  - architecture

#### (3) Runtime programmer

#### Implements:

 Runtime controller for P4-populated target architecture

### **Example: Custom target architecture**

#### Logical P4 pipeline view:



#### Internal design view:



### **Custom architecture description using experimental P4+**

#define NUM\_PORTS 2
struct std\_meta\_t {...}

// Define (header processing) externs ...

```
// Define Architectural Elements
parser Parser<H>(packet in
                                 p in,
                                 *headers, // * distinguishes between headers and metadata
                out H
                inout std meta t std meta,
                                 .p out); // . indicates that port is hidden (i.e. invisible at this pipeline stage)
                packet out
control Pipe<H>(inout H
                                *headers,
               inout std meta t std meta,
               packet inout
                                .p);
control Deparser<H>(packet out
                                    p out,
                   in H
                                    *headers,
                   inout std meta t std meta,
                   packet in
                                    .p in);
                                                                                                         P4+ code:
extern TM(packet in
                          p in,
                         std meta in,
         in std meta t
                                                                                                   Written by target
          (
           packet out
                          p out,
                                                                                                       architecture
           out std meta t std meta
         )*NUM PORTS); // * operator indicates replicated ports
                                                                                                          designer
package Example<H1, H2> (Parser<H1> p1,
                        Pipe<H1> map1,
                        Deparser<H1> d1,
                        TM tm,
                        Parser<H2> p2,
                        Pipe<H2> map2,
                        Deparser<H2> d2)
   // * operator indicates forked replication
    arch = {p1, map1, d1, tm, (p2, map2, d2) *NUM PORTS}
```

### **Custom architecture Interface (auto-generated)**

struct std\_meta\_t {...}

// Define (header processing) externs ...

 Standard P4 code: Imported by P4 programmer

Prototype P4+ workflow being demonstrated at P4EU today



### Programmable Traffic Manager

MIT, NYU, Stanford, Xilinx Labs New P4.org Architecture sub-group



### What is Traffic Management?

- > Policing: compliance with agreed rate
- > Drop policy: how to avoid/deal with congestion
- > Replication: cloning and multicasting packets
- > Packet buffering: temporary storage of packets
- > Packet scheduling: determining order of transmission
- > Traffic shaping: forcing rate and pace

> Associated with *Classification* – mapping packet flows to egress ports and queues



### Why should we care about Traffic Management?

#### > Lots of different types of traffic with different characteristics and requirements

- >> Characteristics: burstiness, packet sizes, flow sizes, flow rates
- >> Requirements: throughput, latency, loss, jitter, reordering, flow completion time, pacing

#### > Network operators have a wide range of objectives

- >> Meet all Service Level Agreements
- >> Maximize network utilization
- >> Achieve fairness, while prioritizing certain traffic

#### > Network devices are acquiring more TM functionality

>> About 50% of a modern programmable switch chip is dedicated to traffic management and buffering – but this part is currently not programmable

#### > Particular programmability benefits, alongside general P4 benefits

- >> Network operators can fine-tune for performance
- >> Small menu of standard algorithms to choose from today
- >> ... Many possible algorithms that can be expressed

### **Programmable Traffic Manager (PTM) architecture**



**E** XILINX.

### The Push-In-First-Out (PIFO) model [SIGCOMM 2016]



> Why is the PIFO a good model for scheduling and shaping?

- >> Ordering decision made at time of enqueue  $\rightarrow$  helps relax timing pressure at output ports
- >> Clear separation between programmable part and fixed part

#### > Can implement existing algorithms, for example:

- Start Time Fair Queueing (STFQ), Least Slack-Time First (LSTF), Stop-and-Go Queueing, Minimum rate guarantees, fine grained priority scheduling, Service-Curved Earliest Deadline First (SC-EDF), Rate-Controlled Service Disciplines (RCSD)
- >> Token bucket rate limiting

#### > Can implement new algorithms using programmable rank computation



Programmable

scheduling and

shaping

#### **Prototype implemented on FPGA for 4x10G line rate** NYU+Stanford+Xilinx Labs demonstration at P4 Workshop, June 2018



>> 11

**EXILINX**.







### **Example: Possible P4 extension for scheduler/shaper**



### **Towards open reference platforms**



### Software platform: P4 toolchain for BMv2 simulation



### Hardware platform: NetFPGA (= Networked FPGA)

- > Line-rate, flexible, open networking hardware for teaching and research
- > Begun in 2007 by Stanford and Xilinx Labs, now anchored at Cambridge
- > NetFPGA systems deployed at over 150 institutions in over 40 countries

Four elements:

- > Community: NetFPGA.org
- > Low-cost board family
- > Tools and reference designs
- > Contributed projects



NetFPGA-1G-CML 4x1G ports

NetFPGA-SUME 4x10G ports

**E** XILINX.

### Hardware platform: P4→NetFPGA workflow

https://github.com/NetFPGA/P4-NetFPGA-public/wiki



### **Possible future P4 open reference platform collection**

#### Two architecture types



Switch style (PSA)

with

#### **Two implementation types**





### Conclusion





### **Research directions**

#### > Language: Extend coverage of P4

- Programmable Traffic Management (MIT + NYU + Stanford + Xilinx Labs + P4.org)
- >> Programmable Target Architectures (Cornell + Stanford + VMWare Research + Xilinx Labs)

#### > Infrastructure: Open source hardware reference platform for P4

- >> Complement existing software reference platform
- >> Cover NIC-style architectures as well as switch-style architectures

#### > Applications

- >> Congestion control; In-band network telemetry
- >> In-network computing
- >> Programmable networking novelty
- >> ... your ideas here



### **Call to action**

#### > Become a member of P4.org

- >> No fee, and simple membership agreement
- >> Code and data under Apache 2.0 license

#### > Participate in working groups, and their *ad hoc* sub-groups (e.g., PNA, PTM)

- >> Activities are open to all members
- >> Anyone with a good idea can help shape the future of P4

#### > Contribute to evolving open source provision

- >> Compiler (p4c) common front end and mid ends, and target-specific backends
- >> Software reference switch (bmv2) and future open platforms
- >> Control plane API (P4Runtime)
- >> Tutorials
- >> Documentation
- >> Standard applications
- >> New applications

The End



