Having written the FreeBSD kernel TLS, I can assure you that there is no copy. Data is brought into the kernel via DMA from storage into a page in the VM page cache. When the IO is done, it is then encrypted into an connection-private page. That page is then sent and DMA'ed on the network adapter. So we have in the kernel tls case:
- memory DMA to kernel mem from storage.
- memory READ from kernel mem to read plaintext for crypto
- memory write to another chunk of kernel mem to write encrypted data
- memory DMA from kernel mem to NIC
In the case where the NIC supports inline TLS offload, the middle 2 steps are skipped, and it devolves to essentially the unencrypted case.
For QUIC you have:
- memory DMA to kernel mem from storage
- memory read from kernel mem via mmap
- memory write to userspace mem to write encrypted data
- memory read from userspace mem to copy to kernel
- memory write to kernel mem
- memory DMA from kernel mem to NIC
So you go from 3 "copies" to 4 "copies", which increases memory bandwidth demands by 33%.
Right now, we can just barely serve 100g from a Xeon-D because Intel limited the memory bandwidth to DDR4-2400. At an effective bandwidth limit of 60GB/sec, that's on the edge of being able to handle the kernel TLS data path. So even if everything else about QUIC was free, this extra memory copy from userspace would cut bandwidth by a third.
Только в freebsd нет in kernel tls ;)