Hacker News new | past | comments | ask | show | jobs | submit login

USB doesn’t provide any DMA (until USB 4) and requires more host cpu resources to meet the same bandwidth. It also has less consistent performance by virtue of the USB protocol itself.





at least for Gigabit speeds, the CPU usage is negligible if the device and the driver are communicating through CDC-NCM protocol, but yeah it's a significant hit if you're using CDC-ECM...,

I am confused by this, I worked on a Linux USB driver that used DMA in 2003.

DMA from device to host directly rather than from host USB controller to host memory.

When I worked on it, the USB controller was just a pci bus device that once set up, the incoming data, from a USB ADC, streamed the data in blocks directly to memory. Maybe they took all that back out.

They didnt remove anything. Did the USB Controller DMA Master support DMA chaining or command lists?

Ethernet controller being a dma master means it can continually plop packets where it wants without CPU intervention. Infamously Realtek RTL8139 10/100M chip was the first Realtek with DMA mastering support, but it was brain dead implementation https://people.freebsd.org/~wpaul/RealTek/3.0/if_rl.c:

>"The RealTek 8139 PCI NIC redefines the meaning of 'low end.' This is probably the worst PCI ethernet controller ever made, with the possible exception of the FEAST chip made by SMC. The 8139 supports bus-master DMA, but it has a terrible interface that nullifies any performance gains that bus-master DMA usually offers.

For transmission, the chip offers a series of four TX descriptor registers. Each transmit frame must be in a contiguous buffer, aligned on a longword (32-bit) boundary. This means we almost always have to do mbuf copies in order to transmit a frame, except in the unlikely case where a) the packet fits into a single mbuf, and b) the packet is 32-bit aligned within the mbuf's data area. The presence of only four descriptor registers means that we can never have more than four packets queued for transmission at any one time.

Reception is not much better. The driver has to allocate a single large buffer area (up to 64K in size) into which the chip will DMA received frames. Because we don't know where within this region received packets will begin or end, we have no choice but to copy data from the buffer area into mbufs in order to pass the packets up to the higher protocol levels.

It's impossible given this rotten design to really achieve decent performance at 100Mbps, unless you happen to have a 400Mhz PII or some equally overmuscled CPU to drive it."

Afaik 10 years later 1Gbit RTL8111B required alignment on 256 byte boundaries so not much better.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: