摘要:随着国家“东数西算”战略实施以及智算、超算业务的快速发展,海量数据广域传输需求不断增多。提出一种广域抗损高吞吐超远程直接内存访问(URDMA)技术方案,通过对传输控制协议/互联网协议(TCP/IP)协议栈的完全卸载,消除中央处理器(CPU)对网络高吞吐性能的限制。采用拥塞控制、丢包恢复、丢包重传等技术增强标准RoCEv2协议,使其在广域有损网络下保持高吞吐性能。测试结果表明,在往返时延(RTT)时延为20 ms、丢包率0.1%的网络环境下,TCP协议吞吐性能仅为0.02 Gbit/s,标准RoCEv2性能接近为0,URDMA协议吞吐性能为88.26 Gbit/s;当RTT时延增加到80 ms时,TCP和RoCEv2协议吞吐基本衰减为0,URDMA协议吞吐性能为83.12 Gbit/s,仍然保持较高的性能。
关键字:广域抗损高吞吐;数据快递;远程直接内存访问;RoCEv2
Abstract: With the implementation of the national "East Data West Computing" strategy and the rapid development of intelligent computing and supercomputing services, the demand for large-scale data transmission is constantly increasing. A wide-area high-throughput ultra remote direct memory access (URDMA) technology solution is proposed, which mitigates the limitation of the central processing unit (CPU) on high-throughput network performance by completely offloading the transmission control protocol/Internet protocol (TCP/IP) protocol stack. By adopting congestion control, packet loss recovery, packet loss retransmission, and other technologies to enhance the standard RoCEv2 protocol, URDMA enables high-throughput performance in wide-area lossy networks. The test results show that in a network environment with a round-trip time (RTT) of 20ms and a packet loss rate of 0.1%, the TCP protocol throughput performance is only 0.02 Gbit/s, the standard RoCEv2 performance is close to 0, and the URDMA protocol throughput performance is 88.26 Gbit/s. When the RTT increases to 80ms, the TCP and RoCEv2 protocols basically decay to 0, and the URDMA protocol throughput performance is 83.12 Gbit/s, still maintaining high performance.
Keywords: high-throughput in wide-area network; data express; remote direct memory access; RoCEv2