The main problem is that it packetizes the data and waits for responses, effectively re-implementing the TCP window inside a TCP stream. You can only have so many packets outstanding in the standard SFTP implementation (64 is the default) and the buffers are quite small (32k by default) which gives a total outstanding data of 2MB. The highest transfer rate you can make depends on the latency of the link. If you have 100 ms of latency then you can send at most 20 MB/s which is about 200 Mbit/s - nowhere near filling a fast wide pipe.
You can tweak the buffer size (up to 256k I think) and the number of outstanding requests, but you hit limits in the popular servers quite quickly.
To mitigate this rclone lets you do multipart concurrent uploads and downloads to sftp which means you can have multiple streams operating at 200 Mbit/s which helps.
The fastest protocols are the TLS/HTTP based ones which stream data. They open up the TCP window properly and the kernel and networking stack is well optimized for this use. Webdav is a good example.
There are some minor usability bugs and I think both endpoints need to have it installed to take advantage. I remember asking ages ago why it wasn’t upstreamed, there were reasons…
[1]: https://mosh.org/
I've been using mosh now for over a decade and it is amazing. Add on rsync for file transfers and I've felt pretty set. If you haven't checked out mosh, you should definitely do so!
The only change I see here that is probably harmless and a speed boost is using AES-NI for AES-CTR. This should probably be an upstream patch. The rest is more iffy.