I've made some further tests.
The Explorer's buffered copy seems to depend on the amount of free/available RAM. When I filled the memory with virtual machines, I had only 4-5GB free (out of 16). I fired the buffered copy of 9GB. Performance monitor running as well.
Both Read and Write curves showed almost perfect trapezoid curves with horizontal plateau's at ~500MB/s, just the Write curve a bit offseted (of course). This is what I expect under normal conditions and all - full speed both read and write almost parallel.
Unbuffered copy with XY - again topped at 330MB/s, same curves as before.
Stopped one VM and freed another 4GB RAM, buffered copy took some hit in write curve - some of it delayed further after the Read has finished. Freed another VM and RAM, and I see the same curve as in previous screenshot on the right, again.
Seems there is no complete joy. It seems the ideal is to have some buffer but not very big as Windows is doing when it sees large amounts of free ram (when it also totally trashes the "other" cache from ram). So many decades and Windows didn't learn how to effectively copy files!
Edit: Just tried with robocopy with the /J option (unbuffered). Funny, it maxes out at 260MB/s only, so XY is doing better. Wondering why unbuffered cannot max out the real drive speeds.