n0derunner

    Effects of CPU topology on sqlserver guests with AHV.

    Published: (Updated: ) in Linux Virtualization, AHV, SQL Server, , , , , by .

    VM CPU Topology

    The topology (layout) that AHV presents virtual Sockets/CPU to the guest operating system will usually be different than the physical topology. This is expected because we typically present a subset of all cores to the guest VMs.

    Usually it is the total number of vCPU given to the VM that matters, not the specific topology, but in the case of SQLserver running an analytical workload (a TPC-H like workload from HammerDB) the topology passed to the VM does make a difference. Between 10% and 20% when measured by the total runtime.

    [I think that the reason we see a difference here is that (a) the analytical workloads use hardly any storage bandwidth (I sized the database to fit in memory) and (b) there is probably a lot of cross-talk between the cores/memory as the DB engine issues parallel queries.]

    At any rate we see that passing 20 cores as “20 sockets of 1 core” beats the performance of “1 socket with 20 cores” by a wide margin. The physical topology is two sockets of 20 cores on each socket. Thankfully the better performing option is the default.

    CPU Topology may make a difference for SQL server running analytical workloads.

    Available constructs in AHV (KVM QEMU)

    Example 16 CPUs allocated by AHV

    AHV uses two constructs when passing CPU and sockets to the guests.

    Using these two constructs the AHV could pass up

    etc.

    How do these appear to the guest (Windows 2016)

    16 sockets 1 core per socket

    Taking for example the first config. 16 sockets with 1 CPU per socket
    16 Sockets, one core per socket

    1 socket, 16 cores on single socket

    The opposite config, a single socket with 16 cores
    1 Socket, 16 cores on single socket

    Effect on guest performance

    The hypervisor sees each guest vCPU as a thread – and from that perspective, KVM/QEMU/Linux does not really care about how the cores are presented to the guest. For the time being we will leave NUMA aside and assume that the number of cores presented to the guest is <= the number of physical cores on a single physical socket.

    However, it seems that the guest does care about how the cpus are presented – or put another way, the guest will make scheduling decisions based on how it thinks the cores are laid out. Here is an example with Windows OS and SQL server servicing HammerDB TPC-H workload.

    HammerDB Driver SetupHammerDB vUser Config
    HDB DriverHDB vUser

    In this experiment the SQL server is configured with 20 vCPU (there are 20 real cores on the each physical socket) and the DOP (Degree Of Parallelism) is also set to 20. So in this experiment we can expect some coordination between threads as they carve up the query across cores.

    Comments

    Leave a Comment