Do You Really Get Double Performance from Hyper-threads?

No. Not even close.

Hyper-threading Technology (HTT), created by Intel almost 15 years ago, was designed to increase the performance of CPU cores. Intel explains that HTT uses processor resources more efficiently, and enables multiple threads to run on each core. The result is increased processor throughput, and improved overall performance on threaded software. With HTT enabled, each CPU core is represented as two “logical” cores. When viewing the CPU usage via the Windows Task Manager or Linux htop utility, each CPU core shown represents a logical-core.

The problem with such visualization is that not all logical-cores are equal. If you had a quad-core CPU without HTT, each of the four cores would be a stand-alone physical core, and the performance of each core would be guaranteed regardless of the tasks running on the remaining cores. However, on a dual-core CPU with HTT, there are only two physical cores abstracted as four logical cores, where each pair shares a physical core. The throughput on each logical core, at any given time, very much depends on the operations being executed on the paired logical core.

In software development, threads refer to “sequences of execution”. While a single threaded application has only one sequence of execution, multi-threaded applications have more than one, and they typically execute in parallel. On Windows, OSX, and Linux systems, users almost always run many processes concurrently, and each one of them may be single or multi-threaded. It is the operating system’s task to schedule execution time-slots for each of the (logical) threads, in each of the processes on the physical threads (logical cores), of the underlying hardware.

When multiple threads are executed on the same physical or logical core, each one is given a time-slot in which to execute (kind of similar to timeshare vacation rentals). Switching threads running on a core is managed by the OS, and called context-switching. Context-switching introduces overhead as it results in the CPU wasting cycles on switching threads instead of executing them.

HTT allows each core to execute two threads with less overhead between context switches. The HT-enabled core automatically switches between the two threads when their execution stalls, i.e., they get blocked waiting for some external operation such as memory I/O. This is more efficient than OS context switching, which introduces overhead; or stalling, which completely wastes CPU cycles. A nice description of HTT describes it as two hands feeding a single mouth, thus never keeping the mouth empty when one of the hands is busy (i.e., blocked on I/O).

Note however that this method of running two threads on a single core is very different than running the two threads in parallel on two physical cores. At best, such cooperative sharing results in 30% performance improvement over a single core, where context switching is managed by the OS, while two physical cores will provide twice the performance of a single core.

The performance gains are often smaller, specifically if the threads executed are CPU-bound, and rarely wait on I/O or stall. Moreover, HTT might even result in a performance degradation depending on the specific application running. This could happen due to the fact that the lower level caches are shared between the two logical cores, which may result in lower bit-rates and lower performance.

You may notice that the Intel documentation is quite confusing in its thread terminology: The number of threads it refers to corresponds to the number of logical-cores, which is typically twice the number of physical cores. To add to the confusion, these threads are sometimes referred to as Physical Threads…

Since behavior varies greatly depending on the exact application, benchmarking with the exact application and use-case should be performed. In the case of Beamr Video multiple processes are used, some of which are multi-threaded, enabling HTT, while using the default Linux kernel scheduler providing about 10% performance boost. This was tested on AWS, with hyperthreads disabled using code from this post.

To summarize, HTT is usually beneficial, increasing performance by up to 30% (often less), however some issues should be kept in mind:

Do not be fooled by CPU monitoring tools which don’t distinguish between physical and logical cores, or treat all cores as equal.
Overall performance may worsen with HTT enabled, therefore application-specific benchmarking is advised.

It is yet to be tested if better (non-default) scheduling can be used to gain more from this technology for a given application.

Do You Really Get Double Performance from Hyper-threads?

Dan Julius

PREV POST

Any Content, Anytime, Anywhere – Priceless?

NEXT POST

Can MSOs Enable Subscribers to Define Their Own Media Experience?