Precise thread sleep needed. Max 1ms error

I was looking for lightweight cross-platform sleep function that is suitable for real time applications (i.e. high resolution/high precision with reliability). Here are my findings:

Scheduling Fundamentals

Giving up CPU and then getting it back is expensive. According to this article, scheduler latency could be anywhere between 10-30ms on Linux. So if you need to sleep less than 10ms with high precision then you need to use special OS specific APIs. The usual C++11 std::this_thread::sleep_for is not high resolution sleep. For example, on my machine, quick tests shows that it often sleeps for at least 3ms when I ask it to sleep for just 1ms.

Linux

Most popular solution seems to be nanosleep() API. However if you want < 2ms sleep with high resolution than you need to also use sched_setscheduler call to set the thread/process for real-time scheduling. If you don’t than nanosleep() acts just like obsolete usleep which had resolution of ~10ms. Another possibility is to use alarms.

Windows

Solution here is to use multimedia times as others have suggested. If you want to emulate Linux’s nanosleep() on Windows, below is how (original ref). Again, note that you don’t need to do CreateWaitableTimer() over and over if you are calling sleep() in loop.

#include <windows.h>    /* WinAPI */

/* Windows sleep in 100ns units */
BOOLEAN nanosleep(LONGLONG ns){
    /* Declarations */
    HANDLE timer;   /* Timer handle */
    LARGE_INTEGER li;   /* Time defintion */
    /* Create timer */
    if(!(timer = CreateWaitableTimer(NULL, TRUE, NULL)))
        return FALSE;
    /* Set timer properties */
    li.QuadPart = -ns;
    if(!SetWaitableTimer(timer, &li, 0, NULL, NULL, FALSE)){
        CloseHandle(timer);
        return FALSE;
    }
    /* Start & wait for timer */
    WaitForSingleObject(timer, INFINITE);
    /* Clean resources */
    CloseHandle(timer);
    /* Slept without problems */
    return TRUE;
}

Cross Platform Code

Here’s the time_util.cc which implements sleep for Linux, Windows and Apple’s platforms. However notice that it doesn’t set real-time mode using sched_setscheduler as I mentioned above so if you want to use for <2ms then that’s something you need to do additionally. One other improvement you can make is to avoid calling CreateWaitableTimer for Windows version over and over again if you are calling sleep in some loop. For how to do this, see example here.

#include "time_util.h"

#ifdef _WIN32
#  define WIN32_LEAN_AND_MEAN
#  include <windows.h>

#else
#  include <time.h>
#  include <errno.h>

#  ifdef __APPLE__
#    include <mach/clock.h>
#    include <mach/mach.h>
#  endif
#endif // _WIN32

/**********************************=> unix ************************************/
#ifndef _WIN32
void SleepInMs(uint32 ms) {
    struct timespec ts;
    ts.tv_sec = ms / 1000;
    ts.tv_nsec = ms % 1000 * 1000000;

    while (nanosleep(&ts, &ts) == -1 && errno == EINTR);
}

void SleepInUs(uint32 us) {
    struct timespec ts;
    ts.tv_sec = us / 1000000;
    ts.tv_nsec = us % 1000000 * 1000;

    while (nanosleep(&ts, &ts) == -1 && errno == EINTR);
}

#ifndef __APPLE__
uint64 NowInUs() {
    struct timespec now;
    clock_gettime(CLOCK_MONOTONIC, &now);
    return static_cast<uint64>(now.tv_sec) * 1000000 + now.tv_nsec / 1000;
}

#else // mac
uint64 NowInUs() {
    clock_serv_t cs;
    mach_timespec_t ts;

    host_get_clock_service(mach_host_self(), SYSTEM_CLOCK, &cs);
    clock_get_time(cs, &ts);
    mach_port_deallocate(mach_task_self(), cs);

    return static_cast<uint64>(ts.tv_sec) * 1000000 + ts.tv_nsec / 1000;
}
#endif // __APPLE__
#endif // _WIN32
/************************************ unix <=**********************************/

/**********************************=> win *************************************/
#ifdef _WIN32
void SleepInMs(uint32 ms) {
    ::Sleep(ms);
}

void SleepInUs(uint32 us) {
    ::LARGE_INTEGER ft;
    ft.QuadPart = -static_cast<int64>(us * 10);  // '-' using relative time

    ::HANDLE timer = ::CreateWaitableTimer(NULL, TRUE, NULL);
    ::SetWaitableTimer(timer, &ft, 0, NULL, NULL, 0);
    ::WaitForSingleObject(timer, INFINITE);
    ::CloseHandle(timer);
}

static inline uint64 GetPerfFrequency() {
    ::LARGE_INTEGER freq;
    ::QueryPerformanceFrequency(&freq);
    return freq.QuadPart;
}

static inline uint64 PerfFrequency() {
    static uint64 xFreq = GetPerfFrequency();
    return xFreq;
}

static inline uint64 PerfCounter() {
    ::LARGE_INTEGER counter;
    ::QueryPerformanceCounter(&counter);
    return counter.QuadPart;
}

uint64 NowInUs() {
    return static_cast<uint64>(
        static_cast<double>(PerfCounter()) * 1000000 / PerfFrequency());
}
#endif // _WIN32

Yet another more complete cross-platform code can be found here.

Another Quick Solution

As you might have noticed, above code is no longer very light-weight. It needs to include Windows header among others things which might not be very desirable if you are developing header-only libraries. If you need sleep less than 2ms and you are not very keen on using OS code then you can just use following simple solution which is cross platform and works very well on my tests. Just remember that you are now not using heavily optimized OS code which might be much better at saving power and managing CPU resources.

typedef std::chrono::high_resolution_clock clock;
template <typename T>
using duration = std::chrono::duration<T>;

static void sleep_for(double dt)
{
    static constexpr duration<double> MinSleepDuration(0);
    clock::time_point start = clock::now();
    while (duration<double>(clock::now() - start).count() < dt) {
        std::this_thread::sleep_for(MinSleepDuration);
    }
}

Related Questions

Leave a Comment