-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Description
It seems that the sleep
function behavior when used in tasks running on different threads is heavily impacted by work taking place on the main thread.
Given the documentation of sleep
function you would expect that @time sleep(1)
will result in approximately 1-second
being reported by the @time
evaluator. In the same way, you would expect that the following will take about 10-seconds in total:
@time for _ in 1:10
sleep(1)
end
# Usually + MWE control (task1): 10 seconds +/- small value
# MWE context (current issue): 16+ seconds (can be altered by the period main thread is busy)
However, the above expectation is violated (in a manner not explainable by the expected/usual variability in the sleep
and/or @time
behavior) when sleep
is called from tasks running on different threads and any of the following two conditions are met:
- there is some work performed by the main thread, or
Libc.systemsleep(n)
is called on the main thread (showcased in my MWE) - the main thread is free, but many (~1000) spawned tasks call
sleep
concurrently (increasing the number of available threads results in worse results).
To give a clear example:
function sleeper(s, c, id)
@info "sleeper $id executing on thread $(threadid()) "
counter = 0
@time "sleeper $id" while counter < c
sleep(s)
counter += 1
end
end
@spawn sleeper(1, 10, 1)
Output like this would look normal:
sleeper 1: 10.021883 seconds (53.10 k allocations: 3.561 MiB)
However, the function sleeper(1, 10, 1)
running in tasks spawned on different threads than main, can output values like the one below (when main thread is busy):
sleeper 1: 16.963197 seconds (13.87 k allocations: 957.833 KiB, 0.13% compilation time)
Or if you make the main really busy for a longer period of time:
sleeper 1: 29.944492 seconds (13.87 k allocations: 957.802 KiB, 0.06% compilation time)
And this is not about some weird print to stdout
competition between tasks (we also have a control
function, realworker
, not using sleep
to confirm my statement).
All this might make more sense if you run the MWE provided below.
Important note: the current issue is to be interpreted in the context of Julia being started with multiple threads. For consistency, let's assume julia -t 6,3 script.jl
.
Main MWE follows:
using Base.Threads, Dates
@info "started with $(nthreads()) threads"
@info "interactive threadpool: $(nthreads(:interactive))"
@info "default threadpool: $(nthreads(:default))"
println(repeat("*", 40))
function mainspin(s, c; withyield=false)
@info "mainspin executing on thread $(threadid()) "
ms = Millisecond(s * 1000)
r = 0.0
counter = 0
while counter < c
t = now()
while (now() - t) < ms
r += rand()
end
counter += 1
if withyield
yield()
end
end
end
function sleeper(s, c, id)
@info "sleeper $id executing on thread $(threadid()) "
counter = 0
@time "sleeper $id" while counter < c
sleep(s)
counter += 1
end
end
function realworker(s, c, id)
@info "worker $id executing on thread $(threadid()) "
ms = Millisecond(s * 1000)
r = 0.0
counter = 0
@time "worker $id" while counter < c
t = now()
while (now() - t) < ms
r += rand()
end
counter += 1
yield()
end
end
begin
# phase 1
@info "phase 1: control run - no main thread work"
# yields 10 times (total time ~10 seconds)
task1 = @spawn realworker(1.02, 10, 1) # 1.02 because weird @time printing overlap
# yields 10 times (total time ~10 seconds)
task11 = @spawn sleeper(1, 10, 1)
wait(task1)
wait(task11)
# we waited for task1 and task11 to finish
# phase 2
println(repeat("*", 40))
@info "phase 2: main thread work without yield"
# yields 10 times (total time ~10 seconds)
task2 = @spawn realworker(1, 10, 2)
# yields 10 times (total time ~10 seconds)
task22 = @spawn sleeper(1, 10, 2)
# we wait 5 seconds before starting to spin
# the main thread
sleep(5)
# now we start spinning
# while task2 and task22 are already running
mainspin(1, 7)
# main thread work does not impact the real worker
# total time ~10 seconds
wait(task2)
# main thread work heavily impacting the sleeper
# total time >15 seconds
wait(task22)
# phase 3
println(repeat("*", 40))
@info "phase 3: main thread work with yield"
# yields 10 times (total time ~10 seconds)
task3 = @spawn realworker(1, 10, 3)
# yields 10 times (total time ~10 seconds)
task33 = @spawn sleeper(1, 10, 3)
# we wait 5 seconds before starting to spin
# the main thread
sleep(5)
# now we start spinning with yield
# while task3 and task33 are running
mainspin(1, 7, withyield=true)
# main thread work does not impact
# the real worker - total time ~10 seconds
wait(task3)
# main thread work (with yield) still impacting the sleeper
# but we consistently get better times (total time ~13 seconds)
wait(task33)
end
Output with julia -t 6,3 script.jl
:
[ Info: started with 6 threads
[ Info: interactive threadpool: 3
[ Info: default threadpool: 6
****************************************
[ Info: phase 1: control run - no main thread work
[ Info: worker 1 executing on thread 4
[ Info: sleeper 1 executing on thread 8
sleeper 1: 10.020176 seconds (51 allocations: 1.438 KiB)
worker 1: 10.199500 seconds (53.62 k allocations: 3.704 MiB, 0.36% compilation time)
****************************************
[ Info: phase 2: main thread work without yield
[ Info: worker 2 executing on thread 7
[ Info: sleeper 2 executing on thread 5
[ Info: mainspin executing on thread 1
worker 2: 9.999656 seconds (47.22 k allocations: 3.243 MiB, 0.24% compilation time)
sleeper 2: 16.974156 seconds (13.87 k allocations: 957.802 KiB, 0.14% compilation time)
****************************************
[ Info: phase 3: main thread work with yield
[ Info: worker 3 executing on thread 9
[ Info: sleeper 3 executing on thread 5
[ Info: mainspin executing on thread 1
worker 3: 9.999733 seconds (37.05 k allocations: 2.618 MiB, 0.23% compilation time)
sleeper 3: 13.979589 seconds (2.67 k allocations: 249.790 KiB, 0.16% compilation time)
It would be speculative on my part to derive hard conclusions about the root cause of this issue. There is a discourse topic that I open and some users are proposing some hypotheses - feel free to check it out.
The underlying cause of the above issue seems to extend to various packages (for example, non-blocking listen from HTTP.jl
responsivity is highly dependent on the main threat not doing work - and that is not caused by the direct sleep
usage). Essential to mention that HTTP.serve!/HTTP.listen!
is spawned to the :interactive
threadpool.
Libc.systemsleep MWE version
The `Libc.systemsleep` version of the above MWE
using Base.Threads, Dates
@info "started with $(nthreads()) threads"
@info "interactive threadpool: $(nthreads(:interactive))"
@info "default threadpool: $(nthreads(:default))"
println(repeat("*", 40))
function mainspin(s, c; withyield=false)
@info "mainspin executing on thread $(threadid()) "
r = 0.0
counter = 0
while counter < c
Libc.systemsleep(s)
counter += 1
if withyield
yield()
end
end
end
function sleeper(s, c, id)
@info "sleeper $id executing on thread $(threadid()) "
counter = 0
@time "sleeper $id" while counter < c
sleep(s)
counter += 1
end
end
function libcworker(s, c, id)
@info "worker $id executing on thread $(threadid()) "
r = 0.0
counter = 0
@time "worker $id" while counter < c
Libc.systemsleep(s)
counter += 1
yield()
end
end
begin
# phase 1
@info "phase 1: control run - no main thread work"
# yields 10 times (total time ~10 seconds)
task1 = @spawn libcworker(1.02, 10, 1) # 1.02 because weird @time printing overlap
# yields 10 times (total time ~10 seconds)
task11 = @spawn sleeper(1, 10, 1)
wait(task1)
wait(task11)
# we waited for task1 and task11 to finish
# phase 2
println(repeat("*", 40))
@info "phase 2: main thread work without yield"
# yields 10 times (total time ~10 seconds)
task2 = @spawn libcworker(1, 10, 2)
# yields 10 times (total time ~10 seconds)
task22 = @spawn sleeper(1, 10, 2)
# we wait 5 seconds before starting to spin
# the main thread
sleep(5)
# now we start spinning
# while task2 and task22 are already running
mainspin(1, 7)
# main thread work does not impact the real worker
# total time ~10 seconds
wait(task2)
# main thread work heavily impacting the sleeper
# total time >15 seconds
wait(task22)
# phase 3
println(repeat("*", 40))
@info "phase 3: main thread work with yield"
# yields 10 times (total time ~10 seconds)
task3 = @spawn libcworker(1, 10, 3)
# yields 10 times (total time ~10 seconds)
task33 = @spawn sleeper(1, 10, 3)
# we wait 5 seconds before starting to spin
# the main thread
sleep(5)
# now we start spinning with yield
# while task3 and task33 are running
mainspin(1, 7, withyield=true)
# main thread work does not impact
# the real worker - total time ~10 seconds
wait(task3)
# main thread work (with yield) still impacting the sleeper
# but we consistently get better times (total time ~13 seconds)
wait(task33)
end
Output:
[ Info: started with 6 threads
[ Info: interactive threadpool: 3
[ Info: default threadpool: 6
****************************************
[ Info: phase 1: control run - no main thread work
[ Info: worker 1 executing on thread 4
[ Info: sleeper 1 executing on thread 5
sleeper 1: 10.021097 seconds (50 allocations: 1.406 KiB)
worker 1: 10.202331 seconds (7.16 k allocations: 527.908 KiB, 0.89% compilation time)
****************************************
[ Info: phase 2: main thread work without yield
[ Info: sleeper 2 executing on thread 8
[ Info: worker 2 executing on thread 6
[ Info: mainspin executing on thread 1
worker 2: 10.001909 seconds (12.27 k allocations: 864.121 KiB, 0.36% compilation time)
sleeper 2: 17.009272 seconds (28.83 k allocations: 1.998 MiB, 0.21% compilation time)
****************************************
[ Info: phase 3: main thread work with yield
[ Info: sleeper 3 executing on thread 5
[ Info: worker 3 executing on thread 7
[ Info: mainspin executing on thread 1
worker 3: 10.001954 seconds (2.62 k allocations: 246.360 KiB, 0.28% compilation time)
sleeper 3: 13.026204 seconds (30.55 k allocations: 2.179 MiB, 0.22% compilation time)