ftrace can be used for debugging or analyzing latencies and performance issues. ftrace means function tracer. It also includes an infrastructure that allows for other types of tracing, e.g, the time interrupts are disabled, the time it takes for a high priority task to run after it was woken up.
See Documentation/trace/.
ftrace can be controlled using sysfs interface from user space.
Here we use the “cyclictest” (uclinux-dist/user/rt-tests) as an example.
We start cyclictest, using a thread of priority 80, which repeatedly does clock_nanosleep():
cyclictest -p 80 -t 1 -n -l 10000 -i 10000
And we start 5 “ls -lR” as workload:
root:/> cat fork_workload.sh #!/bin/sh cnt=0 max_cnt=5 while [ "$cnt" -lt "$max_cnt" ] do echo -n "$cnt " cnt=`expr $cnt + 1` ./ls_workload.sh & done root:/> cat ls_workload.sh #!/bin/sh while [ 1 ] do ls -lR > /dev/null 2>&1 done
Test result shows the maximum wakeup latency is 11137 us :
policy: fifo: loadavg: 4.97 3.19 1.39 6/30 336 T: 0 ( 236) P:80 I:20000 C: 9972 Min: 37 Act: 68 Avg: 137 Max: 11137
We use “wakeup_rt” tracer to find out where this latency comes from:
root:/> cyclictest -p 80 -t 1 -n -l 10000 -i 20000 -b 8000 -W -f could not set latency_hist/wakeup_latency/reset to 1 policy: fifo: loadavg: 5.00 4.97 4.17 6/30 1122 T: 0 ( 1121) P:80 I:20000 C: 163 Min: 157 Act: 258 Avg: 282 Max: 618 # Thread Ids: 01121 # Break thread: 1121
And we got the trace log:
/sys/kernel/debug/tracing/trace: # tracer: wakeup_rt # # wakeup_rt latency trace v1.1.5 on 2.6.32.6-ADI-2010R1-pre-svn8240 # -------------------------------------------------------------------- # latency: 13646 us, #3824/3824, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0) # ----------------- # | task: -465 (uid:0 nice:0 policy:1 rt_prio:80) # ----------------- # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| /_--=> lock-depth # |||||/ delay # cmd pid |||||| time | caller # \ / |||||| \ | / ls-466 0dNh.. 6us+: 466:120:R + [000] 465: 19:S cyclictest ls-466 0dNh.. 20us+: _wake_up_process <-_hrtimer_wakeup ls-466 0dNh.. 25us+: _check_preempt_wakeup <-_try_to_wake_up ls-466 0dNh.. 28us+: _update_curr <-_check_preempt_wakeup ls-466 0dNh.. 32us+: _update_min_vruntime <-_update_curr ls-466 0dNh.. 41us+: _resched_task <-_check_preempt_wakeup ls-466 0dNh.. 45us+: _tick_program_event <-_hrtimer_interrupt ls-466 0dNh.. 49us+: _tick_dev_program_event <-_tick_program_event ls-466 0dNh.. 52us+: _ktime_get <-_tick_dev_program_event ls-466 0dNh.. 56us+: _ktime_add_ns <-_ktime_get ls-466 0dNh.. 60us+: _clockevents_program_event <-_tick_dev_program_ ls-466 0dNh.. 64us+: _bfin_coretmr_set_next_event <-_clockevents_pro ls-466 0dNh.. 69us+: _note_interrupt <-_handle_simple_irq ls-466 0dNh.. 73us+: _irq_exit <-_asm_do_IRQ ls-466 0.N... 77us+: _calc_reloc <-_load_flat_binary ls-466 0.N... 80us+: _bfin_put_addr_at_rp <-_load_flat_binary ls-466 0.N... 84us+: _calc_reloc <-_load_flat_binary ls-466 0.N... 87us+: _bfin_get_addr_from_rp <-_load_flat_binary ... ls-466 0.N... 13403us+: _set_binfmt <-_load_flat_binary ls-466 0.N... 13419us+: _start_thread <-_load_flat_binary ls-466 0.N... 13426us+: _module_put <-_search_binary_handler ls-466 0.N... 13431us+: _fput <-_search_binary_handler ls-466 0.N... 13437us+: _free_bprm <-_do_execve ls-466 0.N... 13440us+: _free_arg_page <-_free_bprm ... ls-466 0.N... 13541us+: _free_arg_page <-_free_bprm ls-466 0.N... 13544us+: ___free_pages <-_free_arg_page ls-466 0.N... 13549us+: _free_hot_page <-___free_pages ls-466 0.N... 13552us+: _free_hot_cold_page <-_free_hot_page ls-466 0.N... 13556us+: _get_pageblock_flags_group <-_free_hot_cold_pa ls-466 0dN... 13563us+: _free_pcppages_bulk <-_free_hot_cold_page ls-466 0.N... 13572us+: _kfree <-_free_bprm ls-466 0.N... 13579us+: _kmem_cache_free <-_sys_execve ls-466 0.N... 13584us+: _schedule <-_system_call ls-466 0.N... 13588us+: _rcu_sched_qs <-_schedule ls-466 0d.... 13594us+: _update_avg <-_schedule ls-466 0d.... 13597us+: _update_avg <-_schedule ls-466 0d.... 13601us+: _put_prev_task_fair <-_schedule ls-466 0d.... 13604us+: _update_curr <-_put_prev_task_fair ls-466 0d.... 13608us+: _update_min_vruntime <-_update_curr ls-466 0d.... 13614us+: ___enqueue_entity <-_put_prev_task_fair ls-466 0d.... 13623us+: _pick_next_task_rt <-_schedule ls-466 0d.... 13634us+: _schedule <-_system_call ls-466 0d.... 13637us : 466:120:R ==> [000] 465: 19:R cyclictest
By analyzing the log and reading code, we know that kernel wakes up the real-time cyclictest thread in a timer irq handler, and requests to do a reschedule by setting TIF_NEED_RESCHED flag of current thread (ls-466):
ls-466 0dNh.. 20us+: _wake_up_process <-_hrtimer_wakeup
However, kernel spends a lot of time in do_execve()→load_flat_binary(), without any chance to call “schedule()”, until do_execve() system call finally finished (i.e, return to user space). So the wake-up latency happens because kernel cannot be preempted.
ls-466 0d.... 13634us+: _schedule <-_system_call