world leader in high performance signal processing
Trace: » ftrace

Using ftrace

ftrace can be used for debugging or analyzing latencies and performance issues. ftrace means function tracer. It also includes an infrastructure that allows for other types of tracing, e.g, the time interrupts are disabled, the time it takes for a high priority task to run after it was woken up.

Document and reference

See Documentation/trace/.

Examples

ftrace can be controlled using sysfs interface from user space.

wake up latency

Here we use the “cyclictest” (uclinux-dist/user/rt-tests) as an example.

We start cyclictest, using a thread of priority 80, which repeatedly does clock_nanosleep():

cyclictest -p 80 -t 1 -n -l 10000 -i 10000

And we start 5 “ls -lR” as workload:

root:/> cat fork_workload.sh 
#!/bin/sh

cnt=0
max_cnt=5
while [ "$cnt" -lt "$max_cnt" ]
do
	echo -n "$cnt "
	cnt=`expr $cnt + 1`
	./ls_workload.sh &
done
root:/> cat ls_workload.sh 
#!/bin/sh

while [ 1 ]
do
	ls -lR > /dev/null 2>&1
done

Test result shows the maximum wakeup latency is 11137 us :

policy: fifo: loadavg: 4.97 3.19 1.39 6/30 336          

T: 0 (  236) P:80 I:20000 C:   9972 Min:     37 Act:   68 Avg:  137 Max:   11137

We use “wakeup_rt” tracer to find out where this latency comes from:

root:/> cyclictest -p 80 -t 1 -n -l 10000 -i 20000 -b 8000 -W -f
could not set latency_hist/wakeup_latency/reset to 1
policy: fifo: loadavg: 5.00 4.97 4.17 6/30 1122          

T: 0 ( 1121) P:80 I:20000 C:    163 Min:    157 Act:  258 Avg:  282 Max:     618
# Thread Ids: 01121
# Break thread: 1121

And we got the trace log:

/sys/kernel/debug/tracing/trace:

# tracer: wakeup_rt
#
# wakeup_rt latency trace v1.1.5 on 2.6.32.6-ADI-2010R1-pre-svn8240
# --------------------------------------------------------------------
# latency: 13646 us, #3824/3824, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0)
#    -----------------
#    | task: -465 (uid:0 nice:0 policy:1 rt_prio:80)
#    -----------------
#
#                  _------=> CPU#
#                 / _-----=> irqs-off
#                | / _----=> need-resched
#                || / _---=> hardirq/softirq
#                ||| / _--=> preempt-depth
#                |||| /_--=> lock-depth
#                |||||/     delay
#  cmd     pid   |||||| time  |   caller
#     \   /      ||||||   \   |   /
      ls-466     0dNh..    6us+:    466:120:R   + [000]   465: 19:S cyclictest
      ls-466     0dNh..   20us+: _wake_up_process <-_hrtimer_wakeup
      ls-466     0dNh..   25us+: _check_preempt_wakeup <-_try_to_wake_up
      ls-466     0dNh..   28us+: _update_curr <-_check_preempt_wakeup
      ls-466     0dNh..   32us+: _update_min_vruntime <-_update_curr
      ls-466     0dNh..   41us+: _resched_task <-_check_preempt_wakeup  
      ls-466     0dNh..   45us+: _tick_program_event <-_hrtimer_interrupt  
      ls-466     0dNh..   49us+: _tick_dev_program_event <-_tick_program_event
      ls-466     0dNh..   52us+: _ktime_get <-_tick_dev_program_event         
      ls-466     0dNh..   56us+: _ktime_add_ns <-_ktime_get                   
      ls-466     0dNh..   60us+: _clockevents_program_event <-_tick_dev_program_
      ls-466     0dNh..   64us+: _bfin_coretmr_set_next_event <-_clockevents_pro
      ls-466     0dNh..   69us+: _note_interrupt <-_handle_simple_irq           
      ls-466     0dNh..   73us+: _irq_exit <-_asm_do_IRQ                        
      ls-466     0.N...   77us+: _calc_reloc <-_load_flat_binary                
      ls-466     0.N...   80us+: _bfin_put_addr_at_rp <-_load_flat_binary       
      ls-466     0.N...   84us+: _calc_reloc <-_load_flat_binary              
      ls-466     0.N...   87us+: _bfin_get_addr_from_rp <-_load_flat_binary 
      ...
      ls-466     0.N... 13403us+: _set_binfmt <-_load_flat_binary             
      ls-466     0.N... 13419us+: _start_thread <-_load_flat_binary           
      ls-466     0.N... 13426us+: _module_put <-_search_binary_handler        
      ls-466     0.N... 13431us+: _fput <-_search_binary_handler                
      ls-466     0.N... 13437us+: _free_bprm <-_do_execve                       
      ls-466     0.N... 13440us+: _free_arg_page <-_free_bprm 
      ...
      ls-466     0.N... 13541us+: _free_arg_page <-_free_bprm                   
      ls-466     0.N... 13544us+: ___free_pages <-_free_arg_page                
      ls-466     0.N... 13549us+: _free_hot_page <-___free_pages              
      ls-466     0.N... 13552us+: _free_hot_cold_page <-_free_hot_page        
      ls-466     0.N... 13556us+: _get_pageblock_flags_group <-_free_hot_cold_pa
      ls-466     0dN... 13563us+: _free_pcppages_bulk <-_free_hot_cold_page     
      ls-466     0.N... 13572us+: _kfree <-_free_bprm                           
      ls-466     0.N... 13579us+: _kmem_cache_free <-_sys_execve           
      ls-466     0.N... 13584us+: _schedule <-_system_call                    
      ls-466     0.N... 13588us+: _rcu_sched_qs <-_schedule                   
      ls-466     0d.... 13594us+: _update_avg <-_schedule                     
      ls-466     0d.... 13597us+: _update_avg <-_schedule                       
      ls-466     0d.... 13601us+: _put_prev_task_fair <-_schedule               
      ls-466     0d.... 13604us+: _update_curr <-_put_prev_task_fair            
      ls-466     0d.... 13608us+: _update_min_vruntime <-_update_curr           
      ls-466     0d.... 13614us+: ___enqueue_entity <-_put_prev_task_fair       
      ls-466     0d.... 13623us+: _pick_next_task_rt <-_schedule                
      ls-466     0d.... 13634us+: _schedule <-_system_call                    
      ls-466     0d.... 13637us :    466:120:R ==> [000]   465: 19:R cyclictest

By analyzing the log and reading code, we know that kernel wakes up the real-time cyclictest thread in a timer irq handler, and requests to do a reschedule by setting TIF_NEED_RESCHED flag of current thread (ls-466):

ls-466     0dNh..   20us+: _wake_up_process <-_hrtimer_wakeup

However, kernel spends a lot of time in do_execve()→load_flat_binary(), without any chance to call “schedule()”, until do_execve() system call finally finished (i.e, return to user space). So the wake-up latency happens because kernel cannot be preempted.

ls-466     0d.... 13634us+: _schedule <-_system_call