world leader in high performance signal processing
Trace: » statistical_profiling
Table of Contents

Statistical Profiling

VisualDSP++ provides profiling methods that measure program performance by sampling the target's Program Counter (PC) register via JTAG. The data collected by this method can be used to indicate where the application is spending its time, but not generate a call graph. In other words, you know where the PC is, but not how it got there.

A statistical profile measures the performance of a system (userspace and kernel, including all interrupts) by sampling the target's PC register at random intervals while the target is running. Most of the execution time in the program is in the areas where most of the PC registers are concentrated. JTAG sampling is completely non-intrusive, so the process does not incur any additional run-time overhead.

Since this is a sampling system, it works best when the application or algorithm can be sampled for a long time - the more samples, the more accurate the results will be.

In order to profile a Linux application, we need to map program symbols (including kernel symbol, application symbol, library symbol) to the memory addresses (PC value) recorded by VDSP profiling tools. The steps roughly looks like:

  • Start the application. While it is running, gathering profiling data using VDSP and runtime system memory map (where are different system components located in memory when the program is running).
  • Collect symbol information for each component you are intrested in, including kernel, library and application. This can be done offline.
  • Post-process the profiling data and symbol information, then analyze.

Any of the files referenced (like bfin-uclinux-vdsp_profiler.pl) can be found in the uClinux-dist SVN under the uClinux-dist/testsuites/adi_ice_helpers/ directory.

Example

Below is a small example showing how to use VDSP++'s JTAG emulator to do statistical profiling of a Linux Application in 9 easy steps.

The applications and library which are used for this should NOT be stripped, otherwise you will get error “no symbol”. Both LOCAL and GLOBAL symbols are necessary for correct profiling.

  1. Compile a kernel, and download the application to be debugged
  2. With the VDSP++ JTAG unit attached, and VDSP++ running, reboot the target, and load the uClinux distribution onto the system
  3. Run the Application in whatever method you want to profile
  4. Copy the various memory maps from the board to your host via ethernet
    rgetz@imhotep:~/vlc> ./grab_maps_from_board.sh 10.64.204.76
  5. Turn on profiling in VDSP++ and capture some information (see Tools → Statistical Profiling)
  6. Turn off profiling in VDSP++ and save the file in text file (.txt) format
  7. Copy the profile over to a Linux host, and run it through a little script, to convert and sort it properly
    rgetz@imhotep:~> ./bfin-uclinux-vdsp_convert.sh nbench.txt

    This example will output a file called nbench.txt.prof.

  8. We need to obtain the map files for applications and libraries used at runtime.
    1. The first needed is the ones used by the toolchain:
      rgetz@imhotep:~> ./grab_maps_from_toolchain.sh
    2. The next ones needed are the ones from the uClinux-dist:
      rgetz@imhotep:~> ./grab_maps_from_dist.sh ~/blackfin/trunk/uClinux-dist
    3. If the application is not in the dist, you can generate the map from hand with nm:
      $ bfin-uclinux-nm -n dhrystone | grep -e " [tT] " > dhrystone.map

      This should produce a file which includes all the functions in the application and their offsets.

  9. To change the raw addresses to kernel and applications functions:
    rgetz@imhotep:~> ./bfin-uclinux-vdsp_profiler.pl nbench.txt.prof System.map user.list > nbench.out

    This will make an output file which looks like <num samples> <address> <application>:<function>. It will come out default sorted by address, to get an idea of which function consumes the most MIPS, use:

    rgetz@imhotep:~> sort -r nbench.out | less

This will provide something like:

03759647 002889d4 nbench:___unpack_d
03540342 00288ddc nbench:___muldf3
02921212 00288ac8 nbench:__fpadd_parts
02847978 002883d8 nbench:___muldi3
02013039 0028974c nbench:___pack_d
01651564 00284e68 nbench:_DoAssignIteration
01524563 0052681c libgcc_s.so.1:__fpadd_parts
01516052 00283c7c nbench:_NumSift
01513159 0015d53c libuClibc-0.9.29.so:___GI_memmove
01216947 00283ce0 nbench:_ToggleBitRun
01197119 00526728 libgcc_s.so.1:___unpack_d
01116812 00283d40 nbench:_mul
01052819 00526b30 libgcc_s.so.1:___muldf3

Complete Table of Contents/Topics