Details:
Faster JPEG encoding - files attached
Posted By: Frank Van Hooft
Join Date: 2007-12-11
Location: CANADA
Message ID: 62061 I'd been wanting to speed up JPEG encoding on the blackfin. Initially I'd been using FFMPEG, and it's fast thanks to the optimisations coded by the ADI folks. But FFMPEG caused me troubles due to its constant malloc'ing & free'ing of memory; over time I was getting Out Of Memory failures due to memory fragmentation. So I turned to the LIBJPEG library, which is much simpler, but slower due to not being optimised.
The bulk of the time in JPEG encoding is spent in the DCT algorithm. libjpeg ships with 3 different DCT implementations; float, accurate integer, fast integer, all coded in C. I recoded the fast integer routine, jfdctfst.c, into blackfin assembler. The new file, attached, is jfdctfst_bfin.S. The new function name is jfdctfst_bfin
Cycle counts (measured by me on a BF537) are:
Original jfdctfst.c function: 2507 cycles
New jfdctfst_bfin.S function: 1058 cycles
Also attached is a slightly edited Makefile containing a rule for building the .S file.
To try this new function, do the following:
1) Copy the two attached files into your libjpeg directory. The Makefile will overwrite the existing Makefile; you may want to make a copy of your original first.
2) Call the new function. I did it by editing the jfdctfst.c file so it calls the jfdctfst_bfin function instead. This might not be the most elegant, but it works. Add the following function prototype to the jfdctfst.c file:
void jfdctfst_bfin (DCTELEM *buf);
Then call the new assembler function at the beginning of the jpeg_fdct_ifast function:
jfdctfst_bfin (data);
return;
So now, when your libjpeg application calls the fast integer DCT routine, it's actually using the assembler version.
Some final notes. I'm not making any guarantees on this code. All I can say is, it works for me. I've tested it as best I can by feeding sample vectors through it, comparing the results to the original .c source, and it appears correct. JPEG images produced also appear correct. I'm sure it's not optimally fast - no doubt a smarter brain than mine can improve on this. But it's a big improvement over the C version - good enough for me. Enjoy.
|
Details:
Faster JPEG encoding - files attached
Posted By: Frank Van Hooft
Join Date: 2007-12-11
Location: CANADA
Message ID: 62061 I'd been wanting to speed up JPEG encoding on the blackfin. Initially I'd been using FFMPEG, and it's fast thanks to the optimisations coded by the ADI folks. But FFMPEG caused me troubles due to its constant malloc'ing & free'ing of memory; over time I was getting Out Of Memory failures due to memory fragmentation. So I turned to the LIBJPEG library, which is much simpler, but slower due to not being optimised.
The bulk of the time in JPEG encoding is spent in the DCT algorithm. libjpeg ships with 3 different DCT implementations; float, accurate integer, fast integer, all coded in C. I recoded the fast integer routine, jfdctfst.c, into blackfin assembler. The new file, attached, is jfdctfst_bfin.S. The new function name is jfdctfst_bfin
Cycle counts (measured by me on a BF537) are:
Original jfdctfst.c function: 2507 cycles
New jfdctfst_bfin.S function: 1058 cycles
Also attached is a slightly edited Makefile containing a rule for building the .S file.
To try this new function, do the following:
1) Copy the two attached files into your libjpeg directory. The Makefile will overwrite the existing Makefile; you may want to make a copy of your original first.
2) Call the new function. I did it by editing the jfdctfst.c file so it calls the jfdctfst_bfin function instead. This might not be the most elegant, but it works. Add the following function prototype to the jfdctfst.c file:
void jfdctfst_bfin (DCTELEM *buf);
Then call the new assembler function at the beginning of the jpeg_fdct_ifast function:
jfdctfst_bfin (data);
return;
So now, when your libjpeg application calls the fast integer DCT routine, it's actually using the assembler version.
Some final notes. I'm not making any guarantees on this code. All I can say is, it works for me. I've tested it as best I can by feeding sample vectors through it, comparing the results to the original .c source, and it appears correct. JPEG images produced also appear correct. I'm sure it's not optimally fast - no doubt a smarter brain than mine can improve on this. But it's a big improvement over the C version - good enough for me. Enjoy.
|