Code optimization tricks

Discussion and questions about programming with Ultibo.
Brutus
Posts: 22
Joined: Sun Jan 20, 2019 1:24 pm

Code optimization tricks

Postby Brutus » Tue Feb 05, 2019 2:56 am

Hi all,

If you are interested, you might like to gather all information about code optimisation in a single thread.

From what I've seen, FPC is actually not very clever in discarding useless code, so it makes sense to apply some rules to avoid generating useless assembly code lines.

For example, when beginning with Ultibo/FPC, I figured out that multiple setLength commands on a dynamic array was very time consuming (no surprise), so I was making intermediate calculations to use SetLength as little as possible.

So I would do things like:

Code: Select all

var
  myLength:Integer;
  myArray: array of byte;
begin
..
myLength += Length(myString);
myLength += 3 + 2 + 1; //Thinking the compiler would optimise this.
..
SetLength(myArray, myLength);


From checking the generated assembly code, I found out that "myLength += 3 + 2 + 1;" is generating more code than "myLength += 6;", which means that FPC is not too inclined in performance optimisation...

Fortunately, this situation is easily survivable, it is usually just a matter of having an idea of what assembly code the compiler will generate behind.

I will post in this thread my future findings.
pik33
Posts: 829
Joined: Fri Sep 30, 2016 6:30 pm
Location: Poland
Contact:

Re: Code optimization tricks

Postby pik33 » Tue Feb 05, 2019 5:14 pm

Try to set optimization level 3 or 4 instead of default 2 and check if something changed.

This version of fpc for ARM is far from being optimal, but you can always insert some asm code in time critical sections.
Brutus
Posts: 22
Joined: Sun Jan 20, 2019 1:24 pm

Re: Code optimization tricks

Postby Brutus » Wed Feb 06, 2019 3:51 am

pik33 wrote:Try to set optimization level 3 or 4 instead of default 2 and check if something changed.

This version of fpc for ARM is far from being optimal, but you can always insert some asm code in time critical sections.


I always set the optimization level to 4 before checking the generated assembly code.

I usually get better performance results from the compiler by lessening the amount of Pascal lines of code, which generally ends up with less memory access and better use of the registers by the compiler.
The only drawback is that it comes at the expense of code readability, which in turn requires more comments.

I agree with you that assembly is the way to go to maximise the use of the registers and lessen the memory access.
This is where assembly gives a very noticeable increase in performances compared to the FPC compiler.

Quick question BTW: Do you know if the FPC compiler checks the registers you use during the inline assembler portion and saves/restores them automatically before and after the inline assembly code?
pik33
Posts: 829
Joined: Fri Sep 30, 2016 6:30 pm
Location: Poland
Contact:

Re: Code optimization tricks

Postby pik33 » Wed Feb 06, 2019 8:18 am

I thought I have to tell the compiler what registers I use in 'end' line of the asm code. Before I discovered this, I used push/pop

R11 can not be used if you want to access pascal variables in asm code.
Brutus
Posts: 22
Joined: Sun Jan 20, 2019 1:24 pm

Re: Code optimization tricks

Postby Brutus » Wed Feb 06, 2019 10:19 am

pik33 wrote:I thought I have to tell the compiler what registers I use in 'end' line of the asm code. Before I discovered this, I used push/pop

R11 can not be used if you want to access pascal variables in asm code.


Thanks, this will avoid me crashing the system multiple times before find this out! :D
Brutus
Posts: 22
Joined: Sun Jan 20, 2019 1:24 pm

Re: Code optimization tricks

Postby Brutus » Fri Feb 08, 2019 3:22 am

pik33 wrote:I thought I have to tell the compiler what registers I use in 'end' line of the asm code. Before I discovered this, I used push/pop

R11 can not be used if you want to access pascal variables in asm code.


I struggle to find an example of register declaration in the end line.

Would you mind providing an example?

EDIT: Found this https://ultibo.org/forum/viewtopic.php?f=10&t=306&start=50#p3321 and checked your code in https://github.com/pik33/SimpleAudio/blob/master/simpleaudio.pas, but could only find the push/pop method.
Anyway, this would be good enough for me, considering this is apparently not even required for registers R0 to R3:

Ultibo wrote:ARM Register Conventions

There is a well established set of rules about ARM register usage that all modern compilers obey which makes it easy to know what you can use and what you need to save, here is a simple table that shows the registers and their special purpose aliases:

Code: Select all

 R0 to R10 - General Purpose

 R11 - Frame Pointer (FP)
 R12 - Intra Procedure (IP)
 R13 - Stack Pointer (SP)
 R14 - Link Register (LR)
 R15 - Program Counter (PC)

And here is the summarized set of rules about their usage:

Code: Select all

 R0-R3 are caller-saved
 R4-R11 are callee-saved
 R12 (alias IP) is caller-saved
 R13 (alias SP) is callee-saved
 R14 (alias LR) is caller-saved
 R15 (alias PC) is the program counter, return from the procedure by setting it to the value passed in LR (eg BX LR)

Within a procedure you can safely use any register that is caller-saved without needing to take any special action, if you use a register that is callee-saved then you must save the original value and restore it before returning from the procedure. Beware of one small trap, R14 (the Link Register) is caller-saved but you will also need the value of it to return from the procedure so you must preserve the original value.


Also found this on another forum:

Caller-saved registers (AKA volatile registers) are used to hold temporary quantities that neednot be preserved across calls.

For that reason, it is the caller's responsibility to push these registers onto the stack if it wants to restore this value after a procedure call.


Callee-saved registers (AKA non-volatile registers) are used to hold long-lived values that shouldbe preserved across calls.

When the caller makes a procedure call, it can expect that those registers will hold the same value after the callee returns, making it the responsibility of the callee to save them and restore them before returning to the caller.
pik33
Posts: 829
Joined: Fri Sep 30, 2016 6:30 pm
Location: Poland
Contact:

Re: Code optimization tricks

Postby pik33 » Fri Feb 08, 2019 8:20 pm

An example from my project : ( https://github.com/pik33/ultibo_retro_gui )
Seems to be necessary as r0-r3 can be used without this.
I found this end syntax somewhere in fpc tutorials in this year (2019) - SimpleAudio code is earlier - and I stll didn't test it in most of my procedures, as they work with push/pop and there is no need to change this until a procedure can do its job fast enough


Code: Select all

function avg(b1,count:cardinal):cardinal;

label p101;

// rev 20190202

begin


                 asm
                 ldr r0,b1
                 mov r1,#0
                 ldr r2,count

p101:            ldrb r3,[r0],#1
                 add r1,r3
                 subs r2,#1
                 bgt p101

                 str r1,result
                 end ['r0','r1','r2','r3'];

result:=result div count;
end;
Brutus
Posts: 22
Joined: Sun Jan 20, 2019 1:24 pm

Re: Code optimization tricks

Postby Brutus » Sat Feb 09, 2019 4:49 am

Thanks you very much!

As expected, the register list at the end adds the registers to the push/pop instructions generated by the compiler.

But the interesting bit is (using your code and just modifying the end line):

Code: Select all

end ['r0','r1','r2','r4','r5','r6','r7','r8','r9','r10','r11'];


gives

Code: Select all

stmfd   r13!,{r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,r14,r15}


So the compiler seems to consider that r0, r1 and r2 don't need to be saved on the stack as expected, but always saves r3.

I'll try your github project release tonight. Looks very interesting BTW.

Return to “General”

Who is online

Users browsing this forum: No registered users and 0 guests