Delays and thermal management

Discussion and questions about programming with Ultibo.
Brutus
Posts: 32
Joined: Sun Jan 20, 2019 1:24 pm

Delays and thermal management

Postby Brutus » Mon Jan 28, 2019 2:24 pm

Hello everyone,

I'm running a thread on a dedicated CPU for datalogging purposes.
This thread is waiting for an external signal in a while loop that is broken by a boolean moving from False to ... True :lol: .

I'm doing this because the process part of that thread needs to start running with the smallest possible delay for timing purposes.

The process needs to be executed every 100µs, so I obviously cannot use a "Sleep" inside the waiting loop since the sleep function has a 1ms resolution...

This makes that CPU running @100% all the time which increases the BCM temperature, something I would like to avoid to prevent the use of a cooling fan in my system's enclosure or having the Processor throttle back in frequency due to over temperature.

As far as I understand, using "Microseconddelay" will not solve that issue, but I was wondering if using some inline assembler functions (like NOPs?) couldn't prevent the CPU load to be 100% all the time?

Thanks in advance if you have any idea! :D
User avatar
Ultibo
Site Admin
Posts: 2255
Joined: Sat Dec 19, 2015 3:49 am
Location: Australia

Re: Delays and thermal management

Postby Ultibo » Mon Jan 28, 2019 10:58 pm

Brutus wrote:This makes that CPU running @100% all the time which increases the BCM temperature, something I would like to avoid to prevent the use of a cooling fan in my system's enclosure or having the Processor throttle back in frequency due to over temperature.

As far as I understand, using "Microseconddelay" will not solve that issue, but I was wondering if using some inline assembler functions (like NOPs?) couldn't prevent the CPU load to be 100% all the time?

Hello,

You have run into one of the key differences between microcontrollers and microprocessors. In general the thermal management on a microprocessor is not designed around the processor running in continuous loops while polling for an external input and the designers go to extreme lengths to provide a range of different options which can be used to avoid doing that.

There are two immediate possibilities I can think of which might help with this case.

The first is to place the CPU into low power mode while waiting using either the WFI or WFE instructions, the scheduler in Ultibo uses WFE by default to place the CPU into low power state whenever there is nothing else to do. You could directly call the WaitForEvent() function or you could use MicrosecondDelayEx() and pass True for the Wait parameter which will perform a wait for event during each loop internally.

In either case you will need to be sure that an event occurs often enough to wake your dedicated CPU within the 100us timeframe. An event is triggered whenever an interrupt occurs and also whenever any CPU executes the SEV (SetEvent) instruction. Ultibo executes an SEV every time a spin lock is released but to be certain you might need to setup a hardware timer interrupt that simply executes SetEvent or SEV at or above the frequency required.

The second option is to change to using an interrupt driven model, you don't say what the external signal is but if it is a GPIO input then you could use an interrupt instead on the GPIO to trigger your action, alternatively using a hardware timer with a trigger every 100us (or less) could be another way to avoid using the continuous loop.

Let us know if we can offer any more input on how to implement these ideas or if we have missed the point completely about what you are needing to achieve.
Ultibo.org | Make something amazing
https://ultibo.org
Brutus
Posts: 32
Joined: Sun Jan 20, 2019 1:24 pm

Re: Delays and thermal management

Postby Brutus » Tue Jan 29, 2019 2:32 am

Ultibo wrote:You have run into one of the key differences between microcontrollers and microprocessors. In general the thermal management on a microprocessor is not designed around the processor running in continuous loops while polling for an external input and the designers go to extreme lengths to provide a range of different options which can be used to avoid doing that.


Thanks for your answers, I will go to bed a bit more knowledgeable tonight... :)

Ultibo wrote:The first is to place the CPU into low power mode while waiting using either the WFI or WFE instructions, the scheduler in Ultibo uses WFE by default to place the CPU into low power state whenever there is nothing else to do. You could directly call the WaitForEvent() function or you could use MicrosecondDelayEx() and pass True for the Wait parameter which will perform a wait for event during each loop internally.

In either case you will need to be sure that an event occurs often enough to wake your dedicated CPU within the 100us timeframe. An event is triggered whenever an interrupt occurs and also whenever any CPU executes the SEV (SetEvent) instruction. Ultibo executes an SEV every time a spin lock is released but to be certain you might need to setup a hardware timer interrupt that simply executes SetEvent or SEV at or above the frequency required.


I understand well, this means that MicrosecondDelayEx(10, True) for example will not exit after 10µs if no SEV instruction has occured, is that right ?

Ultibo wrote:The second option is to change to using an interrupt driven model, you don't say what the external signal is but if it is a GPIO input then you could use an interrupt instead on the GPIO to trigger your action, alternatively using a hardware timer with a trigger every 100us (or less) could be another way to avoid using the continuous loop.


I was using timed interrupt before, but I changed to a waiting loop for the following reasons:

1/I wanted my datalogging to be independant from the rest of the software. Now the datalogging part is running on CPUs 2&3 and CPU 1&2 can be used as a "playground" (OpenVG display, polling sensors...).

2/I'd like to keep the interrupts for other purposes. I might be wrong but I think I remember having read that interrupts should all occur on the same processor.

3/When using 1ms interrupt on CPU0, I had nastly glitches during OpenVG path drawings when executing "drawPath" commands (I think it was on the same CPU0)


To give a bit more insight about this datalogger:

-On program startup, I "register" variables pointers in the datalogger queue then start the datalogging threads (I can set the datalogging delay at whatever frequency I like).

-On CPU2, a thread (the one I was talking about at the begining of this thread) collects the data from the registered pointers every millisecond (Boolean is set to True when ClockgetTotal reaches the threshold value) and stores them in memory.

-When a determined amount of values has been stored (100000 values/signal after 10 seconds@10kHz for example), the thread on CPU2 triggers a Boolean that starts the write file process on the thread located on CPU3.


This works extremely well thanks to the wonderful job you've done with Ultibo, there's absolutely no slow down whatsoever on CPU2 data collection during file write on CPU3 and I can do whatever I want on CPU0 and CPU1 just ignoring everything is datalogged in the background.

Actually, if it wasn't working so well, I would have other worries than caring for the BCM temperature! :D
User avatar
Ultibo
Site Admin
Posts: 2255
Joined: Sat Dec 19, 2015 3:49 am
Location: Australia

Re: Delays and thermal management

Postby Ultibo » Tue Jan 29, 2019 10:13 am

Brutus wrote:I understand well, this means that MicrosecondDelayEx(10, True) for example will not exit after 10µs if no SEV instruction has occured, is that right ?

Yes that's correct, the reason both WFI and WFE exist is to support power management and taken to their extreme you can put the entire SoC into low power mode waiting to be woken by some external trigger. Ultibo doesn't support power management at that level yet (unless you are willing to do some extra work) but you can make use of the wait instructions to help reduce the amount of heat generated by the chip running at full speed.

The simplest way to look at it is that calling WFE will only return when either an interrupt occurs or SEV is executed on any CPU, because we know that the scheduler interrupt on every CPU in Ultibo executes at 500us we can be certain that the maximum time between events will be 500us. To ensure it is lower you can setup an interrupt using another hardware timer to trigger at the required rate.

Brutus wrote:1/I wanted my datalogging to be independant from the rest of the software. Now the datalogging part is running on CPUs 2&3 and CPU 1&2 can be used as a "playground" (OpenVG display, polling sensors...).

I can see the logic in that, the dedicated CPU functionality is something many would like to have available in Linux but it is not easy to achieve.

Brutus wrote:2/I'd like to keep the interrupts for other purposes. I might be wrong but I think I remember having read that interrupts should all occur on the same processor.

The BCM283X SoC routes all normal interrupts (USB, DMA, SPI, I2C, SD etc) to one of the four CPUs (normally CPU0 but it is actually configurable). The other three CPUs can receive certain local interrupts including a number of mailboxes and a couple of timers. Since Ultibo only uses 1 timer on each CPU and one extra on CPU0 (for the timer queue) then there is at least one hardware timer available on each to use for other purposes.

Brutus wrote:3/When using 1ms interrupt on CPU0, I had nastly glitches during OpenVG path drawings when executing "drawPath" commands (I think it was on the same CPU0)

That seems a little odd, the scheduler interrupt on each CPU runs at 500us and the timer queue interrupt on CPU0 runs at 1ms without any noticeable impact, depending on what USB devices are connected it is possible to have in excess of 8000 interrupts per second in normal operation without any obvious performance degradation. In fact in testing we have run up to 250,000 interrupts per second and while it does create a performance impact it doesn't stop the entire system.

If you want to explore timers further to experiment with using WFE for power management let us know, we probably have some example code to start with.
Ultibo.org | Make something amazing
https://ultibo.org
Brutus
Posts: 32
Joined: Sun Jan 20, 2019 1:24 pm

Re: Delays and thermal management

Postby Brutus » Thu Jan 31, 2019 5:14 am

Ultibo wrote:The simplest way to look at it is that calling WFE will only return when either an interrupt occurs or SEV is executed on any CPU, because we know that the scheduler interrupt on every CPU in Ultibo executes at 500us we can be certain that the maximum time between events will be 500us. To ensure it is lower you can setup an interrupt using another hardware timer to trigger at the required rate.


Understood, I'll give it go when time permits. :)

Ultibo wrote:I can see the logic in that, the dedicated CPU functionality is something many would like to have available in Linux but it is not easy to achieve.


That's just fantastic to be able to use that in Ultibo.
If this functionality wasn't there, I would never have been able to make my datalogger work as good as it does.

Ultibo wrote:The BCM283X SoC routes all normal interrupts (USB, DMA, SPI, I2C, SD etc) to one of the four CPUs (normally CPU0 but it is actually configurable). The other three CPUs can receive certain local interrupts including a number of mailboxes and a couple of timers. Since Ultibo only uses 1 timer on each CPU and one extra on CPU0 (for the timer queue) then there is at least one hardware timer available on each to use for other purposes.


Thank you very much for the info!

Ultibo wrote:
Brutus wrote:3/When using 1ms interrupt on CPU0, I had nastly glitches during OpenVG path drawings when executing "drawPath" commands (I think it was on the same CPU0)

That seems a little odd, the scheduler interrupt on each CPU runs at 500us and the timer queue interrupt on CPU0 runs at 1ms without any noticeable impact, depending on what USB devices are connected it is possible to have in excess of 8000 interrupts per second in normal operation without any obvious performance degradation. In fact in testing we have run up to 250,000 interrupts per second and while it does create a performance impact it doesn't stop the entire system.


From what I can see on the screen (objects at wrong angles, crazy resize for one frame every so often), I would tend to think that when the interruption occurs during pipeline transmission to GPU, the GPU gets some garbage. But that's just a guess...

I have no idea if the pipeline transmission to GPU has to be realtime and shouldn't be interrupted, but one thing for sure is that this happens no matter the time it takes to execute the code in the interrupt (I tried with just one line of code with same result), and the problem does immediately disappear when I stop the interrupt.

Ultibo wrote:If you want to explore timers further to experiment with using WFE for power management let us know, we probably have some example code to start with.


Thank you.
At the moment, I need to find out how I should start a timing interrupt on specific CPU.
Do I just need to ensure the thread that is starting the timer is running on the CPU I want the timer to run on?
User avatar
Ultibo
Site Admin
Posts: 2255
Joined: Sat Dec 19, 2015 3:49 am
Location: Australia

Re: Delays and thermal management

Postby Ultibo » Thu Jan 31, 2019 11:00 am

Brutus wrote:From what I can see on the screen (objects at wrong angles, crazy resize for one frame every so often), I would tend to think that when the interruption occurs during pipeline transmission to GPU, the GPU gets some garbage. But that's just a guess...

I have no idea if the pipeline transmission to GPU has to be realtime and shouldn't be interrupted, but one thing for sure is that this happens no matter the time it takes to execute the code in the interrupt (I tried with just one line of code with same result), and the problem does immediately disappear when I stop the interrupt.

Could I ask which timer API you were using for this, for example did you use TimerDeviceEvent() from the Devices unit and did you pass the TIMER_EVENT_FLAG_INTERRUPT when registering the event?

Could you post a small snippet that shows what you tried for the timer setup?

Brutus wrote:At the moment, I need to find out how I should start a timing interrupt on specific CPU.
Do I just need to ensure the thread that is starting the timer is running on the CPU I want the timer to run on?

The main hardware timer devices on the Pi will all trigger their interrupt to the primary CPU (normally CPU0) but each CPU also has a set of generic timers that are local to that CPU and will trigger an interrupt only to that CPU.

We don't have a generic driver at present for these hardware timers but the scheduler on each CPU uses one of them and the other one is free except on CPU0 where it is used for the software timer queue. We can provide a code snippet that shows how to enable this spare timer on CPU1 to 3 and register an interrupt handler for it if you like.
Ultibo.org | Make something amazing
https://ultibo.org
Brutus
Posts: 32
Joined: Sun Jan 20, 2019 1:24 pm

Re: Delays and thermal management

Postby Brutus » Fri Feb 01, 2019 2:41 am

Ultibo wrote:Could I ask which timer API you were using for this, for example did you use TimerDeviceEvent() from the Devices unit and did you pass the TIMER_EVENT_FLAG_INTERRUPT when registering the event?


I did what you mention in your example, Sir! :mrgreen:

Ultibo wrote:Could you post a small snippet that shows what you tried for the timer setup?


Here it is ( :mrgreen: again)

Code: Select all

 
  TimerDeviceSetRate(TimerDeviceGetDefault,1000000);
  TimerDeviceSetInterval(TimerDeviceGetDefault,DataLogInterval); //DataLogInterval was 1000 at the time of the test
  BCM2710ARM_TIMER_FIQ_ENABLED:=True;
  //BCM2710ARM_TIMER_FIQ_ENABLED:=False;
  TimerDeviceStart(TimerDeviceGetDefault);
  TimerDeviceEvent(TimerDeviceGetDefault,TIMER_EVENT_FLAG_REPEAT or TIMER_EVENT_FLAG_INTERRUPT,@TimerDataLogWrite,nil);


Ultibo wrote:The main hardware timer devices on the Pi will all trigger their interrupt to the primary CPU (normally CPU0) but each CPU also has a set of generic timers that are local to that CPU and will trigger an interrupt only to that CPU.

We don't have a generic driver at present for these hardware timers but the scheduler on each CPU uses one of them and the other one is free except on CPU0 where it is used for the software timer queue. We can provide a code snippet that shows how to enable this spare timer on CPU1 to 3 and register an interrupt handler for it if you like.


That would really be nice of you! :D
User avatar
Ultibo
Site Admin
Posts: 2255
Joined: Sat Dec 19, 2015 3:49 am
Location: Australia

Re: Delays and thermal management

Postby Ultibo » Sat Feb 02, 2019 2:00 am

Here's some simple modifications to the DedicatedCPU example to setup a generic timer interrupt on CPU3.

This is based on the example as published which is done for a Pi2 (works also on Pi3 etc) but you can easily modify it to be Pi3 specific by changing BCM2836 to BCM2837 and ARMv7 to ARMv8.

The following changes are applied to the ThreadUnit

Add the PlatformARMv7 unit to the uses clause

Code: Select all

uses
 ...
 PlatformARMv7;


Add a variable to count the number of times the interrupt is triggered (so we can check if it is working).

Code: Select all

var
 Interrupts:LongWord;


Add an interrupt handler for the generic timer interrupt, this sets up the next timer interrupt and then simply increments a counter to prove it is working. You can add whatever extra work you need to the interrupt handler.

Code: Select all

procedure GenericTimerInterrupt(Parameter:Pointer);
var
 Next:LongWord;
 Current:LongInt;
begin
 //Get Timer Value
 Current:=ARMv7GetTimerValue(ARMV7_CP15_C14_CNTV);
 
 //Setup Next
 if Current < 0 then
  begin
   Next:=Current + 1000;
   if LongInt(Next) < 0 then Next:=1000;
  end
 else
  begin
   Next:=1000;
  end; 
 
 //Set Timer Value
 ARMv7SetTimerValue(ARMV7_CP15_C14_CNTV,Next);
 
 //Update Counter
 Inc(Interrupts);
end;


Initialize the generic timer and register the interrupt, this is done at the start of the DedicatedThreadExecute function but it must happen AFTER the thread has migrated to CPU3. Generic timers are only accessible from a thread running directly on the CPU, so a thread on CPU0 cannot configure a generic timer on CPU3.

The generic timers run at 1MHz on the Pi (based on the local peripherals clock) so if we set an interval of 1000 that will be a 1ms interrupt, I also tried an interval of 100 (a 100us interrupt) and that worked as well.

Code: Select all

function DedicatedThreadExecute(Parameter:Pointer):PtrInt;
var
 State:LongWord;
 ...
begin
 Result:=0;
 
 {Do a loop while we are not on our dedicated CPU}
 ...

 //Create a Generic Timer Interrupt, this must be AFTER switching to CPU3
 //Request the Timer IRQ
 RequestIRQ(CPUGetcurrent,BCM2836_IRQ_LOCAL_ARM_CNTVIRQ,@GenericTimerInterrupt,nil);
 
 //Setup the Generic Timer
 State:=ARMv7GetTimerState(ARMV7_CP15_C14_CNTV);
 State:=State and not(ARMV7_CP15_C14_CNT_CTL_IMASK); {Clear the mask bit}
 State:=State or ARMV7_CP15_C14_CNT_CTL_ENABLE;      {Set the enable bit}
 ARMv7SetTimerState(ARMV7_CP15_C14_CNTV,State);

 //Set Timer Value
 ARMv7SetTimerValue(ARMV7_CP15_C14_CNTV,1000);
 
 ...
 {Continue with the rest of the example}
 

That should provide a useful starting point for experimentation, let us know how you go.
Ultibo.org | Make something amazing
https://ultibo.org
Brutus
Posts: 32
Joined: Sun Jan 20, 2019 1:24 pm

Re: Delays and thermal management

Postby Brutus » Sat Feb 02, 2019 3:19 am

Now THAT is a clear and precise explaination with all the means to implement! :)

Thank you again for taking the time.

I'll post the results here.

I wish you a nice week-end.
pik33
Posts: 878
Joined: Fri Sep 30, 2016 6:30 pm
Location: Poland
Contact:

Re: Delays and thermal management

Postby pik33 » Sat Feb 02, 2019 7:17 am

A very useful modification.

Return to “General”

Who is online

Users browsing this forum: No registered users and 0 guests