FramebufferDeviceWrite with FRAMEBUFFER_TRANSFER_DMA strange results

Think you found a bug? Report it here.
Poi
Posts: 36
Joined: Mon Jan 07, 2019 11:57 pm

FramebufferDeviceWrite with FRAMEBUFFER_TRANSFER_DMA strange results

Postby Poi » Sun Feb 24, 2019 7:46 pm

The example program uses several calls to FramebufferDeviceWrite to draw a couple of squares.

The square which is drawn with no additional flags, ends up with the correct color every time.

The square drawn using the FRAMEBUFFER_TRANSFER_DMA flag produces strange results. In the example the drawn square sometimes is the expected red, but most of the time some of the pixels, if not all will end up with what looks like random values.

Code: Select all

program test;

{$mode delphi}{$H+}

uses
   RaspberryPi,
   GlobalConfig,
   GlobalConst,
   GlobalTypes,
   Platform,
   Threads,
   SysUtils,
   Classes,
   Ultibo,
   Devices,
   Framebuffer,
   Console,

   PiTFT35;

var
   FramebufferDevice:PFramebufferDevice;
   FramebufferProperties:TFramebufferProperties;

   buffer: Array of Word;
   colorRed: Word;
   loopIndex: Integer;

begin
   FramebufferDevice:=FramebufferDeviceFindByDescription('Adafruit PiTFT 3.5" LCD');
   FramebufferDeviceGetProperties(FramebufferDevice, @FramebufferProperties);
   FramebufferDeviceRelease(FramebufferDevice);

   Sleep(1000);

   FramebufferProperties.Depth := FRAMEBUFFER_DEPTH_16;

   FramebufferDeviceAllocate(FramebufferDevice, @FramebufferProperties);

   Sleep(1000);

   setlength(buffer, 10);
   colorRed := SwapEndian(Word((31 shl 11) or (0 shl 5) or (0 shl 0)));

   for loopIndex := 0 to 9 do
   begin
      buffer[loopIndex] := colorRed;
   end;

   // DMA transfer
   for loopIndex := 0 to 10 do
   begin
      FramebufferDeviceWrite(FramebufferDevice, 10, 2 + loopIndex, buffer, 10, FRAMEBUFFER_TRANSFER_DMA);
   end;

   // Memory move
   for loopIndex := 0 to 10 do
   begin
      FramebufferDeviceWrite(FramebufferDevice, 30, 2 + loopIndex, buffer, 10, 0);
   end;

   {Halt the thread}
   ThreadHalt(0);
end.
pik33
Posts: 887
Joined: Fri Sep 30, 2016 6:30 pm
Location: Poland
Contact:

Re: FramebufferDeviceWrite with FRAMEBUFFER_TRANSFER_DMA strange results

Postby pik33 » Mon Feb 25, 2019 5:51 am

I don't know exactly your environment, but I experimented with drawing on the framebuffer using DMA and the problem was the cache.

The CPU doesn't know that the DMA drawn something on the framebuffer, so if it is cleaned later and if there was framebuffer related data in the cache, it will overwrite what you already drawn with DMA and this looks like garbage on the screen.

There are 2 solutions:

(1) don't touch the frameufffer memory with the CPU at all
(2) clear the CPU cache before DMA draw (so it will not output garbage later), invalidate the CPU cche after DMA draw (so it can read new values)
User avatar
Ultibo
Site Admin
Posts: 2280
Joined: Sat Dec 19, 2015 3:49 am
Location: Australia

Re: FramebufferDeviceWrite with FRAMEBUFFER_TRANSFER_DMA strange results

Postby Ultibo » Mon Feb 25, 2019 9:36 am

Poi wrote:The square drawn using the FRAMEBUFFER_TRANSFER_DMA flag produces strange results. In the example the drawn square sometimes is the expected red, but most of the time some of the pixels, if not all will end up with what looks like random values.

We can reproduce the problem using your example but only just, for us it works correctly pretty much every time and only gives a few random pixels about 1 out of 20 tries on a Pi B+ (never seems to fail on a Pi 2 or 3).

Since your result seems to be more consistent then it might help if you can do a couple of tests to help narrow down the cause, remembering that the Adafruit PiTFT 3.5" LCD is an SPI device so there are actually two DMA transfers happening, one to copy the data to the framebuffer and another to send the dirty framebuffer region to the SPI.

The framebuffer API seems to have everything in place to correctly deal with cleaning and invalidating the cache in the appropriate places but since we can reproduce the issue (just) there must be something that is not exactly correct.

Can you try the following two changes to your example and report the results.

First one is to simply clean the cache before calling FramebufferDeviceWrite, something like this should work:

Code: Select all

   ...
 
   for loopIndex := 0 to 9 do
  begin
     buffer[loopIndex] := colorRed;
  end;

  // Add this line
  CleanDataCacheRange(LongWord(Buffer),Length(buffer) * SizeOf(Word));
 
  // DMA transfer
  for loopIndex := 0 to 10 do
  begin
     FramebufferDeviceWrite(FramebufferDevice, 10, 2 + loopIndex, buffer, 10, FRAMEBUFFER_TRANSFER_DMA);
  end;
 
  ...
 


The second test is to use a buffer specifically allocated for DMA even though the original one should be perfectly ok.

You need to make a couple of modifications to the example (and remove the change above from the first test)

Code: Select all

  ...
 
// Define a buffer type 
type
  TTestBuffer = array[0..9] of Word;
  PTestbuffer = ^TTestBuffer;

var
   FramebufferDevice:PFramebufferDevice;
   FramebufferProperties:TFramebufferProperties;

   // And declare the buffer as the new type
   buffer: PTestbuffer;
   //buffer: Array of Word;
  ... 
 
 begin
 
   ...
   
  // Allocate the buffer using DMAAllocate instead of SetLength
  //setlength(buffer, 10);
  buffer := DMAAllocateBuffer(SizeOf(TTestBuffer));
  colorRed := SwapEndian(Word((31 shl 11) or (0 shl 5) or (0 shl 0))); 
 
  ...
 


One or both of those tests should give an indication as to what is happening, or at least what to test next.
Ultibo.org | Make something amazing
https://ultibo.org
Poi
Posts: 36
Joined: Mon Jan 07, 2019 11:57 pm

Re: FramebufferDeviceWrite with FRAMEBUFFER_TRANSFER_DMA strange results

Postby Poi » Mon Feb 25, 2019 10:40 am

Cool thanks, I'll try those things as soon as possible.

I forgot to mention that I am using a Raspberry Pi Zero W, dunno if that would make a difference.
User avatar
Ultibo
Site Admin
Posts: 2280
Joined: Sat Dec 19, 2015 3:49 am
Location: Australia

Re: FramebufferDeviceWrite with FRAMEBUFFER_TRANSFER_DMA strange results

Postby Ultibo » Mon Feb 25, 2019 10:21 pm

Poi wrote:I forgot to mention that I am using a Raspberry Pi Zero W, dunno if that would make a difference.

From a software perspective the Zero W and B+ will look (and behave) almost identical so it shouldn't make any difference to the results.
Ultibo.org | Make something amazing
https://ultibo.org
Poi
Posts: 36
Joined: Mon Jan 07, 2019 11:57 pm

Re: FramebufferDeviceWrite with FRAMEBUFFER_TRANSFER_DMA strange results

Postby Poi » Mon Feb 25, 2019 11:25 pm

I tried both tests, and it looks like each one solves the problem in different ways.

Now I am a bit confused about CleanDataCacheRange though. I tried using it before posting, because I saw it was used in the Bouncing Boxes example, but I must have been doing something wrong. What is that needs to be cleaned from the cache? Is it the source data? Or the destination data?

I also tried printing the value of DMA_CACHE_COHERENT and it prints True, like this

Code: Select all

ConsoleWindowWriteLn(TFTHandle, BoolToStr(DMA_CACHE_COHERENT, True));


So I assumed I shouldn't need to call CleanDataCacheRange in the first place.
User avatar
Ultibo
Site Admin
Posts: 2280
Joined: Sat Dec 19, 2015 3:49 am
Location: Australia

Re: FramebufferDeviceWrite with FRAMEBUFFER_TRANSFER_DMA strange results

Postby Ultibo » Tue Feb 26, 2019 10:50 pm

Poi wrote:I tried both tests, and it looks like each one solves the problem in different ways.

Thanks, if your test works consistently with either then that eliminates a lot of things and helps a lot with tracking this down.

Poi wrote:Now I am a bit confused about CleanDataCacheRange though. I tried using it before posting, because I saw it was used in the Bouncing Boxes example, but I must have been doing something wrong. What is that needs to be cleaned from the cache? Is it the source data? Or the destination data?

There are two basic cache operations you can perform (clean and invalidate) and the rules around them are quite simple.

1. For a DMA write operation the source data needs to be cleaned before the transfer.

2. For a DMA read operation the destination data needs to be cleaned before the transfer (see next line) and invalidated after the transfer.

In the second case the clean operation is often forgotten and can create very subtle and hard to track down bugs but the reason for it is simple.

When a block of memory is allocated for use with a DMA read operation it is possible that the same block was used very recently for other data and some of that data may still reside in cache, the clean before the DMA read forces any modified (dirty) data still in cache to be written out to memory so that it does not overwrite the DMA read.

We also see that many people get confused about the invalidate operation and use it when they should use clean instead, again the rules are quite simple.

Clean: Forces modified data in cache that has not been written back to memory (dirty cache lines) to be written out but does not evict the data from the cache.

Invalidate: Evicts data from the cache so that it is no longer present and the next read of that memory will perform a fetch from memory instead of cache. It does not clean the data before eviction so any dirty data will be lost if not cleaned first.

Poi wrote:I also tried
printing the value of DMA_CACHE_COHERENT and it prints True, like this

Code: Select all

ConsoleWindowWriteLn(TFTHandle, BoolToStr(DMA_CACHE_COHERENT, True));


So I assumed I shouldn't need to call CleanDataCacheRange in the first place.


Your assumption is correct, you shouldn't need CleanDataCacheRange for a DMA transfer on a Pi 1 (A/B/Zero). Now we just need to work out why that isn't the case in reality.
Ultibo.org | Make something amazing
https://ultibo.org
pik33
Posts: 887
Joined: Fri Sep 30, 2016 6:30 pm
Location: Poland
Contact:

Re: FramebufferDeviceWrite with FRAMEBUFFER_TRANSFER_DMA strange results

Postby pik33 » Thu Feb 28, 2019 10:46 am

RPi 0/1 has no CPU L2 cache
User avatar
Ultibo
Site Admin
Posts: 2280
Joined: Sat Dec 19, 2015 3:49 am
Location: Australia

Re: FramebufferDeviceWrite with FRAMEBUFFER_TRANSFER_DMA strange results

Postby Ultibo » Thu Feb 28, 2019 11:25 pm

pik33 wrote:RPi 0/1 has no CPU L2 cache

That's only partly true, while the ARM1176JZF CPU used in the BCM2935 doesn't have an inbuilt L2 cache there is a 128KB L2 cache that is shared by the CPU, GPU and all other bus mastering peripherals in the system.

Ultibo wrote:Your assumption is correct, you shouldn't need CleanDataCacheRange for a DMA transfer on a Pi 1 (A/B/Zero). Now we just need to work out why that isn't the case in reality.


So we've reread the documents from both ARM and Broadcom and come to the conclusion that setting DMA_CACHE_COHERENT to True on the Pi A/B/Zero is not correct in all cases.

As mentioned above there is a shared L2 cached in the BCM2835 that is cache coherent between the CPU, GPU and peripherals (if the correct bus address is used), in addition the architecture of the SoC is such that memory regions marked with the Shareable attribute are also cache coherent between the CPU and peripherals.

What we haven't properly taken into account is that the L1 data cache in the CPU (16KB) is not coherent with the GPU or other peripherals except in the case of a memory region marked shareable. So while DMA_CACHE_COHERENT is True for regions of memory with the flag PAGE_TABLE_FLAG_SHARED (which is what is allocated by DMAAllocateBuffer) it is False for normal regions of memory such as those allocated by GetMem or for variables allocated on the stack. Of course memory that is marked as non cached is by its nature also considered cached coherent but it comes with a significant performance penalty.

Based on this we have decided to adjust the system configuration for the Pi A/B/Zero platform to set DMA_CACHE_COHERENT to False which will cause many of the drivers and APIs to perform cache cleaning and/or invalidation when necessary, we hope that we have designed the system well enough that this will be the only change required. The Pi 2 and 3 platforms already have this setting set to False due to the architectural differences in the quad core design and since most of the code is common to all platforms we think everything should be covered.

We'll undertake some intensive testing of this change to attempt to confirm the functionality and then push an update to GitHub.
Ultibo.org | Make something amazing
https://ultibo.org
User avatar
Ultibo
Site Admin
Posts: 2280
Joined: Sat Dec 19, 2015 3:49 am
Location: Australia

Re: FramebufferDeviceWrite with FRAMEBUFFER_TRANSFER_DMA strange results

Postby Ultibo » Tue Mar 26, 2019 10:47 pm

Ultibo wrote:Based on this we have decided to adjust the system configuration for the Pi A/B/Zero platform to set DMA_CACHE_COHERENT to False which will cause many of the drivers and APIs to perform cache cleaning and/or invalidation when necessary, we hope that we have designed the system well enough that this will be the only change required. The Pi 2 and 3 platforms already have this setting set to False due to the architectural differences in the quad core design and since most of the code is common to all platforms we think everything should be covered.

We'll undertake some intensive testing of this change to attempt to confirm the functionality and then push an update to GitHub.


Just to update, after testing across all models we have committed a change that resolves this inconsistency in the reporting of cache coherence in the Pi A/B/Zero. The original example in this post now behaves correctly on all models.
Ultibo.org | Make something amazing
https://ultibo.org

Return to “Bug reports”

Who is online

Users browsing this forum: No registered users and 20 guests