1. 19 Dec, 2018 2 commits
    • Ulrich Sibiller's avatar
      fb: fix fast-path blt detection · 034228d7
      Ulrich Sibiller authored
      Backport of this commit:
      
        commit a2880699e8f1f576e1a48ebf25e8982463323f84
        Author: Keith Packard <keithp@keithp.com>
        Date:   Tue Mar 25 08:21:16 2014 -0700
      
          fb: fix fast-path blt detection
      
          The width parameter is used to disable the blit fast-path (memcpy) when
              source and destination rows overlap in memory. This check was added in [0].
      
          Unfortunately, the calculation to determine if source and destination
          lines overlapped was incorrect:
            (1) it converts width from pixels to bytes, but width is actually in
                bits, not pixels.
            (2) it adds this byte offset to dst/srcLine, which implicitly converts
                the offset from bytes to sizeof(FbBits).
      
          Fix both of these by converting addresses to byte pointers and width
          to bytes and doing comparisons on the resulting byte address.
      
          For example:
          A 32-bpp 1366 pixel-wide row will have
            width = 1366 * 32 = 43712 bits
            bpp = 32
            (bpp >> 3) = 4
            width * (bpp >> 3) = 174848 FbBits
            (FbBits *)width => 699392 bytes
      
          So, "careful" was true if the destination line was within 699392 bytes,
          instead of just within its 1366 * 4 = 5464 byte row.
      
          This bug causes us to take the slow path for large non-overlapping rows
          that are "close" in memory.  As a data point, XGetImage(1366x768) on my
          ARM chromebook was taking ~140 ms, but with this fixed, it now takes
          about 60 ms.
            XGetImage() -> exaGetImage() -> fbGetImage -> fbBlt()
      
          [0] commit e32cc0b4c85c78cd8743a6e1680dcc79054b57ce
          Author: Adam Jackson <ajax@redhat.com>
          Date:   Thu Apr 21 16:37:11 2011 -0400
      
              fb: Fix memcpy abuse
      
              The memcpy fast path implicitly assumes that the copy walks
              left-to-right.  That's not something memcpy guarantees, and newer glibc
              on some processors will indeed break that assumption.  Since we walk a
              line at a time, check the source and destination against the width of
              the blit to determine whether we can be sloppy enough to allow memcpy.
              (Having done this, we can remove the check for !reverse as well.)
      
          v3: Convert to byte units
      
          This first checks to make sure the blt is byte aligned, converts all
          of the data to byte units and then compares for byte address range
          overlap between source and dest.
      Signed-off-by: 's avatarKeith Packard <keithp@keithp.com>
      Reviewed-by: 's avatarDaniel Kurtz <djkurtz@chromium.org>
      034228d7
    • Ulrich Sibiller's avatar
      fb: Fix memcpy abuse · 020ef045
      Ulrich Sibiller authored
      Fixes ArcticaProject/nx-libs#750
      
      Backport of this commit:
      
      commit e32cc0b4c85c78cd8743a6e1680dcc79054b57ce
      Author: Adam Jackson <ajax@redhat.com>
      Date:   Thu Apr 21 16:37:11 2011 -0400
      
          fb: Fix memcpy abuse
      
          The memcpy fast path implicitly assumes that the copy walks
          left-to-right.  That's not something memcpy guarantees, and newer glibc
          on some processors will indeed break that assumption.  Since we walk a
          line at a time, check the source and destination against the width of
          the blit to determine whether we can be sloppy enough to allow memcpy.
          (Having done this, we can remove the check for !reverse as well.)
      
          On an Intel Core i7-2630QM with an NVIDIA GeForce GTX 460M running in
          NoAccel, the broken code and various fixes for -copywinwin{10,100,500}
          gives (edited to fit in 80 columns):
      
          1: Disable the fastpath entirely
          2: Replace memcpy with memmove
          3: This fix
          4: The code before this fix
      
            1            2                 3                 4           Operation
            ------   ---------------   ---------------   ---------------   ------------
            258000   269000 (  1.04)   544000 (  2.11)   552000 (  2.14)   Copy 10x10
             21300    23000 (  1.08)    43700 (  2.05)    47100 (  2.21)   Copy 100x100
               960      962 (  1.00)     1990 (  2.09)     1990 (  2.07)   Copy 500x500
      
          So it's a modest performance hit, but correctness demands it, and it's
          probably worth keeping the 2x speedup from having the fast path in the
          first place.
      Signed-off-by: 's avatarAdam Jackson <ajax@redhat.com>
      Signed-off-by: 's avatarKeith Packard <keithp@keithp.com>
      020ef045
  2. 05 Feb, 2018 1 commit
  3. 05 Jul, 2016 1 commit
  4. 10 Oct, 2011 1 commit