Fix invalid memory access and optimise Blit_3or4_to_3or4__*
Fix invalid write at last pixel of the surface:
when surface has no padding (pitch == w * bpp) and bpp is 3
with Blit, no colorkey, and NO_ALPHA same or inverse rgb triplet
Optimise by using int32 access:
BGR24 -> ARGB8888 : faster x1.897875 (362405 -> 190953)
RGB24 -> ABGR8888 : faster x1.660416 (363304 -> 218803)
ABGR8888 -> RGB24 : faster x1.686319 (334962 -> 198635)
ARGB8888 -> BGR24 : faster x1.691868 (324524 -> 191814)
BGR24 -> RGB888 : faster x1.678459 (326811 -> 194709)
BGR888 -> RGB24 : faster x1.731772 (327724 -> 189242)
RGB24 -> BGR888 : faster x1.690989 (328916 -> 194511)
RGB888 -> BGR24 : faster x1.698333 (326175 -> 192056)
1 file changed