riscv: Improve averaging performance in `png_read_filter_row_avg_rvv` Replace the two-instruction sequence (vwaddu and vnsrl) with a single vaaddu instruction for computing the average. The vaaddu instruction with vxrm=2 (round-toward-zero) produces identical results to the widening add followed by narrowing shift, but in a single operation. Reviewed-by: Cosmin Truta <ctruta@gmail.com> Signed-off-by: Cosmin Truta <ctruta@gmail.com>
diff --git a/riscv/filter_rvv_intrinsics.c b/riscv/filter_rvv_intrinsics.c index a71e561..73fddce 100644 --- a/riscv/filter_rvv_intrinsics.c +++ b/riscv/filter_rvv_intrinsics.c
@@ -4,6 +4,7 @@ * Written by Manfred SCHLAEGL, 2022 * DragoČ™ Tiselice <dtiselice@google.com>, May 2023. * Filip Wasil <f.wasil@samsung.com>, March 2025. + * Liang Junzhao <junzhao.liang@spacemit.com>, Nov 2025. * * This code is released under the libpng license. * For conditions of distribution and use, see the disclaimer @@ -140,11 +141,8 @@ /* x = *row */ x = __riscv_vle8_v_u8m1(row, vl); - /* tmp = a + b */ - vuint16m2_t tmp = __riscv_vwaddu_vv_u16m2(a, b, vl); - - /* a = tmp/2 */ - a = __riscv_vnsrl_wx_u8m1(tmp, 1, vl); + /* a = (a + b) / 2, round to zero with vxrm = 2 */ + a = __riscv_vaaddu_wx_u8m1(a, b, 2, vl); /* a += x */ a = __riscv_vadd_vv_u8m1(a, x, vl);