riscv: Improve averaging performance in `png_read_filter_row_avg_rvv`

Replace the two-instruction sequence (vwaddu and vnsrl) with a single
vaaddu instruction for computing the average.

The vaaddu instruction with vxrm=2 (round-toward-zero) produces
identical results to the widening add followed by narrowing shift,
but in a single operation.

Reviewed-by: Cosmin Truta <ctruta@gmail.com>
Signed-off-by: Cosmin Truta <ctruta@gmail.com>
diff --git a/riscv/filter_rvv_intrinsics.c b/riscv/filter_rvv_intrinsics.c
index a71e561..73fddce 100644
--- a/riscv/filter_rvv_intrinsics.c
+++ b/riscv/filter_rvv_intrinsics.c
@@ -4,6 +4,7 @@
  * Written by Manfred SCHLAEGL, 2022
  *            DragoČ™ Tiselice <dtiselice@google.com>, May 2023.
  *            Filip Wasil     <f.wasil@samsung.com>, March 2025.
+ *            Liang Junzhao   <junzhao.liang@spacemit.com>, Nov 2025.
  *
  * This code is released under the libpng license.
  * For conditions of distribution and use, see the disclaimer
@@ -140,11 +141,8 @@
       /* x = *row */
       x = __riscv_vle8_v_u8m1(row, vl);
 
-      /* tmp = a + b */
-      vuint16m2_t tmp = __riscv_vwaddu_vv_u16m2(a, b, vl);
-
-      /* a = tmp/2 */
-      a = __riscv_vnsrl_wx_u8m1(tmp, 1, vl);
+      /* a = (a + b) / 2, round to zero with vxrm = 2 */
+      a = __riscv_vaaddu_wx_u8m1(a, b, 2, vl);
 
       /* a += x */
       a = __riscv_vadd_vv_u8m1(a, x, vl);