Revert workaround for miscompilation

We have an ugly workaround for miscompilation of a workgroup uniform load (see #199). The miscompilation no longer repros, and it's not clear we still need it. Possibly something changed in naga, or possibly Apple drivers have improved.

It would be possible to go a little further and trim the allocation of the Paths array so it's not rounded up to workgroup size, but the practical benefit of that is marginal.

Sending this as a PR to see if there still may be problems.
diff --git a/shader/tile_alloc.wgsl b/shader/tile_alloc.wgsl
index 7b05bdc..1bea320 100644
--- a/shader/tile_alloc.wgsl
+++ b/shader/tile_alloc.wgsl
@@ -93,15 +93,12 @@
         var offset = atomicAdd(&bump.tile, count);
         if offset + count > config.tiles_size {
             offset = 0u;
+            // TODO: should suppress writing, it's a race condition
             atomicOr(&bump.failed, STAGE_TILE_ALLOC);
         }
-        paths[drawobj_ix].tiles = offset;
-    }    
-    // Using storage barriers is a workaround for what appears to be a miscompilation
-    // when a normal workgroup-shared variable is used to broadcast the value.
-    storageBarrier();
-    let tile_offset = paths[drawobj_ix | (WG_SIZE - 1u)].tiles;
-    storageBarrier();
+        sh_tile_offset = offset;
+    }
+    let tile_offset = workgroupUniformLoad(&sh_tile_offset);
     if drawobj_ix < config.n_drawobj {
         let tile_subix = select(0u, sh_tile_count[local_id.x - 1u], local_id.x > 0u);
         let bbox = vec4(ux0, uy0, ux1, uy1);