Advanced

rashdf provides convenience methods for generating Zarr metadata for HEC-RAS HDF5 files. This is particularly useful for working with stochastic ensemble simulations, where many HEC-RAS HDF5 files are generated for different model realizations, forcing scenarios, or other sources of uncertainty.

To illustrate this, consider a set of HEC-RAS HDF5 files stored in an S3 bucket, where each file represents a different simulation of a river model. We can generate Zarr metadata for each simulation and then combine the metadata into a single Kerchunk metadata file that includes a new “sim” dimension. This combined metadata file can then be used to open a single Zarr dataset that includes all simulations.

The cell timeseries output for a single simulation might look something like this:

>>> from rashdf import RasPlanHdf
>>> plan_hdf = RasPlanHdf.open_uri("s3://bucket/simulations/1/BigRiver.p01.hdf")
>>> plan_hdf.mesh_cells_timeseries_output("BigRiverMesh1")
<xarray.Dataset> Size: 66MB
Dimensions:                              (time: 577, cell_id: 14188)
Coordinates:
* time                                 (time) datetime64[ns] 5kB 1996-01-14...
* cell_id                              (cell_id) int64 114kB 0 1 ... 14187
Data variables:
    Water Surface                        (time, cell_id) float32 33MB dask.array<chunksize=(3, 14188), meta=np.ndarray>
    Cell Cumulative Precipitation Depth  (time, cell_id) float32 33MB dask.array<chunksize=(3, 14188), meta=np.ndarray>
Attributes:
    mesh_name:  BigRiverMesh1

Note that the example below requires installation of the optional libraries kerchunk, zarr, fsspec, and s3fs:

from rashdf import RasPlanHdf
from kerchunk.combine import MultiZarrToZarr
import json

# Example S3 URL pattern for HEC-RAS plan HDF5 files
s3_url_pattern = "s3://bucket/simulations/{sim}/BigRiver.p01.hdf"

zmeta_files = []
sims = list(range(1, 11))

# Generate Zarr metadata for each simulation
for sim in sims:
    s3_url = s3_url_pattern.format(sim=sim)
    plan_hdf = RasPlanHdf.open_uri(s3_url)
    zmeta = plan_hdf.zmeta_mesh_cells_timeseries_output("BigRiverMesh1")
    json_file = f"BigRiver.{sim}.p01.hdf.json"
    with open(json_file, "w") as f:
        json.dump(zmeta, f)
    zmeta_files.append(json_file)

# Combine Zarr metadata files into a single Kerchunk metadata file
# with a new "sim" dimension
mzz = MultiZarrToZarr(zmeta_files, concat_dims=["sim"], coo_map={"sim": sims})
mzz_dict = mss.translate()

with open("BigRiver.combined.p01.json", "w") as f:
    json.dump(mzz_dict, f)

Now, we can open the combined dataset with xarray:

import xarray as xr

ds = xr.open_dataset(
    "reference://",
    engine="zarr",
    backend_kwargs={
        "consolidated": False,
        "storage_options": {"fo": "BigRiver.combined.p01.json"},
    },
    chunks="auto",
)

The resulting combined dataset includes a new sim dimension:

<xarray.Dataset> Size: 674MB
Dimensions:                              (sim: 10, time: 577, cell_id: 14606)
Coordinates:
* cell_id                              (cell_id) int64 117kB 0 1 ... 14605
* sim                                  (sim) int64 80B 1 2 3 4 5 6 7 8 9 10
* time                                 (time) datetime64[ns] 5kB 1996-01-14...
Data variables:
    Cell Cumulative Precipitation Depth  (sim, time, cell_id) float32 337MB dask.array<chunksize=(10, 228, 14606), meta=np.ndarray>
    Water Surface                        (sim, time, cell_id) float32 337MB dask.array<chunksize=(10, 228, 14606), meta=np.ndarray>
Attributes:
    mesh_name:  BigRiverMesh1