Skip to content

Bug: readUCI() renames columns before setting defaults, which creates duplicate columns #201

@rburghol

Description

@rburghol
  • Status: This bug is currently active, and awaiting the completion of @timcera iomanager branch to be fully functional - see code here
  • Location: See Add update_uci mode in CLI and test UCI files #200
  • Problem: readUCI() renames columns before setting defaults, but does not flush to hdf5, which creates duplicate columns
  • Outcome:
    • if calling "readUCI()" with the 3rd parameter overwrite=False duplicate columns get written to the hdf5
    • This seems to have the result of causing an index error when doing multiple UCI re-reads with overwrite=False
  • Use case: this is important as the import_uci step can be very time consuming if you have large WDM inputs, and so I am working on a command line function to allow one to only re-import UCI parameters, and leaving WDM/timeseries things untouched in the original h5 file.

Testing:

Get Branch branch:

First Import and Run

cd tests/testcbp/HSP2Results
hsp2 import_uci PL3_5250_0001eq.uci PL3_5250_0001eq.h5
hsp2 run PL3_5250_0001eq.h5

2026-01-02 20:35:10.69   Simulation Start: 2001-01-01 00:00:00, Stop: 2002-01-01 00:00:00
2026-01-02 20:35:10.69      RCHRES R001 DELT(minutes): 60
2026-01-02 20:35:18.66         HYDR
2026-01-02 20:35:58.46         ADCALC
2026-01-02 20:35:59.41   Done; Run time is about 01:37.8 (mm:ss)

First Update with longer sim and Run

2026-01-02 20:39:48.09   Simulation Start: 1984-01-01 00:00:00, Stop: 2020-01-01 00:00:00
2026-01-02 20:39:48.09      RCHRES R001 DELT(minutes): 60
2026-01-02 20:39:55.57         HYDR
2026-01-02 20:42:14.50         ADCALC
2026-01-02 20:42:17.27   Done; Run time is about 03:24.5 (mm:ss)

2nd Update (back to shorter duration) and Run

hsp2 update_uci PL3_5250_0001eq.uci PL3_5250_0001eq.h5

Traceback (most recent call last):
  File "/usr/local/share/venv/hsp2dev_py10/bin/hsp2", line 7, in <module>
    sys.exit(main())
<snip>
File "/usr/local/share/venv/hsp2dev_py10/lib/python3.10/site-packages/hsp2/hsp2tools/readUCI.py", line 318, in readUCI
    df.to_hdf(store, key=path, data_columns=True)
<snip>
ValueError: cannot reindex on an axis with duplicate labels

Analysis

  • Line 318 is a renaming of columns in the PERLND/SNOW space.
  • I suspect that the cause of this is the renaming of PERLND SNOW parameters due to duplicate columns {"PKSNOW": "PACKF", "PKICE": "PACKI", "PKWATR": "PACKW"}
import h5py
import pandas as pd

store = pd.HDFStore(str("./tests/testcbp/HSP2results/PL3_5250_0001eq.h5"), mode='r')
pd.read_hdf(store, "/PERLND/SNOW/STATES")
Traceback (most recent call last):
  File "/usr/local/share/venv/hsp2dev_py10/lib/python3.10/site-packages/pandas/io/pytables.py", line 1819, in _create_storer
    cls = _TABLE_MAP[tt]
KeyError: None
  • But the h5 is valid, as is evidenced by other tables being read just fine:
pd.read_hdf(store, "/RCHRES/ACIDPH/STATES")

     OPNID  ACCONC1  ACCONC2  ACCONC3  ACCONC4  ACCONC5  ACCONC6  ACCONC7
R001   NaN      0.0      0.0      0.0      0.0      0.0      0.0      0.0
  • Lines 312-318 in readUCI.py rename some columns in the /PERLND/SNOW/STATES table
  • In Line 466, each existing hsp path in the hdf is loaded in the var df
  • In line 477, the defaults for that matching table are loaded into dct_params
    • But for at least some, like /PERLND/SNOW/STATES the default columns still have their old names
  • In Line 470-483 any missing default columns are added to an updated df for that table then pushed back to the hdf5
  • For some, like the /PERLND/SNOW/STATES table, this adds deprecated column names back into the df, in addition to the recently renamed versions of those old columns.
  • Then, the next time readUCI() is run with overwrite = False, the renaming line adds duplicate copies of the renamed columns.
  • I think this is the issue, or at last 1 issue.
  • Below is some slightly edited code from readUCI() that allows one to step through this, not as efficient as a command line debugger, but my VSCode is partially broken since a recent update and I can't step into code for the time being on Windows.

The Entire Code as command line runnable test

"""
Read data from a UCI file and create an HDF file with the data.

Parameters
----------
uciname : str
    The name of the UCI file to read.
hdfname : str
    The name of the HDF file to store the data.
overwrite : bool, optional
    Whether to overwrite existing data in the HDF file. Defaults to True.

Returns
-------
None
"""
import h5py
import pandas as pd
from hsp2.hsp2tools.readUCI import *
from hsp2.hsp2io import *


# Hard code inputs for example testing
uciname = "./tests/testcbp/HSP2results/PL3_5250_0001eq.uci"
hdfname = "./tests/testcbp/HSP2results/PL3_5250_0001eq.h5"
overwrite = False

# Load needed functions
convert = {"C": str, "I": int, "R": float}


if overwrite is True and os.path.exists(hdfname):
    os.remove(hdfname)

# create lookup dictionaries from 'ParseTable.csv' and 'rename.csv'
parse = defaultdict(list)
defaults = {}
cat = {}
path = {}
hsp_paths = {}
datapath = os.path.join(hsp2tools.__path__[0], "data", "ParseTable.csv")
for row in pd.read_csv(datapath).itertuples():
    parse[row.OP, row.TABLE].append(
        (row.NAME, row.TYPE, row.START, row.STOP, row.DEFAULT)
    )
    defaults[row.OP, row.SAVE, row.NAME] = convert[row.TYPE](row.DEFAULT)
    cat[row.OP, row.TABLE] = row.CAT
    path[row.OP, row.TABLE] = row.SAVE
    # store paths for checking defaults:
    hsp_path = f"/{row.OP}/{row.SAVE}/{row.CAT}"
    if not hsp_path in hsp_paths:
        hsp_paths[hsp_path] = {}
    hsp_paths[hsp_path][row.NAME] = defaults[row.OP, row.SAVE, row.NAME]

rename = {}
extendlen = {}
datapath = os.path.join(hsp2tools.__path__[0], "data", "rename.csv")
for row in pd.read_csv(datapath).itertuples():
    if row.LENGTH != 1:
        extendlen[row.OPERATION, row.TABLE] = row.LENGTH
    rename[row.OPERATION, row.TABLE] = row.RENAME

net = None
sc = None
store = pd.HDFStore(hdfname, mode="a")
info = (store, parse, path, defaults, cat, rename, extendlen)
f = reader(uciname)
for line in f:
    if line.startswith("GLOBAL"):
        global_(info, getlines(f))
    elif line.startswith("OPN"):
        opn(info, getlines(f))
    elif line.startswith("NETWORK"):
        net = network(info, getlines(f))
    elif line.startswith("SCHEMATIC"):
        sc = schematic(info, getlines(f))
    elif line.startswith("MASS-LINK"):
        masslink(info, getlines(f))
    elif line.startswith("FTABLES"):
        ftables(info, getlines(f))
    elif line.startswith("EXT"):
        ext(info, getlines(f))
    elif line.startswith("GENER"):
        gener(info, getlines(f))
    elif line.startswith("PERLND"):
        operation(info, getlines(f), "PERLND")
    elif line.startswith("IMPLND"):
        operation(info, getlines(f), "IMPLND")
    elif line.startswith("RCHRES"):
        operation(info, getlines(f), "RCHRES")
    elif line.startswith("MONTH-DATA"):
        monthdata(info, getlines(f))
    elif line.startswith("SPEC-ACTIONS"):
        specactions(info, getlines(f))

colnames = (
    "AFACTR",
    "MFACTOR",
    "MLNO",
    "SGRPN",
    "SMEMN",
    "SMEMSB",
    "SVOL",
    "SVOLNO",
    "TGRPN",
    "TMEMN",
    "TMEMSB",
    "TRAN",
    "TVOL",
    "TVOLNO",
    "COMMENTS",
)
if not ((net is None) and (sc is None)):
    linkage = pd.concat((net, sc), ignore_index=True, sort=True)
    for cname in colnames:
        if cname not in linkage.columns:
            linkage[cname] = ""
    linkage = linkage.sort_values(by=["TVOLNO"]).replace("na", "")
    linkage.to_hdf(store, key="/CONTROL/LINKS", data_columns=True)

Lapse.to_hdf(store, key="TIMESERIES/LAPSE_Table")
Seasons.to_hdf(store, key="TIMESERIES/SEASONS_Table")
Svp.to_hdf(store, key="TIMESERIES/Saturated_Vapor_Pressure_Table")
keys = set(store.keys())
# rename needed for restart. NOTE issue with line 157 in PERLND SNOW HSPF
# where PKSNOW = PKSNOW + PKICE at start - ONLY
path = "/PERLND/SNOW/STATES"
if path in keys:
    df = pd.read_hdf(store, path)
    df = df.rename(
        columns={"PKSNOW": "PACKF", "PKICE": "PACKI", "PKWATR": "PACKW"}
    )
    df.to_hdf(store, key=path, data_columns=True)

path = "/IMPLND/SNOW/STATES"
if path in keys:
    df = pd.read_hdf(store, path)
    df = df.rename(
        columns={"PKSNOW": "PACKF", "PKICE": "PACKI", "PKWATR": "PACKW"}
    )
    df.to_hdf(store, key=path, data_columns=True)

path = "/PERLND/SNOW/FLAGS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "SNOPFG" not in df.columns:  # didn't read SNOW-FLAGS table
        df["SNOPFG"] = 0
        df.to_hdf(store, key=path, data_columns=True)

path = "/IMPLND/SNOW/FLAGS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "SNOPFG" not in df.columns:  # didn't read SNOW-FLAGS table
        df["SNOPFG"] = 0
        df.to_hdf(store, key=path, data_columns=True)

# Need to fixup missing data
path = "/IMPLND/IWATER/PARAMETERS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "PETMIN" not in df.columns:  # didn't read IWAT-PARM2 table
        df["PETMIN"] = 0.35
        df["PETMAX"] = 40.0
        df.to_hdf(store, key=path, data_columns=True)

path = "/IMPLND/IWTGAS/PARAMETERS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "SDLFAC" not in df.columns:  # didn't read LAT-FACTOR table
        df["SDLFAC"] = 0.0
        df["SLIFAC"] = 0.0
        df.to_hdf(store, key=path, data_columns=True)

path = "/IMPLND/IQUAL/PARAMETERS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "SDLFAC" not in df.columns:  # didn't read LAT-FACTOR table
        df["SDLFAC"] = 0.0
        df["SLIFAC"] = 0.0
        df.to_hdf(store, key=path, data_columns=True)

path = "/PERLND/PWTGAS/PARAMETERS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "SDLFAC" not in df.columns:  # didn't read LAT-FACTOR table
        df["SDLFAC"] = 0.0
        df["SLIFAC"] = 0.0
        df["ILIFAC"] = 0.0
        df["ALIFAC"] = 0.0
        df.to_hdf(store, key=path, data_columns=True)
    if "SOTMP" not in df.columns:  # didn't read PWT-TEMPS table
        df["SOTMP"] = 60.0
        df["IOTMP"] = 60.0
        df["AOTMP"] = 60.0
        df.to_hdf(store, key=path, data_columns=True)
    if "SODOX" not in df.columns:  # didn't read PWT-GASES table
        df["SODOX"] = 0.0
        df["SOCO2"] = 0.0
        df["IODOX"] = 0.0
        df["IOCO2"] = 0.0
        df["AODOX"] = 0.0
        df["AOCO2"] = 0.0
        df.to_hdf(store, key=path, data_columns=True)

path = "/PERLND/PWATER/PARAMETERS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "FZG" not in df.columns:  # didn't read PWAT-PARM5 table
        df["FZG"] = 1.0
        df["FZGL"] = 0.1
        df.to_hdf(store, key=path, data_columns=True)

path = "/PERLND/PQUAL/PARAMETERS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "SDLFAC" not in df.columns:  # didn't read LAT-FACTOR table
        df["SDLFAC"] = 0.0
        df["SLIFAC"] = 0.0
        df["ILIFAC"] = 0.0
        df["ALIFAC"] = 0.0
        df.to_hdf(store, key=path, data_columns=True)

path = "/RCHRES/GENERAL/INFO"
if path in keys:
    dfinfo = pd.read_hdf(store, path)
    path = "/RCHRES/HYDR/PARAMETERS"
    if path in keys:
        df = pd.read_hdf(store, path)
        df["NEXITS"] = dfinfo["NEXITS"]
        df["LKFG"] = dfinfo["LKFG"]
        if "IREXIT" not in df.columns:  # didn't read HYDR-IRRIG table
            df["IREXIT"] = 0
            df["IRMINV"] = 0.0
        df["FTBUCI"] = df["FTBUCI"].map(lambda x: f"FT{int(x):03d}")
        df.to_hdf(store, key=path, data_columns=True)
    del dfinfo["NEXITS"]
    del dfinfo["LKFG"]
    dfinfo.to_hdf(store, key="RCHRES/GENERAL/INFO", data_columns=True)

path = "/RCHRES/HTRCH/FLAGS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "BEDFLG" not in df.columns:  # didn't read HT-BED-FLAGS table
        df["BEDFLG"] = 0
        df["TGFLG"] = 2
        df["TSTOP"] = 55
        df.to_hdf(store, key=path, data_columns=True)

path = "/RCHRES/HTRCH/PARAMETERS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "ELEV" not in df.columns:  # didn't read HEAT-PARM table
        df["ELEV"] = 0.0
        df["ELDAT"] = 0.0
        df["CFSAEX"] = 1.0
        df["KATRAD"] = 9.37
        df["KCOND"] = 6.12
        df["KEVAP"] = 2.24
        df.to_hdf(store, key=path, data_columns=True)

path = "/RCHRES/HTRCH/PARAMETERS"
if path in keys:
    df = pd.read_hdf(store, path)
    if "MUDDEP" not in df.columns:  # didn't read HT-BED-PARM table
        df["MUDDEP"] = 0.33
        df["TGRND"] = 59.0
        df["KMUD"] = 50.0
        df["KGRND"] = 1.4
        df.to_hdf(store, key=path, data_columns=True)

path = "/RCHRES/HTRCH/STATES"
if path in keys:
    df = pd.read_hdf(store, path)
    # if 'TW' not in df.columns:  # didn't read HEAT-INIT table
    #    df['TW']    = 60.0
    #    df['AIRTMP']= 60.0

# apply defaults:
# JUST FOR TESTING WE OVERWRITE hsp_paths to ONLY CONTAIN the SNOW PERLND
path = '/PERLND/SNOW/STATES'
hsp_paths = {path:hsp_paths[path]}

for path in hsp_paths:
    if path in keys:
        df = pd.read_hdf(store, path)
        dct_params = hsp_paths[path]
        
        new_columns = {}
        for par_name in dct_params:
            if par_name == "CFOREA":
                ichk = 0
            
            if par_name not in df.columns:  # missing value in HDF5 path
                def_val = dct_params[par_name]
                if def_val != "None":
                    # df[par_name] = def_val
                    new_columns[par_name] = def_val
        
        new_columns = pd.DataFrame(new_columns, index=df.index)
        df1 = pd.concat([df, new_columns], axis=1)
        
        df1.to_hdf(store, key=path, data_columns=True)
    else:
        if path[-6:] == "STATES":
            # need to add states if it doesn't already exist to save initial state variables
            # such as the case where entire IWAT-STATE1 table is being defaulted
            if not "df" in locals():
                x = 1  # sometimes when debugging keys gets creamed, seems like an IDE bug
            for column in df.columns:  # clear out existing data frame columns
                df = df.drop([column], axis=1)
            dct_params = hsp_paths[path]
            for par_name in dct_params:
                def_val = dct_params[par_name]
                if def_val != "None":
                    df[par_name] = def_val
            df.to_hdf(store, key=path, data_columns=True)

# now see what the hdf has in it
pd.read_hdf(store, path)
store.close()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions