This issue is closely tied this discussion, so please read the linked content before continuing.
Examining the data from this query:
SELECT * FROM public.gamit_subnets where "DOY"='180' and "Year"='2022'
Shows interesting behavior:
df = pd.read_csv('/Users/espg/Downloads/gamit_subnets_180_2022.csv')
df.iloc[1].stations
Which outputs the following (note the color highlight)
$${\color{blue}igs.badg,igs.cas1,igs.coco,igs.daej,igs.darw,igs.dumg,igs.guam,}$$
$${\color{blue}igs.hob2,igs.hrao,igs.iisc,igs.kiru,igs.mal2,igs.mcil,igs.mobs,igs.nklg,igs.pohn,igs.pol2,igs.reun,}$$
$${\color{red}igs.cas1,igs.darw,igs.dumg,igs.hob2,igs.hrao,igs.kiru,igs.mal2,igs.mobs,igs.nklg,igs.pol2,igs.reun}$$
All of the red entries above are duplicates of stations already listed in the blue highlighting.
For public.gamit_subnets on DOY of 2022, there are 17 listed clusters in the data table, with the first cluster (labeled subnet 0) being the backbone network. That leaves 16 clusters, which correspond to the 16 clusters that make_clusters produces. Since index zero in the postgres data table corresponds to the backbone, the indexing is off by 1; i.e., df.iloc[1].stations compares to a[0] and b[0] from a, b = make_clusters(points.T, stations), with "a" and "b" being the clusters dictionary and cluster_ties list respectively.
This is the zero-th entry for cluster stations from the clusters dictionary-- note that it's identical to the blue highlighted text from public.gamit_subnets table for DOY 180 in 2022:
>>> a['stations'][0]
[array(['igs', 'badg'], dtype='<U4'),
array(['igs', 'cas1'], dtype='<U4'),
array(['igs', 'coco'], dtype='<U4'),
array(['igs', 'daej'], dtype='<U4'),
array(['igs', 'darw'], dtype='<U4'),
array(['igs', 'dumg'], dtype='<U4'),
array(['igs', 'guam'], dtype='<U4'),
array(['igs', 'hob2'], dtype='<U4'),
array(['igs', 'hrao'], dtype='<U4'),
array(['igs', 'iisc'], dtype='<U4'),
array(['igs', 'kiru'], dtype='<U4'),
array(['igs', 'mal2'], dtype='<U4'),
array(['igs', 'mcil'], dtype='<U4'),
array(['igs', 'mobs'], dtype='<U4'),
array(['igs', 'nklg'], dtype='<U4'),
array(['igs', 'pohn'], dtype='<U4'),
array(['igs', 'pol2'], dtype='<U4'),
array(['igs', 'reun'], dtype='<U4')]
Now, this is the output from the cluster_ties list, which is identical to the red highlighted text from public.gamit_subnets table for DOY 180 in 2022:
>>> b[0]
[array(['igs', 'cas1'], dtype='<U4'),
array(['igs', 'darw'], dtype='<U4'),
array(['igs', 'dumg'], dtype='<U4'),
array(['igs', 'hob2'], dtype='<U4'),
array(['igs', 'hrao'], dtype='<U4'),
array(['igs', 'kiru'], dtype='<U4'),
array(['igs', 'mal2'], dtype='<U4'),
array(['igs', 'mobs'], dtype='<U4'),
array(['igs', 'nklg'], dtype='<U4'),
array(['igs', 'pol2'], dtype='<U4'),
array(['igs', 'reun'], dtype='<U4')]
Looking at two additional entries from public.gamit_subnets and the clusters dictionary & cluster_ties list confirms the pattern.
Questions
- Was this the case with earlier runs that @eckendrick was doing, such as
public.gamit_soln 2022 days 001-008?
- If not, this might be a bug with these lines that check for tie points repeats on load from the database
- If the tie and stations are getting added together inside of
GamitSession, we can fix the issue with the code from the previous bullet or similar
- What is default and preferred behavior for handling stations, and should subnetwork
stations include the tie stations?
- Reading this comment, it looks like currently
GamitSession wants these two data objects (tie points and station clusters) not to overlap.
- Regardless of what the current default behavior is, we should intentionally determine what makes sense for the behavior to be, and if we want to change it.
- Having the clusters include the tie stations (or not) will impact other downstream code, such as how subnetwork plots are currently handled.
- Having the clusters include the tie stations (or not) will also impact the 'check' that's run when determining how large the subnetworks are (should it be the 'base' size of the clusters, or the 'expanded' size that includes the tie points)
- @demiangomez my intuition is that it will make more sense to change the behavior in
GamitSession than what is setup in pyNetwork
- Regardless of what the default behavior is or where the tie stations and subnetworks are being double merged, we should be testing for repeats:
- With unit tests that tell us (and fail submitted PRs) if the control logic needlessly duplicates entries
- With runtime checks that can detect, fix and remove duplicate stations before time consuming numerics
This issue is closely tied this discussion, so please read the linked content before continuing.
Examining the data from this query:
SELECT * FROM public.gamit_subnets where "DOY"='180' and "Year"='2022'Shows interesting behavior:
Which outputs the following (note the color highlight)
All of the red entries above are duplicates of stations already listed in the blue highlighting.
For
public.gamit_subnetson DOY of 2022, there are 17 listed clusters in the data table, with the first cluster (labeled subnet 0) being the backbone network. That leaves 16 clusters, which correspond to the 16 clusters thatmake_clustersproduces. Since index zero in the postgres data table corresponds to the backbone, the indexing is off by 1; i.e.,df.iloc[1].stationscompares toa[0]andb[0]froma, b = make_clusters(points.T, stations), with "a" and "b" being theclustersdictionary andcluster_tieslist respectively.This is the zero-th entry for cluster stations from the clusters dictionary-- note that it's identical to the blue highlighted text from
public.gamit_subnetstable for DOY 180 in 2022:Now, this is the output from the
cluster_tieslist, which is identical to the red highlighted text frompublic.gamit_subnetstable for DOY 180 in 2022:Looking at two additional entries from
public.gamit_subnetsand theclustersdictionary &cluster_tieslist confirms the pattern.Questions
public.gamit_soln2022 days 001-008?GamitSession, we can fix the issue with the code from the previous bullet or similarstationsinclude the tie stations?GamitSessionwants these two data objects (tie points and station clusters) not to overlap.GamitSessionthan what is setup inpyNetwork