The query to get the last update of a dataset can be quite slow, especially once the number of time series in the dataset grow above 90'000.
I wrote an alternative query that seems to perform slightly better, but there probably needs to be an additional index somewhere to make it really go fast.
The existing query is:
-- Get the last time the dataset was updated
--
-- Returns the created_at timestamp of the series in the given dataset
-- that was most recently updated.
CREATE OR REPLACE FUNCTION timeseries.dataset_get_last_update(p_dataset TEXT)
RETURNS TABLE(name TEXT, updated TIMESTAMPTZ)
AS $$
BEGIN
CREATE TEMPORARY TABLE tmp_ts_read_keys
ON COMMIT DROP
AS (
SELECT ts_key
FROM timeseries.catalog AS cat
WHERE set_id = p_dataset
);
RETURN QUERY
SELECT p_dataset AS name, max(ud.updated) AS updated
FROM timeseries.ts_get_last_update() AS ud;
END;
$$ LANGUAGE PLPGSQL
SECURITY DEFINER
SET search_path = timeseries, pg_temp;
-- Get the last time time series were updated
--
-- Returns the created_at of the given time series
CREATE OR REPLACE FUNCTION timeseries.ts_get_last_update()
RETURNS TABLE(ts_key TEXT, updated TIMESTAMPTZ)
AS $$
BEGIN
RETURN QUERY
SELECT
DISTINCT ON(mn.ts_key)
mn.ts_key, mn.created_at AS updated
FROM timeseries.timeseries_main AS mn
JOIN tmp_ts_read_keys AS rd
USING(ts_key)
ORDER BY mn.ts_key, mn.created_at DESC;
END;
$$ LANGUAGE PLPGSQL
SECURITY DEFINER
SET search_path = timeseries, pg_temp;
The query I wrote is the following:
SELECT MAX(updated) updated
FROM (
SELECT
ts_key,
(
SELECT MAX(created_at) updated
FROM timeseries.timeseries_main tm
WHERE tm.ts_key = catalog.ts_key
)
FROM timeseries.catalog
WHERE set_id = '<set_id>'
) AS s;
The timings with the ch.fso.cah.inv dataset, which contains 69'366 time series, is as follows:

The difference in time between the first query and the following ones is probably due to Postgres caching the table in memory after the first read, but the time when used in actual code would probably be closer to the first run.
The timings with the updated query are as follows:

Some data
| Set |
N. series |
Time old |
Time new |
| ch.kof.ivua.ng08 |
120'742 |
First Run: ~270s , Cached: ~270s |
FR: ~49s, C: ~6s |
| ch.kof.ivus.ng08 |
97'595 |
FR: ~245-250s, C: ~245-250s |
FR: ~13s, C: ~4s |
| ch.fso.cah.inv |
69'366 |
FR: ~4.5s, C: ~2.9s |
FR: ~2s, C: ~2s |
| ch.fso.hesta |
4'212 |
FR: ~1.6s, C: ~1.6s |
FR: ~1.2s, C: ~1.2s |
| ch.fso.unemp |
84 |
FR: ~1.2s, C: ~1.1s |
FR: ~1.2s, C: ~1.1s |
The query to get the last update of a dataset can be quite slow, especially once the number of time series in the dataset grow above 90'000.
I wrote an alternative query that seems to perform slightly better, but there probably needs to be an additional index somewhere to make it really go fast.
The existing query is:
The query I wrote is the following:
The timings with the
ch.fso.cah.invdataset, which contains 69'366 time series, is as follows:The difference in time between the first query and the following ones is probably due to Postgres caching the table in memory after the first read, but the time when used in actual code would probably be closer to the first run.
The timings with the updated query are as follows:
Some data