Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ SOURCES = $(DOCNAME).tex

# List of image files to be included in submitted package (anything that
# can be rendered directly by common web browsers)
FIGURES =fig-ext-ids.pdf
FIGURES =fig-ext-ids.pdf voprov_example.png

# List of PDF figures (figures that must be converted to pixel images to
# work in web browsers).
Expand Down
6 changes: 3 additions & 3 deletions data-origin.bib
Original file line number Diff line number Diff line change
Expand Up @@ -52,13 +52,13 @@ @misc{std:registry
Year = {2014}
}

@misc{std:2019ivoa.spec.1021O,
@misc{std:2025ivoa.spec.0116O,
Author= {Francois Ochsenbein et al.},
Organization = {IVOA},
Title = {VOTable Format Definition},
Version = {1.4},
Version = {1.5},
Url = {https://www.ivoa.net/documents/VOTable/},
Year={2019}
Year={2025}
}

@misc{std:2008ivoa.specQ0222P,
Expand Down
72 changes: 32 additions & 40 deletions data-origin.tex
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,19 @@
\lstset{flexiblecolumns=true}
\usepackage{todonotes}
\usepackage{array}
\usepackage{float}
\marginparwidth=4cm

\title{Data Origin in the VO}

% see ivoatexDoc for what group names to use here
\ivoagroup{DCP}

%\author[????URL????]{G.Landais}

\author{G.Landais}
\author{A.Muench}
\author{M.Demleitner}
\author{R.Savalle}
%\author{looking for contributors}
%\author{????Fred Offline????}

\editor{G.Landais}

Expand All @@ -28,7 +27,7 @@

\begin{document}
\begin{abstract}
Data Origin in the VO specifies a set of metadata items that define basic
Data Origin in the VO identifies a set of metadata items that define basic
provenance information, as well as their representation in documents produced
by Virtual Observatory (VO) services. This will improve traceability for VO
users, help them to understand result sets and facilitate data reuse and citation.
Expand All @@ -49,7 +48,7 @@ \section{Introduction}
The Virtual Observatory (VO) provides an advanced framework to search for, query, and consume astronomical data. The specification of Data Origin proposed here for VOTable output includes both metadata originating at the data producer (e.g, author, space agency, observatory) and at the data centre (publisher) hosting the resource.

At this point, depending on the implementation, users can find the information conveyed in Data Origin in the data centre web pages (landing pages) or in the VO Registry. For citation, the ADS (NASA Astrophysics Data System) offers comprehensive bibliographic capabilities, including the production of BibTeX records for publications known to ADS. However, there are no VO standards to communicate this type of information yet.
%However, there are standards for how to locate these types of information, and often it is not available machine-readably.


A list of basic data origin metadata, reliably findable in a convenient location (i.e.,
the VOTable produced by a query) will help users to properly cite or
Expand Down Expand Up @@ -129,9 +128,9 @@ \subsection{Workflow bibliography}

\section{State of the Art}

Neither VOTable \citep{2019ivoa.spec.1021O} nor IVOA data access protocols at this point provide standard facilities for conveying Data Origin information. While protocols such as TAP \citep{2019ivoa.spec.0927D} have standard interfaces to retrieve table metadata (e.g., unit, type and description of columns) or metadata on service endpoints (``capabilities'') by virtue of providing VOSI \citep{2017ivoa.spec.0524G} endpoints, for basic metadata like authors or publication dates, clients have to consult the VO Registry. Even that may be difficult, because you cannot in general obtain its IVOA identifier from a service itself.
Neither VOTable \citep{2025ivoa.spec.0116O} nor IVOA data access protocols at this point provide standard facilities for conveying Data Origin information. While protocols such as TAP \citep{2019ivoa.spec.0927D} have standard interfaces to retrieve table metadata (e.g., unit, type and description of columns) or metadata on service endpoints (``capabilities'') by virtue of providing VOSI \citep{2017ivoa.spec.0524G} endpoints, for basic metadata like authors or publication dates, clients have to consult the VO Registry. Even that may be difficult, because you cannot in general obtain its IVOA identifier from a service itself.

HiPS \citep{2017ivoa.spec.0519F} is a more recent protocol which includes for each dataset a list of standardized metadata. HiPS metadata includes authors, publication year, data centre identifier or licenses.
HiPS \citep{2017ivoa.spec.0519F} is a protocol which includes for each dataset a list of standardized metadata. HiPS metadata includes authors, publication year, data centre identifier or licenses.

\begin{figure}
\centering
Expand Down Expand Up @@ -161,7 +160,7 @@ \subsection{Data Origin in IVOA Registry}

The IVOA Registry uses a unique identifier, the IVOID
\citep{2016ivoa.spec.0523D}, as the primary key for its resource
collection. By the above considerations, this IVOID is not suitable as a means of citation.%, because it is a technical identifier with no provisions for persistence today. remove 2025-11-03
collection. By the above considerations, this IVOID is not suitable as a means of citation.

Both the Registry's metadata schema and the DataCite
\citep{std:DataCite40} metadata schema have been
Expand All @@ -185,7 +184,7 @@ \subsection{Data Origin and Provenance}
get the main entity from a resource that is the data of origin, typically in one step
(an entity generated by an activity that used another entity as origin).

This mapping is illustrate in a VO Provenance Model \citep{2020ivoa.spec.0411S} in appendix
This mapping is illustrated in a VO Provenance Model \citep{2020ivoa.spec.0411S} in appendix

%The Provenance Data Model \citep{2020ivoa.spec.0411S} is based on Entities, Agents and Activities as defined in the W3C Provenance model. The model's main focus is the detailed documentation of workflows.

Expand Down Expand Up @@ -259,10 +258,10 @@ \subsection{Query information}
particularly important.


\begin{table}
\begin{table}[h]
\hbox to\textwidth{\hss
\begin{tabular}{|l|>{\raggedright}p{7cm}|l|} \hline
\textbf{\vrule width0pt height 12pt depth 7pt Key} & \textbf{Description} & \textbf{Dublin Core}\\ \hline
\textbf{\vrule width0pt height 12pt depth 7pt Item} & \textbf{Description} & \textbf{Dublin Core}\\ \hline
% removed ivoid & IVOID of underlying data collection & R & \\ \hline
publisher & Data centre that produced the VOTable & publisher\\ \hline
%rename 23-nov-2023 version & Software version (*) & & \\ \hline
Expand All @@ -285,14 +284,12 @@ \subsection{Query information}
query part of the URL here.
More complex scenarios like UWS are not covered by this document.}
\end{tabular}\hss}
\caption{\xmlel{INFO} names available for specifying the query that
generated a VOTable}
%\caption{\xmlel{INFO} names available for specifying the query that generated a VOTable}
\caption{\xmlel{INFO} names for specifying the query that generated a VOTable}
\label{tab:query-names}
\end{table}




\subsection{Dataset Origin}
\label{sec:dataset-origin}
Dataset origin complements the query-related information to improve the
Expand All @@ -302,14 +299,13 @@ \subsection{Dataset Origin}
must be taken that the in-response metadata reflects the metadata
available there at the time the response is produced.


Table~\ref{tab:origin-names} lists the origin-related metadata items
defined here.

\begin{table}
\begin{table}[!h]
\hbox to\textwidth{\hss
\begin{tabular}{|l|>{\raggedright}p{7cm}|l|} \hline
\textbf{\vrule width0pt height 12pt depth 7pt Key} & \textbf{Description} & \textbf{Dublin Core}\\ \hline
\textbf{\vrule width0pt height 12pt depth 7pt Item} & \textbf{Description} & \textbf{Dublin Core}\\ \hline
% removed 23-nov-2023 publication\_id & Dataset identifier that can be used for citation& M & identifier\\ \hline
data\_ivoid & IVOID of underlying data collection & \\ \hline
ivoid & (deprecated) use data\_ivoid & \\ \hline
Expand All @@ -323,8 +319,6 @@ \subsection{Dataset Origin}
resource\_version & Dataset version & \\ \hline
%rename 23-nov-2023 rights & (*) Licence URI & R & rights\\ \hline
rights\_uri & Licence URI (*) & rights\\ \hline
% removed 23-nov-2023 rights\_type & (*) Licence type (eg: CC-by, CC-0, private, public) & & \\ \hline
%rename 23-nov-2023 copyrights & Copyright text & & \\ \hline
rights & Licence or Copyright text & rights\\ \hline
creator & \raggedright The person(s) mainly involved in the
creation of the resource; generally, the author(s)
Expand All @@ -342,9 +336,8 @@ \subsection{Dataset Origin}
cites & An Identifier (ivoid, DOI, bibcode) of a resource
being in a ``cites'' (**) relationship to the
originating resource & relation\\ \hline
is\_derived\_from & An Identifier (ivoid, DOI, bibcode) of a resource
being in an ``is\_derived\_from'' (**) relationship
to the originating resourcd & relation\\ \hline
is\_derived\_from & An Identifier (ivoid, DOI, bibcode) of a referenced resource
that was used to produce the current resource (**) & relation\\ \hline
% remove 23-nov-2023
%publication\_date & Date of publication (DALI timestamp) & R & \\ \hline
%resource\_date & Date of the original publication (DALI timestamp) & R & date\\ \hline
Expand Down Expand Up @@ -372,22 +365,28 @@ \subsection{VOTable serialization}
providers to describe individual tables.
This is particularly suitable for protocols like Simple Cone Search.

The basic serialization uses INFO tags to populate Data Origin (see the example of a ConeSearch result in appendix \ref{sec:appendixA}).
%The basic serialization uses INFO tags to populate Data Origin (see the example of a ConeSearch result in appendix \ref{sec:appendixA}).
\paragraph{The basic serialization} uses INFO tags to populate DataOrigin using the 'name' attribute with the items listed in Table\ref{tab:origin-names} or Table\ref{tab:query-names} (see the example of a ConeSearch result in appendix \ref{sec:appendixA}).
%uses INFO tags to populate Data Origin items using the attribute 'name' (see the example of a ConeSearch result in appendix \ref{sec:appendixA}).
INFO tags are allowed in VOTable under \xmlel{VOTABLE} or in \xmlel{RESOURCE} elements.
It is expressly allowed to supply data origin in individual
\xmlel{TABLE} or \xmlel{RESOURCE} elements in more complex VOTables.

As a best practice, the global items listed in Table \ref{tab:query-names} should be placed directly at the root of the VOTable document. If a VOTable document contains several resources or tables, the items listed in Table \ref{tab:origin-names} can be placed in their respective resources or tables.

As a service to human readers, it is recommended to put descriptions, possibly derived from definitions provided in this document, into the bodies of the INFO elements.

This specification does not at this point constrain the multiplicities of individual INFO items, and clients should not fail hard if any given INFO item occurs multiple times.

Complex queries (for instance, resulting from ADQL JOIN-s) need an advanced output serialization to gather the full metadata of all contributing resources.


\paragraph{Complex queries} (for instance, resulting from ADQL JOIN-s) need an advanced output serialization to gather the full metadata of all contributing resources.
Mechanisms to manage this requirement are being developed in the IVOA
(MIVOT).
The mechanisms defined here are generally still applicable in these
cases, but the authors acknowledge that they are certainly stretched to
their limits in such cases.

As a service to human readers, it is recommended to put descriptions, possibly derived from definitions provided in this document, into the bodies of the INFO elements.


%\section{Data Origin in Registry} REMOVE 2025-11-03
%The VO registry schema, which contains most of the Data Origin information, is completed by metadata described in VOResource \citep{2018ivoa.spec.0625P}.
Expand All @@ -401,8 +400,8 @@ \subsection{VOTable serialization}
\section{Appendix, Cone search serialization}\label{sec:appendixA}
Simple Conesearch with its VOTable serialization. Data Origin are specified using INFO.
\begin{lstlisting}[basicstyle=\footnotesize\ttfamily]
<VOTABLE version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.ivoa.net/xml/VOTable/v1.1" xsi:schemaLocation=...>
<VOTABLE version="1.5" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.ivoa.net/xml/VOTable/v1.3" xsi:schemaLocation=...>

<INFO name="server_protocol" value="ivo://ivoa.net/std/ConeSearch"/>
<INFO name="request_date" value="2022-10-30T12:08:00"
Expand Down Expand Up @@ -495,7 +494,7 @@ \section{Appendix, VOResource and Data origin}\label{sec:appendixB}
\paragraph{}
\begin{tabular}{|p{2.8cm}|p{3cm}|p{3cm}|p{4cm}|} \hline
\multicolumn{4}{|l|}{\textbf{content} } \\ \hline
source & article & RelatedIdentifier (*) & The Reference article (bibcode or DOI)\\ \hline
source & article & RelatedIdentifier (*) & A scholarly publication to cite when the data is used (bibcode or DOI)\\ \hline
referenceURL & reference\_url & & The landing page URL\\ \hline
%type & & ResourceType & The Resource type (catalog, etc)\\ \hline
%description & & Description & Usually, it is the abstract\\ \hline
Expand All @@ -512,13 +511,8 @@ \section{Appendix, VOResource and Data origin}\label{sec:appendixB}
The right element accepts free text. However, machine-readable (*) license is preferable
\\ \hline
URI & rights\_uri& rightsURI & The License URL\\ \hline
% & & rightsIdentifier & The Standard license name .ex CC-by.
% Copyright is accepted by FAIR principle. But copyright is only a link to the data producer. It gives the contact point to any users who would like to use data. Copyright is more simple to implement for data-centre that provides a copy of original resource, but its use is not well integrated in an interoperable workflow.
% \\ \hline
\multicolumn{4}{p{\textwidth}}{\small \footnotesize(*) See SPDX list \url{https://spdx.org/licenses/} or Creative Commons licenses \url{https://creativecommons.org}}
\end{tabular}\\
%\caption{Expected metadata (VOResource) with their equivalent in Datacite schema (version 4.4) to provide Data Origin in the registry.}
%\end{table}


%%\textbf{Examples}
Expand Down Expand Up @@ -581,7 +575,8 @@ \section{Appendix, Citation Template} \label{sec:appendixC}

\section{Appendix, DataOrigin and ProvDM}\label{sec:appendixD}
This is an example of Provenance extracted from a SimpleConeSearch result containing DataOrigin.
The figure is a graphical representation of provenance created using the VOPROV Python package.
The figure is a graphical representation of provenance created using the VOPROV Python package \footnote{\url{
https://github.com/sanguillon/voprov}}.

\begin{figure}[htbp]
\includegraphics[width=1.2\textwidth]{voprov_example.png}
Expand All @@ -591,10 +586,7 @@ \section{Appendix, DataOrigin and ProvDM}\label{sec:appendixD}

\section{Appendix, Changes from Previous Versions}

%No previous versions yet.
% these would be subsections "Changes from v. WD-..."
% Use itemize environments.
%\subsection{Data Origin in the VO Version 1.0}

\subsection{Difference between versions 1.1 and 1.2}
\begin{itemize}
\item New item: \textit{service\_ivoid}
Expand Down
Binary file modified voprov_example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading