diff --git a/README.md b/README.md
index 5be3d4c..7af5b93 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-
Welcome to gitlab2prov! ๐
+ gitlab2prov, github2prov: (๐ฆ|๐โโฌ) โ ๐
@@ -30,12 +30,12 @@
-> `gitlab2prov` is a Python library and command line tool that extracts provenance information from GitLab projects.
+> `gitlab2prov` is a Python library and command line tool that extracts provenance information from GitLab projects. GitHub support is provided by the `github2prov` command line tool contained in this package.
---
-The `gitlab2prov` data model has been designed according to [W3C PROV](https://www.w3.org/TR/prov-overview/) specification.
-The model documentation can be found [here](https://github.com/DLR-SC/gitlab2prov/tree/master/docs).
+The data model underlying `gitlab2prov` & `github2prov` has been designed according to [W3C PROV](https://www.w3.org/TR/prov-overview/) specification.
+The model documentation can be found [here](/docs/README.md).
## ๏ธ๐๏ธ ๏ธInstallation
@@ -57,14 +57,27 @@ pip install .[dev] # clone repo, install with extras
pip install gitlab2prov[dev] # PyPi, install with extras
```
+That's it! You can now use `gitlab2prov` and `github2prov` from the command line.
+
+```bash
+gitlab2prov --version # show version
+github2prov --version # show version
+```
+
+
## โก Getting started
-`gitlab2prov` needs a [personal access token](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html) to clone git repositories and to authenticate with the GitLab API.
-Follow [this guide](./docs/guides/tokens.md) to create an access token with the required [scopes](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html#personal-access-token-scopes).
+`gitlab2prov` & `github2prov` require a [personal access token](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html) to clone git repositories and to authenticate with the GitLab/GitHub API.
+
+Use the following guides to obtain a token with the required [scopes](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html#personal-access-token-scopes) for yourself:
+- [Create a personal access token (GitLab)](./docs/guides/gitlab-token.md)
+- [Create a personal access token (GitHub)](./docs/guides/github-token.md)
## ๐โ Usage
+The usage of `gitlab2prov` and `github2prov` is identical. The only difference being that `github2prov` only supports GitHub projects whereas `gitlab2prov` supports only GitLab projects. We will use `gitlab2prov` in the following examples.
+
`gitlab2prov` can be configured using the command line interface or by providing a configuration file in `.yaml` format.
### Command Line Usage
@@ -83,66 +96,85 @@ Options:
--help Show this message and exit.
Commands:
- combine Combine multiple graphs into one.
- extract Extract provenance information for one or more...
- load Load provenance files.
- merge-duplicated-agents Merge duplicated agents based on a name to...
- pseudonymize Pseudonymize a provenance graph.
- save Save provenance information to a file.
- stats Print statistics such as node counts and...
+ combine Combine one or more provenance documents.
+ extract Extract provenance information for one or more gitlab projects.
+ read Read provenance information from file[s].
+ stats Print statistics for one or more provenance documents.
+ transform Apply a set of transformations to provenance documents.
+ write Write provenance information to file[s].
```
### Configuration Files
`gitlab2prov` supports configuration files in `.yaml` format that are functionally equivalent to command line invocations.
-To read configuration details from a file instead of specifying on the command line, use the `--config` option:
+To envoke a run using a config file, use the `--config` option:
```ini
-# initiate a run using a config file
+# run gitlab2prov using the config file 'config/example.yaml'
gitlab2prov --config config/example.yaml
```
-You can validate your config file using the provided JSON-Schema `gitlab2prov/config/schema.json` that comes packaged with every installation:
+You can validate your config file using the provided [JSON Schema file](gitlab2prov/config/schema.json) that comes packaged with every installation:
```ini
-# check config file for syntactical errors
+# validate config file 'config/example.yaml' against the JSON Schema
gitlab2prov --validate config/example.yaml
```
-Config file example:
+Here is an example config file that extracts provenance information from three GitLab projects, reads a serialized provenance document from a file, combines the resulting provenance documents, transforms the combined document and writes it to files in different formats. Finally, statistics about the generated output are printed to the console:
```yaml
- extract:
- url: ["https://gitlab.com/example/foo"]
- token: tokenA
+ url:
+ - "https://gitlab.com/aristotle/nicomachean-ethics"
+ - "https://gitlab.com/aristotle/poetics"
+ token: golden_mean_and_drama_token
- extract:
- url: ["https://gitlab.com/example/bar"]
- token: tokenB
-- load:
- input: [example.rdf]
-- pseudonymize:
+ url:
+ - "https://gitlab.com/plato/the-republic"
+ - "https://gitlab.com/plato/phaedrus"
+ token: ideal_forms_and_speech_token
+- extract:
+ url: ["https://gitlab.com/socrates/apology"]
+ token: know_thyself_token
+- read:
+ input: [aristotelian_logic.rdf]
- combine:
-- save:
- output: combined
- format: [json, rdf, xml, dot]
+- transform:
+ use_pseudonyms: true
+ remove_duplicates: true
+- write:
+ output: philosopher_outputs
+ format: [json, rdf, xml, dot]
- stats:
- fine: true
- explain: true
- formatter: table
+ fine: true
+ explain: true
+ format: table
```
The config file example is functionally equivalent to this command line invocation:
```
-gitlab2prov extract -u https://gitlab.com/example/foo -t tokenFoo \
- extract -u https://gitlab.com/example/bar -t tokenBar \
- load -i example.rdf \
- pseudonymize \
- combine \
- save -o combined -f json -f rdf -f xml -f dot \
- stats --fine --explain --formatter table
+gitlab2prov \
+ extract \
+ --url https://gitlab.com/aristotle/nicomachean-ethics \
+ --url https://gitlab.com/aristotle/poetics \
+ --token golden_mean_and_drama_token \
+ extract \
+ --url https://gitlab.com/plato/the-republic \
+ --url https://gitlab.com/plato/phaedrus \
+ --token ideal_forms_and_speech_token \
+ extract \
+ --url https://gitlab.com/socrates/apology --token know_thyself_token \
+ read --input aristotelian_logic.rdf \
+ combine \
+ transform --use_pseudonyms --remove_duplicates \
+ write --output philosopher_outputs \
+ --format json --format rdf --format xml --format dot \
+ stats --fine --explain --format table
+
```
### ๐จ Provenance Output Formats
-`gitlab2prov` supports output formats that the [`prov`](https://github.com/trungdong/prov) library provides:
+`gitlab2prov` & `github2prov` support all output formats that the [`prov`](https://github.com/trungdong/prov) library provides:
* [PROV-N](http://www.w3.org/TR/prov-n/)
* [PROV-O](http://www.w3.org/TR/prov-o/) (RDF)
* [PROV-XML](http://www.w3.org/TR/prov-xml/)
@@ -201,7 +233,7 @@ You can also cite specific releases published on Zenodo: [ | [](https://opensource.org/licenses/BSD-3-Clause) |
| [click](https://github.com/pallets/click) | [](https://opensource.org/licenses/BSD-3-Clause) |
| [python-gitlab](https://github.com/python-gitlab/python-gitlab) | [](https://www.gnu.org/licenses/lgpl-3.0) |
diff --git a/config/example.yaml b/config/example.yaml
index 6e78e6a..d3f8eaf 100644
--- a/config/example.yaml
+++ b/config/example.yaml
@@ -1,18 +1,27 @@
# yaml-language-server: $schema=../gitlab2prov/config/schema.json
- extract:
- url: ["https://gitlab.com/example/foo"]
- token: tokenFoo
+ url:
+ - "https://gitlab.com/aristotle/nicomachean-ethics"
+ - "https://gitlab.com/aristotle/poetics"
+ token: golden_mean_and_drama_token
- extract:
- url: ["https://gitlab.com/example/bar"]
- token: tokenBar
-- load:
- input: [example.rdf]
-- pseudonymize:
+ url:
+ - "https://gitlab.com/plato/the-republic"
+ - "https://gitlab.com/plato/phaedrus"
+ token: ideal_forms_and_speech_token
+- extract:
+ url: ["https://gitlab.com/socrates/apology"]
+ token: know_thyself_token
+- read:
+ input: [aristotelian_logic.rdf]
- combine:
-- save:
- output: combined
- format: [json, rdf, xml, dot]
+- transform:
+ use_pseudonyms: true
+ remove_duplicates: true
+- write:
+ output: philosopher_outputs
+ format: [json, rdf, xml, dot]
- stats:
- fine: true
- explain: true
- formatter: table
\ No newline at end of file
+ fine: true
+ explain: true
+ format: table
\ No newline at end of file
diff --git a/docs/README.md b/docs/README.md
index 52cb91b..9c32bba 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -10,8 +10,7 @@ pip install -r requirements.txt
```
## The GitLab2PROV Provenance Model
-The provenance model for GitLab2PROV provenance graphs consists of multiple submodels, that are concerned with various types of interactions that users can have with a GitLab project aswell as with the `git` repository contained within the project.
-A few models have been compiled without prior examples others are derivations of related projects such as `git2prov` or `github2prov`.
+The GitLab2PROV provenance model comprises several submodels that address different user interactions with a GitLab project and its corresponding git repository. Some submodels have been developed without existing examples, while others are adaptations of related projects like [`git2prov`](https://github.com/IDLabResearch/Git2PROV) or [`github2prov`](https://www.usenix.org/system/files/tapp2019-paper-packer.pdf).
In total, the GitLab2PROV provenance model includes the following submodels:
@@ -23,12 +22,11 @@ In total, the GitLab2PROV provenance model includes the following submodels:
6. **GitLab: Merge Request Web Resource**
7. **GitLab: Release & Tag Resource**
-This document contains a brief explanation for each model.
-This includes, but is not limited to, a reference table for each PROV element of a model that defines which attributes are attached to the element.
-Reference tables for qualified relations, i.e. relations that with attached attributes, are also provided.
+This document provides a concise explanation for each model, including a reference table for each PROV element that defines the attributes attached to the element.
+The reference tables for qualified relations, which are relations with attached attributes, are also included.
-This document uses the Cypher query language notation to denote relationships/relations.
-The following ASCII art based notation represents a directed relation `r` of type `R` between the vertices `S` and `T`.
+To represent relationships/relations, this document uses the Cypher query language notation.
+The notation consists of an ASCII art-based representation of a directed relation `r` of type `R` between vertices `S` and `T`.
`(S)-[r:R]->(T)`
@@ -36,52 +34,65 @@ The following ASCII art based notation represents a directed relation `r` of typ

-This model captures the addition of a new file to the git repository of a GitLab project by a git commit.
+This model captures the addition of a new file to a GitLab project's git repository by a git commit.
-The model includes all human actors involved in the process.
-In this case these actors are the author and the committer of the git commit represented as agents in the model.
-The author represents the user that originally wrote the code contained in the commit.
-The committer represents the user that committed the code on behalf of the author.
-Committer and author can be the same person but do not have to be.
+All human actors involved in the process are included in the model.
+These actors are represented as agents in the model.
+The author represents the user that originally wrote the code contained in the commit.
+The committer represents the user that committed the code on behalf of the author.
+The committer and author can be the same person but do not have to be.
-The commit aswell as all of its parents are captured as activities.
-Each commit is said to be informed by its parent commit, as each commit builds upon the git repository that the parent commits left behind.
-The commit is associated to both author and committer as these are the actors responsible for its existance.
+The commit and all of its parents are captured as activities.
+Each commit is informed by its parent commit.
+The commit is associated with both the author and committer as these are the actors responsible for its existence.
-Two entities are created for the file that was added in the commit.
-One, the File entity, represents the origin of the added file aswell as the concept of its originality.
-The second entity, called FileRevision, represents the added file at the time of its addition.
-The revision are marked as a specialization of the file origin.
-Both entities are generated by the commit activity.
-Both entities are attributed to the author of the commit, the actor responsible for their content and creation.
+Two entities are created for the added file.
+The File entity represents the origin of the added file as well as the concept of its originality.
+The second entity, called FileRevision, represents the added file at the time of its addition.
+The revisions are marked as specializations of the file origin.
+Both entities are generated by the commit activity and are attributed to the author of the commit, the actor responsible for their content and creation.
**`Author`**
-| Attribute | Fixed Value | Description |
-| ---------- | ----------- | -------------------------------------------------------- |
-| name | - | `git config user.name` Set in the author's git config. |
-| email | - | `git config user.email` Set in the author's git config. |
-| prov:role | Author | Function of the agent in context of the commit activity. |
-| prov:type | User | Agent type. |
-| prov:label | - | Human readable representation of the agent. |
+| Attribute | Fixed Value | Description |
+|-----------------|-------------|----------------------------------------------------------|
+| name | - | `git config user.name` Set in the author's git config. |
+| email | - | `git config user.email` Set in the author's git config. |
+| gitlab_username | - | Gitlab user account username. |
+| github_username | - | Github user account username. |
+| gitlab_id | - | Gitlab user id. |
+| github_id | - | Github user id. |
+| prov:role | Author | Function of the agent in context of the commit activity. |
+| prov:type | User | Agent type. |
+| prov:label | - | Human readable representation of the agent. |
**`Committer`**
-| Attribute | Fixed Value | Description |
-| ---------- | ----------- | -------------------------------------------------------- |
-| name | - | `git config user.name` Set in the author's git config. |
-| email | - | `git config user.email` Set in the author's git config. |
-| prov:role | Committer | Function of the agent in context of the commit activity. |
-| prov:type | User | Agent type. |
-| prov:label | - | Human readable representation of the agent. |
+| Attribute | Fixed Value | Description |
+|-----------------|-------------|----------------------------------------------------------|
+| name | - | `git config user.name` Set in the author's git config. |
+| email | - | `git config user.email` Set in the author's git config. |
+| gitlab_username | - | Gitlab user account username. |
+| github_username | - | Github user account username. |
+| gitlab_id | - | Gitlab user id. |
+| github_id | - | Github user id. |
+| prov:role | Committer | Function of the agent in context of the commit activity. |
+| prov:type | User | Agent type. |
+| prov:label | - | Human readable representation of the agent. |
**`Commit`**
| Attribute | Fixed Value | Description |
-| -------------- | ----------------------- | ------------------------------------------- |
-| hexsha | - | Commit SHA1 |
-| message | - | Commit message. |
+|----------------|-------------------------|---------------------------------------------|
+| sha | - | Commit SHA1 |
| title | - | First 50 characters of the commit message. |
+| message | - | Commit message. |
+| deletions | - | Number of lines deleted. |
+| insertions | - | Number of lines inserted. |
+| lines | - | Number of lines changed. |
+| files | - | Number of files changed. |
+| authored_at | - | Time at which the commit was authored. |
+| committed_at | - | Time at which the commit was committed. |
| prov:startTime | `COMMIT_AUTHOR_DATE` | Time at which the commit activity started. |
| prov:endTime | `COMMIT_COMMITTER_DATE` | Time at which the commit activity ended. |
| prov:type | GitCommit | Activity type. |
@@ -89,22 +100,28 @@ Both entities are attributed to the author of the commit, the actor responsible
**`File`**
-| Attribute | Fixed Value | Description |
-| ------------ | ----------- | ------------------------------------------------------------------ |
-| path | - | Original file path. The path at which this file was first created. |
-| committed_in | - | SHA1 of the commit that added this file to the repository. |
-| prov:type | File | Entity type. |
-| prov:label | - | Human readable representation of the entity. |
+| Attribute | Fixed Value | Description |
+|------------|-------------|--------------------------------------------------------------------|
+| name | - | Original file name. |
+| path | - | Original file path. The path at which this file was first created. |
+| commit | - | SHA1 of the commit that added this file to the repository. |
+| prov:type | File | Entity type. |
+| prov:label | - | Human readable representation of the entity. |
**`File Revision`**
-| Attribute | Fixed Value | Description |
-| ------------ | ------------ | ---------------------------------------------------------------------------- |
-| path | - | Current file path of this revision. |
-| committed_in | - | SHA1 of the commit that added this revision to the repository. |
-| change_type | - | [`git diff`](https://git-scm.com/docs/git-diff) change type / change status. |
-| prov:type | FileRevision | Entity type. |
-| prov:label | - | Human readable representation of the entity. |
+| Attribute | Fixed Value | Description |
+|------------|------------------------------------|-----------------------------------------------------------------------------------------------------|
+| name | - | Current file name. |
+| path | - | Current file path of this revision. |
+| commit | - | SHA1 of the commit that added this revision to the repository. |
+| status | `added` or `modified` or `deleted` | Change status of the file revision. |
+| insertions | - | Number of lines inserted. |
+| deletions | - | Number of lines deleted. |
+| lines | - | Number of lines changed. |
+| score | - | Percentage of similarity compared to previous revision ([Docs](https://git-scm.com/docs/git-diff)). |
+| prov:type | FileRevision | Entity type. |
+| prov:label | - | Human readable representation of the entity. |
Some PROV relations in this model are "qualified" relations.
@@ -114,27 +131,31 @@ The following tables define the attributes attached to these relations.
**`File - [wasGeneratedBy] -> Commit`**
| Attribute | Fixed Value | Description |
-| --------- | -------------------- | -------------------------------------------------------------- |
+|-----------|----------------------|----------------------------------------------------------------|
| prov:role | File | Function of the File entity in context of the Commit activity. |
| prov:time | `COMMIT_AUTHOR_DATE` | Time at which the File entity was generated. |
**`File Revision - [wasGeneratedBy] -> Commit`**
-| Attribute | Fixed Value | Description |
-| --------- | ----------------------------- | ---------------------------------------------------------------------- |
-| prov:role | FileRevisionAtPointOfAddition | Function of the FileRevision entity in context of the Commit activity. |
-| prov:time | `COMMIT_AUTHOR_DATE` | Time at which the FileRevision entity was generated. |
+| Attribute | Fixed Value | Description |
+|------------|-------------------------------|-----------------------------------------------------------------------------------------------------|
+| insertions | - | Number of lines inserted. |
+| deletions | - | Number of lines deleted. |
+| lines | - | Number of lines changed. |
+| score | - | Percentage of similarity compared to previous revision ([Docs](https://git-scm.com/docs/git-diff)). |
+| prov:role | FileRevisionAtPointOfAddition | Function of the FileRevision entity in context of the Commit activity. |
+| prov:time | `COMMIT_AUTHOR_DATE` | Time at which the FileRevision entity was generated. |
**`Commit - [wasAssociatedWith] -> Author`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | --------------------------------------------------------------- |
+|-----------|-------------|-----------------------------------------------------------------|
| prov:role | Author | Function of the Author agent in context of the Commit activity. |
**`Commit - [wasAssociatedWith] -> Committer`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | ------------------------------------------------------------------ |
+|-----------|-------------|--------------------------------------------------------------------|
| prov:role | Committer | Function of the Committer agent in context of the Commit activity. |
@@ -142,32 +163,31 @@ The following tables define the attributes attached to these relations.

-This model captures the modification of a file from the git repository of a GitLab project by a git commit.
+This model captures the modification of a file in a GitLab project's git repository by a git commit.
-The model includes all human actors involved in the process.
-In this case these actors are the author and the committer of the git commit represented as agents in the model.
-The author represents the user that originally wrote the code contained in the commit.
-The committer represents the user that committed the code on behalf of the author.
-Committer and author can be the same person but do not have to be.
+All human actors involved in the process are included in the model.
+These actors are represented as agents in the model.
+The author represents the user that originally wrote the code contained in the commit.
+The committer represents the user that committed the code on behalf of the author.
+The committer and author can be the same person but do not have to be.
-The commit aswell as all of its parents are captured as activities.
-Each commit is said to be informed by its parent commit, as each commit builds upon the git repository that the parent commits left behind.
-The commit is associated to both author and committer as these are the actors responsible for its existance.
+The commit and all of its parents are captured as activities.
+Each commit is informed by its parent commit.
+The commit is associated with both the author and committer as these are the actors responsible for its existence.
The commit uses the PreviousFileRevision to generate a new revision that accounts for the modifications included in the commit.
-Three entities are created for the modified file.
-One, the File entity, represents the origin of the file aswell as the concept of its originality.
-The File entity will already exist, due to it being created in the 'Git: Addition of a File' model.
-The second entity, called PreviousFileRevision, represents the latest file revision of the modified file before modification.
-The third entity, called FileRevision, represents the revision of the file after the modification has been accounted for.
-The FileRevision is said to be derived from the previous revision of the modified file.
-The FileRevision is generated by the commit activity.
+Three entities are created for the modified file.
+The File entity represents the origin of the file as well as the concept of its originality.
+The File entity already exists due to it being created in the 'Git: Addition of a File' model.
+The second entity, called PreviousFileRevision, represents the latest file revision of the modified file before modification.
+The third entity, called FileRevision, represents the revision of the file after the modification has been accounted for.
+The FileRevision is derived from the previous revision of the modified file and is generated by the commit activity.
All revisions are marked as specializations of the File entity.
**`Author`**
| Attribute | Fixed Value | Description |
-| ---------- | ----------- | -------------------------------------------------------- |
+|------------|-------------|----------------------------------------------------------|
| name | - | `git config user.name` Set in the author's git config. |
| email | - | `git config user.email` Set in the author's git config. |
| prov:role | Author | Function of the agent in context of the commit activity. |
@@ -177,7 +197,7 @@ All revisions are marked as specializations of the File entity.
**`Committer`**
| Attribute | Fixed Value | Description |
-| ---------- | ----------- | -------------------------------------------------------- |
+|------------|-------------|----------------------------------------------------------|
| name | - | `git config user.name` Set in the author's git config. |
| email | - | `git config user.email` Set in the author's git config. |
| prov:role | Committer | Function of the agent in context of the commit activity. |
@@ -187,10 +207,14 @@ All revisions are marked as specializations of the File entity.
**`Commit`**
| Attribute | Fixed Value | Description |
-| -------------- | ----------------------- | ---------------------------------------------- |
-| hexsha | - | Commit SHA1 |
-| message | - | Commit message. |
+|----------------|-------------------------|------------------------------------------------|
+| sha | - | Commit SHA1 |
| title | - | First 50 characters of the commit message. |
+| message | - | Commit message. |
+| deletions | - | Number of lines deleted. |
+| insertions | - | Number of lines inserted. |
+| lines | - | Number of lines changed. |
+| files | - | Number of files changed. |
| prov:startTime | `COMMIT_AUTHOR_DATE` | Time at which the commit activity started. |
| prov:endTime | `COMMIT_COMMITTER_DATE` | Time at which the commit activity ended. |
| prov:type | GitCommit | Activity type. |
@@ -198,32 +222,43 @@ All revisions are marked as specializations of the File entity.
**`File`**
-| Attribute | Fixed Value | Description |
-| ------------ | ----------- | ------------------------------------------------------------------ |
-| path | - | Original file path. The path at which this file was first created. |
-| committed_in | - | SHA1 of the commit that added this file to the repository. |
-| prov:type | File | Entity type. |
-| prov:label | - | Human readable representation of the entity. |
+| Attribute | Fixed Value | Description |
+|------------|-------------|--------------------------------------------------------------------|
+| name | - | Original file name. |
+| path | - | Original file path. The path at which this file was first created. |
+| commit | - | SHA1 of the commit that added this file to the repository. |
+| prov:type | File | Entity type. |
+| prov:label | - | Human readable representation of the entity. |
**`File Revision`**
-| Attribute | Fixed Value | Description |
-| ------------ | ------------ | ---------------------------------------------------------------------------- |
-| path | - | Current file path of this revision. |
-| committed_in | - | SHA1 of the commit that added this revision to the repository. |
-| change_type | - | [`git diff`](https://git-scm.com/docs/git-diff) change type / change status. |
-| prov:type | FileRevision | Entity type. |
-| prov:label | - | Human readable representation of the entity. |
+| Attribute | Fixed Value | Description |
+|------------|------------------------------------|-----------------------------------------------------------------------------------------------------|
+| name | - | Original file name. |
+| path | - | Original file path. The path at which this file was first created. |
+| commit | - | SHA1 of the commit that added this file to the repository. |
+| status | `added` or `modified` or `deleted` | Change status of the file revision. |
+| insertions | - | Number of lines inserted. |
+| deletions | - | Number of lines deleted. |
+| lines | - | Number of lines changed. |
+| score | - | Percentage of similarity compared to previous revision ([Docs](https://git-scm.com/docs/git-diff)). |
+| prov:type | FileRevision | Entity type. |
+| prov:label | - | Human readable representation of the entity. |
**`Previous File Revision`**
-| Attribute | Fixed Value | Description |
-| ------------ | ------------ | ---------------------------------------------------------------------------- |
-| path | - | Current file path of this revision. |
-| committed_in | - | SHA1 of the commit that added this revision to the repository. |
-| change_type | - | [`git diff`](https://git-scm.com/docs/git-diff) change type / change status. |
-| prov:type | FileRevision | Entity type. |
-| prov:label | - | Human readable representation of the entity. |
+| Attribute | Fixed Value | Description |
+|------------|------------------------------------|-----------------------------------------------------------------------------------------------------|
+| name | - | Original file name. |
+| path | - | Original file path. The path at which this file was first created. |
+| commit | - | SHA1 of the commit that added this file to the repository. |
+| status | `added` or `modified` or `deleted` | Change status of the file revision. |
+| insertions | - | Number of lines inserted. |
+| deletions | - | Number of lines deleted. |
+| lines | - | Number of lines changed. |
+| score | - | Percentage of similarity compared to previous revision ([Docs](https://git-scm.com/docs/git-diff)). |
+| prov:type | FileRevision | Entity type. |
+| prov:label | - | Human readable representation of the entity. |
Some PROV relations in this model are "qualified" relations.
In simple terms: Some PROV relations have attributes attached to them.
@@ -232,27 +267,31 @@ The following tables define the attributes attached to these relations.
**`Commit - [used] -> Previous File Revision`**
| Attribute | Fixed Value | Description |
-| --------- | ------------------------------ | -------------------------------------------------------------- |
+|-----------|--------------------------------|----------------------------------------------------------------|
| prov:role | FileRevisionBeforeModification | Function of the File entity in context of the Commit activity. |
| prov:time | `COMMIT_AUTHOR_DATE` | Time at which the File entity was used. |
**`File Revision - [wasGeneratedBy] -> Commit`**
-| Attribute | Fixed Value | Description |
-| --------- | ----------------------------- | -------------------------------------------------------------- |
-| prov:role | FileRevisionAfterModification | Function of the File entity in context of the Commit activity. |
-| prov:time | `COMMIT_AUTHOR_DATE` | Time at which the File entity was generated. |
+| Attribute | Fixed Value | Description |
+|------------|-------------------------------|-----------------------------------------------------------------------------------------------------|
+| insertions | - | Number of lines inserted. |
+| deletions | - | Number of lines deleted. |
+| lines | - | Number of lines changed. |
+| score | - | Percentage of similarity compared to previous revision ([Docs](https://git-scm.com/docs/git-diff)). |
+| prov:role | FileRevisionAfterModification | Function of the File entity in context of the Commit activity. |
+| prov:time | `COMMIT_AUTHOR_DATE` | Time at which the File entity was generated. |
**`Commit - [wasAssociatedWith] -> Author`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | --------------------------------------------------------------- |
+|-----------|-------------|-----------------------------------------------------------------|
| prov:role | Author | Function of the Author agent in context of the Commit activity. |
**`Commit - [wasAssociatedWith] -> Committer`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | ------------------------------------------------------------------ |
+|-----------|-------------|--------------------------------------------------------------------|
| prov:role | Committer | Function of the Committer agent in context of the Commit activity. |
@@ -260,28 +299,26 @@ The following tables define the attributes attached to these relations.

-This model captures the deletion of a file from the git repository of a GitLab project by a git commit.
+This model documents the removal of a file from the git repository of a GitLab project through a git commit.
-The model includes all human actors involved in the process.
-In this case these actors are the author and the committer of the git commit represented as agents in the model.
-The author represents the user that originally wrote the code contained in the commit.
-The committer represents the user that committed the code on behalf of the author.
-Committer and author can be the same person but do not have to be.
+The model accounts for all human actors involved in the process.
+These actors are represented as agents in the model and include the author and the committer of the git commit.
+The author represents the user who originally wrote the code contained in the commit, while the committer represents the user who committed the code on behalf of the author.
+It is possible for the committer and author to be the same person.
-The commit aswell as all of its parents are captured as activities.
-Each commit is said to be informed by its parent commit, as each commit builds upon the git repository that the parent commits left behind.
-The commit is associated to both author and committer, as these are the actors responsible for its existance.
+The commit, as well as all of its parents, are captured as activities.
+Each commit builds upon the git repository that the parent commits left behind and is informed by its parent commit.
+The commit is attributed to both author and committer, as these are the actors responsible for its existence.
-Two entities are created for the deleted file.
-One, the File entity, represents the origin of the file aswell as the concept of its originality.
-The second, the FileRevision entity, represents the revision of the file at the point of its deletion.
-The revision is invalidated by the commit that deletes / removes it from the repository.
-The deleted revision is marked as a specialization of the original File entity.
+The model generates two entities for the deleted file.
+The File entity represents the origin of the file and the concept of its originality.
+The second entity, FileRevision, represents the revision of the file at the time of its deletion.
+The deleted revision is invalidated by the commit that removes it from the repository and is marked as a specialization of the original File entity.
**`Author`**
| Attribute | Fixed Value | Description |
-| ---------- | ----------- | -------------------------------------------------------- |
+|------------|-------------|----------------------------------------------------------|
| name | - | `git config user.name` Set in the author's git config. |
| email | - | `git config user.email` Set in the author's git config. |
| prov:role | Author | Function of the agent in context of the commit activity. |
@@ -291,7 +328,7 @@ The deleted revision is marked as a specialization of the original File entity.
**`Committer`**
| Attribute | Fixed Value | Description |
-| ---------- | ----------- | -------------------------------------------------------- |
+|------------|-------------|----------------------------------------------------------|
| name | - | `git config user.name` Set in the author's git config. |
| email | - | `git config user.email` Set in the author's git config. |
| prov:role | Committer | Function of the agent in context of the commit activity. |
@@ -301,10 +338,14 @@ The deleted revision is marked as a specialization of the original File entity.
**`Commit`**
| Attribute | Fixed Value | Description |
-| -------------- | ----------------------- | ------------------------------------------- |
-| hexsha | - | Commit SHA1 |
-| message | - | Commit message. |
+|----------------|-------------------------|---------------------------------------------|
+| sha | - | Commit SHA1 |
| title | - | First 50 characters of the commit message. |
+| message | - | Commit message. |
+| deletions | - | Number of lines deleted. |
+| insertions | - | Number of lines inserted. |
+| lines | - | Number of lines changed. |
+| files | - | Number of files changed. |
| prov:startTime | `COMMIT_AUTHOR_DATE` | Time at which the commit activity started. |
| prov:endTime | `COMMIT_COMMITTER_DATE` | Time at which the commit activity ended. |
| prov:type | GitCommit | Activity type. |
@@ -312,22 +353,28 @@ The deleted revision is marked as a specialization of the original File entity.
**`File`**
-| Attribute | Fixed Value | Description |
-| ------------ | ----------- | ------------------------------------------------------------------ |
-| path | - | Original file path. The path at which this file was first created. |
-| committed_in | - | SHA1 of the commit that added this file to the repository. |
-| prov:type | File | Entity type. |
-| prov:label | - | Human readable representation of the entity. |
+| Attribute | Fixed Value | Description |
+|------------|-------------|--------------------------------------------------------------------|
+| name | - | Original file name. |
+| path | - | Original file path. The path at which this file was first created. |
+| commit | - | SHA1 of the commit that added this file to the repository. |
+| prov:type | File | Entity type. |
+| prov:label | - | Human readable representation of the entity. |
**`File Revision`**
-| Attribute | Fixed Value | Description |
-| ------------ | ------------ | ---------------------------------------------------------------------------- |
-| path | - | Current file path of this revision. |
-| committed_in | - | SHA1 of the commit that added this revision to the repository. |
-| change_type | - | [`git diff`](https://git-scm.com/docs/git-diff) change type / change status. |
-| prov:type | FileRevision | Entity type. |
-| prov:label | - | Human readable representation of the entity. |
+| Attribute | Fixed Value | Description |
+|------------|------------------------------------|-----------------------------------------------------------------------------------------------------|
+| name | - | Original file name. |
+| path | - | Original file path. The path at which this file was first created. |
+| commit | - | SHA1 of the commit that added this file to the repository. |
+| status | `added` or `modified` or `deleted` | Change status of the file revision. |
+| insertions | - | Number of lines inserted. |
+| deletions | - | Number of lines deleted. |
+| lines | - | Number of lines changed. |
+| score | - | Percentage of similarity compared to previous revision ([Docs](https://git-scm.com/docs/git-diff)). |
+| prov:type | FileRevision | Entity type. |
+| prov:label | - | Human readable representation of the entity. |
Some PROV relations in this model are "qualified" relations.
@@ -337,63 +384,65 @@ The following tables define the attributes attached to these relations.
**`Commit - [wasAssociatedWith] -> Author`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | --------------------------------------------------------------- |
+|-----------|-------------|-----------------------------------------------------------------|
| prov:role | Author | Function of the Author agent in context of the Commit activity. |
**`Commit - [wasAssociatedWith] -> Committer`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | ------------------------------------------------------------------ |
+|-----------|-------------|--------------------------------------------------------------------|
| prov:role | Committer | Function of the Committer agent in context of the Commit activity. |
**`File Revision - [wasInvalidatedBy] -> Commit`**
-| Attribute | Fixed Value | Description |
-| --------- | ----------------------------- | ---------------------------------------------------------------------- |
-| prov:time | `COMMIT_AUTHOR_DATE` | Time at which the FileRevision entity was invalidated. |
-| prov:role | FileRevisionAtPointOfDeletion | Function of the FileRevision entity in context of the Commit activity. |
+| Attribute | Fixed Value | Description |
+|------------|-------------------------------|------------------------------------------------------------------------|
+| insertions | - | Number of lines inserted. |
+| deletions | - | Number of lines deleted. |
+| lines | - | Number of lines changed. |
+| prov:time | `COMMIT_AUTHOR_DATE` | Time at which the FileRevision entity was invalidated. |
+| prov:role | FileRevisionAtPointOfDeletion | Function of the FileRevision entity in context of the Commit activity. |
## GitLab: Commit Web Resource

-This model captures the creation and annotation of a GitLab commit web resource, i.e. the webpage of a git commit as displayed by GitLab.
-
-GitLab creates a webpage for a commit as soon as the commit is pushed to the GitLab remote.
-Users can interact with the webpage by, among other interactions, leaving a comment in the comment section.
-GitLab captures some of these interactions and stores them in internal data structures.
-Comments written by users are therefore retrievable through the GitLab API.
-Retrievable interactions such as comments are considered to "annotate" the web resource.
-
-The model includes all human actors involved in the process of the creation or annotation of a GitLab commit web resource.
-In this case these actors are the author of the GitLab commit web resource aswell as all users responsible for annotations.
-The author represents the user that pushed the commit to the GitLab remote and consequently triggered the creation of the web resource.
-An annotator is a user that is responsible for the existance of an annotation such as a comment.
-In case of the annotation being a comment, the responsible annotator would be the author of the comment.
-
-The creation of the web resource is captured as an activity.
-The creation activity is informed by the corresponding git commit that triggered the creation of the commit web resource.
-The creation activity is associated with the user that pushed the commit to the GitLab remote.
-In the context of the creation activity, this user is called "Gitlab Commit Author".
-Each annotation is captured as an activity that uses the latest version of the web resource to generate a new one.
-Each annotation is associated with the user that is responsible for creating it.
-Each annotation is informed by either the annotation that precedes it or - if no annotations have been recorded so far - the creation activity.
-The annotations form a chain of events, that corresponds to the chain of interactions between users and the GitLab commit web resource.
-
-The commit web resource is captured by multiple entities.
-One for the original web resource and its concept of originality called "GitLab Commit".
-A second one for the version of the GitLab commit web resource at the time of its creation, called "Commit Version".
-One entity per annotation capturing the commit web resource right after the annotation happened, called "Annotated Commit Version".
-The original web resource and the resource version at the point of creation is generated by the creation activity.
-The original web resource and the first resource version are attributed to the gitlab commit author.
-Each annotated commit version is generated by the corresponding annotation activity.
-Each annotated commit version is attributed to its annotator.
+This model focuses on capturing the creation and annotation of a GitLab commit web resource, which refers to the webpage of a Git commit as displayed by GitLab.
+
+Upon pushing a commit to the GitLab remote, GitLab automatically generates a webpage for the commit.
+Users can interact with this webpage by leaving comments in the comment section and engaging in other interactions.
+GitLab captures and stores some of these interactions in internal data structures.
+As a result, comments written by users and other stored interactions can be retrieved through the GitLab API.
+In this model, these retrievable interactions, such as comments, are considered as annotations to the web resource.
+
+The model encompasses all human actors involved in the process of creating or annotating a GitLab commit web resource.
+This includes the author of the GitLab commit web resource, who is the user that pushed the commit to the GitLab remote, triggering the creation of the web resource.
+Additionally, it includes all users who are responsible for annotations, such as comments.
+
+An annotator is a user who is responsible for the existence of an annotation, such as a comment, on the GitLab commit web resource.
+For instance, in the case of a comment being an annotation, the responsible annotator would be the author of that comment.
+The model accounts for all these actors, providing a comprehensive representation of the human involvement in the creation and annotation of the GitLab commit web resource.
+
+The creation of the GitLab commit web resource is captured as an activity, which is informed by the corresponding git commit that triggered the creation. The user who pushed the commit to the GitLab remote is associated with the creation activity and referred to as the "GitLab Commit Author".
+
+Each annotation is represented as an activity that generates a new version of the web resource by using the latest version.
+The user responsible for creating the annotation is associated with the annotation activity.
+Annotations are informed by either the preceding annotation, if any, or the creation activity, if no annotations have been recorded yet.
+This forms a chain of events, reflecting the chain of interactions between users and the GitLab commit web resource.
+
+The GitLab commit web resource is represented by multiple entities. Firstly, an entity for the original web resource and its concept of originality called "GitLab Commit". Secondly, an entity for the version of the GitLab commit web resource at the time of its creation, referred to as "Commit Version". For each annotation, a separate entity called "Annotated Commit Version" captures the state of the commit web resource after the annotation occurred.
+
+The creation activity generates the original web resource entity and the Commit Version entity, representing the web resource at the point of creation. Both of these entities are attributed to the GitLab Commit Author, who pushed the commit to the GitLab remote.
+
+Each annotated commit version is generated by the corresponding annotation activity, capturing the web resource state after the annotation.
+The Annotated Commit Version entity is attributed to its annotator, who is responsible for creating the annotation.
+This way, the model captures the lineage of the GitLab commit web resource and associates it with the relevant users and activities throughout the process.
**`Gitlab Commit Author`**
| Attribute | Fixed Value | Description |
-| ---------- | ------------------ | -------------------------------------------------------- |
+|------------|--------------------|----------------------------------------------------------|
| name | - | `git config user.name` Set in the author's git config. |
| email | - | `git config user.email` Set in the author's git config. |
| prov:role | GitlabCommitAuthor | Function of the agent in context of the commit activity. |
@@ -402,22 +451,28 @@ Each annotated commit version is attributed to its annotator.
**`Annotator`**
-| Attribute | Fixed Value | Description |
-| --------------- | ----------- | -------------------------------------------------------------- |
-| name | - | Annotator given name. As set in the annotators GitLab profile. |
-| gitlab_username | - | GitLab username. As set in the annotators GitLab profile. |
-| gitlab_id | - | Gitlab internal user id. |
-| prov:role | Annotator | Function of the agent in context of the commit activity. |
-| prov:type | User | Agent type. |
-| prov:label | - | Human readable representation of the agent. |
+| Attribute | Fixed Value | Description |
+|-----------------|-------------|-----------------------------------------------------------|
+| name | - | Annotator given name. |
+| gitlab_username | - | GitLab username. As set in the annotators GitLab profile. |
+| github_username | - | GitHub username. As set in the annotators GitHub profile. |
+| gitlab_id | - | Gitlab user id. |
+| github_id | - | GitHub user id. |
+| prov:role | Annotator | Function of the agent in context of the commit activity. |
+| prov:type | User | Agent type. |
+| prov:label | - | Human readable representation of the agent. |
**`Git Commit`**
| Attribute | Fixed Value | Description |
-| -------------- | ----------------------- | ---------------------------------------------- |
-| hexsha | - | Commit SHA1 |
-| message | - | Commit message. |
+|----------------|-------------------------|------------------------------------------------|
+| sha | - | Commit SHA1 |
| title | - | First 50 characters of the commit message. |
+| message | - | Commit message. |
+| deletions | - | Number of lines deleted. |
+| insertions | - | Number of lines inserted. |
+| lines | - | Number of lines changed. |
+| files | - | Number of files changed. |
| prov:startTime | `COMMIT_AUTHOR_DATE` | Time at which the commit activity started. |
| prov:endTime | `COMMIT_COMMITTER_DATE` | Time at which the commit activity ended. |
| prov:type | GitCommit | Activity type. |
@@ -426,8 +481,8 @@ Each annotated commit version is attributed to its annotator.
**`Creation`**
| Attribute | Fixed Value | Description |
-| -------------- | ----------------------- | ----------------------------------------------- |
-| creation_id | - | SHA1 of the commit that triggered the creation. |
+|----------------|-------------------------|-------------------------------------------------|
+| id | - | SHA1 of the commit that triggered the creation. |
| prov:startTime | `COMMIT_COMMITTER_DATE` | Time at which the web resource was created. |
| prov:endTime | `COMMIT_COMMITTER_DATE` | Time at which the web resource was created. |
| prov:type | GitlabCommitCreation | Activity type. |
@@ -436,9 +491,9 @@ Each annotated commit version is attributed to its annotator.
**`Annotation`**
| Attribute | Fixed Value | Description |
-| -------------- | ----------- | ----------------------------------------------------------------------------- |
+|----------------|-------------|-------------------------------------------------------------------------------|
| id | - | Internal GitLab ID of the datastructure from which the annotation was parsed. |
-| type | - | Annotation type. Parsed from the annotation body. |
+| name | - | Annotation name/class. Parsed from the annotation body. |
| body | - | Annotation string. The string from which the type is parsed. |
| prov:startTime | - | Time at which the annotation was created. |
| prov:endTime | - | Time at which the annotation was created. |
@@ -453,29 +508,30 @@ All recognized annotation types are listed in the "Annotations" section of this
**`Commit`**
-| Attribute | Fixed Value | Description |
-| ---------- | ----------- | ----------------------------------------------------- |
-| hexsha | - | Commit SHA1. |
-| url | - | URL to the webpage of the gitlab commit web resource. |
-| prov:type | Resource | Entity type. |
-| prov:label | - | Human readable representation of the entity. |
+| Attribute | Fixed Value | Description |
+|------------|----------------------|-------------------------------------------------------|
+| sha | - | Commit SHA1. |
+| url | - | URL to the webpage of the gitlab commit web resource. |
+| platform | `gitlab` or `github` | Platform identifier string. |
+| prov:type | Resource | Entity type. |
+| prov:label | - | Human readable representation of the entity. |
**`Commit Version`**
| Attribute | Fixed Value | Description |
-| ---------- | ------------------------- | -------------------------------------------- |
-| version_id | - | Commit SHA1. |
+|------------|---------------------------|----------------------------------------------|
+| id | - | Commit SHA1. |
| prov:type | ResourceAtPointOfAddition | Entity type. |
| prov:label | - | Human readable representation of the entity. |
**`Annotated Commit Version`**
-| Attribute | Fixed Value | Description |
-| ------------- | ------------------------ | -------------------------------------------- |
-| version_id | - | Commit SHA1. |
-| annotation_id | - | Gitlab annotation id. |
-| prov:type | AnnotatedResourceVersion | Entity type. |
-| prov:label | - | Human readable representation of the entity. |
+| Attribute | Fixed Value | Description |
+|------------|--------------------------|----------------------------------------------|
+| id | - | Commit SHA1. |
+| annotation | - | Gitlab annotation id. |
+| prov:type | AnnotatedResourceVersion | Entity type. |
+| prov:label | - | Human readable representation of the entity. |
Some PROV relations in this model are "qualified" relations.
@@ -485,40 +541,40 @@ The following tables define the attributes attached to these relations.
**`Creation - [wasAssociatedWith] -> Gitlab Commit Author`**
| Attribute | Fixed Value | Description |
-| --------- | ------------------ | ----------------------------------------------------------------------------- |
+|-----------|--------------------|-------------------------------------------------------------------------------|
| prov:role | GitlabCommitAuthor | Function of the Gitlab Commit Author agent in context of the Commit activity. |
**` Annotation - [wasAssociatedWith] -> Annotator`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | ------------------------------------------------------------------ |
+|-----------|-------------|--------------------------------------------------------------------|
| prov:role | Annotator | Function of the Annotator agent in context of the Commit activity. |
**`Commit - [wasGeneratedBy] -> Creation`**
| Attribute | Fixed Value | Description |
-| --------- | ---------------------- | ---------------------------------------------------------------- |
+|-----------|------------------------|------------------------------------------------------------------|
| prov:role | GitlabCommitCreation | Function of the Commit entity in context of the Commit activity. |
| prov:time | `COMMIT_COMMITER_DATE` | Time at which the Commit entity was generated. |
**`Commit Version - [wasGeneratedBy] -> Creation`**
| Attribute | Fixed Value | Description |
-| --------- | ---------------------- | ------------------------------------------------------------------------ |
+|-----------|------------------------|--------------------------------------------------------------------------|
| prov:role | GitlabCommitVersion | Function of the Commit Version entity in context of the Commit activity. |
| prov:time | `COMMIT_COMMITER_DATE` | Time at which the Commit Version entity was generated. |
**`Annotated Commit Version - [wasGeneratedBy] -> Creation`**
| Attribute | Fixed Value | Description |
-| --------- | ---------------------------- | ---------------------------------------------------------------- |
+|-----------|------------------------------|------------------------------------------------------------------|
| prov:role | AnnotatedGitlabCommitVersion | Function of the commit entity in context of the commit activity. |
| prov:time | - | Time at which the annotated commit version entity was generated. |
**`Annotated Commit Version - [used] -> Creation`**
| Attribute | Fixed Value | Description |
-| --------- | ------------------------------------------------- | ---------------------------------------------------------------- |
+|-----------|---------------------------------------------------|------------------------------------------------------------------|
| prov:role | AnnotatedGitlabCommitVersion, GitlabCommitVersion | Function of the commit entity in context of the commit activity. |
| prov:time | - | Time at which the annotated commit version entity was generated. |
@@ -527,42 +583,46 @@ The following tables define the attributes attached to these relations.

-This model captures the creation and annotation of a GitLab issue web resource, i.e. the webpage of an issue as displayed by GitLab.
+The GitLab: Issue Web Resource model represents the creation and annotation of a GitLab issue web resource, specifically the webpage of an issue as displayed in GitLab.
+
+The structure of the GitLab: Issue Web Resource model is similar to that of the GitLab: Commit Web Resource model.
+GitLab's issue tracker can be accessed via the GitLab API just like the commit web resources.
+Both models share similar concepts and ideas behind their design.
+
+The model encompasses all human actors involved in the creation or annotation process of a GitLab issue web resource.
+These actors include the author of the GitLab issue web resource, as well as users responsible for annotations.
-GitLab provides an issue tracker which is accessable through the GitLab API.
-The GitLab: Issue Web Resource model is structurally similar to the GitLab: Commit Web Resource model.
-The idea behind it is very similar aswell.
+The issue author refers to the user who originally opened/created the issue.
-The model includes all human actors involved in the process of the creation or annotation of a gitlab issue web resource.
-In this case the actors are the author of the gitlab issue web resource aswell as all users responsible for annotations.
-The issue author represents the user that opened/created the issue in the first place.
-An annotator is a user that is responsible for the existance of an annotation such as a comment, label, etc.
-For example: In case of the annotation being a comment, the responsible annotator would be the author of the comment.
+An annotator is a user who is responsible for creating annotations, such as comments, labels, etc.
+For instance, in the case of a comment annotation, the author of the comment is considered the responsible annotator.
-The creation of the web resource is captured as an activity.
-The creation activity is associated with the user that opened/created the issue.
-In the context of the creation activity, this user is called "Issue Author".
-Each annotation is captured as an activity that uses the latest version of the web resource to generate a new one.
-Each annotation is associated with the user that is responsible for creating it.
-Each annotation is informed by either the annotation that precedes it or - if no annotations have been recorded so far - the creation activity.
-The annotations form a chain of events, that corresponds to the chain of interactions between users and the gitlab issue web resource.
+The creation of the web resource is captured as an activity, which is associated with the user who opened/created the issue, referred to as the "Issue Author" within the context of the creation activity.
-The issue web resource is captured by multiple entities.
-One for the original web resource and its concept of originality called "Issue".
-A second one for the version of the gitlab issue web resource at the time of its creation, called "Issue Version".
-One entity per annotation capturing the issue web resource right after the annotation happened, called "Annotated Issue Version".
-The original web resource and the resource version at the point of creation is generated by the creation activity.
-The original web resource and the first resource version are attributed to the gitlab issue author.
-Each annotated issue version is generated by the corresponding annotation activity.
-Each annotated issue version is attributed to its annotator.
+Each annotation is captured as an activity that uses the latest version of the web resource to generate a new version.
+Each annotation is associated with the user responsible for creating it.
+
+Annotations are informed by either the preceding annotation, if any, or the creation activity if no annotations have been recorded yet.
+This creates a chain of events that corresponds to the interactions between users and the GitLab issue web resource.
+
+The issue web resource is represented by multiple entities, including one for the original web resource referred to as "Issue" which captures its concept of originality.
+Another entity is created for the version of the GitLab issue web resource at the time of its creation, called "Issue Version".
+Additionally, one entity is created per annotation to capture the issue web resource right after the annotation has occurred, known as "Annotated Issue Version".
+
+Both the original web resource and the resource version at the point of creation are generated by the creation activity, and are attributed to the GitLab issue author.
+
+Each annotated issue version is generated by the corresponding annotation activity, and is attributed to its annotator.
+This allows for capturing the changes in the issue web resource after each annotation activity has taken place.
**`Issue Author`**
| Attribute | Fixed Value | Description |
-| --------------- | ----------- | -------------------------------------------------------- |
-| name | - | Author name. As set in the authors gitlab profile. |
+|-----------------|-------------|----------------------------------------------------------|
+| name | - | Author name. |
| gitlab_username | - | GitLab username. As set in the authors gitlab profile. |
-| gitlab_id | - | Gitlab internal user id. |
+| github_username | - | GitHub username. As set in the authors gitlab profile. |
+| gitlab_id | - | GitLab user id. |
+| github_id | - | GitHub user id. |
| prov:role | IssueAuthor | Function of the agent in context of the commit activity. |
| prov:type | User | Agent type. |
| prov:label | - | Human readable representation of the agent. |
@@ -570,10 +630,12 @@ Each annotated issue version is attributed to its annotator.
**`Annotator`**
| Attribute | Fixed Value | Description |
-| --------------- | ----------- | -------------------------------------------------------------- |
+|-----------------|-------------|----------------------------------------------------------------|
| name | - | Annotator given name. As set in the annotators gitlab profile. |
-| gitlab_username | - | GitLab username. As set in the annotators gitlab profile. |
-| gitlab_id | - | Gitlab internal user id. |
+| gitlab_username | - | GitLab username. As set in the authors gitlab profile. |
+| github_username | - | GitHub username. As set in the authors gitlab profile. |
+| gitlab_id | - | GitLab user id. |
+| github_id | - | GitHub user id. |
| prov:role | Annotator | Function of the agent in context of the commit activity. |
| prov:type | User | Agent type. |
| prov:label | - | Human readable representation of the agent. |
@@ -581,8 +643,8 @@ Each annotated issue version is attributed to its annotator.
**`Creation`**
| Attribute | Fixed Value | Description |
-| -------------- | ------------- | ---------------------------------------------- |
-| creation_id | - | Gitlab issue id. |
+|----------------|---------------|------------------------------------------------|
+| id | - | Gitlab issue id. |
| prov:startTime | - | Time at which the web resource was created. |
| prov:endTime | - | Time at which the web resource was created. |
| prov:type | IssueCreation | Activity type. |
@@ -591,9 +653,9 @@ Each annotated issue version is attributed to its annotator.
**`Annotation`**
| Attribute | Fixed Value | Description |
-| -------------- | ----------- | ----------------------------------------------------------------------------- |
+|----------------|-------------|-------------------------------------------------------------------------------|
| id | - | Internal gitlab id of the datastructure from which the annotation was parsed. |
-| type | - | Annotation type. Parsed from the annotation body. |
+| name | - | Annotation name/class. Parsed from the annotation body. |
| body | - | Annotation string. The string from which the type is parsed. |
| prov:startTime | - | Time at which the annotation was created. |
| prov:endTime | - | Time at which the annotation was created. |
@@ -608,34 +670,35 @@ All recognized annotation types are listed in the "Annotations" section of this
**`Issue`**
-| Attribute | Fixed Value | Description |
-| ----------- | ----------- | -------------------------------------------- |
-| id | - | Gitlab issue ID. |
-| iid | - | Internal Gitlab issue ID. |
-| title | - | Issue title. |
-| description | - | Issue description. |
-| url | - | URL to the gitlab issue. |
-| created_at | - | Time at which the issue was created at. |
-| closed_at | - | Time at which the issue was closed at. |
-| prov:type | Issue | Entity type. |
-| prov:label | - | Human readable representation of the entity. |
+| Attribute | Fixed Value | Description |
+|------------|----------------------|----------------------------------------------|
+| id | - | Issue ID. |
+| iid | - | Internal issue ID. |
+| title | - | Issue title. |
+| body | - | Issue body. |
+| platform | `gitlab` or `github` | Platform identifier string. |
+| url | - | Issue webpage url. |
+| created_at | - | Time at which the issue was created at. |
+| closed_at | - | Time at which the issue was closed at. |
+| prov:type | Issue | Entity type. |
+| prov:label | - | Human readable representation of the entity. |
**`Issue Version`**
| Attribute | Fixed Value | Description |
-| ---------- | ------------ | -------------------------------------------- |
-| version_id | - | Gitlab id of the issue. |
+|------------|--------------|----------------------------------------------|
+| id | - | GitLab/GitHub id of the issue. |
| prov:type | IssueVersion | Entity type. |
| prov:label | - | Human readable representation of the entity. |
**`Annotated Issue Version`**
-| Attribute | Fixed Value | Description |
-| ------------- | --------------------- | ----------------------------------------------------------------- |
-| version_id | - | Gitlab id of the issue. |
-| annotation_id | - | Gitlab id of the annotation that generated the annotated version. |
-| prov:type | AnnotatedIssueVersion | Entity type. |
-| prov:label | - | Human readable representation of the entity. |
+| Attribute | Fixed Value | Description |
+|------------|-----------------------|--------------------------------------------------------------------------|
+| id | - | GitLab/GitHub id of the issue. |
+| annotation | - | GitLab/GitHub id of the annotation that generated the annotated version. |
+| prov:type | AnnotatedIssueVersion | Entity type. |
+| prov:label | - | Human readable representation of the entity. |
Some PROV relations in this model are "qualified" relations.
@@ -645,40 +708,40 @@ The following tables define the attributes attached to these relations.
**`Creation - [wasAssociatedWith] -> Issue Author`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | ----------------------------------------------------------------------- |
+|-----------|-------------|-------------------------------------------------------------------------|
| prov:role | IssueAuthor | Function of the issue author agent in context of the creation activity. |
**`Annotation - [wasAssociatedWith] -> Annotator`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | ---------------------------------------------------------------------- |
+|-----------|-------------|------------------------------------------------------------------------|
| prov:role | Annotator | Function of the annotator agent in context of the annotation activity. |
**`Issue - [wasGeneratedBy] -> Creation`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | ----------------------------------------------------------------- |
+|-----------|-------------|-------------------------------------------------------------------|
| prov:role | Resource | Function of the issue entity in context of the creation activity. |
| prov:time | - | Time at which the issue entity was generated. |
**`Issue Version - [wasGeneratedBy] -> Creation`**
| Attribute | Fixed Value | Description |
-| --------- | -------------------------------- | ----------------------------------------------------------------- |
+|-----------|----------------------------------|-------------------------------------------------------------------|
| prov:role | ResourceVersionAtPointOfCreation | Function of the issue entity in context of the creation activity. |
| prov:time | - | Time at which the issue version entity was generated. |
**`Annotated Issue Version - [wasGeneratedBy] -> Annotation`**
| Attribute | Fixed Value | Description |
-| --------- | ------------------------------ | ----------------------------------------------------------------- |
+|-----------|--------------------------------|-------------------------------------------------------------------|
| prov:role | ResourceVersionAfterAnnotation | Function of the issue entity in context of the creation activity. |
| prov:time | - | Time at which the annotated issue version entity was generated. |
**`Annotation - [used] -> Issue Version`**
| Attribute | Fixed Value | Description |
-| --------- | ---------------------------- | ----------------------------------------------------------------- |
+|-----------|------------------------------|-------------------------------------------------------------------|
| prov:role | ResourceVersionToBeAnnotated | Function of the issue entity in context of the creation activity. |
| prov:time | - | Time at which the issue version entity was generated. |
@@ -687,41 +750,49 @@ The following tables define the attributes attached to these relations.

+The GitLab: Merge Request Web Resource model represents the creation and annotation of a GitLab merge request web resource, which refers to the webpage of a merge request as displayed in GitLab.
+
+The structure of the GitLab: Merge Request Web Resource model is similar to that of the GitLab: Commit Web Resource model.
+
+Both models share similar concepts and ideas behind their design.
+
This model captures the creation and annotation of a GitLab merge request web resource, i.e. the webpage of a merge request as displayed by GitLab.
-The GitLab: Merge Request Web Resource model is structurally similar to the GitLab: Commit Web Resource model.
-The idea behind it is very similar aswell.
-
-The model includes all human actors involved in the process of the creation or annotation of a gitlab merge request web resource.
-In this case the actors are the author of the gitlab merge request web resource aswell as all users responsible for annotations.
-The issue author represents the user that opened/created the merge request in the first place.
-An annotator is a user that is responsible for the existance of an annotation such as a comment, label, etc.
-For example: In case of the annotation being a comment, the responsible annotator would be the author of the comment.
-
-The creation of the web resource is captured as an activity.
-The creation activity is associated with the user that opened/created the merge request.
-In the context of the creation activity, this user is called "Merge Request Author".
-Each annotation is captured as an activity that uses the latest version of the web resource to generate a new one.
-Each annotation is associated with the user that is responsible for creating it.
-Each annotation is informed by either the annotation that precedes it or - if no annotations have been recorded so far - the creation activity.
-The annotations form a chain of events, that corresponds to the chain of interactions between users and the gitlab merge request web resource.
-
-The merge request web resource is captured by multiple entities.
-One for the original web resource and its concept of originality called "Merge Request".
-A second one for the version of the gitlab merge request web resource at the time of its creation, called "Merge Request Version".
-One entity per annotation capturing the merge request web resource right after the annotation happened, called "Annotated Merge Request Version".
-The original web resource and the resource version at the point of creation is generated by the creation activity.
-The original web resource and the first resource version are attributed to the gitlab merge request author.
-Each annotated merge request version is generated by the corresponding annotation activity.
-Each annotated merge request version is attributed to its annotator.
+The model encompasses all human actors involved in the creation or annotation process of a GitLab merge request web resource.
+These actors include the author of the GitLab merge request web resource, as well as users responsible for annotations.
+
+The merge request author refers to the user who originally opened/created the merge request.
+An annotator is a user who is responsible for creating annotations, such as comments, labels, etc.
+For instance, in the case of a comment annotation, the author of the comment is considered the responsible annotator.
+
+The creation of the merge request web resource is represented as an activity in the model.
+The creation activity is associated with the user who opened/created the merge request, referred to as "Merge Request Author" in the context of the creation activity.
+
+Each annotation is captured as an activity that utilizes the latest version of the web resource to generate a new version.
+Each annotation is associated with the user who is responsible for creating it.
+
+Annotations are informed by either the preceding annotation or, if no annotations have been recorded yet, the creation activity itself.
+These annotations collectively form a chain of events that corresponds to the interactions between users and the GitLab merge request web resource.
+
+The GitLab merge request web resource is represented by multiple entities in the model.
+One entity represents the original web resource and its concept of originality, referred to as "Merge Request".
+A second entity represents the version of the GitLab merge request web resource at the time of its creation, called "Merge Request Version".
+
+For each annotation, a separate entity is created to capture the merge request web resource right after the annotation occurred, called "Annotated Merge Request Version".
+
+The original web resource and the resource version at the point of creation are generated by the creation activity, and are attributed to the GitLab merge request author.
+
+Each annotated merge request version is generated by the corresponding annotation activity, and is attributed to its annotator.
**`Merge Request Author`**
| Attribute | Fixed Value | Description |
-| --------------- | ------------------ | --------------------------------------------------------------- |
-| name | - | Author name. As set in the authors GitLab profile. |
-| gitlab_username | - | GitLab username. As set in the authors GitLab profile. |
-| gitlab_id | - | Gitlab user id. |
+|-----------------|--------------------|-----------------------------------------------------------------|
+| name | - | Author name. |
+| gitlab_username | - | GitLab username. As set in the authors gitlab profile. |
+| github_username | - | GitHub username. As set in the authors gitlab profile. |
+| gitlab_id | - | GitLab user id. |
+| github_id | - | GitHub user id. |
| prov:role | MergeRequestAuthor | Function of the agent in context of the merge request activity. |
| prov:type | User | Agent type. |
| prov:label | - | Human readable representation of the agent. |
@@ -729,10 +800,12 @@ Each annotated merge request version is attributed to its annotator.
**`Annotator`**
| Attribute | Fixed Value | Description |
-| --------------- | ----------- | --------------------------------------------------------------- |
-| name | - | Annotator given name. As set in the annotators GitLab profile. |
-| gitlab_username | - | GitLab username. As set in the annotators GitLab profile. |
-| gitlab_id | - | Gitlab user id. |
+|-----------------|-------------|-----------------------------------------------------------------|
+| name | - | Author name. |
+| gitlab_username | - | GitLab username. As set in the authors gitlab profile. |
+| github_username | - | GitHub username. As set in the authors gitlab profile. |
+| gitlab_id | - | GitLab user id. |
+| github_id | - | GitHub user id. |
| prov:role | Annotator | Function of the agent in context of the merge request activity. |
| prov:type | User | Agent type. |
| prov:label | - | Human readable representation of the agent. |
@@ -740,8 +813,8 @@ Each annotated merge request version is attributed to its annotator.
**`Creation`**
| Attribute | Fixed Value | Description |
-| -------------- | -------------------- | ---------------------------------------------- |
-| creation_id | - | Gitlab merge request id. |
+|----------------|----------------------|------------------------------------------------|
+| id | - | GitLab/GitHub merge request id. |
| prov:startTime | - | Time at which the web resource was created. |
| prov:endTime | - | Time at which the web resource was created. |
| prov:type | MergeRequestCreation | Activity type. |
@@ -749,15 +822,15 @@ Each annotated merge request version is attributed to its annotator.
**`Annotation`**
-| Attribute | Fixed Value | Description |
-| -------------- | ----------- | ----------------------------------------------------------------------------- |
-| id | - | Internal gitLab id of the datastructure from which the annotation was parsed. |
-| type | - | Annotation type. Parsed from the annotation body. |
-| body | - | Annotation string. The string from which the type is parsed. |
-| prov:startTime | - | Time at which the annotation was created. |
-| prov:endTime | - | Time at which the annotation was created. |
-| prov:type | Annotation | Activity type. |
-| prov:label | - | Human readable representation of the activity. |
+| Attribute | Fixed Value | Description |
+|----------------|-------------|------------------------------------------------------------------------|
+| id | - | Internal id of the datastructure from which the annotation was parsed. |
+| name | - | Annotation name/class. Parsed from the annotation body. |
+| body | - | Annotation string. The string from which the type is parsed. |
+| prov:startTime | - | Time at which the annotation was created. |
+| prov:endTime | - | Time at which the annotation was created. |
+| prov:type | Annotation | Activity type. |
+| prov:label | - | Human readable representation of the activity. |
The set of attributes for annotations can change according to the annotation type.
@@ -767,38 +840,39 @@ All recognized annotation types are listed in the "Annotations" section of this
**`Merge Request`**
-| Attribute | Fixed Value | Description |
-| ------------------------------- | ------------ | ----------------------------------------------------------------- |
-| id | - | Gitlab merge request id. |
-| iid | - | Internal gitlab merge request id. |
-| title | - | Issue title. |
-| description | - | Issue description. |
-| url | - | URL to the gitlab issue. |
-| source_branch | - | Merge request source branch name. |
-| target_branch | - | Merge request target branch name. |
-| created_at | - | Time at which the merge request was created at. |
-| closed_at | - | Time at which the merge request was closed at. |
-| merged_at | - | Time at which the merge request was merged at. |
-| first_deployed_to_production_at | - | Time at which the merge request was first deployed to production. |
-| prov:type | MergeRequest | Entity type. |
-| prov:label | - | Human readable representation of the entity. |
+| Attribute | Fixed Value | Description |
+|---------------------------------|----------------------|-------------------------------------------------------------------|
+| id | - | GitLab/GitHub merge request id. |
+| iid | - | Internal GitLab/GitHub merge request id. |
+| title | - | Merge request title. |
+| body | - | Merge request body. |
+| url | - | URL to the GitLab/GitHub merge request. |
+| platform | `gitlab` or `github` | Platform identifier string. |
+| source_branch | - | Merge request source branch name. |
+| target_branch | - | Merge request target branch name. |
+| created_at | - | Time at which the merge request was created at. |
+| closed_at | - | Time at which the merge request was closed at. |
+| merged_at | - | Time at which the merge request was merged at. |
+| first_deployed_to_production_at | - | Time at which the merge request was first deployed to production. |
+| prov:type | MergeRequest | Entity type. |
+| prov:label | - | Human readable representation of the entity. |
**`Merge Request Version`**
| Attribute | Fixed Value | Description |
-| ---------- | ------------------------- | -------------------------------------------- |
-| version_id | - | Gitlab id of the merge request. |
+|------------|---------------------------|----------------------------------------------|
+| id | - | Gitlab/Github id of the merge request. |
| prov:type | GitlabMergeRequestVersion | Entity type. |
| prov:label | - | Human readable representation of the entity. |
**`Annotated Merge Request Version`**
-| Attribute | Fixed Value | Description |
-| ------------- | ---------------------------- | ----------------------------------------------------------------- |
-| version_id | - | Gitlab id of the merge request. |
-| annotation_id | - | Gitlab id of the annotation that generated the annotated version. |
-| prov:type | AnnotatedMergeRequestVersion | Entity type. |
-| prov:label | - | Human readable representation of the entity. |
+| Attribute | Fixed Value | Description |
+|------------|------------------------------|--------------------------------------------------------------------------|
+| id | - | Gitlab/Github id of the merge request. |
+| annotation | - | Gitlab/Github id of the annotation that generated the annotated version. |
+| prov:type | AnnotatedMergeRequestVersion | Entity type. |
+| prov:label | - | Human readable representation of the entity. |
Some PROV relations in this model are "qualified" relations.
@@ -808,40 +882,40 @@ The following tables define the attributes attached to these relations.
**`Creation - [wasAssociatedWith] -> Merge Request Author`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | ------------------------------------------------------------------------------- |
+|-----------|-------------|---------------------------------------------------------------------------------|
| prov:role | IssueAuthor | Function of the merge request author agent in context of the creation activity. |
**`Annotation - [wasAssociatedWith] -> Annotator`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | ---------------------------------------------------------------------- |
+|-----------|-------------|------------------------------------------------------------------------|
| prov:role | Annotator | Function of the annotator agent in context of the annotation activity. |
**`Merge Request - [wasGeneratedBy] -> Creation`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | ------------------------------------------------------------------------- |
+|-----------|-------------|---------------------------------------------------------------------------|
| prov:role | Resource | Function of the merge request entity in context of the creation activity. |
| prov:time | - | Time at which the merge request entity was generated. |
**`Merge Request Version - [wasGeneratedBy] -> Creation`**
| Attribute | Fixed Value | Description |
-| --------- | -------------------------------- | ------------------------------------------------------------------------- |
+|-----------|----------------------------------|---------------------------------------------------------------------------|
| prov:role | ResourceVersionAtPointOfCreation | Function of the merge request entity in context of the creation activity. |
| prov:time | - | Time at which the merge request version entity was generated. |
**`Annotated Merge Request Version - [wasGeneratedBy] -> Annotation`**
| Attribute | Fixed Value | Description |
-| --------- | ------------------------------ | ------------------------------------------------------------------------- |
+|-----------|--------------------------------|---------------------------------------------------------------------------|
| prov:role | ResourceVersionAfterAnnotation | Function of the merge request entity in context of the creation activity. |
| prov:time | - | Time at which the annotated merge request version entity was generated. |
**`Annotation - [used] -> Merge Request Version`**
| Attribute | Fixed Value | Description |
-| --------- | ---------------------------- | ------------------------------------------------------------------------- |
+|-----------|------------------------------|---------------------------------------------------------------------------|
| prov:role | ResourceVersionToBeAnnotated | Function of the merge request entity in context of the creation activity. |
| prov:time | - | Time at which the merge request version entity was generated. |
@@ -873,7 +947,7 @@ The commit is generated by the commit creation activity.
**`Asset`**
| Attribute | Fixed Value | Description |
-| ---------- | ----------- | -------------------------------------------- |
+|------------|-------------|----------------------------------------------|
| url | - | Asset URL. |
| format | - | Asset format. |
| prov:type | Asset | Entity type. |
@@ -882,8 +956,8 @@ The commit is generated by the commit creation activity.
**`Evidence`**
| Attribute | Fixed Value | Description |
-| ------------ | ----------- | -------------------------------------------- |
-| hexsha | - | Evidence SHA. |
+|--------------|-------------|----------------------------------------------|
+| sha | - | Evidence SHA. |
| url | - | Evidence URL. |
| collected_at | - | Time at which the evidence was generated. |
| prov:type | Asset | Entity type. |
@@ -892,19 +966,23 @@ The commit is generated by the commit creation activity.
**`Commit`**
| Attribute | Fixed Value | Description |
-| ---------- | ----------- | -------------------------------------------- |
-| hexsha | - | Commit SHA1 |
-| message | - | Commit message. |
+|------------|-------------|----------------------------------------------|
+| sha | - | Commit SHA1 |
| title | - | First 50 characters of the commit message. |
+| message | - | Commit message. |
+| deletions | - | Number of lines deleted. |
+| insertions | - | Number of lines inserted. |
+| lines | - | Number of lines changed. |
+| files | - | Number of files changed. |
| prov:type | GitCommit | Entity type. |
| prov:label | - | Human readable representation of the entity. |
**`Tag`**
| Attribute | Fixed Value | Description |
-| ---------- | --------------- | ------------------------------------------------- |
+|------------|-----------------|---------------------------------------------------|
| name | - | Tag name. |
-| hexsha | - | Commit SHA1 of the commit that pushed the tag. |
+| sha | - | Commit SHA1 of the commit that pushed the tag. |
| message | - | Commit message of the commit that pushed the tag. |
| created_at | - | Time at which the tag was created. |
| prov:type | Tag | Entity type. |
@@ -913,21 +991,22 @@ The commit is generated by the commit creation activity.
**`Release`**
-| Attribute | Fixed Value | Description |
-| ----------- | --------------- | -------------------------------------------- |
-| name | - | Release name. |
-| description | - | Release description. |
-| tag_name | - | Release tag name. |
-| created_at | - | Time at which the release was created. |
-| released_at | - | Time at which the release was released. |
-| prov:type | Tag | Entity type. |
-| prov:type | prov:Collection | Entity type. |
-| prov:label | - | Human readable representation of the entity. |
+| Attribute | Fixed Value | Description |
+|-------------|----------------------|----------------------------------------------|
+| name | - | Release name. |
+| body | - | Release body. |
+| tag_name | - | Release tag name. |
+| platform | `gitlab` or `github` | Platform identifier string. |
+| created_at | - | Time at which the release was created. |
+| released_at | - | Time at which the release was released. |
+| prov:type | Tag | Entity type. |
+| prov:type | prov:Collection | Entity type. |
+| prov:label | - | Human readable representation of the entity. |
**`Commit Author`**
| Attribute | Fixed Value | Description |
-| ---------- | ----------- | -------------------------------------------------------- |
+|------------|-------------|----------------------------------------------------------|
| name | - | `git config user.name` Set in the author's git config. |
| email | - | `git config user.email` Set in the author's git config. |
| prov:role | Author | Function of the agent in context of the commit activity. |
@@ -937,7 +1016,7 @@ The commit is generated by the commit creation activity.
**`Tag Author`**
| Attribute | Fixed Value | Description |
-| ---------- | ----------- | -------------------------------------------------------------- |
+|------------|-------------|----------------------------------------------------------------|
| name | - | `git config user.name` Set in the author's git config. |
| email | - | `git config user.email` Set in the author's git config. |
| prov:role | Author | Function of the agent in context of the tag creation activity. |
@@ -947,11 +1026,13 @@ The commit is generated by the commit creation activity.
**`Release Author`**
| Attribute | Fixed Value | Description |
-| --------------- | ------------- | ------------------------------------------------------------------------------------------------ |
+|-----------------|---------------|--------------------------------------------------------------------------------------------------|
| name | - | Author name. As set in the authors GitLab profile. Only available if the token has admin rights. |
| email | - | Author email. Set in the author's git config. Only available if the token has admin rights. |
-| gitlab_username | - | GitLab username. As set in the authors GitLab profile. |
+| gitlab_username | - | GitLab username. As set in the annotators GitLab profile. |
+| github_username | - | GitHub username. As set in the annotators GitHub profile. |
| gitlab_id | - | Gitlab user id. |
+| github_id | - | GitHub user id. |
| prov:role | ReleaseAuthor | Function of the agent in context of the release creation activity. |
| prov:type | User | Agent type. |
| prov:label | - | Human readable representation of the agent. |
@@ -959,8 +1040,8 @@ The commit is generated by the commit creation activity.
**`Commit Creation`**
| Attribute | Fixed Value | Description |
-| -------------- | -------------- | ---------------------------------------------- |
-| creation_id | - | Commit SHA1. |
+|----------------|----------------|------------------------------------------------|
+| id | - | Commit SHA1. |
| prov:startTime | - | Time at which the commit was created. |
| prov:endTime | - | Time at which the commit was created. |
| prov:type | CommitCreation | Activity type. |
@@ -969,8 +1050,8 @@ The commit is generated by the commit creation activity.
**`Tag Creation`**
| Attribute | Fixed Value | Description |
-| -------------- | ----------- | ---------------------------------------------- |
-| creation_id | - | Tag name. |
+|----------------|-------------|------------------------------------------------|
+| id | - | Tag name. |
| prov:startTime | - | Time at which the tag was created. |
| prov:endTime | - | Time at which the tag was created. |
| prov:type | TagCreation | Activity type. |
@@ -979,8 +1060,8 @@ The commit is generated by the commit creation activity.
**`Release Creation`**
| Attribute | Fixed Value | Description |
-| -------------- | --------------- | ---------------------------------------------- |
-| creation_id | - | Tag name. |
+|----------------|-----------------|------------------------------------------------|
+| id | - | Tag name. |
| prov:startTime | - | Time at which the release was created. |
| prov:endTime | - | Time at which the release was realeased. |
| prov:type | ReleaseCreation | Activity type. |
@@ -994,76 +1075,73 @@ The following tables define the attributes attached to these relations.
**`Release Creation - [wasAssociatedWith] -> Release Author`**
| Attribute | Fixed Value | Description |
-| --------- | ------------- | ------------------------------------------------------------------------------- |
+|-----------|---------------|---------------------------------------------------------------------------------|
| prov:role | ReleaseAuthor | Function of the merge request author agent in context of the creation activity. |
**`Tag Creation - [wasAssociatedWith] -> Tag Author`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | ------------------------------------------------------------------------------- |
+|-----------|-------------|---------------------------------------------------------------------------------|
| prov:role | TagAuthor | Function of the merge request author agent in context of the creation activity. |
**`Commit Creation - [wasAssociatedWith] -> Commit Author`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | ------------------------------------------------------------------------------- |
+|-----------|-------------|---------------------------------------------------------------------------------|
| prov:role | Author | Function of the merge request author agent in context of the creation activity. |
**`Release - [wasGeneratedBy] -> Release Creation`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | --------------------------------------------------------------------------- |
+|-----------|-------------|-----------------------------------------------------------------------------|
| prov:role | Release | Function of the release entity in context of the release creation activity. |
| prov:time | - | Time at which the release entity was generated. |
**`Tag - [wasGeneratedBy] -> Tag Creation`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | ------------------------------------------------------------------- |
+|-----------|-------------|---------------------------------------------------------------------|
| prov:role | Tag | Function of the tag entity in context of the tag creation activity. |
| prov:time | - | Time at which the tag entity was generated. |
**`Commit - [wasGeneratedBy] -> Commit Creation`**
| Attribute | Fixed Value | Description |
-| --------- | ----------- | ------------------------------------------------------------------------- |
+|-----------|-------------|---------------------------------------------------------------------------|
| prov:role | Tag | Function of the commit entity in context of the commit creation activity. |
| prov:time | - | Time at which the commit entity was generated. |
## Annotations
-GitLab displays annotations that occur on resources on the webpages of the respective resources.
-For example, if a resource was mentioned in the comment thread of another resource, this mention is displayed in the comment section of the mentioned target.
+GitLab allows annotations or comments to be displayed on the webpages of respective resources.
+For example, if a resource (such as an issue or merge request) is mentioned in the comment thread of another resource, GitLab displays that mention in the comment section of the mentioned target.
+This allows for discussions, references, and annotations to be visible and accessible within the context of the related resources, making it easier for users to collaborate and track discussions on GitLab webpages.

-These annotations can be parsed from multiple sources that are provided by the official GitLab API.
-Sadly there is no dedicated endpoint for all annotations that are of interest.
-Especially annotations that connect resources are difficult to get.
-Here a quick summary of what data needs to be retrieved, how to parse it and the workarounds that we deployed to achieve annotation parsing.
+Annotations can be parsed from various sources provided by the official GitLab API.
+However, there is no dedicated endpoint for retrieving all annotations of interest, particularly those that connect resources, which can be challenging to obtain.
+Here is a brief summary of the data that needs to be retrieved, how to parse it, and the workarounds that have been deployed to achieve annotation parsing.
-For label events we use the official API endpoint from which we parse the appropriate annotations ("add_label", "remove_label").
-Emoji awards can be retrieved from the appropriate API endpoint.
-We parse everything else - such as mentions, time tracking stats, due dates, TODO's, etc. - from system notes that GitLab uses to display annotations in their web-interface.
+To retrieve label events, we use the official API endpoint and parse the relevant annotations such as "add_label" and "remove_label".
+Emoji awards can be retrieved from the appropriate API endpoint.
+For other annotations such as mentions, time tracking stats, due dates, TODO's, etc., we parse them from system notes that GitLab uses to display annotations in their web interface.
-System notes include a string that describe the annotation that they represent.
-We classify the annotation that the string denotes using regular expressions.
-If necessary we include named groups in the regular expressions to extract relevant information from the annotation strings.
-These are later added to PROV element attributes.
+System notes contain a string that describes the annotation they represent.
+We classify the annotation based on the string using regular expressions, and use named groups in the regular expressions to extract relevant information from the annotation strings.
+These extracted information are later added to PROV element attributes.
-Noted, this is not optimal as older GitLab versions employ different string notations for the same annotation.
-Sometimes only differing by a few characters and other times having a completly different string for the same annotation.
-In addition there is a problem when parsing imported projects.
-For example, while parsing a project that was imported from SVN, relevant annotations wheren't recorded as system notes but rather as normal notes.
-This is not accounted for and is - as of right now - not covered by the current note parsing approach.
+Noted, this approach may not be optimal as older GitLab versions may employ different string notations for the same annotation, sometimes differing by only a few characters or even having completely different strings for the same annotation.
+Additionally, there may be issues when parsing imported projects, where relevant annotations may not be recorded as system notes but rather as normal notes.
+This is not currently accounted for in the current note parsing approach.
-Here a list of annotations that we are currently able to parse with a short description of what the annotation is and the API resource from which we parse that annotation.
+Here is a list of annotations that we are currently able to parse, along with a short description of what the annotation is and the API resource from which we parse that annotation.
### List of Annotations
| Annotation Type | Description | Parsed API Resource |
-| ----------------------------------------------- | ------------------------------------------------------------------------------------ | ------------------- |
+|-------------------------------------------------|--------------------------------------------------------------------------------------|---------------------|
| `remove_label` | Removed label from a resource. | Label Event |
| `change_target_branch` | Change merge request target. branch. | System Note |
| `status_changed_to_merged` | Change status of merge request to merged. | System Note |
diff --git a/docs/guides/github-token.md b/docs/guides/github-token.md
new file mode 100644
index 0000000..b713db2
--- /dev/null
+++ b/docs/guides/github-token.md
@@ -0,0 +1,50 @@
+# Create a personal access token (GitHub)
+
+### 1. Go to GitHub
+
+
+### 2. Click on View profile and more
+
+
+
+### 3. Click on Settings
+
+
+
+### 4. Click on Developer settings
+
+
+
+### 5. Click on Personal access tokens
+
+
+
+### 6. Click on Tokens (classic)
+
+
+
+### 7. Click on Generate new token
+
+
+
+### 8. Click on Generate new token (classic)โฆ
+
+
+
+### 9. Assign a name to your token to remember its purpose
+
+
+
+### 10. Check repo โฆ
+
+
+
+### 11. Click on Generate token
+
+
+
+### 12. Click on Copy token
+
+
+
+### 13. Done!
diff --git a/docs/guides/tokens.md b/docs/guides/gitlab-token.md
similarity index 100%
rename from docs/guides/tokens.md
rename to docs/guides/gitlab-token.md
diff --git a/gitlab2prov/adapters/fetch/__init__.py b/gitlab2prov/adapters/fetch/__init__.py
deleted file mode 100644
index 9daabd6..0000000
--- a/gitlab2prov/adapters/fetch/__init__.py
+++ /dev/null
@@ -1,2 +0,0 @@
-from gitlab2prov.adapters.fetch.git import GitFetcher
-from gitlab2prov.adapters.fetch.gitlab import GitlabFetcher
diff --git a/gitlab2prov/adapters/fetch/annotations/__init__.py b/gitlab2prov/adapters/fetch/annotations/__init__.py
deleted file mode 100644
index db17990..0000000
--- a/gitlab2prov/adapters/fetch/annotations/__init__.py
+++ /dev/null
@@ -1,4 +0,0 @@
-from gitlab2prov.adapters.fetch.annotations.classifiers import CLASSIFIERS
-from gitlab2prov.adapters.fetch.annotations.classifiers import IMPORT_STATEMENT
-from gitlab2prov.adapters.fetch.annotations.classifiers import AnnotationClassifier
-from gitlab2prov.adapters.fetch.annotations.parse import parse_annotations
diff --git a/gitlab2prov/adapters/fetch/annotations/parse.py b/gitlab2prov/adapters/fetch/annotations/parse.py
deleted file mode 100644
index c423222..0000000
--- a/gitlab2prov/adapters/fetch/annotations/parse.py
+++ /dev/null
@@ -1,185 +0,0 @@
-import logging
-import operator
-import uuid
-from typing import Any
-from typing import Callable
-from typing import Sequence
-from typing import TypeAlias
-
-from gitlab.v4.objects import ProjectCommitComment
-from gitlab.v4.objects import ProjectIssueAwardEmoji
-from gitlab.v4.objects import ProjectIssueNote
-from gitlab.v4.objects import ProjectIssueNoteAwardEmoji
-from gitlab.v4.objects import ProjectIssueResourceLabelEvent
-from gitlab.v4.objects import ProjectMergeRequestAwardEmoji
-from gitlab.v4.objects import ProjectMergeRequestNote
-from gitlab.v4.objects import ProjectMergeRequestNoteAwardEmoji
-from gitlab.v4.objects import ProjectMergeRequestResourceLabelEvent
-
-from gitlab2prov.adapters.fetch.annotations import AnnotationClassifier
-from gitlab2prov.adapters.fetch.annotations import CLASSIFIERS
-from gitlab2prov.adapters.fetch.annotations import IMPORT_STATEMENT
-from gitlab2prov.domain.constants import ProvRole
-from gitlab2prov.domain.objects import Annotation
-from gitlab2prov.domain.objects import User
-
-
-log = logging.getLogger(__name__)
-
-
-DEFAULT = "default_annotation"
-
-
-Comment: TypeAlias = ProjectCommitComment
-Note: TypeAlias = ProjectIssueNote | ProjectMergeRequestNote
-Label: TypeAlias = ProjectIssueResourceLabelEvent | ProjectMergeRequestResourceLabelEvent
-AwardEmoji: TypeAlias = (
- ProjectIssueAwardEmoji
- | ProjectIssueNoteAwardEmoji
- | ProjectMergeRequestAwardEmoji
- | ProjectMergeRequestNoteAwardEmoji
-)
-
-
-def normalize(string: str) -> str:
- return string.strip().lower()
-
-
-def longest_matching_classifier(string: str) -> AnnotationClassifier | None:
- matching = (cls for cls in CLASSIFIERS if cls.matches(string))
- return max(matching, key=len, default=None)
-
-
-def classify_system_note(string: str) -> tuple[str, dict[str, Any]]:
- string = normalize(string)
- kwargs = {}
- # remove import statement, if present
- if IMPORT_STATEMENT.matches(string):
- string = IMPORT_STATEMENT.replace(string)
- kwargs = IMPORT_STATEMENT.groupdict()
- # find classifier by choosing the one with the longest match
- if matching_classifier := longest_matching_classifier(string):
- kwargs.update(matching_classifier.groupdict())
- return matching_classifier.name, kwargs
- return DEFAULT, kwargs
-
-
-def parse_system_note(note: Note) -> Annotation:
- annotator = User(
- name=note.author.get("name"),
- email=note.author.get("email"),
- gitlab_username=note.author.get("username"),
- gitlab_id=note.author.get("id"),
- prov_role=ProvRole.ANNOTATOR,
- )
- annotation_type, kwargs = classify_system_note(note.body)
- return Annotation(
- id=note.id,
- type=annotation_type,
- body=note.body,
- kwargs=kwargs,
- annotator=annotator,
- prov_start=note.created_at,
- prov_end=note.created_at,
- )
-
-
-def parse_comment(comment: Comment) -> Annotation:
- annotator = User(
- name=comment.author.get("name"),
- email=comment.author.get("email"),
- gitlab_username=comment.author.get("username"),
- gitlab_id=comment.author.get("id"),
- prov_role=ProvRole.ANNOTATOR,
- )
- return Annotation(
- id=f"{uuid.uuid4()}{annotator.gitlab_id}{abs(hash(comment.note))}",
- type="add_comment",
- body=comment.note,
- annotator=annotator,
- prov_start=comment.created_at,
- prov_end=comment.created_at,
- )
-
-
-def parse_note(note: Note) -> Annotation:
- annotator = User(
- name=note.author.get("name"),
- email=note.author.get("email"),
- gitlab_username=note.author.get("username"),
- gitlab_id=note.author.get("id"),
- prov_role=ProvRole.ANNOTATOR,
- )
- return Annotation(
- id=note.id,
- type="add_note",
- body=note.body,
- annotator=annotator,
- prov_start=note.created_at,
- prov_end=note.created_at,
- )
-
-
-def parse_award(award: AwardEmoji) -> Annotation:
- annotator = User(
- name=award.user.get("name"),
- email=award.user.get("email"),
- gitlab_username=award.user.get("username"),
- gitlab_id=award.user.get("id"),
- prov_role=ProvRole.ANNOTATOR,
- )
- return Annotation(
- id=award.id,
- type="award_emoji",
- body=award.name,
- annotator=annotator,
- prov_start=award.created_at,
- prov_end=award.created_at,
- )
-
-
-def parse_label(label: Label) -> Annotation:
- annotator = User(
- name=label.user.get("name"),
- email=label.user.get("email"),
- gitlab_username=label.user.get("username"),
- gitlab_id=label.user.get("id"),
- prov_role=ProvRole.ANNOTATOR,
- )
- return Annotation(
- id=label.id,
- type=f"{label.action}_label",
- body=label.action,
- annotator=annotator,
- prov_start=label.created_at,
- prov_end=label.created_at,
- )
-
-
-def choose_parser(
- parseable: Note | Comment | AwardEmoji | Label,
-) -> Callable[[Note | Comment | AwardEmoji | Label], Annotation] | None:
- match parseable:
- case ProjectIssueNote(system=True) | ProjectMergeRequestNote(system=True):
- return parse_system_note
- case ProjectIssueNote() | ProjectMergeRequestNote():
- return parse_note
- case ProjectCommitComment():
- return parse_comment
- case ProjectIssueResourceLabelEvent() | ProjectMergeRequestResourceLabelEvent():
- return parse_label
- case ProjectIssueAwardEmoji() | ProjectIssueNoteAwardEmoji() | ProjectMergeRequestAwardEmoji() | ProjectMergeRequestNoteAwardEmoji():
- return parse_award
- case _:
- log.warning(f"no parser found for {parseable=}")
- return
-
-
-def parse_annotations(
- parseables: Sequence[Note | Comment | AwardEmoji | Label],
-) -> Sequence[Annotation]:
- annotations = []
- for parseable in parseables:
- if parser := choose_parser(parseable):
- annotations.append(parser(parseable))
- return sorted(annotations, key=operator.attrgetter("prov_start"))
diff --git a/gitlab2prov/adapters/fetch/gitlab.py b/gitlab2prov/adapters/fetch/gitlab.py
deleted file mode 100644
index d153f30..0000000
--- a/gitlab2prov/adapters/fetch/gitlab.py
+++ /dev/null
@@ -1,216 +0,0 @@
-import logging
-from collections.abc import Iterator
-from dataclasses import dataclass
-from dataclasses import field
-
-from gitlab import Gitlab
-from gitlab.exceptions import GitlabListError
-from gitlab.v4.objects import Project
-from gitlab.v4.objects import ProjectCommit
-from gitlab.v4.objects import ProjectIssue
-from gitlab.v4.objects import ProjectMergeRequest
-from gitlab.v4.objects import ProjectRelease
-from gitlab.v4.objects import ProjectTag
-
-from gitlab2prov.adapters.fetch.annotations import parse_annotations
-from gitlab2prov.adapters.fetch.utils import gitlab_url
-from gitlab2prov.adapters.fetch.utils import project_slug
-from gitlab2prov.domain.constants import ProvRole
-from gitlab2prov.domain.objects import Asset
-from gitlab2prov.domain.objects import Evidence
-from gitlab2prov.domain.objects import GitlabCommit
-from gitlab2prov.domain.objects import Issue
-from gitlab2prov.domain.objects import MergeRequest
-from gitlab2prov.domain.objects import Release
-from gitlab2prov.domain.objects import Tag
-from gitlab2prov.domain.objects import User
-
-
-log = logging.getLogger(__name__)
-
-
-@dataclass
-class GitlabFetcher:
- url: str
- token: str
- _project: Project | None = field(init=False, default=None)
-
- def do_login(self) -> None:
- gl = Gitlab(url=gitlab_url(self.url), private_token=self.token)
- self._project = gl.projects.get(project_slug(self.url))
-
- def fetch_gitlab(
- self,
- ) -> Iterator[GitlabCommit | Issue | MergeRequest | Release | Tag]:
- yield from extract_commits(self._project)
- yield from extract_issues(self._project)
- yield from extract_mergerequests(self._project)
- yield from extract_releases(self._project)
- yield from extract_tags(self._project)
-
-
-def on_gitlab_list_error(func):
- def wrapped(*args, **kwargs):
- try:
- return func(*args, **kwargs)
- except GitlabListError as e:
- msg = f"{func.__module__}.{func.__name__}: {type(e)} due to {e.response_code} HTTP Error."
- log.info(msg)
-
- return wrapped
-
-
-def get_commit_author(commit: ProjectCommit) -> User:
- return User(
- name=commit.committer_name,
- email=commit.committer_email,
- gitlab_username=None,
- gitlab_id=None,
- prov_role=ProvRole.AUTHOR_GITLAB_COMMIT,
- )
-
-
-def get_tag_author(tag: ProjectTag) -> User:
- return User(
- name=tag.commit.get("author_name"),
- email=tag.commit.get("author_email"),
- gitlab_username=None,
- gitlab_id=None,
- prov_role=ProvRole.AUTHOR_TAG,
- )
-
-
-def get_resource_author(
- resource: ProjectIssue | ProjectMergeRequest | ProjectRelease, role: ProvRole
-) -> User | None:
- if not hasattr(resource, "author"):
- return None
- return User(
- name=resource.author.get("name"),
- email=resource.author.get("email"),
- gitlab_username=resource.author.get("username"),
- gitlab_id=resource.author.get("id"),
- prov_role=role,
- )
-
-
-def get_assets(release: ProjectRelease) -> list[Asset]:
- return [
- Asset(url=asset.get("url"), format=asset.get("format"))
- for asset in release.assets.get("sources", [])
- ]
-
-
-def get_evidences(release: ProjectRelease) -> list[Evidence]:
- return [
- Evidence(
- hexsha=evidence.get("sha"),
- url=evidence.get("filepath"),
- collected_at=evidence.get("collected_at"),
- )
- for evidence in release.evidences
- ]
-
-
-@on_gitlab_list_error
-def extract_commits(project: Project) -> Iterator[GitlabCommit]:
- for commit in project.commits.list(all=True):
- parseables = {
- *commit.comments.list(all=True, system=False),
- *commit.comments.list(all=True, system=True),
- }
- yield GitlabCommit(
- hexsha=commit.id,
- url=commit.web_url,
- author=get_commit_author(commit),
- annotations=parse_annotations(parseables),
- authored_at=commit.authored_date,
- committed_at=commit.committed_date,
- )
-
-
-@on_gitlab_list_error
-def extract_issues(project: Project) -> Iterator[Issue]:
- for issue in project.issues.list(all=True):
- parseables = {
- *issue.notes.list(all=True, system=False),
- *issue.notes.list(all=True, system=True),
- *issue.awardemojis.list(all=True),
- *issue.resourcelabelevents.list(all=True),
- *(
- award
- for note in issue.notes.list(all=True)
- for award in note.awardemojis.list(all=True)
- ),
- }
- yield Issue(
- id=issue.id,
- iid=issue.iid,
- title=issue.title,
- description=issue.description,
- url=issue.web_url,
- author=get_resource_author(issue, ProvRole.AUTHOR_ISSUE),
- annotations=parse_annotations(parseables),
- created_at=issue.created_at,
- closed_at=issue.closed_at,
- )
-
-
-@on_gitlab_list_error
-def extract_mergerequests(project: Project) -> Iterator[MergeRequest]:
- for mergerequest in project.mergerequests.list(all=True):
- parseables = {
- *mergerequest.notes.list(all=True, system=False),
- *mergerequest.notes.list(all=True, system=True),
- *mergerequest.awardemojis.list(all=True),
- *mergerequest.resourcelabelevents.list(all=True),
- *(
- award
- for note in mergerequest.notes.list(all=True)
- for award in note.awardemojis.list(all=True)
- ),
- }
- yield MergeRequest(
- id=mergerequest.id,
- iid=mergerequest.iid,
- title=mergerequest.title,
- description=mergerequest.description,
- url=mergerequest.web_url,
- source_branch=mergerequest.source_branch,
- target_branch=mergerequest.target_branch,
- author=get_resource_author(mergerequest, ProvRole.AUTHOR_MERGE_REQUEST),
- annotations=parse_annotations(parseables),
- created_at=mergerequest.created_at,
- closed_at=mergerequest.closed_at,
- merged_at=mergerequest.merged_at,
- first_deployed_to_production_at=getattr(
- mergerequest, "first_deployed_to_production_at", None
- ),
- )
-
-
-@on_gitlab_list_error
-def extract_releases(project: Project) -> Iterator[Release]:
- for release in project.releases.list(all=True):
- yield Release(
- name=release.name,
- description=release.description,
- tag_name=release.tag_name,
- author=get_resource_author(release, ProvRole.AUTHOR_RELEASE),
- assets=get_assets(release),
- evidences=get_evidences(release),
- created_at=release.created_at,
- released_at=release.released_at,
- )
-
-
-@on_gitlab_list_error
-def extract_tags(project: Project) -> Iterator[Tag]:
- for tag in project.tags.list(all=True):
- yield Tag(
- name=tag.name,
- hexsha=tag.target,
- message=tag.message,
- author=get_tag_author(tag),
- created_at=tag.commit.get("created_at"),
- )
diff --git a/gitlab2prov/adapters/fetch/utils.py b/gitlab2prov/adapters/fetch/utils.py
deleted file mode 100644
index 4b83042..0000000
--- a/gitlab2prov/adapters/fetch/utils.py
+++ /dev/null
@@ -1,18 +0,0 @@
-from urllib.parse import urlsplit
-
-
-def project_slug(url: str) -> str:
- path = urlsplit(url).path
- if path is None:
- return None
- return path.strip("/")
-
-
-def gitlab_url(url: str) -> str:
- split = urlsplit(url)
- return f"{split.scheme}://{split.netloc}"
-
-
-def clone_over_https_url(url: str, token: str) -> str:
- split = urlsplit(url)
- return f"https://gitlab.com:{token}@{split.netloc}/{project_slug(url)}"
diff --git a/gitlab2prov/adapters/git/__init__.py b/gitlab2prov/adapters/git/__init__.py
new file mode 100644
index 0000000..69cdbf5
--- /dev/null
+++ b/gitlab2prov/adapters/git/__init__.py
@@ -0,0 +1 @@
+from gitlab2prov.adapters.git.fetcher import GitFetcher
\ No newline at end of file
diff --git a/gitlab2prov/adapters/fetch/git.py b/gitlab2prov/adapters/git/fetcher.py
similarity index 67%
rename from gitlab2prov/adapters/fetch/git.py
rename to gitlab2prov/adapters/git/fetcher.py
index e86f90d..eccb7e5 100644
--- a/gitlab2prov/adapters/fetch/git.py
+++ b/gitlab2prov/adapters/git/fetcher.py
@@ -2,11 +2,12 @@
from dataclasses import dataclass
from itertools import zip_longest
from tempfile import TemporaryDirectory
+from pathlib import Path
from git import Commit
from git import Repo
-from gitlab2prov.adapters.fetch.utils import clone_over_https_url
+from gitlab2prov.adapters.project_url import ProjectUrl
from gitlab2prov.domain.constants import ChangeType
from gitlab2prov.domain.constants import ProvRole
from gitlab2prov.domain.objects import File
@@ -20,33 +21,28 @@
@dataclass
class GitFetcher:
- url: str
- token: str
-
- _repo: Repo | None = None
- _tmpdir: TemporaryDirectory | None = None
+ project_url: type[ProjectUrl]
+ repo: Repo | None = None
+ tmpdir: TemporaryDirectory | None = None
def __enter__(self):
- self._tmpdir = TemporaryDirectory(ignore_cleanup_errors=True)
+ self.tmpdir = TemporaryDirectory(ignore_cleanup_errors=True)
return self
def __exit__(self, exc_type, exc_val, exc_tb):
- if self._repo:
- self._repo.close()
- if self._tmpdir:
- self._tmpdir.cleanup()
-
- def do_clone(self) -> None:
- url = clone_over_https_url(self.url, self.token)
- self._repo = Repo.clone_from(
- url=url,
- to_path=self._tmpdir.name,
- )
+ if self.repo:
+ self.repo.close()
+ if self.tmpdir:
+ self.tmpdir.cleanup()
+
+ def do_clone(self, url: str, token: str) -> None:
+ clone_url = self.project_url(url).clone_url(token)
+ self.repo = Repo.clone_from(clone_url, self.tmpdir.name)
- def fetch_git(self) -> Iterator[GitCommit | File | FileRevision]:
- yield from extract_commits(self._repo)
- yield from extract_files(self._repo)
- yield from extract_revisions(self._repo)
+ def fetch_all(self) -> Iterator[GitCommit | File | FileRevision]:
+ yield from extract_commits(self.repo)
+ yield from extract_files(self.repo)
+ yield from extract_revisions(self.repo)
def get_author(commit: Commit) -> User:
@@ -96,14 +92,18 @@ def parse_log(log: str):
def extract_commits(repo: Repo) -> Iterator[GitCommit]:
for commit in repo.iter_commits("--all"):
yield GitCommit(
- hexsha=commit.hexsha,
- message=commit.message,
+ sha=commit.hexsha,
title=commit.summary,
+ message=commit.message,
author=get_author(commit),
committer=get_committer(commit),
+ deletions=commit.stats.total["deletions"],
+ insertions=commit.stats.total["insertions"],
+ lines=commit.stats.total["lines"],
+ files_changed=commit.stats.total["files"],
parents=[parent.hexsha for parent in commit.parents],
- prov_start=commit.authored_datetime,
- prov_end=commit.committed_datetime,
+ authored_at=commit.authored_datetime,
+ committed_at=commit.committed_datetime,
)
@@ -118,7 +118,9 @@ def extract_files(repo: Repo) -> Iterator[File]:
# disregard modifications and deletions
for diff_item in diff.iter_change_type(ChangeType.ADDED):
# path for new files is stored in diff b_path
- yield File(path=diff_item.b_path, committed_in=commit.hexsha)
+ yield File(
+ name=Path(diff_item.b_path).name, path=diff_item.b_path, commit=commit.hexsha
+ )
def extract_revisions(repo: Repo) -> Iterator[FileRevision]:
@@ -135,10 +137,21 @@ def extract_revisions(repo: Repo) -> Iterator[FileRevision]:
file.path,
)
):
+ status = {"A": "added", "M": "modified", "D": "deleted"}.get(status, "modified")
revs.append(
- FileRevision(path=path, committed_in=hexsha, change_type=status, original=file)
+ FileRevision(
+ name=Path(path).name,
+ path=path,
+ commit=hexsha,
+ status=status,
+ insertions=0,
+ deletions=0,
+ lines=0,
+ score=0,
+ file=file,
+ )
)
- # revisions remeber their predecessor (previous revision)
+ # revisions remember their predecessor (previous revision)
for rev, prev in zip_longest(revs, revs[1:]):
rev.previous = prev
yield rev
diff --git a/gitlab2prov/adapters/hub/__init__.py b/gitlab2prov/adapters/hub/__init__.py
new file mode 100644
index 0000000..f777384
--- /dev/null
+++ b/gitlab2prov/adapters/hub/__init__.py
@@ -0,0 +1 @@
+from gitlab2prov.adapters.hub.fetcher import GithubFetcher
\ No newline at end of file
diff --git a/gitlab2prov/adapters/hub/fetcher.py b/gitlab2prov/adapters/hub/fetcher.py
new file mode 100644
index 0000000..8e5f955
--- /dev/null
+++ b/gitlab2prov/adapters/hub/fetcher.py
@@ -0,0 +1,159 @@
+import logging
+import itertools
+from typing import Iterator
+from dataclasses import dataclass, field, InitVar
+
+from github import Github
+from github.Repository import Repository
+
+from gitlab2prov.adapters.project_url import GithubProjectUrl
+from gitlab2prov.adapters.hub.parser import GithubAnnotationParser
+from gitlab2prov.domain.constants import ProvRole
+from gitlab2prov.domain.objects import (
+ Asset,
+ User,
+ Commit,
+ Issue,
+ MergeRequest,
+ GitTag,
+ Release,
+)
+
+
+log = logging.getLogger(__name__)
+
+
+@dataclass
+class GithubFetcher:
+ token: InitVar[str]
+ url: InitVar[str]
+
+ parser: GithubAnnotationParser = GithubAnnotationParser()
+ client: Github = field(init=False)
+ repository: Repository = field(init=False)
+
+ def __post_init__(self, token, url) -> None:
+ self.client = Github(login_or_token=token, per_page=100)
+ self.repository = self.client.get_repo(full_name_or_id=GithubProjectUrl(url).slug)
+ log.warning(f"Remaining requests: {self.client.rate_limiting[0]}")
+
+ def fetch_all(self) -> Iterator[Commit | Issue | MergeRequest | Release | GitTag]:
+ yield from itertools.chain(
+ self.fetch_commits(),
+ self.fetch_issues(),
+ self.fetch_mergerequests(),
+ self.fetch_releases(),
+ self.fetch_tags(),
+ )
+
+ def fetch_commits(self) -> Iterator[Commit]:
+ for commit in self.repository.get_commits():
+ raw_annotations = [
+ *commit.get_statuses(),
+ *commit.get_comments(),
+ *(comment.get_reactions() for comment in commit.get_comments()),
+ ]
+ yield Commit(
+ sha=commit.sha,
+ url=commit.url,
+ author=User(
+ commit.commit.author.name,
+ commit.commit.author.email,
+ prov_role=ProvRole.COMMIT_AUTHOR,
+ ),
+ platform="github",
+ annotations=self.parser.parse(raw_annotations),
+ authored_at=commit.commit.author.date,
+ committed_at=commit.commit.committer.date,
+ )
+
+ def fetch_issues(self) -> Iterator[Issue]:
+ for issue in self.repository.get_issues(state="all"):
+ raw_annotations = [
+ *issue.get_comments(),
+ *issue.get_reactions(),
+ *(comment.get_reactions() for comment in issue.get_comments()),
+ *issue.get_events(),
+ *issue.get_timeline(),
+ ]
+ yield Issue(
+ id=issue.number,
+ iid=issue.id,
+ platform="github",
+ title=issue.title,
+ body=issue.body,
+ url=issue.url,
+ author=User(issue.user.name, issue.user.email, prov_role=ProvRole.ISSUE_AUTHOR),
+ annotations=self.parser.parse(raw_annotations),
+ created_at=issue.created_at,
+ closed_at=issue.closed_at,
+ )
+
+ def fetch_mergerequests(self) -> Iterator[MergeRequest]:
+ for pull in self.repository.get_pulls(state="all"):
+ raw_annotations = []
+ raw_annotations.extend(pull.get_comments())
+ raw_annotations.extend(comment.get_reactions() for comment in pull.get_comments())
+ raw_annotations.extend(pull.get_review_comments())
+ raw_annotations.extend(
+ comment.get_reactions() for comment in pull.get_review_comments()
+ )
+ raw_annotations.extend(pull.get_reviews())
+ raw_annotations.extend(pull.as_issue().get_reactions())
+ raw_annotations.extend(pull.as_issue().get_events())
+ raw_annotations.extend(pull.as_issue().get_timeline())
+
+ yield MergeRequest(
+ id=pull.number,
+ iid=pull.id,
+ title=pull.title,
+ body=pull.body,
+ url=pull.url,
+ platform="github",
+ source_branch=pull.base.ref,
+ target_branch=pull.head.ref,
+ author=User(
+ name=pull.user.name,
+ email=pull.user.email,
+ prov_role=ProvRole.MERGE_REQUEST_AUTHOR,
+ ),
+ annotations=self.parser.parse(raw_annotations),
+ created_at=pull.created_at,
+ closed_at=pull.closed_at,
+ merged_at=pull.merged_at,
+ )
+
+ def fetch_releases(self) -> Iterator[Release]:
+ for release in self.repository.get_releases():
+ yield Release(
+ name=release.title,
+ body=release.body,
+ tag_name=release.tag_name,
+ platform="github",
+ author=User(
+ name=release.author.name,
+ email=release.author.email,
+ prov_role=ProvRole.RELEASE_AUTHOR,
+ ),
+ assets=[
+ Asset(url=asset.url, format=asset.content_type)
+ for asset in release.get_assets()
+ ],
+ evidences=[],
+ created_at=release.created_at,
+ released_at=release.published_at,
+ )
+
+ def fetch_tags(self) -> Iterator[GitTag]:
+ for tag in self.repository.get_tags():
+ yield GitTag(
+ name=tag.name,
+ sha=tag.commit.sha,
+ message=tag.commit.commit.message,
+ author=User(
+ name=tag.commit.author.name,
+ email=tag.commit.author.email,
+ prov_role=ProvRole.TAG_AUTHOR,
+ ),
+ created_at=tag.commit.commit.author.date,
+ )
diff --git a/gitlab2prov/adapters/hub/parser.py b/gitlab2prov/adapters/hub/parser.py
new file mode 100644
index 0000000..eb3d2fb
--- /dev/null
+++ b/gitlab2prov/adapters/hub/parser.py
@@ -0,0 +1,199 @@
+import logging
+from dataclasses import dataclass
+from typing import TypeVar, Callable
+
+from github.CommitComment import CommitComment
+from github.CommitStatus import CommitStatus
+from github.Reaction import Reaction
+from github.IssueComment import IssueComment
+from github.IssueEvent import IssueEvent
+from github.TimelineEvent import TimelineEvent
+from github.PullRequestComment import PullRequestComment
+from github.PullRequestReview import PullRequestReview
+
+from gitlab2prov.domain.objects import Annotation, User
+from gitlab2prov.domain.constants import ProvRole
+
+A = TypeVar("A")
+
+log = logging.getLogger(__name__)
+
+
+@dataclass
+class GithubAnnotationParser:
+ @staticmethod
+ def sort_by_date(annotations: list[Annotation]) -> list[Annotation]:
+ return list(sorted(annotations, key=lambda a: a.start))
+
+ def choose_parser(self, raw_annotation: A) -> Callable[[A], Annotation]:
+ match raw_annotation:
+ case CommitComment():
+ return self.parse_commit_comment
+ case CommitStatus():
+ return self.parse_commit_status
+ case Reaction():
+ return self.parse_reaction
+ case IssueComment():
+ return self.parse_issue_comment
+ case IssueEvent():
+ return self.parse_issue_event
+ case TimelineEvent():
+ return self.parse_timeline_event
+ case PullRequestReview():
+ return self.parse_pull_request_review
+ case PullRequestComment():
+ return self.parse_pull_request_comment
+ case _:
+ log.warning(f"no parser found for {raw_annotation=}")
+
+ @staticmethod
+ def filter_valid(annotations):
+ return [
+ annot
+ for annot in annotations
+ if annot.annotator is not None and annot.start is not None
+ ]
+
+ def parse(self, annotations: list[A]) -> list[Annotation]:
+ parsed_annotations = []
+ for annotation in annotations:
+ if parser := self.choose_parser(annotation):
+ parsed_annotations.append(parser(annotation))
+ return self.filter_valid(self.sort_by_date(parsed_annotations))
+
+ def parse_commit_comment(self, comment: CommitComment) -> Annotation:
+ annotator = User(
+ name=comment.user.name,
+ email=comment.user.email,
+ github_username=comment.user.login,
+ github_id=comment.user.id,
+ prov_role=ProvRole.ANNOTATOR,
+ )
+ return Annotation(
+ id=comment.id,
+ name="add_comment",
+ body=comment.body,
+ start=comment.created_at,
+ end=comment.created_at,
+ annotator=annotator,
+ )
+
+ def parse_commit_status(self, status: CommitStatus) -> Annotation:
+ annotator = User(
+ name=status.creator.name,
+ email=status.creator.email,
+ github_username=status.creator.login,
+ github_id=status.creator.id,
+ prov_role=ProvRole.ANNOTATOR,
+ )
+ return Annotation(
+ id=status.id,
+ name="add_commit_status",
+ body=status.description,
+ start=status.created_at,
+ end=status.created_at,
+ annotator=annotator,
+ )
+
+ def parse_reaction(self, reaction: Reaction) -> Annotation:
+ annotator = User(
+ name=reaction.user.name,
+ email=reaction.user.email,
+ github_username=reaction.user.login,
+ github_id=reaction.user.id,
+ prov_role=ProvRole.ANNOTATOR,
+ )
+ return Annotation(
+ id=reaction.id,
+ name="add_award",
+ body=reaction.content,
+ start=reaction.created_at,
+ end=reaction.created_at,
+ annotator=annotator,
+ )
+
+ def parse_issue_comment(self, comment: IssueComment) -> Annotation:
+ annotator = User(
+ name=comment.user.name,
+ email=comment.user.email,
+ github_username=comment.user.login,
+ github_id=comment.user.id,
+ prov_role=ProvRole.ANNOTATOR,
+ )
+ return Annotation(
+ id=comment.id,
+ name="add_comment",
+ body=comment.body,
+ start=comment.created_at,
+ end=comment.created_at,
+ annotator=annotator,
+ )
+
+ def parse_issue_event(self, event: IssueEvent) -> Annotation:
+ annotator = User(
+ name=event.actor.name,
+ email=event.actor.email,
+ github_username=event.actor.login,
+ github_id=event.actor.id,
+ prov_role=ProvRole.ANNOTATOR,
+ )
+ return Annotation(
+ id=event.id,
+ name=event.event,
+ body=event.event,
+ start=event.created_at,
+ end=event.created_at,
+ annotator=annotator,
+ )
+
+ def parse_timeline_event(self, event: TimelineEvent) -> Annotation:
+ return Annotation(
+ id=event.id,
+ name=event.event,
+ body=event.event,
+ start=event.created_at,
+ end=event.created_at,
+ annotator=User(
+ name=event.actor.name,
+ email=event.actor.email,
+ github_username=event.actor.login,
+ github_id=event.actor.id,
+ prov_role=ProvRole.ANNOTATOR,
+ )
+ if event.actor
+ else None,
+ )
+
+ def parse_pull_request_review(self, review: PullRequestReview) -> Annotation:
+ annotator = User(
+ name=review.user.name,
+ email=review.user.email,
+ github_username=review.user.login,
+ github_id=review.user.id,
+ prov_role=ProvRole.ANNOTATOR,
+ )
+ return Annotation(
+ id=review.id,
+ name="add_review",
+ body=review.body,
+ start=review.submitted_at,
+ end=review.submitted_at,
+ annotator=annotator,
+ )
+
+ def parse_pull_request_comment(self, comment: PullRequestComment) -> Annotation:
+ annotator = User(
+ name=comment.user.name,
+ email=comment.user.email,
+ github_username=comment.user.login,
+ github_id=comment.user.id,
+ prov_role=ProvRole.ANNOTATOR,
+ )
+ return Annotation(
+ id=comment.id,
+ name="add_comment",
+ body=comment.body,
+ start=comment.created_at,
+ end=comment.created_at,
+ annotator=annotator,
+ )
diff --git a/gitlab2prov/adapters/lab/__init__.py b/gitlab2prov/adapters/lab/__init__.py
new file mode 100644
index 0000000..719780d
--- /dev/null
+++ b/gitlab2prov/adapters/lab/__init__.py
@@ -0,0 +1 @@
+from gitlab2prov.adapters.lab.fetcher import GitlabFetcher
\ No newline at end of file
diff --git a/gitlab2prov/adapters/fetch/annotations/classifiers.py b/gitlab2prov/adapters/lab/classifiers.py
similarity index 91%
rename from gitlab2prov/adapters/fetch/annotations/classifiers.py
rename to gitlab2prov/adapters/lab/classifiers.py
index dd48c15..a41d14e 100644
--- a/gitlab2prov/adapters/fetch/annotations/classifiers.py
+++ b/gitlab2prov/adapters/lab/classifiers.py
@@ -9,12 +9,6 @@
log = logging.getLogger(__name__)
-def match_length(match: re.Match) -> int:
- if match is None:
- raise TypeError(f"Expected argument of type re.Match, got {type(match)}.")
- return match.end() - match.start()
-
-
@dataclass(kw_only=True)
class Classifier:
patterns: InitVar[list[str]]
@@ -24,9 +18,15 @@ class Classifier:
def __post_init__(self, regexps: list[str]):
self.compiled = [re.compile(regex, re.IGNORECASE) for regex in regexps]
+ @staticmethod
+ def match_length(match: re.Match) -> int:
+ if match is None:
+ raise TypeError(f"Expected argument of type re.Match, got {type(match)}.")
+ return match.end() - match.start()
+
def matches(self, string: str) -> bool:
matches = [match for pt in self.compiled if (match := re.search(pt, string))]
- self.match = max(matches, key=match_length, default=None)
+ self.match = max(matches, key=self.match_length, default=None)
return self.match is not None
def groupdict(self) -> dict[str, Any]:
@@ -37,7 +37,7 @@ def groupdict(self) -> dict[str, Any]:
def __len__(self) -> int:
if not self.match:
return 0
- return match_length(self.match)
+ return self.match_length(self.match)
@dataclass(kw_only=True)
@@ -443,3 +443,30 @@ class AnnotationClassifier(Classifier):
r"\*by (?P.+) on \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\sUTC \(imported from gitlab project\)\*",
],
)
+
+
+@dataclass
+class SystemNoteClassifier:
+ @staticmethod
+ def normalize(note: str) -> str:
+ return note.strip().lower()
+
+ def longest_matching_classifier(self, note: str) -> AnnotationClassifier:
+ matching = (classifier for classifier in CLASSIFIERS if classifier.matches(note))
+ return max(matching, key=len, default=None)
+
+ def classify(self, note: str) -> tuple[str, dict[str, str]]:
+ # 1. normalize the note
+ key_value_pairs = {}
+ normalized_note = self.normalize(note)
+ # 2. remove import statements, if any and extract the key-value pairs
+ if IMPORT_STATEMENT.matches(normalized_note):
+ normalized_note = IMPORT_STATEMENT.replace(normalized_note)
+ key_value_pairs.update(IMPORT_STATEMENT.groupdict())
+ # 3. find the longest matching classifier
+ if classifier := self.longest_matching_classifier(normalized_note):
+ key_value_pairs.update(classifier.groupdict())
+ # 4. return the classifier name and the matched groups
+ return classifier.name, key_value_pairs
+ # 5. if no classifier matches, return "unknown" and an empty dict
+ return "unknown", key_value_pairs
diff --git a/gitlab2prov/adapters/lab/fetcher.py b/gitlab2prov/adapters/lab/fetcher.py
new file mode 100644
index 0000000..c03c944
--- /dev/null
+++ b/gitlab2prov/adapters/lab/fetcher.py
@@ -0,0 +1,202 @@
+import logging
+import itertools
+from typing import Iterator
+from dataclasses import dataclass, field, InitVar
+
+from gitlab import Gitlab
+from gitlab.exceptions import GitlabListError
+from gitlab.v4.objects import Project
+
+from gitlab2prov.adapters.lab.parser import GitlabAnnotationParser
+from gitlab2prov.adapters.project_url import GitlabProjectUrl
+from gitlab2prov.domain.constants import ProvRole
+from gitlab2prov.domain.objects import (
+ Asset,
+ Evidence,
+ Commit,
+ Issue,
+ MergeRequest,
+ Release,
+ User,
+ GitTag,
+)
+
+
+log = logging.getLogger(__name__)
+
+
+@dataclass
+class GitlabFetcher:
+ token: InitVar[str]
+ url: InitVar[str]
+
+ client: Gitlab = field(init=False)
+ project: Project = field(init=False)
+ parser: GitlabAnnotationParser = GitlabAnnotationParser()
+
+ def __post_init__(self, token, url) -> None:
+ url = GitlabProjectUrl(url)
+ self.client = Gitlab(url.instance, private_token=token)
+ self.project = self.client.projects.get(url.slug)
+
+ def log_list_err(self, log: logging.Logger, err: GitlabListError, cls: str) -> None:
+ log.error(f"failed to fetch {cls} from {self.project.url}")
+ log.error(f"error: {err}")
+
+ def fetch_all(self) -> Iterator[Commit | Issue | MergeRequest | Release | GitTag]:
+ yield from itertools.chain(
+ self.fetch_commits(),
+ self.fetch_issues(),
+ self.fetch_mergerequests(),
+ self.fetch_releases(),
+ self.fetch_tags(),
+ )
+
+ def fetch_commits(self) -> Iterator[Commit]:
+ try:
+ for commit in self.project.commits.list(all=True, per_page=100):
+ yield Commit(
+ sha=commit.id,
+ url=commit.web_url,
+ platform="gitlab",
+ author=User(
+ commit.author_name, commit.author_email, prov_role=ProvRole.COMMIT_AUTHOR
+ ),
+ annotations=self.parser.parse(
+ [
+ *commit.comments.list(all=True, system=False),
+ *commit.comments.list(all=True, system=True),
+ ]
+ ),
+ authored_at=commit.authored_date,
+ committed_at=commit.committed_date,
+ )
+ except GitlabListError as err:
+ self.log_list_err(log, err, "commits")
+
+ def fetch_issues(self, state="all") -> Iterator[Issue]:
+ try:
+ for issue in self.project.issues.list(all=True, state=state, per_page=100):
+ yield Issue(
+ id=issue.id,
+ iid=issue.iid,
+ platform="gitlab",
+ title=issue.title,
+ body=issue.description,
+ url=issue.web_url,
+ author=User(
+ issue.author.get("name"),
+ issue.author.get("email"),
+ gitlab_username=issue.author.get("username"),
+ gitlab_id=issue.author.get("id"),
+ prov_role=ProvRole.ISSUE_AUTHOR,
+ ),
+ annotations=self.parser.parse(
+ [
+ *issue.notes.list(all=True, system=False),
+ *issue.notes.list(all=True, system=True),
+ *issue.awardemojis.list(all=True),
+ *issue.resourcelabelevents.list(all=True),
+ *(
+ award
+ for note in issue.notes.list(all=True)
+ for award in note.awardemojis.list(all=True)
+ ),
+ ]
+ ),
+ created_at=issue.created_at,
+ closed_at=issue.closed_at,
+ )
+ except GitlabListError as err:
+ self.log_list_err(log, err, "issues")
+
+ def fetch_mergerequests(self, state="all") -> Iterator[MergeRequest]:
+ try:
+ for merge in self.project.mergerequests.list(all=True, state=state, per_page=100):
+ yield MergeRequest(
+ id=merge.id,
+ iid=merge.iid,
+ title=merge.title,
+ body=merge.description,
+ url=merge.web_url,
+ platform="gitlab",
+ source_branch=merge.source_branch,
+ target_branch=merge.target_branch,
+ author=User(
+ merge.author.get("name"),
+ merge.author.get("email"),
+ gitlab_username=merge.author.get("username"),
+ gitlab_id=merge.author.get("id"),
+ prov_role=ProvRole.MERGE_REQUEST_AUTHOR,
+ ),
+ annotations=self.parser.parse(
+ (
+ *merge.notes.list(all=True, system=False),
+ *merge.notes.list(all=True, system=True),
+ *merge.awardemojis.list(all=True),
+ *merge.resourcelabelevents.list(all=True),
+ *(
+ award
+ for note in merge.notes.list(all=True)
+ for award in note.awardemojis.list(all=True)
+ ),
+ )
+ ),
+ created_at=merge.created_at,
+ closed_at=merge.closed_at,
+ merged_at=merge.merged_at,
+ first_deployed_to_production_at=getattr(
+ merge, "first_deployed_to_production_at", None
+ ),
+ )
+ except GitlabListError as err:
+ self.log_list_err(log, err, "merge requests")
+
+ def fetch_releases(self) -> Iterator[Release]:
+ try:
+ for release in self.project.releases.list(all=True, per_page=100):
+ yield Release(
+ name=release.name,
+ body=release.description,
+ tag_name=release.tag_name,
+ author=User(
+ name=release.author.get("name"),
+ email=release.author.get("email"),
+ gitlab_username=release.author.get("username"),
+ gitlab_id=release.author.get("id"),
+ prov_role=ProvRole.RELEASE_AUTHOR,
+ ),
+ assets=[
+ Asset(url=asset.get("url"), format=asset.get("format"))
+ for asset in release.assets.get("sources", [])
+ ],
+ evidences=[
+ Evidence(
+ sha=evidence.get("sha"),
+ url=evidence.get("filepath"),
+ collected_at=evidence.get("collected_at"),
+ )
+ for evidence in release.evidences
+ ],
+ created_at=release.created_at,
+ released_at=release.released_at,
+ )
+ except GitlabListError as err:
+ self.log_list_err(log, err, "releases")
+
+ def fetch_tags(self) -> Iterator[GitTag]:
+ try:
+ for tag in self.project.tags.list(all=True, per_page=100):
+ yield GitTag(
+ name=tag.name,
+ sha=tag.target,
+ message=tag.message,
+ author=User(
+ name=tag.commit.get("author_name"),
+ email=tag.commit.get("author_email"),
+ prov_role=ProvRole.TAG_AUTHOR,
+ ),
+ created_at=tag.commit.get("created_at"),
+ )
+ except GitlabListError as err:
+ self.log_list_err(log, err, "tags")
diff --git a/gitlab2prov/adapters/lab/parser.py b/gitlab2prov/adapters/lab/parser.py
new file mode 100644
index 0000000..2a4f959
--- /dev/null
+++ b/gitlab2prov/adapters/lab/parser.py
@@ -0,0 +1,153 @@
+import logging
+import uuid
+from dataclasses import dataclass
+from typing import TypeVar, Callable
+
+from gitlab.v4.objects import (
+ ProjectIssueNote,
+ ProjectMergeRequestNote,
+ ProjectCommitComment,
+ ProjectIssueResourceLabelEvent,
+ ProjectMergeRequestResourceLabelEvent,
+ ProjectIssueAwardEmoji,
+ ProjectIssueNoteAwardEmoji,
+ ProjectMergeRequestAwardEmoji,
+ ProjectMergeRequestNoteAwardEmoji,
+)
+
+from gitlab2prov.adapters.lab.classifiers import SystemNoteClassifier
+from gitlab2prov.domain.objects import Annotation, User
+from gitlab2prov.domain.constants import ProvRole
+
+
+A = TypeVar("A")
+
+log = logging.getLogger(__name__)
+
+
+@dataclass
+class GitlabAnnotationParser:
+
+ classifier: SystemNoteClassifier = SystemNoteClassifier()
+
+ @staticmethod
+ def sort_by_date(annotations: list[Annotation]) -> list[Annotation]:
+ return list(sorted(annotations, key=lambda a: a.start))
+
+ def choose_parser(self, raw_annotation: A) -> Callable[[A], Annotation]:
+ match raw_annotation:
+ case ProjectIssueNote(system=True) | ProjectMergeRequestNote(system=True):
+ return self.parse_system_note
+ case ProjectIssueNote() | ProjectMergeRequestNote():
+ return self.parse_note
+ case ProjectCommitComment():
+ return self.parse_comment
+ case ProjectIssueResourceLabelEvent() | ProjectMergeRequestResourceLabelEvent():
+ return self.parse_label
+ case ProjectIssueAwardEmoji() | ProjectIssueNoteAwardEmoji() | ProjectMergeRequestAwardEmoji() | ProjectMergeRequestNoteAwardEmoji():
+ return self.parse_award
+ case _:
+ log.warning(f"no parser found for {raw_annotation=}")
+ return
+
+ def parse(self, annotations: list[A]) -> list[Annotation]:
+ parsed_annotations = []
+ for annotation in annotations:
+ if parser := self.choose_parser(annotation):
+ parsed_annotations.append(parser(annotation))
+ return self.sort_by_date(parsed_annotations)
+
+ def parse_system_note(self, note: ProjectIssueNote | ProjectMergeRequestNote) -> Annotation:
+ annotator = User(
+ name=note.author.get("name"),
+ email=note.author.get("email"),
+ gitlab_username=note.author.get("username"),
+ gitlab_id=note.author.get("id"),
+ prov_role=ProvRole.ANNOTATOR,
+ )
+ annotation_name, key_value_pairs = self.classifier.classify(note.body)
+ return Annotation(
+ id=note.id,
+ name=annotation_name,
+ body=note.body,
+ start=note.created_at,
+ end=note.created_at,
+ captured_kwargs=key_value_pairs,
+ annotator=annotator,
+ )
+
+ def parse_comment(self, comment: ProjectCommitComment) -> Annotation:
+ annotator = User(
+ name=comment.author.get("name"),
+ email=comment.author.get("email"),
+ gitlab_username=comment.author.get("username"),
+ gitlab_id=comment.author.get("id"),
+ prov_role=ProvRole.ANNOTATOR,
+ )
+ return Annotation(
+ id=f"{uuid.uuid4()}{annotator.gitlab_id}{abs(hash(comment.note))}",
+ name="add_comment",
+ body=comment.note,
+ start=comment.created_at,
+ end=comment.created_at,
+ annotator=annotator,
+ )
+
+ def parse_note(self, note: ProjectIssueNote | ProjectMergeRequestNote) -> Annotation:
+ annotator = User(
+ name=note.author.get("name"),
+ email=note.author.get("email"),
+ gitlab_username=note.author.get("username"),
+ gitlab_id=note.author.get("id"),
+ prov_role=ProvRole.ANNOTATOR,
+ )
+ return Annotation(
+ id=note.id,
+ name="add_note",
+ body=note.body,
+ annotator=annotator,
+ start=note.created_at,
+ end=note.created_at,
+ )
+
+ def parse_award(
+ self,
+ award: ProjectIssueAwardEmoji
+ | ProjectIssueNoteAwardEmoji
+ | ProjectMergeRequestAwardEmoji
+ | ProjectMergeRequestNoteAwardEmoji,
+ ) -> Annotation:
+ annotator = User(
+ name=award.user.get("name"),
+ email=award.user.get("email"),
+ gitlab_username=award.user.get("username"),
+ gitlab_id=award.user.get("id"),
+ prov_role=ProvRole.ANNOTATOR,
+ )
+ return Annotation(
+ id=award.id,
+ name="add_award",
+ body=award.name,
+ annotator=annotator,
+ start=award.created_at,
+ end=award.created_at,
+ )
+
+ def parse_label(
+ self, label: ProjectIssueResourceLabelEvent | ProjectMergeRequestResourceLabelEvent
+ ) -> Annotation:
+ annotator = User(
+ name=label.user.get("name"),
+ email=label.user.get("email"),
+ gitlab_username=label.user.get("username"),
+ gitlab_id=label.user.get("id"),
+ prov_role=ProvRole.ANNOTATOR,
+ )
+ return Annotation(
+ id=label.id,
+ name=f"{label.action}_label",
+ body=label.action,
+ annotator=annotator,
+ start=label.created_at,
+ end=label.created_at,
+ )
diff --git a/gitlab2prov/adapters/project_url.py b/gitlab2prov/adapters/project_url.py
new file mode 100644
index 0000000..b7eed3e
--- /dev/null
+++ b/gitlab2prov/adapters/project_url.py
@@ -0,0 +1,44 @@
+from urllib.parse import urlsplit
+from dataclasses import dataclass
+
+
+@dataclass
+class ProjectUrl:
+ url: str
+ scheme: str = "https"
+
+ def __post_init__(self):
+ parsed_url = urlsplit(self.url)
+ self.url_path = parsed_url.path
+ self.netloc = parsed_url.netloc
+
+ @property
+ def slug(self) -> str:
+ if self.url_path:
+ *owner, project = self.url_path.split("/")
+ owner = "/".join(owner)[1:]
+ return f"{owner}/{project}"
+ return ""
+
+ @property
+ def instance(self) -> str:
+ return f"{self.scheme}://{self.netloc}"
+
+ def clone_url(self, platform: str, token: str = "") -> str:
+ platform_urls = {
+ "gitlab": f"{self.instance}:{token}@{self.netloc}/{self.slug}",
+ "github": f"{self.scheme}://{token}@{self.netloc}/{self.slug}.git",
+ }
+ return platform_urls.get(platform, "")
+
+
+@dataclass
+class GitlabProjectUrl(ProjectUrl):
+ def clone_url(self, token: str = ""):
+ return super().clone_url("gitlab", token)
+
+
+@dataclass
+class GithubProjectUrl(ProjectUrl):
+ def clone_url(self, token: str = ""):
+ return super().clone_url("github", token)
diff --git a/gitlab2prov/adapters/repository.py b/gitlab2prov/adapters/repository.py
index 9d9a642..93dbf25 100644
--- a/gitlab2prov/adapters/repository.py
+++ b/gitlab2prov/adapters/repository.py
@@ -6,7 +6,7 @@
R = TypeVar("R")
-class AbstractRepository(abc.ABC):
+class Repository(abc.ABC):
def add(self, resource: R) -> None:
self._add(resource)
@@ -31,10 +31,8 @@ def _list_all(self, resource_type: Type[R], **filters: Any) -> list[R]:
raise NotImplementedError
-class InMemoryRepository(AbstractRepository):
- # not super efficient
- # should be fast enough for 1.0
- # snychronous get requests are the main culprit in slowing runtime
+class InMemoryRepository(Repository):
+ # TODO: speed up retrieval
def __init__(self):
super().__init__()
self.repo = defaultdict(list)
diff --git a/gitlab2prov/bootstrap.py b/gitlab2prov/bootstrap.py
index 7d4adf2..c7bea98 100644
--- a/gitlab2prov/bootstrap.py
+++ b/gitlab2prov/bootstrap.py
@@ -1,23 +1,30 @@
import inspect
import logging
-from typing import Type
from gitlab2prov.service_layer import handlers, messagebus, unit_of_work
-from gitlab2prov.adapters.fetch import GitFetcher, GitlabFetcher
+
+from gitlab2prov.adapters.git import GitFetcher
+from gitlab2prov.adapters.lab import GitlabFetcher
+from gitlab2prov.adapters.hub import GithubFetcher
+from gitlab2prov.adapters.project_url import GithubProjectUrl, GitlabProjectUrl
log = logging.getLogger(__name__)
def bootstrap(
- uow: unit_of_work.AbstractUnitOfWork = unit_of_work.InMemoryUnitOfWork(),
- git_fetcher: Type[GitFetcher] = GitFetcher,
- gitlab_fetcher: Type[GitlabFetcher] = GitlabFetcher,
+ platform: str,
+ uow: unit_of_work.UnitOfWork = unit_of_work.InMemoryUnitOfWork(),
+ git_fetcher: type[GitFetcher] = GitFetcher,
+ gitlab_fetcher: type[GitlabFetcher] = GitlabFetcher,
+ github_fetcher: type[GithubFetcher] = GithubFetcher,
+ github_url: type[GithubProjectUrl] = GithubProjectUrl,
+ gitlab_url: type[GitlabProjectUrl] = GitlabProjectUrl,
):
dependencies = {
"uow": uow,
- "git_fetcher": git_fetcher,
- "gitlab_fetcher": gitlab_fetcher,
+ "git_fetcher": git_fetcher(gitlab_url if platform == "gitlab" else github_url),
+ "githosted_fetcher": gitlab_fetcher if platform == "gitlab" else github_fetcher,
}
injected_handlers = {
command_type: [inject_dependencies(handler, dependencies) for handler in handlers]
diff --git a/gitlab2prov/config/__init__.py b/gitlab2prov/config/__init__.py
index d2f9919..87f0140 100644
--- a/gitlab2prov/config/__init__.py
+++ b/gitlab2prov/config/__init__.py
@@ -1 +1 @@
-from gitlab2prov.config.parser import ConfigParser
\ No newline at end of file
+from gitlab2prov.config.config import Config
\ No newline at end of file
diff --git a/gitlab2prov/config/config.py b/gitlab2prov/config/config.py
new file mode 100644
index 0000000..4504806
--- /dev/null
+++ b/gitlab2prov/config/config.py
@@ -0,0 +1,73 @@
+import json
+from typing import Any
+from dataclasses import dataclass, field
+
+import jsonschema
+import jsonschema.exceptions
+from ruamel.yaml import YAML
+import ruamel.yaml.constructor as constructor
+
+from gitlab2prov.root import get_package_root
+
+
+@dataclass
+class Config:
+ """A config file."""
+
+ content: str = ""
+ schema: dict[str, Any] = field(init=False)
+
+ def __post_init__(self):
+ self.schema = self.get_schema()
+
+ @classmethod
+ def read(cls, filepath: str):
+ """Read the config file from the given path."""
+ with open(filepath, "rt") as f:
+ yaml = YAML(typ="safe")
+ return cls(content=yaml.load(f.read()))
+
+ @staticmethod
+ def get_schema() -> dict[str, Any]:
+ """Get the schema from the config package."""
+ path = get_package_root() / "config" / "schema.json"
+ with open(path, "rt", encoding="utf-8") as f:
+ return json.loads(f.read())
+
+ def validate(self) -> tuple[bool, str]:
+ """Validate the config file against the schema."""
+ try:
+ jsonschema.validate(self.content, self.schema)
+ except jsonschema.exceptions.ValidationError as err:
+ return False, err.message
+ except jsonschema.exceptions.SchemaError as err:
+ return False, err.message
+ except constructor.DuplicateKeyError as err:
+ return False, err.problem
+ return True, "Everything is fine!"
+
+ def parse(self) -> list[str]:
+ """Parse the config file into a list of strings."""
+ args = []
+
+ for obj in self.content:
+ command = list(obj.keys())[0]
+ args.append(command)
+
+ options = obj.get(command)
+ if not options:
+ continue
+
+ for name, literal in options.items():
+ if isinstance(literal, bool):
+ args.append(f"--{name}")
+ elif isinstance(literal, str):
+ args.append(f"--{name}")
+ args.append(literal)
+ elif isinstance(literal, list):
+ for lit in literal:
+ args.append(f"--{name}")
+ args.append(lit)
+ else:
+ raise ValueError(f"Unknown literal type: {type(literal)}")
+ return args
diff --git a/gitlab2prov/config/parser.py b/gitlab2prov/config/parser.py
deleted file mode 100644
index 073de78..0000000
--- a/gitlab2prov/config/parser.py
+++ /dev/null
@@ -1,58 +0,0 @@
-import json
-from typing import Any
-
-import jsonschema
-from ruamel.yaml import YAML
-
-from gitlab2prov.root import get_package_root
-
-
-def read_file(filepath: str) -> Any:
- with open(filepath, "rt") as f:
- yaml = YAML(typ="safe")
- return yaml.load(f.read())
-
-
-def get_schema() -> dict[str, Any]:
- path = get_package_root() / "config" / "schema.json"
- with open(path, "rt", encoding="utf-8") as f:
- return json.loads(f.read())
-
-
-class ConfigParser:
- @staticmethod
- def validate(filepath: str) -> None:
- jsonschema.validate(read_file(filepath), get_schema())
-
- def parse(self, filepath: str) -> list[str]:
- content = read_file(filepath)
- return list(self.parse_array(content))
-
- def parse_array(self, arr: list[Any]):
- for obj in arr:
- yield from self.parse_object(obj)
-
- def parse_object(self, obj: dict[str, Any]):
- cmd = list(obj.keys())[0]
- yield cmd
- yield from self.parse_options(obj[cmd])
-
- def parse_options(self, options: dict[str, bool | str | list[str]] | None):
- if not options:
- return
- for name, value in options.items():
- yield from self.parse_option(name, value)
-
- def parse_option(self, name: str, literal: bool | str | list[str]):
- match literal:
- case bool():
- yield f"--{name}"
- case str():
- yield f"--{name}"
- yield literal
- case list() as litlist:
- for lit in litlist:
- yield f"--{name}"
- yield lit
- case _:
- raise ValueError(f"Unknown literal type!")
diff --git a/gitlab2prov/config/schema.json b/gitlab2prov/config/schema.json
index ff77cb9..d208473 100644
--- a/gitlab2prov/config/schema.json
+++ b/gitlab2prov/config/schema.json
@@ -1,8 +1,30 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "array",
- "items": [
- {
+ "items": {
+ "oneOf": [
+ {
+ "$ref": "#/definitions/extract"
+ },
+ {
+ "$ref": "#/definitions/read"
+ },
+ {
+ "$ref": "#/definitions/combine"
+ },
+ {
+ "$ref": "#/definitions/write"
+ },
+ {
+ "$ref": "#/definitions/stats"
+ },
+ {
+ "$ref": "#/definitions/transform"
+ }
+ ]
+ },
+ "definitions": {
+ "extract": {
"type": "object",
"properties": {
"extract": {
@@ -19,17 +41,22 @@
"type": "string"
}
},
+ "additionalProperties": false,
"required": [
"url",
"token"
]
}
- }
+ },
+ "additionalProperties": false,
+ "required": [
+ "extract"
+ ]
},
- {
+ "read": {
"type": "object",
"properties": {
- "open": {
+ "read": {
"type": "object",
"properties": {
"input": {
@@ -39,24 +66,33 @@
}
}
},
+ "additionalProperties": false,
"required": [
"input"
]
}
- }
+ },
+ "additionalProperties": false,
+ "required": [
+ "read"
+ ]
},
- {
+ "combine": {
"type": "object",
"properties": {
"combine": {
"type": "null"
}
- }
+ },
+ "additionalProperties": false,
+ "required": [
+ "combine"
+ ]
},
- {
+ "write": {
"type": "object",
"properties": {
- "save": {
+ "write": {
"type": "object",
"properties": {
"output": {
@@ -65,26 +101,30 @@
"format": {
"type": "array",
"items": {
- "type": "string"
+ "type": "string",
+ "enum": [
+ "json",
+ "rdf",
+ "provn",
+ "dot",
+ "xml"
+ ]
}
}
},
+ "additionalProperties": false,
"required": [
"output",
"format"
]
}
- }
- },
- {
- "type": "object",
- "properties": {
- "pseudonymize": {
- "type": "null"
- }
- }
+ },
+ "additionalProperties": false,
+ "required": [
+ "write"
+ ]
},
- {
+ "stats": {
"type": "object",
"properties": {
"stats": {
@@ -99,16 +139,45 @@
"coarse": {
"type": "boolean"
},
- "formatter": {
+ "format": {
"type": "string",
"enum": [
"table",
"csv"
]
}
- }
+ },
+ "additionalProperties": false
+ }
+ },
+ "additionalProperties": false,
+ "required": [
+ "stats"
+ ]
+ },
+ "transform": {
+ "type": "object",
+ "properties": {
+ "transform": {
+ "type": "object",
+ "properties": {
+ "use_pseudonyms": {
+ "type": "boolean"
+ },
+ "remove_duplicates": {
+ "type": "boolean"
+ },
+ "merge_aliased_agents": {
+ "type": "boolean"
+ }
+ },
+ "additionalProperties": false
}
- }
+ },
+ "additionalProperties": false,
+ "required": [
+ "transform"
+ ]
}
- ]
-}
+ }
+}
\ No newline at end of file
diff --git a/gitlab2prov/domain/commands.py b/gitlab2prov/domain/commands.py
index 830eadd..34594f6 100644
--- a/gitlab2prov/domain/commands.py
+++ b/gitlab2prov/domain/commands.py
@@ -1,29 +1,68 @@
from dataclasses import dataclass
from datetime import datetime
from typing import Optional
+from prov.model import ProvDocument
@dataclass
class Command:
+ """Base class for all commands."""
pass
@dataclass
class Fetch(Command):
+ """Fetch data from cloned repository and remote projects."""
url: str
token: str
@dataclass
class Update(Fetch):
+ """Incremental update of data from cloned repository and remote projects."""
last_updated_at: datetime
@dataclass
-class Reset(Command):
- pass
+class Transform(Command):
+ """Apply transformations to the provenance document."""
+ document: ProvDocument
+ use_pseudonyms: bool = False
+ remove_duplicates: bool = False
+ merge_aliased_agents: str = ""
+
+
+@dataclass
+class Combine(Command):
+ """Combine multiple provenance documents into one."""
+ documents: list[ProvDocument]
+
+
+@dataclass
+class Statistics(Command):
+ """Calculate statistics for the provenance document."""
+ document: ProvDocument
+ resolution: str
+ format: str
@dataclass
class Serialize(Command):
- pass
+ """Retrieve/Serialize provenance document from interal data store."""
+ url: str = None
+
+
+@dataclass
+class Write(Command):
+ """Write provenance document to file."""
+ document: ProvDocument
+ filename: Optional[str] = None
+ format: Optional[str] = None
+
+
+@dataclass
+class Read(Command):
+ """Read provenance document from file."""
+ filename: Optional[str] = None
+ content: Optional[str] = None
+ format: Optional[str] = None
\ No newline at end of file
diff --git a/gitlab2prov/domain/constants.py b/gitlab2prov/domain/constants.py
index 45779df..13cf12d 100644
--- a/gitlab2prov/domain/constants.py
+++ b/gitlab2prov/domain/constants.py
@@ -29,13 +29,14 @@ class ChangeType:
class ProvRole:
GIT_COMMIT = "GitCommit"
+ COMMIT = "Commit"
COMMITTER = "Committer"
AUTHOR = "Author"
- AUTHOR_GITLAB_COMMIT = "GitlabCommitAuthor"
- AUTHOR_ISSUE = "IssueAuthor"
- AUTHOR_MERGE_REQUEST = "MergeRequestAuthor"
- AUTHOR_RELEASE = "ReleaseAuthor"
- AUTHOR_TAG = "TagAuthor"
+ COMMIT_AUTHOR = "CommitAuthor"
+ ISSUE_AUTHOR = "IssueAuthor"
+ MERGE_REQUEST_AUTHOR = "MergeRequestAuthor"
+ RELEASE_AUTHOR = "ReleaseAuthor"
+ TAG_AUTHOR = "TagAuthor"
ANNOTATOR = "Annotator"
FILE = "File"
FILE_REVISION_TO_BE_MODIFIED = "FileRevisionToBeModified"
@@ -43,12 +44,20 @@ class ProvRole:
FILE_REVISION_AT_POINT_OF_ADDITION = "FileRevisionAtPointOfAddition"
FILE_REVISION_AT_POINT_OF_DELETION = "FileRevisionAtPointOfDeletion"
RESOURCE = "Resource"
+ FIRST_RESOURCE_VERSION = "FirstResourceVersion"
RESOURCE_VERSION_AT_POINT_OF_CREATION = "ResourceVersionAtPointOfCreation"
RESOURCE_VERSION_TO_BE_ANNOTATED = "ResourceVersionToBeAnnotated"
RESOURCE_VERSION_AFTER_ANNOTATION = "ResourceVersionAfterAnnotation"
+ PRE_ANNOTATION_VERSION = "PreAnnotationVersion"
+ POST_ANNOTATION_VERSION = "PostAnnotationVersion"
RELEASE = "Release"
TAG = "Tag"
- GitCommit = "GitCommit"
+ GITCOMMIT = "GitCommit"
+ ADDED_REVISION = "AddedRevision"
+ DELETED_REVISION = "DeletedRevision"
+ MODIFIED_REVISION = "ModifiedRevision"
+ PREVIOUS_REVISION = "PreviousRevision"
+
class ProvType:
@@ -56,10 +65,14 @@ class ProvType:
GIT_COMMIT = "GitCommit"
FILE = "File"
FILE_REVISION = "FileRevision"
- GITLAB_COMMIT = "GitlabCommit"
+ COMMIT = "Commit"
GITLAB_COMMIT_VERSION = "GitlabCommitVersion"
GITLAB_COMMIT_VERSION_ANNOTATED = "AnnotatedGitlabCommitVersion"
GITLAB_COMMIT_CREATION = "GitlabCommitCreation"
+ GITHUB_COMMIT = "GithubCommit"
+ GITHUB_COMMIT_VERSION = "GithubCommitVersion"
+ GITHUB_COMMIT_VERSION_ANNOTATED = "AnnotatedGithubCommitVersion"
+ GITHUB_COMMIT_CREATION = "GithubCommitCreation"
ISSUE = "Issue"
ISSUE_VERSION = "IssueVersion"
ISSUE_VERSION_ANNOTATED = "AnnotatedIssueVersion"
@@ -68,6 +81,11 @@ class ProvType:
MERGE_REQUEST_VERSION = "MergeRequestVersion"
MERGE_REQUEST_VERSION_ANNOTATED = "AnnotatedMergeRequestVersion"
MERGE_REQUEST_CREATION = "MergeRequestCreation"
+ PULL_REQUEST = "PullRequest"
+ PULL_REQUEST_VERSION = "PullRequestVersion"
+ PULL_REQUEST_VERSION_ANNOTAED = "AnnotatedPullRequestVersion"
+ PULL_REQUEST_CREATION = "PullRequestCreation"
+ CREATION = "Creation"
ANNOTATION = "Annotation"
TAG = "Tag"
TAG_CREATION = "TagCreation"
diff --git a/gitlab2prov/domain/objects.py b/gitlab2prov/domain/objects.py
index 657f4b5..418e884 100644
--- a/gitlab2prov/domain/objects.py
+++ b/gitlab2prov/domain/objects.py
@@ -1,351 +1,539 @@
from __future__ import annotations
from dataclasses import dataclass
-from dataclasses import Field
from dataclasses import field
-from dataclasses import fields
from datetime import datetime
-from itertools import cycle
from typing import Any
-from urllib.parse import urlencode
+from prov.model import (
+ PROV_LABEL,
+ PROV_ROLE,
+ ProvDocument,
+ ProvAgent,
+ ProvActivity,
+ ProvEntity,
+ PROV_TYPE,
+ PROV_ATTR_STARTTIME,
+ PROV_ATTR_ENDTIME,
+)
from prov.identifier import QualifiedName
-from prov.model import PROV_LABEL
-from gitlab2prov.domain.constants import PROV_FIELD_MAP
-from gitlab2prov.domain.constants import ProvRole
from gitlab2prov.domain.constants import ProvType
from gitlab2prov.prov.operations import qualified_name
-# metadata for dataclass attributes that relate objects with one another
-# such attributes will not be included in the list of prov attributes of a dataclass
-IS_RELATION = {"IS_RELATION": True}
+PLACEHOLDER = ProvDocument()
+PLACEHOLDER.set_default_namespace("http://github.com/dlr-sc/gitlab2prov/")
-def is_relation(field: Field):
- return field.metadata == IS_RELATION
+@dataclass
+class User:
+ # TODO: github_email, gitlab_email
+ name: str
+ email: str
+ gitlab_username: str | None = None
+ github_username: str | None = None
+ gitlab_id: str | None = None
+ github_id: str | None = None
+ prov_role: str | None = None
+ def __post_init__(self):
+ self.email = self.email.lower() if self.email else None
-class ProvMixin:
@property
- def prov_identifier(self) -> QualifiedName:
- attrs = urlencode(dict(self._traverse_repr_fields()))
- label = f"{self._prov_type()}?{attrs}"
- return qualified_name(label)
+ def identifier(self) -> QualifiedName:
+ return qualified_name(f"User?{self.name=}&{self.email=}")
+
+ def to_prov_element(self) -> ProvAgent:
+ attributes = [
+ ("name", self.name),
+ ("email", self.email),
+ (PROV_ROLE, self.prov_role),
+ (PROV_TYPE, ProvType.USER),
+ ]
+ if self.gitlab_username:
+ attributes.append(("gitlab_username", self.gitlab_username))
+ if self.github_username:
+ attributes.append(("github_username", self.github_username))
+ if self.gitlab_id:
+ attributes.append(("gitlab_id", self.gitlab_id))
+ if self.github_id:
+ attributes.append(("github_id", self.github_id))
+ return ProvAgent(PLACEHOLDER, self.identifier, attributes)
+
+
+@dataclass
+class File:
+ name: str
+ path: str
+ commit: str
@property
- def prov_label(self) -> QualifiedName:
- attrs = urlencode(dict(self._traverse_repr_fields()))
- label = f"{self._prov_type()}?{attrs}"
- return qualified_name(label)
+ def identifier(self) -> QualifiedName:
+ return qualified_name(f"File?{self.name=}&{self.path=}&{self.commit=}")
+
+ def to_prov_element(self) -> ProvEntity:
+ attributes = [("name", self.name), ("path", self.path), (PROV_TYPE, ProvType.FILE)]
+ return ProvEntity(
+ PLACEHOLDER,
+ self.identifier,
+ attributes,
+ )
+
+
+@dataclass
+class FileRevision(File):
+ status: str
+ insertions: int
+ deletions: int
+ lines: int
+ score: float
+ file: File | None = None
+ previous: FileRevision | None = None
@property
- def prov_attributes(self) -> list[tuple[str, str | int | datetime | None]]:
- return list(self._traverse_attributes())
-
- def _prov_type(self) -> str:
- match self.prov_type:
- case list():
- return self.prov_type[0]
- case _:
- return self.prov_type
-
- def _traverse_repr_fields(self):
- for f in fields(self):
- if f.repr:
- yield f.name, getattr(self, f.name)
-
- def _traverse_attributes(self):
- for f in fields(self):
- if not is_relation(f):
- yield from self._expand_attribute(f.name, getattr(self, f.name))
- yield (PROV_LABEL, self.prov_label)
-
- def _expand_attribute(self, key, val):
- key = PROV_FIELD_MAP.get(key, key)
- match val:
- case list():
- yield from zip(cycle([key]), val)
- case dict():
- yield from val.items()
- case _:
- yield key, val
+ def identifier(self) -> QualifiedName:
+ return qualified_name(
+ f"FileRevision?{self.name=}&{self.path=}&{self.commit=}&{self.status=}"
+ )
+
+ def to_prov_element(self) -> ProvEntity:
+ attributes = [
+ ("name", self.name),
+ ("path", self.path),
+ ("status", self.status),
+ ("insertions", self.insertions),
+ ("deletions", self.deletions),
+ ("lines", self.lines),
+ ("score", self.score),
+ (PROV_TYPE, ProvType.FILE_REVISION),
+ ]
+ return ProvEntity(
+ PLACEHOLDER,
+ self.identifier,
+ attributes,
+ )
@dataclass
-class AgentMixin:
- def __iter__(self):
- yield self.prov_identifier
- yield self.prov_attributes
+class Annotation:
+ id: str
+ name: str
+ body: str
+ start: datetime
+ end: datetime
+ annotator: User
+ captured_kwargs: dict[str, Any] = field(default_factory=dict)
+
+ @property
+ def identifier(self) -> QualifiedName:
+ return qualified_name(f"Annotation?{self.id=}&{self.name=}")
+
+ def to_prov_element(self) -> ProvActivity:
+ attributes = [
+ ("id", self.id),
+ ("name", self.name),
+ ("body", self.body),
+ (PROV_ATTR_STARTTIME, self.start),
+ (PROV_ATTR_ENDTIME, self.end),
+ (PROV_TYPE, ProvType.ANNOTATION),
+ *(("captured_" + k, v) for k, v in self.captured_kwargs.items()),
+ ]
+ return ProvActivity(PLACEHOLDER, self.identifier, attributes)
@dataclass
-class EntityMixin:
- def __iter__(self):
- yield self.prov_identifier
- yield self.prov_attributes
+class Version:
+ id: str
+ resource: str # ProvType
+
+ @property
+ def identifier(self) -> QualifiedName:
+ return qualified_name(f"{self.resource}Version?{self.id=}")
+ @classmethod
+ def from_commit(cls, commit: Commit):
+ return cls(id=commit.sha, resource=ProvType.COMMIT)
-@dataclass(kw_only=True)
-class ActivityMixin:
- def __iter__(self):
- yield self.prov_identifier
- yield self.prov_start
- yield self.prov_end
- yield self.prov_attributes
+ @classmethod
+ def from_issue(cls, issue: Issue):
+ return cls(id=issue.id, resource=ProvType.ISSUE)
+ @classmethod
+ def from_merge_request(cls, merge_request: MergeRequest):
+ return cls(id=merge_request.id, resource=ProvType.MERGE_REQUEST)
-@dataclass(unsafe_hash=True, kw_only=True)
-class User(ProvMixin, AgentMixin):
- name: str
- email: str | None = field(default=None)
- gitlab_username: str | None = field(repr=False, default=None)
- gitlab_id: str | None = field(repr=False, default=None)
- prov_role: ProvRole = field(repr=False, default=None)
- prov_type: ProvType = field(init=False, repr=False, default=ProvType.USER)
+ def to_prov_element(self) -> ProvEntity:
+ attributes = [("id", self.id), (PROV_TYPE, f"{self.resource}Version")]
+ return ProvEntity(PLACEHOLDER, self.identifier, attributes)
- def __post_init__(self):
- self.email = self.email.lower() if self.email else None
+@dataclass
+class AnnotatedVersion:
+ id: str
+ annotation: str # Annotation.id
+ resource: str # ProvType
+ start: datetime
-@dataclass(unsafe_hash=True, kw_only=True)
-class File(ProvMixin, EntityMixin):
- path: str
- committed_in: str
- prov_type: str = field(init=False, repr=False, default=ProvType.FILE)
+ @property
+ def identifier(self) -> QualifiedName:
+ return qualified_name(f"Annotated{self.resource}Version?{self.id=}&{self.annotation=}")
+
+ @classmethod
+ def from_commit(cls, commit: Commit, annotation: Annotation):
+ return cls(
+ id=commit.sha,
+ annotation=annotation.id,
+ resource=ProvType.COMMIT,
+ start=annotation.start,
+ )
+ @classmethod
+ def from_issue(cls, issue: Issue, annotation: Annotation):
+ return cls(
+ id=issue.id, annotation=annotation.id, resource=ProvType.ISSUE, start=annotation.start
+ )
-@dataclass(unsafe_hash=True, kw_only=True)
-class FileRevision(ProvMixin, EntityMixin):
- path: str
- committed_in: str
- change_type: str
- original: File = field(repr=False, metadata=IS_RELATION)
- previous: FileRevision | None = field(repr=False, default=None, metadata=IS_RELATION)
- prov_type: ProvType = field(init=False, repr=False, default=ProvType.FILE_REVISION)
+ @classmethod
+ def from_merge_request(cls, merge_request: MergeRequest, annotation: Annotation):
+ return cls(
+ id=merge_request.id,
+ annotation=annotation.id,
+ resource=ProvType.MERGE_REQUEST,
+ start=annotation.start,
+ )
+ def to_prov_element(self) -> ProvEntity:
+ attributes = [("id", self.id), (PROV_TYPE, f"Annotated{self.resource}Version")]
+ return ProvEntity(
+ PLACEHOLDER,
+ self.identifier,
+ attributes,
+ )
-@dataclass(unsafe_hash=True, kw_only=True)
-class Annotation(ProvMixin, ActivityMixin):
+
+@dataclass
+class Creation:
id: str
- type: str
- body: str = field(repr=False)
- kwargs: dict[str, Any] = field(repr=False, default_factory=dict)
- annotator: User = field(repr=False, metadata=IS_RELATION)
- prov_start: datetime = field(repr=False)
- prov_end: datetime = field(repr=False)
- prov_type: ProvType = field(init=False, repr=False, default=ProvType.ANNOTATION)
-
-
-@dataclass(unsafe_hash=True, kw_only=True)
-class Version(ProvMixin, EntityMixin):
- version_id: str
- prov_type: ProvType = field(repr=False)
-
-
-@dataclass(unsafe_hash=True, kw_only=True)
-class AnnotatedVersion(ProvMixin, EntityMixin):
- version_id: str
- annotation_id: str
- prov_type: ProvType = field(repr=False)
-
-
-@dataclass(unsafe_hash=True, kw_only=True)
-class Creation(ProvMixin, ActivityMixin):
- creation_id: str
- prov_start: datetime = field(repr=False)
- prov_end: datetime = field(repr=False)
- prov_type: ProvType = field(repr=False)
-
-
-@dataclass(unsafe_hash=True, kw_only=True)
-class GitCommit(ProvMixin, ActivityMixin):
- hexsha: str
- message: str = field(repr=False)
- title: str = field(repr=False)
- author: User = field(repr=False, metadata=IS_RELATION)
- committer: User = field(repr=False, metadata=IS_RELATION)
- parents: list[str] = field(repr=False, metadata=IS_RELATION)
- prov_start: datetime = field(repr=False)
- prov_end: datetime = field(repr=False)
- prov_type: ProvType = field(init=False, repr=False, default=ProvType.GIT_COMMIT)
-
-
-@dataclass(unsafe_hash=True, kw_only=True)
-class Issue(ProvMixin, EntityMixin):
+ resource: str
+ start: datetime
+ end: datetime
+
+ @property
+ def identifier(self) -> QualifiedName:
+ return qualified_name(f"Creation?{self.id=}&{self.resource=}")
+
+ @classmethod
+ def from_tag(cls, tag: GitTag):
+ return cls(id=tag.name, resource=ProvType.TAG, start=tag.created_at, end=tag.created_at)
+
+ @classmethod
+ def from_commit(cls, commit: Commit):
+ return cls(
+ id=commit.sha,
+ resource=ProvType.COMMIT,
+ start=commit.authored_at,
+ end=commit.committed_at,
+ )
+
+ @classmethod
+ def from_issue(cls, issue: Issue):
+ return cls(
+ id=issue.id, resource=ProvType.ISSUE, start=issue.created_at, end=issue.closed_at
+ )
+
+ @classmethod
+ def from_merge_request(cls, merge_request: MergeRequest):
+ return cls(
+ id=merge_request.id,
+ resource=ProvType.MERGE_REQUEST,
+ start=merge_request.created_at,
+ end=merge_request.closed_at,
+ )
+
+ def to_prov_element(self) -> ProvActivity:
+ attributes = [
+ ("id", self.id),
+ (PROV_ATTR_STARTTIME, self.start),
+ (PROV_ATTR_ENDTIME, self.end),
+ (PROV_TYPE, ProvType.CREATION),
+ ]
+ return ProvActivity(PLACEHOLDER, self.identifier, attributes)
+
+
+@dataclass
+class GitCommit:
+ sha: str # commit sha
+ title: str # commit title
+ message: str # commit message
+ author: User # author: User
+ committer: User # committer: User
+ deletions: int # number of lines deleted
+ insertions: int # number of lines inserted
+ lines: int # number of lines changed
+ files_changed: int # number of files changed
+ parents: list[str] # list of parent commit shas
+ authored_at: datetime # authored date
+ committed_at: datetime # committed date
+
+ @property
+ def identifier(self) -> QualifiedName:
+ return qualified_name(f"GitCommit?{self.sha=}")
+
+ def to_prov_element(self) -> ProvActivity:
+ attributes = [
+ ("sha", self.sha),
+ ("title", self.title),
+ ("message", self.message),
+ ("deletions", self.deletions),
+ ("insertions", self.insertions),
+ ("lines", self.lines),
+ ("files_changed", self.files_changed),
+ ("authored_at", self.authored_at),
+ ("committed_at", self.committed_at),
+ (PROV_ATTR_STARTTIME, self.authored_at),
+ (PROV_ATTR_ENDTIME, self.committed_at),
+ (PROV_TYPE, ProvType.GIT_COMMIT),
+ ]
+ return ProvActivity(PLACEHOLDER, self.identifier, attributes)
+
+
+@dataclass
+class Issue:
id: str
iid: str
+ platform: str
title: str
- description: str = field(repr=False)
- url: str = field(repr=False)
- author: User = field(repr=False, metadata=IS_RELATION)
- annotations: list[Annotation] = field(repr=False, metadata=IS_RELATION)
+ body: str
+ url: str
+ author: User
+ annotations: list[Annotation]
created_at: datetime = field(repr=False)
closed_at: datetime | None = field(repr=False, default=None)
- prov_type: ProvType = field(init=False, repr=False, default=ProvType.ISSUE)
+
+ @property
+ def identifier(self) -> QualifiedName:
+ return qualified_name(f"Issue?{self.id=}")
@property
def creation(self) -> Creation:
- return Creation(
- creation_id=self.id,
- prov_start=self.created_at,
- prov_end=self.closed_at,
- prov_type=ProvType.ISSUE_CREATION,
- )
+ return Creation.from_issue(self)
@property
def first_version(self) -> Version:
- return Version(version_id=self.id, prov_type=ProvType.ISSUE_VERSION)
+ return Version.from_issue(self)
@property
def annotated_versions(self) -> list[AnnotatedVersion]:
- return [
- AnnotatedVersion(
- version_id=self.id,
- annotation_id=annotation.id,
- prov_type=ProvType.ISSUE_VERSION_ANNOTATED,
- )
- for annotation in self.annotations
+ return [AnnotatedVersion.from_issue(self, annotation) for annotation in self.annotations]
+
+ def to_prov_element(self) -> ProvActivity:
+ attributes = [
+ ("id", self.id),
+ ("iid", self.iid),
+ ("title", self.title),
+ ("body", self.body),
+ ("platform", self.platform),
+ ("url", self.url),
+ (PROV_ATTR_STARTTIME, self.created_at),
+ (PROV_ATTR_ENDTIME, self.closed_at),
+ (PROV_TYPE, ProvType.ISSUE),
]
+ return ProvActivity(PLACEHOLDER, self.identifier, attributes)
+
+@dataclass
+class Commit:
+ sha: str
+ url: str
+ author: User
+ platform: str
+ annotations: list[Annotation]
+ authored_at: datetime
+ committed_at: datetime
-@dataclass(unsafe_hash=True, kw_only=True)
-class GitlabCommit(ProvMixin, EntityMixin):
- hexsha: str
- url: str = field(repr=False)
- author: User = field(repr=False, metadata=IS_RELATION)
- annotations: list[Annotation] = field(repr=False, metadata=IS_RELATION)
- authored_at: datetime = field(repr=False)
- committed_at: datetime = field(repr=False)
- prov_type: ProvType = field(init=False, repr=False, default=ProvType.GITLAB_COMMIT)
+ @property
+ def identifier(self) -> QualifiedName:
+ return qualified_name(f"Commit?{self.sha=}")
@property
def creation(self) -> Creation:
- return Creation(
- creation_id=self.hexsha,
- prov_start=self.authored_at,
- prov_end=self.committed_at,
- prov_type=ProvType.GITLAB_COMMIT_CREATION,
- )
+ return Creation.from_commit(self)
@property
def first_version(self) -> Version:
- return Version(version_id=self.hexsha, prov_type=ProvType.GITLAB_COMMIT_VERSION)
+ return Version.from_commit(self)
@property
def annotated_versions(self) -> list[AnnotatedVersion]:
- return [
- AnnotatedVersion(
- version_id=self.hexsha,
- annotation_id=annotation.id,
- prov_type=ProvType.GITLAB_COMMIT_VERSION_ANNOTATED,
- )
- for annotation in self.annotations
+ return [AnnotatedVersion.from_commit(self, annotation) for annotation in self.annotations]
+
+ def to_prov_element(self) -> ProvActivity:
+ attributes = [
+ ("sha", self.sha),
+ ("url", self.url),
+ ("platform", self.platform),
+ (PROV_ATTR_STARTTIME, self.authored_at),
+ (PROV_ATTR_ENDTIME, self.committed_at),
+ (PROV_TYPE, ProvType.COMMIT),
]
+ return ProvActivity(PLACEHOLDER, self.identifier, attributes)
-@dataclass(unsafe_hash=True, kw_only=True)
-class MergeRequest(ProvMixin, EntityMixin):
+@dataclass
+class MergeRequest:
id: str
iid: str
title: str
- description: str = field(repr=False)
- url: str = field(repr=False)
- source_branch: str = field(repr=False)
- target_branch: str = field(repr=False)
- author: User = field(repr=False, metadata=IS_RELATION)
- annotations: list[Annotation] = field(repr=False, metadata=IS_RELATION)
- created_at: datetime = field(repr=False)
- closed_at: datetime | None = field(repr=False, default=None)
- merged_at: datetime | None = field(repr=False, default=None)
- first_deployed_to_production_at: datetime | None = field(repr=False, default=None)
- prov_type: ProvType = field(init=False, repr=False, default=ProvType.MERGE_REQUEST)
+ body: str
+ url: str
+ platform: str
+ source_branch: str # base for github
+ target_branch: str # head for github
+ author: User
+ annotations: list[Annotation]
+ created_at: datetime
+ closed_at: datetime | None = None
+ merged_at: datetime | None = None
+ first_deployed_to_production_at: datetime | None = None
+
+ @property
+ def identifier(self) -> QualifiedName:
+ return qualified_name(f"MergeRequest?{self.id=}")
@property
def creation(self) -> Creation:
- return Creation(
- creation_id=self.id,
- prov_start=self.created_at,
- prov_end=self.closed_at,
- prov_type=ProvType.MERGE_REQUEST_CREATION,
- )
+ return Creation.from_merge_request(self)
@property
def first_version(self) -> Version:
- return Version(version_id=self.id, prov_type=ProvType.MERGE_REQUEST_VERSION)
+ return Version.from_merge_request(self)
@property
def annotated_versions(self) -> list[AnnotatedVersion]:
return [
- AnnotatedVersion(
- version_id=self.id,
- annotation_id=annotation.id,
- prov_type=ProvType.MERGE_REQUEST_VERSION_ANNOTATED,
- )
+ AnnotatedVersion.from_merge_request(self, annotation)
for annotation in self.annotations
]
+ def to_prov_element(self) -> ProvActivity:
+ attributes = [
+ ("id", self.id),
+ ("iid", self.iid),
+ ("title", self.title),
+ ("body", self.body),
+ ("url", self.url),
+ ("platform", self.platform),
+ ("source_branch", self.source_branch),
+ ("target_branch", self.target_branch),
+ ("created_at", self.created_at),
+ ("closed_at", self.closed_at),
+ ("merged_at", self.merged_at),
+ ("first_deployed_to_production_at", self.first_deployed_to_production_at),
+ (PROV_ATTR_STARTTIME, self.created_at),
+ (PROV_ATTR_ENDTIME, self.closed_at),
+ (PROV_TYPE, ProvType.MERGE_REQUEST),
+ ]
+ return ProvActivity(PLACEHOLDER, self.identifier, attributes)
-@dataclass(unsafe_hash=True, kw_only=True)
-class Tag(ProvMixin, EntityMixin):
+
+@dataclass
+class GitTag:
name: str
- hexsha: str
- message: str | None = field(repr=False)
- author: User = field(repr=False, metadata=IS_RELATION)
- created_at: datetime = field(repr=False)
- prov_type: list[ProvType] = field(
- init=False,
- repr=False,
- default_factory=lambda: [ProvType.TAG, ProvType.COLLECTION],
- )
+ sha: str
+ message: str | None
+ author: User
+ created_at: datetime
+
+ @property
+ def identifier(self) -> QualifiedName:
+ return qualified_name(f"GitTag?{self.name=}")
@property
def creation(self) -> Creation:
- return Creation(
- creation_id=self.name,
- prov_start=self.created_at,
- prov_end=self.created_at,
- prov_type=ProvType.TAG_CREATION,
- )
+ return Creation.from_tag(self)
+
+ def to_prov_element(self) -> ProvEntity:
+ attributes = [
+ ("name", self.name),
+ ("sha", self.sha),
+ ("message", self.message),
+ ("created_at", self.created_at),
+ (PROV_ATTR_STARTTIME, self.created_at),
+ (PROV_ATTR_ENDTIME, self.created_at),
+ (PROV_TYPE, ProvType.TAG),
+ (PROV_TYPE, ProvType.COLLECTION),
+ ]
+ return ProvEntity(PLACEHOLDER, self.identifier, attributes)
-@dataclass(unsafe_hash=True, kw_only=True)
-class Asset(ProvMixin, EntityMixin):
+@dataclass
+class Asset:
url: str
format: str
- prov_type: ProvType = field(init=False, repr=False, default=ProvType.ASSET)
+
+ @property
+ def identifier(self) -> QualifiedName:
+ return qualified_name(f"Asset?{self.url=}")
+
+ def to_prov_element(self) -> ProvEntity:
+ attributes = [
+ ("url", self.url),
+ ("format", self.format),
+ (PROV_TYPE, ProvType.ASSET),
+ ]
+ return ProvEntity(PLACEHOLDER, self.identifier, attributes)
-@dataclass(unsafe_hash=True, kw_only=True)
-class Evidence(ProvMixin, EntityMixin):
- hexsha: str
+@dataclass
+class Evidence:
+ sha: str
url: str
collected_at: datetime
- prov_type: ProvType = field(init=False, repr=False, default=ProvType.EVIDENCE)
+
+ @property
+ def identifier(self) -> QualifiedName:
+ return qualified_name(f"Evidence?{self.sha=}")
+
+ def to_prov_element(self) -> ProvEntity:
+ attributes = [
+ ("sha", self.sha),
+ ("url", self.url),
+ ("collected_at", self.collected_at),
+ (PROV_TYPE, ProvType.EVIDENCE),
+ ]
+ return ProvEntity(PLACEHOLDER, self.identifier, attributes)
-@dataclass(unsafe_hash=True, kw_only=True)
-class Release(ProvMixin, EntityMixin):
+@dataclass
+class Release:
name: str
- description: str = field(repr=False)
- tag_name: str = field(repr=False)
- author: User | None = field(repr=False, metadata=IS_RELATION)
- assets: list[Asset] = field(repr=False, metadata=IS_RELATION)
- evidences: list[Evidence] = field(repr=False, metadata=IS_RELATION)
- created_at: datetime = field(repr=False)
- released_at: datetime = field(repr=False)
- prov_type: list[ProvType] = field(
- init=False,
- repr=False,
- default_factory=lambda: [ProvType.RELEASE, ProvType.COLLECTION],
- )
+ body: str
+ tag_name: str
+ platform: str
+ author: User | None
+ assets: list[Asset]
+ evidences: list[Evidence]
+ created_at: datetime
+ released_at: datetime
+
+ @property
+ def identifier(self) -> QualifiedName:
+ return qualified_name(f"Release?{self.name=}")
@property
def creation(self) -> Creation:
- return Creation(
- creation_id=self.name,
- prov_start=self.created_at,
- prov_end=self.released_at,
- prov_type=ProvType.RELEASE_CREATION,
- )
+ return Creation.from_release(self)
+
+ def to_prov_element(self) -> ProvEntity:
+ attributes = [
+ ("name", self.name),
+ ("body", self.body),
+ ("tag_name", self.tag_name),
+ ("platform", self.platform),
+ ("created_at", self.created_at),
+ ("released_at", self.released_at),
+ (PROV_TYPE, ProvType.RELEASE),
+ (PROV_TYPE, ProvType.COLLECTION),
+ ]
+ return ProvEntity(PLACEHOLDER, self.identifier, attributes)
diff --git a/gitlab2prov/entrypoints/cli.py b/gitlab2prov/entrypoints/cli.py
index 050a31b..93dd87e 100644
--- a/gitlab2prov/entrypoints/cli.py
+++ b/gitlab2prov/entrypoints/cli.py
@@ -1,13 +1,15 @@
from functools import partial
from functools import update_wrapper
from functools import wraps
+from typing import Iterator
+from prov.model import ProvDocument
import click
import git
from gitlab2prov import __version__
from gitlab2prov import bootstrap
-from gitlab2prov.config import ConfigParser
+from gitlab2prov.config import Config
from gitlab2prov.domain import commands
from gitlab2prov.log import create_logger
from gitlab2prov.prov import operations
@@ -22,36 +24,60 @@ def is_git_available():
return False
-def enable_logging(ctx: click.Context, _, enable: bool):
+def is_git_available():
+ """Check whether git is installed using the GitPython package."""
+ try:
+ git.Git().execute(["git", "--version"])
+ return True
+ except git.exc.GitCommandNotFound:
+ return False
+
+
+def enable_logging(ctx: click.Context, param: str, enable: bool):
"""Callback that optionally enables logging."""
if enable:
create_logger()
-def invoke_from_config(ctx: click.Context, _, filepath: str):
+def load_and_validate_config(ctx: click.Context, filepath: str) -> Config:
+ """Load configuration from file and validate it. Returns the config if successful, otherwise fails the context."""
+ if not filepath:
+ return None
+ config = Config.read(filepath)
+ is_valid, error_message = config.validate()
+ if not is_valid:
+ ctx.fail(f"Validation failed: {error_message}")
+ return config
+
+
+def execute_command_from_config(ctx: click.Context, param: str, filepath: str):
"""Callback that executes a gitlab2prov run from a config file."""
- if filepath:
- args = ConfigParser().parse(filepath)
- context = cli.make_context(f"{cli}", args=args, parent=ctx)
- cli.invoke(context)
- ctx.exit()
+ config = load_and_validate_config(ctx, filepath)
+ if not config:
+ return
+
+ context = ctx.command.make_context(ctx.command.name, args=config.parse(), parent=ctx)
+ ctx.command.invoke(context)
+ ctx.exit()
-def validate_config(ctx: click.Context, _, filepath: str):
+def validate_config(ctx: click.Context, param: str, filepath: str):
"""Callback that validates config file using gitlab2prov/config/schema.json."""
- if filepath:
- try:
- ConfigParser().validate(filepath)
- print(ConfigParser().parse(filepath))
- except Exception as err:
- ctx.fail(f"validation failed: {err}")
- click.echo(f"-- OK --")
- ctx.exit()
+ config = load_and_validate_config(ctx, filepath)
+ if not config:
+ return
+
+ click.echo("Validation successful, the following command would be executed:\n")
+ click.echo(f"gitlab2prov {' '.join(config.parse())}")
+ ctx.exit()
def processor(func, wrapped=None):
- """Helper decorator to rewrite a function so that it returns another
- function from it.
+ """Decorator that turns a function into a processor.
+
+ A processor is a function that takes a stream of values, applies an operation
+ to each value and returns a new stream of values.
+ A processor therefore transforms a stream of values into a new stream of values.
"""
@wraps(wrapped or func)
@@ -65,8 +91,12 @@ def processor(stream):
def generator(func):
- """Similar to the :func:`processor` but passes through old values
- unchanged and does not pass through the values as parameter."""
+ """Decorator that turns a function into a generator.
+
+ A generator is a special case of a processor.
+ A generator is a processor that doesn't apply any operation
+ to the values but adds new values to the stream.
+ """
@partial(processor, wrapped=func)
def new_func(stream, *args, **kwargs):
@@ -81,7 +111,6 @@ def new_func(stream, *args, **kwargs):
@click.option(
"--verbose",
is_flag=True,
- is_eager=True,
default=False,
expose_value=False,
callback=enable_logging,
@@ -91,33 +120,64 @@ def new_func(stream, *args, **kwargs):
"--config",
type=click.Path(exists=True, dir_okay=False),
expose_value=False,
- callback=invoke_from_config,
+ callback=execute_command_from_config,
help="Read config from file.",
)
@click.option(
"--validate",
- is_eager=True,
type=click.Path(exists=True, dir_okay=False),
expose_value=False,
callback=validate_config,
help="Validate config file and exit.",
)
@click.pass_context
-def cli(ctx):
+def gitlab2prov(ctx):
"""
Extract provenance information from GitLab projects.
"""
if not is_git_available():
ctx.fail("Could not find git. Please install git.")
- ctx.obj = bootstrap.bootstrap()
+ ctx.obj = bootstrap.bootstrap("gitlab")
-@cli.result_callback()
+@click.group(chain=True, invoke_without_command=False)
+@click.version_option(version=__version__, prog_name="github2prov")
+@click.option(
+ "--verbose",
+ is_flag=True,
+ default=False,
+ expose_value=False,
+ callback=enable_logging,
+ help="Enable logging to 'github2prov.log'.",
+)
+@click.option(
+ "--config",
+ type=click.Path(exists=True, dir_okay=False),
+ expose_value=False,
+ callback=execute_command_from_config,
+ help="Read config from file.",
+)
+@click.option(
+ "--validate",
+ type=click.Path(exists=True, dir_okay=False),
+ expose_value=False,
+ callback=validate_config,
+ help="Validate config file and exit.",
+)
+@click.pass_context
+def github2prov(ctx):
+ ctx.obj = bootstrap.bootstrap("github")
+
+
+@github2prov.result_callback()
+@gitlab2prov.result_callback()
def process_commands(processors, **kwargs):
- """This result callback is invoked with an iterable of all the chained
- subcommands. As each subcommand returns a function
- we can chain them together to feed one into the other, similar to how
- a pipe on unix works.
+ """Execute the chain of commands.
+
+ This function is called after all subcommands have been chained together.
+ It executes the chain of commands by piping the output of one command
+ into the input of the next command. Subcommands can be processors that transform
+ the stream of values or generators that add new values to the stream.
"""
# Start with an empty iterable.
stream = ()
@@ -131,61 +191,65 @@ def process_commands(processors, **kwargs):
pass
-@cli.command("extract")
+@click.command()
@click.option(
"-u", "--url", "urls", multiple=True, type=str, required=True, help="Project url[s]."
)
@click.option("-t", "--token", required=True, type=str, help="Gitlab API token.")
@click.pass_obj
@generator
-def do_extract(bus, urls: list[str], token: str):
+def extract(bus, urls: list[str], token: str):
"""Extract provenance information for one or more gitlab projects.
This command extracts provenance information from one or multiple gitlab projects.
The extracted provenance is returned as a combined provenance graph.
"""
- for url in urls:
- bus.handle(commands.Fetch(url, token))
+ document = None
- graph = bus.handle(commands.Serialize())
- graph.description = f"graph extracted from '{', '.join(urls)}'"
- yield graph
+ for url in urls:
+ doc = bus.handle(commands.Fetch(url, token))
+ doc = bus.handle(commands.Serialize(url))
+ doc = bus.handle(commands.Transform(doc))
+ if not document:
+ document = doc
+ document.update(doc)
- bus.handle(commands.Reset())
+ document.description = f"extracted from '{', '.join(urls)}'"
+ yield document
-@cli.command("load", short_help="Load provenance files.")
+@click.command()
@click.option(
"-i",
"--input",
+ "filenames",
+ default=["-"],
multiple=True,
- type=click.Path(exists=True, dir_okay=False),
+ type=click.Path(dir_okay=False),
help="Provenance file path (specify '-' to read from ).",
)
+@click.pass_obj
@generator
-def load(input):
- """Load provenance information from a file.
+def read(bus, filenames: list[str]):
+ """Read provenance information from file[s].
- This command reads one provenance graph from a file or multiple graphs from multiple files.
+ This command reads one provenance graph from a file/stdin or
+ multiple graphs from multiple files.
"""
- for filepath in input:
+ for filename in filenames:
try:
- if filepath == "-":
- graph = operations.deserialize_graph()
- graph.description = f"''"
- yield graph
- else:
- graph = operations.deserialize_graph(filepath)
- graph.description = f"'{filepath}'"
- yield graph
+ document = bus.handle(commands.Read(filename=filename))
+ document.description = "''" if filename == "-" else f"'{filename}'"
+ yield document
except Exception as e:
- click.echo(f"Could not open '{filepath}': {e}", err=True)
+ click.echo(f"Could not open '{filename}': {e}", err=True)
-@cli.command("save")
+@click.command()
@click.option(
"-f",
"--format",
+ "formats",
multiple=True,
default=["json"],
type=click.Choice(operations.SERIALIZATION_FORMATS),
@@ -194,125 +258,138 @@ def load(input):
@click.option(
"-o",
"--output",
- default="gitlab2prov-graph-{:04}",
+ "destination",
+ default="-",
help="Output file path.",
)
@processor
-def save(graphs, format, output):
- """Save provenance information to a file.
+@click.pass_obj
+def write(bus, documents, formats, destination):
+ """Write provenance information to file[s].
+
+ This command saves one or multiple provenance documents to a file.
- This command writes each provenance graph that is piped to it to a file.
+ The output file path can be specified using the '-o' option.
+ The serialization format can be specified using the '-f' option.
"""
- for idx, graph in enumerate(graphs, start=1):
- for fmt in format:
+ documents = list(documents)
+
+ for i, document in enumerate(documents, start=1):
+ for fmt in formats:
+ filename = f"{destination}{'-' + str(i) if len(documents) > 1 else ''}.{fmt}"
try:
- serialized = operations.serialize_graph(graph, fmt)
- if output == "-":
- click.echo(serialized)
- else:
- with open(f"{output.format(idx)}.{fmt}", "w") as out:
- click.echo(serialized, file=out)
- except Exception as e:
- click.echo(f"Could not save {graph.description}: {e}", err=True)
- yield graph
-
-
-@cli.command("pseudonymize")
-@processor
-def pseudonymize(graphs):
- """Pseudonymize a provenance graph.
+ bus.handle(commands.Write(document, filename, fmt))
+ except Exception as exc:
+ click.echo(f"Could not save {document.description}: {exc}", err=True)
- This command pseudonymizes each provenance graph that is piped to it.
+ yield document
+
+
+@click.command()
+@click.option("--use-pseudonyms", is_flag=True, help="Use pseudonyms.")
+@click.option("--remove-duplicates", is_flag=True, help="Remove duplicate statements.")
+@click.option(
+ "--merge-aliased-agents",
+ type=click.Path(exists=True),
+ default="",
+ help="Merge aliased agents.",
+)
+@processor
+@click.pass_obj
+def transform(
+ bus,
+ documents: Iterator[ProvDocument],
+ use_pseudonyms: bool = False,
+ remove_duplicates: bool = False,
+ merge_aliased_agents: str = "",
+):
+ """Apply a set of transformations to provenance documents.
+
+ This command applies a set of transformations to one or multiple provenance documents.
"""
- for graph in graphs:
- try:
- pseud = operations.pseudonymize(graph)
- pseud.description = f"pseudonymized {graph.description}"
- yield pseud
- except Exception as e:
- click.echo(f"Could not pseudonymize {graph.description}: {e}", err=True)
+ for document in documents:
+ transformed = bus.handle(
+ commands.Transform(document, use_pseudonyms, remove_duplicates, merge_aliased_agents)
+ )
+ transformed.description = f"normalized {document.description}"
+ yield transformed
-@cli.command("combine")
+@click.command()
@processor
-def combine(graphs):
- """Combine multiple graphs into one.
+@click.pass_obj
+def combine(bus, documents: Iterator[ProvDocument]):
+ """Combine one or more provenance documents.
- This command combines all graphs that are piped to it into one.
+ This command combines one or multiple provenance documents into a single document.
"""
- graphs = list(graphs)
+ documents = list(documents)
+ descriptions = [doc.description for doc in documents]
+
try:
- combined = operations.combine(iter(graphs))
- descriptions = ", ".join(graph.description for graph in graphs)
- combined.description = f"combination of {descriptions}"
- yield combined
- except Exception as e:
- descriptions = "with ".join(graph.description for graph in graphs)
- click.echo(f"Could not combine {descriptions}: {e}", err=True)
+ document = bus.handle(commands.Combine(documents))
+ document = bus.handle(commands.Transform(document))
+ document.description = f"combination of {', '.join(descriptions)}"
+ yield document
+
+ except Exception as exc:
+ click.echo(f"Could not combine {', '.join(descriptions)}: {exc}", err=True)
-@cli.command("stats")
+@click.command("stats")
@click.option(
"--coarse",
"resolution",
flag_value="coarse",
default=True,
- help="Print the number of PROV elements aswell as the overall number of relations.",
+ help="Print the number of PROV elements for each element type.",
)
@click.option(
"--fine",
"resolution",
flag_value="fine",
- help="Print the number of PROV elements aswell as the number of PROV relations for each relation type.",
+ help="Print the number of PROV elements for each element type and each relation type.",
)
+@click.option("--format", type=click.Choice(["csv", "table"]), default="table")
@click.option(
"--explain",
- "show_description",
is_flag=True,
help="Print a textual summary of all operations applied to the graphs.",
)
-@click.option("--formatter", type=click.Choice(["csv", "table"]), default="table")
@processor
-def stats(graphs, resolution, show_description, formatter):
- """Print statistics such as node counts and relation counts.
+@click.pass_obj
+def statistics(
+ bus, documents: Iterator[ProvDocument], resolution: str, format: str, explain: bool
+):
+ """Print statistics for one or more provenance documents.
This command prints statistics for each processed provenance graph.
Statistics include the number of elements for each element type aswell as the number of relations for each relation type.
Optionally, a short textual summary of all operations applied to the processed graphs can be printed to stdout.
"""
- for graph in graphs:
+ for document in documents:
try:
- if show_description:
- click.echo(f"\nDescription: {graph.description.capitalize()}\n")
- click.echo(
- operations.stats(
- graph,
- resolution,
- formatter=operations.format_stats_as_ascii_table
- if formatter == "table"
- else operations.format_stats_as_csv,
- )
- )
- yield graph
- except Exception as e:
- click.echo(f"Could not display stats for {graph.description}: {e}", err=True)
-
-
-@cli.command()
-@click.option(
- "--mapping",
- type=click.Path(exists=True, dir_okay=False),
- help="File path to duplicate agent mapping.",
-)
-@processor
-def merge_duplicated_agents(graphs, mapping):
- """Merge duplicated agents based on a name to aliases mapping.
-
- This command solves the problem of duplicated agents that can occur when the same physical user
- uses different user names and emails for his git and gitlab account.
- Based on a mapping of names to aliases the duplicated agents can be merged.
- """
- for graph in graphs:
- graph = operations.merge_duplicated_agents(graph, mapping)
- graph.description += f"merged double agents {graph.description}"
- yield graph
+ statistics = bus.handle(commands.Statistics(document, resolution, format))
+ if explain:
+ statistics = f"{document.description}\n\n{statistics}"
+ click.echo(statistics)
+ except Exception:
+ click.echo("Could not compute statistics for {document.description}.", err=True)
+ yield document
+
+
+# CLI group for gitlab commands
+gitlab2prov.add_command(extract)
+gitlab2prov.add_command(read)
+gitlab2prov.add_command(write)
+gitlab2prov.add_command(combine)
+gitlab2prov.add_command(transform)
+gitlab2prov.add_command(statistics)
+
+# CLI group for github commands
+github2prov.add_command(extract)
+github2prov.add_command(read)
+github2prov.add_command(write)
+github2prov.add_command(combine)
+github2prov.add_command(transform)
+github2prov.add_command(statistics)
diff --git a/gitlab2prov/prov/model.py b/gitlab2prov/prov/model.py
index f936ed3..97c5f12 100644
--- a/gitlab2prov/prov/model.py
+++ b/gitlab2prov/prov/model.py
@@ -1,386 +1,603 @@
-from typing import Optional, Union
-
-from prov.model import ProvDocument, PROV_ROLE
+from typing import Optional, Union, Type, Iterable, Callable, Any
+from dataclasses import dataclass, field
+from operator import attrgetter
+
+from prov.model import (
+ ProvDocument,
+ ProvDerivation,
+ PROV_ROLE,
+ PROV_ATTR_STARTTIME,
+ ProvInvalidation,
+ ProvMembership,
+ ProvElement,
+ ProvUsage,
+ ProvAssociation,
+ ProvAttribution,
+ ProvGeneration,
+ ProvSpecialization,
+ ProvCommunication,
+ ProvRelation,
+ ProvRecord,
+)
+from prov.identifier import QualifiedName, Namespace
+from functools import partial
-from gitlab2prov.prov.operations import graph_factory
-from gitlab2prov.adapters.repository import AbstractRepository
-from gitlab2prov.domain.constants import ChangeType, ProvRole
+from gitlab2prov.adapters.repository import Repository
+from gitlab2prov.domain.constants import ProvRole
from gitlab2prov.domain.objects import (
FileRevision,
GitCommit,
- GitlabCommit,
+ Commit,
Issue,
MergeRequest,
Release,
- Tag,
+ GitTag,
+ Annotation,
+ Creation,
+ AnnotatedVersion,
)
-Resource = Union[GitlabCommit, Issue, MergeRequest]
-
-
-def git_commit_model(resources: AbstractRepository, graph: ProvDocument = None):
- """Commit model implementation."""
- if graph is None:
- graph = graph_factory()
- for commit in resources.list_all(GitCommit):
- file_revisions = resources.list_all(FileRevision, committed_in=commit.hexsha)
- parents = [resources.get(GitCommit, hexsha=hexsha) for hexsha in commit.parents]
- parents = [parent for parent in parents if parent is not None]
- for rev in file_revisions:
- model = choose_rev_model(rev)
- if model is None:
- continue
- graph.update(model(commit, parents, rev))
- return graph
-
-
-def choose_rev_model(rev: FileRevision):
- """Add the file change models based on the change type of each file version."""
- if rev.change_type == ChangeType.ADDED:
- return addition
- if (
- rev.change_type == ChangeType.MODIFIED
- or rev.change_type == ChangeType.RENAMED
- or rev.change_type == ChangeType.COPIED
- or rev.change_type == ChangeType.CHANGED
+AUTHOR_ROLE_MAP = {
+ Commit: ProvRole.COMMIT_AUTHOR,
+ Issue: ProvRole.ISSUE_AUTHOR,
+ MergeRequest: ProvRole.MERGE_REQUEST_AUTHOR,
+}
+
+
+HostedResource = Commit | Issue | MergeRequest
+Query = Callable[[Repository], Iterable[HostedResource]]
+DEFAULT_NAMESPACE = Namespace("ex", "example.org")
+
+
+def file_status_query(repository: Repository, status: str):
+ for revision in repository.list_all(FileRevision, status=status):
+ commit = repository.get(GitCommit, sha=revision.commit)
+ for parent in [repository.get(GitCommit, sha=sha) for sha in commit.parents]:
+ if status == "modified":
+ yield commit, parent, revision, revision.previous
+ else:
+ yield commit, parent, revision
+
+
+def hosted_resource_query(repository: Repository, resource_type: Type[HostedResource]):
+ for resource in repository.list_all(resource_type):
+ if resource_type == Commit:
+ yield (resource, repository.get(GitCommit, sha=resource.sha))
+ yield (resource, None)
+
+
+FileAdditionQuery = partial(file_status_query, status="added")
+FileDeletionQuery = partial(file_status_query, status="deleted")
+FileModificationQuery = partial(file_status_query, status="modified")
+HostedCommitQuery = partial(hosted_resource_query, resource_type=Commit)
+HostedIssueQuery = partial(hosted_resource_query, resource_type=Issue)
+HostedMergeQuery = partial(hosted_resource_query, resource_type=MergeRequest)
+
+
+@dataclass
+class ProvenanceContext:
+ document: ProvDocument
+ namespace: Optional[str] = None
+
+ def add_element(self, dataclass_instance) -> ProvRecord:
+ # Convert the dataclass instance to a ProvElement
+ element = self.convert_to_prov_element(dataclass_instance)
+ # Add the namespace to the element if it is provided
+ if self.namespace:
+ element.add_namespace(self.namespace)
+ # Return the newly added element
+ return self.document.add_record(element)
+
+ def convert_to_prov_element(self, dataclass_instance) -> ProvElement:
+ # Convert the dataclass instance to a ProvElement
+ element = dataclass_instance.to_prov_element()
+ # Add the element to the ProvDocument and return it
+ return self.document.new_record(element._prov_type, element.identifier, element.attributes)
+
+ def add_relation(
+ self,
+ source_dataclass_instance,
+ target_dataclass_instance,
+ relationship_type: Type[ProvRelation],
+ attributes: dict[str, Any] = None,
+ ) -> None:
+ # Initialize attributes if they are not provided
+ if not attributes:
+ attributes = dict()
+ # Make sure that both source and target are part of the document
+ source = self.add_element(source_dataclass_instance)
+ target = self.add_element(target_dataclass_instance)
+ # Create a relationship between the source and target
+ relationship = self.document.new_record(
+ relationship_type._prov_type,
+ QualifiedName(DEFAULT_NAMESPACE, f"relation:{source.identifier}:{target.identifier}"),
+ {
+ relationship_type.FORMAL_ATTRIBUTES[0]: source,
+ relationship_type.FORMAL_ATTRIBUTES[1]: target,
+ },
+ )
+ # Add the remaining attributes to the relationship
+ relationship.add_attributes(attributes)
+ # Add the relationship to the ProvDocument
+ self.document.add_record(relationship)
+
+
+@dataclass
+class FileAdditionModel:
+ commit: GitCommit
+ parent: GitCommit
+ revision: FileRevision
+ ctx: ProvenanceContext = field(init=False)
+
+ def __post_init__(self):
+ self.ctx = ProvenanceContext(ProvDocument())
+
+ def build_provenance_model(self) -> ProvDocument:
+ # Add the elements to the context
+ self.ctx.add_element(self.commit)
+ self.ctx.add_element(self.commit.author)
+ self.ctx.add_element(self.commit.committer)
+ self.ctx.add_element(self.revision)
+ self.ctx.add_element(self.revision.file)
+ # Check if parent exists
+ if self.parent:
+ # Add the parent to the context
+ self.ctx.add_element(self.parent)
+ # Add the communication relation (wasInformedBy) between the parent and the commit
+ self.ctx.add_relation(self.commit, self.parent, ProvCommunication, {})
+ # Add the relations to the context
+ self.ctx.add_relation(
+ self.commit,
+ self.commit.author,
+ ProvAssociation,
+ {PROV_ROLE: ProvRole.AUTHOR},
+ )
+ self.ctx.add_relation(
+ self.commit,
+ self.commit.committer,
+ ProvAssociation,
+ {PROV_ROLE: ProvRole.COMMITTER},
+ )
+ self.ctx.add_relation(
+ self.revision,
+ self.commit,
+ ProvGeneration,
+ {
+ PROV_ATTR_STARTTIME: self.commit.authored_at,
+ PROV_ROLE: ProvRole.FILE,
+ "insertions": self.revision.insertions,
+ "deletions": self.revision.deletions,
+ "lines": self.revision.lines,
+ "score": self.revision.score,
+ },
+ )
+ self.ctx.add_relation(
+ self.revision.file,
+ self.commit,
+ ProvGeneration,
+ {
+ PROV_ATTR_STARTTIME: self.commit.authored_at,
+ PROV_ROLE: ProvRole.ADDED_REVISION,
+ },
+ )
+ self.ctx.add_relation(self.revision.file, self.commit.author, ProvAttribution)
+ self.ctx.add_relation(self.revision, self.revision.file, ProvSpecialization)
+ # Return the document
+ return self.ctx.document
+
+
+
+@dataclass
+class FileDeletionModel:
+ commit: GitCommit
+ parent: GitCommit
+ revision: FileRevision
+ ctx: ProvenanceContext = field(init=False)
+
+ def __post_init__(self):
+ # Initialize the context
+ self.ctx = ProvenanceContext(ProvDocument())
+
+ def build_provenance_model(self) -> ProvDocument:
+ # Add the elements to the context
+ self.ctx.add_element(self.commit)
+ self.ctx.add_element(self.revision)
+ self.ctx.add_element(self.revision.file)
+ self.ctx.add_element(self.commit.author)
+ self.ctx.add_element(self.commit.committer)
+ # Check if parent exists
+ if self.parent:
+ # Add the parent to the context
+ self.ctx.add_element(self.parent)
+ # Add the communication relation (wasInformedBy) between the parent and the commit
+ self.ctx.add_relation(self.commit, self.parent, ProvCommunication)
+ # Add the relations to the context
+ self.ctx.add_relation(
+ self.commit, self.commit.committer, ProvAssociation, {PROV_ROLE: ProvRole.COMMITTER}
+ )
+ self.ctx.add_relation(
+ self.commit, self.commit.author, ProvAssociation, {PROV_ROLE: ProvRole.AUTHOR}
+ )
+ self.ctx.add_relation(self.revision, self.revision.file, ProvSpecialization)
+ self.ctx.add_relation(
+ self.revision,
+ self.commit,
+ ProvInvalidation,
+ {PROV_ATTR_STARTTIME: self.commit.authored_at, PROV_ROLE: ProvRole.DELETED_REVISION},
+ )
+ # Return the document
+ return self.ctx.document
+
+
+@dataclass
+class FileModificationModel:
+ commit: GitCommit
+ parent: GitCommit
+ revision: FileRevision
+ previous: FileRevision
+ ctx: ProvenanceContext = field(init=False)
+
+ def __post_init__(self):
+ # Initialize the context
+ self.ctx = ProvenanceContext(ProvDocument())
+
+ def build_provenance_model(self) -> ProvDocument:
+ # Add the elements to the context
+ self.ctx.add_element(self.commit)
+ self.ctx.add_element(self.revision)
+ self.ctx.add_element(self.revision.file)
+ self.ctx.add_element(self.previous)
+ self.ctx.add_element(self.commit.author)
+ self.ctx.add_element(self.commit.committer)
+ # Check if parent exists
+ if self.parent:
+ # Add the parent to the context
+ self.ctx.add_element(self.parent)
+ # Add the communication relation (wasInformedBy) between the parent and the commit
+ self.ctx.add_relation(self.commit, self.parent, ProvCommunication)
+ # Add the relations to the context
+ self.ctx.add_relation(
+ self.commit, self.commit.author, ProvAssociation, {PROV_ROLE: ProvRole.AUTHOR}
+ )
+ self.ctx.add_relation(
+ self.commit, self.commit.committer, ProvAssociation, {PROV_ROLE: ProvRole.COMMITTER}
+ )
+ self.ctx.add_relation(self.revision, self.revision.file, ProvSpecialization)
+ self.ctx.add_relation(
+ self.revision,
+ self.commit,
+ ProvGeneration,
+ {PROV_ATTR_STARTTIME: self.commit.authored_at, PROV_ROLE: ProvRole.MODIFIED_REVISION},
+ )
+ self.ctx.add_relation(self.revision, self.commit.author, ProvAttribution)
+ self.ctx.add_relation(
+ self.revision, self.previous, ProvDerivation
+ ) # TODO: has to be wasRevisionOf record, add asserted type 'Revison'
+ self.ctx.add_relation(
+ self.commit,
+ self.previous,
+ ProvUsage,
+ {PROV_ATTR_STARTTIME: self.commit.authored_at, PROV_ROLE: ProvRole.PREVIOUS_REVISION},
+ )
+ # Return the document
+ return self.ctx.document
+
+
+@dataclass
+class HostedResourceModel:
+ """Model for a hosted resource (e.g., commit, issue, merge request)."""
+
+ resource: Union[Commit, Issue, MergeRequest]
+ commit: Optional[GitCommit] = None
+ ctx: ProvenanceContext = field(init=False)
+
+ def __post_init__(self):
+ # Initialize the context
+ self.ctx = ProvenanceContext(ProvDocument())
+
+ def build_provenance_model(self):
+ # Choose the creation part based on the type of resource
+ if isinstance(self.resource, Commit) and self.commit:
+ self._add_creation_part_for_hosted_commits()
+ else:
+ self._add_creation_part()
+ # Set the previous annotation and version to the creation / original version
+ previous_annotation = self.resource.creation
+ previous_version = self.resource.first_version
+ # For each annotation and version, add the annotation part, sort by time ascending
+ for current_annotation, current_version in zip(
+ sorted(self.resource.annotations, key=attrgetter("start")),
+ sorted(self.resource.annotated_versions, key=attrgetter("start")),
+ ):
+ # Add the annotation chain link
+ self._add_annotation_part(
+ current_annotation,
+ previous_annotation,
+ current_version,
+ previous_version,
+ )
+ # Update the previous annotation and version
+ previous_annotation = current_annotation
+ previous_version = current_version
+
+ return self.ctx.document
+
+ def _add_creation_part_for_hosted_commits(self):
+ # Add the elements to the context
+ self.ctx.add_element(self.resource)
+ self.ctx.add_element(self.resource.creation)
+ self.ctx.add_element(self.resource.first_version)
+ self.ctx.add_element(self.resource.author)
+ self.ctx.add_element(self.commit)
+ self.ctx.add_element(self.commit.committer)
+ # Add the relations to the context
+ self.ctx.add_relation(
+ self.resource.creation,
+ self.resource.author,
+ ProvAssociation,
+ {PROV_ROLE: ProvRole.COMMIT_AUTHOR},
+ )
+ self.ctx.add_relation(self.resource, self.resource.author, ProvAttribution)
+ self.ctx.add_relation(self.resource.first_version, self.resource, ProvSpecialization)
+ self.ctx.add_relation(self.resource.first_version, self.resource.author, ProvAttribution)
+ self.ctx.add_relation(
+ self.resource,
+ self.resource.creation,
+ ProvGeneration,
+ {PROV_ATTR_STARTTIME: self.resource.creation.start, PROV_ROLE: ProvRole.RESOURCE},
+ )
+ self.ctx.add_relation(
+ self.resource.first_version,
+ self.resource.creation,
+ ProvGeneration,
+ {
+ PROV_ATTR_STARTTIME: self.resource.creation.start,
+ PROV_ROLE: ProvRole.FIRST_RESOURCE_VERSION,
+ },
+ )
+ self.ctx.add_relation(self.resource.creation, self.commit, ProvCommunication)
+ self.ctx.add_relation(
+ self.commit,
+ self.commit.committer,
+ ProvAssociation,
+ {PROV_ROLE: ProvRole.COMMIT_AUTHOR},
+ )
+
+ def _add_creation_part(self):
+ self.ctx.add_element(self.resource)
+ self.ctx.add_element(self.resource.creation)
+ self.ctx.add_element(self.resource.first_version)
+ self.ctx.add_element(self.resource.author)
+
+ self.ctx.add_relation(self.resource, self.resource.author, ProvAttribution)
+ self.ctx.add_relation(self.resource.first_version, self.resource, ProvSpecialization)
+ self.ctx.add_relation(self.resource.first_version, self.resource.author, ProvAttribution)
+ self.ctx.add_relation(
+ self.resource.creation,
+ self.resource.author,
+ ProvAssociation,
+ {PROV_ROLE: AUTHOR_ROLE_MAP[type(self.resource)]},
+ )
+ self.ctx.add_relation(
+ self.resource,
+ self.resource.creation,
+ ProvGeneration,
+ {PROV_ATTR_STARTTIME: self.resource.creation.start, PROV_ROLE: ProvRole.RESOURCE},
+ )
+ self.ctx.add_relation(
+ self.resource.first_version,
+ self.resource.creation,
+ ProvGeneration,
+ {
+ PROV_ATTR_STARTTIME: self.resource.creation.start,
+ PROV_ROLE: ProvRole.FIRST_RESOURCE_VERSION,
+ },
+ )
+
+ def _add_annotation_part(
+ self,
+ current_annotation: Annotation,
+ previous_annotation: Union[Annotation, Creation],
+ current_version: AnnotatedVersion,
+ previous_version: AnnotatedVersion,
):
- return modification
- if rev.change_type == ChangeType.DELETED:
- return deletion
- return None
-
-
-def addition(
- commit: GitCommit,
- parents: list[GitCommit],
- rev: FileRevision,
- graph: ProvDocument = None,
-):
- """Add model for the addition of a new file in a commit."""
- if graph is None:
- graph = graph_factory()
- c = graph.activity(*commit)
- at = graph.agent(*commit.author)
- ct = graph.agent(*commit.committer)
-
- c.wasAssociatedWith(
- at, plan=None, attributes=[(PROV_ROLE, list(at.get_attribute(PROV_ROLE))[0])]
- )
- c.wasAssociatedWith(
- ct, plan=None, attributes=[(PROV_ROLE, list(ct.get_attribute(PROV_ROLE))[0])]
- )
-
- for parent in parents:
- graph.activity(*commit).wasInformedBy(graph.activity(*parent))
-
- f = graph.entity(*rev.original)
- f.wasAttributedTo(at)
- f.wasGeneratedBy(c, time=c.get_startTime(), attributes=[(PROV_ROLE, ProvRole.FILE)])
-
- rev = graph.entity(*rev)
- rev.wasAttributedTo(at)
- rev.specializationOf(f)
- rev.wasGeneratedBy(
- c,
- time=c.get_startTime(),
- attributes=[(PROV_ROLE, ProvRole.FILE_REVISION_AT_POINT_OF_ADDITION)],
- )
- return graph
-
-
-def modification(
- commit: GitCommit,
- parents: list[GitCommit],
- fv: FileRevision,
- graph: ProvDocument = None,
-):
- if graph is None:
- graph = graph_factory()
- c = graph.activity(*commit)
- at = graph.agent(*commit.author)
- ct = graph.agent(*commit.committer)
-
- c.wasAssociatedWith(
- at, plan=None, attributes=[(PROV_ROLE, list(at.get_attribute(PROV_ROLE))[0])]
- )
- c.wasAssociatedWith(
- ct, plan=None, attributes=[(PROV_ROLE, list(ct.get_attribute(PROV_ROLE))[0])]
- )
-
- for parent in parents:
- graph.activity(*commit).wasInformedBy(graph.activity(*parent))
-
- f = graph.entity(*fv.original)
- rev = graph.entity(*fv)
- rev.wasAttributedTo(at)
- rev.specializationOf(f)
- rev.wasGeneratedBy(
- c,
- time=c.get_startTime(),
- attributes=[(PROV_ROLE, ProvRole.FILE_REVISION_AFTER_MODIFICATION)],
- )
-
- # skip previous revisions if none exist
- if fv.previous is None:
- return graph
-
- prev = graph.entity(*fv.previous)
- prev.specializationOf(f)
- graph.wasRevisionOf(rev, prev) # NOTE: rev.wasRevisionOf(prev) is not impl in prov pkg
- c.used(
- prev,
- c.get_startTime(),
- [(PROV_ROLE, ProvRole.FILE_REVISION_TO_BE_MODIFIED)],
- )
- return graph
-
-
-def deletion(
- commit: GitCommit,
- parents: list[GitCommit],
- fv: FileRevision,
- graph: ProvDocument = None,
-):
- if graph is None:
- graph = graph_factory()
- c = graph.activity(*commit)
- at = graph.agent(*commit.author)
- ct = graph.agent(*commit.committer)
-
- c.wasAssociatedWith(
- at, plan=None, attributes=[(PROV_ROLE, list(at.get_attribute(PROV_ROLE))[0])]
- )
- c.wasAssociatedWith(
- ct, plan=None, attributes=[(PROV_ROLE, list(ct.get_attribute(PROV_ROLE))[0])]
- )
-
- for parent in parents:
- graph.activity(*commit).wasInformedBy(graph.activity(*parent))
-
- f = graph.entity(*fv.original)
- rev = graph.entity(*fv)
- rev.specializationOf(f)
- rev.wasInvalidatedBy(
- c,
- c.get_startTime(),
- [(PROV_ROLE, ProvRole.FILE_REVISION_AT_POINT_OF_DELETION)],
- )
- return graph
-
-
-def gitlab_commit_model(resources, graph: ProvDocument = None):
- if graph is None:
- graph = graph_factory()
- for gitlab_commit in resources.list_all(GitlabCommit):
- git_commit = resources.get(GitCommit, hexsha=gitlab_commit.hexsha)
- graph.update(commit_creation(gitlab_commit, git_commit))
- graph.update(annotation_chain(gitlab_commit))
- return graph
- return graph
-
-
-def gitlab_issue_model(resources, graph: ProvDocument = None):
- if graph is None:
- graph = graph_factory()
- for issue in resources.list_all(Issue):
- graph.update(resource_creation(issue))
- graph.update(annotation_chain(issue))
- return graph
-
-
-def gitlab_merge_request_model(resources, graph: ProvDocument = None):
- if graph is None:
- graph = graph_factory()
- for merge_request in resources.list_all(MergeRequest):
- graph.update(resource_creation(merge_request))
- graph.update(annotation_chain(merge_request))
- return graph
-
-
-def commit_creation(
- gitlab_commit: GitlabCommit,
- git_commit: Optional[GitCommit],
- graph: ProvDocument = None,
-):
- if graph is None:
- graph = graph_factory()
- resource = graph.entity(*gitlab_commit)
- creation = graph.activity(*gitlab_commit.creation)
- first_version = graph.entity(*gitlab_commit.first_version)
- author = graph.agent(*gitlab_commit.author)
-
- resource.wasAttributedTo(author)
- creation.wasAssociatedWith(
- author, plan=None, attributes=[(PROV_ROLE, ProvRole.AUTHOR_GITLAB_COMMIT)]
- )
- resource.wasGeneratedBy(
- creation,
- time=creation.get_startTime(),
- attributes=[(PROV_ROLE, ProvRole.RESOURCE)],
- )
- first_version.wasGeneratedBy(
- creation,
- time=creation.get_startTime(),
- attributes=[(PROV_ROLE, ProvRole.RESOURCE_VERSION_AT_POINT_OF_CREATION)],
- )
- first_version.specializationOf(resource)
- first_version.wasAttributedTo(author)
-
- if git_commit is None:
- return graph
-
- commit = graph.activity(*git_commit)
- committer = graph.agent(*git_commit.committer)
- commit.wasAssociatedWith(committer, plan=None, attributes=[(PROV_ROLE, ProvRole.COMMITTER)])
- creation.wasInformedBy(commit)
-
- return graph
-
-
-def resource_creation(resource: Resource, graph: ProvDocument = None):
- if graph is None:
- graph = graph_factory()
- r = graph.entity(*resource)
- c = graph.activity(*resource.creation)
- rv = graph.entity(*resource.first_version)
- at = graph.agent(*resource.author)
-
- c.wasAssociatedWith(
- at,
- plan=None,
- attributes=[(PROV_ROLE, list(at.get_attribute(PROV_ROLE))[0])],
- )
-
- r.wasAttributedTo(at)
- rv.wasAttributedTo(at)
- rv.specializationOf(r)
- r.wasGeneratedBy(
- c,
- time=c.get_startTime(),
- attributes=[(PROV_ROLE, ProvRole.RESOURCE)],
- )
- rv.wasGeneratedBy(
- c,
- time=c.get_startTime(),
- attributes=[(PROV_ROLE, ProvRole.RESOURCE_VERSION_AT_POINT_OF_CREATION)],
- )
- return graph
-
-
-def annotation_chain(resource, graph=None):
- if graph is None:
- graph = graph_factory()
- r = graph.entity(*resource)
- c = graph.activity(*resource.creation)
- fv = graph.entity(*resource.first_version)
-
- prev_annot = c
- prev_annot_ver = fv
-
- for annotation, annotated_version in zip(resource.annotations, resource.annotated_versions):
- annot = graph.activity(*annotation)
- annot_ver = graph.entity(*annotated_version)
- annotator = graph.agent(*annotation.annotator)
-
- annot.wasInformedBy(prev_annot)
- annot_ver.wasDerivedFrom(prev_annot_ver)
- annot_ver.wasAttributedTo(annotator)
- annot_ver.specializationOf(r)
-
- annot.wasAssociatedWith(
- annotator,
- plan=None,
- attributes=[(PROV_ROLE, list(annotator.get_attribute(PROV_ROLE))[0])],
+ # Add the elements to the context
+ self.ctx.add_element(self.resource)
+ self.ctx.add_element(self.resource.creation)
+ self.ctx.add_element(current_annotation)
+ self.ctx.add_element(current_annotation.annotator)
+ self.ctx.add_element(current_version)
+ self.ctx.add_element(previous_annotation)
+ self.ctx.add_element(previous_version)
+ # Add the relations to the context
+ self.ctx.add_relation(current_annotation, previous_annotation, ProvCommunication)
+ self.ctx.add_relation(current_version, previous_version, ProvDerivation)
+ self.ctx.add_relation(current_version, current_annotation.annotator, ProvAttribution)
+ self.ctx.add_relation(
+ current_annotation,
+ current_annotation.annotator,
+ ProvAssociation,
+ {PROV_ROLE: ProvRole.ANNOTATOR},
+ )
+ self.ctx.add_relation(
+ current_annotation,
+ previous_version,
+ ProvUsage,
+ {
+ PROV_ATTR_STARTTIME: current_annotation.start,
+ PROV_ROLE: ProvRole.PRE_ANNOTATION_VERSION,
+ },
)
+ self.ctx.add_relation(
+ current_version,
+ current_annotation,
+ ProvGeneration,
+ {
+ PROV_ATTR_STARTTIME: current_annotation.start,
+ PROV_ROLE: ProvRole.POST_ANNOTATION_VERSION,
+ },
+ )
+
+
+@dataclass
+class ReleaseModel:
+ release: Release
+ tag: GitTag
+ ctx: ProvenanceContext = field(init=False)
+
+ def __post_init__(self):
+ self.ctx = ProvenanceContext(ProvDocument())
+
+ @staticmethod
+ def query(repository: Repository) -> Iterable[tuple[Release, GitTag]]:
+ for release in repository.list_all(Release):
+ tag = repository.get(GitTag, sha=release.tag_sha)
+ yield release, tag
+
+ def build_provenance_model(self) -> ProvDocument:
+ # Add the release
+ self.ctx.add_element(self.release)
+ self.ctx.add_element(self.release.author)
+ self.ctx.add_element(self.release.creation)
+ # Add all evidence files
+ for evidence in self.release.evidences:
+ self.ctx.add_element(evidence)
+ # Add all assets
+ for asset in self.release.assets:
+ self.ctx.add_element(asset)
+ # Add the tag
+ self.ctx.add_element(self.tag)
+ self.ctx.add_element(self.tag.creation)
+ self.ctx.add_element(self.tag.author)
+ # Add the release relationships
+ self.ctx.add_relation(self.release, self.release.author, ProvAttribution)
+ self.ctx.add_relation(
+ self.release,
+ self.release.creation,
+ ProvGeneration,
+ {PROV_ATTR_STARTTIME: self.release.creation.start, PROV_ROLE: ProvRole.RELEASE},
+ )
+ self.ctx.add_relation(
+ self.release.creation,
+ self.release.author,
+ ProvAssociation,
+ {PROV_ROLE: ProvRole.RELEASE_AUTHOR},
+ )
+ # Add the evidence and asset relationships
+ for evidence in self.release.evidences:
+ self.ctx.add_relation(evidence, self.release, ProvMembership)
+ self.ctx.add_relation(evidence, self.release.creation, ProvGeneration)
+ for asset in self.release.assets:
+ self.ctx.add_relation(asset, self.release, ProvMembership)
+ self.ctx.add_relation(asset, self.release.creation, ProvGeneration)
+ # Add tag relationships
+ self.ctx.add_relation(self.tag, self.release, ProvMembership)
+ self.ctx.add_relation(self.tag, self.tag.author, ProvAttribution)
+ self.ctx.add_relation(
+ self.tag,
+ self.tag.creation,
+ ProvGeneration,
+ {PROV_ATTR_STARTTIME: self.tag.creation.start, PROV_ROLE: ProvRole.TAG},
+ )
+ self.ctx.add_relation(
+ self.tag.creation, self.tag.author, ProvAssociation, {PROV_ROLE: ProvRole.TAG_AUTHOR}
+ )
+
- annot.used(
- prev_annot_ver,
- annot.get_startTime(),
- [(PROV_ROLE, list(annotator.get_attribute(PROV_ROLE))[0])],
+@dataclass
+class GitTagModel:
+ """Model for a Git tag."""
+
+ tag: GitTag
+ commit: Commit | None = None
+ ctx: ProvenanceContext = field(init=False)
+
+ def __post_init__(self):
+ self.ctx = ProvenanceContext(ProvDocument())
+
+ @staticmethod
+ def query(repository: Repository) -> Iterable[tuple[GitTag, Commit]]:
+ for tag in repository.list_all(GitTag):
+ commit = repository.get(Commit, sha=tag.sha)
+ yield tag, commit
+
+ def build_provenance_model(self) -> ProvDocument:
+ # Add the tag
+ self.ctx.add_element(self.tag)
+ self.ctx.add_element(self.tag.creation)
+ self.ctx.add_element(self.tag.author)
+ # Add the commit
+ if self.commit:
+ self.ctx.add_element(self.commit)
+ self.ctx.add_element(self.commit.creation)
+ self.ctx.add_element(self.commit.author)
+ # Add tag relationships
+ self.ctx.add_relation(
+ self.tag,
+ self.tag.creation,
+ ProvGeneration,
+ {PROV_ATTR_STARTTIME: self.tag.creation.start, PROV_ROLE: ProvRole.TAG},
)
- annot_ver.wasGeneratedBy(
- annot,
- time=annot.get_startTime(),
- attributes=[(PROV_ROLE, ProvRole.RESOURCE_VERSION_AFTER_ANNOTATION)],
+ self.ctx.add_relation(self.tag, self.tag.author, ProvAttribution)
+ self.ctx.add_relation(
+ self.tag.creation, self.tag.author, ProvAssociation, {PROV_ROLE: ProvRole.TAG_AUTHOR}
)
- prev_annot = annot
- prev_annot_ver = annot_ver
- return graph
-
-
-def gitlab_release_tag_model(resources, graph: ProvDocument = None):
- if graph is None:
- graph = graph_factory()
- for tag in resources.list_all(Tag):
- release = resources.get(Release, tag_name=tag.name)
- commit = resources.get(GitlabCommit, hexsha=tag.hexsha)
- graph.update(release_and_tag(release, tag))
- graph.update(tag_and_commit(tag, commit))
- return graph
-
-
-def release_and_tag(release: Optional[Release], tag: Tag, graph: ProvDocument = None):
- if graph is None:
- graph = graph_factory()
- t = graph.collection(*tag)
-
- if release is None:
- return graph
-
- r = graph.collection(*release)
- c = graph.activity(*release.creation)
- t.hadMember(r)
- r.wasGeneratedBy(c, time=c.get_startTime(), attributes=[(PROV_ROLE, ProvRole.RELEASE)])
- for asset in release.assets:
- graph.entity(*asset).hadMember(graph.entity(*release))
- for evidence in release.evidences:
- graph.entity(*evidence).hadMember(graph.entity(*release))
-
- if release.author is None:
- return graph
-
- at = graph.agent(*release.author)
- r.wasAttributedTo(at)
- c.wasAssociatedWith(
- at, plan=None, attributes=[(PROV_ROLE, list(at.get_attribute(PROV_ROLE))[0])]
- )
-
- return graph
-
-
-def tag_and_commit(tag: Tag, commit: Optional[GitlabCommit], graph: ProvDocument = None):
- if graph is None:
- graph = graph_factory()
- t = graph.collection(*tag)
- tc = graph.activity(*tag.creation)
- at = graph.agent(*tag.author)
- t.wasAttributedTo(at)
- t.wasGeneratedBy(tc, time=tc.get_startTime(), attributes=[(PROV_ROLE, ProvRole.TAG)])
- tc.wasAssociatedWith(
- at, plan=None, attributes=[(PROV_ROLE, list(at.get_attribute(PROV_ROLE))[0])]
- )
-
- if commit is None:
- return graph
-
- cmt = graph.entity(*commit)
- cc = graph.activity(*commit.creation)
- at = graph.agent(*commit.author)
- cmt.hadMember(t)
- cmt.wasAttributedTo(at)
- cmt.wasGeneratedBy(cc, time=cc.get_startTime(), attributes=[(PROV_ROLE, ProvRole.GIT_COMMIT)])
- cc.wasAssociatedWith(
- at, plan=None, attributes=[(PROV_ROLE, list(at.get_attribute(PROV_ROLE))[0])]
- )
-
- return graph
+ # Add commit relationships
+ if self.commit:
+ self.ctx.add_relation(self.commit, self.tag, ProvMembership)
+ self.ctx.add_relation(
+ self.commit,
+ self.commit.creation,
+ ProvGeneration,
+ {PROV_ATTR_STARTTIME: self.commit.creation.start, PROV_ROLE: ProvRole.COMMIT},
+ )
+ self.ctx.add_relation(self.commit, self.commit.author, ProvAttribution)
+ self.ctx.add_relation(
+ self.commit.creation,
+ self.commit.author,
+ ProvAssociation,
+ {PROV_ROLE: ProvRole.COMMIT_AUTHOR},
+ )
+ return self.ctx.document
+
+
+@dataclass
+class CallableModel:
+ """A model that can be called to build a provenance document."""
+
+ model: Type[
+ FileAdditionModel
+ | FileModificationModel
+ | FileDeletionModel
+ | HostedResourceModel
+ | GitTagModel
+ | ReleaseModel
+ ]
+ query: Query
+ document: ProvDocument = field(init=False)
+
+ def __post_init__(self):
+ # Initialize the document
+ self.document = ProvDocument()
+
+ def __call__(self, repository: Repository):
+ # Pass the repository to the query
+ for args in self.query(repository):
+ # Initialize the model
+ m = self.model(*args)
+ # Update the document with the model
+ self.document.update(m.build_provenance_model())
+ return self.document
MODELS = [
- git_commit_model,
- gitlab_commit_model,
- gitlab_issue_model,
- gitlab_merge_request_model,
- gitlab_release_tag_model,
+ CallableModel(FileAdditionModel, FileAdditionQuery),
+ CallableModel(FileDeletionModel, FileDeletionQuery),
+ CallableModel(FileModificationModel, FileModificationQuery),
+ CallableModel(HostedResourceModel, HostedIssueQuery),
+ CallableModel(HostedResourceModel, HostedCommitQuery),
+ CallableModel(HostedResourceModel, HostedMergeQuery),
+ CallableModel(ReleaseModel, ReleaseModel.query),
+ CallableModel(GitTagModel, GitTagModel.query),
]
diff --git a/gitlab2prov/prov/operations.py b/gitlab2prov/prov/operations.py
index e70d37d..8cf07d6 100644
--- a/gitlab2prov/prov/operations.py
+++ b/gitlab2prov/prov/operations.py
@@ -1,7 +1,8 @@
import json
+import sys
import logging
import hashlib
-from typing import Iterable, NamedTuple, Type
+from typing import NamedTuple, Type
from collections import defaultdict, Counter
from pathlib import Path
@@ -34,40 +35,72 @@
DESERIALIZATION_FORMATS = ["rdf", "xml", "json"]
-def serialize_graph(
- graph: ProvDocument, format: str = "json", destination=None, encoding="utf-8"
-) -> str | None:
- if format not in SERIALIZATION_FORMATS:
- raise ValueError("Unsupported serialization format.")
- if format == "dot":
- return prov_to_dot(graph).to_string().encode(encoding)
- return graph.serialize(format=format, destination=destination)
+def read_provenance_file(filename: str) -> ProvDocument:
+ """Read a ProvDocument from a file or sys.stdin."""
+ try:
+ if filename == "-":
+ content = sys.stdin.read()
+ else:
+ with open(filename, "r") as f:
+ content = f.read()
+ except FileNotFoundError:
+ raise FileNotFoundError(f"File {filename} does not exist.")
+ return deserialize_string(content=content)
+
+
+def deserialize_string(content: str, format: str = None):
+ """Deserialize a ProvDocument from a string."""
+ formats = [format] if format else DESERIALIZATION_FORMATS
+ for fmt in formats:
+ try:
+ return ProvDocument.deserialize(content=content, format=fmt)
+ except Exception:
+ pass
+ raise ValueError(f"Deserialization failed for content: {content} and format: {format}")
-def deserialize_graph(source: str = None, content: str = None):
- for format in DESERIALIZATION_FORMATS:
- try:
- return ProvDocument.deserialize(source=source, content=content, format=format)
- except:
- continue
- raise Exception
+def write_provenance_file(
+ document: ProvDocument, filename: str, format: str = "json", overwrite: bool = True
+) -> None:
+ """Write a ProvDocument to a file."""
+ mode = "x" if not overwrite else "w"
+ try:
+ with open(filename, mode) as f:
+ f.write(serialize_string(document, format=format))
+ except FileExistsError:
+ raise FileExistsError(f"File {filename} already exists.")
+
+
+def serialize_string(document: ProvDocument, format: str = "json") -> str:
+ """Serialize a ProvDocument to a string."""
+ if format not in SERIALIZATION_FORMATS:
+ raise ValueError(f"Unsupported serialization format: {format}")
+ if format == "dot":
+ return prov_to_dot(document).to_string()
+ return document.serialize(format=format)
def format_stats_as_ascii_table(stats: dict[str, int]) -> str:
- table = f"|{'Record Type':20}|{'Count':20}|\n+{'-'*20}+{'-'*20}+\n"
- for record_type, count in stats.items():
- table += f"|{record_type:20}|{count:20}|\n"
- return table
+ """Format a dictionary as an ASCII table."""
+ header = "|Record Type |Count |\n"
+ line = "+---------------------+--------------------+\n"
+ rows = [f"|{record_type:20}|{count:20}|" for record_type, count in stats.items()]
+ return f"{header}{line}{''.join(rows)}"
def format_stats_as_csv(stats: dict[str, int]) -> str:
- csv = f"Record Type, Count\n"
- for record_type, count in stats.items():
- csv += f"{record_type}, {count}\n"
- return csv
+ """Format a dictionary as a CSV string."""
+ header = "Record Type, Count\n"
+ rows = [f"{record_type}, {count}" for record_type, count in stats.items()]
+ return f"{header}{''.join(rows)}"
-def stats(graph: ProvDocument, resolution: str, formatter=format_stats_as_ascii_table) -> str:
+def stats(graph: ProvDocument, resolution: str, format: str = "table") -> str:
+ if format == "csv":
+ formatter = format_stats_as_csv
+ if format == "table":
+ formatter = format_stats_as_ascii_table
+
elements = Counter(e.get_type().localpart for e in graph.get_records(ProvElement))
relations = Counter(r.get_type().localpart for r in graph.get_records(ProvRelation))
@@ -92,14 +125,11 @@ def graph_factory(records: Optional[Sequence[ProvRecord]] = None) -> ProvDocumen
return graph
-def combine(graphs: Iterable[ProvDocument]) -> ProvDocument:
- log.info(f"combine graphs {graphs}")
- try:
- acc = next(graphs)
- except StopIteration:
- return graph_factory()
- for graph in graphs:
- acc.update(graph)
+def combine(*documents: ProvDocument) -> ProvDocument:
+ log.info(f"combine {documents=}")
+ acc = documents[0]
+ for document in documents[1:]:
+ acc.update(document)
return dedupe(acc)
@@ -142,7 +172,7 @@ def read(fp: Path) -> dict[str, list[str]]:
data = f.read()
d = json.loads(data)
if not d:
- log.info(f"empty agent mapping")
+ log.info("empty agent mapping")
return dict()
return d
@@ -209,21 +239,6 @@ def get_attribute(record: ProvRecord, attribute: str, first: bool = True) -> str
return choices[0] if first else choices
-def pseudonymize_agent(
- agent: ProvAgent,
- identifier: QualifiedName,
- keep: list[QualifiedName],
- replace: dict[str, Any],
-) -> ProvAgent:
- kept = [(key, val) for key, val in agent.extra_attributes if key in keep]
- replaced = [
- (key, replace.get(key.localpart, val))
- for key, val in agent.extra_attributes
- if key.localpart in replace
- ]
- return ProvAgent(agent.bundle, identifier, kept + replaced)
-
-
def pseudonymize(graph: ProvDocument) -> ProvDocument:
log.info(f"pseudonymize agents in {graph=}")
@@ -266,3 +281,62 @@ def pseudonymize(graph: ProvDocument) -> ProvDocument:
records.append(r_type(relation.bundle, relation.identifier, formal + extra))
return graph_factory(records)
+
+
+def generate_pseudonym(name: str, email: str = None) -> QualifiedName:
+ """Generate pseudonym using hashed name and email."""
+ name_hash = hashlib.sha256(bytes(name, "utf-8")).hexdigest()
+ email_hash = hashlib.sha256(bytes(email, "utf-8")).hexdigest() if email else None
+ return qualified_name(f"User?name={name_hash}&email={email_hash}")
+
+
+def pseudonymize_agent(agent: ProvAgent, pseudonyms: dict) -> ProvAgent:
+ """Replace agent identifier with pseudonym."""
+ name = get_attribute(agent, USERNAME)
+ mail = get_attribute(agent, USEREMAIL)
+
+ if name is None:
+ raise ValueError("ProvAgent representing a user has to have a name!")
+
+ pseudonym = generate_pseudonym(name, mail)
+
+ keep = [PROV_ROLE, PROV_TYPE]
+ replace = {USERNAME: name, USEREMAIL: mail}
+
+ kept = [(key, val) for key, val in agent.extra_attributes if key in keep]
+ replaced = [
+ (key, replace.get(key.localpart, val))
+ for key, val in agent.extra_attributes
+ if key.localpart in replace
+ ]
+
+ pseudonymized_agent = ProvAgent(agent.bundle, pseudonym, kept + replaced)
+
+ return pseudonymized_agent, agent.identifier, pseudonym
+
+
+def pseudonymize_relation(relation: ProvRelation, pseudonyms: dict) -> ProvRelation:
+ """Replace relation identifiers with pseudonyms."""
+ formal = [(key, pseudonyms.get(val, val)) for key, val in relation.formal_attributes]
+ extra = [(key, pseudonyms.get(val, val)) for key, val in relation.extra_attributes]
+ r_type = PROV_REC_CLS.get(relation.get_type())
+ return r_type(relation.bundle, relation.identifier, formal + extra)
+
+
+def pseudonymize(graph: ProvDocument) -> ProvDocument:
+ """Pseudonymize agents in a ProvDocument."""
+ log.info(f"Pseudonymize agents in {graph=}")
+
+ records = list(graph.get_records((ProvActivity, ProvEntity)))
+ pseudonyms = dict()
+
+ for agent in graph.get_records(ProvAgent):
+ pseudonymized_agent, original_id, pseudonym = pseudonymize_agent(agent)
+ pseudonyms[original_id] = pseudonym
+ records.append(pseudonymized_agent)
+
+ for relation in graph.get_records(ProvRelation):
+ pseudonymized_relation = pseudonymize_relation(relation, pseudonyms)
+ records.append(pseudonymized_relation)
+
+ return graph_factory(records)
diff --git a/gitlab2prov/service_layer/handlers.py b/gitlab2prov/service_layer/handlers.py
index 9e6535e..b17fb0b 100644
--- a/gitlab2prov/service_layer/handlers.py
+++ b/gitlab2prov/service_layer/handlers.py
@@ -9,42 +9,77 @@
def fetch_git(cmd: commands.Fetch, uow, git_fetcher) -> None:
- with git_fetcher(cmd.url, cmd.token) as fetcher:
- fetcher.do_clone()
+ log.info(f"fetch {cmd=}")
+ with git_fetcher as fetcher:
+ fetcher.do_clone(cmd.url, cmd.token)
with uow:
- for resource in fetcher.fetch_git():
+ for resource in fetcher.fetch_all():
log.info(f"add {resource=}")
- uow.resources.add(resource)
+ uow.resources[cmd.url].add(resource)
uow.commit()
-def fetch_gitlab(cmd: commands.Fetch, uow, gitlab_fetcher) -> None:
- fetcher = gitlab_fetcher(cmd.url, cmd.token)
- fetcher.do_login()
+def fetch_githosted(cmd: commands.Fetch, uow, githosted_fetcher) -> None:
+ log.info(f"fetch {cmd=}")
+ fetcher = githosted_fetcher(cmd.token, cmd.url)
with uow:
- for resource in fetcher.fetch_gitlab():
+ for resource in fetcher.fetch_all():
log.info(f"add {resource=}")
- uow.resources.add(resource)
+ uow.resources[cmd.url].add(resource)
uow.commit()
-def reset(cmd: commands.Reset, uow):
- log.info(f"reset repository {uow.resources=}")
- uow.reset()
-
-
def serialize(cmd: commands.Serialize, uow) -> ProvDocument:
log.info(f"serialize graph consisting of {model.MODELS=}")
- graph = operations.combine(prov_model(uow.resources) for prov_model in model.MODELS)
- graph = operations.dedupe(graph)
- return graph
+ document = ProvDocument()
+ for prov_model in model.MODELS:
+ log.info(f"populate {prov_model=}")
+ provenance = prov_model(uow.resources[cmd.url])
+ document = operations.combine(document, provenance)
+ document = operations.dedupe(document)
+ return document
+
+
+def transform(cmd: commands.Transform):
+ log.info(f"transform {cmd=}")
+ if cmd.remove_duplicates:
+ cmd.document = operations.dedupe(cmd.doc)
+ if cmd.use_pseudonyms:
+ cmd.document = operations.pseudonymize(cmd.doc)
+ if cmd.merge_aliased_agents:
+ cmd.document = operations.merge_duplicated_agents(cmd.doc, cmd.merge_aliased_agents)
+ return cmd.document
+
+
+def combine(cmd: commands.Combine):
+ log.info(f"combine {cmd=}")
+ return operations.combine(*cmd.documents)
+
+
+def write_file(cmd: commands.Write):
+ log.info(f"write {cmd=}")
+ return operations.write_provenance_file(cmd.document, cmd.filename, cmd.format)
+
+
+def read_file(cmd: commands.Read):
+ log.info(f"read {cmd=}")
+ return operations.read_provenance_file(cmd.filename)
+
+
+def statistics(cmd: commands.Statistics):
+ log.info(f"statistics {cmd=}")
+ return operations.stats(cmd.document, cmd.resolution, cmd.format)
HANDLERS = {
commands.Fetch: [
fetch_git,
- fetch_gitlab,
+ fetch_githosted,
],
- commands.Reset: [reset],
commands.Serialize: [serialize],
+ commands.Read: [read_file],
+ commands.Write: [write_file],
+ commands.Combine: [combine],
+ commands.Transform: [transform],
+ commands.Statistics: [statistics],
}
diff --git a/gitlab2prov/service_layer/messagebus.py b/gitlab2prov/service_layer/messagebus.py
index 3afc567..077a722 100644
--- a/gitlab2prov/service_layer/messagebus.py
+++ b/gitlab2prov/service_layer/messagebus.py
@@ -5,7 +5,7 @@
from prov.model import ProvDocument
from gitlab2prov.domain.commands import Command
-from gitlab2prov.service_layer.unit_of_work import AbstractUnitOfWork
+from gitlab2prov.service_layer.unit_of_work import UnitOfWork
logger = logging.getLogger(__name__)
@@ -13,11 +13,10 @@
@dataclass
class MessageBus:
- uow: AbstractUnitOfWork
+ uow: UnitOfWork
handlers: dict[type[Command], list[Callable]]
def handle(self, command: Command) -> ProvDocument | None:
- # TODO: Return more than the last result...
for handler in self.handlers[type(command)]:
try:
logger.debug(f"Handling command {command}.")
diff --git a/gitlab2prov/service_layer/unit_of_work.py b/gitlab2prov/service_layer/unit_of_work.py
index 46c208d..01f8ab0 100644
--- a/gitlab2prov/service_layer/unit_of_work.py
+++ b/gitlab2prov/service_layer/unit_of_work.py
@@ -1,12 +1,13 @@
from __future__ import annotations
import abc
+from collections import defaultdict
from gitlab2prov.adapters import repository
-class AbstractUnitOfWork(abc.ABC):
- def __enter__(self) -> AbstractUnitOfWork:
+class UnitOfWork(abc.ABC):
+ def __enter__(self) -> UnitOfWork:
return self
def __exit__(self, *args):
@@ -31,9 +32,10 @@ def rollback(self):
raise NotImplementedError
-class InMemoryUnitOfWork(AbstractUnitOfWork):
+class InMemoryUnitOfWork(UnitOfWork):
def __init__(self):
- self.resources = repository.InMemoryRepository()
+ # self.resources = repository.InMemoryRepository()
+ self.resources = defaultdict(repository.InMemoryRepository)
def __enter__(self):
return super().__enter__()
diff --git a/pyproject.toml b/pyproject.toml
index b18f3c1..2dc42cd 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -22,6 +22,7 @@ dependencies = [
"jsonschema", # MIT License
"ruamel.yaml", # MIt License
"pydot>=1.2.0", # MIT License
+ "PyGithub", # LGPL-3.0 License
"click", # BSD 3-Clause License
]
keywords = [
@@ -56,7 +57,8 @@ dev = [
]
[project.scripts]
-gitlab2prov = "gitlab2prov.entrypoints.cli:cli"
+gitlab2prov = "gitlab2prov.entrypoints.cli:gitlab2prov"
+github2prov = "gitlab2prov.entrypoints.cli:github2prov"
[project.urls]
Twitter = "https://twitter.com/dlr_software"
@@ -69,6 +71,9 @@ version = { attr = "gitlab2prov.__version__" }
[tool.setuptools.packages.find]
exclude = ["tests*", "docs*"]
+[tool.setuptools.package-data]
+"gitlab2prov.config" = ["schema.json"]
+
[tool.isort]
profile = "black"
py_version = 310
diff --git a/tests/conftest.py b/tests/conftest.py
new file mode 100644
index 0000000..c130ca3
--- /dev/null
+++ b/tests/conftest.py
@@ -0,0 +1,36 @@
+import random
+import string
+import pytest
+
+from gitlab2prov.domain.objects import User
+
+
+def generate_random_user():
+ name = "".join(random.choice(string.ascii_letters) for _ in range(6))
+ email = f"{name}@example.com"
+ gitlab_username = "".join(random.choice(string.ascii_lowercase) for _ in range(6))
+ github_username = "".join(random.choice(string.ascii_lowercase) for _ in range(6))
+ gitlab_id = str(random.randint(1000, 9999))
+ github_id = str(random.randint(1000, 9999))
+ prov_role = random.choice(["admin", "user", "guest", None])
+ return User(
+ name=name,
+ email=email,
+ gitlab_username=gitlab_username,
+ github_username=github_username,
+ gitlab_id=gitlab_id,
+ github_id=github_id,
+ prov_role=prov_role,
+ )
+
+
+@pytest.fixture
+def random_user() -> User:
+ return generate_random_user()
+
+
+@pytest.fixture
+def n_random_users(request) -> list[User]:
+ marker = request.node.get_closest_marker("fixt_data")
+ n = 10 if marker is None else marker.args[0]
+ return [generate_random_user() for _ in range(n)]
diff --git a/tests/integration/test_repository.py b/tests/integration/test_repository.py
deleted file mode 100644
index c82e221..0000000
--- a/tests/integration/test_repository.py
+++ /dev/null
@@ -1,38 +0,0 @@
-from datetime import datetime, timedelta
-
-from gitlab2prov.adapters import repository
-from gitlab2prov.domain import objects
-
-
-today = datetime.now()
-tomorrow = today + timedelta(days=1)
-yesterday = today - timedelta(days=1)
-
-
-class TestInMemoryRepository:
- def test_get(self):
- repo = repository.InMemoryRepository()
- u1 = objects.User(name="u1", email="e1", prov_role="r1")
- u2 = objects.User(name="u2", email="e2", prov_role="r2")
- repo.add(u1)
- repo.add(u2)
- assert repo.get(objects.User, name="u1") == u1
- assert repo.get(objects.User, name="u2") == u2
-
- def test_get_returns_none_if_repository_is_empty(self):
- repo = repository.InMemoryRepository()
- assert repo.get(objects.User, name="name") == None
-
- def test_list_all(self):
- repo = repository.InMemoryRepository()
- u1 = objects.User(name="u1", email="e1", prov_role="r1")
- u2 = objects.User(name="u2", email="e2", prov_role="r1")
- repo.add(u1)
- repo.add(u2)
- assert repo.list_all(objects.User, name="u1") == [u1]
- assert repo.list_all(objects.User, name="u2") == [u2]
- assert repo.list_all(objects.User, prov_role="r1") == [u1, u2]
-
- def test_list_all_returns_empty_list_if_repository_is_empty(self):
- repo = repository.InMemoryRepository()
- assert repo.list_all(objects.User, name="name") == []
diff --git a/tests/random_refs.py b/tests/random_refs.py
deleted file mode 100644
index 173759f..0000000
--- a/tests/random_refs.py
+++ /dev/null
@@ -1,17 +0,0 @@
-import uuid
-from gitlab2prov.domain import objects
-from gitlab2prov.domain.constants import ProvRole
-
-
-def random_suffix():
- return uuid.uuid4().hex[:6]
-
-
-def random_user():
- return objects.User(
- name=f"user-name-{random_suffix()}",
- email=f"user-email-{random_suffix()}",
- gitlab_username=f"gitlab-user-name-{random_suffix()}",
- gitlab_id=f"gitlab-user-id-{random_suffix()}",
- prov_role=ProvRole.AUTHOR,
- )
diff --git a/tests/unit/objects/test_file.py b/tests/unit/objects/test_file.py
new file mode 100644
index 0000000..ef3cca3
--- /dev/null
+++ b/tests/unit/objects/test_file.py
@@ -0,0 +1,26 @@
+from gitlab2prov.domain.objects import File
+
+
+class TestFile:
+ def test_file_creation(self):
+ # Test File object creation
+ file_obj = File(name="test_file.txt", path="/path/to/file", commit="12345")
+ assert file_obj.name == "test_file.txt"
+ assert file_obj.path == "/path/to/file"
+ assert file_obj.commit == "12345"
+
+ def test_identifier_property(self):
+ # Test identifier property
+ file_obj = File(name="test_file.txt", path="/path/to/file", commit="12345")
+ assert (
+ file_obj.identifier.localpart
+ == "File?name=test_file.txt&path=/path/to/file&commit=12345"
+ )
+
+ def test_to_prov_element_method(self):
+ # Test to_prov_element() method
+ file_obj = File(name="test_file.txt", path="/path/to/file", commit="12345")
+ prov_entity = file_obj.to_prov_element()
+ assert prov_entity.get_attribute("name") == "test_file.txt"
+ assert prov_entity.get_attribute("path") == "/path/to/file"
+ assert prov_entity.get_attribute("commit") == "12345"
diff --git a/tests/unit/objects/test_user.py b/tests/unit/objects/test_user.py
new file mode 100644
index 0000000..0d835b0
--- /dev/null
+++ b/tests/unit/objects/test_user.py
@@ -0,0 +1,83 @@
+from gitlab2prov.domain.objects import User
+from gitlab2prov.domain.constants import ProvType
+
+
+class TestUser:
+ # Test cases for User class
+ def test_user_creation(self):
+ # Test User creation with valid inputs
+ user = User(name="John Doe", email="johndoe@example.com")
+ assert user.name == "John Doe"
+ assert user.email == "johndoe@example.com"
+ assert user.gitlab_username is None
+ assert user.github_username is None
+ assert user.gitlab_id is None
+ assert user.github_id is None
+ assert user.prov_role is None
+
+ # Test User creation with optional parameters
+ user = User(
+ name="Jane Smith",
+ email="janesmith@example.com",
+ gitlab_username="janesmith",
+ github_username="janesmith",
+ gitlab_id="123",
+ github_id="456",
+ prov_role="developer",
+ )
+ assert user.name == "Jane Smith"
+ assert user.email == "janesmith@example.com"
+ assert user.gitlab_username == "janesmith"
+ assert user.github_username == "janesmith"
+ assert user.gitlab_id == "123"
+ assert user.github_id == "456"
+ assert user.prov_role == "developer"
+
+ def test_user_post_init(self):
+ # Test __post_init__() method with lowercase email
+ user = User(name="John Doe", email="JohnDoe@example.com")
+ assert user.email == "johndoe@example.com"
+
+ # Test __post_init__() method with None email
+ user = User(name="Jane Smith", email=None)
+ assert user.email is None
+
+ def test_user_identifier(self):
+ # Test identifier property
+ user = User(name="John Doe", email="johndoe@example.com")
+ assert user.identifier.localpart == "User?name=John Doe&email=johndoe@example.com"
+
+ def test_user_to_prov_element(self):
+ # Test to_prov_element() method with minimum attributes
+ user = User(name="John Doe", email="johndoe@example.com")
+ prov_element = user.to_prov_element()
+ assert prov_element.identifier == "User?name=John Doe&email=johndoe@example.com"
+ assert prov_element.attributes == [
+ ("name", "John Doe"),
+ ("email", "johndoe@example.com"),
+ ("prov_role", None),
+ ("prov_type", ProvType.USER),
+ ]
+
+ # Test to_prov_element() method with all attributes
+ user = User(
+ name="Jane Smith",
+ email="janesmith@example.com",
+ gitlab_username="janesmith",
+ github_username="janesmith",
+ gitlab_id="123",
+ github_id="456",
+ prov_role="developer",
+ )
+ prov_element = user.to_prov_element()
+ assert prov_element.identifier == "User?name=Jane Smith&email=janesmith@example.com"
+ assert prov_element.attributes == [
+ ("name", "Jane Smith"),
+ ("email", "janesmith@example.com"),
+ ("gitlab_username", "janesmith"),
+ ("github_username", "janesmith"),
+ ("gitlab_id", "123"),
+ ("github_id", "456"),
+ ("prov_role", "developer"),
+ ("prov_type", ProvType.USER),
+ ]
diff --git a/tests/unit/test_annotation_parsing.py b/tests/unit/test_annotation_parsing.py
deleted file mode 100644
index 3e8a841..0000000
--- a/tests/unit/test_annotation_parsing.py
+++ /dev/null
@@ -1,66 +0,0 @@
-from gitlab2prov.adapters.fetch.annotations import CLASSIFIERS
-from gitlab2prov.adapters.fetch.annotations.parse import classify_system_note
-from gitlab2prov.adapters.fetch.annotations.parse import longest_matching_classifier
-from gitlab2prov.adapters.fetch.annotations.parse import normalize
-
-
-class TestNormalize:
- def test_removes_trailing_whitespace(self):
- string = " test "
- assert not normalize(string).startswith(" ")
- assert not normalize(string).endswith(" ")
-
- def test_lowercase(self):
- string = "TEST"
- assert normalize(string).islower()
-
-
-class TestLongestMatchingClassifier:
- def test_returns_classifier_with_the_longest_match(self):
- string = "changed epic to slug&123"
- assert longest_matching_classifier(string) is CLASSIFIERS[1]
- assert longest_matching_classifier(string).name == "change_epic"
- string = "close via merge request slug!123"
- assert longest_matching_classifier(string) is CLASSIFIERS[7]
- assert longest_matching_classifier(string).name == "close_by_external_merge_request"
- string = "enabled automatic add to merge train when the pipeline for 12345abcde succeeds"
- assert longest_matching_classifier(string) is CLASSIFIERS[-1]
- assert longest_matching_classifier(string).name == "enable_automatic_add_to_merge_train"
-
- def test_returns_none_if_no_match_was_found(self):
- string = "NOT_MATCHABLE"
- assert longest_matching_classifier(string) is None
-
-
-class TestClassifySystemNote:
- def test_returns_import_statement_capture_groups(self):
- expected_captures = {"pre_import_author": "original-author"}
- string = "*by original-author on 1970-01-01T00:00:00 (imported from gitlab project)*"
- assert classify_system_note(string)[1] == expected_captures
- string = "*by original-author on 1970-01-01 00:00:00 UTC (imported from gitlab project)*"
- assert classify_system_note(string)[1] == expected_captures
-
- def test_returns_annotation_classifier_capture_groups(self):
- string = "assigned to @developer"
- expected_captures = {"user_name": "developer"}
- assert classify_system_note(string)[1] == expected_captures
-
- def test_returns_combined_capture_groups_of_the_import_statement_and_the_classifier(
- self,
- ):
- string = "assigned to @developer *by original-author on 1970-01-01T00:00:00 (imported from gitlab project)*"
- expected_captures = {
- "user_name": "developer",
- "pre_import_author": "original-author",
- }
- assert classify_system_note(string)[1] == expected_captures
-
- def test_returns_classifier_name_for_known_string(self):
- string = "assigned to @developer"
- expected_name = "assign_user"
- assert classify_system_note(string)[0] == expected_name
-
- def test_returns_default_annotation_for_unknown_string(self):
- string = "UNKNOWN"
- expected_name = "default_annotation"
- assert classify_system_note(string)[0] == expected_name
diff --git a/tests/unit/test_classifiers.py b/tests/unit/test_classifiers.py
deleted file mode 100644
index cdf7dff..0000000
--- a/tests/unit/test_classifiers.py
+++ /dev/null
@@ -1,87 +0,0 @@
-import random
-import re
-import string
-
-import pytest
-
-from gitlab2prov.adapters.fetch.annotations.classifiers import Classifier
-from gitlab2prov.adapters.fetch.annotations.classifiers import ImportStatement
-from gitlab2prov.adapters.fetch.annotations.classifiers import match_length
-
-
-class TestMatchLength:
- def test_raises_value(self):
- with pytest.raises(TypeError):
- match_length(None)
-
- def test_match_length_with_n_length_matches(self):
- for idx in range(1, 1000):
- pattern = r"\d{%d}" % idx
- s = "".join(random.choices(string.digits, k=idx))
- match = re.search(pattern, s)
- assert match_length(match) == idx
-
-
-class TestClassifier:
- def test_longest_matching_classifier_wins_selection(self):
- classifiers = [
- Classifier(patterns=[r"\d{1}"]),
- Classifier(patterns=[r"\d{2}"]),
- Classifier(patterns=[r"\d{3}"]),
- ]
- for classifier in classifiers:
- classifier.matches(string.digits)
- assert max(classifiers, key=len) == classifiers[-1]
-
- def test_matches_should_return_true_if_any_pattern_matches(self):
- classifier = Classifier(patterns=[r"\d", r"\s"])
- assert classifier.matches(string.digits) == True
-
- def test_matches_should_return_false_if_no_pattern_matches(self):
- c = Classifier(patterns=[r"\d", r"\s"])
- assert c.matches(string.ascii_letters) == False
-
- def test_matches_should_store_the_longest_match_in_the_class_attributes(self):
- regexes = [r"\d{1}", r"\d{2}", r"\d{3}"]
- classifier = Classifier(patterns=regexes)
- classifier.matches(string.digits)
- assert classifier.match.re.pattern == regexes[-1]
-
- def test_groupdict_should_return_empty_dict_if_no_pattern_matches(self):
- classifier = Classifier(patterns=[r"\d"])
- classifier.matches(string.ascii_letters)
- assert classifier.groupdict() == dict()
-
- def test_groupdict_should_return_captured_groups_if_a_pattern_matches(self):
- classifier = Classifier(patterns=[r"(?P\d)"])
- classifier.matches(string.digits)
- assert classifier.groupdict() == {"number": string.digits[0]}
-
- def test_length_should_be_0_if_no_match_was_found(self):
- classifier = Classifier(patterns=[r"\d"])
- classifier.matches(string.ascii_letters)
- assert len(classifier) == 0
-
- def test_length_should_be_the_span_of_the_found_match(self):
- classifier = Classifier(patterns=[r"\d"])
- classifier.matches(string.digits)
- assert len(classifier) == 1
-
-
-class TestImportStatement:
- def test_replace_returns_unchanged_string_if_no_match_was_found(self):
- imp = ImportStatement(patterns=[r"\d{3}"])
- imp.matches(string.ascii_letters)
- assert imp.replace(string.ascii_letters) == string.ascii_letters
-
- def test_import_statement_removes_only_the_leftmost_occurence(self):
- imp = ImportStatement(patterns=[r"\d{3}"])
- imp.matches(string.digits)
- assert imp.replace(string.digits) == string.digits[3:]
-
- def test_removes_trailing_whitespace_after_import_pattern_replacement(self):
- imp = ImportStatement(patterns=[r"\d{3}"])
- s = f"{string.whitespace}{string.digits}{string.whitespace}"
- imp.matches(s)
- assert not imp.replace(s).endswith(" ")
- assert not imp.replace(s).startswith(" ")
diff --git a/tests/unit/test_fetch_utils.py b/tests/unit/test_fetch_utils.py
deleted file mode 100644
index 5cdb6f7..0000000
--- a/tests/unit/test_fetch_utils.py
+++ /dev/null
@@ -1,17 +0,0 @@
-from gitlab2prov.adapters.fetch import utils
-
-
-class TestHelpers:
- def test_project_slug(self):
- expected_slug = "group/project"
- assert expected_slug == utils.project_slug("https://gitlab.com/group/project")
-
- def test_gitlab_url(self):
- expected_url = "https://gitlab.com"
- assert expected_url == utils.gitlab_url("https://gitlab.com/group/project")
-
- def test_clone_over_https_url(self):
- expected_url = "https://gitlab.com:TOKEN@gitlab.com/group/project"
- assert expected_url == utils.clone_over_https_url(
- "https://gitlab.com/group/project", "TOKEN"
- )
diff --git a/tests/unit/test_handlers.py b/tests/unit/test_handlers.py
deleted file mode 100644
index 14224f5..0000000
--- a/tests/unit/test_handlers.py
+++ /dev/null
@@ -1,94 +0,0 @@
-from typing import TypeVar, Type, Optional
-
-from gitlab2prov import bootstrap
-from gitlab2prov.adapters import repository
-from gitlab2prov.service_layer import unit_of_work
-
-
-R = TypeVar("R")
-
-
-class FakeRepository(repository.AbstractRepository):
- def __init__(self, resources: R):
- self._resources = set(resources)
-
- def _add(self, resource: R):
- self._resources.add(resource)
-
- def _get(self, resource_type: Type[R], **filters) -> Optional[R]:
- return next(
- (
- r
- for r in self._resources
- if all(getattr(r, key) == val for key, val in filters.items())
- )
- )
-
- def _list_all(self, resource_type: Type[R], **filters) -> list[R]:
- return [
- r
- for r in self._resources
- if all(getattr(r, key) == val for key, val in filters.items())
- ]
-
-
-class FakeUnitOfWork(unit_of_work.AbstractUnitOfWork):
- def __init__(self):
- self.resources = FakeRepository([])
- self.committed = False
-
- def _commit(self):
- self.committed = True
-
- def rollback(self):
- pass
-
-
-def FakeGitFetcher(resources):
- class FakeGitRepositoryMiner:
- def __init__(self, url, token):
- self.resources = resources
-
- def __enter__(self):
- return self
-
- def __exit__(self, exc_type, exc_val, exc_tb):
- pass
-
- def do_clone(self):
- pass
-
- def fetch_git(self):
- return iter(self.resources)
-
- return FakeGitRepositoryMiner
-
-
-def FakeGitlabFetcher(resources):
- class FakeGitlabFetcher:
- def __init__(self, url, token):
- self.resources = resources
-
- def do_login(self):
- pass
-
- def fetch_gitlab(self):
- return iter(self.resources)
-
- return FakeGitlabFetcher
-
-
-def bootstrap_test_app(git_resources=None, gitlab_resources=None):
- if git_resources is None:
- git_resources = []
- if gitlab_resources is None:
- gitlab_resources = []
- return bootstrap.bootstrap(
- uow=FakeUnitOfWork(),
- git_fetcher=FakeGitFetcher(git_resources),
- gitlab_fetcher=FakeGitlabFetcher(gitlab_resources),
- )
-
-
-class TestHandlers:
- pass
diff --git a/tests/unit/test_objects.py b/tests/unit/test_objects.py
deleted file mode 100644
index 204dcbf..0000000
--- a/tests/unit/test_objects.py
+++ /dev/null
@@ -1,869 +0,0 @@
-from datetime import datetime, timedelta
-from urllib.parse import urlencode
-
-from prov.model import (
- PROV_TYPE,
- PROV_ROLE,
- PROV_ATTR_STARTTIME,
- PROV_ATTR_ENDTIME,
- PROV_LABEL,
-)
-
-from gitlab2prov.domain import objects
-from gitlab2prov.domain.constants import ProvType, ProvRole
-from gitlab2prov.prov.operations import qualified_name
-
-from tests.random_refs import random_suffix
-
-
-today = datetime.now()
-yesterday = today - timedelta(days=1)
-next_week = today + timedelta(days=7)
-tomorrow = today + timedelta(days=1)
-
-
-class TestUser:
- def test_identifier(self):
- name = f"user-name-{random_suffix()}"
- email = f"user-email-{random_suffix()}"
- username = f"user-username-{random_suffix()}"
- id = f"user-id-{random_suffix()}"
- role = ProvRole.AUTHOR
- user = objects.User(
- name=name,
- email=email,
- gitlab_username=username,
- gitlab_id=id,
- prov_role=role,
- )
- expected_identifier = qualified_name(
- f"User?{urlencode([('name', name), ('email', email)])}"
- )
- assert user.prov_identifier == expected_identifier
-
- def test_attributes(self):
- name = f"user-name-{random_suffix()}"
- email = f"user-email-{random_suffix()}"
- username = f"user-username-{random_suffix()}"
- id = f"user-id-{random_suffix()}"
- role = f"user-prov-role-{random_suffix()}"
- role = ProvRole.AUTHOR
- user = objects.User(
- name=name,
- email=email,
- gitlab_username=username,
- gitlab_id=id,
- prov_role=role,
- )
- expected_attributes = [
- ("name", name),
- ("email", email),
- ("gitlab_username", username),
- ("gitlab_id", id),
- (PROV_ROLE, role),
- (PROV_TYPE, ProvType.USER),
- (PROV_LABEL, user.prov_label),
- ]
- assert user.prov_attributes == expected_attributes
-
- def test_email_normalization(self):
- name = f"user-name-{random_suffix()}"
- role = f"user-prov-role-{random_suffix()}"
- uppercase = f"user-email-{random_suffix()}".upper()
- user = objects.User(name=name, email=uppercase, prov_role=role)
- assert user.email.islower()
-
-
-class TestFile:
- def test_identifier(self):
- path = f"file-path-{random_suffix()}"
- hexsha = f"commit-hash-{random_suffix()}"
- f = objects.File(path=path, committed_in=hexsha)
- expected_identifier = qualified_name(
- f"File?{urlencode([('path', path), ('committed_in', hexsha)])}"
- )
- assert f.prov_identifier == expected_identifier
-
- def test_attributes(self):
- path = f"file-path-{random_suffix()}"
- hexsha = f"commit-hash-{random_suffix()}"
- f = objects.File(path=path, committed_in=hexsha)
- expected_attributes = [
- ("path", path),
- ("committed_in", hexsha),
- (PROV_TYPE, ProvType.FILE),
- (PROV_LABEL, f.prov_label),
- ]
- assert f.prov_attributes == expected_attributes
-
-
-class TestFileRevision:
- def test_identifier(self):
- path = f"file-path-{random_suffix()}"
- hexsha = f"commit-hash-{random_suffix()}"
- change_type = f"change-type-{random_suffix()}"
- file_revision = objects.FileRevision(
- path=path,
- committed_in=hexsha,
- change_type=change_type,
- original=None,
- previous=None,
- )
- expected_identifier = qualified_name(
- f"FileRevision?{urlencode([('path', path), ('committed_in', hexsha), ('change_type', change_type)])}"
- )
- assert file_revision.prov_identifier == expected_identifier
-
- def test_attributes(self):
- path = f"file-path-{random_suffix()}"
- hexsha = f"commit-hash-{random_suffix()}"
- change_type = f"change-type-{random_suffix()}"
- file_revision = objects.FileRevision(
- path=path,
- committed_in=hexsha,
- change_type=change_type,
- original=None,
- previous=None,
- )
- expected_attributes = [
- ("path", path),
- ("committed_in", hexsha),
- ("change_type", change_type),
- (PROV_TYPE, ProvType.FILE_REVISION),
- (PROV_LABEL, file_revision.prov_label),
- ]
- assert file_revision.prov_attributes == expected_attributes
-
-
-class TestGitCommit:
- def test_identifier(self):
- hexsha = f"commit-hash-{random_suffix()}"
- msg = f"commit-message-{random_suffix()}"
- title = f"commit-title-{random_suffix()}"
- commit = objects.GitCommit(
- hexsha=hexsha,
- message=msg,
- title=title,
- author=None,
- committer=None,
- parents=[],
- prov_start=today,
- prov_end=tomorrow,
- )
- expected_identifier = qualified_name(f"GitCommit?{urlencode([('hexsha', hexsha)])}")
- assert commit.prov_identifier == expected_identifier
-
- def test_attributes(self):
- hexsha = f"commit-hash-{random_suffix()}"
- msg = f"commit-message-{random_suffix()}"
- title = f"commit-title-{random_suffix()}"
- commit = objects.GitCommit(
- hexsha=hexsha,
- message=msg,
- title=title,
- author=None,
- committer=None,
- parents=[],
- prov_start=today,
- prov_end=tomorrow,
- )
- expected_attributes = [
- ("hexsha", hexsha),
- ("message", msg),
- ("title", title),
- (PROV_ATTR_STARTTIME, today),
- (PROV_ATTR_ENDTIME, tomorrow),
- (PROV_TYPE, ProvType.GIT_COMMIT),
- (PROV_LABEL, commit.prov_label),
- ]
- assert commit.prov_attributes == expected_attributes
-
-
-class TestAsset:
- def test_identifier(self):
- url = f"asset-url-{random_suffix()}"
- fmt = f"asset-format-{random_suffix()}"
- asset = objects.Asset(url=url, format=fmt)
- expected_identifier = qualified_name(f"Asset?{urlencode([('url', url), ('format', fmt)])}")
- assert asset.prov_identifier == expected_identifier
-
- def test_attributes(self):
- url = f"asset-url-{random_suffix()}"
- fmt = f"asset-format-{random_suffix()}"
- asset = objects.Asset(url=url, format=fmt)
- expected_attributes = [
- ("url", url),
- ("format", fmt),
- (PROV_TYPE, ProvType.ASSET),
- (PROV_LABEL, asset.prov_label),
- ]
- assert asset.prov_attributes == expected_attributes
-
-
-class TestEvidence:
- def test_identifier(self):
- sha = f"evidence-sha-{random_suffix()}"
- url = f"evidence-url-{random_suffix()}"
- evidence = objects.Evidence(hexsha=sha, url=url, collected_at=today)
- expected_identifier = qualified_name(
- f"Evidence?{urlencode([('hexsha', sha), ('url', url), ('collected_at', today)])}"
- )
- assert evidence.prov_identifier == expected_identifier
-
- def test_attributes(self):
- sha = f"evidence-sha-{random_suffix()}"
- url = f"evidence-url-{random_suffix()}"
- evidence = objects.Evidence(hexsha=sha, url=url, collected_at=today)
- expected_attributes = [
- ("hexsha", sha),
- ("url", url),
- ("collected_at", today),
- (PROV_TYPE, ProvType.EVIDENCE),
- (PROV_LABEL, evidence.prov_label),
- ]
- assert evidence.prov_attributes == expected_attributes
-
-
-class TestAnnotatedVersion:
- def test_identifier(self):
- vid = f"version-id-{random_suffix()}"
- aid = f"annotation-id-{random_suffix()}"
- annotated_version = objects.AnnotatedVersion(
- version_id=vid,
- annotation_id=aid,
- prov_type=ProvType.GITLAB_COMMIT_VERSION_ANNOTATED,
- )
- expected_identifier = qualified_name(
- f"{ProvType.GITLAB_COMMIT_VERSION_ANNOTATED}?{urlencode([('version_id', vid), ('annotation_id', aid)])}"
- )
- assert annotated_version.prov_identifier == expected_identifier
-
- def test_attributes(self):
- vid = f"version-id-{random_suffix()}"
- aid = f"annotation-id-{random_suffix()}"
- annotated_version = objects.AnnotatedVersion(
- version_id=vid, annotation_id=aid, prov_type="TestAnnotatedVersion"
- )
- expected_attributes = [
- ("version_id", vid),
- ("annotation_id", aid),
- (PROV_TYPE, "TestAnnotatedVersion"),
- (PROV_LABEL, annotated_version.prov_label),
- ]
- assert annotated_version.prov_attributes == expected_attributes
-
-
-class TestCreation:
- def test_identifier(self):
- id = f"creation-id-{random_suffix()}"
- creation = objects.Creation(
- creation_id=id,
- prov_start=today,
- prov_end=tomorrow,
- prov_type=ProvType.TAG_CREATION,
- )
- expected_identifier = qualified_name(
- f"{ProvType.TAG_CREATION}?{urlencode([('creation_id', id)])}"
- )
- assert creation.prov_identifier == expected_identifier
-
- def test_attributes(self):
- id = f"creation-id-{random_suffix()}"
- creation = objects.Creation(
- creation_id=id,
- prov_start=today,
- prov_end=tomorrow,
- prov_type=ProvType.TAG_CREATION,
- )
- expected_attributes = [
- ("creation_id", id),
- (PROV_ATTR_STARTTIME, today),
- (PROV_ATTR_ENDTIME, tomorrow),
- (PROV_TYPE, "TagCreation"),
- (PROV_LABEL, creation.prov_label),
- ]
- assert creation.prov_attributes == expected_attributes
-
-
-class TestAnnotation:
- def test_identifier(self):
- id = f"annotation-id-{random_suffix()}"
- type = f"annotation-type-{random_suffix()}"
- body = f"annotation-body-{random_suffix()}"
- annotation = objects.Annotation(
- id=id,
- type=type,
- body=body,
- annotator=None,
- prov_start=today,
- prov_end=tomorrow,
- )
- expected_identifier = qualified_name(
- f"Annotation?{urlencode([('id', id), ('type', type)])}"
- )
- assert annotation.prov_identifier == expected_identifier
-
- def test_attributes(self):
- id = f"annotation-id-{random_suffix()}"
- type = f"annotation-type-{random_suffix()}"
- body = f"annotation-body-{random_suffix()}"
- annotation = objects.Annotation(
- id=id,
- type=type,
- body=body,
- annotator=None,
- prov_start=today,
- prov_end=tomorrow,
- )
- expected_attributes = [
- ("id", id),
- ("type", type),
- ("body", body),
- (PROV_ATTR_STARTTIME, today),
- (PROV_ATTR_ENDTIME, tomorrow),
- (PROV_TYPE, ProvType.ANNOTATION),
- (PROV_LABEL, annotation.prov_label),
- ]
- assert annotation.prov_attributes == expected_attributes
-
- def test_kwargs(self):
- id = f"annotation-id-{random_suffix()}"
- type = f"annotation-type-{random_suffix()}"
- body = f"annotation-body-{random_suffix()}"
- kwargs = {"kwarg1": "value1", "kwarg2": "value2"}
- annotation = objects.Annotation(
- id=id,
- type=type,
- body=body,
- annotator=None,
- prov_start=today,
- prov_end=tomorrow,
- kwargs=kwargs,
- )
- expected_attributes = [
- ("id", id),
- ("type", type),
- ("body", body),
- ("kwarg1", "value1"),
- ("kwarg2", "value2"),
- (PROV_ATTR_STARTTIME, today),
- (PROV_ATTR_ENDTIME, tomorrow),
- (PROV_TYPE, ProvType.ANNOTATION),
- (PROV_LABEL, annotation.prov_label),
- ]
- assert annotation.prov_attributes == expected_attributes
-
-
-class TestIssue:
- def test_identifier(self):
- id = f"issue-id-{random_suffix()}"
- iid = f"issue-iid-{random_suffix()}"
- title = f"issue-title-{random_suffix()}"
- desc = f"issue-description-{random_suffix()}"
- url = f"issue-url-{random_suffix()}"
- issue = objects.Issue(
- id=id,
- iid=iid,
- title=title,
- description=desc,
- url=url,
- author=None,
- annotations=[],
- created_at=today,
- closed_at=tomorrow,
- )
- expected_identifier = qualified_name(
- f"Issue?{urlencode([('id', id), ('iid', iid), ('title', title)])}"
- )
- assert issue.prov_identifier == expected_identifier
-
- def test_attributes(self):
- id = f"issue-id-{random_suffix()}"
- iid = f"issue-iid-{random_suffix()}"
- title = f"issue-title-{random_suffix()}"
- desc = f"issue-description-{random_suffix()}"
- url = f"issue-url-{random_suffix()}"
- issue = objects.Issue(
- id=id,
- iid=iid,
- title=title,
- description=desc,
- url=url,
- author=None,
- annotations=[],
- created_at=today,
- closed_at=tomorrow,
- )
- expected_attributes = [
- ("id", id),
- ("iid", iid),
- ("title", title),
- ("description", desc),
- ("url", url),
- ("created_at", today),
- ("closed_at", tomorrow),
- (PROV_TYPE, ProvType.ISSUE),
- (PROV_LABEL, issue.prov_label),
- ]
- assert issue.prov_attributes == expected_attributes
-
- def test_creation(self):
- id = f"issue-id-{random_suffix()}"
- issue = objects.Issue(
- id=id,
- iid="",
- title="",
- description="",
- url="",
- author=None,
- annotations=[],
- created_at=today,
- closed_at=tomorrow,
- )
- expected_creation = objects.Creation(
- creation_id=id,
- prov_start=today,
- prov_end=tomorrow,
- prov_type=ProvType.ISSUE_CREATION,
- )
- assert issue.creation == expected_creation
-
- def test_first_version(self):
- id = f"issue-id-{random_suffix()}"
- issue = objects.Issue(
- id=id,
- iid="",
- title="",
- description="",
- url="",
- author=None,
- annotations=[],
- created_at=today,
- closed_at=tomorrow,
- )
- expected_first_version = objects.Version(version_id=id, prov_type=ProvType.ISSUE_VERSION)
- assert issue.first_version == expected_first_version
-
- def test_annotated_versions(self):
- hexsha = f"commit-sha-{random_suffix()}"
- aid1 = f"annotation-id-{random_suffix()}"
- aid2 = f"annotation-id-{random_suffix()}"
- annot1 = objects.Annotation(
- id=aid1,
- type="",
- body="",
- annotator=None,
- prov_start=today,
- prov_end=tomorrow,
- )
- annot2 = objects.Annotation(
- id=aid2,
- type="",
- body="",
- annotator=None,
- prov_start=today,
- prov_end=tomorrow,
- )
- annots = [annot1, annot2]
- commit = objects.GitlabCommit(
- hexsha=hexsha,
- url="",
- author=None,
- annotations=annots,
- authored_at=today,
- committed_at=tomorrow,
- )
- ver1 = objects.AnnotatedVersion(
- version_id=hexsha,
- annotation_id=annot1.id,
- prov_type=ProvType.GITLAB_COMMIT_VERSION_ANNOTATED,
- )
- ver2 = objects.AnnotatedVersion(
- version_id=hexsha,
- annotation_id=annot2.id,
- prov_type=ProvType.GITLAB_COMMIT_VERSION_ANNOTATED,
- )
- expected_versions = [ver1, ver2]
- assert commit.annotated_versions == expected_versions
-
-
-class TestGitlabCommit:
- def test_identifier(self):
- hexsha = f"commit-hash-{random_suffix()}"
- url = f"commit-url-{random_suffix()}"
- commit = objects.GitlabCommit(
- hexsha=hexsha,
- url=url,
- author=None,
- annotations=[],
- authored_at=today,
- committed_at=tomorrow,
- )
- expected_identifier = qualified_name(f"GitlabCommit?{urlencode([('hexsha', hexsha)])}")
- assert commit.prov_identifier == expected_identifier
-
- def test_attributes(self):
- hexsha = f"commit-hash-{random_suffix()}"
- url = f"commit-url-{random_suffix()}"
- commit = objects.GitlabCommit(
- hexsha=hexsha,
- url=url,
- author=None,
- annotations=[],
- authored_at=today,
- committed_at=tomorrow,
- )
- expected_attributes = [
- ("hexsha", hexsha),
- ("url", url),
- ("authored_at", today),
- ("committed_at", tomorrow),
- (PROV_TYPE, ProvType.GITLAB_COMMIT),
- (PROV_LABEL, commit.prov_label),
- ]
- assert commit.prov_attributes == expected_attributes
-
- def test_creation(self):
- hexsha = f"commit-sha-{random_suffix()}"
- commit = objects.GitlabCommit(
- hexsha=hexsha,
- url="",
- author=None,
- annotations=[],
- authored_at=today,
- committed_at=tomorrow,
- )
- expected_creation = objects.Creation(
- creation_id=hexsha,
- prov_start=today,
- prov_end=tomorrow,
- prov_type=ProvType.GITLAB_COMMIT_CREATION,
- )
- assert commit.creation == expected_creation
-
- def test_first_version(self):
- hexsha = f"commit-sha-{random_suffix()}"
- commit = objects.GitlabCommit(
- hexsha=hexsha,
- url="",
- author=None,
- annotations=[],
- authored_at=today,
- committed_at=tomorrow,
- )
- expected_first_version = objects.Version(
- version_id=hexsha, prov_type=ProvType.GITLAB_COMMIT_VERSION
- )
- assert commit.first_version == expected_first_version
-
- def test_annotated_versions(self):
- hexsha = f"commit-sha-{random_suffix()}"
- aid1 = f"annotation-id-{random_suffix()}"
- aid2 = f"annotation-id-{random_suffix()}"
- annot1 = objects.Annotation(
- id=aid1,
- type="",
- body="",
- annotator=None,
- prov_start=today,
- prov_end=tomorrow,
- )
- annot2 = objects.Annotation(
- id=aid2,
- type="",
- body="",
- annotator=None,
- prov_start=today,
- prov_end=tomorrow,
- )
- annots = [annot1, annot2]
- commit = objects.GitlabCommit(
- hexsha=hexsha,
- url="",
- author=None,
- annotations=annots,
- authored_at=today,
- committed_at=tomorrow,
- )
- ver1 = objects.AnnotatedVersion(
- version_id=hexsha,
- annotation_id=annot1.id,
- prov_type=ProvType.GITLAB_COMMIT_VERSION_ANNOTATED,
- )
- ver2 = objects.AnnotatedVersion(
- version_id=hexsha,
- annotation_id=annot2.id,
- prov_type=ProvType.GITLAB_COMMIT_VERSION_ANNOTATED,
- )
- expected_versions = [ver1, ver2]
- assert commit.annotated_versions == expected_versions
-
-
-class TestMergeRequest:
- def test_identifier(self):
- id = f"merge-request-id-{random_suffix()}"
- iid = f"merge-request-iid-{random_suffix()}"
- title = f"merge-request-title-{random_suffix()}"
- desc = f"merge-request-description-{random_suffix()}"
- url = f"merge-request-url-{random_suffix()}"
- source_branch = f"merge-request-source-branch-{random_suffix()}"
- target_branch = f"merge-request-target-branch-{random_suffix()}"
- merge_request = objects.MergeRequest(
- id=id,
- iid=iid,
- title=title,
- description=desc,
- url=url,
- source_branch=source_branch,
- target_branch=target_branch,
- author=None,
- annotations=[],
- created_at=today,
- closed_at=tomorrow,
- merged_at=next_week,
- first_deployed_to_production_at=yesterday,
- )
- expected_identifier = qualified_name(
- f"MergeRequest?{urlencode([('id', id), ('iid', iid), ('title', title)])}"
- )
- assert merge_request.prov_identifier == expected_identifier
-
- def test_attributes(self):
- id = f"merge-request-id-{random_suffix()}"
- iid = f"merge-request-iid-{random_suffix()}"
- title = f"merge-request-title-{random_suffix()}"
- desc = f"merge-request-description-{random_suffix()}"
- url = f"merge-request-url-{random_suffix()}"
- source_branch = f"merge-request-source-branch-{random_suffix()}"
- target_branch = f"merge-request-target-branch-{random_suffix()}"
- merge_request = objects.MergeRequest(
- id=id,
- iid=iid,
- title=title,
- description=desc,
- url=url,
- source_branch=source_branch,
- target_branch=target_branch,
- author=None,
- annotations=[],
- created_at=today,
- closed_at=tomorrow,
- merged_at=next_week,
- first_deployed_to_production_at=yesterday,
- )
- expected_attributes = [
- ("id", id),
- ("iid", iid),
- ("title", title),
- ("description", desc),
- ("url", url),
- ("source_branch", source_branch),
- ("target_branch", target_branch),
- ("created_at", today),
- ("closed_at", tomorrow),
- ("merged_at", next_week),
- ("first_deployed_to_production_at", yesterday),
- (PROV_TYPE, ProvType.MERGE_REQUEST),
- (PROV_LABEL, merge_request.prov_label),
- ]
- assert merge_request.prov_attributes == expected_attributes
-
- def test_creation(self):
- id = f"merge-request-id-{random_suffix()}"
- merge_request = objects.MergeRequest(
- id=id,
- iid="",
- title="",
- description="",
- url="",
- source_branch="",
- target_branch="",
- author=None,
- annotations=[],
- created_at=today,
- closed_at=tomorrow,
- merged_at=yesterday,
- first_deployed_to_production_at=next_week,
- )
- expected_creation = objects.Creation(
- creation_id=id,
- prov_start=today,
- prov_end=tomorrow,
- prov_type=ProvType.MERGE_REQUEST_CREATION,
- )
- assert merge_request.creation == expected_creation
-
- def test_first_version(self):
- id = f"merge-request-id-{random_suffix()}"
- merge_request = objects.MergeRequest(
- id=id,
- iid="",
- title="",
- description="",
- url="",
- source_branch="",
- target_branch="",
- author=None,
- annotations=[],
- created_at=today,
- closed_at=tomorrow,
- merged_at=yesterday,
- first_deployed_to_production_at=next_week,
- )
- expected_version = objects.Version(version_id=id, prov_type=ProvType.MERGE_REQUEST_VERSION)
- assert merge_request.first_version == expected_version
-
- def test_annotated_versions(self):
- id = f"merge-request-id-{random_suffix()}"
- aid1 = f"annotation-id-{random_suffix()}"
- aid2 = f"annotation-id-{random_suffix()}"
- annot1 = objects.Annotation(
- id=aid1,
- type="",
- body="",
- kwargs=None,
- annotator=None,
- prov_start=today,
- prov_end=tomorrow,
- )
- annot2 = objects.Annotation(
- id=aid2,
- type="",
- body="",
- kwargs=None,
- annotator=None,
- prov_start=today,
- prov_end=tomorrow,
- )
- annots = [annot1, annot2]
- merge_request = objects.MergeRequest(
- id=id,
- iid="",
- title="",
- description="",
- url="",
- source_branch="",
- target_branch="",
- author=None,
- annotations=annots,
- created_at=today,
- closed_at=tomorrow,
- merged_at=yesterday,
- first_deployed_to_production_at=next_week,
- )
- ver1 = objects.AnnotatedVersion(
- version_id=id,
- annotation_id=annot1.id,
- prov_type=ProvType.MERGE_REQUEST_VERSION_ANNOTATED,
- )
- ver2 = objects.AnnotatedVersion(
- version_id=id,
- annotation_id=annot2.id,
- prov_type=ProvType.MERGE_REQUEST_VERSION_ANNOTATED,
- )
- expected_versions = [ver1, ver2]
- assert merge_request.annotated_versions == expected_versions
-
-
-class TestTag:
- def test_identifier(self):
- name = f"tag-name-{random_suffix()}"
- hexsha = f"commit-sha-{random_suffix()}"
- msg = f"tag-message-{random_suffix()}"
- tag = objects.Tag(name=name, hexsha=hexsha, message=msg, author=None, created_at=today)
- expected_identifier = qualified_name(
- f"Tag?{urlencode([('name', name), ('hexsha', hexsha)])}"
- )
- assert tag.prov_identifier == expected_identifier
-
- def test_attributes(self):
- name = f"tag-name-{random_suffix()}"
- hexsha = f"commit-sha-{random_suffix()}"
- msg = f"tag-message-{random_suffix()}"
- tag = objects.Tag(name=name, hexsha=hexsha, message=msg, author=None, created_at=today)
- expected_attributes = [
- ("name", name),
- ("hexsha", hexsha),
- ("message", msg),
- ("created_at", today),
- (PROV_TYPE, ProvType.TAG),
- (PROV_TYPE, ProvType.COLLECTION),
- (PROV_LABEL, tag.prov_label),
- ]
- assert tag.prov_attributes == expected_attributes
-
- def test_creation(self):
- name = f"tag-name-{random_suffix()}"
- tag = objects.Tag(name=name, hexsha="", message="", author=None, created_at=today)
- expected_creation = objects.Creation(
- creation_id=name,
- prov_start=today,
- prov_end=today,
- prov_type=ProvType.TAG_CREATION,
- )
- assert tag.creation == expected_creation
-
-
-class TestRelease:
- def test_identifier(self):
- name = f"release-name-{random_suffix()}"
- desc = f"release-description-{random_suffix()}"
- tag_name = f"tag-name-{random_suffix()}"
- release = objects.Release(
- name=name,
- description=desc,
- tag_name=tag_name,
- author=None,
- assets=[],
- evidences=[],
- created_at=today,
- released_at=tomorrow,
- )
- expected_identifier = qualified_name(f"Release?{urlencode([('name', name)])}")
- assert release.prov_identifier == expected_identifier
-
- def test_attributes(self):
- name = f"release-name-{random_suffix()}"
- desc = f"release-description-{random_suffix()}"
- tag_name = f"tag-name-{random_suffix()}"
- release = objects.Release(
- name=name,
- description=desc,
- tag_name=tag_name,
- author=None,
- assets=[],
- evidences=[],
- created_at=today,
- released_at=tomorrow,
- )
- expected_attributes = [
- ("name", name),
- ("description", desc),
- ("tag_name", tag_name),
- ("created_at", today),
- ("released_at", tomorrow),
- (PROV_TYPE, ProvType.RELEASE),
- (PROV_TYPE, ProvType.COLLECTION),
- (PROV_LABEL, release.prov_label),
- ]
- assert release.prov_attributes == expected_attributes
-
- def test_creation(self):
- name = f"release-name-{random_suffix()}"
- release = objects.Release(
- name=name,
- description="",
- tag_name="",
- author=None,
- assets=[],
- evidences=[],
- created_at=today,
- released_at=tomorrow,
- )
- expected_creation = objects.Creation(
- creation_id=name,
- prov_start=today,
- prov_end=tomorrow,
- prov_type=ProvType.RELEASE_CREATION,
- )
- assert release.creation == expected_creation
diff --git a/tests/unit/test_operations.py b/tests/unit/test_operations.py
deleted file mode 100644
index d8bb6c2..0000000
--- a/tests/unit/test_operations.py
+++ /dev/null
@@ -1,225 +0,0 @@
-import hashlib
-
-from prov.model import ProvAgent, ProvDocument, ProvRelation, PROV_ROLE, PROV_TYPE
-
-from gitlab2prov.prov import operations
-from gitlab2prov.prov.operations import qualified_name
-
-from tests.random_refs import random_suffix
-
-
-class TestStats:
- def test_format_as_ascii_table(self):
- d = {"A": 1, "B": 2, "C": 3}
- expected_header = [
- f"|{'Record Type':20}|{'Count':20}|",
- f"+{'-'*20}+{'-'*20}+",
- ]
- expected_body = [
- f"|{'A':20}|{1:20}|",
- f"|{'B':20}|{2:20}|",
- f"|{'C':20}|{3:20}|",
- ]
- table = operations.format_stats_as_ascii_table(d)
- lines = [l.strip() for l in table.split("\n") if l]
- assert lines[:2] == expected_header
- assert lines[2:] == expected_body
-
- def test_format_stats_as_csv(self):
- d = {"A": 1, "B": 2, "C": 3}
- expected_header = ["Record Type, Count"]
- expected_body = [
- "A, 1",
- "B, 2",
- "C, 3",
- ]
- csv = operations.format_stats_as_csv(d)
- lines = [l.strip() for l in csv.split("\n") if l]
- assert lines[:1] == expected_header
- assert lines[1:] == expected_body
-
-
-class TestGraphFactory:
- def test_namespace_uri_is_gitlab2prov(self):
- graph = operations.graph_factory()
- expected_uri = "http://github.com/dlr-sc/gitlab2prov/"
- assert graph.get_default_namespace().uri == expected_uri
-
- def test_init_wo_list_of_records(self):
- uri = "http://github.com/dlr-sc/gitlab2prov/"
- expected_graph = ProvDocument()
- expected_graph.set_default_namespace(uri)
- assert operations.graph_factory() == expected_graph
-
- def test_init_with_list_of_records(self):
- records = [
- ProvAgent(None, qualified_name(f"agent-id-{random_suffix()}")),
- ProvAgent(None, qualified_name(f"agent-id-{random_suffix()}")),
- ]
- expected_graph = ProvDocument(records)
- assert operations.graph_factory(records) == expected_graph
-
-
-class TestCombine:
- def test_returns_empty_graph_when_run_wo_subgraphs(self):
- assert operations.combine(iter([])) == operations.graph_factory()
-
- def test_carries_over_all_records(self):
- agent1 = ProvAgent(None, qualified_name(f"agent-id-{random_suffix()}"))
- agent2 = ProvAgent(None, qualified_name(f"agent-id-{random_suffix()}"))
- graph1 = ProvDocument([agent1])
- graph2 = ProvDocument([agent2])
- subgraphs = [graph1, graph2]
- expected_graph = ProvDocument([agent1, agent2])
- assert operations.combine(iter(subgraphs)) == expected_graph
-
-
-class TestDedupe:
- def test_removes_duplicate_elements(self):
- agent = ProvAgent(None, qualified_name(f"agent-id-{random_suffix()}"))
- graph = ProvDocument([agent, agent])
- expected_graph = ProvDocument([agent])
- assert list(graph.get_records(ProvAgent)) == [agent, agent]
- assert list(operations.dedupe(graph).get_records(ProvAgent)) == [agent]
- assert operations.dedupe(graph) == expected_graph
-
- def test_merges_attributes_of_duplicate_elements(self):
- id = qualified_name(f"agent-id-{random_suffix()}")
- graph = ProvDocument()
- graph.agent(id, {"attribute1": 1})
- graph.agent(id, {"attribute2": 2})
- expected_attributes = [
- (qualified_name("attribute1"), 1),
- (qualified_name("attribute2"), 2),
- ]
- agents = list(operations.dedupe(graph).get_records(ProvAgent))
- assert len(agents) == 1
- assert agents[0].attributes == expected_attributes
-
- def test_remove_duplicate_relations(self):
- graph = ProvDocument()
- agent = graph.agent(qualified_name(f"agent-id-{random_suffix()}"))
- entity = graph.entity(qualified_name(f"entity-id-{random_suffix()}"))
- r1 = graph.wasAttributedTo(entity, agent)
- r2 = graph.wasAttributedTo(entity, agent)
- assert list(graph.get_records(ProvRelation)) == [r1, r2]
- assert list(operations.dedupe(graph).get_records(ProvRelation)) == [r1]
-
- def test_merges_attributes_of_duplicate_relations(self):
- graph = ProvDocument()
- agent = graph.agent(qualified_name(f"agent-id-{random_suffix()}"))
- entity = graph.entity(qualified_name(f"entity-id-{random_suffix()}"))
- r1_attrs = [(qualified_name("attr"), "val1")]
- r2_attrs = [(qualified_name("attr"), "val2")]
- graph.wasAttributedTo(entity, agent, other_attributes=r1_attrs)
- graph.wasAttributedTo(entity, agent, other_attributes=r2_attrs)
-
- graph = operations.dedupe(graph)
-
- relations = list(graph.get_records(ProvRelation))
- assert len(relations) == 1
- expected_extra_attributes = set(
- [
- (qualified_name("attr"), "val1"),
- (qualified_name("attr"), "val2"),
- ]
- )
- assert set(relations[0].extra_attributes) == expected_extra_attributes
-
-
-class TestUncoverDoubleAgents:
- def test_build_inverse_index(self):
- mapping = {"name": ["alias1", "alias2"]}
- expected_dict = {"alias1": "name", "alias2": "name"}
- assert operations.build_inverse_index(mapping) == expected_dict
-
- def test_uncover_name(self):
- names = {"alias": "name"}
- graph = operations.graph_factory()
- agent = graph.agent("agent-id", other_attributes={qualified_name("name"): "alias"})
- expected_name = (qualified_name("name"), "name")
- assert operations.uncover_name(agent, names) == expected_name
-
- def test_uncover_duplicated_agents_resolves_agent_alias(self, mocker):
- d = {"alias1": "name", "alias2": "name"}
- mocker.patch("gitlab2prov.prov.operations.read_duplicated_agent_mapping")
- mocker.patch("gitlab2prov.prov.operations.build_inverse_index", return_value=d)
-
- graph = operations.graph_factory()
- graph.agent("agent1", {"name": "alias2"})
- graph.agent("agent2", {"name": "alias1"})
-
- graph = operations.merge_duplicated_agents(graph, "")
-
- agents = list(graph.get_records(ProvAgent))
- assert len(agents) == 1
- expected_name = "name"
- [(_, name)] = [(k, v) for k, v in agents[0].attributes if k.localpart == "name"]
- assert name == expected_name
-
- def test_uncover_duplicated_agents_reroutes_relations(self, mocker):
- d = {"alias1": "name", "alias2": "name"}
- mocker.patch("gitlab2prov.prov.operations.read_duplicated_agent_mapping")
- mocker.patch("gitlab2prov.prov.operations.build_inverse_index", return_value=d)
-
- graph = operations.graph_factory()
- a1 = graph.agent("agent1", {"name": "alias2"})
- a2 = graph.agent("agent2", {"name": "alias1"})
- e1 = graph.entity("entity1")
- e2 = graph.entity("entity2")
- e1.wasAttributedTo(a1)
- e2.wasAttributedTo(a2)
-
- graph = operations.merge_duplicated_agents(graph, "")
-
- relations = list(graph.get_records(ProvRelation))
- assert len(relations) == 2
- expected_identifier = "User?name=name"
- assert all(
- relation.formal_attributes[1][1].localpart == expected_identifier
- for relation in relations
- )
-
-
-class TestPseudonymize:
- def test_pseudonymize_changes_agent_name_and_identifier(self):
- graph = operations.graph_factory()
- name = f"agent-name-{random_suffix()}"
- email = f"agent-email-{random_suffix()}"
- graph.agent("agent1", {"name": name, "email": email})
-
- graph = operations.pseudonymize(graph)
-
- expected_name = hashlib.sha256(bytes(name, "utf-8")).hexdigest()
- expected_email = hashlib.sha256(bytes(email, "utf-8")).hexdigest()
- expected_identifier = qualified_name(f"User?name={expected_name}&email={expected_email}")
-
- agent = next(graph.get_records(ProvAgent))
- assert agent.identifier == expected_identifier
- assert list(agent.get_attribute("name"))[0] == expected_name
- assert list(agent.get_attribute("email"))[0] == expected_email
-
- def test_pseudonymize_deletes_non_name_attributes_apart_from_role_and_type(self):
- graph = operations.graph_factory()
- graph.agent(
- "agent1",
- {
- "name": f"agent-name-{random_suffix()}",
- "email": f"email-{random_suffix()}",
- "gitlab_username": f"gitlab-username-{random_suffix()}",
- "gitlab_id": f"gitlab-id-{random_suffix()}",
- PROV_ROLE: f"prov-role-{random_suffix()}",
- PROV_TYPE: f"prov-type-{random_suffix()}",
- },
- )
-
- graph = operations.pseudonymize(graph)
-
- agent = next(graph.get_records(ProvAgent))
- expected_attributes = [
- PROV_ROLE,
- PROV_TYPE,
- qualified_name("name"),
- qualified_name("email"),
- ]
- assert all([(attr in expected_attributes) for (attr, _) in agent.extra_attributes])
diff --git a/tests/unit/test_repository.py b/tests/unit/test_repository.py
new file mode 100644
index 0000000..912ef62
--- /dev/null
+++ b/tests/unit/test_repository.py
@@ -0,0 +1,99 @@
+import pytest
+from gitlab2prov.adapters.repository import InMemoryRepository
+
+
+class TestInMemoryRepository:
+
+
+ def test_add_resource(self, random_user):
+ repo = InMemoryRepository()
+ resource = random_user
+ repo.add(resource)
+ assert len(repo.repo[type(resource)]) == 1
+ assert repo.repo[type(resource)][0] == resource
+
+ def test_get_resource_existing(self, random_user):
+ repo = InMemoryRepository()
+ resource = random_user
+ repo.add(resource)
+ retrieved_resource = repo.get(type(resource))
+ assert retrieved_resource == resource
+
+ def test_get_resource_non_existing(self, random_user):
+ repo = InMemoryRepository()
+ retrieved_resource = repo.get(type(random_user))
+ assert retrieved_resource is None
+
+ @pytest.mark.fixt_data(2)
+ def test_get_resource_with_filters_existing(self, n_random_users):
+ repo = InMemoryRepository()
+ resource1 = n_random_users[0]
+ resource2 = n_random_users[1]
+ repo.add(resource1)
+ repo.add(resource2)
+ retrieved_resource = repo.get(type(resource1), email=resource1.email, name=resource1.name)
+ assert retrieved_resource == resource1
+
+ def test_get_resource_with_filters_non_existing(self, random_user):
+ repo = InMemoryRepository()
+ resource = random_user
+ repo.add(resource)
+ retrieved_resource = repo.get(type(resource), name="...", email="...")
+ assert retrieved_resource is None
+
+ def test_get_resource_throws_attribute_error_for_non_existing_attributes(self, random_user):
+ repo = InMemoryRepository()
+ resource = random_user
+ repo.add(resource)
+ try:
+ repo.get(type(resource), non_existing_attribute="...")
+ except AttributeError:
+ assert True
+ else:
+ assert False
+
+ @pytest.mark.fixt_data(2)
+ def test_list_all_resources(self, n_random_users):
+ repo = InMemoryRepository()
+ resource1 = n_random_users[0]
+ resource2 = n_random_users[1]
+ repo.add(resource1)
+ repo.add(resource2)
+ retrieved_resources = repo.list_all(type(resource1))
+ assert len(retrieved_resources) == 2
+ assert resource1 in retrieved_resources
+ assert resource2 in retrieved_resources
+
+ pytest.mark.fixt_data(2)
+ def test_list_all_resources_with_filters_existing(self, n_random_users):
+ repo = InMemoryRepository()
+ resource1 = n_random_users[0]
+ resource2 = n_random_users[1]
+ repo.add(resource1)
+ repo.add(resource2)
+ retrieved_resources = repo.list_all(
+ type(resource1), name=resource1.name, email=resource1.email
+ )
+ assert len(retrieved_resources) == 1
+ assert resource1 in retrieved_resources
+
+ @pytest.mark.fixt_data(2)
+ def test_list_all_resources_with_filters_non_existing(self, n_random_users):
+ repo = InMemoryRepository()
+ resource1 = n_random_users[0]
+ resource2 = n_random_users[1]
+ repo.add(resource1)
+ repo.add(resource2)
+ retrieved_resources = repo.list_all(type(resource1), name="...", email="...")
+ assert len(retrieved_resources) == 0
+
+ def test_list_all_resources_throws_attribute_error_for_non_existing_attributes(self, random_user):
+ repo = InMemoryRepository()
+ resource1 = random_user
+ repo.add(resource1)
+ try:
+ repo.list_all(type(resource1), non_existing_attribute="...")
+ except AttributeError:
+ assert True
+ else:
+ assert False