- Learn why you need Git and GitHub
- Learn how to use Git
- Try it out!
- Discuss usage in our lab
- Learn about other cool things you can do on GitHub
http://www.cs.toronto.edu/~kenpu/articles/cs/git-intro.html
In summary:
- Be nice to yourself and your teammates
- Let Git be organized and remember your workflow instead of you
- Don't lose your code
- Be able to collaborate with other coders
- Be a good and transparent scientist
https://github.com/HartmannLab
We have a GitHub organization for our group. This is a place to collect all our repositories and scripts. You can git push your repository there. Then, you and others in the group can access it anywhere, anytime. They may modify it without interfering with your local copy until you git pull the modified version from GitHub, and merge it with your own.
Note that repositories in the HartmannLab can be private (only accessible by other members) or public (accessible by anyone on the internet). You can start developping your scripts in a private repository and make it public once you're satisfied with it (which should be when a manuscript is submitted at the latest).
Scenario: A team of immunologists is studying the activation of cytotoxic lymphocytes and needs to perform statistical analyses on large imaging datasets.
Use Case: The team uses Git to manage changes to their data analysis scripts, which are written in Python and R. Each team member clones the Git repository to their local machine, making changes and committing updates as they refine their models. They use GitHub to push these changes, allowing others to pull the latest versions. This approach ensures that everyone is working with the most up-to-date scripts and can track how models evolve over time.
Scenario: A lab member develops a new tool for identifying metabolic niches.
Use Case: The lab uses GitHub to host the project, making it publicly accessible for broader use and collaboration. They use tags to release versions of the software, which helps users refer to or cite specific versions of the tool in their research. GitHub's release feature is used to distribute executables and source code, along with release notes explaining the changes and enhancements made. The lab also uses GitHub Actions to automate testing, ensuring that the software works across different operating systems and Python versions.
Scenario: A senior lab member welcoming new intern wants to share course materials and assignments.
Use Case: The mentor uses a GitHub repository to distribute lecture notes, assignments, and project templates. Interns can fork the repository to obtain their copies, work on assignments, and use pull requests to submit their work. The mentor can then review the submissions directly on GitHub, provide feedback, and merge the pull requests once the assignments are corrected.
https://www.freecodecamp.org/news/introduction-to-git-and-github/
https://learngitbranching.js.org/
Some topics are quite advanced and you will hopefully never need them, but tutorials in Main > Introduction Sequence and in Remote > Push & Pull -- Git Remotes! are quite useful.
Don't memorize everything, just know what can be done and where to find the information. Here are a few life-saving links:
Git guide: a handy cheatsheet.
Oh shit git: when things go awry
A lot of operations can be done on GitHub in your browser. Suggestion: test yourself by doing the following:
- Find the repository including this tutorial, in the HartmannLab on GitHub
- Create a new branch with your name and switch to it
- Add and commit a text file with your favorite joke, or a link to your favorite GIF (think
git add) - Create a pull request to add your changes to the
mainbranch.
The most flexible way to use Git is in the terminal (or command prompt for Windows users). If you need a refresher, here is a command line cheat sheet. Feel free to also use a graphical user interface, but this is less standard and won't be covered here.
Prerequisites (required only the first time you want to connect to GitHub):
- You need a GitHub account and to be part of the HartmannLab orga. If you're seeing this page, congrats, you already fulfil these requirements.
- You will need to be able to run Git on your machine. In Linux and Mac, Git is installed by default. In Windows, you might need to download and install Git first. To know if Git is installed, open a terminal (or command prompt for Windows users) and type
git --version. - Set up your user name and email
git config --global user.name "Your Name"
git config --global user.email "your_email@whatever.com"
- To secure your connection with GitHub and avoid typing your password all the time you can generate an SSH key. Note that you should clone GitHub repos using their SSH address starting with git@, for instance
git@github.com:HartmannLab/GitWorkshop.gitfor this repo. - You might be required to use 2-factor authentication. You can either validate your connection using your mobile phone or do it directly from your computer using Authy desktop.
Suggestion: you can directly use Git for your projects, or test yourself by doing the following:
- Find the repository including this tutorial, in the HartmannLab on GitHub (think
git clone git@github.com:HartmannLab/GitWorkshop.git) - Clone the repository on your computer
- Create a new branch with your name and switch to it
- Add a text file with your favorite joke, or a link to your favorite GIF (think
git add) - Save your changes (think
git commit) - Upload it on GitHub (think
git push)
The way our lab works with GitHub should not be set in stone. Here are a couple of suggestions:
- Maintain one private repository with all example notebooks relevant to the lab (e.g. ark, scyan, misty). People looking for a starting point to analyse their images should find everything in this folder.
- For each project, store your analyses in a dedicated repository. This should document how you went from the images to the results and plots in the final publication.
Note 1: The repo for your analyses will need to be public once you submit your manuscript but can be kept private before that. Usually there's no much risk in having it public unless there's a clear and novel scientific finding that may be scooped, or if you include the raw data in the repository by mistake. Note 2: Keeping track of what you did so you can come back to it after a while is essential, explaining and documenting your workflow to others is great, and making your whole analysis reproducible is ideal.
- Common approaches to make your analyses reproducible include using conda, nextflow, snakemake or docker.
- How to structure your project (i.e. where to put data, notebooks, scripts, plots and so on) depends on your goal and your preferences but cookiecutter is a good starting point. Be carefule that GitHub does not like large files and your data should typically not be included in your repository but distributed separately, for instance with figshare or dryad.
Here are a few open questions:
- Do we want to use GitHub as a knowledge base (e.g. list of tools for spatial analyses) instead of spreadsheets?
- Do we want to make use of Kanban boards on GitHub, for instance for project ideas or to distribute tasks?
- Do we want a logo or something?
When using Git, try to adhere to best practices to get the most of Git. Plan your workflow, by deciding on a branching strategy that suits your project’s needs, such as feature branching. Make frequent, small commits that capture changes logically and include clear, concise commit messages that explain the why behind each change, not just the what. Use branches to isolate development work without affecting the main or 'master' branch. If possible, try code reviewing with other lab members. Regularly pull changes from your repository to keep your local copy up to date and reduce merge conflicts. Avoid including large files in your repository and favor using links and instructions to set up the data whenever possible, to avoid forcing download large data volumes when pushing/pulling. Lastly, ensure sensitive data like passwords, API keys or clinical data are never committed to your repository; instead, use environment variables or configuration files excluded from version control by .gitignore. By following these practices, you can maximize the effectiveness of Git as a tool for collaborative development and maintain a clean, functional codebase.
You can edit files directly online with Visual Studio, which offers many convenient features to work on code. For that, simply change github.com to github.dev in your address bar. For instance you could edit this repository directly.
GitHub also offers Codespaces, which is a virtual machine in which you can run your code directly from your browser and interact using Visual Studio. It also supports things like Jupyter Notebooks, and can be used for collaborating on the same code in real time.
GitHub offers a smart code completion tool integrated within its editor and codespaces (something like chatGPT for code). These options come at a cost but can be free for students and researchers.
GitHub can automate a lot of tasks using so-called GitHub actions. For instance, a script can be triggered each time something is commited to the main branch. This can be used to test your code and make sure it works in different environments. This allows Continuous development, a software engineering principle that ensures that you're code is functional and can be run by users of different machines and operating systems. Simpler uses are also common, such as automatic updates or code linting (making sure your code is formatted according to the guidelines).
You can specify exactly which version of the code was used in your paper by associating a given release of your repository to a unique DOI using Zenodo
The file named README.md is always the one displayed first when you visit a repo on GitHub. This is why it's used as a starting point to explain what your repository is for, and how to run its code.
Ideally each time you're done adding a new feature, function or plot, and each time you correct a bug.
If someone clones the repository, they should be able to compile/run the code without debugging. If you need to commit a buggy version, do it an other branch and keep the latest compiling version of your code in the main branch.
You can tag the latest commit in your active branch to give it a name and find it easily later on (for instance when reaching a stable release or before publishing).
git tag -a versionName -m "Description of the version"
If you create a new file and want it in your repository, git add. If you remove a file in your git folder with rm, it will still be part of your repository unless you use git rm.
If it's on GitHub, it's really safe and people (including Future-you) can have a look at your code if they need it.
Is there anything you found unclear in this document? Did you find a resource to be particularly useful and it is not listed here yet? Go ahead a make a pull request to improve this repository!