This repository is designed as a starting point for data science projects, providing a structured project for data analysis and modeling.
A data science project typically involves several key steps, consequently were created the following resources:
- Notebook: To develop and test ideas, data scientists typically use Jupyter Notebooks. Implementations should locate within the
notebookdirectory. A starter file named__main__.ipynbis provided in that folder to help users get started. - Data: All datasets should be stored in the
datadirectory. To get started, was provided a sample datasetmeteorite_landings.csvlocated in this folder. The dataset was loaded using Python within the Dataset section of the__main__.ipynbnotebook. - Libraries: Any custom libraries or modules created should be placed in the
notebook/libdirectory. A sample library nameddata_profiling.pywas provided to demonstrate how to create reusable code for notebooks. - Results: The results of your analysis should be stored in the
resdirectory. This can include reports, visualizations, or any other output generated by your notebooks. - Requirements: Any additional Python packages required for your project should be listed in the
requirements.txtfile. This allows for easy installation of dependencies usingpip. - Virtual Environment: To not affect your global Python environment, a virtual environment is generally integrated. For this template was considered the usage of
venv.
Important
- Python
- pip
Usage:
bash cmd.sh {setup|clear}If you haven't built the project yet, you can do so by running:
bash cmd.sh setupIf you want to remove all generated files and start from scratch, you can run:
bash cmd.sh clearThis project is distributed under GNU General Public License version 3. You can find the complete text of the license in the project repository.