mirror of
https://github.com/wassname/cookiecutter-data-science.git
synced 2026-06-27 17:48:10 +08:00
1fe968d24e
* WIP - New version with cleaner options * Fix find-replace error (#177) * Remove unnecessary .gitkeep * Remove unused tox.ini * Split reqs into dev/non-dev * Add basic packages support * Add tests for testing environment creation and requirements * Set up CI with Azure Pipelines (#194) * Change archived asciinema example (#163) * Change archived asciinema example * Update README.md Fix Asciinema powerline error * Update docs to show updated asciinema example * Added source and destination to Make data target (#169) * Fix broken Airflow link (#182) * Fixed: Typo in Makefile (#184) Fixed typo in Makefile, section "Set up python interpreter environment": intalled --> installed * Set up CI with Azure Pipelines [skip ci] * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * str paths for windows support * handle multiple data providers (#199) * Add missing env directory bin/activate path * Remove version from PYTHON_INTERPRETER command * Search for virtualenvwrapper.sh path if executable not found * Try chardet for character encoding detection * Specify python and virtualenv binaries for virtualenvwrapper * Add shebang to virtualenvwrapper.sh * Diagnostic * Try virtualenvwrapper-win * Set encoding if detected None * Fixes to Mac and Windows tests on Azure pipelines (#217) * Temporarily comment out py36 * Update azure-pipelines.yml * Fix tests on Windows and Mac (#1) * Temporarily remove py37 * Update virtualenv_harness.sh * put py37 back in * Set encoding to utf-8 * Comment out rmvirtualenv * Update test_creation.py * Update virtualenv_harness.sh * Add --show-capture * Update azure-pipelines.yml * Update azure-pipelines.yml * Update test_creation.py * Update virtualenv_harness.sh * Update virtualenv_harness.sh * Update virtualenv_harness.sh * Update virtualenv_harness.sh * Update Makefile * Update virtualenv_harness.sh * Update cookiecutter.json * Update cookiecutter.json * Update virtualenv_harness.sh * Update Makefile * Update Makefile * Update Makefile * Update virtualenv_harness.sh * Update virtualenv_harness.sh * Update virtualenv_harness.sh * Update virtualenv_harness.sh * Update virtualenv_harness.sh * Update virtualenv_harness.sh * Update virtualenv_harness.sh * Update virtualenv_harness.sh * Update Makefile * Update Makefile * Update Makefile * Update Makefile * Update virtualenv_harness.sh * Update virtualenv_harness.sh * Update virtualenv_harness.sh * Update Makefile * Update Makefile * Update virtualenv_harness.sh * Update Makefile * Update virtualenv_harness.sh * Update virtualenv_harness.sh * Update test_creation.py * Update azure-pipelines.yml * Update virtualenv_harness.sh * Update virtualenv_harness.sh * Update virtualenv_harness.sh * Update virtualenv_harness.sh * Update cookiecutter.json * Update conda_harness.sh * Update conda_harness.sh * Update conda_harness.sh Co-authored-by: Eric Jalbert <ericmjalbert@users.noreply.github.com> Co-authored-by: Jonathan Raviotta <jraviotta@users.noreply.github.com> Co-authored-by: Wes Roach <wesr000@gmail.com> Co-authored-by: Christopher Geis <16896724+geisch@users.noreply.github.com> Co-authored-by: Peter Bull <pjbull@gmail.com> Co-authored-by: Ian Preston <17241371+ianepreston@users.noreply.github.com> Co-authored-by: Jay Qi <jayqi@users.noreply.github.com> Co-authored-by: inchiosa <4316698+inchiosa@users.noreply.github.com> * More graceful deprecation * Make tests pass locally * test version match installed version * Remove unused imports * Unremove used import * Move to GH Actions * Fix typo * Test non-windows * Add netlify configs * Update suggestion to keep using deprecated cookiecutter template (#231) * Add mkdocs requirements file to docs directory * Try setting python version in runtime txt for netlify * Trigger build * Python 3.8 netlify * Python 3.6 netlify * Do not specify python runtime for netlify * Use 3.7 This reverts commit 898d7d3cf6008e47e89ed607167fad7aee1e065e. Co-authored-by: James Myatt <james@jamesmyatt.co.uk> Co-authored-by: drivendata <info@drivendata.org> Co-authored-by: Eric Jalbert <ericmjalbert@users.noreply.github.com> Co-authored-by: Jonathan Raviotta <jraviotta@users.noreply.github.com> Co-authored-by: Wes Roach <wesr000@gmail.com> Co-authored-by: Christopher Geis <16896724+geisch@users.noreply.github.com> Co-authored-by: Ian Preston <17241371+ianepreston@users.noreply.github.com> Co-authored-by: Jay Qi <jayqi@users.noreply.github.com> Co-authored-by: inchiosa <4316698+inchiosa@users.noreply.github.com> Co-authored-by: Robert Gibboni <robert@drivendata.org>
4.1 KiB
4.1 KiB
Cookiecutter Data Science
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
Project homepage
Requirements to use the cookiecutter template:
- Python 2.7 or 3.5+
- Cookiecutter Python package >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
$ pip install cookiecutter
or
$ conda config --add channels conda-forge
$ conda install cookiecutter
To start a new project, run:
cookiecutter -c v1 https://github.com/drivendata/cookiecutter-data-science
New version of Cookiecutter Data Science
Cookiecutter data science is moving to v2 soon, which will entail using
the command ccds ... rather than cookiecutter .... The cookiecutter command
will continue to work, and this version of the template will still be available.
To use the legacy template, you will need to explicitly use -c v1 to select it.
Please update any scripts/automation you have to append the -c v1 option (as above),
which is available now.
The resulting directory structure
The directory structure of your new project looks like this:
├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── docs <- A default Sphinx project; see sphinx-doc.org for details
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── references <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported
│
├── {{ cookiecutter.module_name }} <- Source code for use in this project.
│ │
│ ├── __init__.py <- Makes {{ cookiecutter.module_name }} a Python module
│ │
│ ├── data <- Scripts to download or generate data
│ │ └── make_dataset.py
│ │
│ ├── features <- Scripts to turn raw data into features for modeling
│ │ └── build_features.py
│ │
│ ├── models <- Scripts to train models and then use trained models to make
│ │ │ predictions
│ │ ├── predict_model.py
│ │ └── train_model.py
│ │
│ └── visualization <- Scripts to create exploratory and results oriented visualizations
│ └── visualize.py
│
└── tox.ini <- tox file with settings for running tox; see tox.readthedocs.io
Contributing
We welcome contributions! See the docs for guidelines.
Installing development requirements
pip install -r requirements.txt
Running the tests
py.test tests