A New Data Science Student’s Guide to Git

Melody Peterson
3 min readNov 30, 2020

If there has been just one concept in the first months of my data science journey that has caused me the most frustration, it has been revision tracking. As a former mainframe COBOL programmer and software implementation consultant, I already had a basic understanding of using a “test realm” and then “going live” or moving “into production.” I’m not even sure if anyone still uses that terminology these days.

I do not intend to provide any great knowledge on how revision tracking works, but rather walk through the basic steps and commands to make sure that your local repository is up-to-date with the main repository you are syncing with online.

Step 1: I am beginning with the assumption that you have already cloned your repository to your local computer and you are starting to work on your local files for the day. If not, you will want to look into git clone to get your local environment set up. The first thing you should always do is fetch and/or pull any changes from your origin repository. You do not want to start working on your Jupyter Notebook making lots of changes only to realize that you didn’t pull down the most recent copy with any changes that were made. git fetch will download any changes from the upstream repository. Then you will need to git merge to actually merge the changes into your files. Alternatively, you can do a git pull which performs the fetch and merge simultaneously.

Step 2: Run your jupyter notebook and make your changes as necessary. When you are finished making changes, close the notebook and return to the Bash terminal to press Ctrl C to shut down the connection.

Step 3: Add your changes to the staging area to be uploaded by performing git add . This will add all changes to the staging area. Alternatively, you can just select certain files with git add <file_name> but so far in my experience it seems better to just add all of your changes at once.

Step 4: Commit your changes by using git commit -m '<commit message>'. If you are trying to be more efficient, you can put -a after your commit and it will perform the git add . as a part of your commit. The combined command would look like git commit -am '<commit message>'.

Step 5: Now you are ready to push all of the changes you made locally up to your online repository using git push. You will see your terminal scroll through a lot of output and enumerating files and changes.

Step 6: Now you are done for the day. First thing in the morning, you will want to start all over again at the beginning.

Some side notes: To keep it simple, I haven’t discussed branching of files. But if you are sharing your repo with any other collaborators, this is probably something you want to do. It doesn’t have to complicate things too much and you will want to login to github to create a pull request to merge your branch into the main branch when you are finished working with it.

These are the basics. I suggest referencing the official documentation for more details.

Resources

Git Reference Docs

--

--

Melody Peterson

Data Science Student, Stay-at-Home Mom, Former Management Consultant