I this article I want to talk about Git and how to get started with it. For those new to software development practices it can seem a little daunting, and definitely confusing. I hope to break it down into some simple steps that will help you get started.
Amongst the many themes that run through software development, DevOps and the other connected disciplines are those of working in teams and having a shared understanding of the codebase, ideally with a single source and a history of changes made over time. Git is one of a number of tools that offer this functionality, and is one of the most popular and widely used.
Others such as Mercurial and Subversion are also used, but Git has developed into the most popular and commonly used. In this article we’ll only be discussing Git, but the principles are broadly applicable to other tools.
Finally, through most of this article I’ll be using commands for Linux/MacOS. If you’re on Windows you can either use the, in my opinion, awesome Windows Subsystem for Linux (WSL) or using PowerShell most of the commands will work with the exception of
touch which you can replace with
New-Item -ItemType File -Name <file_name>.
What is Git?
Right up front I want to split Git from tools such as GitHub and GitLab. Sure, they have “git” in the name and under the hood they do use Git, but Git is a tool and GitHub and GitLab are services that use Git. You do not need to sign up for GitHub, GitLab, or any other similar service to use Git; you can use Git without using GitHub or GitLab. we’ll come back to these services later, but for now let’s focus on Git.
So, what is Git? The answer many give is “Git is a version control system”. Let’s unpack that for a moment; what is a version control system? A version control system is a tool that allows you to track changes to a set of files over time. It allows you to see who made changes, when they were made, and what those changes were. It also allows you to revert to previous versions of the files if you need to.
If you’re like me and started out writing a few simple system administration scripts you may have simply saved your files in a tool such as Google Drive, or OneDrive and relied on the built-in feature of those products to maintain a version history of your files. You might also have developed a naming convention such as
my_script_v3.sh and so on. These are both examples of a simple form of version control, but they are very simplistic. The file naming approach relies on manual work and you remembering to increment the version number, deciding when a change warrants a new version etc. The cloud sync tool approach is a little better with modern tools but often they have time limits on the number of changes to the file held and also they struggle when it comes to different changes being made by different people to the same file.
So what do we want, or need from our version control?
- We want to be able to track changes to files over time
- It would be nice to be able to see who made changes and when
- It would be nice to see exact changes made
- Reverse changes if needed could be useful
- Perhaps it would be handy to be able to work on the same files as others
- What about differing changes to the same file, maybe an in production fix and a new feature being developed at the same time
Hopefully you can see that a simple file sync tool is not going to cut it. We need something more powerful. This is where Git comes in. Git gives us all these features and more!
Installing Git is pretty simple. You can download the latest version from the Git website and install it on your machine.
You can also use a package manager to install Git. For example, on a Debian based system you can use
apt to install Git:
$ sudo apt install git
Before getting into the weeds I want to point you to my Git Knowledge Article as this article will give practical tips and tricks for using git to achieve specific goals and I’ll be updating that article over time with new hints and techniques.
One last thing before we get started, this tutorial like so many others you’ll find across the internet is a few basic commands and examples. Don’t mistake it for a comprehensive guide to everything you can do. Pop a brave pill and experiment as you go through this tutorial. You’ll learn a lot more that way. The worst case scenario is something doesn’t work quite as intended, delete the repository and start again. You’ll be fine!
Now, let’s get started with Git! The first thing we need to do is create a repository. A repository is a collection of files that Git will track. We can create a repository in a number of ways, but the simplest is to create a new directory and then initialise a new repository in that directory.
mkdir my_repo # Create a new directory for our repository cd my_repo # Change into the new directory git init # Initialise a new repository Initialized empty Git repository in /home/gwatts/Temp/my_repo/.git/ # Returned output
git init all we have is a pretty normal, empty directory. After running
git init a hidden
.git directory is created. This is where Git stores all the information about the repository. You should never need to edit anything in this directory, but this where all the magic happens and your data is stored. If for some crazy reason you decide to stop using git, you can simply delete this directory and all the data will be gone, and just the files currently in the directory will remain.
One other thing that we should do while getting started is to tell git who we are by setting our name and email address. These tasks can be done using the
git config command.
git config --global user.name "Graham Watts" # Set the user name git config --global user.email "email@example.com" # Set the user email
Setting these values globally means that they will be used for all repositories on the machine. If you want to set them for a specific repository you can omit the
--global flag. We’ll take a look at
gitconfig in more detail at another time. If you forget to set these values you’ll get a warning when you try to commit changes to the repository.
Understanding the deep mechanisms of git is a story for another time, but for now we can think of the
.git directory as a database that stores all the information about the repository.
Now that we have a repository we can start adding files to it. Let’s create a simple file and add it to the repository.
touch my_file.txt # Create a new file echo "Hello World" > my_file.txt # Add some content to the file git add my_file.txt # Add the file to the repository file tracking git commit -m "Added my_file.txt" # Commit the changes to the file to the repository
Now, let’s unpack what just happened. We created a new file, added some content to it, and then added it to the repository. We then committed the changes to the repository. Let’s look at each of these steps in turn.
- We created a new file using the
touchcommand. This created an empty file called
- We added some content to the file using the
>symbol tells the shell to redirect the output of the
echocommand to the file
my_file.txt. This overwrites the contents of the file with the new content.
- By using the
git addcommand we added the file to the list of files that the repository is tracking. This means that Git will now track changes to the file. If we make changes to the file Git will be able to see those changes and track them.
- Finally, we committed the changes to the repository with
git commit. This means that Git has taken a snapshot of the repository at this point in time. This snapshot is called a commit. We can think of a commit as a version of the repository. We can go back to this commit at any time and revert the repository to this point in time. We can also create new commits from this point in time and continue to develop the repository.
Note: When making a commit git expects us to provide a commit message. This is a short description of the changes made in the commit. It is good practice to provide a meaningful commit message. This is especially important when working on a team as it allows other developers to understand what changes were made in the commit. You can provide a commit message using the
-m flag as we did above.
Note: If we don’t pass a commit message using
-m then git will prompt us to do so, typically by opening our default text editor. This can be confusing for a new engineer as they may not know what to do. It is best to get into the habit of providing a commit message using the
-m flag. That said, be brave, this is our little test sandbox, so try and it out and see what happens if you forget to provide a commit message. Maybe add another new file and commit it without a message.
Updating a File
Now that we have a file in our repository let’s make some changes to it. We can do this by simply editing the file. Let’s add another line to the file.
echo "This is a new line" >> my_file.txt # Add another line to the file git add my_file.txt # Add the file changes to the repository file tracking git commit -m "Added another line to my_file.txt" # Commit the changes to the file to the repository
We have now made some changes to the file and committed them to the repository. We can see the changes we made by using the
git diff command.
git diff HEAD~1 HEAD # Show the changes between the last two commits # returned output diff --git a/my_file.txt b/my_file.txt index 557db03..d87f1d6 100644 --- a/my_file.txt +++ b/my_file.txt @@ -1 +1,2 @@ Hello World +This is a new line
Don’t get too bogged down in the
git diff command itself right now, what we have done here is to compare the most recent commit (called
HEAD) with the 1 commit previous (
+ symbol indicates a line that has been added and the
- symbol would indicate a line that has been removed, if there were any in our case. We can see that we have added a new line to the file.
Checking the Status of the Repository
Often as we’re working and we’re adding, removing, and changing files we want to know what the current status of the repository is. We can do this by using the
git status command.
git status # Show the current status of the repository
Let’s give it a quick test. Add a new file to the repository and then run
git status. You should see something like this:
touch my_new_file.txt # Create a new file echo "This is my new file" > my_new_file.txt # Add some content to the file git status # Show the current status of the repository # returned output On branch master Untracked files: (use "git add <file>..." to include in what will be committed) my_new_file.txt nothing added to commit but untracked files present (use "git add" to track)
What we can see here is that we have a new file in the repository that is not being tracked by Git. We can add this file to the repository by using the
git add command and then re-run
git status to see the changes.
git add my_new_file.txt # Add the file to the repository file tracking git status # Show the current status of the repository # returned output On branch master Changes to be committed: (use "git restore --staged <file>..." to unstage) new file: my_new_file.txt
We can see that the file is now being tracked by Git and is ready to be committed. We do see something new with the references to
unstage, we’ll cover this below in Working Area, Staged Changes, and Commits.
For now, let’s commit the changes to the repository.
git commit -m "Added my_new_file.txt" # Commit the changes to the file to the repository
Our file and its changes is now safely stored in the repository. Carry on yourself and try out making changes, adding files and checking the status. Maybe see what happens if you delete a file!
Top tip: When learning like this I like to use the
git status command a lot as I step through each change I make. It’s a great way to see what is going on and to check that I’m doing what I think I’m doing.
Working Area, Staged Changes, and Commits
Now; let’s unpack what we saw earlier with the references to
Git essentially has 3 logical areas in which our files and their changes can exist. These are:
- Working Area
- Staged Changes
Let’s step through each of these areas and see how they relate to each other.
The working area is where we make changes to our files. This is where we add, remove, and change the contents of our files. This is the area that we are most familiar with as this is where we spend most of our time when working with files. This is pretty much our normal experience of working with files, even outside of using git.
gwatts@my-computer:~/Temp/my_repo$ touch some_other_file.txt gwatts@my-computer:~/Temp/my_repo$ git status On branch master Untracked files: (use "git add <file>..." to include in what will be committed) some_other_file.txt nothing added to commit but untracked files present (use "git add" to track)
The staged changes area is where we add our changes to the repository. This is where we tell Git that we want to add our changes to the repository. We can add files to the staged changes area using the
git add command. We can also remove files from the staged changes area using the
git restore --staged command.
The staged area acts as an intermediate step and allows us to have files in our working area that we don’t want to add to the repository yet. Maybe it’s a new file we’re working on and we’re not ready to commit it yet. Or maybe it’s a file that we never want to commit as it’s something local to us. Hint: we can also something called
.gitignore for this, but we’ll get to that later. For now, just know that files in the staged area are ready to be committed to the repository but have not yet been committed and files in the working area are not yet ready to be added. I hope that makes sense? Whether it does, or not, have a play around with adding (
git add) and removing (
git restore --staged <filename>) files from the staged area and see what happens.
gwatts@my-computer:~/Temp/my_repo$ git add some_other_file.txt gwatts@my-computer:~/Temp/my_repo$ git status On branch master Changes to be committed: (use "git restore --staged <file>..." to unstage) new file: some_other_file.txt gwatts@my-computer:~/Temp/my_repo$ git restore --staged some_other_file.txt gwatts@my-computer:~/Temp/my_repo$ git status On branch master Untracked files: (use "git add <file>..." to include in what will be committed) some_other_file.txt nothing added to commit but untracked files present (use "git add" to track)
As we just said, but it bears repeating for clarity, changes in the staged changes area are not yet committed to the repository. We can see what files are in the staged changes area by using the
git status command. If we somehow lost these files and their changes we couldn’t get them back from the repository, as they were never committed.
Top tip: Don’t leave files in the staged changes area for too long. If you do, you may forget what changes you made and why you made them. It’s best to commit your changes as soon as you can.
The commits area is where our changes are finally stored into the repository. Once files, and their changes, are added here we can inspect their history over time, revert back to previous versions, and generally feel safe that as long as the repo itself is safe changes committed here are safe too.
gwatts@Graham-T14:~/Temp/my_repo$ git add some_other_file.txt gwatts@Graham-T14:~/Temp/my_repo$ git commit -m "Adding some_other_file.txt" [master b0a110c] Adding some_other_file.txt 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 some_other_file.txt
While I’m here, I’ll make a note that different individuals and teams have different views when it comes to commits. Some advocate for fewer, larger commits as it keeps the commit history cleaner and there are less commits to look back through for changes etc. Others prefer small commits, made more often as it ensures changes are captures and reduces the risk of losing changes if something goes wrong and reduces the risk of merge conflicts when working with others.
There are arguments either way on this one. My advice is to not get caught up on it too much until you start working with a team and to follow their collective preference once you do. I personal fall on the side of smaller commits, more often, but I’m not going to get into a fight over it.
Viewing the Commit History
Once we have some changes to our repository we can start to see the history of our changes. We can do this by using the
git log command.
git log # Show the commit history of the repository
For our changes so far the output should look something like this:
commit b5259ef09423f32a0b84d5cc1a6713b0346733ab (HEAD -> master) Author: Graham Watts <user_email> Date: Wed Jan 18 10:40:42 2023 +0000 Added another line to my_file.txt commit 99b4f82b3360a17f64c3554304ac2020e7e160a9 Author: Graham Watts <user_email> Date: Wed Jan 18 10:39:28 2023 +0000 Added my_file.txt
With this tool we can see the history of our changes. We can see the commit message, the author, the date, and the commit hash. The commit hash is a unique identifier for the commit. We can use this to reference a specific commit in the future. We’ll see this in action later.
Getting Started with Branches
The last subject I want to cover in this getting started is branches. Branches are a really powerful tool in Git and are a great way to work on new features or changes without affecting the main codebase. We can create a branch using the
git branch command.
git branch my_new_branch # Create a new branch called my_new_branch
What this command does is create a copy of the current state of the repository and store it under a new name called
my_new_branch. This new branch is also part of our repository. What this allows us to do is make changes to files in our repository without affecting the original copies on the previous branch.
We can see our new branch, and any others using the
git branch command again with the
git branch -a # Show all branches in the repository # returned output * master my_new_branch
* indicates the current branch we are on. We can switch between branches using the
git checkout command.
git checkout my_new_branch # Switch to the my_new_branch branch # returned output Switched to branch 'my_new_branch'
git branch -a command again will show us that we are now on the
git branch -a # Show all branches in the repository # returned output master * my_new_branch
We can also see our branch in the output of the
git status command.
On branch my_new_branch nothing to commit, working tree clean
Now that we have a new branch let’s add one more new file to our repository but on this new branch.
# First make sure we're on the my_new_branch branch git checkout my_new_branch # Should return the following output if we're on the correct branch already Already on 'my_new_branch' # Now let's add a new file to the repository touch another_new_file.txt # Let's add some text for good measure echo "This is another new file" >> another_new_file.txt # Now let's add the file to the staged changes area git add another_new_file.txt # And commit the changes git commit -m "Added another_new_file.txt"
If we inspect our repository now using
ls we should see the new file.
ls -la # returned output - something like this total 24 drwxr-xr-x 3 gwatts gwatts 4096 Jan 18 11:46 . drwxr-xr-x 3 gwatts gwatts 4096 Jan 18 10:36 .. drwxr-xr-x 8 gwatts gwatts 4096 Jan 18 11:46 .git -rw-r--r-- 1 gwatts gwatts 25 Jan 18 11:46 another_new_file.txt -rw-r--r-- 1 gwatts gwatts 31 Jan 18 10:39 my_file.txt -rw-r--r-- 1 gwatts gwatts 20 Jan 18 10:47 my_new_file.txt
Now, if we switch back to the
master branch with the
git checkout master command we should see that the new file is not there.
git checkout master # returned output Switched to branch 'master' # Now let's check the files in the repository ls -la # returned output - something like this total 20 drwxr-xr-x 3 gwatts gwatts 4096 Jan 18 11:48 . drwxr-xr-x 3 gwatts gwatts 4096 Jan 18 10:36 .. drwxr-xr-x 8 gwatts gwatts 4096 Jan 18 11:48 .git -rw-r--r-- 1 gwatts gwatts 31 Jan 18 10:39 my_file.txt -rw-r--r-- 1 gwatts gwatts 20 Jan 18 10:47 my_new_file.txt
It’s not that our file has been lost or magically disappeared. It’s just that it’s not on the
master branch. If we switch back to the
my_new_branch branch we can see that the file is still there.
git checkout my_new_branch # returned output Switched to branch 'my_new_branch' # Now let's check the files in the repository ls -la # returned output - something like this total 24 drwxr-xr-x 3 gwatts gwatts 4096 Jan 18 11:46 . drwxr-xr-x 3 gwatts gwatts 4096 Jan 18 10:36 .. drwxr-xr-x 8 gwatts gwatts 4096 Jan 18 11:46 .git -rw-r--r-- 1 gwatts gwatts 25 Jan 18 11:46 another_new_file.txt -rw-r--r-- 1 gwatts gwatts 31 Jan 18 10:39 my_file.txt -rw-r--r-- 1 gwatts gwatts 20 Jan 18 10:47 my_new_file.txt
If we’re happy with the changes we’ve made in our new branch we can merge them back into the
master branch. We can do this using the
git merge command.
git checkout master # Switch to the master branch git merge my_new_branch # Merge the my_new_branch branch into the master branch # returned output Updating df25842..3583577 Fast-forward another_new_file.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 another_new_file.txt
If we inspect our repository now we should see that the new file is now in the
ls -la # returned output - something like this total 24 drwxr-xr-x 3 gwatts gwatts 4096 Jan 18 11:52 . drwxr-xr-x 3 gwatts gwatts 4096 Jan 18 10:36 .. drwxr-xr-x 8 gwatts gwatts 4096 Jan 18 11:52 .git -rw-r--r-- 1 gwatts gwatts 25 Jan 18 11:52 another_new_file.txt -rw-r--r-- 1 gwatts gwatts 31 Jan 18 10:39 my_file.txt -rw-r--r-- 1 gwatts gwatts 20 Jan 18 10:47 my_new_file.txt
As with everything, there are many schools of thought and approaches to branches and merging. Typically you’ll want to keep your branches as small as possible and merge them back into the main branch as soon as possible. This will help to avoid merge conflicts and keep your codebase clean. But, again, when you start working with a team they may employ another approach. It’s important to understand the basics of branches and merging so that you can work with your team and understand what they’re doing.
In this post we’ve covered the basics of working with Git and GitHub. We’ve covered the basics of creating a repository, adding files, committing changes, and pushing changes to GitHub. We’ve also covered the basics of branching and merging. In a future post I’ll cover some of the more advanced features of Git and GitHub including working with remotes, pull requests, and more.
Remember, check out my Git Knowledge Article for tips and tricks when working with Git.
Scott Hanselman has done some great videos on the basics of working in IT and software development including a really nice Git 101 video. I highly recommend checking it out.
If this article helped inspire you please consider sharing this article with your friends and colleagues, or let me know via LinkedIn or Twitter. If you have any ideas for further content you might like to see please let me know too.