Reading time: 40 minutes
We will explore the .git folder TensorFlow along with .github folder and .gitignore file. The first step is to get the TensorFlow source code from GitHub. We will clone it using the following command in the command line:
git clone https://github.com/tensorflow/tensorflow
TensorFlow will be cloned and the output of the above command will be like:
Cloning into 'tensorflow'... remote: Enumerating objects: 47, done. remote: Counting objects: 100% (47/47), done. remote: Compressing objects: 100% (42/42), done. remote: Total 684345 (delta 13), reused 21 (delta 5), pack-reused 684298 Receiving objects: 100% (684345/684345), 389.45 MiB | 3.09 MiB/s, done. Resolving deltas: 100% (555319/555319), done. Checking out files: 100% (20219/20219), done.
Move into the tensorflow directory:
We need to check the contents of tensorflow to determine which folder we want to explore to understand git. ls is a command to list out the contents.
The output will be like:
ACKNOWLEDGMENTS CODEOWNERS ISSUE_TEMPLATE.md tensorflow ADOPTERS.md configure LICENSE third_party arm_compiler.BUILD configure.cmd models.BUILD tools AUTHORS configure.py README.md WORKSPACE BUILD CONTRIBUTING.md RELEASE.md CODE_OF_CONDUCT.md ISSUES.md SECURITY.md
The problem is that the ls command does not list out hidden folder. To list out all contents including hidden files, we need to use:
The output will be:
. BUILD .git README.md .. CODE_OF_CONDUCT.md .github RELEASE.md ACKNOWLEDGMENTS CODEOWNERS .gitignore SECURITY.md ADOPTERS.md configure ISSUES.md tensorflow arm_compiler.BUILD configure.cmd ISSUE_TEMPLATE.md third_party AUTHORS configure.py LICENSE tools .bazelrc CONTRIBUTING.md models.BUILD WORKSPACE
We can see multiple folder of our interest:
- .git (a folder)
- .gitignore (a file)
- .github (for GitHub) (We will explore it later)
Before we go into the adventure, let us take in a branch and a tag.
Let us get the r1.14 branch and then go back to master branch as follows:
git checkout r1.14 git checkout master
Let us get a tag from TensorFlow as:
git checkout v1.14.0
The output of the command will be:
Checking out files: 100% (9746/9746), done. Note: checking out 'v1.14.0'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b <new-branch-name> HEAD is now at 87989f6959 Add Sergii Khomenko to contributor list
With this, we have created some logs and taken a new branch and tags. Let us get started now.
Let us go into .git folder
We will list out all contents using ls -a:
. branches description hooks info objects refs .. config HEAD index logs packed-refs
Folders: branches, hooks, info, objects, refs, logs
Files: description, config, HEAD, index, logs, packed-refs
All folders and files are of our interest and we will go through them one by one.
Initially, if we go into branches folder and list out the contents, we will see that it is empty. This is because branches folder is deprecated. Initially (before 2009), it was used for specifying URLs for various operations like git fetch but Git decided to switch to a different approach.
For this, some repositories may not have this folder. Recently (in 2017), Git brought back branches folder but it is not being used for any feature.
Hence, this folder will remain empty.
hooks folder in git consists of shell scripts which are executed before a specific git operation. This is useful specially when the project is large and involves a large number of components/ checking. For example, before every git push, one can check if the build of TensorFlow is passing or not and take action accordingly.
The contents of hooks folder is as follows:
. post-update.sample pre-rebase.sample .. pre-applypatch.sample pre-receive.sample applypatch-msg.sample pre-commit.sample update.sample commit-msg.sample prepare-commit-msg.sample fsmonitor-watchman.sample pre-push.sample
As we see, there are 11 hooks in TensorFlow. For example, the file pre-push.sample is the shell script that is run before every git push command in TensorFlow. Let us check this file by opening it:
#!/bin/sh remote="$1" url="$2" z40=0000000000000000000000000000000000000000 while read local_ref local_sha remote_ref remote_sha do if [ "$local_sha" = $z40 ] then # Handle delete : else if [ "$remote_sha" = $z40 ] then # New branch, examine all commits range="$local_sha" else # Update to existing branch, examine new commits range="$remote_sha..$local_sha" fi # Check for WIP commit commit=`git rev-list -n 1 --grep '^WIP' "$range"` if [ -n "$commit" ] then echo >&2 "Found WIP commit in $local_ref, not pushing" exit 1 fi fi done exit 0
As we see, it prevents pushing commits with log message "WIP" which denotes the commit is work in progress and shall be pushed later.
Some hooks are provided by default and we can add new hooks or modify existing hooks as TensorFlow has done with pre push hook.
The other files are super interesting as well. Do check it and enjoy with a drink.
It has a file named exclude which contains patterns which are to be ignored. In TensorFlow, the contents of this file is as follows:
# git ls-files --others --exclude-from=.git/info/exclude # Lines that start with '#' are comments. # For a project mostly in C, the following would be a good set of # exclude patterns (uncomment them if you want to use them): # *.[oa] # *~
As we see TensorFlow does not have any preset exclude patterns. This is good as it can be decided during code review/ pull request review whether an implementation strategy is acceptable or not.
objects folder has two folders within it namely:
info folder is empty. pack folder has two files as follows:
. pack-1759b2236450cbd53a5a2aa4ef109e12b48aaade.idx .. pack-1759b2236450cbd53a5a2aa4ef109e12b48aaade.pack
The objects folder contains hashed files of the changes made. As we have not made any changes locally, we do not have such files. Once you make some commits to fix a bug or add a feature, new object files will be created which you can explore.
refs folder has three folders within it:
heads folder has files for each branch that we have used in our clone. As we have cloned a separate branch (r1.12) as well, we have two files namely:
Each files has a hash value pointing to the last commit in each branch where the head of git should point to. Content of master file is:
Content of r1.14 file is:
We have one remote repository so remote folder has only one folder namely:
origin folder has a file named HEAD with the following content:
If we add a new remote URL, a new folder with HEAD file will appear.
In the tags folder, we have no files or folders as we have not initiated any tags.
In the logs folder, we have one file HEAD and one folder refs. refs folder is same as the above refs folder tracks the logs of each activity. HEAD file has stored the logs as:
0000000000000000000000000000000000000000 0fa9f305bb24b2222ddff8ed0300c2e77c9cb96e opengenus <email@example.com> 1568766544 -0400 clone: from https://github.com/tensorflow/tensorflow 0fa9f305bb24b2222ddff8ed0300c2e77c9cb96e 00fad90125b18b80fe054de1055770cfb8fe4ba3 opengenus <firstname.lastname@example.org> 1568767275 -0400 checkout: moving from master to r1.14 00fad90125b18b80fe054de1055770cfb8fe4ba3 0fa9f305bb24b2222ddff8ed0300c2e77c9cb96e opengenus <email@example.com> 1568767395 -0400 checkout: moving from r1.14 to master 0fa9f305bb24b2222ddff8ed0300c2e77c9cb96e 87989f69597d6b2d60de8f112e1e3cea23be7298 opengenus <firstname.lastname@example.org> 1568774984 -0400 checkout: moving from master to v1.14.0 87989f69597d6b2d60de8f112e1e3cea23be7298 0fa9f305bb24b2222ddff8ed0300c2e77c9cb96e opengenus <email@example.com> 1568775029 -0400 checkout: moving from 87989f69597d6b2d60de8f112e1e3cea23be7298 to master
As we can see, it has captured the activity of git commands used. For logs we can infer the following activity:
- cloned tensorflow
- moved from master to r1.14
- moved from r1.14 to master
- moved from master to v1.14.0
- moved from v1.14.0 to master
Let us check the files one by one. We had the following files: description, config, HEAD, index, logs, packed-refs
The content of the config file is as follows:
[core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true [remote "origin"] url = https://github.com/tensorflow/tensorflow fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master [branch "r1.14"] remote = origin merge = refs/heads/r1.14
It has details of remote branch and URL for each local branch with a few other parameters like logallrefupdates which if set to true will capture all activity in form of log.
We have only one remote and two branches.
The contents of description file is as follows:
Unnamed repository; edit this file 'description' to name the repository.
This file contains the name of the repository which is usually used by hook scripts. For example, if there is a hook script
Contents of HEAD file:
index file is a binary file which has the list of files along with permissions and SHA1 hash of the objects associated. We can see its content using git ls-file command.
A small section of the output:
tensorflow/core/kernels/gather_nd_op.h tensorflow/core/kernels/gather_nd_op_cpu_impl.h tensorflow/core/kernels/gather_nd_op_cpu_impl_0.cc tensorflow/core/kernels/gather_nd_op_cpu_impl_1.cc
The first few lines of packed-ref file is as follows:
# pack-refs with: peeled fully-peeled sorted 4be56f381cd000e91f79209aaf150636db6fb840 refs/remotes/origin/0.6.0 807f95063c1e1072fe5b936abf529e133010ec46 refs/remotes/origin/1.8.0 ca2f3de8daa10b18fe2314b2494e94317885b928 refs/remotes/origin/backend_api_cherrypick 711d4fe8132c3cdd70c3230997189d1b87c695de refs/remotes/origin/bananabowl-patch-1 1e57145558a50a972963217c468118f5c3569364 refs/remotes/origin/cherrypick_batch_dot
It has mapping of a hash pointing to a commit along with the branch name. It is used while going into a branch as the hash can be used to identify the point it needs to go to.
With this, we have explored the .git folder and are left with .github folder and .gitignore file. Let us dive into it.
.github folder has only one folder
. .. ISSUE_TEMPLATE
ISSUE_TEMPLATE has the following files within it:
. 00-bug-performance-issue.md 20-documentation-issue.md 40-tflite-op-request.md .. 10-build-installation-issue.md 30-feature-request.md 50-other-issues.md
These files contain templates for filing issues, opening pull requests and others. The files are used as follows:
- 00-bug-performance-issue.md: template for reporting a bug or a performance issue
- 20-documentation-issue.md: template for documentation
- 40-tflite-op-request.md: template for reporting ops being used or missed
- 10-build-installation-issue.md: template for build/installation issues
- 30-feature-request.md: template for opening a feature request
- 50-other-issues.md: template for any other non-support related issues
The .gitignore file contains the list of files and folders which will not be tracked git that is if any changes are made to these files or folders, it will not make any changes in the git flow.
The contents of .gitignore file of TensorFlow is as follows:
.DS_Store .ipynb_checkpoints node_modules /.bazelrc.user /.tf_configure.bazelrc /bazel-* /bazel_pip /tools/python_bin_path.sh /tensorflow/tools/git/gen /pip_test /_python_build *.pyc __pycache__ *.swp .vscode/ cmake_build/ tensorflow/contrib/cmake/_build/ .idea/** /build/ [Bb]uild/ /tensorflow/core/util/version_info.cc /tensorflow/python/framework/fast_tensor_util.cpp /tensorflow/lite/gen/** /tensorflow/lite/tools/make/downloads/** /api_init_files_list.txt /estimator_api_init_files_list.txt *.whl # Android .gradle .idea *.iml local.properties gradleBuild # iOS *.pbxproj *.xcworkspace /*.podspec /tensorflow/lite/**/[ios|objc|swift]*/BUILD /tensorflow/lite/examples/ios/simple/data/*.tflite /tensorflow/lite/examples/ios/simple/data/*.txt Podfile.lock Pods xcuserdata
With this, we have explored the entire git strategy of TensorFlow and you must have learnt a lot in the process like TensorFlow will not accept commits with "WIP" text in form of a custom hook, it has custom GitHub templates and much more.