Last Update: 06.07.2018. By Jens in Developers Life | Learning | Newsletter
We scratched topic yesterday, but I thought we could talk about that a bit further.
The question is: Should you go with a single big repo or just a few ones or one for each module?
This question is emotionally loaded up as the age-old fight of spaces vs tabs. Yet, it is more important and has a bigger effect on the work.
But what are the pros and cons of each option?
Single Big repo
Working with a single repo makes context switching much easier. You don’t have to think about “damn in which repo was this functionality”. It is also easier to point a new dev to just one entry point. One thing, nobody can miss it. You could also make a single version of the whole repo. It might remove some overhead.
The downsides are manyfold though. You need a strict structure or it will become a mess. Also, devs need to be a bit more careful that they stay in scope and are not committing accidentally other stuff like dot files of ides… Basically, anybody can access anything and thus break anything.
It becomes quite huge over time and the history is getting bloated. It might be harder to find relevant code changes. It also takes more time and bandwidth to clone or sync it.
The opposite of it is the “every module has its own repo” approach.
On the pros we got separate code bases, so devs can only break a single point, you can easily see changes on the current code and its pretty lightweight to clone and sync.
The biggest con is, that you end up with dozens or maybe hundreds of repos. If you need to change something down in the dependency tree, you must hunt the repo down, in the right version and make your changes. Chances are you need to touch some more because of dependencies (think Maven pom updates). Now, instead of just committing your fix to one repo, you might commit multiple repos. As they are not connected, committing becomes an overhead and it also makes looking for commit history a bit harder. Yep, you have to look in more than one repo.
You need to keep track of all repos. It can become quickly complex and not each dev can handle that.
The middle ground of both approaches is to settle for a few repos. So, for example, each major application or highly independent libs reside in its own repo. The complexity shrinks and you still get the benefit that a single commit cannot break everything. It’s somewhere in the middle of the other two approaches. In which direction you might lean depends on the project and comapny at hand.
I’ve been working with all three versions so far, and for me, came to the conclusion that a good middle ground gives the least hassles. Complaints and confusion go down and they still don’t break anything. It’s also a good compromise in handling fix branches and versions.
If in doubt, I’d also start with fewer big repos instead of the other way around. It’s easier to split them up if needed than it is to combining smaller ones to bigger repos.