Microsoft discusses progress in GitHub migration

microsoft-620x465-100536203-primary.idge

Just last month, Microsoft announced it planned to close its CodePlex code hosting service in favor of GitHub, which it had been using more and more frequently anyway. This week, the company announced the progress it has made in making the move and issues that cropped up along the way.

Microsoft staffer Brian Harry said in a blog posting that the Windows repository is the largest Git repository in the world. Weighing in at 300GB and 3.5 million files, the Git repository catches 8,421 pull requests and 1,760 official builds a day. Overall, Microsoft has nearly 4,000 engineers working on Windows.

“All three of the dimensions (file count, repo size and activity) independently provide daunting scaling challenges, and taken together they make it unbelievably challenging to create a great experience.  Before the move to Git, in Source Depot, it was spread across 40-plus depots and we had a tool to manage operations that spanned them,” he wrote.

However, it wasn’t a flawless migration. Harry noted that 28 percent of the 251 staff that responded to an internal survey aren’t happy with the move. Reasons include tools that don’t support Git, having to learn the new process, and performance falling short of demand.

“I’m not going to jump up and down and celebrate those numbers, but for a team that had just had their whole life changed, had to learn a new way of working and were living through a transition that was very much a work in progress, I felt reasonably good about it,” he wrote.

Microsoft develops Git Virtual File System

Harry noted that Microsoft has developers working on Windows all over the world, and in worst-case scenarios, simple requests would take hours. Git wasn’t built for a project of this size, so Microsoft developed the Git Virtual File System (GVFS) to provide the benefits of Git and greater speed.

GVFS virtualizes the file system beneath your repo and makes it appear as though all the files in your repo are present locally. However, it doesn’t download the file until you actually open it. It also actively manages how much of the repo Git has to consider in operations such as checkout and status. Files not needed are ignored.

There is also a proxy server to cache Git data. In one experiment, the North Carolina office went from needing 25 minutes to pull a clone without the proxy to just 70 seconds with the proxy.

“Overall Git with GVFS is completely usable at crazy large scale and the results are proving that our engineers are effective. At the same time, we have a lot of work to do to get the performance to the point that our engineers are ‘happy’ with it. The O(modified) work rolling out next week will be a big step, but we have months of additional performance work still on the backlog before we can say we’re done,” Harry wrote.

The code for the Git Virtual File system is available now under the MIT license on GitHub and open for community contributions. You need a Visual Studio Team Services account to use it.