Current location - Quotes Website - Personality signature - How does git merge determine conflicts?
How does git merge determine conflicts?
When solving the conflict of git merger, sometimes I can't help complaining that git is too unwise. Obviously, I just inserted a few lines into the code. I didn't expect the merger to fail, so I had to confirm it manually one by one. I really don't know how to judge the merger conflict of git.

After solving a merger conflict involving dozens of files (it took me a whole night, a whole morning! ), I finally made up my mind to see git.

The concrete realization of conflict judgment in merging code. As the saying goes, if there is grievance, there is debt. At least the next time you encounter the same problem, you will know who you are planted on. So there is such an article, tell me about it.

Conflict determination mechanism in git merger.

Recursive three-way merging and ancestors

Git source code

First, use merge as a keyword search to see the related codes involved.

After searching for a while, I found git merge and compared the function entry of the file to be merged: ll_merge. There is also a document that also points out that ll_merge is the entrance to the merger implementation.

As can be seen from the function signature, mmfile_t should represent the file to be merged. Interestingly, there are not two documents to be merged here, but three.

int ll _ merge(mm buffer _ t * result _ buf,

const char *path,

mmfile_t *ancestor,const char *ancestor_label

mmfile_t *ours,const char *our_label,

mmfile _ t * theirs,const char *their_label,

Structure ll_merge_options *opts)

Readers who have seen git help merge should know that our representatives represent the current branch and their representatives represent the branch to be merged. As you can see, this function is to merge the versions of a file in different branches. So which branch is the ancestor located in? Reading the caller's code in turn, we can see that the general process is like this. Git merge will find three commit, and then call ll_merge for each file to be merged to generate the final merged result. According to the annotation, ancestor is the common ancestor of the last two commit (ours and theirs). In addition, the above documents also show that recursive three-way merging is used when merging git.

About recursive three-way merging, Wikipedia has related introduction #Recursive_three-

Way_merge). At the time of merging, our files, their files and ancestors' files are compared to obtain our files and ancestors' files.

Diff, as well as their and their ancestors' diff, can find out what changes have been made by two different branches. After all, git needs to determine the content of the conflict later, if not.

It is impossible to simply compare two files with the information in the original version.

Because my goal is to explore the mechanism of git to determine conflicts, I didn't look for the implementation of ancestor in git. However, you can see the ancestor submission with the naked eye in the graphical interface. (For example, the network interface of gitlab goes back to the commit line of two branches until the fork in the road).

One thing to note is that submission will not change its ancestors. The so-called recovery just adds a new one to the current submission.

cancel

Commit did not change the position of the fork in the road. Don't take it for granted that after revert, ancestor will be the last ancestor to submit. special

It's always easy to forget this fact when reverting to a merge submission. If you resume the merge.

Commit, when merging again, the ancestor referenced by git will not be the ancestor before merging, but the ancestor after revert.

Ancestor. So he fell into the pit. It is recommended that all readers read the official statement of git on the potential consequences of reverting and merging:/git/git/blob/master/documentation/howto/revert-a-fault-merge.txt.

The conclusion is that if the errors introduced by the merge submission are easy to fix, please don't resume the merge submission easily.

Analyze xdiff

Chasing down from ll_merge, you can see that there is a bypass behind: ll_binary_merge. This function handles the merging of bin files. Its implementation is simple and rude. If you do not specify a merge policy (theris or ours), you can directly report the failure to merge binaries. It seems that in git's view, the binary file has no value of diff.

The main path goes from ll_xdl_merge to xdl_merge and enters a library named xdiff. Finally, I found the concrete implementation of git merging.

In all fairness, xdiff's code style is very bad. Not only are there too few comments, but also the names of structural member variables are like i 1 and i2, which makes me confused and upset.

At the end of the spit, let's talk about the process of xdl_merge first. Xdl_merge did the following four things:

Two-way diff(ours and ancestor, theirs and ancestor) is completed by xdl_do_diff, and modification records are generated and stored in xdfenev _ t.

Xdl_change_compact compresses adjacent modification records, and then uses xdl_build_script to establish xdchange_t linked list to record both modifications. Xdchange_t mainly includes the starting line number and modification range of modification.

At this time, there are three situations, two of which are that only one party modifies (only ours or theirs is a linked list) and exits directly. The last one is the records that have been modified by both parties and need to be merged.

Record. Because the modified records are arranged in line number order, the two linked lists are merged directly. If there is no overlap in the revision records, they will be marked as our revision/other revision in sequence. If there is overlap, it means it has been sent.

After that, there is a conflict, and we will check the two linked lists to merge again. For those parts marked as conflicts, we will compare whether they are equal. If so, we will mark them as modified by both parties.

The merge result is output by xdl_fill_merge_buffer. If there is a conflict, call fill_conflict_hunk to output the conflict. If there is no conflict (marked as our modification/modification by others/modification by both parties), the original contents and modification records of ancestors will be merged, and the modified contents will be taken according to the type and output of the tag.

The code that outputs the conflict situation is located in fill_conflict_hunk. Its implementation is very simple. After all, at this time, we already have the content modified by both parties, and now we only need to output the conflicting content for users to choose at the same time. This is the source of the conflict that took a night and a morning to solve, and the murderer is you, huh.

I'm afraid everyone is familiar with the output format. This function will print several

& lt<<<<< head

three

=======

2

& gt& gt& gt& gt& gt& gt& gtbranch 1

abstract

Git merge's conflict determination mechanism is as follows: first, look for the common ancestor of two commit, compare the differences between ours and their same file, and then merge the two groups of differences. If both parties modify a place at the same time, and the modified contents are different, it is judged as a merger conflict, and the modified contents of both parties are output in turn.