[Webkit-unassigned] [Bug 111066] svn-apply cannot apply patches which is generated by git to files that contain space characters in their path

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Fri May 24 12:35:04 PDT 2013


https://bugs.webkit.org/show_bug.cgi?id=111066





--- Comment #13 from Daniel Bates <dbates at webkit.org>  2013-05-24 12:33:33 PST ---
(From update of attachment 202766)
View in context: https://bugs.webkit.org/attachment.cgi?id=202766&action=review

Thank you Yuki for updating the patch.

The idea presented in this patch is good. I suggest that we encapsulate the parsing machinery to determine the path to the target file into a separate function, say parseGitDiffHeaderForIndexPath, that takes a reference to a file handle F and the last line read from F as input and returns the path to the target file as output (or the empty string on error). parseGitDiffHeaderForIndexPath() will guarantee that F isn't advanced when it returns. That is, if F is at byte b_0 when parseGitDiffHeaderForIndexPath() is called then F will be at byte b_0 when the call returns. This function would have the form:

# Parse the Git diff header for the index path from the given file handle.
#
# This subroutine doesn't advance the handle.
#
# Args:
#   $fileHandle: a file handle; this should be at a line beginning
#                with "diff --git". The handle will not be advanced.
#   $line: the line last read from $fileHandle
#
# Returns $indexPath: the path of the target file.
sub parseGitDiffHeaderForIndexPath($$)
{
    my ($fileHandle, $line) = @_;
    $_ = $line;
    if (/$gitDiffStartWithPrefixRegEx/ || /$gitDiffStartWithoutPrefixNoSpaceRegEx/) {
        return $2;
    }
    # Assume the diff was generated with --no-prefix (e.g. git diff --no-prefix).
    my $sourcePath = "";
    my $destinationPath = "";
    my $savedPosition = tell($fileHandle);
    while (1) {
        # Temporarily strip off any end-of-line characters to simplify
        # regex matching below.
        s/([\n\r]+)$//;

        my $foundHeaderEnding;
        if (/^(copy|rename) to ([^\t\r\n]+)/) {
            $destinationPath = $2;
        } elsif (/^--- ([^\t\r\n]+)/) {
            $sourcePath = $1;
        } elsif (/^\+\+\+ ([^\t\r\n]+)/) {
            $destinationPath = $1;
            $foundHeaderEnding = 1;
        } elsif (/^GIT binary patch$/) {
            $foundHeaderEnding = 1;
        }

        $_ = <$fileHandle>; # Not defined if end-of-file reached.
        last if (!defined($_) || /$gitDiffStartRegEx/ || $foundHeaderEnding); 
    }
    seek($fileHandle, $savedPosition, 0);
    my $indexPath = "";
    if ($destinationPath eq "/dev/null") {
        # Deletion
        $indexPath = $sourcePath;
    } else {
        # Addition, modification, copy, or rename.
        $indexPath = $destinationPath;
    }
    return $indexPath;
}

Then in parseGitDiffHeader() we can write $indexPath = adjustPathForRecentRenamings(parseGitDiffHeaderForIndexPath($fileHandle, $line)) when $_ matches $gitDiffStartRegEx. When $indexPath is a non-empty string we construct the "Index:" line taking care to append the correct line ending (we'll need to save $POSTMATCH before calling parseGitDiffHeaderForIndexPath() as it will change on function return). Otherwise, we die() with an error message, say "Could not determine target file path from diff."

> Tools/Scripts/VCSUtils.pm:110
>  my $gitDiffStartRegEx = qr#^diff --git (\w/)?(.+) (\w/)?([^\r\n]+)#;

We seem to only use this regular expression to match the prefix "diff --git" and to extract the end of line character. Can we simplify this expression?

> Tools/Scripts/VCSUtils.pm:111
> +my $gitDiffStartWithPrefixRegEx = qr#^diff --git a/(.+) b/([^\r\n]+)#;

This is OK as-is. Notice that the source- (a/) and destination- prefix (b/) can be modified by specifying to the git diff command the optional command line arguments --src-prefix and --dst-prefix per <https://www.kernel.org/pub/software/scm/git/docs/v1.7.3/git-diff.html>. I assume this influenced the choice of the special escape \w in the regular expression $gitDiffStartRegEx as opposed to hardcoding the prefixes. The use of the special escape \w is insufficient to match an arbitrary source and destination prefix. Obviously the use of the special escape \w matches more variants of the diff --git line than a diff --git line with hardcoded source and destination prefixes.

> Tools/Scripts/VCSUtils.pm:701
> +            if (/^\+\+\+ (.+)/) {
> +                $indexPath = adjustPathForRecentRenamings($1);

Consider the following non-prefixed diff that deletes the file named "TWO WORDS.txt":

diff --git TWO WORDS.txt TWO WORDS.txt
deleted file mode 100644
index 03a05ce..0000000
--- TWO WORDS.txt       
+++ /dev/null
@@ -1 +0,0 @@
-Sample

Then $indexPath will be /dev/null. But it should be "TWO WORDS.txt".

> Tools/Scripts/VCSUtils.pm:704
> +                $indexPath = $1;

Don't we need to adjust the index path here (i.e. call adjustPathForRecentRenamings())?

> Tools/Scripts/VCSUtils.pm:707
> +                $indexPath = $1;

Ditto.

> Tools/Scripts/VCSUtils.pm:748
> +        } elsif (/^copy from ([^\r\n]+)/) {

I suggest we strengthen this regular expression and exclude tab characters from being considered in a path.

> Tools/Scripts/VCSUtils.pm:750
> +        } elsif (/^rename from ([^\r\n]+)/) {

Ditto.

> Tools/Scripts/VCSUtils.pm:762
> +            # The patch(1) command thinks a file path is characters before the parenthesis.

Is this correct? I thought patch(1) reads all characters up to the first tab character.

> Tools/Scripts/VCSUtils.pm:768
> +            # Please watch the comments in the "elsif (/^--- \S+/)" line for detail.

I think it would be clearer to duplicate the last two lines of the comment for the "elsif (/^--- \S+/)" branch than to refer to it. If you don't like such duplication then I suggest updating this sentence to read (*): See the comment for the /^--- \S+/ case (above) for more details.

(*) It's unusual to say "watch the comments" as the word "watch" implies that a person should keep the comment in view as if it were to change at any moment.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the webkit-unassigned mailing list