[Webkit-unassigned] [Bug 111066] svn-apply cannot apply patches which is generated by git to files that contain space characters in their path

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon May 27 00:18:09 PDT 2013


https://bugs.webkit.org/show_bug.cgi?id=111066





--- Comment #15 from Yuki Sekiguchi <yuki.sekiguchi at access-company.com>  2013-05-27 00:16:37 PST ---
Hi Daniel,

Thank you for reviewing.

(In reply to comment #13)
> (From update of attachment 202766 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=202766&action=review
> 
> Thank you Yuki for updating the patch.
> 
> The idea presented in this patch is good. I suggest that we encapsulate the parsing machinery to determine the path to the target file into a separate function, say parseGitDiffHeaderForIndexPath, that takes a reference to a file handle F and the last line read from F as input and returns the path to the target file as output (or the empty string on error). parseGitDiffHeaderForIndexPath() will guarantee that F isn't advanced when it returns. That is, if F is at byte b_0 when parseGitDiffHeaderForIndexPath() is called then F will be at byte b_0 when the call returns. This function would have the form:

Refactoring is good, but why should we use file handle as input of parseGitDiffHeaderForIndexPath()?
We have to read file to create @diffLines.
I think creating @diffLines and using it from parseGitDiffHeaderForIndexPath() and parseGitDiffHeader() is better than reading file twice.
My patch use @diffLines for the input of parseGitDiffHeaderForIndexPath().

> Then in parseGitDiffHeader() we can write $indexPath = adjustPathForRecentRenamings(parseGitDiffHeaderForIndexPath($fileHandle, $line)) when $_ matches $gitDiffStartRegEx. When $indexPath is a non-empty string we construct the "Index:" line taking care to append the correct line ending (we'll need to save $POSTMATCH before calling parseGitDiffHeaderForIndexPath() as it will change on function return). Otherwise, we die() with an error message, say "Could not determine target file path from diff."

Fixed.

> 
> > Tools/Scripts/VCSUtils.pm:110
> >  my $gitDiffStartRegEx = qr#^diff --git (\w/)?(.+) (\w/)?([^\r\n]+)#;
> 
> We seem to only use this regular expression to match the prefix "diff --git" and to extract the end of line character. Can we simplify this expression?

Yes. Simplified.

> > Tools/Scripts/VCSUtils.pm:111
> > +my $gitDiffStartWithPrefixRegEx = qr#^diff --git a/(.+) b/([^\r\n]+)#;
> 
> This is OK as-is. Notice that the source- (a/) and destination- prefix (b/) can be modified by specifying to the git diff command the optional command line arguments --src-prefix and --dst-prefix per <https://www.kernel.org/pub/software/scm/git/docs/v1.7.3/git-diff.html>. I assume this influenced the choice of the special escape \w in the regular expression $gitDiffStartRegEx as opposed to hardcoding the prefixes. The use of the special escape \w is insufficient to match an arbitrary source and destination prefix. Obviously the use of the special escape \w matches more variants of the diff --git line than a diff --git line with hardcoded source and destination prefixes.

Used \w+, and added comment the limitation.

> > Tools/Scripts/VCSUtils.pm:701
> > +            if (/^\+\+\+ (.+)/) {
> > +                $indexPath = adjustPathForRecentRenamings($1);
> 
> Consider the following non-prefixed diff that deletes the file named "TWO WORDS.txt":
> 
> diff --git TWO WORDS.txt TWO WORDS.txt
> deleted file mode 100644
> index 03a05ce..0000000
> --- TWO WORDS.txt       
> +++ /dev/null
> @@ -1 +0,0 @@
> -Sample
> 
> Then $indexPath will be /dev/null. But it should be "TWO WORDS.txt".

I added a test case and confirmed that your suggested function fixed this.
Thanks!

> > Tools/Scripts/VCSUtils.pm:704
> > +                $indexPath = $1;
> 
> Don't we need to adjust the index path here (i.e. call adjustPathForRecentRenamings())?
> 
> > Tools/Scripts/VCSUtils.pm:707
> > +                $indexPath = $1;
> 
> Ditto.

Yes, we need.

> > Tools/Scripts/VCSUtils.pm:748
> > +        } elsif (/^copy from ([^\r\n]+)/) {
> 
> I suggest we strengthen this regular expression and exclude tab characters from being considered in a path.
> 
> > Tools/Scripts/VCSUtils.pm:750
> > +        } elsif (/^rename from ([^\r\n]+)/) {
> 
> Ditto.

Added \t.

> > Tools/Scripts/VCSUtils.pm:762
> > +            # The patch(1) command thinks a file path is characters before the parenthesis.
> 
> Is this correct? I thought patch(1) reads all characters up to the first tab character.

This is not correct. You are right.
In GNU patch code, fetchname() @ util.c do that.

> > Tools/Scripts/VCSUtils.pm:768
> > +            # Please watch the comments in the "elsif (/^--- \S+/)" line for detail.
> 
> I think it would be clearer to duplicate the last two lines of the comment for the "elsif (/^--- \S+/)" branch than to refer to it. If you don't like such duplication then I suggest updating this sentence to read (*): See the comment for the /^--- \S+/ case (above) for more details.

Just copied commets.

> (*) It's unusual to say "watch the comments" as the word "watch" implies that a person should keep the comment in view as if it were to change at any moment.

English is very difficult for me like perl :(

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the webkit-unassigned mailing list