[webkit-reviews] review denied: [Bug 72351] Implement edit-distance based reviewer recognition algorithm : [Attachment 115198] Added _assert_fuzz_match

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Nov 15 12:03:25 PST 2011


Eric Seidel <eric at webkit.org> has denied Ryosuke Niwa <rniwa at webkit.org>'s
request for review:
Bug 72351: Implement edit-distance based reviewer recognition algorithm
https://bugs.webkit.org/show_bug.cgi?id=72351

Attachment 115198: Added _assert_fuzz_match
https://bugs.webkit.org/attachment.cgi?id=115198&action=review

------- Additional Comments from Eric Seidel <eric at webkit.org>
View in context: https://bugs.webkit.org/attachment.cgi?id=115198&action=review


I think this needs another round.  Mostly for aesthetics.

> Tools/Scripts/webkitpy/common/edit_distance.py:32
> +def edit_distance(str1, str2):

Did you write this yourself?  It doesn't qutie feel like your style.

> Tools/Scripts/webkitpy/common/edit_distance.py:33
> +    distances = [array('h', (0,) * (len(str2) + 1)) for i in range(0,
len(str1) + 1)]

I'm confused what the "h" is for?

> Tools/Scripts/webkitpy/common/edit_distance.py:38
> +    for i in range(1, len(str1) + 1):
> +	   distances[i][0] = i
> +
> +    for j in range(1, len(str2) + 1):
> +	   distances[0][j] = j

Why is this all 1-based?

> Tools/Scripts/webkitpy/common/edit_distance.py:44
> +	       distances[i][j] = min(distances[i][j - 1] + 1, distances[i -
1][j] + 1, distances[i - 1][j - 1] + diff)

It seems like we have more -1 than we do plain usage of i/j.  I assuem that's
because the above array's are one-based?

> Tools/Scripts/webkitpy/common/edit_distance_unittest.py:36
> +	   self.assertEqual(edit_distance('', 'aa'), 2)

I even write things like self._assert_edit_distance('', 'aa', 2) here.	But
this is also OK.

> Tools/Scripts/webkitpy/common/checkout/changelog.py:163
> +	   if self._reviewers_text_list:
> +	       list_of_reviewers =
[self._committer_list.contributors_by_fuzzy_match(reviewer)[0] for reviewer in
self._reviewers_text_list]
> +	       self._reviewers = [reviewers[0] for reviewers in
list_of_reviewers if len(reviewers) == 1]
> +	   else:
> +	       self._reviewers = []

I think this if empty-string check makes more sense inside
contributors_by_fuzzy_match?

I'm also not sure why you're stripping the list container off of _reviewers in
the case where there is only one reviewer?

> Tools/Scripts/webkitpy/common/checkout/changelog_unittest.py:420
> +    def _contributors(self, names):
> +	   return [CommitterList().contributor_by_name(name) for name in names]


I'm not sure we want to make this test depend on our ContributorList directly. 
If we don't have to test that part using real names, we should avoid it.  But
it's also OK, since 1.	the contributor list is very static data, and 2 lots of
other tests aready depend on it, sadly.

> Tools/Scripts/webkitpy/common/checkout/changelog_unittest.py:429
> +	   self.assertEquals(ChangeLogEntry._parse_reviewer_text('Reviewed by
Mac build fix')[1], ['Mac build fix'])

Likewise you could use an _assert_reviewer_text("Reviewed by Mac build fix",
"Mac build fix") here, although obviously that wouldn't work for all the cases
you're testing, but a few such helpers could make this wall of text more
readable.

> Tools/Scripts/webkitpy/common/checkout/changelog_unittest.py:454
> +	   self.assertEqual(self._entry_with_reviewer("Reviewed by Eric
Seidel.").has_valid_reviewer(), True)

Wow, this would be much clearer with a helper.

self._assert_has_valid_reviewer("Reviewed by Eric Seidel", True)

I'm not sure why you're opposed to helper functions in test cases? :p

> Tools/Scripts/webkitpy/common/config/committers.py:32
> +from webkitpy.common.edit_distance import edit_distance

Historically we've avoided having _ in module names to match python style
(outputcapture.py, filesystem.py, etc.)

> Tools/Scripts/webkitpy/common/config/committers.py:536
> +	   # First path, optimitically match for fullname, email and
irc_nicknames
> +	   for contributor in self.contributors():
> +	       if string in [email.lower() for email in contributor.emails] or
string == contributor.full_name.lower():

I would consider breaking these blocks into helper functions.  1.  to make the
management of hte return tuple-easier.	(The first helpr doesn't have to care
about the tuple-ness).	2. to make it easier to test (or pref-test) halves of
it.

> Tools/Scripts/webkitpy/common/config/committers.py:553
> +	       if len(tokens):

You can use "if tokens" to accomplish the same with 5 fewer chars if you like.


More information about the webkit-reviews mailing list