[webkit-reviews] review denied: [Bug 26283] WebKit needs scripts to auto-commit patches from the commit queue : [Attachment 31575] Updated patch, split into modules, now supports --force-clean

Mon Jun 22 17:17:01 PDT 2009

David Levin <levin at chromium.org> has denied Eric Seidel <eric at webkit.org>'s
request for review:
Bug 26283: WebKit needs scripts to auto-commit patches from the commit queue
https://bugs.webkit.org/show_bug.cgi?id=26283

Attachment 31575: Updated patch, split into modules, now supports --force-clean
https://bugs.webkit.org/attachment.cgi?id=31575&action=review

------- Additional Comments from David Levin <levin at chromium.org>
Most of these are pretty minor but there are enough comments that it would be
worth seeing after it is fixed up so r- for now.

I wish the tool would print out its --help content if I just type its name w/o
any args but maybe I'm atypical?

> 91d483a5e9f4b33db4e931e6f99f31bb86066844
> diff --git a/WebKitTools/ChangeLog b/WebKitTools/ChangeLog
> index 14937a6..5885bc0 100644
> --- a/WebKitTools/ChangeLog
> +++ b/WebKitTools/ChangeLog
> @@ -1,3 +1,58 @@
> +2009-06-18  Eric Seidel  <eric at webkit.org>
> +
> +	   Reviewed by NOBODY (OOPS!).
> +
> +	   WebKit needs a script to interact with bugzilla and automate
> +	   parts of the patch posting and commit processes.
> +	   https://bugs.webkit.org/show_bug.cgi?id=26283
> +
> +	   This is really a first-draft tool.
> +	   It's to the point where it's useful to more people than just me now
though.
> +	   Git support works.  SVN support is written, but untested.
> +
> +	   This tool requires BeautifulSoup and mechanize python modules to
run:
> +	   sudo easy_install BeautifulSoup
> +	   sudo easy_install mechanize

I wish that this was part of the help in the script.  Also not everyone has
easy_install installed either.

> +
> +	   More important than the tool itself are the Bugzilla, Git and SVN
class abstractions
> +	   which I hope will allow easy writing of future tools.
> +
> +	   I intend to break Git, SVN and SCM out into an scmtools.py module
after landing.

It looks like you did this already.

> +	   Likewise, Bugzilla into a bugzilla.py module.  I felt it would be
easier to review in one file.

Already done as well!

> diff --git a/WebKitTools/Scripts/bugzilla-tool
b/WebKitTools/Scripts/bugzilla-tool
> new file mode 100755
> index 0000000..f28fa0a
> --- /dev/null
> +++ b/WebKitTools/Scripts/bugzilla-tool
> @@ -0,0 +1,446 @@
> +#!/usr/bin/python
> +
> +# Copyright (C) 2009 Google, Inc.  All rights reserved.
> +#
> +# Redistribution and use in source and binary forms, with or without
> +# modification, are permitted provided that the following conditions
> +# are met:
> +# 1. Redistributions of source code must retain the above copyright
> +#	notice, this list of conditions and the following disclaimer.
> +# 2. Redistributions in binary form must reproduce the above copyright
> +#	notice, this list of conditions and the following disclaimer in the
> +#	documentation and/or other materials provided with the distribution.
> +#

Add clause #3
 *     * Neither the name of Google Inc. nor the names of its
 * contributors may be used to endorse or promote products derived from
 * this software without specific prior written permission.

> +# THIS SOFTWARE IS PROVIDED BY APPLE COMPUTER, INC. ``AS IS'' AND ANY
Consider changing to:
 # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY 

> +#
> +# A tool for automating dealing with bugzilla, posting patches, commiting
patches, etc.

s/commiting/committing/

> +
> +import sys
> +import re
> +import os
> +import subprocess

Nice to alphabetize these.

> +
> +from optparse import OptionParser, IndentedHelpFormatter, SUPPRESS_USAGE,
make_option
> +
> +sys.path.append("modules") # Import WebKit-specific modules

Modified the sys.path is usually something done as a last resort. "from
modules.bugzilla ..." would be better.	You'll need to add an empty __init__.py
file in WebKitTools/Scripts to do this.

> +from bugzilla import Bugzilla
> +from scm import detect_scm_system, ScriptError
> +
> +def chdir_webkit():
> +    # We could instead ask the SCM system for its root directory (since we
need that for making patches as well)
End with "."

> +    script_directory = os.path.abspath(sys.path[0])
> +    webkit_directory = os.path.abspath(os.path.join(script_directory,
"../.."))
> +    print webkit_directory
Is this some debug info that was left behind?  If not, should it say something
before doing the print (e.g. "Working in %s")

> +    os.chdir(webkit_directory)
> +
> +
> +def log(string):
> +    print >> sys.stderr, string
> +
> +def error(string):
> +    log(string)
> +    exit(1)
> +
> +
Extra blank line?

> +# These could be put in some sort of changelogs.py
> +def latest_changelog_entry(changelog_path):
> +    entry_lines = []
> +    changelog = open(changelog_path)
> +    try:
> +	   log("Parsing ChangeLog: " + changelog_path)
> +	   # The first line should be a date line
Add period at end of comment (throughout).

Why not validate that the first line is a date line by using the regex?

> +	   entry_lines.append(changelog.readline())
> +	   
> +	   # e.g. 2009-06-03  Eric Seidel  <eric at webkit.org>
> +	   changelog_date_line_regexp = '^([\d\-]+)  (.+)  <(.+)>$'

How about something more strict about the date format and slightly looser about
the spacing:
	changelog_date_line_regexp = ('^(\d{1,4}\-){2}\d{1-4}' # Consume the
date.
				      + '\S+(.+)\S+' # Consume the name.
				      + '<([^<>]+)>$') # And finally the email
address.

> +	   for line in changelog:
> +	       # if we've hit the next entry, return
> +	       if re.match(changelog_date_line_regexp, line):

Consider compiling the regex
http://docs.python.org/library/re.html#module-contents

> +	finally:
> +		changelog.close()

changelog.close() is indented too far.

> +
> +def modified_changelogs(scm):
> +    changelog_paths = []
> +    paths = scm.changed_files()
> +    for path in paths:
> +	   if re.search('ChangeLog', path):

What about:
   path.endswith("changeLog')
?

> +	       changelog_paths.append(path)
> +    return changelog_paths

In WebKitTools/Scripts/commit-log-editor, it has a specific ordering for the
changelogs in the commit message.  It isn't followed here.

> +
> +def commit_message_for_this_commit(scm):
> +    changelog_paths = modified_changelogs(scm)
> +    if len(changelog_paths) == 0:

Consider
   if not len(changelog_paths):

> +	   error("Found no modified ChangeLogs, can't create a commit
message.")
> +
> +    changelog_messages = []
> +    for path in changelog_paths:
> +	   changelog_entry = latest_changelog_entry(path)
> +	   if not changelog_entry:
> +	       error("Failed to parse ChangeLog: " + os.path.abspath(path))
> +	   changelog_messages.append(changelog_entry)

This code does do the titles "WebCore;", "LayoutTests:", etc. that are done in
WebKitTools/Scripts/commit-log-editor.

> +    return ''.join(changelog_messages)
> +
> +
> +class Command:
> +    def __init__(self, help_text, argument_names="", options = []):
No spaces around = in this case.

> +	   self.help_text = help_text
> +	   self.argument_names = argument_names
> +	   self.options = options
> +	   self.option_parser = OptionParser(usage=SUPPRESS_USAGE,
add_help_option=False, option_list=self.options)
> +    
> +    def name_with_arguments(self, command_name):
> +	   usage_string = command_name
> +	   if (len(self.options) > 0):
No need for ().

> +	       usage_string += " [options]"
> +	   if (self.argument_names != ""):
Prefer
   if self.argument_names:

> +	       usage_string += " " + self.argument_names
> +	   return usage_string
> +
> +    def parse_args(self, args):
> +	   return self.option_parser.parse_args(args)
> +
> +    def execute(self, options, args, tool):
> +	   raise NotImplementedError, "subclasses must implement"
> +
> +
> +class BugsInCommitQueue (Command):
No space before ( even in inheritance (throughout).

> +    def __init__(self):
> +	   Command.__init__(self, 'Bugs in the commit queue')
> +
> +    def execute(self, options, args, tool):
> +	   bug_ids = tool.bugs.fetch_bug_ids_from_commit_queue()
> +	   for bug_id in bug_ids:
> +	       print tool.bugs.bug_url_for_bug_id(bug_id)
In general it is safer to print like this:

 print "%s" % tool.bugs.bug_url_for_bug_id(bug_id)

which will avoid treating anything in the string as an escape sequence.

> +
> +
> +class PatchesInCommitQueue (Command):
> +    def __init__(self):
> +	   Command.__init__(self, 'Patches attached to bugs in the commit
queue')
> +
> +    def execute(self, options, args, tool):
> +	   patches = tool.bugs.fetch_patches_from_commit_queue()
> +	   log("Patches in commit queue:")
> +	   for patch in patches:
> +	       print patch['url']

print "%s" % patch['url']
(throughout)

> +
> +
> +class ReviewedPatchesOnBug (Command):
> +    def __init__(self):
> +	   Command.__init__(self, 'r+\'d patches on a bug', 'BUGID')
> +
> +    def execute(self, options, args, tool):
> +	   bug_id = args[0]
> +	   patches_to_land = tool.bugs.fetch_reviewed_patches_from_bug(bug_id)
> +	   for patch in patches_to_land:
> +	       print patch['url']
> +
> +
> +class ApplyPatchesFromBug (Command):
> +    def __init__(self):
> +	   options = [
> +	       make_option("--no-update", action="store_false", dest="update",
default=True, help="Don't update the working directory before applying
patches"),
> +	       make_option("--force-clean", action="store_true",
dest="force_clean", default=False, help="Clean working directory before
applying patches (removes local changes and commits)"),
> +	       make_option("--no-clean", action="store_false", dest="clean",
default=True, help="Don't check if the working directory is clean before
applying patches"),
> +	   ]
> +	   Command.__init__(self, 'Applies all patches on a bug to the local
working directory without committing.', 'BUGID', options=options)
> +
> +    def execute(self, options, args, tool):
> +	   bug_id = args[0]
> +	   patches = tool.bugs.fetch_reviewed_patches_from_bug(bug_id)
> +	   chdir_webkit()
> +	   if options.clean:
> +	       tool.scm().ensure_clean_working_directory(options.force_clean,
allow_local_commits = True)
No space around = in this case.  (Personally, I'd rather always do spaces
around = but alas such is not python style.)

> +	   if options.update:
> +	       tool.scm().update_webkit()
> +	   
> +	   # Should we error out if tool.scm does not support local commits?
> +	   for patch in patches:
> +	       tool.scm().apply_patch(patch)

Do all the patches get applied as one bunch?  I typically apply each one as a
separate "git commit".
No sure how do-able this though.  (I guess I'd like to encourage one patch per
bug and then there is no issue :).)

> +    def run_and_throw_if_fail(self, script_name):
> +	   build_webkit_process = subprocess.Popen(script_name, shell=True)
> +	   return_code = build_webkit_process.wait()
> +	   if return_code != 0:

Use
	if return_code:

> +	       raise ScriptError(script_name + " failed with code " +
return_code)

> diff --git a/WebKitTools/Scripts/modules/bugzilla.py
b/WebKitTools/Scripts/modules/bugzilla.py

> +def read_config(key):
> +    # Need a way to read from svn too
> +    config_process = subprocess.Popen("git config --get bugzilla." + key,
stdout=subprocess.PIPE, shell=True)
> +    value = config_process.communicate()[0]
> +    return_code = config_process.wait()
> +
> +    if return_code != 0:

Use
    if return_code:

> +	   return None
> +
> +class Bugzilla:
> +    def __init__(self, dryrun = False):
No space around =.

> +	   self.br = Browser()
s/self.br/self.browser/

> +
> +    # This could eventually be a text file
> +    reviewer_usernames_to_full_names = {
> +	   "abarth" : "Adam Barth",
> +	   "adele" : "Adele Peterson",
> +	   "ariya.hidayat" : "Ariya Hidayat",
> +	   "darin" : "Darin Adler",
> +	   "dglazkov" : "Dimitri Glazkov",
> +	   "eric" : "Eric Seidel",
> +	   "ddkilzer" : "David Kilzer",
> +	   "fishd" : "Darin Fisher",
> +	   "gns" : "Gustavo Noronha",
> +	   "hyatt" : "David Hyatt",
> +	   "jmalonzo" : "Jan Alonzo",
> +	   "levin" : "David Levin",
> +	   "mitz" : "Dan Bernstein",
> +	   "mjs" : "Maciej Stachoviak",
> +	   "mrowe" : "Mark Rowe",
> +	   "oliver" : "Oliver Hunt",
> +	   "staikos" : "George Staikos",
> +	   "treat" : "Adam Treat",
> +	   "timothy" : "Timothy Hatcher",
> +	   "xan.lopez" : "Xan Lopez",
> +	   "zecke" : "Holger Freyther",
> +    }
This is almost sorted but ddkilzer and treat are out of order.
Also, it looks like these folks are missing aroben, smfr, bradee-oh? (probably
more...)

> +    def attachment_url_for_id(self, attachment_id, action = "view"):
No spaces around =.

> +	   attachment_base_url = self.bug_server + "attachment.cgi?id="
> +	   return "%s%s&action=%s" % (attachment_base_url, attachment_id,
action)

If attachment_id is just an id, then doesn't the url need id= in it?
(Or perhaps the var name should be attachment_param?)

> +    def fetch_attachments_from_bug(self, bug_id):
...
> +	       attachment['obsolete'] = (attachment_row.has_key('class') and
attachment_row['class'] == "bz_obsolete")

No need for () here.

> +
> +	       if (str(review_status).find("review+") != -1):
No need for ().

> +
> +    def fetch_bug_ids_from_commit_queue(self):
> +	   # unassigned_only =
"&emailassigned_to1=1&emailtype1=substring&email1=unassigned"
Did you mean to keep this line?

> +    def fetch_patches_from_commit_queue(self):
> +	   patches_to_land = []
> +	   for bug_id in self.fetch_bug_ids_from_commit_queue():
> +	       patches = self.fetch_reviewed_patches_from_bug(bug_id)
> +	       patches_to_land.extend(patches)
This also works
   patches_to_land += patches

> +	   return patches_to_land
> +
> +    def authenticate(self, username = None, password = None):
No spaces around ='s.

> +	   if (self.authenticated):
> +	       return
> +	   
> +	   if not username:
> +	       username = read_config("username")
> +	       if not username:
> +		   username = raw_input("Bugzilla login: ")
> +	   if not password:
> +	       password = read_config("password")
> +	       if not password:
> +		   password = getpass.getpass("Bugzilla password for %s: " %
(username,))
> +
> +	   log("Logging in as %s..." % (username,))
> +	   if self.dryrun:
Should it set 
       self.authenticated = True
here?

> +	       return

> +
> +    def add_patch_to_bug(self, bug_id, patch_file_object, description,
comment_text = None, mark_for_review = False):
No spaces around ='s.

> +	   
> +	   log("Adding patch \"%s\" to bug %s" % (description, bug_id))

You could do

	log('Adding patch "%s" to bug %s' % (description, bug_id))

to avoid the escpaing.

> +    def obsolete_attachment(self, attachment_id, comment_text = None):
No space around =.

> +
> +	   log("Obsoleting attachment: %s" % (attachment_id,))

Remove the tuple creation. ("...",) -> "..."

> +	   # Also clear any review flag (to remove it from review/commit
queues)
> +	   self.br.find_control(type='select', nr=0).value = ("X",)

It seems odd that this needs a tuple.  I'm being too lazy to look it up but
just wanted to point this out for your double check.

> +    def post_comment_to_bug(self, bug_id, comment_text):
> +	   self.authenticate()
> +
> +	   log("Adding comment to bug %s" % (bug_id,))

Use
	log("Adding comment to bug %s" % bug_id)

> +    def close_bug_as_fixed(self, bug_id, comment_text = None):
No space around =.

> +	   self.authenticate()
> +
> +	   log("Closing bug %s as fixed" % (bug_id,))

Use:
	log("Closing bug %s as fixed" % bug_id)

> diff --git a/WebKitTools/Scripts/modules/scm.py
b/WebKitTools/Scripts/modules/scm.py
> new file mode 100644
> index 0000000..63afd45
> --- /dev/null
> +++ b/WebKitTools/Scripts/modules/scm.py
>

> +class SCM:
> +    def __init__(self, dryrun = False):
No space around =.

> +
> +    def ensure_clean_working_directory(self, force, allow_local_commits =
False):

No space around =.

> +	   if not force and not self.working_directory_is_clean():
> +	       log("Working directory has modifications, pass --force-clean or
--no-clean to continue.")
> +	       os.system(self.status_command())
> +	       exit(1)
> +	   
> +	   log("Cleaning the working directory")
> +	   self.clean_working_directory(discard_local_commits = not
allow_local_commits)

No space around =.

> +    def ensure_no_local_commits(self, force):
...
> +	   commits = self.local_commits()
> +	   if len(commits) == 0:

if len(commits) :

> +
> +    def apply_patch(self, patch):
> +	   # It's possible that the patch was not made from the root directory
Nit: directory"."

> +	   # we should detect and handle that case.
Nit: "W"e

> +	   return_code = os.system("curl " + patch['url'] + " | svn-apply
--reviewer \"" + patch['reviewer'] + "\"")

How about?
	return_code = os.system('curl %s | svn-apply --reviewer "%s"' %
(patch['url'], patch['reviewer']))

> +	   if (return_code != 0):

Use
	if return_code:

> +	       raise ScriptError("Patch " + patch['url'] + " failed to download
and apply.")
> +

> +    # It's slightly hacky to share this code, since there are implicit
assumptions about the regexp format
Less so with named groups, so many the comment can go away if you do that.
> +    def run_status_and_extract_filenames(self, status_command,
status_regexp):
> +	   file_names = []
> +	   for line in os.popen(status_command).readlines():
> +	       match = re.search(status_regexp, line)
> +	       if not match:
> +		   continue
> +	       # status = match.group(1)
> +	       file_name = match.group(2)

If you go with the named group suggestion, this becomes
	    file_name = match.group('filename')

> +	       file_names.append(file_name)
> +	   return file_names

> +class SVN (SCM):
> +    def __init__(self, dryrun = False):
No spaces around =.

> +
> +    def working_directory_is_clean(self):
> +	   diff_process = subprocess.Popen("svn diff", stdout=subprocess.PIPE,
shell=True)
> +	   diff_output = diff_process.communicate()[0]
> +	   if dif_process.wait() != 0:
Two issues (a typo dif vs diff) and no need for "!= 0"), so consider

	if diff_process.wait():

> +
> +    def changed_files(self):
> +	   status_regexp = "^([ACDMR]).{6} (.+)$" if self.svn_version() > "1.6"
else "^([ACDMR]).{5} (.+)$"

There is something about all of the "" in that line that makes it hard for me
to parse.  Personally, I'd
prefer the standard c++ look in this case.

if self.svn_version() > "1.6":
    status_regexp = "^([ACDMR]).{6} (.+)$"
else
    status_regexp = "^([ACDMR]).{5} (.+)$"

Also consider used named groups (see my comment in the git class).

> +    def commit_with_message(self, message):
> +	   commit_process = subprocess.Popen('svn commit -F -',
stdin=subprocess.PIPE, shell=True)
> +	   commit_process.communicate(message)
Should this do
	   commit_process.wait()
?

> +
> +# All git-specific logic should go here.
> +class Git (SCM):
> +    def __init__(self, dryrun = False):
No space around =.

> +    
> +    def working_directory_is_clean(self):
> +	   diff_process = subprocess.Popen("git diff-index head",
stdout=subprocess.PIPE, shell=True)
> +	   diff_output = diff_process.communicate()[0]
> +	   if dif_process.wait() != 0:

Two issues (a typo dif vs diff) and no need for "!= 0"), so consider

	if diff_process.wait():

> +
> +    def changed_files(self):
> +	   status_command = 'git diff -r --name-status -C -C -M'
Extra -C

> +	   status_regexp = '^([ADM])\t(.+)$'
Consider using named groups, for example:

  '^(?P<status>[ADM])\t(?P<filename>.+)$'

See http://docs.python.org/library/re.html#regular-expression-syntax

> +	   return self.run_status_and_extract_filenames(status_command,
status_regexp)
> +    

> +    # Git-specific methods:
> +    
> +    def commit_locally_with_message(self, message):
> +	   commit_process = subprocess.Popen('git commit -a -F -',
stdin=subprocess.PIPE, shell=True)
In many other places you used "" for strings.  Python also allows single
quotes, but it is nice to be consistent (unless you want to switch to put a "
in the middle.)

> +	   commit_process.communicate(message)
It seems like it should do commit_process.wait() here.

> +    def push_local_commits_to_server(self):
...
> +	   return out
> +    
> +
extra blank line.

> +    def commit_ids_from_range_arguments(self, args, cherry_pick = False):
No spaces around =.

> +	   
> +	   # If not cherry-picking and only passed one revision, assume
"^revision head" aka "revision..head"

I can read "If we're not cherry-picking..." more easily.

> +	   if len(revisions) < 2:
> +	       revisions[0] = "^" + revisions[0]
> +	       revisions.append("head")
> +	   
> +	   rev_list_args = ['git', 'rev-list']
> +	   rev_list_args.extend(revisions)

fwiw, 

   rev_list_arg += revisions

will work too.	Your choice.