[webkit-reviews] review denied: [Bug 31737] Chromium WebKit API WebPageSerializer : [Attachment 43602] Patch

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Nov 24 13:21:57 PST 2009


Darin Fisher (:fishd, Google) <fishd at chromium.org> has denied Yaar Schnitman
<yaar at chromium.org>'s request for review:
Bug 31737: Chromium WebKit API WebPageSerializer
https://bugs.webkit.org/show_bug.cgi?id=31737

Attachment 43602: Patch
https://bugs.webkit.org/attachment.cgi?id=43602&action=review

------- Additional Comments from Darin Fisher (:fishd, Google)
<fishd at chromium.org>
The StringBuilder additions seem quite handy.


> +++ b/WebKit/chromium/public/WebPageSerializer.h
...
> +// Get html data by serializing all frames of current page with lists
> +// which contain all resource links that have local copy.
> +// contain all saved auxiliary files included all sub frames and resources.

did you mean to delete this second sentence?


> +// This function will find out all frames and serialize them to HTML data.
> +// We have a data buffer to temporary saving generated html data. We will
> +// sequentially call WebViewDelegate::SendSerializedHtmlData once the data
> +// buffer is full. See comments of WebViewDelegate::SendSerializedHtmlData

nit: please fix these references to WebViewDelegate since that interface no
longer exists.


> +class WebPageSerializer {
> +public:
> +

nit: ^^^ kill this new line

> +    // Do serialization action. Return false means no available frame has
been
> +    // serialized, otherwise return true.
> +    // The parameter specifies which frame need to be serialized.
> +    // The parameter recursive specifies whether we need to
> +    // serialize all sub frames of the specified frame or not.
> +    // The parameter client specifies the pointer of interface

nit: ^^^ add a trailing period


> +    // WebPageSerializerClient provide sink interface which can receive the
> +    // individual chunks of data to be saved.
> +    // The parameter links contain original URLs of all saved links.
> +    // The parameter localPaths contain corresponding local file paths of
all
> +    // saved links, which matched with vector:links one by one.
> +    // The parameter localDirectoryName is relative path of directory which
> +    // contain all saved auxiliary files included all sub frames and
resources.
> +    WEBKIT_API static bool serialize(WebFrame* frame,
> +					bool recursive,
> +					WebPageSerializerClient* client,
> +					const WebVector<WebURL>& links,
> +					const WebVector<WebString>& localPaths,

> +					const WebString& localDirectoryName);
> +
> +    // FIXME: The following are here for unit testing purposes. Consider
> +    // fixing the unit tests instead.

nit: s/fixing/changing/


> +++ b/WebKit/chromium/public/WebPageSerializerClient.h
...
> +class WebPageSerializerClient {
> +public:
> +    // This enum indicates  This sink interface can receive the individual
chunks
> +    // of serialized data to be saved, so we use values of following enum
> +    // definition to indicate the serialization status of serializing all
html
> +    // content. If current frame is not complete serialized, call
> +    // DidSerializeDataForFrame with URL of current frame, data, data length
and

nit: DidSerializeDataForFrame -> didSerializeDataForFrame 


> +    // flag CurrentFrameIsNotFinished.
> +    // If current frame is complete serialized, call
DidSerializeDataForFrame

ditto


> +    // with URL of current frame, data, data length and flag
> +    // CurrentFrameIsFinished.
> +    // If all frames of page are complete serialized, call
> +    // DidSerializeDataForFrame with empty URL, empty data, 0 and flag

ditto


> +    enum PageSerializationStatus {
> +	   // Current frame is not finished saving.
> +	   CurrentFrameIsNotFinished = 0,
> +	   // Current frame is finished saving.
> +	   CurrentFrameIsFinished,
> +	   // All frame are finished saving.
> +	   AllFramesAreFinished,
> +    };

nit: ^^^ those comments are just restating the name of the enum with the
exception
of the "saving" word.  can we delete the comments?


> +    // Receive the individual chunks of serialized data to be saved.
> +    // The parameter frameURL specifies what frame the data belongs. The
> +    // parameter data contains the available data for saving. The parameter
> +    // status indicates the status of data serialization.
> +    virtual void didSerializeDataForFrame(const WebURL& frameURL,
> +					     const WebString& data,
> +					     PageSerializationStatus status) =
0;
> +
> +    WebPageSerializerClient() { }
> +    virtual ~WebPageSerializerClient() { }

^^^ can this destructor be made protected instead?  i think we can do this
to make it clear that WebPageSerializer does not delete the given client.


> +#endif  // WEBKIT_GLUE_DOM_SERIALIZER_DELEGATE_H__

^^^ please fix that comment


> +++ b/WebKit/chromium/src/DOMUtilitiesPrivate.cpp
...
> +    if (attrName == HTMLNames::srcAttr) {
> +	   // Check src attribute.
> +	   if (element->hasTagName(HTMLNames::imgTag) ||
> +	       element->hasTagName(HTMLNames::scriptTag) ||
> +	       element->hasTagName(HTMLNames::iframeTag) ||

nit: in webkit style, the "||" should go at the start of the next line


> +++ b/WebKit/chromium/src/WebEntities.cpp
...
> +#include "config.h"
> +#include "WebEntities.h"
> +
> +#include <string.h>
> +// Note that this file is also included by HTMLTokenizer.cpp so we are
getting

nit: ^^^ please add a new line after #include <string.h>


> +// two copies of the data in memory.  We can fix this by changing the script

> +// that generated the array to create a static const that is its length, but

> +// this is low priority since the data is less than 4K.
> +#include "HTMLEntityNames.c"
> +#include "PlatformString.h"
> +#include "StringBuilder.h"
> +#include <wtf/HashMap.h>
> +
> +#undef LOG

nit: ^^^ i think can delete this #undef now.


> +
> +#include "WebString.h"
> +
> +using namespace WebCore;
> +
> +namespace WebKit {
> +
> +void populateMap(EntitiesMapType& map,
> +		    const Entity* entities,
> +		    int entitiesCount,

nit: the type of entitiesCount should probably be size_t since you
pass the result of sizeof to it.


> +		    bool standardHTML)
> +{
> +    ASSERT(map.isEmpty());
> +    const Entity* entity = &entities[0];
> +    for (int i = 0; i < entitiesCount; i++, entity++) {

^^^ loop index, i, should then also be size_t.


> +	   int code = entity->code;
> +	   String name = entity->name;
> +	   // For consistency, use case insensitive compare for entity codes
that have both.
> +	   if (map.contains(code)
> +	       && equalIgnoringCase(map.get(code), name))

nit: ^^^ given the long comment, it seems odd to line break here.


> +static const Entity xmlBuiltInEntityCodes[] = {
> +    {"lt", 0x003c},
> +    {"gt", 0x003e},
> +    {"amp", 0x0026},
> +    {"apos", 0x0027},
> +    {"quot", 0x0022}

nit: ^^^ please insert a space after "{" and before "}"


> +String WebEntities::convertEntitiesInString(const String& value) const
...
> +	   } else {
> +	       curPos++;
> +	   }

nit: ^^^ no brackets


> +++ b/WebKit/chromium/src/WebEntities.h

> +#include "WebCommon.h"

nit: WebCommon.h doesn't seem to be needed here.


> +namespace WebKit {
> +typedef HashMap<int, WebCore::String> EntitiesMapType;

can you move this typedef into the WebEntities class?  maybe as a MapType
typedef?


> +class WebEntities {
> +public:
...
> +    WebEntities(bool xmlEntities);

nit: ^^^ can this ctor be made explicit?


> +
> +    // Check whether specified unicode has corresponding html or xml
built-in
> +    // entity name. If yes, return the entity notation. If not, returns an
> +    // empty string. Parameter isHTML indicates check the code in html
entity
> +    // map or in xml entity map.
> +    WebCore::String getEntityNameByCode(int code) const;

nit: I think it would be more conventional to name this method
"entityNameByCode"


> +++ b/WebKit/chromium/src/WebFrameImpl.cpp

> +WebFrameImpl* WebFrameImpl::fromFrameOwnerElement(Element* element)
> +{
> +    if (!element
> +	   || !element->isFrameOwnerElement()
> +	   || (!element->hasTagName(WebCore::HTMLNames::iframeTag) 
> +	       && !element->hasTagName(WebCore::HTMLNames::frameTag)))
> +	   return 0;
> +
> +    WebCore::HTMLFrameOwnerElement* frameElement =
> +	   static_cast<WebCore::HTMLFrameOwnerElement*>(element);

nit: I think you can drop the WebCore:: prefixes above.


> +++ b/WebKit/chromium/src/WebPageSerializer.cpp
...
> +// FIXME: verify unicode conversions below with Darin.

please remove this comment once settled here.


> +WebString WebPageSerializer::generateMetaCharsetDeclaration(const WebString&
charset)
> +{
> +    return String::format("<META http-equiv=\"Content-Type\"
content=\"text/html; charset=%ls\">",
> +			     charset.utf8().data());

I think %ls is wrong since you are passing in a UTF-8 string.


> +WebString WebPageSerializer::generateMarkOfTheWebDeclaration(const WebURL&
url)
> +{
> +    KURL kurl = url;
> +    return String::format("\n<!-- saved from url=(%04d)%s -->\n",
> +			     kurl.string().length(),
> +			     kurl.string().utf8().data());

do you really mean to report the number of UTF-16 code points here?  also,
since
you are printing the UTF-8 string, it seems like you would be better off just
getting
the string value of the WebURL, which is a WebCString, which is already UTF-8.


> +WebString WebPageSerializer::generateBaseTagDeclaration(const WebString&
baseTarget)
> +{
> +    String targetDeclaration;
> +    if (!baseTarget.isEmpty())
> +	   targetDeclaration = String::format(" target=\"%ls\"",
baseTarget.utf8().data());
> +    return String::format("<BASE href=\".\"%ls>",
targetDeclaration.utf8().data());

same issue with %ls.  i think you want %s.


> +++ b/WebKit/chromium/src/WebPageSerializerImpl.cpp
...

In the ChangeLog, I think you should attribute Johnny Ding as the original
author
of this code.

> +namespace WebKit {
...
> +static const WebEntities htmlEntities(false);
> +static const WebEntities xmlEntities(true);

please avoid file static instances of non-POD types.  these should be changed
to either be local variables or function statics that get lazily initialized.


> +// SerializeDomParam Constructor.

this comment isn't useful, so we should just delete it.


> +String WebPageSerializerImpl::preActionBeforeSerializeEndTag(
...
> +    if (param->skipMetaElement == element) {
> +	   *needSkip = true;
> +    } else if (element->hasTagName(HTMLNames::scriptTag) ||
> +		  element->hasTagName(HTMLNames::styleTag)) {

nits: ^^^ no brackets around single line statements and "||"
should go at the start of the next line.


> +void WebPageSerializerImpl::openTagToString(const Element* element,
...
> +		       if (attrValue.startsWith("javascript:", false)) {
> +			   result += attrValue;
> +		       } else {

nit: ^^^ no brackets around single line statments



> +			  // Get the absolute link
> +			   String completeURL =
param->doc->completeURL(attrValue);

nit: fix indentation of comment


> +void WebPageSerializerImpl::buildContentForNode(const Node* node,
> +						   SerializeDomParam* param)
> +{
> +    switch (node->nodeType()) {
> +	   case Node::ELEMENT_NODE: {

nit: case should not be indented inside a switch


> +++ b/WebKit/chromium/src/WebPageSerializerImpl.h

> +#include "config.h"

do not include config.h in header files.


More information about the webkit-reviews mailing list