[Webkit-unassigned] [Bug 31737] Chromium WebKit API WebPageSerializer

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Nov 24 13:21:58 PST 2009


https://bugs.webkit.org/show_bug.cgi?id=31737


Darin Fisher (:fishd, Google) <fishd at chromium.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #43602|review?                     |review-
               Flag|                            |




--- Comment #4 from Darin Fisher (:fishd, Google) <fishd at chromium.org>  2009-11-24 13:21:57 PST ---
(From update of attachment 43602)
The StringBuilder additions seem quite handy.


> +++ b/WebKit/chromium/public/WebPageSerializer.h
...
> +// Get html data by serializing all frames of current page with lists
> +// which contain all resource links that have local copy.
> +// contain all saved auxiliary files included all sub frames and resources.

did you mean to delete this second sentence?


> +// This function will find out all frames and serialize them to HTML data.
> +// We have a data buffer to temporary saving generated html data. We will
> +// sequentially call WebViewDelegate::SendSerializedHtmlData once the data
> +// buffer is full. See comments of WebViewDelegate::SendSerializedHtmlData

nit: please fix these references to WebViewDelegate since that interface no
longer exists.


> +class WebPageSerializer {
> +public:
> +

nit: ^^^ kill this new line

> +    // Do serialization action. Return false means no available frame has been
> +    // serialized, otherwise return true.
> +    // The parameter specifies which frame need to be serialized.
> +    // The parameter recursive specifies whether we need to
> +    // serialize all sub frames of the specified frame or not.
> +    // The parameter client specifies the pointer of interface

nit: ^^^ add a trailing period


> +    // WebPageSerializerClient provide sink interface which can receive the
> +    // individual chunks of data to be saved.
> +    // The parameter links contain original URLs of all saved links.
> +    // The parameter localPaths contain corresponding local file paths of all
> +    // saved links, which matched with vector:links one by one.
> +    // The parameter localDirectoryName is relative path of directory which
> +    // contain all saved auxiliary files included all sub frames and resources.
> +    WEBKIT_API static bool serialize(WebFrame* frame,
> +                                     bool recursive,
> +                                     WebPageSerializerClient* client,
> +                                     const WebVector<WebURL>& links,
> +                                     const WebVector<WebString>& localPaths,
> +                                     const WebString& localDirectoryName);
> +
> +    // FIXME: The following are here for unit testing purposes. Consider
> +    // fixing the unit tests instead.

nit: s/fixing/changing/


> +++ b/WebKit/chromium/public/WebPageSerializerClient.h
...
> +class WebPageSerializerClient {
> +public:
> +    // This enum indicates  This sink interface can receive the individual chunks
> +    // of serialized data to be saved, so we use values of following enum
> +    // definition to indicate the serialization status of serializing all html
> +    // content. If current frame is not complete serialized, call
> +    // DidSerializeDataForFrame with URL of current frame, data, data length and

nit: DidSerializeDataForFrame -> didSerializeDataForFrame 


> +    // flag CurrentFrameIsNotFinished.
> +    // If current frame is complete serialized, call DidSerializeDataForFrame

ditto


> +    // with URL of current frame, data, data length and flag
> +    // CurrentFrameIsFinished.
> +    // If all frames of page are complete serialized, call
> +    // DidSerializeDataForFrame with empty URL, empty data, 0 and flag

ditto


> +    enum PageSerializationStatus {
> +        // Current frame is not finished saving.
> +        CurrentFrameIsNotFinished = 0,
> +        // Current frame is finished saving.
> +        CurrentFrameIsFinished,
> +        // All frame are finished saving.
> +        AllFramesAreFinished,
> +    };

nit: ^^^ those comments are just restating the name of the enum with the
exception
of the "saving" word.  can we delete the comments?


> +    // Receive the individual chunks of serialized data to be saved.
> +    // The parameter frameURL specifies what frame the data belongs. The
> +    // parameter data contains the available data for saving. The parameter
> +    // status indicates the status of data serialization.
> +    virtual void didSerializeDataForFrame(const WebURL& frameURL,
> +                                          const WebString& data,
> +                                          PageSerializationStatus status) = 0;
> +
> +    WebPageSerializerClient() { }
> +    virtual ~WebPageSerializerClient() { }

^^^ can this destructor be made protected instead?  i think we can do this
to make it clear that WebPageSerializer does not delete the given client.


> +#endif  // WEBKIT_GLUE_DOM_SERIALIZER_DELEGATE_H__

^^^ please fix that comment


> +++ b/WebKit/chromium/src/DOMUtilitiesPrivate.cpp
...
> +    if (attrName == HTMLNames::srcAttr) {
> +        // Check src attribute.
> +        if (element->hasTagName(HTMLNames::imgTag) ||
> +            element->hasTagName(HTMLNames::scriptTag) ||
> +            element->hasTagName(HTMLNames::iframeTag) ||

nit: in webkit style, the "||" should go at the start of the next line


> +++ b/WebKit/chromium/src/WebEntities.cpp
...
> +#include "config.h"
> +#include "WebEntities.h"
> +
> +#include <string.h>
> +// Note that this file is also included by HTMLTokenizer.cpp so we are getting

nit: ^^^ please add a new line after #include <string.h>


> +// two copies of the data in memory.  We can fix this by changing the script
> +// that generated the array to create a static const that is its length, but
> +// this is low priority since the data is less than 4K.
> +#include "HTMLEntityNames.c"
> +#include "PlatformString.h"
> +#include "StringBuilder.h"
> +#include <wtf/HashMap.h>
> +
> +#undef LOG

nit: ^^^ i think can delete this #undef now.


> +
> +#include "WebString.h"
> +
> +using namespace WebCore;
> +
> +namespace WebKit {
> +
> +void populateMap(EntitiesMapType& map,
> +                 const Entity* entities,
> +                 int entitiesCount,

nit: the type of entitiesCount should probably be size_t since you
pass the result of sizeof to it.


> +                 bool standardHTML)
> +{
> +    ASSERT(map.isEmpty());
> +    const Entity* entity = &entities[0];
> +    for (int i = 0; i < entitiesCount; i++, entity++) {

^^^ loop index, i, should then also be size_t.


> +        int code = entity->code;
> +        String name = entity->name;
> +        // For consistency, use case insensitive compare for entity codes that have both.
> +        if (map.contains(code)
> +            && equalIgnoringCase(map.get(code), name))

nit: ^^^ given the long comment, it seems odd to line break here.


> +static const Entity xmlBuiltInEntityCodes[] = {
> +    {"lt", 0x003c},
> +    {"gt", 0x003e},
> +    {"amp", 0x0026},
> +    {"apos", 0x0027},
> +    {"quot", 0x0022}

nit: ^^^ please insert a space after "{" and before "}"


> +String WebEntities::convertEntitiesInString(const String& value) const
...
> +        } else {
> +            curPos++;
> +        }

nit: ^^^ no brackets


> +++ b/WebKit/chromium/src/WebEntities.h

> +#include "WebCommon.h"

nit: WebCommon.h doesn't seem to be needed here.


> +namespace WebKit {
> +typedef HashMap<int, WebCore::String> EntitiesMapType;

can you move this typedef into the WebEntities class?  maybe as a MapType
typedef?


> +class WebEntities {
> +public:
...
> +    WebEntities(bool xmlEntities);

nit: ^^^ can this ctor be made explicit?


> +
> +    // Check whether specified unicode has corresponding html or xml built-in
> +    // entity name. If yes, return the entity notation. If not, returns an
> +    // empty string. Parameter isHTML indicates check the code in html entity
> +    // map or in xml entity map.
> +    WebCore::String getEntityNameByCode(int code) const;

nit: I think it would be more conventional to name this method
"entityNameByCode"


> +++ b/WebKit/chromium/src/WebFrameImpl.cpp

> +WebFrameImpl* WebFrameImpl::fromFrameOwnerElement(Element* element)
> +{
> +    if (!element
> +        || !element->isFrameOwnerElement()
> +        || (!element->hasTagName(WebCore::HTMLNames::iframeTag) 
> +            && !element->hasTagName(WebCore::HTMLNames::frameTag)))
> +        return 0;
> +
> +    WebCore::HTMLFrameOwnerElement* frameElement =
> +        static_cast<WebCore::HTMLFrameOwnerElement*>(element);

nit: I think you can drop the WebCore:: prefixes above.


> +++ b/WebKit/chromium/src/WebPageSerializer.cpp
...
> +// FIXME: verify unicode conversions below with Darin.

please remove this comment once settled here.


> +WebString WebPageSerializer::generateMetaCharsetDeclaration(const WebString& charset)
> +{
> +    return String::format("<META http-equiv=\"Content-Type\" content=\"text/html; charset=%ls\">",
> +                          charset.utf8().data());

I think %ls is wrong since you are passing in a UTF-8 string.


> +WebString WebPageSerializer::generateMarkOfTheWebDeclaration(const WebURL& url)
> +{
> +    KURL kurl = url;
> +    return String::format("\n<!-- saved from url=(%04d)%s -->\n",
> +                          kurl.string().length(),
> +                          kurl.string().utf8().data());

do you really mean to report the number of UTF-16 code points here?  also,
since
you are printing the UTF-8 string, it seems like you would be better off just
getting
the string value of the WebURL, which is a WebCString, which is already UTF-8.


> +WebString WebPageSerializer::generateBaseTagDeclaration(const WebString& baseTarget)
> +{
> +    String targetDeclaration;
> +    if (!baseTarget.isEmpty())
> +        targetDeclaration = String::format(" target=\"%ls\"", baseTarget.utf8().data());
> +    return String::format("<BASE href=\".\"%ls>", targetDeclaration.utf8().data());

same issue with %ls.  i think you want %s.


> +++ b/WebKit/chromium/src/WebPageSerializerImpl.cpp
...

In the ChangeLog, I think you should attribute Johnny Ding as the original
author
of this code.

> +namespace WebKit {
...
> +static const WebEntities htmlEntities(false);
> +static const WebEntities xmlEntities(true);

please avoid file static instances of non-POD types.  these should be changed
to either be local variables or function statics that get lazily initialized.


> +// SerializeDomParam Constructor.

this comment isn't useful, so we should just delete it.


> +String WebPageSerializerImpl::preActionBeforeSerializeEndTag(
...
> +    if (param->skipMetaElement == element) {
> +        *needSkip = true;
> +    } else if (element->hasTagName(HTMLNames::scriptTag) ||
> +               element->hasTagName(HTMLNames::styleTag)) {

nits: ^^^ no brackets around single line statements and "||"
should go at the start of the next line.


> +void WebPageSerializerImpl::openTagToString(const Element* element,
...
> +                    if (attrValue.startsWith("javascript:", false)) {
> +                        result += attrValue;
> +                    } else {

nit: ^^^ no brackets around single line statments



> +                       // Get the absolute link
> +                        String completeURL = param->doc->completeURL(attrValue);

nit: fix indentation of comment


> +void WebPageSerializerImpl::buildContentForNode(const Node* node,
> +                                                SerializeDomParam* param)
> +{
> +    switch (node->nodeType()) {
> +        case Node::ELEMENT_NODE: {

nit: case should not be indented inside a switch


> +++ b/WebKit/chromium/src/WebPageSerializerImpl.h

> +#include "config.h"

do not include config.h in header files.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list