[webkit-dev] trac.webkit.org links via Google.com

Yaar Schnitman yaar at chromium.org
Mon Nov 30 23:38:45 PST 2009


A sitemap.xml file is a more modern way of telling Google how to crawl a
site and the traffic can be throttled in Google's webmaster tools (
http://www.google.com/webmasters/tools/).

Creating a daily script that generates sitemap.xml for webkit's SVN repo
should trivial. There are probably trac plugins that do that already. If
done right, google crawler shouldn't produce much more load than an average
developer doing a daily svn sync.

On Mon, Nov 30, 2009 at 11:00 PM, Eric Seidel <eric at webkit.org> wrote:

>
>
> On Tue, Dec 1, 2009 at 1:52 AM, Mark Rowe <mrowe at apple.com> wrote:
>>
>> rel=nofollow doesn't do what you think it does<http://en.wikipedia.org/wiki/Nofollow#What_nofollow_is_not_for>.
>>  It prevents a link from implying influence.  It doesn't prevent the link
>> from being followed and the destination content from being indexed.
>>
>
> Good to know.
>
>  "git grep" is hard to beat.
>>
>
> I totally agree!  I just often want trac urls for sharing with others, and
> assembling them from file paths is annoying sometimes. :)
>
>
> I looked briefly at google.com/codesearch but it doesn't seem to have
> found svn.webkit.org yet.  It claims we should ideally have a "codesearch
> sitemap" http://www.google.com/support/webmasters/bin/topic.py?topic=12640but I don't really know much about sitemaps or if that would even be a good
> idea.
>
> I don't see a sitemap listed in robots.txt (
> http://www.sitemaps.org/protocol.php#submit_robots), but maybe there is
> one tucked away somewhere, but I'm pretty clueless on the whole "hosting a
> website" thing. :)
>
> Thanks again.
>
> -eric
>
>
>>  On Tue, Dec 1, 2009 at 1:41 AM, Mark Rowe <mrowe at apple.com> wrote:
>>
>>>
>>> On 2009-11-30, at 22:36, Eric Seidel wrote:
>>>
>>> It's bothered me for a while that I can't just type "trac webkit
>>> Document.cpp" into Google and have it give me a trac link to our
>>> Document.cpp page.
>>> http://trac.webkit.org/browser/trunk/WebCore/dom/Document.cpp
>>>
>>> I checked http://trac.macosforge.org/robots.txt tonight and low and
>>> behold we disallow "browser/" (which is where all these links live).
>>>  Curious if this is intentional, and if we should change this setting?
>>>
>>>
>>> Web crawler indexing of Trac is seriously painful for the servers
>>> involved.  The entire SVN history of the repository is accessible.  File
>>> content.  Changes.  Annotations.  Everything.  That's not cheap to compute
>>> and serve up.
>>>
>>> - Mark
>>>
>>>
>>
>>
>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20091130/24d22f1b/attachment.html>


More information about the webkit-dev mailing list