[webkit-dev] Spam and indexing
Lucas Forschler
lforschler at apple.com
Thu Mar 28 15:34:47 PDT 2019
> On Mar 28, 2019, at 2:10 PM, Konstantin Tokarev <annulen at yandex.ru> wrote:
>
>
>
> 28.03.2019, 23:58, "Alexey Proskuryakov" <ap at webkit.org <mailto:ap at webkit.org>>:
>> Hello,
>>
>> The robots.txt file that we have on bugs.webkit.org <http://bugs.webkit.org/> currently allows search engines access to individual bug pages, but not to any bug lists. As a result, search engines and the Internet Archive only index bugs that were filed before robots.txt changes a few years ago, and bugs that are directly linked from webpages elsewhere. These bugs are where most spam content naturally ends up on.
>>
>> This is quite wrong, as indexing just a subset of bugs is not beneficial to anyone other than spammers. So we can go in either direction:
>>
>> 1. Allow indexers to enumerate bugs, thus indexing all of them.
>>
>> Seems reasonable that people should be able to find bugs using search engines.
>
> Yes, and it may give better result even than searching bugzilla directly
>
>> On the other hand, we'll need to do something to ensure that indexers don't destroy Bugzilla performance,
>
> This can be solved by caching
>
>> and of course spammers will love having more flexibility.
>
> rel="nofollow" on all links in comments should be enough to make spamming useless
Theoretically yes… but a couple google searches say it doesn’t make a difference. Here is one of many
https://www.seroundtable.com/google-nofollow-link-attribute-failed-comments-26959.html <https://www.seroundtable.com/google-nofollow-link-attribute-failed-comments-26959.html>
I expect that spammers don’t reply care if they get a nofollow or not, they are mostly un-manned scripts anyway.
I’m not opposed to adding this, I just don’t expect it will solve the problem. We could measure and see.
Lucas
>
>>
>> 2. Block indexing completely.
>>
>> Seems like no one was bothered by lack of indexing on new bugs so far.
>
> That's survival bias - if nobody can find relevant bugs, nobody will ever complain
>
>>
>> Thoughts?
>>
>> For reference, here is the current robots.txt content:
>>
>> $ curl https://bugs.webkit.org/robots.txt
>> User-agent: *
>> Allow: /index.cgi
>> Allow: /show_bug.cgi
>> Disallow: /
>> Crawl-delay: 20
>>
>> - Alexey
>> - Alexey
>>
>> _______________________________________________
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
>> https://lists.webkit.org/mailman/listinfo/webkit-dev
>
> --
> Regards,
> Konstantin
>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org <mailto:webkit-dev at lists.webkit.org>
> https://lists.webkit.org/mailman/listinfo/webkit-dev <https://lists.webkit.org/mailman/listinfo/webkit-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20190328/b741dd06/attachment.html>
More information about the webkit-dev
mailing list