Making WTF::StringImpl and WTF::AtomString thread-safe
Hi all, As part of the work for making FontCache thread-safe, it's necessary for there to be a thread-safe AtomString. After discussion, it seems that a thread-safe StringImpl is generally desirable and GPUProcess also has a need of it. I've filed a bug to track this work: https://bugs.webkit.org/show_bug.cgi?id=219285 Google have already done this for Blink and there's a nice plan and lots of discussion to read. Their plan document is linked in the bug. I think we'd be well-served by taking broadly the same approach, which is to make ref-counting on StringImpl atomic and to guard AtomStringTable access with a lock. Making String thread-safe has implications, of course, and I'd like to open discussion on this - Making ref-counting atomic on StringImpl has a significant, negative impact on the expense of ref and deref operations. I'm interested in discussing how we should approach this in terms of tracking the work in Bugzilla and how to go about landing it. Perhaps people also have alternative ideas? On the bug is a first-run at implementing the above approach, currently minus the follow-up of everywhere taking into consideration that String/AtomString are now thread-safe. The impact on StringImpl ref/deref performance has it running on my Xeon desktop machine at about 30-50% of non-atomic ref/deref performance. Speedometer 2.0 takes a 1-8% hit considering error margins, but I'm fairly certain it's mostly on the higher end of that and I've not run enough iterations just yet. Jetstream 1.1 seems practically unaffected, I can't run 2.0 with or without the patch, it appears to hang the browser on the bomb-workers test (at least if it completes, it's not in a reasonable time-frame). I would guess that results may vary wildly depending on platform and available atomic access primitives. As one might expect, the impact is far less on a debug build. I think the initial patch of making it thread-safe vs. the follow-up of altering various areas to take it into account could/should be split, but I assume we'd want to land them at the same time. This is cumbersome with how WebKit Bugzilla currently works and I'd like to hear what people think and how similar changes have been made in the past. Thoughts? Regards, Chris
A few thoughts here: 1. It’s not just ref counting. To make String thread-safe, you also need to address all other data members. That means all state in m_hashAndFlags, including the 8bit/16bit state. It appears that your testing strategy did not reveal this point so far; so, you probably need to expand your plan for unit testing concurrent access to a string, with a focus on writing tests that fail with the current implementation. 2. Performance is a challenge. I’ve tested making String thread-safe pretty extensively. On modern Apple Silicon, it’s free. But on Intel chips, it’s very expensive. Because it’s so expensive, and because we have a no regression policy for performance, I don’t think there’s a way to land this change in pieces. It has to be a mega-patch, so we can test its performance as a whole. 3. I’m surprised by the premise that thread-safe String is a requirement for FontCache and/or the GPU Process. It’s certainly a false premise that there’s consensus on this premise, since I do not agree. Can you share some problem statements regarding FontCache and/or the GPU Process that explain the problem we’re trying to solve? Thanks, Geoff
On Dec 1, 2020, at 9:09 AM, Chris Lord via webkit-dev <webkit-dev@lists.webkit.org> wrote:
Hi all,
As part of the work for making FontCache thread-safe, it's necessary for there to be a thread-safe AtomString. After discussion, it seems that a thread-safe StringImpl is generally desirable and GPUProcess also has a need of it. I've filed a bug to track this work: https://bugs.webkit.org/show_bug.cgi?id=219285
Google have already done this for Blink and there's a nice plan and lots of discussion to read. Their plan document is linked in the bug. I think we'd be well-served by taking broadly the same approach, which is to make ref-counting on StringImpl atomic and to guard AtomStringTable access with a lock.
Making String thread-safe has implications, of course, and I'd like to open discussion on this - Making ref-counting atomic on StringImpl has a significant, negative impact on the expense of ref and deref operations. I'm interested in discussing how we should approach this in terms of tracking the work in Bugzilla and how to go about landing it. Perhaps people also have alternative ideas?
On the bug is a first-run at implementing the above approach, currently minus the follow-up of everywhere taking into consideration that String/AtomString are now thread-safe. The impact on StringImpl ref/deref performance has it running on my Xeon desktop machine at about 30-50% of non-atomic ref/deref performance. Speedometer 2.0 takes a 1-8% hit considering error margins, but I'm fairly certain it's mostly on the higher end of that and I've not run enough iterations just yet. Jetstream 1.1 seems practically unaffected, I can't run 2.0 with or without the patch, it appears to hang the browser on the bomb-workers test (at least if it completes, it's not in a reasonable time-frame). I would guess that results may vary wildly depending on platform and available atomic access primitives. As one might expect, the impact is far less on a debug build.
I think the initial patch of making it thread-safe vs. the follow-up of altering various areas to take it into account could/should be split, but I assume we'd want to land them at the same time. This is cumbersome with how WebKit Bugzilla currently works and I'd like to hear what people think and how similar changes have been made in the past.
Thoughts?
Regards,
Chris _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
On 2020-12-01 18:21, Geoff Garen wrote:
A few thoughts here:
1. It’s not just ref counting.
To make String thread-safe, you also need to address all other data members. That means all state in m_hashAndFlags, including the 8bit/16bit state.
It appears that your testing strategy did not reveal this point so far; so, you probably need to expand your plan for unit testing concurrent access to a string, with a focus on writing tests that fail with the current implementation.
I did consider this, I've also made m_hashAndFlags atomic in the attached patch. Indeed, tests for concurrent string usage and expected behaviour would be desirable, I would take that as a given (but it's worth mentioning).
2. Performance is a challenge.
I’ve tested making String thread-safe pretty extensively. On modern Apple Silicon, it’s free. But on Intel chips, it’s very expensive.
Because it’s so expensive, and because we have a no regression policy for performance, I don’t think there’s a way to land this change in pieces. It has to be a mega-patch, so we can test its performance as a whole.
This is a bit of a shame, but understandable. It's a shame that bugs don't track unresolved dependent bugs so that if we split this into multiple bugs, at least we wouldn't have to submit rolled-up patches for EWS testing. I'll just keep them separate in my branch I suppose, and hopefully I can convince people to look there instead of Bugzilla sometimes... Yes, performance is definitely a challenge. It's helpful that Google have already trodden this path, but I'm sure there will be novel work required.
3. I’m surprised by the premise that thread-safe String is a requirement for FontCache and/or the GPU Process.
It’s certainly a false premise that there’s consensus on this premise, since I do not agree.
Can you share some problem statements regarding FontCache and/or the GPU Process that explain the problem we’re trying to solve?
I can't talk about GPU Process, this was something that was mentioned to me when I was talking about methods of making FontCache thread-safe. FontCache makes extensive use of AtomString for look-ups and comparisons. I had a few alternative ideas for making FontCache safe to use in a Worker, but after discussing them on Slack, it seemed like making FontCache safe for concurrent access and making String thread-safe were both desirable for future work. I really hope other people will chime in here, my personal preference would be to do something less invasive. Other ideas I had for making FontCache Worker-safe; - Add a FontCacheProxy object that calls onto the main-thread FontCache and blocks (not ideal for performance) - Make FontCache not rely on static data and have a completely separate FontCache per-Worker, created on-demand (not ideal for memory usage) - Make FontCache thread-safe, but do it via introducing a completely separate thread-safe AtomString type and leave the current one as it is (I don't have a good grasp of how difficult this would actually be) Cheers, Chris
Thanks, Geoff
On Dec 1, 2020, at 9:09 AM, Chris Lord via webkit-dev <webkit-dev@lists.webkit.org> wrote:
Hi all,
As part of the work for making FontCache thread-safe, it's necessary for there to be a thread-safe AtomString. After discussion, it seems that a thread-safe StringImpl is generally desirable and GPUProcess also has a need of it. I've filed a bug to track this work: https://bugs.webkit.org/show_bug.cgi?id=219285
Google have already done this for Blink and there's a nice plan and lots of discussion to read. Their plan document is linked in the bug. I think we'd be well-served by taking broadly the same approach, which is to make ref-counting on StringImpl atomic and to guard AtomStringTable access with a lock.
Making String thread-safe has implications, of course, and I'd like to open discussion on this - Making ref-counting atomic on StringImpl has a significant, negative impact on the expense of ref and deref operations. I'm interested in discussing how we should approach this in terms of tracking the work in Bugzilla and how to go about landing it. Perhaps people also have alternative ideas?
On the bug is a first-run at implementing the above approach, currently minus the follow-up of everywhere taking into consideration that String/AtomString are now thread-safe. The impact on StringImpl ref/deref performance has it running on my Xeon desktop machine at about 30-50% of non-atomic ref/deref performance. Speedometer 2.0 takes a 1-8% hit considering error margins, but I'm fairly certain it's mostly on the higher end of that and I've not run enough iterations just yet. Jetstream 1.1 seems practically unaffected, I can't run 2.0 with or without the patch, it appears to hang the browser on the bomb-workers test (at least if it completes, it's not in a reasonable time-frame). I would guess that results may vary wildly depending on platform and available atomic access primitives. As one might expect, the impact is far less on a debug build.
I think the initial patch of making it thread-safe vs. the follow-up of altering various areas to take it into account could/should be split, but I assume we'd want to land them at the same time. This is cumbersome with how WebKit Bugzilla currently works and I'd like to hear what people think and how similar changes have been made in the past.
Thoughts?
Regards,
Chris _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
1. It’s not just ref counting.
To make String thread-safe, you also need to address all other data members. That means all state in m_hashAndFlags, including the 8bit/16bit state.
It appears that your testing strategy did not reveal this point so far; so, you probably need to expand your plan for unit testing concurrent access to a string, with a focus on writing tests that fail with the current implementation.
I did consider this, I've also made m_hashAndFlags atomic in the attached patch. Indeed, tests for concurrent string usage and expected behaviour would be desirable, I would take that as a given (but it's worth mentioning).
Great!
3. I’m surprised by the premise that thread-safe String is a requirement for FontCache and/or the GPU Process.
It’s certainly a false premise that there’s consensus on this premise, since I do not agree.
Can you share some problem statements regarding FontCache and/or the GPU Process that explain the problem we’re trying to solve?
I can't talk about GPU Process, this was something that was mentioned to me when I was talking about methods of making FontCache thread-safe.
FontCache makes extensive use of AtomString for look-ups and comparisons. I had a few alternative ideas for making FontCache safe to use in a Worker, but after discussing them on Slack, it seemed like making FontCache safe for concurrent access and making String thread-safe were both desirable for future work. I really hope other people will chime in here, my personal preference would be to do something less invasive.
So the goal is to enable use of fonts (and FontCache) in Workers?
Other ideas I had for making FontCache Worker-safe; - Add a FontCacheProxy object that calls onto the main-thread FontCache and blocks (not ideal for performance)
Yeah, that doesn’t sound great.
- Make FontCache not rely on static data and have a completely separate FontCache per-Worker, created on-demand (not ideal for memory usage)
Seems OK. I think the main downside to this proposal is that an app that moves font-related work to a worker as a performance optimization will also experience a performance regression by hitting a cold FontCache.
- Make FontCache thread-safe, but do it via introducing a completely separate thread-safe AtomString type and leave the current one as it is (I don't have a good grasp of how difficult this would actually be)
I had to chuckle at this point because the obvious name for this new thread-safe AtomString class would be AtomicString, the prior name of AtomString. I think it might be worthwhile to prototype either a per-thread FontCache or a FontCache based on a custom string type or both. One reason it might be worthwhile is that it will reveal the non-string work that needs to happen to achieve thread safety (or at least thread isolation) in FontCache. For example, FontDataCache and its associated types look like they might need work. Another reason it might be worthwhile is that I believe that solving this problem just for FontCache will be a smaller project than making Strings generally thread-safe. I think you’re right that making Strings generally thread-safe would be good for the project overall. I’m just worried that it might set back the FontCache project. Thanks, Geoff
Cheers,
Chris
Thanks, Geoff
On Dec 1, 2020, at 9:09 AM, Chris Lord via webkit-dev <webkit-dev@lists.webkit.org> wrote:
Hi all,
As part of the work for making FontCache thread-safe, it's necessary for there to be a thread-safe AtomString. After discussion, it seems that a thread-safe StringImpl is generally desirable and GPUProcess also has a need of it. I've filed a bug to track this work: https://bugs.webkit.org/show_bug.cgi?id=219285
Google have already done this for Blink and there's a nice plan and lots of discussion to read. Their plan document is linked in the bug. I think we'd be well-served by taking broadly the same approach, which is to make ref-counting on StringImpl atomic and to guard AtomStringTable access with a lock.
Making String thread-safe has implications, of course, and I'd like to open discussion on this - Making ref-counting atomic on StringImpl has a significant, negative impact on the expense of ref and deref operations. I'm interested in discussing how we should approach this in terms of tracking the work in Bugzilla and how to go about landing it. Perhaps people also have alternative ideas?
On the bug is a first-run at implementing the above approach, currently minus the follow-up of everywhere taking into consideration that String/AtomString are now thread-safe. The impact on StringImpl ref/deref performance has it running on my Xeon desktop machine at about 30-50% of non-atomic ref/deref performance. Speedometer 2.0 takes a 1-8% hit considering error margins, but I'm fairly certain it's mostly on the higher end of that and I've not run enough iterations just yet. Jetstream 1.1 seems practically unaffected, I can't run 2.0 with or without the patch, it appears to hang the browser on the bomb-workers test (at least if it completes, it's not in a reasonable time-frame). I would guess that results may vary wildly depending on platform and available atomic access primitives. As one might expect, the impact is far less on a debug build.
I think the initial patch of making it thread-safe vs. the follow-up of altering various areas to take it into account could/should be split, but I assume we'd want to land them at the same time. This is cumbersome with how WebKit Bugzilla currently works and I'd like to hear what people think and how similar changes have been made in the past.
Thoughts?
Regards,
Chris _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
On Dec 9, 2020, at 1:02 PM, Geoff Garen via webkit-dev <webkit-dev@lists.webkit.org> wrote:
- Make FontCache thread-safe, but do it via introducing a completely separate thread-safe AtomString type and leave the current one as it is (I don't have a good grasp of how difficult this would actually be)
I had to chuckle at this point because the obvious name for this new thread-safe AtomString class would be AtomicString, the prior name of AtomString.
If we make a separate thread-safe code path, I’m not sure we need to create a variant of AtomString at all. AtomString optimizes string comparisons (by paying up front every time you construct one with a hash table lookup) and memory use (by sharing the same memory for all equal strings), but there’s no reason we can’t compare two strings without using AtomString. I’m doubting that once we figure out what we’re trying to do that we’ll need AtomString. — Darin
Hey Geoff and Chris,
On Dec 1, 2020, at 9:21 AM, Geoff Garen via webkit-dev <webkit-dev@lists.webkit.org> wrote:
A few thoughts here:
1. It’s not just ref counting.
To make String thread-safe, you also need to address all other data members. That means all state in m_hashAndFlags, including the 8bit/16bit state.
It appears that your testing strategy did not reveal this point so far; so, you probably need to expand your plan for unit testing concurrent access to a string, with a focus on writing tests that fail with the current implementation.
2. Performance is a challenge.
I’ve tested making String thread-safe pretty extensively. On modern Apple Silicon, it’s free. But on Intel chips, it’s very expensive.
First of all, crazy!
Because it’s so expensive, and because we have a no regression policy for performance, I don’t think there’s a way to land this change in pieces. It has to be a mega-patch, so we can test its performance as a whole.
Were you able to quantify anything additional about the performance regression on Intel, for instance, if there were any types of String usage that were particularly hard hit? If not (or you haven’t tried), do you think there are any metrics we should look into gathering that might help pin point where things could maybe be optimized? Ideally we would be able to identify places where we unnecessarily ref-deref StringImpls where either a move or judicious use of StringView would work just as well. The Blink document (linked in the bugzilla bug) suggests they identified some mitigations, but I am not clear they will be representative for us today. Putting on my old bindings hat, one area that might be able to get some wins here is how Strings (and other ref counted things, but let’s stay focused) are passed from the JS bindings layer to the DOM. In most cases Strings should be able to be moved into the DOM without refcount churn, but since many of our functions take `const String&` or `const AtomString&` we lose out. - Sam
3. I’m surprised by the premise that thread-safe String is a requirement for FontCache and/or the GPU Process.
It’s certainly a false premise that there’s consensus on this premise, since I do not agree.
Can you share some problem statements regarding FontCache and/or the GPU Process that explain the problem we’re trying to solve?
Thanks, Geoff
On Dec 1, 2020, at 9:09 AM, Chris Lord via webkit-dev <webkit-dev@lists.webkit.org> wrote:
Hi all,
As part of the work for making FontCache thread-safe, it's necessary for there to be a thread-safe AtomString. After discussion, it seems that a thread-safe StringImpl is generally desirable and GPUProcess also has a need of it. I've filed a bug to track this work: https://bugs.webkit.org/show_bug.cgi?id=219285
Google have already done this for Blink and there's a nice plan and lots of discussion to read. Their plan document is linked in the bug. I think we'd be well-served by taking broadly the same approach, which is to make ref-counting on StringImpl atomic and to guard AtomStringTable access with a lock.
Making String thread-safe has implications, of course, and I'd like to open discussion on this - Making ref-counting atomic on StringImpl has a significant, negative impact on the expense of ref and deref operations. I'm interested in discussing how we should approach this in terms of tracking the work in Bugzilla and how to go about landing it. Perhaps people also have alternative ideas?
On the bug is a first-run at implementing the above approach, currently minus the follow-up of everywhere taking into consideration that String/AtomString are now thread-safe. The impact on StringImpl ref/deref performance has it running on my Xeon desktop machine at about 30-50% of non-atomic ref/deref performance. Speedometer 2.0 takes a 1-8% hit considering error margins, but I'm fairly certain it's mostly on the higher end of that and I've not run enough iterations just yet. Jetstream 1.1 seems practically unaffected, I can't run 2.0 with or without the patch, it appears to hang the browser on the bomb-workers test (at least if it completes, it's not in a reasonable time-frame). I would guess that results may vary wildly depending on platform and available atomic access primitives. As one might expect, the impact is far less on a debug build.
I think the initial patch of making it thread-safe vs. the follow-up of altering various areas to take it into account could/should be split, but I assume we'd want to land them at the same time. This is cumbersome with how WebKit Bugzilla currently works and I'd like to hear what people think and how similar changes have been made in the past.
Thoughts?
Regards,
Chris _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Because it’s so expensive, and because we have a no regression policy for performance, I don’t think there’s a way to land this change in pieces. It has to be a mega-patch, so we can test its performance as a whole.
Were you able to quantify anything additional about the performance regression on Intel, for instance, if there were any types of String usage that were particularly hard hit? If not (or you haven’t tried), do you think there are any metrics we should look into gathering that might help pin point where things could maybe be optimized? Ideally we would be able to identify places where we unnecessarily ref-deref StringImpls where either a move or judicious use of StringView would work just as well.
I didn’t debug deeply. Maybe we’ll be able to reduce refcount changes. Maybe not.
The Blink document (linked in the bugzilla bug) suggests they identified some mitigations, but I am not clear they will be representative for us today.
It’s not obvious to me that what we’re doing and what Blink are doing are analogous. One difference is that we use the String class in our JavaScript engine. Another difference is that we have a no regression policy on benchmarks. Thanks, Geoff
Putting on my old bindings hat, one area that might be able to get some wins here is how Strings (and other ref counted things, but let’s stay focused) are passed from the JS bindings layer to the DOM. In most cases Strings should be able to be moved into the DOM without refcount churn, but since many of our functions take `const String&` or `const AtomString&` we lose out.
- Sam
3. I’m surprised by the premise that thread-safe String is a requirement for FontCache and/or the GPU Process.
It’s certainly a false premise that there’s consensus on this premise, since I do not agree.
Can you share some problem statements regarding FontCache and/or the GPU Process that explain the problem we’re trying to solve?
Thanks, Geoff
On Dec 1, 2020, at 9:09 AM, Chris Lord via webkit-dev <webkit-dev@lists.webkit.org> wrote:
Hi all,
As part of the work for making FontCache thread-safe, it's necessary for there to be a thread-safe AtomString. After discussion, it seems that a thread-safe StringImpl is generally desirable and GPUProcess also has a need of it. I've filed a bug to track this work: https://bugs.webkit.org/show_bug.cgi?id=219285
Google have already done this for Blink and there's a nice plan and lots of discussion to read. Their plan document is linked in the bug. I think we'd be well-served by taking broadly the same approach, which is to make ref-counting on StringImpl atomic and to guard AtomStringTable access with a lock.
Making String thread-safe has implications, of course, and I'd like to open discussion on this - Making ref-counting atomic on StringImpl has a significant, negative impact on the expense of ref and deref operations. I'm interested in discussing how we should approach this in terms of tracking the work in Bugzilla and how to go about landing it. Perhaps people also have alternative ideas?
On the bug is a first-run at implementing the above approach, currently minus the follow-up of everywhere taking into consideration that String/AtomString are now thread-safe. The impact on StringImpl ref/deref performance has it running on my Xeon desktop machine at about 30-50% of non-atomic ref/deref performance. Speedometer 2.0 takes a 1-8% hit considering error margins, but I'm fairly certain it's mostly on the higher end of that and I've not run enough iterations just yet. Jetstream 1.1 seems practically unaffected, I can't run 2.0 with or without the patch, it appears to hang the browser on the bomb-workers test (at least if it completes, it's not in a reasonable time-frame). I would guess that results may vary wildly depending on platform and available atomic access primitives. As one might expect, the impact is far less on a debug build.
I think the initial patch of making it thread-safe vs. the follow-up of altering various areas to take it into account could/should be split, but I assume we'd want to land them at the same time. This is cumbersome with how WebKit Bugzilla currently works and I'd like to hear what people think and how similar changes have been made in the past.
Thoughts?
Regards,
Chris _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org <mailto:webkit-dev@lists.webkit.org> https://lists.webkit.org/mailman/listinfo/webkit-dev <https://lists.webkit.org/mailman/listinfo/webkit-dev>
participants (4)
-
Chris Lord
-
Darin Adler
-
Geoff Garen
-
Sam Weinig