<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">I think I'm starting to see what you're saying, so let me restate it my way to be sure:<div class=""><br class=""></div><div class="">Code that deals with various loaded blobs must be resilient against them getting larger than a 32-bit size can handle and this is the subject of ongoing development effort. Such code may or may not use a Vector. It definitely must use size_t for all of its sizes. The trouble with Vector using unsigned in its API then causes the following troubles:</div><div class=""><br class=""></div><div class=""><b class="">If the code uses a Vector for the large data:</b> in addition to not being able to actually hold all of the data, there will be a risk of truncation from size_t to unsigned on the client side of the Vector, which is much harder to prevent than a truncation inside Vector itself. For such code the ideal would be a Vector that measures itself using size_t, both in its internal data structure and in its API.</div><div class=""><br class=""></div><div class=""><b class="">If the code does not use Vector for that large data but uses it nonetheless for other things:</b> things won't obviously go wrong but you end up in sketchy territory: you will have code that is juggling unsigned's for the Vector and size_t's for the large data. Any arithmetic between the two can easily become suspect. Consider a case where you might use a metric of size or offset in the large data in some arithmetic that flows into Vector. If Vector's indexing operations use unsigned, then you will have truncation.</div><div class=""><br class=""></div><div class="">These problems fall into two categories:</div><div class=""><br class=""></div><div class="">1) Actually not being able to store the large data but failing gracefully when it happens.</div><div class=""><br class=""></div><div class="">2) Truncation leading to some kind of undesirable behavior.</div><div class=""><br class=""></div><div class="">Someone mentioned something about warnings that can tell us about (2). Apparently we cannot turn the warning on right now. The hope is that if we convert everything to unsigned then we won't have the warnings in trunk and then we can turn the warning on, and catch such bugs. This seems very aspirational.</div><div class=""><br class=""></div><div class="">Also, it means that everyone hacking on code that mixes "sizes of files" with "sizes of vectors" will have to scratch their heads about which type to use, and then if they do it wrong they will get some warnings. Of course, the warnings can easily be silenced with a cast. That's super dangerous. In my experience, if the easiest way to make the compiler STFU is by doing the wrong thing (static_cast, in this case) then people will do the wrong thing quite frequently.</div><div class=""><br class=""></div><div class="">(A particularly funny example of this is checked exceptions in Java. If you call a method declared to throw an exception then the caller must either catch it or declare that they throw it, too. The easiest way to silence the error is to catch it with an empty catch block that ignores the exception, since that requires the fewest keystrokes. Someone once did a study of large Java codebases and found such empty catch blocks to be a chronic problem - people did it a lot and it was responsible for bad bugs. The lesson: a type error is only a good thing if the path of least resistance to making the compiler STFU involves actually improving the code.)</div><div class=""><br class=""></div><div class=""><div class="">An alternate approach is to agree to use size_t for sizes. This is an easy to explain rule and it should be easy for reviewers to catch it. It might also make that compiler warning more meaningful, in the sense that in a world where we are mixing size_t and unsigned we will *have* to have some casts between the two but in a world where we use size_t uniformly any such cast is an immediate red flag.</div><div class=""><br class=""></div><div class="">So, I think I'm starting to agree with the size_t approach.</div><div class=""><br class=""></div><div class="">I wonder what the downsides are to this approach. Footprint of Vector?</div><div class=""><br class=""></div><div class="">-Filip</div><div class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Nov 20, 2014, at 9:26 AM, Alexey Proskuryakov <<a href="mailto:ap@webkit.org" class="">ap@webkit.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=koi8-r" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div class=""><div class="">19 нояб. 2014 г., в 14:58, Alexey Proskuryakov <<a href="mailto:ap@webkit.org" class="">ap@webkit.org</a>> написал(а):</div><br class="Apple-interchange-newline"><blockquote type="cite" class=""><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">These and related uses are all over the place - see also Vectors in FormDataBuilder, data returned from FrameLoader::loadResourceSynchronously, plug-in code that loads from network, SharedBuffer etc.</div></blockquote><br class=""></div><div class="">Another way to say this is: we need to deal with large data arrays throughout loading code. We do not really need full size vectors in most other code - it's sort of OK for HTML parser or for image decoder to fail catastrophically when there is too much data fed to it.</div><div class=""><br class=""></div><div class="">This is somewhat questionable design, but if we are going to stick to it, then magnitude checks should be centralized, not sprinkled throughout the code. We should not make this check explicitly when feeding a network resource to the parser, for example.</div><div class=""><br class=""></div><div class="">A 64-bit API for Vector solves this nearly flawlessly. We do not perform the checks manually every time we use a Vector, Vector does them for us.</div><div class=""><br class=""></div><div class="">Other options are:</div><div class=""><br class=""></div><div class="">- uint64_t everywhere. This way, we'll solve practical problems with large resources once and for all. Also, this may prove to be necessary to solve even YouTube/Google Drive uploads, I do not know that yet.</div><div class=""><br class=""></div><div class="">- size_t everywhere. Same effect on 64-bit platforms, while 32-bit ones will still be limited. I like this option, because it won't make us pay the memory and performance cost on old crippled 32-bit machines, which are unlikely to be used for manipulating huge volumes of data anyway.</div><div class=""><br class=""></div><div class="">We'll also need to develop some magic for 53-bit JS bindings, which I'm not sure about.</div><br class=""><div apple-content-edited="true" class="">
<div style="letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">- Alexey</div><div class=""><br class=""></div></div></div></div></div></blockquote></div><br class=""></div></div></body></html>