[webkit-dev] Pointers and self-documenting code

Fri Jul 6 14:26:50 PDT 2012

> From: Filip Pizlo [fpizlo at apple.com]
> Sent: Friday, July 06, 2012 4:52 PM
> To: Joe Mason
> Cc: WebKit Development ‎[webkit-dev at lists.webkit.org]‎
> Subject: Re: [webkit-dev] Pointers and self-documenting code
> 
> It's not at all clear that this is correct.
> 
> Is it correct for the font size to be zero if the frame doesn't exist?  Seems 
> dubious.
> 
> Is it correct for the font size to be zero if the page doesn't exist?  Seems 
> dubious.
> 
> Adding null checking only makes sense if you have a good story for what the 
> code ought to do if the pointer in question is null.

Well, yes, but that's not the point of the example.  I thought about adding a comment, "or some suitable default," but I figured that would be pedantic.

Regardless of whether my example default was a good one, it's clear that dereferencing null and crashing is NOT the correct thing to do.

> This is a questionable policy.  Often object properties have transient 
> nullness.  They may be null at time T1 and non-null at time T2, and the caller
>  may know that the property is non-null from context (the value of some other 
> field, the fact that it's performed some action that forces the field to 
> become non-null, etc).

In this case the caller should specifically assert that the property is not null, to let the person reading the code know that it's a possibility that's been considered and it's believed to be impossible.

> Thus statically enforcing the non-nullness of fields is likely to just make 
> the code more complicated.  And it doesn't buy anything.

It buys a lot!  It buys clarity. It makes the code less fragile. It's a lower barrier to entry, because less knowledge is needed to understand the code.  In some cases it provides a compile-time correctness guarantee, which is a good thing - not in all cases, but sometimes is better than never.

There are three categories of accessor:

1. Those that can never return null
2. Those that can potentially return null, and in circumstances complex enough that developers can't usefully keep track while writing algorithms
3. Those that can potentially return null, but only in circumstances that are well defined and easy to reason about (the "transient nullness" you mention)

For type 1, I argue that we should return references to indicate this statically.  I don't see how this makes the code more complicated.
For type 2, we need to always do null checks.  Anything else would be unsafe.
For type 3, we're in the exact situation we are now - the function signature will return a pointer, and the only way to know whether it needs to be checked for null or not is to read the code and reason about its usage.

I would argue that we're currently too stingy with null checks for type 3, because it's very hard to tell why it's safe to skip null checks by looking at small parts of the code, and it's better to err on the side of safety.  But that's a more subtle discussion, and I don't want to focus on that part.  I'm more interested in the low-hanging fruit of type 1 vs type 2 right now.

Joe
---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.