[webkit-dev] DOMDocument outerHTML and NSStringEncoding

Dan Wood list3679321 at karelia.com
Mon Jan 30 10:01:23 PST 2006


You could scan through the string for charset= as you suggest, but if  
it's not declared, maybe it's an error condition or you could ask the  
user or you could insert your own declaration.  It might not be safe  
to assume ISOLatin1.  It would be nice to have some way to figure out  
the "ideal" encoding for a given string; you could try  
CFStringGetSmallestEncoding or CFStringGetFastestEncoding.

(What I'd like is a way to analyze the desired encoding, and the  
given string, and see which characters are illegal for that encoding  
that need to be escaped with &# entities ... but that's a different  
story!)



Here's some simple NSString category methods to convert charset <->  
NSStringEncoding.

- (NSStringEncoding)encodingFromCharset
{
	CFStringEncoding cfEncoding
	= CFStringConvertIANACharSetNameToEncoding((CFStringRef)self);
	NSStringEncoding encoding = CFStringConvertEncodingToNSStringEncoding 
(cfEncoding);
	return encoding;
}

+ (NSString *)charsetFromEncoding:(NSStringEncoding)anEncoding
{
	CFStringEncoding encoding = CFStringConvertNSStringEncodingToEncoding 
(anEncoding);
	CFStringRef result = CFStringConvertEncodingToIANACharSetName 
(encoding);
	return (NSString *)result;
}



Dan


On Jan 29, 2006, at 11:19 PM, Benoit Marchant wrote:

> Hi
>
> I would like to save the outerHTML of a DOMDocument, but to do the  
> right thing and get an NSData out of that string, I need to know  
> what is the NSStringEncoding that should be used, which would be  
> derived from the the charset declaration  of the document, and  
> ISOLatin1 if there are no charset declared I think.
>
> I didn't find an obvious answer looking at the documentation. Is  
> there one ? And if not, is there a way to avoid to re create a  
> mapping between charset declarations and NSStringEncodings ?
>
>



More information about the webkit-dev mailing list