DOMDocument outerHTML and NSStringEncoding
Hi I would like to save the outerHTML of a DOMDocument, but to do the right thing and get an NSData out of that string, I need to know what is the NSStringEncoding that should be used, which would be derived from the the charset declaration of the document, and ISOLatin1 if there are no charset declared I think. I didn't find an obvious answer looking at the documentation. Is there one ? And if not, is there a way to avoid to re create a mapping between charset declarations and NSStringEncodings ? Thanks, Benoit
You could scan through the string for charset= as you suggest, but if it's not declared, maybe it's an error condition or you could ask the user or you could insert your own declaration. It might not be safe to assume ISOLatin1. It would be nice to have some way to figure out the "ideal" encoding for a given string; you could try CFStringGetSmallestEncoding or CFStringGetFastestEncoding. (What I'd like is a way to analyze the desired encoding, and the given string, and see which characters are illegal for that encoding that need to be escaped with entities ... but that's a different story!) Here's some simple NSString category methods to convert charset <-> NSStringEncoding. - (NSStringEncoding)encodingFromCharset { CFStringEncoding cfEncoding = CFStringConvertIANACharSetNameToEncoding((CFStringRef)self); NSStringEncoding encoding = CFStringConvertEncodingToNSStringEncoding (cfEncoding); return encoding; } + (NSString *)charsetFromEncoding:(NSStringEncoding)anEncoding { CFStringEncoding encoding = CFStringConvertNSStringEncodingToEncoding (anEncoding); CFStringRef result = CFStringConvertEncodingToIANACharSetName (encoding); return (NSString *)result; } Dan On Jan 29, 2006, at 11:19 PM, Benoit Marchant wrote:
Hi
I would like to save the outerHTML of a DOMDocument, but to do the right thing and get an NSData out of that string, I need to know what is the NSStringEncoding that should be used, which would be derived from the the charset declaration of the document, and ISOLatin1 if there are no charset declared I think.
I didn't find an obvious answer looking at the documentation. Is there one ? And if not, is there a way to avoid to re create a mapping between charset declarations and NSStringEncodings ?
participants (2)
-
Benoit Marchant
-
Dan Wood