PDA

View Full Version : Why strange symbols for punctuation?


Gary T
11-06-2015, 09:55 AM
I was reading something online where appeared where there clearly should have been an apostrophe. I have a vague understanding that this might happen due to some sort of format mismatch, but I have two questions:

1) Why not just use an apostrophe? If they can come up with those characters, surely an apostrophe is possible, isn't it?

2) How/why did that particular odd collection of symbols get selected to substitute for an apostrophe? It just seems quite bizarre.

Telemark
11-06-2015, 10:04 AM
As you mentioned, it's probably a character set issue. There isn't an apostrophe character, there's just a code that tells the browser to display an apostrophe. Depending on which character set you are using, the code for apostrophe is different. If you're expecting it to be in one character set and it comes in another, the system will display what corresponds to the character in that character set. It could be an issue with Unicode - a universal encoding system that is common but not universal.

To sum up:

1) They did use an apostrophe in the character set they were using
2) That's just what the encoding corresponds to in the character set being displayed.

Gary T
11-06-2015, 10:09 AM
So is it correct to see this as a parallel to alt codes?

Is it possible that it an apostrophe would appear if I used a different browser to view the item, or does it have to do with the browser used by the originator?

Are there similar codes for every character, including letters? If so, why don't I ever seem to see codes instead of letters? If not, why isn't there just an apostrophe instead of a code for an apostrophe?

Kamino Neko
11-06-2015, 10:28 AM
If not, why isn't there just an apostrophe instead of a code for an apostrophe?

What you're seeing isn't the code for the apostrophe, but what is coded for by the code that the original text used for the apostrophe in the encoding that it was outputted as. What you see will depend on just what the encoding mistake that caused the problem was.

And it most likely happened because it was supposed to be a curly quote, rather than a ' - the latter would generally be the same across encodings (as letters are, which is why they came out correctly), but the curly quotes aren't - and the encoding used for the display either doesn't have curly quotes, or encodes them to a different point.

This, BTW, is called Mojibake (https://en.wikipedia.org/wiki/Mojibake), Buchstabensalat, and a bunch of other things. (Article also goes into more detail of how it happens.)

Blaster Master
11-06-2015, 10:30 AM
So is it correct to see this as a parallel to alt codes?

Alt codes are essentially exactly this; you're directly accessing the ASCII table for the character set you're using. Generally, the same symbols are in the same place for different sets, but not always.

Is it possible that it an apostrophe would appear if I used a different browser to view the item, or does it have to do with the browser used by the originator?

It could be a browser issue. It could also just be a font issue. Sometimes fonts either don't store common symbols in the same order for some reason, or there's a similar symbol elsewhere in the table that got it screwed up.

As an example, I was recently doing some work directly manipulating character sets in some code I was writing but ran across an error when attempting to parse on a hyphen. As it turned out, some of the hyphens were hyphens and some were actually dashes (often shown as a double hyphen), but in the particular character set I was using, both looked the same, so I was really confused by the parsing until I looked at the actual ASCII code for the character

Are there similar codes for every character, including letters? If so, why don't I ever seem to see codes instead of letters? If not, why isn't there just an apostrophe instead of a code for an apostrophe?

For ASCII, there's a standard for the basic set of characters, so all the digits, lower case, upper case, and common symbols are supposed to be in the same place. The issues like this usually come in the extended portion. One browser or character set might use the standard apostrophe, and another one might prefer a non-standard one for some reason. I've seen these issues particularly when jumping between browsers, word processors, character sets, etc.

That said, typically this just results when the designer either didn't test it in common browsers, you're using an uncommon browser, or it's using some obscure font or formatting or whatever that you don't have.

Gary T
11-06-2015, 10:51 AM
Thank you all for the replies, which are helpful.

leahcim
11-06-2015, 11:07 AM
The sequence of character is the UTF-8 encoding of the "right curly single quotation mark", as seen by a browser expecting basic ASCII. That character is unicode code point U+2019 (http://fileformat.info/info/unicode/char/2019/index.htm).

In UTF-8, code points larger than 127 are encoded as multiple bytes, which in this case are (in hex) E2 80 99. If viewed by a non-unicode-supporting browser (or just one that hasn't identified that the page is in UTF-8), this gives "‚" in some character sets.

The reason this wired encoding is done is so that UTF-8 strings correspond exactly to ASCII strings if code points less than 128 are used. That is why the page renders mostly correctly, even though the encoding is wrong.

Nava
11-06-2015, 11:27 AM
It could also just be a font issue. Sometimes fonts either don't store common symbols in the same order for some reason, or there's a similar symbol elsewhere in the table that got it screwed up.

And some are missing a lot of symbols: it's relatively common to encounter fonts that don't have any diacritics, the crossed o, double vowels... so if you're writing in a language which includes those, you'll get most of the text in the chosen font but those letters come up in a different one. It's still the right letter, but the whole thing looks up more or less like this.

Send questions for Cecil Adams to: [email protected]

Send comments about this website to:

Terms of Use / Privacy Policy

Advertise on the Straight Dope!
(Your direct line to thousands of the smartest, hippest people on the planet, plus a few total dipsticks.)

Copyright 2018 STM Reader, LLC.

Best Topics: luminous poison camouflage effectiveness radio announcer microphone pronounce anesthesiologist universal gate codes dogfish in michigan renting vending machines butcher hat minesweeper 1221 small multivitamins what is biff short for best dye for cotton does walgreens have a bathroom rectum damn near killed him what would i look like with a mustache what does cartman say in german citing multiple sentences from one source mla how to blind a camera how long will chili keep dark souls 2 lightning weapon how to get smoke off walls speaker wire for dc power taking classes at community college after graduating