PDA

View Full Version : Why strange symbols for punctuation?


Gary T
11-06-2015, 10:55 AM
I was reading something online where appeared where there clearly should have been an apostrophe. I have a vague understanding that this might happen due to some sort of format mismatch, but I have two questions:

1) Why not just use an apostrophe? If they can come up with those characters, surely an apostrophe is possible, isn't it?

2) How/why did that particular odd collection of symbols get selected to substitute for an apostrophe? It just seems quite bizarre.

Telemark
11-06-2015, 11:04 AM
As you mentioned, it's probably a character set issue. There isn't an apostrophe character, there's just a code that tells the browser to display an apostrophe. Depending on which character set you are using, the code for apostrophe is different. If you're expecting it to be in one character set and it comes in another, the system will display what corresponds to the character in that character set. It could be an issue with Unicode - a universal encoding system that is common but not universal.

To sum up:

1) They did use an apostrophe in the character set they were using
2) That's just what the encoding corresponds to in the character set being displayed.

Gary T
11-06-2015, 11:09 AM
So is it correct to see this as a parallel to alt codes?

Is it possible that it an apostrophe would appear if I used a different browser to view the item, or does it have to do with the browser used by the originator?

Are there similar codes for every character, including letters? If so, why don't I ever seem to see codes instead of letters? If not, why isn't there just an apostrophe instead of a code for an apostrophe?

Kamino Neko
11-06-2015, 11:28 AM
If not, why isn't there just an apostrophe instead of a code for an apostrophe?

What you're seeing isn't the code for the apostrophe, but what is coded for by the code that the original text used for the apostrophe in the encoding that it was outputted as. What you see will depend on just what the encoding mistake that caused the problem was.

And it most likely happened because it was supposed to be a curly quote, rather than a ' - the latter would generally be the same across encodings (as letters are, which is why they came out correctly), but the curly quotes aren't - and the encoding used for the display either doesn't have curly quotes, or encodes them to a different point.

This, BTW, is called Mojibake (https://en.wikipedia.org/wiki/Mojibake), Buchstabensalat, and a bunch of other things. (Article also goes into more detail of how it happens.)

Blaster Master
11-06-2015, 11:30 AM
So is it correct to see this as a parallel to alt codes?

Alt codes are essentially exactly this; you're directly accessing the ASCII table for the character set you're using. Generally, the same symbols are in the same place for different sets, but not always.

Is it possible that it an apostrophe would appear if I used a different browser to view the item, or does it have to do with the browser used by the originator?

It could be a browser issue. It could also just be a font issue. Sometimes fonts either don't store common symbols in the same order for some reason, or there's a similar symbol elsewhere in the table that got it screwed up.

As an example, I was recently doing some work directly manipulating character sets in some code I was writing but ran across an error when attempting to parse on a hyphen. As it turned out, some of the hyphens were hyphens and some were actually dashes (often shown as a double hyphen), but in the particular character set I was using, both looked the same, so I was really confused by the parsing until I looked at the actual ASCII code for the character

Are there similar codes for every character, including letters? If so, why don't I ever seem to see codes instead of letters? If not, why isn't there just an apostrophe instead of a code for an apostrophe?

For ASCII, there's a standard for the basic set of characters, so all the digits, lower case, upper case, and common symbols are supposed to be in the same place. The issues like this usually come in the extended portion. One browser or character set might use the standard apostrophe, and another one might prefer a non-standard one for some reason. I've seen these issues particularly when jumping between browsers, word processors, character sets, etc.

That said, typically this just results when the designer either didn't test it in common browsers, you're using an uncommon browser, or it's using some obscure font or formatting or whatever that you don't have.

Gary T
11-06-2015, 11:51 AM
Thank you all for the replies, which are helpful.

leahcim
11-06-2015, 12:07 PM
The sequence of character is the UTF-8 encoding of the "right curly single quotation mark", as seen by a browser expecting basic ASCII. That character is unicode code point U+2019 (http://fileformat.info/info/unicode/char/2019/index.htm).

In UTF-8, code points larger than 127 are encoded as multiple bytes, which in this case are (in hex) E2 80 99. If viewed by a non-unicode-supporting browser (or just one that hasn't identified that the page is in UTF-8), this gives "‚" in some character sets.

The reason this wired encoding is done is so that UTF-8 strings correspond exactly to ASCII strings if code points less than 128 are used. That is why the page renders mostly correctly, even though the encoding is wrong.

Nava
11-06-2015, 12:27 PM
It could also just be a font issue. Sometimes fonts either don't store common symbols in the same order for some reason, or there's a similar symbol elsewhere in the table that got it screwed up.

And some are missing a lot of symbols: it's relatively common to encounter fonts that don't have any diacritics, the crossed o, double vowels... so if you're writing in a language which includes those, you'll get most of the text in the chosen font but those letters come up in a different one. It's still the right letter, but the whole thing looks up more or less like this.

Best Topics: dope torrents camel cigarette pack waxed rutabaga boiling rubbing alcohol family guy lost praying mantis cute 86ed meaning pooping euphemisms houses with towers nice gams origin doctor reflector headband pair of pants shaving styptic pencil toner phoner skinned human can anyone whistle broken muffler vent into attic make a lake civ 3 strategy domino's brooklyn style tablecloth pad rugrats netflix od green camo pixie sticks drug long surgeries do not push non foaming toothpaste rebar concrete driveway mst3k ortega zzzquil yahoo answers hospital toilets drano overnight mpa vs jd volt amps reactive a fifth alcohol buy cocaine plant seeds handsome jack spoon quote best place to spend fake money plugged ear and ringing what kind of doctor does vasectomy how to cancel a cashiers check butane lighter refill where to buy medium or soft toothbrush is gallium toxic to eat who has the most acting credits on imdb use tivo without service i smell burnt toast take your stinking paws off me you damn dirty ape mad mad mad mad world remake ground beef in fridge for 7 days tv shows without laugh tracks how to tell if electrical box is grounded best andy griffith show episodes ring toss bottle game how to make one way glass mcafee at&t free hyundai reliability vs honda gigablast commercial boy or girl will home depot cut tile copy of car title illinois how to install vinyl shutters on vinyl siding in the studio, trent reznor records and performs all of the parts for this group. what does pack sand mean how long does it take to read a 250 page book how to drive semi truck it's all fun and games until someone gets hurt quote 5 suited deck of cards what if wallace became president japanese movement vs swiss taco bell pizza hut near me