The Museum of HP Calculators

HP Articles Forum

[Return to the Index ]
[ Previous | Next ]


Mapping HP48 Text to Unicode

Posted by Chris Dreher on 16 Jan 2013, 1:15 a.m.

Here is a table for how to translate HP48 text characters into Unicode characters, which is what modern computers use for text. Now HP48 developers create software that transfers, displays, or edits HP48 characters (ex: copying a file from an HP48 calc to a computer) with code that will reliably display a corresponding character. By using this mapping table, we should be able to avoid the garbage data, bugs, and crashes that has been an issue for some PC/Mac/Linux side HP48 software.

Further details and explanations in an easier to read format is available at http://www.drehersoft.com/mapping-hp48-text-to-unicode

Mapping HP48 Text to Unicode
Most HP48 characters can be directly mapped to Unicode characters for the ranges of 0x00 to 0x1E, 0x20 to 0x7E, and 0xA0 to 0xFF. For example, an HP48 character of 'A' is 0x41 (65 in decimal) and in Unicode would be 0041 (65 in decimal). However, the 34 special characters of 0x1F and from 0x7F to 0x9F should be translated by the below table:

  HP48         Unicode    
Dec   Hex    Code    Name
---------    ------------
31     1F    2026    Ellipsis
127    7F    2592    Medium Shade
128    80    2220    Measured Angle
129    81    0101    Latin Small Letter a with Macron
130    82    2207    Nabla
131    83    221A    Square Root
132    84    222B    Integral
133    85    03A3    Greek Capital Letter Sigma
134    86    25B6    Black Right-Pointing Triangle
135    87    03C0    Greek Small Letter Pi
136    88    2202    Partial Differential
137    89    2264    Less-Than or Equal To
138    8A    2265    Greater-Than or Equal To
139    8B    2260    Not Equal To
140    8C    03B1    Greek Small Letter Alpha
141    8D    2192    Rightwards Arrow
142    8E    2190    Leftwards Arrow
143    8F    2193    Downwards Arrow
144    90    2191    Upwards Arrow
145    91    03B3    Greek Small Letter Gamma
146    92    03B4    Greek Small Letter Delta
147    93    03B5    Greek Small Letter Epsilon
148    94    03B7    Greek Small Letter Eta
149    95    03B8    Greek Small Letter Theta
150    96    03BB    Greek Small Letter Lamda
151    97    03C1    Greek Small Letter Rho
152    98    03C3    Greek Small Letter Sigma
153    99    03C4    Greek Small Letter Tau
154    9A    03C9    Greek Small Letter Omega
155    9B    0394    Greek Capital Letter Delta
156    9C    03A0    Greek Capital Letter Pi
157    9D    03A9    Greek Capital Letter Omega
158    9E    25A0    Black Square
159    9F    221E    Infinity
If you are using UTF-8, then it is necessary to encode each Unicode characters into the appropriate 1, 2, or 3 byte sequences.

[bold]Rationale[/bold]
In some cases, the choice of what Unicode character to use was trivial. Some of the below were not.

  1. Character 0x80 (angle)
    1. Instead using 2220 for character 0x80, others have incorrectly used 221F. 221F is the Right Angle character and is not intended for any generic angle. Also, it does not visually match the HP48.
    2. While 2221 is visually an even better match, this character often does not render properly on various computer platforms and software. In short, some users will just see empty boxes.
  2. Character 0x81 (x-bar)
    1. In theory, Unicode allows two characters to be visually combined if the 2nd character is a "combining character". This would allow for the display of an x with a "combining macron" character, which would be 0078 followed by 0304. However, there are two problems with this.
      1. This combining of these two characters often renders poorly or not at all and will leave the user confused.
        For additional examples of how x-bar is inconsistently rendered based on font, go http://www.kreativekorp.com/charset/encoding.php?file=hp-48.kte&char=81.
      2. Using two characters to represent one HP48 character breaks the pattern having a simple one-to-one mapping. Some HP48 developers will likely have bugs in the code when converting back from Unicode to HP48 characters.
    2. Instead, a-bar (0101) is used. It is a single Unicode character so it is easy for HP48 developers to deal with, leading to less bugs. Also, x-bar is used in statistics as the notation for average. The 'a' in a-bar looks like an 'a' for average.
  3. Character 0x82 (nabla)
    1. The character 2207 was chosen over other triangles since this is the Nabla character which is used in mathematics. Details can be read http://en.wikipedia.org/wiki/Nabla_symbol.
  4. Characters 0x8D through 0x90 (arrows)
    1. In Unicode, there are a large number of characters that represent arrows. However, 2190 through 2193 were chosen because these are just simple arrow characters and don’t carry any additional implied meaning. Also, this set of arrow characters supports all four directions where as some of the other sets do not. Lastly, some of the alternative arrow characters do not consistently get rendered on some computing platforms.
  5. Characters 0x85, 0x8C, 0x9B, 0x9C, 0x9D (various Greek symbols)
    1. These are Greek symbols that could have alternatively been represented by various mathematical or electrical Unicode characters. However there are several reasons for preferring the Greek symbols:
      1. We can gain insight into the original HP48 developers intentions by looking at how they translated these characters when using ASCII transfer mode 2 or 3 over a serial link. These characters were translated into \GS, \Ga, \GD, \PI, and \GW respectively. If we assume that "G" stands for Greek, then we can assume these translations mean Greek Capital Sigma, Greek lower alpha, Greek Capital Delta, Capital Pi, and Greek Capital Omega (a lower omega looks like a 'w'). This pattern holds for all the other translated Greek letters as well, except for \pi which is trivial to see that this is lower pi.
      2. Using all Greek symbols results in a visually clean look. In contrast, when symbols from math, electronics, and Greek symbols are mixed together, they often look sloppy because they don’t line up, have different line weights, and different drawing styles.
  6. Character 0x9E (box)
    1. Instead of using 25A0 as the Black Box, others have incorrectly used 25AC which is the Black Rectangle. This visually does not match.

Resources
Unicode Standard: http://unicode.org/
Unicode Character Name Index: http://www.unicode.org/charts/charindex.html
UTF-8 summary: http://en.wikipedia.org/wiki/Utf-8

Edited: 16 Jan 2013, 1:16 a.m.

Password:

[ Return to the Message Index ]

Go back to the main exhibit hall