GSM 03.38 Encoding for Python 2.4

These days I’ve been working with SMS messages. Due to the fact that I work in a Greek company, the character set of these messages should be Greek (or able to contain greek letters). As far as I know there are 3 options to accomplish this:
- Use unicode (UCS2, it’s like UTF-16). But you’ll get only 70 characters per SMS.
- Use GSM 03.38 encoding. You get 160 characters per SMS, but a reduced set of characters.
- 8 bit encoding. I wasn’t able to figure how this work with the Greek charset (yet).
- All lowercase greek letters are transformed to uppercase.
- Tones and diaeresises are removed.
- For some upper case letters, latin ones are used instead. Example: rho: ρΡ -> P (Even in a web browser is difficult to see the subtle difference).
>>> import gsm0338
>>> msg = u”Αναταράξεις στο πολιτικό σκηνικό προκαλεί η υπόθεση Siemens”
>>> msg.encode(”gsm0338″)
‘ANATAPA\x1aEI\x18 \x18TO \x16O\x14ITIKO \x18KHNIKO \x16POKA\x14EI H Y\x16O\x19E\x18H Siemens’
ANATAPAEI TO OITIKO KHNIKO POKAEI H YOEH Siemens
>>> print msg.encode(”gsm0338″).decode(”gsm0338″)
ΑΝΑΤΑΡΑΞΕΙΣ ΣΤΟ ΠΟΛΙΤΙΚΟ ΣΚΗΝΙΚΟ ΠΡΟΚΑΛΕΙ Η YΠΟΘΕΣΗ Siemens
If what you see is useful for you, here is the gsm0338.py encoding module for python 2.4.
Tags:encoding,gsm 03.38,python,python 2.4 »
Trackback URL: http://fitri.manzanisimo.net/2008/06/20/gsm-0338-encoding-for-python-24/trackback/