GSM 03.38 Encoding for Python 2.4

Vim screenshot of gsm0338 encoding

These days I’ve been working with SMS messages. Due to the fact that I work in a Greek company, the character set of these messages should be Greek (or able to contain greek letters). As far as I know there are 3 options to accomplish this:

  • Use unicode (UCS2, it’s like UTF-16). But you’ll get only 70 characters per SMS.
  • Use GSM 03.38 encoding. You get 160 characters per SMS, but a reduced set of characters.
  • 8 bit encoding. I wasn’t able to figure how this work with the Greek charset (yet).
So for our new SMS delivery system we decided to go with the almost universal GSM 03.38 encoding. Also we changed our hacked java SMPP client to pythomnic and it’s SMPP library.
That works great but the library doesn’t know anything about encoding. Using iso-8859-1 works, but for Greek we need some kind of “automatic conversion”. This was done using a python encoding module (for python 2.4).
From Greek to GSM 03.38 you have to understand that gsm supports only a few greek letters, they’re in uppercase and without tones. Therefore:
  • All lowercase greek letters are transformed to uppercase.
  • Tones and diaeresises are removed.
  • For some upper case letters, latin ones are used instead. Example: rho: ρΡ -> P (Even in a web browser is difficult to see the subtle difference).
Example:

>>> import gsm0338
>>> msg = u”Αναταράξεις στο πολιτικό σκηνικό προκαλεί η υπόθεση Siemens”
>>> msg.encode(”gsm0338″)
‘ANATAPA\x1aEI\x18 \x18TO \x16O\x14ITIKO \x18KHNIKO \x16POKA\x14EI H Y\x16O\x19E\x18H Siemens’
ANATAPAEI TO OITIKO KHNIKO POKAEI H YOEH Siemens
>>> print msg.encode(”gsm0338″).decode(”gsm0338″)
ΑΝΑΤΑΡΑΞΕΙΣ ΣΤΟ ΠΟΛΙΤΙΚΟ ΣΚΗΝΙΚΟ ΠΡΟΚΑΛΕΙ Η YΠΟΘΕΣΗ Siemens

If what you see is useful for you, here is the gsm0338.py encoding module for python 2.4.

Tags:,,, »

Trackback URL: http://fitri.manzanisimo.net/2008/06/20/gsm-0338-encoding-for-python-24/trackback/

Say your words