GSM 03.38 Encoding for Python 2.4

By: graffic

Vim screenshot of gsm0338 encoding

These days I’ve been working with SMS messages. Due to the fact that I work in a Greek company, the character set of these messages should be Greek (or able to contain greek letters). As far as I know there are 3 options to accomplish this:

  • Use unicode (UCS2, it’s like UTF-16). But you’ll get only 70 characters per SMS.
  • Use GSM 03.38 encoding. You get 160 characters per SMS, but a reduced set of characters.
  • 8 bit encoding. I wasn’t able to figure how this work with the Greek charset (yet).
So for our new SMS delivery system we decided to go with the almost universal GSM 03.38 encoding. Also we changed our hacked java SMPP client to pythomnic and it’s SMPP library.
That works great but the library doesn’t know anything about encoding. Using iso-8859-1 works, but for Greek we need some kind of “automatic conversion”. This was done using a python encoding module (for python 2.4).
From Greek to GSM 03.38 you have to understand that gsm supports only a few greek letters, they’re in uppercase and without tones. Therefore:
  • All lowercase greek letters are transformed to uppercase.
  • Tones and diaeresises are removed.
  • For some upper case letters, latin ones are used instead. Example: rho: ρΡ -> P (Even in a web browser is difficult to see the subtle difference).
Example:

>>> import gsm0338
>>> msg = u”Αναταράξεις στο πολιτικό σκηνικό προκαλεί η υπόθεση Siemens”
>>> msg.encode(”gsm0338″)
‘ANATAPA\x1aEI\x18 \x18TO \x16O\x14ITIKO \x18KHNIKO \x16POKA\x14EI H Y\x16O\x19E\x18H Siemens’
ANATAPAEI TO OITIKO KHNIKO POKAEI H YOEH Siemens
>>> print msg.encode(”gsm0338″).decode(”gsm0338″)
ΑΝΑΤΑΡΑΞΕΙΣ ΣΤΟ ΠΟΛΙΤΙΚΟ ΣΚΗΝΙΚΟ ΠΡΟΚΑΛΕΙ Η YΠΟΘΕΣΗ Siemens

If what you see is useful for you, here is the gsm0338.py encoding module for python 2.4.

Tags:,,, »

Trackback URL: http://fitri.manzanisimo.net/2008/06/20/gsm-0338-encoding-for-python-24/trackback/

3 Responses so far »

  1. 1

    Bookmarks about Smpp said,

    2008-10-29 @ 12.00 pm

    [...] – bookmarked by 2 members originally found by MediocreFilms on 2008-10-08 GSM 03.38 Encoding for Python 2.4 http://fitri.manzanisimo.net/2008/06/20/gsm-0338-encoding-for-python-24/ – bookmarked by 3 members [...]

  2. 2

    LE said,

    2009-06-01 @ 1.41 pm

    Hi.

    I looked at your code for my own version of the GSM 03.38 codec, and it appears you’ve got some fundamental mistakes.

    For example, the extension escape sequences won’t work at all. They’ll just go away. As for encoding, parsing uppercase H won’t work because you remapped it.

    I’m sorry, but GSM 03.38 isn’t a simple character map codec.

    You might also want to note that you unmap a lot of characters when you map your “Greek equivalents”, for example:

    0×4B:0×004B, # LATIN CAPITAL LETTER K
    0×4B:0×039A, # GREEK CAPITAL LETTER KAPPA

    Will unmap the Latin capital letter K when encoding, in favor of the Greek capital letter Kappa.

    Other than that, thanks!

  3. 3

    graffic said,

    2009-06-01 @ 9.07 pm

    @Le: you’re right.

    I forgot to post some changes we did in the office to make it work. But it was during my last day of work. After that I didn’t have my lovely sms gateway to test it 100%.

    Thank you for the comments. Can I ask for the link of your version? I guess it’s better to point to the right one at the top of the post :)

Comment RSS · TrackBack URI

Say your words