Avid Seeker

Arabic Lam Alif problem on Linux

Screenshot of the problem in Gedit

Video of the problem in Gedit

Problem: letter La (لا) keypress results in a single Unicode character (U+FEFB), instead of Lam (ل) and Alif (ا) (U+0644, U+0627) combined in one ligature. The same problem for La variants: ﻷ، ﻵ، ﻹ.

For non-Arabic speakers, this is like having a key that is supposed to send two characters: fi (U+0046 U+0049), but it sends instead one ligature: (U+FB01).

This problem has been fixed in ibus. Since Gnome uses ibus by default, Gnome desktops should have this problem resolved. If you’re using KDE or other desktop environment, this problem may still persist.

Solution using ibus

  1. Install ibus v1.5.28 or later:

For Arch Linux

1# pacman -S ibus

For Ubuntu/Debian-based:

1# apt install ibus

For Fedora:

1# dnf install ibus
  1. Run ibus-setup
  2. Run ibus-daemon.
  3. Run im-chooser and choose ibus.
  4. Restart Xorg session

Some distributions like Arch may not have im-chooser. In that case make sure the following environment variables are set

GTK_IM_MODULE=ibus
QT_IM_MODULE=ibus
XMODIFIERS=@im=ibus

Applications affected

Qt mysteriously handles this issue, so Qt-based apps like Kate are not affected by this. On the other hand, GTK-based apps, like Firefox and Chromium, do have this problem.

Setting the environment variable QT_IM_MODULE=ibus doesn’t have a clear effect yet. Qt is suspected to solve this issue by implementing XKB original compose mode.

Readings while figuring out how Qt handles this issue:

  1. Qt official i18n docs
  2. Qt3 brief ligature docs

Technical details

The problem originates from XKB limitation that prevents it from mapping a single keypress to multiple keysyms. The current solution works by utilizing compose tables support of input methods. The idea is to map a key sequence of length 1 consisting of the ligature (U+FEFB) into the 2 characters: (U+0644, U+0627). This is why it is considered “hacky”: compose function was never designed to be used that way. This can actually be seen from where XCompose reads from:

cat /usr/share/X11/locale/en_US.UTF-8/Compose | grep LAM
---
<UFEFB>	: "لا" # ARABIC LIGATURE LAM WITH ALEF

So virtually any input method having this hack of “single-letter compose sequence” can actually solve this problem. Despite XIM being an old and unrecommended input method, it still solves this problem since it supports the “hack”. Try setting this as an environment variable:

GTK_IM_MODULE=xim

Single-key compose sequence support

Single-key Compose sequences to make single key presses generate multi-character sequences.

GTK changelog:

On-screen keyboard support:

Issue tracker

Archived discussions:

Credits

Thanks for your contributions and responsiveness to closing these issues.

On an unrelated note, single-key compose sequence might be an easy implementation for Tom Scott’s emoji keyboard.

#Arabic