Arabic Lam Alif problem on Linux
Problem: letter La (لا) keypress results in a single Unicode character (U+FEFB), instead of Lam (ل) and Alif (ا) (U+0644, U+0627) combined in one ligature. The same problem for La variants: ﻷ، ﻵ، ﻹ.
For non-Arabic speakers, this is like having a key that is supposed to send two
characters: fi
(U+0046 U+0049), but it sends instead one ligature: fi
(U+FB01).
This problem has been fixed in ibus. Since Gnome uses ibus by default, Gnome desktops should have this problem resolved. If you’re using KDE or other desktop environment, this problem may still persist.
Solution using ibus
- Install ibus v1.5.28 or later:
For Arch Linux
1# pacman -S ibus
For Ubuntu/Debian-based:
1# apt install ibus
For Fedora:
1# dnf install ibus
- Run
ibus-setup
- Run
ibus-daemon
. - Run
im-chooser
and chooseibus
. - Restart Xorg session
Some distributions like Arch may not have
im-chooser
. In that case make sure the following environment variables are
set
GTK_IM_MODULE=ibus
QT_IM_MODULE=ibus
XMODIFIERS=@im=ibus
Applications affected
Qt mysteriously handles this issue, so Qt-based apps like Kate are not affected by this. On the other hand, GTK-based apps, like Firefox and Chromium, do have this problem.
Setting the environment variable QT_IM_MODULE=ibus
doesn’t have a clear effect
yet. Qt is suspected to solve this issue by implementing XKB original compose
mode.
Readings while figuring out how Qt handles this issue:
Technical details
The problem originates from XKB limitation that prevents it from mapping a single keypress to multiple keysyms. The current solution works by utilizing compose tables support of input methods. The idea is to map a key sequence of length 1 consisting of the ligature (U+FEFB) into the 2 characters: (U+0644, U+0627). This is why it is considered “hacky”: compose function was never designed to be used that way. This can actually be seen from where XCompose reads from:
cat /usr/share/X11/locale/en_US.UTF-8/Compose | grep LAM
---
<UFEFB> : "لا" # ARABIC LIGATURE LAM WITH ALEF
So virtually any input method having this hack of “single-letter compose sequence” can actually solve this problem. Despite XIM being an old and unrecommended input method, it still solves this problem since it supports the “hack”. Try setting this as an environment variable:
GTK_IM_MODULE=xim
Single-key compose sequence support
Single-key Compose sequences to make single key presses generate multi-character sequences.
- Xorg: supported since 2008-06-20.
- ibus-typing-booster supported since v2.19.0.
- ibus: supported since ibus-1.5.27-9.fc38.
GTK changelog:
- 2022-09-08: bugzilla
- 2022-09-12: Gitlab
- 2022-09-13: Proposed fix for GTK4
- 2022-08-08: Fix regressed
On-screen keyboard support:
- 2023-11-14: GNOME OSK
Issue tracker
Archived discussions:
- Gnome
- KDE
- Ubuntu
- askubuntu
- arch forums
- XDG bugzilla issue
- XDG Gitlab issue
- Khaled Hosny blog post
Credits
Thanks for your contributions and responsiveness to closing these issues.
- Fujiwara San
- Matthias Clasen
- Mike Fabian
On an unrelated note, single-key compose sequence might be an easy implementation for Tom Scott’s emoji keyboard.