But if this is encoded as a Unicode character, the door should be open to any word or phrase that is commonly used in some stylized form. (Coca-Cola?)
What carried it through was L2/02-163, which noted that all official documents in Pakistan have to start with the glyph. The document also has a picture of what the real-world glyph actually looks like. It definitely loses something by being flattened to a single line
Lol, no. Unicode's whole point is to reflect natural language (and then some). It's part of natural language, so it is included. Not that difficult to understand, whatever god you do (or don't) have is just fine.
May I present the case of серафими многоꙮчитїи as an argument that it’s not just one religion? :)
(Jokes aside, there’s a procedure to suggest changes to the consortium, and people are using the procedure for intents and purposes they see fit. I’d say that making life easier for people in the fifth most populous country on Earth is very legitimate.)
I wonder if they'll actually fix the issues with japanese vs chinese kanji. The Han unification was a disaster, and Japanese were particularly unhappy about it.
The plot was always descriptive, not prescriptive, and fucking up Han unification is a great example of why. You wanna fix it, go ahead and start working on it.
Were you aware that if you are willing to sacrifice part of the meaning (the patient[^1]) you can use only 4 bytes and one codepoint: 𓂺! Much more efficient and clearer in meaning!
[^1] Modern views would of course argue that assuming 🍑 is the patient argument of this sentence is only true in phallo-dominant-patriarcal cultures, but this is beyond the scope of this humble comment.
I think 🍆🍑💦 is more popular than 𓂺 because 𓂺 is very awkward to use outside proggressive tenses (like the present continuous in English), as it clearly implies that the action is in progress. So it's not as flexible. 🍆🍑💦 lends itself far more easily to poetry, for example.
Plus, since it makes the patient explicit, it has an interpersonal quality that is absent from 𓂺, which is why the use of 𓂺 has been largely confined to situations where the ellipsis of the patient argument is entirely deliberate because it's physically absent (although I guess one might argue that in this case it's actually a reflexive so duh).
It's a legal requirement in Pakistan that official documents start with this. It's purely for an administrative reason (End-users with an Urdu keyboard must be able to easily reach out to this ligature instead of having to switch to another keyboard layout).
This character does not have a decomposition mapping. U+FDFA, however, has a compatibility decomposition mapping. The decomposition of U+FDFA is the longest decomposition in Unicode. In the ICU4X normalizer, I special-cased U+FDFA in order to be able to allocate fewer bits for the length of every other decomposition.
What a strange coincidence, I've spent the better part of the last two days preparing a talk for Rust in Paris on friday about... Unicode. There are actually many more such ligatures (all in the same block). What is surprising is that some of them do have a decomposition mapping, and hence are NFK compatible with their "spelled out" form (e.g. U+FDFA) but others like U+FDFD don't!
Love that Unicode supports this level of expressiveness for all scripts and cultures. I wonder why this was submitted and why under the accessibility label though.
The document linked by hwayne includes a justification if the submission: common use in Pakistan, but since it's a different script it's bothersome to be changing keyboard configurations to type it
But how do they enter it with an urdu keyboard? Is it a dedicated key for it, or is it tied to a shortcut combination in the operating system? If the latter case, then they could just have used the regular arabic glyph saved in the shortcut.
hwayne | 21 hours ago
Wikipedia has a bunch of documents on the standardization. It was controversial:
What carried it through was L2/02-163, which noted that all official documents in Pakistan have to start with the glyph. The document also has a picture of what the real-world glyph actually looks like. It definitely loses something by being flattened to a single line
ubernostrum | 20 hours ago
Depending on the font, some of the other ligatures in that block hold up a bit better as long as you bump up the font size enough. For example:
That's
U+FDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM.janus | 19 hours ago
The ligature has an article in the English Wikipedia:
vrolfs | 14 hours ago
Apparently, it means "In the name of God, the Most Gracious, the Most Merciful".
matheusmoreira | 13 hours ago
Yes. It's the Basmalah. It's recited often and can be found at the start of chapters in the Qur'an.
mhd | 11 hours ago
Also makes a short appearance in Bohemian Rhapsody.
regalialong | 9 hours ago
Oh wow! I did not mentally register it, but it's totally there in the last verse.
icefox | 4 hours ago
Also very common in The 1001 Nights.
ethoh | 9 hours ago
Not a fan. I read it as: A religion has claimed ownership of Unicode.
junon | 9 hours ago
Lol, no. Unicode's whole point is to reflect natural language (and then some). It's part of natural language, so it is included. Not that difficult to understand, whatever god you do (or don't) have is just fine.
fly | 8 hours ago
✝️☦️✡️🕉️☸️
🤡🙄
nikola | 9 hours ago
May I present the case of серафими многоꙮчитїи as an argument that it’s not just one religion? :)
(Jokes aside, there’s a procedure to suggest changes to the consortium, and people are using the procedure for intents and purposes they see fit. I’d say that making life easier for people in the fifth most populous country on Earth is very legitimate.)
ethoh | 6 hours ago
Oh dear, unicode might have lost the plot.
I wonder if they'll actually fix the issues with japanese vs chinese kanji. The Han unification was a disaster, and Japanese were particularly unhappy about it.
icefox | 4 hours ago
The plot was always descriptive, not prescriptive, and fucking up Han unification is a great example of why. You wanna fix it, go ahead and start working on it.
gerikson | 7 hours ago
I like to think that for every person that's mad that Unicode contains ﷽, there's another, opposite person mad you can write 🍆🍑💦 using Unicode.
Also, it's been in Unicode since 2003. The time for complaints has passed.
krtab | 7 hours ago
Were you aware that if you are willing to sacrifice part of the meaning (the patient[^1]) you can use only 4 bytes and one codepoint: 𓂺! Much more efficient and clearer in meaning!
[^1] Modern views would of course argue that assuming 🍑 is the patient argument of this sentence is only true in phallo-dominant-patriarcal cultures, but this is beyond the scope of this humble comment.
x64k | 4 hours ago
I think 🍆🍑💦 is more popular than 𓂺 because 𓂺 is very awkward to use outside proggressive tenses (like the present continuous in English), as it clearly implies that the action is in progress. So it's not as flexible. 🍆🍑💦 lends itself far more easily to poetry, for example.
Plus, since it makes the patient explicit, it has an interpersonal quality that is absent from 𓂺, which is why the use of 𓂺 has been largely confined to situations where the ellipsis of the patient argument is entirely deliberate because it's physically absent (although I guess one might argue that in this case it's actually a reflexive so duh).
hungariantoast | 4 hours ago
𓄽𓀐𓂸
Hecate | 9 hours ago
It's a legal requirement in Pakistan that official documents start with this. It's purely for an administrative reason (End-users with an Urdu keyboard must be able to easily reach out to this ligature instead of having to switch to another keyboard layout).
alper | 9 hours ago
That… is a reading.
hsivonen | 10 hours ago
This character does not have a decomposition mapping. U+FDFA, however, has a compatibility decomposition mapping. The decomposition of U+FDFA is the longest decomposition in Unicode. In the ICU4X normalizer, I special-cased U+FDFA in order to be able to allocate fewer bits for the length of every other decomposition.
krtab | 20 hours ago
What a strange coincidence, I've spent the better part of the last two days preparing a talk for Rust in Paris on friday about... Unicode. There are actually many more such ligatures (all in the same block). What is surprising is that some of them do have a decomposition mapping, and hence are NFK compatible with their "spelled out" form (e.g. U+FDFA) but others like U+FDFD don't!
Pages 1 and 3 of this extract of the unicode standard have a list of them.
freddyb | 12 hours ago
Love that Unicode supports this level of expressiveness for all scripts and cultures. I wonder why this was submitted and why under the accessibility label though.
einacio | 11 hours ago
The document linked by hwayne includes a justification if the submission: common use in Pakistan, but since it's a different script it's bothersome to be changing keyboard configurations to type it
altano | 3 hours ago
freddyb meant why was this submitted to lobsters, not the Unicode spec.
enpo | 9 hours ago
But how do they enter it with an urdu keyboard? Is it a dedicated key for it, or is it tied to a shortcut combination in the operating system? If the latter case, then they could just have used the regular arabic glyph saved in the shortcut.
Hecate | 9 hours ago
Various ways:
https://www.informationpk.com/write-bismillah-hir-rahman-nir-rahim-and-allah-in-crulp-phonetic-urdu-keyboard/
https://urdu.ca/Phonetic-Keyboard-Layout.pdf
chinmay | 3 minutes ago
urdu keyboards already have a SAW key