TheSpellingRule

Slender to slender, broad to broad
This rule is normally expressed as a constraint on orthographic words:
 * 1) Every consonant cluster which is immediately preceded by a broad vowel is immediately followed by either a broad vowel or a boundary.
 * 2) Every consonant cluster which is immediately preceded by a slender vowel is immediately followed by either a slender vowel or a boundary.

The relevant definitions are as follows:
 * 1) The following letters are "consonants": b, c, d, f, g, h, l, m, n, p, r, s, t.
 * 2) A "consonant cluster" is a sequence of one or more consonants which is both immediately preceded and immediately followed by either a vowel or a boundary.
 * 3) The following letters are "vowels": a, e, i, o, u.
 * 4) The following vowels are "broad": a, o, u.
 * 5) The following vowels are "slender": e, i.

For example, these words all follow the rule:
 * bodach, ceòlmhor, cumadh.
 * caileag, coinnich, oidhche.

And these words all violate it:
 * glacte, leagte, togte
 * banrigh, choreigin, mocheirigh, rudeigin
 * mosgìoto, soircas, telefòn.

However, this interpretation of the rule completely fails to do it justice. Rather, it needs to be understood as a constraint on the mapping between the phonological and orthographic forms of words.

Sounds and letters
1. Every phoneme is either a vowel or a consonant.

1.1. There are five vowel phonemes in Gaelic: A, E, I, O and U.

1.1.1. E and I are the front vowels.

1.1.2. A, O and U are the back vowels.

1.2. There are 48 consonant phonemes in Gaelic, involving four variants of each of twelve basic consonant types.

1.2.1. The twelve basic consonant types are: P, T, C, B, D, G, F, S, M, L, N and R.

1.2.2. The four variants of each basic consonant type are:
 * slender fortis
 * broad fortis
 * slender lenis
 * broad lenis.

1.2.3. The most common phonetic realisation of each of the 48 consonants is given in the following table.
 * ~  ||||~ fortis ||||~ lenis ||
 * ~  ||~ slender ||~ broad ||~ slender ||~ broad ||
 * = P ||= phj ||= ph ||= fj ||= f ||
 * = T ||= th ||= th ||= hj ||= h ||
 * = C ||= khj ||= kh ||=  ||= x ||
 * = B ||= pj ||= p ||= vj ||= v ||
 * = D ||= tj ||= t ||= j ||=  ||
 * = G ||= kj ||= k ||= j ||=  ||
 * = F ||= fj ||= f ||=  ||=   ||
 * = S ||=  ||= s ||= hj ||= h ||
 * = M ||= mj ||= m ||= vj ||= v ||
 * = L ||=  ||=   ||=   ||=   ||
 * = N ||=  ||=   ||=   ||=   ||
 * = R ||=  ||=   ||=   ||=   ||

2. Turning to orthography, every phoneme is assigned exactly one orthographic form, consisting of a sequence of one or two letters of the Latin alphabet.

2.1. The vowel phonemes in Gaelic are the same as the five vowel phonemes in Latin. This makes the assignment of orthographic forms to the Gaelic vowels very straightforward:
 * A => 
 * E => 
 * I => 
 * O => 
 * U =>

2.2. Gaelic has 48 consonant phonemes. The Latin alphabet has only 12 letters representing consonants. Thus, the assignment of orthographic forms to the Gaelic consonants is a little more complicated.

2.2.1. The two fortis variants of each basic consonant type are assigned to the same, single-letter orthographic form:
 * fortis B => 
 * fortis D => 
 * fortis G => 
 * fortis P =>
 * fortis T => 
 * fortis C => 
 * fortis F => 
 * fortis S =>
 * fortis M => 
 * fortis L => 
 * fortis N => 
 * fortis R => 

2.2.2. The two lenis variants of each basic consonant type are assigned to the same orthographic form. In general, this is the two-letter sequence formed by taking the orthographic form of the appropriate fortis variants, and adding an  on the end. For example:
 * lenis B => 
 * lenis S => <sh>
 * etc.

2.2.3. However, the orthographic forms of the lenis variants of L, N and R are identical to those of their fortis equivalents:
 * lenis L => <l>
 * lenis N => <n>
 * lenis R => <r>

2.3. Note that the mapping from phonemes to orthographic forms is a total function (i.e. every phoneme is mapped to exactly one sequence of letters), but it is non-injective (i.e. it is not the case that every phoneme maps to a different orthographic form). In other words, the translation from orthographic forms to phonemes is ambiguous, since we cannot tell whether a consonant is slender or broad from its orthographic form - recall that there are 48 distinct consonant phonemes, mapping onto only 21 distinct orthographic forms. However, the ambiguity is phonologically consistent, since all the phonemes which map to the same orthographic form belong to the same basic consonant type.

3. A syllabic onset consists of a consonant phoneme followed by a vowel phoneme. We will henceforth use the following notation for a syllabic onset from consonant X to vowel Y: X<Y.

Here are some examples of syllabic onsets:
 * slender-fortis-B < O
 * broad-lenis-M < A
 * broad-fortis-C < U.

4. Every syllabic onset is assigned exactly one orthographic form, consisting of a sequence of two to four letters of the Latin alphabet.

4.1. By default, the orthographic form assigned to a syllabic onset X<Y is formed by concatenating the orthographic form of consonant X with that of vowel Y.

For example:
 * broad-fortis-B < O => <b> + <o> = <bo>
 * slender-lenis-M < I => <mh> + <i> =

4.2. However, if the consonant X is slender and the vowel Y is not a front vowel, then the orthographic form assigned to a syllabic onset X<Y is formed by concatenating the orthographic form of consonant X with that of vowel Y, and interpolating the letter <e> between the two.

For example:
 * slender-fortis-G < A => <g> + <e> + <a> =
 * slender-lenis-S < O => <sh> + <e> + <o> =

4.3. Ignoring L, N and R, the mapping from syllabic onsets to orthographic forms is a total, injective function (i.e. very syllabic onset is mapped to exactly one sequence of letters, distinct from every other syllabic onset). Thus the mapping in both directions is perfectly unambiguous.

However, that fact that there is no orthographic marking of the fortis-lenis distinction in L, N and R spoils things a little, for example:
 * slender-fortis-N < O =>
 * slender-lenis-N < O =>

5. A syllabic coda consists of a vowel phoneme followed by a consonant phoneme. We will henceforth use the following notation for a syllabic coda from vowel X to consonant Y: X>Y.

Here are some examples of syllabic codas:
 * O > slender-fortis-N
 * E > broad-lenis-C
 * A > broad-fortis-S.

6. Every syllabic coda is assigned exactly one orthographic form, consisting of a sequence of two to four letters of the Latin alphabet.

6.1. By default, the orthographic form assigned to a syllabic coda X>Y is formed by concatenating the orthographic form of vowel X with that of consonant Y.

For example:
 * A > lenis-broad-D => <a> + <dh> =
 * I > fortis-slender-D => <i> + <d> = <id>

6.2. However, if the consonant Y is slender and X is not a front vowel, then then the orthographic form assigned to a syllabic coda X>Y is formed by concatenating the orthographic form of vowel X with that of consonant Y, and interpolating the letter <i> between the two.

For example:
 * A > lenis-slender-C => <a> + <i> + <ch> =

6.3. However, if the consonant Y is broad and X is a front vowel, then then the orthographic form assigned to a syllabic coda X>Y is formed by concatenating the orthographic form of vowel X with that of consonant Y, and interpolating the letter <o> between the two.

For example:
 * I > fortis-broad-S => <i> + <o> + =

6.4. There is one additional quirk involving syllabic codas where the consonant is fortis L, N or R. In these cases the orthographic representation of the consonant is reduplicated.

For example:
 * A > fortis-broad-L => <a> + <l> + <l> =
 * I > fortis-slender-N => <i> + <n> + <n> =
 * O > fortis-slender-L => <o> + <i> + <l> + <l> =
 * I > fortis-broad-R => <i> + <o> + <r> + <r> =