| Home CBM ASCII-X BASIC C128 2MHz Border BASIC 7.80 BASIC 8 CPM Digimaster 128 Fast Serial for uIEC Games Interlace JiffySoft128 Keyboard Scan Media Player 128 Orig Interlace RGBI to S-Video RGBI to SCART RGBI to VGA RGB Conversion SAM 128 Alpha Long Common Phonemes Reciter SAM Technical SID Player 128 VDC Interlace Mp128alpha N-progs D64plus Disk Escape Codes Hardware PCxface PETSCII Pet2asc Futurama IBM PC-AT Contact Games Glossary Hall of fame Hall of shame Miscellaneous Privacy policy Programming Twisty puzzles |
Note: SAM only understands a special phonetic language.
You will have to learn it [see SAM 128 (general)]
if you want "advanced speech effects" (like regional dialects or emphasis of specific words/syllables).
For "simple" American English text, see the page for Using Reciter.
Note: If you installed Reciter instead of SAM, then you need switch the C128 into "SAM-Mode" with the BASIC-Wedge command: ]SAM (if you want to use phonetic text described on this page), or else see the Reciter page (if you want to use English text instead).Speaking phonetic text is very easy, just enter the "SAY" command followed with a string expression that contains valid phonetic text. Some examples: ]SAY "/HEHLOW WERLD." :REM constant string ]SAY A$ :REM scalar variable ]SAY T$(2) :REM array variable ]SAY LEFT$(X$,4)+"S." :REM general string expression I hope you agree by the examples, that the syntax is simple and flexible. The only difficult thing about using SAM is being sure the string argument evaluates as valid phonetic text. Phonetic text will take some time to master because it uses a large "alphabet" of about 40 phonemes (each consisting of 1 or 2 characters) -- see the Phonemes page, or the Alpha Long page, for a list of phonemes that can be used in the string. Phonetic text also offers advanced features, like custom stress and questions [see Using SAM 128 (general), below]. If the string does not contain valid phonetic text, then SAM will generate a "double-beep" sound instead of speaking. When this happens, you can find out where the (first) error in the string occurs with the BASIC-Wedge command ERROR. For example, you ask SAM to say a phrase ("stop it") like this: ]SAY "STOP IT." This example, instead of rendering speech, will fail with a BEEP-BEEP sound. To find out where the error is in your phonetic text, enter ]ERROR, and you will get the following output: STOP IT. In this case, it means neither the character "O", nor the combination "OP" is recognized. This error display only shows the first error found... any following errors will not be highlighted. (As we shall see...)If you do some research, you should find that character pair "AA" will correct the (original "O") error. So now try this command: ]SAY "STAAP IT." Again you will hear "BEEP-BEEP" instead of speech. One last time, ask for details by entering ]ERROR, and you will see: STAAP IT. This means the character I, or the pair IT, is invalid. A little research reveals that "IH" is (probably) the character pair wanted for this phoneme. So finally, enter this command: ]SAY "STAAP IHT." SAM will say this string because it can match all text to a series of phonemes. Another variation you can try is: ]SAY "STAAP IXT." What's the difference? Not much! The IH and IX phonemes sound almost the same -- the IX one is slightly shorter in duration. You should try both to decide which sounds best... Besides using stress marks in the phonetic text, there are several ways you can alter SAM's speech. See the page about Common Routines.
There are two ways to call SAM 128 from a machine language (ML) / assembly language program. The most common will probably be to call 'SayIt' at$EE09 (60937). Before doing that, you will need to copy the phonetic text you want spoken into SAM's buffer. Your text string should be terminated with a 'shifted ESCape' character: $9B (155). If not, SAM will say 'random' junk after your desired text is spoken (or halt with an error without saying anything). In the current (Epsilon) release, the buffer 'samStr' is located at $D800~$D8FF. Because the buffer may move in later versions, you can find the buffer address in memory locations $EE22/23 (a word). Another method, which only applies if you are using BASIC variables, is to simply call 'SayItBASIC' at $EE06 (60934). Before calling this routine, you first need to have a text variable named SA$ defined. If you don't, an empty string named SA$ will be created. You don't need a special terminator byte in the string (because BASIC knows how long the string is). Either way, your string should not have any embedded 'shifted ESC' ($9B) character(s). If it does, SAM will ignore whatever follows (the first). The following examples show the non-wedge BASIC calls to SAM (i.e., SayItBASIC at $EE06). So if you are using ML, just use a simple JSR if running in BANK 1 (unlikely) or else use JSR_FAR. And ML user's that don't want to use BASIC strings must call SayIt at $EE09 instead. To say "Hello world!", for example, SA$="/HEHLOW WERLD.":BANK 1:SYS 60934 Note: if you are using BASIC, it saves a lot of typing (and reduces memory used and improves RUN speed) if you use a variable for the value 60934. Also, you only need to use BANK 1 if you have used some other BANK command (or no BANK command) before using SYS. Thus a more practical (say "the easy way") example is, SA$="DHIY IY5ZIY WEY5.":SYSAM This assumes BANK 1 was the last BANK command issued, and variable AM = 60934. The following examples use this assumption.Another common SYS command you may use (at least while debugging) is the Error Display at 60949 ($EE15). SAM has a very specific set of phonemes he will accept. If your input text fails to match any phoneme in his special "phonetic language" then you will hear "Beep Beep" (instead of speech). When this happens, just call the error display like this: SYS 60949 (the example assumes BANK 1 is active).SAM will 'print' to the current output device (usually the screen) the text you entered, except the character it could not match to a phoneme will be in 'reverse-video'. (The concept of reverse-video really only makes sense on the computer display, or some Commodore-specific printers.) For example, you ask SAM to say a phrase ("stop it") like this: SA$="STOP IT.":SYSAM (Again BANK 1 is assumed active, and variable AM is assumed to equal 60934.) This example, instead of rendering speech, will fail with a BEEP-BEEP sound. To find out where the error is in your input text, enter SYS 60949, and you will get the following output: STOP IT. In this case, it means the character "O" (and the combination "OP") is not recognized. This routine only shows the first error found... any following errors will not be highlighted. So in this example, the single letter "I" will also generate an error after you correct the "O" error...If you do some research, you should find that character pair "AA" will correct the (original "O") error. So now try this command: SA$="STAAP IT.":SYSAM Again you will hear "BEEP-BEEP" instead of speech. One last time, call the Error Display with SYS 60949, and you will see: STAAP IT. This means the letter I, or the pair IT, is invalid. The problem is the single character, I, is not valid. A little research reveals that "IH" is (probably) the character pair wanted for this phoneme. So finally, enter this command: SA$="STAAP IHT.":SYSAM SAM will say this string because it can match all text to a series of phonemes.Another variation you can try is: SA$="STAAP IXT.":SYSAM What's the difference? Not much! The IH and IX phonemes sound almost the same -- the IX one is slightly shorter in duration. You should try both to decide which sounds best... If an error occurs (you hear 'Beep Beep' instead of speech), you can call $EE15 ('DspErr'). This will write the contents of 'samStr' to the active output device (usually the screen) with the "undefined phoneme" shown in reverse video. Like the original (C64) version, there is no CPU flag set to indicate an error upon return.... I guess I should fix this for my final release? Unless you are running code in BANK 1, then you will need to use FAR_CALL to call these routines, and IND_STORE to write your string data. Note: even if you are not using BASIC at all, the MMU pre-configuration registers B (and possibly D) must be set to the same values used by BASIC ROM ($7F and $41, respectively). The 'D' pre-configuration register is only used for finding the BASIC string SA$ and for printing the error text. In both these cases, the BASIC ROM must also be installed in the C128. For the second case, the KERNAL ROM must also be installed.
.1300 LDA #0 ;use BANK 15 (for KERNAL routine $FF77) .1302 STA $FF00 .1305 LDA #00 ;point to SAM buffer ($d800) .1307 LDX #D8 .1309 STA $FE ;pointer in $FE,$FF .130B STX $FF .130D LDA #$FE ;where our pointer is .130F STA $02B9 ;set for IND_STORE .1312 LDY #$16 ;length of string (followed by $9b terminator) ;this loop copies string to SAM buffer .1314 LDA $1330,Y ;read from current bank .1317 LDX #1 ;select BANK 1 .1319 JSR $FF77 ;write to another bank .131C DEY ;count chars .131D BPL $1314 ;loop until all copied ;now call SayIt at $ee09 .131F LDX #1 ;run code in BANK 1 .1321 STX $02 .1323 LDA #$EE ;address $EE09 .1325 LDX #$09 .1327 STA $03 ;set for JSR_FAR .1329 STA $04 .132B JSR $02CD ;call JSR_FAR ;your code continues... .132E NOP .132F RTS ;the text "AESEHMBLIY LAENXGWIHJ." + $9B >01330 41 45 53 45 48 4D 42 4C 49 59 20 4C 41 45 4E 58 >01340 47 57 49 48 4A 2E 9B ;test the example J 1300
You may use 'pure' SAM (without Reciter) to gain enhanced pronunciation (for example, speak with a regional dialect), but then you must use SAM's obscure phonetic language (different from English, or any other language!). Many (most?) people prefer to use Reciter, because learning/using SAM's phonetic language is a moderate pain... but you can not implement dialects with Reciter! Also, Reciter's dictionary misses some common words and thus will be spoken incorrectly. Sometimes you can use "creative spelling" to force Reciter to say the word correctly, but other times your only choice is to use SAM's phonetic language instead. In summary, Reciter is easy to use, but always speaks with Western American accent and will mispronounce some words. You must use 'pure' SAM if you want to speak another language or speak English with a different (not Western American) accent, like:
(The details needed to actually implement any of those non-standard dialects is beyond the scope of this web page... but you should find the needed info here or the linked C64 documentation if you want to try.) SAM requires its "phoneme-text" to be in upper-case ASCII (same as un-shifted PETSCII). Any lower-case text (or any character with value > 127 or value < 32) will cause SAM to generate a "Beep Beep" error sound and return without saying anything. The only exception is the end-of-string marker (character $9B [155]). SAM will ignores this and all following characters.Most phonemes understood by SAM consist of two characters. (However, most consonants require only one character.) There are several possible ways to list them. First I will give an "alphabetic" list which should be helpful to anyone, but is aimed at the beginner. (If you like this format, there is an extended listing here.)
Optimistic Note: don't be scared by the size of the following tables! For normal use, all you need to worry about is the Input Text (what you type) and the Example Pronunciation (what it will sound like). Everything else is just extra info some people may be curious about (or need for advanced programming).
The above tables list "common" SAM phonemes with the internal SAM code byte(s), in hexadecimal. The internal "SAM code" is mainly for hackers and nerds (like myself)... but it also shows (for everyone) how some phonemes are actually expanded into multiple "phones". (I ain't qualified to explain the difference between phonemes and phones, so see the linked Wikipedia articles if you are curious!) Note about phoneme "expansion" (for geeks): the affricates (CH and J) and the plosives (B,D,G,P,T,K) will be expanded into (respectively, 2 or 3) "sub-phonemes" (I believe the technical term is "phone", but I ain't no linguistic expert). There is no way for you to reference/generate the added phones manually. On the other hand, the diphthongs (dual-sound vowels) also expand into two phones. The second (added) phone is one of the "rare phonemes" (listed below), so you could manually generate those phonemes (err, phones) if you want. The final table of "common phonemes" (the 3 "shortcuts") are really just alpha pairs that get re-mapped into a pair of common phonemes -- these are strictly unnecessary! (As a lame "proof", there is no "phoneme data" associated with these three "alpha pairs" in SAM.) I prefer to imagine they were added as a form of compression for Reciter (but the world may never know)... I hope we all agree: the most important column is the "Example Pronunciation"... it tells you what the output will sound like! The examples are based on (Western) American dialect! There are a lot of things I could say here... but a major issue (my opinion) is that SAM, like most English speakers (not just Americans), make (virtually) no distinction between "W" and "WH". In other words, SAM suffers from the "wine/whine merger". As a default, I think this is acceptable, but (so far) I have not found any way to render a "real" WH phoneme (nothing is obviously distinct from the common W phoneme). The combination "/HW" is the best I can do, but it just sounds wrong! Also, there is no way to generate a "rolling R" common in French and Spanish.
Each phoneme lists two time periods:
You can ignore these unless for some reason you want to calculate a phoneme's duration (or you are just curious). These values determine how long it takes SAM to say a phoneme (see Poker / SPEED and the section below about Stress). Because some phonemes consist of two or three parts, a total (sum) is shown for them (along with the list of parts). Any period value showing * is using the special "pure noise" algorithm, and the normal rules of SPEED and phoneme period do not apply (these always show a period of 2). Any period value showing ** is using the special "75% then noise" algorithm. The normal rules apply to 75% of the listed value, but then a "special noise" algorithm begins (and like the "pure noise", this part ignores the SPEED setting). See Poker/NOISES for more info about these two types of phonemes Each phoneme lists two frequencies (these names were used in the original C64 documentation):
Each phoneme also has an undocumented Formant 3 (I don't know if it represents Nasal, Lips, Tongue, or something else!). If you're wondering, "What is a formant?", the short answer is an important (harmonic) waveform of a phoneme/phone. SAM usually generates phonemes with 3 formants plus a fundamental frequency (base wavelength). Formants 1 and 2 have sinusoidal waveform while Formant 3 has a square waveform. I included the first 2 frequencies in the tables because you can manipulate both frequencies with KNOBS and you need to know the first frequency ("Mouth") in order to calculate the "final" base wavelength (see Poker / PITCH). Interesting note: the 3 formant frequencies are not affected by either PITCH or Stress! Assuming you use the recommended TIMEBASE settings, the frequency values shown in the tables are multiples of 25.5 Hz. Taking the phoneme "Y" (as in you) as an example, the "mouth" frequency (formant 1) will be 25.5 * 8 = 204 Hz, and the "throat" frequency (formant 2) will be 25.5 * 82 = 2091 Hz (about 2.1 kHz). And in case you are wondering, PITCH and Stress affect the "preliminary" base wavelength -- and thus the "final" base wavelength. When a phoneme consists of multiple (2 or 3) phones, the first value listed for frequency is the harmonic mean (indicated by ~= which means "approximate average"). Below that is the (exact) frequencies for the individual phones. Some "sub-phonemes" (i.e., some phones) partially or completely ignore the ("Mouth and Throat") formant frequencies (and also PITCH and SPEED). Values indicated with ** work like normal for the first 75% of the "final" base wavelength. In these cases, the frequency values are still relevant and included in the average frequency listed (although I suspect the calculation is inaccurate in these cases). Values indicated with * completely ignore formant frequencies. In these cases, they are not included in the average (shown for phonemes composed of multiple phones). A handful of phonemes (the "rushing" consonants) consist only of one phone of this type. For them, an "x0" is shown (think "times zero") by the frequency value to remind you that the listed frequency is completely ignored. (Technical note: the frequencies are completely ignored for rendering of that phone/phoneme, but due to blending, the value may affect adjacent phonemes.) Sorry if that hurts your head from information overload! But just know that I spared you from the frequency of Formant 3, and the power of all three formants. (The main reason that info is absent is because SAM allows no way to modify them! And none of that info is vital to "nerdy" calculations of pitch and speed!) The "Classification" lists some of the most important (my opinion) classes of phonemes. In reality, SAM associates each phoneme with 15 different classes! I hope you agree, this would result in "information overload" (and add little conceptual value) if all were shown here. You do not (strictly) need to know to which class(es) a phoneme is assigned, but I think the info is (potentially) helpful. The class called "Vowel_X" includes both "real" vowels and modified approximants (LX, RX, WX, and YX). The class called "Fricative" here (and in my source code) includes real fricatives (including sibilants) and affricates, but not the psudo-fricative "H" sound (phonemes /H and /X).
Half of the phonemes are modified approximants (RX, LX, WX, YX). These are created by SAM when a normal approximant (R,L,W,Y) follows a vowel. Notice they are classified as a vowel by SAM (most of the time) and thus belong to the class I call "Vowel_X". Note SAM also has a class for real/normal vowel (which doesn't include these modified approximants), but that (real vowel) classification is rarely considered by SAM's code. Two of the "rare" phonemes are modified plosives (GX and KX). These are created when the normal plosives (G and K) are followed by a consonant. The phoneme DX sounds half-way between T and D... some call it "a quick flap of the tongue". This replaces either T or D when preceded by a vowel_x and also followed by either a stressed vowel_x or a space and any vowel_x. (Where vowel_x is a real vowel or a modified approximant.) The Q phoneme represents a glottal stop -- a halting of air flow. The example pronunciation ("kitten") highlights the pair of t's. In American English (most dialects), the T sound is not actually made (if you are shocked, so was I). The "t-sound" is actually an abrupt blockage (in the throat?) of the preceding "i-sound". Wictionary lists "kitten" with both American (glottal stop) and Imperial (actual t) pronunciations and has audio clips of both. Contrary to what you might expect (based on the original documented example of "kitten"), Reciter will never generate any Q's! SAM will insert a Q in two cases: (1) between two stressed vowel(x)s which are separated by a space, and (2) if you write a really long string of text without any non-space punctuation. In the second case (as described in C64 documentation as SAM "running out of breath"), SAM will change the last space (before "he runs out of breath") into a glottal stop (Q), and pause before he continues speaking. The * phoneme is undocumented; I would call it a bug, but some may call it a "feature". It sounds somewhere between SH and CH. I would call it a short-SH (based mostly on sound, but also the computer code). Surrounding phonemes can affect how it is perceived --- sometimes it sounds like CH. I think it works great in the word "shtick" because if you use the official "SH" phoneme, the SH-sound has too long a duration in my opinion. To hear the ambiguity between SH and CH, have SAM say "potato chip/ship wreck" with code like this: SA$="PEHTEYTOW *IP REHK":SYSAM
Note: for stress digits, the concept of pitch is the same as the PITCH setting (greater values = lower pitch), but the concept of stress rate is the opposite of the SPEED setting (SPEED actually controls duration, the inverse of rate). Sorry if this is confusing (I didn't design SAM), but it is easy to use with just a little practice! Following is a table which will (hopefully) clarify things:
Note that digit 0 is not allowed... this is what happens when no stress digit follows a phoneme. Also digit 9 is not allowed; however, internal SAM transformations may produce stress level 9. Note that stress-level 0 (no stress digit) and stress-level 6 both use the default PITCH (normal wavelength). The difference (not clearly explained in the C64 documentation) is that any stress digit (even the "neutral" 6) will cause the phoneme to have a longer duration. So now you may be confused! Well, let me give you 3 realistic examples. To begin, I hope/expect you know that a simple English sentence can have multiple (subtle) meanings depending on how it is spoken (which words are stressed). Below is a table of values you can try with SAM (or just imagine in your mind)...
Hopefully you will notice/appreciate the expressive power of SAM. If you use Reciter (instead of SAM), then the phrase "I AM SAM." will always be spoken with the stress/emphasis on the word "I"... if this is not what you intend, you should consider the more complex SAM phonetic language. SAM only understands a few punctuation marks. These affect the way "text input" is spoken:
Any non-space punctuation mark will also extend the duration of the preceding syllable by 50% (excluding the phonemes S,SH,F and TH). Here "syllable" means the first prior vowel_x and everything after (until the punctuation mark). Note that SAM will always insert a short pause when he sees a dash (-). In contrast, Reciter will only insert a short pause when the dash is surrounded by spaces (or other SAM punctuation marks). Otherwise, Reciter will treat a dash like a space (i.e., ignore it)!
Find out more about SAM+Reciter (C64) on Wikipedia! SAM (64) © Don't Ask, Inc., 1982 SAM 128 © H2Obsession, 2015 Webpage © H2Obsession, 2015, 2016, 2018 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||