As diphones run from mid of one phone to mid of another, we need to know exactly where that ``mid'' is. For supported languages, we already know where the diphone boundary is in existing diphone databases, so when we synthesize the prompts, the accompanying labels include both the phone boundary positions, as well as the diphone boundaries. Although midway between phone boundaries may be the most appropriate join point for vowels, it almost certainly is not for stops, where the closure part of the phone is by far a better place to join. Diphone boundaries (marked as ``DB'') are also often the part requiring correction.
From the labels, we build a diphone index automatically, which can be used by Festival to synthesize waveforms. Two basic methods are offered first: so-called ``separate-mode,'' where the diphones are selected from each LPC and residual file on demand, and ``group-mode,'' where we can collect just the diphone parts and put them into a single large file. The first of this is used in the initial debugging stage. The second stage is used for distribution of complete voices, as it is both more compact and quicker to access.