First it is clear than having better data in the database will lead to better synthesis. In the simplest sense this means more speech data from the speaker. It is not coincidental that unit selection has become more prominent as the cost of storage has reduced.  correctly identifies this and recommends very large databases to improve the coverage with respect to synthesized data. With more data it is more likely that a database will contain a unit that is closer to the target and also more likely to have a better join.
But the problem of increasing the size of the database is that you never get enough data. There will always be holes in the data because of the phenomenon of frequently occurring rare events in language . For example if we were to collect all triphone contexts, with and without stressed vowels, and consonants in onset and coda (singletons and clusters), then wish to cover these for even a few phrasal conditions we very quickly note that the database requirements become too large even for the cheap storage we now have available. Although referring to database for prosodic coverage,  describes exactly the problem in designing databases that cover predefined phenomena. But even if such a database could be designed the more limiting factor is the difficulty in having a speaker correctly deliver such coverage.