The importance, and realization of lexical stress varies between languages but in order to produce a reasonable pronunciation from a string of letters it is often more than simply producing a string of phones, lexical stress markings are also required. In English lexical stress may be different depending on syntactic class, it may even move with some morphological derivations. Therefore predicting lexical stress for each vowel in the predicted string cannot in general be done from the letter context alone. However results in  suggest that combining phone and stress prediction in a single model give better results.
We tested this on the OALD data set. We first built letter to phone models where lexical stressing information was removed from the phones and we trained a separate stress prediction model using the same test set using features such as syllable position in word, vowel length, vowel height, number of syllables from end of word, and part of speech. On held out data from the OALD the per syllable results are
The second model introduced two types of vowel phone, stressed and unstressed versions. The standard LTS model building technique was applied so the CART trees themselves produced phone and stressing information directly (LTPS).
Thus it can be clearly seen that although higher values are possible per word when ignoring stress, a separated model applied afterwards gives significantly lower results than if the phones and stress levels are predicted by a single model.
We also discovered that including part of speech information in the phone prediction models themselves improved the accuracy of the model. Without POS information the combined model gives 95.32% letter correct and 71.28% word correct. Thus part of speech obviously helps and is readily available in a TTS system with a standard POS tagger even for unknown words.
Ultimately stress cannot be predicted on local context alone as there are a number of example in English where local context is insufficient (cf. photograph/photography). Ideally morphological decomposition is required to do such prediction but we have not yet investigated this area.