As I see it, there are three main ways to build on this work (or its spirit).
First, there's the classical route: trying to find other scenarios where one believe labels come from a combination of Markov chains, build better topologies for such problems and improve on our low accuracy rate.
Second route arises from a practical perspective: Many scenarios in ML aim at learning and predicting users' behavior. I believe that the ML community has much to gain from looking at the existing body of work in behavioral psychology and behavioral economics. This work undermines a prevailing approach in ranking -- that exists one true total ranking over all items that induces the ranking over subsets of items. I believe that many other assumptions, equally as common, can also (and should also) be altered.
Third, from a theoretical perspective, I find the problem of having non-numeric features to be very compelling (though admittedly, not very practical). In particular, suppose every day we're given k graphs and a label corresponding to whether a (known) global property holds or not for some (unknown) combination of the graphs. (E.g. all graphs are weighted and the label indicates whether the graph resulting from a certain linear combination of all input graphs has s-t cut greater than K.) Can we PAC learn the weights of such combination?