Q1: Where is the "detailed derivation for this subsection" at the end of the paper?
A1: Unfortunately we did not publish "an extended version of this work". However, you can find the derivation in my thesis (http://www.cs.cmu.edu/~fanguo/dissertation/fanguo-thesis.pdf) starting from Section A.4, page number 93, or Page 115 of the PDF.
--
Q2: On the 7th page, Section 5.1, the second line, you mentioned that "Only query sessions with at least on click are kept for performance evaluation". As a consequence, "The value of \alpha_1equals 1" (8th page, 11th line). However, on the 4th page, in Figure 4, case 4 (i > l), the item (1 - \alpha1), which will be zero, appeared as a divisor in the formulae for computing \beta_m. Then, how can we compute these coefficients? If we set all the coefficients for case 4 as zero, then all the URL impressed after the last click position will contribute nothing to the model.
A2: You raised a very good point, and thanks a lot for bringing it up. We set \alpha_1 = 0.99 in our experiment, and if you check the attached page, under Appendix B are the parameter values we learned from the experiment. For a more general click log set with non-click sessions, \alpha_1 will be estimated from data.
--
Q3: I found the log-likelihood of CCM much worse than the Dependent Click Model in my implementation?
A3: Try to include query sessions with 1+ clicks only, which would hopefully make the debugging easier. The other thing to note is that for documents which appear very few times in the training data and which do appear in the test data, the document relevance could be replaced by positional relevance for each model. This should also improve the LL results.
--
Q4: In UBM, we need an additional value - the distance to the previous click. Then, during the prediction, how can we get this value? Perhaps we first assume that no click is performed, and as soon as the user clicks on url, the click probability of all the documents below will be updated? If so, then how can we calculate all the click probabilities in a single page, without any prior knowledge of the user's click? Perhaps we need a 2^10 enumeration?
A4: Short answer: using conditional probabilities.
Long answer: the derivation in the following paper may help: http://research.microsoft.com/pubs/80592/KDD09BBM.pdf