Dear Mr. Dave Andersen, We are sorry to inform you that your paper, Loss-Optimized Routing in Overlay Networks, has not been accepted for presentation at Infocom 2003. The number of submission this year was the highest yet (around 1078) with only 224 papers being accepted. As a result several good quality papers may not have been accepted. The referee reports on your paper are attached below. They are also accessible at the URL http://edas.cs.columbia.edu/PaperShow.cgi?m=1439. We hope the comments will be useful and allow you to improve the paper in view of a future submission to another conference or journal. It may be of interest to you to know how the paper selection process was conducted. The papers were distributed to TPC members for review shortly after the submission deadline. The objective was to obtain at least three reports per paper, and this was achieved for more than 98% of the papers. Based on the referees' comments and ratings, the TPC chairs made a preliminary cut, accepting 80 top papers and rejecting 658. The remaining 340, having mixed reports, were resubmitted to the TPC for final decision. Specialist groups of TPC members were constituted, each responsible for the decision on around 30 papers. This final decision was made at the TPC meeting in New York on October 12 following preliminary discussions conducted by email. In interpreting the reports below you should bear in mind two things. Firstly, low ratings do not necessarily mean that a reviewer considers that the paper is bad since, in view of the low Infocom acceptance ratio, we instructed reviewers to use a scale mainly intended to differentiate between the top quality papers. Secondly, for the papers discussed at the TPC meeting, the members consulted the paper itself as well as the referee reports before reaching their decision. They based this decision on their own expert opinion and in some cases may have chosen not to follow the majority referee recommendation. Thank you for submitting your work to Infocom. Best regards, Jim Roberts and Ness Shroff (TPC chairs) ===== Review ===== *** Familiarity (Please assess your familiarity with the subject matter of the paper): 1: Outside main area of research 2: Familiar with this area of research 3: Working in this area of research Evaluation=Working in this area of research (3) *** Recommendation (In determining your overall evaluation, it is useful to bear in mind the relatively low acceptance rate in recent Infocoms (around 20%). The percentages against the ranks should be interpreted with respect to an imagined order of preference among papers submitted to Infocom 2003 based on your personal experience of reviewing papers in the networking area. The ranks help classifying submissions but please remember that final acceptence will be determined using the literal responses to the questions below.): 1: Definite reject (not in top 60%, not up to Infocom standard) 2: Likely reject (top 60% but not in top 40%, needs more work) 3: Accept if room (top 40% but not top 20%, borderline for Infocom) 4: Likely accept (top 20% but not top 10%, significant contribution) 5: Definite accept (top 10%, excellent paper) 6: Best paper award (top 1%, groundbreaking!) Evaluation=Definite reject (not in top 60%, not up to Infocom standard) (1) *** Importance (What are the major issues addressed in the paper? Do you consider these issues important?): Assuming there are multiple congestion-disjoint paths between two hosts, whether probing to discover low loss path or sending redundant streams on multiple paths provide lower overall loss rates. *** Contribution (Do the presented results constitute a significant advance (e.g., technical depth, novelty, creative solution, etc.)? (1-3 sentences)): The results presented iterate what has been often reported in the literature that FEC (in this case, a very simple 2-redundant delivery) can cover lost packets at the expense of higher delivery overhead. The paper does not propose any new idea or methodology. *** Strengths (What are the most important reasons to accept this paper? (1-3 sentences) (e.g., advances the state of the art in..., explores the new research area of..., provides useful results for...)): It proposes the use of FEC (in the form of redundant delivery) on an overlay network to reduce loss rate. *** Weaknesses (What are the most important reasons NOT to accept this paper? (1-3 sentences) (e.g., the paper has serious technical mistakes, isn't novel, does not demonstrate its point by proofs/simulations/experiments, makes very unreasonable assumptions, etc.)): The paper is very superficial. It does not address the hard part of the problem, which the authors recognized: "most multi-path routing schemes make assumptions about path diversity that may not hold when considering typical Internet paths." The authors claim to provide "empirical evaluations of the independence of a particular set of Internet paths in Section IV." However, I was not able to find such evaluations in Section IV. All the evaluations presented consider only a scenario with a single overlay. What would the performance be when there are multiple overlays and every host on the Internet send multiple copies of their data? Wouldn't that simply increase congestion, defeating the original motivation of the paper? The data set studied is rather small. All of the edu sites in the US are connected to Internet2, the US sites are mostly concentrated in 3 geographic areas: New England, SF Bay Area, Salt Lake City. Despite the claim that both cable modem and DSL connections are included, there are only two cable modem connected hosts and one DSL connected host, which then proved to give outliers in the collected data. This indicates the need for a richer data set. *** Summary and comments (Brief summary and detailed comments to the authors): Summary: Authors consider using probing and multiple delivery to reduce the overall loss rate experienced by applications. Authors find that probing does not improve loss rate much, while sending multiple copies of data improves loss rate by a factor of 2. Comments: When loss rates are high, applications clearly suffer. However I'm not convinced that looking for alternate path or using multiple paths is the right solution. As the authors recognized, both of these approaches assume path diversity, which the authors have not established. Given the low loss rates reported, and the high loss of the DSL connection, I wonder if the congestion/loss bottleneck is not at the last mile, which usually will not have path diversity. The data set used is also too small. I want to see more diversity in at least the US and Europe, and also more diversity in connectivity technologies; more instances of cable modem and DSL connections, for example. I also want to see more involved discussions on the probing methodology and results. Why are the parameters used set to the values they are set to? What justifications are there for probing at a fixed rate of once every 15 seconds? Why send only 4 probes 1 sec apart to detect host down? Is 100 probes sufficient to determine the loss rate of a path? Maybe the discouraging results of probing reported in the paper is due to the less than rigorous probing methodology used? There also needs to be a more thorough evaluation of the overhead of the two schemes. How was Fig. 2 computed? If every host adopts either or both of these schemes, what would be the global effect on the network? Would they simply increase network congestion, defeating the original purpose of the paper? Presentation wise, I find the paper to bog down in repeated details. The first part of Section III repeats what is already obvious from Section I. Section IV.D repeats II.B, citing a rather old study [12] that considered only a single(!) path on the Internet. Also, while the work by Labovitz et al. shows that routers may take tens of minutes to stabilize after a fault, more recent works by Jennifer Rexford et al. have shown that these are not the norm for high-traffic routes. ===== TPC Review ===== *** Familiarity (Please assess your familiarity with the subject matter of the paper): 1: Outside main area of research 2: Familiar with this area of research 3: Working in this area of research Evaluation=Working in this area of research (3) *** Recommendation (In determining your overall evaluation, it is useful to bear in mind the relatively low acceptance rate in recent Infocoms (around 20%). The percentages against the ranks should be interpreted with respect to an imagined order of preference among papers s): 1: Definite reject (not in top 60%, not up to Infocom standard) 2: Likely reject (top 60% but not in top 40%, needs more work) 3: Accept if room (top 40% but not top 20%, borderline for Infocom) 4: Likely accept (top 20% but not top 10%, significant contribution) 5: Definite accept (top 10%, excellent paper) 6: Best paper award (top 1%, groundbreaking!) Evaluation=Likely reject (top 60% but not in top 40%, needs more work) (2) *** Interest (What are the major issues addressed in the paper? Do you consider these issues important? (1-3 sentences)): The paper addresses the connectivity loss problem in the Internet (it is mainly concerned with long periods of packet losses). The issue is important. *** Contribution (Do the presented results constitute a significant advance? (1-3 sentences)): The paper proposes the exploration of path diversity, provided by overlay networks, to reduce loss rates. Original idea. *** Strengths (What are the most important reasons to accept this paper? (1-3 sentences)): The idea of using the multiple paths provided by the overlay networks is original. The problem statement and related work provide a reasonable support to the idea. *** Weaknesses (What are the most important reasons NOT to accept this paper? (1-3 sentences) (e.g., the paper has serious technical mistakes, isn't novel, does not demonstrate its point by proofs/simulations/experiments, makes very unreasonable assumptions, etc.)): The design of the two proposed schemes was too simplified. The hypothesis of path independence could prove unrealistic for larger overlays. The measurements are done on a simple network which might not be a good sample. *** Comments (Detailed comments on the paper (primarily for the authors)): This paper considers the problem of loss optimization in overlay networks. The idea is to take advantage of the multiple paths already existent in overlay networks to construct routing algorithms that explore this path diversity. Nodes in the overlay select paths that optimize the loss rates. The authors consider an overlay network oh N nodes where each node can use the direct Internet path and one of the (NxN) one-hop "overlay routes" (the alternative route contains only one node in addition to the originating and destination nodes). Two basic schemes (as well as a combination of them) are used to optimize losses: multi-path delivery with redundant encodings and reactive routing based on probes. The loss-optimized routing the authors propose is mainly intended to avoid long periods of disconnection and to applications that cannot support large end-to-end latencies. Thus, the schemes analyzed are restricted to those whose latency overhead is smaller than one RTT. Probe-based overlay routing consists of probing the NxN paths between the overlay nodes and using an alternative path instead of the direct Internet path when this choice provides better performance. The overhead of the scheme is due to alternative-path probing. The other approach is multi-path with redundant encoding. It consists of adding packet-level FEC code to the data stream and splitting the resulting traffic into two or more paths. The multiple paths are directly provided by the overlay network. The originality of the paper is on the use of the multiple paths of overlay networks, which exist anyway, to reduce loss rates. This is its main strength.Classical techniques are then used to increase the successful transmission rate - FEC and reactive routing. Sections I and II provide a very good presentation of the problem and related work. On the other hand, Section III makes a too simplistic analysis of the proposed mechanisms, especially for multi-path routing -- the hypothesis of path independence largely simplifies the problem but on the same time there is no such guarantee in a generic overlay network. Section IV presents the results obtained by the proposed mechanisms on a 17-node overlay network. This section is sometimes confusing, especially in the explanation of the "RONwide" dataset (2nd paragraph of Section IV and Section IV.D) and some typos exist. Briefly, the paper presents original work but the contribution is small. The idea of using multiple overlay paths is fine but the authors did not take into account that these paths may be dependent. It would be interesting to evaluate larger topologies and more complex mechanisms (e.g. to use more multiple paths instead of only 2). Maybe in this case path dependence will be important. We consider that the paper has an promising idea but small contribution on the design and performance evaluation of the proposed schemes. Thus, it needs more work and should not be accepted in its current state. ===== Review ===== *** Familiarity (Please assess your familiarity with the subject matter of the paper): 1: Outside main area of research 2: Familiar with this area of research 3: Working in this area of research Evaluation=Familiar with this area of research (2) *** Recommendation (In determining your overall evaluation, it is useful to bear in mind the relatively low acceptance rate in recent Infocoms (around 20%). The percentages against the ranks should be interpreted with respect to an imagined order of preference among papers submitted to Infocom 2003 based on your personal experience of reviewing papers in the networking area. The ranks help classifying submissions but please remember that final acceptence will be determined using the literal responses to the questions below.): 1: Definite reject (not in top 60%, not up to Infocom standard) 2: Likely reject (top 60% but not in top 40%, needs more work) 3: Accept if room (top 40% but not top 20%, borderline for Infocom) 4: Likely accept (top 20% but not top 10%, significant contribution) 5: Definite accept (top 10%, excellent paper) 6: Best paper award (top 1%, groundbreaking!) Evaluation=Likely accept (top 20% but not top 10%, significant contribution) (4) *** Importance (What are the major issues addressed in the paper? Do you consider these issues important?): The main issue addressed in this paper is how to take advantage of path diversity in order to reduce packet loss. The authors present and evaluate the results of performance measurements done on an overlay test network built on top of the real Internet. Two major techniques for reducing loss, based on the overlay network concept, have been presented. The first technique employs path probing and selection, and the second technique uses multiple paths for information redundancy. I consider the issue of minimizing loss in the Internet to be very important in the context of provisioning real-time sensitive content, like e.g. telephony, video conferencing, etc. *** Contribution (Do the presented results constitute a significant advance (e.g., technical depth, novelty, creative solution, etc.)? (1-3 sentences)): The paper provides technical insights into the inner workings of the employed testbed, as well as into the details of all performed experiments. Straightforward techniques, such as path probing and the use of multiple paths for information redundancy have been used and combined for the purpose of packet loss minimization. *** Strengths (What are the most important reasons to accept this paper? (1-3 sentences) (e.g., advances the state of the art in..., explores the new research area of..., provides useful results for...)): As mentioned above, this paper presents several techniques and their possible combinations for combating packet loss in the Internet. This paper is certainly a valuable contribution in the area of real-time content delivery over the Internet. *** Weaknesses (What are the most important reasons NOT to accept this paper? (1-3 sentences) (e.g., the paper has serious technical mistakes, isn't novel, does not demonstrate its point by proofs/simulations/experiments, makes very unreasonable assumptions, etc.)): The statements about packet loss and the improvements achieved could have been even stronger if the behavior of the overlay network had also been simulated in a network simulator, as in simulations the performance of the algorithms could have been checked with many different types of traffic, generated with well-defined levels of traffic-burstiness on different timescales. *** Summary and comments (Brief summary and detailed comments to the authors): Below you will find detailed remarks about the wordings and spelling errors in your paper. - In the abstract: i. You should explain the term "standing losses"? ii. There is a spelling error at the end of the abstract - "and that 2-redundant multi-path routing can ELIMINATE 30% of outages while" - In the introduction: i. You should try to further highlight the TWO basic methods for reducing the effective loss rate - packet diversity vs. path diversity - they "get lost" in the text ii. Generally, a more systematic introduction of the employed techniques would be great - maybe even a visual help in the from of a tree-graph - In section II.A: i. You have a spelling error in the last sentence - "and THIS is the approach we consider in this paper." - In IV. Evaluation: i. At the beginning of the second paragraph, you point to the wrong table - the sentence should say "Table II lists the two datasets" ii. In the same paragraph, the "three most promising methods" are mentioned before they were introduced, which is done in the next paragraph iii. In the same paragraph, the last sentence should say "RON_wide MEASSURES more combinations" iv. In the next (i.e. the third) paragraph, there is a spelling error when describing the "loss" method - "Loss: Probe-based reactive routing that ATTEMPTS to minimize loss. Requires only probing overhead. v. In C. you have a numerical error at the end of the second paragraph. The last sentence should say "Probing reduced this to 1960 path-minutes, and redundant routing reduced it to 1570 path-minutes." vi. In the next paragraph, the last sentence should be "This data CONFIRMS that probing can avoid" vii. In the tables VII and VIII, a unit should be provided for the latency viii. In D., the last sentence starting in the first column of page 9 should be "The conditional loss probability of a packet sent through an intermediate NODE was only 50%." ix. In E., the methods "rand rand" and "direct lat" should be described in detail - there should not be left any room for speculation on what they do ===== Review ===== *** Familiarity (Please assess your familiarity with the subject matter of the paper): 1: Outside main area of research 2: Familiar with this area of research 3: Working in this area of research Evaluation=Familiar with this area of research (2) *** Recommendation (In determining your overall evaluation, it is useful to bear in mind the relatively low acceptance rate in recent Infocoms (around 20%). The percentages against the ranks should be interpreted with respect to an imagined order of preference among papers submitted to Infocom 2003 based on your personal experience of reviewing papers in the networking area. The ranks help classifying submissions but please remember that final acceptence will be determined using the literal responses to the questions below.): 1: Definite reject (not in top 60%, not up to Infocom standard) 2: Likely reject (top 60% but not in top 40%, needs more work) 3: Accept if room (top 40% but not top 20%, borderline for Infocom) 4: Likely accept (top 20% but not top 10%, significant contribution) 5: Definite accept (top 10%, excellent paper) 6: Best paper award (top 1%, groundbreaking!) Evaluation=Likely reject (top 60% but not in top 40%, needs more work) (2) *** Importance (What are the major issues addressed in the paper? Do you consider these issues important?): Minimise loss in Internet. The issue is obviously important, but I don't think the approach taken by the authors is an effective one *** Contribution (Do the presented results constitute a significant advance (e.g., technical depth, novelty, creative solution, etc.)? (1-3 sentences)): The paper does not present a significant advance, as stated by the authors them selves (first step); The technical depth is so-so *** Strengths (What are the most important reasons to accept this paper? (1-3 sentences) (e.g., advances the state of the art in..., explores the new research area of..., provides useful results for...)): I am not advocating for its acceptation. *** Weaknesses (What are the most important reasons NOT to accept this paper? (1-3 sentences) (e.g., the paper has serious technical mistakes, isn't novel, does not demonstrate its point by proofs/simulations/experiments, makes very unreasonable assumptions, etc.)): I would like consider the following points as major weaknesses: 1) The paper is placed under ROUTING paradigm, but the concrete results shown in the paper concerns much more measurement, but not really actual routing cases. It is a first step, said authors, so I suggest that they present a result in better adequation with the chosen title. 2) The undergoing iusses are - the redundancy paradigm: It is a personal beleave that it is not an effective solution to want to minimize loss (mostly due to increase in trafic) by increasing even more trafic (redundancy) - dynamuc routing: the idea is attracing. The dynamic behavior of the network shuld however be addressed. Precisely in this paper, measurements are done, but the INTERFERENCE that the redundancy could carry to the network is not really examined, and in particular not experimented nor simulated. 3) It wouold be appreciated to precise the type of applications/flow to which your method is better suited. *** Summary and comments (Brief summary and detailed comments to the authors): I resumed my point of view in the precedent case. Here are some minor comments - P3, B. Internet Perf.... It is *obvious* that the loss proba be higher when the gap between packets is smaller (it is equivalent to have a higher trafic, cf any classical queueing book, e.g. Kleinrock) - P5, paper [11] is cited as reference for Reed-Solomon code. I think it is better to cite a book or a tutorial paper, rather than an application paper. - P5, table 1, colomn Probe-based, there is a DIMENSION problem in N**2/BW, N has no DIMENSION, BW=bit/s. please precise. ===== Review ===== *** Familiarity (Please assess your familiarity with the subject matter of the paper): 1: Outside main area of research 2: Familiar with this area of research 3: Working in this area of research Evaluation=Familiar with this area of research (2) *** Recommendation (In determining your overall evaluation, it is useful to bear in mind the relatively low acceptance rate in recent Infocoms (around 20%). The percentages against the ranks should be interpreted with respect to an imagined order of preference among papers submitted to Infocom 2003 based on your personal experience of reviewing papers in the networking area. The ranks help classifying submissions but please remember that final acceptence will be determined using the literal responses to the questions below.): 1: Definite reject (not in top 60%, not up to Infocom standard) 2: Likely reject (top 60% but not in top 40%, needs more work) 3: Accept if room (top 40% but not top 20%, borderline for Infocom) 4: Likely accept (top 20% but not top 10%, significant contribution) 5: Definite accept (top 10%, excellent paper) 6: Best paper award (top 1%, groundbreaking!) Evaluation=Accept if room (top 40% but not top 20%, borderline for Infocom) (3) *** Importance (What are the major issues addressed in the paper? Do you consider these issues important?): this paper studies several types of routing that could be used in overlay networks to reduce packet losses. These techniques are compared in term of loss, latency and bandwidth overhead, using actual measures from an overlay network. This topic is important since overlay networks are considered in an increasing number of applications (CDN, applicative multicast, peer to peer applications, ...) *** Contribution (Do the presented results constitute a significant advance (e.g., technical depth, novelty, creative solution, etc.)? (1-3 sentences)): The suggested routing algorithms are quite simple. The contribution is in the evaluation of these algorithms using a real overlay network. *** Strengths (What are the most important reasons to accept this paper? (1-3 sentences) (e.g., advances the state of the art in..., explores the new research area of..., provides useful results for...)): Advances the understanding of loss dynamics in the internet and overlay networks. Gives quantitative results on loss and latency *** Weaknesses (What are the most important reasons NOT to accept this paper? (1-3 sentences) (e.g., the paper has serious technical mistakes, isn't novel, does not demonstrate its point by proofs/simulations/experiments, makes very unreasonable assumptions, etc.)): This paper seems to be only a preliminary version, with many typos and minor errors. The type of data flows (application ?) used in the study is not described *** Summary and comments (Brief summary and detailed comments to the authors): Interesting paper, using actual measures on an Internet testbed. A few comments: - the paper contains many typos and small errors (for example errors in Table numbers in the text) that should be corrected - the RON network is described, but nothing is said about the data flows used in the experiment (bandwidth, packet size, ...) - it is said (page 4) that only applications whose maximum latency is less than round trip time are considered (to preclude using retransmission). However in an overlay network, the one way latency through an intermediate node may be larger than the round trip latency through the direct path ? - as far as reactive routing is concerned, nothing is said about an important aspect : stability : what if many RON keep switching between two redundant paths ? - formula in table I should include the probe interval - the definition of the redundancy rate R is not clear : "the fraction of lost packets that can be tolerated", so what means R=1 (page 5), 100% lost packets ? - define more precisely (page 6) the different methods "direct", "Lat", in particular it seems that "Lat Loss" should be called "Loss Lat", since the first packet minimizes loss ?