\ David Wajc's Homepage

News

Apr. '19: My paper with Anupam Gupta, Guru Guruganesh and Binghui Peng, "Stochastic Online Metric Matching", was accepted to ICALP 2019.
Oct. '18: My paper with Mohsen Ghaffari, “Simplified and Space-Optimal Semi-Streaming for (2+ε)-Approximate Matching”, was accepted to SOSA 2019.
Sep. '18: I am visiting Ola Svensson at EPFL for the Fall Semester.
Apr. '18: My paper with Bernhard Haeupler and Ellis Hershkowitz, “Round- and Message-Optimal Distributed Graph Algorithms”, was accepted to PODC 2018.
Apr. '18: My paper with Moab Arar, Shiri Chechik, Sarel Cohen and Cliff Stein, "Dynamic Matching: Reducing Integral Algorithms to Approximately-Maximal Fractional Algorithms", was accepted to ICALP 2018.
Apr. '18: My paper with Anupam Gupta, Guru Guruganesh and Amit Kumar, "Fully-Dynamic Bin Packing with Limited Repacking", (to be merged with "A Tight Approximation for Fully Dynamic Bin Packing without Bundling", by Björn Feldkord, Matthias Feldotto and Sören Riechers) was accepted to ICALP 2018.
Nov. '17: My paper with Ariel Procaccia and Harnui Zhang, "Approximation-Variance Tradeoffs in Facility Location Games", was accepted to AAAI 2018.
Oct. '17: My paper with Ilan R. Cohen, "Randomized Online Matching in Regular Graphs", was accepted to SODA 2018.



Contact Information

dwajc@cs.cmu.edu
GHC 7001, Carnegie Mellon University

Network Coding Gaps for Completion Time of Multiple Unicasts

Bernhard Haeupler, David Wajc, Goran Zuzic
W32019

Abstract

Arguably the most common network communication problem is multiple-unicasts: Distinct packets at different nodes in a network need to be delivered to a destination specific to each packet as fast as possible.

The famous multiple-unicast conjecture posits that, for this natural problem, there is no performance gap between routing and network coding, at least in terms of throughput.

We study the same network coding gap but in terms of completion time. While throughput corresponds to the completion time for asymptotically-large transmissions, we look at completion times of multiple unicasts for arbitrary amounts of data. We develop nearly-matching upper and lower bounds. In particular, we prove that the network coding gap for the completion time of k unicasts is at most polylogarithmic in k, and there exist instances of k unicasts for which this coding gap is polylogarithmic in k.

Online Matching with General Arrivals

Buddhima Gamlath, Michael Kapralov, Andreas Maggiori, Ola Svensson, David Wajc
W22019

Abstract

The online matching problem was introduced by Karp, Vazirani and Vazirani nearly three decades ago. In that seminal work, they studied this problem in bipartite graphs with vertices arriving only on one side, and presented optimal deterministic and randomized algorithms for this setting. In comparison, more general arrival models, such as edge arrivals and general vertex arrivals, have proven more challenging, and positive results are known only for various relaxations of the problem. In particular, even the basic question of whether randomization allows one to beat the trivially-optimal deterministic competitive ratio of 1/2 for either of these models was open. In this paper, we resolve this question for both these natural arrival models, and show the following.

  1. For edge arrivals, randomization does not help | no randomized algorithm is better than 1/2 competitive.
  2. For general vertex arrivals, randomization helps | there exists a randomized (1/2+ Ω(1))-competitive online matching algorithm.

Tight Bounds for Online Edge Coloring

Ilan Reuven Cohen, Binghui Peng, David Wajc
W12019

Abstract

Vizing’s celebrated theorem asserts that any graph of maximum degree  admits an edge coloring using at most Δ+1 colors. In contrast, Bar-Noy, Motwani and Naor showed over a quarter century ago that the trivial greedy algorithm, which uses 2Δ-1 colors, is optimal among online algorithms. Their lower bound has a caveat, however: it only applies to low-degree graphs, with Δ = O(log n), and they conjectured the existence of online algorithms using (1+o(1))·Δ colors for Δ = ω(log n). Progress towards resolving this conjecture was only made under stochastic arrivals (Aggarwal et al., FOCS’03 and Bahmani et al., SODA’10).

We resolve the above conjecture for adversarial vertex arrivals in bipartite graphs, for which we present a (1+o(1))·Δ-edge-coloring algorithm for Δ = ω(log n) known a priori. Surprisingly, if Δ is not known ahead of time, we show that no online (e/(e-1)-Ω(1))·Δ-edge-coloring algorithm exists. We then provide an optimal, (e/(e-1)+o(1))·Δ-edge-coloring algorithm for unknown Δ = ω(log n). Key to our results, and of possible independent interest, is a novel fractional relaxation for edge coloring, for which we present optimal fractional online algorithms and a near-lossless online rounding scheme, yielding our optimal randomized algorithms.

Stochastic Online Metric Matching

Anupam Gupta, Guru Guruganesh, Binguhi Peng, David Wajc
C11ICALP 2019

Abstract

We study the minimum-cost metric perfect matching problem under online i.i.d arrivals. We are given a xed metric with a server at each of the points, and then requests arrive online, each drawn independently from a known probability distribution over the points. Each request has to be matched to a free server, with cost equal to the distance. The goal is to minimize the expected total cost of the matching.

Such stochastic arrival models have been widely studied for the maximization variants of the online matching problem; however, the only known result for the minimization problem is a tight O(log n)-competitiveness for the random-order arrival model. This is in contrast with the adversarial model, where an optimal competitive ratio of O(log n) has long been conjectured and remains a tantalizing open question.

In this paper, we show improved results in the i.i.d arrival model. We show how the i.i.d model can be used to give substantially better algorithms: our main result is an O((log log log n)2)- competitive algorithm in this model. Along the way we give a 9-competitive algorithm for the line and tree metrics. Both results imply a strict separation between the i.i.d model and the adversarial and random order models, both for general metrics and these much-studied metrics.

Simplified and Space-Optimal Semi-Streaming (2+ε)-Approximate Matching

Mohsen Ghaffari, David Wajc
C10SOSA 2019

Abstract

In a recent breakthrough, Paz and Schwartzman (SODA'17) presented a single-pass (2+ɛ)-approximation algorithm for the maximum weight matching problem in the semi-streaming model. Their algorithm uses O(n log^2 n) bits of space, for any constant ɛ>0.

We present two simplified and more intuitive analyses, for essentially the same algorithm, which also improve the space complexity to the optimal bound of O(n log n) bits --- this is optimal as the output matching requires \Omega(n log n) bits. Our analyses rely on a simple use of the primal dual method and a simple accounting method.

Near-Optimum Online Ad Allocation for Targeted Advertising

Joseph (Seffi) Naor, David Wajc
J2TEAC 2018 Special Issue on EC 15 . Supersedes the EC 15 paper below.

Abstract

Motivated by Internet targeted advertising, we address several ad allocation problems. Prior work has established these problems admit no randomized online algorithm better than (1-1/e)-competitive ([Karp et al. 1990; Mehta et al. 2007]), yet simple heuristics have been observed to perform much better in practice. We explain this phenomenon by studying a generalization of the bounded-degree inputs considered by [Buchbinder et al. 2007), graphs which we call (k,d)-bounded. In such graphs the maximal degree on the online side is at most d and the minimal degree on the offline side is at least k. We prove that for such graphs, these problems' natural greedy algorithms attain competitive ratio 1-(d-1)/(k+d-1), tending to one as d/k tends to zero. We prove this bound is tight for these algorithms.

Next, we develop deterministic primal-dual algorithms for the above problems achieving competitive ratio 1-(1-1/d)k>1-1/e^{k/d}, or exponentially better loss as a function of k/d, and strictly better than 1-1/e whenever k ≥ d. We complement our lower bounds with matching upper bounds for the vertex- weighted problem. Finally, we use our deterministic algorithms to prove by dual-fitting that simple randomized algorithms achieve the same bounds in expectation. Our algorithms and analysis differ from previous ad allocation algorithms, which largely scale bids based on the spent fraction of their bidder's budget, whereas we scale bids according to the number of times the bidder could have spent as much as her current bid. Our algorithms differ from previous online primal-dual algorithms, as they do not maintain dual feasibility, but only primal-to-dual ratio, and only attain dual feasibility upon termination. We believe our techniques could find applications to other well- behaved online packing problems.

Round- and Message-Optimal Distributed Graph Algorithms

Bernhard Haeupler, D. Ellis Hershkowitz, David Wajc
C9PODC 2018

Abstract

Distributed graph algorithms that separately optimize for either the number of rounds used or the total number of messages sent have been studied extensively. However, algorithms simultaneously efficient with respect to both measures have been elusive. For example, only very recently was it shown that for Minimum Spanning Tree (MST), an optimal message and round complexity is achievable (up to polylog terms) by a single algorithm in the CONGEST model of communication.

In this paper we provide algorithms that are simultaneously round- and message-optimal for a number of well-studied distributed optimization problems. Our main result is such a distributed algorithm for the fundamental primitive of computing simple functions over each part of a graph partition. From this algorithm we derive round- and message-optimal algorithms for multiple problems, including MST, Approximate Min-Cut and Approximate Single Source Shortest Paths, among others. On general graphs all of our algorithms achieve worst-case optimal ˜O(D+sqrt{n}) round complexity and ˜O(m) message complexity. Furthermore, our algorithms require an optimal ˜O(D) rounds and ˜O(n) messages on planar, genus-bounded, treewidth-bounded and pathwidth-bounded graphs.

Fully-Dynamic Bin Packing with Limited Repacking

Anupam Gupta, Guru Guruganesh, Amit Kumar, David Wajc
C8ICALP 2018

Abstract

We study the classic Bin Packing problem in a fully-dynamic setting, where new items can arrive and old items may depart. We want algorithms with low asymptotic competitive ratio while repacking items sparingly between updates. Formally, each item i has a movement cost ci≥0, and we want to use α·OPT bins and incur a movement cost γ·ci, either in the worst case, or in an amortized sense, for α, γ as small as possible. We call γ the recourse of the algorithm. This is motivated by cloud storage applications, where fully-dynamic Bin Packing models the problem of data backup to minimize the number of disks used, as well as communication incurred in moving file backups between disks. Since the set of files changes over time, we could recompute a solution periodically from scratch, but this would give a high number of disk rewrites, incurring a high energy cost and possible wear and tear of the disks. In this work, we present optimal tradeoffs between number of bins used and number of items repacked, as well as natural extensions of the latter measure.

Dynamic Matching: Reducing Integral Algorithms to Approximately-Maximal Fractional Algorithms

Moab Arar, Shiri Chechik, Sarel Cohen, Cliff Stein, David Wajc
C7ICALP 2018

Abstract

We present a simple randomized reduction from fully-dynamic integral matching algorithms to fully-dynamic “approximately-maximal” fractional matching algorithms. Applying this reduction to the recent fractional matching algorithm of Bhattacharya, Henzinger, and Nanongkai (SODA 2017), we obtain a novel result for the integral problem. Specifically, our main result is a randomized fully-dynamic (2+ɛ)-approximate integral matching algorithm with small polylog worst-case update time. For the (2+ɛ)-approximation regime only a fractional fullydynamic (2+ɛ)-matching algorithm with worst-case polylog update time was previously known, due to Bhattacharya et al. (SODA 2017). Our algorithm is the first algorithm that maintains approximate matchings with worst-case update time better than polynomial, for any constant approximation ratio. As a consequence, we also obtain the first constant-approximate worst-case polylogarithmic update time maximum weight matching algorithm.

Randomized Online Matching in Regular Graphs

Ilan R. Cohen, David Wajc
C6SODA 2018

Abstract

In this paper we study the classic online matching problem, introduced in the seminal work of Karp, Vazirani and Vazirani (STOC 1990), in regular graphs. For such graphs, an optimal deterministic algorithm as well as efficient algorithms under stochastic input assumptions were known. In this work, we present a novel randomized algorithm with competitive ratio tending to one on this family of graphs, under adversarial arrival order. Our main contribution is a novel algorithm which achieves competitive ratio 1-O(sqrt(log d)/ sqrt(d)) in expectation on d-regular graphs. In contrast, we show that all previously-studied online algorithms have competitive ratio strictly bounded away from one. Moreover, we show the convergence rate of our algorithm’s competitive ratio to one is nearly tight, as no algorithm achieves competitive ratio better than 1-O(1/sqrt(d)). Finally, we show that our algorithm yields a similar competitive ratio with high probability, as well as guaranteeing each vertex a probability of being matched tending to one.

Approximation-Variance Tradeoffs in Facility Location Games

Ariel Procaccia, David Wajc, Hanrui Zhang
C5AAAI 2018

Abstract

We revisit the well-studied problem of constructing strategyproof approximation mechanisms for facility location games, but offer a fundamentally new perspective by considering risk averse designers. Specifically, we are interested in the tradeoff between a randomized strategyproof mechanism’s approximation ratio, and its variance (which has long served as a proxy for risk). When there is just one facility, we observe that the social cost objective is trivial, and derive the optimal tradeoff with respect to the maximum cost objective. When there are multiple facilities, the main challenge is the social cost objective, and we establish a surprising impossibility result: under mild assumptions, no smooth approximation-variance tradeoff exists. We also discuss the implications of our work for computational mechanism design at large.

A Faster Distributed Radio Broadcast Primitive (Extended Abstract)

Bernhard Haeupler, David Wajc
C4PODC 2016

Abstract

We present a faster distributed broadcasting primitive for the classical radio network model.

The most basic distributed radio network broadcasting primitive - called Decay - dates back to PODC'87 result of Bar-Yehuda, Goldreich, and Itai. In any radio network with some informed source nodes, running Decay for O(d log n + log^2 n) rounds informs all nodes at most d hops away from a source with high probability. Since 1987 this primitive has been the most important building block for implementing many other functionalities in radio networks. The only improvements to this decades-old algorithm are slight variations due to [Czumaj, Rytter; FOCS'03] and [Kowalski and Pelc, PODC'03] which achieve the same functionality in O(d log(n/d) + log^2 n) rounds. To obtain a speedup from this, d and thus also the network diameter need to be near linear, i.e., larger than n^(1-eps).

Our new distributed primitive spreads messages for d hops in O(d (log n log log n)/(log d) + log^{O(1)} n) rounds with high probability. This improves over Decay for any super-polylogarithmic d = log^{omega(1)} n and achieves near-optimal O(d log log n) running time for d = n^eps. This also makes progress on an open question of Peleg.

Near-Optimum Online Ad Allocation for Targeted Advertising

Joseph (Seffi) Naor, David Wajc
C3EC 2015.

Abstract

Motivated by Internet targeted advertising, we address several ad allocation problems. Prior work has established these problems admit no randomized online algorithm better than (1-1/e)-competitive ([Karp et al. 1990; Mehta et al. 2007]), yet simple heuristics have been observed to perform much better in practice. We explain this phenomenon by studying a generalization of the bounded-degree inputs considered by [Buchbinder et al. 2007), graphs which we call (k,d)-bounded. In such graphs the maximal degree on the online side is at most d and the minimal degree on the offline side is at least k. We prove that for such graphs, these problems' natural greedy algorithms attain competitive ratio 1-(d-1)/(k+d-1), tending to one as d/k tends to zero. We prove this bound is tight for these algorithms.

Next, we develop deterministic primal-dual algorithms for the above problems achieving competitive ratio 1-(1-1/d)k>1-1/e^{k/d}, or exponentially better loss as a function of k/d, and strictly better than 1-1/e whenever k ≥ d. We complement our lower bounds with matching upper bounds for the vertex- weighted problem. Finally, we use our deterministic algorithms to prove by dual-fitting that simple randomized algorithms achieve the same bounds in expectation. Our algorithms and analysis differ from previous ad allocation algorithms, which largely scale bids based on the spent fraction of their bidder's budget, whereas we scale bids according to the number of times the bidder could have spent as much as her current bid. Our algorithms differ from previous online primal-dual algorithms, as they do not maintain dual feasibility, but only primal-to-dual ratio, and only attain dual feasibility upon termination. We believe our techniques could find applications to other well- behaved online packing problems.

You Will Get Mail! Predicting the Arrival of Future Email

Iftah Gamzu, Zohar Shay Karnin, Yoelle Maarek, David Wajc
C2WWW (Companion Volume) 2015

Abstract

The majority of Web email is known to be generated by machines even when one excludes spam. Many machine-generated email messages such as invoices or travel itineraries are critical to users. Recent research studies establish that causality relations between certain types of machine- generated email messages exist and can be mined. These relations exhibit a link between a given message to a past message that gave rise to its creation. For example, a shipment notification message can often be linked to a past online purchase message. Instead of studying how an incoming message can be linked to the past, we propose here to focus on predicting future email arrival as implied by causality relations. Such a prediction method has several potential applications, ranging from improved ad targeting in up sell scenarios to reducing false positives in spam detection.

We introduce a novel approach for predicting which types of machine-generated email messages, represented by so-called "email templates", a user should receive in future time windows. Our prediction approach relies on (1) statistically inferring causality relations between email templates, (2) building a generative model that explains the inbox of each user using those causality relations, and (3) combining those results to predict which email templates are likely to appear in future time frames. We present preliminary experimental results and some data insights obtained by analyzing several million inboxes of Yahoo Mail users, who voluntarily opted-in for such research.

Best-Response Dynamics Out of Sync: Complexity and Characterization

Roee Engelberg, Alex Fabrikant, Michael Schapira, David Wajc
C1EC 2013

Abstract

In many computational and economic models of multi-agent interaction, each participant repeatedly "best-responds" to the others' actions. Game theory research on the prominent "best-response dynamics" model typically relies on the premise that the interaction between agents is somehow synchronized. However, in many real-life settings, e.g., internet protocols and large-scale markets, the interaction between participants is asynchronous. We tackle the following important questions: (1) When are best-response dynamics guaranteed to converge to an equilibrium even under asynchrony? (2) What is the (computational and communication) complexity of verifying guaranteed convergence? We show that, in general, verifying guaranteed convergence is intractable. In fact, our main negative result establishes that this task is undecidable. We exhibit, in contrast, positive results for several environments of interest, including complete, computationally-tractable, characterizations of convergent systems. We discuss the algorithmic implications of our results, which extend beyond best-response dynamics to applications such as asynchronous Boolean circuits.

On the complexity of vertex-coloring edge-weightings

Andrzej Dudek, David Wajc
J1 Discrete Mathematics & Theoretical Computer Science 2011
 

Abstract

Given a graph G=(V,E) and a weight function w:E →R, a coloring of vertices of G, induced by w, is defined by χw(v) = ∑_{e∋v} w(e) for all v∈V. In this paper, we show that determining whether a particular graph has a weighting of the edges from {1,2} that induces a proper vertex coloring is NP-complete.

Negative Association:

Definition, Properties, and Applications

Abstract

In these notes we present the notion of Negative Association, discuss some of its useful properties, and end with some example applications. The slogan to bear in mind here is “independent, or better”.

I have had the pleasure of teaching various courses, both at the Technion and at CMU.

At Carnegie Mellon

Graduate Algorithms (15-750):
Spring 2019

Probability and Computing (15-359/659):
Spring 2015

At the Technion:

Data Structures 1 (234218):  

Summer 2011Summer 2010

Introduction to Systems Programming (234122): 
Spring 2009

Introduction to CS (234114): 

``How do you pronounce 'Wajc'?", you ask?
Answer: The second syllable of the words e-vites and invites.
Using the International Phonetic Alphabet, Wajc would be spelled [vajts].

Still unsure? Check out this recording of Kira Radinsky:

Here's a subset of a Polish transcription table to explain how those letters are supposed to represent those sounds.

Ww       v     as in "vat"
  Aa     o     as in "hot"
  Jj     y     as in "yes"
  Cc     ts     as in "bits"


And before you ask: no, I don't speak Polish. Here's a relevant meme by Krzysztof Onak: