Flow matching models like Stable Diffusion 3.5 and FLUX generate an image by iteratively denoising random noise over many small steps. When we want to fine-tune these models with reinforcement learning, say to make them follow text prompts more faithfully or produce more aestheti…
Figure 1: Colocated serving versus heterogeneous operator-level disaggregated serving.
Large language model inference demand is rising quickly, and the infrastructure spending required to serve that demand is rising with it. Market analyses and industry reports increasingly p…
Homomorphic encryption (HE) is a powerful cryptographic technique that enables computation on encrypted data.
With HE, we can create privacy-preserving machine learning models that navigate privacy laws and tap into the computational power of an untrusted cloud.
However, practica…
In this blog, we explain the recent progress on an important class of stochastic sequential decision problems, called restless bandits, based on our paper Unichain and aperiodicity are sufficient for asymptotic optimality of average-reward restless bandits.
We will also briefly m…
Suppose I ask a text-to-image model for a peacock eating rice. The model may know what a peacock looks like. It may know what rice looks like. It may even know how a chicken pecks at rice. But if the training data never contained the exact composition peacock + rice, why should t…
Computer architecture determines how programs execute in hardware—how work unfolds over time and across resources.
Most readers are familiar with processors through the Von Neumann model, where programs are executed as sequences of instructions.
In this view, execution appears s…
Workload imbalance is one of the major problems in training long-context Large Language Model (LLM) models. Long-context capability — the ability of a model to process and reason over hundreds of thousands of tokens at once — is critical for applications such as repository-level …
Consider a scenario in which a hospital wishes to store their highly sensitive data on a third-party database server (such as AWS). Because they are afraid of scenarios such as data breaches or regulatory audits, the hospital will often store the data encrypted using a primitive…
This blog post is about hypothesis testing. In the classical world, this task is often modeled in the following way: we, as scientists, have formed the belief that some (random) system follows a hypothesis distribution \( \mu\). For example, we may believe that some many-sided d…
Boolean satisfiability, or SAT, is a classic computational problem:
given a Boolean formula, determine whether
there exists an assignment to the variables that satisfies the formula.
SAT is NP-complete, meaning all problems in the complexity class NP
can be (efficiently) translat…