The bst Data Structure

par::bst is a persistent balanced binary search tree supporting set and dictionary operations. All update operations return a new tree and do not modify the original.

The tree optionally supports augmentation via an associative combine function and identity value. These parameters allow for quering the reduced value of the tree in constant time.

par::bst is implemented as a randomized BST; all logarithmic cost bounds hold with high probability. In cost bounds, $n$ refers to the current size (number of elements) of the tree.

Template Parameters

The full type of the tree is:

par::bst<
   KeyType,
   ValueType,
   Compare =      /* default */,
   Combine =      /* default */,
   IdentityType = /* default */
 >

The template parameters are:

If Compare is omitted, the tree uses KeyType's operator< by default. If Combine and IdentityType are omitted, the tree behaves as a standard sorted dictionary without augmentation.

Type Alias Members

The bst class defines several public type aliases that refer to its template parameters and derived types.

Advanced: Custom Types

In most use cases, augmented_value_type and identity_type are simply the same as value_type. However, it is also possible to use a different augmented type, provided that:

Since leaf nodes conceptually contribute their stored value to the augmentation via combine, the following expressions must be valid and produce augmented_value_type:

An example of this is provided later where we show how to implement dynamic MCSS. This is optional. You do not have to implement anything this complex for your labs.

Creating Trees

Although bst has a public constructor, it is preferred to create trees using the provided factory functions: empty, singleton, and from_sequence. These functions make the intended usage clearer and avoid mistakes when working with custom comparison or augmentation parameters.

Constructor Work Span
bst(
  Compare less = {},             // default
  Combine combine = {},          // default 
  identity_type identity = {}    // default
)
$O(1)$ $O(1)$

Constructs an empty tree with the given comparison function, combine function, and identity value. All three parameters are optional and will take their default values if not specified.

par::bst<int,int> T;   // empty BST with key type int and value type int
empty Work Span
static auto empty(
  Compare less = {},             // default
  Combine combine = {},          // default
  identity_type identity = {}    // default
) -> bst
$O(1)$ $O(1)$

Constructs an empty tree with the given comparison function, combine function, and identity value. All three parameters are optional and will take their default values if not specified.

auto T = par::bst<int,int>::empty();
singleton Work Span
static auto singleton(
  key_type key,
  value_type value,
  Compare less = {},             // default
  Combine combine = {},          // default
  identity_type identity = {}    // default
) -> bst
$O(1)$ $O(1)$

Returns a tree containing a single key-value pair, using the given comparison function, combine function, and identity value. These three parameters are optional and will take their default values if not specified.

auto T = par::bst<int,int>::singleton(5, 42);
from_sequence Work Span
static auto from_sequence(
  const sequence auto& S,
  Compare less = {},             // default
  Combine combine = {},          // default
  identity_type identity = {}    // default
) -> bst
$O(|S|)$ $O(\log |S|)$

Constructs a balanced tree from a sequence of key-value pairs, using the given comparison function, combine function, and identity value. These three parameters are optional and will take their default values if not specified. The input sequence must already be sorted by key according to less.

par::array_sequence a{
  std::pair{1,10},
  std::pair{2,20},
  std::pair{3,30}
};

auto T = par::bst<int,int>::from_sequence(a);

Queries

These operations do not modify the tree. Since the tree is persistent, all operations are logically read-only.

size Work Span
auto size() -> std::size_t
$O(1)$ $O(1)$

Returns the number of elements in the tree.

std::size_t n = T.size();
find Work Span
auto find(const key_type& key) -> std::optional<value_type>
$O(\log n)$* $O(\log n)$*

Return an optional containing the value associated with key if it exists, or an empty optional otherwise.

if (auto v = T.find(10)) {
  std::cout << "value = " << v.value() << "\n";
}
first Work Span
auto first()
  -> std::optional<std::pair<const key_type&, const value_type&>>
$O(\log n)$* $O(\log n)$*

Return an optional containing the key-value pair with the smallest key in the tree, or an empty optional if the tree is empty.

last Work Span
auto last()
  -> std::optional<std::pair<const key_type&, const value_type&>>
$O(\log n)$* $O(\log n)$*

Return an optional containing the key-value pair with the largest key in the tree, or an empty optional if the tree is empty.

next Work Span
auto next(const key_type& k)
  -> std::optional<std::pair<const key_type&, const value_type&>>
$O(\log n)$* $O(\log n)$*

Return an optional containing the key-value pair with the smallest strictly greater than k, or an empty optional if no such element exists (i.e., the tree is empty or k is the greatest key).

prev Work Span
auto prev(const key_type& k)
  -> std::optional<std::pair<const key_type&, const value_type&>>
$O(\log n)$* $O(\log n)$*

Return an optional containing the key-value pair with the greatest strictly less than k, or an empty optional if no such element exists (i.e., the tree is empty or k is the least key).

reduce_val Work Span
auto reduce_val() -> augmented_value_type
$O(1)$ $O(1)$

Returns the augmented value stored at the root of the tree. If the tree is empty, returns the identity value.

(Persistent) Updates

These operations treat the tree as a sorted dictionary. All operations return a new tree and do not modify the original.

insert Work Span
auto insert(
  key_type key,
  value_type value
) -> bst
$O(\log n)$* $O(\log n)$*

Returns a new tree with (key, value) inserted. If the key already exists, its value is replaced.

auto T = par::bst<int,int>::empty();
T = T.insert(5, 10);
T = T.insert(3, 7);
remove Work Span
auto remove(const key_type& key) const -> bst;
$O(\log n)$* $O(\log n)$*

Returns a new BST with key removed. If the key is not present, the tree is returned unchanged.

Note: Its called remove rather than delete since delete is a keyword in C++.

par::bst<int,int> T;
T = T.insert(1, 10);
T = T.insert(2, 20);

auto T2 = T.remove(1);

// T is unchanged
// T2 contains only key 2
get_range Work Span
auto get_range(
  const key_type& k1,
  const key_type& k2
) -> bst
$O(\log n)$* $O(\log n)$*

Returns a new tree containing all key-value pairs with keys in the range $(k_1, k_2)$, i.e., all keys $k$ such that $k_1 < k < k_2$.

auto R = T.get_range(10, 20);
filter Work Span
auto filter(auto&& predicate) -> bst
$O(n)$ $O(\log n)$*

Returns a tree containing only elements for which predicate(value) returns true.

auto positives = T.filter([](int v) {
  return v > 0;
});
filter_key Work Span
auto filter_key(auto&& predicate) -> bst
$O(n)$ $O(\log n)$*

Returns a tree containing only elements for which predicate(key, value) returns true.

auto even_keys = T.filter_key([](int k, int) {
  return k % 2 == 0;
});
map Work Span
auto map(auto&& f) -> bst
$O(n)$ $O(\log n)$*

Returns a tree with the same keys but with values transformed by f. The values returned by $$f$$ must be the same type as value_type, and the resulting tree will use the same combine function and identity value.

auto doubled = T.map([](int v) {
  return 2 * v;
});

Structural Operations

These operations split and combine trees based on keys.

split Work Span
auto split(const key_type& key)
  -> std::tuple<bst, std::optional<value_type>, bst>
$O(\log n)$* $O(\log n)$*

Splits the tree into three parts:

auto [L, v, R] = T.split(10);

if (v) {
  std::cout << "value = " << v.value() << "\n";
}
join Work Span
friend auto join(
  bst L,
  bst R
) -> bst
$O(\log (|L| + |R|))$* $O(\log (|L| + |R|))$*

Joins two trees L and R into a single tree.

The result copies the comparator, combine function, and identity value from L and R, which must have the same comparator, combine, and identity value or the result will be inconsistent.

Precondition: all keys in L must be strictly less than all keys in R.

auto T = join(L, R);
join_mid Work Span
friend auto join_mid(
  bst L,
  key_type key,
  value_type value,
  bst R
) -> bst
$O(\log (|L| + |R|))$* $O(\log (|L| + |R|))$*

Creates a new balanced tree from the contents of L, the key,value pair (key, value), and R. The resulting tree will be balanced and hence it is not guaranteed that (key, value) is the root node of the result.

The result copies the comparator, combine function, and identity value from L and R, which must have the same comparator, combine, and identity value or the result will be inconsistent.

Precondition: all keys in L must be strictly less than key, and all keys in R must be strictly greater than key.

auto T = join_mid(L, 10, 42, R);

Set-Theoretic Operations

The bst type supports the set theoretic operations of union, intersection, and difference. Since these functions are primarily intended to maintain sets or maps with unique keys, they do not implement tiebreaking for values. That is, if the union or intersection of two bsts contains a key that was present in both, it will arbitrarily have one of the two corresponding values.

In the cost bounds, $m$ is the size of the smaller of A and B, and $n$ is the size of the larger of A and B. This means that if A and B have the same size, the work is linear in their size.

set_union Work Span
friend auto set_union(
  const bst& A,
  const bst& B
) -> bst;
$O(m + m \log(\frac{n}{m}))$* $O(\log n)$*

Returns a new BST containing all keys that appear in either A or B, and their corresponding values. If a key appears in both trees, an arbitrary value from one of the two keys is selected for the result.

// Note: its not par::set_union
auto U = set_union(A, B);
set_intersection Work Span
friend auto set_intersection(
  const bst& A,
  const bst& B
) -> bst;
$O(m + m \log(\frac{n}{m}))$* $O(\log n)$*

Returns a new BST containing only the keys that appear in both A and B. For each key, an arbitrary value from one of the two corresponding keys is selected for the result.

// Note: its not par::set_intersection
auto I = set_intersection(A, B);
set_difference Work Span
friend auto set_difference(
  const bst& A,
  const bst& B
) -> bst;
$O(m + m \log(\frac{n}{m}))$* $O(\log n)$*

Returns a new BST containing all keys that appear in A but not in B, and their corresponding values.

// Note: its not par::set_difference
auto D = set_difference(A, B);

Exposing the Tree

Unlike typical ordered set or dictionary implementations (e.g., std::set), this BST provides operations that expose its recursive structure. These operations allow algorithms to directly access the root key, value, and subtrees.

Since bst is persistent, exposing a tree does not make a deep copy of the underlying data structure. All operations described in this section therefore run in constant time.

expose Work Span
struct exposed_tree {
  const key_type& key;
  const value_type& value;
  bst left;
  bst right;
};

friend auto expose(const bst& tree) -> std::optional<exposed_tree>
$O(1)$ $O(1)$

If tree is empty, returns an empty optional. Otherwise returns an optional containing an exposed_tree, which consists of:

The left and right subtrees share structure with the original tree. No deep copies are performed.

A common pattern when writing recursive tree algorithms is:

auto e = expose(T);
if (!e.has_value()) {
  return std::nullopt;
} else {
  auto& [k, v, L, R] = e.value();
  // process non-empty case
}

As a convenient helper function, expose_then essentially performs this pattern.

expose_then Work Span
friend auto expose_then(const bst& tree, auto&& f)
  -> std::optional<...>  // f must return a std::optional of some type 
$O(1)$ $O(1)$

Applies a user-provided function f, which takes a key, a value, a left tree, and a right tree, to the exposed tree if the tree is non-empty, and returns the result. If the tree is empty, returns an empty optional.

The function f must return a std::optional. expose_then runs in constant time only if f runs in constant time, otherwise it costs the same as invoking f.

Conceptually, expose_then behaves like:

if (auto e = expose(tree)) {
  return f(e->key, e->value, e->left, e->right);
} else {
  return std::nullopt;
}

Augmentation Examples

This section demonstrates common ways to use augmentation. In each case, we provide a combine function and identity value. The combine function must be associative, and the identity must be a proper identity for that operation.

Note that in C++, for functions and templates that have default parameters, like our BST, the user is always required to provide a prefix of them. We can not do Python-style named parameters unfortunately. That means if we want to specify a custom combine function, we must specify the comparison function, even if we just wanted the default one, since the argument order of the defaults is compare, combine, identity.

Subtree Sum

using SumTree = par::bst<int, int, std::less<>, std::plus<>>;

auto T = SumTree::empty()
                 .insert(1, 10)
                 .insert(2, 20)
                 .insert(3, 30);

int total = T.reduce_val();  // 60

Here the augmented value at each node stores the sum of its subtree. Calling reduce_val() returns the total sum.

In this code, we allowed all of the parameters of empty() to default since std::less<> and std::plus<> have no state, and 0 is the correct identity for addition.

Subtree Product

using ProductTree = par::bst<int, int, std::less<>, std::multiplies<>>;

auto T = ProductTree::empty({}, {}, 1)
                     .insert(1, 2)
                     .insert(2, 3)
                     .insert(3, 4);

int product = T.reduce_val();  // 24

The identity must be 1, since 1 is the identity for multiplication. Letting it default to 0 would be incorrect. Since the identity is the final argument, we are required to specify all three parameters. We use empty braces {} to make those arguments take their default values.

Subtree Maximum

using MaxTree = par::bst<int, int, std::less<> ,decltype(std::ranges::max)>;

auto T = MaxTree::empty({}, std::ranges::max, std::numeric_limits<int>::min())
                 .insert(1, -10)
                 .insert(2, 25)
                 .insert(3, 7);

int m = T.reduce_val();  // 25

Here decltype(std::ranges::max) means "the type of the std::ranges::max function object." Since we have negative numbers, the identity must be the smallest possible value, which we can get from std::numeric_limits. If all of the numbers were non-negative, we could use 0 as the identity instead.

Advanced Example: MCSS

This example demonstrates that the augmentation framework is powerful enough to implement nontrivial divide-and-conquer summaries such as the maximum contiguous subsequence sum (MCSS). This example is provided for interest only - you do not need to implement anything remotely this complex for the labs.

As we know from MCSSLab, we need to store four values as our augmented value:

We encode this as a struct that can be constructed from a single integer. This is how leaf/base-case values are converted into the correct type, since this is an example where value_type = int but augmented_value_type = mcss_node. Since the combine function operates on mcss_node, values of type int must be implicitly convertible to mcss_node.

struct mcss_node {
  int total;
  int prefix;
  int suffix;
  int best;

  // Base case
  mcss_node(int x)
    : total(x),
      prefix(std::max(x, 0)),
      suffix(std::max(x, 0)),
      best(std::max(x, 0)) { }

  mcss_node(int t, int p, int s, int b)
      : total(t),
        prefix(p),
        suffix(s),
        best(b) { }
};

Next, we define an associative combine operation that merges two segments. This should also remind you of MCSSLab.

auto mcss_combine = [](const mcss_node& L, const mcss_node& R) {
  return mcss_node{
      L.total + R.total,
      std::max(L.prefix, L.total + R.prefix),
      std::max(R.suffix, R.total + L.suffix),
      std::max({L.best, R.best, L.suffix + R.prefix})
  };
};

We can now instantiate an augmented BST where the value type is int, but the augmented value type is mcss_node. The identity element is the neutral segment corresponding to value 0.

using MCSSTree = par::bst<int, int, std::less<>, decltype(mcss_combine), mcss_node>;

auto T = MCSSTree::empty(std::less<>{}, mcss_combine, mcss_node{0})
                  .insert(0, 3)
                  .insert(1, -5)
                  .insert(2, 4)
                  .insert(3, 2)
                  .insert(4, -1);

int answer = T.reduce_val().best;  // 6

The call to reduce_val() returns the augmented value stored at the root. The best field contains the maximum contiguous subsequence sum of the values in key order.

The power of using an augmented BST rather than just a sequence with reduce is that we can now efficiently compute the MCSS of any range of values in logarithmic time, and even update the values and insert new ones in between such queries.

// Continuing the example above

// Split into trees L = {0:3, 1:-5, 2:4} and R = {4:-1}
auto [L, v, R] = T.split(3);
answer = L.reduce_val().best;  // 4

// Remove 1:-5, so L2 = {0:3, 2:4}
auto L2 = L.remove(1);
answer = L2.reduce_val().best;  // 7