The bst Data Structure

par::bst is a persistent balanced binary search tree supporting set and dictionary operations. All update operations return a new tree and do not modify the original.

The tree optionally supports augmentation via an associative combine function and identity value. These parameters allow for quering the reduced value of the tree in constant time.

par::bst is implemented as a randomized BST; all logarithmic cost bounds hold with high probability. In cost bounds, $n$ refers to the current size (number of elements) of the tree.

Template Parameters

The full type of the tree is:

par::bst<
   KeyType,
   ValueType,
   Compare =      /* default */,
   Combine =      /* default */,
   IdentityType = /* default */
 >

The template parameters are:

KeyType - the key type (must be comparable by Compare)
ValueType - the value type stored at each key
Compare - a comparison function object [Default: std::less<>]
Combine - an associative function used for subtree augmentation [Default: a function that returns no_augmented_val]
IdentityType - the identity type for Combine [Default: the return type of combine]

If Compare is omitted, the tree uses KeyType's operator< by default. If Combine and IdentityType are omitted, the tree behaves as a standard sorted dictionary without augmentation.

Type Alias Members

The bst class defines several public type aliases that refer to its template parameters and derived types.

key_type - the key type (alias of KeyType).
value_type - the value type (alias of ValueType).
augmented_value_type - the type produced by Combine, and returned by reduce_val(). [Default: no_augmented_val].
identity_type - the type of the identity value used for augmentation. [Default: same as augmented_value_type]

Advanced: Custom Types

In most use cases, augmented_value_type and identity_type are simply the same as value_type. However, it is also possible to use a different augmented type, provided that:

Combine is associative on augmented_value_type, and
The identity value, when converted to augmented_value_type, acts as a valid identity.

Since leaf nodes conceptually contribute their stored value to the augmentation via combine, the following expressions must be valid and produce augmented_value_type:

combine(identity_type, value_type) -> augmented_value_type
combine(augmented_value_type, identity_type) -> augmented_value_type

An example of this is provided later where we show how to implement dynamic MCSS. This is optional. You do not have to implement anything this complex for your labs.

Creating Trees

Although bst has a public constructor, it is preferred to create trees using the provided factory functions: empty, singleton, and from_sequence. These functions make the intended usage clearer and avoid mistakes when working with custom comparison or augmentation parameters.

Constructor	Work	Span
`bst( Compare less = {}, // default Combine combine = {}, // default identity_type identity = {} // default )`	$O(1)$	$O(1)$

Constructs an empty tree with the given comparison function, combine function, and identity value. All three parameters are optional and will take their default values if not specified.

par::bst<int,int> T;   // empty BST with key type int and value type int

`empty`	Work	Span
`static auto empty( Compare less = {}, // default Combine combine = {}, // default identity_type identity = {} // default ) -> bst`	$O(1)$	$O(1)$

Constructs an empty tree with the given comparison function, combine function, and identity value. All three parameters are optional and will take their default values if not specified.

auto T = par::bst<int,int>::empty();

`singleton`	Work	Span
`static auto singleton( key_type key, value_type value, Compare less = {}, // default Combine combine = {}, // default identity_type identity = {} // default ) -> bst`	$O(1)$	$O(1)$

Returns a tree containing a single key-value pair, using the given comparison function, combine function, and identity value. These three parameters are optional and will take their default values if not specified.

auto T = par::bst<int,int>::singleton(5, 42);

`from_sequence`	Work	Span
`static auto from_sequence( const sequence auto& S, Compare less = {}, // default Combine combine = {}, // default identity_type identity = {} // default ) -> bst`	$O(\|S\|)$	$O(\log \|S\|)$

Constructs a balanced tree from a sequence of key-value pairs, using the given comparison function, combine function, and identity value. These three parameters are optional and will take their default values if not specified. The input sequence must already be sorted by key according to less.

par::array_sequence a{
  std::pair{1,10},
  std::pair{2,20},
  std::pair{3,30}
};

auto T = par::bst<int,int>::from_sequence(a);

Queries

These operations do not modify the tree. Since the tree is persistent, all operations are logically read-only.

`size`	Work	Span
`auto size() -> std::size_t`	$O(1)$	$O(1)$

Returns the number of elements in the tree.

std::size_t n = T.size();

`find`	Work	Span
`auto find(const key_type& key) -> std::optional<value_type>`	$O(\log n)$^*	$O(\log n)$^*

Return an optional containing the value associated with key if it exists, or an empty optional otherwise.

if (auto v = T.find(10)) {
  std::cout << "value = " << v.value() << "\n";
}

`first`	Work	Span
`auto first() -> std::optional<std::pair<const key_type&, const value_type&>>`	$O(\log n)$^*	$O(\log n)$^*

Return an optional containing the key-value pair with the smallest key in the tree, or an empty optional if the tree is empty.

`last`	Work	Span
`auto last() -> std::optional<std::pair<const key_type&, const value_type&>>`	$O(\log n)$^*	$O(\log n)$^*

Return an optional containing the key-value pair with the largest key in the tree, or an empty optional if the tree is empty.

`next`	Work	Span
`auto next(const key_type& k) -> std::optional<std::pair<const key_type&, const value_type&>>`	$O(\log n)$^*	$O(\log n)$^*

Return an optional containing the key-value pair with the smallest strictly greater than k, or an empty optional if no such element exists (i.e., the tree is empty or k is the greatest key).

`prev`	Work	Span
`auto prev(const key_type& k) -> std::optional<std::pair<const key_type&, const value_type&>>`	$O(\log n)$^*	$O(\log n)$^*

Return an optional containing the key-value pair with the greatest strictly less than k, or an empty optional if no such element exists (i.e., the tree is empty or k is the least key).

`reduce_val`	Work	Span
`auto reduce_val() -> augmented_value_type`	$O(1)$	$O(1)$

Returns the augmented value stored at the root of the tree. If the tree is empty, returns the identity value.

(Persistent) Updates

These operations treat the tree as a sorted dictionary. All operations return a new tree and do not modify the original.

`insert`	Work	Span
`auto insert( key_type key, value_type value ) -> bst`	$O(\log n)$^*	$O(\log n)$^*

Returns a new tree with (key, value) inserted. If the key already exists, its value is replaced.

auto T = par::bst<int,int>::empty();
T = T.insert(5, 10);
T = T.insert(3, 7);

`remove`	Work	Span
`auto remove(const key_type& key) const -> bst;`	$O(\log n)$^*	$O(\log n)$^*

Returns a new BST with key removed. If the key is not present, the tree is returned unchanged.

Note: Its called remove rather than delete since delete is a keyword in C++.

par::bst<int,int> T;
T = T.insert(1, 10);
T = T.insert(2, 20);

auto T2 = T.remove(1);

// T is unchanged
// T2 contains only key 2

`get_range`	Work	Span
`auto get_range( const key_type& k1, const key_type& k2 ) -> bst`	$O(\log n)$^*	$O(\log n)$^*

Returns a new tree containing all key-value pairs with keys in the range $(k_1, k_2)$, i.e., all keys $k$ such that $k_1 < k < k_2$.

auto R = T.get_range(10, 20);

`filter`	Work	Span
`auto filter(auto&& predicate) -> bst`	$O(n)$	$O(\log n)$^*

Returns a tree containing only elements for which predicate(value) returns true.

auto positives = T.filter([](int v) {
  return v > 0;
});

`filter_key`	Work	Span
`auto filter_key(auto&& predicate) -> bst`	$O(n)$	$O(\log n)$^*

Returns a tree containing only elements for which predicate(key, value) returns true.

auto even_keys = T.filter_key([](int k, int) {
  return k % 2 == 0;
});

`map`	Work	Span
`auto map(auto&& f) -> bst`	$O(n)$	$O(\log n)$^*

Returns a tree with the same keys but with values transformed by f. The values returned by $$f$$ must be the same type as value_type, and the resulting tree will use the same combine function and identity value.

auto doubled = T.map([](int v) {
  return 2 * v;
});

Structural Operations

These operations split and combine trees based on keys.

`split`	Work	Span
`auto split(const key_type& key) -> std::tuple<bst, std::optional<value_type>, bst>`	$O(\log n)$^*	$O(\log n)$^*

Splits the tree into three parts:

L: all keys strictly less than key
v: the value associated with key, if present
R: all keys strictly greater than key

auto [L, v, R] = T.split(10);

if (v) {
  std::cout << "value = " << v.value() << "\n";
}

`join`	Work	Span
`friend auto join( bst L, bst R ) -> bst`	$O(\log (\|L\| + \|R\|))$^*	$O(\log (\|L\| + \|R\|))$^*

Joins two trees L and R into a single tree.

The result copies the comparator, combine function, and identity value from L and R, which must have the same comparator, combine, and identity value or the result will be inconsistent.

Precondition: all keys in L must be strictly less than all keys in R.

auto T = join(L, R);

`join_mid`	Work	Span
`friend auto join_mid( bst L, key_type key, value_type value, bst R ) -> bst`	$O(\log (\|L\| + \|R\|))$^*	$O(\log (\|L\| + \|R\|))$^*

Creates a new balanced tree from the contents of L, the key,value pair (key, value), and R. The resulting tree will be balanced and hence it is not guaranteed that (key, value) is the root node of the result.

The result copies the comparator, combine function, and identity value from L and R, which must have the same comparator, combine, and identity value or the result will be inconsistent.

Precondition: all keys in L must be strictly less than key, and all keys in R must be strictly greater than key.

auto T = join_mid(L, 10, 42, R);

Set-Theoretic Operations

The bst type supports the set theoretic operations of union, intersection, and difference. Since these functions are primarily intended to maintain sets or maps with unique keys, they do not implement tiebreaking for values. That is, if the union or intersection of two bsts contains a key that was present in both, it will arbitrarily have one of the two corresponding values.

In the cost bounds, $m$ is the size of the smaller of A and B, and $n$ is the size of the larger of A and B. This means that if A and B have the same size, the work is linear in their size.

`set_union`	Work	Span
`friend auto set_union( const bst& A, const bst& B ) -> bst;`	$O(m + m \log(\frac{n}{m}))$^*	$O(\log n)$^*

Returns a new BST containing all keys that appear in either A or B, and their corresponding values. If a key appears in both trees, an arbitrary value from one of the two keys is selected for the result.

// Note: its not par::set_union
auto U = set_union(A, B);

`set_intersection`	Work	Span
`friend auto set_intersection( const bst& A, const bst& B ) -> bst;`	$O(m + m \log(\frac{n}{m}))$^*	$O(\log n)$^*

Returns a new BST containing only the keys that appear in both A and B. For each key, an arbitrary value from one of the two corresponding keys is selected for the result.

// Note: its not par::set_intersection
auto I = set_intersection(A, B);

`set_difference`	Work	Span
`friend auto set_difference( const bst& A, const bst& B ) -> bst;`	$O(m + m \log(\frac{n}{m}))$^*	$O(\log n)$^*

Returns a new BST containing all keys that appear in A but not in B, and their corresponding values.

// Note: its not par::set_difference
auto D = set_difference(A, B);

Exposing the Tree

Unlike typical ordered set or dictionary implementations (e.g., std::set), this BST provides operations that expose its recursive structure. These operations allow algorithms to directly access the root key, value, and subtrees.

Since bst is persistent, exposing a tree does not make a deep copy of the underlying data structure. All operations described in this section therefore run in constant time.

`expose`	Work	Span
`struct exposed_tree { const key_type& key; const value_type& value; bst left; bst right; }; friend auto expose(const bst& tree) -> std::optional<exposed_tree>`	$O(1)$	$O(1)$

If tree is empty, returns an empty optional. Otherwise returns an optional containing an exposed_tree, which consists of:

The root key
The root value
The left subtree
The right subtree

The left and right subtrees share structure with the original tree. No deep copies are performed.

A common pattern when writing recursive tree algorithms is:

Call expose(tree)
Check whether the result is empty
Handle the non-empty case using the exposed fields

auto e = expose(T);
if (!e.has_value()) {
  return std::nullopt;
} else {
  auto& [k, v, L, R] = e.value();
  // process non-empty case
}

As a convenient helper function, expose_then essentially performs this pattern.

`expose_then`	Work	Span
`friend auto expose_then(const bst& tree, auto&& f) -> std::optional<...> // f must return a std::optional of some type`	$O(1)$	$O(1)$

Applies a user-provided function f, which takes a key, a value, a left tree, and a right tree, to the exposed tree if the tree is non-empty, and returns the result. If the tree is empty, returns an empty optional.

The function f must return a std::optional. expose_then runs in constant time only if f runs in constant time, otherwise it costs the same as invoking f.

Conceptually, expose_then behaves like:

if (auto e = expose(tree)) {
  return f(e->key, e->value, e->left, e->right);
} else {
  return std::nullopt;
}

Augmentation Examples

This section demonstrates common ways to use augmentation. In each case, we provide a combine function and identity value. The combine function must be associative, and the identity must be a proper identity for that operation.

Note that in C++, for functions and templates that have default parameters, like our BST, the user is always required to provide a prefix of them. We can not do Python-style named parameters unfortunately. That means if we want to specify a custom combine function, we must specify the comparison function, even if we just wanted the default one, since the argument order of the defaults is compare, combine, identity.

Subtree Sum

using SumTree = par::bst<int, int, std::less<>, std::plus<>>;

auto T = SumTree::empty()
                 .insert(1, 10)
                 .insert(2, 20)
                 .insert(3, 30);

int total = T.reduce_val();  // 60

Here the augmented value at each node stores the sum of its subtree. Calling reduce_val() returns the total sum.

In this code, we allowed all of the parameters of empty() to default since std::less<> and std::plus<> have no state, and 0 is the correct identity for addition.

Subtree Product

using ProductTree = par::bst<int, int, std::less<>, std::multiplies<>>;

auto T = ProductTree::empty({}, {}, 1)
                     .insert(1, 2)
                     .insert(2, 3)
                     .insert(3, 4);

int product = T.reduce_val();  // 24

The identity must be 1, since 1 is the identity for multiplication. Letting it default to 0 would be incorrect. Since the identity is the final argument, we are required to specify all three parameters. We use empty braces {} to make those arguments take their default values.

Subtree Maximum

using MaxTree = par::bst<int, int, std::less<> ,decltype(std::ranges::max)>;

auto T = MaxTree::empty({}, std::ranges::max, std::numeric_limits<int>::min())
                 .insert(1, -10)
                 .insert(2, 25)
                 .insert(3, 7);

int m = T.reduce_val();  // 25

Here decltype(std::ranges::max) means "the type of the std::ranges::max function object." Since we have negative numbers, the identity must be the smallest possible value, which we can get from std::numeric_limits. If all of the numbers were non-negative, we could use 0 as the identity instead.

Advanced Example: MCSS

This example demonstrates that the augmentation framework is powerful enough to implement nontrivial divide-and-conquer summaries such as the maximum contiguous subsequence sum (MCSS). This example is provided for interest only - you do not need to implement anything remotely this complex for the labs.

As we know from MCSSLab, we need to store four values as our augmented value:

total - sum of all values in the subtree
prefix - maximum prefix sum
suffix - maximum suffix sum
best - maximum contiguous subsequence sum

We encode this as a struct that can be constructed from a single integer. This is how leaf/base-case values are converted into the correct type, since this is an example where value_type = int but augmented_value_type = mcss_node. Since the combine function operates on mcss_node, values of type int must be implicitly convertible to mcss_node.

struct mcss_node {
  int total;
  int prefix;
  int suffix;
  int best;

  // Base case
  mcss_node(int x)
    : total(x),
      prefix(std::max(x, 0)),
      suffix(std::max(x, 0)),
      best(std::max(x, 0)) { }

  mcss_node(int t, int p, int s, int b)
      : total(t),
        prefix(p),
        suffix(s),
        best(b) { }
};

Next, we define an associative combine operation that merges two segments. This should also remind you of MCSSLab.

auto mcss_combine = [](const mcss_node& L, const mcss_node& R) {
  return mcss_node{
      L.total + R.total,
      std::max(L.prefix, L.total + R.prefix),
      std::max(R.suffix, R.total + L.suffix),
      std::max({L.best, R.best, L.suffix + R.prefix})
  };
};

We can now instantiate an augmented BST where the value type is int, but the augmented value type is mcss_node. The identity element is the neutral segment corresponding to value 0.

using MCSSTree = par::bst<int, int, std::less<>, decltype(mcss_combine), mcss_node>;

auto T = MCSSTree::empty(std::less<>{}, mcss_combine, mcss_node{0})
                  .insert(0, 3)
                  .insert(1, -5)
                  .insert(2, 4)
                  .insert(3, 2)
                  .insert(4, -1);

int answer = T.reduce_val().best;  // 6

The call to reduce_val() returns the augmented value stored at the root. The best field contains the maximum contiguous subsequence sum of the values in key order.

The power of using an augmented BST rather than just a sequence with reduce is that we can now efficiently compute the MCSS of any range of values in logarithmic time, and even update the values and insert new ones in between such queries.

// Continuing the example above

// Split into trees L = {0:3, 1:-5, 2:4} and R = {4:-1}
auto [L, v, R] = T.split(3);
answer = L.reduce_val().best;  // 4

// Remove 1:-5, so L2 = {0:3, 2:4}
auto L2 = L.remove(1);
answer = L2.reduce_val().best;  // 7