par::bst is a persistent balanced binary search tree supporting
set and dictionary operations. All update operations return a new tree and
do not modify the original.
The tree optionally supports augmentation via an associative combine function and identity value. These parameters allow for quering the reduced value of the tree in constant time.
par::bst is implemented as a randomized BST; all logarithmic
cost bounds hold with high probability. In cost bounds, $n$ refers
to the current size (number of elements) of the tree.
The full type of the tree is:
par::bst<
KeyType,
ValueType,
Compare = /* default */,
Combine = /* default */,
IdentityType = /* default */
>
The template parameters are:
KeyType - the key type (must be comparable by Compare)ValueType - the value type stored at each keyCompare - a comparison function object [Default: std::less<>]Combine - an associative function used for subtree augmentation [Default: a function that returns no_augmented_val]IdentityType - the identity type for Combine [Default: the return type of combine]
If Compare is omitted, the tree uses KeyType's operator< by default.
If Combine and IdentityType are omitted, the tree behaves
as a standard sorted dictionary without augmentation.
The bst class defines several public type aliases that refer to
its template parameters and derived types.
key_type - the key type (alias of KeyType).
value_type - the value type (alias of ValueType).
augmented_value_type - the type produced by Combine, and returned by reduce_val(). [Default: no_augmented_val].
identity_type - the type of the identity value used for augmentation. [Default: same as augmented_value_type]
In most use cases, augmented_value_type and
identity_type are simply the same as
value_type. However, it is also possible to use a
different augmented type, provided that:
Combine is associative on
augmented_value_type, and
augmented_value_type, acts as a valid identity.
Since leaf nodes conceptually contribute their stored value to the augmentation via combine, the following expressions must be valid and produce augmented_value_type:
combine(identity_type, value_type) -> augmented_value_type
combine(augmented_value_type, identity_type) -> augmented_value_type
An example of this is provided later where we show how to implement dynamic MCSS. This is optional. You do not have to implement anything this complex for your labs.
Although bst has a public constructor, it is preferred to
create trees using the provided factory functions:
empty, singleton, and from_sequence.
These functions make the intended usage clearer and avoid mistakes when
working with custom comparison or augmentation parameters.
| Constructor | Work | Span |
|---|---|---|
|
$O(1)$ | $O(1)$ |
Constructs an empty tree with the given comparison function, combine function, and identity value. All three parameters are optional and will take their default values if not specified.
par::bst<int,int> T; // empty BST with key type int and value type int
empty |
Work | Span |
|---|---|---|
|
$O(1)$ | $O(1)$ |
Constructs an empty tree with the given comparison function, combine function, and identity value. All three parameters are optional and will take their default values if not specified.
auto T = par::bst<int,int>::empty();
singleton |
Work | Span |
|---|---|---|
|
$O(1)$ | $O(1)$ |
Returns a tree containing a single key-value pair, using the given comparison function, combine function, and identity value. These three parameters are optional and will take their default values if not specified.
auto T = par::bst<int,int>::singleton(5, 42);
from_sequence |
Work | Span |
|---|---|---|
|
$O(|S|)$ | $O(\log |S|)$ |
Constructs a balanced tree from a sequence of key-value pairs, using the given comparison function,
combine function, and identity value. These three parameters are optional
and will take their default values if not specified.
The input sequence must already be sorted by key according to less.
par::array_sequence a{
std::pair{1,10},
std::pair{2,20},
std::pair{3,30}
};
auto T = par::bst<int,int>::from_sequence(a);
These operations do not modify the tree. Since the tree is persistent, all operations are logically read-only.
size |
Work | Span |
|---|---|---|
|
$O(1)$ | $O(1)$ |
Returns the number of elements in the tree.
std::size_t n = T.size();
find |
Work | Span |
|---|---|---|
|
$O(\log n)$* | $O(\log n)$* |
Return an optional containing the value associated with key if it exists,
or an empty optional otherwise.
if (auto v = T.find(10)) {
std::cout << "value = " << v.value() << "\n";
}
first |
Work | Span |
|---|---|---|
|
$O(\log n)$* | $O(\log n)$* |
Return an optional containing the key-value pair with the smallest key in the tree, or an empty optional if the tree is empty.
last |
Work | Span |
|---|---|---|
|
$O(\log n)$* | $O(\log n)$* |
Return an optional containing the key-value pair with the largest key in the tree, or an empty optional if the tree is empty.
next |
Work | Span |
|---|---|---|
|
$O(\log n)$* | $O(\log n)$* |
Return an optional containing the key-value pair with the smallest strictly greater than k,
or an empty optional if no such element exists (i.e., the tree is empty or k is the greatest key).
prev |
Work | Span |
|---|---|---|
|
$O(\log n)$* | $O(\log n)$* |
Return an optional containing the key-value pair with the greatest strictly less than k,
or an empty optional if no such element exists (i.e., the tree is empty or k is the least key).
reduce_val |
Work | Span |
|---|---|---|
|
$O(1)$ | $O(1)$ |
Returns the augmented value stored at the root of the tree. If the tree is empty, returns the identity value.
These operations treat the tree as a sorted dictionary. All operations return a new tree and do not modify the original.
insert |
Work | Span |
|---|---|---|
|
$O(\log n)$* | $O(\log n)$* |
Returns a new tree with (key, value) inserted.
If the key already exists, its value is replaced.
auto T = par::bst<int,int>::empty();
T = T.insert(5, 10);
T = T.insert(3, 7);
remove |
Work | Span |
|---|---|---|
|
$O(\log n)$* | $O(\log n)$* |
Returns a new BST with key removed. If the key is not present,
the tree is returned unchanged.
Note: Its called remove rather than delete since delete is
a keyword in C++.
par::bst<int,int> T;
T = T.insert(1, 10);
T = T.insert(2, 20);
auto T2 = T.remove(1);
// T is unchanged
// T2 contains only key 2
get_range |
Work | Span |
|---|---|---|
|
$O(\log n)$* | $O(\log n)$* |
Returns a new tree containing all key-value pairs with keys in the range $(k_1, k_2)$, i.e., all keys $k$ such that $k_1 < k < k_2$.
auto R = T.get_range(10, 20);
filter |
Work | Span |
|---|---|---|
|
$O(n)$ | $O(\log n)$* |
Returns a tree containing only elements for which
predicate(value) returns true.
auto positives = T.filter([](int v) {
return v > 0;
});
filter_key |
Work | Span |
|---|---|---|
|
$O(n)$ | $O(\log n)$* |
Returns a tree containing only elements for which
predicate(key, value) returns true.
auto even_keys = T.filter_key([](int k, int) {
return k % 2 == 0;
});
map |
Work | Span |
|---|---|---|
|
$O(n)$ | $O(\log n)$* |
Returns a tree with the same keys but with values transformed by
f. The values returned by $$f$$ must be the same type as value_type,
and the resulting tree will use the same combine function and identity value.
auto doubled = T.map([](int v) {
return 2 * v;
});
These operations split and combine trees based on keys.
split |
Work | Span |
|---|---|---|
|
$O(\log n)$* | $O(\log n)$* |
Splits the tree into three parts:
L: all keys strictly less than keyv: the value associated with key, if presentR: all keys strictly greater than keyauto [L, v, R] = T.split(10);
if (v) {
std::cout << "value = " << v.value() << "\n";
}
join |
Work | Span |
|---|---|---|
|
$O(\log (|L| + |R|))$* | $O(\log (|L| + |R|))$* |
Joins two trees L and R into a single tree.
The result copies the comparator, combine function, and identity value from
L and R, which must have the same comparator, combine, and identity
value or the result will be inconsistent.
Precondition: all keys in L must be strictly less than
all keys in R.
auto T = join(L, R);
join_mid |
Work | Span |
|---|---|---|
|
$O(\log (|L| + |R|))$* | $O(\log (|L| + |R|))$* |
Creates a new balanced tree from the contents of L, the key,value pair (key, value),
and R. The resulting tree will be balanced and hence it is not guaranteed that (key, value)
is the root node of the result.
The result copies the comparator, combine function, and identity value from
L and R, which must have the same comparator, combine, and identity
value or the result will be inconsistent.
Precondition: all keys in L must be strictly less than
key, and all keys in R must be strictly greater
than key.
auto T = join_mid(L, 10, 42, R);
The bst type supports the set theoretic operations of union,
intersection, and difference. Since these functions are primarily intended to
maintain sets or maps with unique keys, they do not implement tiebreaking for values.
That is, if the union or intersection of two bsts contains a key that was
present in both, it will arbitrarily have one of the two corresponding values.
In the cost bounds, $m$ is the size of the smaller of A and B, and $n$ is the size of the larger
of A and B. This means that if A and B have the same size, the work is linear in their size.
set_union |
Work | Span |
|---|---|---|
|
$O(m + m \log(\frac{n}{m}))$* | $O(\log n)$* |
Returns a new BST containing all keys that appear in either A or B,
and their corresponding values. If a key appears in both trees, an arbitrary value from one of the two keys
is selected for the result.
// Note: its not par::set_union
auto U = set_union(A, B);
set_intersection |
Work | Span |
|---|---|---|
|
$O(m + m \log(\frac{n}{m}))$* | $O(\log n)$* |
Returns a new BST containing only the keys that appear in both A and B.
For each key, an arbitrary value from one of the two corresponding keys is selected for the result.
// Note: its not par::set_intersection
auto I = set_intersection(A, B);
set_difference |
Work | Span |
|---|---|---|
|
$O(m + m \log(\frac{n}{m}))$* | $O(\log n)$* |
Returns a new BST containing all keys that appear in A
but not in B, and their corresponding values.
// Note: its not par::set_difference
auto D = set_difference(A, B);
Unlike typical ordered set or dictionary implementations
(e.g., std::set), this BST provides operations that expose
its recursive structure. These operations allow algorithms to directly
access the root key, value, and subtrees.
Since bst is persistent, exposing a tree does not make a deep
copy of the underlying data structure. All operations described in this section
therefore run in constant time.
expose |
Work | Span |
|---|---|---|
|
$O(1)$ | $O(1)$ |
If tree is empty, returns an empty optional.
Otherwise returns an optional containing an exposed_tree,
which consists of:
The left and right subtrees share structure with the original tree. No deep copies are performed.
A common pattern when writing recursive tree algorithms is:
expose(tree)auto e = expose(T);
if (!e.has_value()) {
return std::nullopt;
} else {
auto& [k, v, L, R] = e.value();
// process non-empty case
}
As a convenient helper function, expose_then essentially performs this pattern.
expose_then |
Work | Span |
|---|---|---|
|
$O(1)$ | $O(1)$ |
Applies a user-provided function f, which takes a key, a value, a left tree, and
a right tree, to the exposed tree if the tree is non-empty, and returns the result.
If the tree is empty, returns an empty optional.
The function f must return a std::optional. expose_then
runs in constant time only if f runs in constant time, otherwise it costs the
same as invoking f.
Conceptually, expose_then behaves like:
if (auto e = expose(tree)) {
return f(e->key, e->value, e->left, e->right);
} else {
return std::nullopt;
}
This section demonstrates common ways to use augmentation. In each case, we provide a combine function and identity value. The combine function must be associative, and the identity must be a proper identity for that operation.
Note that in C++, for functions and templates that have default
parameters, like our BST, the user is always required to provide
a prefix of them. We can not do Python-style named parameters unfortunately.
That means if we want to specify a custom
combine function, we must specify the comparison function, even if
we just wanted the default one, since the argument order of the defaults
is compare, combine, identity.
using SumTree = par::bst<int, int, std::less<>, std::plus<>>;
auto T = SumTree::empty()
.insert(1, 10)
.insert(2, 20)
.insert(3, 30);
int total = T.reduce_val(); // 60
Here the augmented value at each node stores the sum of its subtree.
Calling reduce_val() returns the total sum.
In this code, we allowed all of the parameters of empty() to default since std::less<>
and std::plus<> have no state, and 0 is the correct identity for addition.
using ProductTree = par::bst<int, int, std::less<>, std::multiplies<>>;
auto T = ProductTree::empty({}, {}, 1)
.insert(1, 2)
.insert(2, 3)
.insert(3, 4);
int product = T.reduce_val(); // 24
The identity must be 1, since 1 is the identity for multiplication. Letting it default to 0
would be incorrect. Since the identity is the final argument, we are required to specify all three parameters. We use empty
braces {} to make those arguments take their default values.
using MaxTree = par::bst<int, int, std::less<> ,decltype(std::ranges::max)>;
auto T = MaxTree::empty({}, std::ranges::max, std::numeric_limits<int>::min())
.insert(1, -10)
.insert(2, 25)
.insert(3, 7);
int m = T.reduce_val(); // 25
Here decltype(std::ranges::max) means
"the type of the std::ranges::max function object."
Since we have negative numbers, the identity must be the smallest possible value,
which we can get from std::numeric_limits. If all of the numbers were non-negative,
we could use 0 as the identity instead.
This example demonstrates that the augmentation framework is powerful enough to implement nontrivial divide-and-conquer summaries such as the maximum contiguous subsequence sum (MCSS). This example is provided for interest only - you do not need to implement anything remotely this complex for the labs.
As we know from MCSSLab, we need to store four values as our augmented value:
total - sum of all values in the subtreeprefix - maximum prefix sumsuffix - maximum suffix sumbest - maximum contiguous subsequence sum
We encode this as a struct that can be constructed from a single integer.
This is how leaf/base-case values are converted into the correct type, since
this is an example where value_type = int but augmented_value_type = mcss_node.
Since the combine function operates on mcss_node, values of type int must be implicitly convertible to mcss_node.
struct mcss_node {
int total;
int prefix;
int suffix;
int best;
// Base case
mcss_node(int x)
: total(x),
prefix(std::max(x, 0)),
suffix(std::max(x, 0)),
best(std::max(x, 0)) { }
mcss_node(int t, int p, int s, int b)
: total(t),
prefix(p),
suffix(s),
best(b) { }
};
Next, we define an associative combine operation that merges two segments. This should also remind you of MCSSLab.
auto mcss_combine = [](const mcss_node& L, const mcss_node& R) {
return mcss_node{
L.total + R.total,
std::max(L.prefix, L.total + R.prefix),
std::max(R.suffix, R.total + L.suffix),
std::max({L.best, R.best, L.suffix + R.prefix})
};
};
We can now instantiate an augmented BST where the value type is
int, but the augmented value type is mcss_node.
The identity element is the neutral segment corresponding to value 0.
using MCSSTree = par::bst<int, int, std::less<>, decltype(mcss_combine), mcss_node>;
auto T = MCSSTree::empty(std::less<>{}, mcss_combine, mcss_node{0})
.insert(0, 3)
.insert(1, -5)
.insert(2, 4)
.insert(3, 2)
.insert(4, -1);
int answer = T.reduce_val().best; // 6
The call to reduce_val() returns the augmented value stored at the root.
The best field contains the maximum contiguous subsequence sum
of the values in key order.
The power of using an augmented BST rather than just a sequence with reduce is that we can now efficiently compute the MCSS of any range of values in logarithmic time, and even update the values and insert new ones in between such queries.
// Continuing the example above
// Split into trees L = {0:3, 1:-5, 2:4} and R = {4:-1}
auto [L, v, R] = T.split(3);
answer = L.reduce_val().best; // 4
// Remove 1:-5, so L2 = {0:3, 2:4}
auto L2 = L.remove(1);
answer = L2.reduce_val().best; // 7