Introduction to Modern C++ for 15-210

This section gives a brief overview of the C++ features you will need for this course. It is not a complete introduction to C++, and it intentionally omits many advanced language features. Our goal is to give you the right mental model for reading and writing the code used in the labs and in the provided library.

C++ is often regarded as a scary language, because it contains many advanced features. While this is true, you do not need to know most of those features to write most code! In particular, in 15-210, we will not require any of the following C++ features:

You are assumed to already be comfortable with basic C-style syntax (variables, conditionals, loops, functions) from 15-122. This guide is meant as a quick start on extending your knowledge of C from 15-122 with the minimum amount of C++ needed to succeed in 15-210.

Warning: Note that you should actively avoid using things on the list above, even if you've seen them before in C. For example, you should not need to use pointers or manual memory management. Doing so, even if your code ends up working, will make your code much more error prone and hard to debug. There will always be a simpler and cleaner option in C++! If you learned these concepts before, you should temporarily pretend you didn't and the labs will be easier. 15-210 does enforce style grading for labs, so you should always strive to implement your solutions using the appropriate tools and abstractions, for your own sake and well as your grade.

Value Semantics

The most important idea to understand first about C++, because it differs from most other languages you may have learned (Java, Python, JavaScript), is that it uses value semantics. Variables store values, and assigning a variable or passing it to a function makes a copy by default.

int x = 5;
int y = x;
y = 7;
// x is still 5

This also applies to containers, which is even more important, since the copy-behavior of most containers is a deep copy.

std::vector<int> a{1, 2, 3};
std::vector<int> b = a;
b[0] = 10;  // a is still {1, 2, 3}, while b is now {10, 2, 3}

This behavior is very different from languages like Python or Java, where assignment usually creates another reference to the same object. It is also different from SML, where values are immutable. In C++, values are mutable, but copying them produces an independent copy. For objects like containers, this means that the default behavior is expensive! Copying an entire container takes linear work since every element must be copied, while producing a reference is essentially free.

References and const

A reference is an alias for an existing object. Reference types are written by adding & after the name of the type, so int& (usually pronounced "int ref" when spoken) is the type of a reference to an integer.

int x = 5;
int& r = x;  // r is a reference to the same object as x
r = 7;       // x is now 7
Warning: Although both use the & symbol, this is not in any way shape or form related to C's address-of operator, which is applied to an object to obtain its address. When written to the right of a typename, & denotes a reference type.

References have two main purposes:

In this class, we are primarily interested in the second reason. The first reason is useful, but it results in code that is not functional-style since it explicitly permits mutation, which we want to minimize. Therefore, we very often use constant references:

std::vector<int> a{1, 2, 3};
const std::vector<int>& b = a;
b[0] = 10;  // compilation error! b is declared const so it can not be mutated

In this example, the reference is const but the underlying vector is still mutable. The reference can therefore see mutation:

std::vector<int> a{1, 2, 3};
const std::vector<int>& b = a;
a[0] = 10;  // valid, and now b[0] == 10 since its a reference

For additional safety, const can be applied to object types, not just reference types. This makes the object logically immutable.

const std::vector<int> a{1, 2, 3};
const std::vector<int>& b = a;
a[0] = 10;  // compile error!
b[0] = 10;  // compile error!

Parameter Passing

Function parameters also follow the same value category rules as variables. They can be value types, reference types, and can be const or not. Accepting a parameter by value results in a copy. This is okay for cheap types (like integers) but is not ideal for large types where deep copying them could be expensive, like most container types (vectors, sequences, etc).

void f(std::vector<int> v);  // Copies v: takes linear work, bad

Function parameters can also be passed as mutable references. This avoids copies but allows the function to mutate the argument which makes it not functional style and harder to reason about. You also can not pass temporaries/literals as mutable references.

void f(std::vector<int>& v);  // No copy, but mutable, dangerous

For the best of both worlds, a good default way to take parameters is as const references, since this both prevents expensive copies and prevents mutation.

void f(const std::vector<int>& v);  // No copy, no mutation, nice

A function parameter of type const T& means "share this object read-only." In other words, it is close to the semantics of SML and other functional languages, and therefore very useful for functional-style code! Most sequence parameters in this course are passed as constant references inside generic code to avoid accidental mutation or accidental expensive copies.

Optional: There is another value category called an "rvalue reference", which would look like std::vector<int>&&, but this is more advanced than what we need for this class.

auto and Generic Functions

The auto keyword asks the compiler to deduce a type automatically. You may see it used for variable declarations:

auto x = 3;      // int
auto y = 2.5;    // double

This is not however the most important use of auto. In this course, auto is especially important for writing generic functions.

auto f(auto x) {
  // works for any parameter type
}

This says that the function f can take an argument of any type. The type of the parameter s is deduced from the type of the argument. Importantly, generic function parameters follow the same value category rules as ordinary function parameters (with one exception that we will mention below). They can be accepted by value, mutable reference, or const reference.

auto f(auto x);          // Makes a copy of the argument
auto f(auto& x);         // (Potentially) mutable reference to the argument
auto f(const auto& x);   // Immutable reference to the argument

You will see auto frequently in function parameters and return types throughout the library. As a rule of thumb: if a generic function should not modify nor copy its argument, it should take it as a const auto& parameter. Just like non-generic functions, this should be our default way of passing parameters.

Optional: Why did I say that f(auto& x) binds its parameter as a potentially mutable reference? Because that pattern can in fact bind to a const reference as well! If you call f(auto& x) using an argument of type const int, then auto will deduce the underlying type as const int, which materializes the generic parameter into const int&. In other words, auto is a placeholder for a type, and const is not just an annotation, its part of the type.

Even-more-generic parameters!

There is one final way of taking generic parameters, which you will not have to write but you may see frequently in the library signatures: auto&&. The full description and rules of what auto&& does are too complicated, but the summary is that auto&& binds to the best-matching value category given its argument. You should just think of auto&& as binds to literally anything.

Optional: If you want a little bit of an idea of why auto&& exists, consider a generic function f that we want to accept any argument. What value category should we use if we don't know anything about the type(s) in advance? So, to write fully generic code, none of these work. Having f(auto&& x) solves this problem! It binds to references or non-referencable objects (e.g., temporaries/literals) and it binds to const or non-const. It is the most generic parameter possible!

Lambda Functions

Lambda functions are a convenient syntax for defining function objects in C++. In other words, they are the embodiment of 15-150's motto that "functions are values!". They are heavily used in this course to express functional-style algorithms.

Syntactically, lambdas are defined by writing square brackets [] followed by what looks like a usual function definition (a parameter list in brackets followed by a body).

auto square = [](int x) {
  return x * x;
};

Lambdas are often passed directly as arguments to algorithms:

auto squares = par::map(a, [](int x) {
  return x * x;
});

Lambda Captures

By default, lambdas are not allowed to access local variables from their enclosing scope. To allow this, we must explicitly capture variables by listing them inside the square brackets [].

int k = 3;
auto add_k = [k](int x) {  // captures a copy of k
  return x + k;
};

Lambda captures follow the same value/reference rules as ordinary variables. By default, captures are copies. To capture a reference, add &.

int k = 3;
auto add_k = [&k](int x) {  // captures k by reference
  return x + k;
};

When capturing by reference, changes to the variable are visible both inside and outside the lambda.

int k = 3;
auto f = [&k](int x) { return x + k; };

k = 10;
int y = f(5);  // y == 15

You can also use a default capture. Writing [=] captures everything by value, and writing [&] captures everything by reference.

In this course, we will usually use [&] when writing temporary lambdas that are immediately passed to higher-order functions (such as tabuluate, map, or reduce). This avoids unnecessary copies and is safe in these cases.

Important: Capturing by reference gives the same advantages and disadvantages of reference variables/parameters. They avoid unnecessary copies, but also open us up to the possibility of (possibly accidental) mutation. You must also be careful to not capture a reference to an object that might be destroyed before the lambda is called, because then the lambda will try to access an object that no longer exists! If this might be the case, you must capture by value instead.

Standard Library Vocabulary Types

The C++ standard library includes several lightweight "vocabulary types" that are used throughout the code in this course.

std::pair

A pair stores a pair of values, which may be of different types.

std::pair<int, int> p{3, 4};

Pairs are often used to return two values from a function. The elements of a pair can be read either from their member variables p.first and p.second or by using a structured binding:

auto [x, y] = p;  // x = 3 and y = 4

Structured bindings follow similar semantics to variable declarations: they can be declared as non-references (which will cause the right-hand side to be copy initialized), or as references or const references to the elements of the right-hand side.

std::pair<int, int> p{3, 4};

auto [x, y] = p;         // copies
auto& [x, y] = p;        // mutable reference
const auto& [x, y] = p;  // immutable reference

std::tuple

Tuples generalize pairs to any number of elements and are commonly used with functions like zip.

std::tuple<int, double, char> t{1, 2.5, 'a'};
auto [x, y, z] = t;  // x = 1,  y = 2.5,  z = 'a'

std::optional

An std::optional<T> represents either "a value of type T" or "no value." This is very useful as a return type when a function may or may not return a meaningful result.

std::optional<int> opt;
if (opt.has_value()) {
  // value is present, use opt.value() to access it
  some_function(opt.value());
}

We can write opt.has_value() in a conditional statement to test whether they contain a value or not. To access the stored value, you can write opt.value() or the more abbreviated syntax *opt (I prefer the former because the latter makes it look like a pointer, but it's not!)

With these concepts in mind, you should be well equipped to read and write the C++ code used throughout this course.