Stanford CS106B: Programming Abstractions

Created by: Roger Grosse
Intended for: CS106B students, those new to programming

CS106B, "Programming Abstractions," is the second term of Stanford's introductory programming sequence, taught in C++. It follows upon CS106A, "Programming Methodology," which covers the basics of programming itself (functions, control structures, classes, etc.). 106B is about basic data structures and algorithms, and how they can be encapsulated as abstract data types.

This guide is based on the class as it was taught in 2004, the year I took it. Things may have changed since then.

You can find lecture videos for the 2008 version of the course here.

Week 1: C++ language

The first week or so of the course covers aspects of the C++ language which are needed for the course -- mostly by highlighting the differences from Java, the language taught in 106A. The main new concept is pointers, but the language also has different syntax for things like classes and input/output.

If you haven't programmed before, check out our beginning programming roadmap, which covers comparable material to CS106A.

Weeks 2-3: recursion

The next two weeks focus on recursion, a programming technique where a function calls itself. Some computations, such as the factorial function or the solution to the Tower of Hanoi puzzle, have a natural recursive structure. Trees, an important data structure, are implemented most elegantly using recursive functions.

A lot of programming tasks can be formulated as search problems, where you want to find an object (e.g. an arrangement of queens on a chess board) which satisfies certain constraints (e.g. no queen can attack any other queen). (Or, sometimes, we want to find all such objects.) Recursive backtracking is a kind of brute-force search algorithm which tries to build solutions one component at a time, backtracking whenever the partial solution violates one of the constraints. One of the early assignments in the course is to use backtracking to find all of the words in a Boggle board.

Week 4: sorting

The course next introduces a prototypical algorithm design problem, sorting an array (i.e. arranging the elements in order). This unit also serves to introduce the idea of the asymptotic complexity of an algorithm -- how the running time scales with the input size, when you ignore constant factors. It covers insertion sort and selection sort, two simple O(n^2) sorting algorithms. It then introduces merge sort and quicksort, two efficient sorting algorithms based on the divide-and-conquer technique. Both algorithms are most naturally implemented as recursive functions.

Weeks 5-9: data structures

Finally, the heart of the course is about data structures and their implementation. There are two ways of looking at data structures:

  • abstract data types (ADTs), which define the abstract operations the data structure must support, and the semantics of those operations, and
  • the actual implementation of the data structure

Abstract data types are an important example of the principle of abstraction, the idea that a system should hide the details of its implementation, so that the client can think about the problem at a higher level. Abstraction makes it easier to build complex pieces of software, and also makes it easier to modify the underlying implementation of the data structure, should one come up with a more efficient approach.

This part of the course starts by talking about two ADTs: stacks and queues. A stack is a "last-in-first-out" (LIFO) container, where one can "push" an item onto the stack, or "pop" the most recently inserted item, but one does not have access to any of the other elements in the stack. A queue is analogous, except that it is a "first-in-first-out" (FIFO) data structure, where one retrieves the elements in the order in which they were added.

This description leaves out how these ADTs are implemented. Either one can be implemented in terms of an array or a linked list.

Container objects such as arrays, stacks, and queues, are supposed to be able to hold elements of arbitrary data type. This is known as generic programming, and in the context of the course, is done using C++ templates.

Another important ADT is the associative array, which stores and retrieves elements indexed by keys which need not be integers. (Most commonly, they are strings.) CS106B covers two data structures which efficiently implement associative arrays: binary search trees, and hash tables.

The class then covers some algorithms related to graphs, including finding a path from one node to another. This ties in with the units on recursion and ADTs: you can find a path using recursive backtracking. If you maintain the active set using a stack, you get depth-first search; if you use a queue, you get breadth-first search.

Finally, the course covers two more specialized data structures: heaps, which implement priority queues, and tries, an efficient way of storing sets of words which allows searching by prefix.