Introduction

I have a Rust training course that covers the basics of the Rust programming language that I occasionally give for paying customers. I have a very condensed version that I give for free online to developers of free and open source software. I do the free course partly as marketing, but mostly because I enjoy teaching. I've been part of the FOSS development community for decades, and I like Rust, and I know many FOSS developers would like to learn the language. I hope my course gives them a head start.

This web page is a further distillation of my condensed version of the course. It's what I wish I had been able to read in 2018 when I started learning Rust.

Some people learn better from an interactive lecture where they can ask questions. That's what my training course is about. Others like watching a video. Many people like to read. The various forms are suitable for different kinds of people, in different kinds of situations.

I know there are many Rust training resources out there. This my contribution to that. I hope at least some people find it useful for getting started with Rust.

In the interest of not writing a full book on Rust, I skip many details. I trust the reader to make an educated guess. For a quick overview of the basics, the guess is probably enough. (The benefit of an interactive training session is that you get to ask questions when anything is unclear.)

Target audience

The target audience of this page is people who already know how to program, and especially people who contribute, or want to contribute, to open source software. This is not a tutorial on programming.

It helps if you have experience with a language with a static type system, but that is not a requirement.

Goal

The goal of this page is to give you enough understanding of Rust that you can make an informed decision whether to learn more about the language. A secondary goal is to cover enough of the basics of the language to let you learn more on your own.

Overview of Rust

Rust tooling

To use Rust, you need to install some version of the "Rust toolchain": the compiler, linker, standard library, and the cargo tool. For the purposes of this page, or more generally for learning Rust, any version will do.

The Rust installation page guides you to download the rustup installer, which is probably the most common way of installing the toolchain. It's what I use.

Some Linux distributions also package Rust and you can install it via their package repositories. Due to many reasons, the distributions will likely have a somewhat older version, but for learning, that is not a problem.

If you want to use newer features of the language, you'll need a newer version of the toolchain, but that isn't what you should start with when learning the language.

To check that your installed toolchain is functional, you should try the following commands:

  • cargo init hello
    • create a new Rust project called hello
  • cd hello
    • go to the project directory
  • cargo check
    • ensure the code is valid Rust
  • cargo build
    • build the program
  • cargo run
    • run the program
  • cargo clippy
    • ensure the code is idiomatic Rust
  • cargo doc --open
    • build documentation for the code, open it in a browser

If all of those work, your toolchain works. If not, you may need to fix something.

Overview of Rust

The cargo init command creates a "hello, world" project. We'll discuss the files it creates below. There are two files: Cargo.toml and src/main.rs.

The Cargo.toml file has metadata about the program

[package]
name = "hello"
version = "0.1.0"
edition = "2021"

[dependencies]

The Cargo book documents all the fields, but they're pretty obvious. We'll discuss edition below.

The src/main.rs file is the actual source code:

fn main() {
    println!("Hello, world!");
}

The main function is where execution starts, and it prints out a greeting.

Cargo

cargo is a workflow tool. It scans the project source tree, determines what files need to be compiled, invokes the compiler to do that, and the linker to produce a binary, etc. It also runs the built binary, adds dependencies to Cargo.toml, downloads dependencies, and more. The goal of cargo is to make the development workflow for a Rust programmer smooth.

Rust programs do not need a Makefile or other such tooling to compile only changed parts. cargo take care of that. When you run cargo run to invoke a built program, cargo will always automatically check if the source or dependencies have changed, and rebuild the binary if needed. Thus, you don't need to run cargo build before you run cargo run.

cargo is extensible the way git is. If you invoke it with a command it doesn't have builtin, it will try to run cargo-foo. Once you learn the basics, you can explore a world of cargo extensions to make your development life easier.

cargo uses the target directory at the root of the project for output files, by default. You can override that, see the Cargo book about CARGO_TARGET_DIR. (I'm not linking to that directly. You'll want to get familiar with cargo documentation over time.)

Rust strengths and weaknesses

I like Rust, and I consider the following some of its strengths and weaknesses. It's a personal list, and you'll form your own opinion.

Strengths:

  • memory management and memory safety
    • no memory leaks
    • no use of memory that's already been freed
    • no dangling pointers
    • no null pointers
  • a strong type system with inference
    • the compiler knows at compile time the actual type of each variable and value
    • the compiler can infer the type of most variables and expressions, meaning the programmer doesn't need to
  • the Result and Option types
    • no runtime exceptions
    • no magic values used to express lack of a value
  • runtime performance
    • zero cost abstractions
    • iterators
    • fast execution speed
    • good control of how memory is used
  • pretty good tooling
    • friendly compiler
    • strong support for IDEs, editors, via an LSP
    • fearless refactoring
  • evolves carefully
    • a new release every six weeks
    • very rarely breaks working code

Weaknesses:

  • builds can be slow, for large Rust programs with many dependecies
  • by default uses static linking
    • binaries can be large
    • but also means a binary is just one file
  • not very good at rapid prototyping
    • make you think about more things up front
  • supports fewer target architectures than C
  • still young, keeps changing
    • every six weeks you have new release notes to read

Rust concepts

  • automatic memory management without garbage collection
  • enumerated types where each variant can contain different kinds of data
  • traits, generic parameterized types
  • pattern matching on value, unpacking them
  • a crate as the unit of building - a library or application
  • language editions

A Rust language "edition" is a higher level version number. Every release of the language, and its toolchain, has a version number: 1.0.0, 1.1.0, etc. An edition is a second level of versioning that allows breaking changes in the language. Every crate declares the edition it uses. Every version of the toolchain supports every edition up to the one that is current for that release. Crates can depend on crates that use any other edition.

This means when my library foo declares edition 2015, it can depend on your library bar that uses edition 2024. The toolchain will compile foo using the syntax and semantics of the 2015 edition and bar with those for the 2024 edition. The edition matters at the source level. At the linking stage, it's all object code.

The edition enables the Rust language development to, for example, introduce new keywords. The first release, 1.0.0, edition 2015, did not have support for asynchronous functions. Adding that required new keywords async and await, which were introduced in edition 2018. This means a crate that wants to use such new stuff has to declare the right edition. On the other hand, no code for an older edition breaks by the changes.

I happen to find this to be an incredibly powerful language feature. I can't wait for the edition that switches Rust to use Lisp syntax.

The Rust ecosystem

Rust is supported by and via the Rust Foundation. They are a way to channels funds into development of the language and its toolchain. For example, they operate sites such as crates.io, docs.rs, and the CI infrastructure for language development.

crates.io is the default location for published open source crates. There are many such crates now. You can use Rust without publishing anything, and you can publish elsewhere. You can even only publish a Git repository. However, the central site makes using crates as dependencies smoother.

crates.io requires a GitHub account to publish. I don't like that, but I live with it. Such an account is not needed for using crates published on the site.

There is a cultural bias against very small crates. This is in contrast to the npm ecosystem that revels in such. Rust does not have a technical limitation on how small a crate can be, but crates that are very small may garner feedback.

The Rust ecosystem heavily emphasizes being welcoming and constructive. There is a code of conduct and it is enforced, including against core team members. This makes Rust possibly the most pleasant open source community I've ever been part of in my decades in open source.

The Rust language and toolchain developers take great care to avoid breaking other people's projects. This has spread to crate authors.

The cargo tool assumes, and to some degree enforces, the use of semantic versioning. I find this to be good.

Important Rust web sites

Enterprise "hello, world"

This chapter is the important part of this page. It covers command line parsing and simple file I/O. These are things most programs need to do, even if they are not command line programs as such.

We've already seen the simplest version of a "hello, world" program above. Next we'll make two new versions:

  • allow user to specify whom to greet on the command line
  • allow user to specify whom to greet in a file

Processing the command line

The most commonly used library for command line argument processing is clap. To use it we need to add it to Cargo.toml:

[dependencies]
clap = { version = "4.0.2", features = ["derive"] }

The syntax is easy to get used to, but there are always details to remember, and so cargo provides a helpful tool for this:

$ cargo add clap --features derive

The above command adds the clap dependency to Cargo.toml. The "features" is a feature of Rust crates to enable some functionality or feature in the crate when used as a dependency. Or some set of features.

A crate can declare a set of features it supports, and then use conditional compilation to build the dependency in a suitable way for each feature.

For clap the derive feature enables a way to express command line arguments in a declarative way using "derive macros". This is easier for the programmer than writing code to invoke the clap library to build a data structure that defines the expected command line arguments.

The core part of clap for this is the Parser macro. In the example below, Parser is applied to the type declaration that follows to generate code to parse a command line according to the fields in the struct. Once the command line has been parsed, a value of type Args can be accessed in the usual Rust way for struct values. This means you can only access a field that is in the Args type definion, and only in the way that's suitable for the type of the field.

use clap::Parser;

#[derive(Parser)]
struct Args {
    #[clap(default_value = "world")]
    whom: String,
}

fn main() {
    let args = Args::parse();
    println!("hello, {}", args.whom);
}

The example makes the person to be greeted optional, by setting a default value for whom.

At run time, the code in clap and used by the code generated by Parser will examine the command line arguments provided by the user, and interpret them according to the way Args is defined. clap also follows the command line conventions that have grown over the decades for Unix.

The above example will terminate the program if there are any problems found on the command line. There are other ways of invoking the clap code that let the programmer handle the errors. However, as we are building enterprise software here, it's OK to just die and be maximally unhelpful to the user.

Handling errors and the Result type

In the second version of our enterprise hello program we'll allow the user to name a file that we'll read to get the name of whom to greet. This will require us to have a slightly more complicated command line syntax, to read a file, and to handle errors.

In Rust a function that may fail returns a value of type Result. It is defined as:

pub enum Result<T, E> {
    Ok(T),
    Err(E),
}

To unpack that:

  • An enum type in Rust is a tagged union, also known as a sum type, among other names. Each variant is distinct and can contain one or more values. Each variant can contain different values of a different type. In the above example, the Ok variant contains one type of value to represent the case of an operation that succeeded. The Err variant is for when the operation failed.
  • The Result type is generic, and has two type parameters. The type T is for the success value, and E for the error value. When Result is actually used, the type parameters are replaced by the actual concrete types, as if the programmer had defined a special result type for a specific actual use of the generic one.

For example, the standard library has the function std::fs::read to read a file into memory. The slightly simplified declaration of the function is:

pub fn read(path: &Path) -> Result<Vec<u8>, std::io::Error>

This tells the compiler that the function takes one argument, which is a reference to a filename, and returns a byte vector on success and an I/O error on failure.

(The actual definition is more complicated. There is more type magic involved. However, I don't want to drown you in too much detail at once.)

Having a special result type for fallible operations is useful. It makes it explicit that something can fail. It especially makes it explicit to the compiler that something can fail. The compiler can then tell the programmer if they don't do something with the result.

The type of failures is usually a custom type in Rust. Often an enum. For the hello example we'll define it using the thiserror crate. We add it as a dependency (cargo add thiserror), and then we can use it:

#[derive(Debug, thiserror::Error)]
enum HelloError {
    #[error("failed to read file {0}")]
    Read(PathBuf, #[source] std::io::Error),

    #[error("failed to parse file {0} as UTF-8")]
    Utf8(PathBuf, #[source] std::string::FromUtf8Error),
}

Here we define a new type, HelloError, and use two "derive macros", like we did for clap::Parser earlier. The actual variants represents the two error cases we can have: we can't read the named file, or we can't interpret the contents of the file as UTF-8.

The #[error("...")] decorations for the variants are used by thiserror::Error to generate code to produce an error message. The string is a format string, with {0} being a placeholder for the zeroth (i.e., first) item in the values contained in the variant.

The #[source] is more magic for code generation. The Rust standard library has an "underlying error" concept for error values. The thiserror library supports that with the source decorator.

Reading files

With our custom error type, we can have the following main program:

use std::error::Error;

fn main() {
    if let Err(e) = fallible_main() {
        eprintln!("ERROR: {}", e);
        let mut err = e.source();
        while let Some(underlying) = err {
            eprintln!("caused by: {}", underlying);
            err = underlying.source();
        }
    }
}

fn fallible_main() -> Result<(), HelloError> {
    let args = Args::parse();
    println!("hello, {}", args.whom()?);
    Ok(())
}

In Rust, a "trait" is effectively an interface, useful when more than one one actual type needs to have the same interface. Traits must be visible at the point of use of the interface. The std::error::Error trait is the standard library interface for error value, and we need that to use the "underlying" concept. It's not available by default, because of historical reasons.

The main function calls fallible_main and if that fails, prints out a sequence of errors, each one caused by its underlying source error. The loop to do that shows "deconstruction" of values using pattern matching.

The split into main and fallible_main is my personal quirk. It's not necessarily common in other people's Rust programs.

The interesting part of the program is in the modified Args type:

use clap::Parser;
use std::fs::read;
use std::path::{Path, PathBuf};

#[derive(Parser)]
struct Args {
    #[clap(default_value = "world")]
    whom: String,

    #[clap(short, long)]
    filename: Option<PathBuf>,
}

impl Args {
    fn whom(&self) -> Result<String, HelloError> {
        if let Some(filename) = &self.filename {
            let whom = Self::read(filename)?;
            Ok(whom)
        } else {
            Ok(self.whom.clone())
        }
    }

    fn read(filename: &Path) -> Result<String, HelloError> {
        let data = read(filename).map_err(|e| HelloError::Read(filename.into(), e))?;
        let whom = String::from_utf8(data).map_err(|e| HelloError::Utf8(filename.into(), e))?;
        Ok(whom.trim().to_string())
    }
}

We add a new field, and an implementation section for methods for the type. Methods with a &self argument can be called on values of the type: args.whom(). Without any variant of a self argument it's an "associated function" and needs to be called using namespace: Self::read. (Self is a handy alias for the type we are implementing. It saves us from having to remember the name. This is known as being ergonomic in Rust.)

The whom method uses pattern matching again to see if a filename option has been used on the command line. The Option type represents either having some value or not having a value.

The Args::read function calls read, which is std::fs::read due to the use for that above. The error value of that is "mapped" to the error type we defined for this program. Likewise the attempt to convert the bytes read from a file to a UTF-8 string can fail, and again we map the error.

Full source for enterprise hello

The full src/main.rs for the final version of our enterprise version of "hello, world" is below.

use clap::Parser;
use std::fs::read;
use std::path::PathBuf;

fn main() {
    if let Err(err) = fallible_main() {
        eprintln!("ERROR: {}", err);
        let mut err = err.source();
        while let Some(underlying) = err {
            eprintln!("caused by: {}", underlying);
            err = underlying.source();
        }
        std::process::exit(1);
    }
}

fn fallible_main() -> anyhow::Result<()> {
    let args = Args::parse();
    println!("Hello, {}!", args.whom()?);
    Ok(())
}

#[derive(Parser)]
struct Args {
    #[clap(short, long, help = "Who should be greeted?")]
    whom: Option<String>,

    #[clap(short, long, help = "Read name to greet from file")]
    filename: Option<PathBuf>,
}

impl Args {
    fn whom(&self) -> Result<String, std::io::Error> {
        if let Some(filename) = &self.filename {
            let data = read(filename).map_err(|e| HelloError::Read(filename.clone(), e))?;
            let name = String::from_utf8_lossy(&data).to_string();
            let name = name.strip_suffix('\n').or(Some(&name)).unwrap();
            Ok(name.to_string())
        } else if let Some(name) = &self.whom {
            Ok(name.to_string())
        } else {
            Ok("world".to_string())
        }
    }
}

Iterators

Iterators are quite common in Rust. Every for loop has them, and explicit use of iterators is idiomatic. You can implement your own iterators. The standard library provides a trait, Iterator, for this, and doing that is also a good way to demystify traits.

The Iterator trait can be condensed to this:

trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}

This means whoever implements an iterator must specify two things:

  • the type of the item returned for each iteration
  • a method that produces the next item, if there is one

The iteration will end when next returns None. The type of items can be anything. The important distinction is if they're the values themselves (i.e., ownership of the value is passed onto user of the iterator) or references to values (e.g., what is contains in a collection data structure).

We'll implement an iterator over a sequence of integers. Integers are easy to copy so we return the actual values, and not references to them.

struct Seq {
    goal: i32,
    next: i32,
}

impl Iterator for Seq {
    type Item = i32;
    fn next(&mut self) -> Option<Self::Item> {
        if self.next < self.goal {
            let item = Some(self.next);
            self.next += 1;
            item
        } else {
            None
        }
    }
}

We can use this with a for loop to print numbers on a line:

for i in Seq::new(10) {
    print!("{} ", i);
}
println!();

Full iterator example

fn main() {
    // 0, 1, 2, etc, through to 9, but not including 10
    for i in Seq::new(10) {
        print!("{} ", i);
    }
    println!();

    // -10, -9, etc, through to 9, but not including 10
    for i in Seq::range(-10, 10) {
        print!("{} ", i);
    }
    println!();
}

struct Seq {
    goal: i32,
    next: i32,
}

impl Seq {
    fn new(goal: i32) -> Self {
        Self {
            goal,
            next: 0,
        }
    }

    fn range(start: i32, goal: i32) -> Self {
        Self {
            goal,
            next: start,
        }
    }
}

impl Iterator for Seq {
    type Item = i32;
    fn next(&mut self) -> Option<Self::Item> {
        if self.next < self.goal {
            let item = Some(self.next);
            self.next += 1;
            item
        } else {
            None
        }
    }
}

Memory management

Memory management is the perennial problem in programming, because there is always less memory than what programmer want.

Computer year RAM (KiB) explanation
PDP-7 1965 9.2 KiB First Unix computer
Commodore 64 1982 64 KiB Very common early microcomputer
Cray X-MP 1982 128 MiB First super computer in Finland
Linus' first PC 1991 4 MiB Linux was made on this
Nokia X10 2021 6 GiB My phone at the time

The Cray was used for scientific research and weather modeling. My phone is used for watching TV. Of the two, my phone has more CPU, RAM, storage, and network bandwidth.

Memory management approaches

There are roughly three ways to dynamically manage memory in programming:

  • The approach used by C: there is a way to allocate a chunk of memory, and to free it. This is as simple as it gets, and programmers get it wrong all the time. Both Microsoft and Google have found that memory management problems are the cause of almost 90% of the security problems in the software they develop in the C language.

    Motto: "Suffering builds character"

  • The approach used by Lisp, and most popular languages today: garbage collection. The programmer allocates memory and the language run time can free it when there are no references to it anymore. This mostly prevents most memory management problems at the cost of runtime performance. It's a fine approach, except for cases where a garbage collection delay may be catastrophic. For example, in code controlling brakes in a car.

    Motto: "Things will usually... wait for it... work."

  • The approach used by Rust: the compiler knows at compile them when memory is allocated and no longer used. The compiler inserts instructions to allocate and free memory accordingly. The language rules make it impossible to use memory unless the compiler knows its allocation status.

    Motto: "Prove to me you manage memory correctly."

The Rust approach leads to a more complicated language, but also much higher runtime performance.

Ownership and borrowing

In Rust every value has exactly one owner. A simplistic explanation is that the variable (memory location) where the value is stored is the owner. When the owner stops existing, e.g., it is removed from the stack, the value can be freed.

Ownership can be transferred. For example, the value may be assigned (moved) to another variable. The original owner can then no longer be used to access the value.

Values can be borrowed by creating references. Holding a reference does not mean you own the value. References can be mutable or immutable.

Rust has two rules for references:

  1. At any given time, you can have either one mutable reference or any number of immutable references.
  2. References must always be valid.

The compiler can check these rules at compile time. It will not compile a program unless it can verify the rules are followed. The part of the compiler that checks these rules is called the borrow checker.

The rules together provide the memory safety of Rust.

  • You can't use memory before it is allocated.
  • You can't use memory after it has been freed.
  • You can't have data race conditions, where data is modified without some form of locking. Only one part at a time can mutate, and while it does that, no other part can read the data.
  • There are no NULL pointers.

The rules don't prevent all bugs, but they eliminate memory management and concurrency problems quite effectively. They are the safety belts of programming.

Advice for self study

  • Use clone liberally, if the borrow checker gets in your way. When you're learning, you have many things to worry about. You can get friendly with the borrow checker once you've learnt enough to do useful things.
  • Use cargo fmt to format your code in the canonical way.
  • Use cargo clippy to learn language idioms.
  • The anyhow crate make errors for applications easy. thiserror is better for anything that resembles a library.
  • Learn to use an implement traits. They're deceptively simple.
  • Take small steps. No, much smaller than that.

Acknowledgements

Several people have pointed things to improve: Richard Braakman, Matthew Wilcox, Dagfinn Ilmari Mannsåker, MicroPanda123, mewsleah, Marcos Dione, and probably some I failed to add.