Introduction
I have a Rust training course that covers the basics of the Rust programming language that I occasionally give for paying customers. I have a very condensed version that I give for free online to developers of free and open source software. I do the free course partly as marketing, but mostly because I enjoy teaching. I've been part of the FOSS development community for decades, and I like Rust, and I know many FOSS developers would like to learn the language. I hope my course gives them a head start.
This web page is a further distillation of my condensed version of the course. It's what I wish I had been able to read in 2018 when I started learning Rust.
Some people learn better from an interactive lecture where they can ask questions. That's what my training course is about. Others like watching a video. Many people like to read. The various forms are suitable for different kinds of people, in different kinds of situations.
I know there are many Rust training resources out there. This my contribution to that. I hope at least some people find it useful for getting started with Rust.
In the interest of not writing a full book on Rust, I skip many details. I trust the reader to make an educated guess. For a quick overview of the basics, the guess is probably enough. (The benefit of an interactive training session is that you get to ask questions when anything is unclear.)
Target audience
The target audience of this page is people who already know how to program, and especially people who contribute, or want to contribute, to open source software. This is not a tutorial on programming.
It helps if you have experience with a language with a static type system, but that is not a requirement.
Goal
The goal of this page is to give you enough understanding of Rust that you can make an informed decision whether to learn more about the language. A secondary goal is to cover enough of the basics of the language to let you learn more on your own.
Overview of Rust
Rust tooling
To use Rust, you need to install some version of the "Rust toolchain": the
compiler, linker, standard library, and the cargo tool. For the purposes
of this page, or more generally for learning Rust, any version will do.
The Rust installation page guides you
to download the rustup installer, which is probably the most common way of
installing the toolchain. It's what I use.
Some Linux distributions also package Rust and you can install it via their package repositories. Due to many reasons, the distributions will likely have a somewhat older version, but for learning, that is not a problem.
If you want to use newer features of the language, you'll need a newer version of the toolchain, but that isn't what you should start with when learning the language.
To check that your installed toolchain is functional, you should try the following commands:
cargo init hello- create a new Rust project called
hello
- create a new Rust project called
cd hello- go to the project directory
cargo check- ensure the code is valid Rust
cargo build- build the program
cargo run- run the program
cargo clippy- ensure the code is idiomatic Rust
cargo doc --open- build documentation for the code, open it in a browser
If all of those work, your toolchain works. If not, you may need to fix something.
Overview of Rust
The cargo init command creates a "hello, world" project. We'll discuss the
files it creates below. There are two files: Cargo.toml and src/main.rs.
The Cargo.toml file has metadata about the program
[package]
name = "hello"
version = "0.1.0"
edition = "2021"
[dependencies]
The Cargo book
documents all the fields, but they're pretty obvious. We'll discuss edition
below.
The src/main.rs file is the actual source code:
fn main() {
println!("Hello, world!");
}
The main function is where execution
starts, and it prints out a greeting.
Cargo
cargo is a workflow tool. It scans the project source tree, determines what
files need to be compiled, invokes the compiler to do that, and the linker
to produce a binary, etc. It also runs the built binary, adds dependencies to
Cargo.toml, downloads dependencies, and more. The goal of cargo is to make
the development workflow for a Rust programmer smooth.
Rust programs do not need a Makefile or other such tooling to compile only
changed parts. cargo take care of that. When you run cargo run to invoke
a built program, cargo will always automatically check if the source or
dependencies have changed, and rebuild the binary if needed. Thus, you don't
need to run cargo build before you run cargo run.
cargo is extensible the way git is. If you invoke it with a command it
doesn't have builtin, it will try to run cargo-foo. Once you learn the
basics, you can explore a world of cargo extensions to make your development
life easier.
cargo uses the target directory at the root of the project for
output files, by default. You can override that, see the Cargo book about
CARGO_TARGET_DIR. (I'm not linking to that directly. You'll want to get
familiar with cargo documentation over time.)
Rust strengths and weaknesses
I like Rust, and I consider the following some of its strengths and weaknesses. It's a personal list, and you'll form your own opinion.
Strengths:
- memory management and memory safety
- no memory leaks
- no use of memory that's already been freed
- no dangling pointers
- no null pointers
- a strong type system with inference
- the compiler knows at compile time the actual type of each variable and value
- the compiler can infer the type of most variables and expressions, meaning the programmer doesn't need to
- the
ResultandOptiontypes- no runtime exceptions
- no magic values used to express lack of a value
- runtime performance
- zero cost abstractions
- iterators
- fast execution speed
- good control of how memory is used
- pretty good tooling
- friendly compiler
- strong support for IDEs, editors, via an LSP
- fearless refactoring
- evolves carefully
- a new release every six weeks
- very rarely breaks working code
Weaknesses:
- builds can be slow, for large Rust programs with many dependecies
- by default uses static linking
- binaries can be large
- but also means a binary is just one file
- not very good at rapid prototyping
- make you think about more things up front
- supports fewer target architectures than C
- still young, keeps changing
- every six weeks you have new release notes to read
Rust concepts
- automatic memory management without garbage collection
- enumerated types where each variant can contain different kinds of data
- traits, generic parameterized types
- pattern matching on value, unpacking them
- a crate as the unit of building - a library or application
- language editions
A Rust language "edition" is a higher level version number. Every release of the language, and its toolchain, has a version number: 1.0.0, 1.1.0, etc. An edition is a second level of versioning that allows breaking changes in the language. Every crate declares the edition it uses. Every version of the toolchain supports every edition up to the one that is current for that release. Crates can depend on crates that use any other edition.
This means when my library foo declares edition 2015, it can depend on your
library bar that uses edition 2024. The toolchain will compile foo using
the syntax and semantics of the 2015 edition and bar with those for the 2024
edition. The edition matters at the source level. At the linking stage, it's
all object code.
The edition enables the Rust language development to, for example, introduce
new keywords. The first release, 1.0.0, edition 2015, did not have support for
asynchronous functions. Adding that required new keywords async and await,
which were introduced in edition 2018. This means a crate that wants to use
such new stuff has to declare the right edition. On the other hand, no code
for an older edition breaks by the changes.
I happen to find this to be an incredibly powerful language feature. I can't wait for the edition that switches Rust to use Lisp syntax.
The Rust ecosystem
Rust is supported by and via the Rust Foundation. They are a way to channels
funds into development of the language and its toolchain. For example, they
operate sites such as crates.io, docs.rs, and the CI infrastructure for
language development.
crates.io is the default location for published open source crates. There
are many such crates now. You can use Rust without publishing anything,
and you can publish elsewhere. You can even only publish a Git repository.
However, the central site makes using crates as dependencies smoother.
crates.io requires a GitHub account to publish. I don't like that, but I
live with it. Such an account is not needed for using crates published on
the site.
There is a cultural bias against very small crates. This is in contrast
to the npm ecosystem that revels in such. Rust does not have a technical
limitation on how small a crate can be, but crates that are very small may
garner feedback.
The Rust ecosystem heavily emphasizes being welcoming and constructive. There is a code of conduct and it is enforced, including against core team members. This makes Rust possibly the most pleasant open source community I've ever been part of in my decades in open source.
The Rust language and toolchain developers take great care to avoid breaking other people's projects. This has spread to crate authors.
The cargo tool assumes, and to some degree enforces, the use of
semantic versioning. I find this to be good.
Important Rust web sites
- https://www.rust-lang.org/ - Rust home page
- https://crates.io/ - published open source crates
- https://docs.rs/ - automatically generated documentation for crates
https://doc.rust-lang.org/std/ - standard library documentation
https://doc.rust-lang.org/book/ - the book "Rust programming language"
- https://stevedonovan.github.io/rust-gentle-intro/ - a gentler introduction than the book
https://www.chiark.greenend.org.uk/~ianmdlvl/rust-polyglot/ - learn Rust using other lanuages you already know
https://blessed.rs/ - popular, commonly chosen libraries for various purposes
- https://serde.rs/ - a very popular set of libraries to serialize and parse structured file formats
Enterprise "hello, world"
This chapter is the important part of this page. It covers command line parsing and simple file I/O. These are things most programs need to do, even if they are not command line programs as such.
We've already seen the simplest version of a "hello, world" program above. Next we'll make two new versions:
- allow user to specify whom to greet on the command line
- allow user to specify whom to greet in a file
Processing the command line
The most commonly used library for command line argument processing is
clap. To use it we need to add it to
Cargo.toml:
[dependencies]
clap = { version = "4.0.2", features = ["derive"] }
The syntax is easy to get used to, but there are always details to remember,
and so cargo provides a helpful tool for this:
$ cargo add clap --features derive
The above command adds the clap dependency to Cargo.toml. The "features"
is a feature of Rust crates to enable some functionality or feature in the
crate when used as a dependency. Or some set of features.
A crate can declare a set of features it supports, and then use conditional compilation to build the dependency in a suitable way for each feature.
For clap the derive feature enables a way to express command line
arguments in a declarative way using "derive macros". This is easier for the
programmer than writing code to invoke the clap library to build a data
structure that defines the expected command line arguments.
The core part of clap for this is the Parser macro. In the example below,
Parser is applied to the type declaration that follows to generate code to
parse a command line according to the fields in the struct. Once the command
line has been parsed, a value of type Args can be accessed in the usual Rust
way for struct values. This means you can only access a field that is in
the Args type definion, and only in the way that's suitable for the type of
the field.
use clap::Parser;
#[derive(Parser)]
struct Args {
#[clap(default_value = "world")]
whom: String,
}
fn main() {
let args = Args::parse();
println!("hello, {}", args.whom);
}
The example makes the person to be greeted optional, by setting a default
value for whom.
At run time, the code in clap and used by the code generated by Parser
will examine the command line arguments provided by the user, and interpret
them according to the way Args is defined. clap also follows the command
line conventions that have grown over the decades for Unix.
The above example will terminate the program if there are any problems found
on the command line. There are other ways of invoking the clap code that
let the programmer handle the errors. However, as we are building enterprise
software here, it's OK to just die and be maximally unhelpful to the user.
Handling errors and the Result type
In the second version of our enterprise hello program we'll allow the user to name a file that we'll read to get the name of whom to greet. This will require us to have a slightly more complicated command line syntax, to read a file, and to handle errors.
In Rust a function that may fail returns a value of type Result. It is defined
as:
pub enum Result<T, E> {
Ok(T),
Err(E),
}
To unpack that:
- An
enumtype in Rust is a tagged union, also known as a sum type, among other names. Each variant is distinct and can contain one or more values. Each variant can contain different values of a different type. In the above example, theOkvariant contains one type of value to represent the case of an operation that succeeded. TheErrvariant is for when the operation failed. - The
Resulttype is generic, and has two type parameters. The typeTis for the success value, andEfor the error value. WhenResultis actually used, the type parameters are replaced by the actual concrete types, as if the programmer had defined a special result type for a specific actual use of the generic one.
For example, the standard library has the function std::fs::read to read a file
into memory. The slightly simplified declaration of the function is:
pub fn read(path: &Path) -> Result<Vec<u8>, std::io::Error>
This tells the compiler that the function takes one argument, which is a reference to a filename, and returns a byte vector on success and an I/O error on failure.
(The actual definition is more complicated. There is more type magic involved. However, I don't want to drown you in too much detail at once.)
Having a special result type for fallible operations is useful. It makes it explicit that something can fail. It especially makes it explicit to the compiler that something can fail. The compiler can then tell the programmer if they don't do something with the result.
The type of failures is usually a custom type in Rust. Often an enum. For
the hello example we'll define it using the thiserror crate. We add it as a
dependency (cargo add thiserror), and then we can use it:
#[derive(Debug, thiserror::Error)]
enum HelloError {
#[error("failed to read file {0}")]
Read(PathBuf, #[source] std::io::Error),
#[error("failed to parse file {0} as UTF-8")]
Utf8(PathBuf, #[source] std::string::FromUtf8Error),
}
Here we define a new type, HelloError, and use two "derive macros", like we
did for clap::Parser earlier. The actual variants represents the two error
cases we can have: we can't read the named file, or we can't interpret the
contents of the file as UTF-8.
The #[error("...")] decorations for the variants are used by
thiserror::Error to generate code to produce an error message. The string is
a format string, with {0} being a placeholder for the zeroth (i.e., first)
item in the values contained in the variant.
The #[source] is more magic for code generation. The Rust standard library
has an "underlying error" concept for error values. The thiserror library
supports that with the source decorator.
Reading files
With our custom error type, we can have the following main program:
use std::error::Error;
fn main() {
if let Err(e) = fallible_main() {
eprintln!("ERROR: {}", e);
let mut err = e.source();
while let Some(underlying) = err {
eprintln!("caused by: {}", underlying);
err = underlying.source();
}
}
}
fn fallible_main() -> Result<(), HelloError> {
let args = Args::parse();
println!("hello, {}", args.whom()?);
Ok(())
}
In Rust, a "trait" is effectively an interface, useful when more than one one
actual type needs to have the same interface. Traits must be visible at the
point of use of the interface. The std::error::Error trait is the standard
library interface for error value, and we need that to use the "underlying"
concept. It's not available by default, because of historical reasons.
The main function calls fallible_main and if that fails, prints out a
sequence of errors, each one caused by its underlying source error. The loop
to do that shows "deconstruction" of values using pattern matching.
The split into main and fallible_main is my personal quirk. It's not
necessarily common in other people's Rust programs.
The interesting part of the program is in the modified Args type:
use clap::Parser;
use std::fs::read;
use std::path::{Path, PathBuf};
#[derive(Parser)]
struct Args {
#[clap(default_value = "world")]
whom: String,
#[clap(short, long)]
filename: Option<PathBuf>,
}
impl Args {
fn whom(&self) -> Result<String, HelloError> {
if let Some(filename) = &self.filename {
let whom = Self::read(filename)?;
Ok(whom)
} else {
Ok(self.whom.clone())
}
}
fn read(filename: &Path) -> Result<String, HelloError> {
let data = read(filename).map_err(|e| HelloError::Read(filename.into(), e))?;
let whom = String::from_utf8(data).map_err(|e| HelloError::Utf8(filename.into(), e))?;
Ok(whom.trim().to_string())
}
}
We add a new field, and an implementation section for methods for the
type. Methods with a &self argument can be called on values of the type:
args.whom(). Without any variant of a self argument it's an "associated
function" and needs to be called using namespace: Self::read. (Self is
a handy alias for the type we are implementing. It saves us from having to
remember the name. This is known as being ergonomic in Rust.)
The whom method uses pattern matching again to see if a filename option has
been used on the command line. The Option type represents either having some
value or not having a value.
The Args::read function calls read, which is std::fs::read due to the
use for that above. The error value of that is "mapped" to the error type we
defined for this program. Likewise the attempt to convert the bytes read from
a file to a UTF-8 string can fail, and again we map the error.
Full source for enterprise hello
The full src/main.rs for the final version of our enterprise version of
"hello, world" is below.
use clap::Parser;
use std::fs::read;
use std::path::PathBuf;
fn main() {
if let Err(err) = fallible_main() {
eprintln!("ERROR: {}", err);
let mut err = err.source();
while let Some(underlying) = err {
eprintln!("caused by: {}", underlying);
err = underlying.source();
}
std::process::exit(1);
}
}
fn fallible_main() -> anyhow::Result<()> {
let args = Args::parse();
println!("Hello, {}!", args.whom()?);
Ok(())
}
#[derive(Parser)]
struct Args {
#[clap(short, long, help = "Who should be greeted?")]
whom: Option<String>,
#[clap(short, long, help = "Read name to greet from file")]
filename: Option<PathBuf>,
}
impl Args {
fn whom(&self) -> Result<String, std::io::Error> {
if let Some(filename) = &self.filename {
let data = read(filename).map_err(|e| HelloError::Read(filename.clone(), e))?;
let name = String::from_utf8_lossy(&data).to_string();
let name = name.strip_suffix('\n').or(Some(&name)).unwrap();
Ok(name.to_string())
} else if let Some(name) = &self.whom {
Ok(name.to_string())
} else {
Ok("world".to_string())
}
}
}
Iterators
Iterators are quite common in Rust. Every for loop has them, and explicit
use of iterators is idiomatic. You can implement your own iterators. The
standard library provides a trait, Iterator, for this, and doing that is
also a good way to demystify traits.
The Iterator trait can be condensed to this:
trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
}
This means whoever implements an iterator must specify two things:
- the type of the item returned for each iteration
- a method that produces the next item, if there is one
The iteration will end when next returns None. The type of items can be
anything. The important distinction is if they're the values themselves (i.e.,
ownership of the value is passed onto user of the iterator) or references to
values (e.g., what is contains in a collection data structure).
We'll implement an iterator over a sequence of integers. Integers are easy to copy so we return the actual values, and not references to them.
struct Seq {
goal: i32,
next: i32,
}
impl Iterator for Seq {
type Item = i32;
fn next(&mut self) -> Option<Self::Item> {
if self.next < self.goal {
let item = Some(self.next);
self.next += 1;
item
} else {
None
}
}
}
We can use this with a for loop to print numbers on a line:
for i in Seq::new(10) {
print!("{} ", i);
}
println!();
Full iterator example
fn main() {
// 0, 1, 2, etc, through to 9, but not including 10
for i in Seq::new(10) {
print!("{} ", i);
}
println!();
// -10, -9, etc, through to 9, but not including 10
for i in Seq::range(-10, 10) {
print!("{} ", i);
}
println!();
}
struct Seq {
goal: i32,
next: i32,
}
impl Seq {
fn new(goal: i32) -> Self {
Self {
goal,
next: 0,
}
}
fn range(start: i32, goal: i32) -> Self {
Self {
goal,
next: start,
}
}
}
impl Iterator for Seq {
type Item = i32;
fn next(&mut self) -> Option<Self::Item> {
if self.next < self.goal {
let item = Some(self.next);
self.next += 1;
item
} else {
None
}
}
}
Memory management
Memory management is the perennial problem in programming, because there is always less memory than what programmer want.
| Computer | year | RAM (KiB) | explanation |
|---|---|---|---|
| PDP-7 | 1965 | 9.2 KiB | First Unix computer |
| Commodore 64 | 1982 | 64 KiB | Very common early microcomputer |
| Cray X-MP | 1982 | 128 MiB | First super computer in Finland |
| Linus' first PC | 1991 | 4 MiB | Linux was made on this |
| Nokia X10 | 2021 | 6 GiB | My phone at the time |
The Cray was used for scientific research and weather modeling. My phone is used for watching TV. Of the two, my phone has more CPU, RAM, storage, and network bandwidth.
Memory management approaches
There are roughly three ways to dynamically manage memory in programming:
The approach used by C: there is a way to allocate a chunk of memory, and to free it. This is as simple as it gets, and programmers get it wrong all the time. Both Microsoft and Google have found that memory management problems are the cause of almost 90% of the security problems in the software they develop in the C language.
Motto: "Suffering builds character"
The approach used by Lisp, and most popular languages today: garbage collection. The programmer allocates memory and the language run time can free it when there are no references to it anymore. This mostly prevents most memory management problems at the cost of runtime performance. It's a fine approach, except for cases where a garbage collection delay may be catastrophic. For example, in code controlling brakes in a car.
Motto: "Things will usually... wait for it... work."
The approach used by Rust: the compiler knows at compile them when memory is allocated and no longer used. The compiler inserts instructions to allocate and free memory accordingly. The language rules make it impossible to use memory unless the compiler knows its allocation status.
Motto: "Prove to me you manage memory correctly."
The Rust approach leads to a more complicated language, but also much higher runtime performance.
Ownership and borrowing
In Rust every value has exactly one owner. A simplistic explanation is that the variable (memory location) where the value is stored is the owner. When the owner stops existing, e.g., it is removed from the stack, the value can be freed.
Ownership can be transferred. For example, the value may be assigned (moved) to another variable. The original owner can then no longer be used to access the value.
Values can be borrowed by creating references. Holding a reference does not mean you own the value. References can be mutable or immutable.
Rust has two rules for references:
- At any given time, you can have either one mutable reference or any number of immutable references.
- References must always be valid.
The compiler can check these rules at compile time. It will not compile a program unless it can verify the rules are followed. The part of the compiler that checks these rules is called the borrow checker.
The rules together provide the memory safety of Rust.
- You can't use memory before it is allocated.
- You can't use memory after it has been freed.
- You can't have data race conditions, where data is modified without some form of locking. Only one part at a time can mutate, and while it does that, no other part can read the data.
- There are no NULL pointers.
The rules don't prevent all bugs, but they eliminate memory management and concurrency problems quite effectively. They are the safety belts of programming.
Advice for self study
- Use
cloneliberally, if the borrow checker gets in your way. When you're learning, you have many things to worry about. You can get friendly with the borrow checker once you've learnt enough to do useful things. - Use
cargo fmtto format your code in the canonical way. - Use
cargo clippyto learn language idioms. - The
anyhowcrate make errors for applications easy.thiserroris better for anything that resembles a library. - Learn to use an implement traits. They're deceptively simple.
- Take small steps. No, much smaller than that.
Acknowledgements
Several people have pointed things to improve: Richard Braakman, Matthew Wilcox, Dagfinn Ilmari Mannsåker, MicroPanda123, mewsleah, Marcos Dione, and probably some I failed to add.