Rust Ownership, Borrowing, and Lifetimes
I’ve been learning Rust recently. This will probably be my third (lazy) attempt to learn the language. The reason I’ve failed previously is simply because I had no reason to learn it.
Other than the memory safety aspects, which I like a lot, I don’t actually like the design of the language at all (but that’s a conversation for another day).
This time around I want to learn the language as it’s pertinent to my job. A few months ago I started working for Fastly as a Staff Software Engineer in their Developer Relations team, and their powerful Compute@Edge platform has strong support for Rust.
I want to start building things on the Compute@Edge platform, so here I am giving Rust another go (at least until TinyGo support matures).
So I know from prior experience the real learning curve involved with this language is going to be its memory management model, which is broken down into three sections:
Understanding the first section (ownership) requires an understanding of ‘stack’ vs ‘heap’ memory, which you may be unfamiliar with depending on your programming experience and if you’ve used only high-level programming languages. So here’s a quick rundown…
Stack Memory
The stack is memory that is available to your program at runtime. Data that is assigned to variables or passed as arguments to function calls are allocated onto the stack in a ‘Last In, First Out’ (LIFO) model, much like a stack of plates.
The only type of data that can be stored on the stack is data that has a known and fixed size. All other ‘unknown’ data must go onto the heap.
The stack is very fast to access data because the data is close together, unlike heap memory, and also because the data that is stored is literal (i.e. primitive types like string literals, booleans, integers etc) it means the data itself can be hard coded into the compiled Rust binary.
NOTE: Rust primitive/scalar types (int, bool, float, char, string literal etc) are stored in stack memory.
Heap Memory
The heap is memory that is also available to your program at runtime but is much slower to access than stack memory. This is because it requires a ‘pointer’ to locate the stored data, of which the storage area is large and very unorganized.
The heap is for memory that grows or has unknown size (such as when accepting user input, you don’t know what data will be provided), and it will be allocated onto the heap by a ‘memory allocator’ that needs to first find a space in the heap large enough to hold the data, and then return a pointer to that space in the heap.
NOTE: Rust complex types (String, Box etc.) are stored into heap memory.
Ownership
The rules for ownership are quite simple:
- Data is assigned to a variable.
- The variable becomes the ‘owner’ of the data.
- There can only be one owner at a time.
- When the owner goes out of scope, the data will be dropped.
Primitive types are popped from stack memory automatically when they go out of scope (e.g. when a function block ends), while complex types must implement a drop
function which Rust will call when out of scope (to explicitly deallocate the heap memory).
So here are some of the gotchas that trip people up:
- Primitive types are copied (because it’s cheap to copy stack memory).
- Primitive types have a
Copy
trait that enable this behaviour. - Complex types move ownership.
- Complex types do not have a
Copy
trait.
As an example, consider the following code, which compiles correctly because we’re dealing with primitives and so the println!()
macro used is safely able to reference both the variable a
and b
.
fn main() {
let a = 123;
let b = a;
println!("a: {}, b: {}", a, b); // a: 123, b: 123
}
Now consider a similar example which doesn’t work because we’re dealing with a complex type (String
). The value assigned to a
is moved to b
. The b
variable has now become the new owner of the data, and this means a
is not allowed to be used again (e.g. we can’t reference it in println!()
).
fn main() {
let a = String::from("foo");
let b = a;
println!("a: {}, b: {}", a, b); // COMPILER ERROR
}
This will generate the following compiler error:
error[E0382]: borrow of moved value: `a`
--> src/main.rs:5:30
|
2 | let a = String::from("foo");
| - move occurs because `a` has type `String`, which does not implement the `Copy` trait
3 | let b = a;
| - value moved here
4 |
5 | println!("a: {}, b: {}", a, b);
| ^ value borrowed here after move
The only solution here is to manually copy the value using the .clone()
method of the String
type, which means b
no longer becomes the new owner of the data (the data itself is duplicated and so it’s new data that b
is the owner of):
fn main() {
let a = String::from("foo");
let b = a.clone();
println!("a: {}, b: {}", a, b); // a: foo, b: foo
}
Using clone()
will duplicate the heap memory, which isn’t cheap. This forces the programmer to opt into this more expensive behaviour, and is a performance/design decision most users of high-level programming languages don’t often have to think about.
It’s important to realise that passing a variable to a function will either move or copy (just as assignment does), and that even a function’s return value can transfer ownership in the same way. So for example, returning a complex type will move ownership to the caller (and the variable the returned value is assigned to becomes the new owner).
In that scenario the complex type’s drop
function is not called, as would normally be the case if a variable went out of scope at the end of a function (even if the original variable/owner was created within the function) as the compiler is able to determine that the value shouldn’t be dropped and instead is being moved to a new owner somewhere else in your program.
Borrowing
The concept of borrowing is designed to make dealing with ownership changes easier. It does this by avoiding the moving of owners. The way it does this is by letting your program provide a ‘reference’ to the data. This means the receiver of the reference (e.g. a function, struct field or a variable etc) can use the value temporarily without taking ownership of it.
To pass a reference instead of passing over ownership, all you have to do is prefix your variable with an ampersand:
fn main() {
let s = String::from("foo");
borrow(&s) // pass a 'reference' to s
}
fn borrow(s: &String) { // accept a 'reference type'
println!("s: {}", s);
}
In the above example, the s
argument variable of the borrow
function will be created on the stack, and will point to data in heap memory. When the borrow
function finishes and the s
variable is popped off the stack it won’t delete any data because it never owned the data that s
was pointing to.
The next thing you’ll likely want to do in your programs is to have functions that borrow data to be able to mutate it. That’s simple enough by using the mut
keyword.
Notice in the below example code we not only define the main
function’s s
variable to be mutable but we also have to change the reference in the call to borrow()
as well as the borrow
function’s signature to accept a mutable reference.
Also notice that after borrowing the value, we call take_ownership()
and we don’t pass a ‘reference’, meaning the function is the new owner of the data that was belonging to s
:
fn main() {
let mut s = String::from("foo");
println!("s: {}", s); // s: foo
borrow(&mut s);
println!("s: {}", s); // s: foobar
take_ownership(s);
println!("s: {}", s); // COMPILER ERROR (ownership of s was moved)
}
fn borrow(s: &mut String) {
s.push_str("bar");
}
fn take_ownership(s: String) {
println!("s: {}", s); // s: foobar
}
It’s also important at this point to understand that defining a variable as being ‘mutable’ and passing a ‘mutable reference’ are two different things. You can see in the below example code that we have said s
is mutable and then we pass it as an immutable reference to borrow_no_mut()
and then as a mutable reference to borrow_with_mut()
:
fn main() {
let mut s = String::from("foo");
borrow_no_mut(&s);
borrow_with_mut(&mut s);
println!("s: {}", s) // foobar
}
fn borrow_no_mut(s: &String) {
println!("s: {}", s) // foo
}
fn borrow_with_mut(s: &mut String) {
s.push_str("bar");
}
Gotchas
- You can’t take a reference and then modify the original variable’s value.
Here’s an example that doesn’t compile:
fn main() {
let mut x;
x = 42;
let y = &x;
x = 43;
println!("{:?}", y);
}
In the above example we define x
and assign the value 42
. Next we define y
and take a reference to x
(i.e. we ‘borrow’ it). Lastly we try to reassign a new value to x
while still holding a reference to it, which isn’t allowed because it would potentially invalidate the reference.
This would result in the following compiler error:
$ cargo run
Compiling chapter1 v0.1.0 (/Users/integralist/Code/rust/rust-for-rustaceans/chapter1)
warning: value assigned to `x` is never read
--> src/main.rs:5:5
|
5 | x = 43;
| ^
|
= note: `#[warn(unused_assignments)]` on by default
= help: maybe it is overwritten before being read?
error[E0506]: cannot assign to `x` because it is borrowed
--> src/main.rs:5:5
|
4 | let y = &x;
| -- borrow of `x` occurs here
5 | x = 43;
| ^^^^^^ assignment to borrowed `x` occurs here
6 | println!("{:?}", y);
| - borrow later used here
For more information about this error, try `rustc --explain E0506`.
warning: `chapter1` (bin "chapter1") generated 1 warning
error: could not compile `chapter1` due to previous error; 1 warning emitted
To solve this problem you need y
to fall out of scope (or alternatively create a function so that when the function is called with a reference the reference drops out of scope at the end). This can be done using block scope syntax like so:
fn main() {
let mut x;
x = 42;
{
let y = &x;
println!("{:?}", y); // 42
}
x = 43;
println!("{:?}", x); // 43
}
Another gotcha:
- You can have only one mutable reference (i.e. this prevents data races).
…unless! the scope allows for it.
So here is an example where it isn’t allowed:
fn main() {
let mut s = String::from("foo");
let a = &mut s;
borrow(a);
let b = &mut s;
borrow(b);
println!("a: {}", a); // COMPILER ERROR
println!("b: {}", b);
}
fn borrow(s: &mut String) {
s.push_str("bar");
}
To make this example work we need the scope rules to allow for it, which means moving the first mutable reference assignment into its own block where the end of the newly defined block’s scope will cause a
to be dropped:
fn main() {
let mut s = String::from("foo");
{
let a = &mut s;
borrow(a);
println!("a: {}", a); // foobar
} // <-- a is dropped
let b = &mut s;
borrow(b);
println!("b: {}", b); // foobarbar
}
fn borrow(s: &mut String) {
s.push_str("bar");
}
Another gotcha:
- You cannot have a mutable reference and an immutable reference.
Here’s an example where the compiler complains:
fn main() {
let mut s = String::from("foo");
let a = &s; // immutable reference
let b = &mut s; // mutable reference
borrow(b);
println!("a: {}", a); // COMPILER ERROR
println!("b: {}", b);
}
fn borrow(s: &mut String) {
s.push_str("bar");
}
Multiple immutable references are safe because you’re only able to read the data and not mutate it, but you cannot have an immutable reference while also holding a mutable reference because this otherwise could change the value of the immutable reference (and that would be unexpected for the part of the program using the immutable reference).
The only way this would be allowed is if the immutable reference goes out of scope before the mutable reference(s) were assigned:
fn main() {
let mut s = String::from("foo");
{
let a = &s;
println!("a: {}", a); // foo
} // <-- immutable reference `a` is dropped
let b = &mut s;
borrow(b);
println!("b: {}", b); // foobar
}
fn borrow(s: &mut String) {
s.push_str("bar");
}
One last gotcha:
- You can’t return a function defined variable as a reference.
Here’s what that might look like:
fn main() {
let r = return_ref();
println!("r: {}", r); // foo
}
fn return_ref<'a>() -> &'a String {
let s = String::from("foo");
return &s; // COMPILER ERROR
}
NOTE: the
<'a>
and&'a
syntax will be explained in the next section called “Lifetimes”.
The above code will cause the following compiler error:
error[E0515]: cannot return reference to local variable `s`
--> src/main.rs:8:12
|
8 | return &s; // COMPILER ERROR
| ^^ returns a reference to data owned by the current function
If you look at the compiler explanation for the error (rustc --explain E0515
) it describes the reason, and the solution:
Local variables, function parameters and temporaries are all dropped before the end of the function body. So a reference to them cannot be returned. Consider returning an owned value instead.
So the solution to this problem is to ‘move’ ownership of the data to whoever is calling the function, like so:
fn main() {
let r = return_ref();
println!("r: {}", r); // foo
}
fn return_ref() -> String {
let s = String::from("foo");
return s;
}
In the above example the r
variable is now the new owner of the data.
Lifetimes
Lifetimes are tightly coupled to ‘references’.
A ‘lifetime’ is how long a reference lives for, and the compiler wants to be sure that any reference that is currently active doesn’t refer to data that no longer exists (i.e. a ‘dangling pointer’).
The compiler needs a way to track a reference so it can be sure the reference lives long enough for no errors to occur. To achieve this goal, the Rust compiler uses a “borrow checker” to validate a reference’s lifetime.
So how do we identify the lifetime of a reference? Well, it begins when the reference is created and it ends when the reference is last used (you might have expected it to be when the reference was dropped/out of scope, which would be incorrect).
What does a lifetime look like? It’s just an annotation that has a specific naming convention: '<T>
where <T>
is a letter like 'a
or 'b
. The letters don’t mean anything special, they’re just a way for the compiler to track a reference.
The Rust online book has a good example of this, where they define a function that accepts two arguments x
and y
(both of type &str
, i.e. a string literal) and depending on the length of the given strings you’ll get back either the x
or y
string.
Without the lifetime feature this would be a problem for the compiler because it wouldn’t be able to statically determine which variable (x
or y
) is going to be returned. That can only be determined at runtime not compile time.
The below code example highlights how defining a single lifetime called 'a
and assigning it to both arguments (and to the return value) allows the compiler to track these references and ensure they both live long enough to prevent any errors at runtime.
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
if x.len() > y.len() {
x
} else {
y
}
}
The above code states all the references in the signature must have the same lifetime, and it tells the borrow checker it should reject any values that don’t adhere to these constraints.
This means that if either of the arguments x
or y
don’t live long enough to be used safely, the compiler will let you know about it.
Here is an example that demonstrates the potential error when the code is poorly designed:
fn main() {
let string1 = String::from("a very long string");
let result;
{
let string2 = String::from("short string");
result = longest(string1.as_str(), string2.as_str());
}
println!("The longest string is {}", result);
}
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
if x.len() > y.len() {
x
} else {
y
}
}
The above code will cause the following compiler error:
error[E0597]: `string2` does not live long enough
--> src/main.rs:6:44
|
6 | result = longest(string1.as_str(), string2.as_str());
| ^^^^^^^ borrowed value does not live long enough
7 | }
| - `string2` dropped here while still borrowed
8 | println!("The longest string is {}", result);
| ------ borrow later used here
Conclusion
Although I’m very new to these concepts that Rust defines, I get the feeling that although understanding them at a high-level (as I’ve described them in this post) is reasonably easy. Being able to fully appreciate them and more importantly getting comfortable with them is just something that’s going to take time.
Let me know if this helped you. Reach out to me on twitter.