Explore the ownership system in Rust

simfoo · on Jan 20, 2015

Am I crazy to say I prefer the ownership and copy/move system of C++11? It feels much easier to reason about but then again maybe that's just because I'm more familiar to it.

dan00 · on Jan 20, 2015

> It feels much easier to reason about but then again maybe that's just because I'm more familiar to it.

I'm a professional C++ programmer for a decade and have started to look into Rust, and there's no way that C++ is easier to reason about.

There's no automatic copying of complex data in Rust, you can't move something away and later use it again, you can't invalidate interators and this list goes on and on.

At the beginning you have to get used to the borrow system of Rust, learn the idioms, but after that, it really seems to be in a lot of parts a better C++.

exDM69 · on Jan 20, 2015

> Am I crazy to say I prefer the ownership and copy/move system of C++11?

Yes you are, and the only reason for that is that Rust's system is checked by the compiler, C++ is not.

It's really easy to screw up with C++, forgetting to make something noncopyable and then causing double free or use after free issues.

sanxiyn · on Jan 20, 2015

It's easier (well, certainly simpler) and less powerful. The system certainly can be simplified a lot if you don't need memory safety.

twic · on Jan 20, 2015

Perhaps it's easier in the sense of "less arduous", but ultimately not easier in the sense of "more productive". The C++ model is less spiky and painful, but just doesn't let you be as confident about your program as the Rust model.

djur · on Jan 20, 2015

This was very clear and made sense to me, but it unfortunately doesn't cover lifetime annotations, which make sense to me 80% of the time and are completely mystifying the other 20%.

Animats · on Jan 20, 2015

If you've written some Rust, and you're more confused after reading that, it's not your fault.

Here's how to think about ownership in Rust. First, realize that Rust has basically the same memory model as C/C++. C/C++ has three big sources of memory trouble: "how big is it", "who owns it", and "who locks it". C/C++ provides little help in dealing with those issue. Rust locks down all those problem. Mostly through compile time checks.

Ownership in Rust starts out simple. There's single-ownership, and multiple-ownership with reference counting. The latter comes in two flavors, with and without concurrency locking. Reference counting works roughly the way you think it should.

In addition to ownership, there's "borrowing". This usually means creating a local reference to something. The local reference must have a scope that doesn't outlive the thing being borrowed. That makes borrowing safe. Borrowing is a compile-time thing with no run-time representation. Borrowed objects with reference counts don't need reference count updates on the borrowed reference, which speeds things up a lot. Borrowing in Rust is cheap and easy, and should be done frequently.

When a reference to something is passed into a function, the compiler needs to know if it's being borrowed, or whether ownership is being handed off to the function. The default is to borrow; for more complex situations, there's special lifetime syntax. A similar issue appears when a function returns a reference. Returning a reference is complicated - is it a new object, or a borrowed reference to an input object? Creating new objects in functions and returning them should be avoided if possible. It's a Rust idiom to create the object in the caller and pass it into a function to be modified.

Single ownership can be handed off to another owner. This, after some controversy, is the default action for the "=" operator. (Using "<-" for ownership transfer probably would have been better.) Such a handoff invalidates the variable that gave up ownership, and that variable can't be used again. For some types, though, you get a copy instead. For other types, you have to explicitly ask for a copy. This part of the language is kind of ugly.

Rust has the concept of "mutability". This is just the inverse of "const". Since immutability is the default, you write "mut" in a lot of places.

Programs in Rust need more design effort than in some other languages. You need to plan out who's going to own what, and who gets to change what. "Agile" types may find this troublesome. The payoff is that once the program has compiled, whole classes of errors have been eliminated.

sanxiyn · on Jan 20, 2015

Historically, Rust used to have separate operators for copy and move (= vs <-). It was an ergonomic disaster. You may find the current state ugly, but I assure you that if you actually used it, you would have found what you suggested even uglier.

This is one example of Rust failing initial design because developers followed intuition. They thought the same "it would be better to use a separate operator for ownership transfer!" and ignored existing research in the area. The fact is every single language having linear types do ownership transfer by default, based on type.

The experience was reported here for posterity: http://smallcultfollowing.com/babysteps/blog/2012/10/01/move...

smosher_ · on Jan 20, 2015

> but I assure you that if you actually used it, you would have found what you suggested even uglier.

It's not fair to make that judgment for someone, particularly because people using Rust at the time weren't complaining much about how ugly it was. At the time it was questioned whether the change was desirable and that very blog post braces for opposition under Wait, doesn’t this make it hard to know what your program does?

Fwiw, that's one instance where unnecessary information/features were removed I would rather had been preserved.

sanxiyn · on Jan 20, 2015

I exaggerated for effect. I am sorry, and I apologize.

Being a long time Rust user, it is annoying to read "why doesn't Rust do X instead of Y?", insinuating X is clearly better than Y and they can't believe why Rust developers haven't considered the obvious alternative. Because it is very frequently (estimate: >50%) the case that Rust did do X, and changed to Y only after lots of actual experience and deliberation. For most design decisions people complain about Rust, Rust tried them both ways. In a few cases, changes were reverted after trying both ways, but usually changes stuck, because they were made carefully.

dan00 · on Jan 20, 2015

> Fwiw, that's one instance where unnecessary information/features were removed I would rather had been preserved.

If the information isn't necessary to reason about your program, to write a correct and performant program, then you've just more information to consume, and instead of the unnecessary information you could consume other, necessary information.

smosher_ · on Jan 20, 2015

The information isn't necessary for the compiler to reason about the program. The information is, in my opinion, useful for humans to reason about the program. The fact is I would reason about programs quicker if this was explicit.

There isn't any more information to consume. The information is exactly the same. It's just different, which is why I think it should look different.

drapper · on Jan 20, 2015

> Creating new objects in functions and returning them should be avoided if possible. It's a Rust idiom to create the object in the caller and pass it into a function to be modified.

> immutability is the default

Haven't yet written any Rust code, but aren't those two things contradicting each other? I'd have thought that with immutability the most natural way to operate is to do pure functions (for uninitiated: pure function doesn't modify anything, can only produce new things).

dan00 · on Jan 20, 2015

> Haven't yet written any Rust code, but aren't those two things contradicting each other?

I don't think so.

One way to ensure that data isn't mutated by different parts of a program, is to have immutable data structures.

The other way - the Rust way - is to seriously track who holds mutable references to the same data and just not allow it.

So you can have more optimized mutable data structures, but can be sure, that if you're holding a mutable reference, that no other can modify the data during the lifetime/scope of your mutable reference.

MichaelMoser123 · on Jan 20, 2015

Similar to C++11 - here they added some smart pointers

  std::shared_ptr - 'owner' smart point with reference counter 

  std::weak_ptr - borrowing thing (non owning smart pointer)

  std::unique_ptr - exlusive ownership (transferable)

well reference counting has its problems: like circular reference chains that will cause leaks, now one is supposed to break these cycles with borrowing smart pointers (it still sucks); also in shared_ptr's the reference count is an atomic operations, even if you only use it from one thread; no way to change that (i think that's an oversight)

However what really sucks in C++ are the default constructor/copy constructor, etc operation; This was done so that C structures behave like classes - most of the C++ goodness is due to these backward compatibility with C requirements.

azth · on Jan 20, 2015

> Creating new objects in functions and returning them should be avoided if possible.

Isn't that how "constructors" in Rust work (i.e. `Foo::new()` calls)? Plus, with return value optimization, it should be pretty performant.

lohankin · on Jan 20, 2015

Noisy, unreadable code, obsessed with ownership, may give rise to whole new classes of errors that otherwise won't be there. Good that someone experimented with building a language around ownership, but for me it all demonstrates that the idea is not viable. I tried to read 10 tutorials already, each leaving me more confused than ever before.

Animats · on Jan 20, 2015

I tried to read 10 tutorials already, each leaving me more confused than ever before.

I can understand that. The language has been changing so fast that most of the stuff on the Web is outdated or wrong. I just found a place in the official reference manual that's out of sync with the compiler, and sent in a note. (The lambda/closure syntax is still in a state of flux.)

Rust roughly follows C/C++/Java syntax, except that, like Go, it's a "name: type" language. Talking about ownership in code is new, as is the syntax for that. The ownership syntax is a bolt-on, and it shows. The lambda syntax is kind of weird; the designers went for conciseness over clarity. Some of this will take getting used to. The syntax is no worse than C++, and in some areas, better.

The syntax isn't the hard part, anyway. It's living within the ownership restrictions that's hard. It's hard because that's a design issue. There's a Rust port of Doom, and it has far too much unsafe code, because Doom's data structures were not designed with Rust in mind. It's not yet clear how big an issue this will be. The remark that 10% of Servo, the web page renderer, is unsafe code is a bad sign.

I don't have a personal opinion on this yet. I haven't written enough Rust code. I'm writing an RSS feed parser to get a sense of what it's like to deal with a complex tree in Rust. So far, it's going well, with no need for unsafe code.

If Rust can eliminate buffer overflows, dangling pointers, and memory leaks, that's enough to justify using it in place of C/C++.

dbaupp · on Jan 20, 2015

Where did the 10% number come from?

FWIW, I just looked at the code in the mozilla/servo repository: more than half (1291) of all mentions of 'unsafe' (2150) are inside autogenerated bindings to CEF (Chromium Embedding Framework), which doesn't seem at all like the interesting/relevant part of the application (it's providing an optional "minimal working" browser chrome for the rendering engine). A lot of the other uses are similarly for other FFI tasks, interfacing with libraries like FreeType, HarfBuzz and Spidermonkey. Obviously it would be nice for those libraries to be in Rust and thus avoid the FFI-is-always-unsafe , but the ecosystem is still young and those libraries are seriously non-trivial, Mozilla and the Servo team presumably want to be able to do the interesting parts of their experimentation rather than spending years porting all the low-level libs to Rust first.

For some "success stories", Servo's CSS parser was now pure Rust, and has a single `unsafe` block: https://github.com/servo/rust-cssparser/blob/38877bc0a50a64a... . Similarly, the pure Rust HTML parser written for Servo doesn't have much unsafe (certainly very little compulsory unsafe, most exists in an alternate DOM implementation, or in the C interface the library exposes): https://github.com/servo/html5ever

That said, Rust is definitely not perfect in this regard, three of the major failings with Rust that force Servo to have a unsafe code are insufficient data-parallel libraries, lack of inheritance-like features to allow the DOM to be CPU and memory efficient, and lack of garbage collection hooks in the compiler.

> The ownership syntax is a bolt-on, and it shows.

Could you go into more detail about what you mean by this?

wycats · on Jan 20, 2015

For what it's worth, I've written two fairly large pieces of code in Rust: Skylight and much of Cargo. In both cases, the amount of unsafe code was very small (tiny) and restricted to FFI boundaries, which are "unsafe" even in high level languages like Ruby.

Servo has been under active development for far longer, and sometimes used unsafe code to work around historical issues that no longer require unsafe code. I find the idea of an idiomatic codebase (written today), that is 10% unsafe code (outside of FFI boundaries) to be difficult to imagine.

eddyb · on Jan 20, 2015

Mentioning Go for `name: type` may not be inaccurate, but it is an odd choice (we're seeing more comparisons of Rust and Go than Rust and any language it is closely related to). I believe the origin for Rust's choice is actually the ML family. EDIT: I just checked more carefully and Go doesn't even have the colon ([this blog post](http://blog.golang.org/gos-declaration-syntax) used it in pseudocode).

Animats · on Jan 20, 2015

Sorry. "name : type" comes from Pascal, and the Modula/Ada/Oberon family of languages continue it. Now that type expressions have become so complex, it's better to have a syntax where you know a type expression is expected. C/C++ struggle with this; they have to know which names are types just to parse. This is a pain for tools which try to parse C/C++ without reading all the include files. C originally had only built-in type names and "struct". "typedef" came later, and made the syntax context-dependent.

adamnemecek · on Jan 20, 2015

> obsessed with ownership, may give rise to whole new classes of errors that otherwise won't be there

How does baking the ability to statically determine resource ownership into a language lead to a class of errors? What sort of errors are these?

lohankin · on Jan 20, 2015

By taking away the ability of programmer to read and understand the code written by somebody else, or by himself 1 week ago

wycats · on Jan 20, 2015

As someone who has written many thousands of lines of production Rust code, this doesn't ring true to me at all. Ownership semantics have dramatically improved my ability to reason about my code, not the opposite.

adamnemecek · on Jan 20, 2015

I have no clue what you are talking about and I don't think that you have either so I'll just leave this discussion now.

Retra · on Jan 20, 2015

Have you written any code?

jonalmeida · on Jan 20, 2015

I found the rust-lang guide (now the rust-lang book[1]) very useful when first learning rust. The style used to teach ownership in that guide is very similar to this, but helped clear out a few remaining doubts I've had.

Consider adding contribution to the book!

[1]: http://doc.rust-lang.org/book/README.html

steveklabnik · on Jan 20, 2015

Thanks for the kind words. I certainly plan on expanding it.

zaroth · on Jan 20, 2015

I have zero experience in Rust, so maybe I shouldn't comment on an intermediate tutorial, but just some thoughts. I'm sure the answers to most of this are out there, this is more stream of consciousness...

"let i = 1", so let isn't picky about types, and types can be implied. In the 'fn foo(i: i64)' we see the "<name>: <type>" syntax, which I'm not an immediate fan of that, but no big deal.

We have value types, defined as a 'struct' with a 'Copy' flag. The "impl Copy for Info {}" syntax seems weird, and the "#[derive(Copy)]" only slightly better. Also, why does the struct have a hanging comma after the last (only) member?

Next we add some methods to the Bob struct. Seems like there are multiple ways, depending on the type of method? 'new' is just nested inside an 'impl Bob' but 'Drop' is more explicitly nested inside 'impl Drop for Bob' and then a lowercase 'drop' function. Huh.

Inside the 'new' function we see the first use of return values. And starting to catch on that Rust likes chatty syntax, e.g. the '->' before the return type. But yet, a very implicit structure to both initializing and returning the new 'Bob' object! I get the sinking feeling the 'name: name.to_string()' in the Bob initializer is important, and that '{ name: name }' would sadly not work... Not loving the underscore in method names.

I guess the 'drop' function is a first example of an instance method, identified by having '&self' or in this case '&mut self'. For a function that internally doesn't explicitly mutate bob, it's curious to see the &mut in that case. I'm going to guess 'drop' is a special case which always must be 'mut'.

The next step is "make bob value format-able when outputting to console" which sounds an awful lot like implementing to_string() but is accomplished through the 'Show::fmt' trait which is implemented using 'fmt::Show'. Is it really called the "Show::fmt" trait, and not the "fmt::Show" trait?

Here's where I like the syntax less and less: "fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result". Why must 'f' be 'mut' here? Why is the '&mut' before the type and not before the name? The one colon for name/value versus two colons for namespace is starting to hurt. Why a double colon instead of a dot? Also, I have to figure out what are all these ampersands are actually doing... they are notably absent from the 'black_hole' function.

Skipping ahead... owned values can be mutated, you just have to flag it with 'let mut'. Almost wish that was simply 'met' vs 'let', or even 'mlet'. Any owner can flag their value mutable, it doesn't have to start that way.

Then we get to "fn mutate(mut value: Bob) { ... }". Earlier we saw the 'fmt' function apply 'mut' before the type definition, here mut keyword is before the variable name. Is this the same, or something different happening? It seems weird there's not just one place to put the 'mut' flag.

BTW, can you really not just do 'bob.name = "mutant";' versus 'bob.name = String::from_str("mutant");'? Up until this point string handling seems reasonably sane.

Further down we start putting things on the heap instead of stack. In 'let bob = Box::new(Bob::new("A"));' -- do you have to always 'Box::new' to get to the heap, or is there a way to define Bob as always living on the heap, i.e. 'class' vs 'struct'? ... I think I spoke too soon, because further down when it takes all of 'Rc::new(RefCell::new(Bob::new("A")))" to simply have a stack pointer to a mutable instance on the heap, I think I'm ready to bolt.

It's an impressive level of control over the semantics of how objects are created and destroyed without having to explicitly control every aspect of memory management, and there do seem to be a lot of follow-on benefits for going through all this trouble, but perhaps by 2.0 they will have made strides to improve the syntax to something a bit less obnoxious.

Twisol · on Jan 20, 2015

> Next we add some methods to the Bob struct.

`impl Bob` defines methods inherent to the type; `impl Drop for Bob` implements methods defined by the Drop trait. (Think Java interfaces.)

> I get the sinking feeling [...]

`{name: name}` works just fine.

> I'm going to guess 'drop' is a special case [...]

Since the signature for `drop` is defined elsewhere, by the `Drop` trait, we must use `&mut`, regardless of if we actually mutate `self`.

> Why must 'f' be 'mut' here?

This time the trait is `Show`, and the same reason as for `Drop` applies. Now, however, we're actually using the fact that it's `mut`: you can't write to an immutable formatter.

> Why is the '&mut' before the type and not before the name?

Because mutability and ownership is information carried by the type, not by the value.

> It seems weird there's not just one place to put the 'mut' flag.

You use `mut` to define whether a stack-allocated value can be mutated, and also to define whether a reference can be used to mutate its referent. Function parameters are stack-allocated, so you can put `mut` before them in the parameter list.

> Up until this point string handling seems reasonably sane.

`String` is a resizable string. `str` is not. It's a bit like Vec<T> vs. [T] - see http://cosmic.mearie.org/2014/01/periodic-table-of-rust-type... .

> I think I spoke too soon, because further down [...]

That incantation effectively disables Rust's compile-time ownership checking and replaces it with a runtime-checked structure. RefCell implements `&mut` semantics at runtime, and Rc allows multiple owners of the same data. There are often ways to avoid needing these constructs, and it's considered a code smell, so the incantation is long for a good reason. (If you really want a shorter version, you can use `unsafe {}` and raw pointers. Or C++.)

You can have a mutable heap-allocated value simply by using `Box::new` and marking the stack variable as `mut`: `let mut foo = Box::new(Bob::new("A"))` .

I agree that the syntax sometimes isn't quite there. There used to be a `box` keyword for heap allocations, but it's been feature-gated for the alpha.

sanxiyn · on Jan 20, 2015

> `{name: name}` works just fine.

It doesn't in this case. "name" field is String, "name" is &str.

Twisol · on Jan 20, 2015

That's true. I thought he was making a point about syntax.

sanxiyn · on Jan 20, 2015

I will reply point by point. I think all of these are good FAQ materials. Thanks for that.

Yes, Rust does type inference. Yes, Rust uses "name: type". This will not change.

Copy can be considered "type of type", aka kind. "impl kind for type" is the syntax for that. Not "type: kind", because type can belong to multiple kinds. "#[derive(Copy)]" is a macro that generates "impl Copy for Info".

There are two ways to define methods. One on types, the other on traits, another "type of type". Drop is a trait. Drop::drop is a method that can be called on any type that belongs to Drop.

Rust uses "->". This will not change. "name" is a string view, not a string. "name.to_string()" converts a string view to a string. Creating a string allocates, creating a string view doesn't. Rust convention is snake cased method names. This will not change.

Yes, Drop::drop is a special case. There was a try to change this, but the conclusion was that it's not worth the effort. Sorry.

"fmt::Show" is a trait. "fmt::Show::fmt" is a method defined in the trait. It may be referred by less qualified name, "Show::fmt". No place in the original article says "Show::fmt" is a trait. It is a method.

"mut f: T" and "f: &mut T" have completely different meanings. See below. "f" must be "mut" here because you write to the formatter, which can mutate the formatter. There can be many formatters, one formatter may write to the console, but another formatter may write to the memory, a growable buffer owned by the formatter. For later you definitely want "mut". In Rust, a dot is used for fields and methods, a double colon for else. This distinction is useful, because in Rust a dot can dereference a pointer, that is, both "." and "->" in C are written ".". No such thing for a double colon. This will not change. Ampersands are borrow operators. That's why "black_hole" drops, but "fmt" method does not drop a formatter.

In "let mut x", it is "let (mut x)", not "(let mut) x". "met" and "mlet" do not make sense because "mut" is not part of special "let mut" syntax. "mut x" in "let mut x" and "mut value" in function arguments are exactly same constructs. You can also do "let (mut x, y)", or "let (x, mut y)". This will not change.

"mut f" means you can assign to "f". "f: &mut" means you can mutate through "f", for example assigning to "f.buffer". You can't assign to "f", to do that use "mut f: &mut". This is exactly same as distinction of "const T * v" and "T * const v" in C. The former is written "v: &mut T", the later "mut v: &T". "T * v" is written "mut v: &mut T". This will not change.

Yes, a string view does not implicitly convert to a string, because it is an allocating operation. This will not change. There is a change in queue that implicitly convert a string to a string view in certain places to improve usability.

There is no way to define Bob as always living on the heap. You can alias, aka typedef, using "type Bob = Box<Bob_>". Then Bob always lives on the heap, which you can convert to value Bob_. This will not change. There is a change in queue that lets you use "box expr" instead of "Box::new(expr)" to improve usability. This also can be used with Rc, RefCell, etc. Don't bolt too soon, Rc and RefCell are rarely needed.

As you can see, there are planned changes to improve the syntax, but most syntax you complain about is that way for good reasons, and will not change by 2.0, likely will not change ever. Syntax improvement suggestions after you understand what the syntax means is of course very welcome, although for 1.x you can't make a backward-incompatible change.

Astrobastard · on Jan 20, 2015

Would these checks be possible to implement as some sort of diagnostic or sanitizer in clang?

sanxiyn · on Jan 20, 2015

Some of it, yes. As a matter of fact, Clang already includes one, which is deployed on a large scale at Google.

http://clang.llvm.org/docs/ThreadSafetyAnalysis.html

lohankin · on Jan 20, 2015

Some serious voodoo science is going on here, with a good chance to become a next fad. Programming is difficult as it is, but apparently not difficult enough for someone's taste.

andolanra · on Jan 20, 2015

This 'ownership' system is a pattern which is used informally in C, C++, and other languages with manual garbage collection—the idea that you have a bit of your program responsible for allocating and freeing memory, and another bit which merely uses it. Rust takes this pattern and has the compiler enforce it.

I suppose in some ways it makes programming more difficult, but it's also going to make low-level programming safer by virtue of the fact that certain invalid programs are no longer possible to express. Rust is one of very few languages which can offer that.

(Also, having used Rust a fair amount, I don't think it's actually that difficult once you're used to it. I can think of much more complicated programming features that have been present in popular languages for decades!)

rubiquity · on Jan 20, 2015

For me, understanding Rust's type system has been far more complicated than understanding the ownership model. But that's likely because I have zero C, C++ or Haskell experience.

andolanra · on Jan 20, 2015

Has it been something about the type system in particular, or just keeping track of all the parts? I would not call it a simple language, but I think it's ultimately tractable, albeit with work.

moonchrome · on Jan 20, 2015

These problems already exist with low level programming (ie. no GC), Rust just forces you to deal with it explicitly at compile time trough it's type system rather than implicitly (docs/conventions/correct library usage).

Anyone who's dealt with manual memory management already knows about these problems and rust solutions shouldn't be that foreign.

My biggest issue with rust is that it's still in early phases even considering the 1.0 release, the language is still not really that expressive and plenty of stuff still looks incredibly tedious to do.

I think 1.0 release is "here's something you can use" but it's not "here's something you'd want to use". I think I will need to wait till 2.0 so things get polished out and early adopters find all the initial design problems.

iopq · on Jan 20, 2015

Recall that in semantic versions you can ADD really nice sugar in a 1.1 version that will make life easier. I think Some 1.x release this year will make the language much more approachable.

Especially regarding closures, they are really painful to use because nothing is properly inferred. You can easily improve type inference by just improving the compiler, this is almost not even a language issue. But it's a real quality of life issue.

So after 1.0 is out you will see a lot of RFCs for things like variadic generics, higher kinded types, etc. that will make someone's life easier (and remove really ugly hacks due to the lack of said things) but are completely backwards compatible.

moonchrome · on Jan 20, 2015

Fair enough, I guess I should have said next big release.

I really want a language that can replace C++ but ATM Rust doesn't feel like it would be more productive for the things I want to do as a language and then there's the tooling/libraries/platforms.