<--

 

 

2020-01-12

Introduction to Rust for Node Developers

In this article we will build a simple command line program that returns the word count of a file. This will essentially be a simpler version of the Unix utility wc, written in Rust. The goal of this article is to give an introduction to some core Rust concepts for readers who might be more familiar with web-focused languages such as JavaScript and Typescript. Therefore, the Rust code examples will be compared to similar code and concepts in JavaScrip or TypeScript. This guide also assumes no prior knowledge of Rust or related tools, but it does assume you have node installed on your machine already.

Notes

A couple of notes and assumptions:

Setting up

In order to get started, first we need to set up a new Rust project. If you haven't yet installed Rust on your computer, you can take a look at the official 'getting started' guide, or the first chapter of The Rust Book.

Once you have cargo available, go ahead and run cargo new miniwc --bin in a suitable directory.

Project structure

The logical next question is "What is cargo?". cargo is a direct parallel to npm in the Node ecosystem, in other words Rust's built-in package manager. You can view popular crates (packages) available at crates.io.

The cargo new miniwc --bin command tells cargo to create a new binary (able to run on our machine) Rust project named miniwc in the directory ./miniwc and set up the basic boilerplate project structure: Cargo.toml, src/main.rs, and a .gitignore.

Running the project

That's it for the project structure, but what about actually running the code? In node, we have npm which allows us to define scripts such as start and test, and then run those commands via npm run start or npm run test. cargo gives us similar functionality. Running cargo run in our project directory will run our boilerplate project. Try it out, and you should see Hello, world! printed to your console.

You may have noticed a new target/ directory appear after you ran cargo run. This is a folder managed by cargo to store build artifacts and other dependencies of the compilation process. For a more detailed guide to cargo and an overview of concepts like the target/ directory, check out The Cargo Book.

Tour of a "Hello World" program in Rust

Let's take a moment to take a look at the auto-generated code within main.rs and draw some basic parallels from the JavaScript world to that of Rust:

File: src/main.rs

fn main() {
    println!("Hello, world!");
}

If we ported the above Rust program to JavaScript it would look like:

function main() {
  console.log("Hello, world!");
}

// Since `main()` isn't a special function in JavaScript,
// we have to invoke it if we want our code to run:
main();

If the distinction between compiled and interpreted languages is a bit hazy for you, take a look at this article for a more in-depth treatment.

fn is the function keyword in Rust, and main denotes to the name of the function. main is a special function name in Rust (as it is in other compiled languages like C) and it lets the Rust compiler know that this is the entry point of an executable program. () is the list of arguments. In this case there are no arguments, so the parentheses are empty.

The body of the main function is declared with { }, and represents its scope. Inside the body of main, we have println!("Hello, world!");. This looks like a function, but in fact is a macro. In Rust macros are denoted by the ! at the end of a keyword.

There is no great parallel for macros in JavaScript, but a simple definition is that macros are code that generate other code when the program is compiled. Rust will replace println! with code for printing to standard out that works for whatever computer architecture you're compiling the Rust code for. In my case this would be code for printing in macOS, but it might be different for you.

With the basic setup and syntax tour out of the way, we can move on to an overview of our miniwc program.

cargo isn't strictly necessary to create Rust binaries, it just provides some convenient tools and a bit of boilerplate to get you started. All you need to compile Rust projects is the Rust Compiler (rustc). Running rustc foobar.rs on any valid and correct Rust program will output an executable binary. Don't believe me? Try it with the code above!

The miniwc program

At the end of this article, we will have an executable program that takes a filename as an argument and returns the word count of that document.

Let's get into it.

Building a foundation

Before we can begin tackling the program requirements we've outlined above, there are several Rust concepts that we need to anchor to their counterparts in JavaScript. I'm a big advocate for understanding bedrock concepts, especially as you move past the beginner stage where you know how to get things done, but maybe not why you're doing them that way. I feel that Rust is a great tool to put the effort in and really learn, so before we go ahead and actually write the code for our program, we're going to explore a prelude of necessary concepts, step by step. These include:

There are some concepts here that may seem very foreign, but they all map to JavaScript concepts you probably already know and use regularly. If you have a good grasp on the above topics already, feel free to skip the next few sections. Otherwise, let's unpack them one at a time.

Types

Rust is a statically typed language, and therefore it expects explicit type annotations in the places in your code where it isn't obvious what the type of a value is. If you have experience with TypeScript, this concept should be familiar.

Two common ways you'll interact with types in Rust is through argument types and return types:

fn example_function(
  integer_arg: i64,
  string_arg: String,
  other_arg: OurCustomType ) -> String {
    // ---snip---
}

In the above example, we pass three arguments to our example_function, integer_arg with the type i64 (a 64-bit signed integer), string_arg with the type String, and other_arg with the made-up example type OurCustomType. These type annotations are denoted by the colon (:) following the argument name. After the list of arguments, there's an arrow (->) followed by String which signifies that this function will return a String value.

JavaScript is a dynamically typed language, which means all of the type behavior we have to specifically define in our Rust code is handled under the hood by the JavaScript runtime. JavaScript has primitive types like Number and String, but it doesn't require the programmer to be explicit about what types correspond to each value. JavaScript also doesn't allow the programmer to come up with their own types, like the Args type we saw previously in the args function signature. This is both powerful and limiting, depending on the context and use-case.

Structures (struct)

With the basics of types in Rust under our belts, let's take a moment to unwrap another fundamental Rust concept that we'll need going forward: struct. Rust, unlike modern JavaScript, has no concept of class and it doesn't have a catch-all, ubiquitous name/value collection like JavaScript's Object type. Instead, Rust allows you to associate fields and related functions using structures, via the keyword struct. This is somewhat similar to how objects are used in JavaScript. Compare the following two examples:

let message = {
  title: "Message title"
  body: "This is a message."
}
struct Message {
  title: String,
  body: String
}

let message = Message {
  title: String::from("Message title"),
  body: String::from("This is a message.")
}

Since Rust doesn't give you an arbitrary bucket of key/value pairs to work with (like JavaScript does with Objects), we first need to define the structure of our Message type, via the struct keyword. Note how in the JavaScript example, we just assign String values to the message and body keys. This is a very common pattern, and in some cases is extremely powerful and simple. In the Rust example, we have to be explicit about the types of values each field (note that in Rust, we call these key/value pairs fields, while in JavaScript they're called properties). Once we've told the Rust compiler what our Message fields will contain, we then can create a new Message with our specific field values.

Implementations (impl)

JavaScript uses an inheritance model called Prototypal Inheritance in order to allow for extending and reusing behavior in your code. Another familiar model that accomplishes something similar is the more traditional class-based model you may have come across in other languages like Java and TypeScript (JavaScript has class syntax, but it's just sugar over its prototypal inheritance model).

For the purposes of this project, you don't need to be super familiar with the ins and outs of Prototypal Inheritance or Object Oriented Programming, but if you're interested in diving in, Mozilla offers an in-depth treatment here. What we're specifically interested in is how JavaScript allows you to implement and reuse behavior, versus how Rust does it. Consider the following JavaScript example:

// Using JavaScript's `class` syntax because
// it's simpler for this example
class Message {
  send(content) {
    console.log(content);
  }
}

class PrivateMessage extends Message {
  send(content) {
    super.send("private: " + content);
  }
}

var message = new Message();
message.send("hello"); // hello

var privateMessage = new PrivateMessage();
privateMessage.send("hello"); // private: hello

Here, we've modeled PrivateMessage as a Message. It inherits the send function we defined on Message, but we can change it to be specific for our PrivateMessage class. Rust has a different way of doing things. Let's take a look at the same idea, expressed in Rust:

struct PrivateMessage {}
struct NormalMessage {}

pub trait Message {
    fn send(&self, content: &str) {
        println!("{}", content);
    }
}

impl Message for NormalMessage {} // Use the default `send`

impl Message for PrivateMessage {
    fn send(&self, content: &str) {
        println!("private: {}", content);
    }
}

pub fn main() {
  let message = NormalMessage {};
  message.send("hello"); // hello

  let private_message = PrivateMessage {};
  private_message.send("hello"); // private: hello
}

In this version of the program, we've defined Message as a trait, which can be implemented by our other code. In other words, our PrivateMessage and NormalMessage structsNormalMessage uses the default send implementation that we define in the Message trait, while PrivateMessage implements its own version of send.

Hopefully this sheds a bit of light onto the basics of Rust inheritance (via traits and impl) versus JavaScript (via prototypes). If any of this still feels opaque, take some time to dive-in to the relevant sections in the Rust Book:

Enumerations (enum)

If you're familiar with TypeScript, then Rust's enum type is a close parallel. If not, enumerations are relatively straightforward: they define a type that can be one of several variants. For example, we can create an enum that represents the different types of common U.S. coinage like so:

enum Coin {
  Penny,
  Nickel,
  Dime,
  Quarter
}

And we can reference any single variant via:

let penny: Coin  = Coin::Penny;
let dime: Coin = Coin::Dime;

As you can see, both penny and dime are Coins (they have the Coin type), but we can get more specific and state the variant of Coin that each variable holds. In JavaScript

Handling arguments

Now that we've explored the necessary foundational concepts to understand and implement our miniwc program, let's get back to our miniwc program. As mentioned before, our program should:

Currently, our program does none of the things outlined above. When you execute cargo run from the command line, we still just see Hello, world! printed out. Let's take it step by step, and first handle taking a filename as an argument.

In node, one of the global variables made available to our programs during runtime is the process.argv variable. This variable contains all of the arguments passed to your node program. To take command line arguments and print them out using node, we could do the following:

File: main.js

for (let arg of process.argv) {
  console.log(arg);
}

If you save and run that program in the root of the project using node main.js hello, you should get three outputs. The first output is the program running our JavaScript code (in this case node). The second is the filename of the program being run, and the third is the argument we passed in.

Rust does not have a runtime environment like node, so how can we get arguments passed to our program?

Although Rust doesn't have a language-specific runtime environment, the operating system your Rust program runs on is technically a runtime. And luckily for us, the operating system provides a way to inject variables into programs. We won't need to get into the specifics of how that happens (and the potential pitfalls), because the Rust standard library provides an easy way for us to access the arguments passed to our program, via the std::env module. Similar to how process.argv works in node, the std::env module will allow us to get a list of arguments we can then use how we'd like.

In order to make the std::env module more ergonomic to use, we can use it at the top of our program like so: use std::env. The use keyword allows us to bring a module into scope. The std library is already available to our program, so we could just type std::env::foo_function every time we wanted to use something from the env module, but with use we can bring the env module directly into scope. A loose parallel between use to an equivalent in JavaScript would be taking a globally available function like global.console.log and setting it to its own variable for easier use, for example let log = global.console.log. With the env module in scope, we can now use the public function args, which exists in the env module.

This function will return a value with the type of Args. Args implements the trait Iterator, which allows us to iterate over the returned arguments. The function signature for args looks like so: fn args() -> Args.

Except for Iterator and the idea of iterating, these are all concepts we've explored in the last few sections, so now let's put them to work. Once you've added the use statement for std::env, your program should look like this:

File: src/main.rs

use std::env;

fn main() {
    println!("Hello, world!");
}

Let's enhance our program and print out all of the arguments that we pass in from the command line:

File: src/main.rs

use std::env;

fn main() {
  for arg in env::args() {
    println!("{}", arg);
  }
}

If the println! macro call seems a bit strange, you can dive deeper here, but you can also simply think of println! as similar to JavaScript template literals: anything between {} will be replaced with the variable you pass as subsequent arguments. Play around with it a bit to get a more intuitive feel for how it works.

Now let's run the program and pass it some arguments via cargo run -- hello world (we separate the commands passed to cargo and the commands passed to our program with --). You should get the following output:

target/debug/miniwc
hello
world

The first line of our output is actually the name of the program running, by convention. It's target/debug/miniwc because that's the binary created for us by cargo. If you compiled this project for release, or used rustc to compile, then the first item in the args() value would just be miniwc. On the next two lines we see the two arguments we passed in.

Our program now nominally supports passing in arguments via the command line. Now we're ready to do something with them.

Using Iterators

Let's start by binding the value of the first argument passed in by the user (ignoring the program path argument, which comes first) using the nth method on the Args type. Args is the type of the value returned from std::env::args(), and it implements the Iterator type, thereby inheriting all of the methods on Iterator. As per the Args documentation, Args specifically gives us an Iterator whose values are Strings.

One of the methods we get by inheriting from Iterator is nth, which returns the value of the Iterator item at the index given to nth. For example, env::args().nth(1) should give us the value at index 1 of the args_list. You can think of Iterator as sort of giving the properties of a JavaScript Array to any type that implements Iterator. Like Arrays, Iterators come with all sorts of useful methods.

With nth, we should now be able to grab the first argument passed to our program. Let's set that value to a variable, and try to print it out with the following code:

File: src/main.rs

use std::env;

pub fn main() {
    let filename = env::args().nth(1);
    println!("{}", filename)
}

After a cargo run -- hello, we see:

error[E0277]: `std::option::Option<std::string::String>` doesn't implement `std::fmt::Display`
 --> src/main.rs:5:20
  |
5 |     println!("{}", filename)
  |                    ^^^^^^^^ `std::option::Option<std::string::String>` cannot be formatted with the default formatter
  |
  = help: the trait `std::fmt::Display` is not implemented for `std::option::Option<std::string::String>`
  = note: in format strings you may be able to use `{:?}` (or {:#?} for pretty-print) instead
  = note: required by `std::fmt::Display::fmt`

error: aborting due to previous error

An error! What happened?

Handling all Options

The issue with our code is that nth doesn't return a String directly, but instead returns a type called Option. Option is part of an interesting feature of Rust: it has no null primitive type. Unlike most languages which have a null type (and very much unlike JavaScript which has null and undefined), Rust forces you to account for all possible values when working with operations that are influenced by things outside of the program's control, like accepting command line arguments or doing file I/O. To do this, Rust makes use of the Option enum, which can either be Some(value) or None. If the value is None, Rust makes you explicitly handle it, otherwise it will be a compile time error like we saw above. While this may seem overly rigid, this is one of the features of Rust that leads to less error-prone programs.

Let's look at a JavaScript example that illustrates this point:

// Get the first argument passed in by the user
let arg = process.argv[2];

// Do really important stuff
console.log(arg.split(""));

There's a subtle error that will only happen sometimes in this code. Can you spot it? If we pass an argument to our program -- node main.js hello -- then it behaves as expected. However, if we don't pass an argument, we'll get an error that's probably very familiar if you use JavaScript a lot:

console.log(arg.split(''))
                  ^

TypeError: Cannot read property 'split' of undefined

In this case, it's easy to see what went wrong: if we don't pass an argument to our program, we end up setting our arg variable to the value at an array index that doesn't exist. JavaScript defaults that value to undefined, which then causes an error later on in our handleArg function when we try to split() the undefined value.

While this example is trivial to fix, it's very easy to introduce this kind of bug into a larger JavaScript program, where it's potentially much harder to find the original cause of the undefined value. A typical fix would have us check that the value exists before trying to use it, but that requires more code and more diligent programmers.

In cases where we're dealing with input to our program that can be undefined, Rust forces us to handle the potential undefined value with the Option type before the program will even compile. We can see the Option type in action if we tweak our println! call a bit:

File: src/main.rs

use std::env;

pub fn main() {
    let filename = env::args().nth(1);
    println!("{:?}", filename)
}

This solution was hinted at in our error message from before. By adding the :? to the curly brackets, we're essentially telling the println! macro that we want to be more lenient about the types of values we can print to the console (specifically, we've added the debug format trait).

If this doesn't make much sense, don't worry about it for now. In general, the Rust compiler is very helpful, and you can usually rely on its suggestions to fix your code if you've gotten stuck. In this case, let's follow its advice and see what we get.

After a cargo run -- hello, you should see:

Some("hello")

There it is! Since we passed in an argument to our program, env::args.nth(1) contains Some value. Now, try running the program without an argument. This time you should've gotten the None variant, just as we expected.

Now that we understand a bit about what's going on with Rust's Option type, how do we actually get to the value inside Some? Conveniently, Rust offers us a shortcut for grabbing values we are pretty sure are going to exist in our program:

File: src/main.rs

use std::env;

pub fn main() {
    let filename = env::args().nth(1).unwrap();
    println!("{}", filename) // we no longer need the ':?'
}

unwrap() is a method available on Option, and it's pretty straightforward. If there is Some(value), then return the value. If not, then panic (error out). unwrap() also serves as a sort of "TODO" flag, because it signals that you should replace it before releasing your program into the world.

When we run our program with at least one argument now, we should get that argument printed to the console. If we run it without any arguments, we should get a panic along the lines of:

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value'

With that brief foray into Rust Options out of the way, let's next move on to actually reading text files from the system.

Reading file contents

The Rust standard library contains a module for filesystem operations. This module is very similar in functionality to the fs module in the Node standard library. In Node, we could use the contents of a file like so:

const fs = require("fs");

fs.readFile("words.txt", "utf8", function (err, data) {
  console.log(data);
});

The readFile() function takes a file, an optional encoding and a callback to handle either an error or the returned contents. The Rust std::fs::read_to_string function does something very similar, taking a file path and returning a Result<String>.

Result and expect()

Result is similar to Option in that it can either produce a value or something else (None being the 'something else' for Option). In the case of Result, the results are either:

In the case of fs::read_to_string, the Ok result is Ok(String), since on a successful "read this file to a string" operation, the value we want back is a String.

Let's add a simple text file to our project and test it out. Add the following text to a file called words.txt in the root of the project:

File: words.txt

This is a file containing words
There are several words on this line
This one is short
The end

Now let's use read_to_string to read words.txt to a variable:

File: src/main.rs

use std::env;
use std::fs;

pub fn main() {
  let filename = env::args().nth(1).unwrap();

  let file_contents = fs::read_to_string(filename).expect("Error reading file to string");

  println!("{}", file_contents)
}

Here we use expect(), which is very similar to unwrap except it allows us to pass a custom panic message. If we run our program and pass it the argument the path of our text file (cargo run -- words.txt), we should see our text printed to the console.

Now that we've successfully read our text file and put its contents in a variable, we can complete the final step of counting the words in that file.

Counting words

Simple text manipulation like counting the number of individual words (separated by whitespace) is a great way to explore the power behind one of Rust's core philosophies, that of zero cost abstractions. The gist of this idea is two-fold: first, you shouldn't pay (in performance or size) for any part of the programming language that you don't use, and second, if you do choose to use a language feature then it will be just as fast (or faster) than if you wrote the feature yourself. By following this simple philosophy, Rust places itself as a prime choice for writing programs that need to be mindful of space and speed considerations.

To illustrate this point, let's take another example from JavaScript. A JavaScript implementation (node, the browser, etc), has to include a garbage collector in order to manage memory the program uses. Even if all you do is console.log('Hello World'), the entirety of the JavaScript runtime, including the garbage collector have to be there. In Rust, when you println!, the only code that gets compiled and run is the code specifically needed to print things.

It is worth noting that sometimes we don't really care that much about speed or size of our programs, and in those cases Rust doesn't have much of an advantage over JavaScript or any other language. But, when we do care about those things Rust really comes into it's own. In many cases with Rust you get the flexibility and expressive power of a super high level programming language while also getting near-unmatched performance. Let's look at an example:

use std::env;
use std::fs;

pub fn main() {
  let filename = env::args().nth(1).unwrap();

  let file_contents = fs::read_to_string(filename).expect("Error retrieving file");

  let number_of_words = file_contents.split_whitespace().count();

  println!("{}", number_of_words)
}

Here we've added a single line to our program, changed another, and essentially achieved our desired functionality. Let's take it step-by-step.

Once we have the file contents from our words.txt file bound to a variable, we take thatfile_contents String and split it up on any Unicode whitespace via split_whitespace. This returns an Iterator value. This would be roughly the equivalent of using the split() method on a String in JavaScript, for example:

let exampleString = "This is an example";
console.log(exampleString.split(" ")); // Array(4) [ "This", "is", "an", "example" ]

Once we've done that, we can consume the Iterator with count() to get the number of items in it. A similar approach in JavaScript would be to use the length property of the returned Array from before.

Finally, we print the resulting count to the console. And that's it! Run cargo run -- words.txt to see the number of words in our text file.

Conclusion

This program is very simple, but it illustrates a plethora of core Rust concepts. It also leaves out some other very important tools and ideas. For example:

If you've made it this far, thanks so much for reading! Writing this article has been a learning process for me, and I still very much consider myself a Rust beginner. If you spot any mistakes, or see any grievous infractions of best-practices, please reach out at tindleaj[at]gmail[dot]com or @tindleaj If you're interested in learning more Rust, there are a ton of other great, free, and current resources to do so.

Additional resources

Are you a JavaScript developer trying to learn Rust? Send me an email at tindleaj@gmail.com. I'm working on stuff you'll be interested in.

Books

Projects

Other

 

Subscribe for updates