Deep Dive: Strings in Rust

Free Linux Book

Get FREE domain for 1st year and build your brand new site

I've gone into Strings briefly in the beginner's article, but this time, we're going to explore them a little bit more in depth. I will cover some aspects I've already covered only to keep a full 0 to almost 100% info on strings in a single article.

Table of Contents

  1. What are Strings?
  2. Creating and Updating Strings
  3. String Slices

What are Strings?

When we say strings, we mean basically a collection of characters. A 'string' of characters, if you will. Rust has 2 types of strings, one in the core, basic language, and one in the standard library.

The string type in the core of Rust, available at all times is the string slice, noted as &str.
The string type in the standard library is String, which has more functionality coded into it than the string slice. It's a growable, owned, mutable collection of characters.

In Rust generally when we speak of Strings, we mean String's, not &str (string slices).

The standard library contains a few extra String types such as OsString, OsStr, CString, and CStr. There's also crates on Crates.io that can provide even MORE functionality and types of strings. As you can see, the other types provided end with String or Str, referring to the owned and borrowed types respectively. They can store strings in a different encoding and be represented differently in memory. You can refer to the respective APIs for more information on each of them.

Creating and Updating Strings

Let's use both types of strings. We'll begin with String types, since they are more straight forward.

fn main() {
    let mut empty_string = String::new();
    let mut loaded_string = "This is some pre-existing data".to_string();
    // let mut loaded_string = String::from("This is some pre-existing data");
}

This creates an empty string called empty_string, into which we can load data. Usually we have some initial data we want to start with, and we can use the .to_string() method for that, as seen in loaded_string! The String "This is some pre-existing data" is what is called a String Literal.
You can use .to_string() on any string literal, or any type that implements the Display Trait (which literals do).
An alternative to .to_string() which works exactly the same is using the String::from() Method. Which I've left commented.

Strings are UTF-8 encoded, which means we are not limited to just letters. Chinese characters, Japanese, hindu, emojis, anything that is UTF-8 encoded can be used. Borrowing from the book and my previous article, all the following are valid strings.

    let hello = String::from("السلام عليكم");
    let hello = String::from("Dobrý den");
    let hello = String::from("Hello");
    let hello = String::from("שָׁלוֹם");
    let hello = String::from("नमस्ते");
    let hello = String::from("こんにちは");
    let hello = String::from("안녕하세요");
    let hello = String::from("你好");
    let hello = String::from("Olá");
    let hello = String::from("Здравствуйте");
    let hello = String::from("Hola");

A String can grow in size using various methods, just like the Vector type can. There's a push method, push_str method, and you can even use the + operator or the format! macro.

Let's start with push_str. As we've seen before, str stands for String slice. Which means it's a borrowed string. We wouldn't want to take ownership of some string data to append it somewhere, and then be unable to ever use that string on it's own again.

fn main() {
    let mut some_string = "This is some string ".to_string();
    let to_add = "Let's add some Data";

    some_string.push_str(to_add);

    println!("{}", some_string);
}

Strings-1

The push_str method takes a string slice, and appends it to the end of the String we call the method on. The push method takes a character and appends it to the end of the String. Works the same way, only the parameter changes.

Most often you'll want to combine two already existing Strings. To do that you can simply use the + operator with a small detail we need to keep in consideration.

fn main() {
    let first_string = "This is some string ".to_string();
    let second_string = "Let's add some Data";

    let final_string = first_string + &second_string;

    // println!("First string is: {}", first_string); ----> This won't work.
    println!("Second string is: {}", second_string);

    println!("Finally we have: {}", final_string);
}

Strings-2

We cannot print first_string because the value was moved. Why was it moved, why can't we just borrow it like we did with second_string?
The answer to both is the way the '+' operator works. It calls the add method, and this is it's signature (when using strings. In the standard library it's written using generics):

fn add(self, s: &str) -> String

It uses self, and a borrowed string. We borrow second_string because we can only add a string reference or slice, to a String. Not Two String types. However those of you who have been following my articles might think "But if we do &second_string, the type is &String, not &str". And that is correct! However, the compiler can coerce/change the type into &str from &String. Since add doesn't take ownership of the 's' parameter of it's signature, we can still use second_string after using the + operator in our example.

However if we need to concatenate more than 2 strings, as you can probably see, it can quickly get unwieldy. Taking the example from the book..

     let s1 = String::from("tic");
    let s2 = String::from("tac");
    let s3 = String::from("toe");

    let s = s1 + "-" + &s2 + "-" + &s3;

For this kind of concatenating, we can use the format! macro. Which kind of works like the println!:

fn main() {
    let first_string = "This is some string ".to_string();
    let second_string = "Let's add some Data";
    let third_string = " And some more data!";
    let final_string = format!("{}, {}, {}", first_string, second_string, third_string);

    println!("{}", final_string);
}

Strings-3

As you can see, format works like I've been using println, but instead of just printing, we can save the information into a variable and use it in it's new nicely formatted way.

In other languages you could use, for example. first_string[0] to get the T character. Rust does not allow String Indexing, sadly. The answer as to why is basically due to how Rust stores Strings internally. The amount of bytes does not necessarily correspond to the amount of characters, since String is a wrapper around Vec<u8>. Please refer to the Internal representation in the references section for a better description.

If we need to preform operations on the individual characters of the string, or on the bytes, there's two handy iterators we can use on the Strings themselves. chars() and bytes().

fn main() {
    for c in "नमस्ते".chars() {
        println!("{}", c);
    }

    for b in "नमस्ते".bytes() {
        println!("{}", b);
    }
}

Strings-4

As you can see, the length in chars and in bytes is not the same. So ideally this is the better way of doing things with individual characters.

String Slices

String slices are references to parts of a string. They are defined as...

   let s = String::from("hello world");

    let hello = &s[0..5];
    let world = &s[6..11];

As you can see, they use the & operator to reference the string, and they're given a range [starting_index .. end_index + 1] (It's +1 since internally it's stored as one more than the ending index)

If the starting value is 0, you can ignore the value altogether and go for [..end_index+1] syntax. In exactly the same way, you can ignore the end_index if the slice goes up to the last element in the string.

Note: String slice range indices must occur at valid UTF-8 character boundaries. If you attempt to create a string slice in the middle of a multibyte character, your program will exit with an error.

Using slices to work with Strings, allows us to add an extra security measure. Since Rust is careful with the references, once you get a string slice from a string, then you cannot really modify that String anymore, because you'd be trying to modify something you're not supposed to. String Slice immutably borrows the String itself.

String literals are slices, and everywhere that you use &str as a parameter, you can use a String and &String. One function with that slice parameter, means that no matter what you want to pass to it, you can work around it. Instead of having multiple functions for multiple types, you can have 1 that will work with the various different possible variable types. It's really flexible!

That's it for today's deep-dive into Strings. There's a few explanations that I have left out since they don't fully enter the scope of this article, but as always, the reference sections has all the links to the extra info! Hope you've enjoyed reading!

References

Strings Chapter in The Rust Book

Strings Representation

Slice Type, String Slices