Pipelines

One of the core designs of Nu is the pipeline, a design idea that traces its roots back decades to some of the original philosophy behind Unix. Just as Nu extends from the single string data type of Unix, Nu also extends the idea of the pipeline to include more than just text.

Basics

A pipeline is composed of three parts: the input, the filter, and the output.

open Cargo.toml | update workspace.dependencies.base64 0.24.2 | save Cargo_new.toml

The first command, open Cargo.toml, is an input (sometimes also called a "source" or "producer"). This creates or loads data and feeds it into a pipeline. It's from input that pipelines have values to work with. Commands like ls are also inputs, as they take data from the filesystem and send it through the pipelines so that it can be used.

The second command, update workspace.dependencies.base64 0.24.2, is a filter. Filters take the data they are given and often do something with it. They may change it (as with the update command in our example), or they may perform other operations, like logging, as the values pass through.

The last command, save Cargo_new.toml, is an output (sometimes called a "sink"). An output takes input from the pipeline and does some final operation on it. In our example, we save what comes through the pipeline to a file as the final step. Other types of output commands may take the values and view them for the user.

The $in variable will collect the pipeline into a value for you, allowing you to access the whole stream as a parameter:

[1 2 3] | $in.1 * $in.2
# => 6

Multi-line pipelines

If a pipeline is getting a bit long for one line, you can enclose it within parentheses ():

let year = (
    "01/22/2021" |
    parse "{month}/{day}/{year}" |
    get year
)

Semicolons

Take this example:

line1; line2 | line3

Here, semicolons are used in conjunction with pipelines. When a semicolon is used, no output data is produced to be piped. As such, the $in variable will not work when used immediately after the semicolon.

As there is a semicolon after line1, the command will run to completion and get displayed on the screen.
line2 | line3 is a normal pipeline. It runs, and its contents are displayed after line1's contents.

Pipeline Input and the Special `$in` Variable

Much of Nu's composability comes from the special $in variable, which holds the current pipeline input.

$in is particular useful when used in:

Command or external parameters
Filters
Custom command definitions or scripts that accept pipeline input

`$in` as a Command Argument or as Part of an Expression

Compare the following two command-lines that create a directory with tomorrow's date as part of the name. The following are equivalent:

Using subexpressions:

mkdir $'((date now) + 1day | format date '%F') Report'

or using pipelines:

date now                    # 1: today
| $in + 1day                # 2: tomorrow
| format date '%F'          # 3: Format as YYYY-MM-DD
| $'($in) Report'           # 4: Format the directory name
| mkdir $in                 # 5: Create the directory

While the second form may be overly verbose for this contrived example, you'll notice several advantages:

It can be composed step-by-step with a simple ↑ (up arrow) to repeat the previous command and add the next stage of the pipeline.
It's arguably more readable.
Each step can, if needed, be commented.
Each step in the pipeline can be inspected for debugging.

Let's examine the contents of $in on each line of the above example:

On line 2, $in refers to the results of line 1's date now (a datetime value).
On line 4, $in refers to tomorrow's formatted date from line 3 and is used in an interpolated string
On line 5, $in refers to the results of line 4's interpolated string, e.g. '2024-05-14 Report'

Pipeline Input in Filter Closures

Certain filter commands may modify the pipeline input to their closure in order to provide more convenient access to the expected context. For example:

1..10 | each {$in * 2}

Rather than referring to the entire range of 10 digits, the each filter modifies $in to refer to the value of the current iteration.

In most filters, the pipeline input and its resulting $in will be the same as the closure parameter. For the each filter, the following example is equivalent to the one above:

1..10 | each {|value| $value * 2}

However, some filters will assign an even more convenient value to their closures' input. The update filter is one example. The pipeline input to the update command's closure (as well as $in) refers to the column being updated, while the closure parameter refers to the entire record. As a result, the following two examples are also equivalent:

ls | update name {|file| $file.name | str upcase}
ls | update name {str upcase}

With most filters, the second version would refer to the entire file record (with name, type, size, and modified columns). However, with update, it refers specifically to the contents of the column being updated, in this case name.

Pipeline Input in Custom Command Definitions and Scripts

See: Custom Commands -> Pipeline Input

When Does `$in` Change (and when can it be reused)?

Rule 1: When used in the first position of a pipeline in a closure or block, $in refers to the pipeline (or filter) input to the closure/block.
Example:
```
def echo_me [] {
  print $in
}
true | echo_me
# => true
```
Rule 1.5: This is true throughout the current scope. Even on subsequent lines in a closure or block, $in is the same value when used in the first position of any pipeline inside that scope.
Example:
```
[ a b c ] | each {
  print $in
  print $in
  $in
}
```
All three of the $in values are the same on each iteration, so this outputs:
```
a
a
b
b
c
c
╭───┬───╮
│ 0 │ a │
│ 1 │ b │
│ 2 │ c │
╰───┴───╯
```

Rule 2: When used anywhere else in a pipeline (other than the first position), $in refers to the previous expression's result:

Example:

4               # Pipeline input
| $in * $in     # $in is 4 in this expression
| $in / 2       # $in is now 16 in this expression
| $in           # $in is now 8
# =>   8

Rule 2.5: Inside a closure or block, Rule 2 usage occurs inside a new scope (a sub-expression) where that "new" $in value is valid. This means that Rule 1 and Rule 2 usage can coexist in the same closure or block.

Example:

4 | do {
  print $in            # closure-scope $in is 4

  let p = (            # explicit sub-expression, but one will be created regardless
    $in * $in          # initial-pipeline position $in is still 4 here
    | $in / 2          # $in is now 16
  )                    # $p is the result, 8 - Sub-expression scope ends

  print $in            # At the closure-scope, the "original" $in is still 4
  print $p
}

So the output from the 3 print statements is:

4
4
8

Again, this would hold true even if the command above used the more compact, implicit sub-expression form:

Example:

4 | do {
  print $in                       # closure-scope $in is 4
  let p = $in * $in | $in / 2     # Implicit let sub-expression
  print $in                       # At the closure-scope, $in is still 4
  print $p
}

4
4
8

Rule 3: When used with no input, $in is null.

Example:

# Input
1 | do { $in | describe }
# =>   int
"Hello, Nushell" | do { $in | describe }
# =>   string
{||} | do { $in | describe }
# =>   closure

# No input
do { $in | describe }
# =>   nothing

Rule 4: In a multi-statement line separated by semicolons, $in cannot be used to capture the results of the previous statement.
This is the same as having no-input:
```
ls / | get name; $in | describe
# => nothing
```
Instead, simply continue the pipeline:
```
ls / | get name | $in | describe
# => list<string>
```

Best practice for `$in` in Multiline Code

While $in can be reused as demonstrated above, assigning its value to another variable in the first line of your closure/block will often aid in readability and debugging.

Example:

def "date info" [] {
  let day = $in
  print ($day | format date '%v')
  print $'... was a ($day | format date '%A')'
  print $'... was day ($day | format date '%j') of the year'
}

'2000-01-01' | date info
# =>  1-Jan-2000
# => ... was a Saturday
# => ... was day 001 of the year

Collectability of `$in`

Currently, the use of $in on a stream in a pipeline results in a "collected" value, meaning the pipeline "waits" on the stream to complete before handling $in with the full results. However, this behavior is not guaranteed in future releases. To ensure that a stream is collected into a single variable, use the collect command.

Likewise, avoid using $in when normal pipeline input will suffice, as internally $in forces a conversion from PipelineData to Value and may result in decreased performance and/or increased memory usage.

Working with External Commands

Nu commands communicate with each other using the Nu data types (see types of data), but what about commands outside of Nu? Let's look at some examples of working with external commands:

internal_command | external_command

Data will flow from the internal_command to the external_command. This data will get converted to a string, so that they can be sent to the stdin of the external_command.

external_command | internal_command

Data coming from an external command into Nu will come in as bytes that Nushell will try to automatically convert to UTF-8 text. If successful, a stream of text data will be sent to internal_command. If unsuccessful, a stream of binary data will be sent to internal command. Commands like lines help make it easier to bring in data from external commands, as it gives discrete lines of data to work with.

external_command_1 | external_command_2

Nu works with data piped between two external commands in the same way as other shells, like Bash would. The stdout of external_command_1 is connected to the stdin of external_command_2. This lets data flow naturally between the two commands.

Command Input and Output Types

The Basics section above describes how commands can be combined in pipelines as input, filters, or output. How you can use commands depends on what they offer in terms of input/output handling.

You can check what a command supports with help <command name>, which shows the relevant Input/output types.

For example, through help first we can see that the first command supports multiple input and output types:

help first
# => […]
# => Input/output types:
# =>   ╭───┬───────────┬────────╮
# =>   │ # │   input   │ output │
# =>   ├───┼───────────┼────────┤
# =>   │ 0 │ list<any> │ any    │
# =>   │ 1 │ binary    │ binary │
# =>   │ 2 │ range     │ any    │
# =>   ╰───┴───────────┴────────╯

[a b c] | first                                                                                                                                   took 1ms
# => a

1..4 | first                                                                                                                                     took 21ms
# => 1

As another example, the ls command supports output but not input:

help ls
# => […]
# => Input/output types:
# =>   ╭───┬─────────┬────────╮
# =>   │ # │  input  │ output │
# =>   ├───┼─────────┼────────┤
# =>   │ 0 │ nothing │ table  │
# =>   ╰───┴─────────┴────────╯

This means, for example, that attempting to pipe into ls (echo .. | ls) leads to unintended results. The input stream is ignored, and ls defaults to listing the current directory.

To integrate a command like ls into a pipeline, you have to explicitly reference the input and pass it as a parameter:

echo .. | ls $in

Note that this only works if $in matches the argument type. For example, [dir1 dir2] | ls $in will fail with the error can't convert list<string> to string.

Other commands without default behavior may fail in different ways, and with explicit errors.

For example, help sleep tells us that sleep supports no input and no output types:

help sleep
# => […]
# => Input/output types:
# =>   ╭───┬─────────┬─────────╮
# =>   │ # │  input  │ output  │
# =>   ├───┼─────────┼─────────┤
# =>   │ 0 │ nothing │ nothing │
# =>   ╰───┴─────────┴─────────╯

When we erroneously pipe into it, instead of unintended behavior like in the ls example above, we receive an error:

echo 1sec | sleep
# => Error: nu::parser::missing_positional
# => 
# =>   × Missing required positional argument.
# =>    ╭─[entry #53:1:18]
# =>  1 │ echo 1sec | sleep
# =>    ╰────
# =>   help: Usage: sleep <duration> ...(rest) . Use `--help` for more information.

While there is no steadfast rule, Nu generally tries to copy established conventions in command behavior, or do what 'feels right'. The sleep behavior of not supporting an input stream matches Bash sleep behavior for example.

Many commands do have piped input/output however, and if it's ever unclear, check their help documentation as described above.

Behind the Scenes

You may have wondered how we see a table if ls is an input and not an output. Nu adds this output for us automatically using another command called table. The table command is appended to any pipeline that doesn't have an output. This allows us to see the result.

In effect, the command:

ls

And the pipeline:

ls | table

Are one and the same.

Note

The phrase "are one and the same" above only applies to the graphical output in the shell, it does not mean the two data structures are the same:

(ls) == (ls | table)
# => false

ls | table is not even structured data!

Output Result to External Commands

Sometimes you want to output Nushell structured data to an external command for further processing. However, Nushell's default formatting options for structured data may not be what you want. For example, you want to find a file named "tutor" under "/usr/share/vim/runtime" and check its ownership

ls /usr/share/nvim/runtime/
# => ╭────┬───────────────────────────────────────┬──────┬─────────┬───────────────╮
# => │  # │                 name                  │ type │  size   │   modified    │
# => ├────┼───────────────────────────────────────┼──────┼─────────┼───────────────┤
# => │  0 │ /usr/share/nvim/runtime/autoload      │ dir  │  4.1 KB │ 2 days ago    │
# => ..........
# => ..........
# => ..........
# => 
# => │ 31 │ /usr/share/nvim/runtime/tools         │ dir  │  4.1 KB │ 2 days ago    │
# => │ 32 │ /usr/share/nvim/runtime/tutor         │ dir  │  4.1 KB │ 2 days ago    │
# => ├────┼───────────────────────────────────────┼──────┼─────────┼───────────────┤
# => │  # │                 name                  │ type │  size   │   modified    │
# => ╰────┴───────────────────────────────────────┴──────┴─────────┴───────────────╯

You decided to use grep and pipe the result to external ^ls

ls /usr/share/nvim/runtime/ | get name | ^grep tutor | ^ls -la $in
# => ls: cannot access ''$'\342\224\202'' 32 '$'\342\224\202'' /usr/share/nvim/runtime/tutor        '$'\342\224\202\n': No such file or directory

What's wrong? Nushell renders lists and tables (by adding a border with characters like ╭,─,┬,╮) before piping them as text to external commands. If that's not the behavior you want, you must explicitly convert the data to a string before piping it to an external. For example, you can do so with to text:

ls /usr/share/nvim/runtime/ | get name | to text | ^grep tutor | tr -d '\n' | ^ls -la $in
# => total 24
# => drwxr-xr-x@  5 pengs  admin   160 14 Nov 13:12 .
# => drwxr-xr-x@  4 pengs  admin   128 14 Nov 13:42 en
# => -rw-r--r--@  1 pengs  admin  5514 14 Nov 13:42 tutor.tutor
# => -rw-r--r--@  1 pengs  admin  1191 14 Nov 13:42 tutor.tutor.json

(Actually, for this simple usage you can just use find)

ls /usr/share/nvim/runtime/ | get name | find tutor | ansi strip | ^ls -al ...$in

Command Output in Nushell

Unlike external commands, Nushell commands are akin to functions. Most Nushell commands do not print anything to stdout and instead just return data.

do { ls; ls; ls; "What?!" }

This means that the above code will not display the files under the current directory three times. In fact, running this in the shell will only display "What?!" because that is the value returned by the do command in this example. However, using the system ^ls command instead of ls would indeed print the directory thrice because ^ls does print its result once it runs.

Knowing when data is displayed is important when using configuration variables that affect the display output of commands such as table.

do { $env.config.table.mode = "none"; ls }

For instance, the above example sets the $env.config.table.mode configuration variable to none, which causes the table command to render data without additional borders. However, as it was shown earlier, the command is effectively equivalent to

do { $env.config.table.mode = "none"; ls } | table

Because Nushell $env variables are scoped, this means that the table command in the example is not affected by the environment modification inside the do block and the data will not be shown with the applied configuration.

When displaying data early is desired, it is possible to explicitly apply | table inside the scope, or use the print command.

do { $env.config.table.mode = "none"; ls | table }
do { $env.config.table.mode = "none"; print (ls) }