A
stream is a data structure that hides the details of how
collections are processed.
Because the details are hidden,
- The implementation is free to perform
the processing in the most efficient way, and
- Concurrent (parallel) processing of the collection is
facilitated
In Java 8, streams are preferred over
iteration for processing
collections.
Suppose we have a file "
alice.txt" and we want to know how many
words are longer than 12 characters.
The following code will read the words into a list of strings called
words:
Using iteration:
This works but it is difficult to parallelize the code, i.e., make the
list accessible to multiple threads.
Using streams:
More succinctly:
An alternate format:
Suppose we replace "
stream()" with "
parallelStream()":
Now the underlying stream implementation can do the filtering and
counting in parallel.
There are significant differences between streams and collections
(lists, sets, etc.):
- A stream does not store its elements, though they can be
generated on demand
- Stream operations do not mutate streams. They return new
streams that are transformations of the source
- Stream operations are lazy when possible. They are not
executed until their result is needed. (Therefore, there can be
infinite streams.)
- Once a stream has been reduced to a result, it cannot be
reused
Recall that in Java 8 a lambda expression can replace an object that is
an instance of a class implementing an interface that has only one
method.
Such an interface, called a
functional interface, is completely
characterized by the parameter and return types of its single method.
Streams are operated on by the single methods contained in functional
interfaces.
Some of the functional interfaces used by streams are shown below and
will be referred to later.
Stream operations express "
what" is done when processing
streams, not "
how" it is done.
Stream operations occur in three stages:
- Stream creation
- Stream transformation into other streams
- Stream termination through terminal operations
Terminal operations force execution of any lazy operations that
preceded them.
Once a terminal operation has been performed on a stream, the stream
can no longer be used.
As of Java 8, any
java.util.Collection can be converted into a
stream using the
stream() and
parallelStream() methods.
The new
Stream interface (in
java.util.stream) also
includes some static methods for creating streams:
- From arrays
- As infinite sequences
To create a stream from an array, use the static
Stream.of
method:
Stream.of's parameter is a "
varargs" parameter, meaning
it can accept a variable number of arguments:
Varargs parameters are bundled into arrays at the time the method is
called.
Infinite streams can be created using:
- Stream.generate(...)
- Stream.iterate(...)
These require certain kinds of lambda expressions, or objects
implementing functional interfaces, as parameters.
The
Stream.generate method requires a function (lambda
expression) with no arguments.
Such a function is an instance of a class implementing
the
Supplier<T> functional interface (see
Functional
Interfaces at left), which takes no
arguments and returns a value of type
T.
Here is an example that generates a stream of strings:
Here is an example that generates a stream of random doubles:
Note that this can be simplified using a
method reference:
The
Stream.iterate method requires a "seed" value and a function (lambda
expression) with one argument.
Such a function is an instance of a class implementing
the
UnaryOperator<T> functional interface (see
Functional
Interfaces at left), which takes one
argument of type
T and returns a value of type
T.
Suppose the function is
f.
Stream.iterate(seed, f)
creates the sequence
seed, f(seed), f(f(seed)), ...
Here is an example that generates
1, 2, 3, ...
Here is an example that generates
A, B, C, ...
Calling the
filter method on a stream "filters" out all elements
that do not satisfy a condition.
filter requires as argument an instance of a class implementing
the
Predicate<T> functional interface (see
Functional
Interfaces at left), that is, a function that takes an
argument of type
T and returns a
boolean value.
We have already seen an example of filtering a stream of strings to
obtain those that are longer than 12 characters:
The result of calling
filter is a new stream; the original
stream is not mutated.
Calling the
map method on a stream creates a new stream that
results from applying a function to each element.
map requires as argument an instance of a class implementing
the
Function<T,R> functional interface (see
Functional
Interfaces at left), that is, a function that takes an
argument of type
T and returns a value of
type
R.
T and
R can be the same type.
This example transforms words in a stream to lower case:
The same thing can be accomplished with a method reference:
In the preceding, a stream of strings is transformed into another
stream of strings, because
T and
R are the same type.
Here is an example where a stream of characters is transformed into a
stream of strings:
Calling
limit(n) on a stream returns a new stream that ends
after
n elements.
Recall how to make an unlimited stream of characters
A, B, C, ...:
Here is how to produce the finite stream
A, B, C, D:
For another example, the following yields a stream of 100 random
numbers:
It is important to understand that streams
are
lazy in the sense that:
- Computations on their source data are only performed when
a terminal stream operation is performed, and
- Source elements are consumed only as needed
We gave an example of a terminal operation earlier when
count()
was used to count words longer than 12 characters.
However, transformation operations such as
filter,
map,
and
limit do not by themselves actually process any source
elements.
It is only after a terminal operation is invoked that any filtering,
mapping, etc. is actually done.
We can observe a stream's laziness through the
peek method, a
stream transformation operation that allows additional action to be performed
on elements as they are actually consumed.
peek exists mainly to support debugging, as it allows you to see
elements as they flow past a point in a stream pipeline.
Here is an example:
Since
peek is not a terminal operation, this code will not
produce any output, showing that although the unlimited
stream
chars is created, its elements are not actually
consumed until they are needed.
Even a call to
limit is not a terminal operation:
The above code still does not produce output. Now consider:
Because
toArray is a terminal operation, this code will produce
the output:
Intermediate stream operations are divided into stateless
and stateful operations.
- Q: Why does the distinction between stateful and stateless
stream operations matter?
- A: Stateless transformations can be performed concurrently.
Stateless operations are those where processing an element of a stream
does not require remembering anything about previously seen elements,
i.e., there is no need to
retain state between elements.
Examples:
Stateful operations may incorporate state from previously seen elements
when processing new elements.
For example,
limit is stateful because it must know how many
previous elements have been processed.
Other stateful operations:
- skip: returns the elements from a given stream with
the first n elements skipped (opposite of limit)
- distinct: returns the elements from a given stream
without duplicates
- sorted: returns the elements from a given stream
sorted
Once you have created and (possibly) transformed a stream, you will
most likely want to get at, or "
consume," the stream data in some way.
Stream operations that allow you to consume stream data are called
"
terminal" operations because once applied, the stream becomes
unusable.
Terminal operations can be divided into those that:
- reduce a stream to a value,
- iterate over a stream's elements, and
- collect stream elements into an aggregation of some sort
We have already seen a simple example of a reduction: the
count
method returns the number of elements in a stream:
This section presents several other examples of reduction operations.
Most make use of a new concept called an
optional value.
Most reduction methods return a value of
type
Optional<T>:
- In Java 8 an Optional<T> value is preferred to
using null when an operation that normally returns a value of
type T fails to return a value at all.
- Such a value is a container object
which may or may not contain a non-null value.
- Methods exist to:
- Check if a value isPresent,
- get the value if it is present
- Return a different value if not (orElse)
- Invoke a lambda ifPresent
The
max (or
min) method returns the largest (or smallest)
element of a stream according to a provided comparator.
In the example below the returned value is of
type
Optional<String> so the value must be accessed
through methods of the
java.util.Optional class.
Note that the returned
Optional value would wrap a non-value if
the stream were empty.
Note that in this case the comparator is provided through a method
reference rather than a lambda expression.
This example finds the first word that starts with the letter 'Q' if
one exists.
Note that it works in conjunction with
filter:
If you want to find any word that starts with the letter 'Q',
use
findAny.
Since any will do, you can parallelize the stream. Note the conversion
using
parallel:
If you just want to know whether there is a match (not the matching element)
use
anyMatch.
Like
filter, this method requires as argument an instance of a class implementing
the
Predicate<T> functional interface (see
Functional
Interfaces at left).
So it does not need to be used with
filter, but it can also
benefit from parallelization.
There are similar methods
allMatch and
noneMatch.
Sometimes you want to reduce a stream to a value in a way not offered
by the previous methods.
A simple version of the
reduce method takes
as argument an instance of a class implementing
the
BinaryOperator<T,T> functional interface (see
Functional
Interfaces at left), that is, a function that takes two
arguments of type
T and returns a value of
type
T.
This example computes the sum of a stream's integer elements:
Note that the result, being an
Optional value, would require the
use of
get to extract the number (44).
If the stream were empty the sum would not have a value and a call
to
get would cause an exception.
Another version of
reduce takes an additional "
identity"
argument that can be used as the start of the computation,
insuring that a value will be produced even if the stream is empty:
Note that the result of the operation can be an
Integer rather
than an
Optional.
Suppose you want to add up the lengths of words in a stream.
The elements of the stream are of type
String but we want to
compute a value of type
Integer. Thus we want to apply a lambda
like:
where
total is a number and
word is a string.
A
BinaryOperator<T> is not sufficient for this, because it takes
two arguments of type
T and returns a value of type
T.
Another version of
reduce takes an additional
BiFunction<T,U> argument (see
Functional
Interfaces at left), that is, a function that takes two arguments
of type
T and
U and returns a value of
type
R.
This argument, called an
accumulator, is repeatedly applied to
partial totals. In our example:
It may seem like the third argument, of
type
BinaryOperator<T>, is unnecessary. However, when
parallelized this operation will have multiple totals, and their
results need to be combined.
Sometimes, instead of reducing a stream to a single value, you simply
want to iterate over its elements. Here are methods that support this:
- iterator: yields an
old-fashioned java.util.Iterator object
- toArray: returns an Object[] array that can be
looped over
- forEach: takes
as argument an instance of a class implementing
the Consumer<T> functional interface (see Functional
Interfaces at left), that is, a function that takes an
argument of type T and returns no value. The function is
applied to each element of the stream.
This method can be used when printing or sending elements to a
database. For example:
or
To convert a stream into a
java.util.Collection, for example,
a
java.util.List, use the
collect method.
One version of this method requires an object that
can
supply new instances of the target
collection,
accumulate these instances in the collection,
and
combine multiple collections into one.
Such objects are provided by static methods in the
Collectors
class in
java.util.stream. For example:
Also:
Other methods exist to convert streams into sets and maps.