Minborg's Java Pot: Java 8

Showing posts with label Java 8. Show all posts

Friday, October 4, 2019

Become a Master of Java Streams - Part 1: Creating Streams

Declarative code (e.g. functional composition with Streams) provides superior code metrics in many cases. Code your way through this hands-on-lab article series and mature into a better Java programmer by becoming a Master of Java Streams.

The whole idea with Streams is to represent a pipeline through which data will flow and the pipeline’s functions operate on the data. This way, functional-style operations on Streams of elements can be expressed. This article is the first out of five where you will learn firsthand how to become a Master of Streams. We start with basic stream examples and progress with more complex tasks until you know how to connect standard Java Streams to databases in the Cloud.

Once you have completed all five articles, you will be able to drastically reduce your codebase and know how to write pure Java code for the entire applications in a blink.

Here is a summary of the upcoming articles:

Part 1: Creating Streams
Part 2: Intermediate Operations
Part 3: Terminal Operations
Part 4: Database Streams
Part 5: Creating a Database Application Using Streams

Since we are firm believers in the concept of ”Learning by doing”, the series is complemented by a GitHub repository that contains Stream exercises split into 5 Units - each corresponding to the topic of an article. Instructions on how to use the source code are provided in the README-file.

What are Java Streams?

The Java Stream interface was first introduced in Java 8 and, together with lambdas, acts as a milestone in the development of Java since it contributes greatly to facilitating a declarative (functional) programming style. If you want to learn more about the advantages of declarative coding we refer you to this article.

A Java Stream can be visualized as a pipeline through which data will flow (see the image below). The pipeline’s functions will operate on the data by e.g. filtering, mapping and sorting the items. Lastly, a terminal operation can be performed to collect the items in a preferred data structure such as a List, an Array or a Map. An important thing to notice is that a Stream can only be consumed once.

A Stream Pipeline contains three main parts; the stream source, the intermediate operation(s) (zero to many) and a terminal operation.

Let’s have a look at an example to get a glimpse of what we will be teaching throughout this series. We encourage you to look at the code below and try to figure out what the print-statement will result in before reading the next paragraph.

List<String> list = Stream.of("Monkey", "Lion", "Giraffe","Lemur")
    .filter(s -&gt; s.startsWith("L"))
    .map(String::toUpperCase)
    .sorted()
    .collect(toList());
System.out.println(list);

Since the Stream API is descriptive and most often intuitive to use, you will probably have a pretty good understanding of the meaning of these operations regardless if you have encountered them before or not. We start off with a Stream of a List containing four Strings, each representing an African animal. The operations then filter out the elements that start with the letter “L”, converts the remaining elements to uppercase letters, sorts them in natural order (which in this case means alphabetical order) and lastly collects them into a List. Hence, resulting in the output [“LEMUR”, “LION”].

It is important to understand that Streams are “lazy” in the sense that elements are “requested” by the terminal operation (in this case the .collect() statement). If the terminal operation only needs one element (like, for example, the terminal operation .findFirst()), then at most one element is ever going to reach the terminal operation and the reminding elements (if any) will never be produced by the source. This also means that just creating a Stream is often a cheap operation whereas consuming it might be expensive depending on the stream pipeline and the number of potential elements in the stream.

In this case, the Stream Source was a List although many other types can act as a data source. We will spend the rest of this article describing some of the most useful source alternatives.

Stream Sources

Streams are mainly suited for handling collections of objects and can operate on elements of any type T. Although, there exist three special Stream implementations; IntStream, LongStream, and DoubleStream which are restricted to handle the corresponding primitive types.

An empty Stream of any of these types can be generated by calling Stream.empty() in the following manner:

Stream<T>     Stream.empty()
IntStream     IntStream.empty()
LongStream    LongStream.empty()
DoubleStream  DoubleStream.empty()

Empty Streams are indeed handy in some cases, but the majority of the time we are interested in filling our Stream with elements. This can be accomplished in a large number of ways. We will start by looking at the special case of an IntStream since it provides a variety of useful methods.

Useful IntStreams

A basic case is generating a Stream over a small number of items. This can be accomplished by listing the integers using IntStream.of(). The code below yields a simple stream of elements 1, 2 and 3.

IntStream oneTwoThree = IntStream.of(1, 2, 3);

Listing all elements manually can be tedious if the number of items grows large. In the case where we are interested in values in a certain range, the command .rangeClosed() is more effective. The operation is inclusive, meaning that the following code will produce a stream of all elements from 1 to 9.

IntStream positiveSingleDigits = IntStream.rangeClosed(1, 9);

An even more powerful command is .iterate() which enables greater flexibility in terms of what numbers to include. Below, we show an example of how it can be used to produce a Stream of all numbers that are powers of two.

IntStream powersOfTwo = IntStream.iterate(1, i -> i * 2);

There are also several perhaps more unexpected ways of producing a Stream. The method chars() can be used to Stream over the characters in a String, in this case, the elements “A”, “B” and “C”.

IntStream chars = "ABC".chars();

There is also a simple way to generate a Stream of random integers.

IntStream randomInts = new Random().ints();

Stream an Array

Streaming existing data collections is another option. We can stream the elements of an existing Array or choose to list items manually using Stream.of() as previously shown and repeated below.

String[] array = {"Monkey", "Lion", "Giraffe", "Lemur"};
Stream<String> stream2 = Stream.of(array);

Stream<String> stream = Stream.of("Monkey", "Lion", "Giraffe", "Lemur");

Stream from a Collection

It is also very simple to stream any Collection. The examples below demonstrate how a List or Set can be streamed with the simple command .stream().

List<String> list = Arrays.asList("Monkey", "Lion", "Giraffe", "Lemur");
Stream<String> streamFromList = list.stream();

Set<String> set = new HashSet<>(list);
Stream<String> streamFromSet = set.stream();

Stream from a Text File

Sometimes it can also be useful to stream the contents of a text-file. The following command will provide a Stream<String> that holds every line from the referenced file as a separate element.

Stream<String> lines = Files.lines(Paths.get("file.txt"));

Exercise

Now that we have familiarized you with some of the ways of creating a Stream, we encourage you to clone this GitHub repo and start practicing. The content of the article will be enough to solve the first Unit which is called Create. The Unit1Create interface contains JavaDocs which describes the intended implementation of the methods in Unit1MyCreate.

public interface Unit1Create {
 /**
  * Creates a new Stream of String objects that contains
  * the elements "A", "B" and "C" in order.
  *
  * @return a new Stream of String objects that contains
  *   the elements "A", "B" and "C" in order
  */
  Stream<String> newStreamOfAToC();

The provided tests (e.g. Unit1MyCreateTest) will act as an automatic grading tool, letting you know if you solution was correct or not.

If you have not done so yet, go ahead and solve the work items in the Unit1MyCreate class. “Gotta catch ‘em all”.

In the next article, we will continue to describe several intermediate operations that can be applied to these Streams and that will convert them into other Streams. See you soon!

Authors

Per Minborg
Julia Gustafsson

Monday, October 8, 2018

Java: Gain Performance Using SingletonStream

Java streams with just one element sometimes create unnecessary overhead in your applications. Learn how to use SingletonStream objects and gain over tenfold performance for some of these kinds of streams and learn how, at the same time, you can simplify your code.

Background

The Stream library in Java 8 is one of the most powerful additions to the Java language ever. Once you start to understand its versatility and resulting code readability, your Java code-style will change forever. Instead of bloating your code with all the nitty and gritty details with for, if and switch statements and numerous intermediate variables, you can use a Stream that just contains a description of what to do, and not really how it is done.

Some years ago, we had to make an API decision for a Java project: Which return type should we select for the two fast local in-memory data cache methods with;

a unique search key which returns either a value or no value
a non-unique search key which returns any number of values (zero to infinity).

This was the initial idea:

Optional<T> searchUnique(K key); // For unique keys
Stream<T> search(K key);         // For non-unique keys

But, we would rather have the two methods look exactly the same and both return a Stream<T>. The API would then look much cleaner because a unique cache would then look exactly the same as a non-unique cache.

However, the unique search had to be very efficient and able to create millions of result objects each second without creating too much overhead.

The Solution

By implementing a SingletonStream that only takes a single element (and therefore can be highly optimized compared to a normal Stream with any number of elements), we were able to let both methods return a Stream while retaining performance. The method searchUnique(K key) would return an empty stream (Stream.empty()) if the key was not found, and it would return a SingletonStream with the value associated with the key if the key existed. We would get:

Stream<T> searchUnique(K key); // For unique keys
Stream<T> search(K key);       // For non-unique keys

Great! We can eat the cookie and still have it!

The Implementation

The SingletonStream is a part of the Speedment Stream ORM and can be viewed here on GitHub. Feel free to use Speedment and any of it's component in your own projects using the Speedment initializer.

The SingletonStream is a good candidate for stack allocation using the JVM's Escape Analysis (read more on Escape Analysis in my previous posts here and here). The implementation comes in two shapes. if we set the STRICT value to true, we will get a completely lazy Stream, but the drawback is that we will lose the Singleton Property once we call some Intermediate Operations like .filter(), map() etc. If we, on the other hand, set the STRICT value to false, the SingletonStream will perform many of the Intermediate Operations eagerly and it will be able to return a new SingletonStream thereby retaining the Singleton Property. This will give better performance in many cases.

The solution devised here for reference streams could also easily be modified to the primitive incarnations of singleton streams. So, it would be almost trivial to write a SingletonIntStream, a SingletonLongStream and a SingletonDoubleStream. Here is a SingletonLongStream.

It should be noted that the class could be further developed so it could support lazy evaluation while still being always high performant. This is a future work.

Performance

There are many ways one could test the performance of the SingletonStream and compare it with a standard Stream implementation with one element.

Here is one way of doing it using JMH. The first tests (count) just counts the number of elements in the stream and the second tests (forEach) does something with one element of a stream.

@Benchmark
public long singletonStreamCount() {
    return SingletonStream.of("A").count();
}

@Benchmark
public long streamCount() {
    return Stream.of("A").count();
}

@Benchmark
public void singletonStreamForEach() {
    SingletonStream.of("A")
        .limit(1)
        .forEach(blackHole());
}

@Benchmark
public void streamForEach() {
   Stream.of("A")
        .limit(1)
        .forEach(blackHole());
}

private static <T> Consumer<T> blackHole() {

    return t -> {};
}

This will produce the following result when run on my MacBook Pro laptop:

...
Benchmark                               Mode  Cnt           Score   Error  Units
SingletonBench.singletonStreamCount    thrpt        333419753.335          ops/s
SingletonBench.singletonStreamForEach  thrpt       2312262034.214          ops/s
SingletonBench.streamCount             thrpt         27453782.595          ops/s
SingletonBench.streamForEach           thrpt         26156364.956          ops/s
...

That's a speedup factor over 10 for the "count" operation. For the "forEach" operation, it looks like the JVM was able to completely optimize away the complete code path for the SingletonStream.

Test It

Download Speedment using the Speedment initializer.

The complete test class is available here.

Conclusions

The SingletonStream works more or less as an extended Optional and allows high performance while retaining the benefits of the Stream library.

You can select two versions of it by setting the STRICT value to your preferred stringency/performance choice.

The SingletonStream could be further improved.

Friday, December 23, 2016

Day 23, Java Holiday Calendar 2016, Use Mappable Types Instead of Bloated Ones

Today's tips is about mappable types. Traditionally we Java developer have relied on inheritance and types with a number of methods to support convenient use of our classes. A traditional Point class would contain x and y coordinates and perhaps also a number of functions that allows us to easily work with our Point.

Imagine now that we add new shapes like Circle and that Circle inherits from Point and adds a radius. Further imagine that there are a bunch of other geometric shapes like Line, Square and the likes that also appear in the code. After a while, all our classes becomes entangled in each other and we end up in a messy hair ball.

However, there is another way of structuring our geometric classes such that there is a minimum of inter-class coupling. Instead of letting each geometric class provide specific methods for translations like add() and flipAroundXAxis() we could add just one or two generic methods that operates on the geometric figure and returns a value of any type. We could then break out the old methods from the geometric classes and convert them to functions and just apply them on the objects rather than letting the objects handle that responsibility.

Let us take the concept for a spin!

Traits

We start with some basic Traits of the geometrical shapes that are shared amongst most shapes. Here are the basic traits:

interface HasX { int x(); } // x is the x-coordinate

interface HasY { int y(); } // y is the y-coordinate

interface HasR { int r(); } // r is the radius

What's the Point?

Now it is time to create our Point interface like this:

interface Point extends HasX, HasY {

    static Point point(int x, int y) {
        return new Point() {
            @Override public int x() { return x; }
            @Override public int y() { return y; }
        };
    }

    static Point origo() { return point(0, 0); }

    default <R> R map(Function<? super Point, ? extends R> mapper) {
        return Objects.requireNonNull(mapper).apply(this);
    }

    default <R, U> R map(BiFunction<? super Point, ? super U, ? extends R> mapper, U other) {
        return Objects.requireNonNull(mapper).apply(this, other);
    }

}

I have purposely refrained from creating an implementation class of the Point interface. Instead, each time the static point() method is called, an instance from an anonymous class is created. This illustrates that the implementation class is "pure" and does not inherit or override anything. In a real solution, there could of course be an implementation of an immutable class PointImpl.

In the middle, there is a conveniency method that return a point in origo (e.g. point(0, 0)).

The two methods at the end of the class are where the interesting stuff starts. They allow us to apply almost any function to a Point.

The first one takes a mapping function that, in turn, takes a point and maps it so something else (i.e. a Function)

The last one takes a mapping function that takes two points and maps it to a Point (e.g. a Binary Function). The mapping function applies the current Point (i.e. "this") as the first parameter and then it also applies another point. That other point is given as the second argument to the map() function. Complicated? Not really. Read more and it will be apparent what is going on.

The Functions

Now we can define a number of useful functions that we could apply to the Point class.

interface PointFunctions {

    static UnaryOperator<Point> NEGATE = (f) -> point(-f.x(), -f.y());
    static BinaryOperator<Point> ADD = (f, s) -> point(f.x() + s.x(), f.y() + s.y());
    static BinaryOperator<Point> SUBTRACT = (f, s) -> point(f.x() - s.x(), f.y() - s.y());
    static UnaryOperator<Point> SWAP_OVER_X_AXIS = (f) -> point(f.x(), -f.y());
    static UnaryOperator<Point> SWAP_OVER_Y_AXIS = (f) -> point(-f.x(), f.y());
    static BinaryOperator<Point> BETWEEN = (f, s) -> point((f.x() + s.x()) / 2, (f.y() + f.y()) / 2);
    static Function<Point, String> TO_STRING = (p) -> String.format("(%d, %d)", p.x(), p.y());
    static BiFunction<Point, Point, Boolean> EQUALS = (f, s) -> (f.x() == s.x()) && (f.y() == s.y());
    //
    static BiFunction<Point, Point, Double> DISTANCE = (f, s) -> Math.sqrt((f.x()-s.x())^2+(f.y()-s.y())^2);
    //
    static Function<Point, UnaryOperator<Point>> ADD2 = s -> (f) -> point(f.x() + s.x(), f.y() + s.y());
}

These methods (or any other similar methods or lambdas) can now be applied to points without polluting the classes.

Example of Usage

This code snippet will create a first point at origo, apply a number of translations to it and then convert it to a string using the TO_STRING mapper. Note how easy it would be to use another string mapper because now the "to string" functionality is separate from the class itself. It would also be easy to use a custom lambda instead of one of the pre-defined functions.

System.out.println(
     origo()
         .map(ADD, point(1, 1))       // 1
         .map(SWAP_OVER_X_AXIS)       // 2
         .map(NEGATE)                 // 3
         .map(BETWEEN, point(-1, -1)) // 4
         .map(TO_STRING)
    );

This will produce the following output:

(-1, 0)

As can be seen in the picture below, this seams to be correct:

Expanding the Concept

If we later introduce a Circle interface like this

interface Circle extends HasX, HasY, HasR {
  // Stuff similar to Point...
}

and we change our methods so that they can work with the traits directly (and after some additional refactoring not shown here) we can get a decoupled environment where the functions can be made to operate on any relevant shape. For example, the following method can return a "toString" mapper that can work on anything having an x and a y value (e.g. both Point and Circle)

static <T extends HasX & HasY> Function<T, String> toStringXY() {
    return (T t) -> String.format("(%d, %d)", t.x(), t.y());
}

After further modification, we could take a Circle and ADD a Point to it and the circle will be translated using the point's coordinates.

Alternate Solutions

It is possible to wrap any class in an Optional and then use the Optional's map() method to map values. In that case we do not need the map() functions in the shapes. On the other hand, we would have to create an Optional and then also get() the end value once all mappings have been applied.

Another way would be just having a bunch of static methods that takes a Point and other shapes as parameters like this:

static <T extends HasX & HasY> Point add(Point p, T other) {
    return point(p.x() + other.x(), p.y() + other.y());
}

But that... is another story...

Follow the Java Holiday Calendar 2016 with small tips and tricks all the way through the winter holiday season. I am contributing to open-source Speedment, a stream based ORM tool and runtime. Please check it out on GitHub.

Wednesday, December 21, 2016

Day 21, Java Holiday Calendar 2016, Concatenate Java Streams

Today's tip is about concatenating streams. The task of the day is to construct a concatenated stream that lazily consumes a number of underlying streams. So, dumping the content from the various streams into a List and then stream from the list or using the Stream.builder() will not do.

As an example, we have three streams with words that are relevant to the US history and constitution:

        // From 1787
        final Stream preamble = Stream.of(
            "We", "the", "people", "of", "the", "United", "States"
        );

        // From 1789
        final Stream firstAmendment = Stream.of(
            "Congress", "shall", "make", "no", "law", 
            "respecting", "an", "establishment", "of", "religion"
        );

        // From more recent days
        final Stream epilogue = Stream.of(
            "In", "reason", "we", "trust"
        );

Creating a concatenated stream can be done in many ways including these:

        // Works for a small number of streams
        Stream.concat(
            preamble,
            Stream.concat(firstAmendment, epilogue)
        )
            .forEach(System.out::println);

        
        // Works for any number of streams
        Stream.of(preamble, firstAmendment, epilogue)
            .flatMap(Function.identity())
            .forEach(System.out::println);

Both methods will produce the same result and they will also close the underlying streams upon completion. Personally, I prefer the latter method since it is more general and can concatenate any number of streams. This is the output of the program:

We
the
people
of
the
United
States
Congress
shall
make
no
law
respecting
an
establishment
of
religion
In
reason
we
trust

Follow the Java Holiday Calendar 2016 with small tips and tricks all the way through the winter holiday season. I am contributing to open-source Speedment, a stream based ORM tool and runtime. Please check it out on GitHub.

Sunday, December 18, 2016

Day 18, Java Holiday Calendar 2016, Easily Create Database Content

Day 18, Easily Create Database Content

Today's tips is about creating database content. There are a number of ways to do this, ranging from writing our own entity beans combined with using JDBC directly to fully automating the entire process.

Suppose we already have a database table like this:

mysql> explain country
+------------+-------------+------+-----+---------+----------------+
| Field      | Type        | Null | Key | Default | Extra          |
+------------+-------------+------+-----+---------+----------------+
| id         | int(11)     | NO   | PRI | NULL    | auto_increment |
| name       | varchar(45) | YES  | UNI | NULL    |                |
| local_name | varchar(45) | YES  |     | NULL    |                |
| code       | int(11)     | YES  |     | NULL    |                |
| domain     | varchar(10) | YES  |     | NULL    |                
+------------+-------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)

Then we could add the Speedment plugin and dependency to our POM and launch the Speedment graphic tool that will analyze the database and generate code automatically for us.

After generation we can do this:

Initialization:

final MyApplication app = new MyApplicationBuilder()
    .withPassword("myPwd729") // Replace with the real pwd
    .build();

final CountryManager countries = app.getOrThrow(CountryManager.class);

Insert DB Content:

countries.persist(
    new CountryImpl()
        .setName("Sweden")
        .setLocalName("Sverige")
        .setCode(40)       // Intentionally wrong, should be 46!!
        .setDomain(".se")
);

Update DB Content:

countries.stream()
    .filter(Country.NAME.equal("Sweden"))  // Filter out Sweden
    .map(c -> c.setCode(46))               // Update code to 46
    .forEach(countries.updater());         // Apply the database updater

Saturday, December 17, 2016

Day 17, Java Holiday Calendar 2016, Parallel Streams in Custom Thread Pools

17. Parallel Streams in Custom Thread Pools

Today's tips is about executing parallel streams in custom thread pools. One of the main drivers for developing the Java 8 streams was their ability to abstract away parallelism. In theory, we could take any stream and make it parallel with the .parallel() method. In reality... not so easy...

One of the caveats with parallel streams is that they, by default, execute on the common Fork Join Pool. So, if there are a lot of things going on in our application, those parallel tasks will compete with all other tasks in the same pool.

This is something that can be fixes easily. A nice thing with all parallel streams is that we can submit the parallel tasks to any Thread Pool, not just the common Thread Pool. In the example below, I am showing how to stream database content in parallel using Speedment, a stream based ORM tool and runtime. This way, complex database operations may execute much faster since the tasks can be spread out on several threads.

However, the same principle would work with any type of stream source such as the ones we get from Java's built-in Collections like List or Set.

Do this:

    final ForkJoinPool forkJoinPool = new ForkJoinPool(3);
    forkJoinPool.submit(() -> 
        
        users.stream() 
            .parallel()
            .filter(User.EMAIL.isNotEmpty())
            .forEach(slowOperation()); 
            
    );

    try {
        forkJoinPool.shutdown();
        forkJoinPool.awaitTermination(1, TimeUnit.HOURS);
    } catch (InterruptedException ie) {
        ie.printStackTrace();
    }

This will create a new Fork Join pool with three threads in which the parallel stream will be executed. This way, we limit the number of parallel task to three and they will operate separate from the common Thread Pool. Well, not entirely separate, because tasks in all pools compete for the same limited CPU resources. But that is another story...

Wednesday, December 14, 2016

Day 14, Java Holiday Calendar 2016, Submitting a Task

14. Submitting a Task

Today's tips is about submitting tasks. Historically, we java developers commonly used a new Thread directly when we wanted something to be done in parallel with the current thread. These days, there are a number of other better and more convenient ways of getting the job done.

For the sake of simplicity, we assume that we have a task we want to run that does not return something. This can be modeled using the Runnable interface and we can use lambdas to express our tasks.

Here are a number of alternate ways to execute a task:

public class Main {

    public static void main(String[] args) {
        // Runs in the main thread 
        helloWorld();

        // Runs in a new thread
        new Thread(Main::helloWorld).start();

        // Runs in the default fork join pool
        ForkJoinPool.commonPool().submit(Main::helloWorld);

        // Runs in the default fork join pool via a CompletableFuture
        CompletableFuture.runAsync(Main::helloWorld);

        // Runs in a custom fork join pool (with three workers)
        new ForkJoinPool(3).submit(Main::helloWorld)

        // Queues up task in a single thread executor
        Executors.newSingleThreadExecutor().execute(Main::helloWorld);

        // Caches tasks so that short lived re-occurring tasks can execute faster
        Executors.newCachedThreadPool().execute(Main::helloWorld);

        // Runs in a separate thread pool with the given delay
        Executors.newScheduledThreadPool(2).schedule(Main::helloWorld, 10, TimeUnit.MILLISECONDS);
    }

    public static void helloWorld() {
        System.out.println("Hello World greeting from " + Thread.currentThread().getName());
    }

}

This will produce the following print out:

Hello World greeting from main
Hello World greeting from Thread-0
Hello World greeting from ForkJoinPool.commonPool-worker-1
Hello World greeting from ForkJoinPool.commonPool-worker-1
Hello World greeting from ForkJoinPool-1-worker-1
Hello World greeting from pool-1-thread-1
Hello World greeting from pool-2-thread-1
Hello World greeting from pool-3-thread-1

Remember that in a real case scenario, we need to shut down the custom thread pools we are creating or else our program will not terminate properly (because there are still live threads alive). Closing a thread pool can be made like this:

try {
     forkJoinPool.shutdown();
     forkJoinPool.awaitTermination(1, TimeUnit.HOURS);
 } catch (InterruptedException ie) {
     ie.printStackTrace();
 }

Read more on thread executing principles on DZone here.

Tuesday, December 13, 2016

Day 13, Java Holiday Calendar 2016, Try Higher Order of Functionality

13. Higher Order of Functionality

Today's tips is to explore the world of Higher Order Functionality and how to work with functions that operates on functions.

In the good old pre-Java 8 days, algorithms mostly operated on data structures. But with the introduction of functions in Java 8, our programs may now also reason about behavior. Programming a program that operates on other programs opens up a whole new level of abstractions which allows for elegant declarative programs that express what to be done rather than how, leaving the details about the execution to a framework that performs the operation from "what" to "how" as a higher order operation.

We could, for example, write a QuickSort algorithm that may sort anything (that extends Object) stored anyhow by just providing functional parameters in the form of a getter, a comparer, a swapper and the number of elements to sort. This enables the QuickSort algorithm to sort lists, arrays or even serialized off heap objects using the same basic algorithm. The QuickSort algorithm just applies the provided functions agnostically. Thus, we only need to write QuickSort once and then we can re-use it by just providing the appropriate functions.

Speedment is an Open Source ORM with an API founded on Java 8 streams. With Speedment you can apply functions that takes functions as a parameter. For example you can:

users.stream()
  .filter(User.BORN.between(1985, 1995)) // Filters out Users born 1985 up to and including 1994
  .map(User.CATEGORY.setTo(3))           // Applies a function that sets their category to 3
  .forEach(users.updater());             // Applies the updater function to the selected users

The code snippet above will;

a) extract users from an underlying database where the users are born between 1985 and 1995 (and only those users)
b) for each such user, it will apply a mapping from a user to an updated user where the category has been set to 3 (but all other fields remain the same)
c) for each updated user, a database updater method will be applied that will result in the updated user being persisted in the database.

So, the snippet above is a sequence of methods that are provided other methods as per the paradigm of Higher Order of Functionality. If we later elect to store our data not in a database but in a file, in memory or even in an Excel diagram, then we only need to provide another set of functions. The stream logic will remain exactly the same.

Read more on Higher Order of Functionality here.

Follow the Java Holiday Calendar 2016 with small tips and tricks all the way through the winter holiday season.

Saturday, December 10, 2016

Day 10, Java Holiday Calendar 2016, MapStream

Today's tips is about the open-source class MapStream that allows us to stream not only over elements but over pair of key, value elements and make changes either to the keys, values or both.

You can find the source code for MapStream here together with some examples of how to use it. It is free so go ahead and use or copy it in your application! MapStream is a part of open-source Speedment, a stream ORM tool and runtime.

With MapStream you can do this:

Map<String, Integer> numberOfCats = new HashMap<>();

numberOfCats.put("Anne", 3);
numberOfCats.put("Berty", 1);
numberOfCats.put("Cecilia", 1);
numberOfCats.put("Denny", 0);
numberOfCats.put("Erica", 0);
numberOfCats.put("Fiona", 2);

System.out.println(
  MapStream.of(numberOfCats)
      .filterValue(v -> v > 0)
      .sortedByValue(Integer::compareTo)
      .mapKey(k -> k + " has ")
      .mapValue(v -> v + (v == 1 ? " cat." : " cats."))
      .map((k, v) -> k + v)
      .collect(Collectors.joining("\n"))
);

This would produce the following:

Cecilia has 1 cat.
Berty has 1 cat.
Fiona has 2 cats.
Anne has 3 cats.

Learn more on MapStream here.

Follow the Java Holiday Calendar 2016 with small tips and tricks all the way through the winter holiday season.

Tuesday, December 6, 2016

Day 6, Java Holiday Calendar 2016, Lazy

6. Be Lazy With Java 8

Today's tips is about lazy initialization. Sometimes, we want our classes to do only what is absolutely necessary and nothing more. Immutable classes are particularly good candidates for laziness. Speedment, a Stream ORM Java Toolkit and Runtime, is using Lazy internally and you can find the complete Lazy source code here. Its free so steal it!

By copying this small Lazy class:

public final class Lazy<T> {

    private volatile T value;

    public T getOrCompute(Supplier<T> supplier) {
        final T result = value;  // Read volatile just once...
        return result == null ? maybeCompute(supplier) : result;
    }

    private synchronized T maybeCompute(Supplier<T> supplier) {
        if (value == null) {
            value = requireNonNull(supplier.get());
        }
        return value;
    }

}

You Can Do This:

public class Point {

    private final int x, y;
    private final Lazy<String> lazyToString;

    public Point(int x, int y) {
        this.x = x; 
        this.y = y;
        lazyToString = new Lazy<>();
    }

    @Override
    public String toString() {
        return lazyToString.getOrCompute( () -> "(" + x + ", " + y + ")");
    }

    // The calculation of the toString value is only done once

    // regardless if toString() is called one or several times.

//

    // If toString() is never called, then the toString value is never

    // calculated.

}

Read more in the original article at http://minborgsjavapot.blogspot.com/2016/01/be-lazy-with-java-8.html

Follow the Java Holiday Calendar 2016 with small tips and tricks all the way through the winter holiday season.

Sunday, December 4, 2016

Creating Maps With Named Lambdas

The Magical Map

Wouldn't it be great if we could create Java maps like this?

Map<String, Integer> map = mapOf(
      one -> 1,
      two -> 2
);

Map<String, String> map2 = mapOf(
      one -> "eins",
      two -> "zwei"
);

Well, we can! Read this post and learn more about lambdas and how we can get the name of their parameters.

The solution

By introducing the following interface we get the functionality above.

public interface KeyValueStringFunction<T> extends Function<String, T>, Serializable {

    default String key() {
        return functionalMethod().getParameters()[0].getName();
    }

    default T value() {
        return apply(key());
    }

    default Method functionalMethod() {
        final SerializedLambda serialzedLabmda = serializedLambda();
        final Class<?> implementationClass = implementationClass(serialzedLabmda);
        return Stream.of(implementationClass.getDeclaredMethods())
            .filter(m -> Objects.equals(m.getName(), serialzedLabmda.getImplMethodName()))
            .findFirst()
            .orElseThrow(RuntimeException::new);
    }

    default Class<?> implementationClass(SerializedLambda serializedLambda) {
        try {
            final String className = serializedLambda.getImplClass().replaceAll("/", ".");
            return Class.forName(className);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

    default SerializedLambda serializedLambda() {
        try {
            final Method replaceMethod = getClass().getDeclaredMethod("writeReplace");
            replaceMethod.setAccessible(true);
            return (SerializedLambda) replaceMethod.invoke(this);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

    @SafeVarargs
    static <V> Map<String, V> mapOf(KeyValueStringFunction<V>... mappings) {
        return Stream.of(mappings).collect(toMap(KeyValueStringFunction::key, KeyValueStringFunction::value));
    }

}

Limitations

We can only create maps with keys that are of type String (or anything super String like CharSequence, Serializable or Comparable<String>) because obviously lambda names are strings.

We must use a Java version that is higher than Java 8u80 because it was at that time lambda names could be retrieved run-time.

The most annoying limitation is that we have to compile (i.e. "javac") our code with the "-parameter" flag or else the parameter names will not be included in the run time package (e.g. JAR or WAR). We can do this automatically by modifying our POM file like this:

          <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.5.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <compilerArgs>
                        <arg>-Xlint:all</arg>

                        <!-- Add this line to your POM -->
                        <arg>-parameters</arg>

                    </compilerArgs>
                    <showWarnings>true</showWarnings>
                    <showDeprecation>true</showDeprecation>
                </configuration>
            </plugin>

If you forget to add the "-parameter" flag, the Java runtime will always report a default name of "arg0" as the name of the parameter. This leads to that the maps will (at most) contain one key "arg0", which is not what we want.

Opportunities

The values, can be of any type. In particular they can be other maps, enabling us to construct more advanced map hierarchies. We could, for example, create a map with different countries with values that are also maps containing the largest towns and how many inhabitants each city has like this:

Map<String, Map<String, Integer>> map3 = mapOf(

            usa -> mapOf(
                new_york    -> 8_550_405,
                los_angeles -> 3_971_883,
                chicago     -> 2_720_546
            ),
            canada -> mapOf(
                toronto  -> 2_615_060,
                montreal -> 1_649_519,
                calgary  -> 1_096_833
            )

);

Update: Important Notes

A number of people have pointed out that SerializedLambda is "thin ice" that we should not rely on in production code. Tagir Valeev (@tagir_valeev) made a performance test of the scheme in this post and compared it to the old fashioned way of just creating a Map and using a regular put() to enter data. The findings were that the old way is orders of magnitudes faster. You can find the entire benchmark here. Thanks Tagir for making this test available. You should view this way of entering data in a Map as academic an as an inspiration of what can be done in Java. Do not use it in production code.

Keep on mapping!

Day 4, Java Holiday Calendar 2016, RemoveIf

4. Use RemoveIf in Java Collections

Today's tips is to use the removeIf() method (that all collection classes like List have) rather than manually iterating over the elements and remove them. For large data sets, removeIf() can be orders of magnitudes faster than other ways. It also looks much better in your code. Why? Read more here and see for your self!

Do This:

items.removeIf(i -> predicate(i));

Don't Do This:

for (Iterator it = items.iterator(); it.hasNext();) {  
    if (predicate(it.next())) {
        it.remove();    
    }
}

Read more in the original article at http://javadeau.lawesson.se/2016/09/java-8-removeif.html

Follow the Java Holiday Calendar 2016 with small tips and tricks all the way through the winter holiday season.

Saturday, December 3, 2016

Day 3, Java Holiday Calendar 2016, Initialize Maps

3. Initializing Maps in the Smartest Way

Today's tips is how to initialize Java Maps in a type safe way with Java 8. With Java 9, we will get even better ways of creating immutable Maps. Until then, by defining two utility methods:

    public static <K, V> Entry<K, V> entry(K key, V value) {
        return new AbstractMap.SimpleEntry<>(key, value);
    }

    public static <K, U> Collector<Entry<K, U>, ?, Map<K, U>> entriesToMap() {
        return Collectors.toMap(Entry::getKey, Entry::getValue);
    }

You Can Do This:

protected static Map<Integer, String> createMap() {
        return Stream.of(
                entry(0, "zero"),
                entry(1, "one"),
                entry(2, "two"),
                entry(3, "three"),
                entry(4, "four"),
                entry(5, "five"),
                entry(6, "six"),
                entry(7, "seven"),
                entry(8, "eight"),
                entry(9, "nine"),
                entry(10, "ten"),
                entry(11, "eleven"),
                entry(12, "twelve")).
                collect(entriesToMap());
    }

Read more in the original article at http://minborgsjavapot.blogspot.com/2014/12/java-8-initializing-maps-in-smartest-way.html

Follow the Java Holiday Calendar 2016 with small tips and tricks all the way through the winter holiday season.

Thursday, December 1, 2016

Day 2, Java Holiday Calendar 2016, Composition

2. Favor Composition Over Inheritance

Today's tips is to avoid inheritance. For good reasons, there can only be one super class for any given Java class. Furthermore, exposing abstract or base classes in your API that are supposed to be inherited by client code is a very big and problematic API commitment. Avoid API inheritance altogether, and instead consider providing static interface methods that take one or several lambda parameters and apply those given lambdas to a default internal API implementation class.

This also creates a much clearer separation of concerns. For example, instead of inheriting from a public API class AbstractReader and overriding abstract void handleError(IOException ioe), it is better to expose a static method or a builder in the Reader interface that takes a Consumer<IOException> and applies it to an internal generic ReaderImpl.

Do This:

Reader reader = Reader.builder()
    .withErrorHandler(IOException::printStackTrace)
    .build();

Don't Do This:

Reader reader = new AbstractReader() {

    @Override
    public void handleError(IOException ioe) {
        ioe.printStackTrace();
    }
};

Read more in the original article at https://dzone.com/articles/the-java-8-api-design-principles

Follow the Java Holiday Calendar 2016 with small tips and tricks all the way through the winter holiday season.

Day 1, Java Holiday Calendar 2016, Functional Interfaces

1. Use the @FunctionalInterface Annotation

Today's tips is to annotate a functional interface with the @FunctionalInterface annotation thereby signaling that API users may use lambdas to implement the interface. It also ensures that the interface remains usable for lambdas over time by preventing abstract methods from accidentally being added to the API later on.

Do This:

@FunctionalInterface

public interface CircleSegmentConstructor {

    CircleSegment apply(Point cntr, Point p, double ang);

    // abstract methods cannot be added

}

Don't Do This:

public interface CircleSegmentConstructor {

    CircleSegment apply(Point cntr, Point p, double ang);

    // abstract methods may be accidentally added later

}

Thursday, November 10, 2016

Work with Parallel Database Streams using Custom Thread Pools

Parallel Database Streams

In my previous post, I wrote about processing database content in parallel using parallel streams and Speedment. Parallel streams can, under many circumstances, be significantly faster than the usual sequential database streams.

The Thread Pool

By default, parallel streams are executed on the common ForkJoinPool where they potentially might compete with other tasks. In this post we will learn how we can execute parallell database streams on our own custom ForkJoinPool, allowing a much better control of our execution environment.

Speedment is an open-source Stream ORM Java Toolkit and Runtime Java tool that wraps an existing database and its tables into Java 8 streams. We can use an existing database and run the Speedment tool and it will generate POJO classes that corresponds to the tables we have selected using the tool. One distinct feature with Speedment is that it supports parallel database streams and that it can use different parallel strategies to further optimize performance.

Getting Started With Speedment

Head out to open-souce Speedment on GitHub and learn how to get started with a Speedment project. Connecting the tool to an existing database is really easy. Read my previous post for more information on how the database table and PrimeUtil class looks like for the examples below.

Executing on the Default ForkJoinPool

Here is the application that I talked about in my previous post that will scan a database table in parallel for undetermined prime number candidates and then it will determine if they are primes or not and update the table accordingly. This is how it looks:

Manager<PrimeCandidate> candidatesHigh = app.configure(PrimeCandidateManager.class)
            .withParallelStrategy(ParallelStrategy.computeIntensityHigh())
            .build();

        candidatesHigh.stream() 
            .parallel()                                                // Use a parallel stream
            .filter(PrimeCandidate.PRIME.isNull())                     // Only consider nondetermined prime candidates
            .map(pc -> pc.setPrime(PrimeUtil.isPrime(pc.getValue())))  // Sets if it is a prime or not
            .forEach(candidatesHigh.updater());                        // Apply the Manager's updater

First, we create a stream over all candidates (using a parallel strategy named ParallelStrategy.computeIntensityHigh()) where the 'prime' column is null using the stream().filter(PrimeCandidate.PRIME.isNull()) method. Then, for each such prime candidate pc, we either set the 'prime' column to true if pc.getValue() is a prime or false if pc.getValue() is not a prime. Interestingly, the pc.setPrime() method returns the entity pc itself, allowing us to easily tag on multiple stream operations. On the last line, we update the database with the result of our check by applying the candidatesHigh.updater() function.

Again, make sure to check out my previous post on the details and benefits of parallel strategies. In short, Java's default parallel strategy works well for low computational demands because it places a large amount of initial work items on each thread. Speedment's parallel strategies works much better for medium to high computational demands whereby a small amount of work items are laid out on the participating threads.

The stream will determine prime numbers fully parallel and the execution threads will use the common ForkJoinPool as can be seen in this picture (my laptop has 4 CPU cores and 8 CPU threads):

Use a Custom Executor Service

As we learned in the beginning of this post, parallel streams are executed by the common ForkJoinPool by default. But, sometimes we want to use our own Executor, perhaps because we are afraid of flooding the common ForkJoinPool, so that other tasks cannot run properly. Defining our own executor can easily be done for Speedment (and other stream libraries) like this:

    final ForkJoinPool forkJoinPool = new ForkJoinPool(3);
    forkJoinPool.submit(() -> 
        
        candidatesHigh.stream() 
            .parallel()
            .filter(PrimeCandidate.PRIME.isNull())
            .map(pc -> pc.setPrime(PrimeUtil.isPrime(pc.getValue())))
            .forEach(candidatesHigh.updater()); 
            
    );

    try {
        forkJoinPool.shutdown();
        forkJoinPool.awaitTermination(1, TimeUnit.HOURS);
    } catch (InterruptedException ie) {
        ie.printStackTrace();
    }

The application code is unmodified, but wrapped into a custom ForkJoinPool that we can control ourselves. In the example above, we setup a thread pool with just three worker threads. The worker threads are not shared with the threads in the common ForkJoinPool.

Here is how the threads looks like using the custom executor service:

This way we can control both the actual ThreadPool itself and precisely how work items are laid out in that pool using a parallel strategy!

Keep up the heat in your pools!

Monday, October 24, 2016

Work with Parallel Database Streams using Java 8

What is a Parallel Database Stream?

Read this post and learn how you can process data from a database in parallel using parallel streams and Speedment. Parallel streams can, under many circumstances, be significantly faster than the usual sequential streams.

With the introduction of Java 8, we got the long awaited Stream library. One of the advantages with streams is that it is very easy to make streams parallel. Basically, we could take any stream and then just apply the method parallel() and we get a parallel stream instead of a sequential one. By default, parallel streams are executed by the common ForkJoinPool.

Spire and Duke Working in Parallel

Parallel streams are good if the work items to be performed in the parallel stream pipelines are largely uncoupled and when the effort of dividing up the work in several threads is relatively low. Equally, the effort of combining the parallel results must also be relatively low.

So, if we have work items that are relatively compute intensive, then parallel streams would often make sense.

Speedment is an open-source Stream ORM Java Toolkit and RuntimeJava tool that wraps an existing database and its tables into Java 8 streams. We can use an existing database and run the Speedment tool and it will generate POJO classes that corresponds to the tables we have selected using the tool.

One cool feature with Speedment is that the database streams supports parallelism using the standard Stream semantics. This way, we can easily work with database content in parallel and produce results much faster than if we process the streams sequentially!

Getting Started With Speedment

Visit open-souce Speedment on GitHub and learn how to get started with a Speedment project. It should be very easy to connect the tool to an existing database.

In this post, the following MySQL table is used for the examples below.

CREATE TABLE `prime_candidate` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `value` bigint(20) NOT NULL,
  `prime` bit(1) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

The idea is that people may insert values into this table and then we will write an application that computes if the inserted values are a prime numbers or not. In a real case scenario, we could use any table in a MySQL, PostgreSQL or MariaDB database.

Writing a Sequential Stream Solution

First, we need to have a method that returns if a value is a prime number. Here is a simple way of doing it. Note that the algorithm is purposely made slow so we clearly can se the effects of parallel streams over an expensive operation.

public class PrimeUtil {

    /**
     * Returns if the given parameter is a prime number.
     *
     * @param n the given prime number candidate
     * @return if the given parameter is a prime number
     */
        static boolean isPrime(long n) {
        // primes are equal or greater than 2 
        if (n < 2) {
            return false;
        }
        // check if n is even
        if (n % 2 == 0) {
            // 2 is the only even prime
            // all other even n:s are not
            return n == 2;
        }
        // if odd, then just check the odds
        // up to the square root of n
        // for (int i = 3; i * i <= n; i += 2) {
        //
        // Make the methods purposely slow by
        // checking all the way up to n
        for (int i = 3; i <= n; i += 2) {
            if (n % i == 0) {
                return false;
            }
        }
        return true;
    }

}

Again, the object of this post is not to devise an efficient prime number determination method.

Given this simple prime number method, we can now easily write a Speedment application that will scan the database table for undetermined prime number candidates and then it will determine if they are primes or not and update the table accordingly. This is how it might look:

final JavapotApplication app = new JavapotApplicationBuilder()
            .withPassword("javapot") // Replace with the real password
            .withLogging(LogType.STREAM)
            .build();
        
        final Manager<PrimeCandidate> candidates = app.getOrThrow(PrimeCandidateManager.class);
        
        candidates.stream()
            .filter(PrimeCandidate.PRIME.isNull())                      // Filter out undetermined primes
            .map(pc -> pc.setPrime(PrimeUtil.isPrime(pc.getValue())))   // Sets if it is a prime or not
            .forEach(candidates.updater());                             // Applies the Manager's updater

The last part contains the interesting stuff. First, we create a stream over all candidates where the 'prime' column is null using the stream().filter(PrimeCandidate.PRIME.isNull()) method. It is important to understand that the Speedment stream implementation will recognize the filter predicate and will be able to use that to reduce the number of candidates that are actually pulled in from the database (e.g. a "SELECT * FROM candidate WHERE prime IS NULL" will be used). Then, for each such prime candidate pc, we either set the 'prime' column to true if pc.getValue() is a prime or false if pc.getValue() is not a prime. Interestingly, the pc.setPrime() method returns the entity pc itself, allowing us to easily tag on multiple stream operations. On the last line, we update the database with the result of our check by applying the candidates.updater() function. So, this application's main functionality is really a one-liner (broken up into five lines for improved readability).

Now, before we can test our application, we need to generate some test data input. Here is an example of how that can be done using Speedment:

final JavapotApplication app = new JavapotApplicationBuilder()
            .withPassword("javapot") // Replace with the real password
            .build();

        final Manager<PrimeCandidate> candidates = app.getOrThrow(PrimeCandidateManager.class);

        final Random random = new SecureRandom();

        // Create a bunch of new prime candidates
        random.longs(1_100, 0, Integer.MAX_VALUE)
            .mapToObj(new PrimeCandidateImpl()::setValue)  // Sets the random value 
            .forEach(candidates.persister());              // Applies the Manager's persister function

Again, we can accomplish our task with just a few lines of code.

Try the Default Parallel Stream

If we want to parallelize our stream, we just need to add one single method to our previous solution:

        candidates.stream()
            .parallel()                                 // Now indicates a parallel stream
            .filter(PrimeCandidate.PRIME.isNull())
            .map(pc -> pc.setPrime(PrimeUtil.isPrime(pc.getValue())))
            .forEach(candidates.updater());             // Applies the Manager's updater

And we are parallel! However, by default, Speedment is using Java's default parallelization behavior (as defined in Spliterators::spliteratorUnknownSize) which is optimized for non-compute-intensive operations. If we analyze Java's default parallelization behavior, we will determine that it will use a first thread for the first 1024 work items, a second thread for the following 2*1024 = 2048 work items and then 3*1024 = 3072 work items for the third thread and so on. This is bad for our application, where the cost of each operation is very high. If we are computing 1100 prime candidates, we will only use two threads because the first thread will take on the first 1024 items and the second thread will take on the rest 76. Modern servers have a lot more threads than that. Read the next section to see how we can fix this issue.

Built-in Parallelization Strategies

Speedment has a number of built-in parallelization strategies that we can select depending on the work item's expected computational demands. This is an improvement over Java 8 that only has one default strategy. The built-in parallel strategies are:

@FunctionalInterface
public interface ParallelStrategy {

    /**
     * A Parallel Strategy that is Java's default <code>Iterator</code> to
     * <code>Spliterator</code> converter. It favors relatively large sets (in
     * the ten thousands or more) with low computational overhead.
     *
     * @return a ParallelStrategy
     */
    static ParallelStrategy computeIntensityDefault() {...}

    /**
     * A Parallel Strategy that favors relatively small to medium sets with
     * medium computational overhead.
     *
     * @return a ParallelStrategy
     */
    static ParallelStrategy computeIntensityMedium() {...}

    /**
     * A Parallel Strategy that favors relatively small to medium sets with high
     * computational overhead.
     *
     * @return a ParallelStrategy
     */
    static ParallelStrategy computeIntensityHigh() {...}

    /**
     * A Parallel Strategy that favors small sets with extremely high
     * computational overhead. The set will be split up in solitary elements
     * that are executed separately in their own thread.
     *
     * @return a ParallelStrategy
     */
    static ParallelStrategy computeIntensityExtreme() {...}

    <T> Spliterator<T> spliteratorUnknownSize(Iterator<? extends T> iterator, int characteristics);

    static ParallelStrategy of(final int... batchSizes) {
        return new ParallelStrategy() {
            @Override
            public <T> Spliterator<T> spliteratorUnknownSize(Iterator<? extends T> iterator, int characteristics) {
                return ConfigurableIteratorSpliterator.of(iterator, characteristics, batchSizes);
            }
        };
    }

Applying a Parallel Strategy

The only thing we have to do is to configure a parallelization strategy to a manager like this, and we are good to go:

Manager<PrimeCandidate> candidatesHigh = app.configure(PrimeCandidateManager.class)
            .withParallelStrategy(ParallelStrategy.computeIntensityHigh())
            .build();

        candidatesHigh.stream() // Better parallel performance for our case!
            .parallel()
            .filter(PrimeCandidate.PRIME.isNull())
            .map(pc -> pc.setPrime(PrimeUtil.isPrime(pc.getValue())))
            .forEach(candidatesHigh.updater());

The ParallelStrategy.computeIntensityHigh() strategy will break up the work items in much smaller chunks. This will give us considerably better performance, since we now are going to use all the available threads. If we look under the hood, we can see that the strategy is defined like this:

    private final static int[] BATCH_SIZES = IntStream.range(0, 8)
            .map(ComputeIntensityUtil::toThePowerOfTwo)
            .flatMap(ComputeIntensityUtil::repeatOnHalfAvailableProcessors)
            .toArray();

This means that, on a computer with 8 threads, it will put one item on thread 1-4, two items on thread 5-8 and when the tasks are completed there will be four items on the next four available threads, then eight items and so on until we reach 256 which is the maximum items put on any thread. Obviously, this strategy is much better than Java's standard strategy for this particular problem.

Here is how the threads in the common ForkJoinPool looks like on my 8 threaded laptop:

Create Your Own Parallel Strategy

One cool thing with Speedment is that we, very easily, can write our parallelization strategy and just inject it into our streams. Consider this custom parallelization strategy:

    public static class MyParallelStrategy implements ParallelStrategy {

        private final static int[] BATCH_SIZES = {1, 2, 4, 8};

        @Override
        public <T> Spliterator<T> spliteratorUnknownSize(Iterator<? extends T> iterator, int characteristics) {
            return ConfigurableIteratorSpliterator.of(iterator, characteristics, BATCH_SIZES);
        }

    }

Which, in fact, it can be expressed even shorter:

    ParallelStrategy myParallelStrategy = ParallelStrategy.of(1, 2, 4, 8);

This strategy will put one work item on the first available thread, two on the second, four on the third, eight on the fourth with eight being the last digit in our array. The last digit will then be used for all subsequent available threads. So the order really becomes 1, 2, 4, 8, 8, 8, 8, ... We can now use our new strategy as follows:

Manager<PrimeCandidate> candidatesCustom = app.configure(PrimeCandidateManager.class)
            .withParallelStrategy(myParallelStrategy)
            .build();

        candidatesCustom.stream()
            .parallel()
            .filter(PrimeCandidate.PRIME.isNull())
            .map(pc -> pc.setPrime(PrimeUtil.isPrime(pc.getValue())))
            .forEach(candidatesCustom.updater());

Voilà! We have full control over how the work items are laid out over the available execution threads.

Benchmarks

All benchmarks used the same input of prime candidates. Tests were run on a MacBook Pro, 2.2 GHz Intel Core i7 with 4 physical cores and 8 threads.

Strategy

Sequential                       265 s (One thread processed all 1100 items)
Parallel Default Java 8          235 s (Because 1024 items were processed by thread 1 and 76 items by thread 2)
Parallel computeIntensityHigh()   69 s (All 4 hardware cores were used)

Conclusions

Speedment supports parallel processing of database content.

Speedment supports a variety of parallel strategies to allow full utilization of the execution environment.

We can easily create our own parallel strategies and use them in our Speedment streams.

It is possible to improve performance significantly by carefully selecting a parallel strategy over just settling with Java's default one.

Thursday, October 20, 2016

Java 8: A Closer Look at Speedment 3.0.1 “Forest” Stream ORM

Following the Road

I have been contributing to the open-source project Speedment (which is a Stream ORM Java Toolkit and Runtime) and a new major version called 3.0.1 “Forest” was just released. Releases are named after the avenues in Palo Alto, California where most of the contributors work. Each new major release gets a new name by following Middlefield Road southwards. The new version is now modularized which helps developers keep up the good pace. There are also a large number of new features for Speedment users and in this article we will look into some of the things to discover!

Persistence

People used to older ORMs can now use Speedment in the same way when creating, updating or removing entities from a database. For example, we can create entities in a database “JPA-style” like this:

Hare hare = new HareImpl();
hare.setName("Flopsy");
hare.setAge(1);
hare.setColor("Gray");

entityManager.persist(hare);  // Persists (=inserts) the new Hare in the database

While this is not a big change, it is still convenient.

Declarative Stream Composition

Speedment database queries are expressed as operations on Standard Java 8 Streams. In the new version, the Speedment API provides methods that returns functions rather than operating on objects directly. This simplifies something called Declarative Stream Composition which simply means that it becomes easier and more efficient to write streams.

Let us take a closer look at an example where we want to join objects from two different tables. We have two tables “hare” and “carrot” where “carrot” has a field named “owner” that is a foreign key to the column “hare”.”id”. The task is to build a Map that contains all Hare entities as keys and a List of Carrot entities that belongs to a particular Hare via its foreign key, as values. This can be expressed like this:

Map<Hare, List<Carrot>> joinMap = carrots.stream()
    .collect(
        groupingBy(hares.finderBy(Carrot.OWNER)) // Applies the finderBy(Carrot.OWNER) classifier
    );

The goupingBy() method takes a Function that maps from a Carrot to a Hare entity. So, by working by methods that returns functions, our code becomes very compact. This also opens up future ways of optimizing the stream, since these functions can be identified and analyzed in the stream pipeline prior to the stream is started. It should be noted that both the collect() and groupingBy() methods are standard Java 8 methods.

Even Better Code Generation

Speedment generates code automatically from the database schema data. One good thing with Speedment is that we can see, understand and change the generated code. This makes things less “magic” compared to other ORMs and puts the developer in the driving seat. The new code generation functionalities include:

Support for Primitive Types

Now we can use primitive types like int, long or double for columns and improve both execution speed and memory usage. Nullable fields can be mapped to specialized Optional types like OptionalInt, OptionalLong and OptionalDouble consistent with Java 8 code styling.

Modular Code Generation

We can plug in our own code generation logic and adapt the default code generator. This comes in handy for us developers that might understand our domain model in depth and want to leverage that knowledge. When new functionality is added by customizing the code generator, these new features will be applied immediately to all generated code. Code the code and get leverage!

Compatibility Mode

Some older solutions are not prepared for Optional fields and so a new “compatibility” mode was added where, for example, a nullable integer will be returned as an Integer and not as an OptionalInt.

Configurable Name Space

We can now configure the code generator to put entities, managers and configuration objects individually on any namespace. This is good for modularized projects.

Improved Code Renderer

Speedment is using a Model View Controller (MVC) paradigm for code generation. This means that the code Model (which is an Abstract Syntax Tree) is separate from the actual code rendering (View). The Views have been updated and improved so it produces better looking code.

Checksum Protection

Manually changes classes are protected by checksums so that they are retained even if we decide to change the name space.

Increased Type Safety

Speedment can now map columns that take values from small sets of strings to Enums further improving type safety. When the generated code uses an Enum, any mismatch between the database model and the values used in the business logic will be found as early as possible by the compiler, instead of later in the development cycle.

Improved Logging for Transparency

Speedment has a new logging system to allow us to see the exact SQL code being sent to the database. This is good for transparency and allows us to see precisely what is happening under the hood. We can easily enable logging of all CRUD operations like this:

HaresApplication loggingApp = new HaresApplicationBuilder()
    .withPassword("secretDbPassword")
    .withLogging(STREAM)
    .withLogging(PERSIST)
    .withLogging(UPDATE)
    .withLogging(REMOVE)
    .build();

Manager<Hare> hares = loggingApp.getOrThrow(HareManager.class);

long oldHares = hares.stream()
    .filter(Hare.AGE.greaterThan(8))
    .count();

System.out.println("There are " + oldHares + " old hares");

This will produce the following log:

2016-10-19T20:50:21.957Z DEBUG [main] (#SELECT) - 
    SELECT COUNT(*) FROM `hares`.`hare` WHERE (`hares`.`hare`.`age` > ?), values:[8]

There are 30 old hares

Improved User Interface

The graphical tool has been improved in many ways. Now, we get warnings and tips that gives us better guidance. Several code generator configuration options have been added and we also see more relevant information when we select different configuration objects.

New Maven Goals

There are two new Maven goals; “clear” and “reload”, that can be used to automate and simplify the building process. The goal “clear” removes all generated code (that is not manually changed) and “reload” reloads the domain model directly from an existing database (metadata).

Take it for a Spin

Check out open-source Speedment on GitHub where there also is a Wiki and a quick start guide. Feel free to give feedback and join the discussion via Gitter.

Drive safely!