Showing posts with label Java. Show all posts

Spring, Reactor & Event driven programming on the JVM

Following my previous outing playing with the Spring-Boot microservices stuff, I once again found myself looking through some of the Spring libraries and came across the Reactor integration stuff.  It looked interesting, and thought I would have a quick look at the asynchronous event-driven model.

As always with Spring Boot, it was super simple to get an app up and running - it's worth noting that the app doesn't really get that much into the async processing side (or really expose scenarios with potential benefits/pitfalls of the approach) - but it gets an app up and running pretty easily.


Event streaming

Obviously, to get started I needed some kind of event source (I could have just stubbed out some code to just randomly create events in the system, but I wanted something more real). Obviously, with the modern web & big data, we have a tonne of events that we could use. I went for the obvious choice of the Twitter streaming API - as it exposes a streaming API I could just connect to that and on receipt of any tweet then just push an event on to my EventBus for any interested parties to process.

The basics of connecting to the Twitter streaming API are pretty simple using Spring-Boot and Spring-Social - I just created a simple Spring-Boot webapp that just exposed a simple page to connect to Twitter (OAuth via Spring-Social) and then on connection just connected to the streaming API and started listening.




First I just connected to the sample "firehose" stream, which is supposed to be 1% of the total twitter stream (I saw reported that there are 500million tweets a day, so you'd be looking at about 5million random tweets sent a day) - But I decided to consume the tweet events to pull out data about the current ongoing Rugby World Cup (England 2015), so I switched to the search stream limiting by references to the world cup.

The searching stream provided a reasonable number of tweets, and I think whilst I was running during the England vs Australia game I processed ~500,000 rugby related tweets.


Configuring reactor

Getting reactor up and running is really easy with Spring:

The EventBus configuration is interesting, as it has options to use different patterns including the LMAX pattern - in this case I just went with a standard thread pool approach.

The tweet-eater

To be honest, the use of reactor was overkill for this experiment, as the Spring-Social API allows you to define listeners for the streaming APIs which have a handle tweet method, much like an event consuming interface. But as it was just an experiment I continued and just used that listener to push the events on to the event bus.

As you can see above, its pretty simple - the listener just creates the event object and pushes it onto the EventBus.  So, with that done, and the basic Spring-Social connection setup to listen to the stream we will have a nice flow of events being pushed (at quite a high rate!) onto the EventBus. Now we just need something to consume those events.


Event consumers

The first consumer I created was just a very basic logging consumer - all it did was count events and then log numbers

Pretty simple


Next up,  I created a basic consumer to inspect the tweets, identify rugby teams mentioned and persist the data to Redis - this was also pretty easy, as Redis integration works pretty simply and Twitter created a set of standard team hashtags for the competition - So I just mapped those to countries, and setup my consumer to check for those

I quickly stuck some lipstick on the interface to display number of tweets per country and we were off!



Like I said, it's only barely scratching the surface of async event based programming, but it shows how easy it is to get the EventBus up and running with Spring.

As always, the code for the full app is over on GitHub - if you just clone the project and add in your own Twitter API keys in the config then you should just be able to build the JAR and run it directly.

Spring MVC - Page caching and If-Modified-Since

Spring (and in general Java/groovy etc) support a range of caching layers for your web application - with options for Hibernate first/second level cache, Spring's @Cacheable and simple integration with lots of providers (EHCache, Redis, etc), but aside from server side caching, you can also implement page level caching using the standard If-Modified-Since cache headers.

If your web application is sitting behind an apache web server, then using this mechanism, you can set it up such that Apache will do a lot of the work, and a lot of requests never even trouble our web application.  This will of course only work if you have fairly static pages that don't update to frequently, and aren't user specific (e.g. so its actually possible to cache at a web-page level - user account pages obviously are harder to cache at this level as they are very user specific).  Even without Apache in the mix, most modern browsers are built to handle If-Modified-Since headers and 304 response codes, so using the approach can mean that browsers are less eager to even request the page from your server whilst a user is browsing the site if we have said its cache-able.


Thankfully, the machinery for this stuff is all baked into Spring MVC.


Updating the RequestMappingHandlerAdapter

This walkthrough is assuming that your app is using standard @Controller annotations and a standard RequestMappingHandlerAdapter to route the requests - although there are still relatively easy mechanisms to do this if you are using other Controller/HandlerAdapter patterns.

The RequestMappingHandlerAdapter class has a method that can be overridden called getLastModifiedInternal() - This method simply returns a long value (the epoch time) that the requested resource was last modified. All we need to do is extend the RequestMappingHandlerAdapter and implement this method to return a timestamp.  For example:



The above assumes we have initialised a timestamp at startup (easy to do using Java config) and assumes that if you have visited the site since last app startup, then there is no change (in reality, we will likely need something more complicated to calculate this)

And that's really all we need, as Spring will handle the rest for us - from this, if a brand new request comes in with no If-Modified-Since headers, then Spring will take this date/time stamp and return it with our response as the Last-Modified date (HTTP standard header - this will inform the browsers/web servers action next time).  If however, a user has already visited your site and the browser has received a valid Last-Modified timestamp, then on the next request it will include this value in the request If-Modified-Since header, when Spring recieves this request, it compares the timestamp against the value that is returned from our getLastModifiedInternal() method, and if there has been no change then Spring will automatically return the response to the client with a 304 response - so none of the Controller code will be executed.

As you can probably guess, this can provide huge efficiencies and improvement on latency, server overhead etc.



More complex Last Modified Dates

As mentioned, in all likelyhood you will need a more sophisticated mechanism than just checking the application start time - So we have plenty of options here: we could query the DB for changes, we could have other flags/properties set for when particular resources are set (bare in mind that if static resources like CSS are changed, these need to be considered)

If you need to consider a fairly unique LastModified date for every endpoint, then this solution isn't a good fit, as this is the more generic approach - you can alternatively implement a similar last modified method on your (every) controllers.


Another option that I have considered is a halfway compromise - where I need endpoints to have specific considerations, but I only have a set of 4-5 (or relatively manageable) specific queries/checks that need to happen, and every endpoint will just need to check some subset of these conditions - this solution involves custom annotations and marking up Controller methods with this annotation to signal to our HandlerAdapter what checks should be considered for the Last Modified timestamp.  This has the advantage of relatively little intrusion in all my controller endpoints, but granular enough to provide enough control for effective page caching.


A custom annotation

First we define a simple annotation that can be used to mark up Controller methods to indicate which conditions are important to the endpoint:



The above code shows the annotation code, a sample enum (just to provide some type safety whilst using the annotation - this could be a string or anything else) and an example of the annotation on a controller endpoint.


Updating our Last Modified method

Now, in our last modified method, we have access to the Handler (e.g. the controller method being invoked) so we can quite simply check the controller method for our annotation, and if found we can just check the values said and perform the appropriate checks to work out what the last modified date needs to consider:




As mentioned, this won't work if there are lots of pages with lots of different requirements, but if you have a manageable of changing entities in your application this can strike a nice balance between clean code and flexibility.



Thoughts on Spring Boot

 Having long been a fan of the Spring framework (Spring MVC, Spring Social etc), I have recently started using Spring Boot for two different projects I am currently working on.
In both cases, they are web applications - both with some key differences in technology (in terms of what I am using for client side libraries and data persistence).

Spring Boot is an opinionated implementation of Spring - and it works really nicely (most of the time).  You simply add the Spring Boot dependencies to your build file (Gradle or Maven both well supported) and you can have an application up and running with a very small amount of code (you may have seen the Spring boot application in a Tweet a while ago)



Now of course, in reality you do need other configuration stuff, but the take away point is that if you are happy with Spring opinions, you can get an application up and running in pretty quick time.  For me, this is really a turning point for Java/Spring productivity - especially as Groovy has become more mature and sits so easily in Spring Boot, I don't think there is much to the argument that Java/Spring is too slow for early stage companies/prototypes/rapid development process (Aside: I know grails has been around for a while, but I think Spring and Java have stepped up here, plus I honestly have my suspicions that Spring Boot and Grails may clash at some point - they have certainly been on a collision course since Spring Boot was announced at Spring One).


The way it works is quite simple really, you add a relevant dependency to your buildfile, for example:

compile("org.springframework.boot:spring-boot-starter-web")  
compile("org.springframework.boot:spring-boot-starter-thymeleaf")


The first one is just your standard web application Spring Boot dependency, and the second is the opinionated version of Thymeleaf configuration.  Then, at startup Spring Boot has lots of AutoConfiguration files that check for the existence of key classes in the classpath, and if present it executes the auto configuration.

See here for the Thymeleaf autoconfiguration - you will see that there is heavy use of the @ConditionalOnClass annotation - which checks for relevant Thymeleaf classes, and if they are on the classpath it configures them.



Undoubtedly, there are a few times where you find yourself scratching your head at the magic, and I have on more than one occasion had to go through the Spring source code.  And sometimes, you want most of the auto configuration, but just want to tweak one or two properties, and you are left having to turn off the autoconfig and do it yourself, but for me the biggest take away point is the speed at which it is now possible to get up and running with Spring/Java/Groovy and have a decent web platform for building your product or company.

Android: Building a cloud based quiz application

A long time ago, when Android was still in its infancy (1.5 I think..) I built and open sourced a basic quiz app.  The app was just a basic multiple choice question-answer app, driven from some questions in the database, but it did ok - it has had over 10k downloads on the app store, and the blog post tutorial here is still one of the most popular articles.

But here we are, Android 5.0 is released and the state of Android development is very different now. But I didn't really want to just re-skin/tweak the old app and push it out again - and I also wanted to write up some notes on using parse.com as a backend service - so this seemed like a good opportunity.

The source code for the app is all on GitHub.


The goal

So the aim is to create a an android app quiz game, but rather than using local storage, using the cloud to power the questions. This avoids the need for lots of boiler plate DB code and also makes it easier for us to update questions.  The tutorial will be broken into two parts - the first part will be basic quiz stuff with cloud questions, the second part will enhance the app to support user login and to track scores to allow users to compete against each other.


Parse

Before you start the tutorial, you need to get an account setup at parse.com - its a cloud DB/backend as a service that was recently bought by Facebook. They allow real easy setup of DBs and provide a load of nice libraries/APIs to really easily interact with their endpoints across lots of platforms (its also incredibly well priced -the free tiew is really good and if you find your mobile app is going beyond that then you can probably think about monetising the app!).  All you need to do is head over there, sign-up and then create a new app - all you need to do is give it a name and hey presto!  You can either make a note of the keys then, or come back and grab them later. There are no other changes you need to make now, as that will get handled from our source code. The only other thing to do is to download the parse android library to make use of their sdk in android (if you have grabbed the source code from GitHub then you will not have to worry about these)



OK, lets get started!

Again, I am not going to spend a lot of time covering the real basics of Android and only really mention the things of note or that are different from standard application development - hopefully the code & general explanation will be clear enough to get an understanding of what is going on.


Android manifest
First, lets get our AndroidManifest.xml file configured.  The only things to note here are the permissions we are setting - we will request internet access and network state permissions. Also worth noting that I have set the min sdk for my sample app at version 16.


Our Application class
We will have to create a custom implementation of the Android Application class. This class is instantiated on application startup, and hopefully if you are looking at Android development you should be familiar with this class.  We will do a couple of things in this class:

  1. Register our parse.com application with out secret keys
  2. Initialise the Parse library and our domain objects
  3. Try to fetch all the questions for the quiz and store them for offline usage 
  4. Create a GamePlay object, that will keep track of the state of the current game in progress
First lets look at the Parse setup - this is standard parse boilerplate and is covered in parse docs and sample apps - you just need to add your ID/Key here (also note, we have just registered the Parse object class Question - this is our domain object - like a JPA entity etc - if we add more domain objects they need to be added here too)

Next we will make a call to parse.com to fetch the questions from our cloud API - we will save this in the background (make an asynchronous call) and "pin it" to make it available for offline usage. Also note that we do not un-pin existing questions until we have successfully found new ones - that way users should always have questions once they have successfully loaded them the first time.
Hopefully the above is quite clear - the parse libraries are quite straight forward to understand - we create a query (typed Question) and then call findInBackground and implement an on success handler.


Domain objects: Question
Parse library provides a nice interface to create POJOs to model your domain model, if you are familiar with JPA/Hibernate/etc and the approach of POJOs representing a domain model its much like this. Using these classes you can easily query/load/save data from the cloud by just using these objects. You will have spotted that in the query we use in the application class to load all our questions we just run a plain query with the Question class - this, as you should expect, will just retrieve all Question objects from your cloud table (parse). The domain models are just an annotated POJO, and you just define appropriate getter/setters for the fields you want to include.


Welcome activity
Now we are into Android basic stuff really - we have setup parse and fetched the questions for local usage, now we just need a basic menu to start the quiz and some activities to display the questions and the end results.

We will just apply the layout and then implement a method to handle the button clicks. For this tutorial we are skipping the high score stuff and just playing the quiz.
All we need to do is reset the current GamePlay object and grab the questions from the local store (by this point they should be updated from the cloud so no problems, then kick off the quiz!


Question activity
There is nothing really interesting to see here - it's all on github if you want to take a look in detail (or have already downloaded and working along) - This is just a standard Android activity that pulls out the question and possible answers and presents them.




This just progresses along fairly simply, until it gets to the end of the quiz and then it presents a simple screen saying the score - all this stuff can be tweaked/styled etc - but there are the basics for a cloud powered multiple choice quiz app!


Creating the questions

All the questions will be stored in the cloud in parse.com - once you have a look at the interface, it should be pretty clear - you can easily create the data for questions either manually or by importing a csv/json file.



You will need to login to your parse account and the quiz app and create a Question class. This will just match the domain POJO we have created. Just login, go to "CORE" then "DATA" Then select "+ Add Class", adda a custom class called "Question" (this must be exactly the same as the name provided in the POJO annotation for the Question class).. Then select "Add Col" and add the fields to match the POJO (question[string], option1[string], etc).  Once you have the class/table added on parse, you can simply add data by selecting "Add Row" and just manually entering the data, or using the import function.



Source code

Tech: Functional programming in Groovy

I have recently started the Coursera Functional Programming with Scala course (taught by Martin Odersky - the creator of Scala) - which is actually serving as an introduction to both FP and Scala at the same time having done neither before. The course itself is great, however, trying to watch the videos and take in both the new Scala syntax and the FP concepts at the same time can take a bit of effort.

I wanted to work through some of the core FP concepts in a more familiar context, so am going to apply some of the lessons/principles/exercises in Groovy.


Functions:

If you have done any Groovy programming then you will have come across Groovy functions/closures. As Groovy is dynamically typed (compared to Scala's static typing), you can play it fairly fast and loose.

For example, if we take the square root function that is demonstrated in the Scala course, it is defined as follows:



As you can see, Scala expects the values to be typed (aside, you don't actually always need to provide a return type in Scala). But in the Groovy function it is:



A groovy function can be defined and assigned to any variable, thereby allowing it to be passed around as a first class object. If we look at the complete solution for calculating the square root of a number (using Newton's method - To compute the square root of "x" we start with an estimate of the square root, "y" and continue to improve the the guess by taking the mean of x and x/y )


So that's in Scala, if we try Groovy we will see we can achieve pretty much the same thing easily:


Recursion (and tail-recursion):

As FP avoids having mutable state, the most common approach to solve problems is to break the problem down in to simple functions and call them recursively - This avoids having to maintain state whilst iterating through loops, and each function call is given its input and produces an output.

If we again consider an example from the Scala course, with the example of a simple function that calculates the factorial for a given number.

This simple function recursively calculates the factorial, continuing to call itself until all numbers to zero have been considered. As you can see, there is no mutable state - every call to the factorial function simply takes the input and returns an output (the value of n is never changed or re-assigned, n is simply used to calculate output values)

There is a problem here, and that is as soon as you attempt to calculate the factorial of a significantly large enough number you will encounter a StackOverflow exception - this is because in the JVM every time a function is called, a frame is added to the stack, so working recursively its pretty easy to hit upon the limit of the stack and encounter this problem.  The common way to solve this is by using Tail-Call recursion. This trick is simply to have the last code that is evaluated in the function to be the recursive call - normally in FP languages the compiler/interpreter will recognise this pattern and under the hood, it will really just run the code as a loop (e.g. if we know the very last piece of code in the block of code is calling itself, its really not that different to just having the block of code/function inside a loop construct)

In the previous factorial example, it might look like the last code to be executed is the recursive call factorial(n-1) - however, the value of that call is actually returned to the function and THEN multiplied by n - so actually the last piece of code to be evaluated in the function call is actually n * return value of factorial(n-1).
Let's have a look at re-writing the function so it is tail-recursive.


Now, using an accumulator, the last code to be evaluated in the function is our recursive function call. In most FP languages, including Scala, this is enough - however, the JVM doesn't automatically support tail-call recursion, so you actually need to use a rather clunkier approach in Groovy:

The use of the trampoline() method means that the function will now be called using tail-call recursion, so there should never be a StackOverflow exception. It's not as nice as in Scala or other languages, but the support is there so we can continue.


Currying:

This is like function composition - the idea being you take a generic function, and then you curry it with some value to make a more specific application of the function. For example, if we look at a function that given values x and y, it returns z which is the value x percent of y (e.g. given x=10, y=100, it returns the 10 percent of 100,  z=10)

The above simple function is a generic mechanism to get a percentage value of another, but if we consider that we wanted a common application of this function was to always calculate 10% of a given value - rather than write a slightly modified version of the function we can simply curry the function as follows:

Now, if the function tenPercent(x) is called, it uses the original percentage() function, but curries the value 10 as the first argument. (If you need to curry other argument positions you can also use the rcurry() function to curry the right most argument, or ncurry() which also takes an argument position - check the Groovy docs on currying for more info)


Immutability:

Immutability is partially supported in Java normally with use of the final keyword (meaning variables can't be changed after being initially set on object instantiation). Groovy also provides a quick and easy @Immutable annotation that can be added to a class to easily make it immutable.  But really, there is more to avoiding immutable state than just having classes as immutable - As we have functions as first class objects, we can easily assign variables and mutate them within a function - so this is more of a mindset or philosophy that you have to get used to. For example:

The first example is probably more like the Groovy/Java code we are used to writing, but that is mutating the state of the list - where as the second approach leaves the original list unchanged.


Map Reduce:

As a final note, there are some functions in FP that are pretty common techniques - the most famous of which these days (in part thanks to Google) is Map-Reduce, but the trio of functions are actually Map, Reduce(also known as Fold) & Filter - you can read more about the functions here (or just google them!), but these functions actually correlate pretty nicely to core Groovy functions that you probably use a lot of (assuming you are groovy programmers).

Map

map is the easiest to understand of the three. It takes in two inputs - a function, and a list. It then applies this function to every element in the list. You can basically do the same thing with a list comprehension however. 
Sound familiar? This is basically the .collect{} function in Groovy

Reduce/Fold

This one is a bit more complicated to descibe, but is the same as the .inject{} function in groovy

Filter

And another simple one - filtering out a list for desired elements, this is Groovy's .findAll{} function



As I said at the start, I am new to FP and coming from an OO background, but hopefully the above isn't too far from the truth!  As I get further through the Coursera course I will try to post again, maybe with some of the assignments attempted in Groovy to see how it really stands up.


Some useful references:

Tech: Building an RSS reader for Android (RIP Google Reader)

This tutorial will walk through building an RSS reader on the Android platform (focusing on 3.0 + as it will be using Fragments). All the code is available as a complete, working Android app that you can fork/download and fire up straight away on a compatible Android device/emulator. So feel free to go grab that from GitHub before continuing.

It is not an unusual requirement for mobile apps these days to be able to consume an RSS feed from another site (or many) to aggregate the content -  Or maybe you just want to build your own Android app now that Google has announced it will be retiring Reader.

Those of you who have worked with RSS in other JVM languages will know that there are plenty of libraries available that can handle parsing RSS - however, because the android platform doesn't actually contain all the core java classes, almost all of the RSS libraries have not been supported.

Fear not though, as Java's SAX parser is available, so with a bit of code we can get a custom RSS parser up and running in no time!

This walk through will cover off the basics of getting an RSS reader app up and running quickly and will also cover some details of Android's fragment system for tablet optimization as well as some things to watch out for (such as changes in the platform that mean network operations cannot be run in the main thread, which requires some tweaking if you have worked on earlier versions).

All the code for the app is also available on our GitHub so feel free to fork that and try loading it up on your Android device/emulator.

Parsing an RSS Feed:


So to get started we will look at parsing the feed - if you have any experience parsing XML using SAX in Java then you will know how easy this is. All we need to do is to tell the parser which XML nodes we care about, and what to do with them.

If  you have never implemented a SAX parser before, there are three primary methods that we will override: 
  • startElement() - This is called by the parser every time a new XML node is found
  • endElement() - This is called by the parser every time an XML node is closed (e.g. </.. )
  • chars() - this is called when characters are found between nodes



Because we only really care about capturing data from the leaf nodes, our startElement() method is left empty. The chars() element has to be watched, as there is no guarantee when it will be called (e.g. in a node like hello world  this method might be called several times between the start and end) so every time we will just append the contents to a StringBuffer - that way we can be sure that we will have captured all the data in the node.  By the time the endElement() method is called, we know that we have the contents of the node itself, and we just have to store the data.  At this point, we just quickly knocked up a POJO with the attributes that we wanted to capture - the Strings that we match on are the node names from the ATOM RSS feed (that Blogger uses) - if you are using another feed, just have a quick look at the feed and update the node names appropriately.

Using our Feed in an Android App

So, that was easy right? Once that parser has run through (and you could use that code standalone in any java app really) then you will have a list of Java objects that have the core details about the latest blog posts on the feed (title, author, datecreated, content, etc) - So now lets look at using it in an Android app.

We will assume a basic understanding of the Android SDK and the concept of Fragments, so won't go back to basics with that stuff.

What we will do, is create a basic ListFragment and an RSSService class that we will use to populate the list. In our ListFragment we will simply tell our RSS service to populate the list:



Simple right?

Let's take a look at what our helpful RSS service is doing for us.



The first thing to note is that this class is extending Android's AsyncTask- The reason for this is that since Android 3.0, you are no longer able to perform network operations in the main application thread, so being as our class is going to have to fetch some RSS feeds we are going to have to spin off a new thread.

As you can see, the constructor just sets some context info that we will use later on, and then builds a progress dialogue - this is then displayed in the onPreExecute() method - This lets us show a "loading" spinning disk whilst we fetch the data.

Android's AsyncTask's primary method that handles the actual work that you want to do asynchronously is called "doInBackground()" - In our case, this is simple - we just invoke our SAX RSS parser and fetch our feed data:



Finally, we will override the "onPostExecute()" method of the async class to use our newly fetched list to populate our ListFragment.  You note how when we overrode the doInBackground() method earlier we set the return to List of Articles (where Article is my simple POJO containing my RSS blog post info) - well this must correspond to the argument of the "onPostExecute()" method, which looks like this:



Actually, all we really needed to do in this method would be pass our new List or articles to the ListFragment and notify it of the change as below:



However, in our application we have added a bit more sugar on the app - and we have actually backed the app with a simple DB that records the unique IDs of the posts and tracks whether or not they have been read to provide a nicer highlighting of listed blog posts.

So that's it - there's plenty more you can add to your RSS reader app, such as storing posts for reading later, and supporting multiple feeds/feed types - but feel free to fork the code on GitHub, or just download on to your android device to enjoy all our NerdAbility updates!

Application running in Andriod emulator
RSS application running in an Andriod  tablet emulator




Graph data structures - Searching

Previously we looked at an introduction to graph data structures and designed a very basic Graph implementation.  I also mentioned that the main things we would likely want to do with a graph is search/explore the graph.


There are two primary tools that we will use to explore graphs - these are basic computer science concepts, and should be familiar to everyone who has studied computer science at uni and faced graphs before.


Depth First Search (DFS) 

The concept behinds DFS is that given a starting point or root node, we will
search as deep as we can one route before backtracking (e.g. select a neighbour to the root node, visit that neighbour, then select a neighbour from that node and visit - continue this until we reach a node that we cannot follow any more edges from, and then backtrack up the graph considering alternate neighbours at each step)




DFS is pretty simple to implement and nataurally uses a Stack data structure to keep track of the backtracking (the easiest way to do this is to solve recursively using the implicit Stack)

Below is a simple Java implementation of DFS using recursion to handle the backtracking.







Breadth First Search (BFS)

This search takes the different approach of looking as wide as possible before moving down a level. For example, we will visit all the immediate neighbours of the root, then select one of the immediate neighbours and visit their immediate neighbours - then backtrack up to visit other neighbours at this level.



(image also from wikipedia)



BFS is also fairly simple, and pretty close to DFS but it uses a Queue (FIFO) structure rather than a stack to visit nodes from the root down first.  Below is example code of BFS implemented in Java.



Graph data structures - An introduction

With most problems it is largely pretty clear which data structure is going to be appropriate to use - If you just care about storing a list and iterating over it, then you probably want an Array based structure - if you particularly care about the elements being unique you could look as a Set. If you want to capture a key-value pair dictionary data structure then you can use a Map.

Graph data structures are no different, and are very applicable to a subset of problems, and once familiar with graphs and the common algorithms it becomes quite easy to quickly identify the problem type as a graph problem.


The basics

A graph has two main elements:
  • Node - a given data point in the graph
  • Edge - a connection that joins any two Nodes. A graph can be "directed" or "un-directed" - this simply determines whether the Edge goes both ways or is purely one way. 

A graph is a data strucutre that stores a set of connected elements - the easiest way to understand the structure is with a real world example, the most famous is probably like the social-network graph. If you think about your profile on any popular social network (Facebook/LinkedIn/etc), your profile is a Node in the graph, and each of your friendship/connections is an Edge to another Node in the graph

A while ago, a Facebook intern created a visualisation of the Facebook social graph around the world - you can't really make out the individual Nodes/Edges, but you get the idea.


In Facebook's graph, it is un-directed, that is, when you become friends with another Node in the graph the relationship goes both ways - you are their friend and they are yours.

Twitter, however, is a directed graph - once you follow someone an Edge is created between your Node and theirs, but they don't automatically follow you as well, so the Edge has direction.


If you are a LinkedIn user, you may have noticed whilst browsing another user's profile a widget saying something like

"X of your connections can introduce you to someone who knows Y"


What LinkedIn is telling you, is that the shortest path between your Node and Y's Node in the graph is 3 Edges (3 "hops" - following an Edge between nodes is often called a "hop") - To be able to do this, LinkedIn is searching the social graph space to discover the shortest path (and number of unique paths that are of the shortest distance) between you and this other user.


Graph representations

There are two primary graph representations:
  • Adjacency Matrix - This is a matrix/2-D array that captures the relationship between every node. Every node is mapped against the X and Y axis, and the value in the intersecting cell determine if their is an Edge between the nodes. E.g. if we wanted to know if there was an edge between Node "4" and node "13" we would look at matrix[4][13] - the value there would tell us. Normally, value of 1 represents an edge, but other values can be used (for example, if it is a weighted graph the values could represent the weights, or if it is a directed graph it could use -ve/+ve values to represent direction).  This representation is good for "dense" graphs.
  • Adjacency List - This representation is simply a List of all Edges and a List of all Nodes. This is a simple representation and is more memory efficient for "sparse" graphs

For the code samples here, I will focus on the Adjacency List representation


Graph representation - Java

Below is a very rudimentary implementation of a Graph class in Java. It uses a Adjacency List representation and will be used in later examples I go through.


Below is a sample unit test setup that shows how a simple graph can be initialised:



In the next post I will go through basic techniques for searching and exploring graph spaces, as well as a post looking at how to solve the LinkedIn shortest path recommendation problem.





Quick Sort - A Java implementation

And now the same for a Quick Sort I implemented. Normally Quick Sort also runs in O(nlogn) time, but its worth noticing that the implementation below just uses the first element as the pivot value, which is not an optimal pivot (performs very badly in partially sorted lists for example), so will leave it to you to think about better ways to choose the pivot value.



MergeSort - A Java implementation

I wrote a Java implementation of Merge Sort a little while ago, just for fun really. I was just about to close the file, noticing it was still open in my IDE, and thought I might as well just post it here quickly. It might be interesting for someone along side the Merge Sort analysis I previously wrote up.



Creating a Java URL shortener

A long time ago I posted a brief article about using Google's URL shortening API to easily create shortened URLs for your application (e.g. if you wanted to post stuff server side to twitter and cared about character usage, you could fire your URL over to Google's API and just use the result).

However, recently I have had to think about how to shorten URLs myself, and really, it's pretty easy to implement yourself.


Creating a unique, repeatable identifier for a URL
I think a lot of people's first instinct might be to go for hashing the URL string - this isn't a good idea for a few reasons though:
  • Length - most normal hashing algorithms (md5/sha-*) produce long strings, which kind of goes against the point of a url shortener
  • Unique-ness - Obviously, if this is going to be a URL identifier then it needs to be unique, and hashes by their very nature are not unique - which means you would need to handle the scenario where a URL creates an already used hash and has an alternative
  • Look-up - as hashes are not (easily) reversible, you would need to look up the URL using the hash as the db key - which may not be ideal given a very large set of URLs (imagine number of URLs bitly.com has)


Thankfully there is a viable, easy solution available.


Lets first think about our database structure for persisting our URLs - In the simplest case we could probably get by with two columns:
  • id (DB generated sequence ID)
  • url - text field to capture the URL value

Generating the identifier from a DB
  1. Now, if you provide a String URL value, your code just needs to insert it into the table, this will create the row and the unique ID.
  2. Next, fetch that unique numeric ID, and convert it to base-62 (this will convert the numeric value into the base-62 representation (rather than normal base10, it will allow 0-9, a-z, A-Z as characters.  This gives you both an identifier in the form of "1jLPSIv" but also provides a massive ID space that can fit into relatively few characters (that will be part of the shortened URL) - 6 characters of base 62 provides a possible 6^62 different unique combinations (1.7594524073e+48 in total)

You may not want to directly leak the DB ids to the URL, in which case you can easily salt the IDs as you choose appropriate.



Bit.ly example

Looking at bit.ly shortening URLs it would appear that they follow the same pattern. I shortened two of my previous blog post URLs one after another, and the URLs generated as follows:

bit.ly/1oYgrsG

bit.ly/1tR63Zb

If we look at those two url identifiers, they look a lot like base-6, and if we convert the identifiers to base 62(and let's look at base-64 as well, just for funsies they may be using that)

URL - Base64>Base10 - Base62>Base10
1oYgrsG - 3685493160710 - 103119489480
1tR63Zb - 3690751293019 - 107587946123

As you can see, they look reasonable - there's a chance that there were a few URLs created between my two URLs, but I suspect there is a reasonable amount of salting going on so it is just a one-by-one increment. (although interestingly, if you get a bit.ly url and then just increase the last char by one - assuming base62 - then it will likely provide a new url - so you can have your own game of manual-webpage-roulette!)

Spring 4: XML to @Annotation Configuration

I got 99 problems, but an XML based security configuration aint one..

I have for a long time preferred to use Spring (or generally Java) annotations over XML config for my development (I concede the point on XML configuration allowing a central place to understand all Controllers etc, but with modern decent IDEs this info can still be viewed centrally even if using annotations).

A while ago I made the switch to pure code configuration for my Spring webapps, with the exception of the web.xml (dependency on using Tomcat7 that I didn't want to enforce just yet) and the security config - which Spring didn't support.  Along with Spring 4, Spring have announced that they will now support security config as code, which is great news!


XML Configuration

Below is an example of a typical XML based configuration.

This basically turns on security for the webapp, it:
  • Adds a single intercept URL rule (in this case, our rule says all URLs are open to everyone - so not that secure! We would likely add more URL intercept rules above this as we have more URLs we want to secure - these rules are read top to bottom and fall through until a match is found, so permitAll must be the lowest rule found)
  • Defines a login form - including location of the form, the submission target URL, login failure redirect URL and the logout URL. Spring will automatically handle all these. If you submit a form to the login-processing-url then Spring will intercept it and attempt to authenticate - you don't need a specific controller handler defined.
  • Defines an authentication manager - this will be used to authenticate submitted requests. In this case, I have overriden the standard Spring User Service with my own implementation (this will be a common requirement as you will want to authenticate against users in your DB etc)


The config is all pretty simple really - the only unpleasantness is the requirement to have XML!


Code Configuration

And here is the config in code using the latest and greatest Spring dependencies.

Let's have a lookat what is going on here.
  • I am @Autowiring in my custom User Service class - this will be used in the Authentication manager later.
  • The configure() method is where we set the URL intercept rules and login form - it should be pretty self explanatory reading it through if you are familiar with old fashioned xml config (will come back to CSRF config shortly - but it is NOT recommended that you disable this! this is just in development)
  • The registerAuthentication() method sets up the Authentication Manager using our User Service implementation. Whilst setting up the Authentication Manager, we also define a password encryptor - you will note in the code config I am using the BCrypt password encoder, and in XML using a base64 MD5 hash - but whichever encoder you want to use you can configure it in the same fashion using the secondary method to instantiate and return the encoder desired (although you probably shouldn't be using MD5 over BCrypt.. really, you shouldn't ).



Observations and Gotchas

As you can see in the above code example, I am explicitly disabling CSRF. As of Spring 3.2 onwards, the security layer provides CSRF protection by default - so if this is not disabled then you have to provide a CSRF token to prove your request is legit. Should you need to disable it, you can using the above syntax, but obviously CSRF protection is built in to help you out, so probs best not to disable it on a prod system!

The link above details how to include tokens in requests for your webapp. It's really pretty easy to do.


The second point to note is one that caught me out.  Previously in Spring, when submitting a form to authenticate, the default field names had to be j_username and j_password. If you are hoping to switch the config out on an existing webapp then you have to make sure you update your login form with the correct field names (Spring 3.2 with XML config you still need j_*, its only Spring 3.2 & Java config..)



As always, the code is on GitHub - feel free to check it out.

JVM Memory Management Primer: Groovy PermGen the Prequel

I am shortly going to be writing a post about managing PermGen memory with Groovy in production, but before getting to that this is a primer/reminder on some key parts of JVM memory management.

Out of Memory: Heap Space

Throughout dev on a JVM project, it is not entirely uncommon to see an OOM exception along the way. The most common will be a "Heap Space" OOME - The Heap is the part of memory that the JVM stores data/objects that are created by your application. For example, if you create a class called "User" and then create an instance of it, the instance of the class will be placed in the Heap.  This class of OOM are more common as it can happen if you attempt to load to much data or if your application is growing during dev and you haven't configured max heap space size (e.g. if you have a low heap space size defined and load a lot of data into your application you may see this error)

The Heap is split in to two** sections (also called generations) - the Young(sometimes called nursery) and the Old (sometimes called tenured generation). The JVM uses a generational Garbage Collection approach:

  • Young Generation - This is where most newly instantiated objects are placed. As many objects are relatively short lived, many are born and then die in this space. This space is collected more frequently and quickly.  This itself is normally split into two generations - Eden & Survivor - These just represent the age of the object, and as an object survives more GC it is promoted up, after a set threshold it is promoted into..
  • Old/Tenured Generation - This space stores objects that have outlived the young generation. These are assumed to be long lived objects, GC of this generation is less frequent and takes a lot longer (as it has to inspect all objects)

There are several GC strategies that handle GC of the heap, and can be tuned for your application (depending on what is important to you) - you can read more on the Oracle docs about strategies.


** In some versions/JVMs PermGen space is also part of the heap, so that would make it three. Also the Young generation is sometimes split further, but for a high level review of OOM Heap Space - we will consider two


Out of Memory: PermGen

The other class of OOME you might see is a PermGen OOM.  By and large this is rarer than the heap space errors as the Permanent Generation (PermGen) is the part of memory that stores Classes. In our User class example, the instances of the User class are on the heap, but the Class itself will be stored in the Perm Gen (in this example, there maybe any number of instances of the User class - one per user! - but there should only be one version of the Class). Just using standard Java, perm gen won't often be a problem, as there should be a static number of total classes for your application and as long as you haven't set the PermGen size too low then this shouldn't happen, but with the use of dynamic languages the risk of PermGen errors become more frequent, with languages like Groovy dynamically creating adhoc classes.

Whilst PermGen stores classes rather than object data, it still obeys the simple principle of GC: If a class no longer has any references to it, then that class could be subject to GC; whilst there are still objects that hold reference to a class then it cannot be GC'd.


A final important thing to note is as the norm in Java/JVM is for a static number of classes to be used, the default GC strategy is to not collect PermGen - So if you are doing anything that involves changing PermGen and adding more classes to that memory then you will inevitably see a memory leak.


Next post I will go into details about a few Groovy memory gotchas I have come across after 18months of production Groovy code.




Spring MVC 4.0

I have had a post coming for some time on my thoughts experience in starting to play with the changes that come as part of the latest major upgrade to the Spring suite.

There are lots of changes, that were announced at last year's Spring One event - including a flashy new spring.io website and several new packages to the framework such as Spring Boot (which they describe as an oppionated way to start building Spring apps - basically out of the box components that reduce work, but configured to Spring's tastes and conventions).



The interesting parts for me are in Spring MVC (as that is where I don most of my Spring dev) - most notably, the ability to write Spring apps in pure Groovy is of interest (as I do a lot of Groovy dev, and once you get used to the luxuries of its collection based closure functions then its hard to go back!) and also the ability to now configure the Spring Security stuff also entirely programatically (my previous attempt of pure code config spring app were thwarted only by having to use XML configuration for the security).

I am yet to convert an app to Groovy, but I have created a basic web app using the Spring 4.0 Milestone releases and converted the security config to code.  Full source code of the webapp is on github.

I will stick that up on GitHub soon, and when I do I will write in more detail about it (I also played around with LESS on that project and configured MAven/Eclipse to build the LESS/CSS files nicely, which I will also write up soon)

As it happens, I also stuck the demo app up on cloud hosting service AppFog.com (given that CloudFoundry is no longer with us in it's free form.. which kinda sucks, as that was really nice). It's by no means a complete app, and doesn't really do anything - just lets you connect your Facebook/Twitter accounts and see all your contacts etc - but as you can see - there is a lot of filler text and the LogOut link is always in the navbar etc.. But anyway, its here, for now..

Spring 4 App homepage


Spring 4 App Dashboard


  http://socialcrm.eu01.aws.af.cm/

Google Shortening URLs

As another quick follow on note from the below, one of the features that was built in to the application was the ability to share your resume or particular achievements with your friends on Twitter. To do this, I obviously wanted to share a link back to the URL of the resume, so to maximise the potential additional text I investigated URL shortening.

Their is a bit.ly API that uses OAuth, but for what I wanted to do, I decided that was overkill, as I didn't necessarily need to associate the shortened URLs to a users bit.ly account, all I really cared about was getting a shortened URL.

Fortunately, Google came to the rescue with their goo.gl URL shortening service that also exposes a public API without need for authentication.

So I simply wrote a service class that utilised the Spring RestTemplate class to shorten URLs:

@Service("urlShortenService")
public class UrlShortenService {

       private RestTemplate restTemplate;

       public UrlShortenService() {
              restTemplate = new RestTemplate(ClientHttpRequestFactorySelector.getRequestFactory());
              List<HttpMessageConverter<?>> messageConverters = new ArrayList<HttpMessageConverter<?>>();
              messageConverters.add(new StringHttpMessageConverter());
              messageConverters.add(new MappingJacksonHttpMessageConverter());
              restTemplate.setMessageConverters(messageConverters);
       }
	   
       public String shortenUrl(String url) {
              Map<String, String> request = new HashMap<String, String>();
              request.put("longUrl", url);
              LinkedHashMap<String, String> shortUrl = restTemplate.postForObject("https://www.googleapis.com/urlshortener/v1/url", request, LinkedHashMap.class);
              return shortUrl.get("id");
       }
}


I didn't worry too much about validating that the string passed in was a URL for the time being as I always had control of that, but that should be something that would need to be considered.

Integrating MongoDB & Spring

Further to my previous post with a deck walking through MongoDB & Spring integration, I also recently wrote it up as a technical article for work (my day job), and I thought I would share the detailed article here for you as well..


Introduction

With the explosion in the amounts of data being generated in recent years, more and more organisations are looking at alternative data storage options to the traditional relational model. This in turn has lead to huge growth in the NoSQL space, with leading web companies such as Facebook, Google, Twitter, Reddit, etc adopting NoSQL solutions.
Within the NoSQL space there are several different implementation options (Graph based DBs such as Neo4J, Wide Column DBS such as Cassandra and Haddop, Document based DBs such as MongoDB and CouchDB) and careful consideration is needed before choosing an implementation.
This article will look at integrating the Document oriented database MongoDB with a Spring MVC Web Application – it is important to note that due to the nature of Document based storage solutions, they are not applicable for all problems. For example, if the data you are modelling cannot naturally be stored as “documents”, then a Document-oriented DB probably isn’t the best solution (an easy way to think about whether the model can be stored as a document is to think of it being stored on paper – does it make sense for all the elements to be intrinsically grouped in to a single document? E.g. if you were storing an essay, all the chapters are intrinsically linked to the essay as a whole document, and a single chapter doesn’t make much sense as an individual object on its own).
For that reason, this article will look at a simple web app that allows users to create Resumes – Similar to an essay, resume storage is naturally suited to the Document based approach rather than a Relational approach.


Getting Started

This article will not cover details of creating a Spring MVC web application, and will assume a prior knowledge of the Spring MVC framework and core Spring principles. The example application itself is very simple, and there are several aspects of it that have not been included (these have been intentionally left out for simplicity), such as security/authentication features, advanced UI/screen flow/editing, etc, the purpose of the example application is purely to demonstrate MongoDB integration with Spring.
The source code for the application is all available from my GitHub Account (see links on the right), and if you want to follow the code below are the required pre-requisites:
  • Install STS Eclipse build and install the CloudFoundry extension (Optional) – If you want to follow the steps to deploy the application to the cloud then this is required
  • Sign up for a CloudFoundry account (www.cloudfoundry.com) (Optional) – Currently in Beta so its recommended that you sign up for the account as soon as possible as requests can take time to come through






Project Dependencies

Spring currently has a project underway to integrate core Spring functionality with various non–relational data technologies, including MongoDB, called “Spring Data”. We will use this library to facilitate our integration, so we will need to include the following maven dependency in our project pom:

              <dependency>
                     <groupId>org.springframework.data</groupId>
                     <artifactId>spring-data-mongodb</artifactId>
                     <version>1.0.0.M3</version>
              </dependency>


We also need to include the MongoDB Java driver:

              <dependency>
                     <groupId>org.mongodb</groupId>
                     <artifactId>mongo-java-driver</artifactId>
                     <version>2.6.5</version>
              </dependency>


And finally, to deploy on to CloudFoundry’s hosted service we need their runtime library:

              <dependency>
                     <groupId>org.cloudfoundry</groupId>
                     <artifactId>cloudfoundry-runtime</artifactId>
                     <version>0.7.1</version>
              </dependency>




Building the Domain Model

If you are familiar with using popular JPA frameworks in your Spring web apps (Hibernate/Eclipselink, etc) then this part should look familiar – Like any application we will need to create some objects to model our underlying data (the “M” in the MVC). For our simple application we will have three basic objects:



As you can see, we have three simple objects in our model, Resume, ResumePage, Section (we will just use these objects to split our document down in to several sections/chapters). One of the advantages of using Document-oriented DBs is that as it is schema-less, objects can be added to the document in the future without affecting existing data.
We model these objects as simple POJOs, just like modelling entities in JPA, but there are a few simple annotations Spring-Data provides for us
Resume.java:

@Document
public class Resume {
      
       @Id
       private String id;
      
       private List<ResumePage> pages = new ArrayList<ResumePage>();
...


That’s it! We use the @Document and @Id annotation to indicate any object that we want to treat as a Document, and the ID. We only need to add these annotations on our Resume object, as in this case the Resume is our document, we do not want to store our pages or sections as individual documents in themselves.



Data Access Layer

Now we have our data model we need to create our Data Access layer, so we can easily perform CRUD updates on our documents.
Creating our Data Access Objects is incredibly simple using MongoDB’s MongoRepository, and we automatically get basic CRUD functionality by just extending that interface. In our example app, we only want CRUD functionality so we just create a basic Interface that extends MongoRepository:

@Transactional
public interface IResumeRepository extends MongoRepository<Resume, String>{}

Next, in our Service layer we will auto-wire our new Interface in to our service classes (this is normal Spring stuff, using the @Autowired annotation we are telling Spring to handle the dependency injection for this interface):

@Repository("profileService")
@Transactional
public class ProfileService{

       @Autowired
       private IResumeRepository resumeRepo;


This makes the CRUD functionality available in our Service class to perform common functions, such as loadResume (in this case, we are using the resume owner’s name as the ID for the documents):

public Resume get(String userName) {
       return resumeRepo.findOne(userName.toLowerCase());
}


Or createResume:

       public Boolean createResume(Resume r) {
              try {
                     // Insert to db
                     resumeRepo.save(r);
                     return true;

              } catch (Exception e) {
                     //log exceptions here!
                     return false;
              }
       }


By simply extending the MongoRepository Interface and then auto-wiring it into our Service classes, we have provided basic CRUD functionality through the MongoRepository API as well as ensuring that our service class is not tightly coupled with the DAO implementation. However, often applications will need custom queries beyond basic CRUD. This can also be simply achieved using the MongoTemplate class provided, for example:

Query query = new Query(where("id").is(“rob”));
Resume r = mongoTemplate.findOne("mycollection", query, Resume.class);

The above query will retrieve the Resume belonging to “Rob”. The Query object provides rich API for accessing and updating documents in your data store.



Spring Configuration

The final step of our integration is to configure the Spring beans to handle the data source properties.
The Spring configuration resides in the Application Context XML file (in our case, it’s a file named “applicationContext.xml” and can be found in src/main/resources/META-INF/spring/.
This file is used to declare some of the Spring beans (although not all beans will be declared in here – some beans are declared using annotations, such as our Service classes), for our MongoDB integration we need to include the following configuration:

       <!-- Mongo Configuration -->
       <mongo:repositories base-package="com.tmm.nosql.mongodb.repo" />
      
       <bean id="mongoTemplate" class="org.springframework.data.document.mongodb.MongoTemplate">
              <constructor-arg ref="mongoDbFactory" />
       </bean>
      
       <!-- local config -->
       <mongo:db-factory id="mongoDbFactory" dbname="resume_db" host="localhost" port="27017"/>


The first element of the config declares the name of the package that our Repositories are stored in (this is where the our DAO Repository lives) – we declare this so to enable the package to be scanned for relevant repository classes.
The second configuration element declares the Mongo Template bean (provided by the Spring Data library) and we declare that we want Spring to inject our MongoDbFactory bean using Constructor based Dependency Injection.
The final configuration option declares the MongoDbFactory that we will be injecting in to our MongoTemplate bean – here we need to define the database server and port name. The default port number is 27017, so unless you have altered from the standard MongoDB install and configuration then you should use these details.




Running the Application

Running Locally

To test the application, set up a Tomcat server within your Eclipse (IDE) and add the project to the server, start the server and navigate to http://localhost:8080/ and you should see a simple welcome page.

Running on CloudFoundry

Before running on CloudFoundry we need to make a very minor tweak to the configuration. Previously we had added the configuration option:

<mongo:db-factory id="mongoDbFactory" dbname="resume_db" host="localhost" port="27017"/>

As we are no longer running on localhost and want to bind these options to CloudFoundry’s services, we change this to:

<cloud:mongo-db-factory id="mongoDbFactory"/>

Simple enough again – we just declare our MongoDbFactory, and leave CloudFoundry to inject the required parameters.


The next step to deploying the application on CloudFoundry is to add a CloudFoundry server instance to your eclipse:
  • Open the “Servers” view in Eclipse (Window>Show View>Servers)
  • Right-click and select New>Server

  • Select VMWare > Cloud Foundry and then press the “Next” button
  • You will then have to enter your Cloud Foundry account information:



  • Complete your email and password information, leaving the URL as it is and select the Next button.
  • The final screen will give you the option to add the project to the server, add the server and press Finish



Now the Cloud Foundry server instance is setup within your IDE you need to configure the deployed application so you can bind it to the required services:
  • Double-click on your deployed project within the “Servers” view:


  • This will open the “Applications” page – you will notice in the bottom left had corner an empty box titled Services, 




  • Press the “Add Service” button to the top right of the Services panel:




  • The MongoDB service will now be listed in the Services box, now drag it across to the right to the “Application Services” section. The “Applications” panel should now appear something like this:



  • Press the “Start” button under the General tab, then watch the log as the application is started up on the CloudFoundry – there should be no errors reported
  • Now navigate to the URL you assigned to the application (this can be edited in the above panel under the “General” section


You now have a MongoDB powered NoSQL application running in the Cloud!