Showing posts with label groovy. Show all posts

Groovy Retrospective: An Addendum - Memory usage & PermGen

I can't really have a Groovy retrospective without mentioning memory.

Over the last four years I have spent more time than any sane person should have to investigating memory leaks in production Groovy code. The dynamic nature of Groovy, and it's dynamic meta-programming presents different considerations for memory management compared to Java, simply because perm gen is no longer a fixed size. Java has a fixed number of classes that would normally be loaded in to memory (hot-reloading in long living containers aside), where as Groovy can easily change or create new classes on the fly. As a result, permgen GC is not as sophisticated (pre moving permgen to the normal heap anyway) and largely in Java if you experienced an Out-Of-Memory permgen exception then you would just increase your permgen size to support the required number of classes being loaded by the application.

To be fair, the majority of the problems encountered were due to trying to hot-reload our code coupled the setup of the application container (in this case tomcat) and having Groovy on the server classpath rather than bundled with the application (much like you wouldn't bundle Java itself with an application, however, bundling Groovy with your application is recommended).

A couple of points of interest, if you are considering Groovy (especially relevant in a long running process where you attempt to reload your code):


The MetaClassRegistry

When you meta-program a Groovy class, e.g. dynamically add a method to a class,  its MetaClass is added to the MetaClassRegistry, which is a static object on the GroovySystem class. This means that any dynamically programmed class creates a tie back to the core Groovy classes.

The main consideration to keep in mind when meta programming in a Groovy environment is that if you want to reload your classes you now have a link between your custom code and the core Groovy code so you must either 1) explicitly clear out the MetaClassRegistry; 2) reload the core Groovy classes as well (throw everything out on reload)

I think coming from a Java environment, where you would likely use the JAVA_HOME on the server for long running applications, it can often seem logical to have a similar server classpath entry for Groovy also - but actually, the easiest approach is to bundle the groovy classes with your application so is a normal candidate for reloading.

If you decide not to reload Groovy, you can add explicit code to clear out the registry - this is pretty simple code, but a note of warning, without just throwing everything away there are still plenty of risks that you can leave links that stop your classes being collected (which was the case for me for a long time until just making all the third party classes (groovy included) candidates to be thrown away and reloaded.  Even with throwing away all third party libraries you can still be caught out if you use shutdown hooks (e.g. jvm shutdown hook to clean up connections will tie your classes back to your underlying JRE, meaning that no classes can be collected until you restart your application!)

Anyway, above is code to clear the registry, it assumes you have access to the GroovyClassLoader, but you can also follow the same approach by just grabbing the MetaClassRegistry from any arbitrary Groovy class and iterate through that. Give it a try, if you play around a little you will probably find its quite easy to create a leak if you want to!


Anonymous Classes

Another thing to keep in mind in Groovy applications is the generation of classes by your application. As you would expect, as you load in Groovy classes to your application they will be compiled to class files (or if you are pre-compiling into a JAR or something) which will be added to PermGen. However, in addition to this, your code being executed may also result in additional (possibly anonymous) classes also being created and added to PermGen - so without care, these can start to fill up that space and cause OOM exceptions (although generated classes will often be very little, so might take a while before it actually errors).

An example of what might do this is loading Groovy config files - if you are loading sensibly and just doing it once then it won't be an issue, but if you find yourself re-loading the config every request/execution then it can keep adding those to PermGen.  Another example of where this happens (surprisingly) is if you are using Groovy templating. Consider the following code:

(Taken from the SimpleTemplateEngine JavaDocs)

The example is a simple example of binding a Groovy template with a map of values - maybe something that you would do to send an email or create a customized document for someone - but behind the scenes Groovy will create a class that is added to permgen for each execution. Now this isn't a  lot, however, if you are dealing with high throughput it can certainly add up pretty quickly.


Lazy Garbage Collection

Another interesting behaviour that I observed over the last few years is that, in the JVM implementation I was using, the PermGen garbage collector collected lazily.  As I mentioned, because nothing interesting traditionally happened in the permanent generation in the JVM, the garbage collectors didn't do anything interesting. Further more, because it was always assumed that the contents of perm gen were fairly static (as the name suggests) the collection happens fairly in-frequently, and often only kicks in for a full collection (which is more costly).  What this means is that even if everytime you reload you free up lots of classes for collection (say, your entire application), it might not GC permgen for several reloads as a full collection isn't required, and the JVM will just lazily perform the collection when permgen is almost full.



If you look at the diagram above, it displays a common pattern I observed - each little step in the up swing of the chart represents a full application reload, but you will see that there are multiple reloads before the permgen usage approaches the limit, and it is only when the usage is close to the limit that it actually performs the collection.

The challenges this can present is that if you push the permgen close to the limit but still not triggering a full collection, then the following reload it can once again spike the permgen (because the entire application and its associated third party code is being reloaded into memory), this can push it over the limit and cause OOM exceptions.  This was not something I ever saw on production environments, but was fairly common in desktop/development environments where less resources were available.

(another interesting observation in this particular pattern is that the GC seems to be intermittently collecting fewer classes. I never got to the bottom of that question mark: there was no difference in activity between application reload each time, so there is no change in application behaviour to trigger a leak and the middle period of reloading maintains a constant level of memory usage patterns - which also shows no leak behaviour)



Hopefully someone smarter than me about all this stuff is reading this and can shed some better insights into it, otherwise, hopefully its helpful if you are about to start doing crazy things with perm gen..

Groovy: A Retrospective

I have been using Groovy, in ernest, for just over 4 years now, and as I am possibly moving to using something else (Scala) I wanted to write up some notes about it all (not getting into memory leaks, as that's a whole other story!).

Over the years I have read people saying that it is not production ready, or is only really a language designed for scripting and testing etc but I have been using it in a production setting for the last four years and generally find that it is now at a stable enough level for production use (yes, it is still slower than Java, even with @CompileStatic, so if you know from the start that you are building some low-latency stuff, then maybe go with something else).

Java to Groovy

I started using Groovy 4 years ago when I joined a team that was already using it as the primary development language and came straight from an enterprise Java background.  The nice thing about Groovy (certainly versus something like Scala) is that it is, for the most part, very similar to Java syntax, and if you can write Java then you can also write Groovy.  Which is great if you either have an existing Java team or a large Java hiring pool and you want to start using Groovy. The downside is that it takes considerable discipline and code review process to ensure a consistently styled code base, as from my experience, whilst the Java-to-Groovy switch can start in an instance, the journey is a gradual one.

From my experience, both of making the transition myself and from hiring other Java developers into the team, there are three main stages of the journey:

  1. I like the return keyword! - If making an immediate switch to a Groovy environment, most former Java devs (myself included) likely end up just writing java in *.groovy files - most people find dropping the semicolon the easiest first step, but lots of Java developers wont fully embrace idiomatic Groovy, a good example of this being use of the return keyword.

    I remember early code review conversations being advised that I didn't need the return, to which I insisted that I liked it and found it clearer to read - A conversation I have since been on the opposite end of with several newer colleagues since then.  The effect of this is fairly minimal - it means as you bring more Java developers on board, you will find patches of your codebase which are distinctly Java in style (return keyword, occasional semicolons, getters & setters, for-each loops etc) - still readable and functional groovy, just not idiomatic.

  2. I can type less!? - The second phase that appears to be common is the opposite end of the spectrum - with the discovery of dynamic typing and the ability to define everything as def (or not include the type at all in the case of the method arguments).

    I'm not sure if this is a result of laziness (fewer character to type, and less thinking required as to specific types) - which I don't really buy - or whether it is just perceived as the idiomatic way.  Another common trait here is to over use scripts (they have a use, but are harder to unit-test and in my opinion, don't offer much over just using a groovy class with a simple main method which just handles the CliBuilder and instantiates the class to run).

    Again, in my opinion, typing is better - I still don't buy that typing def rather than the actual type is actually that much of a saving, and often ends up being a code-smell (even if it didn't start out as one).  The official Groovy recommendation is to type things, with the exception of local method scoped variables - but even in that case I cannot see the benefit: there is no real gain to the developer in typing those precious fewer characters and the future readability suffers as other developers have to track through the code more to be sure as to what a variable is doing.

  3. Idiomatic, typed Groovy - In a circle-of-life analogy, the resulting code is actually quite close to Java (well, it's statically typed and I don't focus on brevity of code at the cost of readability), but makes full use of the best groovy features - automatic getters and setters with dot notation for access (where brevity is good!), all the collection methods (each, findAll, collect, inject), explicit Map and List declaration, optional (but typed!) method arguments, truthy-ness, etc.

    I suspect the evolution to this stage, in part, comes from working on a sufficiently large code base, for long enough to have to start dealing with the fall out of nothing being accurately typed - for me this was having to port some groovy code to Java and found several methods with several un-typed (sometimes defaulted) arguments, and the method body code just a large if block checking the types of the passed data and handling differently.

    A week or so ago I put together a small web-scraping project in Groovy - its just a few classes, but gives a good picture of how I code Groovy now after 4 years.

The good bits

I think it was probably a combination of the above journey and Groovy's maturing (I started using it at version 1.7), but it took probably 2 1/2 years before I wanted to start using Groovy at home and for my own projects, continuing to use Java in those first few years.  But now, I am happy to recommend Groovy (performance concerns noted), its a nice syntactical sugar that can really compliment the way you would normally program in Java and there are some things that I really miss when going back to Java.

  • The explicit maps and list definition is awesome - This is a good example where brevity and readability come together, and is kind of puzzling that this hasn't been natively supported in Java anyway.  The closest I can think that Java has is its Arrays.asList(..) method, which is a nice helper, but still more clunky than it needs to be.  When I go back to Java I definitely miss the ability to just natively define nested Map/List structures
  • The extended collection methods with closure support, for iterating, collecting, filtering etc are also great, and probably the biggest part of Groovy that I miss going back to Java (I know that Java streams have started to address this area, and my limited experience of these have been pretty nice, but not quite as good).  This isn't just a nice syntax for iterating or filtering (although, again, how has there not been a better native support for find/findAll in Java until now?), but also the support for more Functional Programming techniques such as Map-Reduce.
  • Another simple nice feature that adds readability and conciseness - being able to simply assert if ( list )  or if ( map ) to determine if a collection has any values is great.
  • Automatic getters and setters - this is a nice one in theory. I agree that it makes no sense to always have to create getters and setters for everything (even if it is normally the IDE generating them), and I like the dot notation for accessing them.  At first glance it feels like it is breaking the encapsulation of the object (and really maybe they should have implemented this feature as an annotation rather than messing with the standard Java access modifier behaviour) - but really its just doing what you would do by hand - if you leave your variables as default modifier (and don't add your own getters and setters) then Groovy will add the, at compile time, and you are free to use the getter and setter methods throughout the code. The idea that you then use private modifier for variables that you want for just internal use and not exposed. However, there is a long standing bug that means this doesn't work, and you actually get getters and setters for everything! Still, a nice idea.


It's not quite a farewell, as I will continue to use Groovy (partly because I still enjoy using Spring framework so much) for the time being, and only time will tell whether Scala and some other frameworks take its place as my go to hobbyist platform of choice!

Spring, Reactor & Event driven programming on the JVM

Following my previous outing playing with the Spring-Boot microservices stuff, I once again found myself looking through some of the Spring libraries and came across the Reactor integration stuff.  It looked interesting, and thought I would have a quick look at the asynchronous event-driven model.

As always with Spring Boot, it was super simple to get an app up and running - it's worth noting that the app doesn't really get that much into the async processing side (or really expose scenarios with potential benefits/pitfalls of the approach) - but it gets an app up and running pretty easily.


Event streaming

Obviously, to get started I needed some kind of event source (I could have just stubbed out some code to just randomly create events in the system, but I wanted something more real). Obviously, with the modern web & big data, we have a tonne of events that we could use. I went for the obvious choice of the Twitter streaming API - as it exposes a streaming API I could just connect to that and on receipt of any tweet then just push an event on to my EventBus for any interested parties to process.

The basics of connecting to the Twitter streaming API are pretty simple using Spring-Boot and Spring-Social - I just created a simple Spring-Boot webapp that just exposed a simple page to connect to Twitter (OAuth via Spring-Social) and then on connection just connected to the streaming API and started listening.




First I just connected to the sample "firehose" stream, which is supposed to be 1% of the total twitter stream (I saw reported that there are 500million tweets a day, so you'd be looking at about 5million random tweets sent a day) - But I decided to consume the tweet events to pull out data about the current ongoing Rugby World Cup (England 2015), so I switched to the search stream limiting by references to the world cup.

The searching stream provided a reasonable number of tweets, and I think whilst I was running during the England vs Australia game I processed ~500,000 rugby related tweets.


Configuring reactor

Getting reactor up and running is really easy with Spring:

The EventBus configuration is interesting, as it has options to use different patterns including the LMAX pattern - in this case I just went with a standard thread pool approach.

The tweet-eater

To be honest, the use of reactor was overkill for this experiment, as the Spring-Social API allows you to define listeners for the streaming APIs which have a handle tweet method, much like an event consuming interface. But as it was just an experiment I continued and just used that listener to push the events on to the event bus.

As you can see above, its pretty simple - the listener just creates the event object and pushes it onto the EventBus.  So, with that done, and the basic Spring-Social connection setup to listen to the stream we will have a nice flow of events being pushed (at quite a high rate!) onto the EventBus. Now we just need something to consume those events.


Event consumers

The first consumer I created was just a very basic logging consumer - all it did was count events and then log numbers

Pretty simple


Next up,  I created a basic consumer to inspect the tweets, identify rugby teams mentioned and persist the data to Redis - this was also pretty easy, as Redis integration works pretty simply and Twitter created a set of standard team hashtags for the competition - So I just mapped those to countries, and setup my consumer to check for those

I quickly stuck some lipstick on the interface to display number of tweets per country and we were off!



Like I said, it's only barely scratching the surface of async event based programming, but it shows how easy it is to get the EventBus up and running with Spring.

As always, the code for the full app is over on GitHub - if you just clone the project and add in your own Twitter API keys in the config then you should just be able to build the JAR and run it directly.

Spring-Boot & Netflix OSS - An adventure into microservices

Honestly, I still need convincing on microservices.

I can see that they are a compelling argument compared to a monolithic application, but I think I need to get my head around some of the challenges they face - the first one that comes to mind being how to effectively define the microservice boundaries - as it seems to me a lot of the applications I have ever worked with are monolithic because these boundaries are so blurred.


Anyway, I wanted to do some tech stuff, so decided to start building out an application using the microservice architectural pattern and Spring Boot seemed like a good place to get started.

This is very much a work in progress, and I am continuing to progress through different aspects of the application and at the moment there is very little actual code written (in part that is due to the simplicity that Spring-Boot provides).  All code is being kept up to date in GitHub so feel free to have a look at that.


There are lots of great blogs covering this stuff already, so won't re-cover their work, the following article gives a great write up of the Netflix OSS and the Spring integration which is worth reading:

http://callistaenterprise.se/blogg/teknik/2015/04/10/building-microservices-with-spring-cloud-and-netflix-oss-part-1/

(Image from: Building Microservices with Spring-Cloud and Netflix OSS)


Getting started: A service registry - Eureka

One of the first things that is needed is a central Service Registry to allow service-discovery - this is not a new concept to microservices and is an approach used by SOA.  Straight out of the box, Spring-Boot provides integration with Netflix's OSS application Eureka, that provides this.  I opted to have a dedicated application for my registry (code can be seen here) and it really is as simple as adding the relevant dependencies to the build.gradle file, adding an @EnableEurekaServer annotation to our application config then a simple config file defining the server port/name etc and its done!  You can just run gradle assemble in that project to build the JAR file, then run java -jar [the new JAR file] and the application will spin up - you should then be able to go to http://localhost:1111 and you will see the Eureka dashboard (with no microservices registered of course).



My first microservice

So, I had Eureka up and running, but it was looking pretty lonely with no services registered.

A microservice in Spring is also very simple - as really, all it is is a simple web application that runs in its own process with a limited domain - so spinning up a Spring Boot MVC RESTful webapp with a single controller/endpoint is enough to get me a microservice (even just a tweet would do it..)
So we can create our new microservice to do anything we like, in my case I created a QuoteService (the application is slowly evolving into an insurance engine).  Just having the standalone app isn't helping much, so we need to add some configuration to tell the service to register with our Eureka server - this will make our new microservice discoverable by other services wanting to use it. 

Again this is quite simple: we need to tell our application it should try and register with Eureka, and we should add the config to do so:

You can see that we simply annotate our application config in java, and then add some properties that define where the Eureka server is hosted and that's basically it.

Now if we build the project JAR and start up again (and we still have our Eureka service registry running) then after 30seconds or so you should see the Quote-Service registered and ready to use.


On to the next one.. 

Now, we have a microservice, and we have a registry that makes it discoverable, but still - just one microservice is pretty lonely. So next I created another dummy RESTful microservice, this time called ProductService which just followed the same pattern as the first.

Once that was started up then the Eureka dashboard started looking a bit happier with the two services registered - the obvious next challenge is seamless interaction between the two: splitting the services into their own processes is all good an well, but meaningless if you can't easily integrate them.  The way I look at it is when reading the application code of a service (or application using microservices) then it should just look like a normal application with sevice classes - there shouldn't be any fanfare around the fact that my service class actually gets the data from a dedicated microservice over HTTP/AMQP rather than just getting directly from the DB in the traditional way.


So, still just stubbing out the endpoints, I updated my QuoteService endpoint to make a call to my ProductService, and then just jammed that response into the JSON response I was returning anyway:

As you can see, it could be a standard controller in a normal monolithic application from this point, we are just calling a method on our autowired ProductService class and returning that.


So the really interesting part is in the ProductService class - at the moment this isn't a really elegant, abstracted class yes, so there is still some boiler plate, but that will have the advantage of making it clear what is going on:

As you can see, it's just making a REST call to the Product microservice and returning the response cast as a Map - but the really nice part of this is that the service url is just the service name (in this case "PRODUCT-SERVICE", that is injected to the class)  and with the RestTemplate annotated with Spring-Cloud's @LoadBalanced that microservice will be looked up in Eureka (and load balanced if there is more than one PRODUCT-SERVICE running).

So our setup is starting to take shape now - we have two microservices, both registered with Eureka and able to interact with each other in a fairly clean, loosely coupled way.


Don't push me, 'cos I'm close to the edge..

As your microservices start to proliferate, you will get different levels of service granularity, and undoubtedly you won't just want to expose all your microservices as a public API.  One option would be to create a RESTful application and define nicely named endpoints you want to expose and then use the standard integration described above to integrate it.

Fortunately, there is an easier way - Netflix provides a library called Zuul that can be simply configured to map URL patterns to given defined service names (and again looks up in the Eureka service registry).  Much like Eureka, this is super easy to setup and just needs an annotation and the config again:

And the config is pretty easy to understand:

As you can see, we just define service names against URL patterns.

Now once all are apps are up and running, and the microservices are registered on Eureka then you have a single API interface to start interacting with the services (rather than having to access each service on its designated port etc).


Conclusion

 So that's as far as I have got - I wired up the QuoteService to MongoDB so the data all gets persisted there (and have added a get quote endpoint which gets the same data from mongo) and starting to wire up the product service with JPA.  So far it's been enjoyable, and things are making more sense than when I started - but there are a few questions still:

  • It seems like there is still duplication of service names throughout the different projects - for example the ProductService name ("product-service" - case insensitive) is proliferated throughtout - the service itself defines it, the QuoteService needs to know the name of the service, the Zuul edge server needs to know the name etc.  I guess this is unavoidable as these are intrinsic dependencies but still seems a bit flaky.
  • It feels like the Service classes could be factored out - our ProductService class that allows HTTP REST interactions with the Product microservice would likely need to be re-used across all applications/microservices that need to use the Product microservice

Spring MVC - Page caching and If-Modified-Since

Spring (and in general Java/groovy etc) support a range of caching layers for your web application - with options for Hibernate first/second level cache, Spring's @Cacheable and simple integration with lots of providers (EHCache, Redis, etc), but aside from server side caching, you can also implement page level caching using the standard If-Modified-Since cache headers.

If your web application is sitting behind an apache web server, then using this mechanism, you can set it up such that Apache will do a lot of the work, and a lot of requests never even trouble our web application.  This will of course only work if you have fairly static pages that don't update to frequently, and aren't user specific (e.g. so its actually possible to cache at a web-page level - user account pages obviously are harder to cache at this level as they are very user specific).  Even without Apache in the mix, most modern browsers are built to handle If-Modified-Since headers and 304 response codes, so using the approach can mean that browsers are less eager to even request the page from your server whilst a user is browsing the site if we have said its cache-able.


Thankfully, the machinery for this stuff is all baked into Spring MVC.


Updating the RequestMappingHandlerAdapter

This walkthrough is assuming that your app is using standard @Controller annotations and a standard RequestMappingHandlerAdapter to route the requests - although there are still relatively easy mechanisms to do this if you are using other Controller/HandlerAdapter patterns.

The RequestMappingHandlerAdapter class has a method that can be overridden called getLastModifiedInternal() - This method simply returns a long value (the epoch time) that the requested resource was last modified. All we need to do is extend the RequestMappingHandlerAdapter and implement this method to return a timestamp.  For example:



The above assumes we have initialised a timestamp at startup (easy to do using Java config) and assumes that if you have visited the site since last app startup, then there is no change (in reality, we will likely need something more complicated to calculate this)

And that's really all we need, as Spring will handle the rest for us - from this, if a brand new request comes in with no If-Modified-Since headers, then Spring will take this date/time stamp and return it with our response as the Last-Modified date (HTTP standard header - this will inform the browsers/web servers action next time).  If however, a user has already visited your site and the browser has received a valid Last-Modified timestamp, then on the next request it will include this value in the request If-Modified-Since header, when Spring recieves this request, it compares the timestamp against the value that is returned from our getLastModifiedInternal() method, and if there has been no change then Spring will automatically return the response to the client with a 304 response - so none of the Controller code will be executed.

As you can probably guess, this can provide huge efficiencies and improvement on latency, server overhead etc.



More complex Last Modified Dates

As mentioned, in all likelyhood you will need a more sophisticated mechanism than just checking the application start time - So we have plenty of options here: we could query the DB for changes, we could have other flags/properties set for when particular resources are set (bare in mind that if static resources like CSS are changed, these need to be considered)

If you need to consider a fairly unique LastModified date for every endpoint, then this solution isn't a good fit, as this is the more generic approach - you can alternatively implement a similar last modified method on your (every) controllers.


Another option that I have considered is a halfway compromise - where I need endpoints to have specific considerations, but I only have a set of 4-5 (or relatively manageable) specific queries/checks that need to happen, and every endpoint will just need to check some subset of these conditions - this solution involves custom annotations and marking up Controller methods with this annotation to signal to our HandlerAdapter what checks should be considered for the Last Modified timestamp.  This has the advantage of relatively little intrusion in all my controller endpoints, but granular enough to provide enough control for effective page caching.


A custom annotation

First we define a simple annotation that can be used to mark up Controller methods to indicate which conditions are important to the endpoint:



The above code shows the annotation code, a sample enum (just to provide some type safety whilst using the annotation - this could be a string or anything else) and an example of the annotation on a controller endpoint.


Updating our Last Modified method

Now, in our last modified method, we have access to the Handler (e.g. the controller method being invoked) so we can quite simply check the controller method for our annotation, and if found we can just check the values said and perform the appropriate checks to work out what the last modified date needs to consider:




As mentioned, this won't work if there are lots of pages with lots of different requirements, but if you have a manageable of changing entities in your application this can strike a nice balance between clean code and flexibility.



Thoughts on Spring Boot

 Having long been a fan of the Spring framework (Spring MVC, Spring Social etc), I have recently started using Spring Boot for two different projects I am currently working on.
In both cases, they are web applications - both with some key differences in technology (in terms of what I am using for client side libraries and data persistence).

Spring Boot is an opinionated implementation of Spring - and it works really nicely (most of the time).  You simply add the Spring Boot dependencies to your build file (Gradle or Maven both well supported) and you can have an application up and running with a very small amount of code (you may have seen the Spring boot application in a Tweet a while ago)



Now of course, in reality you do need other configuration stuff, but the take away point is that if you are happy with Spring opinions, you can get an application up and running in pretty quick time.  For me, this is really a turning point for Java/Spring productivity - especially as Groovy has become more mature and sits so easily in Spring Boot, I don't think there is much to the argument that Java/Spring is too slow for early stage companies/prototypes/rapid development process (Aside: I know grails has been around for a while, but I think Spring and Java have stepped up here, plus I honestly have my suspicions that Spring Boot and Grails may clash at some point - they have certainly been on a collision course since Spring Boot was announced at Spring One).


The way it works is quite simple really, you add a relevant dependency to your buildfile, for example:

compile("org.springframework.boot:spring-boot-starter-web")  
compile("org.springframework.boot:spring-boot-starter-thymeleaf")


The first one is just your standard web application Spring Boot dependency, and the second is the opinionated version of Thymeleaf configuration.  Then, at startup Spring Boot has lots of AutoConfiguration files that check for the existence of key classes in the classpath, and if present it executes the auto configuration.

See here for the Thymeleaf autoconfiguration - you will see that there is heavy use of the @ConditionalOnClass annotation - which checks for relevant Thymeleaf classes, and if they are on the classpath it configures them.



Undoubtedly, there are a few times where you find yourself scratching your head at the magic, and I have on more than one occasion had to go through the Spring source code.  And sometimes, you want most of the auto configuration, but just want to tweak one or two properties, and you are left having to turn off the autoconfig and do it yourself, but for me the biggest take away point is the speed at which it is now possible to get up and running with Spring/Java/Groovy and have a decent web platform for building your product or company.

Tech: Functional programming in Groovy

I have recently started the Coursera Functional Programming with Scala course (taught by Martin Odersky - the creator of Scala) - which is actually serving as an introduction to both FP and Scala at the same time having done neither before. The course itself is great, however, trying to watch the videos and take in both the new Scala syntax and the FP concepts at the same time can take a bit of effort.

I wanted to work through some of the core FP concepts in a more familiar context, so am going to apply some of the lessons/principles/exercises in Groovy.


Functions:

If you have done any Groovy programming then you will have come across Groovy functions/closures. As Groovy is dynamically typed (compared to Scala's static typing), you can play it fairly fast and loose.

For example, if we take the square root function that is demonstrated in the Scala course, it is defined as follows:



As you can see, Scala expects the values to be typed (aside, you don't actually always need to provide a return type in Scala). But in the Groovy function it is:



A groovy function can be defined and assigned to any variable, thereby allowing it to be passed around as a first class object. If we look at the complete solution for calculating the square root of a number (using Newton's method - To compute the square root of "x" we start with an estimate of the square root, "y" and continue to improve the the guess by taking the mean of x and x/y )


So that's in Scala, if we try Groovy we will see we can achieve pretty much the same thing easily:


Recursion (and tail-recursion):

As FP avoids having mutable state, the most common approach to solve problems is to break the problem down in to simple functions and call them recursively - This avoids having to maintain state whilst iterating through loops, and each function call is given its input and produces an output.

If we again consider an example from the Scala course, with the example of a simple function that calculates the factorial for a given number.

This simple function recursively calculates the factorial, continuing to call itself until all numbers to zero have been considered. As you can see, there is no mutable state - every call to the factorial function simply takes the input and returns an output (the value of n is never changed or re-assigned, n is simply used to calculate output values)

There is a problem here, and that is as soon as you attempt to calculate the factorial of a significantly large enough number you will encounter a StackOverflow exception - this is because in the JVM every time a function is called, a frame is added to the stack, so working recursively its pretty easy to hit upon the limit of the stack and encounter this problem.  The common way to solve this is by using Tail-Call recursion. This trick is simply to have the last code that is evaluated in the function to be the recursive call - normally in FP languages the compiler/interpreter will recognise this pattern and under the hood, it will really just run the code as a loop (e.g. if we know the very last piece of code in the block of code is calling itself, its really not that different to just having the block of code/function inside a loop construct)

In the previous factorial example, it might look like the last code to be executed is the recursive call factorial(n-1) - however, the value of that call is actually returned to the function and THEN multiplied by n - so actually the last piece of code to be evaluated in the function call is actually n * return value of factorial(n-1).
Let's have a look at re-writing the function so it is tail-recursive.


Now, using an accumulator, the last code to be evaluated in the function is our recursive function call. In most FP languages, including Scala, this is enough - however, the JVM doesn't automatically support tail-call recursion, so you actually need to use a rather clunkier approach in Groovy:

The use of the trampoline() method means that the function will now be called using tail-call recursion, so there should never be a StackOverflow exception. It's not as nice as in Scala or other languages, but the support is there so we can continue.


Currying:

This is like function composition - the idea being you take a generic function, and then you curry it with some value to make a more specific application of the function. For example, if we look at a function that given values x and y, it returns z which is the value x percent of y (e.g. given x=10, y=100, it returns the 10 percent of 100,  z=10)

The above simple function is a generic mechanism to get a percentage value of another, but if we consider that we wanted a common application of this function was to always calculate 10% of a given value - rather than write a slightly modified version of the function we can simply curry the function as follows:

Now, if the function tenPercent(x) is called, it uses the original percentage() function, but curries the value 10 as the first argument. (If you need to curry other argument positions you can also use the rcurry() function to curry the right most argument, or ncurry() which also takes an argument position - check the Groovy docs on currying for more info)


Immutability:

Immutability is partially supported in Java normally with use of the final keyword (meaning variables can't be changed after being initially set on object instantiation). Groovy also provides a quick and easy @Immutable annotation that can be added to a class to easily make it immutable.  But really, there is more to avoiding immutable state than just having classes as immutable - As we have functions as first class objects, we can easily assign variables and mutate them within a function - so this is more of a mindset or philosophy that you have to get used to. For example:

The first example is probably more like the Groovy/Java code we are used to writing, but that is mutating the state of the list - where as the second approach leaves the original list unchanged.


Map Reduce:

As a final note, there are some functions in FP that are pretty common techniques - the most famous of which these days (in part thanks to Google) is Map-Reduce, but the trio of functions are actually Map, Reduce(also known as Fold) & Filter - you can read more about the functions here (or just google them!), but these functions actually correlate pretty nicely to core Groovy functions that you probably use a lot of (assuming you are groovy programmers).

Map

map is the easiest to understand of the three. It takes in two inputs - a function, and a list. It then applies this function to every element in the list. You can basically do the same thing with a list comprehension however. 
Sound familiar? This is basically the .collect{} function in Groovy

Reduce/Fold

This one is a bit more complicated to descibe, but is the same as the .inject{} function in groovy

Filter

And another simple one - filtering out a list for desired elements, this is Groovy's .findAll{} function



As I said at the start, I am new to FP and coming from an OO background, but hopefully the above isn't too far from the truth!  As I get further through the Coursera course I will try to post again, maybe with some of the assignments attempted in Groovy to see how it really stands up.


Some useful references:

Algorithm analysis & complexity: MergeSort - part 1

I have just signed up for the Coursera Algorithm Design & Analysis course from Stanford. I have been meaning to take it up for a while but kept missing when it was running. As it happens, this time I have all but missed it, with the final assessments due already, but I have signed up and am going through the material and lectures on my own.

I am only on week one so far, but it is all good - the lecturer seems good and articulates the content well (so far the presentation and sophistication of the course is better than the Scala & FP course and the Logic courses I have previously done on Coursera).

After I completed the Functional Programming & Scala course, I wrote up some notes on some of the techniques and problems in Groovy and thought it might be fun to write up some groovy notes from the algos course.

Week one is covers an introduction to algorithm analysis, divide and conquer (with merge sort) and some other stuff.


Mergesort - Analysis

Mergesort has three key steps
  1. Split the list in to two halves
  2. Recursively sort the first half
  3. Recursively sort the second hald
  4. Merge the two sorted halves
This is a well known example of the divide&conquer paradigm, the recursive sorting continues to recurse and divide the list into two halves until sorting is trivial (e.g. base case of one item in each list).


Let's look at the merge step:

(pseudocode taken from the course)
result = output [length = n]
leftList = 1st  sorted array [n/2]
rightList = 2nd  sorted array [n/2]
i = 1

j = 1
for k = 1 to n
  if leftList(i) < rightList(j) 

    result(k) = leftList(i)
    i++ 

  else [rightList(j) < leftList(i)]
    result(k) = rightList(j)
    j++
end



The above should be fairly straight forward to understand:
  • result stores our final merged list, and will be length n (number of elements in starting array)
  • leftList/rightList - these are the two inputs to the merge, and are simple the two halves of the list that need to be merged. This assumes that both lists are already sorted prior to merging
  • i/j - These are pointers to what we consider the "head" of each list. Rather than removing items from lists we will simply keep track of a pointer for each list to track how many items of each list have been merged
  • We then iterate through all "n" items, comparing the "head" of our two lists, the smaller value getting added to our result list


Complexity analysis of merge step:

In the course, the lecturer states that the worst case running time = 4n + 2 which he then further simplifies to = 6n  (this simplification is simply because n must always be at least 1, so 4n + 2 will always be no worse than 6n.

Let's look at how we get to that figure (we will then later look at what the asymptotic run time is). This figure is simply the number of operations that need to be executed when this algorithm is run so let's count those:

result = output [length = n]
leftList = 1st  sorted array [n/2]
rightList = 2nd  sorted array [n/2]
i = 1

j = 1
In the above, the only operations that are actually set are i & j (the others are just describing the function input/output). So that costs us 2 (constant regardless of how big "n" is)

for k = 1 to n
  if leftList(i) < rightList(j) 

    result(k) = leftList(i)
    i++ 

  else [rightList(j) < leftList(i)]
    result(k) = rightList(j)
    j++
end



In the above, we have one condition checking/incrementing "k" in the for statement (executed every iteration, so this will be done "n" times); we then have an IF conditional check, this will also be done in every iteration (so this is another "n" operations); then depending on which branch of that condition is executed, there are always a further two operations executed (the assignment of the head to result and the increment of the head pointer) - another two operations executed "n" times.

So the above block executes 4 operations for each of the "n" iterations. The first block executes 2 operations =

4n + 2

Seems straight forward enough.  Really though, this is just a fairly simplified piece of pseudo code, and there is a bunch of additional code that is needed to handle other stuff, edge cases etc (e.g. what happens if you have added all of the left list to the result and only have the right list left? obviously that can just be appended to the results, but the above code doesn't handle this). Then there are further questions about different language implementations that might have more operations to do this (not to mention difference when it gets compiled down to machine code) - we will get more into this later when we talk about asymptotic performance. For the time being we will just say the run time (worst case) for merging two lists is 4n + 2.


Now, let's look at recursively sorting..

This part of the algorithm is also relatively simple, we just keep recursing through the list, splitting the list in to two halves until each half of the list is just one element (one element is already sorted!), then we can merge those lists easily.

So, if we are recursing through and splitting the list into two halves, how many times will we have to perform the merge?

Let's step through this, with an example input list of 8 elements (n=8)
  1. One list of 8 elements
  2. Two lists of 4 elements
  3. Four lists of 2 elements
  4. Eight lists of 1 elements
this is just a basic binary tree - see the illustration below taken from wikipedia:


As you can see in the above tree, it splits down into individual elements, then it re-combines them using the merge technique. To work out the complexity of the algorithm we need to know two things:
  1. How deep is our tree going to be?
  2. What is the runtime of a given depth in the tree (e.g. for a given depth "j" in the tree, what is the cost of re-merging all the lists at that level

The first question is relatively easy, as it is a binary tree we know the depth is simply going to be log2(n)+1 - How do we know that? the log of a  number is simply how many times it needs to be divided by 2 (assuming 2 is the logarithm base) before the number is <= 1 - which is exactly what we are doing when we are recursively dividing our lists in half until we reach the single element lists.

E.g. the log2(8) = 3 (8/2=4; 4/2=2; 2/2=1)



So now lets calculate the running time for any given level of the tree.  Lets consider a list of length "n" and we want to know a few things:

How many merge operations do we need to perform for depth "j" of a list "n"?
In a similar way to how we calculated the depth, we can work out the number of merges (or number of leaves at the level of the tree).  If you have halved the lists each step 0..j then the number of leaves will be 2 to the power j.

E.g. The first level down, after just halving the original list, we have j=1, so expect 2 power 1 = 2 leaves.
Second level down, we again halve each of those two lists, we have j=2, so expect 2 power 2 = 4 leaves

Note that j cannot exceed the depth of the tree, which we already know.


What is the complexity of each of these merge operations?
We know the complexity of a merge from our earlier analysis of the merge step as 6m (where m is the size of the halved lists - not the original value "n" of the starting list size), and we know we have 2 to the power j of these merges, which means we know at any level of the tree, the run time performance is:

(2 power j) * (6m)

Now the value of "m" will also be dependent on the value of "j" - because as we get further down the tree, we have a greater number of merges to perform, but the size of the lists are also getting smaller (so the actual value of "m" is decreasing), so what is the value of 6m when we are at level "j" of the tree?  We can work that out in a similar fashion - we know the original "n" (size of starting list) has been halved "j" times  - so it is simply

(n)/(2 power j)

So now we can combine those and resolve the algebra.. we can see that the 2 power j cancels each other out (intuitively really, as we are increasing the number of merges, but at the same time and at the same rate reducing the size of the lists to be merged:

(2 power j) * 6 * (n/(2 power j)) = 6n




What is the overall complexity of the tree?
So we now already know the complexity of any given level of the tree (independent of the depth) and we know the depth of the tree (number of levels):

6n * (log2(n)+1)  =  6nlog2(n) + 6n

So there we have it - we have calculated the worst case running time for the merge sort.


There are a few further things that we will need to consider in later posts: In one post I will have a quick look at asymptotic analysis (e.g. "Big-Oh" notation) and what this means; in another post I will look at a Groovy implementation of it and the analysis of that.








Groovy bug - Stackoverflow calling super methods

I recently stumbled upon a bug in the version of Groovy that I was working with (Groovy 2.1.5), it's a strange one, that seems to have been solved in Groovy 2.2 but I thought I would post it here in case it was useful to anyone else.

The problem is demonstrated in the code below, but the exact use case seems to be having an inheritance structure > 2 classes deep, whereby the method return type changes.



The result in running the above code in Groovy 2.1.5 is a StackOverflow - it seems as though Groovy just recursively calls the doStuff() method on Child class until it errors rather than calling the method on the parent class.  Running on Groovy 2.2.0+ appears to run correctly.

I posted the question on stackoverflow.com, and it was suggested that it was maybe related to this bug, which has a very re-assuring resolution comment:

looks like the issue was fixed somewhen between 2.2.2 and 2.3.0

JVM Memory Management Primer: Groovy PermGen the Prequel

I am shortly going to be writing a post about managing PermGen memory with Groovy in production, but before getting to that this is a primer/reminder on some key parts of JVM memory management.

Out of Memory: Heap Space

Throughout dev on a JVM project, it is not entirely uncommon to see an OOM exception along the way. The most common will be a "Heap Space" OOME - The Heap is the part of memory that the JVM stores data/objects that are created by your application. For example, if you create a class called "User" and then create an instance of it, the instance of the class will be placed in the Heap.  This class of OOM are more common as it can happen if you attempt to load to much data or if your application is growing during dev and you haven't configured max heap space size (e.g. if you have a low heap space size defined and load a lot of data into your application you may see this error)

The Heap is split in to two** sections (also called generations) - the Young(sometimes called nursery) and the Old (sometimes called tenured generation). The JVM uses a generational Garbage Collection approach:

  • Young Generation - This is where most newly instantiated objects are placed. As many objects are relatively short lived, many are born and then die in this space. This space is collected more frequently and quickly.  This itself is normally split into two generations - Eden & Survivor - These just represent the age of the object, and as an object survives more GC it is promoted up, after a set threshold it is promoted into..
  • Old/Tenured Generation - This space stores objects that have outlived the young generation. These are assumed to be long lived objects, GC of this generation is less frequent and takes a lot longer (as it has to inspect all objects)

There are several GC strategies that handle GC of the heap, and can be tuned for your application (depending on what is important to you) - you can read more on the Oracle docs about strategies.


** In some versions/JVMs PermGen space is also part of the heap, so that would make it three. Also the Young generation is sometimes split further, but for a high level review of OOM Heap Space - we will consider two


Out of Memory: PermGen

The other class of OOME you might see is a PermGen OOM.  By and large this is rarer than the heap space errors as the Permanent Generation (PermGen) is the part of memory that stores Classes. In our User class example, the instances of the User class are on the heap, but the Class itself will be stored in the Perm Gen (in this example, there maybe any number of instances of the User class - one per user! - but there should only be one version of the Class). Just using standard Java, perm gen won't often be a problem, as there should be a static number of total classes for your application and as long as you haven't set the PermGen size too low then this shouldn't happen, but with the use of dynamic languages the risk of PermGen errors become more frequent, with languages like Groovy dynamically creating adhoc classes.

Whilst PermGen stores classes rather than object data, it still obeys the simple principle of GC: If a class no longer has any references to it, then that class could be subject to GC; whilst there are still objects that hold reference to a class then it cannot be GC'd.


A final important thing to note is as the norm in Java/JVM is for a static number of classes to be used, the default GC strategy is to not collect PermGen - So if you are doing anything that involves changing PermGen and adding more classes to that memory then you will inevitably see a memory leak.


Next post I will go into details about a few Groovy memory gotchas I have come across after 18months of production Groovy code.




Traversing data structures - A Groovy Visitor Implementation

I have been using Groovy for a while now, having come from a solid Java/J2EE/Spring/ORM background where patterns and solid OO is a mainstay.  Although it took me a bit to get in to the swing of the Groovy stuff, I have now really taken to it (in no small part, encouraged having taken the Coursera FP class) - The simplicity and ease to use the built in functional stuff like .each{}, .findAll{}, .collect{} is really neat, and once you get started with them there really is no stopping!

As Groovy is dynamically typed (not weakly typed though!) you do end up using a lot less POJO classes and a lot more Maps, Lists of Maps, etc (also Groovy makes it really easy to work with these guys) and I find myself fairly often needing to traverse complex, dynamically typed data structures to do some kind of data processing.  Normally, in my Java OO background, when frequently processing tree type structures (a nested Map of Lists/Maps/Simple elements can really be thought of taking this form) I would fall back to the Visitor patter (if you aren't familiar with the GoF Visitor pattern, see here, here, etc for details), but if you have a dynamically typed complex data structures, and you don't really want to have each potential node in your structure to have to implement a set interface with a visit() method on it, then I use the following approach.

(disclaimer: yes, it feels as though it is a hacky, dodgy approach of the pattern - but it works well with Groovy and has been working well for me. If you have ideas on how to improve etc I would love to hear thoughts in the comments!  As such, I like to refer to it as "The Unwelcome Visitor Pattern").

First, I create an iterator class - this basically has the code to iterate through a nested, complex structure - I see this as boiler plate code that is a pain to re-write and will be used by all code that wants to traverse a complex data structure (Map with n-level deep nested Lists/Maps)



As you can see, its just a basic re-cursive piece of code to traverse List/Maps - you will note that it expects a visitor class to be passed in to it as an argument on first calling that has a visitMap() and visitList() methods.  In normal Java, this would need to be a class that implements a particular Interface/Abstract class that has implementations of the required methods. However, as Groovy is a little more dynamic we can do some pretty nice on the fly stuff (yes, I know, if you are performing some really common stuff, you may still want to have the traditional Java interface/explicit class approach as well, but that's not why we are here!).  The code below is an example of doing some on-the-fly processing of a dynamically typed complex data structure (in this case, we are just converting all Date objects to Strings, but this is just an example for funsies)



As you can see, in the above we are using Groovy's ability to create Interface implementations as Maps and just defining a simple closure for each of the visitMap() visitList() methods.


It may not be the most graceful solution, but it works simply and allows easy definition of closures that can process Maps/Lists easily (could also be used in the same way fo rtraversing JSON structures etc)