Showing posts with label github. Show all posts

Machine Learning with AWS & Scala

Recently, in an attempt to starting learning React, I started building an akka-http backend API as a starting point. I quickly got distracted building the backend and ended up integrating with both the Twitter streaming API and AWS' Comprehend sentiment analysis API - which is what this post will be about.

Similar to an old idea, where I built an app consuming tweets about the 2015 Rugby world cup, this time my app was consuming tweets about the FIFA world cup in Russia - splitting tweets by country and recording sentiment for each one (and so a rolling average sentiment for each team).


Overview

The premise was simple:

  1. Connect to the Twitter streaming API (aka the firehose) filtering on world cup related key words
  2. Pass the body of the tweet to AWS Comprehend to get the sentiment score
  3. Update the in memory store of stats (count and average sentiment) for each country

In terms of technology used:
  1. Scala & Akka-Http
  2. Twitter4s Scala client
  3. AWS Java SDK

As always, all the code is on Github - to run it locally, you will need a Twitter dev API key (add an application.conf as per the readme on the Twitter4s github) and you will also need an AWS key/secret - the code will look for credentials stored locally but you can also just set them in environment variables before starting. The free tier supports up to 50,000 Comprehend API requests in the first 12 months - and as you can imagine, plugging this directly into twitter can result in lots of calls, so make sure you restrict it (or at least monitor it) before you leave it running!


Consuming Tweets

Consuming tweets is really simple with the Twitter4s client - we just define a partial function that will handle the incoming tweet. 

The other functions about parsing countries/teams are excluded for brevity - and you can see its quite simple - each inbound tweet we make a call to the Sentiment Service (we will look at that later) then pass it with the additional data to our update service that will then store it in memory. You will also see it is ridiculously easy to start the Twitter streaming client filtering by key words.


Detecting Sentiment

Because I wanted to be able to stub out the sentiment analysis without being tied to AWS, you will notice I am using the self-type annotation on my twitter class above, which requires a SentimentModule to be passed in at construction - I am using a simple cake pattern to manage all my dependencies here. In the Github repo, there is also a Dummy implementation, that will just pick a random number for the score, so you can still see the rest of the API working - but the interesting part is the AWS integration:
Once again, the SDK makes the integration really painless - in my code I am simplifying the actual results a lot to a much cruder Positive/Neutral/Negative rating (plus a numeric score -100..100).

The AWSCredentials class is the bit that is going to look in the normal places for an AWS key.


Storing and updating our stats

So now we have our inbound tweets and a way to asses their sentiment score - I then setup a very simple akka actor to manage the state and just stored the API data in memory (if you restart the app, the store gets reset and the API stops serving data).

Again, very simple out of the box stuff for akka, but it allows easy and thread safe management of the in-memory data store. I also track a rolling list of the last twenty tweets processed, which is managed by a second, almost identical, actor.


The results

I ran the app during several games, below are some sample outputs from the API. The response from the stats API is fairly boring reading (just numbers) but the example tweets show two examples of a positive and neutral tweet correctly identified (apologies for the expletives in the tweet about Poland - I guess that fan wasn't too happy about being beaten by the Senegalese!) - you will also notice, the app captures the countries being mentioned, which exposes one flaw of the design: in the negative tweet from the Polish fan loosing two goals to Senegal, it correctly identifies the sentiment as negative, but we have no way to determine the subject - as both teams are mentioned, the app naively assigns it as a negative tweet to both of the teams, where as on reading, it is clearly negative with regards to Poland (I wasn't too concerned for my experiment, of course, just an observation worth noting).

Sample tweet from the latest API:

Sample response from the stats API:

When I finally did get around to starting to learn React, I just plugged in the APIs and paid no attention to styling, which is a round about way of apologising for the horrible appearence of the screenshot below (I'm really sorry about the css gradient)!





An opinionated guide to building APIs with Akka-Http

Akka-Http is my preferred framework for building APIs, but there are some things I have picked up along the way. For one thing, Akka-Http is very un-opinionated in its approach, there are often lots of ways to do the same thing, and there isn't a lot of opinionated guidance about how to do things.

I have been writing Akka-Http APIs for I guess about 18 months now (not long, I know), having previously worked predominantly with libraries like Spring, and I have seen some pretty nasty code resulting from the this (by this I mean, I have written nasty code - not intentionally, of course, but from good intentions starting off trying to write, clean, idiomatic Akka-Http code, and ending up in huge sprawling routing classes which are un-readable and generally not very nice).

The routing DSL is Akka-Http is pretty nice, but can quickly become unwieldy. For example, let's imagine you start off with something like this:


This looks nice right? A simple nested approach to the routing structure that reflects the URL hierarchy and the HTTP method etc. However, as you can probably imagine, try and scale this up to a full application it can very easily become fairly messy. The nested directives make it nice to group routes under similar root URLs but as you do that you end up with very long, arrow-shaped code that actually isn’t that easy to follow - if you have several endpoints nested within the structure it actually becomes quite hard to work out what endpoints there are and what is handling what.

Another problem that needs to be managed is that with the first one or two endpoints you might put the handling code directly in the routing structure, which is ok for very small numbers, but it needs to be managed sensibly as the endpoints grow and your routing structure starts to look more and more sprawling.

It is of course personal preference, but even with the simple example above, I don’t like the level of nesting that already exists there to simply define the mapping of the GET HTTP method and a given URL - and if you add more endpoints and start to break down the URL with additional directives per URL section then the nesting increases.


To simplify the code, and keep it clean from the start I go for the following approach:

  1. Make sure your Routing classes are sensibly separated - probably by the URL root (e.g. have a single UserRoutes class that handles all URLs under /users) to avoid them growing too much
  2. Hand off all business logic (well, within reason) to a service class - I use Scala’s Self-Type notation to handle this and keep it nicely de-coupled
  3. Use custom directives & non-nested routings to make the DSL more concise

Most of these steps are simple and self explanatory, so its probably just step 3 that needs some more explanation. To start with, here is a simple example:

You can see points 1 and 2 simply enough, but you will also notice that my endpoints are simple functions, without multiple levels of nesting (we may need some additional nesting at some point, as some endpoints will likely need other akka-http directives, but we can strive to keep it minimal). 

You might notice I have duplicated the URL section “users” rather than nesting it - some people might not like this duplication (and I guess risk of error/divergence of URLs - but that can be mitigated with having predefined constants instead of explicit strings), but I prefer the readability and simplicity of this over extensive nesting.


Custom Directives

First off, I have simply combined a couple of existing directives to make it more concise. Normally, you might have several levels of nested directives such as one or more pathPrefix(“path”) sections, the HTTP Method such as get{} another one to match pathEndOrSingleslash{} - To avoid this I have concatenated some of these to convenient single points.


getPath, postPath, putPath, etc simply combine the HTTP method with the URL path-matcher, and also includes the existing Akka-Http directive “redirectToTrailingSlashIfMissing” which avoids having to specify matching on either a slash or path end, and instead allows you to always match exact paths - It basically squashes the three directives in the original HelloWorld example above down to one simple, readable directive.


Custom Serialisation

You may also notice, I have implemented a custom method called “respond” - I use this to handle the serialisation of the response to a common JSON shape and to handle errors. Using this approach, I define a custom Response wrapper type that is essentially an Either of our internal custom error type and a valid response type T (implementation details below) - this means in all our code we have a consistent type that can be used to handle errors and ensure consistent responses.

This respond method simply expects a Response type to be passed to it (along with an optional success status code - defaulting to 200 OK, but can be provided to support alternative success codes). The method then uses Circe and Shapeless to convert the Response to a common JSON object. 

Let’s have a look at some of the details, first the custom types I have defined for errors and custom Response type:


Simple, now let’s take a look at the implementation of the respond method:

It might look daunting (or not, depending on your familiarity with Scala and Shapeless), but its relatively simple. The two implicit Encoder arguments that are included on the method signature simply ensure that whatever type A is in the provided Response[A], Circe & shapeless are able to serialise it. If you try to pass some response to this method that can’t be serialised you get a compile error. After that, all it does is wraps the response A in a common message and returns that along with an appropriate (or provided) HTTP status code.

You might also notice the final result is built using the wrap method in the ResponseWrapperEncoder trait - this allows easy extension/overriding of what the common response message looks like.


Conclusion

All of this machinery is of course abstracted away to a common library that can be used across different projects, and so, in reality it means we have a consistent, clean API with simple routing classes as simple and neat as below, whilst also handing off our business logic to neater, testable services.

All the code for my opinionated library and an example API is all on GitHub, and it is currently in progress with more ideas underway!

Intro to Genetic Algorithms in Scala

This is intended to be quite a simple high level approach to using some of these ideas and techniques for searching solution spaces. As part of my dissertation back when I was in university, I used this approach to find optimal configurations for Neural Networks. More recently, I have applied this approach to finding good configurations for training an OpenNLP model, with pretty good results - which is really what made me think to write a little about the topic.

So what is it?

Genetic algorithms are a subset of evolutionary computing that borrow techniques from classic evolution theories to help find solutions to problems within a large search space.

So, what does that mean? It sounds a little like a bunch of academic rhetoric that make nothing any clearer, right? But actually, the concept is quite simple, and works like the evolutionary theory of "Survival of the fittest" - that is, you generate a large set of random possible solutions to the problem and see how they all perform at solving the problem, and from there, you "evolve" the solutions allowing the best solutions to progress and evolve, cutting out the worst performing solutions.

How it works

In practical terms, at a high level, it involves the following steps:
  1. Create an encoded representation for the problem solution - This might sound complicated, but this just means capture all the possible inputs or configuration for a solution to the problem.

  2. Randomly generate a set of possible solutions - that is, a set of configurations/inputs (this is called a "population")

  3. Asses the performance of these solutions - to do this, you need a concept of "fitness" or how well a solution performs - and rank the population

  4. Evolve the population of solutions - this can involve a range of techniques of varying complexity, often keeping the top n candidates, dropping the bottom m candidates and then evolving/mutating the rest of the new population

  5. Repeat the rank & evolve cycle (depending on how it performs)


To understand it lets first consider a simple, hypothetical numeric problem, let's say you have some function f(x, y, z) that returns an integer, and we want to find a combination of x, y and z that gives the a value closest to 10,000 (e.g. you want a cuboid with volume of 10,000 and you want to find dimensions for the width, depth and height that achieve this - like I said, this is just hypothetical). The first step would be to encode the solution, which is pretty simple - we have three inputs, so our candidate looks like this:

case class Candidate(x: Int, y: Int, z: Int)

We will also need a fitness function, that will asses how a candidate performs (really, this will just be evaluation of the inputs against our function f)

def fitness(candidate: Candidate): Int = Maths.abs(10000 - (x * y * z))


Now we have a representation of the solution, in generic terms, and we have a function that can measure how well they perform, we can randomly generate a population - the size of the population can be chosen as suits, the bigger the population the longer an evolution cycle will take, but of course gives you a wider gene pool for potential solutions which improves your chances of a better solution.

Once we have our initial population, we can evaluate them all against our fitness functions and rank them - from there, we can evolve!

Survival of the fit, only the strong survive

So how do we evolve our candidates? There are a few techniques, we will look at the simplest here:

  • Survival of the fittest - the simple and sensible approach that the top performing candidate in your population always survives as is (in the scenario where by we have stumbled upon the optimal solution in the first random generation, we want to make sure this is preserved)

  • Mutation - much like you might imagine genes get mutated in normal evolution, we can take a number of candidates and slightly adjust the values in our candidate - for numeric "genes" its easy - we can just adjust the number, a simple popular technique for doing this is Gaussian mutation, which ensures that in most cases the the value will be close to the original value, but there are chances that it could be a larger deviation (it's a bell curve distribution, whereby the peak of the bell curve is the original value).

  • Cross breeding - again, like normal reproduction, randomly select two parent candidates (probably from the top x% of the candidate pool) and take some attributes from each parent to make up a child

Between mutation and cross breeding you can create a new pool of candidates and continue the search for a solution.

Some code

Partly because I wanted to learn a bit more about the Shapeless library (more of that in another post) and partly for fun, I created a simple GitHub project with some code that has very simple evolutionary search machinery that allows you to try and use these techniques.

The shapeless stuff was to support generic encoding of our solution - so the library can easily be used to define a solution representation and a fitness function.

Following on from the example above to find three dimensions that have a volume of 10,000 I have the following configuration:
In the above, we first define the case class ThreeNumbers - this can be any case class you like, but must have Gene[A] as its attributes - these are custom types that I created that have defined generation and mutation methods, and have additional configuration (such as max and min values for numbers etc).  We then pass all the configuration in, along with a fitness function of the form (a: B => Double) and run it.

It varies on each run, as the initial pool of candidates is generated at random, but in most cases this example solves the problem (finds an error rate of zero) within 10 generations:



As mentioned, I have used this approach to train OpenNLP NER models - the different training data generation parameters and the training configuration was all encoded as a candidate class and then used this method to evolve to find the best training configuration for named entity recognition, but it can be applied to all sorts of problems!

Feel free to try it out and let me know if it works well for any problem domains - you can also checkout the code and make it more sophisticated in terms of evolution/mutation techniques and introducing a learning rate.

Managing your personal finance (a Java webapp)

"Cash Rules Everything Around Me" - Method Man

A few years ago I was fed up of how useless all the banks were at helping me understand my transactions/spending patterns with their online banking. Basically, they all just seemed to let you log on, and then see a list of transactions (often just referencing codes) of payments in and out by date. Nothing else. It seemed like there were lots of simple things missing that could make it a lot more useful, to name a few:


  • Smarter tagging/categorisation - where some direct debit was just listed by the code of the recipient rather than meaningful info, or being able to group all the spending in coffee shops
  • Alerting to changes/unexpected behaviour - for example when a fixed term-price contract comes to an end and the price changes - detecting a change in a regular fixed aount/recipient should be easy

So this is what I started building - a lot of the functionality has since been built into banks such as Monzo, the aim of this webapp was simply to allow you to import transaction histories (most online banks I use provide the ability to export transactions in csv) so you could view/filter/tag any existing bank data. It only supports Santander at the moment, and the alerting stuff never actually got built in, but I thought I would chuck it on Github rather than have it just sit around gathering dust in a private repo.

I have written about the project a while ago, when it was first underway, but as it has been not worked on for a while, I thought I would move it to GitHub - hence the post here today. The app and all the code can be downloaded here: https://github.com/robhinds/cream


Building & Running

Its a simple java, maven app (it was several years ago, so not migrated to Spring Boot, Gradle, Groovy etc) - and it builds a WAR file. It also uses LESS for style stuff, which is also built by maven (although watch out for silent failures if you make changes!). If you are using an eclipse based IDE you can follow the guide here: to get incremental build support for LESS building (e.g. just change the LESS source and go refresh in the browser and it will reload).

You can run the WAR file in any tomcat/IDE server, and the backend is currently expecting a MySql DB (but can easily be switched for a differed DB driver if required)


Roadmap


  • Migrate to Spring Boot, Groovy, Gradle
  • Add further bank transaction importers
  • Add alerting framework



Screenshots

(I loaded in a generated transaction history - which is why some of the charts look a little weird)




Turn your GitHub Page into a Personalised Resume

A little while ago, I decided I wanted to update my CV, and figured given I was in tech it made sense for my CV to be online.  I was aware of GitHub Pages - which give you a nice looking URL which seemed like a perfect location for my tech CV.


http://robhinds.github.io/


Once I had it looking pretty decent, and updated to modern Bootstrap styling so it was fully responsive, I thought I would stick it on GitHub, as a GitHub page.  GitHub provides support for everyone to have a free hosted page with normal HTML/JS resources etc (which is pretty nice of them!) and gives you a nice, share-able URL like http://{username}.github.io.

Whilst I was reading about GitHub pages, I noticed that they have native support for Jekyll - which is a static HTML generator tool for building websites - which is when I had my second realisation of the day - I could make my CV open-source-able by making it a configurable Jekyll projects that lets users just update some config in their GitHub account and hey-presto, have a nicely styled, personalised tech CV!

So I started porting it over to Jekyll: which just involved moving the configurable, user specific items into a config file (_config.yml) and then breaking the HTML sections into fragments to make it more manageable to understand what is going on.  The idea of Jekyll is pretty straight forward - its just a simple tokenised/template approach to putting together static HTML output, but it does work well and I really didn't find myself wanting for anything in the process.  The GitHub native support was also really nice, all I needed to do was just upload the source of the project to my GitHub account and GitHub handled the build and serving of the site out of the box!

And that's all the configuration it takes! The YAML format is pretty readable - it largely just works with indenting, and hopefully taking a look over the nested sections of data, its fairly easy to understand how you can modify parts to make it customise-able.


You can see my GitHub CV page here robhinds.github.io - Out of the box you can configure lots of aspects: custom text blocks, key skills, blogs, apps, github projects, stackoverflow, etc.

How can you have a custom GitHub CV?

It really is super simple to get your own GitHub CV:
  1. Create a Github account (if you don't already have one)

  2. Go to the project repository and fork the repository

  3. Change the name of the repository (in the settings menu) to {{yourusername}}.github.io

  4. Edit the /_config.yml file in your repository - it should be pretty straight forward as to what the links/details are that you need to add.

  5. Visit your new profile page: {{yourusername}}.github.io and start sharing it!


Spring-Boot & Netflix OSS - An adventure into microservices

Honestly, I still need convincing on microservices.

I can see that they are a compelling argument compared to a monolithic application, but I think I need to get my head around some of the challenges they face - the first one that comes to mind being how to effectively define the microservice boundaries - as it seems to me a lot of the applications I have ever worked with are monolithic because these boundaries are so blurred.


Anyway, I wanted to do some tech stuff, so decided to start building out an application using the microservice architectural pattern and Spring Boot seemed like a good place to get started.

This is very much a work in progress, and I am continuing to progress through different aspects of the application and at the moment there is very little actual code written (in part that is due to the simplicity that Spring-Boot provides).  All code is being kept up to date in GitHub so feel free to have a look at that.


There are lots of great blogs covering this stuff already, so won't re-cover their work, the following article gives a great write up of the Netflix OSS and the Spring integration which is worth reading:

http://callistaenterprise.se/blogg/teknik/2015/04/10/building-microservices-with-spring-cloud-and-netflix-oss-part-1/

(Image from: Building Microservices with Spring-Cloud and Netflix OSS)


Getting started: A service registry - Eureka

One of the first things that is needed is a central Service Registry to allow service-discovery - this is not a new concept to microservices and is an approach used by SOA.  Straight out of the box, Spring-Boot provides integration with Netflix's OSS application Eureka, that provides this.  I opted to have a dedicated application for my registry (code can be seen here) and it really is as simple as adding the relevant dependencies to the build.gradle file, adding an @EnableEurekaServer annotation to our application config then a simple config file defining the server port/name etc and its done!  You can just run gradle assemble in that project to build the JAR file, then run java -jar [the new JAR file] and the application will spin up - you should then be able to go to http://localhost:1111 and you will see the Eureka dashboard (with no microservices registered of course).



My first microservice

So, I had Eureka up and running, but it was looking pretty lonely with no services registered.

A microservice in Spring is also very simple - as really, all it is is a simple web application that runs in its own process with a limited domain - so spinning up a Spring Boot MVC RESTful webapp with a single controller/endpoint is enough to get me a microservice (even just a tweet would do it..)
So we can create our new microservice to do anything we like, in my case I created a QuoteService (the application is slowly evolving into an insurance engine).  Just having the standalone app isn't helping much, so we need to add some configuration to tell the service to register with our Eureka server - this will make our new microservice discoverable by other services wanting to use it. 

Again this is quite simple: we need to tell our application it should try and register with Eureka, and we should add the config to do so:

You can see that we simply annotate our application config in java, and then add some properties that define where the Eureka server is hosted and that's basically it.

Now if we build the project JAR and start up again (and we still have our Eureka service registry running) then after 30seconds or so you should see the Quote-Service registered and ready to use.


On to the next one.. 

Now, we have a microservice, and we have a registry that makes it discoverable, but still - just one microservice is pretty lonely. So next I created another dummy RESTful microservice, this time called ProductService which just followed the same pattern as the first.

Once that was started up then the Eureka dashboard started looking a bit happier with the two services registered - the obvious next challenge is seamless interaction between the two: splitting the services into their own processes is all good an well, but meaningless if you can't easily integrate them.  The way I look at it is when reading the application code of a service (or application using microservices) then it should just look like a normal application with sevice classes - there shouldn't be any fanfare around the fact that my service class actually gets the data from a dedicated microservice over HTTP/AMQP rather than just getting directly from the DB in the traditional way.


So, still just stubbing out the endpoints, I updated my QuoteService endpoint to make a call to my ProductService, and then just jammed that response into the JSON response I was returning anyway:

As you can see, it could be a standard controller in a normal monolithic application from this point, we are just calling a method on our autowired ProductService class and returning that.


So the really interesting part is in the ProductService class - at the moment this isn't a really elegant, abstracted class yes, so there is still some boiler plate, but that will have the advantage of making it clear what is going on:

As you can see, it's just making a REST call to the Product microservice and returning the response cast as a Map - but the really nice part of this is that the service url is just the service name (in this case "PRODUCT-SERVICE", that is injected to the class)  and with the RestTemplate annotated with Spring-Cloud's @LoadBalanced that microservice will be looked up in Eureka (and load balanced if there is more than one PRODUCT-SERVICE running).

So our setup is starting to take shape now - we have two microservices, both registered with Eureka and able to interact with each other in a fairly clean, loosely coupled way.


Don't push me, 'cos I'm close to the edge..

As your microservices start to proliferate, you will get different levels of service granularity, and undoubtedly you won't just want to expose all your microservices as a public API.  One option would be to create a RESTful application and define nicely named endpoints you want to expose and then use the standard integration described above to integrate it.

Fortunately, there is an easier way - Netflix provides a library called Zuul that can be simply configured to map URL patterns to given defined service names (and again looks up in the Eureka service registry).  Much like Eureka, this is super easy to setup and just needs an annotation and the config again:

And the config is pretty easy to understand:

As you can see, we just define service names against URL patterns.

Now once all are apps are up and running, and the microservices are registered on Eureka then you have a single API interface to start interacting with the services (rather than having to access each service on its designated port etc).


Conclusion

 So that's as far as I have got - I wired up the QuoteService to MongoDB so the data all gets persisted there (and have added a get quote endpoint which gets the same data from mongo) and starting to wire up the product service with JPA.  So far it's been enjoyable, and things are making more sense than when I started - but there are a few questions still:

  • It seems like there is still duplication of service names throughout the different projects - for example the ProductService name ("product-service" - case insensitive) is proliferated throughtout - the service itself defines it, the QuoteService needs to know the name of the service, the Zuul edge server needs to know the name etc.  I guess this is unavoidable as these are intrinsic dependencies but still seems a bit flaky.
  • It feels like the Service classes could be factored out - our ProductService class that allows HTTP REST interactions with the Product microservice would likely need to be re-used across all applications/microservices that need to use the Product microservice

Android: Building a cloud based quiz application

A long time ago, when Android was still in its infancy (1.5 I think..) I built and open sourced a basic quiz app.  The app was just a basic multiple choice question-answer app, driven from some questions in the database, but it did ok - it has had over 10k downloads on the app store, and the blog post tutorial here is still one of the most popular articles.

But here we are, Android 5.0 is released and the state of Android development is very different now. But I didn't really want to just re-skin/tweak the old app and push it out again - and I also wanted to write up some notes on using parse.com as a backend service - so this seemed like a good opportunity.

The source code for the app is all on GitHub.


The goal

So the aim is to create a an android app quiz game, but rather than using local storage, using the cloud to power the questions. This avoids the need for lots of boiler plate DB code and also makes it easier for us to update questions.  The tutorial will be broken into two parts - the first part will be basic quiz stuff with cloud questions, the second part will enhance the app to support user login and to track scores to allow users to compete against each other.


Parse

Before you start the tutorial, you need to get an account setup at parse.com - its a cloud DB/backend as a service that was recently bought by Facebook. They allow real easy setup of DBs and provide a load of nice libraries/APIs to really easily interact with their endpoints across lots of platforms (its also incredibly well priced -the free tiew is really good and if you find your mobile app is going beyond that then you can probably think about monetising the app!).  All you need to do is head over there, sign-up and then create a new app - all you need to do is give it a name and hey presto!  You can either make a note of the keys then, or come back and grab them later. There are no other changes you need to make now, as that will get handled from our source code. The only other thing to do is to download the parse android library to make use of their sdk in android (if you have grabbed the source code from GitHub then you will not have to worry about these)



OK, lets get started!

Again, I am not going to spend a lot of time covering the real basics of Android and only really mention the things of note or that are different from standard application development - hopefully the code & general explanation will be clear enough to get an understanding of what is going on.


Android manifest
First, lets get our AndroidManifest.xml file configured.  The only things to note here are the permissions we are setting - we will request internet access and network state permissions. Also worth noting that I have set the min sdk for my sample app at version 16.


Our Application class
We will have to create a custom implementation of the Android Application class. This class is instantiated on application startup, and hopefully if you are looking at Android development you should be familiar with this class.  We will do a couple of things in this class:

  1. Register our parse.com application with out secret keys
  2. Initialise the Parse library and our domain objects
  3. Try to fetch all the questions for the quiz and store them for offline usage 
  4. Create a GamePlay object, that will keep track of the state of the current game in progress
First lets look at the Parse setup - this is standard parse boilerplate and is covered in parse docs and sample apps - you just need to add your ID/Key here (also note, we have just registered the Parse object class Question - this is our domain object - like a JPA entity etc - if we add more domain objects they need to be added here too)

Next we will make a call to parse.com to fetch the questions from our cloud API - we will save this in the background (make an asynchronous call) and "pin it" to make it available for offline usage. Also note that we do not un-pin existing questions until we have successfully found new ones - that way users should always have questions once they have successfully loaded them the first time.
Hopefully the above is quite clear - the parse libraries are quite straight forward to understand - we create a query (typed Question) and then call findInBackground and implement an on success handler.


Domain objects: Question
Parse library provides a nice interface to create POJOs to model your domain model, if you are familiar with JPA/Hibernate/etc and the approach of POJOs representing a domain model its much like this. Using these classes you can easily query/load/save data from the cloud by just using these objects. You will have spotted that in the query we use in the application class to load all our questions we just run a plain query with the Question class - this, as you should expect, will just retrieve all Question objects from your cloud table (parse). The domain models are just an annotated POJO, and you just define appropriate getter/setters for the fields you want to include.


Welcome activity
Now we are into Android basic stuff really - we have setup parse and fetched the questions for local usage, now we just need a basic menu to start the quiz and some activities to display the questions and the end results.

We will just apply the layout and then implement a method to handle the button clicks. For this tutorial we are skipping the high score stuff and just playing the quiz.
All we need to do is reset the current GamePlay object and grab the questions from the local store (by this point they should be updated from the cloud so no problems, then kick off the quiz!


Question activity
There is nothing really interesting to see here - it's all on github if you want to take a look in detail (or have already downloaded and working along) - This is just a standard Android activity that pulls out the question and possible answers and presents them.




This just progresses along fairly simply, until it gets to the end of the quiz and then it presents a simple screen saying the score - all this stuff can be tweaked/styled etc - but there are the basics for a cloud powered multiple choice quiz app!


Creating the questions

All the questions will be stored in the cloud in parse.com - once you have a look at the interface, it should be pretty clear - you can easily create the data for questions either manually or by importing a csv/json file.



You will need to login to your parse account and the quiz app and create a Question class. This will just match the domain POJO we have created. Just login, go to "CORE" then "DATA" Then select "+ Add Class", adda a custom class called "Question" (this must be exactly the same as the name provided in the POJO annotation for the Question class).. Then select "Add Col" and add the fields to match the POJO (question[string], option1[string], etc).  Once you have the class/table added on parse, you can simply add data by selecting "Add Row" and just manually entering the data, or using the import function.



Source code

GitHub as a CV (GaaC)

Over the recent year or so, as it has become more popular throughout the tech industry, there has been a growing amount of discussion around the idea of "GitHub as a CV" - the idea of using your online tech footprint, primarily for most people in the form of your GitHub profile, as a CV and a better representation of a potential employees ability/preference/mindset etc. There has even been a GitHub project that can automatically create a CV for you based on your profile: http://resume.github.io/ (here is mine - depressing that Coldfusion features so highly in the stats though!).  Over at NerdAbility we have really taken that idea forward (incorporating other sites like BitBucket, GoogleCode, Coursera, LinkedIn, StackOverflow, etc) and is obviously something that we think is a good idea.

But its a bad idea..

A big argument against it is that it furthers the already engrained bias towards white men. If you look at the demographic of the most active GitHub profiles there is no denying the common pattern.

I agree completely - that using GitHub as a filtering mechanism or pre-requisite for a candidate sucks. You really shouldn't do that. I have had conversations with agents where they have told me that a client only wants to see candidates who contribute to OSS, and I have declined. It doesn't qualify a candidate as being half-way competent and rules out lots of very competent people who don't have spare time to work on OSS (multiple jobs, family responsibility).


Just another data point..

I guess this is really the point here. You shouldn't rule out candidates because they don't have GitHub accounts, just like you shouldn't rule out a candidate for not having a degree etc.  I think we can all agree tech recruiting is hard, and its really hard to assess whether someone is actually a good developer and not just blagging it - so a computer scientist, I'm grateful for as much data to help with this decision as possible.

It's not that a candidate with an active GitHub profile trumps one without, but it means that its another point in an interview that we can try and use to tease out  a little bit more of an insight into the candidates skills, interests, passions.

GitHub is another point on your CV (if you are fortunate enough to have the time to setup and contribute to a GitHub profile) - just like your academic achievements and career history or anything else you choose to put on there.  If you have an interesting project it can be a talking point for an interview, in much the same way an interesting role on your CV would be.  From my point of view, I love going into interviews and hearing that the interviewer has checked out my projects on GitHub, and when the inevitable "tell us about an interesting/challenging/etc project/problem you have had" comes up, it's great to be able to talk about projects on GitHub that they have seen - not just because I know the project well, but also it will inevitably be a project that I am passionate about (otherwise I wouldn't be doing it in my spare time!).


We shouldn't be demanding OSS contributions, or set online profiles, or StackOverflow credibility - but we probably shouldn't be dismissing it as irrelevant.



Looking for a new job? with an awesome high-tech/startup?







I have just been involved in launching a new site called http://nerdability.com

The site is a new place that allows us decent, hardworking tech folks to build real, representative resumes to help cut through the endless crap that is so often the tech recruiting process.

No one is really at fault with the problems in the current recruitment process. We know it's really tough to hire tech people, as its a specialist skill blended with a touch of creativity and art, so how in the world can a potential employer/recruiter realistically asses whether we can actually do what we claim?  Enter NerdAbility. NerdAbility is online resume builder that allows users to sign up, and then connect their resume with their blog, stackOverflow profile, GitHub account, etc - the end result being you have a living CV, everytime you get up-voted on StackOverflow, or commit to an open source project you are working on in GitHub, it gets reflected in your resume. So now, you can share a resume (all users have a publicly facing url that they can tweet/facebook/email to recruiters) that really shows what you have achieved, and employers can go ahead and see your code you have commited, or see how you solve problems by your answers on StackOverflow.


There are of course plans to continue building the product, including further code repository integration, but also an opportunity for companies to make profiles and start pitching to you why you should want to go work for them (plus of course, they will be able to list job openings).

It is open to everyone worldwide (y'all know how the internet works) - but being as we are based in London, we will be first targeting the UK and Europe to get wicked-cool startups and hi-tech companies on board.


All in all, we think its pretty ace.

So come on over and join the party

http://nerdability.com 



Spring Social Integration

So, you may notice that I am something of a Spring fan, and generally like to try out different Spring projects where I get the chance. I have been aware of the Spring Social project for a little while (was looking at integrating LinkedIn with flutterby but decided against it), so recently, when a competition was announced in work to create a Java web app in the cloud for a chance to win a new Android tablet I thought this would be a good chance to put together something of a showcase of technology.

As always, the web app is a straight forward Spring MVC project built with the current latest Spring libraries (3.1 - also making use of declarative caching with ehcache, which was nice) and also includes the Spring Data MongoDB integration (again - primarily because it seems to be the most commonly supported NoSQL db in the cloud.. well CloudFoundry and OpenShift anyways). I also took the opportunity to integrate some Spring Social stuff, so I will go through that, as its pretty easy (will post again on some thoughts about Redhat's OpenShift platform and further SpringSocial stuff).



Maven Configuration

First of all, there is obviously some Maven configuration needed in the pom. It took me a little while to find the correct repositories to get all the different Spring Social libraries from (some implementations are less mature than others so not all have full proper releases yet)


	 spring-snapshot
	 Spring Maven Snapshot Repository
	 http://maven.springframework.org/snapshot



	 spring-milestone
	 Spring Maven Milestone Repository
	 http://maven.springframework.org/milestone




 Next we need to add the libraries we need:


	org.springframework.social
	spring-social-web
	${spring-social.version}
	
		
			spring-core
			org.springframework
		
		
			spring-web
			org.springframework
		
		
			spring-webmvc
			org.springframework
		
	



	org.springframework.social
	spring-social-github
	${spring-social-github.version}



	org.springframework.social
	spring-social-twitter
	${spring-social-twitter.version}


As you can see, we take the core Spring Social library, and for my project I also used Twitter and GitHub integration (I also had LinkedIn early on, but removed that as the LinkedIn terms of use on the API are pretty limiting). I added the exclusions to ensure that the latest Spring libraries were being used, as these were declared elsewhere in the Pom.



Storing User Connection Details

Spring Social supports both OAuth1 and OAuth2, and provides the underlying mechanisms to perform the "OAuth Dance" which enables your application (once approived by a user) to capture the access token required to enable your application to access the third party API. Usually your application would capture the user details including access token and store them (encrypted of course!) so the user does not need to approve the application every time they use it. There are examples in the Spring Social showcase on GitHub of doing this with a standard JDBC connection implementation, but as I was capturing my user details in Hibernate I decided to implement a Hibernate version.

To implement your own storage mechanism you need two classes, a ConnectionRepository and a UsersConnectionRepository and must implement the same interfaces:

public class HibernateUsersConnectionRepository implements UsersConnectionRepository { 

public class HibernateConnectionRepository implements ConnectionRepository {


You can see my hibernate implementations here. Obviously I also needed a Hibernate entity to represent this connection details, but that was a straight forward entity as follows:

@Entity
@Table(name = "CV_CONNECTION")
public class SocialConnection {

       @Id
       private Long id;
       private String providerId;
       private String providerUserId;
       private String displayName;
       private String profileUrl;
       private String imageUrl;
       private String accessToken;
       private String secret;
       private String refreshToken;
       private Long expireTime;
       private Integer rank;


Configuring the Beans

The next step is to define the  Spring Social beans to use (such as Twitter, GitHub etc) so they can be used throughout the application.

For this I used a Configuration class rather than adding the config directly to my application context xml (this just involves creating a POJO and annotating it with the @Configuration annotation - this then allows you to create code equivalents of xml config that would normally exist in app context xml - we will see more in the details below).

@Configuration
public class SocialConfiguration {


The beans we need to define are the ConnectionRepository, UsersConnectionRepository, ConnectionFactoryLocator and then any Social interface that we are using, in our case Twitter and GitHub.

ConnectionFactoryLocator:


@Bean
public ConnectionFactoryLocator connectionFactoryLocator() {
	  ConnectionFactoryRegistry registry = new ConnectionFactoryRegistry();
	  registry.addConnectionFactory(new GitHubConnectionFactory(gitHubKey, gitHubSecret));
	  registry.addConnectionFactory(new TwitterConnectionFactory(twitterKey, twitterSecret));
	  return registry;
}

The ConnectionFactoryLocator is simply used to initialise the UsersConnectionRepository, for every SpringSocial third party you want to use, you need to add a ConnectionFactory instance as per above - all Spring Social implementations will provide a ConnectionFactory implementation.

The @Bean annotation is like defining a <bean> in the application context xml.

UsersConnectionRepository:


@Bean
@Scope(value = "singleton", proxyMode = ScopedProxyMode.INTERFACES)
public UsersConnectionRepository usersConnectionRepository() {
	  HibernateUsersConnectionRepository repository = new HibernateUsersConnectionRepository(connectionFactoryLocator(), Encryptors.noOpText());
	  return repository;
}

This is to configure the UsersConnectionRepository - as you can see, in this case we are telling Spring to instantiate a HibernateUsersConnectionRepository.


ConnectionRepository:

This one is straight forward as well - again, using the Hibernate implementation of the interface and

@Bean
@Scope(value = "request", proxyMode = ScopedProxyMode.INTERFACES)
public ConnectionRepository connectionRepository() {
	  Authentication authentication = SecurityContextHolder.getContext().getAuthentication();
	  if (authentication == null) {
			 throw new IllegalStateException("Unable to get a ConnectionRepository: no user signed in");
	  }
	  ApplicationUser user = (ApplicationUser) authentication.getPrincipal();
	  return usersConnectionRepository().createConnectionRepository(String.valueOf(user.getAccountId()));
}

You will note that this method also includes an @Scope annotation the value is set to "request" - this is because for every individual user that accesses the applciation, we want them to have access to their own connection (so the application connects to their twitter/github accounts etc, so this bean is scoped to each individual user request (but for user session life). 

ThirdParty Interfaces:

Finally we need a config method for every third party interface that we want to use - they all take a common form as follows:

@Bean
@Scope(value = "request", proxyMode = ScopedProxyMode.INTERFACES)
public Twitter twitter() {
	if (connectionRepository().findPrimaryConnection(Twitter.class) != null) {
		return connectionRepository().findPrimaryConnection(Twitter.class).getApi();
	}
	return null;
}

You will note once again these are scoped to  a user's request and simply gets the connection from the ConnectionRepository for the given interface.



Using the Interfaces

Every implementation exposes the third party API, so each will have a specific API available to it to allow you to perform common tasks (depending on the maturity of the implementation, it may only offer a limited subset of the overall API) .

Below is an example of using the Twitter interface in the Spring-Social-Twitter library to post an update:

@Autowired
private Twitter twitter;
public void postTweet(String content) {
	  twitter.timelineOperations().updateStatus(content);
}

As you can see, once the Twitter interface has been configured and injected, it becomes very simple to utilise the API.



On the whole, it is a relatively simple set of libraries to use and offers (an ever growing) set of integrations with third party sites so worth having a look at.


As always, the entire code base is on my GitHub account (if this link fails, then check the GitHub link on the right) and for the time being the application is being hosted on the CloudFoundry platform (although this is still in development, so it may come up and down, and data may be cleared out intermittently - it is not intended as a live production site and you should not put any data in there that you want to keep!) - The application is a dynamic online resume/CV application, that allows users to create a CV as well as pull in data from their GitHub accounts to show off actual examples of code and skills etc.

Moving to GitHub



Despite usually being a commited (no pun intended) svn user I have decided to make the switch to Git & GitHub.

There were two main reasons, one was just a curiousity to see what Git was all about and see how it worked, and the other was that I just really wanted to start using GitHub.

My GitHub profile is here: https://github.com/robhinds

As a starting point, I have moved all my demo source code over to GitHub, so that can all be accessed directly from there (although the old links will still all work).

If you head to my profile you will also see a few repositories associated to a RoR tutorial that I am currently working on - so there will be a post coming out of that shortly..