Showing posts with label tutorial. Show all posts

An overview: How to implement Google's Structured Data Schemas

As a result of working in web products (both personal and professional) I have become quite au fait with SEO from a technical point of view, and purely technical things you can do to put you in good standing - all of which really, are just web best practices to be a good web citizen - things that we should be doing to make the web better (proper use of HTTP headers, mobile friendly designs, good performance for people on slower connections etc). 

One thing that is a bit above and beyond that is the use of Google's Structured Data. I will talk about what it is and what it does below, but if you are dynamically loading webpages (e.g. your website isn't just HTML on a web server, but either served from a server-side application, or is an API driven JS application), then you are most likely well placed to easily and immediately start implementing it.


1. What is Structured Data?

Google has defined a set of schemas regarding Structured Data on websites. This is a schema that allows better definition of key data points from a given website. It's a sensible move by Google and is a natural progression for their search engine.

Think about it, there are millions of websites out there made up with sprawling HTML code and content - whilst HTML is a standard (more or less!) across the web, there are probably millions of different ways people use it. It would be nice if everyone used the H1 etc heading tags consistently, or if everyone used the <em> tag the same way (emphasis vs italics), but the reality is they don't - some sites will be using HTML as intended but many many more undoubtedly just rely on <span> or <div> tags combined with CSS classes to re-define every element they might need. 

This is all fine, Google is smart enough to pull out the content for indexing - yes, if you use span elements with custom styling for headings on your website rather than the H1+ tags then Google will penalize you, but it won't stop Google reading and indexing the site. Whats more, its getting smarter all the time -  I'd probably back Google to be able to pull out relevant clips or question/answers directly in a fairly reliable way. However, they are Google and much like the Microsoft/IE of the 90s, they have the dominant market share so they can define their own standards for the web if they want to. That's exactly what Structured Data is. 

It's Google saying: 

Hey, if you provide some data in your code that looks like this then we will read that, so we don't have to guess or work out stuff. Or we can just keep trying to work it out from your html content.. it's your call


As mentioned earlier, if you have control over the source code and the data on your website, then this is really powerful. You can define specific (keyword heavy) snippets of content and explicitly tell Google about them - what's more the Schema for Structured Data lets you define content such as FAQ or how-to directions - so you can go beyond keyword heavy snippets and actually create FAQ for questions that you have identified from Google search trends (or whatever SEO tools you might use).  

Hopefully you get the picture by now - this is a pretty powerful tool.


2. Schema tags: FAQ and HowTo

Two specific parts of the Structured Data Schema stood out to me as quite universally useful for websites, and also produce decent tangible results: FAQ schema and HowTo schema.

  • FAQ schema allows high level Q&A points to be provided in the metadata for Google - generally useful as most sites will have some element of their pages that could be presented as FAQ

  • HowTo schema allows step-by-step how to guides - less widely applicable, but if you have any website that provides how-to guides or anything with instructions this is applicable.

What exactly to these tags do and why do we care? Well, as well as trying to win favour with the all-seeing google search bot, if it gets picked up it also means we get more search real estate and increased accessibility to our content which should also increase the chance of click through conversion.

If you have ever seen search results like this:


These points are being scraped from the site by Google from their schema tags - if you can automate the creation and inclusion of these for your site then you can be in a pretty good position to improve your SEO (still relatively low number of sites implementing these schemas).


3. Automating your schema tags

As I have mentioned a couple of times, and as you have hopefully realised - if you have a dynamic website, you are likely already taking structured data (from your database for example, so reliably structured!) and building HTML pages (either server-side, or data as JSON to a javascript app for page creation client side) - but either way, we are starting off with structured data, and the Google Structured Data wants.. you got it, Structured Data! So if you have the content, it really is a generic, simple transform between structured data formats.

Below is some example code - It is based on jekyll, as thats what my most recent personal project has been, but it's also pretty close to pseudocode, so hopefully you can easily translate it to whatever tech you use:

As you can see, its a simple JSON based data structure and you just fill in the relevant parts of the object with your data. 

You can see the full code in my Jekyll food website over on github - likewise, you can see the end result in action on the website (hosted by github pages, of course) too - the project was a food-science site - covering science and recipes, so a perfect match for FAQ (science pages) and HowTo (recipe pages) - for example if you go to a chilli recipe page which naturally has structured data for step-by-step instructions, and view the page source, you will see the JSON schema at the top of the page, using the HowTo schema elements laying out resources required and then the steps required. Likewise on the Science of humidity in cooking in the page source you will see the JSON schema with FAQ:









4. Conclusion

If you have control of the data being loaded on to the page (a custom site - something like WordPress or another off the shelf platform might make this harder, but at the same time there are undoubtedly lots of plugins that already handle this for you on your given platform which make it even easier!), and are even vaguely interested in increasing organic traffic and search ranking, then I'd recommend implementing this. It's very easy to add (it's also primarily behind the scenes tech - it doesn't impact the visuals or design of the pages its added to) and can only be beneficial.

As with anything trying to improve SEO, its always hard to measure and get concrete details on, but its relatively low cost to add, minimal ongoing maintenance so I think its worth a shot.


If you have had experiences with Google's Structured data, good or bad, then I'd love to hear about it!

Generic Programming with Scala & Shapeless part 2

Last year I spent some time playing with, and writing about, Scala & Shapeless - walking through the simple example of generating random test data for a case class.

Recently, I have played some more with Shapeless, this time with the goal of generating React (javascript) components for case classes. It was a very similar exercise, but this time I made use of the LabelledGeneric object, so I could access the field names - so I thought I'd re-visit here and talk a bit about some of the internals of what is going on.


Getting started

As before, I had to define implicits for the simple types I wanted to be able to handle, and the starting point is of course accepting a case class as input.

So there are a few interesting things going on here:

First of all, the method is parameterised with two types: caseClassToGenerator[A, Repr <: HList] A is simply going to be our case class type, and Repr is going to be a Shapeless HList.

Next up, we are expecting several implict method arguments (we will ignore the third implicit for now, that is an implicit that I am using purely for the react side of things - this can be skipped if the method handles everything itself):

implicit generic: LabelledGeneric.Aux[A, Repr], gen: ComponentGenerator[Repr],

Now, as this method's purpose is to handle the input of a case class, and as we are using shapeless, we want to make sure that from that starting input we can transform it into a HList so we can then start dealing with the fields one by one (in other words, this is the first step in converting a case class to a generic list that we can then handle element by element).  In this setting, the second implicit argument is asking the compiler to check that we have also defined an appropriate ComponentGenerator (this is my custom type for generating React components) that can handle the generic HList representation (its no good being able to convert the case class to its generic representation, if we then have no means to actually process a generic HList).

Straight forward so far?

The first implicit argument is a bit more interesting. Functionally, all LabelledGeneric.Aux[ARepr] is doing is asking the compiler to make sure we have an implicit LabelledGeneric instance that can handle converting between our parameter A (the case class input type) and Repr (the HList representation). This implicit means that if we try to pass some type A to this method, the compiler will check that we have a shapeless LabelledGeneric that can handle it - if not, we will get a compile error.

But things get more interesting if we look more at what the .Aux is doing!


Path dependent types & the Aux pattern

The best way to work out what is going on is to just jump into the Shapeless code and have a dig. I will proceed to use Generic as an example, as its a simpler case, but its the same for LabelledGeneric:

That's a lot simpler than I expected to find, to be honest, but as you can see from the above example, there are two types involved, there is the trait parameter T and the inner type Repr, and the Generic trait is just concerned with converting between these two types.

The inner type, Repr, is what is called a path dependent type, in scala. That is, the type is dependent on the actual instance of the enclosing trait or class. This is a powerful mechanism in Scala (but one that can also catch you out, if you are in the habit of defining classes etc within other classes or traits). This is an important detail for our Generic here, as it could be given any parameter T, so the corresponding HList could be anything, but this makes sure it must match the given case class T - that is, the Repr is dependent on what T is.

To try and get our head around it, let's take a look at an example:

Cool, so as we expected, we can see that in our Generic example, we can also see that the type Repr has been defined matching the HList representation of our case class. It makes sense that we want the transformed output HList to have its own, specific type (based on whatever input it was transforming), but it would be a real pain to have to actually define that as a type parameter in the class along with our case class type, so it uses this path-dependent type approach.

So, we still haven't got any closer to what this Aux method is doing, so let's dig into that..


The Aux Pattern

We can see from our code that the Aux method is taking two parameters, firstly A - which is the parameter that our Generic will take, but the Aux method also takes the parameter Repr - which we know (or at least pretend we can guess) corresponds to the path dependent type that is defined nested inside the Generic trait.

The best way to work out what is going on from is to take a look at the shapeless code!

As we can see, the Aux type (this is defined within the Generic object) is just an alias for a Generic[T], where the inner path-dependent type is defined as Repr - they have a pretty decent explanation of what is going on in the comments, so I will reproduce that here:

(that is abbreviated for the more relevant bits - they have even more detail in the comments that can be read).

That pretty nicely sums up the Aux pattern - the pattern allows us to essentially promote the result of a type-level computation to the higher level parameter. It can be used for a variety of things where we want to reason about the path dependent types, besides this, but this is a common use for the pattern.



So thats all I wanted to get into for now - you can see the code here, and hopefully with this overview, and the earlier shapeless overview, you can get an understanding of what the LabelledGeneric stuff is doing and how Shapeless is helping me generate React components.

Generic programming in Scala with Shapeless

In my last post about evolutionary computing, I mentioned I started building the project partly just so I could have a play with Shapeless. Shapeless is a generic programming library for Scala, which is growing in popularity - but can be fairly complicated to get started with.

I had used Shapeless from a distance up until this point - using libraries like Circe or Spray-Json-Shapeless that use Shapeless under the hood to do stuff like JSON de/serialisation without boilerplate overhead of having to define all the different case classes/sealed traits that we want to serialise.

You can read the previous post for more details (and the code) if you want to understand exactly what the goal was, but to simplify, the first thing I needed to achieve was for a given case class, generate a brand new instance.

Now I wanted the client of the library to be able to pass in any case class they so wished, so I needed to be able to generically inspect all attributes of any given case class* and then decide how to generate a new instance for it.

Thinking about this problem in traditional JVM terms, you may be thinking you could use reflection - which is certainly one option, although some people still have a strong dislike for reflection in the JVM, and also has the downside that its not type safe - that is, if someone passes in some attribute that you don't want to support (a DateTime, UUID, etc) then it would have to be handled at runtime (which also means more exception handling code, as well as loosing the compile time safety). Likewise, you could just ask clients of the code to pass in a Map of the attributes, which dodges the need for reflection but still has no type safety either.



Enter Shapeless

Shapeless allows us to represent case classes and sealed traits in a generic structure, that can be easily inspected, transformed and put into new classes.

A case class, product and HList walk into a bar

A case class can be thought of as a product (in the functional programming, algebraic data type sense), that is, case class Person(name: String, age: Int, living: Boolean) is the product of name AND age AND living - that is, every time you have an instance of that case class you will always have an instance of name AND age AND living. Great. So, as well as being a product, the case class in this example also carries semantic meaning in the type itself - that is, for this instance as well as having these three attributes we also know an additional piece of information that this is a Person. This is of course super helpful, and kinda central to the idea of a type system! But maybe sometimes, we want to be able to just generically (but still type safe!) say I have some code that doesn't care about the semantics of what something is, but if it has three attributes of String, Int, Boolean, then I can handle it - and that's what Shapeless provides - It provides a structure called a HList (a Heterogeneous List) that allows you to define a type as a list of different types, but in a compile time checked way.

Rather that a standard List in Scala (or Java) where by we define the type that the list will contain, with a HList, we can define the list as specific types, for example:
The above example allows us to define out HList with specific types, which provides our compile time safety - the :: notation is some syntactical sugar provided by Shapeless to mirror normal Scala list behaviour, and HNil is the type for an empty HList.

Case class to HList and back again

Ok, cool - we have a way to explicitly define very specific heterogeneous lists of types - how can this help us? We might not want to be explicitly defining these lists everywhere. Once again Shapeless steps in and provides a type class called Generic.

The Generic type class allows conversion of a case class to a HList, and back again, for example:
Through compile time macros, Shapeless provides us these Generic type classes for any case class. We can then use these Generic classes to convert to and from case classes and HLists:



Back to the problem at hand

So, we now have the means to convert any given case class to a HList (that can be inspected, traversed etc) - which is great, as this means we can allow our client to pass in any case class and we just need to be able to handle the generation of/from a HList.

To start with, we will define our type class for generation:

Now, in the companion object we will define a helper method, that can be called simply without needing anything passed into scope, and the necessary Generator instance will be provided implicitly (and if there aren't any appropriate implicits, then we get our compile time error, which is our desired behaviour)

As you can see, we can just call the helper method with an appropriate type argument and as long as there is an implicit Generator in scope for that type, then it will invoke that generate method and *hopefully* generate a new instance of that type.

Next, we need to define some specific generators, for now, I will just add the implicits to support some basic types:
All simple so far - if we call Generator.generate[Int] then it gets the implicit Generator[Int] and generates a random int (not that useful, perhaps, but it works!)

Now comes the part where our generate method can handle a case class being passed in, and subsequently the Shapeless HList representations of a case class (we will recursively traverse a HList structure using appropriate implict generators). First up, lets look at how we can implicitly handle a case class being passed in - at first this may start to look daunting, but its really not that bad.. honestly:
So, what is going on there? The implicit is bound by two types: [T, L <: HList] - which is then used in the implicit argument passed into the method implicit generic: Generic.Aux[T, L] and argument lGen: Generator[L] - this matches for all types T that there exists in scope a Shapeless Generic instance that can convert it into some HList L, and for which L an implicit Generator exists.  Now we know from our earlier look at Shapeless that it will provide a Generic instance for any case class (and case classes will be converted into a HList), so this implicit will be used in the scenario where by we pass in a case class instance, as long as we have an implicit Generator in scope that can handle a HList: So lets add that next!

Let's start with the base-case, as we saw earlier, HNil is the type for an empty HList so lets start with that one:
Remember, the goal is from a given case class (and therefore a given HList) is to generate a new one, so for an empty HList, we just want to return the same.

Next we need to handle a generator for a non-empty HList:
So this case is also relatively simple, because our HList is essentially like a linked list, we will handle it recursively - and the type parameters [H, T <: HList] represent the type of the head of the HList (probably a simple type -  in which case it will try to resolve the implicit using our original Int/Boolean/Double generators) and the tail of the HList which will be another HList (possibly a HNil if we are at the end of the list, otherwise this will resolve recursively to this generator again). We grab the implicit Generator for the head and tail then simply call generate on both and that's really all there is to it! We have defined the implicit for the top level case class (which converts it to the generic HList and then calls generate on that), we have the implicit to recursively traverse the HList structure and we have finally the implicits to handle the simple types that might be in our case class.

Now we can simply define a case class, pass it to our helper method and it will generate a brand new instance for us:

I was using this approach to generate candidates for my evolutionary algorithm, however the exact same approach could be used easily to generate test data (or much more). I used the same approach documented here to later transform instances of case classes as well.


Here is the completed code put together:


Footnote

I have mentioned so far that Shapeless provides the means to transform case classes and sealed traits to generic structures, but have only really talked about case classes. The sealed trait is a slightly different shape to a case class (it is analogous to the algebraic data type co-product) and accordingly, Shapeless has a different structure to handle it, rather than a HList it has a type called Coproduct. The above code can be extended to also handle sealed traits, to do so you just need to add the implicit to handle the empty coproduct (CNil) and recursively handle the Coproduct.

Intro to Genetic Algorithms in Scala

This is intended to be quite a simple high level approach to using some of these ideas and techniques for searching solution spaces. As part of my dissertation back when I was in university, I used this approach to find optimal configurations for Neural Networks. More recently, I have applied this approach to finding good configurations for training an OpenNLP model, with pretty good results - which is really what made me think to write a little about the topic.

So what is it?

Genetic algorithms are a subset of evolutionary computing that borrow techniques from classic evolution theories to help find solutions to problems within a large search space.

So, what does that mean? It sounds a little like a bunch of academic rhetoric that make nothing any clearer, right? But actually, the concept is quite simple, and works like the evolutionary theory of "Survival of the fittest" - that is, you generate a large set of random possible solutions to the problem and see how they all perform at solving the problem, and from there, you "evolve" the solutions allowing the best solutions to progress and evolve, cutting out the worst performing solutions.

How it works

In practical terms, at a high level, it involves the following steps:
  1. Create an encoded representation for the problem solution - This might sound complicated, but this just means capture all the possible inputs or configuration for a solution to the problem.

  2. Randomly generate a set of possible solutions - that is, a set of configurations/inputs (this is called a "population")

  3. Asses the performance of these solutions - to do this, you need a concept of "fitness" or how well a solution performs - and rank the population

  4. Evolve the population of solutions - this can involve a range of techniques of varying complexity, often keeping the top n candidates, dropping the bottom m candidates and then evolving/mutating the rest of the new population

  5. Repeat the rank & evolve cycle (depending on how it performs)


To understand it lets first consider a simple, hypothetical numeric problem, let's say you have some function f(x, y, z) that returns an integer, and we want to find a combination of x, y and z that gives the a value closest to 10,000 (e.g. you want a cuboid with volume of 10,000 and you want to find dimensions for the width, depth and height that achieve this - like I said, this is just hypothetical). The first step would be to encode the solution, which is pretty simple - we have three inputs, so our candidate looks like this:

case class Candidate(x: Int, y: Int, z: Int)

We will also need a fitness function, that will asses how a candidate performs (really, this will just be evaluation of the inputs against our function f)

def fitness(candidate: Candidate): Int = Maths.abs(10000 - (x * y * z))


Now we have a representation of the solution, in generic terms, and we have a function that can measure how well they perform, we can randomly generate a population - the size of the population can be chosen as suits, the bigger the population the longer an evolution cycle will take, but of course gives you a wider gene pool for potential solutions which improves your chances of a better solution.

Once we have our initial population, we can evaluate them all against our fitness functions and rank them - from there, we can evolve!

Survival of the fit, only the strong survive

So how do we evolve our candidates? There are a few techniques, we will look at the simplest here:

  • Survival of the fittest - the simple and sensible approach that the top performing candidate in your population always survives as is (in the scenario where by we have stumbled upon the optimal solution in the first random generation, we want to make sure this is preserved)

  • Mutation - much like you might imagine genes get mutated in normal evolution, we can take a number of candidates and slightly adjust the values in our candidate - for numeric "genes" its easy - we can just adjust the number, a simple popular technique for doing this is Gaussian mutation, which ensures that in most cases the the value will be close to the original value, but there are chances that it could be a larger deviation (it's a bell curve distribution, whereby the peak of the bell curve is the original value).

  • Cross breeding - again, like normal reproduction, randomly select two parent candidates (probably from the top x% of the candidate pool) and take some attributes from each parent to make up a child

Between mutation and cross breeding you can create a new pool of candidates and continue the search for a solution.

Some code

Partly because I wanted to learn a bit more about the Shapeless library (more of that in another post) and partly for fun, I created a simple GitHub project with some code that has very simple evolutionary search machinery that allows you to try and use these techniques.

The shapeless stuff was to support generic encoding of our solution - so the library can easily be used to define a solution representation and a fitness function.

Following on from the example above to find three dimensions that have a volume of 10,000 I have the following configuration:
In the above, we first define the case class ThreeNumbers - this can be any case class you like, but must have Gene[A] as its attributes - these are custom types that I created that have defined generation and mutation methods, and have additional configuration (such as max and min values for numbers etc).  We then pass all the configuration in, along with a fitness function of the form (a: B => Double) and run it.

It varies on each run, as the initial pool of candidates is generated at random, but in most cases this example solves the problem (finds an error rate of zero) within 10 generations:



As mentioned, I have used this approach to train OpenNLP NER models - the different training data generation parameters and the training configuration was all encoded as a candidate class and then used this method to evolve to find the best training configuration for named entity recognition, but it can be applied to all sorts of problems!

Feel free to try it out and let me know if it works well for any problem domains - you can also checkout the code and make it more sophisticated in terms of evolution/mutation techniques and introducing a learning rate.

Using OpenNLP for Named-Entity-Recognition in Scala

A common challenge in Natural Language Processing (NLP) is Named Entity Recognition (NER) - this is the process of extracting specific pieces of data from a body of text, commonly people, places and organisations (for example trying to extract the name of all people mentioned in a wikipedia article). NER is a problem that has been tackled many times over the evolution of NLP, from dictionary based, to rule based, to statistical models and more recently using Neural Nets to solve the problem.

Whilst there have been recent attempts to crack the problem without it, the crux of the issue is really that for approach to learn it needs a large corpus of marked up training data (there are some marked up corpora available, but the problem is still quite domain specific, so training on the WSJ data might not perform particularly well against your domain specific data) and finding a set of 100,000 marked up sentences is no easy feat.  There are some approaches that can be used to tackle this by generating training data - but it can be hard to generate truly representative data and so this approach always risks over-fitting to the generated data.

Having previously looked at Stanford's NLP library for some sentiment analysis, this time I am looking at using the OpenNLP library. Stanford's library is often referred to as the benchmark for several NLP problems, however, these benchmarks are always against the data it is trained for - so out of the box, we likely won't get amazing results against a custom dataset. Further to this, the Stanford library is licensed under GPL which makes it harder to use in any kind of commercial/startup setting. The OpenNLP library has been around for several years, but one of its strengths is its API - its pretty well documented to get up and running, and is all very extendable.


Training a custom NER

Once again, for this exercise we are going back to the BBC recipe archive for the source data - we are going to try and train an OpenNLP model that can identify ingredients.

To train the model we need some example sentences - they recommend at least 15,000 marked up sentences to train a model - so for this I annotated a bunch of the recipe steps and ended up with somewhere in the region of about 45,000 sentences.

As you can see in the above example, the marked up sentences are quite straight forward. We just wrap the ingredient in the tags as above (although note that if the word itself isn't padded by a space either side inside the tags, it will fail!).

Once we have our training data, we can just easily setup some code to feed it in and train our model:

This is a very simple example of how you can do it, and not always paying attention to engineering best practices, but you get the idea for whats going on. We are getting an input stream of our training data set, then we instantiate the Maximum Entropy name finder class and ask it to train a model, which we can then write to disk for future use.

When we want to use the model, we can simply load it back into the OpenNLP Name Finder class and use that to parse the input text we want to check:

So, once I had created some training data in the required format, and trained a model I wanted to see how well it had actually worked - obviously, I don't want to run it against one of the original recipes as they were used to train the model, so I selected this recipe for rosemary-caramel millionaire shortbread, to see how it performed, here are the ingredients it found:

  • butter
  • sugar
  • rosemary
  • caramel
  • shortbread


All in all, pretty good - it missed some ingredients, but given the training data was created in about 20 minutes just manipulating the original recipe set with some groovy, that's to be expected really, but it did well in not returning false positives.


In conclusion, if you have a decent training set, or have the means to generate some data with a decent range, you can get some pretty good results using the library. As usual, the code for the project is on GitHub (although it is little more than the code shown in this post).

Machine (re)learning: Neural Networks from scratch

Having recently changed roles, I am now in the enviable position of starting to do some work with machine learning (ML) tools. My undergraduate degree was actually in Artificial Intelligence, but that was over a decade ago, which is a long time in computer science in general, let alone the field of Machine Learning and AI which has progressed massively in the last few years.

So needless to say, coming back to it there has been a lot to learn. In the future I will write a bit more about some of the available tools and libraries that exist these days (both expanding on the traditional AI Stanford libraries I have mentioned previously with my tweet sentiment analysis, plus newer frameworks that cover "deep learning").

Anyway, inspired by this post, I thought it would be a fun Sunday night refresher to write my own neural network. The first, and last, time that I wrote a neural network was for my final year dissertation (and that code is long gone), so was writing from first principals. The rest of this post will be a very straight forward introduction to the ideas and the code for a basic single layer neural network with a simple sigmoid activation function. I am training and testing on a very simple labeled data set and with the current configuration it scores 90% on the un-seen classification tests.  I am by no means any kind of expert in this field, and there are plenty of resources and papers written by people who are inventing this stuff as we speak, but this will be more the musings of someone re-learning this stuff.

As always, all the code is on GitHub (and, as per my change in roles, this time it is all written in Scala).

A brief overview

The basic idea is that a Neural Network(NN) attempts to mimic the parallel architecture of the human brain. The human brain is a massive network of billions of simple neural cells that are interconnected, and given a stimulus they each either "fire" or don't, and its these firing neural cells and synapses that enable us to learn (excuse my crude explanation, I'm clearly not a biologist..).

So that is what we are trying to build: A connected network of neurons that given some stimuli either "fire" or don't.

A Neuron

Ok, so this seems like a sensible place to start, right? We all know what a network is, so what are these nodes in our network that we are connecting? These are the decision points, and in themselves are incredibly simple. They are basically a function that takes n input values  and multiplies them by a pre-defined weight (per input), adds a bias and then runs it through an activation function (think of this as our fire/don't-fire function):



In terms of our code, this is pretty simple (don't worry about the sigmoid derivative values we are setting, we are just doing this to save time later):

As you can see, the neuron holds the state about the weights of the different inputs (a NN is normally fixed in terms of number of neurons, so once initialised at the start we know that we will get the same number of inputs).

You may have also noticed the first line of the class, where we require an ActivationFunction to be applied - as I mentioned, this is our final processing of the output. In this example I am just using the Sigmoid function:

As you can see, it's a pretty simple function. Much like the brain, the neurons are very simple processors, and its only the combination of them that makes the brain so powerful (or deep learning, for that matter)

The network

Ok, so a sinple "shallow" NN has an input layer, a single hidden layer and an output layer (deep NNs will have multiple hidden layers).  The network is fully connected between layers - that is to say, each node is connected to every node in the next layer up, and thats it - no neurons are connected to other layers and no connections are missed.



The exact setup of the network is largely problem dependent, but there are some observations:

  1. - The input layer has to correlate to your inputs: If you have a data set that has two input values, let's say you have a dataset that contains house price data, and you have the input values number of rooms and square foot then you would have to have two input neurons.

  2. - Similarly, if you were using the NN for a classification problem and you know you have a fixed number of classifications, the output neurons would correlate to that. For example, consider you were using the NN to recognise hand written digits (0-9) then you would likely have 10 output neurons to group those possible outputs.

 The number of hidden neurons, or the number of layers is a lot more problem dependent and is something that needs to be tuned per problem.

If you have ever worked with graph type structures in code before, then setting up a simple network of these neurons is also relatively straight forward given the uniform structure of the network.

The weights

Oh right, yes. So all these neurons are connected, but its a weighted graph, in CS terms. That is, the connection between each node is assigned a weight - as a simple example, if we take the house price dataset as an example, after looking at the data we might determine that the number of rooms is a more significant factor in the end result, so that connection should have a greater weighting than the other input.  Now that is not exactly what happens, but in simple terms it's a good way of thinking about it.

Given that the end goal is for the NN to learn from the data, it really doesn't matter too much what we initialise the weights for all the connections to, so at start-up we can just assign random values. Just think of our NN as a newborn baby - it has no idea how important different inputs are, or how important the different neural cells that fire in response to the stimuli are - everything is just firing off all over the place as they slowly start to learn!

So what next?

Ok, so we have our super-simple neurons, that just mimic a single brain cell and we have them connected in a structured but randomly weighted fashion, so how does the network learn? Well, this is the interesting bit..

When it comes to training the network we need a pretty large dataset to allow it to be able to learn enough to start generalising - but at the same time, we don't want to train it on all the data, as we want to hold some back to test it at the end, just to see just how smart it really is.

In AI terms, we are going to be performing "supervised" learning - this just means that we will train it with a dataset where we know the correct answer, so we can make adjustments based on how well (or badly) the network is doing - this is different to "unsupervised" learning, where we have lots of data, but we don't have the right answer for each data point.

Training: Feed forward


The first step is the "feed-forward" step - this is where we grab the first record from our training data set and feed the inputs into our randomly initialised NN, that input is fed through the all of the neurons in the NN until we get to the output layer and we have the networks attempt at the answer. As the weighting is all random, this is going to be way off (imagine a toddlers guess the first time they ever see a smart phone).

As you can see, the code is really simple, just iterate through the layers calculating the output for each neuron. Now, as this is supervised learning we also have the expected output for the dataset, and this means we can work out how far off the network is and attempt to adjust the weights in the network so that next time it performs a bit better.

Training: Back propogation

This is where the magic, and the maths, comes in.  The TL;DR overview for this is basically, we know how wrong the network was in its guess, so we attempt to update each of the connection weights based on that error, and to do so we use differentiation to work out the direction to go to minimise the error.

This took a while of me staring at equations and racking my brain trying to remember my A-level maths stuff to get this, and even so, I wouldn't like to have to go back and attempt to explain this to my old maths teacher, but I'm happy I have enough of a grasp of what is going on, having now coded it up.

In a very broad, rough overview, here is what we are going to do now:

  1. - Cacluate the squared error of the outputs. That is a fairly simple equation of
    1/2 * (target - output)^2

  2. - From there, we will work out the derivative of this function. A lot of the errors we will start to work with now uses differentiation and the derivatives of the result - and this takes a little bit of calculus, but the basic reason is relatively straight forward if you think about what differentiation is for.

     
  3. - We work back through the network, calculating the errors for every neuron combining the error, derivatives and the weighting (to determine how much a particular connection played in an error, it also needs to be considered when correcting the weighting.

Once we have adjusted the weight for each connection based on the weight, we start again and run the next training data record.  Given the variation in the dataset, the next training record might be quite different to the previous record trained on, so the adjustment might be quite different, so after lots (maybe millions) of iterations over the data, incrementally tweaking and adjusting the weights for the different cases, hopefully it starts to converge on something like a reasonable performance.

Maths fun: Why the derivative of the errors?

So, why is calculating the derivative relevant in adjusting the errors?

If you imagine the network as a function of all the different weights, its pretty complicated, but if we were to reduce this, for the sake of easier visualisation, to just be a 3-d space of possible points (e.g. we have just 3 weights to adjust, and those weights are plotted on a 3-d graph) - now imagine our function plots a graph something like this:


(taken from the wikipedia page on partial derivatives)

The derivative of the function allows us to work out the direction of the slope in the graph from a given point, so with our three current weights (co-ordinates) we can use the derivative to work out the direction in which we need to adjust these weights.

Conclusion

The whole NN in scala is currently on Github, and there isn't much more code than I have included here.  I have found it pretty helpful to code it up from scratch and actually have to think about it - and has been fun coding it up and seeing it train up to getting 90% accuracy on the unseen data set I was using (just a simple two-in two-out dataset, so not that impressive).

Next up I will see how the network performs against the MNIST dataset (like the Hello-World benchmark for machine learning, and is the classification of handwritten digits).

Oh, and if you are interested as to why the images look like a dodgy photocopy, they are photos of original diagrams that I included in my final year dissertation in university, just for old time sake!

Spring-Boot & Netflix OSS - An adventure into microservices

Honestly, I still need convincing on microservices.

I can see that they are a compelling argument compared to a monolithic application, but I think I need to get my head around some of the challenges they face - the first one that comes to mind being how to effectively define the microservice boundaries - as it seems to me a lot of the applications I have ever worked with are monolithic because these boundaries are so blurred.


Anyway, I wanted to do some tech stuff, so decided to start building out an application using the microservice architectural pattern and Spring Boot seemed like a good place to get started.

This is very much a work in progress, and I am continuing to progress through different aspects of the application and at the moment there is very little actual code written (in part that is due to the simplicity that Spring-Boot provides).  All code is being kept up to date in GitHub so feel free to have a look at that.


There are lots of great blogs covering this stuff already, so won't re-cover their work, the following article gives a great write up of the Netflix OSS and the Spring integration which is worth reading:

http://callistaenterprise.se/blogg/teknik/2015/04/10/building-microservices-with-spring-cloud-and-netflix-oss-part-1/

(Image from: Building Microservices with Spring-Cloud and Netflix OSS)


Getting started: A service registry - Eureka

One of the first things that is needed is a central Service Registry to allow service-discovery - this is not a new concept to microservices and is an approach used by SOA.  Straight out of the box, Spring-Boot provides integration with Netflix's OSS application Eureka, that provides this.  I opted to have a dedicated application for my registry (code can be seen here) and it really is as simple as adding the relevant dependencies to the build.gradle file, adding an @EnableEurekaServer annotation to our application config then a simple config file defining the server port/name etc and its done!  You can just run gradle assemble in that project to build the JAR file, then run java -jar [the new JAR file] and the application will spin up - you should then be able to go to http://localhost:1111 and you will see the Eureka dashboard (with no microservices registered of course).



My first microservice

So, I had Eureka up and running, but it was looking pretty lonely with no services registered.

A microservice in Spring is also very simple - as really, all it is is a simple web application that runs in its own process with a limited domain - so spinning up a Spring Boot MVC RESTful webapp with a single controller/endpoint is enough to get me a microservice (even just a tweet would do it..)
So we can create our new microservice to do anything we like, in my case I created a QuoteService (the application is slowly evolving into an insurance engine).  Just having the standalone app isn't helping much, so we need to add some configuration to tell the service to register with our Eureka server - this will make our new microservice discoverable by other services wanting to use it. 

Again this is quite simple: we need to tell our application it should try and register with Eureka, and we should add the config to do so:

You can see that we simply annotate our application config in java, and then add some properties that define where the Eureka server is hosted and that's basically it.

Now if we build the project JAR and start up again (and we still have our Eureka service registry running) then after 30seconds or so you should see the Quote-Service registered and ready to use.


On to the next one.. 

Now, we have a microservice, and we have a registry that makes it discoverable, but still - just one microservice is pretty lonely. So next I created another dummy RESTful microservice, this time called ProductService which just followed the same pattern as the first.

Once that was started up then the Eureka dashboard started looking a bit happier with the two services registered - the obvious next challenge is seamless interaction between the two: splitting the services into their own processes is all good an well, but meaningless if you can't easily integrate them.  The way I look at it is when reading the application code of a service (or application using microservices) then it should just look like a normal application with sevice classes - there shouldn't be any fanfare around the fact that my service class actually gets the data from a dedicated microservice over HTTP/AMQP rather than just getting directly from the DB in the traditional way.


So, still just stubbing out the endpoints, I updated my QuoteService endpoint to make a call to my ProductService, and then just jammed that response into the JSON response I was returning anyway:

As you can see, it could be a standard controller in a normal monolithic application from this point, we are just calling a method on our autowired ProductService class and returning that.


So the really interesting part is in the ProductService class - at the moment this isn't a really elegant, abstracted class yes, so there is still some boiler plate, but that will have the advantage of making it clear what is going on:

As you can see, it's just making a REST call to the Product microservice and returning the response cast as a Map - but the really nice part of this is that the service url is just the service name (in this case "PRODUCT-SERVICE", that is injected to the class)  and with the RestTemplate annotated with Spring-Cloud's @LoadBalanced that microservice will be looked up in Eureka (and load balanced if there is more than one PRODUCT-SERVICE running).

So our setup is starting to take shape now - we have two microservices, both registered with Eureka and able to interact with each other in a fairly clean, loosely coupled way.


Don't push me, 'cos I'm close to the edge..

As your microservices start to proliferate, you will get different levels of service granularity, and undoubtedly you won't just want to expose all your microservices as a public API.  One option would be to create a RESTful application and define nicely named endpoints you want to expose and then use the standard integration described above to integrate it.

Fortunately, there is an easier way - Netflix provides a library called Zuul that can be simply configured to map URL patterns to given defined service names (and again looks up in the Eureka service registry).  Much like Eureka, this is super easy to setup and just needs an annotation and the config again:

And the config is pretty easy to understand:

As you can see, we just define service names against URL patterns.

Now once all are apps are up and running, and the microservices are registered on Eureka then you have a single API interface to start interacting with the services (rather than having to access each service on its designated port etc).


Conclusion

 So that's as far as I have got - I wired up the QuoteService to MongoDB so the data all gets persisted there (and have added a get quote endpoint which gets the same data from mongo) and starting to wire up the product service with JPA.  So far it's been enjoyable, and things are making more sense than when I started - but there are a few questions still:

  • It seems like there is still duplication of service names throughout the different projects - for example the ProductService name ("product-service" - case insensitive) is proliferated throughtout - the service itself defines it, the QuoteService needs to know the name of the service, the Zuul edge server needs to know the name etc.  I guess this is unavoidable as these are intrinsic dependencies but still seems a bit flaky.
  • It feels like the Service classes could be factored out - our ProductService class that allows HTTP REST interactions with the Product microservice would likely need to be re-used across all applications/microservices that need to use the Product microservice

Android: Building a cloud based quiz application

A long time ago, when Android was still in its infancy (1.5 I think..) I built and open sourced a basic quiz app.  The app was just a basic multiple choice question-answer app, driven from some questions in the database, but it did ok - it has had over 10k downloads on the app store, and the blog post tutorial here is still one of the most popular articles.

But here we are, Android 5.0 is released and the state of Android development is very different now. But I didn't really want to just re-skin/tweak the old app and push it out again - and I also wanted to write up some notes on using parse.com as a backend service - so this seemed like a good opportunity.

The source code for the app is all on GitHub.


The goal

So the aim is to create a an android app quiz game, but rather than using local storage, using the cloud to power the questions. This avoids the need for lots of boiler plate DB code and also makes it easier for us to update questions.  The tutorial will be broken into two parts - the first part will be basic quiz stuff with cloud questions, the second part will enhance the app to support user login and to track scores to allow users to compete against each other.


Parse

Before you start the tutorial, you need to get an account setup at parse.com - its a cloud DB/backend as a service that was recently bought by Facebook. They allow real easy setup of DBs and provide a load of nice libraries/APIs to really easily interact with their endpoints across lots of platforms (its also incredibly well priced -the free tiew is really good and if you find your mobile app is going beyond that then you can probably think about monetising the app!).  All you need to do is head over there, sign-up and then create a new app - all you need to do is give it a name and hey presto!  You can either make a note of the keys then, or come back and grab them later. There are no other changes you need to make now, as that will get handled from our source code. The only other thing to do is to download the parse android library to make use of their sdk in android (if you have grabbed the source code from GitHub then you will not have to worry about these)



OK, lets get started!

Again, I am not going to spend a lot of time covering the real basics of Android and only really mention the things of note or that are different from standard application development - hopefully the code & general explanation will be clear enough to get an understanding of what is going on.


Android manifest
First, lets get our AndroidManifest.xml file configured.  The only things to note here are the permissions we are setting - we will request internet access and network state permissions. Also worth noting that I have set the min sdk for my sample app at version 16.


Our Application class
We will have to create a custom implementation of the Android Application class. This class is instantiated on application startup, and hopefully if you are looking at Android development you should be familiar with this class.  We will do a couple of things in this class:

  1. Register our parse.com application with out secret keys
  2. Initialise the Parse library and our domain objects
  3. Try to fetch all the questions for the quiz and store them for offline usage 
  4. Create a GamePlay object, that will keep track of the state of the current game in progress
First lets look at the Parse setup - this is standard parse boilerplate and is covered in parse docs and sample apps - you just need to add your ID/Key here (also note, we have just registered the Parse object class Question - this is our domain object - like a JPA entity etc - if we add more domain objects they need to be added here too)

Next we will make a call to parse.com to fetch the questions from our cloud API - we will save this in the background (make an asynchronous call) and "pin it" to make it available for offline usage. Also note that we do not un-pin existing questions until we have successfully found new ones - that way users should always have questions once they have successfully loaded them the first time.
Hopefully the above is quite clear - the parse libraries are quite straight forward to understand - we create a query (typed Question) and then call findInBackground and implement an on success handler.


Domain objects: Question
Parse library provides a nice interface to create POJOs to model your domain model, if you are familiar with JPA/Hibernate/etc and the approach of POJOs representing a domain model its much like this. Using these classes you can easily query/load/save data from the cloud by just using these objects. You will have spotted that in the query we use in the application class to load all our questions we just run a plain query with the Question class - this, as you should expect, will just retrieve all Question objects from your cloud table (parse). The domain models are just an annotated POJO, and you just define appropriate getter/setters for the fields you want to include.


Welcome activity
Now we are into Android basic stuff really - we have setup parse and fetched the questions for local usage, now we just need a basic menu to start the quiz and some activities to display the questions and the end results.

We will just apply the layout and then implement a method to handle the button clicks. For this tutorial we are skipping the high score stuff and just playing the quiz.
All we need to do is reset the current GamePlay object and grab the questions from the local store (by this point they should be updated from the cloud so no problems, then kick off the quiz!


Question activity
There is nothing really interesting to see here - it's all on github if you want to take a look in detail (or have already downloaded and working along) - This is just a standard Android activity that pulls out the question and possible answers and presents them.




This just progresses along fairly simply, until it gets to the end of the quiz and then it presents a simple screen saying the score - all this stuff can be tweaked/styled etc - but there are the basics for a cloud powered multiple choice quiz app!


Creating the questions

All the questions will be stored in the cloud in parse.com - once you have a look at the interface, it should be pretty clear - you can easily create the data for questions either manually or by importing a csv/json file.



You will need to login to your parse account and the quiz app and create a Question class. This will just match the domain POJO we have created. Just login, go to "CORE" then "DATA" Then select "+ Add Class", adda a custom class called "Question" (this must be exactly the same as the name provided in the POJO annotation for the Question class).. Then select "Add Col" and add the fields to match the POJO (question[string], option1[string], etc).  Once you have the class/table added on parse, you can simply add data by selecting "Add Row" and just manually entering the data, or using the import function.



Source code

Tech: Functional programming in Groovy

I have recently started the Coursera Functional Programming with Scala course (taught by Martin Odersky - the creator of Scala) - which is actually serving as an introduction to both FP and Scala at the same time having done neither before. The course itself is great, however, trying to watch the videos and take in both the new Scala syntax and the FP concepts at the same time can take a bit of effort.

I wanted to work through some of the core FP concepts in a more familiar context, so am going to apply some of the lessons/principles/exercises in Groovy.


Functions:

If you have done any Groovy programming then you will have come across Groovy functions/closures. As Groovy is dynamically typed (compared to Scala's static typing), you can play it fairly fast and loose.

For example, if we take the square root function that is demonstrated in the Scala course, it is defined as follows:



As you can see, Scala expects the values to be typed (aside, you don't actually always need to provide a return type in Scala). But in the Groovy function it is:



A groovy function can be defined and assigned to any variable, thereby allowing it to be passed around as a first class object. If we look at the complete solution for calculating the square root of a number (using Newton's method - To compute the square root of "x" we start with an estimate of the square root, "y" and continue to improve the the guess by taking the mean of x and x/y )


So that's in Scala, if we try Groovy we will see we can achieve pretty much the same thing easily:


Recursion (and tail-recursion):

As FP avoids having mutable state, the most common approach to solve problems is to break the problem down in to simple functions and call them recursively - This avoids having to maintain state whilst iterating through loops, and each function call is given its input and produces an output.

If we again consider an example from the Scala course, with the example of a simple function that calculates the factorial for a given number.

This simple function recursively calculates the factorial, continuing to call itself until all numbers to zero have been considered. As you can see, there is no mutable state - every call to the factorial function simply takes the input and returns an output (the value of n is never changed or re-assigned, n is simply used to calculate output values)

There is a problem here, and that is as soon as you attempt to calculate the factorial of a significantly large enough number you will encounter a StackOverflow exception - this is because in the JVM every time a function is called, a frame is added to the stack, so working recursively its pretty easy to hit upon the limit of the stack and encounter this problem.  The common way to solve this is by using Tail-Call recursion. This trick is simply to have the last code that is evaluated in the function to be the recursive call - normally in FP languages the compiler/interpreter will recognise this pattern and under the hood, it will really just run the code as a loop (e.g. if we know the very last piece of code in the block of code is calling itself, its really not that different to just having the block of code/function inside a loop construct)

In the previous factorial example, it might look like the last code to be executed is the recursive call factorial(n-1) - however, the value of that call is actually returned to the function and THEN multiplied by n - so actually the last piece of code to be evaluated in the function call is actually n * return value of factorial(n-1).
Let's have a look at re-writing the function so it is tail-recursive.


Now, using an accumulator, the last code to be evaluated in the function is our recursive function call. In most FP languages, including Scala, this is enough - however, the JVM doesn't automatically support tail-call recursion, so you actually need to use a rather clunkier approach in Groovy:

The use of the trampoline() method means that the function will now be called using tail-call recursion, so there should never be a StackOverflow exception. It's not as nice as in Scala or other languages, but the support is there so we can continue.


Currying:

This is like function composition - the idea being you take a generic function, and then you curry it with some value to make a more specific application of the function. For example, if we look at a function that given values x and y, it returns z which is the value x percent of y (e.g. given x=10, y=100, it returns the 10 percent of 100,  z=10)

The above simple function is a generic mechanism to get a percentage value of another, but if we consider that we wanted a common application of this function was to always calculate 10% of a given value - rather than write a slightly modified version of the function we can simply curry the function as follows:

Now, if the function tenPercent(x) is called, it uses the original percentage() function, but curries the value 10 as the first argument. (If you need to curry other argument positions you can also use the rcurry() function to curry the right most argument, or ncurry() which also takes an argument position - check the Groovy docs on currying for more info)


Immutability:

Immutability is partially supported in Java normally with use of the final keyword (meaning variables can't be changed after being initially set on object instantiation). Groovy also provides a quick and easy @Immutable annotation that can be added to a class to easily make it immutable.  But really, there is more to avoiding immutable state than just having classes as immutable - As we have functions as first class objects, we can easily assign variables and mutate them within a function - so this is more of a mindset or philosophy that you have to get used to. For example:

The first example is probably more like the Groovy/Java code we are used to writing, but that is mutating the state of the list - where as the second approach leaves the original list unchanged.


Map Reduce:

As a final note, there are some functions in FP that are pretty common techniques - the most famous of which these days (in part thanks to Google) is Map-Reduce, but the trio of functions are actually Map, Reduce(also known as Fold) & Filter - you can read more about the functions here (or just google them!), but these functions actually correlate pretty nicely to core Groovy functions that you probably use a lot of (assuming you are groovy programmers).

Map

map is the easiest to understand of the three. It takes in two inputs - a function, and a list. It then applies this function to every element in the list. You can basically do the same thing with a list comprehension however. 
Sound familiar? This is basically the .collect{} function in Groovy

Reduce/Fold

This one is a bit more complicated to descibe, but is the same as the .inject{} function in groovy

Filter

And another simple one - filtering out a list for desired elements, this is Groovy's .findAll{} function



As I said at the start, I am new to FP and coming from an OO background, but hopefully the above isn't too far from the truth!  As I get further through the Coursera course I will try to post again, maybe with some of the assignments attempted in Groovy to see how it really stands up.


Some useful references: