Showing posts with label technical interviews. Show all posts

Technical aptitude testing & recruiting

Having spent a reasonable amount of time on both sides of the interview table, plus having co-founded NerdAbility, it may come of little surprise that I am pretty opinionated on the topic.

It's a pretty divisive topic, and a lot of people feel quite strongly about a lot of this stuff, but here's my opinion anyway..


Tech tests

First up is the question of whether or not potential candidates should have to take some kind of tech test. Personally I like them. But with a few caveats:

  1. They should be easy. 
    This might sound counter intuitive, but I prefer a relatively simple tech test.  In reality, I'm not really convinced that you gain that much from the more complex tests, and at worst, probably just get false negatives and end up mistakenly ruling out great candidates.  I have seen tests that take days (unit testing/mocking/designing/coding/reviewing/re-factoring etc.) and if anything they put candidates off, and as mentioned probably don't provide much info.

    Something nice and simple, let's say a modest estimate of an hour all in is probably about right.  As ranted about by many folk, famously Jeff Atwood, a basic FizzBuzz programming test will rule out a lot of people.

    Further more, when I review test submissions I don't really care about if they work - what I am really looking at is the coding style and overall approach. Coding style, class structuring, use of nice libraries/core functionality/data structures, unit tests.  If you have a simple test that should take no more than an hour to code then candidates really don't have any excuse not to make their best efforts with how the code is structured, unit tested etc - so you can set the bar pretty high.
  2. They shouldn't be timed
    Timing the tests just blur the lines - if you are timing the tests then you have to lower the bar. With a simple, non-timed test, you can say you will rule out people who haven't submitted unit-tests for example, but if you set a time limit then you have to excuse people - and will inevitably find yourself saying things like

    "Well, sure its 200 lines of code all in the main() method, and variable names like 'tmpString', and it probably doesn't work, and it's not unit-tested.. but maybe they were rushed for time.. maybe we should bring them in..? "

    It happens, the bar slips lower and lower, and eventually the test is serving no purpose other than ruling out those people who just can't be bothered.
  3. They should be core technology concepts
    There is no point having tests that test specific domain knowledge or expect experience beyond core language competencies. Even if your business is in a very specific niche, you are going to do much better hiring great tech folk if you test core competencies rather than specific libraries/tools/technologies.
  4. They should be done before hand
    On-site testing adds a different dimension to the test - whether it be whiteboard or on a machine, there are other variables that can end up being a distraction. On a whiteboard candidates can end up worrying over exact method signatures and missing semi colons (but whiteboard is much preferable to actual coding - In my opinion, you should never expect a candidate to bring a laptop, and asking a candidate to use another machine/OS/IDE is also fraught with potential distractions).



What's the point?

As mentioned, I am not a big believer in using the tests to really measure if someone is a great programmer. For me, they serve two simple purposes (well three actually, but I will mention the third point later)
  1. They can be bothered. They are actually interested enough in the role and the company to invest their own time and energy. This will rule a few people out who are just machine gunning resumes out to lots of companies blindly, or those who are just after some interview practice.
  2. They have demonstrated an understanding of core technology approaches/patterns - Simple things like Single-responsibility, unit testing (use of good assertions, sensible testing, messaging etc), class organisation shows that they have actually spent a reasonable amount of time programming and keep up technology.  A nice little example of this, that I like to see, is use of Java's Collections/Arrays convenience methods (assuming testing Java!)  Arrays.asList( "1", "2", "3" ) makes declaration of explicit lists easier (nice for testing etc) and shows a knowledge of core Java stuff.


So all we have so far is know they can be bothered and that they have a good understanding/interest in program design/architecture - not much more that can indicate whether or not they are an awesome developer.



The interview

This is where the test really comes into its own.  I think using the test to drive the tech interview is really a great way to go.

You can walk through the code, ask about design decisions, and with a little extra thought you can easily push into variations of the test, how would they handle other constraints and you can continue to push through varying degrees of complexity, and if you are consistent with your tests you get a consistent sliding scale to compare candidates - you know exactly at what point did each candidate get to on the scale of questions. See this article as a great example of this technique more generally (also fun/good practice to try working through the problems the author is asking before reading the answers to see how far you get).

This approach of starting very easy and working up a sliding scale of difficulty is a common practice used by big co's like Google et al.



An example

Here's an approach I quite like

Tech test: the problem

Given a webpage address, find the most common word on the page.

This tests core technology concepts, stuff like good usage of HashMap (or similar) data structure for counting words, extensive enough to need some proper unit testing, but quick enough to complete relatively quickly.


Interview: followup

There are a few follow up questions that can be used to further probe the candidates understanding:
  1. What would you need to change if I wanted the most common word on a whole website (e.g. wikipedia) - this can go into how to crawl webpages and potential pitfalls if that is a relevant area, but otherwise can go into challenges regarding the amount of information needed to be stored, e.g. if you have limited memory, how can you store the info etc
  2. If I wanted the top 5 most common words how would you change it?  This is interesting as there are a variety of solutions, and unless they go straight for the optimal solution you can keep asking if they can think of any better solution. 

    For example, they might just keep track of the top 5 words during the counting, which is pretty efficient, but less flexible when top 5 becomes Top X words;

    Alternatively they may just count all the words, then implement a comparator for the Map entries and sort them all and just take the top X - which is flexible but is always going to be  O(nlogn) performance (sorting is always at best nlogn)

    Another approach is to use a Heap (PriorityQueue in Java), and heapify the counted set (heapify can be completed in O(n) time) then just take the top X elements from the queue (X being a lower order constant not dependent on the size of the dataset, and log n being lower order than the linear time to heapify the data upfront,  so performance is O(n))

    You can also follow up this question with further questioning about performance and Big-O - if thats something that you think is interesting/relevant for the position - which it might not be..



Whatever test you choose, as long as you have some sensible and interesting questions to follow up with, I think it makes for a pretty productive process and in many ways is optimized for the candidate making the best impression they can.

Tech cheat sheets - Maps

Also sometimes called an associative array, symbol table or dictionary - Maps are collections of key-value pairs and are probably one of the other most common data structures you might come across day-to-day.

HashMap

The most commonly used Map implementation in Java is probably the HashMap. The HashMap makes use of the equals() and hashCode() method on Java's Object API.

The basic premise is that the HashMap has a collection of "buckets", each which can hold several objects. When an object is added to a HashMap the hashCode() method on the key is used to select the bucket to use, the object is then added in that bucket.  For retrieval, its the same process - hashCode() is used to determine the bucket, then the entries in the bucket are inspected and the equals() method is used to determine the match.  In Java's HashMap, the buket is essentially implemented as a linked list (not a LinkedList - but Entry<k,v> has a pointer to the next entry)

The obvious implication of this, is that the performance is dependent largely on the design of a good equals() and hashCode() method.  For example, if you designed a hashCode() method that always returned the constant 1 (which would be legal, as the Java contract is that if two Objects are equal() then they must have the same hashCode(), but if two Objects have the same hashCode() they do not need to be equals()) - then it would mean all entries would be put in a single bucket.


The hashCode() method

Designing a good hash code implementation is very important - for performance (see this stackoverflow discussion on the performance impact of large HashMaps with poor hashCode implementations), but also if your hash code is erroneous then your HashMaps might just not work and you may insert objects in your Map and never be able to retrieve them (if hashCode() doesn't return consistent values for example, it could be placed in a bucket, then when trying to retrieve it generates a different hashCode() so looks in a different bucket).

If you know the complete key set, and it fits in to the Integer range (hashCode() returns an int) then you could design a perfect hashing algorithm that allows every unique key to have its own bucket, so guarantees O(1) time for insert/retrieve. However, in practice this is also quite unlikely, so ideally want to design for as even a spread across buckets as possible.


Performance

Due to the dependency on the implementation of the objects used as keys, and the data set, the worst case vs best case performance is varying.

Search/insert/remove - All these operations suffer the same problem - in best/average case performance these can be done in constant O(1) - However, the worst case (all elements in one bucket) the performance drops to linear O(n)

In practice, HashMaps are usually more efficient than search trees and other look ups, which is why they are very commonly used.

Tech cheat sheets - Stacks & queues

A Stack data structure is a Last-In-First-Out (LIFO) list.  There is a Stack<T> Interface, but the recommended Java structure is the Deque (another interface featuring an Array and Linked implementation).

A Queue data structure is simply the opposite, First-In-First-Out (FIFO) structure. The current Java recommendation is also to use the Deque (noramlly pronounced "deck" if you were interested,  and stands for Double-Ended Queue)

Having read the discussion of ArrayList vs LinkedList, many of the same considerations apply - but given the common use-pattern of stacks/queues, the different implementations make sense.


Deque

Java's Deque implements the Queue interface, and can be used as either a Queue or a Stack, offering methods appropriate for either use.


ArrayDeque vs LinkedDeque

Similar to ArrayList, the Array based implementation is the most popular, and, by-and-large the most recommended implementation to use.

Based on what we already know about ArrayList and LinkedLists, and what we know about Stack vs Queue behaviour, there would be a natural use for each (e.g. LinkedList seems like a good option Stack/LIFO - we can easily add to the list by adding new objects to the front of the list, and then popping objects off the stack by removing from the head of the List - both operations O(1) - compared to the cost of adding to the front of an ArrayList that requires a lot of copying ).

However, in the ArrayDeque implementation it is a circular array - so no copying is required and add/remove is a constant O(1), and the LinkedList implementation creates a very slight performance overhead by using additional memory creating "nodes" for each object in the list.






Tech cheat sheets - Lists & arrays

In Java, arrays and lists are an ordered collection of non-unique elements. They are probably one of the most common data structures you might have come across in your day-to-day programming.

Array vs List

In most cases, in Java you are more likely to use Lists over arrays - Lists provide more functionality as part of the API than the array does, so given the option, most people will use a list.

The two most common use cases for an array in java are:
  1. An array of primitive types - Java generics only supports object references. Although Java autoboxing reduces the need for this, as you can still insert int type values into List<Integer> for example.
  2. Micro optimization in performance critical systems

Most other cases, people generally use Lists, as the List interface offers more/convenient functionality, and also offers further control over type of List:


ArrayList

Array list is a simple array based implementation of the List interface.

Performance

add( T item ) - Adding a single element to an ArrayList using this add method will just add the element at the end of the list, which is very cheap - O(1)

add( T item, int index) - Adding a single element to an ArrayList at a specified position is less performant, as it needs to copy all elements to the right of the specified position, so this is more expensive and runs in linear time - O(n)

remove( T item ) - Similar to adding at a specified position, this is less performant  as it involves an array copy (plus, if we remove a specific Object rather than an item at a given position, it still potentially needs to access all items in the list). Again, linear time - O(n)

set/get( int index ) - This is very cheap in an arraylist, as it is just backed by an array, so can be looked up in constant O(1)


LinkedList

LinkedList in Java is a double linked list implementation of the List interface (e.g. every element in the list stores a pointer to the previous & next elements in the list - and these pointers are used to access/traverse the list).

Performance

add( T item ) - Adding a single element to a LinkedList using this add method will also just add the element at the end of the list same as the ArrayList, which is very cheap - O(1)

add( T item, int index) -Again, adding at a given position (** using this method!) is more expensive - Insertion in a linked list is cheap, as the pointers just need to be adjusted, however, you have to traverse the list to find the position, which puts you back into running in linear time - O(n)

remove( T item ) - Again, this method has the same performance/issues as the above add at position method, having to traverse the list to find the element to remove. So again, runs in linear time - O(n)

set/get( int index ) - As we need to traverse the list to find the position, it also needs to potentially traverse every element in the list, so runs at linear time O(n)


** As you will note from the above summary, ArrayList is better for get & set methods and equal performance for add remove methods. However, LinkedList does have the benefit of being able to use an Iterator to add/delete elements in constant O(1) time - e.g. if you are iterating a list and already at the position you wish to insert/delete then it is very cheap - see JavaDocs



Conclusion


By and large, for most simple List cases (not considering wanting to use Sets, Queues, Stacks etc) the most common choice is ArrayList as it offers, generally, the best performance.

If you know you need to have a very large list, and know that you will always be inserting new elements towards the head of the list, then LinkedList may be a better alternative - if you only ever insert at the start of a list, LinkedList will perform in constant O(1) time, where as ArrayList will perform O(n) time.

There are futher implementations of the List interface in Java, such as Stack, Vector.  I will look at some of those in different posts.