SlideShare a Scribd company logo
© Russell John Childs, PhD. Date: 2015-03-21. Algorithms for calculating order statistic.
Module K-STATISTIC Specifications
Description: Given an unsigned integer, k, and a set of arrays, S:={A}, the k smallest items in S shall be
found and returned.
Specifications:
K-STATISTIC-1: "A" shall be a sorted array of unsigned, non-duplicative integers.
K-STATISTIC-2: "S" shall be a set of sorted arrays, A.
K-STATISTIC-3: An unsigned integer, "k", shall be within the range .
K-STATISTIC-4: : "R" shall be a sorted array of unsigned integers.
K-STATISTIC-5: The k smallest elements, over all arrays, A, in the set S, shall be found and returned in R.
K-STATISTIC-6: T shall be the number of threads allocated to the module.
K-STATISTIC-7: The time-order-of-complexity of this module shall be assessed.
Interface Specifications:
K-STATISTIC-8: This module shall provide the interface:
template<typename Type>
vector<Type> k_statistic(const vector< vector<Type> >& S, unsigned k)
K-STATISTIC-8.1: k_statistic shall take the set, S:={A}, as an argument.
K-STATISTIC-8.2: k_statistic shall take, k, as an argument.
K-STATISTIC-8.3: k_statistic shall take a template parameter, Type, which shall map to an integer.
K-STATISTIC-9: k_statistic shall return the set R.
Data Specifications:
K-STATISTIC-10: Each sorted array, A, shall be no larger than 1MB ( bytes).
K-STATISTIC-11: "D" shall be an ordered dataset of unsigned integers.
K-STATISTIC-12: "D" shall be partitioned into the set S:={A} and A shall satisfy K-STATISTIC-10
Module K-STATISTIC Analysis
Multi-threaded Bucket Sort Analysis:
Let B be a bucket sort.
Let N be the number of items to be sorted.
Let M be the number of buckets in B.
Let be i-th bucket in B.
Let be the number of items in .
Items, , are placed in container, , in bucket , . Two cases arise:
(1) C is a vectorised, ordered linked-list, s.t. consecutive nodes belong to each thread, where T is
thread-count, and is multi-casted to all threads. Insertions and searches are : Copying to linear
memory is . Thus, for it is possible to achieve are insertion, search and copy. Thread-
overhead may make SIMD a more suitable choice.
(2) C is a non-vectorised ordered set:
Item insertion/search:
, where H is
entropy: .
Sort: . The entropy of the integers directly affects . A uniform
distribution has the lowest : . A -func has the highest: .
With one thread per array: ,
since each thread needs to process no more than from each array.
Multi-threaded Merge Sort Analysis:
Two sorted arrays are merged according to the following prescription:
Let be the value of the median element of the left-hand array, , where is the size.
Let be the position of in the right-hand array, , through a binary search.
Array is split about its mid-point , where .
Array is split about b, , where .
Sub-arrays are then recombined: , as the following diagram depicts:
Case 1: The number of threads, . The order of complexity for merging two arrays of equal size is
given by:
Given arrays, of equal size, and an infinite number of threads, , the arrays may be merged in pairs to
give:
For 1 billion arrays and a k-statistic of 1 billion, would be:
Case 2: The number of threads, , is finite. The expression for this case is ( threads for merge, for
reduction by pairs, ):
, NB:
.
Threads0
1
2
3
4
5
6Synchronisation of
pushes to task queue
Ordered array
Results (using: Visual Studio 2013, Intel Quad-Core 2.6 GHz i7-3720QM, 8 GB RAM, Dell Precision M4700 Mobile Workstation):
For 1, 2 and 4 threads, the vectorised bucket sort is around five times faster than successive, single-threaded std::merge operations.
Sadly, the performance decreases with thread-count, indicating thread-overhead is an issue. Performance-profiling with Intel VTune
Amplifier has not yet been undertaken, so the amount of time spent in locks, RFOs and memory fetches to cache is unknown.
The vectorised merge-sort is very slow. This may, again, be the result of thread-overhead and thread oversubscription.
Overall, it is surmised that this sort of vectorisation is better accomplished through SIMD or FPGAs, where each "thread" performs
relatively few operations and the thread-count can be far higher.
/**
* brief Algorithms for calculating Kth order statistic given multiple,
* sorted arrays. This compiles under Visual Studio 2013. As yet,
* code has not been ported to Eclipse and Linux. Code is untested
* but available for review.
* details This file contains a multi-threaded bucket sort and two methods.
* One of the methods uses the bucket sort to obtain Kth order
statistic and the other uses a merge-sort whose merge is
vectorised
* author Russell John Childs, PhD.
* date 2015-03-21
* copyright Russell John Childs, PhD.
*/
#include <vector>
#include <set>
#include <queue>
#include <chrono>
#include <thread>
#include <future>
#include <condition_variable>
#include <random>
#include <iostream>
#include <sstream>
/**
* @namespace Sort
* @brief The namespace for k statistics sorting
*/
namespace Sort
{
/** class
* @param Type. The type of the element in the array to be sorted
*
* @param BucketCount. The number of buckets for the sort
*
* @param Predicate. An overloaded method "unsigned operator()(const Type&)
* This must take a parameter of type const Type& and return a unique unsigned
* integer within [lower, upper] specified in this class's constructor.
*
*/
template< typename Type, unsigned BucketCount, typename Comp=std::less<Type>>
class BucketSort
{
public:
/**
* @param lower : unsigned. The lower value of the range.
*
* @param upper : unsigned. The upper value of the range.
*
* Notes: If an element lies outside the range it is ignored and not
* included in the final, sorted result
*/
BucketSort(unsigned lower, unsigned upper) :
m_lower(lower),
m_upper(upper),
m_normalisation(long double(BucketCount-1) / long double(upper - lower)),
m_size(0)
{
//Reset all the buckets
reset();
}
/**
* No operations specified.
*/
~BucketSort(void)
{
}
/**
* Emtpies the buckets
*/
void reset(void)
{
m_buckets.clear();
m_buckets.resize(BucketCount);
m_size = 0;
}
/**
* Sorts the passed array.
* @param arr : Type. The fixed array, of type Type, to be sorted
*
* Notes: This is a convenience function for handling fixed arrays.
* It is, at this time, unimplemented and does nothing.
*/
template<int Size, typename Pred>
void sort(Type (&arr) [Size], Pred)
{
//This method to be implemented at a later date.
}
/**
* Sorts the passed container.
* @param arr : const Container&. The conainer to be sorted.
*
* @param pred : Pred. A function "unsigned func(const Type& elem)" that
* returns the unique unsigned key associated with elem.
* Keys outside [lower, upper] cause elem to be skipped.
*
* Notes: (1) Container must provide method const Type& operator[](unsigned).
* (2) The result is returned by Type& operator[](unsigned) and
* get_result() This allows a succession of arrays to be passed to
* sort() before the result is obtained.
* (3) sort() may be terminated by pred() returning val > arr.size().
* (4) This implementation uses locks. Conversion to lock-free is
* relatively straightforward, but requires substantially more
* testing, due to subtle bugs that arise with lock-free.
*/
template< typename Container, typename Pred >
void sort(const Container &arr, Pred pred )
{
//Get array size
unsigned size = arr.size();
//Add each each source element to sorted list
for (unsigned i = 0; i < size; ++i)
{
unsigned key = pred((arr[i]));
if (m_lower <= key && key <= m_upper)
{
unsigned index = unsigned((key - m_lower)*m_normalisation);
{
std::lock_guard<std::mutex> lck(m_mutexes[index]);
m_buckets[index].insert(arr[i]);
//Update total element count;
++m_size;
}
}
else if (key == -int(1))
{
//predicate has signalled that sort should terminate
i = key-1;
}
}
}
/**
* Returns an element in the sorted array
* @param index : unsigned. The index of the element in the sorted array.
*
* @return const std::vector<Type>&. The list of sorted objects.
* Notes: 1. This method will be provided at a future date. It requires that
* BucketSort utilise a linear, contiguous buffer for its buckets
* allowing for O(1) retrieval of an element. It is, at this time,
* not available to the user, who must use the far less efficient
* mechanism "get_result()[index]".
*/
Type& operator[](unsigned index)
{
}
/**
* Returns a vector containing the sorted elements
*
* @param k : unsigned. Number of sorted elements to return. Default = All.
* @ return : const std::vector<Type>& . The sorted elements
*
*/
const std::vector<Type>& get_result(unsigned k=0)
{
unsigned size = k;
if (k == 0)
{
//Get total number of elements
for (auto& bucket : m_buckets)
{
size += bucket.size();
}
}
//Resize result vector
m_result.resize( size );
//Store sorted result
unsigned index = 0;
for (auto& bucket : m_buckets)
{
if (index < k)
{
auto lim = std::min<unsigned>(bucket.size(), k - index);
auto iter = bucket.begin();
for (unsigned i = 0; i < lim; ++i)
{
m_result[index++] = *iter++;
}
}
}
//Return sorted result
return m_result;
}
/**
* Finds a specified element
* @param in : const Type&. The element sought.
*
* @param advance : int. For the specified element, finds the element that
* is "advance" elements before (if advance < 0) or
* after (if advance > 0) the specified element "in".
*
* @return typename std::set<Type>::iterator . An iterator to the element,
* if present, or
end().
*
* Notes: This implementation uses locks. Conversion to lock-free is
* relatively straightforward, but requires substantially more
* testing, due to subtle bugs that arise with lock-free.
*/
template<typename Pred>
typename std::set<Type>::iterator find(const Type& in,
Pred pred, int
advance)
{
//Get bucket bounds and bucket index
int bounds = BucketCount - 1;
unsigned key = pred(in);
int index = unsigned((key - m_lower)*m_normalisation);
//Get beginning and end of the bucket table
std::unique_lock<std::mutex> lck(m_mutexes[0]);
auto begin = m_buckets[0].begin();
lck.unlock();
std::unique_lock<std::mutex> lck1(m_mutexes[bounds]);
auto end = m_buckets[bounds].end();
lck1.unlock();
//Create return var
typename std::set<Type, Comp>::iterator ret_val;
bool is_not_found = index > bounds;
if (is_not_found == false )
{
std::lock_guard<std::mutex> lck(m_mutexes[index]);
ret_val = m_buckets[index].find(in);
if (ret_val == m_buckets[index].end())
{
ret_val = end;
is_not_found = true;
}
}
else
{
//Out of bounds
ret_val = end;
}
//Increment iterator whilst within bounds
while (is_not_found == false && advance > 0)
{
std::unique_lock<std::mutex> lck(m_mutexes[index]);
if (ret_val != m_buckets[index].end())
{
//Increment if within bounds of current bucket
++ret_val;
--advance;
}
else if (++index <= bounds)
{
//If within bounds of table, get start of next bucket
lck.unlock();
std::lock_guard<std::mutex> lck1(m_mutexes[index]);
ret_val = m_buckets[index].begin();
}
else
{
//Out-of-bounds
ret_val = end;
is_not_found = true;
}
}
//Decrement iterator whilst within bounds
while (is_not_found == false && advance < 0)
{
std::unique_lock<std::mutex> lck(m_mutexes[index]);
if (ret_val != m_buckets[index].begin())
{
//Decrement if within bounds of current bucket
--ret_val;
++advance;
}
else if (--index >= 0)
{
//If within bounds of table, go to eof prev bucket
lck.unlock();
std::lock_guard<std::mutex> lck1(m_mutexes[index]);
//auto in = m_buckets[index];
//ret_val = in.begin() == in.end() ? in.end() : in.end()--;
ret_val = m_buckets[index].end();
}
else
{
//Out-of-bounds
ret_val = begin;
is_not_found = true;
}
}
return ret_val;
}
/**
* Returns total count of elements in bucket sort
*
* @return unsigned . Total count of elements.
*/
unsigned size(void)
{
return m_size;
}
private:
unsigned m_lower;
unsigned m_upper;
long double m_normalisation;
unsigned m_size;
std::vector<std::set<Type, Comp>> m_buckets;
std::mutex m_mutexes[BucketCount];
std::vector<Type> m_result;
};
template<typename T= unsigned> using Element=std::pair<unsigned, T>;
template<typename T> using ElementArray = std::vector<Element<T>>;
template<typename T> using ElementArrays = std::vector<ElementArray<T>>;
/** k_statistic
* @param arrays : ElementArrays. The sorted arrays to be merged
*
* @param k : unsigned. The k-statistic, i.e. the k smallest elements in "arrays".
*
* Notes: (1) At least one input array must have sizeof >= k. If necessary,
* pad the 1st array with value > k-th smallest.
* (2 )Function uses bucket sort to keep track of k smallest elements
* so far. For each array, if an element is larger than the largest
* of the k smallest found so far, then all later elements are
* discarded. Otherwise, the set of k smallest elements is updated
* with the new element.
*/
namespace KStatisticBucketSort
{
template< typename T >
const ElementArray<T> k_statistic(ElementArrays<T>& arrays, unsigned k)
{
std::atomic<unsigned> last_found;
//Find an array, A, with at least k elements
unsigned index = 0;
while (arrays[index++].size() < k && index < arrays.size());
unsigned upper = last_found = arrays[index - 1][k - 1].first;
//Create a buacket sort shared amongst thread.
//TODO: Remove magic number (100) and use dynamically sized buckets
struct Comp{
bool operator()(const Element<T>& lhs, const Element<T>& rhs )
{
return lhs.first < rhs.first;
}
};
BucketSort<Element<T>, 2000, Comp> bucket_sort(0, upper);
//Add A as the base case
{
auto pred = [&](const Element<T>& elem){ return elem.first; };
bucket_sort.sort(arrays[index - 1], pred);
}
unsigned start_array = index - 1;
//Create a predicate for the bucket sort
auto pred = [&](const Element<T>& elem)
{
//Update last element in list of k smallest found so far
unsigned old_val = last_found; //last of the k smallest
unsigned new_val = old_val; //new val for last of k smallest
bool stop_processing=false; //Flag to continue or discontinue
do
{
old_val = last_found;
new_val = old_val;
stop_processing = false;
if (bucket_sort.size() < k)
{
new_val =std::max<unsigned>( elem.first, old_val );
}
else if (elem.first > old_val)
{
//Simply stop processing array if elem > max(k-smallest)
new_val = old_val;
stop_processing = true;
}
else
{
//Add elem and update max(k-smallest)
auto tmp = [&](const Element<T>& in){ return in.first; };
new_val = std::max<unsigned>(elem.first, bucket_sort.find(
std::make_pair(old_val, elem.second), tmp, -1)->first);
}
} while (last_found.compare_exchange_weak(old_val, new_val) == false
&&
stop_processing == false );
return stop_processing == false ? elem.first : -int(1);
};
//Create a thread function that adds a new array to the bucket sort
index = 0;
std::atomic<bool> start_thread = false;
std::atomic<unsigned> pop_count = 0;
auto add_array = [&]( void )
{
//Wait for start signal
while (start_thread == false);
//Loop over arrays "popping" each one processed
unsigned old_count = pop_count;
unsigned new_count = old_count + 1;
while (old_count < arrays.size())
{
//Claim an array, by capturing pop count and incrmeenting
while (pop_count.compare_exchange_weak(old_count, old_count+1)
== false);
//Check pop count is within bounds and not the starting array
if (old_count != start_array && old_count < arrays.size())
{
bucket_sort.sort(arrays[old_count], pred);
}
}
};
//Add arrays to the bucket sort, limit number of threads (4=magic num,
//but this is only proof-of-concept code
unsigned thread_limit = 4;
unsigned thread_count = 0;
std::vector< std::future<void> >results;
for (auto& arr : arrays)
{
//Add array to sort
results.push_back(
std::async(std::launch::async, add_array));
//If thread lim reached:
if (thread_count > thread_limit)
{
//Wait for existing threads to finish
start_thread = true;
for (auto& res : results)
{
res.wait();
}
//Reset limit checks
thread_count = 0;
results.clear();
}
++thread_count;
}
//Start threads and wait for results
start_thread = true;
for (auto& res : results)
{
res.wait();
}
//Extract k-th order statistic from bucket table
return bucket_sort.get_result(k);
}
}
/** k_statistic
* @param arrays : ElementArrays. The sorted arrays to be merged
*
* @param k : unsigned. The k-statistic, i.e. the k smallest elements in "arrays".
*
* Notes: (1) Function uses merge sort to find smallest elements. The merge is vectorised.
* (2) The implementation of this method is in a STATE OF FLUX.
* Focussed on stress-testing Bucket-Sort algorithm
*/
namespace KStatisticMergeSort
{
template<typename T>
const ElementArray<T> k_statistic(ElementArrays<T>& arrays, unsigned k)
{
//std::atomic isn't copy-constructible, so need to wrap it for std::vec
//Very annoying. Naughty C++ committee.
struct AtomicUnsigned
{
AtomicUnsigned(void) : m_val(new std::atomic<unsigned>(0))
{
}
~AtomicUnsigned(void)
{
delete m_val;
}
void operator=(unsigned i)
{
*m_val = i;
}
void operator+=(unsigned i)
{
unsigned old = *m_val;
while (m_val->compare_exchange_weak(old, old + i) == false);
}
operator unsigned(void)
{
return *m_val;
}
std::atomic<unsigned>* m_val;
};
//EOF annoying code-bloat
//Create sync id for each queue
std::vector<AtomicUnsigned> sync_id;
//Struct to hold id, used to sync pushes to queue,
//and the boundaries of the left and right array chunks to merge
struct QueueData
{
unsigned m_sync_id;
unsigned m_generations_skipped;
typename ElementArray<T>::const_iterator m_lhs_first;
typename ElementArray<T>::const_iterator m_lhs_last;
typename ElementArray<T>::const_iterator m_rhs_first;
typename ElementArray<T>::const_iterator m_rhs_last;
};
//Create task queues. One queue for each array-pair. Length will halve
//with each iteration: N/2 -> N/4 -> N/8 ... -> 1 merged array
std::vector<std::queue<QueueData>> queues;
//Create array of sorted counts (2*k sorted elems => done!)
std::vector<AtomicUnsigned> sorted_count;
//Create pneding task count (1 thread spawned per task up to thr limit)
std::atomic<int> pending_tasks(0);
std::vector<AtomicUnsigned> queue_tasks;
//Create merge function
std::vector<std::mutex> mut(arrays.size()); //This function should be made lock-free
//Lambda to extract queue results to an array
std::function<void(unsigned)> merge_pair = [&](unsigned i)
{
//Flag to indicate pair is merged
bool done = false;
//Lock queue and peek 1st task
std::unique_lock<std::mutex> lck(mut[i]);
QueueData q_data = queues[i].front();
//Check for termination condition (2*k sorted items)
if (sorted_count[i] == k << 1)
{
//Check final tree level fully populated
//Level of node
unsigned node_level =
static_cast<unsigned>(std::log2(q_data.m_sync_id+1)+1);
//Depth of tree
unsigned last_level =
static_cast<unsigned>(std::log2(sync_id[i] + 1) + 1);
//Node is at tree depth?
done = node_level == last_level;
}
//Terminate if finished
if (done == true)
{
lck.unlock();
}
//Only process queue[i] if we have not reached last node in last level
else
{
//Get latest chunk from queue
queues[i].pop();
lck.unlock();
//Get middle of lhs
//-------------------- -------------
//| | M | | OR | M | |
//------------------- -------------
auto lhs_middle = q_data.m_lhs_first +
(q_data.m_lhs_last - q_data.m_lhs_first) / 2;
//Get "middle" of rhs(insertion point of M)
//--------------------
//| |M'<M | I>=M |
//--------------------
auto rhs_middle = std::upper_bound(q_data.m_rhs_first,
q_data.m_rhs_last, *lhs_middle);
//Create (lhs.lower_half, rhs.lower_half).
// ( [beg, mid) , [beg, I) )
QueueData left
{
q_data.m_sync_id * 2 + 1,
q_data.m_generations_skipped,
q_data.m_lhs_first,
lhs_middle,//NB: first==mid implies [mid,mid), i.e. "empty"
q_data.m_rhs_first,
rhs_middle
};
//Create (lhs.upper_half, rhs.upper_half).
// ( [mid,last+1) , [I, last+1) )
QueueData right
{
q_data.m_sync_id * 2 + 2,
q_data.m_generations_skipped,
lhs_middle,
q_data.m_lhs_last,
rhs_middle,
q_data.m_rhs_last
};
//Termination conditions
//1. Prev gen lhs sorted(q_data.m_lhs_first==q_data.m_lhs_last)
//2. All data >= lhs.mid ---> left=empty
if ( right.m_generations_skipped > 0 ||
q_data.m_lhs_first == q_data.m_lhs_last ||
/*q_data.m_rhs_first == q_data.m_rhs_last ||*/
(left.m_lhs_first == left.m_lhs_last &&
left.m_rhs_first == left.m_rhs_last) )
{
//Store in + just keep rhs branch
right = q_data;
right.m_sync_id = q_data.m_sync_id * 2 + 2;
//Keep track of tree levels skipped in bifurcation process
++right.m_generations_skipped;
//Update count of sorted elements
if (right.m_generations_skipped == 1)
{
sorted_count[i].m_val->fetch_add(
((q_data.m_lhs_last - q_data.m_lhs_first) +
(q_data.m_rhs_last - q_data.m_rhs_first)) );
}
}
//Wait for synchronisation value
// / 
// / 
// /  /  wait for right-2 (0 gens skipped)
// / / /  wait for right-2 (1 gen skipped)
// //// //  wait for right-4 (2 gens skipped)
unsigned skipped = std::max<unsigned>(right.m_generations_skipped, 1);
unsigned prev_id;
do
{
prev_id = right.m_sync_id - (1 << skipped);
//std::this_thread::sleep_for(std::chrono::microseconds(1));
std::this_thread::yield();
} while (sync_id[i].m_val->compare_exchange_weak(prev_id, prev_id)
== false);
//Only push lhs if generation not skipped
lck.lock();
if (right.m_generations_skipped == 0)
{
//Push chunk to q and update task count
queues[i].push(left);
auto& task = *(queue_tasks[i].m_val);
task++;
++pending_tasks;
}
//Push rhs chunk, update task count & sync id
queues[i].push(right);
auto& task = *(queue_tasks[i].m_val);
task++;
++pending_tasks;
sync_id[i] = right.m_sync_id;
}
};
//Lambda for performing container.begin()+n
auto advance = [&]( const ElementArray<T>& a,
typename ElementArray<T>::const_iterator it, unsigned ind)
{
std::advance(it, std::min<unsigned>(ind, a.end()-it));
return it;
};
//Lambda to push array pairs onto queues
auto push_pairs = [&](const ElementArrays<T>& arrs)
{
//Resize queues, task counts, sorted counts, sync ids
auto size = arrs.size();
queues.clear();
queues.resize(size / 2);
queue_tasks.clear();
queue_tasks.resize(size / 2);
sorted_count.clear();
sorted_count.resize(size/2);
sync_id.clear();
sync_id.resize(queues.size());
//oop over array pairs
unsigned count = 0;
for (unsigned i = 0; i < (size>>1)<<1; i += 2)
{
//Populate thread id, gens skipped, start/end of lhs of pair,
//start/end rhs of pair. NB interval = [beg,end) = [0,1,..,k-1,k)
QueueData data
{
0,
0,
arrs[i].begin(),
advance(arrs[i], arrs[i].cbegin(), k),
arrs[i + 1].cbegin(),
advance(arrs[i + 1], arrs[i + 1].cbegin(), k)
};
//Push chunk to q and update task count
queues[count].push(data);
auto& tmp = *(queue_tasks[count++].m_val);
tmp++;
++pending_tasks;
}
};
//Push original arrays
push_pairs(arrays);
//Lambda to extract queue results to an array
auto extract = [&](ElementArrays<T>& arrs)
{
unsigned count = 0;
//Lopp over queues
for (auto& q : queues)
{
ElementArray<T> arr;
while (q.empty() == false)
{
//Get start/end of chunk range
auto pair = q.front();
q.pop();
//Only extract if range not empty and elems <= k
if (pair.m_lhs_first != pair.m_lhs_last)
{
std::copy(pair.m_lhs_first, pair.m_lhs_last,
std::back_inserter(arr));
}
//Only extract if range not empty and elems <= k
if (pair.m_rhs_first != pair.m_rhs_last)
{
std::copy(pair.m_rhs_first, pair.m_rhs_last,
std::back_inserter(arr));
}
}
//Return sorted array
arr.resize(k);
arrs.push_back(arr);
}
};
//This section attempts to balance workload evenly across available threads.
//
//For each array-pair in turn, priority is given to vectorising the merge
//of each pair.
//
//Array-pairs are, successively, processed in parallel, until all threads
//are consumed in the vectorised merges of the pairs processed so far.
//
//As each vectorised merge bifructaes (1->2->4->8...) it will consume more
//threads until all threads are being used to vectorise merges. When this
//point is reached, array-pair parallel processes ceases, since all threads
//are vectorising the existing merges.
//
//The aim is to minimise the latency of the merge in the hope that this
//minimises the latency associated with merging all the array-pairs.
//
//Storage for results (toggle between ret_val[0<->1], so ret_val[a] has prev
//results ret_val[b] has new results in seq N/2 -> N/4 -> N/8
ElementArrays<T> ret_val[2];
unsigned toggle = 0;
//Vector of futures returned by tasks
std::vector<std::future<void>> results;
//Thread limit and count. Magic number, since code=proof-of-concept only
unsigned thread_limit = 1024;
unsigned thread_count = 0;
//Flag to indicate that sorting is complete
bool done;
bool once = true;
do
{
done = true;
//Keep processing any tasks on the queues
while (pending_tasks > 0)
{
//Loop over queueus
unsigned q = 0;
for (auto& queue : queues)
{
//Process tasks for this queue
std::unique_lock<std::mutex> lck(mut[q]);
unsigned num = queue_tasks[q];
for (unsigned task = 0; task < num; ++task)
{
//Limit num threads spawned
if (thread_count < thread_limit)
{
//Spawn 1 thrd/task and update task count
results.push_back(std::async(std::launch::async,
merge_pair,
q));
--pending_tasks;
auto& tmp = *(queue_tasks[q].m_val);
tmp--;
++thread_count;
}
}
++q;
lck.unlock();
}
//Wait for current tasks to finish spawning new tasks
for (auto& res : results)
{
res.wait();
--thread_count;
}
results.clear();
}
//Extract results
auto& old_ret_val = ret_val[toggle];
toggle = ++toggle % 2;
ret_val[toggle].clear();
if (once && arrays.size()%2 != 0)
{
ret_val[toggle].push_back(arrays[arrays.size() - 1]);
}
once = false;
if (old_ret_val.size() % 2 != 0)
{
ret_val[toggle].push_back(old_ret_val[old_ret_val.size() - 1]);
}
extract(ret_val[toggle]);
//Clear queues and add extracted results
if (ret_val[toggle].size() > 1)
{
queues.clear();
push_pairs(ret_val[toggle]);
done = false;
}
} while (done == false);
//Return sorted results
return ret_val[toggle][0];
}
}
}
namespace TestSort
{
/**
* @brief Creates random number of randomly-sized, ordered arrays of random numbers.
*
* @param k : unsigned k-th order statisitc.
*
* @param max_num_arrays : unsigned . Max number of arrays to be sorted.
*
* @param max_size_of_arrays : unsigned . Max number of elements in array.
*
* @param out : Sort::ElementArrays<unsigned>& . The arrays to be returned.
*
*/
template<typename T>
void get_random_arrays(unsigned k, unsigned max_num_arrays,
unsigned max_size_of_array, Sort::ElementArrays<T>& out)
{
//Random generator (replace 0 with dev() for random seed)
std::random_device dev;
std::mt19937 generator(0/*dev()*/);
Sort::ElementArrays<unsigned> element_arrays;
//Create random number for number of arrays
std::uniform_int_distribution<T> arrays_rnd(2, max_num_arrays);
unsigned num_arrays = arrays_rnd(generator);
//Create arrays
std::set<unsigned> s;
for (unsigned i = 0; i < num_arrays; ++i)
{
//Create radnom number for sizeof array
Sort::ElementArray<T> element_array;
std::uniform_int_distribution<T> elements_rnd(k, max_size_of_array);
unsigned num_elements = elements_rnd(generator);
//Create random numbers for elements and add to array
std::uniform_int_distribution<T> element_rnd(0, 1<<31);
for (unsigned j = 0; j < num_elements; ++j)
{
T elem = element_rnd(generator);
/*bool res = s.insert(elem).second;
if (res == false)
{
//std::cout << "duplicate removed" << std::endl;
j = j-1;
}
else*/
{
element_array.push_back(std::make_pair(elem, i));
}
}
//Sort the arrays and return them
std::sort(element_array.begin(), element_array.end());
out.push_back(element_array);
}
}
/**
* @brief Creates random number of randomly-sized, ordered arrays of random numbers.
* Adds all of these arrays to the sorting algorithm and validates the result
* against the known result obtained by adding arrays to a sorted set.
*
* @param k : unsigned k-th order statisitc.
*
* @param max_num_arrays : unsigned . Max number of arrays to be sorted.
*
* @param max_size_of_arrays : unsigned . Max number of elements in array.
*
* @return bool : Pass = true.
*
*/
bool test_bucket(unsigned k, unsigned max_num_arrays,
unsigned max_size_of_array)
{
//Declare arrays
Sort::ElementArrays<unsigned> test_sort;
//Print banner
std::cout << "Creating up to " << max_num_arrays
<< " arrays, each of size <= "
<< max_size_of_array
<< ". Please wait ..." << std::endl;
//Create a set of ordered, random integer arrays
get_random_arrays<unsigned>(k, max_num_arrays,
max_size_of_array, test_sort);
//Print banner
unsigned ave = 0;
for (auto& arr : test_sort) ave += arr.size();
ave /= test_sort.size();
std::cout << "Testing bucket algorithm with: Num arrays = "
<< test_sort.size()
<< ", average size = "
<< ave
<< ", k = " << k
<< ". Please wait ... " << std::endl;
//Run sorting algorithm
auto start = std::chrono::high_resolution_clock::now();
auto result = Sort::KStatisticBucketSort::k_statistic(test_sort, k);
auto end = std::chrono::high_resolution_clock::now();
//Print banner
std::cout << "Algorithm execution completed, after "
<< std::chrono::duration<double, std::milli>(end - start).count()
<< " ms." << std::endl;
//Run STL merge
//Print banner
std::cout << "Running std::merge. Please wait ..." << std::endl;
//Start timer
auto start_1 = std::chrono::high_resolution_clock::now();
//Merge (out[0]=arr1)+arr2 -> out[1], out[1]+arr2 -> out[0], out[0]+arr3 -> out[1] ....
std::vector<Sort::Element<unsigned>> output[2];
output[0].insert(output[0].begin(),
test_sort[0].begin(), test_sort[0].begin() + k);
output[0].resize(2 * k);
output[1].resize(2 * k);
unsigned toggle;
for (unsigned i = 1; i < test_sort.size(); ++i)
{
toggle = (i - 1) % 2;
std::merge(output[toggle].begin(), output[toggle].begin() + k,
test_sort[i].begin(), test_sort[i].begin() + k,
output[(toggle + 1) % 2].begin());
}
//Resize back to k
output[(toggle + 1) % 2].resize(k);
auto& validate = output[(toggle + 1) % 2];
//stop timer
auto end_1 = std::chrono::high_resolution_clock::now();
//Print banner
std::cout << "std::merge execution completed, after "
<< std::chrono::duration<double, std::milli>(end_1 - start_1).count()
<< " ms." << std::endl;
std::cout << "Validating results. Please wait ..."
<< std::endl;
//Validate that values extracted from set == result from algorithm
return result == validate;
}
bool test_merge(unsigned k, unsigned max_num_arrays,
unsigned max_size_of_array)
{
using namespace Sort;
//Declare arrays
Sort::ElementArrays<unsigned> test_sort;
//Print banner
std::cout << "Creating up to " << max_num_arrays
<< " arrays, each of size <= "
<< max_size_of_array
<< ". Please wait ..." << std::endl;
//Preliminary debugging tests.
/*test_sort.push_back(
{ { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 },
{ 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 },
{ 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } });
test_sort.push_back(
{ { 40, 40 }, { 42, 42 }, { 44, 44 }, { 46, 46 }, {48, 48 }, { 410, 410 },
{ 412, 412 }, { 414, 414 }, { 416, 416 }, { 418, 418 }, { 420, 420 },
{ 422, 422 }, { 424, 424 }, { 426, 426 }, { 428, 428 }, { 430, 430 } });*/
/*test_sort.push_back(
{ { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 },
{ 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 },
{ 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } });*/
/*test_sort.push_back(
{ { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 },
{ 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 },
{ 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } });
test_sort.push_back(
{ { 1, 1 }, { 3, 3 }, { 5, 5 }, { 7, 7 }, { 9, 9 }, { 11, 11 },
{ 13, 13 }, { 15, 15 }, { 17, 17 }, { 19, 19 }, { 21, 21 },
{ 23, 23 }, { 25, 25 }, { 27, 27 }, { 29, 29 }, { 31, 31 } });
test_sort.push_back(
{ { 1, 1 }, { 3, 3 }, { 5, 5 }, { 7, 7 }, { 9, 9 }, { 11, 11 },
{ 13, 13 }, { 15, 15 }, { 17, 17 }, { 19, 19 }, { 21, 21 },
{ 23, 23 }, { 25, 25 }, { 27, 27 }, { 29, 29 }, { 31, 31 } });
test_sort.push_back(
{ { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 },
{ 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 },
{ 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } });*/
/*
for (unsigned i = 0; i < 4; ++i)
{
Sort::ElementArray<unsigned> arr;
for (unsigned j = 0; j < 2 * k; ++j)
{
arr.push_back(std::make_pair(i + (j * 4), i + (j * 4)));
}
test_sort.push_back(arr);
}
test_sort.pop_back();*/
//Randomised strees-test.
get_random_arrays<unsigned>(k, max_num_arrays, max_size_of_array, test_sort);
//Print banner
unsigned ave = 0;
for (auto& arr : test_sort) ave += arr.size();
ave /= test_sort.size();
std::cout << "Testing merge algorithm with: Num arrays = "
<< test_sort.size()
<< ", average size = "
<< ave
<< ", k = " << k
<< ". Please wait ... " << std::endl;
//Run sorting algorithm
auto start = std::chrono::high_resolution_clock::now();
auto result = Sort::KStatisticMergeSort::k_statistic(test_sort, k);
auto end = std::chrono::high_resolution_clock::now();
//Print banner
std::cout << "Algorithm execution completed, after "
<< std::chrono::duration<double, std::milli>(end - start).count()
<< " ms." << std::endl;
//Run STL merge
//Print banner
std::cout << "Running std::merge. Please wait ..." << std::endl;
//Start timer
auto start_1 = std::chrono::high_resolution_clock::now();
//Merge (out[0]=arr1)+arr2 -> out[1], out[1]+arr2 -> out[0], out[0]+arr3 -> out[1] ....
std::vector<Sort::Element<unsigned>> output[2];
output[0].insert(output[0].begin(),
test_sort[0].begin(), test_sort[0].begin() + k);
output[0].resize(2 * k);
output[1].resize(2 * k);
unsigned toggle;
for (unsigned i = 1; i < test_sort.size(); ++i)
{
toggle = (i - 1) % 2;
std::merge(output[toggle].begin(), output[toggle].begin() + k,
test_sort[i].begin(), test_sort[i].begin() + k,
output[(toggle + 1) % 2].begin());
}
//Resize back to k
output[(toggle + 1) % 2].resize(k);
auto& validate = output[(toggle + 1) % 2];
//stop timer
auto end_1 = std::chrono::high_resolution_clock::now();
//Print banner
std::cout << "std::merge execution completed, after "
<< std::chrono::duration<double, std::milli>(end_1 - start_1).count()
<< " ms." << std::endl;
std::cout << "Validating results. Please wait ..."
<< std::endl;
//Validate that values extracted from set == result from algorithm
return result == validate;
}
}
int main(void)
{
bool result_bucket = TestSort::test_bucket(1000, 1000, 10000);
std::cout << "Bucket Sort Algorithm: " << (result_bucket ? "Passed." : "Failed.") <<
std::endl;
std::cout << std::endl;
bool result_merge = TestSort::test_merge(100, 500, 10000);
std::cout << "Merge Sort Algorithm: " << (result_merge ? "Passed." : "Failed.") <<
std::endl;
return 0;
}

More Related Content

What's hot (20)

PPTX
C++11 - STL Additions
GlobalLogic Ukraine
 
PPTX
Dynamic memory allocation in c++
Tech_MX
 
PPT
JavaScript Objects
Reem Alattas
 
PPT
JavaScript Functions
Reem Alattas
 
PPTX
C++11 Multithreading - Futures
GlobalLogic Ukraine
 
PPT
Javascript built in String Functions
Avanitrambadiya
 
PDF
Memory Management C++ (Peeling operator new() and delete())
Sameer Rathoud
 
PDF
Memory Management with Java and C++
Mohammad Shaker
 
PDF
WebGL 2.0 Reference Guide
The Khronos Group Inc.
 
PPTX
L14 string handling(string buffer class)
teach4uin
 
PPTX
Chapter 2
application developer
 
PPTX
classes & objects in cpp overview
gourav kottawar
 
PPTX
POLITEKNIK MALAYSIA
Aiman Hud
 
PDF
N-Queens Combinatorial Problem - Polyglot FP for Fun and Profit - Haskell and...
Philip Schwarz
 
PDF
Why Haskell
Susan Potter
 
PPTX
Lecture08 stacks and-queues_v3
Hariz Mustafa
 
PPTX
Lecture07 the linked-list_as_a_data_structure_v3
Hariz Mustafa
 
PPT
C1320prespost
FALLEE31188
 
PPTX
iOS Session-2
Hussain Behestee
 
PPTX
Pointer in C++
Mauryasuraj98
 
C++11 - STL Additions
GlobalLogic Ukraine
 
Dynamic memory allocation in c++
Tech_MX
 
JavaScript Objects
Reem Alattas
 
JavaScript Functions
Reem Alattas
 
C++11 Multithreading - Futures
GlobalLogic Ukraine
 
Javascript built in String Functions
Avanitrambadiya
 
Memory Management C++ (Peeling operator new() and delete())
Sameer Rathoud
 
Memory Management with Java and C++
Mohammad Shaker
 
WebGL 2.0 Reference Guide
The Khronos Group Inc.
 
L14 string handling(string buffer class)
teach4uin
 
classes & objects in cpp overview
gourav kottawar
 
POLITEKNIK MALAYSIA
Aiman Hud
 
N-Queens Combinatorial Problem - Polyglot FP for Fun and Profit - Haskell and...
Philip Schwarz
 
Why Haskell
Susan Potter
 
Lecture08 stacks and-queues_v3
Hariz Mustafa
 
Lecture07 the linked-list_as_a_data_structure_v3
Hariz Mustafa
 
C1320prespost
FALLEE31188
 
iOS Session-2
Hussain Behestee
 
Pointer in C++
Mauryasuraj98
 

Similar to Algorithms devised for a google interview (20)

PPTX
ch16.pptx
lordaragorn2
 
PPTX
ch16 (1).pptx
lordaragorn2
 
PPTX
Time and Space Complexity Analysis.pptx
dudelover
 
PDF
An Introduction to Part of C++ STL
乐群 陈
 
PPT
Tri Merge Sorting Algorithm
Ashim Sikder
 
PPTX
Object Oriented Programming Using C++: C++ STL Programming.pptx
RashidFaridChishti
 
PPTX
L2_DatabAlgorithm Basics with Design & Analysis.pptx
dpdiyakhan
 
PPT
free power point ready to download right now
waroc73256
 
PPT
ee220s02lec9.ppt ghggggggggggggggggggggggg
ahmadusmani321
 
PPT
standard template library(STL) in C++
•sreejith •sree
 
PPTX
Parallel Sorting Algorithms. Quicksort. Merge sort. List Ranking
SukhrobAtoev2
 
PPTX
Merge sort analysis and its real time applications
yazad dumasia
 
PPTX
C++ STL (quickest way to learn, even for absolute beginners).pptx
Abhishek Tirkey
 
PPTX
C++ STL (quickest way to learn, even for absolute beginners).pptx
GauravPandey43518
 
PPTX
Merge radix-sort-algorithm
Rendell Inocencio
 
PPTX
Merge radix-sort-algorithm
Rendell Inocencio
 
PDF
Sorting and Searching Techniques
Prof Ansari
 
PPTX
Review to the data structure and algorithm
lochanraj1
 
ch16.pptx
lordaragorn2
 
ch16 (1).pptx
lordaragorn2
 
Time and Space Complexity Analysis.pptx
dudelover
 
An Introduction to Part of C++ STL
乐群 陈
 
Tri Merge Sorting Algorithm
Ashim Sikder
 
Object Oriented Programming Using C++: C++ STL Programming.pptx
RashidFaridChishti
 
L2_DatabAlgorithm Basics with Design & Analysis.pptx
dpdiyakhan
 
free power point ready to download right now
waroc73256
 
ee220s02lec9.ppt ghggggggggggggggggggggggg
ahmadusmani321
 
standard template library(STL) in C++
•sreejith •sree
 
Parallel Sorting Algorithms. Quicksort. Merge sort. List Ranking
SukhrobAtoev2
 
Merge sort analysis and its real time applications
yazad dumasia
 
C++ STL (quickest way to learn, even for absolute beginners).pptx
Abhishek Tirkey
 
C++ STL (quickest way to learn, even for absolute beginners).pptx
GauravPandey43518
 
Merge radix-sort-algorithm
Rendell Inocencio
 
Merge radix-sort-algorithm
Rendell Inocencio
 
Sorting and Searching Techniques
Prof Ansari
 
Review to the data structure and algorithm
lochanraj1
 
Ad

More from Russell Childs (20)

PDF
spinor_quantum_simulator_user_guide_.pdf
Russell Childs
 
PDF
String searching o_n
Russell Childs
 
PDF
String searching o_n
Russell Childs
 
PDF
String searching o_n
Russell Childs
 
PDF
String searching
Russell Childs
 
PDF
Permute
Russell Childs
 
PDF
Permute
Russell Childs
 
PDF
Feature extraction using adiabatic theorem
Russell Childs
 
PDF
Feature extraction using adiabatic theorem
Russell Childs
 
PDF
Wavelets_and_multiresolution_in_two_pages
Russell Childs
 
PDF
Relativity 2
Russell Childs
 
PDF
Recursion to iteration automation.
Russell Childs
 
PDF
Dirac demo (quantum mechanics with C++). Please note: There is a problem with...
Russell Childs
 
PDF
Shared_memory_hash_table
Russell Childs
 
PDF
Full resume dr_russell_john_childs_2016
Russell Childs
 
PDF
Simple shared mutex UML
Russell Childs
 
PDF
Design pattern to avoid downcasting
Russell Childs
 
PDF
Interview uml design
Russell Childs
 
PDF
Full_resume_Dr_Russell_John_Childs
Russell Childs
 
PDF
Dynamic programming burglar_problem
Russell Childs
 
spinor_quantum_simulator_user_guide_.pdf
Russell Childs
 
String searching o_n
Russell Childs
 
String searching o_n
Russell Childs
 
String searching o_n
Russell Childs
 
String searching
Russell Childs
 
Feature extraction using adiabatic theorem
Russell Childs
 
Feature extraction using adiabatic theorem
Russell Childs
 
Wavelets_and_multiresolution_in_two_pages
Russell Childs
 
Relativity 2
Russell Childs
 
Recursion to iteration automation.
Russell Childs
 
Dirac demo (quantum mechanics with C++). Please note: There is a problem with...
Russell Childs
 
Shared_memory_hash_table
Russell Childs
 
Full resume dr_russell_john_childs_2016
Russell Childs
 
Simple shared mutex UML
Russell Childs
 
Design pattern to avoid downcasting
Russell Childs
 
Interview uml design
Russell Childs
 
Full_resume_Dr_Russell_John_Childs
Russell Childs
 
Dynamic programming burglar_problem
Russell Childs
 
Ad

Recently uploaded (20)

PDF
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
 
PDF
Tesia Dobrydnia - An Avid Hiker And Backpacker
Tesia Dobrydnia
 
PDF
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
 
PDF
June 2025 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
PPSX
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
 
PPTX
Precooling and Refrigerated storage.pptx
ThongamSunita
 
PPTX
ASBC application presentation template (ENG)_v3 (1).pptx
HassanMohammed730118
 
PPTX
Artificial Intelligence jejeiejj3iriejrjifirirjdjeie
VikingsGaming2
 
PDF
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
 
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
PPTX
Explore USA’s Best Structural And Non Structural Steel Detailing
Silicon Engineering Consultants LLC
 
PDF
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
PDF
Artificial Neural Network-Types,Perceptron,Problems
Sharmila Chidaravalli
 
PPTX
Unit_I Functional Units, Instruction Sets.pptx
logaprakash9
 
PDF
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
PPT
SF 9_Unit 1.ppt software engineering ppt
AmarrKannthh
 
PPTX
Work at Height training for workers .pptx
cecos12
 
PDF
Clustering Algorithms - Kmeans,Min ALgorithm
Sharmila Chidaravalli
 
PPT
دراسة حاله لقرية تقع في جنوب غرب السودان
محمد قصص فتوتة
 
PDF
LLC CM NCP1399 SIMPLIS MODEL MANUAL.PDF
ssuser1be9ce
 
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
 
Tesia Dobrydnia - An Avid Hiker And Backpacker
Tesia Dobrydnia
 
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
 
June 2025 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
 
Precooling and Refrigerated storage.pptx
ThongamSunita
 
ASBC application presentation template (ENG)_v3 (1).pptx
HassanMohammed730118
 
Artificial Intelligence jejeiejj3iriejrjifirirjdjeie
VikingsGaming2
 
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
Explore USA’s Best Structural And Non Structural Steel Detailing
Silicon Engineering Consultants LLC
 
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
Artificial Neural Network-Types,Perceptron,Problems
Sharmila Chidaravalli
 
Unit_I Functional Units, Instruction Sets.pptx
logaprakash9
 
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
SF 9_Unit 1.ppt software engineering ppt
AmarrKannthh
 
Work at Height training for workers .pptx
cecos12
 
Clustering Algorithms - Kmeans,Min ALgorithm
Sharmila Chidaravalli
 
دراسة حاله لقرية تقع في جنوب غرب السودان
محمد قصص فتوتة
 
LLC CM NCP1399 SIMPLIS MODEL MANUAL.PDF
ssuser1be9ce
 

Algorithms devised for a google interview

  • 1. © Russell John Childs, PhD. Date: 2015-03-21. Algorithms for calculating order statistic. Module K-STATISTIC Specifications Description: Given an unsigned integer, k, and a set of arrays, S:={A}, the k smallest items in S shall be found and returned. Specifications: K-STATISTIC-1: "A" shall be a sorted array of unsigned, non-duplicative integers. K-STATISTIC-2: "S" shall be a set of sorted arrays, A. K-STATISTIC-3: An unsigned integer, "k", shall be within the range . K-STATISTIC-4: : "R" shall be a sorted array of unsigned integers. K-STATISTIC-5: The k smallest elements, over all arrays, A, in the set S, shall be found and returned in R. K-STATISTIC-6: T shall be the number of threads allocated to the module. K-STATISTIC-7: The time-order-of-complexity of this module shall be assessed. Interface Specifications: K-STATISTIC-8: This module shall provide the interface: template<typename Type> vector<Type> k_statistic(const vector< vector<Type> >& S, unsigned k) K-STATISTIC-8.1: k_statistic shall take the set, S:={A}, as an argument. K-STATISTIC-8.2: k_statistic shall take, k, as an argument. K-STATISTIC-8.3: k_statistic shall take a template parameter, Type, which shall map to an integer. K-STATISTIC-9: k_statistic shall return the set R. Data Specifications: K-STATISTIC-10: Each sorted array, A, shall be no larger than 1MB ( bytes). K-STATISTIC-11: "D" shall be an ordered dataset of unsigned integers. K-STATISTIC-12: "D" shall be partitioned into the set S:={A} and A shall satisfy K-STATISTIC-10 Module K-STATISTIC Analysis Multi-threaded Bucket Sort Analysis: Let B be a bucket sort. Let N be the number of items to be sorted. Let M be the number of buckets in B. Let be i-th bucket in B. Let be the number of items in . Items, , are placed in container, , in bucket , . Two cases arise: (1) C is a vectorised, ordered linked-list, s.t. consecutive nodes belong to each thread, where T is thread-count, and is multi-casted to all threads. Insertions and searches are : Copying to linear memory is . Thus, for it is possible to achieve are insertion, search and copy. Thread- overhead may make SIMD a more suitable choice. (2) C is a non-vectorised ordered set: Item insertion/search: , where H is entropy: . Sort: . The entropy of the integers directly affects . A uniform distribution has the lowest : . A -func has the highest: . With one thread per array: , since each thread needs to process no more than from each array.
  • 2. Multi-threaded Merge Sort Analysis: Two sorted arrays are merged according to the following prescription: Let be the value of the median element of the left-hand array, , where is the size. Let be the position of in the right-hand array, , through a binary search. Array is split about its mid-point , where . Array is split about b, , where . Sub-arrays are then recombined: , as the following diagram depicts: Case 1: The number of threads, . The order of complexity for merging two arrays of equal size is given by: Given arrays, of equal size, and an infinite number of threads, , the arrays may be merged in pairs to give: For 1 billion arrays and a k-statistic of 1 billion, would be: Case 2: The number of threads, , is finite. The expression for this case is ( threads for merge, for reduction by pairs, ): , NB: . Threads0 1 2 3 4 5 6Synchronisation of pushes to task queue Ordered array
  • 3. Results (using: Visual Studio 2013, Intel Quad-Core 2.6 GHz i7-3720QM, 8 GB RAM, Dell Precision M4700 Mobile Workstation): For 1, 2 and 4 threads, the vectorised bucket sort is around five times faster than successive, single-threaded std::merge operations. Sadly, the performance decreases with thread-count, indicating thread-overhead is an issue. Performance-profiling with Intel VTune Amplifier has not yet been undertaken, so the amount of time spent in locks, RFOs and memory fetches to cache is unknown. The vectorised merge-sort is very slow. This may, again, be the result of thread-overhead and thread oversubscription. Overall, it is surmised that this sort of vectorisation is better accomplished through SIMD or FPGAs, where each "thread" performs relatively few operations and the thread-count can be far higher.
  • 4. /** * brief Algorithms for calculating Kth order statistic given multiple, * sorted arrays. This compiles under Visual Studio 2013. As yet, * code has not been ported to Eclipse and Linux. Code is untested * but available for review. * details This file contains a multi-threaded bucket sort and two methods. * One of the methods uses the bucket sort to obtain Kth order statistic and the other uses a merge-sort whose merge is vectorised * author Russell John Childs, PhD. * date 2015-03-21 * copyright Russell John Childs, PhD. */ #include <vector> #include <set> #include <queue> #include <chrono> #include <thread> #include <future> #include <condition_variable> #include <random> #include <iostream> #include <sstream> /** * @namespace Sort * @brief The namespace for k statistics sorting */ namespace Sort { /** class * @param Type. The type of the element in the array to be sorted * * @param BucketCount. The number of buckets for the sort * * @param Predicate. An overloaded method "unsigned operator()(const Type&) * This must take a parameter of type const Type& and return a unique unsigned * integer within [lower, upper] specified in this class's constructor. * */ template< typename Type, unsigned BucketCount, typename Comp=std::less<Type>> class BucketSort { public: /** * @param lower : unsigned. The lower value of the range. * * @param upper : unsigned. The upper value of the range. * * Notes: If an element lies outside the range it is ignored and not * included in the final, sorted result */ BucketSort(unsigned lower, unsigned upper) : m_lower(lower), m_upper(upper), m_normalisation(long double(BucketCount-1) / long double(upper - lower)), m_size(0) { //Reset all the buckets reset(); } /** * No operations specified. */ ~BucketSort(void) { }
  • 5. /** * Emtpies the buckets */ void reset(void) { m_buckets.clear(); m_buckets.resize(BucketCount); m_size = 0; } /** * Sorts the passed array. * @param arr : Type. The fixed array, of type Type, to be sorted * * Notes: This is a convenience function for handling fixed arrays. * It is, at this time, unimplemented and does nothing. */ template<int Size, typename Pred> void sort(Type (&arr) [Size], Pred) { //This method to be implemented at a later date. } /** * Sorts the passed container. * @param arr : const Container&. The conainer to be sorted. * * @param pred : Pred. A function "unsigned func(const Type& elem)" that * returns the unique unsigned key associated with elem. * Keys outside [lower, upper] cause elem to be skipped. * * Notes: (1) Container must provide method const Type& operator[](unsigned). * (2) The result is returned by Type& operator[](unsigned) and * get_result() This allows a succession of arrays to be passed to * sort() before the result is obtained. * (3) sort() may be terminated by pred() returning val > arr.size(). * (4) This implementation uses locks. Conversion to lock-free is * relatively straightforward, but requires substantially more * testing, due to subtle bugs that arise with lock-free. */ template< typename Container, typename Pred > void sort(const Container &arr, Pred pred ) { //Get array size unsigned size = arr.size(); //Add each each source element to sorted list for (unsigned i = 0; i < size; ++i) { unsigned key = pred((arr[i])); if (m_lower <= key && key <= m_upper) { unsigned index = unsigned((key - m_lower)*m_normalisation); { std::lock_guard<std::mutex> lck(m_mutexes[index]); m_buckets[index].insert(arr[i]); //Update total element count; ++m_size; } } else if (key == -int(1)) { //predicate has signalled that sort should terminate i = key-1; } } }
  • 6. /** * Returns an element in the sorted array * @param index : unsigned. The index of the element in the sorted array. * * @return const std::vector<Type>&. The list of sorted objects. * Notes: 1. This method will be provided at a future date. It requires that * BucketSort utilise a linear, contiguous buffer for its buckets * allowing for O(1) retrieval of an element. It is, at this time, * not available to the user, who must use the far less efficient * mechanism "get_result()[index]". */ Type& operator[](unsigned index) { } /** * Returns a vector containing the sorted elements * * @param k : unsigned. Number of sorted elements to return. Default = All. * @ return : const std::vector<Type>& . The sorted elements * */ const std::vector<Type>& get_result(unsigned k=0) { unsigned size = k; if (k == 0) { //Get total number of elements for (auto& bucket : m_buckets) { size += bucket.size(); } } //Resize result vector m_result.resize( size ); //Store sorted result unsigned index = 0; for (auto& bucket : m_buckets) { if (index < k) { auto lim = std::min<unsigned>(bucket.size(), k - index); auto iter = bucket.begin(); for (unsigned i = 0; i < lim; ++i) { m_result[index++] = *iter++; } } } //Return sorted result return m_result; } /** * Finds a specified element * @param in : const Type&. The element sought. * * @param advance : int. For the specified element, finds the element that * is "advance" elements before (if advance < 0) or * after (if advance > 0) the specified element "in".
  • 7. * * @return typename std::set<Type>::iterator . An iterator to the element, * if present, or end(). * * Notes: This implementation uses locks. Conversion to lock-free is * relatively straightforward, but requires substantially more * testing, due to subtle bugs that arise with lock-free. */ template<typename Pred> typename std::set<Type>::iterator find(const Type& in, Pred pred, int advance) { //Get bucket bounds and bucket index int bounds = BucketCount - 1; unsigned key = pred(in); int index = unsigned((key - m_lower)*m_normalisation); //Get beginning and end of the bucket table std::unique_lock<std::mutex> lck(m_mutexes[0]); auto begin = m_buckets[0].begin(); lck.unlock(); std::unique_lock<std::mutex> lck1(m_mutexes[bounds]); auto end = m_buckets[bounds].end(); lck1.unlock(); //Create return var typename std::set<Type, Comp>::iterator ret_val; bool is_not_found = index > bounds; if (is_not_found == false ) { std::lock_guard<std::mutex> lck(m_mutexes[index]); ret_val = m_buckets[index].find(in); if (ret_val == m_buckets[index].end()) { ret_val = end; is_not_found = true; } } else { //Out of bounds ret_val = end; } //Increment iterator whilst within bounds while (is_not_found == false && advance > 0) { std::unique_lock<std::mutex> lck(m_mutexes[index]); if (ret_val != m_buckets[index].end()) { //Increment if within bounds of current bucket ++ret_val; --advance; } else if (++index <= bounds) { //If within bounds of table, get start of next bucket lck.unlock(); std::lock_guard<std::mutex> lck1(m_mutexes[index]); ret_val = m_buckets[index].begin(); } else { //Out-of-bounds ret_val = end; is_not_found = true;
  • 8. } } //Decrement iterator whilst within bounds while (is_not_found == false && advance < 0) { std::unique_lock<std::mutex> lck(m_mutexes[index]); if (ret_val != m_buckets[index].begin()) { //Decrement if within bounds of current bucket --ret_val; ++advance; } else if (--index >= 0) { //If within bounds of table, go to eof prev bucket lck.unlock(); std::lock_guard<std::mutex> lck1(m_mutexes[index]); //auto in = m_buckets[index]; //ret_val = in.begin() == in.end() ? in.end() : in.end()--; ret_val = m_buckets[index].end(); } else { //Out-of-bounds ret_val = begin; is_not_found = true; } } return ret_val; } /** * Returns total count of elements in bucket sort * * @return unsigned . Total count of elements. */ unsigned size(void) { return m_size; } private: unsigned m_lower; unsigned m_upper; long double m_normalisation; unsigned m_size; std::vector<std::set<Type, Comp>> m_buckets; std::mutex m_mutexes[BucketCount]; std::vector<Type> m_result; }; template<typename T= unsigned> using Element=std::pair<unsigned, T>; template<typename T> using ElementArray = std::vector<Element<T>>; template<typename T> using ElementArrays = std::vector<ElementArray<T>>; /** k_statistic * @param arrays : ElementArrays. The sorted arrays to be merged * * @param k : unsigned. The k-statistic, i.e. the k smallest elements in "arrays". * * Notes: (1) At least one input array must have sizeof >= k. If necessary, * pad the 1st array with value > k-th smallest. * (2 )Function uses bucket sort to keep track of k smallest elements * so far. For each array, if an element is larger than the largest * of the k smallest found so far, then all later elements are * discarded. Otherwise, the set of k smallest elements is updated
  • 9. * with the new element. */ namespace KStatisticBucketSort { template< typename T > const ElementArray<T> k_statistic(ElementArrays<T>& arrays, unsigned k) { std::atomic<unsigned> last_found; //Find an array, A, with at least k elements unsigned index = 0; while (arrays[index++].size() < k && index < arrays.size()); unsigned upper = last_found = arrays[index - 1][k - 1].first; //Create a buacket sort shared amongst thread. //TODO: Remove magic number (100) and use dynamically sized buckets struct Comp{ bool operator()(const Element<T>& lhs, const Element<T>& rhs ) { return lhs.first < rhs.first; } }; BucketSort<Element<T>, 2000, Comp> bucket_sort(0, upper); //Add A as the base case { auto pred = [&](const Element<T>& elem){ return elem.first; }; bucket_sort.sort(arrays[index - 1], pred); } unsigned start_array = index - 1; //Create a predicate for the bucket sort auto pred = [&](const Element<T>& elem) { //Update last element in list of k smallest found so far unsigned old_val = last_found; //last of the k smallest unsigned new_val = old_val; //new val for last of k smallest bool stop_processing=false; //Flag to continue or discontinue do { old_val = last_found; new_val = old_val; stop_processing = false; if (bucket_sort.size() < k) { new_val =std::max<unsigned>( elem.first, old_val ); } else if (elem.first > old_val) { //Simply stop processing array if elem > max(k-smallest) new_val = old_val; stop_processing = true; } else { //Add elem and update max(k-smallest) auto tmp = [&](const Element<T>& in){ return in.first; }; new_val = std::max<unsigned>(elem.first, bucket_sort.find( std::make_pair(old_val, elem.second), tmp, -1)->first); } } while (last_found.compare_exchange_weak(old_val, new_val) == false && stop_processing == false ); return stop_processing == false ? elem.first : -int(1); }; //Create a thread function that adds a new array to the bucket sort index = 0;
  • 10. std::atomic<bool> start_thread = false; std::atomic<unsigned> pop_count = 0; auto add_array = [&]( void ) { //Wait for start signal while (start_thread == false); //Loop over arrays "popping" each one processed unsigned old_count = pop_count; unsigned new_count = old_count + 1; while (old_count < arrays.size()) { //Claim an array, by capturing pop count and incrmeenting while (pop_count.compare_exchange_weak(old_count, old_count+1) == false); //Check pop count is within bounds and not the starting array if (old_count != start_array && old_count < arrays.size()) { bucket_sort.sort(arrays[old_count], pred); } } }; //Add arrays to the bucket sort, limit number of threads (4=magic num, //but this is only proof-of-concept code unsigned thread_limit = 4; unsigned thread_count = 0; std::vector< std::future<void> >results; for (auto& arr : arrays) { //Add array to sort results.push_back( std::async(std::launch::async, add_array)); //If thread lim reached: if (thread_count > thread_limit) { //Wait for existing threads to finish start_thread = true; for (auto& res : results) { res.wait(); } //Reset limit checks thread_count = 0; results.clear(); } ++thread_count; } //Start threads and wait for results start_thread = true; for (auto& res : results) { res.wait(); } //Extract k-th order statistic from bucket table return bucket_sort.get_result(k); } } /** k_statistic * @param arrays : ElementArrays. The sorted arrays to be merged * * @param k : unsigned. The k-statistic, i.e. the k smallest elements in "arrays". * * Notes: (1) Function uses merge sort to find smallest elements. The merge is vectorised. * (2) The implementation of this method is in a STATE OF FLUX. * Focussed on stress-testing Bucket-Sort algorithm */
  • 11. namespace KStatisticMergeSort { template<typename T> const ElementArray<T> k_statistic(ElementArrays<T>& arrays, unsigned k) { //std::atomic isn't copy-constructible, so need to wrap it for std::vec //Very annoying. Naughty C++ committee. struct AtomicUnsigned { AtomicUnsigned(void) : m_val(new std::atomic<unsigned>(0)) { } ~AtomicUnsigned(void) { delete m_val; } void operator=(unsigned i) { *m_val = i; } void operator+=(unsigned i) { unsigned old = *m_val; while (m_val->compare_exchange_weak(old, old + i) == false); } operator unsigned(void) { return *m_val; } std::atomic<unsigned>* m_val; }; //EOF annoying code-bloat //Create sync id for each queue std::vector<AtomicUnsigned> sync_id; //Struct to hold id, used to sync pushes to queue, //and the boundaries of the left and right array chunks to merge struct QueueData { unsigned m_sync_id; unsigned m_generations_skipped; typename ElementArray<T>::const_iterator m_lhs_first; typename ElementArray<T>::const_iterator m_lhs_last; typename ElementArray<T>::const_iterator m_rhs_first; typename ElementArray<T>::const_iterator m_rhs_last; }; //Create task queues. One queue for each array-pair. Length will halve //with each iteration: N/2 -> N/4 -> N/8 ... -> 1 merged array std::vector<std::queue<QueueData>> queues; //Create array of sorted counts (2*k sorted elems => done!) std::vector<AtomicUnsigned> sorted_count; //Create pneding task count (1 thread spawned per task up to thr limit) std::atomic<int> pending_tasks(0); std::vector<AtomicUnsigned> queue_tasks; //Create merge function std::vector<std::mutex> mut(arrays.size()); //This function should be made lock-free //Lambda to extract queue results to an array
  • 12. std::function<void(unsigned)> merge_pair = [&](unsigned i) { //Flag to indicate pair is merged bool done = false; //Lock queue and peek 1st task std::unique_lock<std::mutex> lck(mut[i]); QueueData q_data = queues[i].front(); //Check for termination condition (2*k sorted items) if (sorted_count[i] == k << 1) { //Check final tree level fully populated //Level of node unsigned node_level = static_cast<unsigned>(std::log2(q_data.m_sync_id+1)+1); //Depth of tree unsigned last_level = static_cast<unsigned>(std::log2(sync_id[i] + 1) + 1); //Node is at tree depth? done = node_level == last_level; } //Terminate if finished if (done == true) { lck.unlock(); } //Only process queue[i] if we have not reached last node in last level else { //Get latest chunk from queue queues[i].pop(); lck.unlock(); //Get middle of lhs //-------------------- ------------- //| | M | | OR | M | | //------------------- ------------- auto lhs_middle = q_data.m_lhs_first + (q_data.m_lhs_last - q_data.m_lhs_first) / 2; //Get "middle" of rhs(insertion point of M) //-------------------- //| |M'<M | I>=M | //-------------------- auto rhs_middle = std::upper_bound(q_data.m_rhs_first, q_data.m_rhs_last, *lhs_middle); //Create (lhs.lower_half, rhs.lower_half). // ( [beg, mid) , [beg, I) ) QueueData left { q_data.m_sync_id * 2 + 1, q_data.m_generations_skipped, q_data.m_lhs_first, lhs_middle,//NB: first==mid implies [mid,mid), i.e. "empty" q_data.m_rhs_first, rhs_middle }; //Create (lhs.upper_half, rhs.upper_half). // ( [mid,last+1) , [I, last+1) ) QueueData right { q_data.m_sync_id * 2 + 2, q_data.m_generations_skipped, lhs_middle, q_data.m_lhs_last, rhs_middle,
  • 13. q_data.m_rhs_last }; //Termination conditions //1. Prev gen lhs sorted(q_data.m_lhs_first==q_data.m_lhs_last) //2. All data >= lhs.mid ---> left=empty if ( right.m_generations_skipped > 0 || q_data.m_lhs_first == q_data.m_lhs_last || /*q_data.m_rhs_first == q_data.m_rhs_last ||*/ (left.m_lhs_first == left.m_lhs_last && left.m_rhs_first == left.m_rhs_last) ) { //Store in + just keep rhs branch right = q_data; right.m_sync_id = q_data.m_sync_id * 2 + 2; //Keep track of tree levels skipped in bifurcation process ++right.m_generations_skipped; //Update count of sorted elements if (right.m_generations_skipped == 1) { sorted_count[i].m_val->fetch_add( ((q_data.m_lhs_last - q_data.m_lhs_first) + (q_data.m_rhs_last - q_data.m_rhs_first)) ); } } //Wait for synchronisation value // / // / // / / wait for right-2 (0 gens skipped) // / / / wait for right-2 (1 gen skipped) // //// // wait for right-4 (2 gens skipped) unsigned skipped = std::max<unsigned>(right.m_generations_skipped, 1); unsigned prev_id; do { prev_id = right.m_sync_id - (1 << skipped); //std::this_thread::sleep_for(std::chrono::microseconds(1)); std::this_thread::yield(); } while (sync_id[i].m_val->compare_exchange_weak(prev_id, prev_id) == false); //Only push lhs if generation not skipped lck.lock(); if (right.m_generations_skipped == 0) { //Push chunk to q and update task count queues[i].push(left); auto& task = *(queue_tasks[i].m_val); task++; ++pending_tasks; } //Push rhs chunk, update task count & sync id queues[i].push(right); auto& task = *(queue_tasks[i].m_val); task++; ++pending_tasks; sync_id[i] = right.m_sync_id; } }; //Lambda for performing container.begin()+n auto advance = [&]( const ElementArray<T>& a, typename ElementArray<T>::const_iterator it, unsigned ind) {
  • 14. std::advance(it, std::min<unsigned>(ind, a.end()-it)); return it; }; //Lambda to push array pairs onto queues auto push_pairs = [&](const ElementArrays<T>& arrs) { //Resize queues, task counts, sorted counts, sync ids auto size = arrs.size(); queues.clear(); queues.resize(size / 2); queue_tasks.clear(); queue_tasks.resize(size / 2); sorted_count.clear(); sorted_count.resize(size/2); sync_id.clear(); sync_id.resize(queues.size()); //oop over array pairs unsigned count = 0; for (unsigned i = 0; i < (size>>1)<<1; i += 2) { //Populate thread id, gens skipped, start/end of lhs of pair, //start/end rhs of pair. NB interval = [beg,end) = [0,1,..,k-1,k) QueueData data { 0, 0, arrs[i].begin(), advance(arrs[i], arrs[i].cbegin(), k), arrs[i + 1].cbegin(), advance(arrs[i + 1], arrs[i + 1].cbegin(), k) }; //Push chunk to q and update task count queues[count].push(data); auto& tmp = *(queue_tasks[count++].m_val); tmp++; ++pending_tasks; } }; //Push original arrays push_pairs(arrays); //Lambda to extract queue results to an array auto extract = [&](ElementArrays<T>& arrs) { unsigned count = 0; //Lopp over queues for (auto& q : queues) { ElementArray<T> arr; while (q.empty() == false) { //Get start/end of chunk range auto pair = q.front(); q.pop(); //Only extract if range not empty and elems <= k if (pair.m_lhs_first != pair.m_lhs_last) { std::copy(pair.m_lhs_first, pair.m_lhs_last, std::back_inserter(arr)); } //Only extract if range not empty and elems <= k if (pair.m_rhs_first != pair.m_rhs_last) {
  • 15. std::copy(pair.m_rhs_first, pair.m_rhs_last, std::back_inserter(arr)); } } //Return sorted array arr.resize(k); arrs.push_back(arr); } }; //This section attempts to balance workload evenly across available threads. // //For each array-pair in turn, priority is given to vectorising the merge //of each pair. // //Array-pairs are, successively, processed in parallel, until all threads //are consumed in the vectorised merges of the pairs processed so far. // //As each vectorised merge bifructaes (1->2->4->8...) it will consume more //threads until all threads are being used to vectorise merges. When this //point is reached, array-pair parallel processes ceases, since all threads //are vectorising the existing merges. // //The aim is to minimise the latency of the merge in the hope that this //minimises the latency associated with merging all the array-pairs. // //Storage for results (toggle between ret_val[0<->1], so ret_val[a] has prev //results ret_val[b] has new results in seq N/2 -> N/4 -> N/8 ElementArrays<T> ret_val[2]; unsigned toggle = 0; //Vector of futures returned by tasks std::vector<std::future<void>> results; //Thread limit and count. Magic number, since code=proof-of-concept only unsigned thread_limit = 1024; unsigned thread_count = 0; //Flag to indicate that sorting is complete bool done; bool once = true; do { done = true; //Keep processing any tasks on the queues while (pending_tasks > 0) { //Loop over queueus unsigned q = 0; for (auto& queue : queues) { //Process tasks for this queue std::unique_lock<std::mutex> lck(mut[q]); unsigned num = queue_tasks[q]; for (unsigned task = 0; task < num; ++task) { //Limit num threads spawned if (thread_count < thread_limit) { //Spawn 1 thrd/task and update task count results.push_back(std::async(std::launch::async, merge_pair, q)); --pending_tasks; auto& tmp = *(queue_tasks[q].m_val); tmp--; ++thread_count; } } ++q; lck.unlock();
  • 16. } //Wait for current tasks to finish spawning new tasks for (auto& res : results) { res.wait(); --thread_count; } results.clear(); } //Extract results auto& old_ret_val = ret_val[toggle]; toggle = ++toggle % 2; ret_val[toggle].clear(); if (once && arrays.size()%2 != 0) { ret_val[toggle].push_back(arrays[arrays.size() - 1]); } once = false; if (old_ret_val.size() % 2 != 0) { ret_val[toggle].push_back(old_ret_val[old_ret_val.size() - 1]); } extract(ret_val[toggle]); //Clear queues and add extracted results if (ret_val[toggle].size() > 1) { queues.clear(); push_pairs(ret_val[toggle]); done = false; } } while (done == false); //Return sorted results return ret_val[toggle][0]; } } } namespace TestSort { /** * @brief Creates random number of randomly-sized, ordered arrays of random numbers. * * @param k : unsigned k-th order statisitc. * * @param max_num_arrays : unsigned . Max number of arrays to be sorted. * * @param max_size_of_arrays : unsigned . Max number of elements in array. * * @param out : Sort::ElementArrays<unsigned>& . The arrays to be returned. * */ template<typename T> void get_random_arrays(unsigned k, unsigned max_num_arrays, unsigned max_size_of_array, Sort::ElementArrays<T>& out) { //Random generator (replace 0 with dev() for random seed) std::random_device dev; std::mt19937 generator(0/*dev()*/); Sort::ElementArrays<unsigned> element_arrays; //Create random number for number of arrays std::uniform_int_distribution<T> arrays_rnd(2, max_num_arrays); unsigned num_arrays = arrays_rnd(generator);
  • 17. //Create arrays std::set<unsigned> s; for (unsigned i = 0; i < num_arrays; ++i) { //Create radnom number for sizeof array Sort::ElementArray<T> element_array; std::uniform_int_distribution<T> elements_rnd(k, max_size_of_array); unsigned num_elements = elements_rnd(generator); //Create random numbers for elements and add to array std::uniform_int_distribution<T> element_rnd(0, 1<<31); for (unsigned j = 0; j < num_elements; ++j) { T elem = element_rnd(generator); /*bool res = s.insert(elem).second; if (res == false) { //std::cout << "duplicate removed" << std::endl; j = j-1; } else*/ { element_array.push_back(std::make_pair(elem, i)); } } //Sort the arrays and return them std::sort(element_array.begin(), element_array.end()); out.push_back(element_array); } } /** * @brief Creates random number of randomly-sized, ordered arrays of random numbers. * Adds all of these arrays to the sorting algorithm and validates the result * against the known result obtained by adding arrays to a sorted set. * * @param k : unsigned k-th order statisitc. * * @param max_num_arrays : unsigned . Max number of arrays to be sorted. * * @param max_size_of_arrays : unsigned . Max number of elements in array. * * @return bool : Pass = true. * */ bool test_bucket(unsigned k, unsigned max_num_arrays, unsigned max_size_of_array) { //Declare arrays Sort::ElementArrays<unsigned> test_sort; //Print banner std::cout << "Creating up to " << max_num_arrays << " arrays, each of size <= " << max_size_of_array << ". Please wait ..." << std::endl; //Create a set of ordered, random integer arrays get_random_arrays<unsigned>(k, max_num_arrays, max_size_of_array, test_sort); //Print banner unsigned ave = 0; for (auto& arr : test_sort) ave += arr.size(); ave /= test_sort.size(); std::cout << "Testing bucket algorithm with: Num arrays = " << test_sort.size() << ", average size = " << ave << ", k = " << k << ". Please wait ... " << std::endl;
  • 18. //Run sorting algorithm auto start = std::chrono::high_resolution_clock::now(); auto result = Sort::KStatisticBucketSort::k_statistic(test_sort, k); auto end = std::chrono::high_resolution_clock::now(); //Print banner std::cout << "Algorithm execution completed, after " << std::chrono::duration<double, std::milli>(end - start).count() << " ms." << std::endl; //Run STL merge //Print banner std::cout << "Running std::merge. Please wait ..." << std::endl; //Start timer auto start_1 = std::chrono::high_resolution_clock::now(); //Merge (out[0]=arr1)+arr2 -> out[1], out[1]+arr2 -> out[0], out[0]+arr3 -> out[1] .... std::vector<Sort::Element<unsigned>> output[2]; output[0].insert(output[0].begin(), test_sort[0].begin(), test_sort[0].begin() + k); output[0].resize(2 * k); output[1].resize(2 * k); unsigned toggle; for (unsigned i = 1; i < test_sort.size(); ++i) { toggle = (i - 1) % 2; std::merge(output[toggle].begin(), output[toggle].begin() + k, test_sort[i].begin(), test_sort[i].begin() + k, output[(toggle + 1) % 2].begin()); } //Resize back to k output[(toggle + 1) % 2].resize(k); auto& validate = output[(toggle + 1) % 2]; //stop timer auto end_1 = std::chrono::high_resolution_clock::now(); //Print banner std::cout << "std::merge execution completed, after " << std::chrono::duration<double, std::milli>(end_1 - start_1).count() << " ms." << std::endl; std::cout << "Validating results. Please wait ..." << std::endl; //Validate that values extracted from set == result from algorithm return result == validate; } bool test_merge(unsigned k, unsigned max_num_arrays, unsigned max_size_of_array) { using namespace Sort; //Declare arrays Sort::ElementArrays<unsigned> test_sort; //Print banner std::cout << "Creating up to " << max_num_arrays << " arrays, each of size <= " << max_size_of_array << ". Please wait ..." << std::endl; //Preliminary debugging tests. /*test_sort.push_back( { { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 }, { 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 }, { 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } }); test_sort.push_back( { { 40, 40 }, { 42, 42 }, { 44, 44 }, { 46, 46 }, {48, 48 }, { 410, 410 }, { 412, 412 }, { 414, 414 }, { 416, 416 }, { 418, 418 }, { 420, 420 }, { 422, 422 }, { 424, 424 }, { 426, 426 }, { 428, 428 }, { 430, 430 } });*/ /*test_sort.push_back( { { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 },
  • 19. { 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 }, { 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } });*/ /*test_sort.push_back( { { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 }, { 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 }, { 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } }); test_sort.push_back( { { 1, 1 }, { 3, 3 }, { 5, 5 }, { 7, 7 }, { 9, 9 }, { 11, 11 }, { 13, 13 }, { 15, 15 }, { 17, 17 }, { 19, 19 }, { 21, 21 }, { 23, 23 }, { 25, 25 }, { 27, 27 }, { 29, 29 }, { 31, 31 } }); test_sort.push_back( { { 1, 1 }, { 3, 3 }, { 5, 5 }, { 7, 7 }, { 9, 9 }, { 11, 11 }, { 13, 13 }, { 15, 15 }, { 17, 17 }, { 19, 19 }, { 21, 21 }, { 23, 23 }, { 25, 25 }, { 27, 27 }, { 29, 29 }, { 31, 31 } }); test_sort.push_back( { { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 }, { 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 }, { 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } });*/ /* for (unsigned i = 0; i < 4; ++i) { Sort::ElementArray<unsigned> arr; for (unsigned j = 0; j < 2 * k; ++j) { arr.push_back(std::make_pair(i + (j * 4), i + (j * 4))); } test_sort.push_back(arr); } test_sort.pop_back();*/ //Randomised strees-test. get_random_arrays<unsigned>(k, max_num_arrays, max_size_of_array, test_sort); //Print banner unsigned ave = 0; for (auto& arr : test_sort) ave += arr.size(); ave /= test_sort.size(); std::cout << "Testing merge algorithm with: Num arrays = " << test_sort.size() << ", average size = " << ave << ", k = " << k << ". Please wait ... " << std::endl; //Run sorting algorithm auto start = std::chrono::high_resolution_clock::now(); auto result = Sort::KStatisticMergeSort::k_statistic(test_sort, k); auto end = std::chrono::high_resolution_clock::now(); //Print banner std::cout << "Algorithm execution completed, after " << std::chrono::duration<double, std::milli>(end - start).count() << " ms." << std::endl; //Run STL merge //Print banner std::cout << "Running std::merge. Please wait ..." << std::endl; //Start timer auto start_1 = std::chrono::high_resolution_clock::now(); //Merge (out[0]=arr1)+arr2 -> out[1], out[1]+arr2 -> out[0], out[0]+arr3 -> out[1] .... std::vector<Sort::Element<unsigned>> output[2]; output[0].insert(output[0].begin(), test_sort[0].begin(), test_sort[0].begin() + k); output[0].resize(2 * k); output[1].resize(2 * k); unsigned toggle; for (unsigned i = 1; i < test_sort.size(); ++i) {
  • 20. toggle = (i - 1) % 2; std::merge(output[toggle].begin(), output[toggle].begin() + k, test_sort[i].begin(), test_sort[i].begin() + k, output[(toggle + 1) % 2].begin()); } //Resize back to k output[(toggle + 1) % 2].resize(k); auto& validate = output[(toggle + 1) % 2]; //stop timer auto end_1 = std::chrono::high_resolution_clock::now(); //Print banner std::cout << "std::merge execution completed, after " << std::chrono::duration<double, std::milli>(end_1 - start_1).count() << " ms." << std::endl; std::cout << "Validating results. Please wait ..." << std::endl; //Validate that values extracted from set == result from algorithm return result == validate; } } int main(void) { bool result_bucket = TestSort::test_bucket(1000, 1000, 10000); std::cout << "Bucket Sort Algorithm: " << (result_bucket ? "Passed." : "Failed.") << std::endl; std::cout << std::endl; bool result_merge = TestSort::test_merge(100, 500, 10000); std::cout << "Merge Sort Algorithm: " << (result_merge ? "Passed." : "Failed.") << std::endl; return 0; }