© Russell John Childs, PhD. Date: 2015-03-21. Algorithms for calculating order statistic.
Module K-STATISTIC Specifications
Description: Given an unsigned integer, k, and a set of arrays, S:={A}, the k smallest items in S shall be
found and returned.
Specifications:
K-STATISTIC-1: "A" shall be a sorted array of unsigned, non-duplicative integers.
K-STATISTIC-2: "S" shall be a set of sorted arrays, A.
K-STATISTIC-3: An unsigned integer, "k", shall be within the range .
K-STATISTIC-4: : "R" shall be a sorted array of unsigned integers.
K-STATISTIC-5: The k smallest elements, over all arrays, A, in the set S, shall be found and returned in R.
K-STATISTIC-6: T shall be the number of threads allocated to the module.
K-STATISTIC-7: The time-order-of-complexity of this module shall be assessed.
Interface Specifications:
K-STATISTIC-8: This module shall provide the interface:
template<typename Type>
vector<Type> k_statistic(const vector< vector<Type> >& S, unsigned k)
K-STATISTIC-8.1: k_statistic shall take the set, S:={A}, as an argument.
K-STATISTIC-8.2: k_statistic shall take, k, as an argument.
K-STATISTIC-8.3: k_statistic shall take a template parameter, Type, which shall map to an integer.
K-STATISTIC-9: k_statistic shall return the set R.
Data Specifications:
K-STATISTIC-10: Each sorted array, A, shall be no larger than 1MB ( bytes).
K-STATISTIC-11: "D" shall be an ordered dataset of unsigned integers.
K-STATISTIC-12: "D" shall be partitioned into the set S:={A} and A shall satisfy K-STATISTIC-10
Module K-STATISTIC Analysis
Multi-threaded Bucket Sort Analysis:
Let B be a bucket sort.
Let N be the number of items to be sorted.
Let M be the number of buckets in B.
Let be i-th bucket in B.
Let be the number of items in .
Items, , are placed in container, , in bucket , . Two cases arise:
(1) C is a vectorised, ordered linked-list, s.t. consecutive nodes belong to each thread, where T is
thread-count, and is multi-casted to all threads. Insertions and searches are : Copying to linear
memory is . Thus, for it is possible to achieve are insertion, search and copy. Thread-
overhead may make SIMD a more suitable choice.
(2) C is a non-vectorised ordered set:
Item insertion/search:
, where H is
entropy: .
Sort: . The entropy of the integers directly affects . A uniform
distribution has the lowest : . A -func has the highest: .
With one thread per array: ,
since each thread needs to process no more than from each array.
Multi-threaded Merge Sort Analysis:
Two sorted arrays are merged according to the following prescription:
Let be the value of the median element of the left-hand array, , where is the size.
Let be the position of in the right-hand array, , through a binary search.
Array is split about its mid-point , where .
Array is split about b, , where .
Sub-arrays are then recombined: , as the following diagram depicts:
Case 1: The number of threads, . The order of complexity for merging two arrays of equal size is
given by:
Given arrays, of equal size, and an infinite number of threads, , the arrays may be merged in pairs to
give:
For 1 billion arrays and a k-statistic of 1 billion, would be:
Case 2: The number of threads, , is finite. The expression for this case is ( threads for merge, for
reduction by pairs, ):
, NB:
.
Threads0
1
2
3
4
5
6Synchronisation of
pushes to task queue
Ordered array
Results (using: Visual Studio 2013, Intel Quad-Core 2.6 GHz i7-3720QM, 8 GB RAM, Dell Precision M4700 Mobile Workstation):
For 1, 2 and 4 threads, the vectorised bucket sort is around five times faster than successive, single-threaded std::merge operations.
Sadly, the performance decreases with thread-count, indicating thread-overhead is an issue. Performance-profiling with Intel VTune
Amplifier has not yet been undertaken, so the amount of time spent in locks, RFOs and memory fetches to cache is unknown.
The vectorised merge-sort is very slow. This may, again, be the result of thread-overhead and thread oversubscription.
Overall, it is surmised that this sort of vectorisation is better accomplished through SIMD or FPGAs, where each "thread" performs
relatively few operations and the thread-count can be far higher.
/**
* brief Algorithms for calculating Kth order statistic given multiple,
* sorted arrays. This compiles under Visual Studio 2013. As yet,
* code has not been ported to Eclipse and Linux. Code is untested
* but available for review.
* details This file contains a multi-threaded bucket sort and two methods.
* One of the methods uses the bucket sort to obtain Kth order
statistic and the other uses a merge-sort whose merge is
vectorised
* author Russell John Childs, PhD.
* date 2015-03-21
* copyright Russell John Childs, PhD.
*/
#include <vector>
#include <set>
#include <queue>
#include <chrono>
#include <thread>
#include <future>
#include <condition_variable>
#include <random>
#include <iostream>
#include <sstream>
/**
* @namespace Sort
* @brief The namespace for k statistics sorting
*/
namespace Sort
{
/** class
* @param Type. The type of the element in the array to be sorted
*
* @param BucketCount. The number of buckets for the sort
*
* @param Predicate. An overloaded method "unsigned operator()(const Type&)
* This must take a parameter of type const Type& and return a unique unsigned
* integer within [lower, upper] specified in this class's constructor.
*
*/
template< typename Type, unsigned BucketCount, typename Comp=std::less<Type>>
class BucketSort
{
public:
/**
* @param lower : unsigned. The lower value of the range.
*
* @param upper : unsigned. The upper value of the range.
*
* Notes: If an element lies outside the range it is ignored and not
* included in the final, sorted result
*/
BucketSort(unsigned lower, unsigned upper) :
m_lower(lower),
m_upper(upper),
m_normalisation(long double(BucketCount-1) / long double(upper - lower)),
m_size(0)
{
//Reset all the buckets
reset();
}
/**
* No operations specified.
*/
~BucketSort(void)
{
}
/**
* Emtpies the buckets
*/
void reset(void)
{
m_buckets.clear();
m_buckets.resize(BucketCount);
m_size = 0;
}
/**
* Sorts the passed array.
* @param arr : Type. The fixed array, of type Type, to be sorted
*
* Notes: This is a convenience function for handling fixed arrays.
* It is, at this time, unimplemented and does nothing.
*/
template<int Size, typename Pred>
void sort(Type (&arr) [Size], Pred)
{
//This method to be implemented at a later date.
}
/**
* Sorts the passed container.
* @param arr : const Container&. The conainer to be sorted.
*
* @param pred : Pred. A function "unsigned func(const Type& elem)" that
* returns the unique unsigned key associated with elem.
* Keys outside [lower, upper] cause elem to be skipped.
*
* Notes: (1) Container must provide method const Type& operator[](unsigned).
* (2) The result is returned by Type& operator[](unsigned) and
* get_result() This allows a succession of arrays to be passed to
* sort() before the result is obtained.
* (3) sort() may be terminated by pred() returning val > arr.size().
* (4) This implementation uses locks. Conversion to lock-free is
* relatively straightforward, but requires substantially more
* testing, due to subtle bugs that arise with lock-free.
*/
template< typename Container, typename Pred >
void sort(const Container &arr, Pred pred )
{
//Get array size
unsigned size = arr.size();
//Add each each source element to sorted list
for (unsigned i = 0; i < size; ++i)
{
unsigned key = pred((arr[i]));
if (m_lower <= key && key <= m_upper)
{
unsigned index = unsigned((key - m_lower)*m_normalisation);
{
std::lock_guard<std::mutex> lck(m_mutexes[index]);
m_buckets[index].insert(arr[i]);
//Update total element count;
++m_size;
}
}
else if (key == -int(1))
{
//predicate has signalled that sort should terminate
i = key-1;
}
}
}
/**
* Returns an element in the sorted array
* @param index : unsigned. The index of the element in the sorted array.
*
* @return const std::vector<Type>&. The list of sorted objects.
* Notes: 1. This method will be provided at a future date. It requires that
* BucketSort utilise a linear, contiguous buffer for its buckets
* allowing for O(1) retrieval of an element. It is, at this time,
* not available to the user, who must use the far less efficient
* mechanism "get_result()[index]".
*/
Type& operator[](unsigned index)
{
}
/**
* Returns a vector containing the sorted elements
*
* @param k : unsigned. Number of sorted elements to return. Default = All.
* @ return : const std::vector<Type>& . The sorted elements
*
*/
const std::vector<Type>& get_result(unsigned k=0)
{
unsigned size = k;
if (k == 0)
{
//Get total number of elements
for (auto& bucket : m_buckets)
{
size += bucket.size();
}
}
//Resize result vector
m_result.resize( size );
//Store sorted result
unsigned index = 0;
for (auto& bucket : m_buckets)
{
if (index < k)
{
auto lim = std::min<unsigned>(bucket.size(), k - index);
auto iter = bucket.begin();
for (unsigned i = 0; i < lim; ++i)
{
m_result[index++] = *iter++;
}
}
}
//Return sorted result
return m_result;
}
/**
* Finds a specified element
* @param in : const Type&. The element sought.
*
* @param advance : int. For the specified element, finds the element that
* is "advance" elements before (if advance < 0) or
* after (if advance > 0) the specified element "in".
*
* @return typename std::set<Type>::iterator . An iterator to the element,
* if present, or
end().
*
* Notes: This implementation uses locks. Conversion to lock-free is
* relatively straightforward, but requires substantially more
* testing, due to subtle bugs that arise with lock-free.
*/
template<typename Pred>
typename std::set<Type>::iterator find(const Type& in,
Pred pred, int
advance)
{
//Get bucket bounds and bucket index
int bounds = BucketCount - 1;
unsigned key = pred(in);
int index = unsigned((key - m_lower)*m_normalisation);
//Get beginning and end of the bucket table
std::unique_lock<std::mutex> lck(m_mutexes[0]);
auto begin = m_buckets[0].begin();
lck.unlock();
std::unique_lock<std::mutex> lck1(m_mutexes[bounds]);
auto end = m_buckets[bounds].end();
lck1.unlock();
//Create return var
typename std::set<Type, Comp>::iterator ret_val;
bool is_not_found = index > bounds;
if (is_not_found == false )
{
std::lock_guard<std::mutex> lck(m_mutexes[index]);
ret_val = m_buckets[index].find(in);
if (ret_val == m_buckets[index].end())
{
ret_val = end;
is_not_found = true;
}
}
else
{
//Out of bounds
ret_val = end;
}
//Increment iterator whilst within bounds
while (is_not_found == false && advance > 0)
{
std::unique_lock<std::mutex> lck(m_mutexes[index]);
if (ret_val != m_buckets[index].end())
{
//Increment if within bounds of current bucket
++ret_val;
--advance;
}
else if (++index <= bounds)
{
//If within bounds of table, get start of next bucket
lck.unlock();
std::lock_guard<std::mutex> lck1(m_mutexes[index]);
ret_val = m_buckets[index].begin();
}
else
{
//Out-of-bounds
ret_val = end;
is_not_found = true;
}
}
//Decrement iterator whilst within bounds
while (is_not_found == false && advance < 0)
{
std::unique_lock<std::mutex> lck(m_mutexes[index]);
if (ret_val != m_buckets[index].begin())
{
//Decrement if within bounds of current bucket
--ret_val;
++advance;
}
else if (--index >= 0)
{
//If within bounds of table, go to eof prev bucket
lck.unlock();
std::lock_guard<std::mutex> lck1(m_mutexes[index]);
//auto in = m_buckets[index];
//ret_val = in.begin() == in.end() ? in.end() : in.end()--;
ret_val = m_buckets[index].end();
}
else
{
//Out-of-bounds
ret_val = begin;
is_not_found = true;
}
}
return ret_val;
}
/**
* Returns total count of elements in bucket sort
*
* @return unsigned . Total count of elements.
*/
unsigned size(void)
{
return m_size;
}
private:
unsigned m_lower;
unsigned m_upper;
long double m_normalisation;
unsigned m_size;
std::vector<std::set<Type, Comp>> m_buckets;
std::mutex m_mutexes[BucketCount];
std::vector<Type> m_result;
};
template<typename T= unsigned> using Element=std::pair<unsigned, T>;
template<typename T> using ElementArray = std::vector<Element<T>>;
template<typename T> using ElementArrays = std::vector<ElementArray<T>>;
/** k_statistic
* @param arrays : ElementArrays. The sorted arrays to be merged
*
* @param k : unsigned. The k-statistic, i.e. the k smallest elements in "arrays".
*
* Notes: (1) At least one input array must have sizeof >= k. If necessary,
* pad the 1st array with value > k-th smallest.
* (2 )Function uses bucket sort to keep track of k smallest elements
* so far. For each array, if an element is larger than the largest
* of the k smallest found so far, then all later elements are
* discarded. Otherwise, the set of k smallest elements is updated
* with the new element.
*/
namespace KStatisticBucketSort
{
template< typename T >
const ElementArray<T> k_statistic(ElementArrays<T>& arrays, unsigned k)
{
std::atomic<unsigned> last_found;
//Find an array, A, with at least k elements
unsigned index = 0;
while (arrays[index++].size() < k && index < arrays.size());
unsigned upper = last_found = arrays[index - 1][k - 1].first;
//Create a buacket sort shared amongst thread.
//TODO: Remove magic number (100) and use dynamically sized buckets
struct Comp{
bool operator()(const Element<T>& lhs, const Element<T>& rhs )
{
return lhs.first < rhs.first;
}
};
BucketSort<Element<T>, 2000, Comp> bucket_sort(0, upper);
//Add A as the base case
{
auto pred = [&](const Element<T>& elem){ return elem.first; };
bucket_sort.sort(arrays[index - 1], pred);
}
unsigned start_array = index - 1;
//Create a predicate for the bucket sort
auto pred = [&](const Element<T>& elem)
{
//Update last element in list of k smallest found so far
unsigned old_val = last_found; //last of the k smallest
unsigned new_val = old_val; //new val for last of k smallest
bool stop_processing=false; //Flag to continue or discontinue
do
{
old_val = last_found;
new_val = old_val;
stop_processing = false;
if (bucket_sort.size() < k)
{
new_val =std::max<unsigned>( elem.first, old_val );
}
else if (elem.first > old_val)
{
//Simply stop processing array if elem > max(k-smallest)
new_val = old_val;
stop_processing = true;
}
else
{
//Add elem and update max(k-smallest)
auto tmp = [&](const Element<T>& in){ return in.first; };
new_val = std::max<unsigned>(elem.first, bucket_sort.find(
std::make_pair(old_val, elem.second), tmp, -1)->first);
}
} while (last_found.compare_exchange_weak(old_val, new_val) == false
&&
stop_processing == false );
return stop_processing == false ? elem.first : -int(1);
};
//Create a thread function that adds a new array to the bucket sort
index = 0;
std::atomic<bool> start_thread = false;
std::atomic<unsigned> pop_count = 0;
auto add_array = [&]( void )
{
//Wait for start signal
while (start_thread == false);
//Loop over arrays "popping" each one processed
unsigned old_count = pop_count;
unsigned new_count = old_count + 1;
while (old_count < arrays.size())
{
//Claim an array, by capturing pop count and incrmeenting
while (pop_count.compare_exchange_weak(old_count, old_count+1)
== false);
//Check pop count is within bounds and not the starting array
if (old_count != start_array && old_count < arrays.size())
{
bucket_sort.sort(arrays[old_count], pred);
}
}
};
//Add arrays to the bucket sort, limit number of threads (4=magic num,
//but this is only proof-of-concept code
unsigned thread_limit = 4;
unsigned thread_count = 0;
std::vector< std::future<void> >results;
for (auto& arr : arrays)
{
//Add array to sort
results.push_back(
std::async(std::launch::async, add_array));
//If thread lim reached:
if (thread_count > thread_limit)
{
//Wait for existing threads to finish
start_thread = true;
for (auto& res : results)
{
res.wait();
}
//Reset limit checks
thread_count = 0;
results.clear();
}
++thread_count;
}
//Start threads and wait for results
start_thread = true;
for (auto& res : results)
{
res.wait();
}
//Extract k-th order statistic from bucket table
return bucket_sort.get_result(k);
}
}
/** k_statistic
* @param arrays : ElementArrays. The sorted arrays to be merged
*
* @param k : unsigned. The k-statistic, i.e. the k smallest elements in "arrays".
*
* Notes: (1) Function uses merge sort to find smallest elements. The merge is vectorised.
* (2) The implementation of this method is in a STATE OF FLUX.
* Focussed on stress-testing Bucket-Sort algorithm
*/
namespace KStatisticMergeSort
{
template<typename T>
const ElementArray<T> k_statistic(ElementArrays<T>& arrays, unsigned k)
{
//std::atomic isn't copy-constructible, so need to wrap it for std::vec
//Very annoying. Naughty C++ committee.
struct AtomicUnsigned
{
AtomicUnsigned(void) : m_val(new std::atomic<unsigned>(0))
{
}
~AtomicUnsigned(void)
{
delete m_val;
}
void operator=(unsigned i)
{
*m_val = i;
}
void operator+=(unsigned i)
{
unsigned old = *m_val;
while (m_val->compare_exchange_weak(old, old + i) == false);
}
operator unsigned(void)
{
return *m_val;
}
std::atomic<unsigned>* m_val;
};
//EOF annoying code-bloat
//Create sync id for each queue
std::vector<AtomicUnsigned> sync_id;
//Struct to hold id, used to sync pushes to queue,
//and the boundaries of the left and right array chunks to merge
struct QueueData
{
unsigned m_sync_id;
unsigned m_generations_skipped;
typename ElementArray<T>::const_iterator m_lhs_first;
typename ElementArray<T>::const_iterator m_lhs_last;
typename ElementArray<T>::const_iterator m_rhs_first;
typename ElementArray<T>::const_iterator m_rhs_last;
};
//Create task queues. One queue for each array-pair. Length will halve
//with each iteration: N/2 -> N/4 -> N/8 ... -> 1 merged array
std::vector<std::queue<QueueData>> queues;
//Create array of sorted counts (2*k sorted elems => done!)
std::vector<AtomicUnsigned> sorted_count;
//Create pneding task count (1 thread spawned per task up to thr limit)
std::atomic<int> pending_tasks(0);
std::vector<AtomicUnsigned> queue_tasks;
//Create merge function
std::vector<std::mutex> mut(arrays.size()); //This function should be made lock-free
//Lambda to extract queue results to an array
std::function<void(unsigned)> merge_pair = [&](unsigned i)
{
//Flag to indicate pair is merged
bool done = false;
//Lock queue and peek 1st task
std::unique_lock<std::mutex> lck(mut[i]);
QueueData q_data = queues[i].front();
//Check for termination condition (2*k sorted items)
if (sorted_count[i] == k << 1)
{
//Check final tree level fully populated
//Level of node
unsigned node_level =
static_cast<unsigned>(std::log2(q_data.m_sync_id+1)+1);
//Depth of tree
unsigned last_level =
static_cast<unsigned>(std::log2(sync_id[i] + 1) + 1);
//Node is at tree depth?
done = node_level == last_level;
}
//Terminate if finished
if (done == true)
{
lck.unlock();
}
//Only process queue[i] if we have not reached last node in last level
else
{
//Get latest chunk from queue
queues[i].pop();
lck.unlock();
//Get middle of lhs
//-------------------- -------------
//| | M | | OR | M | |
//------------------- -------------
auto lhs_middle = q_data.m_lhs_first +
(q_data.m_lhs_last - q_data.m_lhs_first) / 2;
//Get "middle" of rhs(insertion point of M)
//--------------------
//| |M'<M | I>=M |
//--------------------
auto rhs_middle = std::upper_bound(q_data.m_rhs_first,
q_data.m_rhs_last, *lhs_middle);
//Create (lhs.lower_half, rhs.lower_half).
// ( [beg, mid) , [beg, I) )
QueueData left
{
q_data.m_sync_id * 2 + 1,
q_data.m_generations_skipped,
q_data.m_lhs_first,
lhs_middle,//NB: first==mid implies [mid,mid), i.e. "empty"
q_data.m_rhs_first,
rhs_middle
};
//Create (lhs.upper_half, rhs.upper_half).
// ( [mid,last+1) , [I, last+1) )
QueueData right
{
q_data.m_sync_id * 2 + 2,
q_data.m_generations_skipped,
lhs_middle,
q_data.m_lhs_last,
rhs_middle,
q_data.m_rhs_last
};
//Termination conditions
//1. Prev gen lhs sorted(q_data.m_lhs_first==q_data.m_lhs_last)
//2. All data >= lhs.mid ---> left=empty
if ( right.m_generations_skipped > 0 ||
q_data.m_lhs_first == q_data.m_lhs_last ||
/*q_data.m_rhs_first == q_data.m_rhs_last ||*/
(left.m_lhs_first == left.m_lhs_last &&
left.m_rhs_first == left.m_rhs_last) )
{
//Store in + just keep rhs branch
right = q_data;
right.m_sync_id = q_data.m_sync_id * 2 + 2;
//Keep track of tree levels skipped in bifurcation process
++right.m_generations_skipped;
//Update count of sorted elements
if (right.m_generations_skipped == 1)
{
sorted_count[i].m_val->fetch_add(
((q_data.m_lhs_last - q_data.m_lhs_first) +
(q_data.m_rhs_last - q_data.m_rhs_first)) );
}
}
//Wait for synchronisation value
// / 
// / 
// /  /  wait for right-2 (0 gens skipped)
// / / /  wait for right-2 (1 gen skipped)
// //// //  wait for right-4 (2 gens skipped)
unsigned skipped = std::max<unsigned>(right.m_generations_skipped, 1);
unsigned prev_id;
do
{
prev_id = right.m_sync_id - (1 << skipped);
//std::this_thread::sleep_for(std::chrono::microseconds(1));
std::this_thread::yield();
} while (sync_id[i].m_val->compare_exchange_weak(prev_id, prev_id)
== false);
//Only push lhs if generation not skipped
lck.lock();
if (right.m_generations_skipped == 0)
{
//Push chunk to q and update task count
queues[i].push(left);
auto& task = *(queue_tasks[i].m_val);
task++;
++pending_tasks;
}
//Push rhs chunk, update task count & sync id
queues[i].push(right);
auto& task = *(queue_tasks[i].m_val);
task++;
++pending_tasks;
sync_id[i] = right.m_sync_id;
}
};
//Lambda for performing container.begin()+n
auto advance = [&]( const ElementArray<T>& a,
typename ElementArray<T>::const_iterator it, unsigned ind)
{
std::advance(it, std::min<unsigned>(ind, a.end()-it));
return it;
};
//Lambda to push array pairs onto queues
auto push_pairs = [&](const ElementArrays<T>& arrs)
{
//Resize queues, task counts, sorted counts, sync ids
auto size = arrs.size();
queues.clear();
queues.resize(size / 2);
queue_tasks.clear();
queue_tasks.resize(size / 2);
sorted_count.clear();
sorted_count.resize(size/2);
sync_id.clear();
sync_id.resize(queues.size());
//oop over array pairs
unsigned count = 0;
for (unsigned i = 0; i < (size>>1)<<1; i += 2)
{
//Populate thread id, gens skipped, start/end of lhs of pair,
//start/end rhs of pair. NB interval = [beg,end) = [0,1,..,k-1,k)
QueueData data
{
0,
0,
arrs[i].begin(),
advance(arrs[i], arrs[i].cbegin(), k),
arrs[i + 1].cbegin(),
advance(arrs[i + 1], arrs[i + 1].cbegin(), k)
};
//Push chunk to q and update task count
queues[count].push(data);
auto& tmp = *(queue_tasks[count++].m_val);
tmp++;
++pending_tasks;
}
};
//Push original arrays
push_pairs(arrays);
//Lambda to extract queue results to an array
auto extract = [&](ElementArrays<T>& arrs)
{
unsigned count = 0;
//Lopp over queues
for (auto& q : queues)
{
ElementArray<T> arr;
while (q.empty() == false)
{
//Get start/end of chunk range
auto pair = q.front();
q.pop();
//Only extract if range not empty and elems <= k
if (pair.m_lhs_first != pair.m_lhs_last)
{
std::copy(pair.m_lhs_first, pair.m_lhs_last,
std::back_inserter(arr));
}
//Only extract if range not empty and elems <= k
if (pair.m_rhs_first != pair.m_rhs_last)
{
std::copy(pair.m_rhs_first, pair.m_rhs_last,
std::back_inserter(arr));
}
}
//Return sorted array
arr.resize(k);
arrs.push_back(arr);
}
};
//This section attempts to balance workload evenly across available threads.
//
//For each array-pair in turn, priority is given to vectorising the merge
//of each pair.
//
//Array-pairs are, successively, processed in parallel, until all threads
//are consumed in the vectorised merges of the pairs processed so far.
//
//As each vectorised merge bifructaes (1->2->4->8...) it will consume more
//threads until all threads are being used to vectorise merges. When this
//point is reached, array-pair parallel processes ceases, since all threads
//are vectorising the existing merges.
//
//The aim is to minimise the latency of the merge in the hope that this
//minimises the latency associated with merging all the array-pairs.
//
//Storage for results (toggle between ret_val[0<->1], so ret_val[a] has prev
//results ret_val[b] has new results in seq N/2 -> N/4 -> N/8
ElementArrays<T> ret_val[2];
unsigned toggle = 0;
//Vector of futures returned by tasks
std::vector<std::future<void>> results;
//Thread limit and count. Magic number, since code=proof-of-concept only
unsigned thread_limit = 1024;
unsigned thread_count = 0;
//Flag to indicate that sorting is complete
bool done;
bool once = true;
do
{
done = true;
//Keep processing any tasks on the queues
while (pending_tasks > 0)
{
//Loop over queueus
unsigned q = 0;
for (auto& queue : queues)
{
//Process tasks for this queue
std::unique_lock<std::mutex> lck(mut[q]);
unsigned num = queue_tasks[q];
for (unsigned task = 0; task < num; ++task)
{
//Limit num threads spawned
if (thread_count < thread_limit)
{
//Spawn 1 thrd/task and update task count
results.push_back(std::async(std::launch::async,
merge_pair,
q));
--pending_tasks;
auto& tmp = *(queue_tasks[q].m_val);
tmp--;
++thread_count;
}
}
++q;
lck.unlock();
}
//Wait for current tasks to finish spawning new tasks
for (auto& res : results)
{
res.wait();
--thread_count;
}
results.clear();
}
//Extract results
auto& old_ret_val = ret_val[toggle];
toggle = ++toggle % 2;
ret_val[toggle].clear();
if (once && arrays.size()%2 != 0)
{
ret_val[toggle].push_back(arrays[arrays.size() - 1]);
}
once = false;
if (old_ret_val.size() % 2 != 0)
{
ret_val[toggle].push_back(old_ret_val[old_ret_val.size() - 1]);
}
extract(ret_val[toggle]);
//Clear queues and add extracted results
if (ret_val[toggle].size() > 1)
{
queues.clear();
push_pairs(ret_val[toggle]);
done = false;
}
} while (done == false);
//Return sorted results
return ret_val[toggle][0];
}
}
}
namespace TestSort
{
/**
* @brief Creates random number of randomly-sized, ordered arrays of random numbers.
*
* @param k : unsigned k-th order statisitc.
*
* @param max_num_arrays : unsigned . Max number of arrays to be sorted.
*
* @param max_size_of_arrays : unsigned . Max number of elements in array.
*
* @param out : Sort::ElementArrays<unsigned>& . The arrays to be returned.
*
*/
template<typename T>
void get_random_arrays(unsigned k, unsigned max_num_arrays,
unsigned max_size_of_array, Sort::ElementArrays<T>& out)
{
//Random generator (replace 0 with dev() for random seed)
std::random_device dev;
std::mt19937 generator(0/*dev()*/);
Sort::ElementArrays<unsigned> element_arrays;
//Create random number for number of arrays
std::uniform_int_distribution<T> arrays_rnd(2, max_num_arrays);
unsigned num_arrays = arrays_rnd(generator);
//Create arrays
std::set<unsigned> s;
for (unsigned i = 0; i < num_arrays; ++i)
{
//Create radnom number for sizeof array
Sort::ElementArray<T> element_array;
std::uniform_int_distribution<T> elements_rnd(k, max_size_of_array);
unsigned num_elements = elements_rnd(generator);
//Create random numbers for elements and add to array
std::uniform_int_distribution<T> element_rnd(0, 1<<31);
for (unsigned j = 0; j < num_elements; ++j)
{
T elem = element_rnd(generator);
/*bool res = s.insert(elem).second;
if (res == false)
{
//std::cout << "duplicate removed" << std::endl;
j = j-1;
}
else*/
{
element_array.push_back(std::make_pair(elem, i));
}
}
//Sort the arrays and return them
std::sort(element_array.begin(), element_array.end());
out.push_back(element_array);
}
}
/**
* @brief Creates random number of randomly-sized, ordered arrays of random numbers.
* Adds all of these arrays to the sorting algorithm and validates the result
* against the known result obtained by adding arrays to a sorted set.
*
* @param k : unsigned k-th order statisitc.
*
* @param max_num_arrays : unsigned . Max number of arrays to be sorted.
*
* @param max_size_of_arrays : unsigned . Max number of elements in array.
*
* @return bool : Pass = true.
*
*/
bool test_bucket(unsigned k, unsigned max_num_arrays,
unsigned max_size_of_array)
{
//Declare arrays
Sort::ElementArrays<unsigned> test_sort;
//Print banner
std::cout << "Creating up to " << max_num_arrays
<< " arrays, each of size <= "
<< max_size_of_array
<< ". Please wait ..." << std::endl;
//Create a set of ordered, random integer arrays
get_random_arrays<unsigned>(k, max_num_arrays,
max_size_of_array, test_sort);
//Print banner
unsigned ave = 0;
for (auto& arr : test_sort) ave += arr.size();
ave /= test_sort.size();
std::cout << "Testing bucket algorithm with: Num arrays = "
<< test_sort.size()
<< ", average size = "
<< ave
<< ", k = " << k
<< ". Please wait ... " << std::endl;
//Run sorting algorithm
auto start = std::chrono::high_resolution_clock::now();
auto result = Sort::KStatisticBucketSort::k_statistic(test_sort, k);
auto end = std::chrono::high_resolution_clock::now();
//Print banner
std::cout << "Algorithm execution completed, after "
<< std::chrono::duration<double, std::milli>(end - start).count()
<< " ms." << std::endl;
//Run STL merge
//Print banner
std::cout << "Running std::merge. Please wait ..." << std::endl;
//Start timer
auto start_1 = std::chrono::high_resolution_clock::now();
//Merge (out[0]=arr1)+arr2 -> out[1], out[1]+arr2 -> out[0], out[0]+arr3 -> out[1] ....
std::vector<Sort::Element<unsigned>> output[2];
output[0].insert(output[0].begin(),
test_sort[0].begin(), test_sort[0].begin() + k);
output[0].resize(2 * k);
output[1].resize(2 * k);
unsigned toggle;
for (unsigned i = 1; i < test_sort.size(); ++i)
{
toggle = (i - 1) % 2;
std::merge(output[toggle].begin(), output[toggle].begin() + k,
test_sort[i].begin(), test_sort[i].begin() + k,
output[(toggle + 1) % 2].begin());
}
//Resize back to k
output[(toggle + 1) % 2].resize(k);
auto& validate = output[(toggle + 1) % 2];
//stop timer
auto end_1 = std::chrono::high_resolution_clock::now();
//Print banner
std::cout << "std::merge execution completed, after "
<< std::chrono::duration<double, std::milli>(end_1 - start_1).count()
<< " ms." << std::endl;
std::cout << "Validating results. Please wait ..."
<< std::endl;
//Validate that values extracted from set == result from algorithm
return result == validate;
}
bool test_merge(unsigned k, unsigned max_num_arrays,
unsigned max_size_of_array)
{
using namespace Sort;
//Declare arrays
Sort::ElementArrays<unsigned> test_sort;
//Print banner
std::cout << "Creating up to " << max_num_arrays
<< " arrays, each of size <= "
<< max_size_of_array
<< ". Please wait ..." << std::endl;
//Preliminary debugging tests.
/*test_sort.push_back(
{ { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 },
{ 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 },
{ 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } });
test_sort.push_back(
{ { 40, 40 }, { 42, 42 }, { 44, 44 }, { 46, 46 }, {48, 48 }, { 410, 410 },
{ 412, 412 }, { 414, 414 }, { 416, 416 }, { 418, 418 }, { 420, 420 },
{ 422, 422 }, { 424, 424 }, { 426, 426 }, { 428, 428 }, { 430, 430 } });*/
/*test_sort.push_back(
{ { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 },
{ 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 },
{ 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } });*/
/*test_sort.push_back(
{ { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 },
{ 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 },
{ 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } });
test_sort.push_back(
{ { 1, 1 }, { 3, 3 }, { 5, 5 }, { 7, 7 }, { 9, 9 }, { 11, 11 },
{ 13, 13 }, { 15, 15 }, { 17, 17 }, { 19, 19 }, { 21, 21 },
{ 23, 23 }, { 25, 25 }, { 27, 27 }, { 29, 29 }, { 31, 31 } });
test_sort.push_back(
{ { 1, 1 }, { 3, 3 }, { 5, 5 }, { 7, 7 }, { 9, 9 }, { 11, 11 },
{ 13, 13 }, { 15, 15 }, { 17, 17 }, { 19, 19 }, { 21, 21 },
{ 23, 23 }, { 25, 25 }, { 27, 27 }, { 29, 29 }, { 31, 31 } });
test_sort.push_back(
{ { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 },
{ 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 },
{ 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } });*/
/*
for (unsigned i = 0; i < 4; ++i)
{
Sort::ElementArray<unsigned> arr;
for (unsigned j = 0; j < 2 * k; ++j)
{
arr.push_back(std::make_pair(i + (j * 4), i + (j * 4)));
}
test_sort.push_back(arr);
}
test_sort.pop_back();*/
//Randomised strees-test.
get_random_arrays<unsigned>(k, max_num_arrays, max_size_of_array, test_sort);
//Print banner
unsigned ave = 0;
for (auto& arr : test_sort) ave += arr.size();
ave /= test_sort.size();
std::cout << "Testing merge algorithm with: Num arrays = "
<< test_sort.size()
<< ", average size = "
<< ave
<< ", k = " << k
<< ". Please wait ... " << std::endl;
//Run sorting algorithm
auto start = std::chrono::high_resolution_clock::now();
auto result = Sort::KStatisticMergeSort::k_statistic(test_sort, k);
auto end = std::chrono::high_resolution_clock::now();
//Print banner
std::cout << "Algorithm execution completed, after "
<< std::chrono::duration<double, std::milli>(end - start).count()
<< " ms." << std::endl;
//Run STL merge
//Print banner
std::cout << "Running std::merge. Please wait ..." << std::endl;
//Start timer
auto start_1 = std::chrono::high_resolution_clock::now();
//Merge (out[0]=arr1)+arr2 -> out[1], out[1]+arr2 -> out[0], out[0]+arr3 -> out[1] ....
std::vector<Sort::Element<unsigned>> output[2];
output[0].insert(output[0].begin(),
test_sort[0].begin(), test_sort[0].begin() + k);
output[0].resize(2 * k);
output[1].resize(2 * k);
unsigned toggle;
for (unsigned i = 1; i < test_sort.size(); ++i)
{
toggle = (i - 1) % 2;
std::merge(output[toggle].begin(), output[toggle].begin() + k,
test_sort[i].begin(), test_sort[i].begin() + k,
output[(toggle + 1) % 2].begin());
}
//Resize back to k
output[(toggle + 1) % 2].resize(k);
auto& validate = output[(toggle + 1) % 2];
//stop timer
auto end_1 = std::chrono::high_resolution_clock::now();
//Print banner
std::cout << "std::merge execution completed, after "
<< std::chrono::duration<double, std::milli>(end_1 - start_1).count()
<< " ms." << std::endl;
std::cout << "Validating results. Please wait ..."
<< std::endl;
//Validate that values extracted from set == result from algorithm
return result == validate;
}
}
int main(void)
{
bool result_bucket = TestSort::test_bucket(1000, 1000, 10000);
std::cout << "Bucket Sort Algorithm: " << (result_bucket ? "Passed." : "Failed.") <<
std::endl;
std::cout << std::endl;
bool result_merge = TestSort::test_merge(100, 500, 10000);
std::cout << "Merge Sort Algorithm: " << (result_merge ? "Passed." : "Failed.") <<
std::endl;
return 0;
}

More Related Content

PDF
Interview C++11 code
PPT
Link list
PDF
CS225_Prelecture_Notes 2nd
PPTX
Lecture11 standard template-library
PPT
Memory Management In C++
PDF
Java 8 Stream API. A different way to process collections.
ODT
Java%20 new%20faq.doc 0
DOCX
Memory management in c++
Interview C++11 code
Link list
CS225_Prelecture_Notes 2nd
Lecture11 standard template-library
Memory Management In C++
Java 8 Stream API. A different way to process collections.
Java%20 new%20faq.doc 0
Memory management in c++

What's hot (20)

PPTX
C++11 - STL Additions
PPTX
Dynamic memory allocation in c++
PPT
JavaScript Objects
PPT
JavaScript Functions
PPTX
C++11 Multithreading - Futures
PPT
Javascript built in String Functions
PDF
Memory Management C++ (Peeling operator new() and delete())
PDF
Memory Management with Java and C++
PDF
WebGL 2.0 Reference Guide
PPTX
L14 string handling(string buffer class)
PPTX
PPTX
classes & objects in cpp overview
PPTX
POLITEKNIK MALAYSIA
PDF
N-Queens Combinatorial Problem - Polyglot FP for Fun and Profit - Haskell and...
PDF
Why Haskell
PPTX
Lecture08 stacks and-queues_v3
PPTX
Lecture07 the linked-list_as_a_data_structure_v3
PPT
C1320prespost
PPTX
iOS Session-2
PPTX
Pointer in C++
C++11 - STL Additions
Dynamic memory allocation in c++
JavaScript Objects
JavaScript Functions
C++11 Multithreading - Futures
Javascript built in String Functions
Memory Management C++ (Peeling operator new() and delete())
Memory Management with Java and C++
WebGL 2.0 Reference Guide
L14 string handling(string buffer class)
classes & objects in cpp overview
POLITEKNIK MALAYSIA
N-Queens Combinatorial Problem - Polyglot FP for Fun and Profit - Haskell and...
Why Haskell
Lecture08 stacks and-queues_v3
Lecture07 the linked-list_as_a_data_structure_v3
C1320prespost
iOS Session-2
Pointer in C++
Ad

Similar to Algorithms devised for a google interview (20)

DOCX
Consider this code using the ArrayBag of Section 5.2 and the Locat.docx
PDF
LAB (100-) The Numbers Module Your task for this lab is to complete.pdf
PDF
Getting StartedCreate a class called Lab8. Use the same setup for .pdf
PDF
Ashish garg research paper 660_CamReady
PPTX
Lecture 9_Classes.pptx
PDF
The Ring programming language version 1.8 book - Part 86 of 202
PDF
In this lab, we will write an application to store a deck of cards i.pdf
PDF
Recursion to iteration automation.
PPT
DSA Lec-2 Arrays ADT FOR THE STUDENTS OF BSCS
PDF
For this lab, you will write the following filesAbstractDataCalc.pdf
PPTX
Hive - ORIEN IT
PPT
Lecture5
PPTX
Object Oriented Design and Programming Unit-05
PPT
lect- 3&4.ppt
PDF
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2
DOCX
Assignment 13assg-13.cppAssignment 13assg-13.cpp   @auth.docx
PPTX
Object Oriented Programming Using C++: C++ STL Programming.pptx
PDF
Functions, Strings ,Storage classes in C
PDF
Homework Assignment – Array Technical DocumentWrite a technical .pdf
ODP
(2) collections algorithms
Consider this code using the ArrayBag of Section 5.2 and the Locat.docx
LAB (100-) The Numbers Module Your task for this lab is to complete.pdf
Getting StartedCreate a class called Lab8. Use the same setup for .pdf
Ashish garg research paper 660_CamReady
Lecture 9_Classes.pptx
The Ring programming language version 1.8 book - Part 86 of 202
In this lab, we will write an application to store a deck of cards i.pdf
Recursion to iteration automation.
DSA Lec-2 Arrays ADT FOR THE STUDENTS OF BSCS
For this lab, you will write the following filesAbstractDataCalc.pdf
Hive - ORIEN IT
Lecture5
Object Oriented Design and Programming Unit-05
lect- 3&4.ppt
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2
Assignment 13assg-13.cppAssignment 13assg-13.cpp   @auth.docx
Object Oriented Programming Using C++: C++ STL Programming.pptx
Functions, Strings ,Storage classes in C
Homework Assignment – Array Technical DocumentWrite a technical .pdf
(2) collections algorithms
Ad

More from Russell Childs (20)

PDF
spinor_quantum_simulator_user_guide_.pdf
PDF
String searching o_n
PDF
String searching o_n
PDF
String searching o_n
PDF
String searching
PDF
PDF
PDF
Feature extraction using adiabatic theorem
PDF
Feature extraction using adiabatic theorem
PDF
Wavelets_and_multiresolution_in_two_pages
PDF
Relativity 2
PDF
Dirac demo (quantum mechanics with C++). Please note: There is a problem with...
PDF
Shared_memory_hash_table
PDF
Full resume dr_russell_john_childs_2016
PDF
Simple shared mutex UML
PDF
Design pattern to avoid downcasting
PDF
Interview uml design
PDF
Full_resume_Dr_Russell_John_Childs
PDF
Dynamic programming burglar_problem
PDF
K d tree_cpp
spinor_quantum_simulator_user_guide_.pdf
String searching o_n
String searching o_n
String searching o_n
String searching
Feature extraction using adiabatic theorem
Feature extraction using adiabatic theorem
Wavelets_and_multiresolution_in_two_pages
Relativity 2
Dirac demo (quantum mechanics with C++). Please note: There is a problem with...
Shared_memory_hash_table
Full resume dr_russell_john_childs_2016
Simple shared mutex UML
Design pattern to avoid downcasting
Interview uml design
Full_resume_Dr_Russell_John_Childs
Dynamic programming burglar_problem
K d tree_cpp

Recently uploaded (20)

PPTX
Module 8- Technological and Communication Skills.pptx
PPT
Chapter 1 - Introduction to Manufacturing Technology_2.ppt
PPTX
"Array and Linked List in Data Structures with Types, Operations, Implementat...
PDF
Unit I -OPERATING SYSTEMS_SRM_KATTANKULATHUR.pptx.pdf
PPTX
Software Engineering and software moduleing
PPTX
PRASUNET_20240614003_231416_0000[1].pptx
PPTX
A Brief Introduction to IoT- Smart Objects: The "Things" in IoT
PPTX
CN_Unite_1 AI&DS ENGGERING SPPU PUNE UNIVERSITY
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PDF
Cryptography and Network Security-Module-I.pdf
PDF
Computer System Architecture 3rd Edition-M Morris Mano.pdf
PDF
Design of Material Handling Equipment Lecture Note
PDF
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
PDF
Unit1 - AIML Chapter 1 concept and ethics
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PDF
VSL-Strand-Post-tensioning-Systems-Technical-Catalogue_2019-01.pdf
PDF
First part_B-Image Processing - 1 of 2).pdf
PDF
Computer organization and architecuture Digital Notes....pdf
PDF
August -2025_Top10 Read_Articles_ijait.pdf
PDF
Present and Future of Systems Engineering: Air Combat Systems
Module 8- Technological and Communication Skills.pptx
Chapter 1 - Introduction to Manufacturing Technology_2.ppt
"Array and Linked List in Data Structures with Types, Operations, Implementat...
Unit I -OPERATING SYSTEMS_SRM_KATTANKULATHUR.pptx.pdf
Software Engineering and software moduleing
PRASUNET_20240614003_231416_0000[1].pptx
A Brief Introduction to IoT- Smart Objects: The "Things" in IoT
CN_Unite_1 AI&DS ENGGERING SPPU PUNE UNIVERSITY
distributed database system" (DDBS) is often used to refer to both the distri...
Cryptography and Network Security-Module-I.pdf
Computer System Architecture 3rd Edition-M Morris Mano.pdf
Design of Material Handling Equipment Lecture Note
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
Unit1 - AIML Chapter 1 concept and ethics
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
VSL-Strand-Post-tensioning-Systems-Technical-Catalogue_2019-01.pdf
First part_B-Image Processing - 1 of 2).pdf
Computer organization and architecuture Digital Notes....pdf
August -2025_Top10 Read_Articles_ijait.pdf
Present and Future of Systems Engineering: Air Combat Systems

Algorithms devised for a google interview

  • 1. © Russell John Childs, PhD. Date: 2015-03-21. Algorithms for calculating order statistic. Module K-STATISTIC Specifications Description: Given an unsigned integer, k, and a set of arrays, S:={A}, the k smallest items in S shall be found and returned. Specifications: K-STATISTIC-1: "A" shall be a sorted array of unsigned, non-duplicative integers. K-STATISTIC-2: "S" shall be a set of sorted arrays, A. K-STATISTIC-3: An unsigned integer, "k", shall be within the range . K-STATISTIC-4: : "R" shall be a sorted array of unsigned integers. K-STATISTIC-5: The k smallest elements, over all arrays, A, in the set S, shall be found and returned in R. K-STATISTIC-6: T shall be the number of threads allocated to the module. K-STATISTIC-7: The time-order-of-complexity of this module shall be assessed. Interface Specifications: K-STATISTIC-8: This module shall provide the interface: template<typename Type> vector<Type> k_statistic(const vector< vector<Type> >& S, unsigned k) K-STATISTIC-8.1: k_statistic shall take the set, S:={A}, as an argument. K-STATISTIC-8.2: k_statistic shall take, k, as an argument. K-STATISTIC-8.3: k_statistic shall take a template parameter, Type, which shall map to an integer. K-STATISTIC-9: k_statistic shall return the set R. Data Specifications: K-STATISTIC-10: Each sorted array, A, shall be no larger than 1MB ( bytes). K-STATISTIC-11: "D" shall be an ordered dataset of unsigned integers. K-STATISTIC-12: "D" shall be partitioned into the set S:={A} and A shall satisfy K-STATISTIC-10 Module K-STATISTIC Analysis Multi-threaded Bucket Sort Analysis: Let B be a bucket sort. Let N be the number of items to be sorted. Let M be the number of buckets in B. Let be i-th bucket in B. Let be the number of items in . Items, , are placed in container, , in bucket , . Two cases arise: (1) C is a vectorised, ordered linked-list, s.t. consecutive nodes belong to each thread, where T is thread-count, and is multi-casted to all threads. Insertions and searches are : Copying to linear memory is . Thus, for it is possible to achieve are insertion, search and copy. Thread- overhead may make SIMD a more suitable choice. (2) C is a non-vectorised ordered set: Item insertion/search: , where H is entropy: . Sort: . The entropy of the integers directly affects . A uniform distribution has the lowest : . A -func has the highest: . With one thread per array: , since each thread needs to process no more than from each array.
  • 2. Multi-threaded Merge Sort Analysis: Two sorted arrays are merged according to the following prescription: Let be the value of the median element of the left-hand array, , where is the size. Let be the position of in the right-hand array, , through a binary search. Array is split about its mid-point , where . Array is split about b, , where . Sub-arrays are then recombined: , as the following diagram depicts: Case 1: The number of threads, . The order of complexity for merging two arrays of equal size is given by: Given arrays, of equal size, and an infinite number of threads, , the arrays may be merged in pairs to give: For 1 billion arrays and a k-statistic of 1 billion, would be: Case 2: The number of threads, , is finite. The expression for this case is ( threads for merge, for reduction by pairs, ): , NB: . Threads0 1 2 3 4 5 6Synchronisation of pushes to task queue Ordered array
  • 3. Results (using: Visual Studio 2013, Intel Quad-Core 2.6 GHz i7-3720QM, 8 GB RAM, Dell Precision M4700 Mobile Workstation): For 1, 2 and 4 threads, the vectorised bucket sort is around five times faster than successive, single-threaded std::merge operations. Sadly, the performance decreases with thread-count, indicating thread-overhead is an issue. Performance-profiling with Intel VTune Amplifier has not yet been undertaken, so the amount of time spent in locks, RFOs and memory fetches to cache is unknown. The vectorised merge-sort is very slow. This may, again, be the result of thread-overhead and thread oversubscription. Overall, it is surmised that this sort of vectorisation is better accomplished through SIMD or FPGAs, where each "thread" performs relatively few operations and the thread-count can be far higher.
  • 4. /** * brief Algorithms for calculating Kth order statistic given multiple, * sorted arrays. This compiles under Visual Studio 2013. As yet, * code has not been ported to Eclipse and Linux. Code is untested * but available for review. * details This file contains a multi-threaded bucket sort and two methods. * One of the methods uses the bucket sort to obtain Kth order statistic and the other uses a merge-sort whose merge is vectorised * author Russell John Childs, PhD. * date 2015-03-21 * copyright Russell John Childs, PhD. */ #include <vector> #include <set> #include <queue> #include <chrono> #include <thread> #include <future> #include <condition_variable> #include <random> #include <iostream> #include <sstream> /** * @namespace Sort * @brief The namespace for k statistics sorting */ namespace Sort { /** class * @param Type. The type of the element in the array to be sorted * * @param BucketCount. The number of buckets for the sort * * @param Predicate. An overloaded method "unsigned operator()(const Type&) * This must take a parameter of type const Type& and return a unique unsigned * integer within [lower, upper] specified in this class's constructor. * */ template< typename Type, unsigned BucketCount, typename Comp=std::less<Type>> class BucketSort { public: /** * @param lower : unsigned. The lower value of the range. * * @param upper : unsigned. The upper value of the range. * * Notes: If an element lies outside the range it is ignored and not * included in the final, sorted result */ BucketSort(unsigned lower, unsigned upper) : m_lower(lower), m_upper(upper), m_normalisation(long double(BucketCount-1) / long double(upper - lower)), m_size(0) { //Reset all the buckets reset(); } /** * No operations specified. */ ~BucketSort(void) { }
  • 5. /** * Emtpies the buckets */ void reset(void) { m_buckets.clear(); m_buckets.resize(BucketCount); m_size = 0; } /** * Sorts the passed array. * @param arr : Type. The fixed array, of type Type, to be sorted * * Notes: This is a convenience function for handling fixed arrays. * It is, at this time, unimplemented and does nothing. */ template<int Size, typename Pred> void sort(Type (&arr) [Size], Pred) { //This method to be implemented at a later date. } /** * Sorts the passed container. * @param arr : const Container&. The conainer to be sorted. * * @param pred : Pred. A function "unsigned func(const Type& elem)" that * returns the unique unsigned key associated with elem. * Keys outside [lower, upper] cause elem to be skipped. * * Notes: (1) Container must provide method const Type& operator[](unsigned). * (2) The result is returned by Type& operator[](unsigned) and * get_result() This allows a succession of arrays to be passed to * sort() before the result is obtained. * (3) sort() may be terminated by pred() returning val > arr.size(). * (4) This implementation uses locks. Conversion to lock-free is * relatively straightforward, but requires substantially more * testing, due to subtle bugs that arise with lock-free. */ template< typename Container, typename Pred > void sort(const Container &arr, Pred pred ) { //Get array size unsigned size = arr.size(); //Add each each source element to sorted list for (unsigned i = 0; i < size; ++i) { unsigned key = pred((arr[i])); if (m_lower <= key && key <= m_upper) { unsigned index = unsigned((key - m_lower)*m_normalisation); { std::lock_guard<std::mutex> lck(m_mutexes[index]); m_buckets[index].insert(arr[i]); //Update total element count; ++m_size; } } else if (key == -int(1)) { //predicate has signalled that sort should terminate i = key-1; } } }
  • 6. /** * Returns an element in the sorted array * @param index : unsigned. The index of the element in the sorted array. * * @return const std::vector<Type>&. The list of sorted objects. * Notes: 1. This method will be provided at a future date. It requires that * BucketSort utilise a linear, contiguous buffer for its buckets * allowing for O(1) retrieval of an element. It is, at this time, * not available to the user, who must use the far less efficient * mechanism "get_result()[index]". */ Type& operator[](unsigned index) { } /** * Returns a vector containing the sorted elements * * @param k : unsigned. Number of sorted elements to return. Default = All. * @ return : const std::vector<Type>& . The sorted elements * */ const std::vector<Type>& get_result(unsigned k=0) { unsigned size = k; if (k == 0) { //Get total number of elements for (auto& bucket : m_buckets) { size += bucket.size(); } } //Resize result vector m_result.resize( size ); //Store sorted result unsigned index = 0; for (auto& bucket : m_buckets) { if (index < k) { auto lim = std::min<unsigned>(bucket.size(), k - index); auto iter = bucket.begin(); for (unsigned i = 0; i < lim; ++i) { m_result[index++] = *iter++; } } } //Return sorted result return m_result; } /** * Finds a specified element * @param in : const Type&. The element sought. * * @param advance : int. For the specified element, finds the element that * is "advance" elements before (if advance < 0) or * after (if advance > 0) the specified element "in".
  • 7. * * @return typename std::set<Type>::iterator . An iterator to the element, * if present, or end(). * * Notes: This implementation uses locks. Conversion to lock-free is * relatively straightforward, but requires substantially more * testing, due to subtle bugs that arise with lock-free. */ template<typename Pred> typename std::set<Type>::iterator find(const Type& in, Pred pred, int advance) { //Get bucket bounds and bucket index int bounds = BucketCount - 1; unsigned key = pred(in); int index = unsigned((key - m_lower)*m_normalisation); //Get beginning and end of the bucket table std::unique_lock<std::mutex> lck(m_mutexes[0]); auto begin = m_buckets[0].begin(); lck.unlock(); std::unique_lock<std::mutex> lck1(m_mutexes[bounds]); auto end = m_buckets[bounds].end(); lck1.unlock(); //Create return var typename std::set<Type, Comp>::iterator ret_val; bool is_not_found = index > bounds; if (is_not_found == false ) { std::lock_guard<std::mutex> lck(m_mutexes[index]); ret_val = m_buckets[index].find(in); if (ret_val == m_buckets[index].end()) { ret_val = end; is_not_found = true; } } else { //Out of bounds ret_val = end; } //Increment iterator whilst within bounds while (is_not_found == false && advance > 0) { std::unique_lock<std::mutex> lck(m_mutexes[index]); if (ret_val != m_buckets[index].end()) { //Increment if within bounds of current bucket ++ret_val; --advance; } else if (++index <= bounds) { //If within bounds of table, get start of next bucket lck.unlock(); std::lock_guard<std::mutex> lck1(m_mutexes[index]); ret_val = m_buckets[index].begin(); } else { //Out-of-bounds ret_val = end; is_not_found = true;
  • 8. } } //Decrement iterator whilst within bounds while (is_not_found == false && advance < 0) { std::unique_lock<std::mutex> lck(m_mutexes[index]); if (ret_val != m_buckets[index].begin()) { //Decrement if within bounds of current bucket --ret_val; ++advance; } else if (--index >= 0) { //If within bounds of table, go to eof prev bucket lck.unlock(); std::lock_guard<std::mutex> lck1(m_mutexes[index]); //auto in = m_buckets[index]; //ret_val = in.begin() == in.end() ? in.end() : in.end()--; ret_val = m_buckets[index].end(); } else { //Out-of-bounds ret_val = begin; is_not_found = true; } } return ret_val; } /** * Returns total count of elements in bucket sort * * @return unsigned . Total count of elements. */ unsigned size(void) { return m_size; } private: unsigned m_lower; unsigned m_upper; long double m_normalisation; unsigned m_size; std::vector<std::set<Type, Comp>> m_buckets; std::mutex m_mutexes[BucketCount]; std::vector<Type> m_result; }; template<typename T= unsigned> using Element=std::pair<unsigned, T>; template<typename T> using ElementArray = std::vector<Element<T>>; template<typename T> using ElementArrays = std::vector<ElementArray<T>>; /** k_statistic * @param arrays : ElementArrays. The sorted arrays to be merged * * @param k : unsigned. The k-statistic, i.e. the k smallest elements in "arrays". * * Notes: (1) At least one input array must have sizeof >= k. If necessary, * pad the 1st array with value > k-th smallest. * (2 )Function uses bucket sort to keep track of k smallest elements * so far. For each array, if an element is larger than the largest * of the k smallest found so far, then all later elements are * discarded. Otherwise, the set of k smallest elements is updated
  • 9. * with the new element. */ namespace KStatisticBucketSort { template< typename T > const ElementArray<T> k_statistic(ElementArrays<T>& arrays, unsigned k) { std::atomic<unsigned> last_found; //Find an array, A, with at least k elements unsigned index = 0; while (arrays[index++].size() < k && index < arrays.size()); unsigned upper = last_found = arrays[index - 1][k - 1].first; //Create a buacket sort shared amongst thread. //TODO: Remove magic number (100) and use dynamically sized buckets struct Comp{ bool operator()(const Element<T>& lhs, const Element<T>& rhs ) { return lhs.first < rhs.first; } }; BucketSort<Element<T>, 2000, Comp> bucket_sort(0, upper); //Add A as the base case { auto pred = [&](const Element<T>& elem){ return elem.first; }; bucket_sort.sort(arrays[index - 1], pred); } unsigned start_array = index - 1; //Create a predicate for the bucket sort auto pred = [&](const Element<T>& elem) { //Update last element in list of k smallest found so far unsigned old_val = last_found; //last of the k smallest unsigned new_val = old_val; //new val for last of k smallest bool stop_processing=false; //Flag to continue or discontinue do { old_val = last_found; new_val = old_val; stop_processing = false; if (bucket_sort.size() < k) { new_val =std::max<unsigned>( elem.first, old_val ); } else if (elem.first > old_val) { //Simply stop processing array if elem > max(k-smallest) new_val = old_val; stop_processing = true; } else { //Add elem and update max(k-smallest) auto tmp = [&](const Element<T>& in){ return in.first; }; new_val = std::max<unsigned>(elem.first, bucket_sort.find( std::make_pair(old_val, elem.second), tmp, -1)->first); } } while (last_found.compare_exchange_weak(old_val, new_val) == false && stop_processing == false ); return stop_processing == false ? elem.first : -int(1); }; //Create a thread function that adds a new array to the bucket sort index = 0;
  • 10. std::atomic<bool> start_thread = false; std::atomic<unsigned> pop_count = 0; auto add_array = [&]( void ) { //Wait for start signal while (start_thread == false); //Loop over arrays "popping" each one processed unsigned old_count = pop_count; unsigned new_count = old_count + 1; while (old_count < arrays.size()) { //Claim an array, by capturing pop count and incrmeenting while (pop_count.compare_exchange_weak(old_count, old_count+1) == false); //Check pop count is within bounds and not the starting array if (old_count != start_array && old_count < arrays.size()) { bucket_sort.sort(arrays[old_count], pred); } } }; //Add arrays to the bucket sort, limit number of threads (4=magic num, //but this is only proof-of-concept code unsigned thread_limit = 4; unsigned thread_count = 0; std::vector< std::future<void> >results; for (auto& arr : arrays) { //Add array to sort results.push_back( std::async(std::launch::async, add_array)); //If thread lim reached: if (thread_count > thread_limit) { //Wait for existing threads to finish start_thread = true; for (auto& res : results) { res.wait(); } //Reset limit checks thread_count = 0; results.clear(); } ++thread_count; } //Start threads and wait for results start_thread = true; for (auto& res : results) { res.wait(); } //Extract k-th order statistic from bucket table return bucket_sort.get_result(k); } } /** k_statistic * @param arrays : ElementArrays. The sorted arrays to be merged * * @param k : unsigned. The k-statistic, i.e. the k smallest elements in "arrays". * * Notes: (1) Function uses merge sort to find smallest elements. The merge is vectorised. * (2) The implementation of this method is in a STATE OF FLUX. * Focussed on stress-testing Bucket-Sort algorithm */
  • 11. namespace KStatisticMergeSort { template<typename T> const ElementArray<T> k_statistic(ElementArrays<T>& arrays, unsigned k) { //std::atomic isn't copy-constructible, so need to wrap it for std::vec //Very annoying. Naughty C++ committee. struct AtomicUnsigned { AtomicUnsigned(void) : m_val(new std::atomic<unsigned>(0)) { } ~AtomicUnsigned(void) { delete m_val; } void operator=(unsigned i) { *m_val = i; } void operator+=(unsigned i) { unsigned old = *m_val; while (m_val->compare_exchange_weak(old, old + i) == false); } operator unsigned(void) { return *m_val; } std::atomic<unsigned>* m_val; }; //EOF annoying code-bloat //Create sync id for each queue std::vector<AtomicUnsigned> sync_id; //Struct to hold id, used to sync pushes to queue, //and the boundaries of the left and right array chunks to merge struct QueueData { unsigned m_sync_id; unsigned m_generations_skipped; typename ElementArray<T>::const_iterator m_lhs_first; typename ElementArray<T>::const_iterator m_lhs_last; typename ElementArray<T>::const_iterator m_rhs_first; typename ElementArray<T>::const_iterator m_rhs_last; }; //Create task queues. One queue for each array-pair. Length will halve //with each iteration: N/2 -> N/4 -> N/8 ... -> 1 merged array std::vector<std::queue<QueueData>> queues; //Create array of sorted counts (2*k sorted elems => done!) std::vector<AtomicUnsigned> sorted_count; //Create pneding task count (1 thread spawned per task up to thr limit) std::atomic<int> pending_tasks(0); std::vector<AtomicUnsigned> queue_tasks; //Create merge function std::vector<std::mutex> mut(arrays.size()); //This function should be made lock-free //Lambda to extract queue results to an array
  • 12. std::function<void(unsigned)> merge_pair = [&](unsigned i) { //Flag to indicate pair is merged bool done = false; //Lock queue and peek 1st task std::unique_lock<std::mutex> lck(mut[i]); QueueData q_data = queues[i].front(); //Check for termination condition (2*k sorted items) if (sorted_count[i] == k << 1) { //Check final tree level fully populated //Level of node unsigned node_level = static_cast<unsigned>(std::log2(q_data.m_sync_id+1)+1); //Depth of tree unsigned last_level = static_cast<unsigned>(std::log2(sync_id[i] + 1) + 1); //Node is at tree depth? done = node_level == last_level; } //Terminate if finished if (done == true) { lck.unlock(); } //Only process queue[i] if we have not reached last node in last level else { //Get latest chunk from queue queues[i].pop(); lck.unlock(); //Get middle of lhs //-------------------- ------------- //| | M | | OR | M | | //------------------- ------------- auto lhs_middle = q_data.m_lhs_first + (q_data.m_lhs_last - q_data.m_lhs_first) / 2; //Get "middle" of rhs(insertion point of M) //-------------------- //| |M'<M | I>=M | //-------------------- auto rhs_middle = std::upper_bound(q_data.m_rhs_first, q_data.m_rhs_last, *lhs_middle); //Create (lhs.lower_half, rhs.lower_half). // ( [beg, mid) , [beg, I) ) QueueData left { q_data.m_sync_id * 2 + 1, q_data.m_generations_skipped, q_data.m_lhs_first, lhs_middle,//NB: first==mid implies [mid,mid), i.e. "empty" q_data.m_rhs_first, rhs_middle }; //Create (lhs.upper_half, rhs.upper_half). // ( [mid,last+1) , [I, last+1) ) QueueData right { q_data.m_sync_id * 2 + 2, q_data.m_generations_skipped, lhs_middle, q_data.m_lhs_last, rhs_middle,
  • 13. q_data.m_rhs_last }; //Termination conditions //1. Prev gen lhs sorted(q_data.m_lhs_first==q_data.m_lhs_last) //2. All data >= lhs.mid ---> left=empty if ( right.m_generations_skipped > 0 || q_data.m_lhs_first == q_data.m_lhs_last || /*q_data.m_rhs_first == q_data.m_rhs_last ||*/ (left.m_lhs_first == left.m_lhs_last && left.m_rhs_first == left.m_rhs_last) ) { //Store in + just keep rhs branch right = q_data; right.m_sync_id = q_data.m_sync_id * 2 + 2; //Keep track of tree levels skipped in bifurcation process ++right.m_generations_skipped; //Update count of sorted elements if (right.m_generations_skipped == 1) { sorted_count[i].m_val->fetch_add( ((q_data.m_lhs_last - q_data.m_lhs_first) + (q_data.m_rhs_last - q_data.m_rhs_first)) ); } } //Wait for synchronisation value // / // / // / / wait for right-2 (0 gens skipped) // / / / wait for right-2 (1 gen skipped) // //// // wait for right-4 (2 gens skipped) unsigned skipped = std::max<unsigned>(right.m_generations_skipped, 1); unsigned prev_id; do { prev_id = right.m_sync_id - (1 << skipped); //std::this_thread::sleep_for(std::chrono::microseconds(1)); std::this_thread::yield(); } while (sync_id[i].m_val->compare_exchange_weak(prev_id, prev_id) == false); //Only push lhs if generation not skipped lck.lock(); if (right.m_generations_skipped == 0) { //Push chunk to q and update task count queues[i].push(left); auto& task = *(queue_tasks[i].m_val); task++; ++pending_tasks; } //Push rhs chunk, update task count & sync id queues[i].push(right); auto& task = *(queue_tasks[i].m_val); task++; ++pending_tasks; sync_id[i] = right.m_sync_id; } }; //Lambda for performing container.begin()+n auto advance = [&]( const ElementArray<T>& a, typename ElementArray<T>::const_iterator it, unsigned ind) {
  • 14. std::advance(it, std::min<unsigned>(ind, a.end()-it)); return it; }; //Lambda to push array pairs onto queues auto push_pairs = [&](const ElementArrays<T>& arrs) { //Resize queues, task counts, sorted counts, sync ids auto size = arrs.size(); queues.clear(); queues.resize(size / 2); queue_tasks.clear(); queue_tasks.resize(size / 2); sorted_count.clear(); sorted_count.resize(size/2); sync_id.clear(); sync_id.resize(queues.size()); //oop over array pairs unsigned count = 0; for (unsigned i = 0; i < (size>>1)<<1; i += 2) { //Populate thread id, gens skipped, start/end of lhs of pair, //start/end rhs of pair. NB interval = [beg,end) = [0,1,..,k-1,k) QueueData data { 0, 0, arrs[i].begin(), advance(arrs[i], arrs[i].cbegin(), k), arrs[i + 1].cbegin(), advance(arrs[i + 1], arrs[i + 1].cbegin(), k) }; //Push chunk to q and update task count queues[count].push(data); auto& tmp = *(queue_tasks[count++].m_val); tmp++; ++pending_tasks; } }; //Push original arrays push_pairs(arrays); //Lambda to extract queue results to an array auto extract = [&](ElementArrays<T>& arrs) { unsigned count = 0; //Lopp over queues for (auto& q : queues) { ElementArray<T> arr; while (q.empty() == false) { //Get start/end of chunk range auto pair = q.front(); q.pop(); //Only extract if range not empty and elems <= k if (pair.m_lhs_first != pair.m_lhs_last) { std::copy(pair.m_lhs_first, pair.m_lhs_last, std::back_inserter(arr)); } //Only extract if range not empty and elems <= k if (pair.m_rhs_first != pair.m_rhs_last) {
  • 15. std::copy(pair.m_rhs_first, pair.m_rhs_last, std::back_inserter(arr)); } } //Return sorted array arr.resize(k); arrs.push_back(arr); } }; //This section attempts to balance workload evenly across available threads. // //For each array-pair in turn, priority is given to vectorising the merge //of each pair. // //Array-pairs are, successively, processed in parallel, until all threads //are consumed in the vectorised merges of the pairs processed so far. // //As each vectorised merge bifructaes (1->2->4->8...) it will consume more //threads until all threads are being used to vectorise merges. When this //point is reached, array-pair parallel processes ceases, since all threads //are vectorising the existing merges. // //The aim is to minimise the latency of the merge in the hope that this //minimises the latency associated with merging all the array-pairs. // //Storage for results (toggle between ret_val[0<->1], so ret_val[a] has prev //results ret_val[b] has new results in seq N/2 -> N/4 -> N/8 ElementArrays<T> ret_val[2]; unsigned toggle = 0; //Vector of futures returned by tasks std::vector<std::future<void>> results; //Thread limit and count. Magic number, since code=proof-of-concept only unsigned thread_limit = 1024; unsigned thread_count = 0; //Flag to indicate that sorting is complete bool done; bool once = true; do { done = true; //Keep processing any tasks on the queues while (pending_tasks > 0) { //Loop over queueus unsigned q = 0; for (auto& queue : queues) { //Process tasks for this queue std::unique_lock<std::mutex> lck(mut[q]); unsigned num = queue_tasks[q]; for (unsigned task = 0; task < num; ++task) { //Limit num threads spawned if (thread_count < thread_limit) { //Spawn 1 thrd/task and update task count results.push_back(std::async(std::launch::async, merge_pair, q)); --pending_tasks; auto& tmp = *(queue_tasks[q].m_val); tmp--; ++thread_count; } } ++q; lck.unlock();
  • 16. } //Wait for current tasks to finish spawning new tasks for (auto& res : results) { res.wait(); --thread_count; } results.clear(); } //Extract results auto& old_ret_val = ret_val[toggle]; toggle = ++toggle % 2; ret_val[toggle].clear(); if (once && arrays.size()%2 != 0) { ret_val[toggle].push_back(arrays[arrays.size() - 1]); } once = false; if (old_ret_val.size() % 2 != 0) { ret_val[toggle].push_back(old_ret_val[old_ret_val.size() - 1]); } extract(ret_val[toggle]); //Clear queues and add extracted results if (ret_val[toggle].size() > 1) { queues.clear(); push_pairs(ret_val[toggle]); done = false; } } while (done == false); //Return sorted results return ret_val[toggle][0]; } } } namespace TestSort { /** * @brief Creates random number of randomly-sized, ordered arrays of random numbers. * * @param k : unsigned k-th order statisitc. * * @param max_num_arrays : unsigned . Max number of arrays to be sorted. * * @param max_size_of_arrays : unsigned . Max number of elements in array. * * @param out : Sort::ElementArrays<unsigned>& . The arrays to be returned. * */ template<typename T> void get_random_arrays(unsigned k, unsigned max_num_arrays, unsigned max_size_of_array, Sort::ElementArrays<T>& out) { //Random generator (replace 0 with dev() for random seed) std::random_device dev; std::mt19937 generator(0/*dev()*/); Sort::ElementArrays<unsigned> element_arrays; //Create random number for number of arrays std::uniform_int_distribution<T> arrays_rnd(2, max_num_arrays); unsigned num_arrays = arrays_rnd(generator);
  • 17. //Create arrays std::set<unsigned> s; for (unsigned i = 0; i < num_arrays; ++i) { //Create radnom number for sizeof array Sort::ElementArray<T> element_array; std::uniform_int_distribution<T> elements_rnd(k, max_size_of_array); unsigned num_elements = elements_rnd(generator); //Create random numbers for elements and add to array std::uniform_int_distribution<T> element_rnd(0, 1<<31); for (unsigned j = 0; j < num_elements; ++j) { T elem = element_rnd(generator); /*bool res = s.insert(elem).second; if (res == false) { //std::cout << "duplicate removed" << std::endl; j = j-1; } else*/ { element_array.push_back(std::make_pair(elem, i)); } } //Sort the arrays and return them std::sort(element_array.begin(), element_array.end()); out.push_back(element_array); } } /** * @brief Creates random number of randomly-sized, ordered arrays of random numbers. * Adds all of these arrays to the sorting algorithm and validates the result * against the known result obtained by adding arrays to a sorted set. * * @param k : unsigned k-th order statisitc. * * @param max_num_arrays : unsigned . Max number of arrays to be sorted. * * @param max_size_of_arrays : unsigned . Max number of elements in array. * * @return bool : Pass = true. * */ bool test_bucket(unsigned k, unsigned max_num_arrays, unsigned max_size_of_array) { //Declare arrays Sort::ElementArrays<unsigned> test_sort; //Print banner std::cout << "Creating up to " << max_num_arrays << " arrays, each of size <= " << max_size_of_array << ". Please wait ..." << std::endl; //Create a set of ordered, random integer arrays get_random_arrays<unsigned>(k, max_num_arrays, max_size_of_array, test_sort); //Print banner unsigned ave = 0; for (auto& arr : test_sort) ave += arr.size(); ave /= test_sort.size(); std::cout << "Testing bucket algorithm with: Num arrays = " << test_sort.size() << ", average size = " << ave << ", k = " << k << ". Please wait ... " << std::endl;
  • 18. //Run sorting algorithm auto start = std::chrono::high_resolution_clock::now(); auto result = Sort::KStatisticBucketSort::k_statistic(test_sort, k); auto end = std::chrono::high_resolution_clock::now(); //Print banner std::cout << "Algorithm execution completed, after " << std::chrono::duration<double, std::milli>(end - start).count() << " ms." << std::endl; //Run STL merge //Print banner std::cout << "Running std::merge. Please wait ..." << std::endl; //Start timer auto start_1 = std::chrono::high_resolution_clock::now(); //Merge (out[0]=arr1)+arr2 -> out[1], out[1]+arr2 -> out[0], out[0]+arr3 -> out[1] .... std::vector<Sort::Element<unsigned>> output[2]; output[0].insert(output[0].begin(), test_sort[0].begin(), test_sort[0].begin() + k); output[0].resize(2 * k); output[1].resize(2 * k); unsigned toggle; for (unsigned i = 1; i < test_sort.size(); ++i) { toggle = (i - 1) % 2; std::merge(output[toggle].begin(), output[toggle].begin() + k, test_sort[i].begin(), test_sort[i].begin() + k, output[(toggle + 1) % 2].begin()); } //Resize back to k output[(toggle + 1) % 2].resize(k); auto& validate = output[(toggle + 1) % 2]; //stop timer auto end_1 = std::chrono::high_resolution_clock::now(); //Print banner std::cout << "std::merge execution completed, after " << std::chrono::duration<double, std::milli>(end_1 - start_1).count() << " ms." << std::endl; std::cout << "Validating results. Please wait ..." << std::endl; //Validate that values extracted from set == result from algorithm return result == validate; } bool test_merge(unsigned k, unsigned max_num_arrays, unsigned max_size_of_array) { using namespace Sort; //Declare arrays Sort::ElementArrays<unsigned> test_sort; //Print banner std::cout << "Creating up to " << max_num_arrays << " arrays, each of size <= " << max_size_of_array << ". Please wait ..." << std::endl; //Preliminary debugging tests. /*test_sort.push_back( { { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 }, { 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 }, { 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } }); test_sort.push_back( { { 40, 40 }, { 42, 42 }, { 44, 44 }, { 46, 46 }, {48, 48 }, { 410, 410 }, { 412, 412 }, { 414, 414 }, { 416, 416 }, { 418, 418 }, { 420, 420 }, { 422, 422 }, { 424, 424 }, { 426, 426 }, { 428, 428 }, { 430, 430 } });*/ /*test_sort.push_back( { { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 },
  • 19. { 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 }, { 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } });*/ /*test_sort.push_back( { { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 }, { 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 }, { 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } }); test_sort.push_back( { { 1, 1 }, { 3, 3 }, { 5, 5 }, { 7, 7 }, { 9, 9 }, { 11, 11 }, { 13, 13 }, { 15, 15 }, { 17, 17 }, { 19, 19 }, { 21, 21 }, { 23, 23 }, { 25, 25 }, { 27, 27 }, { 29, 29 }, { 31, 31 } }); test_sort.push_back( { { 1, 1 }, { 3, 3 }, { 5, 5 }, { 7, 7 }, { 9, 9 }, { 11, 11 }, { 13, 13 }, { 15, 15 }, { 17, 17 }, { 19, 19 }, { 21, 21 }, { 23, 23 }, { 25, 25 }, { 27, 27 }, { 29, 29 }, { 31, 31 } }); test_sort.push_back( { { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 }, { 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 }, { 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } });*/ /* for (unsigned i = 0; i < 4; ++i) { Sort::ElementArray<unsigned> arr; for (unsigned j = 0; j < 2 * k; ++j) { arr.push_back(std::make_pair(i + (j * 4), i + (j * 4))); } test_sort.push_back(arr); } test_sort.pop_back();*/ //Randomised strees-test. get_random_arrays<unsigned>(k, max_num_arrays, max_size_of_array, test_sort); //Print banner unsigned ave = 0; for (auto& arr : test_sort) ave += arr.size(); ave /= test_sort.size(); std::cout << "Testing merge algorithm with: Num arrays = " << test_sort.size() << ", average size = " << ave << ", k = " << k << ". Please wait ... " << std::endl; //Run sorting algorithm auto start = std::chrono::high_resolution_clock::now(); auto result = Sort::KStatisticMergeSort::k_statistic(test_sort, k); auto end = std::chrono::high_resolution_clock::now(); //Print banner std::cout << "Algorithm execution completed, after " << std::chrono::duration<double, std::milli>(end - start).count() << " ms." << std::endl; //Run STL merge //Print banner std::cout << "Running std::merge. Please wait ..." << std::endl; //Start timer auto start_1 = std::chrono::high_resolution_clock::now(); //Merge (out[0]=arr1)+arr2 -> out[1], out[1]+arr2 -> out[0], out[0]+arr3 -> out[1] .... std::vector<Sort::Element<unsigned>> output[2]; output[0].insert(output[0].begin(), test_sort[0].begin(), test_sort[0].begin() + k); output[0].resize(2 * k); output[1].resize(2 * k); unsigned toggle; for (unsigned i = 1; i < test_sort.size(); ++i) {
  • 20. toggle = (i - 1) % 2; std::merge(output[toggle].begin(), output[toggle].begin() + k, test_sort[i].begin(), test_sort[i].begin() + k, output[(toggle + 1) % 2].begin()); } //Resize back to k output[(toggle + 1) % 2].resize(k); auto& validate = output[(toggle + 1) % 2]; //stop timer auto end_1 = std::chrono::high_resolution_clock::now(); //Print banner std::cout << "std::merge execution completed, after " << std::chrono::duration<double, std::milli>(end_1 - start_1).count() << " ms." << std::endl; std::cout << "Validating results. Please wait ..." << std::endl; //Validate that values extracted from set == result from algorithm return result == validate; } } int main(void) { bool result_bucket = TestSort::test_bucket(1000, 1000, 10000); std::cout << "Bucket Sort Algorithm: " << (result_bucket ? "Passed." : "Failed.") << std::endl; std::cout << std::endl; bool result_merge = TestSort::test_merge(100, 500, 10000); std::cout << "Merge Sort Algorithm: " << (result_merge ? "Passed." : "Failed.") << std::endl; return 0; }