SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET ISO 9001:2008 Certified Journal Page 1174
“AN OPTIMIZED PARALLEL ALGORITHM FOR LONGEST COMMON
SUBSEQUENCE USING OPENMP” – A Review
1Hanok Palaskar, 2Prof. Tausif Diwan
1 M.Tech Student, CSE Department, Shri Ramdeobaba College of Engineering and Management, Nagpur, India
2Assistant Professor, CSE Department, Shri Ramdeobaba College of Engineering and Management, Nagpur, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - The LCS problem is to find the maximum
length common subsequence of two or more given
sequences. Finding the Longest Common Subsequence has
many applications in the areas of bioinformatics and
computational genomics. LCS problem has optimal
substructure and overlapping sub problems, problems with
such properties can be approached by a dynamic
programming problem solving technique. Due to growth of
database sizes of biological sequences, parallel algorithms
are the best solution to solve these large size problems in
less amount of time than sequential algorithms. In this
paper we have carried out a brief survey of different parallel
algorithms and approaches to parallelize LCS problem on
the multi-core CPUs and as well as on GPUs. We have also
proposed our optimized parallel algorithm to solve LCS
problem on multi-core CPUs using a tool OpenMP.
Key Words: LCS, Dynamic Programming, Parallel
Algorithm, OpenMP.
1. INTRODUCTION
One of the classical problems in computer science
is the longest common subsequence. In LCS
problem we are given two sequences A = (a1,
a2,….am) and B = (b1,b2,….bn) and wish to find a
maximum length common subsequence of A and
B. By using the Dynamic Programming technique
LCS can be solved in O(mn) time. Dynamic
Programming algorithms recursively break the
problem up into overlapping sub problems, and
store the answer to the sub problems for later
reference. If there is an enough overlapping of sub
problems, then the time complexity can be
reduced drastically, typically from exponential to
polynomial. LCS has the application in many
areas, such as speech recognition, file comparison,
and especially bioinformatics.
Most common studies in the bioinformatics field
have evolved towards a more large scale, for
example, analysis and study of genome/proteome
instead of a single gene/protein. Hence, it
becomes more and more difficult to achieve these
analyses using classical sequential algorithms on a
single computer. The bioinformatics requires now
parallel algorithms for the massive computation
for their analysis. Parallel algorithms, are
different from a traditional serial algorithms, and
can be executed a piece at a time on many
different processing devices, and at the end to get
the correct result can be combined together.
Due to the spread of multicore machines and
multithreading processors in the marketplace, we
can create parallel programs for uniprocessor
computers also, and can be used to solve large
scale instances problems like LCS. One of the best
tools to do parallel processing on multi-core CPUs
is OpenMP, which is a shared-memory application
programming interface (API) and can be used to
describe how the work is to be shared among
threads that will execute on different processors
or cores.
2. THE LONGEST COMMON SUBSEQUENCE
PROBLEM
The deduction of longest common subsequence of
two or more sequences is a current problem in
the domain of bioinformatics, pattern matching
and data mining. The deduction of these
subsequences is frequently used as a technique of
comparison in order to get the similarity degree
between two or more sequences.
2.1 Definition
A sequence is a finite set of characters or symbols.
If P = <a1,a2, ..,an> is a sequence, where a1,a2, ..,an
are characters, the integer n is the magnitude of P.
A sequence Q = <b1,b2, ..,bn> is a subsequence of P
= <a1,a2, ..,an> if there are integers i1, i2, .., im(1 ≤ i1
< i2 < .. < im ≤ n) where bk = aik for k ∈ [1,m].
For example, X = <B,C,D,E> is a subsequence of Y
= <A,B,C,D,E,F>. A sequence W is a common
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET ISO 9001:2008 Certified Journal Page 1175
subsequence to sequences X and Y if W is a
subsequence of X and of Y. A common
subsequence is largest subsequence if it is having
maximum length. For example: sequences
<C,D,C,B> and <C,E,B,C> are the longest common
subsequences of <B,C,D,C,E,B,C> and of
<C,E,D,B,C,B>.
2.2 Score Matrix
A classical approach for solving the LCS problem
is the dynamic programming. This technique is
based on the filling of a score matrix by using a
scoring mechanism. The last calculated score is
the length of the LCS and by tracing back the table
we can get the subsequence.
Consider n and m be the lengths of two strings
which are to be compared. We determine the
length of a largest common subsequence in X =
<x1,x2, ..,xn> and Y = <y1,y2, ..,ym>.
We find L(i, j) the length of a largest common
subsequence in <a1,a2, ..ai> and <b1,b2, ..,bj>(0 ≤ j
≤m,0 ≤ i ≤n).
0 if i=0 or j=0
L(i,j) = L(i-1,j-1) + 1 if ai = bj
Max(L(i,j-1), L(i-1,j)) else
We use the above scoring function in order to fill
the matrix row by row (fig. 1).
Fig. 1: Example of Filling LCS Score Matrix
The length of the LCS is the highest calculated
score in the score matrix. In Fig. 1, the length is 4.
To find the LCS we trace back from the highest
score point (4) to the score 1 in the score matrix.
3. LITERATURE SURVEY
To find an optimized solution, lot of research is
going on for over thirty years for the LCS problem.
The solutions are based on Dynamic
Programming, Divide-and-conquer, or Dominant
Point technique etc. also, many attempts has been
made to parallelize the existing algorithms. This
section briefly explains the techniques and
approaches used by the various authors to
parallelize the LCS problem in order to get the
optimized solution.
 Amine Dhraief, Raik Issaoui and Abdelfettah
Belghith in “Parallel Computing the Longest
Common Subsequence (LCS) on GPUs:
Efficiency and Language Suitability” focused on
parallelization of LCS problem using Dynamic
Programming approach. They have presented
parallelization approach for solving LCS problem
on GPU, and implemented their proposed
algorithm on NVIDIA platform using CUDA and
OpenCL. To parallelize their algorithm they have
computed the score matrix in the anti-diagonal
direction instead of row wise. They have also
implemented their proposed algorithm on CPU
using OpenMP API. Their algorithm achieves good
speedup on GPU than CPU, and for their proposed
algorithm CUDA is more suitable, for NVIDIA
platform, than OpenCL[1].
 Quingguo Wang, Dmitry Korkin, and Yi Shang
in “A Fast Multiple Longest Common
Subsequnce (MLCS) Algorithm” presented an
algorithm for general case of Multiple LCS and its
parallelization. Their algorithm is based on
dominant point approach and a fast divide-and-
conquer technique is used to compute the
dominant points. The parallelization of the
algorithm is carried out using multiple processors
having one master processor and other as slaves.
Master processor starts the divide-and-conquer
algorithm, splits the dominant points and assigns
them evenly to two children processor. The
program is recursively executed at the children
processor to form binary tree based on parent-
children relationship. This algorithm shows
asymptotically linear speed-up with respect to
sequential algorithm [2].
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET ISO 9001:2008 Certified Journal Page 1176
 Amit Shukla and Suneeta Agrawal in “A
Relative Position based Algorithm to find out
the Longest Common Subsequence from
Multiple Biological Sequnces” proposed a
parallel algorithm for LCS based on the calculation
of the relative positions of the characters from
any number of given sequences. The speed-up in
this algorithm has been achieved by recognizing
and rejecting all those subsequences which
cannot generate the next character of the LCS. For
this algorithm it is required to have the number of
characters being used in the sequences in
advance. Parallelization approach of this
algorithm uses multiple processors where
number of processors is equal to the number of
characters in the finite symbol set. Calculations
are done at each processor and the results are
stored at local memories of each processer which
are then combined to get the final LCS. The
complexity for the sequential algorithm is O(n)
where n is length of the sequence and the
complexity for the parallel algorithm is O(k)
where k is the slowest processor. Complexity for
the parallel algorithm is independent of the
number of sequences [3].
 R. Devika Ruby and Dr. L. Arockiam in
“Positional LCS: A position based algorithm to
find Longest Common Subsequence (LCS) in
Sequence Database (SDB)” in their paper
presented a position based algorithm for LCS
which is useful in Sequence Database
Applications. Their proposed algorithm focuses
only on matched position, instead of unmatched
positions of sequences, to get LCS. Primary idea
for their algorithm is to remove backtracking time
by storing only the matched position, where LCS
occurs. Also to reduce the time complexity,
instead of searching entire score matrix, matched
position array is used. In their proposed
algorithm score matrix is computed for entire
sequences, but to find the LCS their algorithm
scans only the matched positions. Time
complexity of their proposed algorithm is reduced
to half than time complexity of dynamic LCS [4].
 Jaimei Liu and Suping Wu in “Research on
Longest Common Subsequence Fast
Algorithm” proposed a fast algorithm for LCS, for
two sequences having length ‘m’ and ‘n’, by
transforming the problem of LCS into solving the
problem of matrix m[p,m] (where p< min(m,n)).
They have also presented the parallelization of
their proposed algorithm based on OpenMP. For
optimizing computation of each element in the
score matrix, they have used their proposed
theorems. To find the LCS instead of backtracking
the score matrix, they have used a simple formula
which gives the required LCS in constant time.
Time complexity of their proposed algorithm is
O(np) and the space complexity is O(m+n). They
have realized the parallelization of their proposed
algorithm by using OpenMP constructs on the
nested outer loops [5].
4. PROPOSED WORK
We propose the new optimized parallel LCS
algorithm. Major factor in finding the LCS is the
computation of score matrix hence we will
optimize the calculation of elements in the score
matrix by using theorems, instead of classical
method. We will implement our parallel algorithm
on the multi-core CPUs using OpenMP API
constructs. To increase the performance of our
parallel in terms of speed and memory
optimization we will divide the load among the
threads by applying load balancing techniques
and cache optimization respectively. We expect
that our proposed algorithm will be faster than
the existing Parallel LCS algorithms. In future we
will expand our algorithm, to support the Multiple
Longest Common Subsequence (MLCS) and also
to make the hybrid version of our algorithm by
combining OpenMP and MPI.
5. CONCLUSION
Problem of LCS have the variety of applications in
the domain of pattern matching, data mining and
bioinformatics. Due to the recent developments in
the multi-core CPUs, parallel algorithms using
OpenMP are one of the best ways to solve the
problems having large size inputs. This paper
presented a review of parallel algorithm for the
Longest Common Subsequence problem and
approaches to parallelize LCS problem on the
multi-core CPUs and as well as on GPUs. We also
have proposed our parallel algorithm and the
optimizations in order to increase the
performance, which we expect to be the faster
algorithm in comparison to the existing parallel
algorithms for solving LCS.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET ISO 9001:2008 Certified Journal Page 1177
6. REFERENCES
[1] Amine Dhraief, Raik Issaoui, Abdelfettah
Belghith, “Parallel Computing the Longest
Common Subsequence (LCS) on GPUs: Efficiency
and Language Suitability”, The First International
Conference on Advanced Communications and
Computation, 2011.
[2] Quingguo Wang, Dmitry Korkin, Yi Shang, “A
Fast Multiple Longest Common Subsequnce
(MLCS) Algorithm”,IEEE transaction on
knowledge and data engineering, 2011.
[3] Amit Shukla, Suneeta Agrawal, “A Relative
Position based Algorithm to find out the Longest
Common Subsequence from Multiple Biological
Sequnces ”, 2010 International Conference on
Computer and Communication Technology, pages
496 – 502.
[4]R. Devika Ruby, Dr. L. Arockiam, “Positional
LCS: A position based algorithm to find Longest
Common Subsequence (LCS) in Sequence
Database (SDB)”, IEEE International Conference
on Computational Intelligence and Computing
Research, 2012.
[5] Jiamei Liu, Suping Wu “Research on Longest
Common Subsequence Fast Algorithm”, 2011
International Conference on Consumer
Electronics, Communications and Networks,
pages 4338 – 4341.
[6] Zhong Zheng, Xuhao Chen, Zhiying Wang, Li
Shen, Jaiwen Li “Performance Model for OpenMP
Parallelized Loops ”, 2011 International
Conference on Transportation, Mechanical and
Electrical Engineering (TMEE), pages 383-387.
[7] Rahim Khan, Mushtaq Ahmad, Muhammad
Zakarya, “Longest Common Subsequence Based
Algorithm for Measuring Similarity Between Time
Series: A New Approach” World Applied Sciences
Journal, pages 1192-1198.
[8] Jiaoyun Yang, Yun Xu,“A Space-Bounded
Anytime Algorithm for the Multiple Longest
Common Subsequence Problem”, IEEE transaction
on knowledge and data engineering, 2014.
[9] I-Hsuan Yang, Chien-Pin Huang, Kun-Mao
Chao, “A fast algorithm for computing a longest
common increasing subsequence”, Information
Processing Letters, ELSEVIER, 2004.
[10] Yu Haiying, Zhao Junlan, Application of
Longest Common Subsequence Algorithm in
Similarity Measurement of Program Source Codes.
Journal of Inner Mongolia University, vol. 39, pp.
225–229, Mar 2008.
[11] Krste Asanovic, Ras Bodik, Bryan, Joseph,
Parry, Samuel Williams, “The Landscape of
Parallel Computing Research: A view from
Berkeley” Electrical Engineering and Computer
Sciences University of California at Berkeley,
December 2006.
[12] Barbara Champman, Gabriel Jost, Ruud Van
Der Pas “Using OpenMP”, 1-123

More Related Content

What's hot (17)

PDF
E01113138
IOSR Journals
 
PDF
Parallel k nn on gpu architecture using opencl
eSAT Publishing House
 
PDF
Analysis of Algorithms II - PS3
AtakanAral
 
PDF
A COMPARATIVE STUDY OF BACKPROPAGATION ALGORITHMS IN FINANCIAL PREDICTION
IJCSEA Journal
 
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
PDF
IRJET- Wavelet Transform based Steganography
IRJET Journal
 
PPT
Novel algorithms for Knowledge discovery from neural networks in Classificat...
Dr.(Mrs).Gethsiyal Augasta
 
PDF
Experimental study of Data clustering using k- Means and modified algorithms
IJDKP
 
PDF
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
IJERA Editor
 
PDF
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
IJERA Editor
 
PDF
Vol 16 No 2 - July-December 2016
ijcsbi
 
PDF
Extended pso algorithm for improvement problems k means clustering algorithm
IJMIT JOURNAL
 
PDF
Simulation of Single and Multilayer of Artificial Neural Network using Verilog
ijsrd.com
 
PDF
Combination of Immune Genetic Particle Swarm Optimization algorithm with BP a...
paperpublications3
 
PDF
Neural Network Algorithm for Radar Signal Recognition
IJERA Editor
 
PDF
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
csandit
 
PDF
Job Scheduling on the Grid Environment using Max-Min Firefly Algorithm
Editor IJCATR
 
E01113138
IOSR Journals
 
Parallel k nn on gpu architecture using opencl
eSAT Publishing House
 
Analysis of Algorithms II - PS3
AtakanAral
 
A COMPARATIVE STUDY OF BACKPROPAGATION ALGORITHMS IN FINANCIAL PREDICTION
IJCSEA Journal
 
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
IRJET- Wavelet Transform based Steganography
IRJET Journal
 
Novel algorithms for Knowledge discovery from neural networks in Classificat...
Dr.(Mrs).Gethsiyal Augasta
 
Experimental study of Data clustering using k- Means and modified algorithms
IJDKP
 
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
IJERA Editor
 
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
IJERA Editor
 
Vol 16 No 2 - July-December 2016
ijcsbi
 
Extended pso algorithm for improvement problems k means clustering algorithm
IJMIT JOURNAL
 
Simulation of Single and Multilayer of Artificial Neural Network using Verilog
ijsrd.com
 
Combination of Immune Genetic Particle Swarm Optimization algorithm with BP a...
paperpublications3
 
Neural Network Algorithm for Radar Signal Recognition
IJERA Editor
 
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
csandit
 
Job Scheduling on the Grid Environment using Max-Min Firefly Algorithm
Editor IJCATR
 

Similar to An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp – A Review (20)

PDF
Kernal based speaker specific feature extraction and its applications in iTau...
TELKOMNIKA JOURNAL
 
PPTX
design and analysis of algorithm (Longest common subsequence)
RoneekPatel
 
PDF
Gk3611601162
IJERA Editor
 
DOCX
User_42751212015Module1and2pagestocompetework.pdf.docx
dickonsondorris
 
PDF
Problems in Task Scheduling in Multiprocessor System
ijtsrd
 
PDF
cis97003
perfj
 
PDF
Performance Comparision of Machine Learning Algorithms
Dinusha Dilanka
 
PDF
M017327378
IOSR Journals
 
PDF
Scheduling Using Multi Objective Genetic Algorithm
iosrjce
 
PDF
Parallelization of Graceful Labeling Using Open MP
IJSRED
 
PDF
L010628894
IOSR Journals
 
PDF
A Comparative study of K-SVD and WSQ Algorithms in Fingerprint Compression Te...
IRJET Journal
 
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
PDF
International Journal of Soft Computing, Mathematics and Control (IJSCMC)
ijscmcj1
 
PDF
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
ijscmcj
 
PDF
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Subhajit Sahu
 
PDF
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Subhajit Sahu
 
PDF
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
ijnlc
 
PDF
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
kevig
 
PPTX
DAArealtime.pptx,To design an algorithm to determine the longest subsequence ...
RoneekPatel
 
Kernal based speaker specific feature extraction and its applications in iTau...
TELKOMNIKA JOURNAL
 
design and analysis of algorithm (Longest common subsequence)
RoneekPatel
 
Gk3611601162
IJERA Editor
 
User_42751212015Module1and2pagestocompetework.pdf.docx
dickonsondorris
 
Problems in Task Scheduling in Multiprocessor System
ijtsrd
 
cis97003
perfj
 
Performance Comparision of Machine Learning Algorithms
Dinusha Dilanka
 
M017327378
IOSR Journals
 
Scheduling Using Multi Objective Genetic Algorithm
iosrjce
 
Parallelization of Graceful Labeling Using Open MP
IJSRED
 
L010628894
IOSR Journals
 
A Comparative study of K-SVD and WSQ Algorithms in Fingerprint Compression Te...
IRJET Journal
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
International Journal of Soft Computing, Mathematics and Control (IJSCMC)
ijscmcj1
 
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
ijscmcj
 
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Subhajit Sahu
 
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Subhajit Sahu
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
ijnlc
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
kevig
 
DAArealtime.pptx,To design an algorithm to determine the longest subsequence ...
RoneekPatel
 
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
PDF
Kiona – A Smart Society Automation Project
IRJET Journal
 
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
PDF
Breast Cancer Detection using Computer Vision
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Ad

Recently uploaded (20)

PDF
LLC CM NCP1399 SIMPLIS MODEL MANUAL.PDF
ssuser1be9ce
 
PDF
bs-en-12390-3 testing hardened concrete.pdf
ADVANCEDCONSTRUCTION
 
PDF
Module - 5 Machine Learning-22ISE62.pdf
Dr. Shivashankar
 
PDF
lesson4-occupationalsafetyandhealthohsstandards-240812020130-1a7246d0.pdf
arvingallosa3
 
PDF
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
PDF
FSE-Journal-First-Automated code editing with search-generate-modify.pdf
cl144
 
PDF
Clustering Algorithms - Kmeans,Min ALgorithm
Sharmila Chidaravalli
 
PPTX
Unit_I Functional Units, Instruction Sets.pptx
logaprakash9
 
PDF
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
PPTX
Explore USA’s Best Structural And Non Structural Steel Detailing
Silicon Engineering Consultants LLC
 
PPTX
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
PDF
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
PPT
SF 9_Unit 1.ppt software engineering ppt
AmarrKannthh
 
PPTX
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
resming1
 
PPTX
Introduction to File Transfer Protocol with commands in FTP
BeulahS2
 
PDF
Tesia Dobrydnia - An Avid Hiker And Backpacker
Tesia Dobrydnia
 
PDF
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
PDF
01-introduction to the ProcessDesign.pdf
StiveBrack
 
PDF
Python Mini Project: Command-Line Quiz Game for School/College Students
MPREETHI7
 
PDF
Module - 4 Machine Learning -22ISE62.pdf
Dr. Shivashankar
 
LLC CM NCP1399 SIMPLIS MODEL MANUAL.PDF
ssuser1be9ce
 
bs-en-12390-3 testing hardened concrete.pdf
ADVANCEDCONSTRUCTION
 
Module - 5 Machine Learning-22ISE62.pdf
Dr. Shivashankar
 
lesson4-occupationalsafetyandhealthohsstandards-240812020130-1a7246d0.pdf
arvingallosa3
 
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
FSE-Journal-First-Automated code editing with search-generate-modify.pdf
cl144
 
Clustering Algorithms - Kmeans,Min ALgorithm
Sharmila Chidaravalli
 
Unit_I Functional Units, Instruction Sets.pptx
logaprakash9
 
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
Explore USA’s Best Structural And Non Structural Steel Detailing
Silicon Engineering Consultants LLC
 
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
SF 9_Unit 1.ppt software engineering ppt
AmarrKannthh
 
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
resming1
 
Introduction to File Transfer Protocol with commands in FTP
BeulahS2
 
Tesia Dobrydnia - An Avid Hiker And Backpacker
Tesia Dobrydnia
 
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
01-introduction to the ProcessDesign.pdf
StiveBrack
 
Python Mini Project: Command-Line Quiz Game for School/College Students
MPREETHI7
 
Module - 4 Machine Learning -22ISE62.pdf
Dr. Shivashankar
 

An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp – A Review

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET ISO 9001:2008 Certified Journal Page 1174 “AN OPTIMIZED PARALLEL ALGORITHM FOR LONGEST COMMON SUBSEQUENCE USING OPENMP” – A Review 1Hanok Palaskar, 2Prof. Tausif Diwan 1 M.Tech Student, CSE Department, Shri Ramdeobaba College of Engineering and Management, Nagpur, India 2Assistant Professor, CSE Department, Shri Ramdeobaba College of Engineering and Management, Nagpur, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - The LCS problem is to find the maximum length common subsequence of two or more given sequences. Finding the Longest Common Subsequence has many applications in the areas of bioinformatics and computational genomics. LCS problem has optimal substructure and overlapping sub problems, problems with such properties can be approached by a dynamic programming problem solving technique. Due to growth of database sizes of biological sequences, parallel algorithms are the best solution to solve these large size problems in less amount of time than sequential algorithms. In this paper we have carried out a brief survey of different parallel algorithms and approaches to parallelize LCS problem on the multi-core CPUs and as well as on GPUs. We have also proposed our optimized parallel algorithm to solve LCS problem on multi-core CPUs using a tool OpenMP. Key Words: LCS, Dynamic Programming, Parallel Algorithm, OpenMP. 1. INTRODUCTION One of the classical problems in computer science is the longest common subsequence. In LCS problem we are given two sequences A = (a1, a2,….am) and B = (b1,b2,….bn) and wish to find a maximum length common subsequence of A and B. By using the Dynamic Programming technique LCS can be solved in O(mn) time. Dynamic Programming algorithms recursively break the problem up into overlapping sub problems, and store the answer to the sub problems for later reference. If there is an enough overlapping of sub problems, then the time complexity can be reduced drastically, typically from exponential to polynomial. LCS has the application in many areas, such as speech recognition, file comparison, and especially bioinformatics. Most common studies in the bioinformatics field have evolved towards a more large scale, for example, analysis and study of genome/proteome instead of a single gene/protein. Hence, it becomes more and more difficult to achieve these analyses using classical sequential algorithms on a single computer. The bioinformatics requires now parallel algorithms for the massive computation for their analysis. Parallel algorithms, are different from a traditional serial algorithms, and can be executed a piece at a time on many different processing devices, and at the end to get the correct result can be combined together. Due to the spread of multicore machines and multithreading processors in the marketplace, we can create parallel programs for uniprocessor computers also, and can be used to solve large scale instances problems like LCS. One of the best tools to do parallel processing on multi-core CPUs is OpenMP, which is a shared-memory application programming interface (API) and can be used to describe how the work is to be shared among threads that will execute on different processors or cores. 2. THE LONGEST COMMON SUBSEQUENCE PROBLEM The deduction of longest common subsequence of two or more sequences is a current problem in the domain of bioinformatics, pattern matching and data mining. The deduction of these subsequences is frequently used as a technique of comparison in order to get the similarity degree between two or more sequences. 2.1 Definition A sequence is a finite set of characters or symbols. If P = <a1,a2, ..,an> is a sequence, where a1,a2, ..,an are characters, the integer n is the magnitude of P. A sequence Q = <b1,b2, ..,bn> is a subsequence of P = <a1,a2, ..,an> if there are integers i1, i2, .., im(1 ≤ i1 < i2 < .. < im ≤ n) where bk = aik for k ∈ [1,m]. For example, X = <B,C,D,E> is a subsequence of Y = <A,B,C,D,E,F>. A sequence W is a common
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET ISO 9001:2008 Certified Journal Page 1175 subsequence to sequences X and Y if W is a subsequence of X and of Y. A common subsequence is largest subsequence if it is having maximum length. For example: sequences <C,D,C,B> and <C,E,B,C> are the longest common subsequences of <B,C,D,C,E,B,C> and of <C,E,D,B,C,B>. 2.2 Score Matrix A classical approach for solving the LCS problem is the dynamic programming. This technique is based on the filling of a score matrix by using a scoring mechanism. The last calculated score is the length of the LCS and by tracing back the table we can get the subsequence. Consider n and m be the lengths of two strings which are to be compared. We determine the length of a largest common subsequence in X = <x1,x2, ..,xn> and Y = <y1,y2, ..,ym>. We find L(i, j) the length of a largest common subsequence in <a1,a2, ..ai> and <b1,b2, ..,bj>(0 ≤ j ≤m,0 ≤ i ≤n). 0 if i=0 or j=0 L(i,j) = L(i-1,j-1) + 1 if ai = bj Max(L(i,j-1), L(i-1,j)) else We use the above scoring function in order to fill the matrix row by row (fig. 1). Fig. 1: Example of Filling LCS Score Matrix The length of the LCS is the highest calculated score in the score matrix. In Fig. 1, the length is 4. To find the LCS we trace back from the highest score point (4) to the score 1 in the score matrix. 3. LITERATURE SURVEY To find an optimized solution, lot of research is going on for over thirty years for the LCS problem. The solutions are based on Dynamic Programming, Divide-and-conquer, or Dominant Point technique etc. also, many attempts has been made to parallelize the existing algorithms. This section briefly explains the techniques and approaches used by the various authors to parallelize the LCS problem in order to get the optimized solution.  Amine Dhraief, Raik Issaoui and Abdelfettah Belghith in “Parallel Computing the Longest Common Subsequence (LCS) on GPUs: Efficiency and Language Suitability” focused on parallelization of LCS problem using Dynamic Programming approach. They have presented parallelization approach for solving LCS problem on GPU, and implemented their proposed algorithm on NVIDIA platform using CUDA and OpenCL. To parallelize their algorithm they have computed the score matrix in the anti-diagonal direction instead of row wise. They have also implemented their proposed algorithm on CPU using OpenMP API. Their algorithm achieves good speedup on GPU than CPU, and for their proposed algorithm CUDA is more suitable, for NVIDIA platform, than OpenCL[1].  Quingguo Wang, Dmitry Korkin, and Yi Shang in “A Fast Multiple Longest Common Subsequnce (MLCS) Algorithm” presented an algorithm for general case of Multiple LCS and its parallelization. Their algorithm is based on dominant point approach and a fast divide-and- conquer technique is used to compute the dominant points. The parallelization of the algorithm is carried out using multiple processors having one master processor and other as slaves. Master processor starts the divide-and-conquer algorithm, splits the dominant points and assigns them evenly to two children processor. The program is recursively executed at the children processor to form binary tree based on parent- children relationship. This algorithm shows asymptotically linear speed-up with respect to sequential algorithm [2].
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET ISO 9001:2008 Certified Journal Page 1176  Amit Shukla and Suneeta Agrawal in “A Relative Position based Algorithm to find out the Longest Common Subsequence from Multiple Biological Sequnces” proposed a parallel algorithm for LCS based on the calculation of the relative positions of the characters from any number of given sequences. The speed-up in this algorithm has been achieved by recognizing and rejecting all those subsequences which cannot generate the next character of the LCS. For this algorithm it is required to have the number of characters being used in the sequences in advance. Parallelization approach of this algorithm uses multiple processors where number of processors is equal to the number of characters in the finite symbol set. Calculations are done at each processor and the results are stored at local memories of each processer which are then combined to get the final LCS. The complexity for the sequential algorithm is O(n) where n is length of the sequence and the complexity for the parallel algorithm is O(k) where k is the slowest processor. Complexity for the parallel algorithm is independent of the number of sequences [3].  R. Devika Ruby and Dr. L. Arockiam in “Positional LCS: A position based algorithm to find Longest Common Subsequence (LCS) in Sequence Database (SDB)” in their paper presented a position based algorithm for LCS which is useful in Sequence Database Applications. Their proposed algorithm focuses only on matched position, instead of unmatched positions of sequences, to get LCS. Primary idea for their algorithm is to remove backtracking time by storing only the matched position, where LCS occurs. Also to reduce the time complexity, instead of searching entire score matrix, matched position array is used. In their proposed algorithm score matrix is computed for entire sequences, but to find the LCS their algorithm scans only the matched positions. Time complexity of their proposed algorithm is reduced to half than time complexity of dynamic LCS [4].  Jaimei Liu and Suping Wu in “Research on Longest Common Subsequence Fast Algorithm” proposed a fast algorithm for LCS, for two sequences having length ‘m’ and ‘n’, by transforming the problem of LCS into solving the problem of matrix m[p,m] (where p< min(m,n)). They have also presented the parallelization of their proposed algorithm based on OpenMP. For optimizing computation of each element in the score matrix, they have used their proposed theorems. To find the LCS instead of backtracking the score matrix, they have used a simple formula which gives the required LCS in constant time. Time complexity of their proposed algorithm is O(np) and the space complexity is O(m+n). They have realized the parallelization of their proposed algorithm by using OpenMP constructs on the nested outer loops [5]. 4. PROPOSED WORK We propose the new optimized parallel LCS algorithm. Major factor in finding the LCS is the computation of score matrix hence we will optimize the calculation of elements in the score matrix by using theorems, instead of classical method. We will implement our parallel algorithm on the multi-core CPUs using OpenMP API constructs. To increase the performance of our parallel in terms of speed and memory optimization we will divide the load among the threads by applying load balancing techniques and cache optimization respectively. We expect that our proposed algorithm will be faster than the existing Parallel LCS algorithms. In future we will expand our algorithm, to support the Multiple Longest Common Subsequence (MLCS) and also to make the hybrid version of our algorithm by combining OpenMP and MPI. 5. CONCLUSION Problem of LCS have the variety of applications in the domain of pattern matching, data mining and bioinformatics. Due to the recent developments in the multi-core CPUs, parallel algorithms using OpenMP are one of the best ways to solve the problems having large size inputs. This paper presented a review of parallel algorithm for the Longest Common Subsequence problem and approaches to parallelize LCS problem on the multi-core CPUs and as well as on GPUs. We also have proposed our parallel algorithm and the optimizations in order to increase the performance, which we expect to be the faster algorithm in comparison to the existing parallel algorithms for solving LCS.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET ISO 9001:2008 Certified Journal Page 1177 6. REFERENCES [1] Amine Dhraief, Raik Issaoui, Abdelfettah Belghith, “Parallel Computing the Longest Common Subsequence (LCS) on GPUs: Efficiency and Language Suitability”, The First International Conference on Advanced Communications and Computation, 2011. [2] Quingguo Wang, Dmitry Korkin, Yi Shang, “A Fast Multiple Longest Common Subsequnce (MLCS) Algorithm”,IEEE transaction on knowledge and data engineering, 2011. [3] Amit Shukla, Suneeta Agrawal, “A Relative Position based Algorithm to find out the Longest Common Subsequence from Multiple Biological Sequnces ”, 2010 International Conference on Computer and Communication Technology, pages 496 – 502. [4]R. Devika Ruby, Dr. L. Arockiam, “Positional LCS: A position based algorithm to find Longest Common Subsequence (LCS) in Sequence Database (SDB)”, IEEE International Conference on Computational Intelligence and Computing Research, 2012. [5] Jiamei Liu, Suping Wu “Research on Longest Common Subsequence Fast Algorithm”, 2011 International Conference on Consumer Electronics, Communications and Networks, pages 4338 – 4341. [6] Zhong Zheng, Xuhao Chen, Zhiying Wang, Li Shen, Jaiwen Li “Performance Model for OpenMP Parallelized Loops ”, 2011 International Conference on Transportation, Mechanical and Electrical Engineering (TMEE), pages 383-387. [7] Rahim Khan, Mushtaq Ahmad, Muhammad Zakarya, “Longest Common Subsequence Based Algorithm for Measuring Similarity Between Time Series: A New Approach” World Applied Sciences Journal, pages 1192-1198. [8] Jiaoyun Yang, Yun Xu,“A Space-Bounded Anytime Algorithm for the Multiple Longest Common Subsequence Problem”, IEEE transaction on knowledge and data engineering, 2014. [9] I-Hsuan Yang, Chien-Pin Huang, Kun-Mao Chao, “A fast algorithm for computing a longest common increasing subsequence”, Information Processing Letters, ELSEVIER, 2004. [10] Yu Haiying, Zhao Junlan, Application of Longest Common Subsequence Algorithm in Similarity Measurement of Program Source Codes. Journal of Inner Mongolia University, vol. 39, pp. 225–229, Mar 2008. [11] Krste Asanovic, Ras Bodik, Bryan, Joseph, Parry, Samuel Williams, “The Landscape of Parallel Computing Research: A view from Berkeley” Electrical Engineering and Computer Sciences University of California at Berkeley, December 2006. [12] Barbara Champman, Gabriel Jost, Ruud Van Der Pas “Using OpenMP”, 1-123