SlideShare a Scribd company logo
Predictive Job Scheduling in a Connection Limited
System using Parallel Genetic Algorithm

(Synopsis)
INTRODUCTION
Most job-scheduling approaches for parallel machines apply
space sharing which
means allocating CPUs/nodes to jobs in a dedicated manner and
sharing the machine
among multiple jobs by allocation on different subsets of nodes. Some
approaches
apply time sharing (or better to say a combination of time and space
sharing), i.e. use
multiple time slices per CPU/node. Job scheduling determines when
and where to execute the job, given a stream of parallel jobs and set
of computing resources. In a standard working model, when a parallel
job arrives to the system, the scheduler tries to allocate required
number of processors for the duration of runtime to the job and, if
available, starts the job immediately. If the requested processors are
currently unavailable, the job is queued and scheduled to start at a
later time. The most common metrics evaluated include system
metrics such as the system utilization, throughput, etc. and users
metrics such as turnaround time, wait time, etc. The typical charging
model is based on the amount of total resources used (resources
$times$ runtime) by any job.
Data mining, the extraction of hidden predictive information from
large databases, is a powerful new technology with great potential to
help companies focus on the most important information in their data
warehouses. Data mining tools predict future trends and behaviors,
allowing businesses to make proactive, knowledge-driven decisions.
The automated, prospective analyses offered by data mining move
beyond the analyses of past events provided by retrospective tools
typical of decision support systems. Data mining tools can answer
business questions that traditionally were too time consuming to
resolve. They scour databases for hidden patterns, finding predictive
information that experts may miss because it lies outside their
expectations.
Most companies already collect and refine massive quantities of
data. Data mining techniques can be implemented rapidly on existing
software and hardware platforms to enhance the value of existing
information resources, and can be integrated with new products and
systems as they are brought on-line. When implemented on high
performance client/server or parallel processing computers, data
mining tools can analyze massive databases to deliver answers to
questions such as, "Which clients are most likely to respond to my
next promotional mailing, and why?"
Data mining (DM), also called Knowledge-Discovery in
Databases (KDD) or Knowledge-Discovery and Data Mining, is the
process of automatically searching large volumes of data for patterns
using tools such as classification, association rule mining, clustering,
etc.. Data mining is a complex topic and has links with multiple core
fields such as computer science and adds value to rich seminal
computational techniques from statistics, information retrieval,
machine learning and pattern recognition.
Data mining techniques are the result of a long process of research
and product development. This evolution began when business data
was first stored on computers, continued with improvements in data
access, and more recently, generated technologies that allow users to
navigate through their data in real time. Data mining takes this
evolutionary process beyond retrospective data access and navigation
to prospective and proactive information delivery. Data mining is ready
for application in the business community because it is supported by
three technologies that are now sufficiently mature:
o Massive data collection
o Powerful multiprocessor computers
o Data mining algorithms
Commercial databases are growing at unprecedented rates. A recent
META Group survey of data warehouse projects found that 19% of
respondents are beyond the 50 gigabyte level, while 59% expect to be
there by second quarter of 1996.1 In some industries, such as retail,
these numbers can be much larger. The accompanying need for
improved computational engines can now be met in a cost-effective
manner with parallel multiprocessor computer technology. Data mining
algorithms embody techniques that have existed for at least 10 years,
but have only recently been implemented as mature, reliable,
understandable tools that consistently outperform older statistical
methods.
Overview of the System
There are mainly two types of scheduling namely the system level
scheduling and the application level scheduling. The scheduling system
will analyze the load situation of every node and select one node to
run the job. The scheduling policy is to optimize the total performance
of the whole system. If the system is heavily loaded, the scheduling
system has to realize the load balancing and increase the throughput
and resource utilization under restricted conditions. This kind of
scheduling is known as the system level scheduling.
If multiple jobs arrive within a unit scheduling time slot, the
scheduling system shall allocate an appropriate number of jobs to
every node in order to finish these jobs under a defined objective.
Obviously, the objective is usually the minimal average execution
time. This scheduling policy is application-oriented so we call it
application-level scheduling.
A genetic algorithm (or GA) is a search technique used in computing
to find true or approximate solutions to optimization and search
problems. Genetic algorithms are categorized as global search
heuristics. Genetic algorithms are a particular class of evolutionary
algorithms that use techniques inspired by evolutionary biology such
as inheritance, mutation, selection, and crossover (also called
recombination).
Genetic algorithms are implemented as a computer simulation in which
a population of abstract representations (called chromosomes or the
genotype or the genome) of candidate solutions (called individuals,
creatures, or phenotypes) to an optimization problem evolves toward
better solutions. Traditionally, solutions are represented in binary as
strings of 0s and 1s, but other encodings are also possible. The
evolution usually starts from a population of randomly generated
individuals and happens in generations. In each generation, the fitness
of every individual in the population is evaluated, multiple individuals
are stochastically selected from the current population (based on their
fitness), and modified (recombined and possibly mutated) to form a
new population. The new population is then used in the next iteration
of the algorithm. Commonly, the algorithm terminates when either a
maximum number of generations has been produced, or a satisfactory
fitness level has been reached for the population. If the algorithm has
terminated due to a maximum number of generations, a satisfactory
solution may or may not have been reached.
A typical genetic algorithm requires two things to be defined:
1. a genetic representation of the solution domain,
2. a fitness function to evaluate the solution domain.
A standard representation of the solution is as an array of bits. Arrays
of other types and structures can be used in essentially the same way.
The main property that makes these genetic representations
convenient is that their parts are easily aligned due to their fixed size,
that facilitates simple crossover operation. Variable length
representations may also be used, but crossover implementation is
more complex in this case. Tree-like representations are explored in
Genetic programming and free-form representations are explored in
HBGA.
The fitness function is defined over the genetic representation and
measures the quality of the represented solution. The fitness function
is always problem dependent. For instance, in the knapsack problem
we want to maximize the total value of objects that we can put in a
knapsack of some fixed capacity. A representation of a solution might
be an array of bits, where each bit represents a different object, and
the value of the bit (0 or 1) represents whether or not the object is in
the knapsack. Not every such representation is valid, as the size of
objects may exceed the capacity of the knapsack. The fitness of the
solution is the sum of values of all objects in the knapsack if the
representation is valid, or 0 otherwise. In some problems, it is hard or
even impossible to define the fitness expression; in these cases,
interactive genetic algorithms are used.
Once we have the genetic representation and the fitness function
defined, GA proceeds to initialize a population of solutions randomly,
then improve it through repetitive application of mutation, crossover,
and selection operators.
Abstract
Job scheduling is the key feature of any computing environment
and the efficiency of computing depends largely on the scheduling
technique used. Intelligence is the key factor which is lacking in the
job scheduling techniques of today. Genetic algorithms are powerful
search techniques based on the mechanisms of natural selection and
natural genetics.

Multiple jobs are handled by the scheduler and the resource the
job needs are in remote locations. Here we assume that the resource a
job needs are in a location and not split over nodes and each node that
has a resource runs a fixed number of jobs.

The existing algorithms used are non predictive and employs
greedy based algorithms or a variant of it. The efficiency of the job
scheduling process would increase if previous experience and the
genetic algorithms are used.

In this paper, we propose a model of the scheduling algorithm
where the scheduler can learn from previous experiences and an
effective job scheduling is achieved as time progresses.
Description of Problem
The similar system is already available are non predictive and employs
greedy based algorithms or a variant of it. That is the existing system
will not predict in advance regarding the situation. So we can not
schedule the jobs in network in such a way that the resources are
utilized at the optimal level. The problem is to reduce the processing
overhead

during

scheduling.

The proposed system work to data transfer between computers of two
networks. generally,during data transfer between pc's of two different
networks.

Existing Method
The Data mining Algorithms can be categorized into the following
:


Association Algorithm



Classification



Clustering Algorithm

Classification:
The process of dividing a dataset into mutually exclusive groups
such that the members of each group are as "close" as possible to one
another, and different groups are as "far" as possible from one
another, where distance is measured with respect to specific
variable(s) you are trying to predict. For example, a typical
classification problem is to divide a database of companies into groups
that are as homogeneous as possible with respect to a
creditworthiness variable with values "Good" and "Bad."
Clustering:
The process of dividing a dataset into mutually exclusive groups
such that the members of each group are as "close" as possible to one
another, and different groups are as "far" as possible from one
another, where distance is measured with respect to all available
variables.
Given databases of sufficient size and quality, data mining technology
can generate new business opportunities by providing these
capabilities:

•

Automated prediction of trends and behaviors. Data mining
automates the process of finding predictive information in large
databases. Questions that traditionally required extensive handson analysis can now be answered directly from the data —
quickly. A typical example of a predictive problem is targeted
marketing. Data mining uses data on past promotional mailings
to identify the targets most likely to maximize return on
investment in future mailings. Other predictive problems include
forecasting bankruptcy and other forms of default, and
identifying segments of a population likely to respond similarly to
given events.

•

Automated discovery of previously unknown patterns.
Data mining tools sweep through databases and identify
previously hidden patterns in one step. An example of pattern
discovery is the analysis of retail sales data to identify seemingly
unrelated products that are often purchased together. Other
pattern discovery problems include detecting fraudulent credit
card transactions and identifying anomalous data that could
represent data entry keying errors.
Proposed System
Job scheduling is the key feature of any computing environment
and the efficiency of computing depends largely on the scheduling
technique used. Popular algorithm called genetic concept is used in the
systems across the network and scheduling the job according to
predicting the load.
Here

the

system

will

take

care

of

the

scheduling of data packets between the source and destination
computers.
•

Job scheduling to route the packets at all the ports in the router

•

Maintaining queue of data packets and scheduling algorithm is
implemented

•

First

Come

First

Serve

scheduling

and

Genetic

algorithm

scheduling is called for source and destination
•

Comparison of two algorithm is shown in this proposed system
Hardware specifications:
Processor
RAM

:
:

Intel Processor IV
128 MB

Hard disk

:

20 GB

CD drive

:

40 x Samsung

Floppy drive

:

1.44 MB

Monitor

:

15’ Samtron color

Keyboard
Mouse

:
:

108 mercury keyboard
Logitech mouse

Software Specification
Operating System – Windows XP/2000
Language used – J2sdk1.4.0, JCreator
Module Design
Simulated Model :
The simulated model of network is constructed by keeping
group of computer as Network 0 and Network 1. In between the two
network the router is placed from where the data from one network
flows to other network.
First Come First Serve Algorithm:
The packet transfer between the network in implemented
using FCFS algorithm

Genetic Algorithm:
The packet transfer between the network in implemented
using Genetic algorithm. The algorithm details were discussed in
Proposed system design.
Projecting Result and Comparison:
The data transfer between the network of source and
destination is shown by drawing the path between source and
destination. For drawing the path , the points across the network is
also collected. The comparison of two algorithm result are displayed to
the user in separate frame to see the efficiency of Genetic algorithm

More Related Content

DOC
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
PDF
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
PPTX
Seminar Presentation
PDF
Recommendation system using bloom filter in mapreduce
PDF
Document Classification Using Expectation Maximization with Semi Supervised L...
PDF
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
PDF
Effective data mining for proper
PDF
Anomaly detection via eliminating data redundancy and rectifying data error i...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
Seminar Presentation
Recommendation system using bloom filter in mapreduce
Document Classification Using Expectation Maximization with Semi Supervised L...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
Effective data mining for proper
Anomaly detection via eliminating data redundancy and rectifying data error i...

What's hot (19)

PDF
A statistical data fusion technique in virtual data integration environment
PDF
Application of data mining tools for
PDF
Enhancement techniques for data warehouse staging area
PDF
Introduction to feature subset selection method
PDF
An efficient algorithm for sequence generation in data mining
PDF
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...
PDF
V2 i9 ijertv2is90699-1
PDF
Certain Investigation on Dynamic Clustering in Dynamic Datamining
PDF
Column store decision tree classification of unseen attribute set
PDF
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
PDF
Novel Ensemble Tree for Fast Prediction on Data Streams
PDF
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
PDF
Tutorial Knowledge Discovery
PDF
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
DOCX
QUERY AWARE DETERMINIZATION OF UNCERTAIN OBJECTS
PDF
The pertinent single-attribute-based classifier for small datasets classific...
PDF
(2016)application of parallel glowworm swarm optimization algorithm for data ...
PDF
A survey of modified support vector machine using particle of swarm optimizat...
PDF
Query aware determinization of uncertain
A statistical data fusion technique in virtual data integration environment
Application of data mining tools for
Enhancement techniques for data warehouse staging area
Introduction to feature subset selection method
An efficient algorithm for sequence generation in data mining
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...
V2 i9 ijertv2is90699-1
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Column store decision tree classification of unseen attribute set
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
Novel Ensemble Tree for Fast Prediction on Data Streams
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
Tutorial Knowledge Discovery
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
QUERY AWARE DETERMINIZATION OF UNCERTAIN OBJECTS
The pertinent single-attribute-based classifier for small datasets classific...
(2016)application of parallel glowworm swarm optimization algorithm for data ...
A survey of modified support vector machine using particle of swarm optimizat...
Query aware determinization of uncertain
Ad

Viewers also liked (8)

DOC
Constructing inter domain packet filters to control ip (synopsis)
DOC
Controlling ip spoofing through inter domain packet filters(synopsis)
PDF
Proposed Methods of IP Spoofing Detection & Prevention
PPTX
Oceanography 1
PPT
Parallel Computing
PPT
I P S P O O F I N G
PDF
Genetic Approach to Parallel Scheduling
PPTX
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Constructing inter domain packet filters to control ip (synopsis)
Controlling ip spoofing through inter domain packet filters(synopsis)
Proposed Methods of IP Spoofing Detection & Prevention
Oceanography 1
Parallel Computing
I P S P O O F I N G
Genetic Approach to Parallel Scheduling
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Ad

Similar to Predictive job scheduling in a connection limited system using parallel genetic algorithm (synopsis) (20)

DOC
Table
PDF
DGBSA : A BATCH JOB SCHEDULINGALGORITHM WITH GA WITH REGARD TO THE THRESHOLD ...
PDF
An Improve Object-oriented Approach for Multi-objective Flexible Job-shop Sch...
PDF
AN IMPROVE OBJECT-ORIENTED APPROACH FOR MULTI-OBJECTIVE FLEXIBLE JOB-SHOP SCH...
PDF
A MULTI-POPULATION BASED FROG-MEMETIC ALGORITHM FOR JOB SHOP SCHEDULING PROBLEM
DOCX
Introduction
PPTX
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
PDF
50120130406046
PDF
Multiprocessor Scheduling of Dependent Tasks to Minimize Makespan and Reliabi...
PDF
Multiprocessor scheduling of dependent tasks to minimize makespan and reliabi...
PDF
DOC
genetic paper
PDF
MULTIPROCESSOR SCHEDULING AND PERFORMANCE EVALUATION USING ELITIST NON DOMINA...
PDF
Comparison of Dynamic Scheduling Techniques in Flexible Manufacturing System
PDF
LOAD DISTRIBUTION COMPOSITE DESIGN PATTERN FOR GENETIC ALGORITHM-BASED AUTONO...
PDF
Load Distribution Composite Design Pattern for Genetic Algorithm-Based Autono...
PPTX
Time Table Management system
PDF
Comparison+of+memetic+algorithm+and+pso+in+optimizing++multi+job+shop+schedul...
PDF
Bragged Regression Tree Algorithm for Dynamic Distribution and Scheduling of ...
PPTX
Genetic algorithm
Table
DGBSA : A BATCH JOB SCHEDULINGALGORITHM WITH GA WITH REGARD TO THE THRESHOLD ...
An Improve Object-oriented Approach for Multi-objective Flexible Job-shop Sch...
AN IMPROVE OBJECT-ORIENTED APPROACH FOR MULTI-OBJECTIVE FLEXIBLE JOB-SHOP SCH...
A MULTI-POPULATION BASED FROG-MEMETIC ALGORITHM FOR JOB SHOP SCHEDULING PROBLEM
Introduction
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
50120130406046
Multiprocessor Scheduling of Dependent Tasks to Minimize Makespan and Reliabi...
Multiprocessor scheduling of dependent tasks to minimize makespan and reliabi...
genetic paper
MULTIPROCESSOR SCHEDULING AND PERFORMANCE EVALUATION USING ELITIST NON DOMINA...
Comparison of Dynamic Scheduling Techniques in Flexible Manufacturing System
LOAD DISTRIBUTION COMPOSITE DESIGN PATTERN FOR GENETIC ALGORITHM-BASED AUTONO...
Load Distribution Composite Design Pattern for Genetic Algorithm-Based Autono...
Time Table Management system
Comparison+of+memetic+algorithm+and+pso+in+optimizing++multi+job+shop+schedul...
Bragged Regression Tree Algorithm for Dynamic Distribution and Scheduling of ...
Genetic algorithm

More from Mumbai Academisc (20)

DOC
Non ieee java projects list
DOC
Non ieee dot net projects list
DOC
Ieee java projects list
DOC
Ieee 2014 java projects list
DOC
Ieee 2014 dot net projects list
DOC
Ieee 2013 java projects list
DOC
Ieee 2013 dot net projects list
DOC
Ieee 2012 dot net projects list
PPT
Spring ppt
PDF
Ejb notes
PDF
Java web programming
PDF
Java programming-examples
PPTX
Hibernate tutorial
DOCX
J2ee project lists:-Mumbai Academics
PPT
Web based development
PPTX
Java tutorial part 4
PPTX
Java tutorial part 3
PPTX
Java tutorial part 2
PDF
Engineering
Non ieee java projects list
Non ieee dot net projects list
Ieee java projects list
Ieee 2014 java projects list
Ieee 2014 dot net projects list
Ieee 2013 java projects list
Ieee 2013 dot net projects list
Ieee 2012 dot net projects list
Spring ppt
Ejb notes
Java web programming
Java programming-examples
Hibernate tutorial
J2ee project lists:-Mumbai Academics
Web based development
Java tutorial part 4
Java tutorial part 3
Java tutorial part 2
Engineering

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Approach and Philosophy of On baking technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Review of recent advances in non-invasive hemoglobin estimation
NewMind AI Weekly Chronicles - August'25-Week II
Machine learning based COVID-19 study performance prediction
Encapsulation_ Review paper, used for researhc scholars
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
A comparative analysis of optical character recognition models for extracting...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
sap open course for s4hana steps from ECC to s4
Assigned Numbers - 2025 - Bluetooth® Document
Approach and Philosophy of On baking technology
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
20250228 LYD VKU AI Blended-Learning.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Dropbox Q2 2025 Financial Results & Investor Presentation
Mobile App Security Testing_ A Comprehensive Guide.pdf

Predictive job scheduling in a connection limited system using parallel genetic algorithm (synopsis)

  • 1. Predictive Job Scheduling in a Connection Limited System using Parallel Genetic Algorithm (Synopsis)
  • 2. INTRODUCTION Most job-scheduling approaches for parallel machines apply space sharing which means allocating CPUs/nodes to jobs in a dedicated manner and sharing the machine among multiple jobs by allocation on different subsets of nodes. Some approaches apply time sharing (or better to say a combination of time and space sharing), i.e. use multiple time slices per CPU/node. Job scheduling determines when and where to execute the job, given a stream of parallel jobs and set of computing resources. In a standard working model, when a parallel job arrives to the system, the scheduler tries to allocate required number of processors for the duration of runtime to the job and, if available, starts the job immediately. If the requested processors are currently unavailable, the job is queued and scheduled to start at a later time. The most common metrics evaluated include system metrics such as the system utilization, throughput, etc. and users metrics such as turnaround time, wait time, etc. The typical charging model is based on the amount of total resources used (resources $times$ runtime) by any job.
  • 3. Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Most companies already collect and refine massive quantities of data. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line. When implemented on high performance client/server or parallel processing computers, data mining tools can analyze massive databases to deliver answers to questions such as, "Which clients are most likely to respond to my next promotional mailing, and why?"
  • 4. Data mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns using tools such as classification, association rule mining, clustering, etc.. Data mining is a complex topic and has links with multiple core fields such as computer science and adds value to rich seminal computational techniques from statistics, information retrieval, machine learning and pattern recognition. Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery. Data mining is ready for application in the business community because it is supported by three technologies that are now sufficiently mature: o Massive data collection o Powerful multiprocessor computers o Data mining algorithms Commercial databases are growing at unprecedented rates. A recent META Group survey of data warehouse projects found that 19% of
  • 5. respondents are beyond the 50 gigabyte level, while 59% expect to be there by second quarter of 1996.1 In some industries, such as retail, these numbers can be much larger. The accompanying need for improved computational engines can now be met in a cost-effective manner with parallel multiprocessor computer technology. Data mining algorithms embody techniques that have existed for at least 10 years, but have only recently been implemented as mature, reliable, understandable tools that consistently outperform older statistical methods.
  • 6. Overview of the System There are mainly two types of scheduling namely the system level scheduling and the application level scheduling. The scheduling system will analyze the load situation of every node and select one node to run the job. The scheduling policy is to optimize the total performance of the whole system. If the system is heavily loaded, the scheduling system has to realize the load balancing and increase the throughput and resource utilization under restricted conditions. This kind of scheduling is known as the system level scheduling. If multiple jobs arrive within a unit scheduling time slot, the scheduling system shall allocate an appropriate number of jobs to every node in order to finish these jobs under a defined objective. Obviously, the objective is usually the minimal average execution time. This scheduling policy is application-oriented so we call it application-level scheduling. A genetic algorithm (or GA) is a search technique used in computing to find true or approximate solutions to optimization and search problems. Genetic algorithms are categorized as global search heuristics. Genetic algorithms are a particular class of evolutionary algorithms that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover (also called recombination).
  • 7. Genetic algorithms are implemented as a computer simulation in which a population of abstract representations (called chromosomes or the genotype or the genome) of candidate solutions (called individuals, creatures, or phenotypes) to an optimization problem evolves toward better solutions. Traditionally, solutions are represented in binary as strings of 0s and 1s, but other encodings are also possible. The evolution usually starts from a population of randomly generated individuals and happens in generations. In each generation, the fitness of every individual in the population is evaluated, multiple individuals are stochastically selected from the current population (based on their fitness), and modified (recombined and possibly mutated) to form a new population. The new population is then used in the next iteration of the algorithm. Commonly, the algorithm terminates when either a maximum number of generations has been produced, or a satisfactory fitness level has been reached for the population. If the algorithm has terminated due to a maximum number of generations, a satisfactory solution may or may not have been reached. A typical genetic algorithm requires two things to be defined: 1. a genetic representation of the solution domain, 2. a fitness function to evaluate the solution domain.
  • 8. A standard representation of the solution is as an array of bits. Arrays of other types and structures can be used in essentially the same way. The main property that makes these genetic representations convenient is that their parts are easily aligned due to their fixed size, that facilitates simple crossover operation. Variable length representations may also be used, but crossover implementation is more complex in this case. Tree-like representations are explored in Genetic programming and free-form representations are explored in HBGA. The fitness function is defined over the genetic representation and measures the quality of the represented solution. The fitness function is always problem dependent. For instance, in the knapsack problem we want to maximize the total value of objects that we can put in a knapsack of some fixed capacity. A representation of a solution might be an array of bits, where each bit represents a different object, and the value of the bit (0 or 1) represents whether or not the object is in the knapsack. Not every such representation is valid, as the size of objects may exceed the capacity of the knapsack. The fitness of the solution is the sum of values of all objects in the knapsack if the representation is valid, or 0 otherwise. In some problems, it is hard or even impossible to define the fitness expression; in these cases, interactive genetic algorithms are used.
  • 9. Once we have the genetic representation and the fitness function defined, GA proceeds to initialize a population of solutions randomly, then improve it through repetitive application of mutation, crossover, and selection operators.
  • 10. Abstract Job scheduling is the key feature of any computing environment and the efficiency of computing depends largely on the scheduling technique used. Intelligence is the key factor which is lacking in the job scheduling techniques of today. Genetic algorithms are powerful search techniques based on the mechanisms of natural selection and natural genetics. Multiple jobs are handled by the scheduler and the resource the job needs are in remote locations. Here we assume that the resource a job needs are in a location and not split over nodes and each node that has a resource runs a fixed number of jobs. The existing algorithms used are non predictive and employs greedy based algorithms or a variant of it. The efficiency of the job scheduling process would increase if previous experience and the genetic algorithms are used. In this paper, we propose a model of the scheduling algorithm where the scheduler can learn from previous experiences and an effective job scheduling is achieved as time progresses.
  • 11. Description of Problem The similar system is already available are non predictive and employs greedy based algorithms or a variant of it. That is the existing system will not predict in advance regarding the situation. So we can not schedule the jobs in network in such a way that the resources are utilized at the optimal level. The problem is to reduce the processing overhead during scheduling. The proposed system work to data transfer between computers of two networks. generally,during data transfer between pc's of two different networks. Existing Method The Data mining Algorithms can be categorized into the following :  Association Algorithm  Classification  Clustering Algorithm Classification:
  • 12. The process of dividing a dataset into mutually exclusive groups such that the members of each group are as "close" as possible to one another, and different groups are as "far" as possible from one another, where distance is measured with respect to specific variable(s) you are trying to predict. For example, a typical classification problem is to divide a database of companies into groups that are as homogeneous as possible with respect to a creditworthiness variable with values "Good" and "Bad." Clustering: The process of dividing a dataset into mutually exclusive groups such that the members of each group are as "close" as possible to one another, and different groups are as "far" as possible from one another, where distance is measured with respect to all available variables. Given databases of sufficient size and quality, data mining technology can generate new business opportunities by providing these capabilities: • Automated prediction of trends and behaviors. Data mining automates the process of finding predictive information in large databases. Questions that traditionally required extensive handson analysis can now be answered directly from the data —
  • 13. quickly. A typical example of a predictive problem is targeted marketing. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. Other predictive problems include forecasting bankruptcy and other forms of default, and identifying segments of a population likely to respond similarly to given events. • Automated discovery of previously unknown patterns. Data mining tools sweep through databases and identify previously hidden patterns in one step. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors.
  • 14. Proposed System Job scheduling is the key feature of any computing environment and the efficiency of computing depends largely on the scheduling technique used. Popular algorithm called genetic concept is used in the systems across the network and scheduling the job according to predicting the load. Here the system will take care of the scheduling of data packets between the source and destination computers. • Job scheduling to route the packets at all the ports in the router • Maintaining queue of data packets and scheduling algorithm is implemented • First Come First Serve scheduling and Genetic algorithm scheduling is called for source and destination • Comparison of two algorithm is shown in this proposed system
  • 15. Hardware specifications: Processor RAM : : Intel Processor IV 128 MB Hard disk : 20 GB CD drive : 40 x Samsung Floppy drive : 1.44 MB Monitor : 15’ Samtron color Keyboard Mouse : : 108 mercury keyboard Logitech mouse Software Specification Operating System – Windows XP/2000 Language used – J2sdk1.4.0, JCreator
  • 16. Module Design Simulated Model : The simulated model of network is constructed by keeping group of computer as Network 0 and Network 1. In between the two network the router is placed from where the data from one network flows to other network. First Come First Serve Algorithm: The packet transfer between the network in implemented using FCFS algorithm Genetic Algorithm: The packet transfer between the network in implemented using Genetic algorithm. The algorithm details were discussed in Proposed system design. Projecting Result and Comparison: The data transfer between the network of source and destination is shown by drawing the path between source and destination. For drawing the path , the points across the network is also collected. The comparison of two algorithm result are displayed to the user in separate frame to see the efficiency of Genetic algorithm