2. 2
What is a Computer Program?
• To exactly know, what is data structure?
We must know:
– What is a computer program?
Input
Some mysterious
processing Output
3. Data Structures
Prepares the students for the more
advanced material students will encounter
in later courses.
Cover well-known data structures such as
dynamic arrays, linked lists, stacks,
queues, tree and graphs.
Implement data structures in C++
4. 4
Example
• Data structure for storing data of students:-
– Arrays
– Linked Lists
• Issues
– Space needed
– Operations efficiency (Time required to complete
operations)
• Retrieval
• Insertion
• Deletion
– Frequency of usage of above operations
5. 5
What data structure to use?
Data structures let the input and output be represented in a way that
can be handled efficiently and effectively.
array
Linked list
tree
queue
stack
6. Organizing Data
Any organization for a collection of records
that can be searched, processed in any
order, or modified.
The choice of data structure and algorithm
can make the difference between a
program running in a few seconds or many
days.
7. 7
7
What’s the difference
• Different types of values
• Different structures
– No structure – just a collection of values
– Linear structure of values – the order matters
– Set of key-value pairs
– Hierarchical structures
– Grid/table
– ….
• Different access disciplines
– get, put, remove anywhere
– get, put, remove only at the ends, or only at the top, or …
– get, put, remove by position, or by value, or by key, or …
– ….
8. 8
8
Good Algorithms?
• Run in less time
• Consume less memory
But computational resources (time
complexity) is usually more important
9. 9
9
Complexity
• In examining algorithm efficiency we must
understand the idea of complexity
• Complexity is the consumptions of resources.
• Most important aspect of complexity are
– Space complexity
– Time Complexity
10. 10
10
Space Complexity
• When memory was expensive we focused on making
programs as space efficient as possible and developed
schemes to make memory appear larger than it really
was (virtual memory and memory paging schemes)
• Space complexity is still important in the field of
embedded computing (hand held computer based
equipment like cell phones, palm devices, etc)
11. 11
11
Time Complexity
• Is the algorithm “fast enough” for my needs
• How much longer will the algorithm take if I
increase the amount of data it must process
• Given a set of algorithms that accomplish the
same thing, which is the right one to choose
12. Running Time of an Algorithm
• Depends upon
• Input Size
• Nature of Input
• Generally time grows with size of input, so
running time of an algorithm is usually
measured as function of input size.
• Running time is measured in terms of number
of steps/primitive operations performed
• Independent from machine, OS
13. Finding running time of an
Algorithm / Analyzing an Algorithm
• Running time is measured by number of
steps/primitive operations performed
• Steps means elementary operation like
– ,+, *,<, =, A[i] etc
• We will measure number of steps taken in
term of size of input
14. Simple Example
// Input: int A[N], array of N integers
// Output: Sum of all numbers in array A
int Sum(int A[], int N)
{
int s=0;
for (int i=0; i< N; i++)
s = s + A[i];
return s;
}
How should we analyse this?
15. Simple Example
// Input: int A[N], array of N integers
// Output: Sum of all numbers in array A
int Sum(int A[], int N){
int s=0;
for (int i=0; i< N; i++)
s = s + A[i];
return s;
}
1
2 3 4
5
6 7
8
1,2,8: Once
3,4,5,6,7: Once per each iteration
of for loop, N iteration
Total: 5N + 3
The complexity function of the
algorithm is : f(N) = 5N +3
16. Simple Example /Growth of 5n+3
Estimated running time for different values of N:
N = 10 => 53 steps
N = 100 => 503 steps
N = 1,000 => 5003 steps
N = 1,000,000 => 5,000,003 steps
As N grows, the number of steps grow in linear
proportion to N for this function “Sum”
17. What Dominates in Previous
Example?
What about the +3 and 5 in 5N+3?
– As N gets large, the +3 becomes insignificant
– 5 is inaccurate, as different operations require varying amounts
of time and also does not have any significant importance
What is fundamental is that the time is linear in N.
Asymptotic Complexity: As N gets large, concentrate on
the highest order term:
• Drop lower order terms such as +3
• Drop the constant coefficient of the highest order term
i.e. N
19. BIG OMEGA NOTATION
• If we wanted to say “running time is at least…” we
use Ω
• Big Omega notation, Ω, is used to express the lower
bounds on a function.
• If f(n) and g(n) are two complexity functions then we
can say:
f(n) is Ω(g(n)) if there exist
positive
numbers c and n0
such that 0<=f(n)>=cΩ(n)
for all n>=n0
5n+3=Ω(n)
20. BIG THETA NOTATION
• If we wish to express tight bounds we use the theta notation, Θ
• f(n) = Θ(g(n)) means that f(n) = O(c1g(n)) and f(n) = Ω(c2g(n))
20
21. WHAT DOES THIS ALL MEAN?
• If f(n) = Θ(g(n)) we say that f(n) and g(n)
grow at the same rate, asymptotically
• If f(n) = O(g(n)) and f(n) ≠ Ω(g(n)), then we
say that f(n) is asymptotically slower
growing than g(n).
• If f(n) = Ω(g(n)) and f(n) ≠ O(g(n)), then we
say that f(n) is asymptotically faster growing
than g(n).
21
22. WHICH NOTATION DO WE USE?
• To express the efficiency of our algorithms
which of the three notations should we use?
• As computer scientist we generally like to
express our algorithms as big O since we would
like to know the upper bounds of our algorithms.
• If we know the worse case then we can aim to
improve it and/or avoid it.
22
23. Big Oh Notation
If f(N) and g(N) are two complexity functions, we say
f(N) = O(g(N))
(read "f(N) is order g(N)", or "f(N) is big-O of g(N)")
if there are constants c and N0 such that for N > N0,
f(N) ≤ c * g(N)
for all sufficiently large N.
24. Big Oh Notation
• O(f(n)) =
{g(n) : there exists positive constants c and n0
such that 0 <= g(n) <= c f(n) }
• O(f(n)) is a set of functions.
• n = O(n2
) means that function n belongs to
the set of functions O(n2
)
25. Big-Oh Notation
• Even though it is correct to say “7n - 3 is
O(n3
)”, a better statement is “7n - 3 is O(n)”, that
is, one should make the approximation as tight
as possible
• Simple Rule:
Drop lower order terms and constant factors
7n-3 is O(n)
8n2
log n + 5n2
+ n is O(n2
log n)
27. Performance Classification
f(n) Classification
1 Constant: run time is fixed, and does not depend upon n. Most instructions are
executed once, or only a few times, regardless of the amount of information being
processed
log n Logarithmic: when n increases, so does run time, but much slower. Common in
programs which solve large problems by transforming them into smaller problems.
n Linear: run time varies directly with n. Typically, a small amount of processing is
done on each element.
n log n When n doubles, run time slightly more than doubles. Common in programs which
break a problem down into smaller sub-problems, solves them independently, then
combines solutions
n2 Quadratic: when n doubles, runtime increases fourfold. Practical only for small
problems; typically the program processes all pairs of input (e.g. in a double nested
loop).
n3 Cubic: when n doubles, runtime increases eightfold
2n Exponential: when n doubles, run time squares. This is often the result of a natural,
“brute force” solution.
28. Size does matter
What happens if we double the input size N?
N log2N 5N N log2N N2
2N
8 3 40 24 64 256
16 4 80 64 256 65536
32 5 160 160 1024 ~109
64 6 320 384 4096 ~1019
128 7 640 896 16384 ~1038
256 8 1280 2048 65536 ~1076
30. Size does matter
• Suppose a program has run time O(n!) and the run
time for
n = 10 is 1 second
For n = 12, the run time is 2 minutes
For n = 14, the run time is 6 hours
For n = 16, the run time is 2 months
For n = 18, the run time is 50 years
For n = 20, the run time is 200 centuries
31. Standard Analysis Techniques
• Constant time statements
• Analyzing Loops
• Analyzing Nested Loops
• Analyzing Sequence of Statements
• Analyzing Conditional Statements
32. Constant time statements
• Simplest case: O(1) time statements
• Assignment statements of simple data types
int x = y;
• Arithmetic operations:
x = 5 * y + 4 - z;
• Array referencing:
A[j] = 5;
• Array assignment:
j, A[j] = 5;
• Most conditional tests:
if (x < 12) ...
33. Analyzing Loops
• Any loop has two parts:
– How many iterations are performed?
– How many steps per iteration?
int sum = 0,j;
for (j=0; j < N; j++)
sum = sum +j;
– Loop executes N times (0..N-1)
– 4 = O(1) steps per iteration
• Total time is N * O(1) = O(N*1) = O(N)
34. 34
ANALYZING LOOPS – LINEAR LOOPS
• Example (have a look at this code segment):
• Efficiency is proportional to the number of iterations.
• Efficiency time function is :
f(n) = 1 + (n-1) + c*(n-1) +( n-1)
= (c+2)*(n-1) + 1
= (c+2)n – (c+2) +1
• Asymptotically, efficiency is : O(n)
34
35. Analyzing Loops
• What about this for loop?
int sum =0, j;
for (j=0; j < 100; j++)
sum = sum +j;
• Loop executes 100 times
• 4 = O(1) steps per iteration
• Total time is 100 * O(1) = O(100 * 1) = O(100)
= O(1)
36. Analyzing Nested Loops
• Treat just like a single loop and evaluate each level of
nesting as needed:
int j,k;
for (j=0; j<N; j++)
for (k=N; k>0; k--)
sum += k+j;
• Start with outer loop:
– How many iterations? N
– How much time per iteration? Need to evaluate inner loop
• Inner loop uses O(N) time
• Total time is N * O(N) = O(N*N) = O(N2
)
37. 37
HOW DID WE GET THIS ANSWER?
• When doing Big-O analysis, we sometimes have
to compute a series like: 1 + 2 + 3 + ... + (n-1) + n
• i.e. Sum of first n numbers. What is the
complexity of this?
• Gauss figured out that the sum of the first n
numbers is always:
37
38. Analyzing Sequence of Statements
• For a sequence of statements, compute their
complexity functions individually and add them
up
for (j=0; j < N; j++)
for (k =0; k < j; k++)
sum = sum + j*k;
for (l=0; l < N; l++)
sum = sum -l;
cout<<“Sum=”<<sum;
Total cost is O(N2
) + O(N) +O(1) = O(N2
)
SUM RULE
O(N2
)
O(N)
O(1)
39. Analyzing Conditional Statements
What about conditional statements such as
if (condition)
statement1;
else
statement2;
where statement1 runs in O(N) time and statement2 runs in O(N2) time?
We use "worst case" complexity: among all inputs of size N, that is the
maximum running time?
The analysis for the example above is O(N2
)
40. Best Case
• Best case is defined as which input of size n
is cheapest among all inputs of size n.
• “The best case for my algorithm is n=1
because that is the fastest.” WRONG!
Misunderstanding
41. Selecting a Data Structure
Select a data structure as follows:
1. Analyze the problem to determine the
resource constraints a solution must
meet.
2. Determine the basic operations that must
be supported. Quantify the resource
constraints for each operation.
3. Select the data structure that best meets
these requirements.
42. 42
Data Type, Data Structure, and Abstract Data Types
• Data Type
– Set of values that the variable may assume
– E.g., boolean = {false, true}, digit = {0, 1, 2, …., 9}
• Abstract Data Type
– A mathematical model, together with various operations defined on the model
– Algorithms are designed in terms of ADTs and implemented in terms of the data types
and operators supported by the programming language
• Data Structures
– Physical implementation of an ADT
– Data structures used in implementations are provided in a language (primitive or built-in)
or are built from the language constructs (user-defined)
– Each operation associated with the ADT is implemented by one or more
subroutines in the implementation
43. Array
• An ordered set (sequence) with a fixed
number of elements, all of the same type,
where the basic operation is
direct access to each element in the array
so values can be retrieved from or stored
in this element.
44. 44
Array representation
• [5, 2, 4, 8,1]
• Some of the implementations can be
1
8
4
2
5
location(i) = i
5
2
4
8
1
location(i) = 9- i
4
2
5
1
8
location(i) = (7+i)%10
45. Arrays
Properties:
– Ordered so there is a first element, a second one, etc.
– Fixed number of elements — fixed capacity
– Elements must be the same type (and size);
use arrays only for homogeneous data sets.
– Direct access: Access an element by giving its location
• The time to access each element is the same for all elements,
regardless of position.
• in contrast to sequential access (where to access an element, one
must first access all those that precede it.)
46. Declaring Arrays in C++
where
element_type is any type
array_name is the name of the array — any valid identifier
CAPACITY (a positive integer constant) is the number of
elements in the array
score[0]
score[1]
score[2]
score[3]
score[99]
.
.
.
.
.
.
element_type array_name[CAPACITY];
e.g., double score[100];
The elements (or positions) of the array are
indexed 0, 1, 2, . . ., CAPACITY - 1.
The compiler reserves a block of “consecutive”
memory locations, enough to hold CAPACITY
values of type element_type.
47. an array literal
Array Initialization
Example:
double rate[5] = {0.11, 0.13, 0.16, 0.18, 0.21};
Note 1: If fewer values supplied than array's capacity, remaining
elements assigned 0.
double rate[5] = {0.11, 0.13, 0.16};
Note 2: It is an error if more values are supplied than the declared size of
the array.
How this error is handled, however, will vary from one compiler to
another.
rate
0 1 2 3 4
0.11 0.13 0.16 0 0
rate
0 1 2 3 4
0.11 0.13 0.16 0.18 0.21
In C++, arrays can be initialized when they are declared.
Numeric arrays:
element_type num_array[CAPACITY] = {list_of_initial_values};
48. Addresses
When an array is declared, the address of the first byte (or word) in
the block of memory associated with the array is called the base
address of the array.
Each array reference must be translated into an offset from this base
address.
For example, if each element of array score will be stored in 8 bytes
and the base address of score is 0x1396. A statement such as
cout << score[3] << endl;
requires that array reference
score[3]
be translated into a memory
address: 0x1396 + 3 * sizeof
(double)
= 0x1396 + 3 * 8
= 0x13ae
The contents of the memory word with this
address 0x13ae can then be retrieved and
displayed.
An address translation like this is carried out
each time an array element is accessed.
score[3]
[0]
[1]
[2]
[3]
[99]
.
.
.
.
.
.
score 0x1396
0x13ae
What will be the
time complexity
49. Problems with Arrays
1. The capacity of Array can NOT change during
program execution.
What is the problem?
Memory wastage
Out of range errors
2. Arrays are NOT self contained objects
What is the problem?
No way to find the last value stored.
Not a self contained object as per OOP principles.
50. Dynamic Arrays
You would like to use an array data structure
but you do not know the size of the array at
compile time.
You find out when the program executes that
you need an integer array of size n=20.
Allocate an array using the new operator:
int* y = new int[20]; // or int* y = new int[n]
y[0] = 10;
y[1] = 15; // use is the same
51. Dynamic Arrays
‘y’ is a lvalue; it is a pointer that holds the
address of 20 consecutive cells in memory.
It can be assigned a value. The new operator
returns as address that is stored in y.
We can write:
y = &x[0];
y = x; // x can appear on the right
// y gets the address of the
// first cell of the x array
52. Dynamic Arrays
We must free the memory we got using the
new operator once we are done with the y
array.
delete[ ] y;
We would not do this to the x array because we
did not use new to create it.
53. Multidimensional Arrays
Most high level languages support arrays with more than one
dimension.
2D arrays are useful when data has to be arranged in tabular
form.
Higher dimensional arrays appropriate when several
characteristics associated with data.
Test 1 Test 2 Test 3 Test 4
Student 1 99.0 93.5 89.0 91.0
Student 2 66.0 68.0 84.5 82.0
Student 3 88.5 78.5 70.0 65.0
: : : : :
: : : : :
Student-n 100.0 99.5 100.0 99.0
For storage and processing, use a two-dimensional
array.
Example: A table of test scores for several different
students on
several different tests.
54. Declaring Two-Dimensional Arrays
Standard form of declaration:
element_type array_name[NUM_ROWS][NUM_COLUMNS];
Example:
const int NUM_ROWS = 30,
NUM_COLUMNS = 4;
double scoresTable[NUM_ROWS][NUM_COLUMNS];
Initialization
List the initial values in braces, row by row;
May use internal braces for each row to improve
readability.
Example:
double rates[][] = {{0.50, 0.55, 0.53}, // first row
{0.63, 0.58, 0.55}}; // second row
[0]
[1]
[2]
[3]
[29]
[0] [[1] [2] [3]
55. Processing Two-Dimensional Arrays
Remember: Rows (and) columns are numbered from zero!!
Use doubly-indexed variables:
scoresTable[2][3] is the entry in row 2 and column
3
row index column index
Use nested loops to vary the two indices, most often in a rowwise
manner.