PYTHON
Data – raw data – it can be from any source
Python used in data analysis and mc concepts
ETL – extract transform load
Oracle or mysql db -> filter – load – table => report
Based on its data it perform some analysis ( visualization analysis( charts))
MACHINE LEARNING:
Subset of AI
Working based on algorithms
Train the machines based on input and output
Machines – learn itself by seeing and understands and create algorithms own
Machine – has to learn itself – called machine learning
Eg: chatgpt, stocks, photos, recommendations, chatbot, spam
Spam – based on content , subset, sender
PYTHON
Text analysis
Statics analysis
Predictive
Diagnostic analysis
TEXT ANALYSIS – translations , speak – convert to text
Understand the emotions just by using :) :(
Eg : google language convert
STATISTICS ANLYSIS - used to store data
100 years
Used on data only
PREDICTIVE ANALYSIS - Your having data – based on the data ( weather
report ) you predict
Not give 100 percent predictions but you do some predictions (100%)
Your going to give the predictions for the future
Eg: Weather – store 10 days of report
Based on the data your going to predict here
DIAGNOSTIC ANALYSIS – based on the data it will going to find the root cause
of issue
PYTHON => you want to perform some analysis python is used
-programming language
Difference between script language and programming language:
Scripting – wont be much interaction with compiler
Interpreter is there
Line by line execution is there
programming language – compiled – run
code – compiled- machine code – execute
python – interpreted language – so there is hidden compilation will be there
it also programming language but it is interpreted also
variable – case sensitive
DATA TYPES:
Mutable – change
1.LIST []
2. SET {}
3. DICTIONARY {key:value}
Immutable – cannot change after creating
1. NUMBER - int, float, complex
2. STRING - characters
3. TUPLE – ()
LIBRARIES:
1. Numpy
2. Panda
3. RE
4. Matplotlib
NUM:
No need to mention datatype in py
# commenting
Print(type(a)) <class ‘compile’>
Type – datatype of a
--case sensitive for declared variables
STRING:
“oracle”=’oracle’
Indexing forward 0
Backward-1
Range => a[1:5] print from 1 to 4
INDEXING:
A=”ORACLE”
Print(A)
Print(a[-1])
Print(a.lower())
Print(a.capitalize()) - starting value should capitalize
A[2] - retrieve 2nd index position
a[1:4] --- (1 to 4-1 )
index start from 0
split function:
a=”oracle python”
print(a.split(‘e’))
output:
[‘oracl’,’python’]
It will split whenever they find e in the string and split it to separate words
LIST:
Index start from 0
Store heterogeneous elements
Ordered collection
A= [10,20,20,’hi’,’HELLO’,[10,20]] -- nested list ( list inside a list)
A[1] -- 20
A[5][1] -- 20
A[1:6] -- [20,30,hi,
Add:
L1.append(2)
Print(l1)
L2=[] #empty list
Access : print(l2)
Output: []
L1[1] = 80;
Print(l1)
[10,80,20,……….]
It will replace the old data
Because it is mutable – list
TUPLE: ( )
Store heterogeneous element
Ordered collection
Perform indexing but cant change the value
immutable
Immutable data type – you cant add anything
T1=(10,20,’hi’,’hello’)
Print(t1)
T1.append(4) -- throw error because it it immutable
T1[2] =’oracle’ ---error – you cant change the value once it is created
If you want to access the element you can do in tuple
User input:
A=input(‘enter the value for a:’)
Print(a)
Output: enter the value for a : 10
A=10
SET :
Unordered
Cant perform indexing
Cant predict
Set is a mutable data type
Duplication of elements are not allowed
{ } -- denote as set
S1 = { } --it is not an empty set , this is treated as dictionary
If you want to create a empty set –
s1=set() --this treated as empty set
print (s1)
output: set()
Cant perform indexing – coz duplicates are removed
Set is an unordered collection
Even if you giving in a order – not guaranteed that how it stores
DICTIONARY:
Unordered collection
Data is stored in key value pairs
You cant change the keys -- because it is immutable
Values are mutable – you can change
Key value
10 suganya
D1 ={10: ‘suganya’, ‘a’ : ‘harini’}
Print(d1)
LIST :
Mutable data type
Hetrogeneous
ordered collection
a=[10,20,30,'hi','HELLO',[10,20]]
a[1]=20
a[5][0]=20
indexing
TUPLE:
immutable data type
()
SET
duplication of elements not allowed
s1={}=it is not an empty set
s1=set()
unordered collection-
display
you cant perfrom indexing
DICTIONARY
key value
10 suganya
a harini
print(d1[10])
NUMBER-int,float,complex
STRING- forward,backward
Tuple()-immutable -ORDERED COLELCTION,indexing
LIST-[]=>ordered collection,mutable,indexing
SET=> {}=>SET()=>unordered collection
DICTIONRY=>UC=>{key:value,}
CONTROL STRUCTURE:
When you want to execute multiple times you use this
Eg:
A=10
B=30
If (a>b):
Print(“a is greater than b”)
Else:
Print(“b is greater than a”)
-------
L1=[1,20,40,46,56]
Print(l1)
For I in l1: --index element will be directly assigned to i
Print(i)
l1=[10,20,30,40,30,40]
for i in range(1,5,2):
print(l1[i])
FUNCTIONS:
Def fun(name):
Print(name)
Fun(“suganya”)
Def add():
A=10
B=20
C=A+B
Return c
Function defined with return and without also
Def add(a,b)
C=a+b
Return c
Call:
X=10
Y=30
Total=add(x,y)
Print(total) --pass by reference
Def f1(a,b=2) --if I pass b variable it will take and if I don’t pass any
variable for b then the default value 2 will be assigned to b
Print(a+b)
Passing : Print(f1(10,20))
Def f1(name,id):
Print(“hi”,name);
Print(“age”,age);
Passing: print(f1(age=10,name=”sugan”))
Def name(*a): --variable length arguments . if we don’t know how
many no of arguments you have passed then use this to pass any no of
arguments
Eg:
Def name(*a):
For I in a :
Print(i)
Passing: name(10,20,30)
Out:
10
20
30
Variable length keyword arguments:
Def f1(**kwargs): --when we don’t the argument name itself then
we use this
For a,b in kwargs.items():
Print(“%s==%s”, %(a,b))
Passing: f1(firstname=”sugan”,lastname=”b”) --here we passing the
argument names also to the function
Out:
Firstname==sugan
Lastname==b
SCOPE OF VARIABLE:
global
local – access within the function
lamda – anonymous function ,
does not have any proper name
inside this you have only one expression
eg:
add=lamda x,y:x+y
passing: print(add(1,4))
output: 5
l1=[11,50,60]
l2=list(filter( lamda x : ( x % 2 == 0), l1)) --filter – if the expression is
true it is added to the list
l2=list(map(lamda x: x * 2,l1)) -- if
RECURSIVE FUNCTION:
Function that calls itself
Factorial:
Def fact(x):
If x==1:
Return 1
Else:
Return (x * fact(x-1))
Passing: print(fact(4))
Pop:
A=l1.pop(1) --in pop you passing the index
Print(l1)
Remove:
In remove you passing the value
L1.remove(3) -- remove the value 3
L1.clear() -- remove all value
Delete:
Del l1 -- entire list is deleted not the element
SET AND SUBSET :
Subset elements must be in set
Pop – removing elements should be printed, remove randomly
Remove – remove element
Discard(20) – remove method raise an error if the element not exist and
discard method will not do this
S1.intersection(s2) – common elements will be print , returns a new set
without the unwanted items
S1.difference(s2) – common elements will be removed and the elements
remaining in the first set will be displayed
S1.symmetric_difference(s2) – common elements will be displayed and the
elements which are left in s1 and s2 are printed
S1.intersection_update(s2) – removes the unwanted items from the original
set
S1 – original set so unwanted items removed in this set
S2.issubset(s1)) - return true if s2 element must be present in s1
Out=re.match(“a”,”suganya”)
It will match then return true
Eg:
Text =”hello world, hello world”
Result=re.search(“world”,text)
Search – search for first occurrence
Findall(“world”,text)
Findall – search for entire string, find all the occurrences
“\d “– only digit , if it is metacharacter we use \
NUMPY:
Array concept
Homogeneous
Np – library ( pre defined package)
You have to import - import numpy as np
Eg:
A1=np.array([1,4,6])
Print(a1)
Data type mention:
A1=np.array([3,4,5]),dtype=float or complex)
Print(a1)
Functions:
Np.arange(1,8,2) -- after how many elements created
2 – step ( like increment)
Np.zeros(5) -- 5 floating point numbers
[0.0.0.0.0]
Np.ones(4) -- 4 ones will be printed
[1.1.1.1.1]
Np.linespace(0,4,3) --equally split and create an array
[0. 2. 4.]
Np.random.rand(5) -- randomly creates a number from 0 to 1
( floating point numbers) rand – create integer numbers
Np.random.randn(5) -- randomly , no order,
Np.random.randint(1,20,4) -- randomly generates a number from 1 to
20 but to generate only 4 numbers
Want to know the dimensions:
A2=np.Array([1,5,3,6],[3,5,8,4])
Print(a2.ndim) -- it will show which type of dimensions
Print(a2.shape) -- shows the no of rows and columns in array
A2.shape=(4,2) --display element of 4 row and 2 cols
Print(a2)
Print(a2.size) --no of elements in array
Print(a2.dtype) -- int32 display the datatype
A2=a1[:3] -- display first 3 rows
[:3,:3,:2] --
Want diagonal elements - np.array(a1[[0,1,2,3],[0,1,2,3]])
Boolean indexing - want to access only negative elements - A2=a2[a2<0]
Print(a1*2) -- multiply by 2 every elements
Print(a1.sum()) --print the total sum of elements
Print(np.power(a1,a2)) --a1^2
Np.power(a1,2)) -- multiply by 2
a1.max() --print highest value
a1.min() -- print minimum value
a1.max(axis=0)) --in column wise it will be showing
a1.max(axis=1) -- in row wise it will be showing the max value
print(a1+a2) -- add or multiply arrays also
character also we do in numpy:
print(np.char.add([‘a’,’b’], [‘c’,’d’]))
output : [‘ac’ , ‘bd’] -- first elements will be added in the first list and
second element will be added in the second list
np.char.multiply(‘a’,20)
aaaaaaaaaaaaaaaaaa
np.char.center(‘a’,10,fillchar=’*’))
out: ****a*****
np.random.sample() -- random num generate using sample function
random.sample(size=(2,2)) -- 2 rows and 2 columns , here only you
specified the size
random number – it generate random number from 0 to 1
print(np.argsort(a1)) --sort an array
print(np.argsmax(a1))
print(np.sort(a1))
print(np.count_nonzero(a1)) --it will print count of non zero elements
pandas:
series – 1d
dataframe – 2d
PYTHON:
Tuple inside the list access using index by – [][] -> specifying separate
box symbol
List
Remove – we have to specify elements
Pop() – pop the last element and print
Discard – not error
Pop( 1) - we also specify index
Extend - [[2,3]] it give multiple values to list and in the nested list
Dictionary – want to add then specify assignment operator
Set – use add
List – use append and extend
Tuple – not use any of this
Dictionary :
Delete element by
1. Del dict[1]
2. Dict.pop(3)
3. Dict.clear()
Set:
Perform remove and discard
List:
Using by del ----del list[1]
Tuple – not allow this operation
Common prog:
Prime
Prime below 50
Average
Largest element
Second max
Largest list