NumPy provides two fundamental objects for multi-dimensional arrays: the N-dimensional array object (ndarray) and the universal function object (ufunc). An ndarray is a homogeneous collection of items indexed using N integers. The shape and data type define an ndarray. NumPy arrays have a dtype attribute that returns the data type layout. Arrays can be created using the array() function and have various dimensions like 0D, 1D, 2D and 3D.
NumPy is a Python library that provides multidimensional array and matrix objects, along with tools to work with these objects. It is used for working with arrays and matrices, and has functions for linear algebra, Fourier transforms, and matrices. NumPy was created in 2005 and provides fast operations on arrays and matrices.
Vectorization refers to performing operations on entire NumPy arrays or sequences of data without using explicit loops. This allows computations to be performed more efficiently by leveraging optimized low-level code. Traditional Python code may use loops to perform operations element-wise, whereas NumPy allows the same operations to be performed vectorized on entire arrays. Broadcasting rules allow operations between arrays of different shapes by automatically expanding dimensions. Vectorization is a key technique for speeding up numerical Python code using NumPy.
1. NumPy is a fundamental Python library for numerical computing that provides support for arrays and vectorized computations.
2. Pandas is a popular Python library for data manipulation and analysis that provides DataFrame and Series data structures to work with tabular data.
3. When performing arithmetic operations between DataFrames or Series in Pandas, the data is automatically aligned based on index and column labels to maintain data integrity. NumPy also automatically broadcasts arrays during arithmetic to align dimensions element-wise.
NumPy is a Python package that provides multidimensional array and matrix objects as well as tools to work with these objects. It was created to handle large, multi-dimensional arrays and matrices efficiently. NumPy arrays enable fast operations on large datasets and facilitate scientific computing using Python. NumPy also contains functions for Fourier transforms, random number generation and linear algebra operations.
This document provides an overview of working with DataFrames in Python using the Pandas library. It discusses:
1. What a DataFrame is - a two-dimensional, size-mutable, tabular data structure in Pandas for data manipulation.
2. How to create DataFrames from dictionaries, lists, CSV files and more.
3. Common tasks like viewing data, selecting rows/columns, modifying data, analysis and saving DataFrames.
It also covers indexing and filtering DataFrames using labels or boolean conditions, arithmetic alignment in Pandas and NumPy, and vectorized computation in NumPy.
Essential numpy before you start your Machine Learning journey in python.pdfSmrati Kumar Katiyar
This document discusses various ways to create, access, and manipulate NumPy arrays. It covers creating arrays from lists, tuples, ranges, random data, identity matrices, and existing data. It also covers element-wise operations like addition, subtraction, multiplication and division. Other topics include checking array shape and datatype, matrix multiplication, representing vectors, matrices and tensors, broadcasting, and accessing elements through indexing, slicing, boolean and integer indexing.
This document discusses NumPy and Pandas libraries in Python. It begins with an introduction to NumPy, covering NumPy arrays, slicing and indexing multi-dimensional arrays, copying vs viewing arrays, array shapes, reshaping arrays, iterating over arrays, and joining arrays. It then discusses Pandas, covering Pandas Series and DataFrames, indexing and selecting data, sorting and filtering, descriptive statistics, and plotting. Examples are provided with each concept to demonstrate the use of NumPy and Pandas functionality.
Homework Assignment – Array Technical DocumentWrite a technical .pdfaroraopticals15
Homework Assignment – Array Technical Document
Write a technical document that describes the structure and use of arrays. The document should
be 3 to 5 pages and include an Introduction section, giving a brief synopsis of the document and
arrays, a Body section, describing arrays and giving an annotated example of their use as a
programming construct, and a conclusion to revisit important information about arrays described
in the Body of the document. Some suggested material to include:
Declaring arrays of various types
Array pointers
Printing and processing arrays
Sorting and searching arrays
Multidimensional arrays
Indexing arrays of various dimension
Array representation in memory by data type
Passing arrays as arguments
If you find any useful images on the Internet, you can use them as long as you cite the source in
end notes.
Solution
Array is a collection of variables of the same type that are referenced by a common name.
Specific elements or variables in the array are accessed by means of index into the array.
If taking about C, In C all arrays consist of contiguous memory locations. The lowest address
corresponds to the first element in the array while the largest address corresponds to the last
element in the array.
C supports both single and multi-dimensional arrays.
1) Single Dimension Arrays:-
Syntax:- type var_name[size];
where type is the type of each element in the array, var_name is any valid identifier, and size is
the number of elements in the array which has to be a constant value.
*Array always use zero as index to first element.
The valid indices for array above are 0 .. 4, i.e. 0 .. number of elements - 1
For Example :- To load an array with values 0 .. 99
int x[100] ;
int i ;
for ( i = 0; i < 100; i++ )
x[i] = i ;
To determine to size of an array at run time the sizeof operator is used. This returns the size in
bytes of its argument. The name of the array is given as the operand
size_of_array = sizeof ( array_name ) ;
2) Initialisg array:-
Arrays can be initialised at time of declaration in the following manner.
type array[ size ] = { value list };
For Example :-
int i[5] = {1, 2, 3, 4, 5 } ;
i[0] = 1, i[1] = 2, etc.
The size specification in the declaration may be omitted which causes the compiler to count the
number of elements in the value list and allocate appropriate storage.
For Example :- int i[ ] = { 1, 2, 3, 4, 5 } ;
3) Multidimensional array:-
Multidimensional arrays of any dimension are possible in C but in practice only two or three
dimensional arrays are workable. The most common multidimensional array is a two
dimensional array for example the computer display, board games, a mathematical matrix etc.
Syntax :type name [ rows ] [ columns ] ;
For Example :- 2D array of dimension 2 X 3.
int d[ 2 ] [ 3 ] ;
A two dimensional array is actually an array of arrays, in the above case an array of two integer
arrays (the rows) each with three elements, and is stored row-wise in memory.
For Example :- Program to fill .
This document provides an overview of NumPy arrays, including how to create and manipulate vectors (1D arrays) and matrices (2D arrays). It discusses NumPy data types and shapes, and how to index, slice, and perform common operations on arrays like summation, multiplication, and dot products. It also compares the performance of vectorized NumPy operations versus equivalent Python for loops.
NumPy is a Python library that provides multidimensional array and matrix objects to perform scientific computing. It contains efficient functions for operations on arrays like arithmetic, aggregation, copying, indexing, slicing, and reshaping. NumPy arrays have advantages over native Python sequences like fixed size and efficient mathematical operations. Common NumPy operations include elementwise arithmetic, aggregation functions, copying and transposing arrays, changing array shapes, and indexing/slicing arrays.
This document provides an introduction to NumPy arrays. It discusses arrays versus lists, how to create NumPy arrays using various functions like arange() and zeros(), and how to perform operations on NumPy arrays such as arithmetic, mathematical functions, and manipulations. It also covers installing NumPy, importing it, and checking the version. NumPy arrays allow fast and efficient storage and manipulation of numerical data in Python.
This document provides an introduction to NumPy arrays. It discusses arrays versus lists, how to create NumPy arrays using various functions like arange() and zeros(), and how to perform operations on NumPy arrays like arithmetic, mathematical functions, and manipulations. It also covers installing NumPy, importing it, and checking the version. NumPy arrays allow fast and efficient storage and manipulation of numerical data in Python.
The document discusses various operations that can be performed on arrays, including traversing, inserting, searching, deleting, merging, and sorting elements. It provides examples and algorithms for traversing an array, inserting and deleting elements, and merging two arrays. It also discusses two-dimensional arrays and how to store user input data in a 2D array. Limitations of arrays include their fixed size and issues with insertion/deletion due to shifting elements.
NumPy arrays can be created from Python lists by passing the list to the np.array() method. NumPy arrays differ from lists in that elements are stored contiguously in memory without comma separators. Multi-dimensional arrays can be created from nested lists. NumPy array attributes like shape and ndim provide information about array dimensions and elements. Individual elements can be accessed using indexes, and whole arrays or subsets of arrays can undergo vectorized operations for efficient processing.
This document provides an introduction to NumPy, the fundamental package for scientific computing with Python. It discusses what NumPy is, why it is useful compared to regular Python lists, how to define arrays of different dimensions, and how to initialize, manipulate, and perform operations on NumPy arrays. Some key capabilities of NumPy include N-dimensional arrays, broadcasting functions, integration with C/C++ and Fortran code, and tools for linear algebra and Fourier transforms.
This document provides an overview of working with DataFrames in Python using the Pandas library. It discusses:
1. What a DataFrame is - a two-dimensional, size-mutable, tabular data structure in Pandas for data manipulation.
2. How to create DataFrames from dictionaries, lists, CSV files and more.
3. Common tasks like viewing data, selecting rows/columns, modifying data, analysis and saving DataFrames.
It also covers indexing and filtering DataFrames using labels or boolean conditions, arithmetic alignment in Pandas and NumPy, and vectorized computation in NumPy.
Essential numpy before you start your Machine Learning journey in python.pdfSmrati Kumar Katiyar
This document discusses various ways to create, access, and manipulate NumPy arrays. It covers creating arrays from lists, tuples, ranges, random data, identity matrices, and existing data. It also covers element-wise operations like addition, subtraction, multiplication and division. Other topics include checking array shape and datatype, matrix multiplication, representing vectors, matrices and tensors, broadcasting, and accessing elements through indexing, slicing, boolean and integer indexing.
This document discusses NumPy and Pandas libraries in Python. It begins with an introduction to NumPy, covering NumPy arrays, slicing and indexing multi-dimensional arrays, copying vs viewing arrays, array shapes, reshaping arrays, iterating over arrays, and joining arrays. It then discusses Pandas, covering Pandas Series and DataFrames, indexing and selecting data, sorting and filtering, descriptive statistics, and plotting. Examples are provided with each concept to demonstrate the use of NumPy and Pandas functionality.
Homework Assignment – Array Technical DocumentWrite a technical .pdfaroraopticals15
Homework Assignment – Array Technical Document
Write a technical document that describes the structure and use of arrays. The document should
be 3 to 5 pages and include an Introduction section, giving a brief synopsis of the document and
arrays, a Body section, describing arrays and giving an annotated example of their use as a
programming construct, and a conclusion to revisit important information about arrays described
in the Body of the document. Some suggested material to include:
Declaring arrays of various types
Array pointers
Printing and processing arrays
Sorting and searching arrays
Multidimensional arrays
Indexing arrays of various dimension
Array representation in memory by data type
Passing arrays as arguments
If you find any useful images on the Internet, you can use them as long as you cite the source in
end notes.
Solution
Array is a collection of variables of the same type that are referenced by a common name.
Specific elements or variables in the array are accessed by means of index into the array.
If taking about C, In C all arrays consist of contiguous memory locations. The lowest address
corresponds to the first element in the array while the largest address corresponds to the last
element in the array.
C supports both single and multi-dimensional arrays.
1) Single Dimension Arrays:-
Syntax:- type var_name[size];
where type is the type of each element in the array, var_name is any valid identifier, and size is
the number of elements in the array which has to be a constant value.
*Array always use zero as index to first element.
The valid indices for array above are 0 .. 4, i.e. 0 .. number of elements - 1
For Example :- To load an array with values 0 .. 99
int x[100] ;
int i ;
for ( i = 0; i < 100; i++ )
x[i] = i ;
To determine to size of an array at run time the sizeof operator is used. This returns the size in
bytes of its argument. The name of the array is given as the operand
size_of_array = sizeof ( array_name ) ;
2) Initialisg array:-
Arrays can be initialised at time of declaration in the following manner.
type array[ size ] = { value list };
For Example :-
int i[5] = {1, 2, 3, 4, 5 } ;
i[0] = 1, i[1] = 2, etc.
The size specification in the declaration may be omitted which causes the compiler to count the
number of elements in the value list and allocate appropriate storage.
For Example :- int i[ ] = { 1, 2, 3, 4, 5 } ;
3) Multidimensional array:-
Multidimensional arrays of any dimension are possible in C but in practice only two or three
dimensional arrays are workable. The most common multidimensional array is a two
dimensional array for example the computer display, board games, a mathematical matrix etc.
Syntax :type name [ rows ] [ columns ] ;
For Example :- 2D array of dimension 2 X 3.
int d[ 2 ] [ 3 ] ;
A two dimensional array is actually an array of arrays, in the above case an array of two integer
arrays (the rows) each with three elements, and is stored row-wise in memory.
For Example :- Program to fill .
This document provides an overview of NumPy arrays, including how to create and manipulate vectors (1D arrays) and matrices (2D arrays). It discusses NumPy data types and shapes, and how to index, slice, and perform common operations on arrays like summation, multiplication, and dot products. It also compares the performance of vectorized NumPy operations versus equivalent Python for loops.
NumPy is a Python library that provides multidimensional array and matrix objects to perform scientific computing. It contains efficient functions for operations on arrays like arithmetic, aggregation, copying, indexing, slicing, and reshaping. NumPy arrays have advantages over native Python sequences like fixed size and efficient mathematical operations. Common NumPy operations include elementwise arithmetic, aggregation functions, copying and transposing arrays, changing array shapes, and indexing/slicing arrays.
This document provides an introduction to NumPy arrays. It discusses arrays versus lists, how to create NumPy arrays using various functions like arange() and zeros(), and how to perform operations on NumPy arrays such as arithmetic, mathematical functions, and manipulations. It also covers installing NumPy, importing it, and checking the version. NumPy arrays allow fast and efficient storage and manipulation of numerical data in Python.
This document provides an introduction to NumPy arrays. It discusses arrays versus lists, how to create NumPy arrays using various functions like arange() and zeros(), and how to perform operations on NumPy arrays like arithmetic, mathematical functions, and manipulations. It also covers installing NumPy, importing it, and checking the version. NumPy arrays allow fast and efficient storage and manipulation of numerical data in Python.
The document discusses various operations that can be performed on arrays, including traversing, inserting, searching, deleting, merging, and sorting elements. It provides examples and algorithms for traversing an array, inserting and deleting elements, and merging two arrays. It also discusses two-dimensional arrays and how to store user input data in a 2D array. Limitations of arrays include their fixed size and issues with insertion/deletion due to shifting elements.
NumPy arrays can be created from Python lists by passing the list to the np.array() method. NumPy arrays differ from lists in that elements are stored contiguously in memory without comma separators. Multi-dimensional arrays can be created from nested lists. NumPy array attributes like shape and ndim provide information about array dimensions and elements. Individual elements can be accessed using indexes, and whole arrays or subsets of arrays can undergo vectorized operations for efficient processing.
This document provides an introduction to NumPy, the fundamental package for scientific computing with Python. It discusses what NumPy is, why it is useful compared to regular Python lists, how to define arrays of different dimensions, and how to initialize, manipulate, and perform operations on NumPy arrays. Some key capabilities of NumPy include N-dimensional arrays, broadcasting functions, integration with C/C++ and Fortran code, and tools for linear algebra and Fourier transforms.
Unit 2 discusses different programming paradigms including structured and unstructured programming. Structured programming divides code into modular functions making it easier to test, debug and modify, while unstructured programming writes code in a single block. Common structured programming languages include C and Pascal.
COMPUTING AND PROGRAMMING FUNDAMENTAL.pptxSherinRappai
The document discusses computing, programming, algorithms, and program development life cycle. It provides definitions and explanations of key concepts:
1. A program is a set of instructions that tells a computer how to perform tasks, written in a programming language. Programs range from simple scripts to complex applications.
2. Algorithms are step-by-step procedures for solving problems or performing tasks. They are incorporated into programs.
3. The program development life cycle includes phases like analysis, design, coding, testing, and maintenance to systematically create reliable programs. Diagramming tools like pseudocode, flowcharts, and UML diagrams are used in the design process.
- The document discusses various clustering techniques used in unsupervised machine learning. It describes partitioning methods like k-means and k-medoids, hierarchical methods like agglomerative and divisive clustering, and density-based methods like DBSCAN. It also covers choosing the number of clusters and interpreting clustering results. Clustering is used in applications such as customer segmentation, anomaly detection, and data simplification.
This document discusses artificial neural networks. It defines neural networks as computational models inspired by the human brain that are used for tasks like classification, clustering, and pattern recognition. The key points are:
- Neural networks contain interconnected artificial neurons that can perform complex computations. They are inspired by biological neurons in the brain.
- Common neural network types are feedforward networks, where data flows from input to output, and recurrent networks, which contain feedback loops.
- Neural networks are trained using algorithms like backpropagation that minimize error by adjusting synaptic weights between neurons.
- Neural networks have various applications including voice recognition, image recognition, and robotics due to their ability to learn from large amounts of data.
This document discusses rendering algorithms and techniques. It begins by defining rendering as the process of generating 2D or 3D images from 3D models. There are two main categories of rendering: real-time rendering used for interactive graphics, and pre-rendering used where image quality is prioritized over speed. The three main computational techniques are ray casting, ray tracing, and shading. Ray tracing simulates physically accurate lighting by tracing the path of light rays. Shading determines an object's shade based on attributes like diffuse illumination and light source contributions.
This OrionX's 14th semi-annual report on the state of the cryptocurrency mining market. The report focuses on Proof-of-Work cryptocurrencies since those use substantial supercomputer power to mint new coins and encode transactions on their blockchains. Only two make the cut this time, Bitcoin with $18 billion of annual economic value produced and Dogecoin with $1 billion. Bitcoin has now reached the Zettascale with typical hash rates of 0.9 Zettahashes per second. Bitcoin is powered by the world's largest decentralized supercomputer in a continuous winner take all lottery incentive network.
Mastering AI Workflows with FME - Peak of Data & AI 2025Safe Software
Harness the full potential of AI with FME: From creating high-quality training data to optimizing models and utilizing results, FME supports every step of your AI workflow. Seamlessly integrate a wide range of models, including those for data enhancement, forecasting, image and object recognition, and large language models. Customize AI models to meet your exact needs with FME’s powerful tools for training, optimization, and seamless integration
Your startup on AWS - How to architect and maintain a Lean and Mean accountangelo60207
Prevent infrastructure costs from becoming a significant line item on your startup’s budget! Serial entrepreneur and software architect Angelo Mandato will share his experience with AWS Activate (startup credits from AWS) and knowledge on how to architect a lean and mean AWS account ideal for budget minded and bootstrapped startups. In this session you will learn how to manage a production ready AWS account capable of scaling as your startup grows for less than $100/month before credits. We will discuss AWS Budgets, Cost Explorer, architect priorities, and the importance of having flexible, optimized Infrastructure as Code. We will wrap everything up discussing opportunities where to save with AWS services such as S3, EC2, Load Balancers, Lambda Functions, RDS, and many others.
Kubernetes Security Act Now Before It’s Too LateMichael Furman
In today's cloud-native landscape, Kubernetes has become the de facto standard for orchestrating containerized applications, but its inherent complexity introduces unique security challenges. Are you one YAML away from disaster?
This presentation, "Kubernetes Security: Act Now Before It’s Too Late," is your essential guide to understanding and mitigating the critical security risks within your Kubernetes environments. This presentation dives deep into the OWASP Kubernetes Top Ten, providing actionable insights to harden your clusters.
We will cover:
The fundamental architecture of Kubernetes and why its security is paramount.
In-depth strategies for protecting your Kubernetes Control Plane, including kube-apiserver and etcd.
Crucial best practices for securing your workloads and nodes, covering topics like privileged containers, root filesystem security, and the essential role of Pod Security Admission.
Don't wait for a breach. Learn how to identify, prevent, and respond to Kubernetes security threats effectively.
It's time to act now before it's too late!
TrustArc Webinar - 2025 Global Privacy SurveyTrustArc
How does your privacy program compare to your peers? What challenges are privacy teams tackling and prioritizing in 2025?
In the sixth annual Global Privacy Benchmarks Survey, we asked global privacy professionals and business executives to share their perspectives on privacy inside and outside their organizations. The annual report provides a 360-degree view of various industries' priorities, attitudes, and trends. See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar features an expert panel discussion and data-driven insights to help you navigate the shifting privacy landscape. Whether you are a privacy officer, legal professional, compliance specialist, or security expert, this session will provide actionable takeaways to strengthen your privacy strategy.
This webinar will review:
- The emerging trends in data protection, compliance, and risk
- The top challenges for privacy leaders, practitioners, and organizations in 2025
- The impact of evolving regulations and the crossroads with new technology, like AI
Predictions for the future of privacy in 2025 and beyond
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2025/06/state-space-models-vs-transformers-for-ultra-low-power-edge-ai-a-presentation-from-brainchip/
Tony Lewis, Chief Technology Officer at BrainChip, presents the “State-space Models vs. Transformers for Ultra-low-power Edge AI” tutorial at the May 2025 Embedded Vision Summit.
At the embedded edge, choices of language model architectures have profound implications on the ability to meet demanding performance, latency and energy efficiency requirements. In this presentation, Lewis contrasts state-space models (SSMs) with transformers for use in this constrained regime. While transformers rely on a read-write key-value cache, SSMs can be constructed as read-only architectures, enabling the use of novel memory types and reducing power consumption. Furthermore, SSMs require significantly fewer multiply-accumulate units—drastically reducing compute energy and chip area.
New techniques enable distillation-based migration from transformer models such as Llama to SSMs without major performance loss. In latency-sensitive applications, techniques such as precomputing input sequences allow SSMs to achieve sub-100 ms time-to-first-token, enabling real-time interactivity. Lewis presents a detailed side-by-side comparison of these architectures, outlining their trade-offs and opportunities at the extreme edge.
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdfRejig Digital
Unlock the future of oil & gas safety with advanced environmental detection technologies that transform hazard monitoring and risk management. This presentation explores cutting-edge innovations that enhance workplace safety, protect critical assets, and ensure regulatory compliance in high-risk environments.
🔍 What You’ll Learn:
✅ How advanced sensors detect environmental threats in real-time for proactive hazard prevention
🔧 Integration of IoT and AI to enable rapid response and minimize incident impact
📡 Enhancing workforce protection through continuous monitoring and data-driven safety protocols
💡 Case studies highlighting successful deployment of environmental detection systems in oil & gas operations
Ideal for safety managers, operations leaders, and technology innovators in the oil & gas industry, this presentation offers practical insights and strategies to revolutionize safety standards and boost operational resilience.
👉 Learn more: https://www.rejigdigital.com/blog/continuous-monitoring-prevent-blowouts-well-control-issues/
Developing Schemas with FME and Excel - Peak of Data & AI 2025Safe Software
When working with other team members who may not know the Esri GIS platform or may not be database professionals; discussing schema development or changes can be difficult. I have been using Excel to help illustrate and discuss schema design/changes during meetings and it has proven a useful tool to help illustrate how a schema will be built. With just a few extra columns, that Excel file can be sent to FME to create new feature classes/tables. This presentation will go thru the steps needed to accomplish this task and provide some lessons learned and tips/tricks that I use to speed the process.
If You Use Databricks, You Definitely Need FMESafe Software
DataBricks makes it easy to use Apache Spark. It provides a platform with the potential to analyze and process huge volumes of data. Sounds awesome. The sales brochure reads as if it is a can-do-all data integration platform. Does it replace our beloved FME platform or does it provide opportunities for FME to shine? Challenge accepted
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...Safe Software
The National Fuels Treatments Initiative (NFT) is transforming wildfire mitigation by creating a standardized map of nationwide fuels treatment locations across all land ownerships in the United States. While existing state and federal systems capture this data in diverse formats, NFT bridges these gaps, delivering the first truly integrated national view. This dataset will be used to measure the implementation of the National Cohesive Wildland Strategy and demonstrate the positive impact of collective investments in hazardous fuels reduction nationwide. In Phase 1, we developed an ETL pipeline template in FME Form, leveraging a schema-agnostic workflow with dynamic feature handling intended for fast roll-out and light maintenance. This was key as the initiative scaled from a few to over fifty contributors nationwide. By directly pulling from agency data stores, oftentimes ArcGIS Feature Services, NFT preserves existing structures, minimizing preparation needs. External mapping tables ensure consistent attribute and domain alignment, while robust change detection processes keep data current and actionable. Now in Phase 2, we’re migrating pipelines to FME Flow to take advantage of advanced scheduling, monitoring dashboards, and automated notifications to streamline operations. Join us to explore how this initiative exemplifies the power of technology, blending FME, ArcGIS Online, and AWS to solve a national business problem with a scalable, automated solution.
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOMAnchore
Over 70% of any given software application consumes open source software (most likely not even from the original source) and only 15% of organizations feel confident in their risk management practices.
With the newly announced Anchore SBOM feature, teams can start safely consuming OSS while mitigating security and compliance risks. Learn how to import SBOMs in industry-standard formats (SPDX, CycloneDX, Syft), validate their integrity, and proactively address vulnerabilities within your software ecosystem.
Providing an OGC API Processes REST Interface for FME FlowSafe Software
This presentation will showcase an adapter for FME Flow that provides REST endpoints for FME Workspaces following the OGC API Processes specification. The implementation delivers robust, user-friendly API endpoints, including standardized methods for parameter provision. Additionally, it enhances security and user management by supporting OAuth2 authentication. Join us to discover how these advancements can elevate your enterprise integration workflows and ensure seamless, secure interactions with FME Flow.
➡ 🌍📱👉COPY & PASTE LINK👉👉👉 ➤ ➤➤ https://drfiles.net/
Wondershare Filmora Crack is a user-friendly video editing software designed for both beginners and experienced users.
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven InfrastructureSafe Software
When projects depend on fast, reliable spatial data, every minute counts.
AI Clearing needed a faster way to handle complex spatial data from drone surveys, CAD designs and 3D project models across construction sites. With FME Form, they built no-code workflows to clean, convert, integrate, and validate dozens of data formats – cutting analysis time from 5 hours to just 30 minutes.
Join us, our partner Globema, and customer AI Clearing to see how they:
-Automate processing of 2D, 3D, drone, spatial, and non-spatial data
-Analyze construction progress 10x faster and with fewer errors
-Handle diverse formats like DWG, KML, SHP, and PDF with ease
-Scale their workflows for international projects in solar, roads, and pipelines
If you work with complex data, join us to learn how to optimize your own processes and transform your results with FME.
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...Anish Kumar
Presented by: Anish Kumar
LinkedIn: https://www.linkedin.com/in/anishkumar/
This lightning talk dives into real-world GenAI projects that scaled from prototype to production using Databricks’ fully managed tools. Facing cost and time constraints, we leveraged four key Databricks features—Workflows, Model Serving, Serverless Compute, and Notebooks—to build an AI inference pipeline processing millions of documents (text and audiobooks).
This approach enables rapid experimentation, easy tuning of GenAI prompts and compute settings, seamless data iteration and efficient quality testing—allowing Data Scientists and Engineers to collaborate effectively. Learn how to design modular, parameterized notebooks that run concurrently, manage dependencies and accelerate AI-driven insights.
Whether you're optimizing AI inference, automating complex data workflows or architecting next-gen serverless AI systems, this session delivers actionable strategies to maximize performance while keeping costs low.
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...Anish Kumar
Ad
Numpy in python, Array operations using numpy and so on
1. SHERIN RAPPAI
Unit 3: Basics of Numpy
21BCA2T452 : Python Programming
Prof. Sherin Rappai
Assistant Professor Dept. of Computer Science
2. SHERIN RAPPAI
NUMPY BASICS:ARRAYS ANDVECTORIZED COMPUTATION
NumPy (Numerical Python) is a fundamental library in Python for numerical and scientific computing. It
provides support for arrays (multi-dimensional, homogeneous data structures) and a wide range of
mathematical functions to perform vectorized computations efficiently.
Installing NumPy
Before using NumPy, you need to make sure it's installed.You can install it using pip:
pip install numpy
3. SHERIN RAPPAI
Importing NumPy
To use NumPy in your Python code, you should import it:
import numpy as np
By convention, it's common to import NumPy as np for brevity.
Why Use Arrays?
Arrays are more efficient than lists when performing operations. For example, if you want to add 2 to
every element in the list, you would need a loop in plain Python. But with NumPy, you can do this in a
single line:
arr = np.array([1, 2, 3, 4, 5])
new_arr = arr + 2 # Adds 2 to every element in the array
print(new_arr)
Output: [3 4 5 6 7]
4. SHERIN RAPPAI
Creating NumPy Arrays
You can create NumPy arrays using various methods:
1. From Python Lists:
arr = np.array([1, 2, 3, 4, 5])
2. Using NumPy Functions:
zeros_arr = np.zeros(5) # Creates an array of zeros with 5 elements
ones_arr = np.ones(3) # Creates an array of ones with 3 elements
rand_arr = np.random.rand(3, 3) # Creates a 3x3 array with random values between 0 and 1
3. Using NumPy's Range Function:
range_arr = np.arange(0, 10, 2) # Creates an array with values [0, 2, 4, 6, 8]
5. SHERIN RAPPAI
BASIC ARRAY OPERATIONS
Once you have NumPy arrays, you can perform various operations on them:
1. Element-wise Operations:
NumPy allows you to perform element-wise operations, like addition, subtraction, multiplication, and
division:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b # Element-wise addition: [5, 7, 9]
d = a * b # Element-wise multiplication: [4, 10, 18]
6. SHERIN RAPPAI
2. Indexing and Slicing:
Indexing means accessing a specific element in an array by its position (index). In NumPy,
indices start from 0.
arr = np.array([0, 1, 2, 3, 4, 5])
element = arr[2] # Access element at index 2 (value: 2)
sub_array = arr[2:5] # Slice from index 2 to 4 (values: [2, 3, 4])
7. SHERIN RAPPAI
Slicing:Slicing allows you to access a range or subset of elements from an array. It is done
using the syntax arr[start:end], where start is the index where the slice begins (inclusive),
and end is where it stops (exclusive).
arr = np.array([10, 20, 30, 40, 50])
# Getting a slice of elements from index 1 to 3 (exclusive of 3)
print(arr[1:3]) # Output: [20 30]
# Getting a slice from the start till the third element
print(arr[:3]) # Output: [10 20 30]
# Getting a slice from index 2 to the end of the array
print(arr[2:]) # Output: [30 40 50]
8. SHERIN RAPPAI
Negative Indexing:
You can also use negative indices to access elements from the end of the array. For example, -1 refers to
the last element, -2 refers to the second last element, and so on.
Example:
arr = np.array([10, 20, 30, 40, 50])
# Accessing the last element
print(arr[-1]) # Output: 50
# Accessing the second last element
print(arr[-2]) # Output: 40
9. SHERIN RAPPAI
Slicing with Steps:You can also specify a step value, which tells how many elements to skip
in the slice.The syntax is arr[start:end:step].
Example:
arr = np.array([10, 20, 30, 40, 50, 60])
# Getting every second element from index 1 to 5
print(arr[1:5:2]) # Output: [20 40]
# Reversing the array using negative step
print(arr[::-1]) # Output: [60 50 40 30 20 10]
•The array is [10, 20, 30, 40, 50, 60].
•Index positions: [0, 1, 2, 3, 4, 5].
•The slice starts at index 1, which is 20.
•2 is the step value, which means "skip every
second element.
•It skips the next element and picks the
element at index 3, which is 40.
•The slice stops before reaching index 5.
10. SHERIN RAPPAI
3.Array Shape and Reshaping:
The shape of an array tells us how many elements it contains along each dimension (or axis).
You can check the shape of an array using the .shape attribute.
You can check and change the shape of NumPy arrays:
arr = np.array([[1, 2, 3], [4, 5, 6]])
shape = arr.shape # Get the shape (2, 3)
reshaped = arr.reshape(3, 2) # Reshape the array to (3, 2)
Reshaping:
Reshaping allows you to change the shape of an array without changing its data.You can
convert a 1D array to a 2D array, or a 2D array to a 3D array, etc., as long as the total
number of elements stays the same.
Example:
11. SHERIN RAPPAI
# Creating a 1D array with 6 elements
arr = np.array([1, 2, 3, 4, 5, 6])
# Reshaping the 1D array into a 2D array (2 rows, 3 columns)
reshaped_arr = arr.reshape(2, 3)
print(reshaped_arr)
Reshape Rules:
When reshaping an array, the new shape must contain the same total number of elements as the original array. For
example, if you have an array with 12 elements, you could reshape it to:A 2x6 array (2 rows x 6 columns)A 3x4 array (3
rows x 4 columns)A 4x3 array (4 rows x 3 columns)
Example
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
# Reshaping into 3 rows and 4 columns
reshaped_arr = arr.reshape(3, 4)
print(reshaped_arr)
12. SHERIN RAPPAI
Flattening an Array:If you want to convert a multi-dimensional array back into a 1D array, you can flatten it using
the .flatten() method.
Example
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
# Flattening the 2D array into a 1D array
flat_arr = arr_2d.flatten()
print(flat_arr)
O/P
[1 2 3 4 5 6]
Shape:Tells you the dimensions of an array (rows, columns, etc.).
Reshaping: Lets you change the shape of an array while keeping the same number of elements.
Flattening: Converts a multi-dimensional array back into a 1D array.
13. SHERIN RAPPAI
4.Aggregation Functions:
Agregation functions are used to perform calculations on an entire array or along a specific axis (e.g., summing all
elements, finding the maximum, etc.).These functions are essential for data analysis and numerical computations.
Common Aggregation Functions:
Here are some of the most commonly used aggregation functions in NumPy:
1. Sum:The sum() function adds all the elements of an array.
2. Mean:The mean() function calculates the average of the elements.
3. Maximum and Minimum:max() gives the maximum value in the array. min() gives the minimum value in the array.
4. Product:The prod() function returns the product of all elements in the array (i.e., multiplies all elements together).
5. Standard Deviation andVariance: std() calculates the standard deviation (how spread out the numbers are).
6. var() calculates the variance (the square of the standard deviation).
7. Cumulative Sum and Product : cumsum() gives the cumulative sum (the sum of the elements up to each
index).cumprod() gives the cumulative product (the product of elements up to each index).
NumPy provides functions to compute statistics on arrays:
arr = np.array([1, 2, 3, 4, 5])
mean = np.mean(arr) # Calculate the mean (average)
max_val = np.max(arr) # Find the maximum value
min_val = np.min(arr) # Find the minimum value
14. SHERIN RAPPAI
VECTORIZED COMPUTATION
Vectorized computation in Python refers to performing operations on entire arrays or sequences of data
without the need for explicit loops.This approach leverages highly optimized, low-level code to achieve
faster and more efficient computations.The primary library for vectorized computation in Python is
NumPy.
Traditional Loop-Based Computation
In traditional Python programming, you might use explicit loops to perform operations on arrays or lists.
For example:
# Using loops to add two lists element-wise
list1 = [1, 2, 3]
list2 = [4, 5, 6]
result = []
for i in range(len(list1)):
result.append(list1[i] + list2[i]) # Result: [5, 7, 9]
15. SHERIN RAPPAI
Vectorized Computation with NumPy
NumPy allows you to perform operations on entire arrays, making code more concise and efficient. Here's how you can
achieve the same result using NumPy:
import numpy as np
# Using NumPy for element-wise addition
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 + arr2
# Result: array([5, 7, 9])
16. SHERIN RAPPAI
INTRODUCTION TO PANDAS DATA STRUCTURES
Pandas is a popular Python library for data manipulation and analysis. It provides two primary data structures: the
DataFrame and the Series.These data structures are designed to handle structured data, making it easier to work
with datasets in a tabular format.
DataFrame:
A DataFrame is a 2-dimensional, labeled data structure that resembles a spreadsheet or SQL table.
It consists of rows and columns, where each column can have a different data type (e.g., integers, floats, strings, or
even custom data types).
You can think of a DataFrame as a collection of Series objects, where each Series is a column.
DataFrames are highly versatile and are used for a wide range of data analysis tasks, including data cleaning,
exploration, and transformation.
17. SHERIN RAPPAI
import pandas as pd
# Creating a DataFrame from a dictionary of data
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['NewYork', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)
# Displaying the DataFrame
print(df)
Here's a basic example of how to create a DataFrame using Pandas:
18. SHERIN RAPPAI
Series:
A Series is a one-dimensional labeled array that can hold data of any data type.
It is like a column in a DataFrame or a single variable in statistics.
Series objects are commonly used for time series data, as well as other one-dimensional data.
Key characteristics of a Pandas Series:
Homogeneous Data: Unlike Python lists or NumPy arrays, a Pandas Series enforces homogeneity, meaning all the
data within a Series must be of the same data type. For example, if you create a Series with integer values, all values
within that Series will be integers.
Labeled Data: Series have two parts: the data itself and an associated index.The index provides labels or names for
each data point in the Series. By default, Series have a numeric index starting from 0, but you can specify custom
labels if needed.
Size and Shape:A Series has a size (the number of elements) and shape (1-dimensional) but does not have columns
or rows like a DataFrame.
19. SHERIN RAPPAI
import pandas as pd
# Create a Series from a list
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
# Display the Series
print(series)
0 10
1 20
2 30
3 40
4 50
dtype: int64
20. SHERIN RAPPAI
Some common tasks you can perform with Pandas:
Data Loading: Pandas can read data from various sources, including CSV files, Excel spreadsheets, SQL databases,
and more.
Data Cleaning: You can clean and preprocess data by handling missing values, removing duplicates, and
transforming data types.
Data Selection: Easily select specific rows and columns of interest using various indexing techniques.
Data Aggregation: Perform groupby operations, calculate statistics, and aggregate data based on specific criteria.
Data Visualization: You can use Pandas in conjunction with visualization libraries like Matplotlib and Seaborn to
create informative plots and charts.
21. SHERIN RAPPAI
A DataFrame in Python typically refers to a two-dimensional, size-mutable, and potentially heterogeneous tabular data
structure provided by the popular library called Pandas. It is a fundamental data structure for data manipulation and
analysis in Python.
Here's how you can work with DataFrames in Python using Pandas:
1. Import Pandas:
First, you need to import the Pandas library.
import pandas as pd
2. Creating a DataFrame:
You can create a DataFrame in several ways. Here are a few
common methods:
From a dictionary:
data = {'Column1': [value1, value2, ...],
'Column2': [value1, value2, ...]}
df = pd.DataFrame(data)
DataFrame
22. SHERIN RAPPAI
• From a list of lists:
data = [[value1, value2],
[value3, value4]]
df = pd.DataFrame(data, columns=['Column1', 'Column2'])
• From a CSV file:
df = pd.read_csv('file.csv')
3.Viewing Data:
You can use various methods to view and explore your DataFrame:
df.head(): Displays the first few rows of the DataFrame.
df.tail(): Displays the last few rows of the DataFrame.
df.shape: Returns the number of rows and columns.
df.columns: Returns the column names.
df.info(): Provides information about the DataFrame, including data types and non-null counts.
23. SHERIN RAPPAI
4. Selecting Data:
You can select specific columns or rows from a DataFrame using indexing or filtering. For example:
df['Column1'] # Select a specific column
df[['Column1', 'Column2']] # Select multiple columns
df[df['Column1'] > 5] # Filter rows based on a condition
5. Modifying Data:
You can modify the DataFrame by adding or modifying columns, updating values, or appending rows. For example:
df['NewColumn'] = [new_value1, new_value2, ...] # Add a new column
df.at[index, 'Column1'] = new_value # Update a specific value
df = df.append({'Column1': value1, 'Column2': value2}, ignore_index=True) # Append a new row
24. SHERIN RAPPAI
6. Data Analysis:
Pandas provides various functions for data analysis, such as
describe(), groupby(), agg(), and more.
7. Saving Data:
You can save the DataFrame to a CSV file or other formats:
df.to_csv('output.csv', index=False)
df.to_excel('output.xlsx', index=False)
25. SHERIN RAPPAI
INDEX OBJECTS-INDEXING, SELECTION,AND FILTERING
In Pandas, the Index object is a fundamental component of both Series and DataFrame data structures. It
provides the labels or names for the rows or columns of your data.You can use indexing, selection, and
filtering techniques with these indexes to access specific data points or subsets of your data. Here's how
you can work with index objects in Pandas:
1. Indexing:
Indexing allows you to access specific elements or rows in your data using labels.You can use .loc[] for label-based
indexing and .iloc[] for integer-based indexing.
• Label-based indexing:
df.loc['label'] # Access a specific row by its label
df.loc['label', 'column_name'] # Access a specific element by
label and column name
26. SHERIN RAPPAI
EXAMPLE
import pandas as pd
# Create a DataFrame with custom labels
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['NY', 'LA', 'SF']}
df = pd.DataFrame(data, index=['A1', 'B2', 'C3'])
# Access the row with label 'B2'
print(df.loc['B2’])
# Access the value in the row with label 'B2' and the column 'City'
print(df.loc['B2', 'City'])
27. SHERIN RAPPAI
• Integer-based indexing:
df.iloc[0] # Access the first row
df.iloc[0, 1] # Access an element by row and column index
2. Selection:
Selection refers to choosing specific columns or rows from a DataFrame based on their labels or positions.You
use selection when you want to extract specific columns or rows without applying any condition.
It’s about choosing specific data (columns/rows) directly.
No conditional logic is applied
df['Column1'] # Select 'Column1' from the DataFrame
df[['Column1', 'Column2']] # Select 'Column1' and 'Column2'
df.loc[0] # Select the first row by index label
df.iloc[2] # Select the third row by integer position
28. SHERIN RAPPAI
3. Filtering:
You can use various methods to select specific data based on conditions or criteria.
• Select rows based on a condition:
• df[df['Column'] > 5] # Select rows where 'Column' is greater than 5
• Select rows by multiple conditions:
• df[(df['Column1'] > 5) & (df['Column2'] < 10)] # Rows where 'Column1' > 5 and 'Column2' < 10
Filtering allows you to create a boolean mask based on a condition and then apply that mask to your DataFrame to
select rows meeting the condition.
Create a boolean mask:
condition = df['Column'] > 5
Apply the mask to the DataFrame:
filtered_df = df[condition]
4. Setting a New Index:
You can set a specific column as the index of your DataFrame using the .set_index() method.
df.set_index('Column_Name', inplace=True)
29. SHERIN RAPPAI
5. Resetting the Index:
If you've set a column as the index and want to revert to the default integer-based index, you can use the .reset_index()
method.
df.reset_index(inplace=True)
6. Multi-level Indexing:
You can create DataFrames with multi-level indexes, allowing you to work with more complex hierarchical data
structures.
df.set_index(['Index1', 'Index2'], inplace=True)
Index objects in Pandas are versatile and powerful for working with data because they enable you to
access and manipulate your data in various ways, whether it's for data retrieval, filtering, or
restructuring.
30. SHERIN RAPPAI
ARITHMETIC AND DATA ALIGNMENT IN PANDAS
Arithmetic and data alignment in Pandas refer to how mathematical operations are performed between Series and
DataFrames when they have different shapes or indices. Pandas automatically aligns data based on the labels of the objects
involved in the operation, which ensures that the result of the operation maintains data integrity and is aligned correctly.
Here are some key aspects of arithmetic and data alignment in Pandas:
1.Automatic Alignment:
When you perform mathematical operations (e.g., addition, subtraction, multiplication, division) between two Series or
DataFrames, Pandas aligns the data based on their labels (index or column names). It aligns the data based on common labels
and performs the operation only on matching labels.
series1 = pd.Series([1, 2, 3], index=['A', 'B', 'C’])
series2 = pd.Series([4, 5, 6], index=['B', 'C', 'D'])
result = series1 + series2
In this example, the result Series will have NaN values for the 'A' and 'D' labels because those labels don't match between
series1 and series2.
A NaN
B 6.0
C 8.0
D NaN
dtype: float64
31. SHERIN RAPPAI
2. Missing Data (NaN):
When labels don't match, Pandas fills in the result with NaN (Not-a-Number) to indicate missing values.
3. DataFrame Alignment:
The same principles apply to DataFrames when performing operations between them.The alignment occurs both for rows
(based on the index) and columns (based on column names).
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y'])
df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]}, index=['Y', 'Z'])
result = df1 + df2
In this case, result will have NaN values in columns 'A' and 'C' because those columns don't exist in both df1 and df2.
4. Handling Missing Data:
You can use methods like .fillna() to replace NaN values with a specific value or use .dropna() to remove rows or columns
with missing data.
result_filled = result.fillna(0) # Replace NaN with 0
result_dropped = result.dropna() # Remove rows or columns with NaN values
32. SHERIN RAPPAI
5.Alignment with Broadcasting:
Pandas allows you to perform operations between a Series and a scalar value, and it broadcasts the scalar to match the
shape of the Series.
series = pd.Series([1, 2, 3])
scalar = 2
result = series * scalar
In this example, result will be a Series with values [2, 4, 6].
Automatic alignment in Pandas is a powerful feature that simplifies data manipulation and allows you to work with
datasets of different shapes without needing to manually align them. It ensures that operations are performed in a way
that maintains the integrity and structure of your data.
33. SHERIN RAPPAI
ARITHMETIC AND DATA ALIGNMENT IN NUMPY
NumPy, like Pandas, performs arithmetic and data alignment when working with arrays. However, unlike Pandas, NumPy
is primarily focused on numerical computations with homogeneous arrays (arrays of the same data type). Here's how
arithmetic and data alignment work in NumPy:
Automatic Alignment:
NumPy arrays perform element-wise operations, and they automatically align data based on the shape of the arrays
being operated on.This means that if you perform an operation between two NumPy arrays of different shapes,
NumPy will broadcast the smaller array to match the shape of the larger one, element-wise.
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4])
result = arr1 + arr2
In this example, NumPy will automatically broadcast arr2 to match the shape of arr1, resulting in [5, 7, 8].
34. SHERIN RAPPAI
Broadcasting Rules:
NumPy follows specific rules when broadcasting arrays:
If the arrays have a different number of dimensions, pad the smaller shape with ‘1’ on the left side.
For example:
•Shape (3, 5) and shape (5) become (3, 5) and (1, 5). NumPy adds a 1 on the left to make both arrays 2D.
Compare the shapes element-wise, starting from the right. If dimensions are equal or one of them is 1, they are
compatible.
If the dimensions are incompatible, NumPy raises a "ValueError: operands could not be broadcast together" error.
Shape (3, 5) and (1, 5):The second dimensions (5 and 5) are the same, and the first dimensions (3 and 1) are compatible
because 1 can be stretched to 3.
Handling Missing Data:
In NumPy, there is no concept of missing data like NaN in Pandas. If you perform operations between arrays with
mismatched shapes, NumPy will either broadcast or raise an error, depending on whether broadcasting is possible.
Element-Wise Operations:
NumPy performs arithmetic operations element-wise by default.This means that each element in the resulting array is the
result of applying the operation to the corresponding elements in the input arrays.
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 * arr2
35. SHERIN RAPPAI
APPLYING FUNCTIONS AND MAPPING
In NumPy, you can apply functions and perform element-wise operations on arrays using various techniques, including
vectorized functions, np.apply_along_axis(), and the np.vectorize() function. Additionally, you can use the np.vectorize()
function for mapping operations. Here's an overview of these approaches:
Vectorized Functions:
NumPy is designed to work efficiently with vectorized operations, meaning you can apply functions to entire arrays or
elements of arrays without the need for explicit loops. NumPy provides built-in functions that can be applied element-
wise to arrays.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Applying a function element-wise
result = np.square(arr) # Square each element
In this example, the np.square() function is applied element-wise to the arr array.
36. SHERIN RAPPAI
‘np.apply_along_axis():
You can use the np.apply_along_axis() function to apply a function along a specified axis of a multi-dimensional array.This
is useful when you want to apply a function to each row or column of a 2D array.
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Apply a function along the rows (axis=1)
def sum_of_row(row):
return np.sum(row)
result = np.apply_along_axis(sum_of_row, axis=1, arr=arr)
In this example, sum_of_row is applied to each row along axis=1, resulting in a new 1D array.
37. SHERIN RAPPAI
np.vectorize():
The np.vectorize() function allows you to create a vectorized version of a Python function, which can then be applied
element-wise to NumPy arrays.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Define a Python function
def my_function(x):
return x * 2
# Create a vectorized version of the function
vectorized_func = np.vectorize(my_function)
# Apply the vectorized function to the array
result = vectorized_func(arr)
This approach is useful when you have a custom function that you want to apply to an array.
38. SHERIN RAPPAI
Mapping with np.vectorize():
You can use np.vectorize() to map a function to each element of an array.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Define a Python function
def my_function(x):
return x * 2
# Create a vectorized version of the function
vectorized_func = np.vectorize(my_function)
# Map the function to each element
result = vectorized_func(arr)
This approach is similar to applying a function element-wise but can be used for more complex mapping
operations.
These methods allow you to apply functions and perform mapping operations efficiently on NumPy
arrays, making it a powerful library for numerical and scientific computing tasks.
39. SHERIN RAPPAI
SORTING AND RANKING
Sorting and ranking are common data manipulation operations in data analysis and are widely supported in Python
through libraries like NumPy and Pandas.These operations help organize data in a desired order or rank elements
based on specific criteria. Here's how to perform sorting and ranking in both libraries:
Sorting in NumPy:
In NumPy, you can sort NumPy arrays using the np.sort() and np.argsort() functions.
np.sort():This function returns a new sorted array without modifying the original array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
sorted_arr = np.sort(arr)
40. SHERIN RAPPAI
np.argsort():This function returns the indices that would sort the array.You can use these indices to sort the original
array.
import numpy as np
output:
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) Indices of sorted array: [1 3 6 0 9 2 4 5 8 7 10]
indices = np.argsort(arr)
print("Indices of sorted array:", sorted_indices)
sorted_arr = arr[indices]
print("Sorted array:", sorted_arr)
Sorting in Pandas:
In Pandas, you can sort Series and DataFrames using the sort_values() method.You can specify the column(s) to sort by
and the sorting order.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 35]}
df = pd.DataFrame(data)
# Sort by 'Age' column in ascending order
sorted_df = df.sort_values(by='Age', ascending=True)
41. SHERIN RAPPAI
NumPy doesn't have a built-in ranking function, but you can use np.argsort() to get the ranking of elements.You can
then use these rankings to create a ranked array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
indices = np.argsort(arr)
ranked_arr = np.argsort(indices) + 1 # Add 1 to start ranking from 1 instead of 0
Ranking in Pandas:
In Pandas, you can rank data using the rank() method.You can specify the sorting order and how to handle ties (e.g.,
assigning the average rank to tied values).
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 30]}
df = pd.DataFrame(data)
# Rank by 'Age' column in descending order and assign average rank to tied values
df['Rank'] = df['Age'].rank(ascending=False, method='average')
Ranking in NumPy:
42. SHERIN RAPPAI
SUMMARIZING AND COMPUTING DESCRIPTIVE STATISTICS
1. Summary Statistics:
NumPy provides functions to compute summary statistics directly on arrays.
import numpy as np
data = np.array([25, 30, 22, 35, 28])
mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)
variance = np.var(data)
43. SHERIN RAPPAI
2. Percentiles and Quartiles:
You can compute specific percentiles and quartiles using the np.percentile() function.
percentile_25 = np.percentile(data, 25)
percentile_75 = np.percentile(data, 75)
3. Correlation and Covariance:
You can compute correlation and covariance between arrays using np.corrcoef() and np.cov().
correlation_matrix = np.corrcoef(data1, data2)
covariance_matrix = np.cov(data1, data2)
44. SHERIN RAPPAI
CORRELATION AND COVARIANCE
In NumPy, you can compute correlation and covariance between arrays using the np.corrcoef() and np.cov() functions,
respectively.These functions are useful for analyzing relationships and dependencies between variables. Here's how to
use them:
Computing Correlation Coefficient (Correlation):
The correlation coefficient measures the strength and direction of a linear relationship between two variables. It ranges
from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear correlation.
import numpy as np
# Create two arrays representing variables
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])
45. SHERIN RAPPAI
# Compute the correlation coefficient between x and y
correlation_matrix = np.corrcoef(x, y)
# The correlation coefficient is in the (0, 1) element of the matrix
correlation_coefficient = correlation_matrix[0, 1]
In this example, correlation_coefficient will contain the Pearson correlation coefficient between x and y.
46. SHERIN RAPPAI
Computing Covariance:
Covariance measures the degree to which two variables change together. Positive values indicate a positive relationship
(both variables increase or decrease together), while negative values indicate an inverse relationship (one variable
increases as the other decreases).
import numpy as np
# Create two arrays representing variables
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])
# Compute the covariance between x and y
covariance_matrix = np.cov(x, y)
# The covariance is in the (0, 1) element of the matrix
covariance = covariance_matrix[0, 1]
In this example, covariance will contain the covariance between x and y.
Both np.corrcoef() and np.cov() can accept multiple arrays as input, allowing you to compute correlations and
covariances for multiple variables simultaneously. For example, if you have a dataset with multiple columns, you can
compute the correlation matrix or covariance matrix for all pairs of variables.
47. SHERIN RAPPAI
HANDLING MISSING DATA
Handling missing data in NumPy is an important aspect of data analysis and manipulation. NumPy provides several ways
to work with missing or undefined values, typically represented as NaN (Not-a-Number). Here are some common
techniques for handling missing data in NumPy:
Using np.nan: NumPy represents missing data using np.nan.You can create arrays with missing values like this:
import numpy as np
arr = np.array([1.0, 2.0, np.nan, 4.0])
Now, arr contains a missing value represented as np.nan.
48. SHERIN RAPPAI
Checking for Missing Data:You can check for missing values using the np.isnan() function. For example:
np.isnan(arr) # Returns a boolean array indicating which elements are NaN.
Filtering Missing Data:To filter out missing values from an array, you can use boolean indexing. For example:
arr[~np.isnan(arr)] # Returns an array without NaN values.
Replacing Missing Data:You can replace missing values with a specific value using np.nan_to_num() or np.nanmean(). For
example:
arr[np.isnan(arr)] = 0 # Replace NaN with 0
Or, to replace NaN with the mean of the non-missing values:
mean = np.nanmean(arr)
arr[np.isnan(arr)] = mean
49. SHERIN RAPPAI
Ignoring Missing Data: Sometimes, you may want to perform operations while ignoring missing values.You can use
functions like np.nanmax(), np.nanmin(), np.nansum(), etc., which ignore NaN values when computing the result.
Interpolation: If you have a time series or ordered data, you can use interpolation methods to fill missing values.
NumPy provides functions like np.interp() for this purpose.
Masked Arrays: NumPy also supports masked arrays (numpy.ma) that allow you to work with missing data more
explicitly by creating a mask that specifies which values are missing.This can be useful for certain computations.
import numpy as np
import numpy.ma as ma
arr = np.array([1, 2, np.nan, 4])
masked_arr = ma.masked_array(arr, np.isnan(arr)) # Mask NaN values
mean_val = masked_arr.mean() # Calculates mean ignoring NaNs
Handling Missing Data in Multidimensional Arrays: If you're working with multidimensional arrays, you can apply
the above techniques along a specific axis or use functions like np.isnan() with the axis parameter to handle missing
data along specific dimensions.
Keep in mind that the specific method you choose to handle missing data depends on your data
analysis goals and the context of your data. Some methods may be more appropriate than others,
depending on your use case.
50. SHERIN RAPPAI
HIERARCHICAL INDEXING
Hierarchical indexing in NumPy is often referred to as "MultiIndexing" and allows you to work with multi-dimensional
arrays where each dimension has multiple levels or labels.This is particularly useful when you want to represent higher-
dimensional data with more complex hierarchical structures.
You can create a MultiIndex in NumPy using the numpy.MultiIndex class. Here's a basic example:
import numpy as np
import pandas as pd # Import pandas
# Create a MultiIndex with two levels
index = pd.MultiIndex.from_arrays([['A', 'A', 'B', 'B'], [1, 2, 1, 2], ['X', 'Y', 'X', 'Y']],
names=['Level1', 'Level2', 'Level3'])
# Create a random data array
data = np.random.rand(4, 3)
# Create a DataFrame with MultiIndex
df = pd.DataFrame(data, index=index, columns=['Value1', 'Value2', 'Value3'])
print(df)
Value1 Value2 Value3
Level1 Level2 Level3
A 1 X 0.654321 0.123456 0.987654
2 Y 0.234567 0.345678 0.456789
B 1 X 0.987654 0.876543 0.765432
2 Y 0.123456 0.234567 0.345678
51. SHERIN RAPPAI
In this example, we've created a MultiIndex with two levels: 'A' and 'B' as the first level, and '1', '2' as the second level.
Then, we've created a DataFrame with this MultiIndex and some random data.
You can access data from this DataFrame using hierarchical indexing. For example:
# Accessing data using hierarchical indexing
value_A1_X = df.loc[('A', 1, 'X')]['Value1'] # AccessValue1 for 'A', 1, 'X'
52. SHERIN RAPPAI
Some common operations with hierarchical indexing include:
Slicing:You can perform slices at each level of the index, allowing you to select specific subsets of the data.
Stacking and Unstacking: Stacking: Converts columns into a new level of the index.
Unstacking: Moves one level of the index back into
columns.
Swapping Levels:You can swap levels to change the order of the levels in the index.
# Swap 'Letter' and 'Number' levels
print(df.swaplevel('Letter', 'Number'))
Grouping and Aggregating: You can group data based on levels of the index and perform aggregation functions like
mean, sum, etc.
Reordering Levels:You can change the order of levels in the index.
Resetting Index: You can reset the index to move the hierarchical index levels back to columns.
Value1 Value2
Number Letter
1 A 10 100
2 A 20 200
1 B 30 300
2 B 40 400
Value1 Value2
Number Letter
1 A 10 100
2 A 20 200
1 B 30 300
2 B 40 400
53. SHERIN RAPPAI
Hierarchical indexing is especially valuable when dealing with multi-dimensional data, such as panel
data or data with multiple categorical variables. It allows for more expressive data organization and
manipulation.You can also use the pd.MultiIndex class from the pandas library, which provides more
advanced functionality for working with hierarchical data structures, including various methods for
creating and manipulating MultiIndex objects.
Editor's Notes
#9: By using :: and specifying -1 for the step, you're telling Python to:Start from the end of the array and move backward (step size of -1).
This effectively reverses the array.
#16: Versatility: DataFrames are incredibly powerful and versatile. They can handle various tasks, such as:
Data cleaning: Fixing or removing incorrect, incomplete, or duplicate data.
Exploration: Summarizing data, performing statistical calculations, and visualizing trends.
Transformation: Applying functions, aggregations, and pivoting data for further analysis
#18: Stock prices recorded every minute.Daily temperatures over a year.Monthly sales revenue of a company.Heartbeat measurements from a fitness tracker over time.
Size:
Series: The size refers to the total number of elements in the Series, similar to the length of a list. It counts how many data points the Series holds.
Shape
A Series is always 1-dimensional, which means it only has a single axis (the values), even if it looks like a column of data. The shape of a Series will be (n,), where n is the number of elements.
#20: Data Loading:
pd.read_csv('file.csv’)
pd.read_excel('file.xlsx’)
pd.read_sql_query().
Data Cleaning:
Handling missing values: Replace or remove missing data using fillna() or dropna().Removing duplicates: Identify and remove duplicate rows with drop_duplicates().Transforming data types: Convert data types using astype()
Data Selection:
Pandas provides multiple ways to select specific rows and columns:
Select columns using df['column_name'] or df[['col1', 'col2']].
Data Aggregation:
You can group data by specific criteria and perform aggregations such as sum, mean, count, etc. The groupby() function is used to split the data into groups before applying an aggregation.
Common aggregation methods include sum(), mean(), count(), max(), and min()
Data Visualization:
While Pandas itself has basic plotting capabilities (df.plot()), it is commonly used alongside libraries like Matplotlib and Seaborn to create more sophisticated plots and charts.
#23: This creates a Boolean Series (a sequence of True or False values), where each value corresponds to whether the condition 'Column1' > 5 is satisfied for each row in the DataFrame df.For example, if df['Column1'] contains values [3, 7, 1, 9], the condition > 5 is applied to each element, resulting in the Boolean Series [False, True, False, True].
This line filters the rows of the DataFrame based on a condition.df['Column1'] > 5 returns a boolean Series where True represents the rows where 'Column1' has values greater than 5, and False represents the rows where it doesn't.df[...] returns the subset of rows where the condition is True.The result will be a pandas DataFrame with only the rows that meet the condition.
df.at[index, 'Column1'] = new_value - Update a specific value:
The .at[] method is used to access and update a specific cell in the DataFrame.
index refers to the row index, and 'Column1' refers to the column.
The value at the intersection of the given row and column is updated to new_value.
df = df.append({'Column1': value1, 'Column2': value2}, ignore_index=True) - Append a new row:
This appends a new row to the DataFrame df.
A dictionary {'Column1': value1, 'Column2': value2} defines the data for the new row.
ignore_index=True resets the index of the new row, ensuring it gets added with a new sequential index rather than trying to maintain the original index.
The result is a new DataFrame with the appended row.
#24: In pandas, the index=False argument is used to exclude the DataFrame's index (row labels) when saving it to a file (e.g., CSV, Excel, etc.).
In pandas, the describe() function generates descriptive statistics for the DataFrame or a specific column. It provides a quick summary of the central tendency, dispersion, and shape of a dataset’s distribution, including:
count: The number of non-null entries.
mean: The average of the values.
std: The standard deviation, a measure of the spread of the data.
min: The minimum value.
25%: The 25th percentile (first quartile).
50%: The 50th percentile (median).
75%: The 75th percentile (third quartile).
max: The maximum value.
#27: import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['NY', 'LA', 'SF']}
df = pd.DataFrame(data)
# Access the first row
print(df.iloc[0])
Name Alice
Age 25
City NY
Name: 0, dtype: object
#28: Purpose:
It sets the values of the specified column ('Column_Name') as the new index of the DataFrame df.
This operation replaces the default integer-based index (0, 1, 2, …) with the values from 'Column_Name’.
inplace=True: This means the operation will be performed in-place, meaning the DataFrame df will be modified directly without needing to assign it to a new variable. If this parameter were set to False, the original DataFrame would remain unchanged, and a new DataFrame with the new index would be returned.
#29: import pandas as pd
# Create a sample DataFrame
data = {'Country': ['USA', 'USA', 'Canada', 'Canada'],
'State': ['New York', 'California', 'Ontario', 'Quebec'],
'Population': [19.45, 39.51, 14.57, 8.43]}
df = pd.DataFrame(data)
# Set 'Country' and 'State' as a multi-level index
df.set_index(['Country', 'State'], inplace=True)
print(df)
inplace=True:
This parameter ensures that the operation is performed on the original DataFrame itself, rather than returning a new DataFrame with the reset index. If inplace=False (the default), the method returns a new DataFrame and does not modify the original one.
#34: In NumPy, broadcasting is a powerful feature that allows operations on arrays of different shapes, as long as they are compatible based on specific rules. Broadcasting makes it easy to perform element-wise operations without having to reshape or replicate arrays manually.
After aligning the shapes by padding, NumPy compares the dimensions one by one from right to left. For each dimension:
If the dimensions are the same size, they are compatible.If one of the dimensions is 1, it is "stretched" to match the other dimension.If they are neither the same nor 1, the shapes are incompatible, and broadcasting fails.
#36: If you wanted to apply the function along the columns (i.e., column-wise), you would set axis=0
The first row [1, 2, 3] is passed to sum_of_row, which returns 1 + 2 + 3 = 6.The second row [4, 5, 6] is passed to sum_of_row, which returns 4 + 5 + 6 = 15
The first arr is the parameter name that apply_along_axis expects, telling it which array to process.The second arr is the variable name you defined earlier in your code (the actual array [[1, 2, 3], [4, 5, 6]]).
#37: Element-wise Operations:
Vectorized functions apply an operation to each element of an array simultaneously, enabling batch processing rather than one-at-a-time processing.
np.vectorize() creates a new function vectorized_func that applies my_function element-wise to a NumPy array.
he vectorized function is applied to each element of arr:my_function(1) returns 1 * 2 = 2
my_function(2) returns 2 * 2 = 4
my_function(3) returns 3 * 2 = 6
my_function(4) returns 4 * 2 = 8
#40: The np.argsort() function in NumPy returns the indices that would sort an array. Unlike np.sort(), which returns the sorted array itself, np.argsort() provides the indices of the sorted elements. This can be particularly useful when you want to keep track of the original order of the elements after sorting.
Ascending = False for descending order
#41: This line ranks the entries in the 'Age' column:
ascending=False: This means that higher ages receive a higher rank.
method='average': This specifies that if there are ties in the ranks (like Bob and David both being age 30), they will be assigned the average of their ranks.
#43: Percentiles are values below which a given percentage of observations in a group of observations falls. For example, the 25th percentile (also known as the first quartile) is the value below which 25% of the data points lie.
Quartiles divide the data into four equal parts:
Q1 (First Quartile): 25th percentile
Q2 (Second Quartile): 50th percentile (median)
Q3 (Third Quartile): 75th percentile
Correlation measures the strength and direction of a linear relationship between two variables. It ranges from -1 to 1:
A correlation of 1 indicates a perfect positive linear relationship.
A correlation of -1 indicates a perfect negative linear relationship.
A correlation of 0 indicates no linear relationship.
Covariance measures the degree to which two variables change together.
#49: Interpolation is a technique for estimating unknown values in a sequence based on surrounding data. This is useful in time series or other ordered data when you have gaps or missing values.
np.interp(): It linearly interpolates between points to fill missing values. However, it requires you to provide the x-values (indices or times) and corresponding y-values (data) to interpolate.
3. Masked Arrays
NumPy provides the numpy.ma module for creating masked arrays. This allows you to explicitly handle missing or invalid data by "masking" certain values.
Masked Arrays (numpy.ma): Let you mask specific values (e.g., NaN), so they are ignored during calculations.
#50: np.vstack(): Vertically stacks arrays, so now you're adding a third level: ['X', 'Y', 'X', 'Y']. This adds another layer of labeling.
.T (Transpose): The T transposes the array so that each "row" is now a tuple of 3 labels (outer, inner, and sub-level).
MultiIndexing or hierarchical indexing allows you to represent higher-dimensional data in a structured way by breaking down the indices into multiple levels.In this example, you created a MultiIndex with 3 levels and then created a pandas DataFrame with random data, indexed by this MultiIndex.This is useful for organizing complex data where each observation belongs to multiple categories, making it easier to analyze and manipulate.
#52: Hierarchical indexing in pandas (also known as MultiIndexing) enables you to work with multi-level indexed data efficiently. With it, you can perform various operations that provide more flexibility when working with complex datasets. Here’s a brief explanation of common operations associated with hierarchical indexing:
1. Slicing:
You can slice the data at different levels of the index to retrieve specific subsets.
df.loc['A'] # Slicing by the first level 'A' df.loc[('A', 1)] # Slicing by both the first and second levels
This helps in isolating parts of the data easily, depending on which levels you want to focus on.
2. Stacking and Unstacking:
Stacking turns columns into rows (long format), while unstacking moves rows into columns (wide format). This is useful for reshaping data.
df_stacked = df.stack() # Stack columns into rows df_unstacked = df.unstack() # Unstack rows into columns
Stacking makes the DataFrame more compact (often useful in time series data).
Unstacking can help in widening the data for better readability.
3. Swapping Levels:
You can swap the levels in the index to reorder them or change the hierarchy.
df_swapped = df.swaplevel(0, 1) # Swap the first and second levels
This changes how pandas interprets your hierarchical structure, which can affect operations like slicing.
4. Grouping and Aggregating:
You can group data based on levels of the index and then apply aggregation functions like mean, sum, etc.
df.groupby(level=0).sum() # Group by the first level and sum the values df.groupby(level=[0, 1]).mean() # Group by the first two levels and calculate the mean
This is useful for summarizing data across categories (levels).
5. Reordering Levels:
You can change the order of the levels in the hierarchical index.
df_reordered = df.reorder_levels([2, 0, 1]) # Reorder the levels (third, first, second)
Changing the level order can be useful for making slicing or other operations more intuitive based on your analysis needs.
6. Resetting Index:
You can "flatten" the hierarchical index and turn it back into regular columns.
df_reset = df.reset_index() # Convert the multi-index back to columns
Resetting the index is helpful when you no longer need the hierarchical structure and prefer to work with simple columns.
Swapping refers to exchanging the positions of two specific levels of a MultiIndex. This operation changes the order of the two levels you specify but leaves the other levels in their original order.
Use case: When you want to switch the positions of exactly two levels in the index.