SlideShare a Scribd company logo
Effective Numerical Computation in NumPy and SciPy 
Kimikazu Kato 
PyCon JP 2014 
September 13, 2014 
1 / 35
About Myself 
Kimikazu Kato 
Chief Scientists at Silver Egg Technology Co., Ltd. 
Ph.D in Computer Science 
Background in Mathematics, Numerical Computation, Algorithms, etc. 
<2 year experience in Python 
>10 year experience in numerical computation 
Now designing algorithms for recommendation system, and doing research 
about machine learning and data analysis. 
2 / 35
This talk... 
is about effective usage of NumPy/SciPy 
is NOT exhaustive introduction of capabilities, but shows some case 
studies based on my experience and interest 
3 / 35
Table of Contents 
Introduction 
Basics about NumPy 
Broadcasting 
Indexing 
Sparse matrix 
Usage of scipy.sparse 
Internal structure 
Case studies 
Conclusion 
4 / 35
Numerical Computation 
Differential equations 
Simulations 
Signal processing 
Machine Learning 
etc... 
Why Numerical Computation in Python? 
Productivity 
Easy to write 
Easy to debug 
Connectivity with visualization tools 
Matplotlib 
IPython 
Connectivity with web system 
Many frameworks (Django, Pyramid, Flask, Bottle, etc.) 
5 / 35
But Python is Very Slow! 
Code in C 
#include <stdio.h> 
int main() { 
int i; double s=0; 
for (i=1; i<=100000000; i++) s+=i; 
printf("%.0fn",s); 
} 
Code in Python 
s=0. 
for i in xrange(1,100000001): 
s+=i 
print s 
Both of the codes compute the sum of integers from 1 to 100,000,000. 
Result of benchmark in a certain environment: 
Above: 0.109 sec (compiled with -O3 option) 
Below: 8.657 sec 
(80+ times slower!!) 
6 / 35
Better code 
import numpy as np 
a=np.arange(1,100000001) 
print a.sum() 
Now it takes 0.188 sec. (Measured by "time" command in Linux, loading time 
included) 
Still slower than C, but sufficiently fast as a script language. 
7 / 35
Lessons 
Python is very slow when written badly 
Translate C (or Java, C# etc.) code into Python is often a bad idea. 
Python-friendly rewriting sometimes result in drastic performance 
improvement 
8 / 35
Basic rules for better performance 
Avoid for-sentence as far as possible 
Utilize libraries' capabilities instead 
Forget about the cost of copying memory 
Typical C programmer might care about it, but ... 
9 / 35
Basic techniques for NumPy 
Broadcasting 
Indexing 
10 / 35
Broadcasting 
>>> import numpy as np 
>>> a=np.array([0,1,2]) 
>>> a*3 
array([0, 3, 6]) 
>>> b=np.array([1,4,9]) 
>>> np.sqrt(b) 
array([ 1., 2., 3.]) 
A function which is applied to each element when applied to an array is called 
a universal function. 
11 / 35
Broadcasting (2D) 
>>> import numpy as np 
>>> a=np.arange(9).reshape((3,3)) 
>>> b=np.array([1,2,3]) 
>>> a 
array([[0, 1, 2], 
[3, 4, 5], 
[6, 7, 8]]) 
>>> b 
array([1, 2, 3]) 
>>> a*b 
array([[ 0, 2, 6], 
[ 3, 8, 15], 
[ 6, 14, 24]]) 
12 / 35
Indexing 
>>> import numpy as np 
>>> a=np.arange(10) 
>>> a 
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 
>>> indices=np.arange(0,10,2) 
>>> indices 
array([0, 2, 4, 6, 8]) 
>>> a[indices]=0 
>>> a 
array([0, 1, 0, 3, 0, 5, 0, 7, 0, 9]) 
>>> b=np.arange(100,600,100) 
>>> b 
array([100, 200, 300, 400, 500]) 
>>> a[indices]=b 
>>> a 
array([100, 1, 200, 3, 300, 5, 400, 7, 500, 9]) 
13 / 35
Refernces 
Gabriele Lanaro, "Python High Performance Programming," Packt 
Publishing, 2013. 
Stéfan van der Walt, Numpy Medkit 
14 / 35
Sparse matrix 
Defined as a matrix in which most elements are zero 
Compressed data structure is used to express it, so that it will be... 
Space effective 
Time effective 
15 / 35
scipy.sparse 
The class scipy.sparse has mainly three types as expressions of a sparse 
matrix. (There are other types but not mentioned here) 
lil_matrix : convenient to set data; setting a[i,j] is fast 
csr_matrix : convenient for computation, fast to retrieve a row 
csc_matrix : convenient for computation, fast to retrieve a column 
Usually, set the data into lil_matrix, and then, convert it to csc_matrix or 
csr_matrix. 
For csr_matrix, and csc_matrix, calcutaion of matrices of the same type is fast, 
but you should avoid calculation of different types. 
16 / 35
Use case 
>>> from scipy.sparse import lil_matrix, csr_matrix 
>>> a=lil_matrix((3,3)) 
>>> a[0,0]=1.; a[0,2]=2. 
>>> a=a.tocsr() 
>>> print a 
(0, 0) 1.0 
(0, 2) 2.0 
>>> a.todense() 
matrix([[ 1., 0., 2.], 
[ 0., 0., 0.], 
[ 0., 0., 0.]]) 
>>> b=lil_matrix((3,3)) 
>>> b[1,1]=3.; b[2,0]=4.; b[2,2]=5. 
>>> b=b.tocsr() 
>>> b.todense() 
matrix([[ 0., 0., 0.], 
[ 0., 3., 0.], 
[ 4., 0., 5.]]) 
>>> c=a.dot(b) 
>>> c.todense() 
matrix([[ 8., 0., 10.], 
[ 0., 0., 0.], 
[ 0., 0., 0.]]) 
>>> d=a+b 
>>> d.todense() 
matrix([[ 1., 0., 2.], 
[ 0., 3., 0.], 
[ 4., 0., 5.]]) 17 / 35
Internal structure: csr_matrix 
>>> from scipy.sparse import lil_matrix, csr_matrix 
>>> a=lil_matrix((3,3)) 
>>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5. 
>>> b=a.tocsr() 
>>> b.todense() 
matrix([[ 0., 1., 2.], 
[ 0., 0., 3.], 
[ 4., 5., 0.]]) 
>>> b.indices 
array([1, 2, 2, 0, 1], dtype=int32) 
>>> b.data 
array([ 1., 2., 3., 4., 5.]) 
>>> b.indptr 
array([0, 2, 3, 5], dtype=int32) 
18 / 35
Internal structure: csc_matrix 
>>> from scipy.sparse import lil_matrix, csr_matrix 
>>> a=lil_matrix((3,3)) 
>>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5. 
>>> b=a.tocsc() 
>>> b.todense() 
matrix([[ 0., 1., 2.], 
[ 0., 0., 3.], 
[ 4., 5., 0.]]) 
>>> b.indices 
array([2, 0, 2, 0, 1], dtype=int32) 
>>> b.data 
array([ 4., 1., 5., 2., 3.]) 
>>> b.indptr 
array([0, 1, 3, 5], dtype=int32) 
19 / 35
Merit of knowing the internal structure 
Setting csr_matrix or csc_matrix with its internal structure is much faster than 
setting lil_matrix with indices. 
See the benchmark of setting 
 
 
 
  
  
ý ý 
ý  
 
 
 
 
20 / 35
from scipy.sparse import lil_matrix, csr_matrix 
import numpy as np 
from timeit import timeit 
def set_lil(n): 
a=lil_matrix((n,n)) 
for i in xrange(n): 
a[i,i]=2. 
if i+1n: 
a[i,i+1]=1. 
return a 
def set_csr(n): 
data=np.empty(2*n-1) 
indices=np.empty(2*n-1,dtype=np.int32) 
indptr=np.empty(n+1,dtype=np.int32) 
# to be fair, for-sentence is intentionally used 
# (using indexing technique is faster) 
for i in xrange(n): 
indices[2*i]=i 
data[2*i]=2. 
if in-1: 
indices[2*i+1]=i+1 
data[2*i+1]=1. 
indptr[i]=2*i 
indptr[n]=2*n-1 
a=csr_matrix((data,indices,indptr),shape=(n,n)) 
return a 
print lil:,timeit(set_lil(10000), 
number=10,setup=from __main__ import set_lil) 
print csr:,timeit(set_csr(10000), 
number=10,setup=from __main__ import set_csr) 
21 / 35
Result: 
lil: 11.6730761528 
csr: 0.0562081336975 
Remark 
When you deal with already sorted data, setting csr_matrix or csc_matrix 
with data, indices, indptr is much faster than setting lil_matrix 
But the code tend to be more complicated if you use the internal structure 
of csr_matrix or csc_matrix 
22 / 35
Case Studies 
23 / 35
Case 1: Norms 
If 2 
is dense: 
norm=np.dot(v,v) 
Ï2  Ï % 
2% 
Expressed as product of matrices. (dot means matrix product, but you don't 
have to take transpose explicitly.) 
When is sparse, suppose that is expressed as matrix: 
2 2  g * 
norm=v.multiply(v).sum() 
(multiply() is element-wise product) 
This is because taking transpose of a sparse matrix changes the type. 
24 / 35
Frobenius norm: 
norm=a.multiply(a).sum() 
 ÏÏ'SP % 
 % 
25 / 35
Case 2: Applying a function to all of the elements of a 
sparse matrix 
A universal function can be applied to a dense matrix: 
 import numpy as np 
 a=np.arange(9).reshape((3,3)) 
 a 
array([[0, 1, 2], 
[3, 4, 5], 
[6, 7, 8]]) 
 np.tanh(a) 
array([[ 0. , 0.76159416, 0.96402758], 
[ 0.99505475, 0.9993293 , 0.9999092 ], 
[ 0.99998771, 0.99999834, 0.99999977]]) 
This is convenient and fast. 
However, we cannot do the same thing for a sparse matrix. 
26 / 35
from scipy.sparse import lil_matrix 
 a=lil_matrix((3,3)) 
 a[0,0]=1. 
 a[1,0]=2. 
 b=a.tocsr() 
 np.tanh(b) 
3x3 sparse matrix of type 'type 'numpy.float64'' 
with 2 stored elements in Compressed Sparse Row format 
This is because, for an arbitrary function, its application to a sparse matrix is 
not necessarily sparse. 
However, if a universal function  satisfies 	
   
, the density is 
preserved. 
Then, how can we compute it? 
27 / 35
Use the internal structure!! 
The positions of the non-zero elements are not changed after application of 
the function. 
Keep indices and indptr, and just change data. 
Solution: 
b = csr_matrix((np.tanh(a.data), a.indices, a.indptr), shape=a.shape) 
28 / 35
Case 3: Formula which appears in a paper 
In the algorithm for recommendation system [1], the following formula 
appears: 
 øø   
 * g  
where is dense matrix, and D is a diagonal matrix defined from a 
given array as: 
	 %
 
  
 
 
 
  
  
ý 
 * 
 
 
 
Here, (which corresponds to the number of users or items) is big and 
(which means the number of latent factors) is small. 
[1] Hu et al. Collaborative Filtering for Implicit Feedback Datasets, ICDM, 
2008. 
*  
29 / 35
Solution 1: 
There is a special class dia_matrix to deal with a diagonal sparse matrix. 
import scipy.sparse as sparse 
import numpy as np 
def f(a,d): 
a: 2d array of shape (n,f), d: 1d array of length n 
dd=sparse.diags([d],[0]) 
return np.dot(a.T,dd.dot(a)) 
30 / 35
Solution 2: 
Pack csr_matrix with data,indices,indptr 
data=d 
indices=[0,1,..,n] 
indptr=[0,1,...,n+1] 
def g(a,d): 
n,f=a.shape 
data=d 
indices=np.arange(n) 
indptr=np.arange(n+1) 
dd=sparse.csr_matrix((data,indices,indptr),shape=(n,n)) 
return np.dot(a.T,dd.dot(a)) 
31 / 35
Solution 3: 
 
  
 
 
û 
) 
 
 
û 
) 
	 
  g g   
 
  
  
  
û 
)  
  
  
û 
)  
This is equivalent to the broadcasting! 
def h(a,d): 
return np.dot(a.T*d,a) 
ü 
ü 
ü 
* 
* 
û 
*) 
 
  
 
 
   
  
ý 
 * 
 
 
 
ü 
ü 
 g  
ü 
* * 
* * 
û 
*) * 
 
  
32 / 35
Benchmark 
def datagen(n,f): 
np.random.seed(0) 
a=np.random.random((n,f)) 
d=np.random.random(n) 
return a,d 
from timeit import timeit 
print dia_matrix :,timeit(f(a,d),number=10, 
setup=from __main__ import f,datagen; a,d=datagen(1000000,10)) 
print csr_matrix :,timeit(g(a,d),number=10, 
setup=from __main__ import g,datagen; a,d=datagen(1000000,10)) 
print broadcasting :,timeit(h(a,d),number=10, 
setup=from __main__ import h,datagen; a,d=datagen(1000000,10)) 
Result: 
dia_matrix : 1.60458707809 
csr_matrix : 1.32580018044 
broadcasting : 1.30032682419 
33 / 35
Conclusion 
Try not to use for-sentence, but use libraries' capabilities instead. 
Knowledge about the internal structure of the sparse matrix is useful to 
extract further performance. 
Mathematical derivation is important. The key is to find a mathematically 
equivalent and Python-friendly formula. 
Computational speed does not necessarily matter. Finding a better code in 
a short time is valuable. Otherwise, you shouldn't pursue too much. 
34 / 35
Acknowledgment 
I would like to thank 
(@shima__shima) 
who gave me useful advice in Twitter. 
35 / 35

More Related Content

PDF
Blueprints: Introduction to Python programming
PPTX
Python 101: Python for Absolute Beginners (PyTexas 2014)
PDF
PyTorch under the hood
PDF
よくわかるCoqプログラミング
PPTX
SQLチューニング入門 入門編
PDF
超実践 Cloud Spanner 設計講座
PPTX
Data Analysis with Python Pandas
PDF
今秋リリース予定のPostgreSQL11を徹底解説
Blueprints: Introduction to Python programming
Python 101: Python for Absolute Beginners (PyTexas 2014)
PyTorch under the hood
よくわかるCoqプログラミング
SQLチューニング入門 入門編
超実践 Cloud Spanner 設計講座
Data Analysis with Python Pandas
今秋リリース予定のPostgreSQL11を徹底解説

What's hot (20)

PDF
PostgreSQL: Advanced indexing
PPTX
MongoDB: システム可用性を拡張するインデクス戦略
PDF
MySQLerの7つ道具
PDF
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
PDF
画像処理でのPythonの利用
PDF
Scientific Computing with Python - NumPy | WeiYuan
PDF
運用視点でのAWSサポート利用Tips
PDF
【Tech circle】zabbix3.0ハンズオン
PPTX
Introduction to pandas
PDF
Docker道場「Dockerの基本概念」0825インフラ勉強会資料
PPTX
Python Programming Essentials - M23 - datetime module
PPTX
ランク6の俺がパズドラについて語る
PDF
Apache Drill を利用した実データの分析
PDF
InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxData
PPTX
ハトでもわかる単純パーセプトロン
PDF
本当にあったApache Spark障害の話
PDF
New Relicで始める、.NET Applications on AWSのObservability
PDF
ISUCONで学ぶ Webアプリケーションのパフォーマンス向上のコツ 実践編 完全版
PDF
4章 Linuxカーネル - 割り込み・例外 5
PDF
Mongodb - Scaling write performance
PostgreSQL: Advanced indexing
MongoDB: システム可用性を拡張するインデクス戦略
MySQLerの7つ道具
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
画像処理でのPythonの利用
Scientific Computing with Python - NumPy | WeiYuan
運用視点でのAWSサポート利用Tips
【Tech circle】zabbix3.0ハンズオン
Introduction to pandas
Docker道場「Dockerの基本概念」0825インフラ勉強会資料
Python Programming Essentials - M23 - datetime module
ランク6の俺がパズドラについて語る
Apache Drill を利用した実データの分析
InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxData
ハトでもわかる単純パーセプトロン
本当にあったApache Spark障害の話
New Relicで始める、.NET Applications on AWSのObservability
ISUCONで学ぶ Webアプリケーションのパフォーマンス向上のコツ 実践編 完全版
4章 Linuxカーネル - 割り込み・例外 5
Mongodb - Scaling write performance
Ad

Viewers also liked (20)

PDF
Zuang-FPSGD
PDF
A Safe Rule for Sparse Logistic Regression
PDF
Recommendation System --Theory and Practice
PDF
【論文紹介】Approximate Bayesian Image Interpretation Using Generative Probabilisti...
PDF
特定の不快感を与えるツイートの分類と自動生成について
PDF
About Our Recommender System
PDF
養成読本と私
PDF
Googleにおける機械学習の活用とクラウドサービス
PDF
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
PDF
「TensorFlow Tutorialの数学的背景」 クイックツアー(パート1)
PDF
Googleのインフラ技術に見る基盤標準化とDevOpsの真実
PDF
Life with jupyter
PDF
Numpy scipy matplotlibの紹介
PDF
Introducton to Convolutional Nerural Network with TensorFlow
PDF
数式をnumpyに落としこむコツ
PDF
NumPy闇入門
PDF
Spannerに関する技術メモ
PDF
言語処理するのに Python でいいの? #PyDataTokyo
PDF
Using Kubernetes on Google Container Engine
PDF
数式を綺麗にプログラミングするコツ #spro2013
Zuang-FPSGD
A Safe Rule for Sparse Logistic Regression
Recommendation System --Theory and Practice
【論文紹介】Approximate Bayesian Image Interpretation Using Generative Probabilisti...
特定の不快感を与えるツイートの分類と自動生成について
About Our Recommender System
養成読本と私
Googleにおける機械学習の活用とクラウドサービス
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
「TensorFlow Tutorialの数学的背景」 クイックツアー(パート1)
Googleのインフラ技術に見る基盤標準化とDevOpsの真実
Life with jupyter
Numpy scipy matplotlibの紹介
Introducton to Convolutional Nerural Network with TensorFlow
数式をnumpyに落としこむコツ
NumPy闇入門
Spannerに関する技術メモ
言語処理するのに Python でいいの? #PyDataTokyo
Using Kubernetes on Google Container Engine
数式を綺麗にプログラミングするコツ #spro2013
Ad

Similar to Effective Numerical Computation in NumPy and SciPy (20)

KEY
Numpy Talk at SIAM
PDF
Kaggle tokyo 2018
PDF
Introduction to NumPy (PyData SV 2013)
PDF
Introduction to NumPy
PDF
Introduction to NumPy for Machine Learning Programmers
PDF
Numpy.pdf
PPTX
Chapter 5-Numpy-Pandas.pptx python programming
PPTX
Lecture 2 _Foundions foundions NumPyI.pptx
PDF
The num py_library_20200818
PDF
numpy.pdf
PDF
Lecture 5 of Stanford university about python librarys
PPTX
UNIT-03_Numpy (1) python yeksodbbsisbsjsjsh
PPTX
Introduction-to-NumPy-in-Python (1).pptx
PPTX
NUMPY LIBRARY study materials PPT 2.pptx
PPTX
NumPy-python-27-9-24-we.pptxNumPy-python-27-9-24-we.pptx
PPTX
THE NUMPY LIBRARY of python with slides.pptx
PPTX
Numpy_Pandas_for beginners_________.pptx
PPTX
numpy code and examples with attributes.pptx
PDF
Numpy questions with answers and practice
Numpy Talk at SIAM
Kaggle tokyo 2018
Introduction to NumPy (PyData SV 2013)
Introduction to NumPy
Introduction to NumPy for Machine Learning Programmers
Numpy.pdf
Chapter 5-Numpy-Pandas.pptx python programming
Lecture 2 _Foundions foundions NumPyI.pptx
The num py_library_20200818
numpy.pdf
Lecture 5 of Stanford university about python librarys
UNIT-03_Numpy (1) python yeksodbbsisbsjsjsh
Introduction-to-NumPy-in-Python (1).pptx
NUMPY LIBRARY study materials PPT 2.pptx
NumPy-python-27-9-24-we.pptxNumPy-python-27-9-24-we.pptx
THE NUMPY LIBRARY of python with slides.pptx
Numpy_Pandas_for beginners_________.pptx
numpy code and examples with attributes.pptx
Numpy questions with answers and practice

More from Kimikazu Kato (15)

PDF
Tokyo webmining 2017-10-28
PDF
機械学習ゴリゴリ派のための数学とPython
PDF
Pythonを使った機械学習の学習
PDF
Fast and Probvably Seedings for k-Means
PDF
Pythonで機械学習入門以前
PDF
Pythonによる機械学習
PDF
Introduction to behavior based recommendation system
PDF
Pythonによる機械学習の最前線
PDF
Sparse pca via bipartite matching
PDF
正しいプログラミング言語の覚え方
PDF
Sapporo20140709
PDF
ネット通販向けレコメンドシステム提供サービスについて
PPTX
関東GPGPU勉強会資料
PDF
2012-03-08 MSS研究会
PPTX
純粋関数型アルゴリズム入門
Tokyo webmining 2017-10-28
機械学習ゴリゴリ派のための数学とPython
Pythonを使った機械学習の学習
Fast and Probvably Seedings for k-Means
Pythonで機械学習入門以前
Pythonによる機械学習
Introduction to behavior based recommendation system
Pythonによる機械学習の最前線
Sparse pca via bipartite matching
正しいプログラミング言語の覚え方
Sapporo20140709
ネット通販向けレコメンドシステム提供サービスについて
関東GPGPU勉強会資料
2012-03-08 MSS研究会
純粋関数型アルゴリズム入門

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Sensors and Actuators in IoT Systems using pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Electronic commerce courselecture one. Pdf
PDF
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
PDF
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Advanced IT Governance
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
HCSP-Presales-Campus Network Planning and Design V1.0 Training Material-Witho...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Sensors and Actuators in IoT Systems using pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
GamePlan Trading System Review: Professional Trader's Honest Take
Electronic commerce courselecture one. Pdf
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
Understanding_Digital_Forensics_Presentation.pptx
Advanced IT Governance
Reach Out and Touch Someone: Haptics and Empathic Computing
“AI and Expert System Decision Support & Business Intelligence Systems”
Review of recent advances in non-invasive hemoglobin estimation
HCSP-Presales-Campus Network Planning and Design V1.0 Training Material-Witho...
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Monthly Chronicles - July 2025
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
NewMind AI Weekly Chronicles - August'25 Week I
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

Effective Numerical Computation in NumPy and SciPy

  • 1. Effective Numerical Computation in NumPy and SciPy Kimikazu Kato PyCon JP 2014 September 13, 2014 1 / 35
  • 2. About Myself Kimikazu Kato Chief Scientists at Silver Egg Technology Co., Ltd. Ph.D in Computer Science Background in Mathematics, Numerical Computation, Algorithms, etc. <2 year experience in Python >10 year experience in numerical computation Now designing algorithms for recommendation system, and doing research about machine learning and data analysis. 2 / 35
  • 3. This talk... is about effective usage of NumPy/SciPy is NOT exhaustive introduction of capabilities, but shows some case studies based on my experience and interest 3 / 35
  • 4. Table of Contents Introduction Basics about NumPy Broadcasting Indexing Sparse matrix Usage of scipy.sparse Internal structure Case studies Conclusion 4 / 35
  • 5. Numerical Computation Differential equations Simulations Signal processing Machine Learning etc... Why Numerical Computation in Python? Productivity Easy to write Easy to debug Connectivity with visualization tools Matplotlib IPython Connectivity with web system Many frameworks (Django, Pyramid, Flask, Bottle, etc.) 5 / 35
  • 6. But Python is Very Slow! Code in C #include <stdio.h> int main() { int i; double s=0; for (i=1; i<=100000000; i++) s+=i; printf("%.0fn",s); } Code in Python s=0. for i in xrange(1,100000001): s+=i print s Both of the codes compute the sum of integers from 1 to 100,000,000. Result of benchmark in a certain environment: Above: 0.109 sec (compiled with -O3 option) Below: 8.657 sec (80+ times slower!!) 6 / 35
  • 7. Better code import numpy as np a=np.arange(1,100000001) print a.sum() Now it takes 0.188 sec. (Measured by "time" command in Linux, loading time included) Still slower than C, but sufficiently fast as a script language. 7 / 35
  • 8. Lessons Python is very slow when written badly Translate C (or Java, C# etc.) code into Python is often a bad idea. Python-friendly rewriting sometimes result in drastic performance improvement 8 / 35
  • 9. Basic rules for better performance Avoid for-sentence as far as possible Utilize libraries' capabilities instead Forget about the cost of copying memory Typical C programmer might care about it, but ... 9 / 35
  • 10. Basic techniques for NumPy Broadcasting Indexing 10 / 35
  • 11. Broadcasting >>> import numpy as np >>> a=np.array([0,1,2]) >>> a*3 array([0, 3, 6]) >>> b=np.array([1,4,9]) >>> np.sqrt(b) array([ 1., 2., 3.]) A function which is applied to each element when applied to an array is called a universal function. 11 / 35
  • 12. Broadcasting (2D) >>> import numpy as np >>> a=np.arange(9).reshape((3,3)) >>> b=np.array([1,2,3]) >>> a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) >>> b array([1, 2, 3]) >>> a*b array([[ 0, 2, 6], [ 3, 8, 15], [ 6, 14, 24]]) 12 / 35
  • 13. Indexing >>> import numpy as np >>> a=np.arange(10) >>> a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> indices=np.arange(0,10,2) >>> indices array([0, 2, 4, 6, 8]) >>> a[indices]=0 >>> a array([0, 1, 0, 3, 0, 5, 0, 7, 0, 9]) >>> b=np.arange(100,600,100) >>> b array([100, 200, 300, 400, 500]) >>> a[indices]=b >>> a array([100, 1, 200, 3, 300, 5, 400, 7, 500, 9]) 13 / 35
  • 14. Refernces Gabriele Lanaro, "Python High Performance Programming," Packt Publishing, 2013. Stéfan van der Walt, Numpy Medkit 14 / 35
  • 15. Sparse matrix Defined as a matrix in which most elements are zero Compressed data structure is used to express it, so that it will be... Space effective Time effective 15 / 35
  • 16. scipy.sparse The class scipy.sparse has mainly three types as expressions of a sparse matrix. (There are other types but not mentioned here) lil_matrix : convenient to set data; setting a[i,j] is fast csr_matrix : convenient for computation, fast to retrieve a row csc_matrix : convenient for computation, fast to retrieve a column Usually, set the data into lil_matrix, and then, convert it to csc_matrix or csr_matrix. For csr_matrix, and csc_matrix, calcutaion of matrices of the same type is fast, but you should avoid calculation of different types. 16 / 35
  • 17. Use case >>> from scipy.sparse import lil_matrix, csr_matrix >>> a=lil_matrix((3,3)) >>> a[0,0]=1.; a[0,2]=2. >>> a=a.tocsr() >>> print a (0, 0) 1.0 (0, 2) 2.0 >>> a.todense() matrix([[ 1., 0., 2.], [ 0., 0., 0.], [ 0., 0., 0.]]) >>> b=lil_matrix((3,3)) >>> b[1,1]=3.; b[2,0]=4.; b[2,2]=5. >>> b=b.tocsr() >>> b.todense() matrix([[ 0., 0., 0.], [ 0., 3., 0.], [ 4., 0., 5.]]) >>> c=a.dot(b) >>> c.todense() matrix([[ 8., 0., 10.], [ 0., 0., 0.], [ 0., 0., 0.]]) >>> d=a+b >>> d.todense() matrix([[ 1., 0., 2.], [ 0., 3., 0.], [ 4., 0., 5.]]) 17 / 35
  • 18. Internal structure: csr_matrix >>> from scipy.sparse import lil_matrix, csr_matrix >>> a=lil_matrix((3,3)) >>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5. >>> b=a.tocsr() >>> b.todense() matrix([[ 0., 1., 2.], [ 0., 0., 3.], [ 4., 5., 0.]]) >>> b.indices array([1, 2, 2, 0, 1], dtype=int32) >>> b.data array([ 1., 2., 3., 4., 5.]) >>> b.indptr array([0, 2, 3, 5], dtype=int32) 18 / 35
  • 19. Internal structure: csc_matrix >>> from scipy.sparse import lil_matrix, csr_matrix >>> a=lil_matrix((3,3)) >>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5. >>> b=a.tocsc() >>> b.todense() matrix([[ 0., 1., 2.], [ 0., 0., 3.], [ 4., 5., 0.]]) >>> b.indices array([2, 0, 2, 0, 1], dtype=int32) >>> b.data array([ 4., 1., 5., 2., 3.]) >>> b.indptr array([0, 1, 3, 5], dtype=int32) 19 / 35
  • 20. Merit of knowing the internal structure Setting csr_matrix or csc_matrix with its internal structure is much faster than setting lil_matrix with indices. See the benchmark of setting ý ý ý 20 / 35
  • 21. from scipy.sparse import lil_matrix, csr_matrix import numpy as np from timeit import timeit def set_lil(n): a=lil_matrix((n,n)) for i in xrange(n): a[i,i]=2. if i+1n: a[i,i+1]=1. return a def set_csr(n): data=np.empty(2*n-1) indices=np.empty(2*n-1,dtype=np.int32) indptr=np.empty(n+1,dtype=np.int32) # to be fair, for-sentence is intentionally used # (using indexing technique is faster) for i in xrange(n): indices[2*i]=i data[2*i]=2. if in-1: indices[2*i+1]=i+1 data[2*i+1]=1. indptr[i]=2*i indptr[n]=2*n-1 a=csr_matrix((data,indices,indptr),shape=(n,n)) return a print lil:,timeit(set_lil(10000), number=10,setup=from __main__ import set_lil) print csr:,timeit(set_csr(10000), number=10,setup=from __main__ import set_csr) 21 / 35
  • 22. Result: lil: 11.6730761528 csr: 0.0562081336975 Remark When you deal with already sorted data, setting csr_matrix or csc_matrix with data, indices, indptr is much faster than setting lil_matrix But the code tend to be more complicated if you use the internal structure of csr_matrix or csc_matrix 22 / 35
  • 24. Case 1: Norms If 2 is dense: norm=np.dot(v,v) Ï2 Ï % 2% Expressed as product of matrices. (dot means matrix product, but you don't have to take transpose explicitly.) When is sparse, suppose that is expressed as matrix: 2 2 g * norm=v.multiply(v).sum() (multiply() is element-wise product) This is because taking transpose of a sparse matrix changes the type. 24 / 35
  • 26. Case 2: Applying a function to all of the elements of a sparse matrix A universal function can be applied to a dense matrix: import numpy as np a=np.arange(9).reshape((3,3)) a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) np.tanh(a) array([[ 0. , 0.76159416, 0.96402758], [ 0.99505475, 0.9993293 , 0.9999092 ], [ 0.99998771, 0.99999834, 0.99999977]]) This is convenient and fast. However, we cannot do the same thing for a sparse matrix. 26 / 35
  • 27. from scipy.sparse import lil_matrix a=lil_matrix((3,3)) a[0,0]=1. a[1,0]=2. b=a.tocsr() np.tanh(b) 3x3 sparse matrix of type 'type 'numpy.float64'' with 2 stored elements in Compressed Sparse Row format This is because, for an arbitrary function, its application to a sparse matrix is not necessarily sparse. However, if a universal function satisfies , the density is preserved. Then, how can we compute it? 27 / 35
  • 28. Use the internal structure!! The positions of the non-zero elements are not changed after application of the function. Keep indices and indptr, and just change data. Solution: b = csr_matrix((np.tanh(a.data), a.indices, a.indptr), shape=a.shape) 28 / 35
  • 29. Case 3: Formula which appears in a paper In the algorithm for recommendation system [1], the following formula appears: øø * g where is dense matrix, and D is a diagonal matrix defined from a given array as: % ý * Here, (which corresponds to the number of users or items) is big and (which means the number of latent factors) is small. [1] Hu et al. Collaborative Filtering for Implicit Feedback Datasets, ICDM, 2008. * 29 / 35
  • 30. Solution 1: There is a special class dia_matrix to deal with a diagonal sparse matrix. import scipy.sparse as sparse import numpy as np def f(a,d): a: 2d array of shape (n,f), d: 1d array of length n dd=sparse.diags([d],[0]) return np.dot(a.T,dd.dot(a)) 30 / 35
  • 31. Solution 2: Pack csr_matrix with data,indices,indptr data=d indices=[0,1,..,n] indptr=[0,1,...,n+1] def g(a,d): n,f=a.shape data=d indices=np.arange(n) indptr=np.arange(n+1) dd=sparse.csr_matrix((data,indices,indptr),shape=(n,n)) return np.dot(a.T,dd.dot(a)) 31 / 35
  • 32. Solution 3: û ) û ) g g û ) û ) This is equivalent to the broadcasting! def h(a,d): return np.dot(a.T*d,a) ü ü ü * * û *) ý * ü ü g ü * * * * û *) * 32 / 35
  • 33. Benchmark def datagen(n,f): np.random.seed(0) a=np.random.random((n,f)) d=np.random.random(n) return a,d from timeit import timeit print dia_matrix :,timeit(f(a,d),number=10, setup=from __main__ import f,datagen; a,d=datagen(1000000,10)) print csr_matrix :,timeit(g(a,d),number=10, setup=from __main__ import g,datagen; a,d=datagen(1000000,10)) print broadcasting :,timeit(h(a,d),number=10, setup=from __main__ import h,datagen; a,d=datagen(1000000,10)) Result: dia_matrix : 1.60458707809 csr_matrix : 1.32580018044 broadcasting : 1.30032682419 33 / 35
  • 34. Conclusion Try not to use for-sentence, but use libraries' capabilities instead. Knowledge about the internal structure of the sparse matrix is useful to extract further performance. Mathematical derivation is important. The key is to find a mathematically equivalent and Python-friendly formula. Computational speed does not necessarily matter. Finding a better code in a short time is valuable. Otherwise, you shouldn't pursue too much. 34 / 35
  • 35. Acknowledgment I would like to thank (@shima__shima) who gave me useful advice in Twitter. 35 / 35