SlideShare a Scribd company logo
Sequential Function Approximation
In High Dimensions for Big Data
Dongbin Xiu
Collaborators: Yeonjong Shin, Kailiang Wu
Department of Mathematics
Ohio State University
Function Approximation: Setup
• Consider an unknown target function f(x), x 2 Rd
f = (f(x1), . . . , f(xm))• Let be the function samples
• Construct an approximation
ef(x) ⇡ f(x)
• Standard approach:
• Choose a linear space
• Choose a set of basis function (for example, polynomials)
i(x), i = 1, . . . , n
• Let
• Determine the coefficients c = (c1, . . . , cn)
ef(x) =
nX
j=1
ci j(x)
ef(X) =
MX
i=1
ci i(X) (= c0 + c1X + c2X2
+ · · · )
ef(xj) = f(xj)
Ac = f
Example: Polynomial Interpolation
• Interpolation condition:
• Linear system of equations:
M A c f=
N
M = N
1 X X2 X3
Overdetermined Case
A
c f≈M
N
• How “big” is big ?
• KB à MB à GB à TB à PB à EBà ZB à YB à ?
• Big = infinity-B
• M à infinity
• Always an overdetermined system
• “Easy” case: Least squares works mathematically
• Allows larger N for complex models
• Cost: O(MN2)
FYI: Peta Exa Zetta Yotta
Sequential Approximation: Framework
• Sequential: Constructing approximation using one data point at a time
• Data: fk = f(xk) + ✏k, k = 1, . . .
E(✏k) = 0, E(✏2
k) = 2
(xk)  2
max < 1
• Starting with an arbitrary initial approximation at k=0.
• At k-th data point, we seek to “minimally interpolate the current data”
ef(k)
(x) = argmin
p2V
✓
p(x) ef(k 1)
(x)
2
L2
!
◆
, subject to p(xk) = fk
ef(k)
(x) = argmin
p2V
✓
p(x) ef(k 1)
(x)
2
L2
!
+
1
k
|p(xk) fk|2
◆
Or, in grneral, for noisy data: γk ~ O(𝜎)
Sequential Approximation: Algorithm
• Starting with an initial coefficient vector c(0)
• Draw i.i.d. random samples from D with probability measure ν
{Xk}k=1,... ⇠ ⌫
And collect data {f(Xk)}k=1,...
• Assuming orthonormal basis:
• Algorithm:
• Remarks:
• Only vector operations. No matrices needed
• Can handle stream data
• Motivation: Randomized Kaczmarz for Ax = b (Strohmer & Vershynin, 2009)
• Question: Convergence? If so, accuracy?
• Compute
c(k)
= c(k 1)
+
f(xk) hc(k 1)
, (xk)i
k (xk)k2
2 + k
(xk), k = 1, . . . ,
ef(x) =
NX
j=1
cj j(x) = h (x), ci
Convergence
• Theorem: Upper and lower bounds for the general sampling measure (skipped)
• Remarks:
• The rate is optimal --- can not be improved
• This is not an error bound. It is an error equality.
• Sampling can be done by, e.g., MCMC.
• Theorem (limiting case γk=0): Let the sampling probability measure be
d⌫(x) =
0
@ 1
N
NX
j=1
2
j (x)
1
A d!(x)
Ekc(k)
ˆck2
2 = kf Pfk2
! + rk
kˆck2
2 kf Pfk2
! , r = 1
1
N
Then the following holds
Ekf ef(k)
k2
! = 2kf Pfk2
! + rk
kPfk2
! kf Pfk2
! , r = 1
1
N
GAUSSIAN: f = exp
" dX
i=1
c2
i
✓
xi + 1
2
↵i
◆2
#
CONTINUOUS: f = exp
" dX
i=1
ci
xi + 1
2
↵i
#
CORNER PEAK: f =
"
1 +
dX
i=1
xi + 1
2i2
# (d+1)
PRODUCT PEAK: f =
dY
i=1
"
c 2
i +
✓
xi + 1
2
↵i
◆2
# 1
Test Functions
• “Standard” functions from Genz (84)
• In two dimensions (d=2): Franke function
Numerical Tests
0 500K 1,000K 1,500K 2,000K 2,500K 3,000K 3,500K
10-5
10-4
10-3
10-2
10-1
100
n=5
n=6
n=7
n=8
n=9
• Target function: GAUSSIAN, d=10
Approximation errors at degree n
1 2 3 4 5 6 7 8 9 10
10-6
10-5
10-4
10-3
10-2
10-1
100
d=10
10 7.9 ⇥ 10 3
7.6 ⇥ 10 3
20 2.2 ⇥ 10 3
2.2 ⇥ 10 3
20 2.2 ⇥ 10 3
2.2 ⇥ 10 3
30 1.0 ⇥ 10 3
1.0 ⇥ 10 3
Table 4.1
Numerical rate of convergence snum (4.3) against the optimal rate of convergence sopt (4.4) in
two dimension d = 2, for the test functions (4.1) at di↵erent polynomial degree.
urate at lower levels for higher degree polynomials, as expected. The numerical errors
in the converged results are plotted in Fig. 4.4, with respect to the di↵erent degree of
polynomials. One clearly observe the exponential convergence of the approximation
error at increasing degree of polynomials. This is expected for this smooth target
function. The number of iterations m to achieve the converged solutions is tabulated
in Table 4.2. For reference, we also tabulate the cardinality of the polynomial space,
p, at di↵erent polynomial degree. In this case, the number of iterations m is also the
number of function data. These results indicate that the proposed RK method is in-
deed suitable for “big data” problems, where a large amount of data can be collected.
At high degree n 9, the proposed RK method never requires the formation of the
model matrix, which would be of size O(106
⇥105
) and can not be easily handled. In-
stead the method requires only operations of the row vectors, whose size is O(105
⇥1).
All the computations here were performed on a standard desktop computer with Intel
i7-4770 CPU at 3.40GHz and with 24.0GB RAM.
d = 10 n = 1 n = 2 n = 3 n = 4 n = 5
p 11 66 286 1,001 3,003
m 10,000 10,000 10,000 26,000 80,000
n = 6 n = 7 n = 8 n = 9 n = 10
p 8,008 19,448 43,758 92,378 184,756
m 220,000 500,000 1,700,000 3,000,000 7,000,000
Table 4.2
The number of iterations (m) used to reach the converged solutions in Fig. 4.3, along with the
cardinality of the polynomial space (p) at dimension d = 10. The target function is the GAUSSIAN
Converged Solutions
• Complexity:
N = p
M = m
0 5M 10M 15M 20M 25M
10-2
10-1
n=2
n=3
n=4
num opt
ten dimension d = 10 for the GAUSSIAN function f1 in (4.1) with ci = 1 and wi = 0.5.
p of the polynomial space. Again, all the simulations were conducted on the same
desktop computer.
0 1M 2M 3M 4M 5M 6M 7M
10-3
10-2
10-1
100
n=3
n=4
n=5
0 5M 10M 15M 20M 25M
10-2
10-1
n=2
n=3
n=4
Fig. 4.5. Error convergence with respect to the iteration count for the GAUSSAN function f1
with ci = 1 and wi = 0.5 by Legendre polynomials. Left: d = 20; Right: d = 40.
d = 20 n = 1 n = 2 n = 3 n = 4 n = 5
p 21 231 1,771 10,626 53,130
m 10,000 50,000 200,000 900,000 4,000,000
d = 40 n = 1 n = 2 n = 3 n = 4 n = 5
p 41 861 12,341 135,751 1,221,759
m 1,000 100,000 1,500,000 20,700,000 —
Table 4.4
The number of iterations (m) used to reach the converged solutions in Fig. 4.5, along with the
High Dimensions
• Legendre approximation at d=40
• Complexity:
N = p
M = m
Irregular Data Set
−1 −0.5 0 0.5 1
−1
−0.5
0
0.5
1
⇥ = {xj}M
j=1
• Fact: In practice, data sets rarely follow optimal measure
Nearest Neighbor Replacement
At each step k=1, 2, …
• Draw a sample 𝜉k~𝜇
• Find the nearest neighbor to 𝜉k inside ϴ
kxik
⇠kk2  kxj ⇠kk2 , 8xj 2 ⇥
• Conduct SA update at the selected point xik
⇥ = {xj}M
j=1
Properties:
• Still only vector operations
• Additional cost of finding the nearest neighbor: 2-norm, sorting
• The size of the data set M can vary
−1 −0.5 0 0.5 1
−1
−0.5
0
0.5
1
NNR: Theoretical Justification
• The point set defines a Voronoi tessellation
Di = x 2 D kx xik2 < kx xjk2 for any xj 6= xi
¯D =
M[
i=1
¯Di
• The NNR creates a discrete measure to sample the points
⇠k ⇠ µ ! xik
⇠ ˆµ
• The discrete sampling measure is a weak approximation of 𝜇
Z
D
g(x)dˆµ
Z
g(x)dµ 
✓
sup
x2D
krg(x)k2
◆
= max
1iM
diam(Di)
• Main Result: If 𝜇 is the optimal sampling measure, then the NNR on ϴ converges
lim
k!1
E
✓
p(k)
f
2
L2
!
◆
= 2 kf PV fk
2
L2
!
+ O( )
Numerical Tests
GAUSSIAN: f = exp
" dX
i=1
c2
i
✓
xi + 1
2
↵i
◆2
#
CONTINUOUS: f = exp
" dX
i=1
ci
xi + 1
2
↵i
#
CORNER PEAK: f =
"
1 +
dX
i=1
xi + 1
2i2
# (d+1)
PRODUCT PEAK: f =
dY
i=1
"
c 2
i +
✓
xi + 1
2
↵i
◆2
# 1
• “Standard” functions from Genz (84)
• Sampling methods:
• Uniform probability (SA: uniform)
• Optimal measure + NNR (SA: NNR)
• Randomized Kaczmarz (RK)
• For linear system of equations: Ax = b
• Strohmer and Vershynin, 2009
• Randomly sample the rows with probability / k (xi)k2
−1 −0.5 0 0.5 1
−1
−0.5
0
0.5
1
−1 −0.5 0 0.5 1
−1
−0.5
0
0.5
1
−1 −0.5 0 0.5 1
−1
−0.5
0
0.5
1
−1 −0.5 0 0.5 1
−1
−0.5
0
0.5
1
ϴ1
ϴ3
ϴ2
ϴ4M=20,000
0 5K 10K 15K 20K
10
−8
10
−6
10
−4
10
−2
10
0
RK
SA: Uniform
SA: NNR
0 5K 10K 15K 20K
10
−3
10
−2
10
−1
10
0
RK
SA: Uniform
SA: NNR
0 5K 10K 15K 20K
10
−8
10
−6
10
−4
10
−2
10
0
RK
SA: Uniform
SA: NNR
0 5K 10K 15K 20K
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
10
1
RK
SA: Uniform
SA: NNR
f1 on ϴ1 f2 on ϴ2
f3 on ϴ3 f4 on ϴ4
Error Convergence with n=10 polynomials
0 100K 200K 300K 400K 500K
10
−4
10
−3
10
−2
10
−1
10
0
RK
SA: Uniform
SA: NNR
High dimensional (d=10) Dynamic Data Set
• Domain: D=[-1,1]d, D1= |x| <0.5 containing 99% points, D2=DD1
• Dynamic set: Initial M=106, Poisson distribution 𝜆=2,000 every 100 steps
Back to Regular Grids
• Cardinality of polynomial space: N =
✓
n + d
d
◆
=
(n + d)!
n!d!
.
• Question: What is the (almost) best points for polynomial approximation in d=1?
• Answer: Gauss quadrature
• Most straightforward extension for d>1: tensorized Gauss points
• Advantage: retain all the good mathematical properties from d=1
• Disadvantage: Fastest growth of the number of points
• Let m be the number of points in each dimension
the total number of points: M = md
• Worst choice for d>>1.
• Our approach for d>>1:
• Let us use the least likely choice: tensorized quadrature points
• Apply the sequential approximation algorithm
• And let’s see what happens …
Main Algorithm
• Starting with an initial coefficient vector c(0)
• Randomly draw a point, from the M tensor Gauss points, with probability
where
d
n := span{xk
= xk1
1 · · · xkd
d , |k|1  n}
is the tensor product of the one-dimensional polynomial space ⇧1
n. Since ⇧d
n ✓ d
n for
any d 1 and n 0, the 2n polynomial exactness (3.5) holds for all f 2 ⇧d
2n.
Other one-dimensional choices such as Gauss-Labatto rule, Gauss-Radau rule can
certainly be used. The number of points m in each dimension may be di↵erent.
3.1.2. A Randomized Kaczmarz iteration. With the tensor quadrature points
defined, we proceed to apply the following randomized Kaczmarz iteration.
For any tensor quadrature point z(j)
2 ⇥M , define the sampling probability
p⇤
j = w(j) k (z(j)
)k2
2
N
, j = 1, . . . , M, (3.6)
which satisfies
PM
j=1 p⇤
j = 1 by the 2n polynomial exactness of the tensor quadrature.
Using the vector notation (2.7) and setting an initial choice of c[0]
= 0, one then
computes, for k = 0, 1, . . . ,
c[k+1]
= c[k]
+
f(z(j[k]
)
) hc[k]
, (z(j[k]
)
)i
k (z(j[k]))k2
2
(z(j[k]
)
), z(j[k]
)
⇠ dµp⇤ , (3.7)
where dµp⇤ denotes the probability measure corresponding to (3.6), i.e.
dµp⇤ :=
MX
j=1
p⇤
j x z(j)
dx, (3.8)
and (x) is the Dirac delta function.
is the tensor product of the one-dimensional polynomial space ⇧1
n. Since ⇧d
n
any d 1 and n 0, the 2n polynomial exactness (3.5) holds for all f 2 ⇧d
2n
Other one-dimensional choices such as Gauss-Labatto rule, Gauss-Radau
certainly be used. The number of points m in each dimension may be di↵ere
3.1.2. A Randomized Kaczmarz iteration. With the tensor quadrat
defined, we proceed to apply the following randomized Kaczmarz iteration.
For any tensor quadrature point z(j)
2 ⇥M , define the sampling probabi
p⇤
j = w(j) k (z(j)
)k2
2
N
, j = 1, . . . , M,
which satisfies
PM
j=1 p⇤
j = 1 by the 2n polynomial exactness of the tensor qua
Using the vector notation (2.7) and setting an initial choice of c[0]
= 0,
computes, for k = 0, 1, . . . ,
c[k+1]
= c[k]
+
f(z(j[k]
)
) hc[k]
, (z(j[k]
)
)i
k (z(j[k]))k2
2
(z(j[k]
)
), z(j[k]
)
⇠ dµp⇤ ,
where dµp⇤ denotes the probability measure corresponding to (3.6), i.e.
dµp⇤ :=
MX
j=1
p⇤
j x z(j)
dx,
and (x) is the Dirac delta function.
The implementation of the algorithm is remarkably simple. One random
a point from the tensor quadrature set ⇥M using the discrete probability (3
• Compute
• The sampling discrete probability is not a common one.
• But it can be sampled efficiently using the tensor structure.
• Consider M=(n+1)d tensor product grids of Gauss-Legendre quadrature of (n+1)
Error Analysis
[f, g]w :=
j=1
w f(z )g(z ),
for functions f and g. The corresponding induced discrete norm is denoted by k · k2
w.
Theorem 3.1. Assume c[0]
= 0. The k-th iterative solution of the algorithm
(3.7) satisfies
Ekc[k]
ˆck2
2 = kf P⇧fk2
w + E + rk
(kP⇧fk2
w kf P⇧fk2
w E), (3.9)
where
E = 2 [f P⇧f, P⇧f]w hˆc, ei , r = 1 1/N,
and e = ˜c ˆc with ˜cj := [f, j]w. And,
lim
k!1
Ekc[k]
ˆck2
2 = kf P⇧fk2
w + E = kfk2
w k˜ck2
2 + kek2
2. (3.10)
Furthermore, the resulting approximation ef[k]
= hc[k]
, (x)i satisfies
Ek ef[k]
fk2
L2
!
= kf P⇧fk2
L2
!
+kf P⇧fk2
w+E+rk
(kP⇧fk2
w kf P⇧fk2
w E). (3.11)
And,
lim
k!1
Ek ef[k]
fk2
L2
!
= kf P⇧fk2
L2
!
+ kf P⇧fk2
w + E. (3.12)
The proof is in the next section.
Theorem 3.1, as an equality, gives an estimation of the numerical errors of the
proposed algorithm (3.7). The expectation E in (3.9) shall be understood as then o
• Error equality;
• Rate can not be improved.
Computational Complexity
• Convergence criterion: Let the total number of iteration K = γ N, then error decay
2, . . . , d, and can be trivially realized.
3.4.2. Convergence criterion. The convergence rate (1 1/N) in Theorem
3.1 is in the form of equality. This provides us a very sharp estimate of convergence
criterion. Let K be the total number of iterations for the main algorithm (3.7). Let
K = N, where is constant independent on N. Then we derive
✓
1
1
N
◆K
= exp
✓
N ln
✓
1
1
N
◆◆
= exp
1X
i=1
1
i
1
Ni 1
!!
= exp
✓
+ O
✓
1
N
◆◆
⇡ e , if N 1.
According to Theorem 3.1, this implies that the square of iteration error, the last
term in (3.11), becomes roughly e times smaller. For example, when = 5, e ⇡
6.7⇥10 3
; when = 10, e ⇡ 4.5⇥10 5
. In most problems, = 10 can be su cient.
Our extensive numerical tests verify that K ⇠ 10N is a good criterion for accurate
solutions. On the other hand, suppose one desires the iteration error to be at certain
small level ✏ ⌧ 1, then the iteration can be stopped at ⇠ log(✏)N steps.
In high dimensions d 1, the total number of tensor quadrature points M = (n+
1)d
grows exceptionally fast. It is much bigger than the cardinality of the polynomial
11
• A good stopping criterion: γ = 10, e-γ = 4.5 x 10-5
• Complexity:
• Memory storage: O(d x N) real numbers
• Each iteration is O(d x N) flops
• At high dimensions, N ~ dn >> d. Also assume K = γ N
• Storage: O(N) real numbers, and O(KN) = O(N2) flops
• Tensor structure of the points is the key.
• For reference: Suppose we solve the regression using least square with J>N points
• Storage: O(J x N) = O(N2) real numbers
• Flops: O(J x N2) = O(N3)
0 5K 10K 15K 20K
10
−8
10
−7
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
Theoretical rates
n=10
n=15
n=20
0 5K 10K 15K 20K
10
−3
10
−2
10
−1
10
0
Theoretical rates
n=10
n=20
n=40
0 5K 10K 15K 20K
10
−12
10
−10
10
−8
10
−6
10
−4
10
−2
10
0
Theoretical rates
n=5
n=10
n=20
0 5K 10K 15K 20K
10
−14
10
−12
10
−10
10
−8
10
−6
10
−4
10
−2
10
0
Theoretical rates
n=10
n=15
n=20
f1
f2
f3
f4
Unbounded Domain with Hermite Polynomial (d=2)
Verification: Legendre Approximation in d=10
• Let us take a target function: GAUSSIAN f1
0 500K 1000K 1500K 2000K
10
−4
10
−3
10
−2
0 500K 1000K 1500K 2000K
10
−4
10
−3
10
−2
Fig. 4.4. Coe cient errors (left) and function approximation errors (right) versus num-
ber of iterations by Legendre polynomials of di↵erent degree n at d = 10. The target function
is the GAUSSIAN function f1 in (4.1) with i = 1 and i = 0.375.
0 100K 200K 300K 400K
10
−4
10
−3
10
−2
10
−1
10
0
Numerical
Theoretical
0 200K 400K 600K 800K 1000K 1200K
10
−4
10
−3
10
−2
10
−1
10
0
Numerical
Theoretical
n=7 n=8
1
The exponential convergence of the errors with respect to the iteration count can be
clearly observed. For the polynomial degrees n = 5, 6, 7, 8, 9, the cardinality of the
polynomial space is N = 3, 003, 8, 008, 19, 448, 43, 758 and 92, 378, respectively. We
observe from Fig. 4.4 that the ⇠ 10N iteration count is indeed a good (and quite
conservative) criterion for converged solution.
Furthermore, for this separable function, all the terms in the theoretical conver-
gence formula (3.11) in Theorem 3.1 can be accurately computed. We thus obtain
n = 7 : E ef[k]
f
2
L2(!)
' 6.1279 ⇥ 10 5
+ 172.5288 ⇥
✓
1
1
19448
◆k
,
16
• Estimates of the theoretical convergence:
0 5K 10K 15K 20K
10
−3
10
n=20
n=30
n=40
0 5K 10K 15K 20K
10
−3
10
n=20
n=30
n=40
Fig. 4.3. Function approximation errors versus number of iterations by trigonometric
polynomials for four of test functions in (4.1) and (4.2) at d = 2. Top left: f1 with i = 1
and i = ⇡+1
2
; Top right: f2 with i = 1 and i = ⇡+1
2
; Bottom left: f4 with i = 2 and
i = ⇡+1
2
; Bottom right: f6.
and
n = 8 : E ef[k]
f
2
L2(!)
' 2.8532 ⇥ 10 6
+ 172.5289 ⇥
✓
1
1
43758
◆k
.
We plot these “theoretical curves” in Fig. 4.5, along with the numerical convergence
over a single simulation. We observe that they agree with each other very well and
the di↵erence is indistinguishable.
For higher dimensions, we present the results of d = 40 in Fig. 4.6. As the com-
plexity of the problem grows exponentially in higher dimensions, we confine ourselves
to polynomial degree n  4. Again, the “theoretical convergence curves” agree very
well with the actual numerical convergence.
We now examine the convergence rate with di↵erent sampling probability. As
Theorem 3.2 in Section 3.2 indicates, the randomized tensor quadrature method shall
converge via any proper discrete sampling probability, as in the general algorithm
(3.14). However, the optimal sampling probability (3.6) in our main algorithm (3.7)
shall give us the optimal rate of convergence. This can be clearly seen in Fig. 4.7
and 4.8, where the numerical convergence by the optimal sampling probability (3.6)
and by the discrete uniform probability are shown. Fig. 4.7 is for the GAUSSIAN
17
lim
k!1
Ekc ˆck2 = kf P⇧fkw + E = kfkw k˜ck2 + kek2. (3.10)
Furthermore, the resulting approximation ef[k]
= hc[k]
, (x)i satisfies
Ek ef[k]
fk2
L2
!
= kf P⇧fk2
L2
!
+kf P⇧fk2
w+E+rk
(kP⇧fk2
w kf P⇧fk2
w E). (3.11)
And,
lim
k!1
Ek ef[k]
fk2
L2
!
= kf P⇧fk2
L2
!
+ kf P⇧fk2
w + E. (3.12)
The proof is in the next section.
Theorem 3.1, as an equality, gives an estimation of the numerical errors of the
proposed algorithm (3.7). The expectation E in (3.9) shall be understood as the
expectation over the random sequence
n
z(j[`]
)
o
0`k 1
of the algorithm. The con-
vergence rate in expectation is 1 1/N, which is the optimal convergence rate for
randomized Kaczmarz methods.
3.2. General algorithm. Instead of using the discrete probability p⇤
j in (3.6),
the randomized iteration (3.7) in our main algorithm can be carried out by any discrete
probability. This results in a more general algorithm.
Let
dµp :=
MX
j=1
pj x z(j)
dx, (3.13)
be a general discrete probability measure, where pj is any discrete probability mass
satisfying
PM
j=1 pj = 1. Then the same iteration (3.7) can be adopted with the
measure dµp, i.e.,
[k+1] [k] f(z(j[k]
)
) hc[k]
, (z(j[k]
)
)i (j[k]
) (j[k]
)
• Theorem:
5
the polynomial space is N = 125, 751. The left of Fig. 4.13 is for the results in
bounded domain by Legendre polynomials, and the right is for unbounded domain
by Hermite polynomials. Again, we observe the expected exponential convergence
and its agreement with the theoretical convergence. Note that in this case the tensor
quadrature grids ⇥M consists of M = 3500
⇡ 3.6⇥10238
points. A number too large to
be handled by most computers. However, the current randomized tensor quadrature
method converges after ⇠ 1.2 ⇥ 106
steps, following the ⇠ 10N convergence rule and
using only a tiny portion, ⇠ 1/10232
, of the tensor quadrature points.
0 500K 1000K 1500K 2000K
10
−2
10
−1
10
0
Numerical
Theoretical rate
0 500K 1000K 1500K 2000K
10
−1
10
0
Numerical
Theoretical rate
Fig. 4.13. Function approximation errors versus number of iterations of n = 2 at d = 500.
Left: Legendre polynomials in [ 1, 1]500
; Right: Hermite polynomials in R500
. The target
function is f5 in (4.2) with i = 0.
High Dimension: d=500
• Target function:
f4(x) =
i=1
2
i +
i
2
i ; (PRODUCT PEAK).
where = [ 1, · · · , d] are parameters controlling the di culty of the functions, and
= [ 1, · · · , d] are shifting parameters. We also consider the following two function
f5(x) = cos kx k2 , f6(x) =
9
5 4 cos
⇣Pd
i=1 xi
⌘, (4.2
13
• Polynomial degree: n=2. N = 125,751.
• Total number of tensor points: M ~ 3.6 x 10238
Legendre in bounded domain Hermite in unbounded domain
• Method converges after 1.2 x 106 steps.
• Uses only a tiny portion of the full tensor points.
0 2K 4K 6K 8K 10K
10-8
10-6
10-4
10-2
100
Halton Sequence
Sobol Sequence
2D Fibonacci lattice rules
Uniform Samples
Chebyshev Samples
Optimal Samples
0 1M 2M 3M
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
Halton Sequence
Sobol Sequence
Uniform Samples
Chebyshev Samples
Optimal Samples
Convergence with QMC Grids
d=2, n=10
d=10, n=8
Summary
• Sequential approximation method
• Motivated by randomized Kaczmarz method
• Optimal sampling measure
• Highly efficient algorithm using tensorized quadrature
• Nearest neighbor replacement algorithm for arbitrary data sets
• Reference:
• Shin and Xiu, A randomized method for multivariate
function approximation, SIAM J. Sci. Comput.
• Wu, Shin and Xiu, Randomized tensor quadrature method for
high dimensional polynomial approximation, SIAM J. Sci. Comput.
• Wu and Xiu, Sequential function approximation on arbitrarily
distributed point sets, J. Comput. Phys.

More Related Content

PDF
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...

What's hot (20)

PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
Litvinenko low-rank kriging +FFT poster
PDF
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
MVPA with SpaceNet: sparse structured priors
PDF
Patch Matching with Polynomial Exponential Families and Projective Divergences
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Delayed acceptance for Metropolis-Hastings algorithms
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
Litvinenko low-rank kriging +FFT poster
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
MVPA with SpaceNet: sparse structured priors
Patch Matching with Polynomial Exponential Families and Projective Divergences
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Delayed acceptance for Metropolis-Hastings algorithms
Ad

Similar to Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Sequential Function Approximation in High Dimensions with Big Data - Dongbin Xiu, Aug 30, 2017 (20)

PDF
Interpolation techniques - Background and implementation
PDF
Lecture about interpolation
PDF
Harmonic Analysis and Deep Learning
PDF
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
PDF
Norm-variation of bilinear averages
PDF
Fortran chapter 2.pdf
PDF
Intro. to computational Physics ch2.pdf
PDF
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...
PDF
Thesis defense
PDF
Ijetcas14 546
PDF
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
PDF
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
PDF
AJMS_402_22_Reprocess_new.pdf
PPTX
Maths iii quick review by Dr Asish K Mukhopadhyay
PDF
Lecture on solving1
PDF
Exchange confirm
PDF
Harmonic Analysis And The Theory Of Probability Reprint 2020 Salomon Bochner
PDF
Lecture5
PPT
lecture6.ppt
PDF
Best Approximation in Real Linear 2-Normed Spaces
Interpolation techniques - Background and implementation
Lecture about interpolation
Harmonic Analysis and Deep Learning
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Norm-variation of bilinear averages
Fortran chapter 2.pdf
Intro. to computational Physics ch2.pdf
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...
Thesis defense
Ijetcas14 546
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
AJMS_402_22_Reprocess_new.pdf
Maths iii quick review by Dr Asish K Mukhopadhyay
Lecture on solving1
Exchange confirm
Harmonic Analysis And The Theory Of Probability Reprint 2020 Salomon Bochner
Lecture5
lecture6.ppt
Best Approximation in Real Linear 2-Normed Spaces
Ad

More from The Statistical and Applied Mathematical Sciences Institute (20)

PDF
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
PDF
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
PDF
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
PDF
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
PDF
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
PDF
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
PPTX
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
PDF
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
PDF
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
PPTX
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
PDF
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
PDF
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
PDF
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
PDF
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
PDF
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
PDF
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
PPTX
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
PPTX
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
PDF
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
PDF
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...

Recently uploaded (20)

PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
RMMM.pdf make it easy to upload and study
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Cell Types and Its function , kingdom of life
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Lesson notes of climatology university.
PDF
Trump Administration's workforce development strategy
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Classroom Observation Tools for Teachers
Chinmaya Tiranga quiz Grand Finale.pdf
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Anesthesia in Laparoscopic Surgery in India
Microbial diseases, their pathogenesis and prophylaxis
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
RMMM.pdf make it easy to upload and study
Module 4: Burden of Disease Tutorial Slides S2 2025
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
VCE English Exam - Section C Student Revision Booklet
Cell Types and Its function , kingdom of life
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Lesson notes of climatology university.
Trump Administration's workforce development strategy
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Weekly quiz Compilation Jan -July 25.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Classroom Observation Tools for Teachers

Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Sequential Function Approximation in High Dimensions with Big Data - Dongbin Xiu, Aug 30, 2017

  • 1. Sequential Function Approximation In High Dimensions for Big Data Dongbin Xiu Collaborators: Yeonjong Shin, Kailiang Wu Department of Mathematics Ohio State University
  • 2. Function Approximation: Setup • Consider an unknown target function f(x), x 2 Rd f = (f(x1), . . . , f(xm))• Let be the function samples • Construct an approximation ef(x) ⇡ f(x) • Standard approach: • Choose a linear space • Choose a set of basis function (for example, polynomials) i(x), i = 1, . . . , n • Let • Determine the coefficients c = (c1, . . . , cn) ef(x) = nX j=1 ci j(x)
  • 3. ef(X) = MX i=1 ci i(X) (= c0 + c1X + c2X2 + · · · ) ef(xj) = f(xj) Ac = f Example: Polynomial Interpolation • Interpolation condition: • Linear system of equations: M A c f= N M = N 1 X X2 X3
  • 4. Overdetermined Case A c f≈M N • How “big” is big ? • KB à MB à GB à TB à PB à EBà ZB à YB à ? • Big = infinity-B • M à infinity • Always an overdetermined system • “Easy” case: Least squares works mathematically • Allows larger N for complex models • Cost: O(MN2) FYI: Peta Exa Zetta Yotta
  • 5. Sequential Approximation: Framework • Sequential: Constructing approximation using one data point at a time • Data: fk = f(xk) + ✏k, k = 1, . . . E(✏k) = 0, E(✏2 k) = 2 (xk)  2 max < 1 • Starting with an arbitrary initial approximation at k=0. • At k-th data point, we seek to “minimally interpolate the current data” ef(k) (x) = argmin p2V ✓ p(x) ef(k 1) (x) 2 L2 ! ◆ , subject to p(xk) = fk ef(k) (x) = argmin p2V ✓ p(x) ef(k 1) (x) 2 L2 ! + 1 k |p(xk) fk|2 ◆ Or, in grneral, for noisy data: γk ~ O(𝜎)
  • 6. Sequential Approximation: Algorithm • Starting with an initial coefficient vector c(0) • Draw i.i.d. random samples from D with probability measure ν {Xk}k=1,... ⇠ ⌫ And collect data {f(Xk)}k=1,... • Assuming orthonormal basis: • Algorithm: • Remarks: • Only vector operations. No matrices needed • Can handle stream data • Motivation: Randomized Kaczmarz for Ax = b (Strohmer & Vershynin, 2009) • Question: Convergence? If so, accuracy? • Compute c(k) = c(k 1) + f(xk) hc(k 1) , (xk)i k (xk)k2 2 + k (xk), k = 1, . . . , ef(x) = NX j=1 cj j(x) = h (x), ci
  • 7. Convergence • Theorem: Upper and lower bounds for the general sampling measure (skipped) • Remarks: • The rate is optimal --- can not be improved • This is not an error bound. It is an error equality. • Sampling can be done by, e.g., MCMC. • Theorem (limiting case γk=0): Let the sampling probability measure be d⌫(x) = 0 @ 1 N NX j=1 2 j (x) 1 A d!(x) Ekc(k) ˆck2 2 = kf Pfk2 ! + rk kˆck2 2 kf Pfk2 ! , r = 1 1 N Then the following holds Ekf ef(k) k2 ! = 2kf Pfk2 ! + rk kPfk2 ! kf Pfk2 ! , r = 1 1 N
  • 8. GAUSSIAN: f = exp " dX i=1 c2 i ✓ xi + 1 2 ↵i ◆2 # CONTINUOUS: f = exp " dX i=1 ci xi + 1 2 ↵i # CORNER PEAK: f = " 1 + dX i=1 xi + 1 2i2 # (d+1) PRODUCT PEAK: f = dY i=1 " c 2 i + ✓ xi + 1 2 ↵i ◆2 # 1 Test Functions • “Standard” functions from Genz (84) • In two dimensions (d=2): Franke function
  • 9. Numerical Tests 0 500K 1,000K 1,500K 2,000K 2,500K 3,000K 3,500K 10-5 10-4 10-3 10-2 10-1 100 n=5 n=6 n=7 n=8 n=9 • Target function: GAUSSIAN, d=10 Approximation errors at degree n
  • 10. 1 2 3 4 5 6 7 8 9 10 10-6 10-5 10-4 10-3 10-2 10-1 100 d=10 10 7.9 ⇥ 10 3 7.6 ⇥ 10 3 20 2.2 ⇥ 10 3 2.2 ⇥ 10 3 20 2.2 ⇥ 10 3 2.2 ⇥ 10 3 30 1.0 ⇥ 10 3 1.0 ⇥ 10 3 Table 4.1 Numerical rate of convergence snum (4.3) against the optimal rate of convergence sopt (4.4) in two dimension d = 2, for the test functions (4.1) at di↵erent polynomial degree. urate at lower levels for higher degree polynomials, as expected. The numerical errors in the converged results are plotted in Fig. 4.4, with respect to the di↵erent degree of polynomials. One clearly observe the exponential convergence of the approximation error at increasing degree of polynomials. This is expected for this smooth target function. The number of iterations m to achieve the converged solutions is tabulated in Table 4.2. For reference, we also tabulate the cardinality of the polynomial space, p, at di↵erent polynomial degree. In this case, the number of iterations m is also the number of function data. These results indicate that the proposed RK method is in- deed suitable for “big data” problems, where a large amount of data can be collected. At high degree n 9, the proposed RK method never requires the formation of the model matrix, which would be of size O(106 ⇥105 ) and can not be easily handled. In- stead the method requires only operations of the row vectors, whose size is O(105 ⇥1). All the computations here were performed on a standard desktop computer with Intel i7-4770 CPU at 3.40GHz and with 24.0GB RAM. d = 10 n = 1 n = 2 n = 3 n = 4 n = 5 p 11 66 286 1,001 3,003 m 10,000 10,000 10,000 26,000 80,000 n = 6 n = 7 n = 8 n = 9 n = 10 p 8,008 19,448 43,758 92,378 184,756 m 220,000 500,000 1,700,000 3,000,000 7,000,000 Table 4.2 The number of iterations (m) used to reach the converged solutions in Fig. 4.3, along with the cardinality of the polynomial space (p) at dimension d = 10. The target function is the GAUSSIAN Converged Solutions • Complexity: N = p M = m
  • 11. 0 5M 10M 15M 20M 25M 10-2 10-1 n=2 n=3 n=4 num opt ten dimension d = 10 for the GAUSSIAN function f1 in (4.1) with ci = 1 and wi = 0.5. p of the polynomial space. Again, all the simulations were conducted on the same desktop computer. 0 1M 2M 3M 4M 5M 6M 7M 10-3 10-2 10-1 100 n=3 n=4 n=5 0 5M 10M 15M 20M 25M 10-2 10-1 n=2 n=3 n=4 Fig. 4.5. Error convergence with respect to the iteration count for the GAUSSAN function f1 with ci = 1 and wi = 0.5 by Legendre polynomials. Left: d = 20; Right: d = 40. d = 20 n = 1 n = 2 n = 3 n = 4 n = 5 p 21 231 1,771 10,626 53,130 m 10,000 50,000 200,000 900,000 4,000,000 d = 40 n = 1 n = 2 n = 3 n = 4 n = 5 p 41 861 12,341 135,751 1,221,759 m 1,000 100,000 1,500,000 20,700,000 — Table 4.4 The number of iterations (m) used to reach the converged solutions in Fig. 4.5, along with the High Dimensions • Legendre approximation at d=40 • Complexity: N = p M = m
  • 12. Irregular Data Set −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 ⇥ = {xj}M j=1 • Fact: In practice, data sets rarely follow optimal measure
  • 13. Nearest Neighbor Replacement At each step k=1, 2, … • Draw a sample 𝜉k~𝜇 • Find the nearest neighbor to 𝜉k inside ϴ kxik ⇠kk2  kxj ⇠kk2 , 8xj 2 ⇥ • Conduct SA update at the selected point xik ⇥ = {xj}M j=1 Properties: • Still only vector operations • Additional cost of finding the nearest neighbor: 2-norm, sorting • The size of the data set M can vary
  • 14. −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 NNR: Theoretical Justification • The point set defines a Voronoi tessellation Di = x 2 D kx xik2 < kx xjk2 for any xj 6= xi ¯D = M[ i=1 ¯Di • The NNR creates a discrete measure to sample the points ⇠k ⇠ µ ! xik ⇠ ˆµ • The discrete sampling measure is a weak approximation of 𝜇 Z D g(x)dˆµ Z g(x)dµ  ✓ sup x2D krg(x)k2 ◆ = max 1iM diam(Di) • Main Result: If 𝜇 is the optimal sampling measure, then the NNR on ϴ converges lim k!1 E ✓ p(k) f 2 L2 ! ◆ = 2 kf PV fk 2 L2 ! + O( )
  • 15. Numerical Tests GAUSSIAN: f = exp " dX i=1 c2 i ✓ xi + 1 2 ↵i ◆2 # CONTINUOUS: f = exp " dX i=1 ci xi + 1 2 ↵i # CORNER PEAK: f = " 1 + dX i=1 xi + 1 2i2 # (d+1) PRODUCT PEAK: f = dY i=1 " c 2 i + ✓ xi + 1 2 ↵i ◆2 # 1 • “Standard” functions from Genz (84) • Sampling methods: • Uniform probability (SA: uniform) • Optimal measure + NNR (SA: NNR) • Randomized Kaczmarz (RK) • For linear system of equations: Ax = b • Strohmer and Vershynin, 2009 • Randomly sample the rows with probability / k (xi)k2
  • 16. −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 ϴ1 ϴ3 ϴ2 ϴ4M=20,000
  • 17. 0 5K 10K 15K 20K 10 −8 10 −6 10 −4 10 −2 10 0 RK SA: Uniform SA: NNR 0 5K 10K 15K 20K 10 −3 10 −2 10 −1 10 0 RK SA: Uniform SA: NNR 0 5K 10K 15K 20K 10 −8 10 −6 10 −4 10 −2 10 0 RK SA: Uniform SA: NNR 0 5K 10K 15K 20K 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1 RK SA: Uniform SA: NNR f1 on ϴ1 f2 on ϴ2 f3 on ϴ3 f4 on ϴ4 Error Convergence with n=10 polynomials
  • 18. 0 100K 200K 300K 400K 500K 10 −4 10 −3 10 −2 10 −1 10 0 RK SA: Uniform SA: NNR High dimensional (d=10) Dynamic Data Set • Domain: D=[-1,1]d, D1= |x| <0.5 containing 99% points, D2=DD1 • Dynamic set: Initial M=106, Poisson distribution 𝜆=2,000 every 100 steps
  • 19. Back to Regular Grids • Cardinality of polynomial space: N = ✓ n + d d ◆ = (n + d)! n!d! . • Question: What is the (almost) best points for polynomial approximation in d=1? • Answer: Gauss quadrature • Most straightforward extension for d>1: tensorized Gauss points • Advantage: retain all the good mathematical properties from d=1 • Disadvantage: Fastest growth of the number of points • Let m be the number of points in each dimension the total number of points: M = md • Worst choice for d>>1. • Our approach for d>>1: • Let us use the least likely choice: tensorized quadrature points • Apply the sequential approximation algorithm • And let’s see what happens …
  • 20. Main Algorithm • Starting with an initial coefficient vector c(0) • Randomly draw a point, from the M tensor Gauss points, with probability where d n := span{xk = xk1 1 · · · xkd d , |k|1  n} is the tensor product of the one-dimensional polynomial space ⇧1 n. Since ⇧d n ✓ d n for any d 1 and n 0, the 2n polynomial exactness (3.5) holds for all f 2 ⇧d 2n. Other one-dimensional choices such as Gauss-Labatto rule, Gauss-Radau rule can certainly be used. The number of points m in each dimension may be di↵erent. 3.1.2. A Randomized Kaczmarz iteration. With the tensor quadrature points defined, we proceed to apply the following randomized Kaczmarz iteration. For any tensor quadrature point z(j) 2 ⇥M , define the sampling probability p⇤ j = w(j) k (z(j) )k2 2 N , j = 1, . . . , M, (3.6) which satisfies PM j=1 p⇤ j = 1 by the 2n polynomial exactness of the tensor quadrature. Using the vector notation (2.7) and setting an initial choice of c[0] = 0, one then computes, for k = 0, 1, . . . , c[k+1] = c[k] + f(z(j[k] ) ) hc[k] , (z(j[k] ) )i k (z(j[k]))k2 2 (z(j[k] ) ), z(j[k] ) ⇠ dµp⇤ , (3.7) where dµp⇤ denotes the probability measure corresponding to (3.6), i.e. dµp⇤ := MX j=1 p⇤ j x z(j) dx, (3.8) and (x) is the Dirac delta function. is the tensor product of the one-dimensional polynomial space ⇧1 n. Since ⇧d n any d 1 and n 0, the 2n polynomial exactness (3.5) holds for all f 2 ⇧d 2n Other one-dimensional choices such as Gauss-Labatto rule, Gauss-Radau certainly be used. The number of points m in each dimension may be di↵ere 3.1.2. A Randomized Kaczmarz iteration. With the tensor quadrat defined, we proceed to apply the following randomized Kaczmarz iteration. For any tensor quadrature point z(j) 2 ⇥M , define the sampling probabi p⇤ j = w(j) k (z(j) )k2 2 N , j = 1, . . . , M, which satisfies PM j=1 p⇤ j = 1 by the 2n polynomial exactness of the tensor qua Using the vector notation (2.7) and setting an initial choice of c[0] = 0, computes, for k = 0, 1, . . . , c[k+1] = c[k] + f(z(j[k] ) ) hc[k] , (z(j[k] ) )i k (z(j[k]))k2 2 (z(j[k] ) ), z(j[k] ) ⇠ dµp⇤ , where dµp⇤ denotes the probability measure corresponding to (3.6), i.e. dµp⇤ := MX j=1 p⇤ j x z(j) dx, and (x) is the Dirac delta function. The implementation of the algorithm is remarkably simple. One random a point from the tensor quadrature set ⇥M using the discrete probability (3 • Compute • The sampling discrete probability is not a common one. • But it can be sampled efficiently using the tensor structure. • Consider M=(n+1)d tensor product grids of Gauss-Legendre quadrature of (n+1)
  • 21. Error Analysis [f, g]w := j=1 w f(z )g(z ), for functions f and g. The corresponding induced discrete norm is denoted by k · k2 w. Theorem 3.1. Assume c[0] = 0. The k-th iterative solution of the algorithm (3.7) satisfies Ekc[k] ˆck2 2 = kf P⇧fk2 w + E + rk (kP⇧fk2 w kf P⇧fk2 w E), (3.9) where E = 2 [f P⇧f, P⇧f]w hˆc, ei , r = 1 1/N, and e = ˜c ˆc with ˜cj := [f, j]w. And, lim k!1 Ekc[k] ˆck2 2 = kf P⇧fk2 w + E = kfk2 w k˜ck2 2 + kek2 2. (3.10) Furthermore, the resulting approximation ef[k] = hc[k] , (x)i satisfies Ek ef[k] fk2 L2 ! = kf P⇧fk2 L2 ! +kf P⇧fk2 w+E+rk (kP⇧fk2 w kf P⇧fk2 w E). (3.11) And, lim k!1 Ek ef[k] fk2 L2 ! = kf P⇧fk2 L2 ! + kf P⇧fk2 w + E. (3.12) The proof is in the next section. Theorem 3.1, as an equality, gives an estimation of the numerical errors of the proposed algorithm (3.7). The expectation E in (3.9) shall be understood as then o • Error equality; • Rate can not be improved.
  • 22. Computational Complexity • Convergence criterion: Let the total number of iteration K = γ N, then error decay 2, . . . , d, and can be trivially realized. 3.4.2. Convergence criterion. The convergence rate (1 1/N) in Theorem 3.1 is in the form of equality. This provides us a very sharp estimate of convergence criterion. Let K be the total number of iterations for the main algorithm (3.7). Let K = N, where is constant independent on N. Then we derive ✓ 1 1 N ◆K = exp ✓ N ln ✓ 1 1 N ◆◆ = exp 1X i=1 1 i 1 Ni 1 !! = exp ✓ + O ✓ 1 N ◆◆ ⇡ e , if N 1. According to Theorem 3.1, this implies that the square of iteration error, the last term in (3.11), becomes roughly e times smaller. For example, when = 5, e ⇡ 6.7⇥10 3 ; when = 10, e ⇡ 4.5⇥10 5 . In most problems, = 10 can be su cient. Our extensive numerical tests verify that K ⇠ 10N is a good criterion for accurate solutions. On the other hand, suppose one desires the iteration error to be at certain small level ✏ ⌧ 1, then the iteration can be stopped at ⇠ log(✏)N steps. In high dimensions d 1, the total number of tensor quadrature points M = (n+ 1)d grows exceptionally fast. It is much bigger than the cardinality of the polynomial 11 • A good stopping criterion: γ = 10, e-γ = 4.5 x 10-5 • Complexity: • Memory storage: O(d x N) real numbers • Each iteration is O(d x N) flops • At high dimensions, N ~ dn >> d. Also assume K = γ N • Storage: O(N) real numbers, and O(KN) = O(N2) flops • Tensor structure of the points is the key. • For reference: Suppose we solve the regression using least square with J>N points • Storage: O(J x N) = O(N2) real numbers • Flops: O(J x N2) = O(N3)
  • 23. 0 5K 10K 15K 20K 10 −8 10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 Theoretical rates n=10 n=15 n=20 0 5K 10K 15K 20K 10 −3 10 −2 10 −1 10 0 Theoretical rates n=10 n=20 n=40 0 5K 10K 15K 20K 10 −12 10 −10 10 −8 10 −6 10 −4 10 −2 10 0 Theoretical rates n=5 n=10 n=20 0 5K 10K 15K 20K 10 −14 10 −12 10 −10 10 −8 10 −6 10 −4 10 −2 10 0 Theoretical rates n=10 n=15 n=20 f1 f2 f3 f4 Unbounded Domain with Hermite Polynomial (d=2)
  • 24. Verification: Legendre Approximation in d=10 • Let us take a target function: GAUSSIAN f1 0 500K 1000K 1500K 2000K 10 −4 10 −3 10 −2 0 500K 1000K 1500K 2000K 10 −4 10 −3 10 −2 Fig. 4.4. Coe cient errors (left) and function approximation errors (right) versus num- ber of iterations by Legendre polynomials of di↵erent degree n at d = 10. The target function is the GAUSSIAN function f1 in (4.1) with i = 1 and i = 0.375. 0 100K 200K 300K 400K 10 −4 10 −3 10 −2 10 −1 10 0 Numerical Theoretical 0 200K 400K 600K 800K 1000K 1200K 10 −4 10 −3 10 −2 10 −1 10 0 Numerical Theoretical n=7 n=8 1 The exponential convergence of the errors with respect to the iteration count can be clearly observed. For the polynomial degrees n = 5, 6, 7, 8, 9, the cardinality of the polynomial space is N = 3, 003, 8, 008, 19, 448, 43, 758 and 92, 378, respectively. We observe from Fig. 4.4 that the ⇠ 10N iteration count is indeed a good (and quite conservative) criterion for converged solution. Furthermore, for this separable function, all the terms in the theoretical conver- gence formula (3.11) in Theorem 3.1 can be accurately computed. We thus obtain n = 7 : E ef[k] f 2 L2(!) ' 6.1279 ⇥ 10 5 + 172.5288 ⇥ ✓ 1 1 19448 ◆k , 16 • Estimates of the theoretical convergence: 0 5K 10K 15K 20K 10 −3 10 n=20 n=30 n=40 0 5K 10K 15K 20K 10 −3 10 n=20 n=30 n=40 Fig. 4.3. Function approximation errors versus number of iterations by trigonometric polynomials for four of test functions in (4.1) and (4.2) at d = 2. Top left: f1 with i = 1 and i = ⇡+1 2 ; Top right: f2 with i = 1 and i = ⇡+1 2 ; Bottom left: f4 with i = 2 and i = ⇡+1 2 ; Bottom right: f6. and n = 8 : E ef[k] f 2 L2(!) ' 2.8532 ⇥ 10 6 + 172.5289 ⇥ ✓ 1 1 43758 ◆k . We plot these “theoretical curves” in Fig. 4.5, along with the numerical convergence over a single simulation. We observe that they agree with each other very well and the di↵erence is indistinguishable. For higher dimensions, we present the results of d = 40 in Fig. 4.6. As the com- plexity of the problem grows exponentially in higher dimensions, we confine ourselves to polynomial degree n  4. Again, the “theoretical convergence curves” agree very well with the actual numerical convergence. We now examine the convergence rate with di↵erent sampling probability. As Theorem 3.2 in Section 3.2 indicates, the randomized tensor quadrature method shall converge via any proper discrete sampling probability, as in the general algorithm (3.14). However, the optimal sampling probability (3.6) in our main algorithm (3.7) shall give us the optimal rate of convergence. This can be clearly seen in Fig. 4.7 and 4.8, where the numerical convergence by the optimal sampling probability (3.6) and by the discrete uniform probability are shown. Fig. 4.7 is for the GAUSSIAN 17 lim k!1 Ekc ˆck2 = kf P⇧fkw + E = kfkw k˜ck2 + kek2. (3.10) Furthermore, the resulting approximation ef[k] = hc[k] , (x)i satisfies Ek ef[k] fk2 L2 ! = kf P⇧fk2 L2 ! +kf P⇧fk2 w+E+rk (kP⇧fk2 w kf P⇧fk2 w E). (3.11) And, lim k!1 Ek ef[k] fk2 L2 ! = kf P⇧fk2 L2 ! + kf P⇧fk2 w + E. (3.12) The proof is in the next section. Theorem 3.1, as an equality, gives an estimation of the numerical errors of the proposed algorithm (3.7). The expectation E in (3.9) shall be understood as the expectation over the random sequence n z(j[`] ) o 0`k 1 of the algorithm. The con- vergence rate in expectation is 1 1/N, which is the optimal convergence rate for randomized Kaczmarz methods. 3.2. General algorithm. Instead of using the discrete probability p⇤ j in (3.6), the randomized iteration (3.7) in our main algorithm can be carried out by any discrete probability. This results in a more general algorithm. Let dµp := MX j=1 pj x z(j) dx, (3.13) be a general discrete probability measure, where pj is any discrete probability mass satisfying PM j=1 pj = 1. Then the same iteration (3.7) can be adopted with the measure dµp, i.e., [k+1] [k] f(z(j[k] ) ) hc[k] , (z(j[k] ) )i (j[k] ) (j[k] ) • Theorem:
  • 25. 5 the polynomial space is N = 125, 751. The left of Fig. 4.13 is for the results in bounded domain by Legendre polynomials, and the right is for unbounded domain by Hermite polynomials. Again, we observe the expected exponential convergence and its agreement with the theoretical convergence. Note that in this case the tensor quadrature grids ⇥M consists of M = 3500 ⇡ 3.6⇥10238 points. A number too large to be handled by most computers. However, the current randomized tensor quadrature method converges after ⇠ 1.2 ⇥ 106 steps, following the ⇠ 10N convergence rule and using only a tiny portion, ⇠ 1/10232 , of the tensor quadrature points. 0 500K 1000K 1500K 2000K 10 −2 10 −1 10 0 Numerical Theoretical rate 0 500K 1000K 1500K 2000K 10 −1 10 0 Numerical Theoretical rate Fig. 4.13. Function approximation errors versus number of iterations of n = 2 at d = 500. Left: Legendre polynomials in [ 1, 1]500 ; Right: Hermite polynomials in R500 . The target function is f5 in (4.2) with i = 0. High Dimension: d=500 • Target function: f4(x) = i=1 2 i + i 2 i ; (PRODUCT PEAK). where = [ 1, · · · , d] are parameters controlling the di culty of the functions, and = [ 1, · · · , d] are shifting parameters. We also consider the following two function f5(x) = cos kx k2 , f6(x) = 9 5 4 cos ⇣Pd i=1 xi ⌘, (4.2 13 • Polynomial degree: n=2. N = 125,751. • Total number of tensor points: M ~ 3.6 x 10238 Legendre in bounded domain Hermite in unbounded domain • Method converges after 1.2 x 106 steps. • Uses only a tiny portion of the full tensor points.
  • 26. 0 2K 4K 6K 8K 10K 10-8 10-6 10-4 10-2 100 Halton Sequence Sobol Sequence 2D Fibonacci lattice rules Uniform Samples Chebyshev Samples Optimal Samples 0 1M 2M 3M 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 Halton Sequence Sobol Sequence Uniform Samples Chebyshev Samples Optimal Samples Convergence with QMC Grids d=2, n=10 d=10, n=8
  • 27. Summary • Sequential approximation method • Motivated by randomized Kaczmarz method • Optimal sampling measure • Highly efficient algorithm using tensorized quadrature • Nearest neighbor replacement algorithm for arbitrary data sets • Reference: • Shin and Xiu, A randomized method for multivariate function approximation, SIAM J. Sci. Comput. • Wu, Shin and Xiu, Randomized tensor quadrature method for high dimensional polynomial approximation, SIAM J. Sci. Comput. • Wu and Xiu, Sequential function approximation on arbitrarily distributed point sets, J. Comput. Phys.