Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Sequential Function Approximation in High Dimensions with Big Data - Dongbin Xiu, Aug 30, 2017

Sequential Function Approximation
In High Dimensions for Big Data
Dongbin Xiu
Collaborators: Yeonjong Shin, Kailiang Wu
Department of Mathematics
Ohio State University

Function Approximation: Setup
• Consider an unknown target function f(x), x 2 Rd
f = (f(x1), . . . , f(xm))• Let be the function samples
• Construct an approximation
ef(x) ⇡ f(x)
• Standard approach:
• Choose a linear space
• Choose a set of basis function (for example, polynomials)
i(x), i = 1, . . . , n
• Let
• Determine the coefficients c = (c1, . . . , cn)
ef(x) =
nX
j=1
ci j(x)

ef(X) =
MX
i=1
ci i(X) (= c0 + c1X + c2X2
+ · · · )
ef(xj) = f(xj)
Ac = f
Example: Polynomial Interpolation
• Interpolation condition:
• Linear system of equations:
M A c f=
N
M = N
1 X X2 X3

Overdetermined Case
A
c f≈M
N
• How “big” is big ?
• KB à MB à GB à TB à PB à EBà ZB à YB à ?
• Big = infinity-B
• M à infinity
• Always an overdetermined system
• “Easy” case: Least squares works mathematically
• Allows larger N for complex models
• Cost: O(MN2)
FYI: Peta Exa Zetta Yotta

Sequential Approximation: Framework
• Sequential: Constructing approximation using one data point at a time
• Data: fk = f(xk) + ✏k, k = 1, . . .
E(✏k) = 0, E(✏2
k) = 2
(xk)  2
max < 1
• Starting with an arbitrary initial approximation at k=0.
• At k-th data point, we seek to “minimally interpolate the current data”
ef(k)
(x) = argmin
p2V
✓
p(x) ef(k 1)
(x)
2
L2
!
◆
, subject to p(xk) = fk
ef(k)
(x) = argmin
p2V
✓
p(x) ef(k 1)
(x)
2
L2
!
+
1
k
|p(xk) fk|2
◆
Or, in grneral, for noisy data: γk ~ O(𝜎)

Sequential Approximation: Algorithm
• Starting with an initial coefficient vector c(0)
• Draw i.i.d. random samples from D with probability measure ν
{Xk}k=1,... ⇠ ⌫
And collect data {f(Xk)}k=1,...
• Assuming orthonormal basis:
• Algorithm:
• Remarks:
• Only vector operations. No matrices needed
• Can handle stream data
• Motivation: Randomized Kaczmarz for Ax = b (Strohmer & Vershynin, 2009)
• Question: Convergence? If so, accuracy?
• Compute
c(k)
= c(k 1)
+
f(xk) hc(k 1)
, (xk)i
k (xk)k2
2 + k
(xk), k = 1, . . . ,
ef(x) =
NX
j=1
cj j(x) = h (x), ci

Convergence
• Theorem: Upper and lower bounds for the general sampling measure (skipped)
• Remarks:
• The rate is optimal --- can not be improved
• This is not an error bound. It is an error equality.
• Sampling can be done by, e.g., MCMC.
• Theorem (limiting case γk=0): Let the sampling probability measure be
d⌫(x) =
0
@ 1
N
NX
j=1
2
j (x)
1
A d!(x)
Ekc(k)
ˆck2
2 = kf Pfk2
! + rk
kˆck2
2 kf Pfk2
! , r = 1
1
N
Then the following holds
Ekf ef(k)
k2
! = 2kf Pfk2
! + rk
kPfk2
! kf Pfk2
! , r = 1
1
N

GAUSSIAN: f = exp
" dX
i=1
c2
i
✓
xi + 1
2
↵i
◆2
#
CONTINUOUS: f = exp
" dX
i=1
ci
xi + 1
2
↵i
#
CORNER PEAK: f =
"
1 +
dX
i=1
xi + 1
2i2
# (d+1)
PRODUCT PEAK: f =
dY
i=1
"
c 2
i +
✓
xi + 1
2
↵i
◆2
# 1
Test Functions
• “Standard” functions from Genz (84)
• In two dimensions (d=2): Franke function

Numerical Tests
0 500K 1,000K 1,500K 2,000K 2,500K 3,000K 3,500K
10-5
10-4
10-3
10-2
10-1
100
n=5
n=6
n=7
n=8
n=9
• Target function: GAUSSIAN, d=10
Approximation errors at degree n

1 2 3 4 5 6 7 8 9 10
10-6
10-5
10-4
10-3
10-2
10-1
100
d=10
10 7.9 ⇥ 10 3
7.6 ⇥ 10 3
20 2.2 ⇥ 10 3
2.2 ⇥ 10 3
20 2.2 ⇥ 10 3
2.2 ⇥ 10 3
30 1.0 ⇥ 10 3
1.0 ⇥ 10 3
Table 4.1
Numerical rate of convergence snum (4.3) against the optimal rate of convergence sopt (4.4) in
two dimension d = 2, for the test functions (4.1) at di↵erent polynomial degree.
urate at lower levels for higher degree polynomials, as expected. The numerical errors
in the converged results are plotted in Fig. 4.4, with respect to the di↵erent degree of
polynomials. One clearly observe the exponential convergence of the approximation
error at increasing degree of polynomials. This is expected for this smooth target
function. The number of iterations m to achieve the converged solutions is tabulated
in Table 4.2. For reference, we also tabulate the cardinality of the polynomial space,
p, at di↵erent polynomial degree. In this case, the number of iterations m is also the
number of function data. These results indicate that the proposed RK method is in-
deed suitable for “big data” problems, where a large amount of data can be collected.
At high degree n 9, the proposed RK method never requires the formation of the
model matrix, which would be of size O(106
⇥105
) and can not be easily handled. In-
stead the method requires only operations of the row vectors, whose size is O(105
⇥1).
All the computations here were performed on a standard desktop computer with Intel
i7-4770 CPU at 3.40GHz and with 24.0GB RAM.
d = 10 n = 1 n = 2 n = 3 n = 4 n = 5
p 11 66 286 1,001 3,003
m 10,000 10,000 10,000 26,000 80,000
n = 6 n = 7 n = 8 n = 9 n = 10
p 8,008 19,448 43,758 92,378 184,756
m 220,000 500,000 1,700,000 3,000,000 7,000,000
Table 4.2
The number of iterations (m) used to reach the converged solutions in Fig. 4.3, along with the
cardinality of the polynomial space (p) at dimension d = 10. The target function is the GAUSSIAN
Converged Solutions
• Complexity:
N = p
M = m

0 5M 10M 15M 20M 25M
10-2
10-1
n=2
n=3
n=4
num opt
ten dimension d = 10 for the GAUSSIAN function f1 in (4.1) with ci = 1 and wi = 0.5.
p of the polynomial space. Again, all the simulations were conducted on the same
desktop computer.
0 1M 2M 3M 4M 5M 6M 7M
10-3
10-2
10-1
100
n=3
n=4
n=5
0 5M 10M 15M 20M 25M
10-2
10-1
n=2
n=3
n=4
Fig. 4.5. Error convergence with respect to the iteration count for the GAUSSAN function f1
with ci = 1 and wi = 0.5 by Legendre polynomials. Left: d = 20; Right: d = 40.
d = 20 n = 1 n = 2 n = 3 n = 4 n = 5
p 21 231 1,771 10,626 53,130
m 10,000 50,000 200,000 900,000 4,000,000
d = 40 n = 1 n = 2 n = 3 n = 4 n = 5
p 41 861 12,341 135,751 1,221,759
m 1,000 100,000 1,500,000 20,700,000 —
Table 4.4
The number of iterations (m) used to reach the converged solutions in Fig. 4.5, along with the
High Dimensions
• Legendre approximation at d=40
• Complexity:
N = p
M = m

Irregular Data Set
−1 −0.5 0 0.5 1
−1
−0.5
0
0.5
1
⇥ = {xj}M
j=1
• Fact: In practice, data sets rarely follow optimal measure

Nearest Neighbor Replacement
At each step k=1, 2, …
• Draw a sample 𝜉k~𝜇
• Find the nearest neighbor to 𝜉k inside ϴ
kxik
⇠kk2  kxj ⇠kk2 , 8xj 2 ⇥
• Conduct SA update at the selected point xik
⇥ = {xj}M
j=1
Properties:
• Still only vector operations
• Additional cost of finding the nearest neighbor: 2-norm, sorting
• The size of the data set M can vary

−1 −0.5 0 0.5 1
−1
−0.5
0
0.5
1
NNR: Theoretical Justification
• The point set defines a Voronoi tessellation
Di = x 2 D kx xik2 < kx xjk2 for any xj 6= xi
¯D =
M[
i=1
¯Di
• The NNR creates a discrete measure to sample the points
⇠k ⇠ µ ! xik
⇠ ˆµ
• The discrete sampling measure is a weak approximation of 𝜇
Z
D
g(x)dˆµ
Z
g(x)dµ 
✓
sup
x2D
krg(x)k2
◆
= max
1iM
diam(Di)
• Main Result: If 𝜇 is the optimal sampling measure, then the NNR on ϴ converges
lim
k!1
E
✓
p(k)
f
2
L2
!
◆
= 2 kf PV fk
2
L2
!
+ O( )

Numerical Tests
GAUSSIAN: f = exp
" dX
i=1
c2
i
✓
xi + 1
2
↵i
◆2
#
CONTINUOUS: f = exp
" dX
i=1
ci
xi + 1
2
↵i
#
CORNER PEAK: f =
"
1 +
dX
i=1
xi + 1
2i2
# (d+1)
PRODUCT PEAK: f =
dY
i=1
"
c 2
i +
✓
xi + 1
2
↵i
◆2
# 1
• “Standard” functions from Genz (84)
• Sampling methods:
• Uniform probability (SA: uniform)
• Optimal measure + NNR (SA: NNR)
• Randomized Kaczmarz (RK)
• For linear system of equations: Ax = b
• Strohmer and Vershynin, 2009
• Randomly sample the rows with probability / k (xi)k2

−1 −0.5 0 0.5 1
−1
−0.5
0
0.5
1
−1 −0.5 0 0.5 1
−1
−0.5
0
0.5
1
−1 −0.5 0 0.5 1
−1
−0.5
0
0.5
1
−1 −0.5 0 0.5 1
−1
−0.5
0
0.5
1
ϴ1
ϴ3
ϴ2
ϴ4M=20,000

0 5K 10K 15K 20K
10
−8
10
−6
10
−4
10
−2
10
0
RK
SA: Uniform
SA: NNR
0 5K 10K 15K 20K
10
−3
10
−2
10
−1
10
0
RK
SA: Uniform
SA: NNR
0 5K 10K 15K 20K
10
−8
10
−6
10
−4
10
−2
10
0
RK
SA: Uniform
SA: NNR
0 5K 10K 15K 20K
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
10
1
RK
SA: Uniform
SA: NNR
f1 on ϴ1 f2 on ϴ2
f3 on ϴ3 f4 on ϴ4
Error Convergence with n=10 polynomials

0 100K 200K 300K 400K 500K
10
−4
10
−3
10
−2
10
−1
10
0
RK
SA: Uniform
SA: NNR
High dimensional (d=10) Dynamic Data Set
• Domain: D=[-1,1]d, D1= |x| <0.5 containing 99% points, D2=DD1
• Dynamic set: Initial M=106, Poisson distribution 𝜆=2,000 every 100 steps

Back to Regular Grids
• Cardinality of polynomial space: N =
✓
n + d
d
◆
=
(n + d)!
n!d!
.
• Question: What is the (almost) best points for polynomial approximation in d=1?
• Answer: Gauss quadrature
• Most straightforward extension for d>1: tensorized Gauss points
• Advantage: retain all the good mathematical properties from d=1
• Disadvantage: Fastest growth of the number of points
• Let m be the number of points in each dimension
the total number of points: M = md
• Worst choice for d>>1.
• Our approach for d>>1:
• Let us use the least likely choice: tensorized quadrature points
• Apply the sequential approximation algorithm
• And let’s see what happens …

Main Algorithm
• Starting with an initial coefficient vector c(0)
• Randomly draw a point, from the M tensor Gauss points, with probability
where
d
n := span{xk
= xk1
1 · · · xkd
d , |k|1  n}
is the tensor product of the one-dimensional polynomial space ⇧1
n. Since ⇧d
n ✓ d
n for
any d 1 and n 0, the 2n polynomial exactness (3.5) holds for all f 2 ⇧d
2n.
Other one-dimensional choices such as Gauss-Labatto rule, Gauss-Radau rule can
certainly be used. The number of points m in each dimension may be di↵erent.
3.1.2. A Randomized Kaczmarz iteration. With the tensor quadrature points
defined, we proceed to apply the following randomized Kaczmarz iteration.
For any tensor quadrature point z(j)
2 ⇥M , define the sampling probability
p⇤
j = w(j) k (z(j)
)k2
2
N
, j = 1, . . . , M, (3.6)
which satisfies
PM
j=1 p⇤
j = 1 by the 2n polynomial exactness of the tensor quadrature.
Using the vector notation (2.7) and setting an initial choice of c[0]
= 0, one then
computes, for k = 0, 1, . . . ,
c[k+1]
= c[k]
+
f(z(j[k]
)
) hc[k]
, (z(j[k]
)
)i
k (z(j[k]))k2
2
(z(j[k]
)
), z(j[k]
)
⇠ dµp⇤ , (3.7)
where dµp⇤ denotes the probability measure corresponding to (3.6), i.e.
dµp⇤ :=
MX
j=1
p⇤
j x z(j)
dx, (3.8)
and (x) is the Dirac delta function.
is the tensor product of the one-dimensional polynomial space ⇧1
n. Since ⇧d
n
any d 1 and n 0, the 2n polynomial exactness (3.5) holds for all f 2 ⇧d
2n
Other one-dimensional choices such as Gauss-Labatto rule, Gauss-Radau
certainly be used. The number of points m in each dimension may be di↵ere
3.1.2. A Randomized Kaczmarz iteration. With the tensor quadrat
defined, we proceed to apply the following randomized Kaczmarz iteration.
For any tensor quadrature point z(j)
2 ⇥M , define the sampling probabi
p⇤
j = w(j) k (z(j)
)k2
2
N
, j = 1, . . . , M,
which satisfies
PM
j=1 p⇤
j = 1 by the 2n polynomial exactness of the tensor qua
Using the vector notation (2.7) and setting an initial choice of c[0]
= 0,
computes, for k = 0, 1, . . . ,
c[k+1]
= c[k]
+
f(z(j[k]
)
) hc[k]
, (z(j[k]
)
)i
k (z(j[k]))k2
2
(z(j[k]
)
), z(j[k]
)
⇠ dµp⇤ ,
where dµp⇤ denotes the probability measure corresponding to (3.6), i.e.
dµp⇤ :=
MX
j=1
p⇤
j x z(j)
dx,
and (x) is the Dirac delta function.
The implementation of the algorithm is remarkably simple. One random
a point from the tensor quadrature set ⇥M using the discrete probability (3
• Compute
• The sampling discrete probability is not a common one.
• But it can be sampled efficiently using the tensor structure.
• Consider M=(n+1)d tensor product grids of Gauss-Legendre quadrature of (n+1)

Error Analysis
[f, g]w :=
j=1
w f(z )g(z ),
for functions f and g. The corresponding induced discrete norm is denoted by k · k2
w.
Theorem 3.1. Assume c[0]
= 0. The k-th iterative solution of the algorithm
(3.7) satisﬁes
Ekc[k]
ˆck2
2 = kf P⇧fk2
w + E + rk
(kP⇧fk2
w kf P⇧fk2
w E), (3.9)
where
E = 2 [f P⇧f, P⇧f]w hˆc, ei , r = 1 1/N,
and e = ˜c ˆc with ˜cj := [f, j]w. And,
lim
k!1
Ekc[k]
ˆck2
2 = kf P⇧fk2
w + E = kfk2
w k˜ck2
2 + kek2
2. (3.10)
Furthermore, the resulting approximation ef[k]
= hc[k]
, (x)i satisﬁes
Ek ef[k]
fk2
L2
!
= kf P⇧fk2
L2
!
+kf P⇧fk2
w+E+rk
(kP⇧fk2
w kf P⇧fk2
w E). (3.11)
And,
lim
k!1
Ek ef[k]
fk2
L2
!
= kf P⇧fk2
L2
!
+ kf P⇧fk2
w + E. (3.12)
The proof is in the next section.
Theorem 3.1, as an equality, gives an estimation of the numerical errors of the
proposed algorithm (3.7). The expectation E in (3.9) shall be understood as then o
• Error equality;
• Rate can not be improved.

Computational Complexity
• Convergence criterion: Let the total number of iteration K = γ N, then error decay
2, . . . , d, and can be trivially realized.
3.4.2. Convergence criterion. The convergence rate (1 1/N) in Theorem
3.1 is in the form of equality. This provides us a very sharp estimate of convergence
criterion. Let K be the total number of iterations for the main algorithm (3.7). Let
K = N, where is constant independent on N. Then we derive
✓
1
1
N
◆K
= exp
✓
N ln
✓
1
1
N
◆◆
= exp
1X
i=1
1
i
1
Ni 1
!!
= exp
✓
+ O
✓
1
N
◆◆
⇡ e , if N 1.
According to Theorem 3.1, this implies that the square of iteration error, the last
term in (3.11), becomes roughly e times smaller. For example, when = 5, e ⇡
6.7⇥10 3
; when = 10, e ⇡ 4.5⇥10 5
. In most problems, = 10 can be su cient.
Our extensive numerical tests verify that K ⇠ 10N is a good criterion for accurate
solutions. On the other hand, suppose one desires the iteration error to be at certain
small level ✏ ⌧ 1, then the iteration can be stopped at ⇠ log(✏)N steps.
In high dimensions d 1, the total number of tensor quadrature points M = (n+
1)d
grows exceptionally fast. It is much bigger than the cardinality of the polynomial
11
• A good stopping criterion: γ = 10, e-γ = 4.5 x 10-5
• Complexity:
• Memory storage: O(d x N) real numbers
• Each iteration is O(d x N) flops
• At high dimensions, N ~ dn >> d. Also assume K = γ N
• Storage: O(N) real numbers, and O(KN) = O(N2) flops
• Tensor structure of the points is the key.
• For reference: Suppose we solve the regression using least square with J>N points
• Storage: O(J x N) = O(N2) real numbers
• Flops: O(J x N2) = O(N3)

0 5K 10K 15K 20K
10
−8
10
−7
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
Theoretical rates
n=10
n=15
n=20
0 5K 10K 15K 20K
10
−3
10
−2
10
−1
10
0
Theoretical rates
n=10
n=20
n=40
0 5K 10K 15K 20K
10
−12
10
−10
10
−8
10
−6
10
−4
10
−2
10
0
Theoretical rates
n=5
n=10
n=20
0 5K 10K 15K 20K
10
−14
10
−12
10
−10
10
−8
10
−6
10
−4
10
−2
10
0
Theoretical rates
n=10
n=15
n=20
f1
f2
f3
f4
Unbounded Domain with Hermite Polynomial (d=2)

Verification: Legendre Approximation in d=10
• Let us take a target function: GAUSSIAN f1
0 500K 1000K 1500K 2000K
10
−4
10
−3
10
−2
0 500K 1000K 1500K 2000K
10
−4
10
−3
10
−2
Fig. 4.4. Coe cient errors (left) and function approximation errors (right) versus num-
ber of iterations by Legendre polynomials of di↵erent degree n at d = 10. The target function
is the GAUSSIAN function f1 in (4.1) with i = 1 and i = 0.375.
0 100K 200K 300K 400K
10
−4
10
−3
10
−2
10
−1
10
0
Numerical
Theoretical
0 200K 400K 600K 800K 1000K 1200K
10
−4
10
−3
10
−2
10
−1
10
0
Numerical
Theoretical
n=7 n=8
1
The exponential convergence of the errors with respect to the iteration count can be
clearly observed. For the polynomial degrees n = 5, 6, 7, 8, 9, the cardinality of the
polynomial space is N = 3, 003, 8, 008, 19, 448, 43, 758 and 92, 378, respectively. We
observe from Fig. 4.4 that the ⇠ 10N iteration count is indeed a good (and quite
conservative) criterion for converged solution.
Furthermore, for this separable function, all the terms in the theoretical conver-
gence formula (3.11) in Theorem 3.1 can be accurately computed. We thus obtain
n = 7 : E ef[k]
f
2
L2(!)
' 6.1279 ⇥ 10 5
+ 172.5288 ⇥
✓
1
1
19448
◆k
,
16
• Estimates of the theoretical convergence:
0 5K 10K 15K 20K
10
−3
10
n=20
n=30
n=40
0 5K 10K 15K 20K
10
−3
10
n=20
n=30
n=40
Fig. 4.3. Function approximation errors versus number of iterations by trigonometric
polynomials for four of test functions in (4.1) and (4.2) at d = 2. Top left: f1 with i = 1
and i = ⇡+1
2
; Top right: f2 with i = 1 and i = ⇡+1
2
; Bottom left: f4 with i = 2 and
i = ⇡+1
2
; Bottom right: f6.
and
n = 8 : E ef[k]
f
2
L2(!)
' 2.8532 ⇥ 10 6
+ 172.5289 ⇥
✓
1
1
43758
◆k
.
We plot these “theoretical curves” in Fig. 4.5, along with the numerical convergence
over a single simulation. We observe that they agree with each other very well and
the di↵erence is indistinguishable.
For higher dimensions, we present the results of d = 40 in Fig. 4.6. As the com-
plexity of the problem grows exponentially in higher dimensions, we conﬁne ourselves
to polynomial degree n  4. Again, the “theoretical convergence curves” agree very
well with the actual numerical convergence.
We now examine the convergence rate with di↵erent sampling probability. As
Theorem 3.2 in Section 3.2 indicates, the randomized tensor quadrature method shall
converge via any proper discrete sampling probability, as in the general algorithm
(3.14). However, the optimal sampling probability (3.6) in our main algorithm (3.7)
shall give us the optimal rate of convergence. This can be clearly seen in Fig. 4.7
and 4.8, where the numerical convergence by the optimal sampling probability (3.6)
and by the discrete uniform probability are shown. Fig. 4.7 is for the GAUSSIAN
17
lim
k!1
Ekc ˆck2 = kf P⇧fkw + E = kfkw k˜ck2 + kek2. (3.10)
Furthermore, the resulting approximation ef[k]
= hc[k]
, (x)i satisﬁes
Ek ef[k]
fk2
L2
!
= kf P⇧fk2
L2
!
+kf P⇧fk2
w+E+rk
(kP⇧fk2
w kf P⇧fk2
w E). (3.11)
And,
lim
k!1
Ek ef[k]
fk2
L2
!
= kf P⇧fk2
L2
!
+ kf P⇧fk2
w + E. (3.12)
The proof is in the next section.
Theorem 3.1, as an equality, gives an estimation of the numerical errors of the
proposed algorithm (3.7). The expectation E in (3.9) shall be understood as the
expectation over the random sequence
n
z(j[`]
)
o
0`k 1
of the algorithm. The con-
vergence rate in expectation is 1 1/N, which is the optimal convergence rate for
randomized Kaczmarz methods.
3.2. General algorithm. Instead of using the discrete probability p⇤
j in (3.6),
the randomized iteration (3.7) in our main algorithm can be carried out by any discrete
probability. This results in a more general algorithm.
Let
dµp :=
MX
j=1
pj x z(j)
dx, (3.13)
be a general discrete probability measure, where pj is any discrete probability mass
satisfying
PM
j=1 pj = 1. Then the same iteration (3.7) can be adopted with the
measure dµp, i.e.,
[k+1] [k] f(z(j[k]
)
) hc[k]
, (z(j[k]
)
)i (j[k]
) (j[k]
)
• Theorem:

5
the polynomial space is N = 125, 751. The left of Fig. 4.13 is for the results in
bounded domain by Legendre polynomials, and the right is for unbounded domain
by Hermite polynomials. Again, we observe the expected exponential convergence
and its agreement with the theoretical convergence. Note that in this case the tensor
quadrature grids ⇥M consists of M = 3500
⇡ 3.6⇥10238
points. A number too large to
be handled by most computers. However, the current randomized tensor quadrature
method converges after ⇠ 1.2 ⇥ 106
steps, following the ⇠ 10N convergence rule and
using only a tiny portion, ⇠ 1/10232
, of the tensor quadrature points.
0 500K 1000K 1500K 2000K
10
−2
10
−1
10
0
Numerical
Theoretical rate
0 500K 1000K 1500K 2000K
10
−1
10
0
Numerical
Theoretical rate
Fig. 4.13. Function approximation errors versus number of iterations of n = 2 at d = 500.
Left: Legendre polynomials in [ 1, 1]500
; Right: Hermite polynomials in R500
. The target
function is f5 in (4.2) with i = 0.
High Dimension: d=500
• Target function:
f4(x) =
i=1
2
i +
i
2
i ; (PRODUCT PEAK).
where = [ 1, · · · , d] are parameters controlling the di culty of the functions, and
= [ 1, · · · , d] are shifting parameters. We also consider the following two function
f5(x) = cos kx k2 , f6(x) =
9
5 4 cos
⇣Pd
i=1 xi
⌘, (4.2
13
• Polynomial degree: n=2. N = 125,751.
• Total number of tensor points: M ~ 3.6 x 10238
Legendre in bounded domain Hermite in unbounded domain
• Method converges after 1.2 x 106 steps.
• Uses only a tiny portion of the full tensor points.

0 2K 4K 6K 8K 10K
10-8
10-6
10-4
10-2
100
Halton Sequence
Sobol Sequence
2D Fibonacci lattice rules
Uniform Samples
Chebyshev Samples
Optimal Samples
0 1M 2M 3M
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
Halton Sequence
Sobol Sequence
Uniform Samples
Chebyshev Samples
Optimal Samples
Convergence with QMC Grids
d=2, n=10
d=10, n=8

Summary
• Sequential approximation method
• Motivated by randomized Kaczmarz method
• Optimal sampling measure
• Highly efficient algorithm using tensorized quadrature
• Nearest neighbor replacement algorithm for arbitrary data sets
• Reference:
• Shin and Xiu, A randomized method for multivariate
function approximation, SIAM J. Sci. Comput.
• Wu, Shin and Xiu, Randomized tensor quadrature method for
high dimensional polynomial approximation, SIAM J. Sci. Comput.
• Wu and Xiu, Sequential function approximation on arbitrarily
distributed point sets, J. Comput. Phys.

Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Sequential Function Approximation in High Dimensions with Big Data - Dongbin Xiu, Aug 30, 2017

More Related Content

What's hot (20)

Similar to Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Sequential Function Approximation in High Dimensions with Big Data - Dongbin Xiu, Aug 30, 2017 (20)

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded (20)

Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Sequential Function Approximation in High Dimensions with Big Data - Dongbin Xiu, Aug 30, 2017