Master Thesis Slides: Topic Development of Methods for Deep Neural Network Architectures Optimization based on Tensor Factorization Algorithms

Development of Methods for Deep Neural
Network Architectures Optimization based on
Tensor Factorization Algorithms
Supervisor: Revin Ilya Evgenyevich, research associate,
Laboratory for Composite AI, Research Center “Strong AI in Industry”
Presented by: Zakharov Denis, J4232c

Pain: Current deep networks have excessive parameters which results in
long training time and expensive storing wights production.
Hypothesis: With the use of optimization method we can reduce size of
stored model, increase throughput.
Work: Development of Methods for Deep Neural Network Architectures
Optimization based on Tensor Factorization Algorithms
Problem
2

Purpose and objectives of study
Goal
Development of Methods for Deep Neural Network Architectures Optimization based on Tensor
Factorization Algorithms
Objectives
• Make a Literature review of related field to provide a background
• Make a scientific research of tensor Algorithms (Optimization, Operations)
• Make a scientific research of LoRA approach
• Perform experiment with TS Models
• Based on these findings propose a product that can be integrated into AutoML solution
• Develop Optimization Method
• Contribute to Fedot.Industrial framework
3

BERT 340M T5 11B GPT-3 175B
Megatron-Turing 530B
GPT-4 1.76T
Gemini Pro ≈30T
Gemini Ultra ≈60T
2018 2019 2020 2021 2022 2023 2024
Model Trend
Modern Networks
4
Number
of
parameters

Tensor Decomposition
5
!
≈ =
!!
"!
#!
"
!
"#$
$
%
&
!
≈
!!
!"
!#
!!
!"
!#
!
!!
≈
""
#!
$!
!
!
"#$
! " # $!
= ∗ ∗
t-SVD (Singular Value Decomposition)
CANDECOMP/PARAFAC
Tucker Block Term

Low-rank decomposition
6
𝑀 𝑅!
"
𝐿!
≈ ×
𝑚×𝑘
𝑚×𝑛
𝑘×𝑛

Statement of Experiment
7
Check performance of TS models NBEATS, Transformer, ARIMA on M4
dataset for checking how would they predict timeseries based on data.

Models
8
Transformer — state-of-the-art deep learning model introduced in
2017. It is an encoder-decoder architecture whose core feature is
the ‘multi-head attention’ mechanism
Linear Linear Linear
Scaled Dot-Product Attention
Concat
Linear
Multi-Head Attention
MatMul
SoftMax
Mask
Scale
MatMul
Scaled Dot-Product Attention
In Out
Multi-Head
Attention
Add & Norm
Feed
Forward
Add & Norm
Masked
Multi-Head
Attention
Add & Norm
Multi-Head
Attention
Add & Norm
Feed
Forward
Add & Norm
SoftMax
Linear
!
×
!
×

Models
9
NBEATS — deep learning-based approach for time series forecasting.
ARIMA — popular statistical model used to forecast future values in a time
series based on past values.
Block
FC Stack
FC FC
!!(#!) !"(#")
Backcast
Stack
Forecast
Stack
Block 1
Block 2
Block K
…
Forecast
Global
Forecast
Stack 1
Stack 2
Stack M
…

M4 Dataset
10
The M4 dataset is a collection of 100,000 time series used for the fourth
edition of the Makridakis forecasting Competition.
Consists of time series of:
• Yearly — 63 avg training length
• Quarterly — 125
• Monthly — 302
• Weekly — 2035
• Daily — 475
• Hourly — 682

Experiment - Monthly
11
ARIMA
NBEATS Transformer

Interpretation of results
12
On short series all three models performed well and suitable for predicting
However with longer range
• NBEATS perform better than others models
• Some models has a critical difference in predictions
• Longer Range all models not struggle
Training on just 50 epochs took:
• Almost an hour for NBEATS
• 20 min for Transformer
• 1 min for Arima

LoRA + rSVD
! ∈ #!×!
A = "(0, &!
)
B = 0
r
x
h
!!
"×$
%
$×&
'
"×&
=
(
"×"
)
"×"
*!
"×&
SVD
recover
(
$×"
=
(
+
"×"
!
$×"
13

Results NBEATS
Base
Model
LoRA
Layers
Model latency throughput
No LoRA 0.00108 4364808.0
LoRA Layer 0.00166 4159606.0
MS Default 0.00106 4748564.0
MS All 0.00108 4482670.0
MS LoRA 0.00105 5365210.0
14

Results Transformer
Model latency throughput
No LoRA 0.00085 9306328.0
LoRA Layer 0.00084 9782205.0
MS Default 0.00085 9572622.0
MS All 0.00085 9463282.0
MS LoRA 0.00084 8586088.0
0
0,01
0,02
0,03
0,04
0,05
0,06
0,07
Epoch
1
Epoch
2
Epoch
3
Epoch
4
Epoch
5
Epoch
6
Epoch
7
Epoch
8
Model Train With Early Stopping
Base LoRA
15

Resume
In this work:
• Provide an Experiment for
research of performance:
• NBEATS
• ARIMA
• Transformer
• Logically assumed that layers in
models could be replaced using
LoRa approach
• Implement logic as a master
thesis
15
TN diagrams of some popular decompositions
!(")
!($) !(%)
"(")
"($)
"(%)
"(&)
!! !" !# !$
!(",")
!(%,")
!(&,")
!(",%)
!(%,%)
!(&,%)
!(",&)
!(%,&)
!(&,&)
"" "% "&
"' "( ")
"* "+ ",
!(")
!"
!# !$
…
!($)
!(%)
%" %%&"
%$
!(")
!"
!# !$
…
!($)
!(%)
%" %%&"
%$
%'
!(")
"$
""
"%
…
!(%)
!($)
!!
!" !#
#
!(")
"$
""
"%
…
!(%)
!($)
#
!!
!" !#
!(")
"$
""
"% …
!(%)
!($)
#
!!
!"
!#
#(")

THANK YOU
FOR YOUR TIME!
@misterzurg

Master Thesis Slides: Topic Development of Methods for Deep Neural Network Architectures Optimization based on Tensor Factorization Algorithms

More Related Content

Similar to Master Thesis Slides: Topic Development of Methods for Deep Neural Network Architectures Optimization based on Tensor Factorization Algorithms (20)

More from Denis Zakharov (9)

Recently uploaded (20)

Master Thesis Slides: Topic Development of Methods for Deep Neural Network Architectures Optimization based on Tensor Factorization Algorithms