SlideShare a Scribd company logo
Indonesian Journal of Electrical Engineering and Computer Science
Vol. 25, No. 1, January 2022, pp. 223~231
ISSN: 2502-4752, DOI: 10.11591/ijeecs.v25.i1.pp223-231  223
Journal homepage: http://ijeecs.iaescore.com
A new function of stereo matching algorithm based on hybrid
convolutional neural network
Mohd Saad Hamid1,2
, Nurulfajar Abd Manap1
, Rostam Affendi Hamzah2
, Ahmad Fauzan Kadmin1,2
,
Shamsul Fakhar Abd Gani2
, Adi Irwan Herman3
1
Fakulti Kejuruteraan Elektronik dan Kejuruteraan Komputer, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia
2
Fakulti Teknologi Kejuruteraan Elektrik dan Elektronik, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia
3
Product & Test Engineering, Texas Instruments, Batu Berendam, Melaka, Malaysia
Article Info ABSTRACT
Article history:
Received Apr 23, 2021
Revised Nov 23, 2021
Accepted Nov 30, 2021
This paper proposes a new hybrid method between the learning-based and
handcrafted methods for a stereo matching algorithm. The main purpose of
the stereo matching algorithm is to produce a disparity map. This map is
essential for many applications, including three-dimensional (3D)
reconstruction. The raw disparity map computed by a convolutional neural
network (CNN) is still prone to errors in the low texture region. The
algorithm is set to improve the matching cost computation stage with hybrid
CNN-based combined with truncated directional intensity computation. The
difference in truncated directional intensity value is employed to decrease
radiometric errors. The proposed method’s raw matching cost went through
the cost aggregation step using the bilateral filter (BF) to improve accuracy.
The winner-take-all (WTA) optimization uses the aggregated cost volume to
produce an initial disparity map. Finally, a series of refinement processes
enhance the initial disparity map for a more accurate final disparity map.
This paper verified the performance of the algorithm using the Middlebury
online stereo benchmarking system. The proposed algorithm achieves the
objective of generating a more accurate and smooth disparity map with
different depths at low texture regions through better matching cost quality.
Keywords:
Convolutional neural network
Directional intensity
Stereo matching algorithm
Stereo vision
This is an open access article under the CC BY-SA license.
Corresponding Author:
Mohd Saad Hamid
Fakulti Teknologi Kejuruteraan Elektrik dan Elektronik, Universiti Teknikal Malaysia Melaka
Hang Tuah Jaya, 76100 Durian Tunggal, Melaka, Malaysia
Email: mohdsaad@utem.edu.my
1. INTRODUCTION
Recent years have seen a rise in the number of significant advancements in the stereo vision area. As
one of the active topics under stereo vision, the stereo matching problem still appears to be actively discussed
among researchers around the globe today [1]. One of the objectives in stereo matching algorithm
development is to produce a disparity or depth map. The map contains the disparity values between pixel
locations of matching features in the left and right images. The map can provide the depth information
between the camera and object. This information is valuable for applications such as three-dimensional (3D)
reconstruction, obstacle avoidance, robotics, and navigation. The stereo vision approach also has been used to
generate the 3D facial image for facial recognition [2].
Besides the stereo-based approach, as mentioned by [3], the single view or monocular-based method
can also produce depth-related information. This approach can generate a disparity map using a single image.
However, as [3] and [4] pointed out, the monocular depth method was less accurate than the stereo-based
approach. It is because the multi-view contains a broader amount of information compared to the single-view
 ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 223-231
224
approach. Another merit for the stereo-based method, as mentioned by [4], is the capability to outperform the
monocular method in recovering the objects of interest.
Another method for generating depth-related information is light detection and ranging (LiDAR).
LiDAR uses the laser approach to sense the depth and perform the depth measurement. LiDAR method can
provide accurate 3D points [4]. However, this method is not very cost-effective and time-consuming [5]. Due
to the highlighted shortcomings in monocular and LiDAR methods, we are inspired to continue working on
the stereo-based method and propose our algorithm.
Figure 1 illustrates the crucial steps for the stereo vision algorithm. The formalization of the
algorithm steps is related to the previous literature [6]-[9]. The first step is matching cost computation. Some
researchers utilize the directional intensity difference to compute matching costs [10]. The calculation of
absolute and squared differences is also commonly used because it requires low computational complexity.
The next step, cost aggregation, can decrease the errors in the initial matching cost. The edge-preserving
filters such as bilateral filter (BF) [11] and guided filter (GF) [12] can achieve the purpose. These filters
maintain a good edge while smoothing the input. Thus it provides better results in the aggregation step than
the low pass filters (Gaussian and box filter). The third step is in charge of assigning a value to the disparity
map. Winner-take-all (WTA) optimization is the most popular method for this step for the local approach. In
WTA, the smallest cost value disparity will be selected for every pixel location [7] to generate an initial
disparity map. The third step’s initial disparity may still contain errors caused by occlusions, low textures,
and invalid matches [13]. So, the final step may include multiple post-processing steps to refine the map. The
authors of [12] used the left-to-right consistency (LRC) check process to identify the void matched pixels.
The median filter is also commonly used for local refinement [14]-[16]. It is another type of non-linear filter
for denoising purposes. Due to its complexity, this final step can add additional time to the overall process.
Figure 1. Stereo vision algorithm steps
Deep learning has become the driving force behind advancements in the computer vision field [17].
Deep learning utilizes artificial neural networks to learn a large amount of data [18] mentioned that
conventional stereo vision performance will be improved by incorporating deep learning for recognition
tasks. Implementation of deep learning on stereo vision can be divided into two approaches. The first
approach is the deep learning approach combined with a traditional handcraft algorithm matching cost with a
convolutional neural network (MC-CNN-acrt) [14]. MC-CNN-acrt architecture outperforms other stereo
matching conventional methods on Middlebury [19] benchmarking systems [14] construct and train their
convolutional neural network (CNN) based network using binary classification to solve matching cost
computation steps. Another researcher in [20] proposed CNN-based similarity learning in Euclidean space
using a dot product. The method computes faster than MC-CNN, but the result is less accurate. CNN is also
used in stereo matching because of its feature extraction capabilities and vigorous radiometric difference [15].
The second approach is the pure deep learning end-to-end network. This approach does not require any
handcrafted algorithm. The end-to-end style deep learning network performs all stereo matching stages in
one combination network. Geometry and context network (GC-Net) by [21] uses 2D CNN to form cost
volumes and uses soft-argmin layer to regress the disparity values [22] introduce PSMNet, a faster and more
accurate disparity map than GC-Net.
The hybrid method between learning and handcraft algorithm was also introduced by [23] recently.
Their deep learning network produces a disparity map and predictions of uncertainties. They also propose a
handcrafted method to enhance their disparity map using a modified SGBM algorithm (SGBMP) by [23], [24]
introduces a hybrid model between CNN and the learning-based conditional random field (CRF) model to
form an end-to-end learning network. The performance of the hybrid and pure end-to-end methods will be
discussed further in section 3.
The author of [25] concluded, the end-to-end method still has some demerits in the ill-posed region
and is computationally expensive. Additionally, [14] also pointed out that the raw disparity map generated by
a CNN is prone to errors in the low texture and the occluded region. The problem with the low texture region
is also mentioned by [13], affecting the similarity measurement. Other researchers [26]-[28] also stated that
the low texture region is one of the challenging areas to cater to in stereo matching methods. So, in the
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 
A new function of stereo matching algorithm based on hybrid convolutional neural… (Mohd Saad Hamid)
225
following section of this paper, we will present our proposed stereo matching algorithm. This paper’s main
contribution is the hybrid converged classification CNN fused with directional intensity information for the
matching cost computation stage. The main objective is to produce a better disparity map by focusing on the
low texture region errors.
2. THE PROPOSED METHOD
As mentioned in the earlier section, we present our proposed algorithm as categorized in [6]. A
summary view of our algorithm is illustrated in Figure 2. Our proposed algorithm is comprised of four stages,
Figure 2. The proposed stereo vision algorithm steps
2.1. Matching cost computation
Our CNN-based model is inspired by the MC-CNN-acrt [14] architecture. However, we did improve
the architecture, as discussed in [29]. In this paper, we extend the improvement through our new findings. As
discussed in our previous work, the converged classification method in our Siamese-based CNN model [29]
has been used for the matching cost computation stage. We then fused it with additional directional intensity
differences to improve the accuracy of matching costs computed in this stage. We maintained the eight layers
of our CNN architecture as in our previous work. The network will produce a similarity score that provides a
binary classification of good or bad matching. Based on (1), CCNN reflects the cost value for all disparities d at
each pixel position p.
𝐶𝐶𝑁𝑁(𝑝, 𝑑) = −𝑠(𝑃𝑁
𝐿(𝑝), 𝑃𝑁
𝑅(𝑝 − 𝑑)) (1)
Input patches of NxN size from left and right image, 𝑃𝑁
𝐿
and 𝑃𝑁
𝑅
respectively supplied to the CNN.
The minus sign would translate the score of similarity to the initial matching cost. The matching cost values
obtained by (1) are also discussed in [29]. Another crucial part of this cost computation stage is the
directional intensity difference. The second part of the cost function is defined in (2).
𝐶𝐷𝐼(𝑝, 𝑑) = 𝛼 ⋅ 𝑚𝑖𝑛(∥ 𝐼𝐿(𝑝) − 𝐼𝑅(𝑝) ∥,𝜏1) + (1 − 𝛼) ⋅ 𝑚𝑖𝑛 (∥▽
𝑥
𝐼𝐿(𝑝) −▽
𝑥
𝐼𝑅(𝑝) ∥,𝜏2) (2)
Here 𝐼(𝑝) represent the colour vector of a pixel at position p. ▽x is the directional intensity
difference in the x-direction. Α balances the directional intensity difference values, while τ1 and τ2 are
truncation values. This paper improves the matching cost step from our previous work [29] by combining it
with the directional intensity difference-based cost volume, CDI, as in (3).
𝐶𝐼𝑀(𝑝, 𝑑) = 𝐶𝐷𝐼(𝑝, 𝑑) + 𝜆 ⋅ 𝐶𝐶𝑁𝑁(𝑝, 𝑑) (3)
The parameter λ will balance the cost volumes. The raw matching cost volume cost from this stage is defined
as CIM.
2.2. Cost aggregation
In the second stage, we refine the raw matching costs to create a more accurate disparity map. It is
because the initial cost volume generated from the previous stage is prone to noises. Therefore, we aggregate
cost volumes in this stage using a bilateral filter (BF). The BF is responsible for performing smoothing
 ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 223-231
226
operations while keeping the edges sharp on the matching costs. The BF [11] defined in (4) will aggregate
the raw matching cost volume, CIM.
𝐵𝐹[𝐼]𝑝 =
1
𝑊𝑝
∑ 𝐺𝜎𝑠
𝑞𝜖𝑆 (∥ 𝑝 − 𝑞 ∥)𝐺𝜎𝑟
(∣ 𝐼𝑝 − 𝐼𝑞 ∣) (4)
Where 𝑊p is the normalization factor. 𝐼𝑝 and 𝐼𝑞 represent the intensity at pixel position 𝑝 and 𝑞, respectively.
𝑝 represents the (𝑥, 𝑦) coordinates pixel of interest while the neighbouring coordinate 𝑞 is within support
region 𝑆. The aggregated cost is denoted as CAGG in (5) at the end of this step.
𝐶𝐴𝐺𝐺(𝑝, 𝑑) = 𝐵𝐹[𝐼]𝐶𝐼𝑀(𝑝, 𝑑) (5)
2.3. Disparity computation / optimization
Using winner-take-all (WTA) optimization, we computed the disparity map for this disparity
computation stage. We employ WTA optimization as defined in (6) to produce an initial disparity map, 𝑑(𝑝).
𝑑(𝑝) = argmin
𝑑𝜖𝑑𝑟
(𝐶𝐴𝐺𝐺(𝑝, 𝑑)) (6)
The sets of disparity values of 𝑑𝑟 obtained from the ground truth. The 𝑑(𝑝) is the chosen disparity value for
position 𝑑(𝑝), which represents 2D coordinates (𝑥, 𝑦).
2.4. Disparity refinement
This refinement stage consists of several steps. As discussed by [12], left-right consistency (LRC)
check and interpolation were also used to deal with occlusion and mismatches. Firstly, we employ LRC to
identify invalid pixels. Then the detected invalid pixel is replaced with a valid pixel using the hole-filling
process. After that, we implemented a guided image filter (GIF) due to its good edge-preserving capabilities.
The filter kernel for GIF is implemented in this paper as defined in (7).
𝐺𝑝,𝑞(𝐼) =
1
∣𝑤∣
2 ∑ (1 +
(𝐼𝑝−𝜇𝑘)(𝐼𝑞−𝜇𝑘)
𝜎𝑘
2+𝜀
)
𝑞𝜖𝑊𝑘
(7)
Where I the guidance image and p represent the (x; y) coordinates pixel of interest. Another pixel position, q,
denote the neighbouring pixel in the support region wk. Then, σ and µ are the variance and mean of the
intensity values, respectively. The parameter, 𝜀 is used as a control element for the smoothness term. The
refined disparity map using GIF is defined as 𝑑𝐺𝐹 in (8).
𝑑𝐺𝐹(𝑝) = 𝐺𝑝,𝑞(𝐼)𝑑(𝑝) (8)
The weighted median filter (WMF) was implemented to remove any existing outliers in the disparity
map. We use the following cosine similarity weight function as defined in (9) for the weighted median
filtering.
𝑊
𝑝
𝑐𝑜𝑠
=
𝐼𝑝⋅𝐼𝑞
∥𝐼𝑝∥⋅∥𝐼𝑞∥
(9)
𝑑𝑓(𝑝) = 𝑊
𝑝
𝑐𝑜𝑠
ℎ(𝑝)𝑑𝐺𝐹(𝑝) (10)
After the WMF is implemented, the final disparity map generated from the whole proposed
algorithm represented by 𝑑𝑓(𝑝) in (10). The following section will provide an overview of the proposed
method’s performance quantitatively and qualitatively,
3. RESULTS AND DISCUSSION
Several tests were executed on a personal computer (PC) platform to examine the proposed method’s
performance using Middlebury v3 datasets [19]. The main code runs on the Python platform, utilizing Keras
and Tensorflow library to perform the CNN architecture’s inference. The hardware used in this test is a PC with
Intel Core i5 3.0 GHz with 16 GB DDR3 RAM and the Nvidia GTX1060 GPU. We maintained the
hyperparameters of our CNN model, similar to our previous work. The image datasets for the tests were taken
from Middlebury online benchmarking system [19]. The results of the proposed algorithm are also compared
with other methods such as MC-CNN-acrt and MC-CNN-fst by [14], pyramid stereo matching network
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 
A new function of stereo matching algorithm based on hybrid convolutional neural… (Mohd Saad Hamid)
227
(PSMNet_ROB) by [22], hybrid CNN+CRF models by [24] (labelled as JMR), SGBMP by [23], MC-CNN-WS
by [30] and line segment based efficient large scale stereo matching (LS_ELAS) by [31]. Table 1 shows the
results obtained for All errors (error of invalid disparity values in all pixels). Another metric used is shown in
Table 2, which illustrates the NonOcc error percentage (error of invalid disparity values in non-occluded pixel).
Based on the All errors results in Table 1, the proposed algorithm framework outperforms the
original MC-CNN-acrt and other published methods. The proposed method can reduce the average errors
down to 9.29%. It can also perform better than another recently published deep learning method denoted as
SGBMP [23] and JMR [24]. However, there is a considerable difference between the SGBMP method’s
results and ours, especially on the Jadeplant, PianoL, Vintage and PlayTable images.
Based on the quantitative comparison of NonOcc error percentage in Table 2, the proposed method
performed moderately. However, the proposed method still performed better than the end-to-end network-
based method, PSMNet [22]. The proposed method generates output with 6.05% of errors, while PSMNet
produces 9.60%. Thus, except for the Vintage and Piano images, our proposed method outperforms PSMNet
in almost every image.
Table 1. Results from Middlebury benchmark - All error
Method
Proposed
JMR
SGBMP
MC-CNN-acrt
MC-CNN-fst
PSMNet_ROB
MC-CNN-WS
LS_ELAS
Adirondack 5.37 2.17 6.50 4.24 5.32 8.83 5.73 9.31
ArtL 8.36 18.00 9.33 18.70 19.20 13.90 20.50 5.90
Jadeplant 37.60 24.70 56.80 34.10 32.60 68.40 36.30 64.50
Motorcycle 7.24 5.98 5.04 7.21 8.75 8.26 9.39 7.24
MotorcycleE 7.29 6.90 5.43 7.22 8.83 9.16 9.37 7.65
Piano 6.63 6.14 4.77 6.00 8.12 5.89 8.13 6.25
PianoL 7.88 7.27 14.80 9.35 17.20 10.50 16.10 9.69
Pipes 11.40 11.00 7.85 13.50 15.50 14.40 16.70 12.80
Playroom 9.78 17.50 7.62 18.30 18.60 9.38 18.70 10.10
Playtable 4.72 8.18 10.60 9.71 13.60 5.54 11.50 23.90
PlaytableP 4.48 7.44 3.78 9.37 9.75 5.52 10.10 4.27
Recycle 3.79 2.96 3.19 4.64 5.00 4.98 5.05 7.39
Shelves 8.44 7.81 5.00 6.62 8.91 11.60 9.83 8.48
Teddy 3.64 8.98 3.35 9.31 10.40 3.87 11.00 2.98
Vintage 9.96 10.30 30.00 21.60 15.80 9.66 20.80 14.00
Average 9.29 9.57 11.20 11.80 12.80 13.30 13.70 12.90
Table 2. Results from Middlebury benchmark - NonOcc error
Method
JMR
MC-CNN-acrt
MC-CNN-fst
MC-CNN-WS
Proposed
SGBMP
PSMNet_ROB
LS_ELAS
Adirondack 0.92 0.76 1.21 1.66 3.23 3.87 7.32 8.46
ArtL 2.18 2.49 2.84 4.27 5.79 4.96 9.69 3.83
Jadeplant 6.01 16.30 10.00 12.80 19.00 29.30 44.50 41.10
Motorcycle 1.26 1.27 1.62 2.26 4.59 3.45 5.55 5.12
MotorcycleE 1.27 1.27 1.61 2.18 4.50 3.89 6.12 5.80
Piano 2.21 1.83 3.17 3.21 5.88 3.82 5.01 5.54
PianoL 4.03 5.07 13.20 11.70 7.31 14.40 9.82 8.97
Pipes 2.12 2.29 3.20 4.27 7.05 3.94 9.86 7.44
Playroom 1.94 2.27 3.13 3.49 6.75 5.09 7.33 8.76
Playtable 2.20 3.11 5.78 3.78 3.58 9.74 4.40 22.40
PlaytableP 1.65 3.03 2.97 3.31 3.51 2.70 4.43 3.47
Recycle 1.30 2.48 1.95 1.83 2.97 2.91 3.73 6.93
Shelves 5.51 4.41 6.26 7.02 7.39 4.64 11.10 8.26
Teddy 1.15 1.07 1.12 2.00 2.51 1.80 3.44 2.29
Vintage 3.73 14.80 9.16 14.30 8.08 26.10 8.07 13.10
Average 2.30 3.81 3.87 4.63 6.05 7.25 9.60 9.66
 ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 223-231
228
The qualitative comparison of the methods is illustrated in Figure 3. The ground truth image of
PlayTable image is shown in Figure 3(a). The PlayTable image pair (illustrated as left and right image in
Figure 3(b) and Figure 3(c) respectively) contains a low texture region in the floor area of images. Based on
Figure 3(d), the proposed method produces a smooth disparity map with different depths in the region with low
texture and plain colour. Qualitatively, the map produced is more accurate than the disparity map generated by
other methods such as SGBMP, JMR, MC-CNN-WS, MC-CNN-acrt and LS_ELAS (illustrated in Figure 3(e),
Figure 3(f), Figure 3(g), Figure 3(h) and Figure 3(i) respectively). This proposed method achieved the objective
of this work, to reduce the error in the low texture region, as mentioned in the earlier section. The All error
recorded for the PlayTable image is the best among other methods, as illustrated in Table 1.
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
Figure 3. Middlebury - PlayTable image comparison, (a) ground truth, (b) left, (c) right, (d) proposed
method, (e) SGBMP, (f) JMR, (g) MC-CNN-WS, (h) MC-CNN-acrt, (i) LS_ELAS
The effectiveness of combining the cost volumes 𝐶𝐷𝐼 and 𝐶𝐶𝑁𝑁 into CIM in step 1 as discussed in
section 2.1, is proven. The combined cost volumes, CIM helps reduce the error from 16.2% to 9.29% for All
errors and from 14.4% to 6.05% for NonOcc error in the Middlebury online stereo benchmarking system. For
qualitative comparison, Figure 4(a) depicts the left image of the Adirondack image. Figure 4(b) illustrated the
disparity map for the Adirondack image when CDI was used without CCNN. In addition, Figure 4(c)
demonstrates the effect of combining both cost volumes (CIM). Thus, this combination contributes to better
accuracy generated through the proposed matching cost computation steps.
(a) (b) (c)
Figure 4. Middlebury-Adirondack image (a) left image, (b) without proposed method, and (c) proposed method
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 
A new function of stereo matching algorithm based on hybrid convolutional neural… (Mohd Saad Hamid)
229
4. CONCLUSION
In conclusion, we demonstrate the hybrid learning-based method combined with a handcrafted
method to perform matching cost computation. The initial cost volume created may contain noises, so the
cost aggregation step was implemented to reduce errors. BF has been used as an edge-preserving filter to
maintain the sharp edge while smoothing the input. WTA was employed to compute the initial disparity map.
The final disparity map generated through post-processing steps includes LRC, hole filling, GIF, and WMF.
Based on the final disparity map, we conclude that the proposed algorithm enhances the disparity map’s
accuracy in the low texture region while maintaining the object edge. Furthermore, the proposed algorithm
can perform competitively compared to other published methods based on the Middlebury stereo
benchmarking system.
ACKNOWLEDGEMENTS
This work is supported by the Centre for Research and Innovation Management (CRIM), Universiti
Teknikal Malaysia Melaka (UTeM).
REFERENCES
[1] T. Xue, A. Owens, D. Scharstein, M. Goesele, and R. Szeliski, “Multi-frame stereo matching with edges, planes, and
superpixels,” Image Vis. Comput., 2019, doi: 10.1016/j.imavis.2019.05.006.
[2] E. Winarno, I. H. Al Amin, and W. Hadikurniawati, “Asymmetrical half-join method on dual vision face recognition,”
International Journal of Electrical and Computer Engineering (IJECE), vol. 7, no. 6, pp. 3411-3420, 2017, doi:
10.11591/ijece.v7i6.pp3411-3420.
[3] H. Ham, J. Wesley, and Hendra, “Computer vision based 3D reconstruction : A review,” International Journal of Electrical and
Computer Engineering (IJECE), vol. 9, no. 4, pp. 2394-2402, 2019, doi: 10.11591/ijece.v9i4.pp2394-2402.
[4] C. Strecha, W. V. Hansen, L. V. Gool, P. Fua, and U. Thoennessen, “On benchmarking camera calibration and multi-view stereo
for high resolution imagery,” in 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2008, doi:
10.1109/CVPR.2008.4587706.
[5] G. Yang, J. Manela, M. Happold, and D. Ramanan, “Hierarchical Deep Stereo Matching on High-Resolution Images,” in 2019
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019,
pp. 5510-5519, doi: 10.1109/CVPR.2019.00566.
[6] D. Scharstein and R. Szeliski, “A Taxonomy and Evaluation of Dense Two-Frame Stereo,” Int. J. Comput. Vis., vol. 47, no. 1, pp.
7-42, 2002, doi: 10.1109/SMBV.2001.988771.
[7] R. A. Hamzah and H. Ibrahim, “Literature survey on stereo vision disparity map algorithms,” J. Sensors, vol. 2016, 2016, doi:
10.1155/2016/8742920.
[8] M. S. Hamid, N. A. Manap, R. A. Hamzah, and A. F. Kadmin, “Stereo matching algorithm based on deep learning: A survey,” J.
King Saud Univ. - Comput. Inf. Sci., 2020, doi: 10.1016/j.jksuci.2020.08.011.
[9] N. A. Manap, S. F. Hussin, A. M. Darsono, and M. M. Ibrahim, “Performance Analysis on Stereo Matching Algorithms Based on
Local and Global Methods for 3D Images Application,” J. Telecommun. Electron. Comput. Eng., vol. 10, no. 2, pp. 23-28, 2018.
[10] K. Zhang, Y. Fang, D. Min, L. Sun, S. Yang, and S. Yan, “Cross-Scale Cost Aggregation for Stereo Matching,” IEEE Trans.
Circuits Syst. Video Technol., vol. 27, no. 5, pp. 965-976, 2017, doi: 10.1109/TCSVT.2015.2513663.
[11] S. Paris, P. Kornprobst, J. Tumblin, and F. Durand, “Bilateral filtering: Theory and applications,” Found. Trends Comput. Graph.
Vis., vol. 4, no. 1, pp. 1-73, 2009, doi: 10.1561/0600000020.
[12] S. Zhu, Z. Wang, X. Zhang, and Y. Li, “Edge-preserving guided filtering based cost aggregation for stereo matching,” J. Vis.
Commun. Image Represent., vol. 39, pp. 107-119, 2016, doi: 10.1016/j.jvcir.2016.05.012.
[13] P. Brandao, E. Mazomenos, and D. Stoyanov, “Widening siamese architectures for stereo matching,” Pattern Recognit. Lett., vol.
120, pp. 75-81, Apr. 2019, doi: 10.1016/j.patrec.2018.12.002.
[14] J. ˇZbontar and Y. LeCun, “Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches,” J. Mach.
Learn. Res., vol. 17, pp. 1-32, 2016, doi: 10.1186/s13568-015-0106-7.
[15] S. Wen, “Convolutional neural network and adaptive guided image filter based stereo matching,” in 2017 IEEE International
Conference on Imaging Systems and Techniques (IST), 2017, no. 1, pp. 1-6, doi: 10.1109/IST.2017.8261530.
[16] H. Xu, “Stereo matching and depth map collection algorithm based on deep learning,” in IST 2017 - IEEE International
Conference on Imaging Systems and Techniques, Proceedings, vol. 2018-Janua, no. 1, 2018, pp. 1-6, doi:
10.1109/IST.2017.8261504.
[17] T. Q. Vinh, L. H. Duy, and N. T. Nhan, “Vietnamese handwritten character recognition using convolutional neural network,”
Indonesian Journal of Applied Informatics (IJAI), vol. 9, no. 2, pp. 276-283, 2020, doi: 10.11591/ijai.v9.i2.pp276-283.
[18] D. Ciresan, U. Meier, J. Schmidhuber, D. Cires, and U. Meier, “Multi-column Deep Neural Networks for Image Classification,”
in 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, no. February,
pp. 3642-3649, doi: 10.1109/CVPR.2012.6248110.
[19] D. Scharstein et al., “High-resolution stereo datasets with subpixel-accurate ground truth,” in German Conference on Pattern
Recognition, 2014, doi: 10.1007/978-3-319-11752-2_3.
[20] Z. Chen, X. Sun, L. Wang, Y. Yu, and C. Huang, “A Deep Visual Correspondence Embedding Model,” in 2015 IEEE
International Conference on Computer Vision (ICCV), 2015, pp. 1-9, doi: 10.1109/ICCV.2015.117.
[21] A. Kendall et al., “End-to-End Learning of Geometry and Context for Deep Stereo Regression,” in Proceedings of the IEEE
International Conference on Computer Vision, 2017, vol. 2017-Octob, pp. 66-75, doi: 10.1109/ICCV.2017.17.
[22] J.-R. Chang and Y.-S. Chen, “Pyramid Stereo Matching Network,” in 2018 IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2018, doi: 10.1109/CVPR.2017.730.
[23] Y. Hu, W. Zhen, and S. Scherer, “Deep-Learning Assisted High-Resolution Binocular Stereo Depth Reconstruction,” in 2020
IEEE International Conference on Robotics and Automation (ICRA), 2020,
pp. 8637-8643, doi: 10.1109/ICRA40945.2020.9196655.
 ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 223-231
230
[24] P. Knöbelreiter, C. Reinbacher, A. Shekhovtsov, and T. Pock, “End-to-end training of hybrid CNN-CRF models for stereo,” Proc.
- 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, 2017,
pp. 1456-1465, doi: 10.1109/CVPR.2017.159.
[25] X. Song, X. Zhao, H. Hu, and L. Fang, “EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching,” Lect.
Notes Comput. Sci., vol. 11365, pp. 20-35, 2019, doi: 10.1007/978-3-030-20873-8_2.
[26] S. Zhu, H. Xu, and L. Yan, “A Stereo Matching and Depth Map Acquisition Algorithm Based on Deep Learning and Improved
Winner Takes All-Dynamic Programming,” IEEE Access, vol. 7, pp. 74625-74639, 2019, doi: 10.1109/ACCESS.2019.2921395.
[27] J. Kang, L. Chen, F. Deng, and C. Heipke, “Context pyramidal network for stereo matching regularized by disparity gradients,”
ISPRS J. Photogramm. Remote Sens., vol. 157, no. March, pp. 201-215, 2019, doi: 10.1016/j.isprsjprs.2019.09.012.
[28] F. Li, Q. Li, T. Zhang, Y. Niu, and G. Shi, “Depth acquisition with the combination of structured light and deep learning stereo
matching,” Signal Process. Image Commun., vol. 75, no. April, pp. 111-117, 2019, doi: 10.1016/j.image.2019.04.001.
[29] M. S. Hamid, N. A. Manap, R. A. Hamzah, and A. F. Kadmin, “Converged classification network for matching cost
computation,” Int. J. Adv. Sci. Technol., vol. 29, no. 6, 2020.
[30] S. Tulyakov, A. Ivanov, and F. Fleuret, “Weakly Supervised Learning of Deep Metrics for Stereo Reconstruction,” in
Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-Octob, 2017, pp. 1348-1357, doi:
10.1109/ICCV.2017.150.
[31] R. A. Jellal, M. Lange, B. Wassermann, A. Schilling, and A. Zell, “LS-ELAS: Line segment based efficient large scale stereo
matching,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017,
pp. 146-152, doi: 10.1109/ICRA.2017.7989019.
BIOGRAPHIES OF AUTHORS
Mohd Saad Hamid received his B. Eng degree majoring in Computer from
Multimedia University. He completed his M. Eng majoring in Computer and Communication
from Universiti Kebangsaan Malaysia in 2014. Currently, he is a lecturer at Universiti
Teknikal Malaysia Melaka. His research interests are computer vision, deep learning, image
processing and embedded systems. He can be contacted at email: mohdsaad@utem.edu.my.
Nurulfajar Abd Manap earned a Master of Electrical Engineering in image
processing (2002, Universiti Teknologi Malaysia) and Bachelor of Electrical Engineering
(2000, Universiti Teknologi Malaysia). He obtained his PhD (Image and Video Processing) at
the University of Strathclyde, Glasgow in 2012. At present, he teaches various electronics
engineering subjects and courses in his affiliation for undergraduate and postgraduate levels.
His subjects include Advanced Digital Signal Processing, Computer Vision and Pattern
Recognition, Data Structures, Multimedia Technology & Applications and Distributed & High
Performance Computing. His research interests are 3D image processing, stereo vision and
video processing. He also one of MIET Chartered Engineer and Apple Distinguished
Educators (ADE). He can be contacted at email: nurulfajar@utem.edu.my.
Rostam Affendi Hamzah graduated from Universiti Teknologi Malaysia where
he received his B. Eng majoring in Electronic Engineering. Then he received his M. Sc.
majoring in Electronic System Design engineering from the Universiti Sains Malaysia in 2010.
In 2017, he received PhD majoring in Electronic Imaging from Universiti Sains Malaysia.
Currently, he is a senior lecturer in the Universiti Teknikal Malaysia Melaka teaching Digital
Electronics, Digital Image Processing and Embedded Systems. His research interests are
computer vision, pattern recognition and digital image processing. He can be contacted at
email: rostamaffendi@utem.edu.my.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 
A new function of stereo matching algorithm based on hybrid convolutional neural… (Mohd Saad Hamid)
231
Ahmad Fauzan Kadmin graduated with a Bachelor Degree in Electronics
Engineering from Universiti Sains Malaysia and Master Degree in Computer &
Communication Engineering from Universiti Kebangsaan Malaysia. He has over 14 years of
experience in the electronic & computer engineering field with technical expert in R&D
engineering, computer vision & medical electronics. He is also one of the MIET Chartered
Engineers. He can be contacted at email: fauzan@utem.edu.my.
Shamsul Fakhar Abd Gani graduated from Universiti Malaysia Perlis
(UniMAP) in Bachelor of Engineering (Computer Engineering) with honours in 2006 and later
received his Master's degree in Internet & Web Computing in 2015 from Royal Melbourne
Institute of Technology (RMIT) Australia. He started his career as an R&D electronic engineer
specializing in software design for meter cluster development in Siemens VDO Automotive.
Shamsul is now a lecturer in the electronic and computer engineering technology department
of FTKEE UTeM. He can be contacted at email: shamsulfakhar@utem.edu.my.
Adi Irwan Herman graduated in 2015 with a Bachelor Degree in Computer
Engineering Technology (Computer Systems) from the Universiti Teknikal Malaysia Melaka.
Currently, he has been working with Texas Instrument for more than five years, with his
current research interests being computer engineering-related fields of studies. He can be
contacted at email: adiirwanherman@gmail.com.

More Related Content

PDF
A deep learning based stereo matching model for autonomous vehicle
PDF
Stereo matching algorithm using census transform and segment tree for depth e...
PDF
A Review Paper on Stereo Vision Based Depth Estimation
PDF
Development of stereo matching algorithm based on sum of absolute RGB color d...
PDF
Development of depth map from stereo images using sum of absolute differences...
PDF
6 - Conception of an Autonomous UAV using Stereo Vision (presented in an Indo...
PDF
A novel fast block matching algorithm considering cost function and stereo al...
PDF
A novel fast block matching algorithm considering cost function and stereo al...
A deep learning based stereo matching model for autonomous vehicle
Stereo matching algorithm using census transform and segment tree for depth e...
A Review Paper on Stereo Vision Based Depth Estimation
Development of stereo matching algorithm based on sum of absolute RGB color d...
Development of depth map from stereo images using sum of absolute differences...
6 - Conception of an Autonomous UAV using Stereo Vision (presented in an Indo...
A novel fast block matching algorithm considering cost function and stereo al...
A novel fast block matching algorithm considering cost function and stereo al...

Similar to A new function of stereo matching algorithm based on hybrid convolutional neural network (20)

PDF
Stereo matching based on absolute differences for multiple objects detection
PDF
A novel tool for stereo matching of images
PDF
A novel tool for stereo matching of images
PDF
A novel tool for stereo matching of images
PDF
Stereo Matching by Deep Learning
PDF
Passive stereo vision with deep learning
PDF
Structure tensor-based Gaussian kernel edge-adaptive depth map refinement wit...
PDF
Multiple Ant Colony Optimizations for Stereo Matching
PDF
Intelligent indoor mobile robot navigation using stereo vision
PDF
Dense Image Matching - Challenges and Potentials (Keynote 3D-ARCH 2015)
PDF
An Assessment of Image Matching Algorithms in Depth Estimation
PDF
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
PDF
branch-SGM
PDF
Stereo Correspondence Estimation by Two Dimensional Real Time Spiral Search A...
PDF
COMPUTATIONALLY EFFICIENT TWO STAGE SEQUENTIAL FRAMEWORK FOR STEREO MATCHING
PDF
An Enhanced Computer Vision Based Hand Movement Capturing System with Stereo ...
PDF
COMPUTATIONALLY EFFICIENT TWO STAGE SEQUENTIAL FRAMEWORK FOR STEREO MATCHING
PPTX
Lec13 stereo converted
PDF
Vision based non-invasive tool for facial swelling assessment
Stereo matching based on absolute differences for multiple objects detection
A novel tool for stereo matching of images
A novel tool for stereo matching of images
A novel tool for stereo matching of images
Stereo Matching by Deep Learning
Passive stereo vision with deep learning
Structure tensor-based Gaussian kernel edge-adaptive depth map refinement wit...
Multiple Ant Colony Optimizations for Stereo Matching
Intelligent indoor mobile robot navigation using stereo vision
Dense Image Matching - Challenges and Potentials (Keynote 3D-ARCH 2015)
An Assessment of Image Matching Algorithms in Depth Estimation
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
branch-SGM
Stereo Correspondence Estimation by Two Dimensional Real Time Spiral Search A...
COMPUTATIONALLY EFFICIENT TWO STAGE SEQUENTIAL FRAMEWORK FOR STEREO MATCHING
An Enhanced Computer Vision Based Hand Movement Capturing System with Stereo ...
COMPUTATIONALLY EFFICIENT TWO STAGE SEQUENTIAL FRAMEWORK FOR STEREO MATCHING
Lec13 stereo converted
Vision based non-invasive tool for facial swelling assessment
Ad

More from IJEECSIAES (17)

PDF
Slantlet transform used for faults diagnosis in robot arm
PDF
The IoT and registration of MRI brain diagnosis based on genetic algorithm an...
PDF
A comparative study for the assessment of Ikonos satellite image-fusion techn...
PDF
Elliptical curve cryptography image encryption scheme with aid of optimizatio...
PDF
Design secure multi-level communication system based on duffing chaotic map a...
PDF
An internet of things-based automatic brain tumor detection system
PDF
A YOLO and convolutional neural network for the detection and classification ...
PDF
Development of smart machine for sorting of deceased onions
PDF
Efficient resampling features and convolution neural network model for image ...
PDF
State and fault estimation based on fuzzy observer for a class of Takagi-Suge...
PDF
Hunting strategy for multi-robot based on wolf swarm algorithm and artificial...
PDF
Agricultural harvesting using integrated robot system
PDF
Characterization of silicon tunnel field effect transistor based on charge pl...
PDF
A new smart approach of an efficient energy consumption management by using a...
PDF
Parameter selection in data-driven fault detection and diagnosis of the air c...
PDF
Electrical load forecasting through long short term memory
PDF
A simple faulted phase-based fault distance estimation algorithm for a loop d...
Slantlet transform used for faults diagnosis in robot arm
The IoT and registration of MRI brain diagnosis based on genetic algorithm an...
A comparative study for the assessment of Ikonos satellite image-fusion techn...
Elliptical curve cryptography image encryption scheme with aid of optimizatio...
Design secure multi-level communication system based on duffing chaotic map a...
An internet of things-based automatic brain tumor detection system
A YOLO and convolutional neural network for the detection and classification ...
Development of smart machine for sorting of deceased onions
Efficient resampling features and convolution neural network model for image ...
State and fault estimation based on fuzzy observer for a class of Takagi-Suge...
Hunting strategy for multi-robot based on wolf swarm algorithm and artificial...
Agricultural harvesting using integrated robot system
Characterization of silicon tunnel field effect transistor based on charge pl...
A new smart approach of an efficient energy consumption management by using a...
Parameter selection in data-driven fault detection and diagnosis of the air c...
Electrical load forecasting through long short term memory
A simple faulted phase-based fault distance estimation algorithm for a loop d...
Ad

Recently uploaded (20)

PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
PPT on Performance Review to get promotions
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Digital Logic Computer Design lecture notes
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Foundation to blockchain - A guide to Blockchain Tech
OOP with Java - Java Introduction (Basics)
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
CH1 Production IntroductoryConcepts.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Lecture Notes Electrical Wiring System Components
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPT on Performance Review to get promotions
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Digital Logic Computer Design lecture notes
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
CYBER-CRIMES AND SECURITY A guide to understanding
Internet of Things (IOT) - A guide to understanding
bas. eng. economics group 4 presentation 1.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026

A new function of stereo matching algorithm based on hybrid convolutional neural network

  • 1. Indonesian Journal of Electrical Engineering and Computer Science Vol. 25, No. 1, January 2022, pp. 223~231 ISSN: 2502-4752, DOI: 10.11591/ijeecs.v25.i1.pp223-231  223 Journal homepage: http://ijeecs.iaescore.com A new function of stereo matching algorithm based on hybrid convolutional neural network Mohd Saad Hamid1,2 , Nurulfajar Abd Manap1 , Rostam Affendi Hamzah2 , Ahmad Fauzan Kadmin1,2 , Shamsul Fakhar Abd Gani2 , Adi Irwan Herman3 1 Fakulti Kejuruteraan Elektronik dan Kejuruteraan Komputer, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia 2 Fakulti Teknologi Kejuruteraan Elektrik dan Elektronik, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia 3 Product & Test Engineering, Texas Instruments, Batu Berendam, Melaka, Malaysia Article Info ABSTRACT Article history: Received Apr 23, 2021 Revised Nov 23, 2021 Accepted Nov 30, 2021 This paper proposes a new hybrid method between the learning-based and handcrafted methods for a stereo matching algorithm. The main purpose of the stereo matching algorithm is to produce a disparity map. This map is essential for many applications, including three-dimensional (3D) reconstruction. The raw disparity map computed by a convolutional neural network (CNN) is still prone to errors in the low texture region. The algorithm is set to improve the matching cost computation stage with hybrid CNN-based combined with truncated directional intensity computation. The difference in truncated directional intensity value is employed to decrease radiometric errors. The proposed method’s raw matching cost went through the cost aggregation step using the bilateral filter (BF) to improve accuracy. The winner-take-all (WTA) optimization uses the aggregated cost volume to produce an initial disparity map. Finally, a series of refinement processes enhance the initial disparity map for a more accurate final disparity map. This paper verified the performance of the algorithm using the Middlebury online stereo benchmarking system. The proposed algorithm achieves the objective of generating a more accurate and smooth disparity map with different depths at low texture regions through better matching cost quality. Keywords: Convolutional neural network Directional intensity Stereo matching algorithm Stereo vision This is an open access article under the CC BY-SA license. Corresponding Author: Mohd Saad Hamid Fakulti Teknologi Kejuruteraan Elektrik dan Elektronik, Universiti Teknikal Malaysia Melaka Hang Tuah Jaya, 76100 Durian Tunggal, Melaka, Malaysia Email: [email protected] 1. INTRODUCTION Recent years have seen a rise in the number of significant advancements in the stereo vision area. As one of the active topics under stereo vision, the stereo matching problem still appears to be actively discussed among researchers around the globe today [1]. One of the objectives in stereo matching algorithm development is to produce a disparity or depth map. The map contains the disparity values between pixel locations of matching features in the left and right images. The map can provide the depth information between the camera and object. This information is valuable for applications such as three-dimensional (3D) reconstruction, obstacle avoidance, robotics, and navigation. The stereo vision approach also has been used to generate the 3D facial image for facial recognition [2]. Besides the stereo-based approach, as mentioned by [3], the single view or monocular-based method can also produce depth-related information. This approach can generate a disparity map using a single image. However, as [3] and [4] pointed out, the monocular depth method was less accurate than the stereo-based approach. It is because the multi-view contains a broader amount of information compared to the single-view
  • 2.  ISSN: 2502-4752 Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 223-231 224 approach. Another merit for the stereo-based method, as mentioned by [4], is the capability to outperform the monocular method in recovering the objects of interest. Another method for generating depth-related information is light detection and ranging (LiDAR). LiDAR uses the laser approach to sense the depth and perform the depth measurement. LiDAR method can provide accurate 3D points [4]. However, this method is not very cost-effective and time-consuming [5]. Due to the highlighted shortcomings in monocular and LiDAR methods, we are inspired to continue working on the stereo-based method and propose our algorithm. Figure 1 illustrates the crucial steps for the stereo vision algorithm. The formalization of the algorithm steps is related to the previous literature [6]-[9]. The first step is matching cost computation. Some researchers utilize the directional intensity difference to compute matching costs [10]. The calculation of absolute and squared differences is also commonly used because it requires low computational complexity. The next step, cost aggregation, can decrease the errors in the initial matching cost. The edge-preserving filters such as bilateral filter (BF) [11] and guided filter (GF) [12] can achieve the purpose. These filters maintain a good edge while smoothing the input. Thus it provides better results in the aggregation step than the low pass filters (Gaussian and box filter). The third step is in charge of assigning a value to the disparity map. Winner-take-all (WTA) optimization is the most popular method for this step for the local approach. In WTA, the smallest cost value disparity will be selected for every pixel location [7] to generate an initial disparity map. The third step’s initial disparity may still contain errors caused by occlusions, low textures, and invalid matches [13]. So, the final step may include multiple post-processing steps to refine the map. The authors of [12] used the left-to-right consistency (LRC) check process to identify the void matched pixels. The median filter is also commonly used for local refinement [14]-[16]. It is another type of non-linear filter for denoising purposes. Due to its complexity, this final step can add additional time to the overall process. Figure 1. Stereo vision algorithm steps Deep learning has become the driving force behind advancements in the computer vision field [17]. Deep learning utilizes artificial neural networks to learn a large amount of data [18] mentioned that conventional stereo vision performance will be improved by incorporating deep learning for recognition tasks. Implementation of deep learning on stereo vision can be divided into two approaches. The first approach is the deep learning approach combined with a traditional handcraft algorithm matching cost with a convolutional neural network (MC-CNN-acrt) [14]. MC-CNN-acrt architecture outperforms other stereo matching conventional methods on Middlebury [19] benchmarking systems [14] construct and train their convolutional neural network (CNN) based network using binary classification to solve matching cost computation steps. Another researcher in [20] proposed CNN-based similarity learning in Euclidean space using a dot product. The method computes faster than MC-CNN, but the result is less accurate. CNN is also used in stereo matching because of its feature extraction capabilities and vigorous radiometric difference [15]. The second approach is the pure deep learning end-to-end network. This approach does not require any handcrafted algorithm. The end-to-end style deep learning network performs all stereo matching stages in one combination network. Geometry and context network (GC-Net) by [21] uses 2D CNN to form cost volumes and uses soft-argmin layer to regress the disparity values [22] introduce PSMNet, a faster and more accurate disparity map than GC-Net. The hybrid method between learning and handcraft algorithm was also introduced by [23] recently. Their deep learning network produces a disparity map and predictions of uncertainties. They also propose a handcrafted method to enhance their disparity map using a modified SGBM algorithm (SGBMP) by [23], [24] introduces a hybrid model between CNN and the learning-based conditional random field (CRF) model to form an end-to-end learning network. The performance of the hybrid and pure end-to-end methods will be discussed further in section 3. The author of [25] concluded, the end-to-end method still has some demerits in the ill-posed region and is computationally expensive. Additionally, [14] also pointed out that the raw disparity map generated by a CNN is prone to errors in the low texture and the occluded region. The problem with the low texture region is also mentioned by [13], affecting the similarity measurement. Other researchers [26]-[28] also stated that the low texture region is one of the challenging areas to cater to in stereo matching methods. So, in the
  • 3. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752  A new function of stereo matching algorithm based on hybrid convolutional neural… (Mohd Saad Hamid) 225 following section of this paper, we will present our proposed stereo matching algorithm. This paper’s main contribution is the hybrid converged classification CNN fused with directional intensity information for the matching cost computation stage. The main objective is to produce a better disparity map by focusing on the low texture region errors. 2. THE PROPOSED METHOD As mentioned in the earlier section, we present our proposed algorithm as categorized in [6]. A summary view of our algorithm is illustrated in Figure 2. Our proposed algorithm is comprised of four stages, Figure 2. The proposed stereo vision algorithm steps 2.1. Matching cost computation Our CNN-based model is inspired by the MC-CNN-acrt [14] architecture. However, we did improve the architecture, as discussed in [29]. In this paper, we extend the improvement through our new findings. As discussed in our previous work, the converged classification method in our Siamese-based CNN model [29] has been used for the matching cost computation stage. We then fused it with additional directional intensity differences to improve the accuracy of matching costs computed in this stage. We maintained the eight layers of our CNN architecture as in our previous work. The network will produce a similarity score that provides a binary classification of good or bad matching. Based on (1), CCNN reflects the cost value for all disparities d at each pixel position p. 𝐶𝐶𝑁𝑁(𝑝, 𝑑) = −𝑠(𝑃𝑁 𝐿(𝑝), 𝑃𝑁 𝑅(𝑝 − 𝑑)) (1) Input patches of NxN size from left and right image, 𝑃𝑁 𝐿 and 𝑃𝑁 𝑅 respectively supplied to the CNN. The minus sign would translate the score of similarity to the initial matching cost. The matching cost values obtained by (1) are also discussed in [29]. Another crucial part of this cost computation stage is the directional intensity difference. The second part of the cost function is defined in (2). 𝐶𝐷𝐼(𝑝, 𝑑) = 𝛼 ⋅ 𝑚𝑖𝑛(∥ 𝐼𝐿(𝑝) − 𝐼𝑅(𝑝) ∥,𝜏1) + (1 − 𝛼) ⋅ 𝑚𝑖𝑛 (∥▽ 𝑥 𝐼𝐿(𝑝) −▽ 𝑥 𝐼𝑅(𝑝) ∥,𝜏2) (2) Here 𝐼(𝑝) represent the colour vector of a pixel at position p. ▽x is the directional intensity difference in the x-direction. Α balances the directional intensity difference values, while τ1 and τ2 are truncation values. This paper improves the matching cost step from our previous work [29] by combining it with the directional intensity difference-based cost volume, CDI, as in (3). 𝐶𝐼𝑀(𝑝, 𝑑) = 𝐶𝐷𝐼(𝑝, 𝑑) + 𝜆 ⋅ 𝐶𝐶𝑁𝑁(𝑝, 𝑑) (3) The parameter λ will balance the cost volumes. The raw matching cost volume cost from this stage is defined as CIM. 2.2. Cost aggregation In the second stage, we refine the raw matching costs to create a more accurate disparity map. It is because the initial cost volume generated from the previous stage is prone to noises. Therefore, we aggregate cost volumes in this stage using a bilateral filter (BF). The BF is responsible for performing smoothing
  • 4.  ISSN: 2502-4752 Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 223-231 226 operations while keeping the edges sharp on the matching costs. The BF [11] defined in (4) will aggregate the raw matching cost volume, CIM. 𝐵𝐹[𝐼]𝑝 = 1 𝑊𝑝 ∑ 𝐺𝜎𝑠 𝑞𝜖𝑆 (∥ 𝑝 − 𝑞 ∥)𝐺𝜎𝑟 (∣ 𝐼𝑝 − 𝐼𝑞 ∣) (4) Where 𝑊p is the normalization factor. 𝐼𝑝 and 𝐼𝑞 represent the intensity at pixel position 𝑝 and 𝑞, respectively. 𝑝 represents the (𝑥, 𝑦) coordinates pixel of interest while the neighbouring coordinate 𝑞 is within support region 𝑆. The aggregated cost is denoted as CAGG in (5) at the end of this step. 𝐶𝐴𝐺𝐺(𝑝, 𝑑) = 𝐵𝐹[𝐼]𝐶𝐼𝑀(𝑝, 𝑑) (5) 2.3. Disparity computation / optimization Using winner-take-all (WTA) optimization, we computed the disparity map for this disparity computation stage. We employ WTA optimization as defined in (6) to produce an initial disparity map, 𝑑(𝑝). 𝑑(𝑝) = argmin 𝑑𝜖𝑑𝑟 (𝐶𝐴𝐺𝐺(𝑝, 𝑑)) (6) The sets of disparity values of 𝑑𝑟 obtained from the ground truth. The 𝑑(𝑝) is the chosen disparity value for position 𝑑(𝑝), which represents 2D coordinates (𝑥, 𝑦). 2.4. Disparity refinement This refinement stage consists of several steps. As discussed by [12], left-right consistency (LRC) check and interpolation were also used to deal with occlusion and mismatches. Firstly, we employ LRC to identify invalid pixels. Then the detected invalid pixel is replaced with a valid pixel using the hole-filling process. After that, we implemented a guided image filter (GIF) due to its good edge-preserving capabilities. The filter kernel for GIF is implemented in this paper as defined in (7). 𝐺𝑝,𝑞(𝐼) = 1 ∣𝑤∣ 2 ∑ (1 + (𝐼𝑝−𝜇𝑘)(𝐼𝑞−𝜇𝑘) 𝜎𝑘 2+𝜀 ) 𝑞𝜖𝑊𝑘 (7) Where I the guidance image and p represent the (x; y) coordinates pixel of interest. Another pixel position, q, denote the neighbouring pixel in the support region wk. Then, σ and µ are the variance and mean of the intensity values, respectively. The parameter, 𝜀 is used as a control element for the smoothness term. The refined disparity map using GIF is defined as 𝑑𝐺𝐹 in (8). 𝑑𝐺𝐹(𝑝) = 𝐺𝑝,𝑞(𝐼)𝑑(𝑝) (8) The weighted median filter (WMF) was implemented to remove any existing outliers in the disparity map. We use the following cosine similarity weight function as defined in (9) for the weighted median filtering. 𝑊 𝑝 𝑐𝑜𝑠 = 𝐼𝑝⋅𝐼𝑞 ∥𝐼𝑝∥⋅∥𝐼𝑞∥ (9) 𝑑𝑓(𝑝) = 𝑊 𝑝 𝑐𝑜𝑠 ℎ(𝑝)𝑑𝐺𝐹(𝑝) (10) After the WMF is implemented, the final disparity map generated from the whole proposed algorithm represented by 𝑑𝑓(𝑝) in (10). The following section will provide an overview of the proposed method’s performance quantitatively and qualitatively, 3. RESULTS AND DISCUSSION Several tests were executed on a personal computer (PC) platform to examine the proposed method’s performance using Middlebury v3 datasets [19]. The main code runs on the Python platform, utilizing Keras and Tensorflow library to perform the CNN architecture’s inference. The hardware used in this test is a PC with Intel Core i5 3.0 GHz with 16 GB DDR3 RAM and the Nvidia GTX1060 GPU. We maintained the hyperparameters of our CNN model, similar to our previous work. The image datasets for the tests were taken from Middlebury online benchmarking system [19]. The results of the proposed algorithm are also compared with other methods such as MC-CNN-acrt and MC-CNN-fst by [14], pyramid stereo matching network
  • 5. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752  A new function of stereo matching algorithm based on hybrid convolutional neural… (Mohd Saad Hamid) 227 (PSMNet_ROB) by [22], hybrid CNN+CRF models by [24] (labelled as JMR), SGBMP by [23], MC-CNN-WS by [30] and line segment based efficient large scale stereo matching (LS_ELAS) by [31]. Table 1 shows the results obtained for All errors (error of invalid disparity values in all pixels). Another metric used is shown in Table 2, which illustrates the NonOcc error percentage (error of invalid disparity values in non-occluded pixel). Based on the All errors results in Table 1, the proposed algorithm framework outperforms the original MC-CNN-acrt and other published methods. The proposed method can reduce the average errors down to 9.29%. It can also perform better than another recently published deep learning method denoted as SGBMP [23] and JMR [24]. However, there is a considerable difference between the SGBMP method’s results and ours, especially on the Jadeplant, PianoL, Vintage and PlayTable images. Based on the quantitative comparison of NonOcc error percentage in Table 2, the proposed method performed moderately. However, the proposed method still performed better than the end-to-end network- based method, PSMNet [22]. The proposed method generates output with 6.05% of errors, while PSMNet produces 9.60%. Thus, except for the Vintage and Piano images, our proposed method outperforms PSMNet in almost every image. Table 1. Results from Middlebury benchmark - All error Method Proposed JMR SGBMP MC-CNN-acrt MC-CNN-fst PSMNet_ROB MC-CNN-WS LS_ELAS Adirondack 5.37 2.17 6.50 4.24 5.32 8.83 5.73 9.31 ArtL 8.36 18.00 9.33 18.70 19.20 13.90 20.50 5.90 Jadeplant 37.60 24.70 56.80 34.10 32.60 68.40 36.30 64.50 Motorcycle 7.24 5.98 5.04 7.21 8.75 8.26 9.39 7.24 MotorcycleE 7.29 6.90 5.43 7.22 8.83 9.16 9.37 7.65 Piano 6.63 6.14 4.77 6.00 8.12 5.89 8.13 6.25 PianoL 7.88 7.27 14.80 9.35 17.20 10.50 16.10 9.69 Pipes 11.40 11.00 7.85 13.50 15.50 14.40 16.70 12.80 Playroom 9.78 17.50 7.62 18.30 18.60 9.38 18.70 10.10 Playtable 4.72 8.18 10.60 9.71 13.60 5.54 11.50 23.90 PlaytableP 4.48 7.44 3.78 9.37 9.75 5.52 10.10 4.27 Recycle 3.79 2.96 3.19 4.64 5.00 4.98 5.05 7.39 Shelves 8.44 7.81 5.00 6.62 8.91 11.60 9.83 8.48 Teddy 3.64 8.98 3.35 9.31 10.40 3.87 11.00 2.98 Vintage 9.96 10.30 30.00 21.60 15.80 9.66 20.80 14.00 Average 9.29 9.57 11.20 11.80 12.80 13.30 13.70 12.90 Table 2. Results from Middlebury benchmark - NonOcc error Method JMR MC-CNN-acrt MC-CNN-fst MC-CNN-WS Proposed SGBMP PSMNet_ROB LS_ELAS Adirondack 0.92 0.76 1.21 1.66 3.23 3.87 7.32 8.46 ArtL 2.18 2.49 2.84 4.27 5.79 4.96 9.69 3.83 Jadeplant 6.01 16.30 10.00 12.80 19.00 29.30 44.50 41.10 Motorcycle 1.26 1.27 1.62 2.26 4.59 3.45 5.55 5.12 MotorcycleE 1.27 1.27 1.61 2.18 4.50 3.89 6.12 5.80 Piano 2.21 1.83 3.17 3.21 5.88 3.82 5.01 5.54 PianoL 4.03 5.07 13.20 11.70 7.31 14.40 9.82 8.97 Pipes 2.12 2.29 3.20 4.27 7.05 3.94 9.86 7.44 Playroom 1.94 2.27 3.13 3.49 6.75 5.09 7.33 8.76 Playtable 2.20 3.11 5.78 3.78 3.58 9.74 4.40 22.40 PlaytableP 1.65 3.03 2.97 3.31 3.51 2.70 4.43 3.47 Recycle 1.30 2.48 1.95 1.83 2.97 2.91 3.73 6.93 Shelves 5.51 4.41 6.26 7.02 7.39 4.64 11.10 8.26 Teddy 1.15 1.07 1.12 2.00 2.51 1.80 3.44 2.29 Vintage 3.73 14.80 9.16 14.30 8.08 26.10 8.07 13.10 Average 2.30 3.81 3.87 4.63 6.05 7.25 9.60 9.66
  • 6.  ISSN: 2502-4752 Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 223-231 228 The qualitative comparison of the methods is illustrated in Figure 3. The ground truth image of PlayTable image is shown in Figure 3(a). The PlayTable image pair (illustrated as left and right image in Figure 3(b) and Figure 3(c) respectively) contains a low texture region in the floor area of images. Based on Figure 3(d), the proposed method produces a smooth disparity map with different depths in the region with low texture and plain colour. Qualitatively, the map produced is more accurate than the disparity map generated by other methods such as SGBMP, JMR, MC-CNN-WS, MC-CNN-acrt and LS_ELAS (illustrated in Figure 3(e), Figure 3(f), Figure 3(g), Figure 3(h) and Figure 3(i) respectively). This proposed method achieved the objective of this work, to reduce the error in the low texture region, as mentioned in the earlier section. The All error recorded for the PlayTable image is the best among other methods, as illustrated in Table 1. (a) (b) (c) (d) (e) (f) (g) (h) (i) Figure 3. Middlebury - PlayTable image comparison, (a) ground truth, (b) left, (c) right, (d) proposed method, (e) SGBMP, (f) JMR, (g) MC-CNN-WS, (h) MC-CNN-acrt, (i) LS_ELAS The effectiveness of combining the cost volumes 𝐶𝐷𝐼 and 𝐶𝐶𝑁𝑁 into CIM in step 1 as discussed in section 2.1, is proven. The combined cost volumes, CIM helps reduce the error from 16.2% to 9.29% for All errors and from 14.4% to 6.05% for NonOcc error in the Middlebury online stereo benchmarking system. For qualitative comparison, Figure 4(a) depicts the left image of the Adirondack image. Figure 4(b) illustrated the disparity map for the Adirondack image when CDI was used without CCNN. In addition, Figure 4(c) demonstrates the effect of combining both cost volumes (CIM). Thus, this combination contributes to better accuracy generated through the proposed matching cost computation steps. (a) (b) (c) Figure 4. Middlebury-Adirondack image (a) left image, (b) without proposed method, and (c) proposed method
  • 7. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752  A new function of stereo matching algorithm based on hybrid convolutional neural… (Mohd Saad Hamid) 229 4. CONCLUSION In conclusion, we demonstrate the hybrid learning-based method combined with a handcrafted method to perform matching cost computation. The initial cost volume created may contain noises, so the cost aggregation step was implemented to reduce errors. BF has been used as an edge-preserving filter to maintain the sharp edge while smoothing the input. WTA was employed to compute the initial disparity map. The final disparity map generated through post-processing steps includes LRC, hole filling, GIF, and WMF. Based on the final disparity map, we conclude that the proposed algorithm enhances the disparity map’s accuracy in the low texture region while maintaining the object edge. Furthermore, the proposed algorithm can perform competitively compared to other published methods based on the Middlebury stereo benchmarking system. ACKNOWLEDGEMENTS This work is supported by the Centre for Research and Innovation Management (CRIM), Universiti Teknikal Malaysia Melaka (UTeM). REFERENCES [1] T. Xue, A. Owens, D. Scharstein, M. Goesele, and R. Szeliski, “Multi-frame stereo matching with edges, planes, and superpixels,” Image Vis. Comput., 2019, doi: 10.1016/j.imavis.2019.05.006. [2] E. Winarno, I. H. Al Amin, and W. Hadikurniawati, “Asymmetrical half-join method on dual vision face recognition,” International Journal of Electrical and Computer Engineering (IJECE), vol. 7, no. 6, pp. 3411-3420, 2017, doi: 10.11591/ijece.v7i6.pp3411-3420. [3] H. Ham, J. Wesley, and Hendra, “Computer vision based 3D reconstruction : A review,” International Journal of Electrical and Computer Engineering (IJECE), vol. 9, no. 4, pp. 2394-2402, 2019, doi: 10.11591/ijece.v9i4.pp2394-2402. [4] C. Strecha, W. V. Hansen, L. V. Gool, P. Fua, and U. Thoennessen, “On benchmarking camera calibration and multi-view stereo for high resolution imagery,” in 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2008, doi: 10.1109/CVPR.2008.4587706. [5] G. Yang, J. Manela, M. Happold, and D. Ramanan, “Hierarchical Deep Stereo Matching on High-Resolution Images,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 5510-5519, doi: 10.1109/CVPR.2019.00566. [6] D. Scharstein and R. Szeliski, “A Taxonomy and Evaluation of Dense Two-Frame Stereo,” Int. J. Comput. Vis., vol. 47, no. 1, pp. 7-42, 2002, doi: 10.1109/SMBV.2001.988771. [7] R. A. Hamzah and H. Ibrahim, “Literature survey on stereo vision disparity map algorithms,” J. Sensors, vol. 2016, 2016, doi: 10.1155/2016/8742920. [8] M. S. Hamid, N. A. Manap, R. A. Hamzah, and A. F. Kadmin, “Stereo matching algorithm based on deep learning: A survey,” J. King Saud Univ. - Comput. Inf. Sci., 2020, doi: 10.1016/j.jksuci.2020.08.011. [9] N. A. Manap, S. F. Hussin, A. M. Darsono, and M. M. Ibrahim, “Performance Analysis on Stereo Matching Algorithms Based on Local and Global Methods for 3D Images Application,” J. Telecommun. Electron. Comput. Eng., vol. 10, no. 2, pp. 23-28, 2018. [10] K. Zhang, Y. Fang, D. Min, L. Sun, S. Yang, and S. Yan, “Cross-Scale Cost Aggregation for Stereo Matching,” IEEE Trans. Circuits Syst. Video Technol., vol. 27, no. 5, pp. 965-976, 2017, doi: 10.1109/TCSVT.2015.2513663. [11] S. Paris, P. Kornprobst, J. Tumblin, and F. Durand, “Bilateral filtering: Theory and applications,” Found. Trends Comput. Graph. Vis., vol. 4, no. 1, pp. 1-73, 2009, doi: 10.1561/0600000020. [12] S. Zhu, Z. Wang, X. Zhang, and Y. Li, “Edge-preserving guided filtering based cost aggregation for stereo matching,” J. Vis. Commun. Image Represent., vol. 39, pp. 107-119, 2016, doi: 10.1016/j.jvcir.2016.05.012. [13] P. Brandao, E. Mazomenos, and D. Stoyanov, “Widening siamese architectures for stereo matching,” Pattern Recognit. Lett., vol. 120, pp. 75-81, Apr. 2019, doi: 10.1016/j.patrec.2018.12.002. [14] J. ˇZbontar and Y. LeCun, “Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches,” J. Mach. Learn. Res., vol. 17, pp. 1-32, 2016, doi: 10.1186/s13568-015-0106-7. [15] S. Wen, “Convolutional neural network and adaptive guided image filter based stereo matching,” in 2017 IEEE International Conference on Imaging Systems and Techniques (IST), 2017, no. 1, pp. 1-6, doi: 10.1109/IST.2017.8261530. [16] H. Xu, “Stereo matching and depth map collection algorithm based on deep learning,” in IST 2017 - IEEE International Conference on Imaging Systems and Techniques, Proceedings, vol. 2018-Janua, no. 1, 2018, pp. 1-6, doi: 10.1109/IST.2017.8261504. [17] T. Q. Vinh, L. H. Duy, and N. T. Nhan, “Vietnamese handwritten character recognition using convolutional neural network,” Indonesian Journal of Applied Informatics (IJAI), vol. 9, no. 2, pp. 276-283, 2020, doi: 10.11591/ijai.v9.i2.pp276-283. [18] D. Ciresan, U. Meier, J. Schmidhuber, D. Cires, and U. Meier, “Multi-column Deep Neural Networks for Image Classification,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, no. February, pp. 3642-3649, doi: 10.1109/CVPR.2012.6248110. [19] D. Scharstein et al., “High-resolution stereo datasets with subpixel-accurate ground truth,” in German Conference on Pattern Recognition, 2014, doi: 10.1007/978-3-319-11752-2_3. [20] Z. Chen, X. Sun, L. Wang, Y. Yu, and C. Huang, “A Deep Visual Correspondence Embedding Model,” in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1-9, doi: 10.1109/ICCV.2015.117. [21] A. Kendall et al., “End-to-End Learning of Geometry and Context for Deep Stereo Regression,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, vol. 2017-Octob, pp. 66-75, doi: 10.1109/ICCV.2017.17. [22] J.-R. Chang and Y.-S. Chen, “Pyramid Stereo Matching Network,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, doi: 10.1109/CVPR.2017.730. [23] Y. Hu, W. Zhen, and S. Scherer, “Deep-Learning Assisted High-Resolution Binocular Stereo Depth Reconstruction,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 8637-8643, doi: 10.1109/ICRA40945.2020.9196655.
  • 8.  ISSN: 2502-4752 Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 223-231 230 [24] P. Knöbelreiter, C. Reinbacher, A. Shekhovtsov, and T. Pock, “End-to-end training of hybrid CNN-CRF models for stereo,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, 2017, pp. 1456-1465, doi: 10.1109/CVPR.2017.159. [25] X. Song, X. Zhao, H. Hu, and L. Fang, “EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching,” Lect. Notes Comput. Sci., vol. 11365, pp. 20-35, 2019, doi: 10.1007/978-3-030-20873-8_2. [26] S. Zhu, H. Xu, and L. Yan, “A Stereo Matching and Depth Map Acquisition Algorithm Based on Deep Learning and Improved Winner Takes All-Dynamic Programming,” IEEE Access, vol. 7, pp. 74625-74639, 2019, doi: 10.1109/ACCESS.2019.2921395. [27] J. Kang, L. Chen, F. Deng, and C. Heipke, “Context pyramidal network for stereo matching regularized by disparity gradients,” ISPRS J. Photogramm. Remote Sens., vol. 157, no. March, pp. 201-215, 2019, doi: 10.1016/j.isprsjprs.2019.09.012. [28] F. Li, Q. Li, T. Zhang, Y. Niu, and G. Shi, “Depth acquisition with the combination of structured light and deep learning stereo matching,” Signal Process. Image Commun., vol. 75, no. April, pp. 111-117, 2019, doi: 10.1016/j.image.2019.04.001. [29] M. S. Hamid, N. A. Manap, R. A. Hamzah, and A. F. Kadmin, “Converged classification network for matching cost computation,” Int. J. Adv. Sci. Technol., vol. 29, no. 6, 2020. [30] S. Tulyakov, A. Ivanov, and F. Fleuret, “Weakly Supervised Learning of Deep Metrics for Stereo Reconstruction,” in Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-Octob, 2017, pp. 1348-1357, doi: 10.1109/ICCV.2017.150. [31] R. A. Jellal, M. Lange, B. Wassermann, A. Schilling, and A. Zell, “LS-ELAS: Line segment based efficient large scale stereo matching,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 146-152, doi: 10.1109/ICRA.2017.7989019. BIOGRAPHIES OF AUTHORS Mohd Saad Hamid received his B. Eng degree majoring in Computer from Multimedia University. He completed his M. Eng majoring in Computer and Communication from Universiti Kebangsaan Malaysia in 2014. Currently, he is a lecturer at Universiti Teknikal Malaysia Melaka. His research interests are computer vision, deep learning, image processing and embedded systems. He can be contacted at email: [email protected]. Nurulfajar Abd Manap earned a Master of Electrical Engineering in image processing (2002, Universiti Teknologi Malaysia) and Bachelor of Electrical Engineering (2000, Universiti Teknologi Malaysia). He obtained his PhD (Image and Video Processing) at the University of Strathclyde, Glasgow in 2012. At present, he teaches various electronics engineering subjects and courses in his affiliation for undergraduate and postgraduate levels. His subjects include Advanced Digital Signal Processing, Computer Vision and Pattern Recognition, Data Structures, Multimedia Technology & Applications and Distributed & High Performance Computing. His research interests are 3D image processing, stereo vision and video processing. He also one of MIET Chartered Engineer and Apple Distinguished Educators (ADE). He can be contacted at email: [email protected]. Rostam Affendi Hamzah graduated from Universiti Teknologi Malaysia where he received his B. Eng majoring in Electronic Engineering. Then he received his M. Sc. majoring in Electronic System Design engineering from the Universiti Sains Malaysia in 2010. In 2017, he received PhD majoring in Electronic Imaging from Universiti Sains Malaysia. Currently, he is a senior lecturer in the Universiti Teknikal Malaysia Melaka teaching Digital Electronics, Digital Image Processing and Embedded Systems. His research interests are computer vision, pattern recognition and digital image processing. He can be contacted at email: [email protected].
  • 9. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752  A new function of stereo matching algorithm based on hybrid convolutional neural… (Mohd Saad Hamid) 231 Ahmad Fauzan Kadmin graduated with a Bachelor Degree in Electronics Engineering from Universiti Sains Malaysia and Master Degree in Computer & Communication Engineering from Universiti Kebangsaan Malaysia. He has over 14 years of experience in the electronic & computer engineering field with technical expert in R&D engineering, computer vision & medical electronics. He is also one of the MIET Chartered Engineers. He can be contacted at email: [email protected]. Shamsul Fakhar Abd Gani graduated from Universiti Malaysia Perlis (UniMAP) in Bachelor of Engineering (Computer Engineering) with honours in 2006 and later received his Master's degree in Internet & Web Computing in 2015 from Royal Melbourne Institute of Technology (RMIT) Australia. He started his career as an R&D electronic engineer specializing in software design for meter cluster development in Siemens VDO Automotive. Shamsul is now a lecturer in the electronic and computer engineering technology department of FTKEE UTeM. He can be contacted at email: [email protected]. Adi Irwan Herman graduated in 2015 with a Bachelor Degree in Computer Engineering Technology (Computer Systems) from the Universiti Teknikal Malaysia Melaka. Currently, he has been working with Texas Instrument for more than five years, with his current research interests being computer engineering-related fields of studies. He can be contacted at email: [email protected].