This document presents an improved reinforcement learning-based routing (IRLR) algorithm designed for wireless mesh networks, addressing issues with traditional algorithms, particularly the ε-greedy policy's high dropped packet rates during the exploration phase. The proposed IRLR enhances performance by using additional control packets during exploration while ensuring all data packets are sent during exploitation, resulting in improved packet delivery ratios and reduced latency, as demonstrated by simulations using OMNeT++. The research highlights the growing demand for efficient wireless network traffic solutions, emphasizing the effectiveness of reinforcement learning in optimizing routing protocols.