Building a Reward Matrix – Designing Your Datasets
Experimenting and implementation comprise the two main approaches of artificial intelligence. Experimenting largely entails trying ready-to-use datasets and black box, ready-to-use Python examples. Implementation involves preparing a dataset, developing preprocessing algorithms, and then choosing a model, the proper parameters, and hyperparameters.
Implementation usually involves white box work that entails knowing exactly how an algorithm works and even being able to modify it.
In Chapter 1, Getting Started with Next-Generation Artifcial Intelligence through Reinforcement Learning, the MDP-driven Bellman equation relied on a reward matrix. In this chapter, we will get our hands dirty in a white box process to create that reward matrix.
An MDP process cannot run without a reward matrix. The reward matrix determines whether it is possible to go from one cell to another, from A to B. It is like a map of a city that tells you if you are allowed to take a street or if it is a one-way street, for example. It can also set a goal, such as a place that you would like to visit in a city, for example.
To achieve the goal of designing a reward matrix, the raw data provided by other systems, software, and sensors needs to go through preprocessing. A machine learning program will not provide efficient results if the data has not gone through a standardization process.
The reward matrix, R, will be built using a McCulloch-Pitts neuron in TensorFlow. Warehouse management has grown exponentially as e-commerce has taken over many marketing segments. This chapter introduces automated guided vehicles (AGVs), the equivalent of an SDC in a warehouse to store and retrieve products.
The challenge in this chapter will be to understand the preprocessing phase in detail. The quality of the processed dataset will influence directly the accuracy of any machine learning algorithm.
This chapter covers the following topics:
- The McCulloch-Pitts neuron will take the raw data and transform it
- Logistic classifiers will begin the neural network process
- The logistic sigmoid will squash the values
- The softmax function will normalize the values
- The one-hot function will choose the target for the reward matrix
- An example of AGVs in a warehouse
The topics form a list of tools that, in turn, form a pipeline that will take raw data and transform it into a reward matrix—an MDP.