How-To Tutorials

article-image-making-simple-web-based-ssh-client-using-nodejs-and-socketio

28 Oct 2015

7 min read

Making a simple Web based SSH client using Node.js and Socket.io

28 Oct 2015

If you are reading this post, you probably know what SSH stands for. But just for the sake of formality, here we go: SSH stands for Secure Shell. It is a network protocol for secure access to the shell on a remote computer. You can do much more over SSH besides commanding your computer. Here you can find further information: http://en.wikipedia.org/wiki/Secure_Shell. In this post, we are going to create a very simple web terminal. And when I say simple, I mean it! However much you like colors, it will not support them because the parsing is just beyond the scope of this post. If you want a good client-side terminal library use term.js. It is made by the same guy who wrote pty.js, which we will be using. It is able to handle pretty much all key events and COLORS!!!! Installation I am going to assume you already have your node and npm installed. First we will install all of the npm packages we will be using: npm install express pty.js socket.io Express is a super cool web framework for Node. We are going to use it to serve our static files. I know it is a bit overkill, but I like Express. pty.js is where the magic will be happening. It forks processes into virtual pseudo terminals and provides bindings for communication. Socket.io is what we will use to transmit the data from the web browser to the server and back. It uses modern WebSockets, but provides fallbacks for backward compatibility. Anytime you want to create a real-time application, Socket.io is the way to go. Planning First things first, we need to think what we want the program to do. We want the program to create an instance of a shell on the server (remote machine) and send all of the text to the browser. Back in the browser, we want to capture any user events and send them back to the server shell. The WebSSH server This is the code that will power the terminal forwarding. Open a new file named server.js and start by importing all of the libraries: var express = require('express'); var https = require('https'); var http = require('http'); var fs = require('fs'); var pty = require('pty.js'); Set up express: // Setup the express app var app = express(); // Static file serving app.use("/",express.static("./")); Next we are going to create the server. // Creating an HTTP server var server = http.createServer(app).listen(8080) If you want to use HTTPS, which you probably will, you need to generate a key and certificate and import them as shown. var options = { key: fs.readFileSync('keys/key.pem'), cert: fs.readFileSync('keys/cert.pem') }; Then use the options object to create the actual server. Notice that this time we are using the https package. // Create an HTTPS server var server = https.createServer(options, app).listen(8080) CAUTION: Even if you use HTTPS, do not use this example program on the Internet. You are not authenticating the client in any way and thus providing a free open gate to your computer. Please make sure you only use this on your Private network protected by a firewall!!! Now bind the socket.io instance to the server: var io = require('socket.io')(server); After this, we can set up the place where the magic happens. // When a new socket connects io.on('connection', function(socket){ // Create terminal var term = pty.spawn('sh', [], { name: 'xterm-color', cols: 80, rows: 30, cwd: process.env.HOME, env: process.env }); // Listen on the terminal for output and send it to the client term.on('data', function(data){ socket.emit('output', data); }); // Listen on the client and send any input to the terminal socket.on('input', function(data){ term.write(data); }); // When socket disconnects, destroy the terminal socket.on("disconnect", function(){ term.destroy(); console.log("bye"); }); }); In this block, all we do is wait for new connections. When we get one, we spawn a new virtual terminal and start to pump the data from the terminal to the socket and vice versa. After the socket disconnects, we make sure to destroy the terminal. If you have noticed, I am using the simple sh shell. I did this mainly because I don't have a fancy prompt on it. Because we are not adding any parsing logic, my bash prompt would show up like this: ]0;piman@mothership: ~ _[01;32mâœ“ [33mpiman_[0m â†£ _[1;34m[~]_[37m$[0m - Eww! But you may use any shell you like. This is all that we need on the server side. Save the file and close it. Client side The client side is going to be just a very simple HTML file. Start with a very simple HTML markup: <!doctype html> <html> <head> <title>SSH Client</title> <script type="text/javascript" src="//cdnjs.cloudflare.com/ajax/libs/socket.io/1.3.5/socket.io.min.js"></script> <script type="text/javascript" src="//cdnjs.cloudflare.com/ajax/libs/jquery/2.1.4/jquery.min.js"></script> <style> body { margin: 0; padding: 0; } .terminal { font-family: monospace; color: white; background: black; } </style> </head> <body> <h1>SSH</h1> <div class="terminal"> </div> <script> </script> </body> </html> I am downloading the client side libraries jquery and socket.io from cdnjs. All of the client code will be written in the script tag below the terminal div. Surprisingly the code is very simple: // Connect to the socket.io server var socket = io.connect('http://localhost:8080'); // Wait for data from the server socket.on('output', function (data) { // Insert some line breaks where they belong data = data.replace("n", "<br>"); data = data.replace("r", "<br>"); // Append the data to our terminal $('.terminal').append(data); }); // Listen for user input and pass it to the server $(document).on("keypress",function(e){ var char = String.fromCharCode(e.which); socket.emit("input", char); }); Notice that we do not have to explicitly append the text the client types to the terminal mainly because the server echos it back anyways. Now we are done! Run the server and open up the URL in your browser. node server.js You should see a small prompt and be able to start typing commands. You can now explore you machine from the browser! Remember that our Web Terminal does not support Tab, Ctrl, Backspace or Esc characters. Implementing this is your homework. Conclusion I hope you found this tutorial useful. You can apply the knowledge in any real-time application where communication with the server is critical. All the code is available here. Please note, that if you'd like to use a browser terminal I strongly recommend term.js. It supports colors and styles and all the basic keys including Tabs, Backspace etc. I use it in my PiDashboard project. It is much cleaner and less tedious than the example I have here. I can't wait what amazing apps you will invent based on this. About the Author Jakub Mandula is a student interested in anything to do with technology, computers, mathematics or science.

0
6
48193

article-image-how-to-handle-categorical-data-for-machine-learning-algorithms

Packt Editorial Staff

20 Sep 2019

9 min read

How to handle categorical data for machine learning algorithms

Packt Editorial Staff

20 Sep 2019

9 min read

The quality of data and the amount of useful information are key factors that determine how well a machine learning algorithm can learn. Therefore, it is absolutely critical that we make sure to encode categorical variables correctly, before we feed data into a machine learning algorithm. In this article, with simple yet effective examples we will explain how to deal with categorical data in computing machine learning algorithms and how we to map ordinal and nominal feature values to integer representations. The article is an excerpt from the book Python Machine Learning - Third Edition by Sebastian Raschka and Vahid Mirjalili. This book is a comprehensive guide to machine learning and deep learning with Python. It acts as both a clear step-by-step tutorial, and a reference you’ll keep coming back to as you build your machine learning systems. It is not uncommon that real-world datasets contain one or more categorical feature columns. When we are talking about categorical data, we have to further distinguish between nominal and ordinal features. Ordinal features can be understood as categorical values that can be sorted or ordered. For example, t-shirt size would be an ordinal feature, because we can define an order XL > L > M. In contrast, nominal features don't imply any order and, to continue with the previous example, we could think of t-shirt color as a nominal feature since it typically doesn't make sense to say that, for example, red is larger than blue. Categorical data encoding with pandas Before we explore different techniques to handle such categorical data, let's create a new DataFrame to illustrate the problem: >>> import pandas as pd >>> df = pd.DataFrame([ ... ['green', 'M', 10.1, 'class1'], ... ['red', 'L', 13.5, 'class2'], ... ['blue', 'XL', 15.3, 'class1']]) >>> df.columns = ['color', 'size', 'price', 'classlabel'] >>> df color size price classlabel 0 green M 10.1 class1 1 red L 13.5 class2 2 blue XL 15.3 class1 As we can see in the preceding output, the newly created DataFrame contains a nominal feature (color), an ordinal feature (size), and a numerical feature (price) column. The class labels (assuming that we created a dataset for a supervised learning task) are stored in the last column. Mapping ordinal features To make sure that the learning algorithm interprets the ordinal features correctly, we need to convert the categorical string values into integers. Unfortunately, there is no convenient function that can automatically derive the correct order of the labels of our size feature, so we have to define the mapping manually. In the following simple example, let's assume that we know the numerical difference between features, for example, XL = L + 1 = M + 2: >>> size_mapping = { ... 'XL': 3, ... 'L': 2, ... 'M': 1} >>> df['size'] = df['size'].map(size_mapping) >>> df color size price classlabel 0 green 1 10.1 class1 1 red 2 13.5 class2 2 blue 3 15.3 class1 If we want to transform the integer values back to the original string representation at a later stage, we can simply define a reverse-mapping dictionary inv_size_mapping = {v: k for k, v in size_mapping.items()} that can then be used via the pandas map method on the transformed feature column, similar to the size_mapping dictionary that we used previously. We can use it as follows: >>> inv_size_mapping = {v: k for k, v in size_mapping.items()} >>> df['size'].map(inv_size_mapping) 0 M 1 L 2 XL Name: size, dtype: object Encoding class labels Many machine learning libraries require that class labels are encoded as integer values. Although most estimators for classification in scikit-learn convert class labels to integers internally, it is considered good practice to provide class labels as integer arrays to avoid technical glitches. To encode the class labels, we can use an approach similar to the mapping of ordinal features discussed previously. We need to remember that class labels are not ordinal, and it doesn't matter which integer number we assign to a particular string label. Thus, we can simply enumerate the class labels, starting at 0: >>> import numpy as np >>> class_mapping = {label:idx for idx,label in ... enumerate(np.unique(df['classlabel']))} >>> class_mapping {'class1': 0, 'class2': 1} Next, we can use the mapping dictionary to transform the class labels into integers: >>> df['classlabel'] = df['classlabel'].map(class_mapping) >>> df color size price classlabel 0 green 1 10.1 0 1 red 2 13.5 1 2 blue 3 15.3 0 We can reverse the key-value pairs in the mapping dictionary as follows to map the converted class labels back to the original string representation: >>> inv_class_mapping = {v: k for k, v in class_mapping.items()} >>> df['classlabel'] = df['classlabel'].map(inv_class_mapping) >>> df color size price classlabel 0 green 1 10.1 class1 1 red 2 13.5 class2 2 blue 3 15.3 class1 Alternatively, there is a convenient LabelEncoder class directly implemented in scikit-learn to achieve this: >>> from sklearn.preprocessing import LabelEncoder >>> class_le = LabelEncoder() >>> y = class_le.fit_transform(df['classlabel'].values) >>> y array([0, 1, 0]) Note that the fit_transform method is just a shortcut for calling fit and transform separately, and we can use the inverse_transform method to transform the integer class labels back into their original string representation: >>> class_le.inverse_transform(y) array(['class1', 'class2', 'class1'], dtype=object) Performing a technique ‘one-hot encoding’ on nominal features In the Mapping ordinal features section, we used a simple dictionary-mapping approach to convert the ordinal size feature into integers. Since scikit-learn's estimators for classification treat class labels as categorical data that does not imply any order (nominal), we used the convenient LabelEncoder to encode the string labels into integers. It may appear that we could use a similar approach to transform the nominal color column of our dataset, as follows: >>> X = df[['color', 'size', 'price']].values >>> color_le = LabelEncoder() >>> X[:, 0] = color_le.fit_transform(X[:, 0]) >>> X array([[1, 1, 10.1], [2, 2, 13.5], [0, 3, 15.3]], dtype=object) After executing the preceding code, the first column of the NumPy array X now holds the new color values, which are encoded as follows: blue = 0 green = 1 red = 2 If we stop at this point and feed the array to our classifier, we will make one of the most common mistakes in dealing with categorical data. Can you spot the problem? Although the color values don't come in any particular order, a learning algorithm will now assume that green is larger than blue, and red is larger than green. Although this assumption is incorrect, the algorithm could still produce useful results. However, those results would not be optimal. A common workaround for this problem is to use a technique called one-hot encoding. The idea behind this approach is to create a new dummy feature for each unique value in the nominal feature column. Here, we would convert the color feature into three new features: blue, green, and red. Binary values can then be used to indicate the particular color of an example; for example, a blue example can be encoded as blue=1, green=0, red=0. To perform this transformation, we can use the OneHotEncoder that is implemented in scikit-learn's preprocessing module: >>> from sklearn.preprocessing import OneHotEncoder >>> X = df[['color', 'size', 'price']].values >>> color_ohe = OneHotEncoder() >>> color_ohe.fit_transform(X[:, 0].reshape(-1, 1)).toarray() array([[0., 1., 0.], [0., 0., 1.], [1., 0., 0.]]) Note that we applied the OneHotEncoder to a single column (X[:, 0].reshape(-1, 1))) only, to avoid modifying the other two columns in the array as well. If we want to selectively transform columns in a multi-feature array, we can use the ColumnTransformer that accepts a list of (name, transformer, column(s)) tuples as follows: >>> from sklearn.compose import ColumnTransformer >>> X = df[['color', 'size', 'price']].values >>> c_transf = ColumnTransformer([ ... ('onehot', OneHotEncoder(), [0]), ... ('nothing', 'passthrough', [1, 2]) ... ]) >>> c_transf.fit_transform(X) .astype(float) array([[0.0, 1.0, 0.0, 1, 10.1], [0.0, 0.0, 1.0, 2, 13.5], [1.0, 0.0, 0.0, 3, 15.3]]) In the preceding code example, we specified that we only want to modify the first column and leave the other two columns untouched via the 'passthrough' argument. An even more convenient way to create those dummy features via one-hot encoding is to use the get_dummies method implemented in pandas. Applied to a DataFrame, the get_dummies method will only convert string columns and leave all other columns unchanged: >>> pd.get_dummies(df[['price', 'color', 'size']]) price size color_blue color_green color_red 0 10.1 1 0 1 0 1 13.5 2 0 0 1 2 15.3 3 1 0 0 When we are using one-hot encoding datasets, we have to keep in mind that it introduces multicollinearity, which can be an issue for certain methods (for instance, methods that require matrix inversion). If features are highly correlated, matrices are computationally difficult to invert, which can lead to numerically unstable estimates. To reduce the correlation among variables, we can simply remove one feature column from the one-hot encoded array. Note that we do not lose any important information by removing a feature column, though; for example, if we remove the column color_blue, the feature information is still preserved since if we observe color_green=0 and color_red=0, it implies that the observation must be blue. If we use the get_dummies function, we can drop the first column by passing a True argument to the drop_first parameter, as shown in the following code example: >>> pd.get_dummies(df[['price', 'color', 'size']], ... drop_first=True) price size color_green color_red 0 10.1 1 1 0 1 13.5 2 0 1 2 15.3 3 0 0 In order to drop a redundant column via the OneHotEncoder , we need to set drop='first' and set categories='auto' as follows: >>> color_ohe = OneHotEncoder(categories='auto', drop='first') >>> c_transf = ColumnTransformer([ ... ('onehot', color_ohe, [0]), ... ('nothing', 'passthrough', [1, 2]) ... ]) >>> c_transf.fit_transform(X).astype(float) array([[ 1. , 0. , 1. , 10.1], [ 0. , 1. , 2. , 13.5], [ 0. , 0. , 3. , 15.3]]) In this article, we have gone through some of the methods to deal with categorical data in datasets. We distinguished between nominal and ordinal features, and with examples we explained how they can be handled. To harness the power of the latest Python open source libraries in machine learning check out this book Python Machine Learning - Third Edition, written by Sebastian Raschka and Vahid Mirjalili. Other interesting read in data! The best business intelligence tools 2019: when to use them and how much they cost Introducing Microsoft’s AirSim, an open-source simulator for autonomous vehicles built on Unreal Engine Media manipulation by Deepfakes and cheap fakes require both AI and social fixes, finds a Data & Society report

0
0
48131

Richard Gall

11 Apr 2018

3 min read

What is LSTM?

Richard Gall

11 Apr 2018

3 min read

What does LSTM stand for? LSTM stands for long short term memory. It is a model or architecture that extends the memory of recurrent neural networks. Typically, recurrent neural networks have 'short term memory' in that they use persistent previous information to be used in the current neural network. Essentially, the previous information is used in the present task. That means we do not have a list of all of the previous information available for the neural node. Find out how LSTM works alongside recurrent neural networks. Watch this short video tutorial. How does LSTM work? LSTM introduces long-term memory into recurrent neural networks. It mitigates the vanishing gradient problem, which is where the neural network stops learning because the updates to the various weights within a given neural network become smaller and smaller. It does this by using a series of 'gates'. These are contained in memory blocks which are connected through layers, like this: There are three types of gates within a unit: Input Gate: Scales input to cell (write) Output Gate: Scales output to cell (read) Forget Gate: Scales old cell value (reset) Each gate is like a switch that controls the read/write, thus incorporating the long-term memory function into the model. Applications of LSTM There are a huge range of ways that LSTM can be used, including: Handwriting recognition Time series anomaly detection Speech recognition Learning grammar Composing music The difference between LSTM and GRU There are many similarities between LSTM and GRU (Gated Recurrent Units). However, there are some important differences that are worth remembering: A GRU has two gates, whereas an LSTM has three gates. GRUs don't possess any internal memory that is different from the exposed hidden state. They don't have the output gate, which is present in LSTMs. There is no second nonlinearity applied when computing the output in GRU. GRU as a concept, is a little newer than LSTM. It is generally more efficient - it trains models at a quicker rate than LSTM. It is also easier to use. Any modifications you need to make to a model can be done fairly easily. However, LSTM should perform better than GRU where longer term memory is required. Ultimately, comparing performance is going to depend on the data set you are using. 4 ways to enable Continual learning into Neural Networks Build a generative chatbot using recurrent neural networks (LSTM RNNs) How Deep Neural Networks can improve Speech Recognition and generation

0
0
48111

article-image-how-to-deploy-serverless-applications-in-go-using-aws-lambda-tutorial

Savia Lobo

03 Oct 2018

12 min read

How to deploy Serverless Applications in Go using AWS Lambda [Tutorial]

Savia Lobo

03 Oct 2018

12 min read

0
0
48105

How-To Tutorials

article-image-step-detector-and-step-counters-sensors

Packt

14 Apr 2016

13 min read

Step Detector and Step Counters Sensors

Packt

14 Apr 2016

13 min read

0
3
48076

article-image-brett-lantz-on-implementing-a-decision-tree-using-c5-0-algorithm-in-r

Packt Editorial Staff

29 Mar 2019

9 min read

Brett Lantz on implementing a decision tree using C5.0 algorithm in R

Packt Editorial Staff

29 Mar 2019

9 min read

Decision tree learners are powerful classifiers that utilize a tree structure to model the relationships among the features and the potential outcomes. This structure earned its name due to the fact that it mirrors the way a literal tree begins at a wide trunk and splits into narrower and narrower branches as it is followed upward. In much the same way, a decision tree classifier uses a structure of branching decisions that channel examples into a final predicted class value. In this article, we demonstrate the implementation of decision tree using C5.0 algorithm in R. This article is taken from the book, Machine Learning with R, Fourth Edition written by Brett Lantz. This 10th Anniversary Edition of the classic R data science book is updated to R 4.0.0 with newer and better libraries. This book features several new chapters that reflect the progress of machine learning in the last few years and help you build your data science skills and tackle more challenging problems There are numerous implementations of decision trees, but the most well-known is the C5.0 algorithm. This algorithm was developed by computer scientist J. Ross Quinlan as an improved version of his prior algorithm, C4.5 (C4.5 itself is an improvement over his Iterative Dichotomiser 3 (ID3) algorithm). Although Quinlan markets C5.0 to commercial clients (see http://www.rulequest.com/ for details), the source code for a single-threaded version of the algorithm was made public, and has therefore been incorporated into programs such as R. The C5.0 decision tree algorithm The C5.0 algorithm has become the industry standard for producing decision trees because it does well for most types of problems directly out of the box. Compared to other advanced machine learning models, the decision trees built by C5.0 generally perform nearly as well but are much easier to understand and deploy. Additionally, as shown in the following table, the algorithm's weaknesses are relatively minor and can be largely avoided. Strengths An all-purpose classifier that does well on many types of problems. Highly automatic learning process, which can handle numeric or nominal features, as well as missing data. Excludes unimportant features. Can be used on both small and large datasets. Results in a model that can be interpreted without a mathematical background (for relatively small trees). More efficient than other complex models. Weaknesses Decision tree models are often biased toward splits on features having a large number of levels. It is easy to overfit or underfit the model. Can have trouble modeling some relationships due to reliance on axis-parallel splits. Small changes in training data can result in large changes to decision logic. Large trees can be difficult to interpret and the decisions they make may seem counterintuitive. To keep things simple, our earlier decision tree example ignored the mathematics involved with how a machine would employ a divide and conquer strategy. Let's explore this in more detail to examine how this heuristic works in practice. Choosing the best split The first challenge that a decision tree will face is to identify which feature to split upon. In the previous example, we looked for a way to split the data such that the resulting partitions contained examples primarily of a single class. The degree to which a subset of examples contains only a single class is known as purity, and any subset composed of only a single class is called pure. There are various measurements of purity that can be used to identify the best decision tree splitting candidate. C5.0 uses entropy, a concept borrowed from information theory that quantifies the randomness, or disorder, within a set of class values. Sets with high entropy are very diverse and provide little information about other items that may also belong in the set, as there is no apparent commonality. The decision tree hopes to find splits that reduce entropy, ultimately increasing homogeneity within the groups. Typically, entropy is measured in bits. If there are only two possible classes, entropy values can range from 0 to 1. For n classes, entropy ranges from 0 to log2(n). In each case, the minimum value indicates that the sample is completely homogenous, while the maximum value indicates that the data are as diverse as possible, and no group has even a small plurality. In mathematical notion, entropy is specified as: In this formula, for a given segment of data (S), the term c refers to the number of class levels, and pi refers to the proportion of values falling into class level i. For example, suppose we have a partition of data with two classes: red (60 percent) and white (40 percent). We can calculate the entropy as: > -0.60 * log2(0.60) - 0.40 * log2(0.40) [1] 0.9709506 We can visualize the entropy for all possible two-class arrangements. If we know the proportion of examples in one class is x, then the proportion in the other class is (1 – x). Using the curve() function, we can then plot the entropy for all possible values of x: > curve(-x * log2(x) - (1 - x) * log2(1 - x), col = "red", xlab = "x", ylab = "Entropy", lwd = 4) This results in the following figure: The total entropy as the proportion of one class varies in a two-class outcome As illustrated by the peak in entropy at x = 0.50, a 50-50 split results in the maximum entropy. As one class increasingly dominates the other, the entropy reduces to zero. To use entropy to determine the optimal feature to split upon, the algorithm calculates the change in homogeneity that would result from a split on each possible feature, a measure known as information gain. The information gain for a feature F is calculated as the difference between the entropy in the segment before the split (S1) and the partitions resulting from the split (S2): One complication is that after a split, the data is divided into more than one partition. Therefore, the function to calculate Entropy(S2) needs to consider the total entropy across all of the partitions. It does this by weighting each partition's entropy according to the proportion of all records falling into that partition. This can be stated in a formula as: In simple terms, the total entropy resulting from a split is the sum of entropy of each of the n partitions weighted by the proportion of examples falling in the partition (wi). The higher the information gain, the better a feature is at creating homogeneous groups after a split on that feature. If the information gain is zero, there is no reduction in entropy for splitting on this feature. On the other hand, the maximum information gain is equal to the entropy prior to the split. This would imply the entropy after the split is zero, which means that the split results in completely homogeneous groups. The previous formulas assume nominal features, but decision trees use information gain for splitting on numeric features as well. To do so, a common practice is to test various splits that divide the values into groups greater than or less than a threshold. This reduces the numeric feature into a two-level categorical feature that allows information gain to be calculated as usual. The numeric cut point yielding the largest information gain is chosen for the split. Note: Though it is used by C5.0, information gain is not the only splitting criterion that can be used to build decision trees. Other commonly used criteria are Gini index, chi-squared statistic, and gain ratio. For a review of these (and many more) criteria, refer to An Empirical Comparison of Selection Measures for Decision-Tree Induction, Mingers, J, Machine Learning, 1989, Vol. 3, pp. 319-342. Pruning the decision tree As mentioned earlier, a decision tree can continue to grow indefinitely, choosing splitting features and dividing into smaller and smaller partitions until each example is perfectly classified or the algorithm runs out of features to split on. However, if the tree grows overly large, many of the decisions it makes will be overly specific and the model will be overfitted to the training data. The process of pruning a decision tree involves reducing its size such that it generalizes better to unseen data. One solution to this problem is to stop the tree from growing once it reaches a certain number of decisions or when the decision nodes contain only a small number of examples. This is called early stopping or prepruning the decision tree. As the tree avoids doing needless work, this is an appealing strategy. However, one downside to this approach is that there is no way to know whether the tree will miss subtle but important patterns that it would have learned had it grown to a larger size. An alternative, called post-pruning, involves growing a tree that is intentionally too large and pruning leaf nodes to reduce the size of the tree to a more appropriate level. This is often a more effective approach than prepruning because it is quite difficult to determine the optimal depth of a decision tree without growing it first. Pruning the tree later on allows the algorithm to be certain that all of the important data structures were discovered. Note: The implementation details of pruning operations are very technical and beyond the scope of this book. For a comparison of some of the available methods, see A Comparative Analysis of Methods for Pruning Decision Trees, Esposito, F, Malerba, D, Semeraro, G, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, Vol. 19, pp. 476-491. One of the benefits of the C5.0 algorithm is that it is opinionated about pruning—it takes care of many of the decisions automatically using fairly reasonable defaults. Its overall strategy is to post-prune the tree. It first grows a large tree that overfits the training data. Later, the nodes and branches that have little effect on the classification errors are removed. In some cases, entire branches are moved further up the tree or replaced by simpler decisions. These processes of grafting branches are known as subtree raising and subtree replacement, respectively. Getting the right balance of overfitting and underfitting is a bit of an art, but if model accuracy is vital, it may be worth investing some time with various pruning options to see if it improves the test dataset performance. To summarize , decision trees are widely used due to their high accuracy and ability to formulate a statistical model in plain language. Here, we looked at a highly popular and easily configurable decision tree algorithm C5.0. The major strength of the C5.0 algorithm over other decision tree implementations is that it is very easy to adjust the training options. Harness the power of R to build flexible, effective, and transparent machine learning models with Brett Lantz’s latest book Machine Learning with R, Fourth Edition. Dr.Brandon explains Decision Trees to Jon Building a classification system with Decision Trees in Apache Spark 2.0 Implementing Decision Trees

0
0
48067

article-image-rust-is-the-future-of-systems-programming-c-is-the-new-assembly-intel-principal-engineer-josh-triplett

Bhagyashree R

27 Aug 2019

10 min read

“Rust is the future of systems programming, C is the new Assembly”: Intel principal engineer, Josh Triplett

Bhagyashree R

27 Aug 2019

10 min read

At Open Source Technology Summit (OSTS) 2019, Josh Triplett, a Principal Engineer at Intel gave an insight into what Intel is contributing to bring the most loved language, Rust to full parity with C. In his talk titled Intel and Rust: the Future of Systems Programming, he also spoke about the history of systems programming, how C became the “default” systems programming language, what features of Rust gives it an edge over C, and much more. Until now, OSTS was Intel's closed event where the company's business and tech leaders come together to discuss the various trends, technologies, and innovations that will help shape the open-source ecosystem. However, this year was different as the company welcomed non-Intel attendees including media, partners, and developers for the first time. The event hosts keynotes, more than 50 technical sessions, panels, demos covering all the open source technologies Intel is involved in. These include integrated software stacks (edge, AI, infrastructure), firmware, embedded and IoT projects, and cloud system software. This year the event happened from May 14-16 at Stevenson, Washington. What is systems programming Systems programming is the development and management of software that serves as a platform for other software to be built upon. The system software also directly or closely interfaces with computer hardware in order to gain necessary performance and expose abstractions. Unlike application programming where software is created to provide services to the user, it aims to produce software that provides services to the computer hardware. Triplett broadly defines systems programming as “anything that isn't an app.” It includes things like BIOS, firmware, boot loaders, operating systems kernels, embedded and similar types of low-level code, virtual machine implementations. Triplett also counts a web browser as a system software as it is more than “just an app,” they are actually “platforms for websites and web apps,” he says. How C became the “default” systems programming language Previously, most system software including BIOS, boot loaders, and firmware were written in Assembly. In the 1960s, experiments to bring hardware support in high-level languages started, which resulted in the creation of languages such as PL/S, BLISS, BCPL, and extended ALGOL. Then in the 1970s, Dennis Ritchie created the C programming language for the Unix operating system. Derived from the typeless B programming language, C was packed with powerful high-level functionalities and detailed features that were best suited for writing an operating system. Several UNIX components including its kernel were eventually rewritten in C. Many other system software including the Oracle database, a large portion of Windows source code, Linux operating system, were all written in C. C was seeing a huge adoption at this point. But, what exactly made developers comfortable moving to C? Triplett believes that in order to make this move from one language to another, developers have to be comfortable in terms of two things: features and parity. First, the language should offer “sufficiently compelling” features. “It can’t just be a little bit better. It has to be substantially better to warrant the effort and engineering time needed to move,” he adds. As compared to Assembly, C had a lot to offer. It had some degree of type safety, provided portability, better productivity with high-level constructs, and much more readable code. Second, the language has to provide parity, which means developers had to be confident that it is no less capable than Assembly. He states, “It can’t just be better, it also has to be no worse.” In addition to being faster and expressing any type of data that Assembly was able to, it also had what Triplett calls “escape hatch.” This means you are allowed to make the move incrementally and also combine Assembly if required. Triplett believes that C is now becoming what Assembly was years ago. “C is the new Assembly,” he concludes. Developers are looking for a high-level language that not only addresses the problems in C that can’t be fixed but also leverage other exciting features that these languages provide. Such a language that aims to be compelling enough to make developers move from C should be memory safe, provide automatic memory management, security, and much more. “Any language that wants to be better than C has to offer a lot more than just protection from buffer overflows if it's actually going to be a compelling alternative. People care about usability and productivity. They care about writing code that is self-explanatory, which accomplishes more work in less code. It also needs to address security issues. Usability and productivity go hand in hand with security. The less code you need to write to accomplish something, the less chance you have of introducing bugs security bugs or otherwise,” he explains. Comparing Rust with C Back in 2006, Graydon Hoare, a Mozilla employee started writing Rust as a personal project. Mozilla, in 2009, started sponsoring the project and also expanded the team to drive further development of the language. One of the reasons why Mozilla got interested is that Firefox was written in more than 4 million lines of C++ code and had quite a bit of highly critical vulnerabilities. Rust was built with safety and concurrency in mind making it the perfect choice for rewriting many components of Firefox under Project Quantum. It is also using Rust to develop Servo, an HTML rendering engine that will eventually replace Firefox’s rendering engine. Many other companies have also started using Rust for their projects including Microsoft, Google, Facebook, Amazon, Dropbox, Fastly, Chef, Baidu, and more. Rust addresses the memory management problem in C. It offers automatic memory management so that developers do not have to manually call free on every object. What sets it apart from other modern languages is that it does not have a garbage collector or runtime system of any kind. Rust instead has the concepts of ownership, borrowing, references, and lifetimes. “Rust has a system of declaring whether any given use of an object is the owner of that object or whether it's just borrowing that object temporarily. If you're just borrowing an object the compiler will keep track of that. It'll make sure that the original sticks around as long as you reference it. Rust makes sure that the owner of the object frees it when it's done and it inserts the call to free at compile time with no extra runtime overhead,” Triplett explains. Not having a runtime is also a plus for Rust. Triplett believes that languages that have a runtime are difficult to use as a system programming language. He adds, “You have to initialize that runtime before you can call any code, you have to use that runtime to call functions, and the runtime itself might run extra code behind your back at unexpected times.” Rust also aims to provide safe concurrent programming. The same features that make it memory safe, keep track of things like which thread own which object, which objects can be passed between threads, and which objects require acquiring locks. These features make Rust compelling enough for developers to choose for systems programming. However, talking about the second criteria, Rust does not have enough parity with C yet. “Achieving parity with C is exactly what got me involved in Rust,” says Triplett Teaching Rust about C compatible unions Triplett's first contribution to the Rust programming language was in the form of the 1444 RFC, which was started in 2015 and got accepted in 2016. This RFC proposed to bring native support for C-compatible unions in Rust that would be defined via a new "contextual keyword" union. Triplett understood the need for this proposal when he wanted to build a virtual machine in Rust and the Linux kernel interface for that /dev/kvm required unions. "I worked with the Rust community and with the language team to get unions into Rust and because of that work I'm actually now part of the Rust language governance team helping to evaluate and guide other changes into the language," he adds. He talked about this RFC in much detail at the very first RustConf in 2016: https://www.youtube.com/watch?v=U8Gl3RTXf88 Support for unnamed struct and union types Another feature that Triplett worked on was the support for unnamed struct and union types in Rust. This has been a widespread C compiler extension for decades and was also included in the C11 standard. This allowed developers to group and layout fields in arbitrary ways to match C data structures used in the Foreign Function Interface (FFI). With this proposal implemented, Rust will be able to represent such types using the same names as the structures without interposing artificial field names that will confuse users of well-established interfaces from existing platforms. A stabilized support for inline Assembly in Rust Systems programming often involves low-level manipulations and requires low-level details of the processors such as privileged instructions. For this, Rust supports using inline Assembly via the ‘asm!’ macro. However, it is only present in the nightly compiler and not yet stabilized. Triplett in a collaboration with other Rust developers is writing a proposal to introduce more robust syntax for inline Assembly. To know more in detail about support for inline Assembly, check out this pre-RFC. BFLOAT16 support into Rust Many Intel processors including Xeon Scalable ‘Cooper Lake-SP’ now support BFLOAT16, a new floating-point format. This truncated 16-bit version of the 32-bit IEEE 754 single-precision floating-point format was mainly designed for deep learning. This format is also used in machine learning libraries like Tensorflow that work with huge datasets. It also makes interoperating with existing systems, functions, and storage much easier. This is why Triplett is working on adding support for BFLOAT16 in Rust so that developers would be able to use the full capabilities of their hardware. FFI/C Parity Working Group This was one of the important announcements that Triplett made. He is starting a working group that will focus on achieving full parity with C. Under this group, he aims to collaborate with both the Rust community and other Intel developers to develop the specifications for the remaining features that need to be implemented in Rust for system programming. This group will also focus on bringing support for systems programming using the stable releases of Rust, not just experimental nightly releases of the compiler. In last week’s Reddit discussion, Triplett shared the current status of the working group, “To pre-answer, one question: the FFI / C Parity working group is in the process of being launched, and hasn't quite kicked off yet. I'll be posting about it here and elsewhere when it is, along with the initial goals.” Watch Josh Triplett’s full OSTS talk to know more about Intel’s contribution to Rust: https://www.youtube.com/watch?v=l9hM0h6IQDo [box type="shadow" align="" class="" width=""]Update: We have made the following corrections based on feedback from Josh Triplett: This year OSTS was open to Intel's partners and press. Previously, the article read 'escape patch', but it is 'escape hatch.' RFC 1444 wasn't last year, it was started in 2015 and accepted in 2016. 'dev KVM' is now corrected to '/dev/kvm'[/box] AMD competes with Intel by launching EPYC Rome, world’s first 7 nm chip for data centers, luring in Twitter and Google Hot Chips 31: IBM Power10, AMD’s AI ambitions, Intel NNP-T, Cerebras largest chip with 1.2 trillion transistors and more Intel’s 10th gen 10nm ‘Ice Lake’ processor offers AI apps, new graphics and best connectivity

0
0
48003

article-image-setting-up-logistic-regression-model-using-tensorflow

Packt Editorial Staff

25 Apr 2018

8 min read

Setting up Logistic Regression model using TensorFlow

Packt Editorial Staff

25 Apr 2018

8 min read

TensorFlow is another open source library developed by the Google Brain Team to build numerical computation models using data flow graphs. The core of TensorFlow was developed in C++ with the wrapper in Python. The tensorflow package in R gives you access to the TensorFlow API composed of Python modules to execute computation models. TensorFlow supports both CPU- and GPU-based computations. In this article, we will cover the application of TensorFlow in setting up a logistic regression model. The example will use a similar dataset to that used in the H2O model setup. The tensorflow package in R calls the Python tensorflow API for execution, which is essential to install the tensorflow package in both R and Python to make R work. The following are the dependencies for tensorflow: Python 2.7 / 3.x R (>3.2) devtools package in R for installing TensorFlow from GitHub TensorFlow in Python pip Getting ready The code for this section is created on Linux but can be run on any operating system. To start modeling, load the tensorflow package in the environment. R loads the default TensorFlow environment variable and also the NumPy library from Python in the np variable: library("tensorflow") # Load TensorFlow np <- import("numpy") # Load numpy library How to do it... The data is imported using a standard function from R, as shown in the following code. The data is imported using the csv file and transformed into the matrix format followed by selecting the features used to model as defined in xFeatures and yFeatures. The next step in TensorFlow is to set up a graph to run optimization: # Loading input and test data xFeatures = c("Temperature", "Humidity", "Light", "CO2", "HumidityRatio") yFeatures = "Occupancy" occupancy_train <-as.matrix(read.csv("datatraining.txt",stringsAsFactors = T)) occupancy_test <- as.matrix(read.csv("datatest.txt",stringsAsFactors = T)) # subset features for modeling and transform to numeric values occupancy_train<-apply(occupancy_train[, c(xFeatures, yFeatures)], 2, FUN=as.numeric) occupancy_test<-apply(occupancy_test[, c(xFeatures, yFeatures)], 2, FUN=as.numeric) # Data dimensions nFeatures<-length(xFeatures) nRow<-nrow(occupancy_train) Before setting up the graph, let's reset the graph using the following command: # Reset the graph tf$reset_default_graph() Additionally, let's start an interactive session as it will allow us to execute variables without referring to the session-to-session object: # Starting session as interactive session sess<-tf$InteractiveSession() Define the logistic regression model in TensorFlow: # Setting-up Logistic regression graph x <- tf$constant(unlist(occupancy_train[, xFeatures]), shape=c(nRow, nFeatures), dtype=np$float32) # W <- tf$Variable(tf$random_uniform(shape(nFeatures, 1L))) b <- tf$Variable(tf$zeros(shape(1L))) y <- tf$matmul(x, W) + b The input feature x is defined as a constant as it will be an input to the system. The weight W and bias b are defined as variables that will be optimized during the optimization process. The y is set up as a symbolic representation between x, W, and b. The weight W is set up to initialize random uniform distribution and b is assigned the value zero. The next step is to set up the cost function for logistic regression: # Setting-up cost function and optimizer y_ <- tf$constant(unlist(occupancy_train[, yFeatures]), dtype="float32", shape=c(nRow, 1L)) cross_entropy<-tf$reduce_mean(tf$nn$sigmoid_cross_entropy_with_logits(labels=y_, logits=y, name="cross_entropy")) optimizer <- tf$train$GradientDescentOptimizer(0.15)$minimize(cross_entropy) # Start a session init <- tf$global_variables_initializer() sess$run(init) Execute the gradient descent algorithm for the optimization of weights using cross entropy as the loss function: # Running optimization for (step in 1:5000) { sess$run(optimizer) if (step %% 20== 0) cat(step, "-", sess$run(W), sess$run(b), "==>", sess$run(cross_entropy), "n") } How it works... The performance of the model can be evaluated using AUC: # Performance on Train library(pROC) ypred <- sess$run(tf$nn$sigmoid(tf$matmul(x, W) + b)) roc_obj <- roc(occupancy_train[, yFeatures], as.numeric(ypred)) # Performance on test nRowt<-nrow(occupancy_test) xt <- tf$constant(unlist(occupancy_test[, xFeatures]), shape=c(nRowt, nFeatures), dtype=np$float32) ypredt <- sess$run(tf$nn$sigmoid(tf$matmul(xt, W) + b)) roc_objt <- roc(occupancy_test[, yFeatures], as.numeric(ypredt)). AUC can be visualized using the plot.auc function from the pROC package, as shown in the screenshot following this command. The performance for training and testing (hold-out) is very similar. plot.roc(roc_obj, col = "green", lty=2, lwd=2) plot.roc(roc_objt, add=T, col="red", lty=4, lwd=2) Performance of logistic regression using TensorFlow Visualizing TensorFlow graphs TensorFlow graphs can be visualized using TensorBoard. It is a service that utilizes TensorFlow event files to visualize TensorFlow models as graphs. Graph model visualization in TensorBoard is also used to debug TensorFlow models. Getting ready TensorBoard can be started using the following command in the terminal: $ tensorboard --logdir home/log --port 6006 The following are the major parameters for TensorBoard: --logdir : To map to the directory to load TensorFlow events --debug: To increase log verbosity --host: To define the host to listen to its localhost (0.0.1) by default --port: To define the port to which TensorBoard will serve The preceding command will launch the TensorFlow service on localhost at port 6006, as shown in the following screenshot: TensorBoard The tabs on the TensorBoard capture relevant data generated during graph execution. How to do it... The section covers how to visualize TensorFlow models and output in TernsorBoard. To visualize summaries and graphs, data from TensorFlow can be exported using the FileWriter command from the summary module. A default session graph can be added using the following command: # Create Writer Obj for log log_writer = tf$summary$FileWriter('c:/log', sess$graph) The graph for logistic regression developed using the preceding code is shown in the following screenshot: Visualization of the logistic regression graph in TensorBoard Details about symbol descriptions on TensorBoard can be found at https://www.tensorflow.org/get_started/graph_viz. Similarly, other variable summaries can be added to the TensorBoard using correct summaries, as shown in the following code: # Adding histogram summary to weight and bias variable w_hist = tf$histogram_summary("weights", W) b_hist = tf$histogram_summary("biases", b) Create a cross entropy evaluation for test. An example script to generate the cross entropy cost function for test and train is shown in the following command: # Set-up cross entropy for test nRowt<-nrow(occupancy_test) xt <- tf$constant(unlist(occupancy_test[, xFeatures]), shape=c(nRowt, nFeatures), dtype=np$float32) ypredt <- tf$nn$sigmoid(tf$matmul(xt, W) + b) yt_ <- tf$constant(unlist(occupancy_test[, yFeatures]), dtype="float32", shape=c(nRowt, 1L)) cross_entropy_tst<-tf$reduce_mean(tf$nn$sigmoid_cross_entropy_with_logits(labels=yt_, logits=ypredt, name="cross_entropy_tst")) Add summary variables to be collected: # Add summary ops to collect data w_hist = tf$summary$histogram("weights", W) b_hist = tf$summary$histogram("biases", b) crossEntropySummary<-tf$summary$scalar("costFunction", cross_entropy) crossEntropyTstSummary<-tf$summary$scalar("costFunction_test", cross_entropy_tst) Open the writing object, log_writer. It writes the default graph to the location, c:/log: # Create Writer Obj for log log_writer = tf$summary$FileWriter('c:/log', sess$graph) Run the optimization and collect the summaries: for (step in 1:2500) { sess$run(optimizer) # Evaluate performance on training and test data after 50 Iteration if (step %% 50== 0){ ### Performance on Train ypred <- sess$run(tf$nn$sigmoid(tf$matmul(x, W) + b)) roc_obj <- roc(occupancy_train[, yFeatures], as.numeric(ypred)) ### Performance on Test ypredt <- sess$run(tf$nn$sigmoid(tf$matmul(xt, W) + b)) roc_objt <- roc(occupancy_test[, yFeatures], as.numeric(ypredt)) cat("train AUC: ", auc(roc_obj), " Test AUC: ", auc(roc_objt), "n") # Save summary of Bias and weights log_writer$add_summary(sess$run(b_hist), global_step=step) log_writer$add_summary(sess$run(w_hist), global_step=step) log_writer$add_summary(sess$run(crossEntropySummary), global_step=step) log_writer$add_summary(sess$run(crossEntropyTstSummary), global_step=step) } } Collect all the summaries to a single tensor using themerge_all command from the summary module: summary = tf$summary$merge_all() Write the summaries to the log file using the log_writer object: log_writer = tf$summary$FileWriter('c:/log', sess$graph) summary_str = sess$run(summary) log_writer$add_summary(summary_str, step) log_writer$close() We have learned how to perform logistic regression using TensorFlow also we have covered the application of TensorFlow in setting up a logistic regression model. [box type="shadow" align="" class="" width=""]This article is book excerpt taken from, R Deep Learning Cookbook, co-authored by PKS Prakash & Achyutuni Sri Krishna Rao. This book contains powerful and independent recipes to build deep learning models in different application areas using R libraries.[/box] Read More Getting started with Linear and logistic regression Healthcare Analytics: Logistic Regression to Reduce Patient Readmissions Using Logistic regression to predict market direction in algorithmic trading

0
0
47924

article-image-connecting-cloud-object-storage-with-databricks-unity-catalog

Pulkit Chadha

22 Oct 2024

10 min read

Connecting Cloud Object Storage with Databricks Unity Catalog

Pulkit Chadha

22 Oct 2024

10 min read

0
0
47916

article-image-restful-web-services-with-kotlin

Natasha Mathur

01 Jun 2018

9 min read

Building RESTful web services with Kotlin

Natasha Mathur

01 Jun 2018

9 min read

Kotlin has been eating up the Java world. It has already become a hit in the Android Ecosystem which was dominated by Java and is welcomed with open arms. Kotlin is not limited to Android development and can be used to develop server-side and client-side web applications as well. Kotlin is 100% compatible with the JVM so you can use any existing frameworks such as Spring Boot, Vert.x, or JSF for writing Java applications. In this tutorial, we will learn how to implement RESTful web services using Kotlin. This article is an excerpt from the book 'Kotlin Programming Cookbook', written by, Aanand Shekhar Roy and Rashi Karanpuria. Setting up dependencies for building RESTful services In this recipe, we will lay the foundation for developing the RESTful service. We will see how to set up dependencies and run our first SpringBoot web application. SpringBoot provides great support for Kotlin, which makes it easy to work with Kotlin. So let's get started. We will be using IntelliJ IDEA and Gradle build system. If you don't have that, you can get it from https://www.jetbrains.com/idea/. How to do it… Let's follow the given steps to set up the dependencies for building RESTful services: First, we will create a new project in IntelliJ IDE. We will be using the Gradle build system for maintaining dependency, so create a Gradle project: When you have created the project, just add the following lines to your build.gradle file. These lines of code contain spring-boot dependencies that we will need to develop the web app: buildscript { ext.kotlin_version = '1.1.60' // Required for Kotlin integration ext.spring_boot_version = '1.5.4.RELEASE' repositories { jcenter() } dependencies { classpath "org.jetbrains.kotlin:kotlin-gradle-plugin:$kotlin_version" // Required for Kotlin integration classpath "org.jetbrains.kotlin:kotlin-allopen:$kotlin_version" // See https://kotlinlang.org/docs/reference/compiler-plugins.html#kotlin-spring-compiler-plugin classpath "org.springframework.boot:spring-boot-gradle-plugin:$spring_boot_version" } } apply plugin: 'kotlin' // Required for Kotlin integration apply plugin: "kotlin-spring" // See https://kotlinlang.org/docs/reference/compiler-plugins.html#kotlin-spring-compiler-plugin apply plugin: 'org.springframework.boot' jar { baseName = 'gs-rest-service' version = '0.1.0' } sourceSets { main.java.srcDirs += 'src/main/kotlin' } repositories { jcenter() } dependencies { compile "org.jetbrains.kotlin:kotlin-stdlib:$kotlin_version" // Required for Kotlin integration compile 'org.springframework.boot:spring-boot-starter-web' testCompile('org.springframework.boot:spring-boot-starter-test') } Let's now create an App.kt file in the following directory hierarchy: It is important to keep the App.kt file in a package (we've used the college package). Otherwise, you will get an error that says the following: ** WARNING ** : Your ApplicationContext is unlikely to start due to a `@ComponentScan` of the default package. The reason for this error is that if you don't include a package declaration, it considers it a "default package," which is discouraged and avoided. Now, let's try to run the App.kt class. We will put the following code to test if it's running: @SpringBootApplication open class App { } fun main(args: Array<String>) { SpringApplication.run(App::class.java, *args) } Now run the project; if everything goes well, you will see output with the following line at the end: Started AppKt in 5.875 seconds (JVM running for 6.445) We now have our application running on our embedded Tomcat server. If you go to http://localhost:8080, you will see an error as follows: The preceding error is 404 error and the reason for that is we haven't told our application to do anything when a user is on the / path. Creating a REST controller In the previous recipe, we learned how to set up dependencies for creating RESTful services. Finally, we launched our backend on the http://localhost:8080 endpoint but got 404 error as our application wasn't configured to handle requests at that path (/). We will start from that point and learn how to create a REST controller. Let's get started! We will be using IntelliJ IDE for coding purposes. For setting up of the environment, refer to the previous recipe. You can also find the source in the repository at https://gitlab.com/aanandshekharroy/kotlin-webservices. How to do it… In this recipe, we will create a REST controller that will fetch us information about students in a college. We will be using an in-memory database using a list to keep things simple: Let's first create a Student class having a name and roll number properties: package college class Student() { lateinit var roll_number: String lateinit var name: String constructor( roll_number: String, name: String): this() { this.roll_number = roll_number this.name = name } } Next, we will create the StudentDatabase endpoint, which will act as a database for the application: @Component class StudentDatabase { private val students = mutableListOf<Student>() } Note that we have annotated the StudentDatabase class with @Component, which means its lifecycle will be controlled by Spring (because we want it to act as a database for our application). We also need a @PostConstruct annotation, because it's an in-memory database that is destroyed when the application closes. So we would like to have a filled database whenever the application launches. So we will create an init method, which will add a few items into the "database" at startup time: @PostConstruct private fun init() { students.add(Student("2013001","Aanand Shekhar Roy")) students.add(Student("2013165","Rashi Karanpuria")) } Now, we will create a few other methods that will help us deal with our database: getStudent: Gets the list of students present in our database: fun getStudents()=students addStudent: This method will add a student to our database: fun addStudent(student: Student): Boolean { students.add(student) return true } Now let's put this database to use. We will be creating a REST controller that will handle the request. We will create a StudentController and annotate it with @RestController. Using @RestController is simple, and it's the preferred method for creating MVC RESTful web services. Once created, we need to provide our database using Spring dependency injection, for which we will need the @Autowired annotation. Here's how our StudentController looks: @RestController class StudentController { @Autowired private lateinit var database: StudentDatabase } Now we will set our response to the / path. We will show the list of students in our database. For that, we will simply create a method that lists out students. We will need to annotate it with @RequestMapping and provide parameters such as path and request method (GET, POST, and such): @RequestMapping("", method = arrayOf(RequestMethod.GET)) fun students() = database.getStudents() This is what our controller looks like now. It is a simple REST controller: package college import org.springframework.beans.factory.annotation.Autowired import org.springframework.web.bind.annotation.RequestMapping import org.springframework.web.bind.annotation.RequestMethod import org.springframework.web.bind.annotation.RestController @RestController class StudentController { @Autowired private lateinit var database: StudentDatabase @RequestMapping("", method = arrayOf(RequestMethod.GET)) fun students() = database.getStudents() } Now when you restart the server and go to http://localhost:8080, we will see the response as follows: As you can see, Spring is intelligent enough to provide the response in the JSON format, which makes it easy to design APIs. Now let's try to create another endpoint that will fetch a student's details from a roll number: @GetMapping("/student/{roll_number}") fun studentWithRollNumber( @PathVariable("roll_number") roll_number:String) = database.getStudentWithRollNumber(roll_number) Now, if you try the http://localhost:8080/student/2013001 endpoint, you will see the given output: {"roll_number":"2013001","name":"Aanand Shekhar Roy"} Next, we will try to add a student to the database. We will be doing it via the POST method: @RequestMapping("/add", method = arrayOf(RequestMethod.POST)) fun addStudent(@RequestBody student: Student) = if (database.addStudent(student)) student else throw Exception("Something went wrong") There's more… So far, our server has been dependent on IDE. We would definitely want to make it independent of an IDE. Thanks to Gradle, it is very easy to create a runnable JAR just with the following: ./gradlew clean bootRepackage The preceding command is platform independent and uses the Gradle build system to build the application. Now, you just need to type the mentioned command to run it: java -jar build/libs/gs-rest-service-0.1.0.jar You can then see the following output as before: Started AppKt in 4.858 seconds (JVM running for 5.548) This means your server is running successfully. Creating the Application class for Spring Boot The SpringApplication class is used to bootstrap our application. We've used it in the previous recipes; we will see how to create the Application class for Spring Boot in this recipe. We will be using IntelliJ IDE for coding purposes. To set up the environment, read previous recipes, especially the Setting up dependencies for building RESTful services recipe. How to do it… If you've used Spring Boot before, you must be familiar with using @Configuration, @EnableAutoConfiguration, and @ComponentScan in your main class. These were used so frequently that Spring Boot provides a convenient @SpringBootApplication alternative. The Spring Boot looks for the public static main method, and we will use a top-level function outside the Application class. If you noted, while setting up the dependencies, we used the kotlin-spring plugin, hence we don't need to make the Application class open. Here's an example of the Spring Boot application: package college import org.springframework.boot.SpringApplication import org.springframework.boot.autoconfigure.SpringBootApplication @SpringBootApplication class Application fun main(args: Array<String>) { SpringApplication.run(Application::class.java, *args) } The Spring Boot application executes the static run() method, which takes two parameters and starts a autoconfigured Tomcat web server when Spring application is started. When everything is set, you can start the application by executing the following command: ./gradlew bootRun If everything goes well, you will see the following output in the console: This is along with the last message—Started AppKt in xxx seconds. This means that your application is up and running. In order to run it as an independent server, you need to create a JAR and then you can execute as follows: ./gradlew clean bootRepackage Now, to run it, you just need to type the following command: java -jar build/libs/gs-rest-service-0.1.0.jar We learned how to set up dependencies for building RESTful services, creating a REST controller, and creating the application class for Spring boot. If you are interested in learning more about Kotlin then be sure to check out the 'Kotlin Programming Cookbook'. Build your first Android app with Kotlin 5 reasons to choose Kotlin over Java Getting started with Kotlin programming Forget C and Java. Learn Kotlin: the next universal programming language

0
0
47761

article-image-mastering-code-generation-exploring-the-llvm-backend

Kai Nacke, Amy Kwan

24 Oct 2024

10 min read

Mastering Code Generation: Exploring the LLVM Backend

Kai Nacke, Amy Kwan

24 Oct 2024

10 min read

This article is an excerpt from the book, Learn LLVM 17 - Second Edition, by Kai Nacke, Amy Kwan. Learn how to build your own compiler, from reading the source to emitting optimized machine code. This book guides you through the JIT compilation framework, extending LLVM in a variety of ways, and using the right tools for troubleshooting.Introduction Generating optimized machine code is a critical task in the compilation process, and the LLVM backend plays a pivotal role in this transformation. The backend translates the LLVM Intermediate Representation (IR), derived from the Abstract Syntax Tree (AST), into machine code that can be executed on target architectures. Understanding how to generate this IR effectively is essential for leveraging LLVM's capabilities. This article delves into the intricacies of generating LLVM IR using a simple expression language example. We'll explore the necessary steps, from declaring library functions to implementing a code generation visitor, ensuring a comprehensive understanding of the LLVM backend's functionality. Generating code with the LLVM backend The task of the backend is to create optimized machine code from the LLVM IR of a module. The IR is the interface to the backend and can be created using a C++ interface or in textual form. Again, the IR is generated from the AST. Textual representation of LLVM IR Before trying to generate the LLVM IR, it should be clear what we want to generate. For our example expression language, the high-level plan is as follows: 1. Ask the user for the value of each variable. 2. Calculate the value of the expression. 3. Print the result. To ask the user to provide a value for a variable and to print the result, two library functions are used: calc_read() and calc_write(). For the with a: 3*a expression, the generated IR is as follows: 1. The library functions must be declared, like in C. The syntax also resembles C. The type before the function name is the return type. The type names surrounded by parenthesis are the argument types. The declaration can appear anywhere in the file: declare i32 @calc_read(ptr) declare void @calc_write(i32) 2. The calc_read() function takes the variable name as a parameter. The following construct defines a constant, holding a and the null byte used as a string terminator in C: @a.str = private constant [2 x i8] c"a\00" 3. It follows the main() function. The parameter names are omitted because they are not used. Just as in C, the body of the function is enclosed in braces: define i32 @main(i32, ptr) { 4. Each basic block must have a label. Because this is the first basic block of the function, we name it entry: entry: 5. The calc_read() function is called to read the value for the a variable. The nested getelemenptr instruction performs an index calculation to compute the pointer to the first element of the string constant. The function result is assigned to the unnamed %2 variable. %2 = call i32 @calc_read(ptr @a.str) 6. Next, the variable is multiplied by 3: %3 = mul nsw i32 3, %2 7. The result is printed on the console via a call to the calc_write() function: call void @calc_write(i32 %3) 8. Last, the main() function returns 0 to indicate a successful execution: ret i32 0 } Each value in the LLVM IR is typed, with i32 denoting the 32-bit bit integer type and ptr denoting a pointer. Note: previous versions of LLVM used typed pointers. For example, a pointer to a byte was expressed as i8* in LLVM. Since L LVM 16, opaque pointers are the default. An opaque pointer is just a pointer to memory, without carrying any type information about it. The notation in LLVM IR is ptr.Previous versions of LLVM used typed pointers. For example, a pointer to a byte was expressed as i8* in LLVM. Since L LVM 16, opaque pointers are the default. An opaque pointer is just a pointer to memory, without carrying any type information about it. The notation in LLVM IR is ptr. Since it is now clear what the IR looks like, let’s generate it from the AST. Generating the IR from the AST The interface, provided in the CodeGen.h header file, is very small: #ifndef CODEGEN_H #define CODEGEN_H #include "AST.h" class CodeGen { public: void compile(AST *Tree); }; #endifBecause the AST contains the information, the basic idea is to use a visitor to walk the AST. The CodeGen.cpp file is implemented as follows: 1. The required includes are at the top of the file: #include "CodeGen.h" #include "llvm/ADT/StringMap.h" #include "llvm/IR/IRBuilder.h" #include "llvm/IR/LLVMContext.h" #include "llvm/Support/raw_ostream.h" 2. The namespace of the LLVM libraries is used for name lookups: using namespace llvm; 3. First, some private members are declared in the visitor. Each compilation unit is represented in LLVM by the Module class and the visitor has a pointer to the module called M. For easy IR generation, the Builder (of type IRBuilder<>) is used. LLVM has a class hierarchy to represent types in IR. You can look up the instances for basic types such as i32 from the LLVM context. These basic types are used very often. To avoid repeated lookups, we cache the needed type instances: VoidTy, Int32Ty, PtrTy, and Int32Zero. The V member is the current calculated value, which is updated through the tree traversal. And last, nameMap maps a variable name to the value returned from the calc_read() function: namespace { class ToIRVisitor : public ASTVisitor { Module *M; IRBuilder<> Builder; Type *VoidTy; Type *Int32Ty; PointerType *PtrTy; Constant *Int32Zero; Value *V; StringMap<Value *> nameMap;4. The constructor initializes all members: public: ToIRVisitor(Module *M) : M(M), Builder(M->getContext()) { VoidTy = Type::getVoidTy(M->getContext()); Int32Ty = Type::getInt32Ty(M->getContext()); PtrTy = PointerType::getUnqual(M->getContext()); Int32Zero = ConstantInt::get(Int32Ty, 0, true); }5. For each function, a FunctionType instance must be created. In C++ terminology, this is a function prototype. A function itself is defined with a Function instance. The run() method defines the main() function in the LLVM IR first: void run(AST *Tree) { FunctionType *MainFty = FunctionType::get( Int32Ty, {Int32Ty, PtrTy}, false); Function *MainFn = Function::Create( MainFty, GlobalValue::ExternalLinkage, "main", M); 6. Then we create the BB basic block with the entry label, and attach it to the IR builder: BasicBlock *BB = BasicBlock::Create(M->getContext(), "entry", MainFn); Builder.SetInsertPoint(BB);7. With this preparation done, the tree traversal can begin: Tree->accept(*this); 8. After the tree traversal, the computed value is printed via a call to the calc_write() function. Again, a function prototype (an instance of FunctionType) has to be created. The only parameter is the current value, V: FunctionType *CalcWriteFnTy = FunctionType::get(VoidTy, {Int32Ty}, false); Function *CalcWriteFn = Function::Create( CalcWriteFnTy, GlobalValue::ExternalLinkage, "calc_write", M); Builder.CreateCall(CalcWriteFnTy, CalcWriteFn, {V});9. The generation finishes by returning 0 from the main() function: Builder.CreateRet(Int32Zero); }10. A WithDecl node holds the names of the declared variables. First, we create a function prototype for the calc_read() function: virtual void visit(WithDecl &Node) override { FunctionType *ReadFty = FunctionType::get(Int32Ty, {PtrTy}, false); Function *ReadFn = Function::Create( ReadFty, GlobalValue::ExternalLinkage, "calc_read", M);11. The method loops through the variable names: for (auto I = Node.begin(), E = Node.end(); I != E; ++I) { 12. For each variable, a string with a variable name is created: StringRef Var = *I; Constant *StrText = ConstantDataArray::getString( M->getContext(), Var); GlobalVariable *Str = new GlobalVariable( *M, StrText->getType(), /*isConstant=*/true, GlobalValue::PrivateLinkage, StrText, Twine(Var).concat(".str"));13. Then the IR code to call the calc_read() function is created. The string created in the previous step is passed as a parameter: CallInst *Call = Builder.CreateCall(ReadFty, ReadFn, {Str});14. The returned value is stored in the mapNames map for later use: nameMap[Var] = Call; }15. The tree traversal continues with the expression: Node.getExpr()->accept(*this); };16. A Factor node is either a variable name or a number. For a variable name, the value is looked up in the mapNames map. For a number, the value is converted to an integer and turned into a constant value: virtual void visit(Factor &Node) override { if (Node.getKind() == Factor::Ident) { V = nameMap[Node.getVal()]; } else { int intval; Node.getVal().getAsInteger(10, intval); V = ConstantInt::get(Int32Ty, intval, true); } };17. And last, for a BinaryOp node, the right calculation operation must be used: virtual void visit(BinaryOp &Node) override { Node.getLeft()->accept(*this); Value *Left = V; Node.getRight()->accept(*this); Value *Right = V; switch (Node.getOperator()) { case BinaryOp::Plus: V = Builder.CreateNSWAdd(Left, Right); break; case BinaryOp::Minus: V = Builder.CreateNSWSub(Left, Right); break; case BinaryOp::Mul: V = Builder.CreateNSWMul(Left, Right); break; case BinaryOp::Div: V = Builder.CreateSDiv(Left, Right); break; } }; }; }18. With this, the visitor class is complete. The compile() method creates the global context and the module, runs the tree traversal, and dumps the generated IR to the console: void CodeGen::compile(AST *Tree) { LLVMContext Ctx; Module *M = new Module("calc.expr", Ctx); ToIRVisitor ToIR(M); ToIR.run(Tree); M->print(outs(), nullptr); }We now have implemented the frontend of the compiler, from reading the source up to generating the IR. Of course, all these components must work together on user input, which is the task of the compiler driver. We also need to implement the functions needed at runtime. Both are topics of the next section covered in the book. Conclusion In conclusion, the process of generating LLVM IR from an AST involves multiple steps, each crucial for producing efficient machine code. This article highlighted the structure and components necessary for this task, including function declarations, basic block management, and tree traversal using a visitor pattern. By carefully managing these elements, developers can harness the power of LLVM to create optimized and reliable machine code. The integration of all these components, alongside user input and runtime functions, completes the frontend implementation of the compiler. This sets the stage for the next phase, focusing on the compiler driver and runtime functions, ensuring seamless execution and integration of the compiled code. Author BioKai Nacke is a professional IT architect currently residing in Toronto, Canada. He holds a diploma in computer science from the Technical University of Dortmund, Germany. and his diploma thesis on universal hash functions was recognized as the best of the semester. With over 20 years of experience in the IT industry, Kai has extensive expertise in the development and architecture of business and enterprise applications. In his current role, he evolves an LLVM/clang-based compiler.nFor several years, Kai served as the maintainer of LDC, the LLVM-based D compiler. He is the author of D Web Development and Learn LLVM 12, both published by Packt. In the past, he was a speaker in the LLVM developer room at the Free and Open Source Software Developers’ European Meeting (FOSDEM).Amy Kwan is a compiler developer currently residing in Toronto, Canada. Originally, from the Canadian prairies, Amy holds a Bachelor of Science in Computer Science from the University of Saskatchewan. In her current role, she leverages LLVM technology as a backend compiler developer. Previously, Amy has been a speaker at the LLVM Developer Conference in 2022 alongside Kai Nacke.

0
0
47753

article-image-powershell-basics-for-it-professionals

Savia Lobo

16 Dec 2019

6 min read

PowerShell Basics for IT Professionals

Savia Lobo

16 Dec 2019

6 min read

PowerShell is Microsoft’s automation platform for IT Pros. Of late, there have been a lot of questions around the complexity of this latest automation tool by Microsoft. At Microsoft Ignite 2018, Jason Himmelstein, Director of Technical Strategy and Strategic Partnerships, Office Apps & Services MVP, explained the basics of PowerShell and how to truly optimize your SharePoint implementation using this powerful IT pro toolset. While in this post we look at the big picture, you can check out the complete video here: ‘Introduction to PowerShell for the anxious IT pro’. Want to do more with PowerShell? After learning the basics, you can learn how to use PowerShell to automate complex Windows server tasks. You can also improve PowerShell's usability, and control and manage Windows-based environments by working through exciting recipes given in Windows Server 2019 Automation with PowerShell Cookbook - Third Edition written by Thomas Lee. Himmelstein starts off by saying PowerShell isn’t a packaged executable, nor it is developer-centric that needs one to understand code, and it is easy for an IT Pro to understand. What is PowerShell? Windows PowerShell is Microsoft’s task automation framework, consisting of a command-line shell and associated scripting language built on .NET Framework. It provides full access to COM and WMI, enabling administrators to perform administrative tasks on both local and remote Windows systems. In simple words, PowerShell is an object-based, not a text-based, command-line interface for Microsoft Technologies. This means results in PowerShell can be acted upon and not just read from. One can cause huge damage to an environment using PowerShell as there is no back button in PowerShell. However, to check what must have gone wrong, you can check the logs but can not undo actions. Why PowerShell matters Regardless of the platform a person uses such as Office 365, Azure, etc., PowerShell can be easily implemented due to its cross-platform capability. Himmelstein also highlights one can also get started with Azure PowerShell by trying it out in an Azure Cloud Shell environment, an interactive, authenticated, browser-accessible shell for managing Azure resources. Azure Cloud Shell comes equipped with commonly used CLI tools including Linux shell interpreters, PowerShell modules, Azure tools, text editors, source control, build tools, container tools, database tools and more. Cloud Shell also includes language support for several popular programming languages such as Node.js, .NET and Python. Cloud Shell also securely authenticates automatically for instant access to your resources through the Azure CLI or Azure PowerShell cmdlets. Users can use PowerShell in Cloud Shell. One can also develop applications using PowerShell or can use PowerShell via Source Control Management (SCM). Basics of PowerShell PowerShell Hardware There are two ways one can use PowerShell; one is via the PowerShell Console, which is similar to a command line. The other is PowerShell ISE (Integrated Scripting Environment). One thing Himmelstein encourages is, “we run PowerShell in the Console and we write PowerShell in the ISE.” The reason is there are certain functionalities that do not work in the ISE when one hits the ‘Run’ command. In such cases, the user will have to take that PowerShell out, copy it, save the file and run it in a command window. cmdlets Cmdlets are the main building blocks of PowerShell. These are mini commands that perform one action. These have the ability to pipe the output of one cmdlet into further cmdlets. These can also perform equality tests with expressions such as -eq, -lt, -match; one can diff easily within a PowerShell. Modules There are four types of Modules in PowerShell: Script: A Script module is a file (.psm1) that contains any valid Windows PowerShell code. Binary: A binary module is a .NET framework assembly (.dll) that contains compiled code. Manifest: A module Manifest is a Windows PowerShell data file (.psd1) that describes the contents of a module and determines how a module is processed. Dynamic: A dynamic module does not persist to disk. It is created using New Module, is intended to be short-lived, and cannot be accessed by Get-Module. Himmelstein prefers not to use the Dynamic module as it persists for just one session. Objects and Members Objects are instances of classes and have properties and methods. Members are properties and methods of an object. Properties define what an Object is and Methods define what you can do with the object. Himmelstein puts together all these terms in a simple way: Objects = stuff Cmdlets = things you can do with the stuff Modules = list of things you can do with the stuff Properties = details about the stuff Methods = instructions for things you can do with the stuff PipeLine Using PipeLines one can chain objects together for processing. The output of a pipelined object becomes the object itself. Functional Explanation Get-command: Gets all the cmdlet installed on your computer. Get-help: Displays additional information about a cmdlet Get-member: Listing the Properties and Methods of a Command or Object Get-verb: Gets approved Windows PowerShell verbs Start-transcript: Logs everything you do in that PowerShell window to a file Get- history: If you didn’t start transcript, you can still review your history before closing your Shell or ISE window. Tips for PowerShell beginners Use Variables: You can use any variables except the ones that are reserved by the system, which you will be prompted when you try to enter a reserved variable. Call one thing at a time Comment your scripts as this may save you a lot of time. Create scripts using an ISE/IDE, you can also use the Visual Studio Code and then execute in Shell. Dispose of your objects. Close the command window by typing Exit. Test before using in Production Write reusable scripts. What Powershell beginners should avoid Rewriting your variables Hard coding your scripts such as Password as it may get fired by PowerShell Taking code from the internet or vendor and just Run in your environment (You should read every code before you run it in your environment). Assuming the code is not harmful; it is. There is no back button in PowerShell and you cannot undo things. Running your code in an IDE/ISE and expect everything to work. PowerShell Syntax and Bracketology Syntax ‘#’ is for Comment ‘+’ is for Add ‘=’, ‘-eq’, are for Equal ‘!’, ‘-ne’, ‘-not’ are for ‘not equal’ Brackets ‘()’ Curved brackets also known as Parentheses are used for required options, compulsory arguments, or control structures. ‘{}’ Curly brackets are used for block expression within a command block and is also used to open a code block ‘[]’ Square brackets are used to denote optional elements or parameters and also used for match functions. Now that you know the basics of PowerShell, you can start performing key admin tasks on Windows Server 2019. To further learn how to employ best practices for writing PowerShell scripts and configuring Windows Server 2019 and leverage PowerShell to automate complex Windows server tasks, check out our book, Windows Server 2019 Automation with PowerShell Cookbook - Third Edition written by Thomas Lee. Weaponizing PowerShell with Metasploit and how to defend against PowerShell attacks [Tutorial] Scripting with Windows Powershell Desired State Configuration [Video] Automate tasks using Azure PowerShell and Azure CLI [Tutorial]

0
0
47589

article-image-bayesian-network-fundamentals

Packt

10 Aug 2015

25 min read

Bayesian Network Fundamentals

Packt

10 Aug 2015

25 min read

In this article by Ankur Ankan and Abinash Panda, the authors of Mastering Probabilistic Graphical Models Using Python, we'll cover the basics of random variables, probability theory, and graph theory. We'll also see the Bayesian models and the independencies in Bayesian models. A graphical model is essentially a way of representing joint probability distribution over a set of random variables in a compact and intuitive form. There are two main types of graphical models, namely directed and undirected. We generally use a directed model, also known as a Bayesian network, when we mostly have a causal relationship between the random variables. Graphical models also give us tools to operate on these models to find conditional and marginal probabilities of variables, while keeping the computational complexity under control. (For more resources related to this topic, see here.) Probability theory To understand the concepts of probability theory, let's start with a real-life situation. Let's assume we want to go for an outing on a weekend. There are a lot of things to consider before going: the weather conditions, the traffic, and many other factors. If the weather is windy or cloudy, then it is probably not a good idea to go out. However, even if we have information about the weather, we cannot be completely sure whether to go or not; hence we have used the words probably or maybe. Similarly, if it is windy in the morning (or at the time we took our observations), we cannot be completely certain that it will be windy throughout the day. The same holds for cloudy weather; it might turn out to be a very pleasant day. Further, we are not completely certain of our observations. There are always some limitations in our ability to observe; sometimes, these observations could even be noisy. In short, uncertainty or randomness is the innate nature of the world. The probability theory provides us the necessary tools to study this uncertainty. It helps us look into options that are unlikely yet probable. Random variable Probability deals with the study of events. From our intuition, we can say that some events are more likely than others, but to quantify the likeliness of a particular event, we require the probability theory. It helps us predict the future by assessing how likely the outcomes are. Before going deeper into the probability theory, let's first get acquainted with the basic terminologies and definitions of the probability theory. A random variable is a way of representing an attribute of the outcome. Formally, a random variable X is a function that maps a possible set of outcomes ? to some set E, which is represented as follows: X : ? ? E As an example, let us consider the outing example again. To decide whether to go or not, we may consider the skycover (to check whether it is cloudy or not). Skycover is an attribute of the day. Mathematically, the random variable skycover (X) is interpreted as a function, which maps the day (?) to its skycover values (E). So when we say the event X = 40.1, it represents the set of all the days {?} such that , where is the mapping function. Formally speaking, . Random variables can either be discrete or continuous. A discrete random variable can only take a finite number of values. For example, the random variable representing the outcome of a coin toss can take only two values, heads or tails; and hence, it is discrete. Whereas, a continuous random variable can take infinite number of values. For example, a variable representing the speed of a car can take any number values. For any event whose outcome is represented by some random variable (X), we can assign some value to each of the possible outcomes of X, which represents how probable it is. This is known as the probability distribution of the random variable and is denoted by P(X). For example, consider a set of restaurants. Let X be a random variable representing the quality of food in a restaurant. It can take up a set of values, such as {good, bad, average}. P(X), represents the probability distribution of X, that is, if P(X = good) = 0.3, P(X = average) = 0.5, and P(X = bad) = 0.2. This means there is 30 percent chance of a restaurant serving good food, 50 percent chance of it serving average food, and 20 percent chance of it serving bad food. Independence and conditional independence In most of the situations, we are rather more interested in looking at multiple attributes at the same time. For example, to choose a restaurant, we won't only be looking just at the quality of food; we might also want to look at other attributes, such as the cost, location, size, and so on. We can have a probability distribution over a combination of these attributes as well. This type of distribution is known as joint probability distribution. Going back to our restaurant example, let the random variable for the quality of food be represented by Q, and the cost of food be represented by C. Q can have three categorical values, namely {good, average, bad}, and C can have the values {high, low}. So, the joint distribution for P(Q, C) would have probability values for all the combinations of states of Q and C. P(Q = good, C = high) will represent the probability of a pricey restaurant with good quality food, while P(Q = bad, C = low) will represent the probability of a restaurant that is less expensive with bad quality food. Let us consider another random variable representing an attribute of a restaurant, its location L. The cost of food in a restaurant is not only affected by the quality of food but also the location (generally, a restaurant located in a very good location would be more costly as compared to a restaurant present in a not-very-good location). From our intuition, we can say that the probability of a costly restaurant located at a very good location in a city would be different (generally, more) than simply the probability of a costly restaurant, or the probability of a cheap restaurant located at a prime location of city is different (generally less) than simply probability of a cheap restaurant. Formally speaking, P(C = high | L = good) will be different from P(C = high) and P(C = low | L = good) will be different from P(C = low). This indicates that the random variables C and L are not independent of each other. These attributes or random variables need not always be dependent on each other. For example, the quality of food doesn't depend upon the location of restaurant. So, P(Q = good | L = good) or P(Q = good | L = bad)would be the same as P(Q = good), that is, our estimate of the quality of food of the restaurant will not change even if we have knowledge of its location. Hence, these random variables are independent of each other. In general, random variables can be considered as independent of each other, if: They may also be considered independent if: We can easily derive this conclusion. We know the following from the chain rule of probability: P(X, Y) = P(X) P(Y | X) If Y is independent of X, that is, if X | Y, then P(Y | X) = P(Y). Then: P(X, Y) = P(X) P(Y) Extending this result on multiple variables, we can easily get to the conclusion that a set of random variables are independent of each other, if their joint probability distribution is equal to the product of probabilities of each individual random variable. Sometimes, the variables might not be independent of each other. To make this clearer, let's add another random variable, that is, the number of people visiting the restaurant N. Let's assume that, from our experience we know the number of people visiting only depends on the cost of food at the restaurant and its location (generally, lesser number of people visit costly restaurants). Does the quality of food Q affect the number of people visiting the restaurant? To answer this question, let's look into the random variable affecting N, cost C, and location L. As C is directly affected by Q, we can conclude that Q affects N. However, let's consider a situation when we know that the restaurant is costly, that is, C = high and let's ask the same question, "does the quality of food affect the number of people coming to the restaurant?". The answer is no. The number of people coming only depends on the price and location, so if we know that the cost is high, then we can easily conclude that fewer people will visit, irrespective of the quality of food. Hence, . This type of independence is called conditional independence. Installing tools Let's now see some coding examples using pgmpy, to represent joint distributions and independencies. Here, we will mostly work with IPython and pgmpy (and a few other libraries) for coding examples. So, before moving ahead, let's get a basic introduction to these. IPython IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, which offers enhanced introspection, rich media, additional shell syntax, tab completion, and a rich history. IPython provides the following features: Powerful interactive shells (terminal and Qt-based) A browser-based notebook with support for code, text, mathematical expressions, inline plots, and other rich media Support for interactive data visualization and use of GUI toolkits Flexible and embeddable interpreters to load into one's own projects Easy-to-use and high performance tools for parallel computing You can install IPython using the following command: >>> pip3 install ipython To start the IPython command shell, you can simply type ipython3 in the terminal. For more installation instructions, you can visit http://ipython.org/install.html. pgmpy pgmpy is a Python library to work with Probabilistic Graphical models. As it's currently not on PyPi, we will need to build it manually. You can get the source code from the Git repository using the following command: >>> git clone https://github.com/pgmpy/pgmpy Now cd into the cloned directory switch branch for version used and build it with the following code: >>> cd pgmpy >>> git checkout book/v0.1 >>> sudo python3 setup.py install For more installation instructions, you can visit http://pgmpy.org/install.html. With both IPython and pgmpy installed, you should now be able to run the examples. Representing independencies using pgmpy To represent independencies, pgmpy has two classes, namely IndependenceAssertion and Independencies. The IndependenceAssertion class is used to represent individual assertions of the form of or . Let's see some code to represent assertions: # Firstly we need to import IndependenceAssertion In [1]: from pgmpy.independencies import IndependenceAssertion # Each assertion is in the form of [X, Y, Z] meaning X is # independent of Y given Z. In [2]: assertion1 = IndependenceAssertion('X', 'Y') In [3]: assertion1 Out[3]: (X _|_ Y) Here, assertion1 represents that the variable X is independent of the variable Y. To represent conditional assertions, we just need to add a third argument to IndependenceAssertion: In [4]: assertion2 = IndependenceAssertion('X', 'Y', 'Z') In [5]: assertion2 Out [5]: (X _|_ Y | Z) In the preceding example, assertion2 represents . IndependenceAssertion also allows us to represent assertions in the form of . To do this, we just need to pass a list of random variables as arguments: In [4]: assertion2 = IndependenceAssertion('X', 'Y', 'Z') In [5]: assertion2 Out[5]: (X _|_ Y | Z) Moving on to the Independencies class, an Independencies object is used to represent a set of assertions. Often, in the case of Bayesian or Markov networks, we have more than one assertion corresponding to a given model, and to represent these independence assertions for the models, we generally use the Independencies object. Let's take a few examples: In [8]: from pgmpy.independencies import Independencies # There are multiple ways to create an Independencies object, we # could either initialize an empty object or initialize with some # assertions. In [9]: independencies = Independencies() # Empty object In [10]: independencies.get_assertions() Out[10]: [] In [11]: independencies.add_assertions(assertion1, assertion2) In [12]: independencies.get_assertions() Out[12]: [(X _|_ Y), (X _|_ Y | Z)] We can also directly initialize Independencies in these two ways: In [13]: independencies = Independencies(assertion1, assertion2) In [14]: independencies = Independencies(['X', 'Y'], ['A', 'B', 'C']) In [15]: independencies.get_assertions() Out[15]: [(X _|_ Y), (A _|_ B | C)] Representing joint probability distributions using pgmpy We can also represent joint probability distributions using pgmpy's JointProbabilityDistribution class. Let's say we want to represent the joint distribution over the outcomes of tossing two fair coins. So, in this case, the probability of all the possible outcomes would be 0.25, which is shown as follows: In [16]: from pgmpy.factors import JointProbabilityDistribution as Joint In [17]: distribution = Joint(['coin1', 'coin2'], [2, 2], [0.25, 0.25, 0.25, 0.25]) Here, the first argument includes names of random variable. The second argument is a list of the number of states of each random variable. The third argument is a list of probability values, assuming that the first variable changes its states the slowest. So, the preceding distribution represents the following: In [18]: print(distribution) +--------------------------------------+ ¦ coin1 ¦ coin2 ¦ P(coin1,coin2) ¦ ¦---------+---------+------------------¦ ¦ coin1_0 ¦ coin2_0 ¦ 0.2500 ¦ +---------+---------+------------------¦ ¦ coin1_0 ¦ coin2_1 ¦ 0.2500 ¦ +---------+---------+------------------¦ ¦ coin1_1 ¦ coin2_0 ¦ 0.2500 ¦ +---------+---------+------------------¦ ¦ coin1_1 ¦ coin2_1 ¦ 0.2500 ¦ +--------------------------------------+ We can also conduct independence queries over these distributions in pgmpy: In [19]: distribution.check_independence('coin1', 'coin2') Out[20]: True Conditional probability distribution Let's take an example to understand conditional probability better. Let's say we have a bag containing three apples and five oranges, and we want to randomly take out fruits from the bag one at a time without replacing them. Also, the random variables and represent the outcomes in the first try and second try respectively. So, as there are three apples and five oranges in the bag initially, and . Now, let's say that in our first attempt we got an orange. Now, we cannot simply represent the probability of getting an apple or orange in our second attempt. The probabilities in the second attempt will depend on the outcome of our first attempt and therefore, we use conditional probability to represent such cases. Now, in the second attempt, we will have the following probabilities that depend on the outcome of our first try: , , , and . The Conditional Probability Distribution (CPD) of two variables and can be represented as , representing the probability of given that is the probability of after the event has occurred and we know it's outcome. Similarly, we can have representing the probability of after having an observation for . The simplest representation of CPD is tabular CPD. In a tabular CPD, we construct a table containing all the possible combinations of different states of the random variables and the probabilities corresponding to these states. Let's consider the earlier restaurant example. Let's begin by representing the marginal distribution of the quality of food with Q. As we mentioned earlier, it can be categorized into three values {good, bad, average}. For example, P(Q) can be represented in the tabular form as follows: Quality P(Q) Good 0.3 Normal 0.5 Bad 0.2 Similarly, let's say P(L) is the probability distribution of the location of the restaurant. Its CPD can be represented as follows: Location P(L) Good 0.6 Bad 0.4 As the cost of restaurant C depends on both the quality of food Q and its location L, we will be considering P(C | Q, L), which is the conditional distribution of C, given Q and L: Location Good Bad Quality Good Normal Bad Good Normal Bad Cost High 0.8 0.6 0.1 0.6 0.6 0.05 Low 0.2 0.4 0.9 0.4 0.4 0.95 Representing CPDs using pgmpy Let's first see how to represent the tabular CPD using pgmpy for variables that have no conditional variables: In [1]: from pgmpy.factors import TabularCPD # For creating a TabularCPD object we need to pass three # arguments: the variable name, its cardinality that is the number # of states of the random variable and the probability value # corresponding each state. In [2]: quality = TabularCPD(variable='Quality', variable_card=3, values=[[0.3], [0.5], [0.2]]) In [3]: print(quality) +----------------------+ ¦ ['Quality', 0] ¦ 0.3 ¦ +----------------+-----¦ ¦ ['Quality', 1] ¦ 0.5 ¦ +----------------+-----¦ ¦ ['Quality', 2] ¦ 0.2 ¦ +----------------------+ In [4]: quality.variables Out[4]: OrderedDict([('Quality', [State(var='Quality', state=0), State(var='Quality', state=1), State(var='Quality', state=2)])]) In [5]: quality.cardinality Out[5]: array([3]) In [6]: quality.values Out[6]: array([0.3, 0.5, 0.2]) You can see here that the values of the CPD are a 1D array instead of a 2D array, which you passed as an argument. Actually, pgmpy internally stores the values of the TabularCPD as a flattened numpy array. In [7]: location = TabularCPD(variable='Location', variable_card=2, values=[[0.6], [0.4]]) In [8]: print(location) +-----------------------+ ¦ ['Location', 0] ¦ 0.6 ¦ +-----------------+-----¦ ¦ ['Location', 1] ¦ 0.4 ¦ +-----------------------+ However, when we have conditional variables, we also need to specify them and the cardinality of those variables. Let's define the TabularCPD for the cost variable: In [9]: cost = TabularCPD( variable='Cost', variable_card=2, values=[[0.8, 0.6, 0.1, 0.6, 0.6, 0.05], [0.2, 0.4, 0.9, 0.4, 0.4, 0.95]], evidence=['Q', 'L'], evidence_card=[3, 2]) Graph theory The second major framework for the study of probabilistic graphical models is graph theory. Graphs are the skeleton of PGMs, and are used to compactly encode the independence conditions of a probability distribution. Nodes and edges The foundation of graph theory was laid by Leonhard Euler when he solved the famous Seven Bridges of Konigsberg problem. The city of Konigsberg was set on both sides by the Pregel river and included two islands that were connected and maintained by seven bridges. The problem was to find a walk to exactly cross all the bridges once in a single walk. To visualize the problem, let's think of the graph in Fig 1.1: Fig 1.1: The Seven Bridges of Konigsberg graph Here, the nodes a, b, c, and d represent the land, and are known as vertices of the graph. The line segments ab, bc, cd, da, ab, and bc connecting the land parts are the bridges and are known as the edges of the graph. So, we can think of the problem of crossing all the bridges once in a single walk as tracing along all the edges of the graph without lifting our pencils. Formally, a graph G = (V, E) is an ordered pair of finite sets. The elements of the set V are known as the nodes or the vertices of the graph, and the elements of are the edges or the arcs of the graph. The number of nodes or cardinality of G, denoted by |V|, are known as the order of the graph. Similarly, the number of edges denoted by |E| are known as the size of the graph. Here, we can see that the Konigsberg city graph shown in Fig 1.1 is of order 4 and size 7. In a graph, we say that two vertices, u, v ? V are adjacent if u, v ? E. In the City graph, all the four vertices are adjacent to each other because there is an edge for every possible combination of two vertices in the graph. Also, for a vertex v ? V, we define the neighbors set of v as . In the City graph, we can see that b and d are neighbors of c. Similarly, a, b, and c are neighbors of d. We define an edge to be a self loop if the start vertex and the end vertex of the edge are the same. We can put it more formally as, any edge of the form (u, u), where u ? V is a self loop. Until now, we have been talking only about graphs whose edges don't have a direction associated with them, which means that the edge (u, v) is same as the edge (v, u). These types of graphs are known as undirected graphs. Similarly, we can think of a graph whose edges have a sense of direction associated with it. For these graphs, the edge set E would be a set of ordered pair of vertices. These types of graphs are known as directed graphs. In the case of a directed graph, we also define the indegree and outdegree for a vertex. For a vertex v ? V, we define its outdegree as the number of edges originating from the vertex v, that is, . Similarly, the indegree is defined as the number of edges that end at the vertex v, that is, . Walk, paths, and trails For a graph G = (V, E) and u,v ? V, we define a u - v walk as an alternating sequence of vertices and edges, starting with u and ending with v. In the City graph of Fig 1.1, we can have an example of a - d walk as . If there aren't multiple edges between the same vertices, then we simply represent a walk by a sequence of vertices. As in the case of the Butterfly graph shown in Fig 1.2, we can have a walk W : a, c, d, c, e: Fig 1.2: Butterfly graph—a undirected graph A walk with no repeated edges is known as a trail. For example, the walk in the City graph is a trail. Also, a walk with no repeated vertices, except possibly the first and the last, is known as a path. For example, the walk in the City graph is a path. Also, a graph is known as cyclic if there are one or more paths that start and end at the same node. Such paths are known as cycles. Similarly, if there are no cycles in a graph, it is known as an acyclic graph. Bayesian models In most of the real-life cases when we would be representing or modeling some event, we would be dealing with a lot of random variables. Even if we would consider all the random variables to be discrete, there would still be exponentially large number of values in the joint probability distribution. Dealing with such huge amount of data would be computationally expensive (and in some cases, even intractable), and would also require huge amount of memory to store the probability of each combination of states of these random variables. However, in most of the cases, many of these variables are marginally or conditionally independent of each other. By exploiting these independencies, we can reduce the number of values we need to store to represent the joint probability distribution. For instance, in the previous restaurant example, the joint probability distribution across the four random variables that we discussed (that is, quality of food Q, location of restaurant L, cost of food C, and the number of people visiting N) would require us to store 23 independent values. By the chain rule of probability, we know the following: P(Q, L, C, N) = P(Q) P(L|Q) P(C|L, Q) P(N|C, Q, L) Now, let us try to exploit the marginal and conditional independence between the variables, to make the representation more compact. Let's start by considering the independency between the location of the restaurant and quality of food over there. As both of these attributes are independent of each other, P(L|Q) would be the same as P(L). Therefore, we need to store only one parameter to represent it. From the conditional independence that we have seen earlier, we know that . Thus, P(N|C, Q, L) would be the same as P(N|C, L); thus needing only four parameters. Therefore, we now need only (2 + 1 + 6 + 4 = 13) parameters to represent the whole distribution. We can conclude that exploiting independencies helps in the compact representation of joint probability distribution. This forms the basis for the Bayesian network. Representation A Bayesian network is represented by a Directed Acyclic Graph (DAG) and a set of Conditional Probability Distributions (CPD) in which: The nodes represent random variables The edges represent dependencies For each of the nodes, we have a CPD In our previous restaurant example, the nodes would be as follows: Quality of food (Q) Location (L) Cost of food (C) Number of people (N) As the cost of food was dependent on the quality of food (Q) and the location of the restaurant (L), there will be an edge each from Q ? C and L ? C. Similarly, as the number of people visiting the restaurant depends on the price of food and its location, there would be an edge each from L ? N and C ? N. The resulting structure of our Bayesian network is shown in Fig 1.3: Fig 1.3: Bayesian network for the restaurant example Factorization of a distribution over a network Each node in our Bayesian network for restaurants has a CPD associated to it. For example, the CPD for the cost of food in the restaurant is P(C|Q, L), as it only depends on the quality of food and location. For the number of people, it would be P(N|C, L) . So, we can generalize that the CPD associated with each node would be P(node|Par(node)) where Par(node) denotes the parents of the node in the graph. Assuming some probability values, we will finally get a network as shown in Fig 1.4: Fig 1.4: Bayesian network of restaurant along with CPDs Let us go back to the joint probability distribution of all these attributes of the restaurant again. Considering the independencies among variables, we concluded as follows: P(Q,C,L,N) = P(Q)P(L)P(C|Q, L)P(N|C, L) So now, looking into the Bayesian network (BN) for the restaurant, we can say that for any Bayesian network, the joint probability distribution over all its random variables {X1,X2,...,Xn} can be represented as follows: This is known as the chain rule for Bayesian networks. Also, we say that a distribution P factorizes over a graph G, if P can be encoded as follows: Here, ParG(X) is the parent of X in the graph G. Summary In this article, we saw how we can represent a complex joint probability distribution using a directed graph and a conditional probability distribution associated with each node, which is collectively known as a Bayesian network. Resources for Article: Further resources on this subject: Web Scraping with Python [article] Exact Inference Using Graphical Models [article] wxPython: Design Approaches and Techniques [article]

0
0
47571

article-image-optimizing-salesforce-flows

Andy Forbes, Philip Safir, Joseph Kubon, Francisco Fálder

24 Jan 2024

13 min read

Optimizing Salesforce Flows

Andy Forbes, Philip Safir, Joseph Kubon, Francisco Fálder

24 Jan 2024

13 min read

0
0
47512

article-image-what-are-slowly-changing-dimensions-scd-and-why-you-need-them-in-your-data-warehouse

Savia Lobo

07 Dec 2017

8 min read

What are Slowly changing Dimensions (SCD) and why you need them in your Data Warehouse?

Savia Lobo

07 Dec 2017

8 min read

0
1
47429

Making a simple Web based SSH client using Node.js and Socket.io

How to handle categorical data for machine learning algorithms

What is LSTM?

How to deploy Serverless Applications in Go using AWS Lambda [Tutorial]

Step Detector and Step Counters Sensors

Brett Lantz on implementing a decision tree using C5.0 algorithm in R

“Rust is the future of systems programming, C is the new Assembly”: Intel principal engineer, Josh Triplett

Setting up Logistic Regression model using TensorFlow

Connecting Cloud Object Storage with Databricks Unity Catalog

Building RESTful web services with Kotlin

Trending Topics

Mastering Code Generation: Exploring the LLVM Backend

PowerShell Basics for IT Professionals

Bayesian Network Fundamentals

Optimizing Salesforce Flows

What are Slowly changing Dimensions (SCD) and why you need them in your Data Warehouse?