Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Tech Guides

852 Articles
article-image-python-software-foundation-jetbrains-python-developer-survey
Aaron Lazar
24 May 2018
7 min read
Save for later

What the Python Software Foundation & Jetbrains 2017 Python Developer Survey had to reveal

Aaron Lazar
24 May 2018
7 min read
Python Software Foundation together with Jetbrains conduct their developer survey every year and at the end of 2017, over 9,500 developers from all over the world participated in this insightful Python developer survey. The learnings are pretty interesting and so, I thought I’d quickly summarise the most relevant points here. So here we go. TL;DR: Adoption of Python 3 is growing rapidly, but 25% of developers are yet to migrate to Python 3 despite the closely looming deadline (1st of Jan, 2020). There are as many Python developers doing primarily data science as there are those focused on web development. This is quite different from the 2016 findings, where there was a difference of 17% between Web and Data. A majority of Python developers use JavaScript, HTML/CSS and SQL along with Python. Django, Pandas, NumPy and Matplotlib are the most popular python frameworks. Jupyter Notebook and Docker are the most popular technologies used along with Python. Among cloud platforms, AWS is the most popular. Both editions of PyCharm (Community and Professional) are the most popular tools for Python development, followed by SublimeText. Code autocompletion, code refactorings and writing unit tests are the most widely used features in Python. More than half the respondents had a full-time job, working on Python. A majority of respondents held the role of a Developer/Programmer and belonged to the age group 21-39. Most of the developers are located in the US, India and China. The above stats are quite interesting no doubt. This got me thinking about the why behind those numbers. Here’s my perspective on some of those findings. I would love to hear yours in the comments section below. How is Python being used? Starting with the usage of Python, the survey revealed that close to 80% of the respondents used Python as their primary language for development. When asked which languages they generally used it with, the top responses were JavaScript, HTML/CSS and SQL. On the other hand, a lot of Java developers and those using Bash/shell, seem to use Python as their secondary language. This shows that Python is quite interoperable with a number of other languages, making it versatile to use in web, enterprise and server side scripting. Now when it comes to what tasks Python is used for in day to day development, it wasn’t a surprise when respondents mentioned Data Analysis. More than 50% use Python as their primary language for data analysis, however, only 32% claimed that they used it for Machine Learning. On the other hand, 54% mentioned that they used it for web development. 36% responded that they used Python for DevOps and system administration purposes. This isn’t surprising as most developers tend to stick to a particular tool/stack as far as possible. Developers also responded that they used Python the most for Web Development, apart from anything else, with Machine Learning + Data Analysis close on its heels. Most DevOps and Sys admins use Python as their secondary language - that might be because shell/bash are their primary languages. In the 2016 survey, the percentage of web developers was much more than ML/Data Analysts, but the difference has reduced greatly. What roles do these developers hold? When asked what roles these developers hold, the responses were quite interesting! While nearly a quarter were in a combination of Data Analysis and Machine Learning roles, another quarter were in a combination of Data Analysis and Web Development! 15% claimed to be in Web Development and Machine Learning. This relationship, although quite unlikely, is extremely interesting and worth exploring further. One reason could be that developers are building machine learning solutions that are offered to customers as a web application, rather than as a desktop application. Another reason could also be that a lot of web apps these days are becoming more data driven and require some kind of machine learning components running under the hood. What version of Python are developers rolling with and what tools do they use it with? A very surprising fact that surfaced from the survey was that 25% of developers still haven’t migrated their codebases to Python 3 and are still working with Python 2. This is quite astonishing, since the support for Python 2 will be discontinued in less than two years (from Jan 1, 2020 to be precise). Although, the adoption for Python 3 has been growing steadily over the years, most of the developers who were still using Python 2 turned out to be web developers. This is so because data scientists might have moved into using Python quite recently, as compared to web developers who might have been using Python for a long time and hence, haven’t migrated their legacy code. What are their prefered tool set with Python? When asked about the tools that developers used, the web developers responded that a majority of them used Django(76%), while 53% used Requests and 49% used Flask. When it came to GUI frameworks, 9% of developers used PyQT / PyGTK / wxPython while 6% used TkInter. 29% of these developers mentioned that they used scientific libraries like NumPy / pandas / Matplotlib / scipy. This is quite supportive of the overlap between both the GUI development and Data Science roles. On the other hand, Data Scientists responded that 65% used NumPy / pandas / Matplotlib / scipy. 38% used Keras / Theano / TensorFlow / scikit-learn, while 31% and 27% used Django and Flask respectively. Django was a clear winner in the tools section, with an overall of 41% developers using it. When asked about what tools they used along with Python, the web developers responded that 47% used Docker, 46% used an ORM like SQLAlchemy, PonyORM, etc. and 40% used Redis. 27% of them used Jupyter Notebook. The Data Scientists on the other hand, used Jupyter Notebook a lot (52%). 40% of them used Anaconda and 23% Docker. Of the various cloud platforms, developers chose AWS the most (65%). When it came to Python features that were used the most, Code autocompletion (84%), code refactorings (82%) and writing unit tests (81%), made the top of the list. 75% developers used SQL databases while only 46% used NoSQL. Of the various IDEs and Editors, PyCharm in both its versions, community and professional, was the most popular, closely tailed by Sublime, Vim, IDLE, Atom, and VS Code. While Web Developers preferred PyCharm, data scientists prefer Jupyter Notebook. Developer Profile: Employment, Job Roles and Experience Unsurprisingly, 52% of Python developers claimed that they were in a full-time job. This ties in well with the 2018 StackOverflow Developer survey which labeled Python as the “most wanted” programming language. So developers out there, if you’re well versed with Python, you’re more likely to be hired. Talking about job roles, 73% held the role of a Developer/Programmer, while 19% held the role of a Data Analyst and 18% an Architect. Interestingly, 7% of them held the role of a CIO / CEO / CTO. In terms of years of experience, the results were well balanced with almost as many developers having more than 5 years of experience as those with less than 5 years of experience. 67% of the respondents were in the age group of 21-39, meaning that a majority of young folk seem to be using Python. If you’re one of them, and are looking to progress in your career, check out our extensive catalog of Python titles. As for geographic location of the developers, 18% were from the US while 13% were from India and 7% from China. Should you move to Python 3? 7 Python experts’ opinions Python experts talk Python on Twitter: Q&A Recap Introducing Dask: The library that makes scalable analytics in Python easier
Read more
  • 0
  • 2
  • 45595

article-image-dl-wars-pytorch-vs-tensorflow
Savia Lobo
15 Sep 2017
6 min read
Save for later

Is Facebook-backed PyTorch better than Google's TensorFlow?

Savia Lobo
15 Sep 2017
6 min read
[dropcap]T[/dropcap]he rapid rise of tools and techniques in Artificial Intelligence and Machine learning of late has been astounding. Deep Learning, or “Machine learning on steroids” as some say, is one area where data scientists and machine learning experts are spoilt for choice in terms of the libraries and frameworks available. There are two libraries that are starting to emerge as frontrunners. TensorFlow is the best in class, but PyTorch is a new entrant in the field that could compete. So, PyTorch vs TensorFlow, which one is better? How do the two deep learning libraries compare to one another? TensorFlow and PyTorch: the basics Google’s TensorFlow is a widely used machine learning and deep learning framework. Open sourced in 2015 and backed by a huge community of machine learning experts, TensorFlow has quickly grown to be THE framework of choice by many organizations for their machine learning and deep learning needs. PyTorch, on the other hand, a recently developed Python package by Facebook for training neural networks is adapted from the Lua-based deep learning library Torch. PyTorch is one of the few available DL frameworks that uses tape-based autograd system to allow building dynamic neural networks in a fast and flexible manner. Pytorch vs TensorFlow Let's get into the details - let the Python vs TensorFlow match up begin... What programming languages support PyTorch and TensorFlow? Although primarily written in C++ and CUDA, Tensorflow contains a Python API sitting over the core engine, making it easier for Pythonistas to use. Additional APIs for C++, Haskell, Java, Go, and Rust are also included which means developers can code in their preferred language. Although PyTorch is a Python package, there’s provision for you to code using the basic C/ C++ languages using the APIs provided. If you are comfortable using Lua programming language, you can code neural network models in PyTorch using the Torch API. How easy are PyTorch and TensorFlow to use? TensorFlow can be a bit complex to use if used as a standalone framework, and can pose some difficulty in training Deep Learning models. To reduce this complexity, one can use the Keras wrapper which sits on top of TensorFlow’s complex engine and simplifies the development and training of deep learning models. TensorFlow also supports Distributed training, which PyTorch currently doesn’t. Due to the inclusion of Python API, TensorFlow is also production-ready i.e., it can be used to train and deploy enterprise-level deep learning models. PyTorch was rewritten in Python due to the complexities of Torch. This makes PyTorch more native to developers. It has an easy to use framework that provides maximum flexibility and speed. It also allows quick changes within the code during training without hampering its performance. If you already have some experience with deep learning and have used Torch before, you will like PyTorch even more, because of its speed, efficiency, and ease of use. PyTorch includes custom-made GPU allocator, which makes deep learning models highly memory efficient. Due to this, training large deep learning models becomes easier. Hence, large organizations such as Facebook, Twitter, Salesforce, and many more are embracing Pytorch. In this PyTorch vs TensorFlow round, PyTorch wins out in terms of ease of use. Training Deep Learning models with PyTorch and TensorFlow Both TensorFlow and PyTorch are used to build and train Neural Network models. TensorFlow works on SCG (Static Computational Graph) that includes defining the graph statically before the model starts execution. However, once the execution starts the only way to tweak changes within the model is using tf.session and tf.placeholder tensors. PyTorch is well suited to train RNNs( Recursive Neural Networks) as they run faster in PyTorch than in TensorFlow. It works on DCG (Dynamic Computational Graph) and one can define and make changes within the model on the go. In a DCG, each block can be debugged separately, which makes training of neural networks easier. TensorFlow has recently come up with TensorFlow Fold, a library designed to create TensorFlow models that works on structured data. Like PyTorch, it implements the DCGs and gives massive computational speeds of up to 10x on CPU and more than 100x on GPU! With the help of Dynamic Batching, you can now implement deep learning models which vary in size as well as structure. Comparing GPU and CPU optimizations TensorFlow has faster compile times than PyTorch and provides flexibility for building real-world applications. It can run on literally any kind of processor from a CPU, GPU, TPU, mobile devices, to a Raspberry Pi (IoT Devices). PyTorch, on the other hand, includes Tensor computations which can speed up deep neural network models upto 50x or more using GPUs. These tensors can dwell on CPU or GPU. Both CPU and GPU are written as independent libraries; making PyTorch efficient to use, irrespective of the Neural Network size. Community Support TensorFlow is one of the most popular Deep Learning frameworks today, and with this comes a huge community support. It has great documentation, and an eloquent set of online tutorials. TensorFlow also includes numerous pre-trained models which are hosted and available on github. These models aid developers and researchers who are keen to work with TensorFlow with some ready-made material to save their time and efforts. PyTorch, on the other hand, has a relatively smaller community since it has been developed fairly recently. As compared to TensorFlow, the documentation isn’t that great, and codes are not readily available. However, PyTorch does allow individuals to share their pre-trained models with others. PyTorch and TensorFlow - A David & Goliath story As it stands, Tensorflow is clearly favoured and used more than PyTorch for a variety of reasons. Tensorflow best suited for a wide range of practical purposes. It is the obvious choice for many machine learning and deep learning experts because of its vast array of features. Its maturity in the market is important too. It has a better community support along with multiple language APIs available. It has a good documentation and is production-ready due to the availability of ready-to-use code. Hence, it is better suited for someone who wants to get started with Deep Learning, or for organizations wanting to productize their Deep Learning models. PyTorch is relatively new and has a smaller community than TensorFlow, but it is fast and efficient. In short, it gives you all the power of Torch wrapped in the usefulness and ease of Python. Because of its efficiency and speed, it's a good option for small, research based projects. As mentioned earlier, companies such as Facebook, Twitter, and many others are using Pytorch to train deep learning models. However, its adoption is yet to go mainstream. The potential is evident, PyTorch is just not ready yet to challenge the beast that is TensorFlow. However considering its growth, the day is not far when PyTorch is further optimized and offers more functionalities - to the point that it becomes the David to TensorFlow’s Goliath.
Read more
  • 0
  • 0
  • 45487

article-image-what-is-the-history-behind-c-programming-and-unix
Packt Editorial Staff
17 Oct 2019
9 min read
Save for later

What is the history behind C Programming and Unix?

Packt Editorial Staff
17 Oct 2019
9 min read
If you think C programming and Unix are unrelated, then you are making a big mistake. Back in the 1970s and 1980s, if the Unix engineers at Bell Labs had decided to use another programming language instead of C to develop a new version of Unix, then we would be talking about that language today. The relationship between the two is simple; Unix is the first operating system that is implemented with a high-level C programming language, got its fame and power from Unix. Of course, our statement about C being a high-level programming language is not true in today’s world. This article is an excerpt from the book Extreme C by Kamran Amini. Kamran teaches you to use C’s power. Apply object-oriented design principles to your procedural C code. You will gain new insight into algorithm design, functions, and structures. You’ll also understand how C works with UNIX, how to implement OO principles in C, and what multiprocessing is. In this article, we are going to look at the history of C programming and Unix. Multics OS and Unix Even before having Unix, we had the Multics OS. It was a joint project launched in 1964 as a cooperative project led by MIT, General Electric, and Bell Labs. Multics OS was a huge success because it could introduce the world to a real working and secure operating system. Multics was installed everywhere from universities to government sites. Fast-forward to 2019, and every operating system today is borrowing some ideas from Multics indirectly through Unix. In 1969, because of the various reasons that we will talk about shortly, some people at Bell Labs, especially the pioneers of Unix, such as Ken Thompson and Dennis Ritchie, gave up on Multics and, subsequently, Bell Labs quit the Multics project. But this was not the end for Bell Labs; they had designed their simpler and more efficient operating system, which was called Unix. It is worthwhile to compare the Multics and Unix operating systems. In the following list, you will see similarities and differences found while comparing Multics and Unix: Both follow the onion architecture as their internal structure. We mean that they both have the same rings in their onion architecture, especially kernel and shell rings. Therefore, programmers could write their own programs on top of the shell ring. Also, Unix and Multics expose a list of utility programs, and there are lots of utility programs such as ls and pwd. In the following sections, we will explain the various rings found in the Unix architecture. Multics needed expensive resources and machines to be able to work. It was not possible to install it on ordinary commodity machines, and that was one of the main drawbacks that let Unix thrive and finally made Multics obsolete after about 30 years. Multics was complex by design. This was the reason behind the frustration of Bell Labs employees and, as we said earlier, the reason why they left the project. But Unix tried to remain simple. In the first version, it was not even multitasking or multi-user! You can read more about Unix and Multics online, and follow the events that happened in that era. Both were successful projects, but Unix has been able to thrive and survive to this day. It is worth sharing that Bell Labs has been working on a new distributed operating system called Plan 9, which is based on the Unix project.   Figure 1-1: Plan 9 from Bell Labs Suffice to say that Unix was a simplification of the ideas and innovations that Multics presented; it was not something new, and so, I can quit talking about Unix and Multics history at this point. So far, there are no traces of C in the history because it has not been invented yet. The first versions of Unix were purely written using assembly language. Only in 1973 was Unix version 4 written using C. Now, we are getting close to discussing C itself, but before that, we must talk about BCPL and B because they have been the gateway to C. About BCPL and B BCPL was created by Martin Richards as a programming language invented for the purpose of writing compilers. The people from Bell Labs were introduced to the language when they were working as part of the Multics project. After quitting the Multics project, Bell Labs first started to write Unix using assembly programming language. That’s because, back then, it was an anti-pattern to develop an operating system using a programming language other than assembly. For instance, it was strange that the people at the Multics project were using PL/1 to develop Multics but, by doing that, they showed that operating systems could be successfully written using a higher-level programming language other than assembly. As a result, Multics became the main inspiration for using another language for developing Unix. The attempt to write operating system modules using a programming language other than assembly remained with Ken Thompson and Dennis Ritchie at Bell Labs. They tried to use BCPL, but it turned out that they needed to apply some modifications to the language to be able to use it in minicomputers such as the DEC PDP-7. These changes led to the B programming language. While we won’t go too deep into the properties of the B language here you can read more about it and the way it was developed at the following links: The B Programming Language  The Development of the C Language Dennis Ritchie authored the latter article himself, and it is a good way to explain the development of the C programming language while still sharing valuable information about B and its characteristics. B also had its shortcomings in terms of being a system programming language. B was typeless, which meant that it was only possible to work with a word (not a byte) in each operation. This made it hard to use the language on machines with a different word length. Therefore, over time, further modifications were made to the language until it led to developing the NB (New B) language, which later derived the structures from the B language. These structures were typeless in B, but they became typed in C. And finally, in 1973, the fourth version of Unix could be developed using C, which still had many assembly codes. In the next section, we talk about the differences between B and C, and why C is a top-notch modern system programming language for writing an operating system. The way to C programming and Unix I do not think we can find anyone better than Dennis Ritchie himself to explain why C was invented after the difficulties met with B. In this section, we’re going to list the causes that prompted Dennis Ritchie, Ken Thompson, and others create a new programming language instead of using B for writing Unix. Limitations of the B programming language: B could only work with words in memory: Every single operation should have been performed in terms of words. Back then, having a programming language that was able to work with bytes was a dream. This was because of the available hardware at the time, which addressed the memory in a word-based scheme. B was typeless: More accurately, B was a single-type language. All variables were from the same type: word. So, if you had a string with 20 characters (21 plus the null character at the end), you had to divide it up by words and store it in more than one variable. For example, if a word was 4 bytes, you would have 6 variables to store 21 characters of the string. Being typeless meant that multiple byte-oriented algorithms, such as string manipulation algorithms, were not efficiently written with B: This was because B was using the memory words not bytes, and they could not be used efficiently to manage multi-byte data types such as integers and characters. B didn’t support floating-point operations: At the time, these operations were becoming increasingly available on the new hardware, but there was no support for that in the B language. Through the availability of machines such as PDP-1, which could address memory on a byte basis, B showed that it could be inefficient in addressing bytes of memory: This became even clearer with B pointers, which could only address the words in the memory, and not the bytes. In other words, for a program wanting to access a specific byte or a byte range in the memory, more computations had to be done to calculate the corresponding word index. The difficulties with B, particularly its slow development and execution on machines that were available at the time, forced Dennis Ritchie to develop a new language. This new language was called NB, or New B at first, but it eventually turned out to be C. This newly developed language, C, tried to cover the difficulties and flaws of B and became a de facto programming language for system development, instead of the assembly language. In less than 10 years, newer versions of Unix were completely written in C, and all newer operating systems that were based on Unix got tied with C and its crucial presence in the system. As you can see, C was not born as an ordinary programming language, but instead, it was designed by having a complete set of requirements in mind. You may consider languages such as Java, Python, and Ruby to be higher-level languages, but they cannot be considered as direct competitors as they are different and serve different purposes. For instance, you cannot write a device driver or a kernel module with Java or Python, and they themselves have been built on top of a layer written in C. Unlike some programming languages, C is standardized by ISO, and if it is required to have a certain feature in the future, then the standard can be modified to support the new feature. To summarize In this article, we began with the relationship between Unix and C. Even in non-Unix operating systems, you see some traces of a similar design to Unix systems. We also looked at the history of C and explained how Unix appeared from Multics OS and how C was derived from the B programming language. The book Extreme C, written by Kamran Amini will help you make the most of C's low-level control, flexibility, and high performance. Is Dark an AWS Lambda challenger? Microsoft mulls replacing C and C++ code with Rust calling it a “modern safer system programming language” with great memory safety features Is Scala 3.0 a new language altogether? Martin Odersky, its designer, says “yes and no”
Read more
  • 0
  • 0
  • 45364

article-image-introduction-sklearn
Janu Verma
16 Apr 2015
7 min read
Save for later

Introduction to Sklearn

Janu Verma
16 Apr 2015
7 min read
This is an introductory post on scikit-learn where we will learn basic terminology and functionality of this amazing Python package. We will also explore basic principles of machine learning and how machine learning can be done with sklearn. What is scikit-learn (sklearn)? scikit-learn is a python framework for machine learning. It has an efficient implementation of various machine learning and data mining algorithms. It is easy to use and accessible to everybody – open source, and a commercially usable BSD license. Data Scientists love Python and most scientists in the industry use this as their data science stack:                numpy + pandas + sklearn Dependencies Python (>= 2.6) numpy (>= 1.6.1) scipy (>= 0.9) matplotlib (for some tasks) Installation Mac - pip install -U numpy scipy scikit-learn Linux - sudo apt-get install build-essential python-dev python-setuptools python-numpy python-scipy libatlas-dev libatlas3gf-base After you have installed sklearn and all its dependencies, you are ready to dive further. Input data Most machine learning algorithms implemented in sklearn expect the input data in the form of a numpy array of shape [nSamples, nFeatures]. nSamples is the number of samples in the data. Each sample is an observation or an instance of the data. A sample can be a text document, a picture, a row in a database or a csv file – anything you can describe with a fixed set of quantitative traits. nFeatures is the number of features or distinct traits that describe each sample quantitatively. Features can be real-valued, boolean or discrete. The data can be very high dimensional, such as with hundreds of thousands of features, and it can be sparse, such as most of the features values are zero. Example As an example, we will look at the Iris dataset, which comes with sklearn and every other ML package that I know of! from sklearn.datasets import load_iris iris = load_iris() input = iris.data output = iris.target What are the number of samples and features in this dataset ? Since the input data is a numpy array, we can access its shape using the following: nSamples = input.shape[0] nFeatures = input.shape[1] >> nSamples = 150 >> nFeatures = 4 This dataset has 150 samples, where each sample has 4 features. Let's look at the names of the target output: iris.target_names >> array(['setosa','versicolor', 'virginica'], dtype='|S10') To get a better idea of the data, let's look at a sample: input[0] >> array([5.1, 3.5, 1.4, 0.2]) output[0] >> 0 The data is given as a numpy array of shape (150,4) which consists of the measurements of physical traits for three species of irises. The features include: sepal length in cm sepal width in cm petal length in cm petal width in cm The target values {0,1,2} denote three species: Setosa Versicolour Virginica Here is the basic idea of machine learning. The basic setting for a supervised machine learning model is as follows: We have a labeled training set, such as samples with known values of a target. We are given an unlabeled testing set, such as samples for which the target values are unknown. The goal is to build a model that trains on the labeled data to predict the output for the unlabeled data. Supervised learning is further broken down into two categories: classification and regression. In classification, the target value is discrete In regression, the target value is continuous. There are various machine learning methods that can be used to build a supervised learning model, for example decision trees, k-nearest neighbors, SVM, linear and logistic regression, random forests, and more. I'll not talk about these methods and their differences in this post. I will give an illustration of using sklearn for predictive modeling using a regression and a classification model. Iris Example continued (Clasification): We saw that data is a numpy array of shape (150,4) consisting of measurements of physical traits for three iris species. Goal The task is to build a machine learning model to predict the species of a sample given the values of the features. We will split the iris set into a training and a test set. The model will be built on a training set and evaluated on the test set. Before we do that, let's look at the general outline of a machine learning model in sklearn. Outline of sklearn models: The basic outline of a sklearn model is given by the following pseudocode. input = labeled data X_train = input.features Y_train = input.target algorithm = sklearn.ClassImplementingTheAlgorithm(parameters of the algorithm) fitting = algorithm.fit(X_train, Y_train) X_test = unlabeled set prediction = algorithm.predict(X_test) Here, as before, the labeled training data is in the form of a numpy array with X_train as the array of feature values and Y_train as the corresponding target values. In sklearn, different machine learning algorithms are implemented as classes and we will choose the class corresponding to the algorithm we want to use. Each class has a method called fit which fits the input training data to estimate the parameters of the algorithm. Now with these estimated parameters, the predict method computes the estimated value of the target for the test examples. sklearn model on iris data: Following the general outline of the sklearn model, we will now build a model on iris data to predict the species. from sklearn.datasets import load_iris iris = load_iris() X = iris.data Y = iris.target from sklearn import cross_validation X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(X,Y, test_size=0.4) from sklearn.neighbors import KNeighborsClassifier algorithm = KNeighborsClassifier(n_neighbors=5) fitting = algorithm.fit(X_train, Y_train) prediction = algorithm.predict(X_test) The iris data set is split into a training and a test set using a cross validation class from sklearn. The 60% of the iris data was formed and the remaining 40% was the test. The cross_validation picks training and test examples randomly. We used the K-nearest neighbor algorithm to build this model. There is no reason for choosing this method, other than simplicity. The prediction of the sklearn model is a label from {0,1,2} for each of the test case. Let's check how well this model performed: from sklearn.metrics import accuracy_score accuracy_score(Y_test, prediction) >> 0.97 Regression: We will discuss the simplest example of fitting a line through the data. # Create some simple data import numpy as np np.random.seed(0) X = np.random.random(size=(20, 1)) y = 3 * X.squeeze() + 2 + np.random.normal(size=20) # Fit a linear regression to it from sklearn.linear_model import LinearRegression model = LinearRegression(fit_intercept=True) model.fit(X, y) print ("Model coefficient: %.5f, and intercept: %.5f"% (model.coef_, model.intercept_)) >> Model coefficient: 3.93491, and intercept: 1.46229 # model prediction X_test = np.linspace(0, 1, 100)[:, np.newaxis] y_test = model.predict(X_test) Thus we get the values of the target (which were continous). We gave a simple model based on sklearn implementation of K-Nearest neighbor algorithm and linear regression. You can try other models. The python code will be same for most of the methods in sklearn, except for a change in the name of the algorithm. Discovert more Machine Learning content and tutorials on our dedicated Machine Learning page. About the Author Janu Verma is a Quantitative Researcher at the Buckler Lab, Cornell University, where he works on problems in bioinformatics and genomics. His background is in mathematics and machine learning and he leverages tools from these areas to answer questions in biology. He holds a Masters in Theoretical Physics from University of Cambridge in UK, and he dropped out from mathematics PhD program (after 3 years) at Kansas State University. He has held research positions at Indian Statistical Institute – Delhi, Tata Institute of Fundamental Research – Mumbai and at JN Center for Advanced Scientific Research – Bangalore. He is a voracious reader and an avid traveler. He hangs out at the local coffee shops, which serve as his office away from office. He writes about data science, machine learning and mathematics at Random Inferences.
Read more
  • 0
  • 0
  • 44735

article-image-6-common-use-cases-of-reverse-proxy-scenarios
Guest Contributor
05 Oct 2018
6 min read
Save for later

6 common use cases of Reverse Proxy scenarios

Guest Contributor
05 Oct 2018
6 min read
Proxy servers are used as intermediaries between a client and a website or online service. By routing traffic through a proxy server, users can disguise their geographic location and their IP address. Reverse proxies, in particular, can be configured to provide a greater level of control and abstraction, thereby ensuring the flow of traffic between clients and servers remains smooth. This makes them a popular tool for individuals who want to stay hidden online, but they are also widely used in enterprise settings, where they can improve security, allow tasks to be carried out anonymously, and control the way employees are able to use the internet. What is a Reverse Proxy? A reverse proxy server is a type of proxy server that usually exists behind the firewall of a private network. It directs any client requests to the appropriate server on the backend. Reverse proxies are also used as a means of caching common content and compressing inbound and outbound data, resulting in a faster and smoother flow of traffic between clients and servers. Furthermore, the reverse proxy can handle other tasks, such as SSL encryption, further reducing the load on web servers. There is a multitude of scenarios and use cases in which having a reverse proxy can make all the difference to the speed and security of your corporate network. By providing you with a point at which you can inspect traffic and route it to the appropriate server, or even transform the request, a reverse proxy can be used to achieve a variety of different goals. Load Balancing to route incoming HTTP requests This is probably the most familiar use of reverse proxies for many users. Load balancing involves the proxy server being configured to route incoming HTTP requests to a set of identical servers. By spreading incoming requests across these servers, the reverse proxies are able to balance out the load, therefore sharing it amongst them equally. The most common scenario in which load balancing is employed is when you have a website that requires multiple servers. This happens due to the volume of requests, which are too much for one server to handle efficiently. By balancing the load across multiple servers, you can also move away from an architecture that features a single point of failure. Usually, the servers will all be hosting the same content, but there are also situations in which the reverse proxy will also be retrieving specific information from one of a number of different servers. Provide security by monitoring and logging traffic By acting as the mediator between clients and your system’s backend, a reverse proxy server can hide the overall structure of your backend servers. This is because the reverse proxy will capture any requests that would otherwise go to those servers and handle them securely. A reverse proxy can also improve security by providing businesses with a point at which they can monitor and log traffic flowing through their network. A common use case in which a reverse proxy is used to bolster the security of a network would be the use of a reverse proxy as an SSL gateway. This allows you to communicate using HTTP behind the firewall without compromising your security. It also saves you the trouble of having to configure security for each server behind the firewall individually. A rotating residential proxy, also known as a backconnect proxy, is a type of proxy that frequently changes the IP addresses and connections that the user uses. This allows users to hide their identity and generate a large number of requests without setting alarms off. A reverse rotating residential proxy can be used to improve the security of a corporate network or website. This is because the servers in question will display the information for the proxy server while keeping their own information hidden from potential attackers. No need to install certificates on your backend servers with SSL Termination SSL termination process occurs when an SSL connection server ends, or when the traffic shifts between encrypted and unencrypted requests. By using a reverse proxy to handle any incoming HTTPS connections, you can have the proxy server decrypt the request, and then pass on the unencrypted request to the appropriate server. Taking this approach offers practical benefits. For example, it eliminates the need to install certificates on your backend servers. It also provides you with a single configuration point for managing SSL/TLS. Removing the need for your web servers to undertake this decryption means that you are also reducing the processing load on the server. Serve static content on behalf of backend servers Some reverse proxy servers can be configured to also act as web servers. Websites contain a mixture of dynamic content, which changes over time, and static content, which always remains the same. If you can configure your reverse proxy server to serve up static content on behalf of backend servers, you can greatly reduce the load, freeing up more power for dynamic content rendering. Alternatively, a reverse proxy can be configured to behave like a cache. This allows it to store and serve content that is frequently requested, thereby further reducing the load on backend servers. URL Rewriting before they go on to the backend servers Anything that a business can do to easily to improve their SEO score is worth considering. Without an investment in your SEO, your business or website will remain invisible to search engine users. With URL rewriting, you can compensate for any legacy systems you use, which produce URLs that are less than ideal for SEO. With a reverse proxy server, the URLs can be automatically reformatted before they are passed on to the backend servers. Combine Different Websites into a Single URL Space It is often desirable for a business to adopt a distributed architecture whereby different functions are handled by different components. With a reverse proxy, it is easy to route a single URL to a multitude of components. To anyone who uses your URL, it will simply appear as if they are moving to another page on the website. In fact, each page within that URL might actually be connecting to a completely different backend service. This is an approach that is widely used for web service APIs. To sum up, the primary function of a reverse proxy is load balancing, ensuring that no individual backend server becomes inundated with more traffic or requests than it can handle. However, there are a number of other scenarios in which a reverse proxy can potentially offer enormous benefits. About the author Harold Kilpatrick is a cybersecurity consultant and a freelance blogger. He's currently working on a cybersecurity campaign to raise awareness around the threats that businesses can face online. Read Next HAProxy introduces stick tables for server persistence, threat detection, and collecting metrics How to Configure Squid Proxy Server Acting as a proxy (HttpProxyModule)
Read more
  • 0
  • 0
  • 44261

article-image-top-8-ways-to-improve-your-data-visualizations
Natasha Mathur
04 Jul 2018
7 min read
Save for later

8 ways to improve your data visualizations

Natasha Mathur
04 Jul 2018
7 min read
In Dr. W.Edwards Deming’s words “In God we trust, all others must bring data”. Organizations worldwide, revolve around data like planets revolve around the sun. Since data is so central to organizations, there are certain data visualization tools that help them understand data to make better business decisions. A lot more data is getting churned out and collected by organizations than ever before. So, how to make sense of all this data? Humans are visual creatures and our human brain processes visual information far better than textual information. In fact, presentations that use visual aids such as colors, shapes, images, etc, are found to be far more persuasive according to a research done by University of Minnesota back in 1986. Data visualization is one such process that easily translates the collected information into engaging visuals. It’s easy, cheap and doesn’t require any designing expertise to create data visuals. However, some professionals feel that data visualization is just limited to slapping on charts and graphs when that’s not actually the case. Data visualization is about conveying the right information, in a way that enhances the audience’s experience. So, if you want your graphs and charts to be more succinct and understandable, here are eight ways to improve your data visualization process: 1. Get rid of unneeded information Less is more in some cases and the same goes for data visualization. Using excessive color, jargons, pie charts and metrics take away focus from the important information. For instance, when using colors, don’t make your charts and graphs a rainbow instead use a specific set of colors with a clear purpose and meaning. Do you see the difference color and chart make to visualization in the below images? Source: Podio Similarly, when it comes to expressing your data, note how people interact at your workplace. Keep the tone of your visuals as natural as possible to make it easy for the audience to interpret your data. For metrics, only show the ones that truly bring value to your storytelling. Filter out the ones that are not so important to create less fuss. Tread cautiously while using pie charts as they can be difficult to understand sometimes and also, get rid of the elements on a chart that cause unnecessary confusion.   Source: Dashboard Zone 2. Use conditional formatting for tabular data Data visualization doesn’t need to use fancy tools or designs. Take your standard excel table for example. Do you want to point out patterns or outliers in your data? Conditional formatting is a great tool for people working with data. It involves making simple rules on a given data and once that’s done, it’ll highlight only the data that matters the most to you. This helps quickly track the main information. Conditional formatting can be used for different things. It can help spot duplicate data in your table. You need to set bounds for the data using the built-in conditional formatting. It’ll then format the cells based on those bounds, highlighting the data you want. For instance, if sales quota of over 65% is good, between 65% and 55% is average, and below 55% is poor, then with conditional formatting, you can quickly find out who is meeting the expected sales quota, and who is not. 3. Add trendlines to unearth patterns for prediction Another feature that can amp up your data visualization is trendlines. They observe the relationship between two variables from your existing data. They are also are useful for predicting future values. Trendlines are simple to add and help discover trends in the given data set. Source: Interworks It also show data trends or moving averages in your charts. Depending on the kind of data you’re working with, there are a number of trendlines out there that you can use on your visualizations. Questions like whether a new strategy seems to be working in favor of the organization can be answered with the help of trendlines. This insight, in turn, helps predict new outcomes for the future. Statistical models are used in trendlines to make predictions. Once you add trend lines to a view, it’s up to you to decide how you want them to look and behave. 4. Implement filter by rule to get more specific Filter helps display just the information that you need. Using filter by rule, you can add filter option to your dataset. Organizations produce huge amounts of data on a regular basis. Suppose you want to know which employees within your organization are consistent performers. So, instead of creating a visualization that includes all the employees and their performances, you can filter it down, so that it shows only the employees who are always doing well. Similarly, if you want to find out which day the sales went up or down, you can filter it to show results for only the past week or month depending upon your preference. 5. For complex or dense data representation, add hierarchy Hierarchies eliminate the need to create extra visualizations. You can view data from a high level and dig deeper into the specifics of the data as you come up with questions based on the data. Adding a hierarchy to the data helps club multiple information in one visualization. Source: dzone For instance, if you create a hierarchy that shows the total sales achieved by different sales representative within an organization in the past month. Now, you can further break this down by selecting a particular sales rep, and then you can go even further by selecting a specific product assigned to that sales rep. This cuts down on a lot of extra work. 6. Make visuals more appealing by formatting data Data formatting takes only a few seconds but it can make a huge difference when it comes to the audience interpreting your data. Source: dzone It makes the numbers appear more visually appealing and easier to read for the audience. It can be used for charts such as bar charts and column charts. Formatting data to show a certain number of decimals, comma separators, number font, currency or percentage can make your visualization process more engaging. 7. Include comparison for more insight Comparisons provide readers a better perspective on data. It can both improve and add insights to your visualizations by including comparisons to your charts. For instance, in case you want to inform your audience about organization’s growth in current as well as the past year then you can include comparison within the visualization. You can also use a comparison chart to compare between two data points such as budget vs actually spent. 8. Sort data to improve readability Again, sorting through data is a great way to make things easy for the audience when dealing with huge quantities of data. For instance, if you want to include information about the highest and lowest performing products, you can sort your data. Sorting can be done in the following ways: Ascending - This helps sort the data from lowest to highest. Descending -  This sorts data from highest to lowest. Data source order - Sorts the data in the order it is sorted in the data source. Alphabetic - Data is alphabetically sorted. Manual -  Data can be sorted manually in the order you prefer. Effective data visualization helps people interpret the information in data that could not be seen before, to change their minds and prompt action. These were some of the tricks and features to take your data visualization game to the next level. There are different data visualization tools available in the market to choose from. Tableau and Microsoft Power BI are among the top ones that offer great features for data visualization. So, now that we’ve got you covered with some of the best practices for data visualization, it’s your turn to put these tips to practice and create some strong visual data stories. Do you have any DataViz tips to share with our readers? Please add them in the comments below. Getting started with Data Visualization in Tableau What is Seaborn and why should you use it for data visualization? “Tableau is the most powerful and secure end-to-end analytics platform”: An interview with Joshua Milligan  
Read more
  • 0
  • 0
  • 44109
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-what-does-a-data-science-team-look-like
Fatema Patrawala
21 Nov 2019
11 min read
Save for later

What does a data science team look like?

Fatema Patrawala
21 Nov 2019
11 min read
Until a couple of years ago, people barely knew the term 'data science' which has now evolved into an extremely popular career field. The Harvard Business Review dubbed data scientist within the data science team as the sexiest job of the 21st century and expert professionals jumped on the data is the new oil bandwagon. As per the Figure Eight Report 2018, which takes the pulse of the data science community in the US, a lot has changed rapidly in the data science field over the years. For the 2018 report, they surveyed approximately 240 data scientists and found out that machine learning projects have multiplied and more and more data is required to power them. Data science and machine learning jobs are LinkedIn's fastest growing jobs. And the internet is creating 2.5 quintillion bytes of data to process and analyze each day. With all these changes, it is evident for data science teams to evolve and change among various organizations. The data science team is responsible for delivering complex projects where system analysis, software engineering, data engineering, and data science is used to deliver the final solution. To achieve all of this, the team does not only have a data scientist or a data analyst but also includes other roles like business analyst, data engineer or architect, and chief data officer. In this post, we will differentiate and discuss various job roles within a data science team, skill sets required and the compensation benefit for each one of them. For an in-depth understanding of data science teams, read the book, Managing Data Science by Kirill Dubovikov, which has interesting case studies on building successful data science teams. He also explores how the team can efficiently manage data science projects through the use of DevOps and ModelOps.  Now let's get into understanding individual data science roles and functions, but before that we take a look at the structure of the team.There are three basic team structures to match different stages of AI/ML adoption: IT centric team structure At times for companies hiring a data science team is not an option, and they have to leverage in-house talent. During such situations, they take advantage of the fully functional in-house IT department. The IT team manages functions like data preparation, training models, creating user interfaces, and model deployment within the corporate IT infrastructure. This approach is fairly limited, but it is made practical by MLaaS solutions. Environments like Microsoft Azure or Amazon Web Services (AWS) are equipped with approachable user interfaces to clean datasets, train models, evaluate them, and deploy. Microsoft Azure, for instance, supports its users with detailed documentation for a low entry threshold. The documentation helps in fast training and early deployment of models even without an expert data scientists on board. Integrated team structure Within the integrated structure, companies have a data science team which focuses on dataset preparation and model training, while IT specialists take charge of the interfaces and infrastructure for model deployment. Combining machine learning expertise with IT resource is the most viable option for constant and scalable machine learning operations. Unlike the IT centric approach, the integrated method requires having an experienced data scientist within the team. This approach ensures better operational flexibility in terms of available techniques. Additionally, the team leverages deeper understanding of machine learning tools and libraries – like TensorFlow or Theano which are specifically for researchers and data science experts. Specialized data science team Companies can also have an independent data science department to build an all-encompassing machine learning applications and frameworks. This approach entails the highest cost. All operations, from data cleaning and model training to building front-end interfaces, are handled by a dedicated data science team. It doesn't necessarily mean that all team members should have a data science background, but they should have technology background with certain service management skills. A specialized structure model aids in addressing complex data science tasks that include research, use of multiple ML models tailored to various aspects of decision-making, or multiple ML backed services. Today's most successful Silicon Valley tech operates with specialized data science teams. Additionally they are custom-built and wired for specific tasks to achieve different business goals. For example, the team structure at Airbnb is one of the most interesting use cases. Martin Daniel, a data scientist at Airbnb in this talk explains how the team emphasizes on having an experimentation-centric culture and apply machine learning rigorously to address unique product challenges. Job roles and responsibilities within data science team As discussed earlier, there are many roles within a data science team. As per Michael Hochster, Director of Data Science at Stitch Fix, there are two types of data scientists: Type A and Type B. Type A stands for analysis. Individuals involved in Type A are statisticians that make sense of data without necessarily having strong programming knowledge. Type A data scientists perform data cleaning, forecasting, modeling, visualization, etc. Type B stands for building. These individuals use data in production. They're good software engineers with strong programming knowledge and statistics background. They build recommendation systems, personalization use cases, etc. Though it is rare that one expert will fit into a single category. But understanding these data science functions can help make sense of the roles described further. Chief data officer/Chief analytics officer The chief data officer (CDO) role has been taking organizations by storm. A recent NewVantage Partners' Big Data Executive Survey 2018 found that 62.5% of Fortune 1000 business and technology decision-makers said their organization appointed a chief data officer. The role of chief data officer involves overseeing a range of data-related functions that may include data management, ensuring data quality and creating data strategy. He or she may also be responsible for data analytics and business intelligence, the process of drawing valuable insights from data. Even though chief data officer and chief analytics officer (CAO) are two distinct roles, it is often handled by the same person. Expert professionals and leaders in analytics also own the data strategy and how a company should treat its data. It does make sense as analytics provide insights and value to the data. Hence, with a CDO+CAO combination companies can take advantage of a good data strategy and proper data management without losing on quality. According to compensation analysis from PayScale, the median chief data officer salary is $177,405 per year, including bonuses and profit share, ranging from $118,427 to $313,791 annually. Skill sets required: Data science and analytics, programming skills, domain expertise, leadership and visionary abilities are required. Data analyst The data analyst role implies proper data collection and interpretation activities. The person in this job role will ensure that collected data is relevant and exhaustive while also interpreting the results of the data analysis. Some companies also require data analysts to have visualization skills to convert alienating numbers into tangible insights through graphics. As per Indeed, the average salary for a data analyst is $68,195 per year in the United States. Skill sets required: Programming languages like R, Python, JavaScript, C/C++, SQL. With this critical thinking, data visualization and presentation skills will be good to have. Data scientist Data scientists are data experts who have the technical skills to solve complex problems and the curiosity to explore what problems are needed to be solved. A data scientist is an individual who develops machine learning models to make predictions and is well versed in algorithm development and computer science. This person will also know the complete lifecycle of the model development. A data scientist requires large amounts of data to develop hypotheses, make inferences, and analyze customer and market trends. Basic responsibilities include gathering and analyzing data, using various types of analytics and reporting tools to detect patterns, trends and relationships in data sets. According to Glassdoor, the current U.S. average salary for a data scientist is $118,709. Skills set required: A data scientist will require knowledge of big data platforms and tools like  Seahorse powered by Apache Spark, JupyterLab, TensorFlow and MapReduce; and programming languages that include SQL, Python, Scala and Perl; and statistical computing languages, such as R. They should also have cloud computing capabilities and knowledge of various cloud platforms like AWS, Microsoft Azure etc.You can also read this post on how to ace a data science interview to know more. Machine learning engineer At times a data scientist is confused with machine learning engineers, but a machine learning engineer is a distinct role that involves different responsibilities. A machine learning engineer is someone who is responsible for combining software engineering and machine modeling skills. This person determines which model to use and what data should be used for each model. Probability and statistics are also their forte. Everything that goes into training, monitoring, and maintaining a model is the ML engineer's job. The average machine learning engineer's salary is $146,085 in the US, and is ranked No.1 on the Indeed's Best Jobs in 2019 list. Skill sets required: Machine learning engineers will be required to have expertise in computer science and programming languages like R, Python, Scala, Java etc. They would also be required to have probability techniques, data modelling and evaluation techniques. Data architects and data engineers The data architects and data engineers work in tandem to conceptualize, visualize, and build an enterprise data management framework. The data architect visualizes the complete framework to create a blueprint, which the data engineer can use to build a digital framework. The data engineering role has recently evolved from the traditional software-engineering field.  Recent enterprise data management experiments indicate that the data-focused software engineers are needed to work along with the data architects to build a strong data architecture. Average salary for a data architect in the US ranges from $1,22,000 to $1,29, 000 annually as per a recent LinkedIn survey. Skill sets required: A data architect or an engineer should have a keen interest and experience in programming languages frameworks like HTML5, RESTful services, Spark, Python, Hive, Kafka, and CSS etc. They should have the required knowledge and experience to handle database technologies such as PostgreSQL, MapReduce and MongoDB and visualization platforms such as; Tableau, Spotfire etc. Business analyst A business analyst (BA) basically handles Chief analytics officer's role but on the operational level. This implies converting business expectations into data analysis. If your core data scientist lacks domain expertise, a business analyst can bridge the gap. They are responsible for using data analytics to assess processes, determine requirements and deliver data-driven recommendations and reports to executives and stakeholders. BAs engage with business leaders and users to understand how data-driven changes will be implemented to processes, products, services, software and hardware. They further articulate these ideas and balance them against technologically feasible and financially reasonable. The average salary for a business analyst is $75,078 per year in the United States, as per Indeed. Skill sets required: Excellent domain and industry expertise will be required. With this good communication as well as data visualization skills and knowledge of business intelligence tools will be good to have. Data visualization engineer This specific role is not present in each of the data science teams as some of the responsibilities are realized by either a data analyst or a data architect. Hence, this role is only necessary for a specialized data science model. The role of a data visualization engineer involves having a solid understanding of UI development to create custom data visualization elements for your stakeholders. Regardless of the technology, successful data visualization engineers have to understand principles of design, both graphical and more generally user-centered design. As per Payscale, the average salary for a data visualization engineer is $98,264. Skill sets required: A data visualization engineer need to have rigorous knowledge of data visualization methods and be able to produce various charts and graphs to represent data. Additionally they must understand the fundamentals of design principles and visual display of information. To sum it up, a data science team has evolved to create a number of job roles and opportunities, but companies still face challenges in building up the team from scratch and find it hard to figure where to start from. If you are facing a similar dilemma, check out this book, Managing Data Science, written by Kirill Dubovikov. It covers concepts and methodologies to manage and deliver top-notch data science solutions, while also providing guidance on hiring, growing and sustaining a successful data science team. How to learn data science: from data mining to machine learning How to ace a data science interview Data science vs. machine learning: understanding the difference and what it means today 30 common data science terms explained 9 Data Science Myths Debunked
Read more
  • 0
  • 0
  • 44105

article-image-common-problems-in-delphi-parallel-programming
Pavan Ramchandani
27 Jul 2018
12 min read
Save for later

Common problems in Delphi parallel programming

Pavan Ramchandani
27 Jul 2018
12 min read
This tutorial will be explaining how to find performance bottlenecks and apply the correct algorithm to fix them when working with Delphi. Also, teach you how to improve your algorithms before taking you through parallel programming. The article is an excerpt from a book written by Primož Gabrijelčič, titled Delphi High Performance. Never access UI from a background thread Let's start with the biggest source of hidden problems—manipulating a user interface from a background thread. This is, surprisingly, quite a common problem—even more so as all Delphi resources on multithreaded programming will simply say to never do that. Still, it doesn't seem to touch some programmers, and they will always try to find an excuse to manipulate a user interface from a background thread. Indeed, there may be a situation where VCL or FireMonkey may be manipulated from a background thread, but you'll be treading on thin ice if you do that. Even if your code works with the current Delphi, nobody can guarantee that changes in graphical libraries introduced in future Delphis won't break your code. It is always best to cleanly decouple background processing from a user interface. Let's look at an example which nicely demonstrates the problem. The ParallelPaint demo has a simple form, with eight TPaintBox components and eight threads. Each thread runs the same drawing code and draws a pattern into its own TPaintBox. As every thread accesses only its own Canvas, and no other user interface components, a naive programmer would therefore assume that drawing into paintboxes directly from background threads would not cause problems. A naive programmer would be very much mistaken. If you run the program, you will notice that although the code paints constantly into some of the paint boxes, others stop to be updated after some time. You may even get a Canvas does not allow drawing exception. It is impossible to tell in advance which threads will continue painting and which will not. The following image shows an example of an output. The first two paint boxes in the first row, and the last one in the last row were not updated anymore when I grabbed the image: The lines are drawn in the DrawLine method. It does nothing special, just sets the color for that line and draws it. Still, that is enough to break the user interface when this is called from multiple threads at once, even though each thread uses its own Canvas: procedure TfrmParallelPaint.DrawLine(canvas: TCanvas; p1, p2: TPoint; color: TColor); begin Canvas.Pen.Color := color; Canvas.MoveTo(p1.X, p1.Y); Canvas.LineTo(p2.X, p2.Y); end; Is there a way around this problem? Indeed there is. Delphi's TThread class implements a method, Queue, which executes some code in the main thread. Queue takes a procedure or anonymous method as a parameter and sends it to the main thread. After some short time, the code is then executed in the main thread. It is impossible to tell how much time will pass before the code is executed, but that delay will typically be very short, in the order of milliseconds. As it accepts an anonymous method, we can use the magic of variable capturing and write the corrected code, as shown here: procedure TfrmParallelPaint.QueueDrawLine(canvas: TCanvas; p1, p2: TPoint; color: TColor); begin TThread.Queue(nil, procedure begin Canvas.Pen.Color := color; Canvas.MoveTo(p1.X, p1.Y); Canvas.LineTo(p2.X, p2.Y); end); end; In older Delphis you don't have such a nice Queue method but only a version of Synchronize that accepts a normal  method. If you have to use this method, you cannot count on anonymous method mechanisms to handle parameters. Rather, you have to copy them to fields and then Synchronize a parameterless method operating on these fields. The following code fragment shows how to do that: procedure TfrmParallelPaint.SynchronizedDraw; begin FCanvas.Pen.Color := FColor; FCanvas.MoveTo(FP1.X, FP1.Y); FCanvas.LineTo(FP2.X, FP2.Y); end; procedure TfrmParallelPaint.SyncDrawLine(canvas: TCanvas; p1, p2: TPoint; color: TColor); begin FCanvas := canvas; FP1 := p1; FP2 := p2; FColor := color; TThread.Synchronize(nil, SynchronizedDraw); end; If you run the corrected program, the final result should always be similar to the following image, with all eight  TPaintBox components showing a nicely animated image: Simultaneous reading and writing The next situation which I'm regularly seeing while looking at a badly-written parallel code is simultaneous reading and writing from/to a shared data structure, such as a list.  The SharedList program demonstrates how things can go wrong when you share a data structure between threads. Actually, scrap that, it shows how things will go wrong if you do that. This program creates a shared list, FList: TList<Integer>. Then it creates one background thread which runs the method ListWriter and multiple background threads, each running the ListReader method. Indeed, you can run the same code in multiple threads. This is a perfectly normal behavior and is sometimes extremely useful. The ListReader method is incredibly simple. It just reads all the elements in a list and does that over and over again. As I've mentioned before, the code in my examples makes sure that problems in multithreaded code really do occur, but because of that, my demo code most of the time also looks terribly stupid. In this case, the reader just reads and reads the data because that's the best way to expose the problem: procedure TfrmSharedList.ListReader; var i, j, a: Integer; begin for i := 1 to CNumReads do for j := 0 to FList.Count - 1 do a := FList[j]; end; The ListWriter method is a bit different. It also loops around, but it also sleeps a little inside each loop iteration. After the Sleep, the code either adds to the list or deletes from it. Again, this is designed so that the problem is quick to appear: procedure TfrmSharedList.ListWriter; var i: Integer; begin for i := 1 to CNumWrites do begin Sleep(1); if FList.Count > 10 then FList.Delete(Random(10)) else FList.Add(Random(100)); end; end; If you start the program in a debugger, and click on the Shared lists button, you'll quickly get an EArgumentOutOfRangeException exception. A look at the stack trace will show that it appears in the line a := FList[j];. In retrospect, this is quite obvious. The code in ListReader starts the inner for loop and reads the FListCount. At that time, FList has 11 elements so Count is 11. At the end of the loop, the code tries to read FList[10], but in the meantime ListWriter has deleted one element and the list now only has 10 elements. Accessing element [10] therefore raises an exception. We'll return to this topic later, in the section about Locking. For now you should just keep in mind that sharing data structures between threads causes problems. Sharing a variable OK, so rule number two is "Shared structures bad". What about sharing a simple variable? Nothing can go wrong there, right? Wrong! There are actually multiple ways something can go wrong. The program IncDec demonstrates one of the bad things that can happen. The code contains two methods: IncValue and DecValue. The former increments a shared FValue: integer; some number of times, and the latter decrements it by the same number of times: procedure TfrmIncDec.IncValue; var i: integer; value: integer; begin for i := 1 to CNumRepeat do begin value := FValue; FValue := value + 1; end; end; procedure TfrmIncDec.DecValue; var i: integer; value: integer; begin for i := 1 to CNumRepeat do begin value := FValue; FValue := value - 1; end; end; A click on the Inc/Dec button sets the shared value to 0, runs IncValue, then DecValue, and logs the result: procedure TfrmIncDec.btnIncDec1Click(Sender: TObject); begin FValue := 0; IncValue; DecValue; LogValue; end; I know you can all tell what FValue will hold at the end of this program. Zero, of course. But what will happen if we run IncValue and DecValue in parallel? That is, actually, hard to predict! A click on the Multithreaded button does almost the same, except that it runs IncValue and DecValue in parallel. How exactly that is done is not important at the moment (but feel free to peek into the code if you're interested): procedure TfrmIncDec.btnIncDec2Click(Sender: TObject); begin FValue := 0; RunInParallel(IncValue, DecValue); LogValue; end; Running this version of the code may still sometimes put zero in FValue, but that will be extremely rare. You most probably won't be able to see that result unless you are very lucky. Most of the time, you'll just get a seemingly random number from the range -10,000,000 to 10,000,000 (which is the value of the CNumRepeatconstant). In the following image, the first number is a result of the single-threaded code, while all the rest were calculated by the parallel version of the algorithm: To understand what's going on, you should know that Windows (and all other operating systems) does many things at once. At any given time, there are hundreds of threads running in different programs and they are all fighting for the limited number of CPU cores. As our program is the active one (has focus), its threads will get most of the CPU time, but still they'll sometimes be paused for some amount of time so that other threads can run. Because of that, it can easily happen that IncValue reads the current value of FValue into value (let's say that the value is 100) and is then paused. DecValue reads the same value and then runs for some time, decrementing FValue. Let's say that it gets it down to -20,000. (That is just a number without any special meaning.) After that, the IncValue thread is awakened. It should increment the value to -19,999, but instead of that it adds 1 to 100 (stored in value), gets 101, and stores that into FValue. Ka-boom! In each repetition of the program, this will happen at different times and will cause a different result to be calculated. You may complain that the problem is caused by the two-stage increment and decrement, but you'd be wrong. I dare you—go ahead, change the code so that it will modify FValue with Inc(FValue) and Dec(FValue) and it still won't work correctly. Well, I hear you say, so I shouldn't even modify one variable from two threads at the same time? I can live with that. But surely, it is OK to write into a variable from one thread and read from another? The answer, as you can probably guess given the general tendency of this section, is again—no, you may not. There are some situations where this is OK (for example, when a variable is only one byte long) but, in general, even simultaneous reading and writing can be a source of weird problems. The ReadWrite program demonstrates this problem. It has a shared buffer, FBuf: Int64, and a pointer variable used to read and modify the data, FPValue: PInt64. At the beginning, the buffer is initialized to an easily recognized number and a pointer variable is set to point to the buffer: FPValue := @FBuf; FPValue^ := $7777777700000000; The program runs two threads. One just reads from the location and stores all the read values into a list. This value is created with Sorted and Duplicates properties, set in a way that prevents it from storing duplicate values: procedure TfrmReadWrite.Reader; var i: integer; begin for i := 1 to CNumRepeat do FValueList.Add(FPValue^); end; The second thread repeatedly writes two values into the shared location: procedure TfrmReadWrite.Writer; var i: integer; begin for i := 1 to CNumRepeat do begin FPValue^ := $7777777700000000; FPValue^ := $0000000077777777; end; end; At the end, the contents of the FValueList list are logged on the screen. We would expect to see only two values—$7777777700000000 and $0000000077777777. In reality, we see four, as the following screenshot demonstrates: The reason for that strange result is that Intel processors in 32-bit mode can't write a 64-bit number (as int64 is) in one step. In other words, reading and writing 64-bit numbers in 32-bit code is not atomic. When multithreading programmers talk about something being atomic, they want to say that an operation will execute in one indivisible step. Any other thread will either see a state before the operation or a state after the operation, but never some undefined intermediate state. How do values $7777777777777777 and $0000000000000000 appear in the test application? Let's say that FValue^ contains $7777777700000000. The code then starts writing $0000000077777777 into FValue by firstly storing a $77777777 into the bottom four bytes. After that it starts writing $00000000 into the upper four bytes of FValue^, but in the meantime Reader reads the value and gets $7777777777777777. In a similar way, Reader will sometimes see $0000000000000000 in the FValue^. We'll look into a way to solve this situation immediately, but in the meantime, you may wonder—when is it okay to read/write from/to a variable at the same time? Sadly, the answer is—it depends. Not even just on the CPU family (Intel and ARM processors behave completely differently), but also on a specific architecture used in a processor. For example, older and newer Intel processors may not behave the same in that respect. You can always depend on access to byte-sized data being atomic, but that is that. Access (reads and writes) to larger quantities of data (words, integers) is atomic only if the data is correctly aligned. You can access word sized data atomically if it is word aligned, and integer data if it is double-word aligned. If the code was compiled in 64-bit mode, you can also atomically access in 64 data if it is quad-word aligned. When you are not using data packing (such as packed records) the compiler will take care of alignment and data access should automatically be atomic. You should, however, still check the alignment in code, if nothing else to prevent stupid programming errors. If you want to write and read larger amounts of data, modify the data, or if you want to work on shared data structures, correct alignment will not be enough. You will need to introduce synchronization into your program. If you found this post useful, do check out the book Delphi High Performance to learn more about the intricacies of how to perform High-performance programming with Delphi. Delphi: memory management techniques for parallel programming Parallel Programming Patterns Concurrency programming 101: Why do programmers hang by a thread?
Read more
  • 0
  • 1
  • 43914

article-image-alteryx-vs-tableau-choosing-the-right-data-analytics-tool-for-your-business
Guest Contributor
04 Mar 2019
6 min read
Save for later

Alteryx vs. Tableau: Choosing the right data analytics tool for your business

Guest Contributor
04 Mar 2019
6 min read
Data Visualization is commonly used in the modern world, where most business decisions are taken into consideration by analyzing the data. One of the most significant benefits of data visualization is that it enables us to visually access huge amounts of data in easily understandable visuals. There are many areas where data visualization is being used. Some of the data visualization tools include Tableau, Alteryx, Infogram, ChartBlocks, Datawrapper, Plotly, Visual.ly, etc. Tableau and Alteryx are industry standard tools and have dominated the data analytics market for a few years now and still running strong without any strong competition. In this article, we will understand the core differences between Alteryx tool and Tableau. This will help us in deciding which tool to use for what purposes. Tableau is one of the top-rated tools which helps the analysts to carry out business intelligence and data visualization activities. Using Tableau, the users will be able to generate compelling dashboards and stunning data visualizations. Tableau’s interactive user interface helps users to quickly generate reports where they can drill down the information to a granular level. Alteryx is a powerful tool widely used in data analytics and also provides meaningful insights to the executive level personnel. With the user-friendly interface, the user will be able to extract the data, transform the data, and load the data within the Alteryx tool. Why use Alteryx with Tableau? The use of Alteryx with Tableau is a powerful combination when it comes to getting value-added data decisions. With Alteryx, businesses can manipulate their data and provide input to the Tableau platform, which in return will be able to showcase strong data visualizations. This will help the businesses to take appropriate actions which are backed up with data analysis. Alteryx and Tableau tools are widely used within organizations where the decisions can be taken into consideration based on the insights obtained from data analysis. Talking about data handling, Alteryx is a powerful ETL platform where data can be analyzed in different formats. When it comes to data representation, Tableau is a perfect match. Further, using Tableau the reports can be shared across team members. Nowadays, most of the businesses want to see real-time data and want to understand business trends. The combination of Alteryx and Tableau allows the data analysts to analyze the data, and generate meaningful insights to the users, on-the-fly. Here, data analysis can be executed within the Alteryx tool where the raw data is handled, and then the data representation or visualization is done in Tableau, so both of these tools go hand in hand. Tableau vs Alteryx The table below lists the differences between the tools. Alteryx Tableau This tool is known as a smart data analytics platform. This tool is known for its data visualization capabilities. 2. Can connect with different data sources and can synthesize the raw data. A standard ETL process is possible. 2. Can connect with different data sources and provide data visualization within minutes from the gathered data. 3. Helps in terms of the data analysis 3. Helps in terms of building appealing graphs. 4. The GUI is okay and widely accepted. 4. The GUI is one of the best features where graphs can be easily built by using drag and drop options. 5. Technical knowledge is necessary because it involves in data sources integrations, and also data blending activity. 5. Technical knowledge is not necessary, because all the data will be polished and only the user has to build graphs/visualization. 6.  Once the data blending activity is completed, the users will be able to share the file which can be consumed by Tableau. 6. Once the graphs are prepared, the reports can be easily shared among team members without any hassle. 7. A lot of flexibility while using this tool for data blending activity. 7. Flexibility while using the tool for data visualization. 8. Using this tool, the users will be able to do spatial and predictive analysis 8. Possible by representing the data in an appropriate format. 9.  One of the best tools when it comes to data preparations. 9. Not feasible to prepare the data in Tableau when it is compared to Alteryx. 10. Data representation cannot be done accurately. 10. It is a wonderful tool for data representation. 11. Has one time feeds- Annual fees 11. Has an option to pay monthly as well. 12. Has a drag and drop interface where the user can develop a workflow easily. 12. Has a drag and drop interface where the user will be able to build a visualization in no time. Alteryx and Tableau Integration As discussed earlier, these two tools have their own advantages and disadvantages, but when integrated together, they can do wonders with the data. This integration between Tableau and Alteryx makes the task of visualizing the Alteryx generated answers quite simple. The data is first loaded into the Alteryx tool and is then extracted in the form of .tde files (i.e. Tableau Data Extracted Files). These .tde files will be consumed by Tableau tool to do the data visualization part. On a regular basis, the data extracted file from Alteryx tool (i.e. .tde files) will be generated and will replace the old .tde files. Thus, by integrating Alteryx and Tableau, we can: Cleanse, combine, as well as collect all the data sources that are relevant and enrich them with the help of third-party data - everything in one workflow. Give analytical context to your data by providing predictive, location-based, and deep spatial analytics. Publish your analytic workflows’ results to Tableau for intuitive, rich visualizations that help you in making decisions more quickly. Tableau and Alteryx do not require any advanced skill-set as both tools have simple drag and drop interfaces. You can create a workflow in Alteryx that can process data in a sequential manner. In a similar way, Tableau enables you to build charts by dragging various fields to be utilized, to specified areas. The companies which have a lot of data to analyze, and can spend large amounts of money on analytics, can use these two tools. There doesn’t exist any significant challenges during Tableau, Alteryx integration. Conclusion When Tableau and Alteryx are used together, it is really useful for the businesses so that the senior management can take decisions based on the data insights provided by these tools. These two tools compliment each other and provide high-quality service to businesses. Author Bio Savaram Ravindra is a Senior Content Contributor at Mindmajix.com. His passion lies in writing articles on different niches, which include some of the most innovative and emerging software technologies, digital marketing, businesses, and so on. By being a guest blogger, he helps his company acquire quality traffic to its website and build its domain name and search engine authority. Before devoting his work full time to the writing profession, he was a programmer analyst at Cognizant Technology Solutions. Follow him on LinkedIn and Twitter. How to share insights using Alteryx Server How to do data storytelling well with Tableau [Video] A tale of two tools: Tableau and Power BI  
Read more
  • 0
  • 0
  • 43793

article-image-bootstrap-vs-material-design-for-your-next-web-or-app-development-project
Guest Contributor
08 Oct 2019
8 min read
Save for later

Should you use Bootstrap or Material Design for your next web or app development project?

Guest Contributor
08 Oct 2019
8 min read
Superior user experience is becoming increasingly important for businesses as it helps them to engage users and boost brand loyalty. Front-end website and app development platforms, namely Bootstrap vs Material Design empower developers to create websites with a robust structure and advanced functionality, thereby delivering outstanding business solutions and unbeatable user experience. Both Twitter’s Bootstrap vs Material Design are used by developers to create functional and high-quality websites and apps. If you are an aspiring front-end developer, here’s a direct comparison between the two, so you can choose the one that’s better suited for your upcoming project. BootStrap Bootstrap is an open-source, intuitive, and powerful framework used for responsive mobile-first solutions on the web. For several years, Bootstrap has helped developers create splendid mobile-ready front-end websites. In fact, Bootstrap is the most popular  CSS framework as it’s easy to learn and offers a consistent design by using re-usable components. Let’s dive deeper into the pros and cons of Bootstrap. Pros High speed of development If you have limited time for the website or app development, Bootstrap is an ideal choice. It offers ready-made blocks of code that can get you started within no time. So, you don’t have to start coding from scratch. Bootstrap also provides ready-made themes, templates, and other resources that can be downloaded and customized to suit your needs, allowing you to create a unique website as quickly as possible. Bootstrap is mobile first Since July 1, 2019, Google started using mobile-friendliness as a critical ranking factor for all websites. This is because users prefer using sites that are compatible with the screen size of the device they are using. In other words, they prefer accessing responsive sites. Bootstrap is an ideal choice for responsive sites as it has an excellent fluid grid system and responsive utility classes that make the task at hand easy and quick. Enjoys  a strong community support Bootstrap has a huge number of resources available on its official website and enjoys immense support from the developers’ community. Consequently, it helps all developers fix issues promptly. At present, Bootstrap is being developed and maintained on GitHub by Mark Otto, currently Principal Design & Brand Architect at GitHub, with nearly 19 thousand commits and 1087 contributors. The team regularly releases updates to fix any new issues and improve the effectiveness of the framework. For instance, currently, the Bootstrap team is working on releasing version 4.3 that will drop jQuery for regular JavaScript. This is primarily because jQuery adds 30KB to the webpage size and is tricky to configure with bundlers like Webpack. Similarly, Flexbox is a new feature added to the Bootstrap 4 framework. In fact, Bootstrap version 4 is rich with features, such as a Flexbox-based grid, responsive sizing and floats, auto margins, vertical centering, and new spacing utilities. Further, you will find plenty of websites offering Bootstrap tutorials, a wide collection of themes, templates, plugins, and user interface kit that can be used as per your taste and nature of the project. Cons All Bootstrap sites look the same The Twitter team introduced Bootstrap with the objective of helping developers use a standardized interface to create websites within a short time. However, one of the major drawbacks of this framework is that all websites created using this framework are highly recognizable as Bootstrap sites. Open Airbnb, Twitter, Apple Music, or Lyft. They all look the same with bold headlines, rounded sans-serif fonts, and lots of negative space. Bootstrap sites can be heavy Bootstrap is notorious for adding unnecessary bloat to websites as the files generated are huge in size. This leads to longer loading time and battery draining issues. Further, if you delete them manually, it defeats the purpose of using the framework. So, if you use this popular front-end UI library in your project, make sure you pay extra attention to page weight and page speed. May not be suitable for simple websites Bootstrap may not be the right front-end framework for all types of websites, especially the ones that don’t need a full-fledged framework. This is because, Bootstrap’s theme packages are incredibly heavy with battery-draining scripts. Also, Bootstrap has CSS weighing in at 126KB and 29KB of JavaScript that can increase the site’s loading time. In such cases, Bootstrap alternatives, namely Foundation, Skeleton, Pure, and Semantic UI adaptable and lightweight frameworks that can meet your developmental needs and improve your site’s user-friendliness. Material Design When compared to Bootstrap vs Material Design is hard to customize and learn. However, this design language was introduced by Google in 2014 with the objective of enhancing Android app’s design and user interface. The language is quite popular among developers as it offers a quick and effective way for web development. It includes responsive transitions and animations, lighting and shadows effects, and grid-based layouts. When developing a website or app using Material Design, designers should play to its strengths but be wary of its cons. Let’s see why. Pros Offers numerous components  Material Design offers numerous components that provide a base design, guidelines, and templates. Developers can work on this to create a suitable website or application for the business. The Material Design concept offers the necessary information on how to use each component. Moreover, Material Design Lite is quite popular for its customization. Many designers are creating customized components to take their projects to the next level. Is compatible across various browsers Both Bootstrap vs Material Design have a sound browser compatibility as they are compatible across most browsers. Material Design supports Angular Material and React Material User Interface. It also uses the SASS preprocessor. Doesn’t require JavaScript frameworks Bootstrap completely depends on JavaScript frameworks. However, Material Design doesn’t need any JavaScript frameworks or libraries to design websites or apps. In fact, the platform provides a material design framework that allows developers to create innovative components such as cards and badges. Cons The animations and vibrant colors can be distracting Material Design extensively uses animated transitions and vibrant colors and images that help bring the interface to life. However, these animations can adversely affect the human brain’s ability to gather information. It is affiliated to Google Since Material Design is a Google-promoted framework, Android is its prominent adopter. Consequently, developers looking to create apps on a platform-independent UX may find it tough to work with Material Design. However, when Google introduced the language, it had broad vision for Material Design that encompasses many platforms, including iOS. The tech giant has several Google Material Design components for iOS that can be used to render interesting effects using a flexible header, standard material colors, typography, and sliding tabs Carries performance overhead Material Design extensively uses animations that carry a lot of overhead. For instance, effects like drop shadow, color fill, and transform/translate transitions can be jerky and unpleasant for regular users. Wrapping up: Should you use Bootstrap vs Material Design for your next web or app development project? Bootstrap is great for responsive, simple, and professional websites. It enjoys immense support and documentation, making it easy for developers to work with it. So, if you are working on a project that needs to be completed within a short time, opt for Bootstrap. The framework is mainly focused on creating responsive, functional, and high-quality websites and apps that enhance the user experience. Notice how these websites have used Bootstrap to build responsive and mobile-first sites. (Source: cssreel) (Source: Awwwards) Material Design, on the other hand, is specific as a design language and great for building websites that focus on appearance, innovative designs, and beautiful animations. You can use Material Design for your portfolio sites, for instance. The framework is pretty detailed and straightforward to use and helps you create websites with striking effects. Check out how these websites and apps use the customized themes, popups, and buttons of Material Design. (Source:  Nimbus 9) (Source: Digital Trends) What do you think? Which framework works better for you? Bootstrap vs Material Design. Let us know in the comments section below. Author Bio Gaurav Belani is a Senior SEO and Content Marketing Analyst at The 20 Media, a Content Marketing agency that specializes in data-driven SEO. He has more than seven years of experience in Digital Marketing and along with that loves to read and write about AI, Machine Learning, Data Science and much more about the emerging technologies. In his spare time, he enjoys watching movies and listening to music. Connect with him on Twitter and Linkedin. Material-UI v4 releases with CSS specificity, Classes boilerplate, migration to Typescript and more Warp: Rust’s new web framework Learn how to Bootstrap a Spring application [Tutorial] Bootstrap 5 to replace jQuery with vanilla JavaScript How to use Bootstrap grid system for responsive website design?  
Read more
  • 0
  • 0
  • 43733
article-image-brief-history-python
Sam Wood
14 Oct 2015
4 min read
Save for later

A Brief History of Python

Sam Wood
14 Oct 2015
4 min read
From data to web development, Python has come to stand as one of the most important and most popular open source programming languages being used today. But whilst some see it as almost a new kid on the block, Python is actually older than both Java, R, and JavaScript. So what are the origins of our favorite open source language? In the beginning... Python's origins lie way back in distant December 1989, making it the same age as Taylor Swift. Created by Guido van Rossum (the Python community's Benevolent Dictator for Life) as a hobby project to work on during week around Christmas, Python is famously named not after the constrictor snake but rather the British comedy troupe Monty Python's Flying Circus. (We're quite thankful for this at Packt - we have no idea what we'd put on the cover if we had to pick for 'Monty' programming books!) Python was born out of the ABC language, a terminated project of the Dutch CWI research institute that van Rossum worked for, and the Amoeba distributed operating system. When Amoeba needed a scripting language, van Rossum created Python. One of the principle strengths of this new language was how easy it was to extend, and its support for multiple platforms - a vital innovation in the days of the first personal computers. Capable of communicating with libraries and differing file formats, Python quickly took off. Computer Programming for Everybody Python grew throughout the early nineties, acquiring lambda, reduce(), filter() and map() functional programming tools (supposedly courtesy of a Lisp hacker who missed them and thus submitted working patches), key word arguments, and built in support for complex numbers. During this period, Python also served a central role in van Rossum's Computer Programming for Everybody initiative. The CP4E's goal was to make programming more accessible to the 'layman' and encourage a basic level of coding literacy as an equal essential knowledge alongside English literacy and math skills. Because of Python's focus on clean syntax and accessibility, it played a key part in this. Although CP4E is now inactive, learning Python remains easy and Python is one of the most common languages that new would-be programmers are pointed at to learn. Going Open with 2.0 As Python grew in the nineties, one of the key issues in uptake was its continued dependence on van Rossum. 'What if Guido was hit by a bus?' Python users lamented, 'or if he dropped dead of exhaustion or if he is rubbed out by a member of a rival language following?' In 2000, Python 2.0 was released by the BeOpen Python Labs team. The ethos of 2.0 was very much more open and community oriented in its development process, with much greater transparency. Python moved its repository to SourceForge, granting write access to its CVS tree more people and an easy way to report bugs and submit patches. As the release notes stated, 'the most important change in Python 2.0 may not be to the code at all, but to how Python is developed'. Python 2.7 is still used today - and will be supported until 2020. But the word from development is clear - there will be no 2.8. Instead, support remains focused upon 2.7's usurping younger brother - Python 3. The Rise of Python 3 In 2008, Python 3 was released on an almost-unthinkable premise - a complete overhaul of the language, with no backwards compatibility. The decision was controversial, and born in part of the desire to clean house on Python. There was a great emphasis on removing duplicative constructs and modules, to ensure that in Python 3 there was one - and only one - obvious way of doing things. Despite the introduction of tools such as '2to3' that could identify quickly what would need to be changed in Python 2 code to make it work in Python 3, many users stuck with their classic codebases. Even today, there is no assumption that Python programmers will be working with Python 3. Despite flame wars raging across the Python community, Python 3's future ascendancy was something of an inevitability. Python 2 remains a supported language (for now). But as much as it may still be the default choice of Python, Python 3 is the language's future. The Future Python's userbase is vast and growing - it's not going away any time soon. Utilized by the likes of Nokia, Google, and even NASA for it's easy syntax, it looks to have a bright future ahead of it supported by a huge community of OS developers. Its support of multiple programming paradigms, including object-oriented Python programming, functional Python programming, and parallel programming models makes it a highly adaptive choice - and its uptake keeps growing.
Read more
  • 0
  • 0
  • 43526

article-image-brief-history-minecraft-modding
Aaron Mills
03 Jun 2015
7 min read
Save for later

A Brief History of Minecraft Modding

Aaron Mills
03 Jun 2015
7 min read
Minecraft modding has been around since nearly the beginning. During that time it has gone through several transformations or “eras." The early days and early mods looked very different from today. I first became involved in the community during Mid-Beta, so everything that happened before then is second hand knowledge. A great deal has been lost to the sands of time, but the important stops along the way are remembered, as we shall explore. Minecraft has gone through several development stages over the years. Interestingly, these stages also correspond to the various “eras” of Minecraft Modding. Minecraft Survival was first experienced as Survival Test during Classic, then again in the Indev stage, which gave way to Infdev, then to Alpha and Beta before finally reaching Release. But before all that was Classic. Classic was released in May of 2009 and development continued into September of that year. Classic saw the introduction of Survival and Multiplayer. During this period of Minecraft’s history, modding was in its infancy. On the one hand, Server modding thrived during this stage with several different Server mods available. (These mods were the predecessors to Bukkit, which we will cover later.) Generally, the purpose of these mods was to give server admins more tools for maintaining their servers. On the other hand, however, Client side mods, ones that add new content, didn’t really start appearing until the Alpha stage. Alpha was released in late June of 2010, and it would continue for the rest of the year. Prior to Alpha, came Indev and Infdev, but there isn’t much evidence of any mods during that time period, possibly because of the lack of Multiplayer in Indev and Infdev. Alpha brought the return of Multiplayer, and during this time Minecraft began to see its first simple Client mods. Initially it was just simple modification of existing content: adding support for Higher Resolution textures, new arrow types, bug fixes, compass modifications, etc. The mods were simple and small. This began to change, though, beginning with the creation of the Minecraft Coder Pack, which was later renamed the Mod Coder Pack, commonly known as MCP. (One of the primary creators of MCP, Michael “Searge” Stoyke, now actually works for Mojang.) MCP saw its first release for Alpha 1.1.2_01 sometime in mid 2010. Despite being easily decompiled, Minecraft code was also obfuscated. Obfuscation is when you take all the meaningful names and words in the code and replace it with non-human readable nonsense. The computer can still make sense of it just fine, but humans have a hard time. MCP resolved this limitation by applying meaningful names to the code, making modding significantly easier than ever before. At the same time, but developing completely independently, was the server mod hMod, which gave some simple but absolutely necessary tools to server admins. However, hMod was in trouble as the main dev was MIA. This situation eventually led to the creation of Bukkit, a server mod designed from the ground up to support “plugins” and do everything that hMod couldn’t do. Bukkit was created by a group of people who were also eventually hired by Mojang: Nathan 'Dinnerbone' Adams, Erik 'Grum' Broes, Warren 'EvilSeph' Loo, and Nathan 'Tahg' Gilbert. Bukkit went on to become possibly the most popular Minecraft mod ever created. Many in fact believe its existence is largely responsible for the popularity of online Minecraft servers. However, it will remain largely incompatible with client side mods for some time. Not to be left behind, the client saw another major development late in the year: Risugami’s ModLoader. ModLoader was transformational. Prior to the existence of ModLoader, if you wanted to use two mods, you would have to manually merge the code, line by line, yourself. There were many common tasks that couldn’t be done without editing Minecraft’s base code, things such as adding new blocks and items. ModLoader changed that by creating a framework where simple mods could hook into ModLoader code to perform common tasks that previously required base edits. It was simple, and it would never really expand beyond its original scope. Still, it led modding into a new era. Minecraft Beta, what many call the “Golden Age” of modding, was released just before Christmas in 2010 and would continue through 2011. Beta saw the rise of many familiar mods that are still recognized today, including my own mod, Railcraft. Also IndustrialCraft, Buildcraft, Redpower, and Better than Wolves all saw their start during this period. These were major mods that added many new blocks and features to Minecraft. Additionally, the massive Aether mod, which recently received a modern reboot, was also released during Beta. These mods and more redefined the meaning of “Minecraft Mods”. They existed on a completely new scale, sometimes completely changing the game. But there were still flaws. Mods were still painful to create and painful to use. You couldn’t use IndustrialCraft and Buildcraft at the same time; they just edited too many of the same base files. ModLoader only covered the most common base edits, barely touching the code, and not enough for a major mod. Additionally, to use a mod, you still had to manually insert code into the Minecraft jar, a task that turned many players off of modding. Seeing that their mods couldn’t be used together, the creators of several major mods launched a new project. They would call it Minecraft Forge. Started by Eloraam of Redpower and SpaceToad of Buildcraft, it would see rapid adoption by many of the major mods of the time. Forge built on top of ModLoader, greatly expanding the number of base hooks and allowing many more mods to work together than was previously possible. This ushered in the true “Golden Age” of modding, which would continue from Beta and into Release. Minecraft 1.0 was released in November of 2011, heralding Minecraft’s “Official” release. Around the same time, client modding was undergoing a shift. Many of the most prominent developers were moving on to other things, including the entire Forge team. For the most part, their mods would survive without them, but some would not. Redpower, for example ceased all development in late 2012. Eloraam, SpaceToad, and Flowerchild would hand the reigns of Forge off to LexManos, a relatively unknown name at the time. The “Golden Age” was at an end, but it was replaced by an explosion of new mods and modding was becoming even more popular than ever. The new Forge team, consisting mainly of LexManos and cpw, would bring many new innovations to modding. Eventually they even developed a replacement for Risugami’s ModLoader, naming it ForgeModLoader and incorporating it into Forge. Users would no longer be required to muck around with Minecraft’s internals to install mods. Innovation has continued to the present day, and mods for Minecraft have become too numerous to count. However, the picture for server mods hasn’t been so rosy. Bukkit, the long dominant server mod, suffered a killing blow in 2014. Licensing conflicts developed between the original creators and maintainers, largely revolving around the who “owned” the project after the primary maintainers resigned. Ultimately, one of the most prolific maintainers used a technicality to invalidate the rights of the project to use his code, effectively killing the entire project. A replacement has yet to develop, leaving the server community limping along on increasingly outdated code. But one shouldn’t be too concerned about the future. There have been challenges in the past, but nearly every time a project died, it was soon replaced by something even better. Minecraft has one of the largest, most vibrant, and most mainstream modding communities ever to exist. It’s had a long and varied history, and this has been just a brief glimpse into that heritage. There are many more events, both large and small, that have helped shape the community. May the future of Minecraft continue to be as interesting. About the Author Aaron Mills was born in 1983 and lives in the Pacific Northwest, which is a land rich in lore, trees, and rain. He has a Bachelor's Degree in Computer Science and studied at Washington State University Vancouver. He is best known for his work on the Minecraft Mod, Railcraft, but has also contributed significantly to the Minecraft Mods of Forestry and Buildcraft as well some contributions to the Minecraft Forge project.
Read more
  • 0
  • 0
  • 43062

article-image-devops-engineering-and-full-stack-development
Richard Gall
28 Jul 2015
5 min read
Save for later

DevOps engineering and full-stack development – 2 sides of the same agile coin

Richard Gall
28 Jul 2015
5 min read
Two of the most talked-about and on-trend roles in tech dominated our Skill Up survey – DevOps engineers and Full-Stack developers. Even before we started exploring our data, we knew that both would feature heavily. Given the amount of time spent online arguing about DevOps and the merits and drawbacks of full-stack development, it’s interesting to see exactly what it means to be a DevOps engineer or full-stack developer. From salary to tool use, both our Web Development and SysAdmin and Security Salary and Skills Reports offer an insight into the professional lives of people actually performing these roles every day. The similarities between DevOps engineering and full-stack development The similarities between the two roles are striking. Both DevOps engineering and full-stack development are having a considerable impact on the way in which technology is used and understood within organizations and businesses – which makes them particularly valuable. In SMEs, for example, DevOps engineers command almost the same amount of money as in Enterprise. Considering the current economic climate, it’s a clear signal of the value of DevOps practices in environments where flexibility and the ability to adapt to changing demands and expectations are crucial to growth. Full-stack developers also command the highest salaries in many industries. In consultancy, for example, full-stack developers earn significantly more than any other web development role. While this could suggest that organizations aren’t yet willing to invest in (or simply don’t need) in-house full-stack developers, it highlights that they are nevertheless willing to spend money on individuals with full-stack knowledge, who are capable of delivering cutting-edge insight. However, just as we saw Cloud consultancies dominate the tech consultancy market a few years ago, over time it’s likely that full-stack development will become more and more established as a standard. DevOps engineers and full-stack developers share the same philosophical germ. They are symptoms of a growing business demand for greater agility and flexibility, and hint at a trend towards greater generalization in the skillset of technical professionals. part of the thrill of #devops to me is how there's no true agreement about what it is. it's like watching LOST all over again — jon devops hendren (@devops) May 18, 2015 Full-stack developers are using DevOps tools I’ve always seen them as manifestations of similar ideas in different technical areas. However, when you look at the data we’ve collected in our survey, alongside some wider research, the relationship between the DevOps engineer and the Full-Stack developer might possibly be more than purely conceptual. ‘Full-Stack’ and ‘DevOps’ are both terms that blur the lines between developer and engineer, and both are two sides of an intriguing form of cross-pollination; technologies more commonly used for deployment and automation. Docker and Vagrant were the most notable, highlighting the impact of containerization and virtualization on web development, but we also found a number of references to the Microsoft automation tool PowerShell – a distinctly DevOps-esque tool if ever there was one. Perhaps there’s a danger of overstating my point – surely we shouldn’t be surprised if web developers are using these tools – it’s not that strange, right? Maybe, but the fact that tools such as these are being used by web developers in their day-to-day work suggests that they are no longer simply expected to develop: they also need to deploy and configure their projects. Indeed, it’s worth noting that across all our web development respondents, a large number plan on learning Docker over the next 12 months. DevOps engineers use a huge range of tools DevOps Engineers were even more eclectic in their tool-usage than full-stack developers. Python is the language of-choice and Puppet the go-to configuration management tool, but web tools such as JavaScript and PHP are also being used. References to Flask, for example, the Python microframework, emphasise the way in which DevOps Engineers have an eye on web development while they’re automating your infrastructure. Taken alone, these responses might not fully evidence the relationship between DevOps engineers and Full-Stack developers. However, there are jobs out there asking for a combination of both skillsets. One, posted by a recruiter working for a nameless ‘creative media house’ in London, was looking for someone to become ‘a key member of multi-party cloud research projects, helping to bring a microservices-based video automation system to life, integrate development and developed systems into onside and global infrastructure’. The tools being asked for were very varied indeed. From a high-level language, such as JavaScript, to scripting languages such as Bash, Python and Perl, to continuous integration tools, configuration management tools and containerization technologies, whoever eventually gets the job certainly deserves to be called a polyglot. Blurring the line between full-stack and DevOps A further indication of the blurred line between engineers and developers can be found in this article from computing.co.uk. It’s an interesting tale of how working practices develop according to necessity and how methodologies and ideas interact with the practical details of a given situation. It tells the story of how the Washington Post went about building its submission platform, and how the way in which the project was resourced and managed changed according to certain pressures – internal and external. The title might actually be misleading – if you read it, it’s not so much that DevOps necessitates full-stack development, more that each thing grows out of the next. It might even be said that the reverse is true – that full-stack development necessitates DevOps thinking. The relationship between DevOps and full-stack development gives a real indication of the state of the tech world in 2015. Within a tech landscape of increasing complexity and cross-pollination there are going to be opportunities for developers and engineers to significantly drive their value as technical professionals. It’s simply a question of learning more, and of being open to new challenges and ideas about how to work effectively. It probably won’t be easy, but it might just be a fun journey.
Read more
  • 0
  • 0
  • 42728
article-image-tools-to-stay-completely-anonymous-online
Guest Contributor
12 Jul 2018
8 min read
Save for later

10 great tools to stay completely anonymous online

Guest Contributor
12 Jul 2018
8 min read
Everybody is facing a battle these days. Though it may not be immediately apparent, it is already affecting a majority of the global population. This battle is not fought with bombs, planes, or tanks or with any physical weapons for that matter. This battle is for our online privacy. A survey made last year discovered 69% of data breaches were related to identity theft. Another survey shows the number of cases of data breaches related to identity theft has steadily risen over the last 4 years worldwide. And it is likely to increase as hackers are gaining easy access more advanced tools. The EU’s GDPR may curb this trend by imposing stricter data protection standards on data controllers and processors. These entities have been collecting and storing our data for years through ads that track our online habits-- another reason to protect our online anonymity. However, this new regulation has only been in force for over a month and only within the EU. So, it's going to take some time before we feel its long-term effects. The question is, what should we do when hackers out there try to steal and maliciously use our personal information? Simple: We defend ourselves with tools at our disposal to keep ourselves completely anonymous online. So, here’s a list you may find useful. 1. VPNs A VPN helps you maintain anonymity by hiding your real IP and internet activity from prying eyes. Normally, your browser sends a query tagged with your IP every time you make an online search. Your ISP takes this query and sends it to a DNS server which then points you to the correct website. Of course, your ISP (and all the servers your query had to go through) can, and will likely, view and monitor all the data you course through them-- including your personal information and IP address. This allows them to keep a tab on all your internet activity. A VPN protects your identity by assigning you an anonymous IP and encrypting your data. This means that any query you send to your ISP will be encrypted and no longer display your real IP. This is why using a VPN is one of the best ways to keeping anonymous online. However, not all VPNs are created equal. You have to choose the best one if you want airtight security. Also, beware of free VPNs. Most of them make money by selling your data to advertisers. You’ll want to compare and contrast several VPNs to find the best one for you. But, that’s sooner said than done with so many different VPNs out there. Look for reviews on trustworthy sites to find the best vpn for your needs. 2. TOR Browser The Onion Router (TOR) is a browser that strengthens your online anonymity even more by using different layers of encryption-- thereby protecting your internet activity which includes “visits to Web sites, online posts, instant messages, and other communication forms”. It works by first encasing your data in three layers of encryption. Your data is then bounced three times-- each bounce taking off one layer of encryption. Once your data gets to the right server, it “puts back on” each layer it has shed as it successively bounces back to your device. You can even improve TOR by using it in combination with a compatible VPN. It is important to note, though, that using TOR won’t hide the fact that you’re using it. Some sites may restrict allowances made through TOR. 3. Virtual machine A Virtual machine is basically a second computer within your computer. It lets you emulate another device through an application. This emulated computer can then be set according to your preferences. The best use for this tool, however, is for tasks that don’t involve an internet connection. It is best used for when you want to open a file and want to make sure no one is watching over your shoulder. After opening the file, you then simply delete the virtual machine. You can try VirtualBox which is available on Windows, Linux, and Mac. 4. Proxy servers A proxy server is an intermediary between your device and the internet. It’s basically another computer that you use to process internet requests. It’s similar to a virtual machine in concept but it’s an entirely separate physical machine. It protects your anonymity in a similar way a VPN does (by hiding your IP) but it can also send a different user agent to keep your browser unidentifiable and block or accept cookies but keep them from passing to your device. Most VPN companies also offer proxy servers so they’re a good place to look for a reliable one. 5. Fake emails A fake email is exactly what the name suggests: an email that isn’t linked to your real identity. Fake emails aid your online anonymity by not only hiding your real identity but by making sure to keep you safe from phishing emails or malware-- which can be easily sent to you via email. Making a fake email can be as easy as signing up for an email without using your real information or by using a fake email service. 6. Incognito mode “Going incognito” is the easiest anonymity tool to come by. Your device will not store any data at all while in this mode including: your browsing history, cookies, site data, and information entered in forms. Most browsers have a privacy mode that you can easily use to hide your online activity from other users of the same device. 7. Ad blockers Ads are everywhere these days. Advertising has and always will be a lucrative business. That said, there is a difference between good ads and bad ads. Good ads are those that target a population as a whole. Bad ads (interest-based advertising, as their companies like to call it) target each of us individually by tracking our online activity and location-- which compromises our online privacy. Tracking algorithms aren’t illegal, though, and have even been considered “clever”. But, the worst ads are those that contain malware that can infect your device and prevent you from using it. You can use ad blockers to combat these threats to your anonymity and security. Ad blockers usually come in the form of browser extensions which instantly work with no additional configuration needed. For Google Chrome, you can choose either Adblock Plus, uBlock Origin, or AdBlock. For Opera, you can choose either Opera Ad Blocker, Adblock Plus, or uBlock Origin. 8. Secure messaging apps If you need to use an online messaging app, you should know that the popular ones aren’t as secure as you’d like them to be. True, Facebook messenger does have a “secret conversation” feature but Facebook hasn’t exactly been the most secure social network to begin with. Instead, use tools like Signal or Telegram. These apps use end-to-end encryption and can even be used to make voice calls. 9. File shredder The right to be forgotten has surfaced in mainstream media with the onset of the EU’s General Data Protection Regulation. This right basically requires data collecting or processing entities to completely remove a data subject’s PII from their records. You can practice this same right on your own device by using a “file shredding” tool. But the the thing is: Completely removing sensitive files from your device is hard. Simply deleting it and emptying your device’s recycle bin doesn’t actually remove the file-- your device just treats the space it filled up as empty and available space. These “dead” files can still haunt you when they are found by someone who knows where to look. You can use software like Dr. Cleaner (for Mac) or Eraser (for Win) to “shred” your sensitive files by overwriting them several times with random patterns of random sets of data. 10. DuckDuckGo DuckDuckGo is a search engine that doesn’t track your behaviour (like Google and Bing that use behavioural trackers to target you with ads). It emphasizes your privacy and avoids the filter bubble of personalized search results. It offers useful features like region-specific searching, Safe Search (to protect against explicit content), and an instant answer feature which shows an answer across the top of the screen apart from the search results. To sum it up: Our online privacy is being attacked from all sides. Ads legally track our online activities and hackers steal our personal information. The GDPR may help in the long run but that remains to be seen. What's important is what we do now. These tools will set you on the path to a more secure and private internet experience today. About the Author Dana Jackson, an U.S. expat living in Germany and the founder of PrivacyHub. She loves all things related to security and privacy. She holds a degree in Political Science, and loves to call herself a scientist. Dana also loves morning coffee and her dog Paw.   [divider style="normal" top="20" bottom="20"] Top 5 cybersecurity trends you should be aware of in 2018 Twitter allegedly deleted 70 million fake accounts in an attempt to curb fake news Top 5 cybersecurity myths debunked  
Read more
  • 0
  • 4
  • 42434

article-image-python-r-war
Amey Varangaonkar
28 Aug 2017
7 min read
Save for later

Is Python edging R out in the data science wars?

Amey Varangaonkar
28 Aug 2017
7 min read
When it comes to the ‘lingua franca’ of data science, there seems to be a face-off between R and Python. R has long been established as the language of researchers and statisticians but Python has come up quickly as a bona-fide challenger, helping embed analytics as a necessity for businesses and other organizations in 2017. If  a tech war does exist between the two languages, it’s a battle fought not so much on technical features but instead on the wider changes within modern business and technology. R is a language purpose-built for statistics, for performing accurate and intensive analysis. So, the fact that R is being challenged by Python — a language that is flexible, fast, and relatively easy to learn — suggests we are seeing a change in who’s actually doing data science, where they’re doing it, and what they’re trying to achieve. Python versus R — A Closer Look Let’s make a quick comparison of the two languages on aspects important to those working with data and see what we can learn about the two worlds where R and Python operate. Learning curve Python is the easier language to learn. While R certainly isn’t impenetrable, Python’s syntax marks it as a great language to learn even if you’re completely new to programming. The fact that such an easy language would come to rival R within data science indicates the pace at which the field is expanding. More and more people are taking on data-related roles, possibly without a great deal of programming knowledge — Python makes the barrier to entry much lower than R. That said, once you get to grips with the basics of R, it becomes relatively easier to learn the more advanced stuff. This is why statisticians and experienced programmers find R easier to use. Packages and libraries Many R packages are in-built. Python, meanwhile, depends upon a range of external packages. This obviously makes R much more efficient as a statistical tool — it means that if you’re using Python you need to know exactly what you’re trying to do and what external support you’re going to need. Data Visualization R is well-known for its excellent graphical capabilities. This makes it easy to present and communicate data in varied forms. For statisticians and researchers, the importance of that is obvious. It means you can perform your analysis and present your work in a way that is relatively seamless. The ggplot2 package in R, for example, allows you to create complex and elegant plots with ease and as a result, its popularity in the R community has increased over the years. Python also offers a wide range of libraries which can be used for effective data storytelling. The breadth of external packages available with Python means the scope of what’s possible is always expanding. Matplotlib has been a mainstay of Python data visualization. It’s also worth remarking on upcoming libraries like Seaborn. Seaborn is a neat little library that sits on top of Matplotlib, wrapping its functionality and giving you a neater API for specific applications. So, to sum up, you have sufficient options to perform your data visualization tasks effectively — using either R or Python! Analytics and Machine Learning Thanks to libraries like scikit-learn, Python helps you build machine learning systems with relative ease. This takes us back to the point about barrier to entry. If machine learning is upending how we use and understand data, it makes sense that more people want a piece of the action without having to put in too much effort. But Python also has another advantage; it’s great for creating web services where data can be uploaded by different people. In a world where accessibility and data empowerment have never been more important (i.e., where everyone takes an interest in data, not just the data team), this could prove crucial. With packages such as caret, MICE, and e1071, R too gives you the power to perform effective machine learning in order to get crucial insights out of your data. However, R falls short in comparison to Python, thanks to the latter’s superior libraries and more diverse use-cases. Deep Learning Both R and Python have libraries for deep learning. It’s much easier and more efficient with Python though — most likely because the Python world changes much more quickly, new libraries and tools springing up as quickly as the data science world hooks on to a new buzzword. Theano, and most recently Keras and TensorFlow have all made a huge impact on making it relatively easy to build incredibly complex and sophisticated deep learning systems. If you’re clued-up and experienced with R it shouldn’t be too hard to do the same, using libraries such as MXNetR, deepr, and H2O — that said, if you want to switch models, you may need to switch tools, which could be a bit of a headache. Big Data With Python, you can write efficient MapReduce applications with ease, or scale your R program on Hadoop to work with petabytes of data. Both R and Python are equally good when it comes to working with Big Data, as they can be seamlessly integrated with Big Data tools such as Apache Spark and Apache Hadoop, among many others. It’s likely that it’s in this field that we’re going to see R moving more and more into industry as businesses look for a concise way to handle large datasets. This is true in industries such as bioinformatics which have a close connection with the academic world and necessarily depend upon a combination of size and accuracy when it comes to working with data. So, where does this comparison leave us? Ultimately, what we see are two different languages offering great solutions to very different problems in data science. In Python, we have a flexible and adaptable language with a vibrant community of developers working on a huge range of problems and tasks, each one trying to find more effective and more intelligent ways of doing things. In R, we have a purely statistical language with a large repository of over 8000 packages for data analysis and visualization. While Python is production-ready and is better suited for organizations looking to harness technical innovation to its advantage, R’s analytical and data visualization capabilities can make your life as a statistician or data analyst easier. Recent surveys indicate that Python commands a higher salary than R — that is because it’s a language that can be used across domains; a problem-solving language. That’s not to say that R isn’t a valuable language; rather, Python is the language that just seems to fit the times at the moment. In the end, it all boils down to your background, and the kind of data problems you want to solve. If you come from a statistics or research background and your problems only revolve around statistical analysis and visualization, then R would best fit your bill. However, if you’re a Computer Science graduate looking to build a general-purpose, enterprise-wide data model which can integrate seamlessly with the other business workflows, you will find Python easier to use. R and Python are two different animals. Instead of comparing the two, maybe it’s time we understood where and how each can be best used and then harnessed their power to the fullest to solve our data problems. One thing is for sure, though — neither is going away anytime soon. Both R and Python occupy a large chunk of the data science market-share today, and it will take a major disruption to take either one of them out of the equation completely.
Read more
  • 0
  • 1
  • 41963