Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Tech Guides

852 Articles
article-image-using-meta-learning-nonstationary-competitive-environments-pieter-abbeel-et-al
Sugandha Lahoti
15 Feb 2018
5 min read
Save for later

Using Meta-Learning in Nonstationary and Competitive Environments with Pieter Abbeel et al

Sugandha Lahoti
15 Feb 2018
5 min read
This ICLR 2018 accepted paper, Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments, addresses the use of meta-learning to operate in non-stationary environments, represented as a Markov chain of distinct tasks. This paper is authored by Pieter Abbeel, Maruan Al-Shedivat, Trapit Bansal, Yura Burda, Ilya Sutskever, and Igor Mordatch. Pieter Abbeel is a professor at UC Berkeley since 2008. He was also a Research Scientist at OpenAI (2016-2017). His current research focuses on robotics and machine learning with particular focus on meta-learning and deep reinforcement learning. One of the other authors of this paper, Ilya Sutskever is the co-founder and Research Director of OpenAI. He was also a Research Scientist at the Google Brain Team for 3 years. Meta-Learning, or alternatively learning to learn, typically uses metadata to understand how automatic learning can become flexible in solving learning problems, i.e. to learn the learning algorithm itself. Continuous adaptation in real-world environments is quite essential for any learning agent and meta-learning approach is an appropriate choice for this task. This article will talk about one of the top accepted research papers in the field of meta-learning at the 6th annual ICLR conference scheduled to happen between April 30 - May 03, 2018. Using a gradient-based meta-learning algorithm for Nonstationary Environments What problem is the paper attempting to solve? Reinforcement Learning algorithms, although achieving impressive results ranging from playing games to applications in dialogue systems to robotics, are only limited to solving tasks in stationary environments. On the other hand, the real-world is often nonstationary either due to complexity, changes in the dynamics in the environment over the lifetime of a system, or presence of multiple learning actors. Nonstationarity breaks the standard assumptions and requires agents to continuously adapt, both at training and execution time, in order to succeed. The classical approaches to dealing with nonstationarity are usually based on context detection and tracking i.e., reacting to the already happened changes in the environment by continuously fine-tuning the policy. However, nonstationarity allows only for limited interaction before the properties of the environment change. Thus, it immediately puts learning into the few-shot regime and often renders simple fine-tuning methods impractical. In order to continuously learn and adapt from limited experience in nonstationary environments, the authors of this paper propose the learning-to-learn (or meta-learning) approach. Paper summary This paper proposes a gradient-based meta-learning algorithm suitable for continuous adaptation of RL agents in nonstationary environments. The agents meta-learn to anticipate the changes in the environment and update their policies accordingly. This method builds upon the previous work on gradient-based model-agnostic meta-learning (MAML) that has been shown successful in the few shot settings. Their algorithm re-derive MAML for multi-task reinforcement learning from a probabilistic perspective, and then extends it to dynamically changing tasks. This paper also considers the problem of continuous adaptation to a learning opponent in a competitive multi-agent setting and have designed RoboSumo—a 3D environment with simulated physics that allows pairs of agents to compete against each other. The paper answers the following questions: What is the behavior of different adaptation methods (in nonstationary locomotion and competitive multi-agent environments) when the interaction with the environment is strictly limited to one or very few episodes before it changes? What is the sample complexity of different methods, i.e., how many episodes are required for a method to successfully adapt to the changes? Additionally, it answers the following questions specific to the competitive multi-agent setting: Given a diverse population of agents that have been trained under the same curriculum, how do different adaptation methods rank in a competition versus each other? When the population of agents is evolved for several generations, what happens with the proportions of different agents in the population? Key Takeaways This work proposes a simple gradient-based meta-learning approach suitable for continuous adaptation in nonstationary environments. This method was applied to nonstationary locomotion and within a competitive multi-agent setting—the RoboSumo environment. The key idea of the method is to regard nonstationarity as a sequence of stationary tasks and train agents to exploit the dependencies between consecutive tasks such that they can handle similar nonstationarities at execution time. In both cases, i.e meta-learning algorithm and the multi-agent setting,  meta-learned adaptation rules were more efficient than the baselines in the few-shot regime. Additionally, agents that meta-learned to adapt, demonstrated the highest level of skill when competing in iterated games against each other. Reviewer feedback summary Overall Score: 24/30 Average Score: 8 The paper was termed as a great contribution to ICLR. According to the reviewers, the paper addressed a very important problem for general AI and was well-written. They also appreciated the careful experiment designs, and thorough comparisons making the results convincing. They found that editorial rigor and image quality could be better. However, there was no content related improvements suggested. The paper was appreciated for being dense and rich on rapid meta-learning.
Read more
  • 0
  • 0
  • 10354

article-image-pieter-abbeel-et-al-improve-exploratory-behaviour-deep-rl-algorithms
Sugandha Lahoti
14 Feb 2018
4 min read
Save for later

Pieter Abbeel et al on how to improve the exploratory behaviour of Deep Reinforcement Learning algorithms

Sugandha Lahoti
14 Feb 2018
4 min read
The paper, Parameter space noise for exploration proposes parameter space noise as an efficient solution for exploration, a big problem for deep reinforcement learning. This paper is authored by Pieter Abbeel, Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour, and Marcin Andrychowicz. Pieter Abbeel is currently a professor at UC Berkeley since 2008. He was also a Research Scientist at OpenAI (2016-2017). Pieter is one of the pioneers of deep reinforcement learning for robotics, including learning locomotion and visuomotor skills. His current research focuses on robotics and machine learning with particular focus on deep reinforcement learning, meta-learning, and AI safety. Deep reinforcement learning is the combination of deep learning with reinforcement learning to create artificial agents to achieve human-level performance across many challenging domains. This article will talk about one of Pieter’s top accepted research papers in the field of deep reinforcement learning at the 6th annual ICLR conference scheduled to happen between April 30 - May 03, 2018. Improving the exploratory behavior of Deep RL algorithms with Parameter Space Noise What problem is the paper attempting to solve? This paper is about the exploration challenge in deep reinforcement learning (RL) algorithms. The main purpose of exploration is to ensure that the agent’s behavior does not converge prematurely to a local optimum. Enabling efficient and effective exploration is difficult since it is not directed by the reward function of the underlying Markov decision process (MDP). A large number of methods have been proposed to tackle this challenge in high-dimensional and/or continuous-action MDPs. These methods increase the exploratory nature of these algorithms through the addition of temporally-correlated noise or through the addition of parameter noise. The main limitation of these methods is that they are either only proposed and evaluated for the on-policy setting with relatively small and shallow function approximators or disregard all temporal structure and gradient information. Paper summary This paper proposes adding noise to the parameters (parameter space noise) of a deep network when taking actions in deep reinforcement learning to encourage exploration. The effectiveness of this approach is demonstrated through empirical analysis across a variety of reinforcement learning tasks (i.e.DQN, DDPG, and TRPO). It answers the following questions: Do existing state-of-the-art RL algorithms benefit from incorporating parameter space noise? Does parameter space noise aid in exploring sparse reward environments more effectively? How does parameter space noise exploration compare against evolution strategies for deep policies with respect to sample efficiency? Key Takeaways The paper describes a method which proves parameter space noise as a conceptually simple yet effective replacement for traditional action space noise like -greedy and additive Gaussian noise. This work shows that parameter perturbations can successfully be combined with contemporary on- and off-policy deep RL algorithms such as DQN, DDPG, and TRPO and often results in improved performance compared to action noise. The paper attempts to prove with experiments that using parameter noise allows solving environments with very sparse rewards, in which action noise is unlikely to succeed. Parameter space noise is a viable and interesting alternative to action space noise, which is still the effective standard in most reinforcement learning applications. Reviewer feedback summary Overall Score: 20/30 Average Score: 6.66 The reviewers were pleased with the paper. They termed it as a simple strategy for exploration that is effective empirically. The paper was found to be clear and well written with thorough experiments across deep RL domains.  The authors have also released open-source code along with their paper for reproducibility, which was appreciated by the reviewers. However, a common trend among the reviews was that the authors overstated their claims and contributions.  The reviewers called out some statements in particular (e.g. the discussion of ES and RL). They also felt that the paper lacked a strong justification for the method other than it being empirically effective and intuitive.
Read more
  • 0
  • 0
  • 4712

article-image-create-strong-data-science-project-portfolio-lands-job
Aaron Lazar
13 Feb 2018
8 min read
Save for later

How to create a strong data science project portfolio that lands you a job

Aaron Lazar
13 Feb 2018
8 min read
Okay, you’re probably here because you’ve got just a few months to graduate and the projects section of your resume is blank. Or you’re just an inquisitive little nerd scraping the WWW for ways to crack that dream job. Either way, you’re not alone and there are ten thousand others trying to build a great Data Science portfolio to land them a good job. Look no further, we’ll try our best to help you on how to make a portfolio that catches the recruiter’s eye! David “Trent” Salazar‘s portfolio is a great example of a wholesome one and Sajal Sharma’s, is a good example of how one can display their Data Science Portfolios on a platform like Github. Companies are on the lookout for employees who can add value to the business. To showcase this on your resume effectively, the first step is to understand the different ways in which you can add value. 4 things you need to show in a data science portfolio Data science can be broken down into 4 broad areas: Obtaining insights from data and presenting them to the business leaders Designing an application that directly benefits the customer Designing an application or system that directly benefits other teams in the organisation Sharing expertise on data science with other teams You’ll need to ensure that your portfolio portrays all or at least most of the above, in order to easily make it through a job selection. So let’s see what we can do to make a great portfolio. Demonstrate that you know what you're doing So the idea is to show the recruiter that you’re capable of performing the critical aspects of Data Science, i.e. import a data set, clean the data, extract useful information from the data using various techniques, and finally visualise the findings and communicate them. Apart from the technical skills, there are a few soft skills that are expected as well. For instance, the ability to communicate and collaborate with others, the ability to reason and take the initiative when required. If your project is actually able to communicate these things, you’re in! Stay focused and be specific You might know a lot, but rather than throwing all your skills, projects and knowledge in the employer’s face, it’s always better to be focused on doing something and doing it right. Just as you’d do in your resume, keeping things short and sweet, you can implement this while building your portfolio too. Always remember, the interviewer is looking for specific skills. Research the data science job market Find 5-6 jobs, probably from Linkedin or Indeed, that interest you and go through their descriptions thoroughly. Understand what kind of skills the employer is looking for. For example, it could be classification, machine learning, statistical modeling or regression. Pick up the tools that are required for the job - for example, Python, R, TensorFlow, Hadoop, or whatever might get the job done. If you don’t know how to use that tool, you’ll want to skill-up as you work your way through the projects. Also, identify the kind of data that they would like you to be working on, like text or numerical, etc. Now, once you have this information at hand, start building your project around these skills and tools. Be a problem solver Working on projects that are not actual ‘problems’ that you’re solving, won’t stand out in your portfolio. The closer your projects are to the real-world, the easier it will be for the recruiter to make their decision to choose you. This will also showcase your analytical skills and how you’ve applied data science to solve a prevailing problem. Put at least 3 diverse projects in your data science portfolio A nice way to create a portfolio is to list 3 good projects that are diverse in nature. Here are some interesting projects to get you started on your portfolio: Data Cleaning and wrangling Data Cleaning is one of the most critical tasks that a data scientist performs. By taking a group of diverse data sets, consolidating and making sense of them, you’re giving the recruiter confidence that you know how to prep them for analysis. For example, you can take Twitter or Whatsapp data and clean it for analysis. The process is pretty simple; you first find a “dirty” data set, then spot an interesting angle to approach the data from, clean it up and perform analysis on it, and finally present your findings. Data storytelling Storytelling showcases not only your ability to draw insight from raw data, but it also reveals how well you’re able to convey the insights to others and persuade them. For example, you can use data from the bus system in your country and gather insights to identify which stops incur the most delays. This could be fixed by changing their route. Make sure your analysis is descriptive and your code and logic can be followed. Here’s what you do; first you find a good dataset, then you explore the data and spot correlations in the data. Then you visualize it before you start writing up your narrative. Tackle the data from various angles and pick up the most interesting one. If it’s interesting to you, it will most probably be interesting to anyone else who’s reviewing it. Break down and explain each step in detail, each code snippet, as if you were describing it to a friend. The idea is to teach the reviewer something new as you run through the analysis. End to end data science If you’re more into Machine Learning, or algorithm writing, you should do an end-to-end data science project. The project should be capable of taking in data, processing it and finally learning from it, every step of the way. For example, you can pick up fuel pricing data for your city or maybe stock market data. The data needs to be dynamic and updated regularly. The trick for this one is to keep the code simple so that it’s easy to set up and run. You first need to identify a good topic. Understand here that we will not be working with a single dataset, rather you will need to import and parse all the data and bring it under a single dataset yourself. Next, get the training and test data ready to make predictions. Document your code and other findings and you’re good to go. Prove you have the data science skill set If you want to get that job, you’ve got to have the appropriate tools to get the job done. Here’s a list of some of the most popular tools with a link to the right material for you to skill-up: Data science languages There's a number of key languages in data science that are essential. It might seem obvious, but making sure they're on your resume and demonstrated in your portfolio is incredibly important. Include things like: Python R Java Scala SQL Big Data tools If you're applying for big data roles, demonstrating your experience with the key technologies is a must. It not only proves you have the skills, but also shows that you have an awareness of what tools can be used to build a big data solution or project. You'll need: Hadoop, Spark Hive Machine learning frameworks With machine learning so in demand, if you can prove you've used a number of machine learning frameworks, you've already done a lot to impress. Remember, many organizations won't actually know as much about machine learning as you think. In fact, they might even be hiring you with a view to building out this capability. Remember to include: TensorFlow Caffe2 Keras PyTorch Data visualisation tools Data visualization is a crucial component of any data science project. If you can visualize and communicate data effectively, you're immediately demonstrating you're able to collaborate with others and make your insights accessible and useful to the wider business. Include tools like these in your resume and portfolio:  D3.js Excel chart  Tableau  ggplot2 So there you have it. You know what to do to build a decent data science portfolio. It’s really worth attending competitions and challenges. It will not only help you keep up to data and well oiled with your skills, but also give you a broader picture of what people are actually working on and with what tools they’re able to solve problems.
Read more
  • 0
  • 2
  • 21911

article-image-15-useful-python-libraries-to-make-your-data-science-tasks-easier
Amey Varangaonkar
12 Feb 2018
10 min read
Save for later

15 Useful Python Libraries to make your Data Science tasks Easier

Amey Varangaonkar
12 Feb 2018
10 min read
Python has become a big hit in the Data Science community over the last five years. So much so that it is slowly taking over R - the ‘lingua franca of statistics’ - as the preferred choice of tool for many. The recently published Stack Overflow Developer Survey 2018 suggests Python is the next big programming language, and its adoption in the industry is only going to increase. Python’s rise has been staggering, but not really surprising. Its general-purpose nature, coupled with the efficiency and ease of use make it easier for you to build your data science solutions without any hassle. You also have a rich suite of Python libraries available at your disposal for all your Data Science-related tasks - from basic web scraping to something as complex as training deep learning models. In this article, we take a look at some of the most popular and widely used Python libraries and their application areas. Web Scraping Web scraping is a popular information extraction technique from the web using the HTTP protocol, with the help of a web browser. The two most commonly used tools for web scraping are, unsurprisingly, Python-based. 1.Beautiful Soup Beautiful Soup is a popular Python library for extracting information out of the HTML and XML files. It provides a unique, easy way to navigate, search and modify the parsed data, potentially saving you hours of needless work. It works with both the versions of Python, i.e. 2.7 and 3.x and is very easy to use. Check out our latest tutorial on how to scrape web page using the Beautiful Soup. [box type="info" align="" class="" width=""]Editor's Tip: If you’re new to the concept of web scraping, Beautiful Soup should be your go-to library. You can learn more about how to use this library more efficiently in our book Python Web Scraping Cookbook [/box] 2.Scrapy Scrapy is a free, open source framework written in Python. Although developed for web scraping, it can also be used as a general web crawler and extract data using different APIs. Following the ‘Don’t Repeat Yourself’ philosophy of frameworks such as Django, Scrapy includes a set of self-contained crawlers, with each of them following specific instructions with a specific objective. [box type="info" align="" class="" width=""]Editor’s tip: To learn how to use Scrapy for your scraping projects, our book Python Web Scraping, Second Edition is definitely worth checking out. [/box] Scientific Computation and Data Analysis Arguably the most common data science tasks, Python proves to be of great worth to data scientists by providing unique libraries for data manipulation and analysis, as well as mathematical computation. 3. NumPy NumPy is the most popular library for scientific computing in Python and is a part of the larger Python stack for scientific computation called SciPy (discussed below). Apart from its uses in linear algebra and other mathematical functions, it can also be used as a multi-dimensional container, or array, of generic data with arbitrary data types. NumPy integrates seamlessly languages such as C/C++ and because of its support for multiple data types, it works well with a variety of databases as well. 4. SciPy SciPy is a Python-based framework containing open source libraries for mathematics, scientific computation and data analysis.  The SciPy library is a collection of algorithms and tools for advanced mathematical computations, statistics and much more. The SciPy stack consists of the following libraries: NumPy - Python package for numerical computation SciPy - One of the core packages of the SciPy stack for signal processing, optimization and advanced statistics matplotlib - Popular Python library for data visualization SymPy - Library for symbolic mathematics and algebra pandas - Python library for data manipulation and analysis iPython -  Interactive console to run Python-based code 5. pandas pandas is a widely used Python package providing data structures and tools for effective data manipulation and analysis. It is a popularly used tool for Quantitative Analysis and finds a lot of application in algorithmic trading and risk analysis. With a large community of dedicated users, pandas is regularly updated to get new API changes, performance updates and bug fixes. This is one library you definitely need to work with to truly realize its power. [box type="info" align="" class="" width=""]Editor's Tip: To get a more hands-on understanding of how to effectively use pandas for data analysis, make sure you check out our highly popular title pandas Cookbook.[/box] Machine Learning and Deep Learning Python trumps all other languages when it comes to implementing efficient machine learning and deep learning models, simply by virtue of its diverse, effective and easy to use set of libraries. It is worth having a look at the experts’ take on why Python is great for machine learning and Artificial Intelligence. In this section, we see some of the most popular and commonly used Python libraries for machine learning and deep learning: 6. Scikit-learn scikit-learn is the most popular Python library for data mining, analysis and machine learning. It is built using the capabilities of NumPy, SciPy and matplotlib, and is commercially usable. You can implement a variety of machine learning techniques such as classification, regression, clustering and more, using scikit-learn. It is very easy to install and has a clean, slick documentation for anyone looking to get started with it. [box type="info" align="" class="" width=""]Editor’s tip: To understand how to use scikit-learn in your machine learning projects, our bestselling book Python Machine Learning, Second Edition is all you need. If you’re looking to specifically master scikit-learn, Mastering Machine Learning with scikit-learn will prove to be a very useful resource. Check it out! [/box] 7. Tensorflow Tensorflow is the popular machine learning library everyone seems to be talking about today. It is a Python-based framework for effective machine learning and deep learning using multiple CPUs or GPUs. Backed by Google, it was initially developed by the research team of Google Brain, and is the widely used framework in the world for machine intelligence. It enjoys the support of a large community of active users and is finding widespread application for advanced machine learning across a multitude of industrial domains - from manufacturing and retail to healthcare and smart cars. If you are interested to know more about Tensorflow, you can quickly check out the tutorial here. [box type="info" align="" class="" width=""]Editor's Tip: Tensorflow being the most popular framework for machine learning and deep learning, it is one library you should definitely master. Check out the following books to skill up quickly! Machine Learning with TensorFlow 1.x TensorFlow Machine Learning Cookbook Deep Learning with TensorFlow Tensorflow 1.x Deep Learning Cookbook Mastering Tensorflow 1.x [/box] 8. Keras Keras is a Python-based neural networks API, and offers a simplified interface to train and deploy your deep learning models with ease. It has support for a variety of deep learning frameworks such as Tensorflow, Deeplearning4j and CNTK. Keras is very user-friendly, follows a modular approach and supports both CPU and GPU-based computations. If you want to make the deep learning process simpler and effective, this library is definitely worth checking out! [box type="info" align="" class="" width=""]Editor's Tip: If you’re looking for a resource that teaches you how to use Keras effectively, our trending book Deep Learning with Keras will be of great help to you! [/box] 9. PyTorch One of the more recent additions to Python deep learning family is PyTorch, a neural network modeling library with strong GPU support. Although still in a beta stage, this project is backed by bigwigs such as Facebook and Twitter. PyTorch builds on the architecture of Torch, another popular deep library, to enable more efficient tensor computation and implementation of dynamic neural networks. [box type="info" align="" class="" width=""]Editor's Tip: Here is Deep Learning with PyTorch to get you started with this amazing tool. [/box] Natural Language Processing Natural Language Processing pertains to designing of systems that process, interpret and analyze human language, spoken or written. Python offers unique libraries for performing a variety of tasks such as working with structured and unstructured text, predictive analytics and much more. 10. NLTK NLTK is a popular Python library for language processing. It offers easy to use interfaces for a variety of NLP tasks such as text classification, tokenization, text parsing, semantic reasoning and much more. It is an open source, community-driven project, and has support for both Python 2 and Python 3. 11. SpaCy SpaCy is another library for advanced natural language processing, based on Python and Cython. It has an extensive support for various deep learning libraries and frameworks such as Tensorflow and PyTorch. With SpaCy, you can build complex statistical models for NLP with relative ease. SpaCy is easy to install and use, and proves to be of great help when it comes to large-scale extracting and analyzing of textual information. [box type="info" align="" class="" width=""]Editor's Tip: To know more about how these libraries are used for natural language processing, make sure you check out the book Natural Language Processing with Python Cookbook [/box] Data Visualization Data visualization is a popularly used Data Science technique for visually analysing and communicating information and valuable business insights through graphs, charts, dashboards and reports. Python offers a lot of popular libraries for effective data storytelling. Some of them are listed below: 12. matplotlib matplotlib is the most popular Python library for data visualization which allows for enterprise-grade 2D and 3D plotting. With matplotlib, you can build different kinds of visualizations such as histograms, bar charts, scatter plots and much more, with just a few lines of code. The popularity of matplotlib rivals that of R’s highly acclaimed ggplot2, and deciding which library is better has been a hot topic for debate, for many years now. Matplotlib runs seamlessly on all Python consoles, including iPython and Jupyter notebooks, giving you all the necessary tools to create and share your data visualizations with others. [box type="info" align="" class="" width=""]Editor's Tip: Get started with matplotlib today, with the help of Matplotlib 2.x By Example [/box] 13. Seaborn Seaborn is a Python-based data visualization library, which finds its roots in matplotlib. Apart from offering attractive and insightful data visualizations, seaborn also offers strong support for other Python libraries such as NumPy and pandas. Per the official seaborn page: “If matplotlib “tries to make easy things easy and hard things possible”, seaborn tries to make a well-defined set of hard things easy too.” 14. Bokeh Bokeh is an interactive data visualization library based on Python. It aims to provide D3.js style elegant graphics and visualizations and runs primarily on modern web browsers. Apart from the ability to create a wide variety of visualizations, Bokeh also supports large-scale interactivity and visualizations of real-time datasets. 15. Plotly Plotly is a popularly used Python library which is used across the world for making publication-quality plots and graphs. With Plotly, you can build interactive dashboards, scatter plots, histograms, candlestick charts, heat maps, and a whole host of other data visualizations with ease. With superior interactivity, deployment and publication capabilities, Plotly is used across different domains, majorly finance and geospatial industries for effective data storytelling. So there you have it! Python has an extensive suite of libraries for every data science related task, each equipped with unique features to make the task fast and hassle-free. While there are a lot more Python libraries out there, we cherry-picked these 15 libraries based on their popularity, usefulness and the value they bring to the table. Also, the extensive community support for Python means you can get help for any kind of problem you might come across while using these tools. It's time now for you to go out there and crunch some data with some of these Python powered libraries!
Read more
  • 0
  • 0
  • 41233

article-image-how-iot-is-going-to-change-tech-teams
Raka Mahesa
31 Jan 2018
5 min read
Save for later

How IoT is going to change tech teams

Raka Mahesa
31 Jan 2018
5 min read
The Internet of Things is going to transform the way we live in the future. It will change how we commute, how we work, even simple day to day activities. But one thing that’s often overlooked when we talk about the internet of things is how it will impact IT teams. We’ve seen a lot of change in the shape of the modern IT team over the last 10 years thanks to things like DevOps, but IoT is going to shape things further in the near future.  To better understand how the Internet of Things will shape IT teams in the future, we first need to understand the application of the Internet of Things, especially in the sector closest to IT teams, the enterprise sector. IoT in the enterprise sector If you look at consumer media, the most common applications of the Internet of Things are the small-scale ones like smart gadgets and smart home systems. Unfortunately, this class of IoT products hasn't really caught up with mainstream consumers; its audience is limited to hobbyists and people in the tech. However, it's a whole different story with the enterprise sector becuse companies all over the world are starting to realize the benefit of applying IoT in their line of business.  Different industries have different applications of IoT. Usually though, IoT is used to either increase efficiency or reduce cost. For example, a shipping service may apply a monitoring system on their vehicles to track their speed and mileage to find ways to reduce fuel usage. Similarly, an airline company could apply sensors on their fleet of airplanes to monitor engine conditions to maintain it properly. A company may also apply IoT to manage its energy consumption so that it can reduce unneeded expenses. What new skills does IoT demand of tech pros All of these applications of IoT are going to require new skills and maybe even new job roles. So while we’ll see efficiencies thanks to these innovations, to really make an impact its still going to need both personal and organizational investment in skills and knowledge to ensure IoT is really helping to drive positive change. IoT and the second data explosion Let’s start with the most obvious change – the growth of data. Yes, the big data explosion has been happening all around us for the last decade, but IoT is bringing with it a second explosion that will be even bigger. This means everyone is going to have to become more data-savvy. That’s not to say that everyone will need to moonlight as a data scientist, but they will need an awareness of how data is stored and processed, who needs access to it and who needs to act on it. Device management will become more important than ever IoT isn’t just about data. It’s also about devices. With more gadgets and sensors connected to a given network, device management and maintenance will be an essential part of the IT team’s work. To tackle this problem, the team will need to grow bigger to handle more work, or they will need to use a more powerful device management tool that can handle a big amount of connected devices. New security risks presented by IoT An increase in the number of connected devices also presents increased security risks. This means pressure will be on IT departments to  IT team will need to tighten up security. Managing networks is one part of that, but a further challenge will be managing the human side of security – ensuring good practice is followed by staff and taking steps to minimize social engineering threats. IT teams will have to customize IoT solutions to meet their needs IoT doesn’t yet have many standards. That means today’s organizations face opportunities and challenges in how they customize solutions and tools for their own needs. This can be daunting, but for people working in IT teams it’s also really exciting – it gives them more control and ownership of the work they are doing. Third party solutions will no doubt remain, but they won’t be quite so important when it comes to IoT. True, companies like IBM will be working on IoT solutions right now to capture the market; however, because these innovations are in their infancy there’s a limit on traditional technology corporations’ ability to shape and define the IoT landscape in the way they have done with innovations in the past.  And that's just a small bit of how the Internet of Things will affect the IT team. When IoT takes off, it will change our lives in the most unimaginable ways possible, so of course there will be even more changes that will happen with the IT teams in charge of this. But then again, the world of technology is ripe with changes and disruptions, so I'm sure we're all used to changes and will be able to adapt. Raka Mahesa is a game developer at Chocoarts who is interested in digital technology in general. Outside of work, he enjoys working on his own projects, with Corridoom VR being his latest released game. Raka also regularly tweets @legacy99.
Read more
  • 0
  • 0
  • 19744

article-image-what-is-seaborn-and-why-should-you-use-it-for-data-visualization
Erik Kappelman
30 Jan 2018
6 min read
Save for later

What is Seaborn and why should you use it for data visualization?

Erik Kappelman
30 Jan 2018
6 min read
Seaborn is a Python library created for enhanced data visualization. It's a very timely and relevant tool for data professionals working today precisely because effective data visualization – and communication in general – is a particularly essential skill. Being able to bridge the gap between data and insight is hugely valuable, and Seaborn is a tool that fits comfortably in the toolchain of anyone interested in doing just that. There are, of course, a huge range of data visualization libraries out there – but if you're wondering why you should use Seaborn, put simply it brings some serious power to the table that other tools can’t quite match. Follow this Seaborn tutorial and you’ll find out what makes Seaborn such a good data visualization library. How to get started with Seaborn To get started, I recommend becoming familiar with Anaconda, if you are not already. I find that using Anaconda and its various tools makes coding in Python, especially package and library management, a whole lot easier. So, let's load the packages we are going to need. (I am assuming you have already downloaded and setup Seaborn.) import numpy as np import matplotlib.pyplot as plt import seaborn as sns import pandas as pd Now that we have our packages on board, let's just make a basic plot. The function below creates a series of sine functions and then graphs all of these functions; take a look: np.random.seed(sum(map(ord, "aesthetics"))) def sinplot(flip=1): x = np.linspace(0, 14, 100) for i in range(1, 7): plt.plot(x, np.sin(x + i * .5) * (7 - i) * flip) sin = sinplot() plt.savefig("sin.png") It’s a pretty basic set of sine curves, and while it looks pretty professional and clean, it doesn’t really tell us much more about what makes Seaborn unique. So what makes Seaborn different? What are the benefits of Seaborn? Well, let's take a look at what Seaborn refers to as ‘joint plots.’ These plots pair a scatter plot with the distribution of each variable in the scatter plot on the axes. Let's look at the code for the next two graphs and then we’ll discuss why they matter: join1 = sns.jointplot(x="x", y="y", data=df); join1.savefig("join1.png") join2= sns.jointplot(x="x", y="y", data=df, kind="kde"); join2.savefig("join2.png") plt.clf() This plot isn’t unique to Seaborn. I've created very similar plots in R, however, that plot took one single line of code. In R, at the very least you're looking at five or six lines, and you’re going to have to use the default plotting package because I’ve never been able to figure out marginal plots in ggplot2. Graphs like this really show us a lot about the data we are examining. We can simultaneously see that the two sets of data are correlated and that they are both somewhat skewed and non-normal, although the y variable could probably pass as normal. If marginal plots were this easy in R, I would leverage them a whole lot more because they are informative. The next plot, however, is different. In fact, I hadn’t really seen something like it before I learned about Seaborn. This plot uses a kernel density plot instead of a scatter plot, and the distributions are estimated smoothly instead of using histograms. This could be a helpful graph if you were specifically interested in densities and correlations as well as the distributions of the data. This could be quite beneficial in various spatial analysis applications, as well as traditional statistical fields. The third join plot includes a regression line in the scatter plot as well as an assessment of the fit of the linear model used. The code used to produce this plot is below: tips = sns.load_dataset('tips') sns.jointplot(x="total_bill", y="tip", data=tips, kind="reg"); plt.savefig('join3.png') The inclusion of error fields around the line helps you to better visualize the accuracy of the linear regression. Additionally, the distribution of the data is available in the margins. Normally, it would take three separate graphs to convey all of this information. Seaborn makes this much simpler. With a single line of code, we are able to create a graph that covers all of the relevant information related to this linear regression. Another somewhat novel graph type that’s available in Seaborn is the violin plot. Again, we can create this complex graph with the simple code shown below: iris = sns.load_dataset("iris") sns.violinplot(x=iris.species, y=iris.sepal_length, data=iris); plt.savefig("violin.png") This is data from the famous Iris data set. The violin plot is essentially an amalgamation of a box plot and a kernel density estimate of a distribution. Both box plots and graphs of univariate distributions are very helpful when first beginning analysis of some dataset. Again, Seaborn takes a lot out of the work of this process by making it easy to produce single graphs that would normally take multiple graphs using other analysis tools. The final chart I would like to show is really useful. It summarizes the results of univariate logistic regression graphically. This is a tough thing to display and until I came across Seaborn I had really never seen an example I would consider good. The chart is created with the code below: tips['big_tip'] = tips['tip']/tips['total_bill'] >= 0.2 sns.lmplot(x="total_bill", y="big_tip", data=tips,logistic=True, y_jitter=.03); plt.savefig("tiplogit.png") The chart displays the results of the regression a binary indicator if a tip was larger than 20 percent or ‘big’ against the total cost of the meal: The chart illustrates very clearly that people are not tipping as much when their meals are more expensive, at least in terms of proportions. Summarizing the results of logistic regressions is always challenging, but as you can see, thanks to Seaborn, you can do a pretty good job with just one line of code. Seaborn is simply a really great library that's worth your time exploring – I hope this post has convinced you and inspired you to go and try it for yourself if you haven't already. There is always room for improvement when it comes to data visualization. Seaborn might be the improvement you need. I know I'll be using it.
Read more
  • 0
  • 0
  • 42617
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-is-linux-hard-to-learn
Jay LaCroix
30 Jan 2018
6 min read
Save for later

Is Linux hard to learn?

Jay LaCroix
30 Jan 2018
6 min read
This post is an extract from Linux Mint Essentials by Jay LaCroix. Quite often, I am asked whether or not Linux is hard to learn. The reputation Linux has of being hard to use and learn most likely stems from the early days when typical distributions actually were quite difficult to use. I remember a time when simply installing a video card driver required manually recompiling the kernel (which took many hours) and enabling support for media such as MP3s required multiple manual commands. Nowadays, however, how difficult Linux is to learn and use is determined by which distribution you pick. If, for example, you're a beginner and you choose a distribution tailored for advanced users, you are likely to find yourself frustrated very quickly. In fact, there are distros available that make you do everything manually, such as choosing which version of the kernel to run and installing and configuring the desktop environment. This level of customizability is wonderful for advanced users who wish to build their own Linux system from the ground up, though it is more likely that beginners would be put off by it. General purpose distributions such as Mint are actually very easy to learn, and in some cases, some tasks in Mint are even easier to perform than in other operating systems. The ease of use we enjoy with a number of Linux distributions is due in part to the advancements that Ubuntu has made in usability. Around the time when Windows Vista was released, a renaissance of sorts occurred in the Linux community. At that time, quite a few people were so outraged by Windows Vista that a lot more effort was put into making Ubuntu easier to use. It can be argued that the time period of Vista was the fastest growth in usability that Linux ever saw. Tasks that were once rites of passage (such as installing drivers and media codecs) became trivial. The exciting changes in Ubuntu during that time inspired other distributions to make similar changes. Nowadays, usage of Ubuntu is beginning to decline due to the fact that not everyone is pleased about its new user interface (Unity); however, there is no denying the positive impact it had on Linux usability. Being based on Ubuntu, Mint inherits many of those benefits, but also aims to improve on its proposed weaknesses. Due to its great reception, it eventually went on to surpass Ubuntu itself. Mint currently sits at the very top of the charts on Distrowatch.com, and with a good reason—it's an amazing distribution. Distributions such as Mint are incredibly user friendly. Even the installation procedure is a cinch, and most can get through it by simply accepting the defaults. Installing new software is also straightforward as everything is included in software repositories and managed through a graphical application. In fact, I recently acquired an HP printer that comes with a CD full of required software for Windows, but when connected to my Mint computer, it just worked. No installation of any software was required. Linux has never been easier! Why use Linux Mint When it comes to Linux, there are many distributions available, each vying for your attention. But which Linux distribution should you use? In this post, taken from Linux Mint Essentials, we’ll explore why you should choose Linux Mint rather than larger distributions such as Fedora and Ubuntu. In the first instance, the user-friendly nature of Linux Mint is certainly a good reason to use it. However, there’s much more to it than just that. Of course, it’s true that Ubuntu is the big player when it comes to Linux distributions - but because Linux Mint is built on Ubuntu it has the power of its foundations. That means by choosing Mint, you’re not compromising on what has become a standard in Linux. So, Linux Mint takes the already solid foundation of Ubuntu, and improves on it by using a different user interface, adding custom tools, and including a number of further tweaks to make its media formats recognized right from the start. It’s not uncommon for a Linux distribution to be based on other distributions. This is because it's much easier to build a distribution on an already existing foundation, since building your own base is quite time consuming (and expensive). By utilizing the existing foundation of Ubuntu, Mint benefits from the massive software repository that Ubuntu has at its disposal, without having to reinvent the wheel and recreate everything from the ground up. The development time saved by doing this allows the Linux Mint developers to focus on adding exciting features and tweaks to improve its ease of use. Given the fact that Ubuntu is open source, it's perfectly fine to use it as a base for a completely separate distribution. Unlike the proprietary software market, the developers of Mint aren't at risk of being sued for recycling the package base of another distribution. In fact, Ubuntu itself is built on the foundation of another distribution (Debian), and Mint is not the only distribution to use Ubuntu as a base. As mentioned before, Mint utilizes a different user interface than Ubuntu. Ubuntu ships with the Unity interface, which (so far) has not been highly regarded by the majority of the Linux community. Unity split Ubuntu's user community in half as some people loved the new interface, though others were not so enthused and made their distaste well-known. Rather than adopt Unity during this transition, Mint opted for two primary environments instead, Cinnamon and MATE. Cinnamon is recommended for more modern computers, and MATE is useful for older computers that are lower in processing power and memory. MATE is also useful for those who prefer the older style of Linux environments, as it is a fork of GNOME 2.x. Many people consider Cinnamon to be the default desktop environment in Linux Mint, but that is open to debate. The Mint developers have yet to declare either of them as the default. Mint actually ships five different versions (also known as spins) of its distribution. Four of them (Cinnamon, MATE, KDE, and Xfce) feature different user interfaces as the main difference, while the fifth is a completely different distribution that is based on Debian instead of Ubuntu. Due to its popularity, Cinnamon is the closest thing to a default in Mint and as such, it is a recommended starting point.
Read more
  • 0
  • 0
  • 14873

article-image-customer-relationship-management-just-got-better-artificial-intelligence
Savia Lobo
28 Jan 2018
8 min read
Save for later

Customer Relationship management just got better with Artificial Intelligence

Savia Lobo
28 Jan 2018
8 min read
According to an International Data Corporation (IDC) report, Artificial intelligence (AI) has the potential to impact many areas of customer relationship management (CRM). AI as an armor will ease out mundane tasks for the CRM teams, which implies they will be able to address more customer queries through an automated approach. An AI-based expert CRM offers highly predictive and intuitive ways to customer problems, thus grabbing maximum customer attention. With AI, CRM platforms within different departments such as sales, finance, marketing etc. do not limit themselves to getting service feedback from their customers. But they can also gain information based on the data that customers generate online i.e the social media or IoT devices. With such massive amount of data hailing from various channels, it becomes a bit tricky for organizations to keep a track of its customers. Not only this, but to extract detailed insights from huge amount of data becomes all the more difficult. And here is the gap where, organizations feel the need to bring in an AI-based optimized approach for their CRM platform. The AI-enabled platform can assist CRM teams to gain insights from the large aggregation of customer data, while also paving a way for seamless customer interactions. Organizations can not only provide customers with helpful suggestions, but also recommend products to boost their business profitability. AI-infused CRM platforms can take over straightforward tasks such as client feedback, that otherwise is time consuming. It allows businesses to focus on customers that provide higher business value, which might have got neglected previously. It also acts as a guide for executive level employees via a virtual assistant, allowing them to tackle customer queries without any assistance from senior executives. AI techniques such as Natural language processing(NLP) and predictive analytics are used within the CRM domain, to gain intelligent insights in order to enhance human decision making. NLP interprets incoming emails, categorizes them on the basis of intent, and automatically drafts responses by identifying the priority level. Predictive Analytics helps in detecting the optimal time for solving customer queries, and the mode of communication that will best fit to engage with the customer. With such functionalities, a smarter move towards digitizing organizational solutions can be achieved  reaping huge profits for organizations who wish to leverage it. How AI is transforming CRM Businesses aim to satisfy customers who utilize their services. This is because, keeping a customer happy can lead to further incrementation in revenue generation. Organizations can achieve this rapidly with the help of AI. Salesforce, the market leader in the CRM space, integrated an AI assistant which is popularly known as Einstein. Einstein makes CRM an easy-to-use platform by simply allowing customers to import their data on Salesforce and automatically provides ready-to-crunch data driven insights across different channels. Other organizations such as SAP and Oracle are implementing AI-based technologies for their CRM platforms to provide an improvised customer experience. Let’s explore how AI benefits within an organization: Steering Sales With AI, the sales team can shift their focus from the mundane administrative tasks and get to know their customers better. Sales CRM team leverages novel scoring techniques, which help in prioritizing quality leads, thus generating maximum revenue for the organization. Sales leaders, with help of AI can work towards improving sales productivity. After analyzing company’s historical data and employee activities, the AI-fused CRM software can present a performance report of the top sales representatives. Such a feature helps sales leaders to strategize what the bottom line representatives should learn from the top representatives to drive conversations with their customers that show a likelihood for sales generation. People.ai, a sales management platform, utilize AI to deliver performance analytics, personalized coaching, and provide reviews for their sales pipeline. This can assist sales leaders get a complete view of sales activities going on within their organizations. Marketing it better To trigger a customer sale requires extensive push marketing strategies.With Artificial Intelligence enabled marketing, customers are driven into a predictive journey, which ensures each journey to end up into a sale or a subscription. Both ways it is a win-win situation for the organizations. Predictive scoring can intelligently determine the likelihood of a customer to subscribe to a newsletter or trigger a purchase. AI can also analyze images across various social media sources such as Pinterest, Facebook, and can provide suggestions for visuals of an upcoming advertising campaign. Also, by carrying out sentiment analysis on product reviews and customer feedback, the marketing team can take into account, user’s sentiment about a particular brand or product. This helps brands to announce discount offers in case of a decreased sale, or increase the production of a product in demand. Marketo, a marketing automation platform includes a software which aids different CRM platforms to gain rich behavioral insights of their customers and to drive business strategies. 24*7 customer support Whenever a customer query arises within a CRM, AI anticipates the likely issues and resolves them before it results into a problem. Different customer cases are classified and directed to the right service agent to address with the help of predictive analytics techniques. Also, NLP-based digital assistants known as chatbots are used to analyze the written content within e-mails. A chatbot efficiently responds to customer e-mails; in most rare cases, it directs the e-mail to a service agent. Chatbots can even notify a customer about an early-bird offer to purchase a product, which they are likely to buy. It can also issue meetings and notify the same by scheduling reminders­‑given the era of push notifications and smart wearables. Hence, with AI into CRM, organizations can not only offer customers better services but also provide 24*7 support. Agent.ai, an AI-based customer service platform, allows organizations to provide a 24*7*365 customer support including holidays, weekends, and non-staffed hours. Application development no more a developer’s play Building an application has become an important milestone to achieve for any organization. If the application has a seamless and user-friendly interface, it is favoured by many customers and thus, the organization gets more customer traction. Building an application was considered as ‘a developers job only’ as it involves coding. However, due to the rise in platforms that help build an application with lesser coding or in fact no-coding, any non-coder can easily develop an application. CRM platforms helps businesses to build applications, which provides insight driven predictions and recommendation solutions to their customers. Salesforce assures their customers that each application built on their platform includes intelligent data modeling, tracking, and monitoring. Business users, data scientists, or any non-developer, can now build applications without learning to code. This helps them to create prediction-based applications their way; without the IT hassle. Challenges & Limitations AI implementations are becoming common with an increased number of organizations adopting it both on a small and a large scale. Many businesses are moving towards a smart customer management by infusing AI within their organizations. AI undoubtedly brings in an ease of work, but there are challenges that the CRM platform can face, which if unaddressed may cause revenue declination for businesses. Below are the challenges which organizations might face while setting up AI in their CRM platform: Outdated data: Organizations collect a huge amount of data during various business activities to drive meaningful insights about sales, customer preferences, etc. This data is a treasure trove for the marketing team, to plan strategies in order to attract more new customers and retain the existing ones. On the contrary, if the data provided is not updated,  CRM teams may find it difficult to understand the current customer relationship status. To avoid this, a comprehensive data cleanup project is essential to maintain better quality of data. Partially automated: AI creates an optimized environment  for the CRM with the use of  predictive analytics and natural language processing for better customer engagement. This eases out the mundane elements for the CRM team, and they can focus on other strategic outcomes. This does not imply that AI is completely replacing humans. Instead, a human touch is required to monitor if the solutions given by the AI benefits the customer and how they can tweak it to a much more smarter AI. Intricacies of language: An AI is trained on data which includes various set of phrases and questions, and also the desired output that it should give. If the query input by the customer is not phrased in a correct manner, the AI is unable to provide them with correct solutions. Hence, customers have to take precautions while asking their queries and phrase it in the correct manner, else the machine would not understand what the customer aims to ask. Infusing AI into CRM has multiple benefits, but the three most important ones include predictive scoring, forecasting, and recommendations. These benefits empower CRM to outsmart its traditional counterpart by helping organizations to serve its customers with state-of-the-art results. Customers appreciate when their query is addressed in lesser time,leaving a positive remark on the organization. Additionally we have digital assistants to assist firms in solving customer query quickly.
Read more
  • 0
  • 0
  • 15843

article-image-unity-and-unreal-comparison
Raka Mahesa
26 Jan 2018
5 min read
Save for later

Unity and Unreal comparison

Raka Mahesa
26 Jan 2018
5 min read
If you want to find out how to get into game development, you’ve probably come across the two key game engines in the industry: Unreal Engine and Unity. But how do Unity and Unreal Engine compare? What are the differences between Unity and Unreal engine? Is one better than the other? Explore the newest and most popular Unity eBooks and video courses. Discover Unreal eBooks and video courses here. Unity and Unreal price comparison Unreal Engine has a simple pricing scheme: You get everything for free, but you have to pay 5 percent of your earnings. Unity also has a free tier that includes the core features of the engine, but if your company has an annual revenue of more than $100,000, you have to use the paid tier, which will cost you $35 per month. The paid tier also gives you additional features including a custom splash screen, an enhanced analytics feature, and expanded multiplayer hosting  The question here is which pricing scheme fits with your business model (and budget). If you have a small, nimble team Unity might be the better option, but if you have a big team developing a complex game, Unreal Engine might be more cost effective. The good thing is, without spending a dime, you can get the full capability of both tools, so you can't really go wrong starting with either of them. How do Unity's and Unreal's capabilities compare? We'll start with a simple, but important, question: what platforms do Unreal Engine and Unity support? Unreal engine supports developing games for mobile platforms like iOS and Android, for consoles like PS4, XBOX ONE, and Nintendo Switch, and for desktop operating systems like Windows, Mac, and Linux. It also has support for VR platforms such as Oculus, SteamVR, PSVR, Google Daydream, and Samsung Gear VR.  Unity, on the other hand, not only supports all of those platforms, it also supports smart TV platforms like Android TV and Samsung SmartTV, as well as augmented reality platforms like Apple ARKit and Google ARCore. And Unity doesn't simply support more platforms than Unreal, it is also usually the first game engine to provide compatibility when a new platform is launched. Unity is the clear winner when it comes to compatibility, and if you're looking to release your game to as many platforms as possible, then Unity is your best choice. Comparing Unity and Unreal's feature sets Even though both software have similar capabilities, Unreal Engine provides more built-in tools that makes game development easier. Unreal has a built-in, extensive material editor as well as a built-in cinematic editor that allows developers to easily create cinematic sequences in their games. Meanwhile, Unity relies on third-party addons from their asset store to provide similar functionalities. That said, the 2D development tool provided by Unity is much more effective than Unreal’s.  Do keep in mind that features can't only be judged by their numbers alone. One of the most important qualities in a tool is how easy they are to use. Ease of use is, of course, relatively subjective – what one person loves using might be a nightmare for another.  Is Unity or Unreal easier to use? Based on the built-in tools provided by the engine, we can see that Unreal is the more powerful of the two options. But that also means Unity is simpler to use. The same comparison can be seen in their programming aspect. Unity is using C# for their main programming language, which is easier to use and learn. Unreal, on the other hand, is using C++, which is much more powerful, but is also harder to learn and more prone to mistakes. Fortunately, Unreal makes up for its complexity by providing an alternative, easy-to-use scripting language: Blueprint. Blueprint is a scripting language where developers can simply connect nodes together to program gameplay elements. Using this tool, non-programmers like artists and writers are able to script gameplay events without relying on programmers. Comparing the Unity and Unreal communities The last point we're going to address is something not directly related to the engine itself, but it is nevertheless pretty important - the community. A big community makes it much easier to get help when you run into trouble; it also means more tool and resource development. Unity is the winner on this front, as can be seen with the huge amount of tutorials and third-party libraries that are created for it. It’s important to remember one thing: both development tools are fully capable of producing great games with amazing graphics and good performance that can sell millions. One tool may need more work than the other to get the same result, but that result is perfectly achievable with both engines. So you don't need to worry that choosing one tool over the other will negatively affect your end product. So, have you made up your mind on which tool you're going to use? Raka Mahesa is a game developer at Chocoarts who is interested in digital technology in general. Outside of work, he enjoys working on his own projects, with Corridoom VR being his latest relesed gme. Raka also regularly tweets @legacy99.
Read more
  • 0
  • 0
  • 54288

article-image-10-to-dos-for-industrial-internet-architects
Aaron Lazar
24 Jan 2018
4 min read
Save for later

10 To-dos for Industrial Internet Architects

Aaron Lazar
24 Jan 2018
4 min read
[box type="note" align="" class="" width=""]This is a guest post by Robert Stackowiak, a technology business strategist at the Microsoft Technology Center. Robert has co-authored the book Architecting the Industrial Internet with Shyam Nath who is the director of technology integrations for Industrial IoT at GE Digital. You may also check out our interview with Shyam for expert insights into the world of IIoT, Big Data, Artificial Intelligence and more.[/box] Just about every day, one can pick up a technology journal or view an on-line technology article about what is new in the Industrial Internet of Things (IIoT). These articles usually provide insight into IIoT solutions to business problems or how a specific technology component is evolving to provide a function that is needed. Various industry consortia, such as the Industrial Internet Consortium (IIC), provides extremely useful documentation in defining key aspects of the IIoT architecture that the architect must consider. These broad reference architecture patterns have also begun to consistently include specific technologies and common components. The authors of the title Architecting the Industrial Internet felt the time was right for a practical guide for architects.The book provides guidance on how to define and apply an IIoT architecture in a typical project today by describing architecture patterns. In this article, we explore ten to-dos for Industrial Internet Architects designing these solutions. Just as technology components are showing up in common architecture patterns, their justification and use cases are also being discovered through repeatable processes. The sponsorship and requirements for these projects are almost always driven by leaders in the line of business in a company. Techniques for uncovering these projects can be replicated as architects gain needed discovery skills. Industrial Internet Architects To-dos: Understand IIoT: Architects first will seek to gain an understanding of what is different about the Industrial Internet, the evolution to specific IIoT solutions, and how legacy technology footprints might fit into that architecture. Understand IIoT project scope and requirements: They next research guidance from industry consortia and gather functional viewpoints. This helps to better understand the requirements their architecture must deliver solutions to, and the scope of effort they will face. Act as a bridge between business and technical requirements: They quickly come to realize that since successful projects are driven by responding to business requirements, the architect must bridge the line of business and IT divide present in many companies. They are always on the lookout for requirements and means to justify these projects. Narrow down viable IIoT solutions: Once the requirements are gathered and a potential project appears to be justifiable, requirements and functional viewpoints are aligned in preparation for defining a solution. Evaluate IIoT architectures and solution delivery models: Time to market of a proposed Industrial Internet solution is often critical to business sponsors. Most architecture evaluations include consideration of applications or pseudo-applications that can be modified to deliver the needed solution in a timely manner. Have a good grasp of IIoT analytics: Intelligence delivered by these solutions is usually linked to the timely analysis of data streams and care is taken in defining Lambda architectures (or Lambda variations) including machine learning and data management components and where analysis and response must occur. Evaluate deployment options: Technology deployment options are explored including the capabilities of proposed devices, networks, and cloud or on-premises backend infrastructures. Assess IIoT Security considerations: Security is top of mind today and proper design includes not only securing the backend infrastructure, but also extends to securing networks and the edge devices themselves. Conform to Governance and compliance policies: The viability of the Industrial Internet solution can be determined by whether proper governance is put into place and whether compliance standards can be met. Keep up with the IIoT landscape: While relying on current best practices, the architect must keep an eye on the future evaluating emerging architecture patterns and solutions. [author title="Author’s Bio" image="http://"]Robert Stackowiak is a technology business strategist at the Microsoft Technology Center in Chicago where he gathers business and technical requirements during client briefings and defines Internet of Things and analytics architecture solutions, including those that reside in the Microsoft Azure cloud. He joined Microsoft in 2016 after a 20-year stint at Oracle where he was Executive Director of Big Data in North America. Robert has spoken at industry conferences around the world and co-authored many books on analytics and data management including Big Data and the Internet of Things: Enterprise Architecture for A New Age, published by Apress, five editions of Oracle Essentials, published by O'Reilly Media, Oracle Big Data Handbook, published by Oracle Press, Achieving Extreme Performance with Oracle Exadata, published by Oracle Press, and Oracle Data Warehousing and Business Intelligence Solutions, published by Wiley. You can follow him on Twitter at @rstackow. [/author]  
Read more
  • 0
  • 0
  • 10339
article-image-why-metadata-important-iot
Raka Mahesa
24 Jan 2018
4 min read
Save for later

Why Metadata is so important for IoT

Raka Mahesa
24 Jan 2018
4 min read
The Internet of Things is growing all the time. However, as IoT takes over the world, there are more and more aspects of it that needs to be addressed, such as security and standardization. It might not be ideal to live in a wild west where everything is connected but there are no guidelines or rules for how to manage and analyze these networks. A crucial part of all this is metadata – as data grows in size, the way we label, categorize and describe it will become more important than ever. Find our latest and forthcoming IoT eBooks and videos here.  We probably shouldn’t be that surprised – if IoT is all about connecting things that wasn’t previously connected – traffic lights, lamps, car parts – good metadata allows us to make sure those connections remain clear and legible. It helps to ensure that things are working properly. A system without definitions, without words and labels, would, after all, get chaotic pretty quickly. Metadata makes it easier to organize IoT data If metadata is, quite simply, data about data, it’s not hard to see why it might be so important when dealing with the expanse of data that is about to be generated thanks to the internet of things. While IoT will clearly largely run on data – information and messages passing between objects, moving within a given system, metadata is incredibly useful in this new world because it allows us to better understand the systems that we are developing. And what’s more, once we have that level of insight, we can begin to do more to further improve and optimize IoT systems using machine learning and artificial intelligence.  Consider how metadata organizes your media library – it would be a mess without it, practically unusable. When you scale that up, we’ll be able to make much smarter use of IoT. Without it, we might well be lost in a chaotic mess of connections.  Metadata, then, allows us to organize and catalog data.  Metadata solves IoT's interoperability problem Metadata can also help with the biggest problem of IoT: interoperability. Interoperability refers to the ability for one device to communicate and exchange data with another device. And this is really important in the context of the Internet of Things, because having great interoperability means more devices can connect with each other.  How does metadata solve interoperability? Well, by using metadata, a device can quickly identify a new device that tries to connect to it by looking at its model number, device class, and other attributes. Once the new device has been identified, our device can find a suitable communication protocol that's supported by both devices to exchange data. Metadata can also be added on the exchanged data, so both devices can read and process the data correctly, just like adding image format metadata allows any application to display that image. Metadata helps to protect legacy hardware and software There's another aspect that metadata can help with. The Internet of Things is an evolving technology where new products are introduced every day, and bring along with them changes and innovations. But what happens to the old products that have been replaced by new ones? With metadata, we can archive and protect the future accessibility of our devices, making sure that new devices can still communicate with older, legacy devices.  That's why metadata is important for the Internet of Things. There are many benefits that can be gained by having a robust system of metadata in the Internet of Things. And as the Internet of Things grows and is used to manage more crucial aspects of our lives, the need for this system will also grow. Raka Mahesa is a game developer at Chocoarts who is interested in digital technology. Outside of work, he enjoys working on his own projects, with Corridoom VR being his latest relesed gme. Raka also regularly tweets @legacy99.
Read more
  • 0
  • 0
  • 23223

article-image-systems-programming-go-unix-linux
Mihalis Tsoukalos
24 Jan 2018
17 min read
Save for later

Systems programming with Go in UNIX and Linux

Mihalis Tsoukalos
24 Jan 2018
17 min read
This is a guest post by Mihalis Tsoukalos. Mihalis is a Unix administrator, programmer, and Mathematician who enjoys writing. He is the author of Go Systems Programming from which this Go programming tutorial is taken. What is Go? Back when UNIX was first introduced, the only way to write systems software was by using C; nowadays you can program systems software using programming languages including Go. Apart from Go, other preferred languages for developing system utilities are Python, Perl, Rust and Ruby. Go is a modern generic purpose open-source programming language that was officially announced at the end of 2009, was begun as an internal Google project and has been inspired by many other programming languages including C, Pascal, Alef and Oberon. Its spiritual fathers are Robert Griesemer, Ken Thomson and Rob Pike that designed Go as a language for professional programmers that want to build reliable and robust software. Apart from its syntax and standard functions, Go comes with a pretty rich and convenient standard library. What is systems programming? Systems programming is a special area of programming on UNIX machines. Please note that Systems programming is not limited to UNIX machines. Most commands that have to do with System Administration tasks such as disk formatting, network interface configuration, module loading, kernel performance tracking, and so on, are implemented using the techniques of Systems Programming. Additionally, the /etc directory, which can be found on all UNIX systems, contains plain text files that deal with the configuration of a UNIX machine and its services and are also manipulated using systems software. You can group the various areas of systems software and related system calls in the following sets: File I/O: This area deals with file reading and writing operations, which is the most important task of an operating system. File input and output must be fast and efficient and, above all, it must be reliable. Advanced File I/O: Apart from the basic input and output system calls, there are also more advanced ways to read or write a file including asynchronous I/O and non-blocking I/O. System files and Configuration: This group of systems software includes functions that allow you to handle system files such as /etc/password and get system specific information such as system time and DNS configuration. Files and Directories: This cluster includes functions and system calls that allow the programmer to create and delete directories and get information such as the owner and the permissions of a file or a directory. Process Control: This group of software allows you to create and interact with UNIX processes. Threads: When a process has multiple threads, it can perform multiple tasks. However, threads must be created, terminated and synchronized, which is the purpose of this collection of functions and system calls. Server Processes: This set includes techniques that allow you to develop server processes, which are processes that get executed in the background without the need for an active terminal. Go is not that good at writing server processes in the traditional UNIX way – but let me explain this a little more. UNIX servers like Apache use fork(2) to create one or more children processes; this process is called forking and refers to cloning the parent process into a child process and continue executing the same executable from the same point and, most importantly, sharing memory. Although Go does not offer an equivalent to the fork(2) function this is not an issue because you can use goroutines to cover most of the uses of fork(2). Interprocess Communication: This set of functions allows processes that run on the same UNIX machine to communicate with each other using features such as pipes, FIFOs, message queues, semaphores and shared memory. Signal Processing: Signals offer processes a way of handling asynchronous events, which can be very handy. Almost all server processes have extra code that allows them to handle UNIX signals using the system calls of this group. Network Programming: This is the art of developing applications that work over computer networks with the he€lp of TCP/IP and is not Systems programming per se. However, most TCP/IP servers and clients are dealing with system resources, users, files and directories so most of the times you cannot create network applications without doing some kind of Systems programming. The challenging thing with Systems programming is that you cannot afford to have an incomplete program; you can either have a fully working, secure program that can be used on a production system or nothing at all. This mainly happens because you cannot trust end users and hackers! The key difficulty in systems programming is the fact that an erroneous system call can make your UNIX machine misbehave or, even worst, crash it! Most security issues on UNIX systems usually come from wrongly implemented systems software because bugs in systems software can compromise the security of an entire system. The worst part is that this can happen many years after using a certain piece of software! Systems programming examples with Go Printing the permission of a file or a directory With the help of the ls(1) command, you can find out the permissions of a file: $ ls -l /bin/ls -rwxr-xr-x 1 root wheel 38624 Mar 23 01:57 /bin/ls The presented Go program, which is named permissions.go, will teach you how to print the permissions of a file or a directory using Go and will be presented in two parts. The first part is the next: package main import ( "fmt" "os" ) func main() { arguments := os.Args if len(arguments) == 1 { fmt.Println("Please provide an argument!") os.Exit(1) } file := arguments[1] The second part contains the important Go code: info, err := os.Stat(file) if err != nil { fmt.Println("Error:", err) os.Exit(1) } mode := info.Mode() fmt.Print(file, ": ", mode, "n") } Once again most of the Go code is for dealing with the command line argument and making sure that you have one! The Go code that does the actual job is mainly the call to the os.Stat() function, which returns a FileInfo structure that describes the file or directory examined by os.Stat(). From the FileInfo structure you can discover the permissions of a file by calling the Mode() function. Executing permissions.go creates the following kind of output: $ go run permissions.go /bin/ls /bin/ls: -rwxr-xr-x $ go run permissions.go /usr /usr: drwxr-xr-x $ go run permissions.go /us Error: stat /us: no such file or directory exit status 1 How to write to files using fmt.Fprintf() The use of the fmt.Fprintf() function allows you to write formatted text to files in a way that is similar to the way the fmt.Printf() function works. The Go code that illustrates the use of fmt.Fprintf() will be named fmtF.go and is going to be presented in three parts. The first part is the expected preamble of the program: package main import ( "fmt" "os" ) The second part has the next Go code: func main() { if len(os.Args) != 2 { fmt.Println("Please provide a filename") os.Exit(1) } filename := os.Args[1] destination, err := os.Create(filename) if err != nil { fmt.Println("os.Create:", err) os.Exit(1) } defer destination.Close() First, you make sure that you have one command line argument before continuing. Then, you read that command line argument and you give it to os.Create() in order to create it! Please note that the os.Create() function will truncate the file if it already exists. The last part is the following: fmt.Fprintf(destination, "[%s]: ", filename) fmt.Fprintf(destination, "Using fmt.Fprintf in %sn", filename) } Here, you write the desired text data to the file that is identified by the destination variable using fmt.Fprintf() as if you were using the fmt.Printf() method. Executing fmtF.go will generate the following output: $ go run fmtF.go test $ cat test [test]: Using fmt.Fprintf in test In other words, you can create plain text files using fmt.Fprintf(). Developing wc(1) in Go The principal idea behind the code of the wc.go program is that you read a text file line by line until there is nothing left to read. For each line you read you find out the number of characters and the number of words it has. As you need to read your input line by line, the use of bufio is preferred instead of the plain io because it simplifies the code. However, trying to implement wc.go on your own using io would be a very educational exercise. But first you will see the kind of output the wc(1) utility generates: $ wcwc.gocp.go 68 160 1231wc.go 45 112 755cp.go 113 272 1986 total So, if wc(1) has to process more than one file, it automatically generates summary information. Counting words The trickiest part of the implementation is word counting, which is implemented using Go regular expressions: r := regexp.MustCompile("[^s]+") for range r.FindAllString(line, -1) { numberOfWords++ } What the provided regular expression does is separating the words of a line based on whitespace characters in order to count them afterwards! The code! After this little introduction, it is time to see the Go code of wc.go, which will be presented in five parts. The first part is the expected preamble: import ( "bufio" "flag" "fmt" "io" "os" "regexp" ) The second part is the implementation of the count() function, which includes the core functionality of the program: func count(filename string) (int, int, int) { var err error varnumberOfLinesint varnumberOfCharactersint varnumberOfWordsint numberOfLines = 0 numberOfCharacters = 0 numberOfWords = 0 f, err := os.Open(filename) if err != nil { fmt.Printf("error opening file %s", err) os.Exit(1) } defer f.Close() r := bufio.NewReader(f) for { line, err := r.ReadString('n') if err == io.EOF { break } else if err != nil { fmt.Printf("error reading file %s", err) } numberOfLines++ r := regexp.MustCompile("[^s]+") for range r.FindAllString(line, -1) { numberOfWords++ } numberOfCharacters += len(line) } return numberOfLines, numberOfWords, numberOfCharacters } There exist lot of interesting things here. First of all, you can see the Go code presented in the previous section for counting the words of each line. Counting lines is easy because each time the bufio reader reads a new line the value of the numberOfLines variable is increased by one. The ReadString() function tells the program to read until the first occurrence of a 'n' in the input – multiple calls to ReadString() mean that you are reading a file line by line. Next, you can see that the count() function returns three integer values. Last, counting characters is implemented with the help of the len() function that returns the number of characters in a given string, which in this case is the line that was read. The for loop terminates when you get the io.EOF error message, which signifies that there is nothing left to read from the input file. The third part of wc.go starts with the beginning of the implementation of the main() function, which also includes the configuration of the flag package: func main() { minusC := flag.Bool("c", false, "Characters") minusW := flag.Bool("w", false, "Words") minusL := flag.Bool("l", false, "Lines") flag.Parse() flags := flag.Args() if len(flags) == 0 { fmt.Printf("usage: wc<file1> [<file2> [... <fileN]]n") os.Exit(1) } totalLines := 0 totalWords := 0 totalCharacters := 0 printAll := false for _, filename := range flag.Args() { The last for statement is for processing all input files given to the program. The wc.go program supports three flags: the -c flag is for printing the character count, the -w flag is for printing the word count and the -l flag is for printing the line count. The fourth part is the next: numberOfLines, numberOfWords, numberOfCharacters := count(filename) totalLines = totalLines + numberOfLines totalWords = totalWords + numberOfWords totalCharacters = totalCharacters + numberOfCharacters if (*minusC&& *minusW&& *minusL) || (!*minusC&& !*minusW&& !*minusL) { fmt.Printf("%d", numberOfLines) fmt.Printf("t%d", numberOfWords) fmt.Printf("t%d", numberOfCharacters) fmt.Printf("t%sn", filename) printAll = true continue } if *minusL { fmt.Printf("%d", numberOfLines) } if *minusW { fmt.Printf("t%d", numberOfWords) } if *minusC { fmt.Printf("t%d", numberOfCharacters) } fmt.Printf("t%sn", filename) } This part deals with the printing of the information on a per file basis depending on the command line flags. As you can see, most of the Go code here is for handling the output according to the command line flags. The last part is the following: if (len(flags) != 1) &&printAll { fmt.Printf("%d", totalLines) fmt.Printf("t%d", totalWords) fmt.Printf("t%d", totalCharacters) fmt.Println("ttotal") return } if (len(flags) != 1) && *minusL { fmt.Printf("%d", totalLines) } if (len(flags) != 1) && *minusW { fmt.Printf("t%d", totalWords) } if (len(flags) != 1) && *minusC { fmt.Printf("t%d", totalCharacters) } if len(flags) != 1 { fmt.Printf("ttotaln") } } This is where you print the total number of lines, words and characters read according to the flags of the program. Once again, most of the Go code here is for modifying the output according to the command line flags. Executing wc.go will generated the following kind of output: $ go build wc.go $ ls -l wc -rwxr-xr-x 1 mtsouk staff 2264384 Apr 29 21:10 wc $ ./wcwc.gosparse.gonotGoodCP.go 120 280 2319 wc.go 44 98 697 sparse.go 27 61 418 notGoodCP.go 191 439 3434 total $ ./wc -l wc.gosparse.go 120 wc.go 44 sparse.go 164 total $ ./wc -w -l wc.gosparse.go 120 280 wc.go 44 98 sparse.go 164 378 total If you do not execute go build wc.go in order to create an executable file, then executing go run wc.go using Go source files as arguments will fail because the compiler will try to compile the Go source files instead of treating them as command line arguments to the go run wc.go command: $ go run wc.gosparse.go # command-line-arguments ./sparse.go:11: main redeclared in this block previous declaration at ./wc.go:49 $ go run wc.gowc.go package main: case-insensitive file name collision: "wc.go" and "wc.go" $ go run wc.gocp.gosparse.go # command-line-arguments ./cp.go:35: main redeclared in this block previous declaration at ./wc.go:49 ./sparse.go:11: main redeclared in this block previous declaration at ./cp.go:35 Additionally, trying to execute wc.go on a Linux system with Go version 1.3.3 will fail because it uses features of Go that can be found in newer versions – if you use the latest Go version you will have no problem running wc.go. The error message you will get will be the following: $ go version go version go1.3.3 linux/amd64 $ go run wc.go # command-line-arguments ./wc.go:40: syntax error: unexpected range, expecting { ./wc.go:46: non-declaration statement outside function body ./wc.go:47: syntax error: unexpected } Reading a text file character by character Although reading a text file character by character is not needed for the development of the wc(1) utility, it would be good to know how to implement it in Go. The name of the file will be charByChar.go and will be presented in four parts. The first part comes with the following Go code: import ( "bufio" "fmt" "io/ioutil" "os" "strings" ) Although charByChar.go does not have many lines of Go code, it needs lots of Go standard packages, which is a naïve indication that the task it implements is not trivial. The second part is: func main() { arguments := os.Args if len(arguments) == 1 { fmt.Println("Not enough arguments!") os.Exit(1) } input := arguments[1] The third part is the following: buf, err := ioutil.ReadFile(input) if err != nil { fmt.Println(err) os.Exit(1) } The last part has the next Go code: in := string(buf) s := bufio.NewScanner(strings.NewReader(in)) s.Split(bufio.ScanRunes) for s.Scan() { fmt.Print(s.Text()) } } ScanRunes is a split function that returns each character (rune) as a token. Then the call to Scan() allows us to process each character one by one. There also exist ScanWords and ScanLines for getting words and lines scanned, respectively. If you use fmt.Println(s.Text()) as the last statement to the program instead of fmt.Print(s.Text()), then each character will be printed in its own line and the task of the program will be more obvious. Executing charByChar.go generates the following kind of output: $ go run charByChar.go test package main … The wc(1) command can verify the correctness of the Go code of charByChar.go by comparing the input file with the output generated by charByChar.go: $ go run charByChar.go test | wc 32 54 439 $ wc test 32 54 439 test How to create sparse files in Go Big files that are created with the os.Seek() function may have holes in them and occupy fewer disk blocks than files with the same size but without holes in them; such files are called sparse files. This section will develop a program that creates sparse files. The Go code of sparse.go will be presented in three parts. The first part is: package main import ( "fmt" "log" "os" "path/filepath" "strconv" ) The second part of sparse.go has the following Go code: func main() { if len(os.Args) != 3 { fmt.Printf("usage: %s SIZE filenamen", filepath.Base(os.Args[0])) os.Exit(1) } SIZE, _ := strconv.ParseInt(os.Args[1], 10, 64) filename := os.Args[2] _, err := os.Stat(filename) if err == nil { fmt.Printf("File %s already exists.n", filename) os.Exit(1) } The strconv.ParseInt() function is used for converting the command line argument that defines the size of the sparse file from its string value to its integer value. Additionally, the os.Stat() call makes sure that you will not accidentally overwrite an existing file. The last part is where the action takes place: fd, err := os.Create(filename) if err != nil { log.Fatal("Failed to create output") } _, err = fd.Seek(SIZE-1, 0) if err != nil { fmt.Println(err) log.Fatal("Failed to seek") } _, err = fd.Write([]byte{0}) if err != nil { fmt.Println(err) log.Fatal("Write operation failed") } err = fd.Close() if err != nil { fmt.Println(err) log.Fatal("Failed to close file") } } First, you try to create the desired sparse file using os.Create(). Then, you call fd.Seek() in order to make the file bigger without adding actual data. Last, you write a byte to it using fd.Write(). As you do not have anything more to do with the file, you call fd.Close() and you are done. Executing sparse.go generates the following output: $ go run sparse.go 1000 test $ go run sparse.go 1000 test File test already exists. exit status 1 How can you tell whether a file is a sparse file or not? You will learn in a while, but first let us create some files: $ go run sparse.go 100000 testSparse $ dd if=/dev/urandom bs=1 count=100000 of=noSparseDD 100000+0 records in 100000+0 records out 100000 bytes (100 kB) copied, 0.152511 s, 656 kB/s $ dd if=/dev/urandom seek=100000 bs=1 count=0 of=sparseDD 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000159399 s, 0.0 kB/s $ ls -l noSparse DDsparse DDtestSparse -rw-r--r-- 1 mtsoukmtsouk 100000 Apr 29 21:43 noSparseDD -rw-r--r-- 1 mtsoukmtsouk 100000 Apr 29 21:43 sparseDD -rw-r--r-- 1 mtsoukmtsouk 100000 Apr 29 21:40 testSparse So, how can you tell if any of the three files is a sparse file or not? The -s flag of the ls(1) utility shows the number of file system blocks actually used by a file. So, the output of the ls -ls command allows you to detect if you are dealing with a sparse file or not: $ ls -ls noSparse DDsparse DDtestSparse 104 -rw-r--r-- 1 mtsoukmtsouk 100000 Apr 29 21:43 noSparseDD 0 -rw-r--r-- 1 mtsoukmtsouk 100000 Apr 29 21:43 sparseDD 8 -rw-r--r-- 1 mtsoukmtsouk 100000 Apr 29 21:40 testSparse Now look at the first column of the output. The noSparseDD file, which was generated using the dd(1) utility, is not a sparse file. The sparseDD file is a sparse file generated using the dd(1) utility. Last, the testSparse is also a sparse file that was created using sparse.go. Mihalis Tsoukalos is a Unix administrator, programmer, DBA and mathematician who enjoys writing. He is currently writing Mastering Go. His research interests include programming languages, databases and operating systems. He holds a B.Sc in Mathematics from the University of Patras and an M.Sc in IT from University College London (UK). He has written various technical articles for Sys Admin, MacTech, C/C++ Users Journal, Linux Journal, Linux User and Developer, Linux Format and Linux Voice.
Read more
  • 0
  • 0
  • 36194

article-image-why-are-android-developers-switching-java-kotlin
Hari Vignesh
23 Jan 2018
4 min read
Save for later

Why are Android developers switching from Java to Kotlin?

Hari Vignesh
23 Jan 2018
4 min read
When we talk about Android app development, the first programming language that comes to mind is 'Java'. However Java isn’t the only language you can use for Android programming – you can use any language that compiles to the JVM. Recently, a new language has caught the attention of the Android community – Kotlin. Kotlin has actually been around since 2011, but it was only in May 2017 that Google announced that the language was to become an officially supported language in the Android operating system. This is one of the many reasons why Kotlin’s adoption has been so dramatic. The Realm report, published at the end of 2017 suggests that Kotlin is likely to overtake Java in terms of usage in the next couple of years. When you want to work on custom Android applications, an advanced technology will help you achieve your goals. Java and Kotlin are commonly used languages for Google for writing Android Apps. A great importance is given to programming languages because it might cut down some of your time and money. Want to learn Kotlin? Find Kotlin eBooks and videos in our library. There are many reasons why mobile developers are choosing to switch from Java to Kotlin. Below are some of the most significant. Kotlin is easy for anyone who knows Java to learn Similarities in typing and syntax make Kotlin very easy to master for anyone who’s already working with Java. If you’re worried about a steep learning curve, you'll be pleasantly surprised by how easy it is for developers to dive into coding in Kotlin. Kotlin is evolving with a lot of support from the developer community. A lot of developers who contribute to Kotlin’s evolution are freelancers who find work on different platforms and experience a wide range of smaller projects with varied needs. Other contributors include larger companies and industry giants like Google. Kotlin needs 20 percent less coding compared to Java. Java is a bit outdated, which means every new launch has to support features included in the previous version. This eventually increases the code to write, resulting in absence of layer-to-layer architecture. If you compare the coding of Java class and Kotlin class, you will find that the one written in Kotlin will be much more compact than the one written in Java. Kotlin has Android Studio support Because Kotlin is built by JetBrains, it’s unsurprising that Android Studio (also a JetBrains product) has excellent support for Kotlin. Android Studio makes it incredibly easy to configure Kotlin in your project; it’s as straightforward as simply opening a few menus. Your IDE will have no problem understanding, compiling and running Kotlin code once you have set up Kotlin for Android Studio. After configuring Kotlin for Android Studio, you can convert the entire Java source file into a Kotlin file. The fact that Kotlin is Java compatible makes it a uniquely useful language that can leverage JVMs while at the same time be used to update and improve enterprise-level solutions that have enormous codebases written in Java. Kotlin is great for procedural programming Every programming paradigm has its own set of strengths and weaknesses. There will always be certain scenarios where one is more effective than another. One thing that’s so appealing about Kotlin is that it combines the strengths of two different approaches – procedural and functional. True, the largely procedural approach can sometimes be the most challenging aspect of the language when you first start to get to grips with it. However, the level of control such an approach can give you is well worth the investment of your time. Kotlin makes development more efficient and your life easier This follows on nicely from the point above. While certain aspects of Kotlin require patience and concentration to master, in the long run, with less code, errors and bugs will be greatly reduced. That saves you time, making coding much more enjoyable rather than an administrative nightmare of spaghetti code. There are plenty of features in Kotlin that makes it a practical solution to today’s programming challenges. Where JetBrains takes the language next remains to be seen – we could, perhaps, see Kotlin make a move towards iOS development, and if it compiled to JavaScript we may also begin to see it used more and more within web development. Of course, this will largely be down to JetBrain’s goals and just how much they want Kotlin to dominate the developer landscape. Hari Vignesh Jayapalan is a Google Certified Android app developer, IDF Certified UI & UX Professional, street magician, fitness freak, technology enthusiast, and wannabe entrepreneur. He can be found on Twitter @HariofSpades.
Read more
  • 0
  • 0
  • 17669
article-image-heres-how-you-can-handle-the-bias-variance-trade-off-in-your-ml-models
Savia Lobo
22 Jan 2018
8 min read
Save for later

Here's how you can handle the bias variance trade-off in your ML models

Savia Lobo
22 Jan 2018
8 min read
Many organizations rely on machine learning techniques in their day-today workflow, to cut down on the time required to do a job. The reason why these techniques are robust is because they undergo various tests in order to carry out correct predictions about any data fed into them. During this phase, there are also certain errors generated, which can lead to an inconsistent ML model. Two common errors that we are going to look at in this article are that of bias and Variance, and how a trade-off can be achieved between the two in order to generate a successful ML model.  Let’s first have a look at what creates these kind of errors. Machine learning techniques or more precisely supervised learning techniques involve training, often the most important stage in the ML workflow. The machine learning model is trained using the training data. How is this training data prepared? This is done by using a dataset for which the output of the algorithm is known. During the training stage, the algorithm analyzes the training data that is fed and produces patterns which are captured within an inferred function. This inferred function, which is derived after analysis of the training dataset, is the model that would be further used to map new examples. An ideal model generated from this training data should be able to generalize well. This means, it should learn from the training data and should correctly predict or classify data within any new problem instance. In general, the more complex the model is, the better it classifies the training data. However, if the model is too complex i.e it will pick up random features i.e. noise in the training data, this is the case of overfitting i.e. the model is said to overfit . On the other hand, if the model is not so complex, or missing out on important dynamics present within the data, then it is a case of underfitting. Both overfitting and underfitting are basically errors in the ML models or algorithms. Also, it is generally impossible to minimize both these errors at the same time and this leads to a condition called as the Bias-Variance Tradeoff. Before getting into knowing how to achieve the trade-off, lets simply understand how bias and variance errors occur. The Bias and Variance Error Let’s understand each error with the help of an example. Suppose you have 3 training datasets say T1, T2, and T3, and you pass these datasets through a supervised learning algorithm. The algorithm generates three different models say M1, M2, and M3 from each of the training dataset. Now let’s say you have a new input A. The whole idea is to apply each model on this new input A. Here, there can be two types of errors that can occur. If the output generated by each model on the input A is different(B1, B2, B3), the algorithm is said to have a high Variance Error. On the other hand, if the output from all the three models is same (B) but incorrect, the algorithm is said to have a high Bias Error. High Variance also means that the algorithm produces a model that is too specific to the training data, which is a typical case of Overfitting. On the other hand, high bias means that the algorithm has not picked up defining patterns from the dataset, this is a case of Underfitting. Some examples of high-bias ML algorithms are: Linear Regression, Linear Discriminant Analysis and Logistic Regression Examples of high-variance Ml algorithms are: Decision Trees, k-Nearest Neighbors and Support Vector Machines.  How to achieve a Bias-Variance Trade-off? For any supervised algorithm, having a high bias error usually means it has low variance error and vise versa. To be more specific, parametric or linear ML algorithms often have a high bias but low variance. On the other hand, non-parametric or non-linear algorithms have vice versa. The goal of any ML model is to obtain a low variance and a low bias state, which is often a task due to the parametrization of machine learning algorithms. So how can we achieve a trade-off between the two? Following are some ways to achieve the Bias-Variance Tradeoff: By minimizing the total error: The optimum location for any model is the level of complexity at which the increase in bias is equivalent to the reduction in variance. Practically, there is no analytical method to find the optimal level. One should use an accurate measure for error prediction and explore different levels of model complexity, and then choose the complexity level that reduces the overall error. Generally resampling based measures such as cross-validation should be preferred over theoretical measures such as Aikake's Information Criteria. Source: http://scott.fortmann-roe.com/docs/BiasVariance.html (The irreducible error is the noise that cannot be reduced by algorithms but can be reduced with better data cleaning.) Using Bagging and Resampling techniques: These can be used to reduce the variance in model predictions. In bagging (Bootstrap Aggregating), several replicas of the original dataset are created using random selection with replacement. One modeling algorithm that makes use of bagging is Random Forests. In Random Forest algorithm, the bias of the full model is equivalent to the bias of a single decision tree--which itself has high variance. By creating many of these trees, in effect a "forest", and then averaging them the variance of the final model can be greatly reduced over that of a single tree. Adjusting minor values in algorithms: Both the k-nearest algorithms and Support Vector Machines(SVM) algorithms have low bias and high variance. But the trade-offs in both these cases can be changed. In the K-nearest algorithm, the value of k can be increased, which would simultaneously increase the number of neighbors that contribute to the prediction. This in turn would increase the bias of the model. Whereas, in the SVM algorithm, the trade-off can be changed by an increase in the C parameter that would influence the violations of the margin allowed in the training data. This will increase the bias but decrease the variance. Using a proper Machine learning workflow: This means you have to ensure proper training by: Maintaining separate training and test sets - Splitting the dataset into training (50%), testing(25%), and validation sets ( 25%). The training set is to build the model, test set is to check the accuracy of the model, and the validation set is to evaluate the performance of your model hyperparameters. Optimizing your model by using systematic cross-validation - A cross-validation technique is a must to fine tune the model parameters, especially for unknown instances. In supervised machine learning, validation or cross-validation is used to find out the predictive accuracy within various models of varying complexity, in order to find the best model.For instance, one can use the k-fold cross validation method. Here, the dataset is divided into k folds. For each fold, train the algorithm on k-1 folds iteratively, using the remaining fold(also called as 'holdout fold')as the test set. Repeat this process until each k has acted as a test set. The average of the k recorded errors is called as the cross validation error and can serve as the performance metric for the model.   Trying out appropriate algorithms - Before relying on any model we need to first ensure that the model works best for our assumptions. One can make use of the No Free Lunch theorem, which states that one model can not work for only one problem. For instance, while using No Free lunch theorem, a random search will do the same as any of the heuristic optimization algorithms.   Tuning the hyperparameters that can give an impactful performance - Any machine learning model requires different hyperparameters such as constraints, weights or learning rates for generalizing different data patterns. Tuning these hyperparameters is necessary so that the model can optimally solve machine learning problems. Grid search and randomized search are two such methods practiced for hyperparameter tuning. So, we have listed some of the ways where you can achieve trade-off between the two. Both bias and variance are related to each other, if you increase one the other decreases and vice versa. By a trade-off, there is an optimal balance in the bias and variance which gives us a model that is neither underfit nor overfit. And finally, the ultimate goal of any supervised machine algorithm lies in isolating the signal from the dataset, and making sure that it eliminates the noise.  
Read more
  • 0
  • 0
  • 20373

article-image-what-you-need-to-know-about-generative-adversarial-networks
Guest Contributor
19 Jan 2018
7 min read
Save for later

What you need to know about Generative Adversarial Networks

Guest Contributor
19 Jan 2018
7 min read
[box type="note" align="" class="" width=""]We have come to you with another guest post by Indra den Bakker, an experienced deep learning engineer and a mentor on Udacity for many budding data scientists. Indra has also written one of our best selling titles, Python Deep Learning Cookbook which covers solutions to various problems in modeling deep neural networks.[/box] In 2014, we took a significant step in AI with the introduction of Generative Adversarial Networks -better known as GANs- by Ian Goodfellow, amongst others. The real breakthrough of GANs didn’t follow until 2016, however, the original paper includes many novel ideas that would be exploited in the years to come. Previously, deep learning had already revolutionized many industries by achieving above human performance. However, many critics argued that these deep learning models couldn’t compete with human creativity. With the introduction to GANs, Ian showed that these critics could be wrong. Figure 1: example of style transfer with deep learning The idea behind GANs is to create new examples based on a training set - for example to demonstrate the ability to create new paintings or new handwritten digits. In GANs two competing deep learning models are trained simultaneously. These networks compete against each other: one model tries to generate new realistic examples, this network is also called the generator. The other network tries to classify if an example originates from the training set or from the generator, also called as discriminator. In other words, the generator tries to mislead the discriminator by generating new examples. In the figure below we can see the general structure of GANs. Figure 2: GAN structure with X as training examples and Z as noise input. GANs are fundamentally different from other machine learning applications. The task of a GAN is unsupervised: we try to extract patterns and structure from data without additional information. Therefore, we don’t have a truth label. GANs shouldn’t be confused with autoencoder networks. With autoencoders we know what the output should be: the same as the input. But in case of GANs we try to create new examples that look like the training examples but are different. It’s a new way of teaching an agent to learn complex tasks by imitating an “expert”. If the generator is able to fool the discriminator one could argue that the agent mastered the task - think about the Turing test. Best way to explain GANs is to use images as an example. The resulting output of GANs can be fascinating. The most used dataset for GANs is the popular MNIST dataset. This dataset has been used in many deep learning papers, including the original Generative Adversarial Nets paper. Figure 3: example of MNIST training images Let’s say as input we have a bunch of handwritten digits. We want our model to be able to take these examples and create new handwritten digits. We want our model to learn how to write digits in such a way that it looks like handwritten digits. Note, that we don’t care which digits the model creates as long as it looks like one of the digits from 0 to 9. As you may suspect, there is a thin line between generating examples that are exact copies of the training set and newly created images. We need to make sure that the generator generates new images that follow the distribution of the training examples but are slightly different. This is where the creativity needs to come in. In Figure 2, we’ve showed that the generator uses noise -random values- as input. This noise is random, to make sure that the generator creates different output each time. Now that we know what we need and what we want to achieve, let’s have a closer look at both model architectures. Let’s start with the generator. We will feed the generator with random noise: a vector of 100 values randomly drawn between -1 and 1. Next, we stack multiple fully connected layers with Leaky ReLU activation function. Our training images are in grayscale and are sized as 28x28. Which means, flattened we need an output of 784 units for the final layer of our generator - the output of the generator should match the size of the training images. As activation function for our final layer we will be using TanH to make sure the resulting values are squeezed between -1 and 1. The final model architecture of our generator looks as follows: Figure 4: model architecture of the generator Next, we define our discriminator model. Most common is to use a mirrored version of the generator, where we have as input 784 values and as final layer a fully connected layer with 1 hidden neuron and sigmoid activation function for binary classification. Keep in mind that both the generator and discriminator are trained at the same time. The model looks like this: Figure 5: model architecture of the discriminator In general, generating new images is a harder task. Therefore, sometimes it can be beneficial to train the generator twice for each step. Whereas the discriminator will only be trained once. Another option is to set the learning rate for the discriminator a bit smaller than the learning rate for the generator. Tracking the performance of GANs can be tricky. Sometimes a lower loss doesn’t represent a better output. That’s why it’s a good idea to output the generated images during the training process. In the following figure we can see the digits generated by a GAN after 20 epochs. Figure 6: example output of generated MNIST images As we have stated in the introduction, GANs didn’t get much traction until 2016. GANs were mostly unstable and hard to train. Small adjustments in the model or training parameter resulted in unsatisfying results. Advancements in model architecture and other improvements fixed some of the previous limitations and unlocked the real potential of GANs. An important improvement was introduced by Deep Convolutional GANs (DCGANs). DCGANs is a network architecture, where in both the discriminator and generator are fully convolutional. The output is more stable - for datasets with higher translation invariance, like the Fashion MNIST dataset. Figure 7: example of Fashion MNIST images generated by a Deep Convolutional Generative Adversarial Network (DCGAN) There is so much more to discover with GANs and there is huge potential still to be unlocked. According to Yann LeCun - one of the fathers of deep learning - GANs are the most important advancement in machine learning in the last 20 years. GANs can be used for many different applications, ranging from 3D face generation to upscaling resolution of images and text-to-image. GANs might be the stepping stone we have been waiting for to add creativity to machines. [author title="Author's Bio"]Indra den Bakker is an experienced deep learning engineer and mentor on Udacity. He is the founder of 23insights, a part of NVIDIA's Inception program—a machine learning start-up building solutions that transform the world’s most important industries. For Udacity, he mentors students pursuing a Nanodegree in deep learning and related fields, and he is also responsible for reviewing student projects. Indra has a background in computational intelligence and has worked for several years as a data scientist for IPG Mediabrands and Screen6 before founding 23insights. [/author]      
Read more
  • 0
  • 0
  • 39092