Tech Guides

article-image-9-data-science-myths-debunked

03 Jul 2018

9 min read

9 Data Science Myths Debunked

03 Jul 2018

The benefits of data science are evident for all to see. Not only does it equip you with the tools and techniques to make better business decisions, the predictive power of analytics also allows you to determine future outcomes - something that can prove to be crucial to businesses. Despite all these advantages, data science is a touchy topic for many businesses. It’s worth looking at some glaring stats that show why businesses are reluctant to adopt data science: Poor data across businesses and organizations - in both private and government costs the U.S economy close to $3 Trillion per year. Only 29% enterprises are able to properly leverage the power of Big Data and derive useful business value from it. These stats show a general lack of awareness or knowledge when it comes to data science. It could be due to some preconceived notions, or simply lack of knowledge and its application that seems to be a huge hurdle to these companies. In this article, we attempt to take down some of these notions and give a much clearer picture of what data science really is. Here are 5 of the most common myths or misconceptions in data science, and why are absolutely wrong: Data Science is just a fad, it won’t last long This is probably the most common misconception. Many tend to forget that although ‘data science’ is a recently coined term, this field of study is a cumulation of decades of research and innovation in statistical methodologies and tools. It has been in use since the 1960s or even before - just that the scale at which it was being used then was small. Back in the day, there were no ‘data scientists’, but just statisticians and economists who used the now unknown terms such as ‘data fishing’ or ‘data dredging’. Even the terms ‘data analysis’ and ‘data mining’ only went mainstream in the 1990s, but they were in use way before that period. Data Science’s rise to fame has coincided with the exponential rise in the amount of data being generated every minute. The need to understand this information and make positive use of it led to an increase in the demand for data science. Now with Big Data and Internet of Things going wild, the rate of data generation and the subsequent need for its analysis will only increase. So if you think data science is a fad that will go away soon, think again. Data Science and Business Intelligence are the same Those who are unfamiliar with what data science and Business Intelligence actually entail often get confused, and think they’re one and the same. No, they’re not. Business Intelligence is an umbrella term for the tools and techniques that give answers to the operational and contextual aspects of your business or organization. Data science, on the other hand has more to do with collecting information in order to build patterns and insights. Learning about your customers or your audience is Business Intelligence. Understanding why something happened, or whether it will happen again, is data science. If you want to gauge how changing a certain process will affect your business, data science - not Business Intelligence - is what will help you. Data Science is only meant for large organizations with large resources Many businesses and entrepreneurs are wrongly of the opinion that data science is - or works best - only for large organizations. It is a wrongly perceived notion that you need sophisticated infrastructure to process and get the most value out of your data. In reality, all you need is a bunch of smart people who know how to get the best value of the available data. When it comes to taking a data-driven approach, there’s no need to invest a fortune in setting up an analytics infrastructure for an organization of any scale. There are many open source tools out there which can be easily leveraged to process large-scale data with efficiency and accuracy. All you need is a good understanding of the tools. It is difficult to integrate data science systems with the organizational workflow With the advancement of tech, one critical challenge that has now become very easy to overcome is to collaborate with different software systems at once. With the rise of general-purpose programming languages, it is now possible to build a variety of software systems using a single programming language. Take Python for example. You can use it to analyze your data, perform machine learning or develop neural networks to work on more complex data models. All this while, you can link your web API designed in Python to communicate with these data science systems. There are provisions being made now to also integrate codes written in different programming languages while ensuring smooth interoperability and no loss of latency. So if you’re wondering how to incorporate your analytics workflow in your organizational workflow, don’t worry too much. Data Scientists will be replaced by Artificial Intelligence soon Although there has been an increased adoption of automation in data science, the notion that the work of a data scientist will be taken over by an AI algorithm soon is rather interesting. Currently, there is an acute shortage of data scientists, as this McKinsey Global Report suggests. Could this change in the future? Will automation completely replace human efforts when it comes to data science? Surely machines are a lot better than humans at finding patterns; AI best the best go player, remember. This is what the common perception seems to be, but it is not true. However sophisticated the algorithms become in automating data science tasks, we will always need a capable data scientist to oversee them and fine-tune their performance. Not just that, businesses will always need professionals with strong analytical and problem solving skills with relevant domain knowledge. They will always need someone to communicate the insights coming out of the analysis to non-technical stakeholders. Machines don’t ask questions of data. Machines don’t convince people. Machines don’t understand the ‘why’. Machines don’t have intuition. At least, not yet. Data scientists are here to stay, and their demand is not expected to go down anytime soon. You need a Ph.D. in statistics to be a data scientist No, you don’t. Data science involves crunching numbers to get interesting insights, and it often involves the use of statistics to better understand the results. When it comes to performing some advanced tasks such as machine learning and deep learning, sure, an advanced knowledge of statistics helps. But that does not imply that people who do not have a degree in maths or statistics cannot become expert data scientists. Today, organizations are facing a severe shortage of data professionals capable of leveraging the data to get useful business insights. This has led to the rise of citizen data scientists - meaning professionals who are not experts in data science, but can use the data science tools and techniques to create efficient data models. These data scientists are no experts in statistics and maths, they just know the tool inside out, ask the right questions, and have the necessary knowledge of turning data into insights. Having an expertise of the data science tools is enough Many people wrongly think that learning a statistical tool such as SAS, or mastering Python and its associated data science libraries is enough to get the data scientist tag. While learning a tool or skill is always helpful (and also essential), by no means is it the only requisite to do effective data science. One needs to go beyond the tools and also master skills such as non-intuitive thinking, problem-solving, and knowing the correct practical applications of a tool to tackle any given business problem. Not just that, it requires you to have excellent communication skills to present your insights and findings related to the most complex of analysis to other stakeholders, in a way they can easily understand and interpret. So if you think that a SAS certification is enough to get you a high-paying data science job and keep it, think again. You need to have access to a lot of data to get useful insights Many small to medium-sized businesses don’t adopt a data science framework because they think it takes lots and lots of data to be able to use the analytics tools and techniques. Data when present in bulk, always helps, true, but don’t need hundreds of thousands of records to identify some pattern, or to extract relevant insights. Per IBM, data science is defined by the 4 Vs of data, meaning Volume, Velocity, Veracity and Variety. If you are able to model your existing data into one of these formats, it automatically becomes useful and valuable. Volume is important to an extent, but it’s the other three parameters that add the required quality. More data = more accuracy Many businesses collect large hordes of information and use the modern tools and frameworks available at their disposal for analyzing this data. Unfortunately, this does not always guarantee accurate results. Neither does it guarantee useful actionable insights or more value. Once the data is collected, the preliminary analysis on what needs to be done with the data is required. Then, we use the tools and frameworks at our disposal to extract the relevant insights and built an appropriate data model. These models need to be fine-tuned as per the processes for which they will be used. Then, eventually, we get the desired degree of accuracy from the model. Data in itself is quite useless. It’s how we work on it - more precisely, how effectively we work on it - that makes all the difference. So there you have it! Data science is one of the most popular skills to have in your resume today, but it is important to first clear all the confusions and misconceptions that you may have about it. Lack of information or misinformation can do more harm than good, when it comes to leveraging the power of data science within a business - especially considering it could prove to be a differentiating factor for its success and failure. Do you agree with our list? Do you think there are any other commonly observed myths around data science that we may have missed? Let us know. Read more 30 common data science terms explained Why is data science important? 15 Useful Python Libraries to make your Data Science tasks Easier

0
0
41962

article-image-8-ways-artificial-intelligence-can-improve-devops

Prasad Ramesh

01 Sep 2018

6 min read

8 ways Artificial Intelligence can improve DevOps

Prasad Ramesh

01 Sep 2018

6 min read

DevOps combines development and operations in an agile manner. ITOps refers to network infrastructure, computer operations, and device management. AIOps is artificial intelligence applied to ITOps, a term coined by Gartner. Makes us wonder what AI applied to DevOps would look like. Currently, there are some problem areas in DevOps that mainly revolve around data. Namely, accessing the large pool of data, taking actions on it, managing alerts etc. Moreover, there are errors caused by human intervention. AI works heavily with data and can help improve DevOps in numerous ways. Before we get into how AI can improve DevOps, let’s take a look at some of the problem areas in DevOps today. The trouble with DevOps Human errors: When testing or deployment is performed manually and there is an error, it is hard to repeat and fix. Many a time, the software development is outsourced in companies. In such cases, there is lack of coordination between the dev and ops teams. Environment inconsistency: Software functionality breaks when the code moves to different environments as each environment has different configurations. Teams can run around wasting a lot of time due to bugs when the software works fine on one environment but not on the other. Change management: Many companies have change management processes well in place, but they are outdated for DevOps. The time taken for reviews, passing a new module etc is manual and proves to be a bottleneck. Changes happen frequently in DevOps and the functioning suffers due to old processes. Monitoring: Monitoring is key to ensure smooth functioning in Agile. Many companies do not have the expertise to monitor the pipeline and infrastructure. Moreover monitoring only the infrastructure is not enough, there also needs to be monitoring of application performance, solutions need to be logged and analytics need to be tracked. Now let’s take a look at 8 ways AI can improve DevOps given the above context. 1. Better data access One of the most critical issues faced by DevOps teams is the lack of unregulated access to data. There is also a large amount of data, while the teams rarely view all of the data and focus on the outliers. The outliers only work as an indicator but do not give robust information. Artificial intelligence can compile and organize data from multiple sources for repeated use. Organized data is much easier to access and understand than heaps of raw data. This will help in predictive analysis and eventually a better decision making process. This is very important and enables many other ways listed below. 2. Superior implementation efficiency Artificially intelligent systems can work with minimal or no human intervention. Currently, a rules-based environment managed by humans is followed in DevOps teams. AI can transform this into self governed systems to greatly improve operational efficiency. There are limitations to the volume and complexity of analysis a human can perform. Given the large volumes of data to be analyzed and processed, AI systems being good at it, can set optimal rules to maximize operational efficiencies. 3. Root cause analysis Conducting root cause analysis is very important to fix an issue permanently. Not getting to the root cause allows for the cause to persist and affect other areas further down the line. Often, engineers don’t investigate failures in depth and are more focused on getting the release out. This is not surprising given the limited amount of time they have to work with. If fixing a superficial area gets things working, the root cause is not found. AI can take all data into account and see patterns between activity and cause to find the root cause of failure. 4 Automation Complete automation is a problem in DevOps, many tasks in DevOps are routine and need to be done by humans. An AI model can automate these repeatable tasks and speed up the process significantly. A well-trained model increases the scope of complexity of the tasks that can be automated by machines. AI can help achieve least human intervention so that developers can focus on more complex interactive problems. Complete automation also allows the errors to be reproduced and fixed promptly. 5 Reduce Operational Complexity AI can be used to simplify operations by providing a unified view. An engineer can view all the alerts and relevant data produced by the tools in a single place. This improves the current scenario where engineers have to switch between different tools to manually analyze and correlate data. Alert prioritization, root cause analysis, evaluating unusual behavior are complex time consuming tasks that depend on data. An organized singular view can greatly benefit in looking up data when required. “AI and machine learning makes it possible to get a high-level view of the tool-chain, but at the same time zoom in when it is required.” -SignifAI 6 Predicting failures A critical failure in a particular tool/area in DevOps can cripple the process and delay cycles. With enough data, machine learning models can predict when an error can occur. This goes beyond simple predictions. If an occurred fault is known to produce certain readings, AI can read patterns and predict the signs failure. AI can see indicators that humans may not be able to. As such early failure prediction and notification enable the team to fix it before it can affect the software development life cycle (SDLC). 7 Optimizing a specific metric AI can work towards solutions where the uptime is maximized. An adaptive machine learning system can learn how the system works and improve it. Improving could mean tweaking a specific metric in the workflow for optimized performance. Configurations can be changed by AI for optimal performance as required during different production phases. Real-time analysis plays a big part in this. 8 Managing Alerts DevOps systems can be flooded with alerts which are hard for humans to read and act upon. AI can analyze these alerts in real-time and categorize them. Assigning priority to alerts helps teams towards work on fixing them rather than going through a long list of alerts. The alerts can simply be tagged by a common ID for specific areas or AI can be trained for classifying good and bad alerts. Prioritizing alerts in such a way that flaws are shown first to be fixed will help smooth functioning. Conclusion As we saw, most of these areas depend heavily on data. So getting the system right to enhance data accessibility is the first step to take. Predictions work better when data is organized, performing root cause analysis is also easier. Automation can repeat mundane tasks and allow engineers to focus on more interactive problems that machines cannot handle. With machine learning, the overall operation efficiency, simplicity, and speed can be improved for smooth functioning of DevOps teams. Why Agile, DevOps and Continuous Integration are here to stay: Interview with Nikhil Pathania, DevOps practitioner Top 7 DevOps tools in 2018 GitLab’s new DevOps solution

0
0
41796

article-image-top-5-programming-languages-big-data

Amey Varangaonkar

04 Apr 2018

8 min read

Top 5 programming languages for crunching Big Data effectively

Amey Varangaonkar

04 Apr 2018

8 min read

One of the most important decisions that Big Data professionals have to make, especially the ones who are new to the scene or are just starting out, is choosing the best programming languages for big data manipulation and analysis. Understanding the Big Data problem and framing the architecture to solve it is not quite enough these days - the execution needs to be perfect as well, and choosing the right language goes a long way. The best languages for big data In this article, we look at the 5 of the most popularly used - not to mention highly effective - programming languages for developing Big Data solutions. Scala A beautiful crossover of the object-oriented and functional programming paradigms, Scala is fast and robust, and a popular choice of language for many Big Data professionals.The fact that two of the most popular Big Data processing frameworks in Apache Spark and Apache Kafka have been built on top of Scala tells you everything you need to know about the power of Scala. Scala runs on the JVM, which means the codes written in Scala can be easily used within a Java-based Big Data ecosystem. One significant factor that differentiates Scala from Java, though, is that Scala is a lot less verbose in comparison. You can write 100s of lines of confusing-looking Java code in less than 15 lines in Scala. One negative aspect of Scala, though, is its steep learning curve when compared to languages like Go and Python, and this may put off beginners looking to use it. Why use Scala for big data? Fast and robust Suitable for working with Big Data tools like Apache Spark for distributed Big Data processing JVM compliant, can be used in a Java-based ecosystem Python Python has been declared as one of the fastest growing programming languages in 2018 as per the recently held Stack Overflow Developer Survey. Its general-purpose nature means it can be used across a broad spectrum of use-cases, and Big Data programming is one major area of application. Many libraries for data analysis and manipulation which are increasingly being used in a Big Data framework to clean and manipulate large chunks of data, such as pandas, NumPy, SciPy - are all Python-based. Not just that, most popular machine learning and deep learning frameworks such as scikit-learn, Tensorflow and many more, are also written in Python and are finding increasing application within the Big Data ecosystem. One drawback of using Python, and a reason why it is not a first-class citizen when it comes to Big Data programming yet, is that it’s slow. Although very easy to use, Big Data professionals have found systems built with languages such as Java or Scala faster and more robust to use than the systems built with Python. However, Python makes up for this limitation with other qualities. As Python is primarily a scripting language, interactive coding and development of analytical solutions for Big Data becomes very easy. Python can integrate effortlessly with the existing Big Data frameworks such as Apache Hadoop and Apache Spark, allowing you to perform predictive analytics at scale without any problem. Why use Python for big data? General-purpose Rich libraries for data analysis and machine learning Easy to use Supports iterative development Rich integration with Big Data tools Interactive computing through Jupyter notebooks R It won’t come as a surprise to many that those who love statistics, love R. The ‘language of statistics’ as it is popularly called as, R is used to build data models which can be used for effective and accurate data analysis. Powered by a large repository of R packages (CRAN, also called as Comprehensive R Archive Network), with R you have just about every type of tool to accomplish any task in Big Data processing - right from analysis to data visualization. R can be integrated seamlessly with Apache Hadoop and Apache Spark, among other popular frameworks, for Big Data processing and analytics. One issue with using R as a programming language for Big Data is that it is not very general-purpose. It means the code written in R is not production-deployable and generally has to be translated to some other programming language such as Python or Java. That said, if your goal is to only build statistical models for Big Data analytics, R is an option you should definitely consider. Why use R for big data? Built for data science Support for Hadoop and Spark Strong statistical modeling and visualization capabilities Support for Jupyter notebooks Java Last, but not the least, there’s always the good old Java. Some of the traditional Big Data frameworks such as Apache Hadoop and all the tools within its ecosystem are all Java-based, and still in use today in many enterprises. Not to mention the fact that Java is the most stable and production-ready language among all the languages we have discussed so far! Using Java to develop your Big Data applications gives you the ability to use a large ecosystem of tools and libraries for interoperability, monitoring and much more, most of which have already been tried and tested. One major drawback of Java is its verbosity. The fact that you have to write hundreds of lines of codes in Java for a task which can written in barely 15-20 lines of code in Python or Scala, can turnoff many budding programmers. However, the introduction of lambda functions in Java 8 does make life quite easier. Java also does not support iterative development unlike newer languages like Python, and this is an area of focus for the future Java releases. Despite the flaws, Java remains a strong contender when it comes to the preferred language for Big Data programming because of its history and the continued reliance on the traditional Big Data tools and frameworks. Why use Java for big data? Traditional Big Data tools and frameworks are written in Java Stable and production-ready Large ecosystem of tried and tested tools and libraries Go Last but not the least, there’s Go - one of the fastest rising programming languages in recent times. Designed by a group of Google engineers who were frustrated with C++, we think Go is a good shout in this list - simply because of the fact that it powers so many tools used in the Big Data infrastructure, including Kubernetes, Docker and many more. Go is fast, easy to learn, and fairly easy to develop applications with, not to mention deploy them. More importantly, as businesses look at building data analysis systems that can operate at scale, Go-based systems are being used to integrate machine learning and parallel processing of data. It is also possible to interface other languages with Go-based systems with relative ease. Why use Go for big data? Fast, easy to use Many tools used in the Big Data infrastructure are Go-based Efficient distributed computing There are a few other languages you might want to consider - Julia, SAS and MATLAB being some major ones which are useful in their own right. However, when compared to the languages we talked about above, we thought they fell a bit short in some aspects - be it speed, efficiency, ease of use, documentation, or community support, among other things. Let’s take a quick look at the comparison table of all the languages we discussed above. Note that we have used the ✓ symbol for the best possible language/s to help you make an informed decision. This is just our view, and that’s not to say that the other languages are any worse! Scala Python R Java Go Speed ✓ ✓ ✓ Ease of use ✓ ✓ ✓ Quick Learning curve ✓ ✓ Data Analysis capability ✓ ✓ ✓ General-purpose ✓ ✓ ✓ ✓ Big Data support ✓ ✓ ✓ ✓ ✓ Interfacing with other languages ✓ ✓ ✓ Production-ready ✓ ✓ ✓ So...which language should you choose? To answer the question in short - it all depends on the use-case you want to develop. If your focus is hardcore data analysis which involves a lot of statistical computing, R would be your go-to language. On the other hand, if you want to develop streaming applications for your Big Data, Scala can be a preferable choice. If you wish to use Machine Learning to leverage your Big Data and build predictive models, Python will come to your rescue. Lastly, if you plan to build Big Data solutions using just the traditionally-available tools, Java is the language for you. You also have the option of combining the power of two languages to get a more efficient and powerful solution. For example, you can train your machine learning model in Python and deploy it on Spark in a distributed mode. Ultimately, it all depends on how efficiently your solution can function, and more importantly, how fast and accurate it is. Which language do you prefer for crunching your Big Data? Do let us know!

0
1
41540

article-image-top-5-cybersecurity-assessment-tools-for-networking-professionals

Savia Lobo

07 Jun 2018

6 min read

Top 5 cybersecurity assessment tools for networking professionals

Savia Lobo

07 Jun 2018

6 min read

Security is one of the major concerns while setting up data centers in the cloud. Although firewalls and managed networking components are deployed by most of the organizations for their data centers, they still fear being attacked by intruders. As such, organizations constantly seek tools that can assist them in gauging how vulnerable their network is and how they can secure their applications therein. Many confuse security assessment with penetration testing and also use it interchangeably. However, there is a notable difference between the two. Security assessment is a process of finding out the different vulnerabilities within a system and prioritize them based on severity and business criticality. On the other hand, penetration testing simulates a real-life attack and maps out paths that a real attacker would take to fulfill the attack. You can check out our article, Top 5 penetration testing tools for ethical hackers to know about some of the pentesting tools. Plethora of tools in the market exist and every tool claims to be the best. Here is our top 5 list of tools to secure your organization over the network. Wireshark Wireshark is one of the popular tools for packet analysis. It is open source under GNU General Public License. Wireshark has a user-friendly GUI and supports Command Line Input (CLI). It is a great debugging tool for developers who wish to develop a network application. It runs on multiple platforms including Windows, Linux, Solaris, NetBSD, and so on. WireShark community also hosts SharkFest, launched in 2008, for WireShark developers and the user communities. The main aim of this conference is to support Wireshark development and to educate current and future generations of computer science and IT professionals on how to use this tool to manage, troubleshoot, diagnose, and secure traditional and modern networks. Some benefits of using this tool include: Wireshark features live real-time traffic analysis and also supports offline analysis. Depending on the platform, one can read live data from Ethernet, PPP/HDLC, USB, IEEE 802.11, Token Ring, and many others. Decryption support for several protocols such as IPsec, ISAKMP, Kerberos, SNMPv3, SSL/TLS, WEP, and WPA/WPA2 Network captured by this tool can be browsed via a GUI, or via the TTY-mode TShark utility. Wireshark also has the most powerful display filters in whole industry It also provides users with Tshark, a network protocol analyzer, used to analyze packets from the hosts without a UI. Nmap Network Mapper, popularly known as Nmap is an open source licensed tool for conducting network discovery and security auditing. It is also utilized for tasks such as network inventory management, monitoring host or service uptime, and much more. How Nmap works is, it uses raw IP packets in order to find out the available hosts on the network, the services they offer, the OS on which they are operating, the firewall that they are currently using and much more. Nmap is a quick essential to scan large networks and can also be used to scan single hosts. It runs on all major operating system. It also provides official binary packages for Windows, Linux, and Mac OS X. It also includes Zenmap - An advanced security scanner GUI and a results viewer Ncat - This is a tool used for data transfer, redirection, and debugging. Ndiff - A utility tool for comparing scan results Nping - A packet generation and response analysis tool Nmap is traditionally a command-line tool run from a Unix shell or Windows Command prompt. This makes Nmap easy for scripting and allows easy sharing of useful commands within the user community. With this, experts do not have to move through different configuration panels and scattered option fields. Nessus Nessus, a product of the Tenable.io, is one of the popular vulnerability scanners specifically for UNIX systems. This tool remains constantly updated with 70k+ plugins. Nessus is available in both free and paid versions. The paid version costs around $2,190 per year, whereas the free version, ‘Nessus Home’ offers limited usage and is licensed only for home network usage. Customers choose Nessus because It includes simple steps for policy creation and needs just a few clicks for scanning an entire corporate network. It offers vulnerability scanning at a low total cost of ownership (TCO) product One can carry out a quick and accurate scanning with lower false positives. It also has an embedded scripting language for users to write their own plugins and to understand the existing ones. QualysGuard QualysGuard is a famous SaaS (Software-as-a-Service) vulnerability management tool. It has a comprehensive vulnerability knowledge base, using which it is able to provide continuous protection against the latest worms and security threats. It proactively monitors all the network access points, due to which security managers can invest less time to research, scan, and fix network vulnerabilities. This helps organizations in avoiding network vulnerabilities before they could be exploited. It provides a detailed technical analysis of the threats via powerful and easy-to-read reports. The detailed report includes the security threat, the consequences faced if the vulnerability is exploited, and also a solution that recommends how the vulnerability can be fixed. One can get a summary of the overall security with QualysGuard’s executive dashboard. The dashboard displays a number of new, active, and re-opened vulnerabilities. It also displays a graph which showcases vulnerabilities based on severity level. Get to know more about QualysGuard on its official website. Core Impact Core Impact is widely used as a comprehensive tool to assess and test security vulnerability within any organization. It includes a large database of professional exploits and is regularly updated. It assists in cleanly exploiting one machine and later creating an encrypted tunnel through it to exploit other machines. Core Impact provides a controlled environment to mimic bad attacks. This helps one to secure their network before the occurrence of an actual attack. One interesting feature of Core Impact is that one can fully test their network, irrespective of the length, quickly and efficiently. These are five popular tools network security professionals use for assessing their networks. However, there are many other tools such as Netsparker, OpenVAS, Nikto, and many more for assessing the security of their network. Every security assessment tool is unique in its own way. However, it all boils down to one’s own expertise and the experience they have, and also the kind of project environment it is used in. Top 5 penetration testing tools for ethical hackers Intel’s Spectre variant 4 patch impacts CPU performance Pentest tool in focus: Metasploit

0
0
40689

Julian Ursell

29 Jan 2015

3 min read

Promising DevOps Projects

Julian Ursell

29 Jan 2015

3 min read

The DevOps movement is currently driving a wave of innovations in technology, which are contributing to the development of powerful systems and software development architectures, as well as generating a transformation in “systems thinking”. Game changers like Docker have revolutionized the way system engineers, administrators, and application developers approach their jobs, and there is now a concerted push to iterate on the new paradigms that have emerged. The crystallization of containerization virtualization methods is producing a different perspective on service infrastructures, enabling a greater modularity and fine-grained-ness not so imaginable a decade ago. Powerful configuration management tools such as Chef, Puppet, and Ansible allow for infrastructure to be defined literally as code. The flame of innovation is burning brightly in this space and the concept of the “DevOps engineer” is becoming a reality and not the idealistic myth it appeared to be before. Now that DevOps know roughly where they're going, a feverish development drive is gathering pace with projects looking to build upon the flagship technologies that contributed to the initial spark. The next few years are going to be fascinating in terms of seeing how the DevOps foundations laid down will be built upon moving forward. The major foundation of modern DevOps development is the refinement of the concept and implementation of containerization. Docker has demonstrated how it can be leveraged to host, run, and deploy applications, servers, and services in an incredibly lightweight fashion, abstracting resources by isolating parts of the operating system in separate containers. The sea change in thinking this has created has been resounding. Still, however, a particular challenge for DevOps engineers working at scale with containers is developing effective orchestration services. Enter Kubernetes (apparently meaning “helmsman” in Greek), the project open sourced by Google for the orchestration and management of container clusters. The value of Kubernetes is that it works alongside Docker, building beyond simply booting containers to allow a finer degree of management and monitoring. It utilizes units called “pods” that facilitate communication and data sharing between Docker containers and the grouping of application-specific containers. The Docker project has actually taken the orchestration service Fig under its wing for further development, but there are a myriad of ways in which containers can be orchestrated. Kubernetes illustrates how the wave of DevOps-oriented technologies like Docker are driving large scale companies to open source their own solutions, and contribute to the spirit of open source collaboration that underlines the movement. Other influences of DevOps can be seen on the reappraisal of operating system architectures. CoreOS,for example, is a Linux distribution that has been designed with scale, flexibility, and lightweight resource consumption in mind. It hosts applications as Docker containers, and makes the development of large scale distributed systems easier by making it “natively” clustered, meaning it is adapted naturally for use over multiple machines. Under the hood it offers powerful tools including Fleet (CoreOS' cluster orchestration system) and Etcd for service discovery and information sharing between cluster nodes. A tool to watch out for in the future is Terraform (built by the same team behind Vagrant), which offers at its core the ability to build infrastructures with combined resources from multiple service providers, such as Digital Ocean, AWS, and Heroku, describing this infrastructure as code with an abstracted configuration syntax. It will be fascinating to see whether Terraform catches on and becomes opened up to a greater mass of major service providers. Kubernetes, CoreOS, and Terraform all convey the immense development pull generated by the DevOps movement, and one that looks set to roll on for some time yet.

0
0
40410

article-image-what-role-does-linux-play-in-securing-android-devices

Sugandha Lahoti

07 Oct 2018

9 min read

What role does Linux play in securing Android devices?

Sugandha Lahoti

07 Oct 2018

9 min read

In this article, we will talk about the Android Model particularly the Linux Kernel layer, over which Android is built. We will also talk about Android's security features and offerings and how Linux plays a role to secure Android OS. This article is taken from the book Practical Mobile Forensics - Third Edition by Rohit Tamma et al. In this book, you will investigate, analyze, and report iOS, Android, and Windows devices. The Android architecture Android is open source and the code is released under the Apache license. Practically, this means anyone (especially device manufacturers) can access it, freely modify it, and use the software according to the requirements of any device. This is one of the primary reasons for its wide acceptance. Notable players that use Android include Samsung, HTC, Sony, and LG. As with any other platform, Android consists of a stack of layers running one above the other. To understand the Android ecosystem, it's essential to have a basic understanding of what these layers are and what they do. The following figure summarizes the various layers involved in the Android software stack: Android architecture Each of these layers performs several operations that support specific operating system functions. Each layer provides services to the layers lying on top of it. The Linux kernel layer Android OS is built on top of the Linux kernel, with some architectural changes made by Google. There are several reasons for choosing the Linux kernel. Most importantly, Linux is a portable platform that can be compiled easily on different hardware. The kernel acts as an abstraction layer between the software and hardware present on the device. Consider the case of a camera click. What happens when you take a photo using the camera button on your device? At some point, the hardware instruction (pressing a button) has to be converted to a software instruction (to take a picture and store it in the gallery). The kernel contains drivers to facilitate this process. When the user presses on the button, the instruction goes to the corresponding camera driver in the kernel, which sends the necessary commands to the camera hardware, similar to what occurs when a key is pressed on a keyboard. In simple words, the drivers in the kernel command control the underlying hardware. The Linux kernel is responsible for managing the core functionality of Android, such as process management, memory management, security, and networking. Linux is a proven platform when it comes to security and process management. Android has taken leverage of the existing Linux open source OS to build a solid foundation for its ecosystem. Each version of Android has a different version of the underlying Linux kernel. The Marshmallow Android version is known to use Linux kernel 3.18.10, whereas the Nougat version is known to use Linux kernel 4.4.1. Android security Android was designed with a specific focus on security. Android as a platform offers and enforces certain features that safeguard the user data present on the mobile through multi-layered security. There are certain safe defaults that will protect the user, and certain offerings that can be leveraged by the development community to build secure applications. The following are issues that are to be kept in mind while incorporating Android security controls: Protecting user-related data Safeguarding the system resources Making sure that one application cannot access the data of another application The next few sections will help us understand more about Android's security features and offerings. Secure kernel Linux has evolved as a trusted platform over the years, and Android has leveraged this fact using it as its kernel. The user-based permission model of Linux has, in fact, worked well for Android. As mentioned earlier, there is a lot of specific code built into the Linux kernel. With each Android version release, the kernel version has also changed. The following table shows Android versions and their corresponding kernel versions: Android version Linux kernel version 1 2.6.25 1.5 2.6.27 1.6 2.6.29 2.2 2.6.32 2.3 2.6.35 3.0 2.6.36 4.0 3.0.1 4.1 3.0.31 4.2 3.4.0 4.2 3.4.39 4.4 3.8 5.0 3.16.1 6.0 3.18.1 7.0 4.4.1 The permission model As shown in the following screenshot, any Android application must be granted permissions to access sensitive functionality, such as the internet, dialer, and so on, by the user. This provides an opportunity for the user to know in advance which functions on the device is being accessed by the application. Simply put, it requires the user's permission to perform any kind of malicious activity (stealing data, compromising the system, and so on). This model helps the user to prevent attacks, but if the user is unaware and gives away a lot of permissions, it leaves them in trouble (remember, when it comes to installing malware on any device, the weakest link is always the user). Until Android 6.0, users needed to grant the permissions during install time. Users had to either accept all the permissions or not install the application. But, starting from Android 6.0, users grant permissions to apps while the app is running. This new permission system also gives the user more control over the app's functionality by allowing the user to grant selective permissions. For example, a user can deny a particular app access to his location but provide access to the internet. The user can revoke the permissions at any time by going to the app's Settings screen. Application sandbox In Linux systems, each user is assigned a unique user ID (UID), and users are segregated so that one user cannot access the data of another user. However, all applications under a particular user are run with the same privileges. Similarly, in Android, each application runs as a unique user. In other words, a UID is assigned to each application and is run as a separate process. This concept ensures an application sandbox at the kernel level. The kernel manages the security restrictions between the applications by making use of existing Linux concepts, such as UID and GID. If an application attempts to do something malicious, say to read the data of another application, this is not permitted as the application does not have user privileges. Hence, the operating system protects an application from accessing the data of another application. Secure inter-process communication Android offers secure inter-process communication through which one's activity in an application can send messages to another activity in the same application or a different application. To achieve this, Android provides inter-process communication (IPC) mechanisms: intents, services, content providers, and so on. Application signing It is mandatory that all of the installed applications are digitally signed. Developers can place their applications in Google's Play Store only after signing the applications. The private key with which the application is signed is held by the developer. Using the same key, a developer can provide updates to their application, share data between the applications, and so on. Security-Enhanced Linux Security-Enhanced Linux (SELinux) is a security feature that was introduced in Android 4.3 and fully enforced in Android 5.0. Until this addition, Android security was based on Discretionary Access Control (DAC), which means applications can ask for permissions, and users can grant or deny those permissions. Thus, malware can create havoc on phones by gaining those permissions. But, SE Android uses Mandatory Access Control (MAC), which ensures that applications work in isolated environments. Hence, even if a user installs a malware app, the malware cannot access the OS and corrupt the device. SELinux is used to enforce MAC over all the processes, including the ones running with root privileges. SELinux operates on the principle of default denial: anything that is not explicitly allowed is denied. SELinux can operate in one of the two global modes: permissive mode, in which permission denials are logged but not enforced, and enforcing mode, in which denials are both logged and enforced. Full Disk Encryption With Android 6.0 Marshmallow, Google has mandated Full Disk Encryption (FDE) for most devices, provided that the hardware meets certain minimum standards. Encryption is the process of converting data into cipher text using a secret key. On Android devices, full disk encryption refers to the process of encrypting all user data using a secret key. This key is then encrypted by the lock screen PIN/pattern/password before being securely stored in a trusted location. Once a device is encrypted, all user-created data is automatically encrypted before writing it to disk, and all reads automatically decrypt data before returning it to the calling process. Full disk encryption in Android works only with an Embedded Multimedia Card (eMMC) and similar flash devices that present themselves to the kernel as block devices. Staring from Android 7.x, Google decided to shift the encryption feature from full-disk encryption to file-based encryption. In file-based encryption, different files are encrypted with different keys. By doing so, those files can be unlocked independently without requiring an entire partition to be decrypted at once. As a result of this, the system can now decrypt and use files needed to boot the system, and open notifications without having to wait until the user unlocks the phone. Trusted Execution Environment Trusted Execution Environment (TEE) is an isolated area (typically a separate microprocessor) intended to guarantee the security of data stored inside it, and also to execute code with integrity. The main processor on mobile devices is considered untrusted and cannot be used to store secret data (such as cryptographic keys). Hence, TEE is used specifically to perform such operations, and the software running on the main processor delegates any operations that require the use of secret data to the TEE processor. Thus we talked about the Linux Kernel layer, over which Android is built. We also talked about Android's security features and offerings and how Linux plays a role to secure Android OS. To learn more about methods for accessing the data stored on Android devices, read our book Practical Mobile Forensics - Third Edition. The kernel community attempting to make Linux more secure. Google open sources Filament – a physically based rendering engine for Android, Windows, Linux and macOS Google becomes a new platinum member of the Linux Foundation

0
0
40325

article-image-most-commonly-used-java-machine-learning-libraries

Fatema Patrawala

10 Sep 2018

15 min read

6 most commonly used Java Machine learning libraries

Fatema Patrawala

10 Sep 2018

15 min read

There are over 70 Java-based open source machine learning projects listed on the MLOSS.org website and probably many more unlisted projects live at university servers, GitHub, or Bitbucket. In this article, we will review the major machine learning libraries and platforms in Java, the kind of problems they can solve, the algorithms they support, and the kind of data they can work with. This article is an excerpt taken from Machine learning in Java, written by Bostjan Kaluza and published by Packt Publishing Ltd. Weka Weka, which is short for Waikato Environment for Knowledge Analysis, is a machine learning library developed at the University of Waikato, New Zealand, and is probably the most well-known Java library. It is a general-purpose library that is able to solve a wide variety of machine learning tasks, such as classification, regression, and clustering. It features a rich graphical user interface, command-line interface, and Java API. You can check out Weka at http://www.cs.waikato.ac.nz/ml/weka/. At the time of writing this book, Weka contains 267 algorithms in total: data pre-processing (82), attribute selection (33), classification and regression (133), clustering (12), and association rules mining (7). Graphical interfaces are well-suited for exploring your data, while Java API allows you to develop new machine learning schemes and use the algorithms in your applications. Weka is distributed under GNU General Public License (GNU GPL), which means that you can copy, distribute, and modify it as long as you track changes in source files and keep it under GNU GPL. You can even distribute it commercially, but you must disclose the source code or obtain a commercial license. In addition to several supported file formats, Weka features its own default data format, ARFF, to describe data by attribute-data pairs. It consists of two parts. The first part contains header, which specifies all the attributes (that is, features) and their type; for instance, nominal, numeric, date, and string. The second part contains data, where each line corresponds to an instance. The last attribute in the header is implicitly considered as the target variable, missing data are marked with a question mark. For example, the Bob instance written in an ARFF file format would be as follows: @RELATION person_dataset @ATTRIBUTE `Name` STRING @ATTRIBUTE `Height` NUMERIC @ATTRIBUTE `Eye color`{blue, brown, green} @ATTRIBUTE `Hobbies` STRING @DATA 'Bob', 185.0, blue, 'climbing, sky diving' 'Anna', 163.0, brown, 'reading' 'Jane', 168.0, ?, ? The file consists of three sections. The first section starts with the @RELATION <String> keyword, specifying the dataset name. The next section starts with the @ATTRIBUTE keyword, followed by the attribute name and type. The available types are STRING, NUMERIC, DATE, and a set of categorical values. The last attribute is implicitly assumed to be the target variable that we want to predict. The last section starts with the @DATA keyword, followed by one instance per line. Instance values are separated by comma and must follow the same order as attributes in the second section. Weka's Java API is organized in the following top-level packages: weka.associations: These are data structures and algorithms for association rules learning, including Apriori, predictive apriori, FilteredAssociator, FP-Growth, Generalized Sequential Patterns (GSP), Hotspot, and Tertius. weka.classifiers: These are supervised learning algorithms, evaluators, and data structures. Thepackage is further split into the following components: weka.classifiers.bayes: This implements Bayesian methods, including naive Bayes, Bayes net, Bayesian logistic regression, and so on weka.classifiers.evaluation: These are supervised evaluation algorithms for nominal and numerical prediction, such as evaluation statistics, confusion matrix, ROC curve, and so on weka.classifiers.functions: These are regression algorithms, including linear regression, isotonic regression, Gaussian processes, support vector machine, multilayer perceptron, voted perceptron, and others weka.classifiers.lazy: These are instance-based algorithms such as k-nearest neighbors, K*, and lazy Bayesian rules weka.classifiers.meta: These are supervised learning meta-algorithms, including AdaBoost, bagging, additive regression, random committee, and so on weka.classifiers.mi: These are multiple-instance learning algorithms, such as citation k-nn, diverse density, MI AdaBoost, and others weka.classifiers.rules: These are decision tables and decision rules based on the separate-and-conquer approach, Ripper, Part, Prism, and so on weka.classifiers.trees: These are various decision trees algorithms, including ID3, C4.5, M5, functional tree, logistic tree, random forest, and so on weka.clusterers: These are clustering algorithms, including k-means, Clope, Cobweb, DBSCAN hierarchical clustering, and farthest. weka.core: These are various utility classes, data presentations, configuration files, and so on. weka.datagenerators: These are data generators for classification, regression, and clustering algorithms. weka.estimators: These are various data distribution estimators for discrete/nominal domains, conditional probability estimations, and so on. weka.experiment: These are a set of classes supporting necessary configuration, datasets, model setups, and statistics to run experiments. weka.filters: These are attribute-based and instance-based selection algorithms for both supervised and unsupervised data preprocessing. weka.gui: These are graphical interface implementing explorer, experimenter, and knowledge flowapplications. Explorer allows you to investigate dataset, algorithms, as well as their parameters, and visualize dataset with scatter plots and other visualizations. Experimenter is used to design batches of experiment, but it can only be used for classification and regression problems. Knowledge flows implements a visual drag-and-drop user interface to build data flows, for example, load data, apply filter, build classifier, and evaluate. Java-ML for machine learning Java machine learning library, or Java-ML, is a collection of machine learning algorithms with a common interface for algorithms of the same type. It only features Java API, therefore, it is primarily aimed at software engineers and programmers. Java-ML contains algorithms for data preprocessing, feature selection, classification, and clustering. In addition, it features several Weka bridges to access Weka's algorithms directly through the Java-ML API. It can be downloaded from http://java-ml.sourceforge.net; where, the latest release was in 2012 (at the time of writing this book). Java-ML is also a general-purpose machine learning library. Compared to Weka, it offers more consistent interfaces and implementations of recent algorithms that are not present in other packages, such as an extensive set of state-of-the-art similarity measures and feature-selection techniques, for example, dynamic time warping, random forest attribute evaluation, and so on. Java-ML is also available under the GNU GPL license. Java-ML supports any type of file as long as it contains one data sample per line and the features are separated by a symbol such as comma, semi-colon, and tab. The library is organized around the following top-level packages: net.sf.javaml.classification: These are classification algorithms, including naive Bayes, random forests, bagging, self-organizing maps, k-nearest neighbors, and so on net.sf.javaml.clustering: These are clustering algorithms such as k-means, self-organizing maps, spatial clustering, Cobweb, AQBC, and others net.sf.javaml.core: These are classes representing instances and datasets net.sf.javaml.distance: These are algorithms that measure instance distance and similarity, for example, Chebyshev distance, cosine distance/similarity, Euclidian distance, Jaccard distance/similarity, Mahalanobis distance, Manhattan distance, Minkowski distance, Pearson correlation coefficient, Spearman's footrule distance, dynamic time wrapping (DTW), and so on net.sf.javaml.featureselection: These are algorithms for feature evaluation, scoring, selection, and ranking, for instance, gain ratio, ReliefF, Kullback-Liebler divergence, symmetrical uncertainty, and so on net.sf.javaml.filter: These are methods for manipulating instances by filtering, removing attributes, setting classes or attribute values, and so on net.sf.javaml.matrix: This implements in-memory or file-based array net.sf.javaml.sampling: This implements sampling algorithms to select a subset of dataset net.sf.javaml.tools: These are utility methods on dataset, instance manipulation, serialization, Weka API interface, and so on net.sf.javaml.utils: These are utility methods for algorithms, for example, statistics, math methods, contingency tables, and others Apache Mahout The Apache Mahout project aims to build a scalable machine learning library. It is built atop scalable, distributed architectures, such as Hadoop, using the MapReduce paradigm, which is an approach for processing and generating large datasets with a parallel, distributed algorithm using a cluster of servers. Mahout features console interface and Java API to scalable algorithms for clustering, classification, and collaborative filtering. It is able to solve three business problems: item recommendation, for example, recommending items such as people who liked this movie also liked…; clustering, for example, of text documents into groups of topically-related documents; and classification, for example, learning which topic to assign to an unlabeled document. Mahout is distributed under a commercially-friendly Apache License, which means that you can use it as long as you keep the Apache license included and display it in your program's copyright notice. Mahout features the following libraries: org.apache.mahout.cf.taste: These are collaborative filtering algorithms based on user-based and item-based collaborative filtering and matrix factorization with ALS org.apache.mahout.classifier: These are in-memory and distributed implementations, includinglogistic regression, naive Bayes, random forest, hidden Markov models (HMM), and multilayer perceptron org.apache.mahout.clustering: These are clustering algorithms such as canopy clustering, k-means, fuzzy k-means, streaming k-means, and spectral clustering org.apache.mahout.common: These are utility methods for algorithms, including distances, MapReduce operations, iterators, and so on org.apache.mahout.driver: This implements a general-purpose driver to run main methods of other classes org.apache.mahout.ep: This is the evolutionary optimization using the recorded-step mutation org.apache.mahout.math: These are various math utility methods and implementations in Hadoop org.apache.mahout.vectorizer: These are classes for data presentation, manipulation, andMapReduce jobs Apache Spark Apache Spark, or simply Spark, is a platform for large-scale data processing builds atop Hadoop, but, in contrast to Mahout, it is not tied to the MapReduce paradigm. Instead, it uses in-memory caches to extract a working set of data, process it, and repeat the query. This is reported to be up to ten times as fast as a Mahout implementation that works directly with disk-stored data. It can be grabbed from https://spark.apache.org. There are many modules built atop Spark, for instance, GraphX for graph processing, Spark Streaming for processing real-time data streams, and MLlib for machine learning library featuring classification, regression, collaborative filtering, clustering, dimensionality reduction, and optimization. Spark's MLlib can use a Hadoop-based data source, for example, Hadoop Distributed File System (HDFS) or HBase, as well as local files. The supported data types include the following: Local vector is stored on a single machine. Dense vectors are presented as an array of double-typed values, for example, (2.0, 0.0, 1.0, 0.0); while sparse vector is presented by the size of the vector, an array of indices, and an array of values, for example, [4, (0, 2), (2.0, 1.0)]. Labeled point is used for supervised learning algorithms and consists of a local vector labeled with a double-typed class values. Label can be class index, binary outcome, or a list of multiple class indices (multiclass classification). For example, a labeled dense vector is presented as [1.0, (2.0, 0.0, 1.0, 0.0)]. Local matrix stores a dense matrix on a single machine. It is defined by matrix dimensions and a single double-array arranged in a column-major order. Distributed matrix operates on data stored in Spark's Resilient Distributed Dataset (RDD), which represents a collection of elements that can be operated on in parallel. There are three presentations: row matrix, where each row is a local vector that can be stored on a single machine, row indices are meaningless; and indexed row matrix, which is similar to row matrix, but the row indices are meaningful, that is, rows can be identified and joins can be executed; and coordinate matrix, which is used when a row cannot be stored on a single machine and the matrix is very sparse. Spark's MLlib API library provides interfaces to various learning algorithms and utilities as outlined in the following list: org.apache.spark.mllib.classification: These are binary and multiclass classification algorithms, including linear SVMs, logistic regression, decision trees, and naive Bayes org.apache.spark.mllib.clustering: These are k-means clustering org.apache.spark.mllib.linalg: These are data presentations, including dense vectors, sparse vectors, and matrices org.apache.spark.mllib.optimization: These are the various optimization algorithms used as low-level primitives in MLlib, including gradient descent, stochastic gradient descent, update schemes for distributed SGD, and limited-memory BFGS org.apache.spark.mllib.recommendation: These are model-based collaborative filtering implemented with alternating least squares matrix factorization org.apache.spark.mllib.regression: These are regression learning algorithms, such as linear least squares, decision trees, Lasso, and Ridge regression org.apache.spark.mllib.stat: These are statistical functions for samples in sparse or dense vector format to compute the mean, variance, minimum, maximum, counts, and nonzero counts org.apache.spark.mllib.tree: This implements classification and regression decision tree-learning algorithms org.apache.spark.mllib.util: These are a collection of methods to load, save, preprocess, generate, and validate the data Deeplearning4j Deeplearning4j, or DL4J, is a deep-learning library written in Java. It features a distributed as well as a single-machinedeep-learning framework that includes and supports various neural network structures such as feedforward neural networks, RBM, convolutional neural nets, deep belief networks, autoencoders, and others. DL4J can solve distinct problems, such as identifying faces, voices, spam or e-commerce fraud. Deeplearning4j is also distributed under Apache 2.0 license and can be downloaded from http://deeplearning4j.org. The library is organized as follows: org.deeplearning4j.base: These are loading classes org.deeplearning4j.berkeley: These are math utility methods org.deeplearning4j.clustering: This is the implementation of k-means clustering org.deeplearning4j.datasets: This is dataset manipulation, including import, creation, iterating, and so on org.deeplearning4j.distributions: These are utility methods for distributions org.deeplearning4j.eval: These are evaluation classes, including the confusion matrix org.deeplearning4j.exceptions: This implements exception handlers org.deeplearning4j.models: These are supervised learning algorithms, including deep belief network, stacked autoencoder, stacked denoising autoencoder, and RBM org.deeplearning4j.nn: These are the implementation of components and algorithms based on neural networks, such as neural network, multi-layer network, convolutional multi-layer network, and so on org.deeplearning4j.optimize: These are neural net optimization algorithms, including back propagation, multi-layer optimization, output layer optimization, and so on org.deeplearning4j.plot: These are various methods for rendering data org.deeplearning4j.rng: This is a random data generator org.deeplearning4j.util: These are helper and utility methods MALLET Machine Learning for Language Toolkit (MALLET), is a large library of natural language processing algorithms and utilities. It can be used in a variety of tasks such as document classification, document clustering, information extraction, and topic modeling. It features command-line interface as well as Java API for several algorithms such as naive Bayes, HMM, Latent Dirichlet topic models, logistic regression, and conditional random fields. MALLET is available under Common Public License 1.0, which means that you can even use it in commercial applications. It can be downloaded from http://mallet.cs.umass.edu. MALLET instance is represented by name, label, data, and source. However, there are two methods to import data into the MALLET format, as shown in the following list: Instance per file: Each file, that is, document, corresponds to an instance and MALLET accepts the directory name for the input. Instance per line: Each line corresponds to an instance, where the following format is assumed: the instance_name label token. Data will be a feature vector, consisting of distinct words that appear as tokens and their occurrence count. The library comprises the following packages: cc.mallet.classify: These are algorithms for training and classifying instances, including AdaBoost, bagging, C4.5, as well as other decision tree models, multivariate logistic regression, naive Bayes, and Winnow2. cc.mallet.cluster: These are unsupervised clustering algorithms, including greedy agglomerative, hill climbing, k-best, and k-means clustering. cc.mallet.extract: This implements tokenizers, document extractors, document viewers, cleaners, and so on. cc.mallet.fst: This implements sequence models, including conditional random fields, HMM, maximum entropy Markov models, and corresponding algorithms and evaluators. cc.mallet.grmm: This implements graphical models and factor graphs such as inference algorithms, learning, and testing. For example, loopy belief propagation, Gibbs sampling, and so on. cc.mallet.optimize: These are optimization algorithms for finding the maximum of a function, such as gradient ascent, limited-memory BFGS, stochastic meta ascent, and so on. cc.mallet.pipe: These are methods as pipelines to process data into MALLET instances. cc.mallet.topics: These are topics modeling algorithms, such as Latent Dirichlet allocation, four-level pachinko allocation, hierarchical PAM, DMRT, and so on. cc.mallet.types: This implements fundamental data types such as dataset, feature vector, instance, and label. cc.mallet.util: These are miscellaneous utility functions such as command-line processing, search, math, test, and so on. To design, build, and deploy your own machine learning applications by leveraging key Java machine learning libraries, check out this book Machine learning in Java, published by Packt Publishing. 5 JavaScript machine learning libraries you need to know A non programmer’s guide to learning Machine learning Why use JavaScript for machine learning?

0
0
40195

article-image-deploy-self-service-business-intelligence-qlik-sense

Amey Varangaonkar

31 May 2018

7 min read

Best practices for deploying self-service BI with Qlik Sense

Amey Varangaonkar

31 May 2018

7 min read

As part of a successful deployment of Qlik Sense, it is important IT recognizes self-service Business Intelligence to have its own dynamics and adoption rules. The various use cases and subsequent user groups thus need to be assessed and captured. Governance should always be present but power users should never get the feeling that they are restricted. Once they are won over, the rest of the traction and the adoption of other user types is very easy. In this article, we will look at the most important points to keep in mind while deploying self-service with Qlik Sense. The following excerpt is taken from the book Mastering Qlik Sense, authored by Martin Mahler and Juan Ignacio Vitantonio. This book demonstrates useful techniques to design useful and highly profitable Business Intelligence solutions using Qlik Sense. Here's the list of points to be kept in mind: Qlik Sense is not QlikView Not even nearly. The biggest challenge and fallacy is that the organization was sold, by Qlik or someone else, just the next version of the tool. It did not help at all that Qlik itself was working for years on Qlik Sense under the initial product name Qlik.Next. Whatever you are being told, however, it is being sold to you, Qlik Sense is at best the cousin of QlikView. Same family, but no blood relation. Thinking otherwise sets the wrong expectation so the business gives the wrong message to stakeholders and does not raise awareness to IT that self-service BI cannot be deployed in the same fashion as guided analytics, QlikView in this case. Disappointment is imminent when stakeholders realize Qlik Sense cannot replicate their QlikView dashboards. Simply installing Qlik Sense does not create a self-service BI environment Installing Qlik Sense and giving users access to the tool is a start but there is more to it than simply installing it. The infrastructure requires design and planning, data quality processing, data collection, and determining who intends to use the platform to consume what type of data. If data is not available and accessible to the user, data analytics serve no purpose. Make sure a data warehouse or similar is in place and the business has a use case for self-service data analytics. A good indicator for this is when the business or project works with a lot of data, and there are business users who have lots of Excel spreadsheets lying around analyzing it in different ways. That’s your best case candidate for Qlik Sense. IT to monitor Qlik Sense environment rather control IT needs to unlearn to learn new things and the same applies when it comes to deploying self-service. Create a framework with guidelines and principles and monitor that users are following it, rather than limiting them in their capabilities. This framework needs to have the input of the users as well and to be elastic. Also, not many IT professionals agree with giving away too much power to the user in the development process, believing this leads to chaos and anarchy. While the risk is there, this fear needs to be overcome. Users love data analytics, and they are keen to get the help of IT to create the most valuable dashboard possible and ensure it will be well received by a wide audience. Identifying key users and user groups is crucial For a strong adoption of the tool, IT needs to prepare the environment and identify the key power users in the organization and to win them over to using the technology. It is important they are intensively supported, especially in the beginning, and they are allowed to drive how the technology should be used rather than having principles imposed on them. Governance should always be present but power users should never get the feeling they are restricted by it. Because once they are won over, the rest of the traction and the adoption of other user types is very easy. Qlik Sense sells well–do a lot of demos Data analytics, compelling visualizations, and the interactivity of Qlik Sense is something almost everyone is interested in. The business wants to see its own data aggregated and distilled in a cool and glossy dashboard. Utilize the momentum and do as many demos as you can to win advocates of the technology and promote a consciousness of becoming a data-driven culture in the organization. Even the simplest Qlik Sense dashboards amaze people and boost their creativity for use cases where data analytics in their area could apply and create value. Promote collaboration Sharing is caring. This not only applies to insights, which naturally are shared with the excitement of having found out something new and valuable, but also to how the new insight has been derived. People keep their secrets on the approach and methodology to themselves, but this is counterproductive. It is important that applications, visualizations, and dashboards created with Qlik Sense are shared and demonstrated to other Qlik Sense users as frequently as possible. This not only promotes a data-driven culture but also encourages the collaboration of users and teams across various business functions, which would not have happened otherwise. They could either be sharing knowledge, tips, and tricks or even realizing they look at the same slices of data and could create additional value by connecting them together. Market the success of Qlik Sense within the organization If Qlik Sense has had a successful achievement in a project, tell others about it. Create a success story and propose doing demos of the dashboard and its analytics. IT has been historically very bad in promoting their work, which is counterproductive. Data analytics creates value and there is nothing embarrassing about boasting about its success; as Muhammad Ali suggested, it’s not bragging if it’s true. Introduce guidelines on design and terminology Avoiding the pitfalls of having multiple different-looking dashboards by promoting a consistent branding look across all Qlik Sense dashboards and applications, including terminology and best practices. Ensure the document is easily accessible to all users. Also, create predesigned templates with some sample sheets so the users duplicate them and modify them to their liking and extend them, applying the same design. Protect less experienced users from complexities Don’t overwhelm users if they have never developed in their life. Approach less technically savvy users in a different way by providing them with sample data and sample templates, including a library of predefined visualizations, dimensions, or measures (so-called Master Key Items). Be aware that what is intuitive to Qlik professionals or power users is not necessarily intuitive to other users – be patient and appreciative of their feedback, and try to understand how a typical business user might think. For a strong adoption of the tool, IT needs to prepare the environment and identify the key power users in the organization and win them over to using the technology. It is important they are intensively supported, especially in the beginning, and they are allowed to drive how the technology should be used rather than having principles imposed on them. If you found the excerpt useful, make sure you check out the book Mastering Qlik Sense to learn more of these techniques on efficient Business Intelligence using Qlik Sense. Read more How Qlik Sense is driving self-service Business Intelligence Overview of a Qlik Sense® Application’s Life Cycle What we learned from Qlik Qonnections 2018

0
0
39763

article-image-4-key-benefits-of-using-firebase-for-mobile-app-development

Guest Contributor

19 Oct 2018

6 min read

4 key benefits of using Firebase for mobile app development

Guest Contributor

19 Oct 2018

6 min read

A powerful backend solution is essential for building sophisticated mobile apps. In recent years, Firebase has emerged to prominence as a power-packed Backend-as-a-Solution (BaaS), thanks to its wide-ranging features and performance boosting elements. After being acquired in 2014 by Google, several of its features further got a performance boost. These features have made Firebase quite a popular backend solution for app developers and other emerging IT sectors. Let us look at its 4 key benefits for cross-platform mobile app development. Unleashing the power of Google Analytics Google Analytics for Firebase is a completely free solution with unconstrained reporting on many aspects. The reporting feature allows you to evaluate client behavior, report on broken links, user interactions and all other aspects of user experience and user interface. The reporting helps developers make informed decisions while optimizing the UI and the app performance. The unmatched scale of reporting: Firebase analytics allows access to unlimited reports on as many as 500 different events. The developers can also create custom events for reporting as their need suits. Robust audience segmentation: The Firebase analytics also allows segmenting the app audience on different parameters and grounds. The integrated console allows segmenting the audience on the basis of device information, custom events, and user characteristics. Crash reporting to fix Bugs Firebase also helps to address performance issues of an app by fixing bugs right from its backend solution. It is also equipped with robust crash reporting feature. Its crash reporting helps to deliver intricate and detailed bug and crash reports to address all the coding errors in an app. The reporting feature is capable of grouping together the issues in different categories as per the characteristics of the problem. Here are some of the attributes of this reporting feature. Monitoring errors: It is capable of monitoring fatal errors for iOS apps and both fatal and non-fatal errors for Android apps. Generally, reports are initiated as per the impact caused by such errors on the user experience. Required data collection to fix errors: The reports also enlist all the details concerning the device in use, performance shortfalls and user scenarios concerning the erroneous events. According to the contributing factors and other similarities, the issues are grouped in different categories. Email alerts: It also allows sending email alerts as and when such issues or problems are detected. The configuration of error reporting: The error reporting can also be configured remotely to control who can access the reports and list of events that occurred before an event. It is free: Crash and bug reporting is free with Firebase. You don't need to pay a penny to access this feature. Synchronizing data with real-time database With Firebase you can sync the offline and online data through NoSQL database. This makes the application data available on both offline and online states of the app. This boosts collaboration on the application data in real time. Here are some of its benefits. Real-time: Unlike the so-called HTTP requests that work to update the data across interfaces, the Real-time Database of firebase syncs data with every change thus helping to reflect the change in real time across any device in use. Offline: As Firebase Real-time Database SDK helps save your data in local disk, you can always access the data offline. As and when connectivity is back, the changes are synced with the present state of the server. Access from multiple devices: The Firebase Real-time Database allows accessing application data from multiple devices and interfaces including mobile devices and web. Splitting and scaling your data: Thanks to Firebase Real-time Database, you can split your data across multiple databases within the same project and set rules for each database instances. Firebase is feature rich for futuristic app development In addition to the above, Firebase is fully empowered with a host of rich features required for building sophisticated and most feature-rich mobile apps. Let us have a look at some of the key features of Firebase that made it a reliable platform for cross-platform development. Hosting: The hosting feature of Firebase allows developers to update their contents in the Content Delivery Network (CDN) during production. Firebase offers full hosting support with a custom domain, Global CDN, and an automatically provided SSL Certificate. Authentication: Firebase backend service offers a powerful authentication feature. It comes equipped with simple SDKs and easy to use libraries to integrate authentication feature with any mobile app. Storage: Firebase storage feature is powered by Google Cloud Storage and allows users to easily download media files and visual contents. This feature is also helpful in making use of user-generated content. Cloud Messaging: With Cloud Messaging, a mobile app powered can easily send a message to users and indulge in real-time communication. Remote Configuration: This feature of Firebase allows developers to incorporate certain changes in the app remotely. Thanks to this, the changes are reflected in the existing version, and the user does not need to download the latest updated version. Test Lab: With Test lab, developers can easily test the app in all the devices listed in the Google data center. It can even do the testing without requiring any test code of the respective app. Notifications: This feature gives developers a console to manage and send user-focused custom notifications to the users. App Indexing: This feature allows developers to index the app in Google Search and achieve higher search ranks in app marketplaces like Play Store and App Store. Dynamic Links: Firebase also equips the app to create dynamic links or smart URLs to present the respective app across all digital platforms including social media, mobile app, web, email, and other channels. All the above-mentioned benefits and useful features that empower mobile app developers to create dynamic user experience helped Firebase achieve such unprecedented popularity among developers worldwide. No wonder, in a short time span it has become a very popular backend solution for so many successful cross-platform mobile apps. Some exemplary use cases of Firebases Here we have picked two use cases of Firebase, respectively for one relatively new and successful app and one leading app in its niche. Fabulous Fabulous is a unique app that trains users to dispose of bad habits and get used to good habits to ensure health and wellbeing. The app by customizing the onboarding process through Firebase managed to double the retention rate. The app could incorporate custom user experience for different groups of users as per their preference. Onefootball This leading mobile soccer app OneFootBall experienced more than 5% increase in user session time thanks to Firebase. The new backend solution powered by Firebase helped the game app engage the audience more efficiently than ever before. The custom contents created by this popular app can enjoy better traction with users thanks to higher engagement. Author Bio: Juned Ahmed works as an IT consultant at IndianAppDevelopers, a leading Mobile app development company which offers to hire app developers in India for mobile solutions. He has more than 10 years of experience in developing and implementing marketing strategies. How to integrate Firebase on Android/iOS applications natively. Build powerful progressive web apps with Firebase. How to integrate Firebase with NativeScript for cross-platform app development.

0
0
39730

article-image-top-4-business-intelligence-tools

Ed Bowkett

04 Dec 2014

4 min read

Top 4 Business Intelligence Tools

Ed Bowkett

04 Dec 2014

4 min read

With the boom of data analytics, Business Intelligence has taken something of a front stage in recent years, and as a result, a number of Business Intelligence (BI) tools have appeared. This allows a business to obtain a reliable set of data, faster and easier, and to set business objectives. This will be a list of the more prominent tools and will list advantages and disadvantages of each. Pentaho Pentaho was founded in 2004 and offers a suite, among others, of open source BI applications under the name, Pentaho Business Analytics. It has two suites, enterprise and community. It allows easy access to data and even easier ways of visualizing this data, from a variety of different sources including Excel and Hadoop and it covers almost every platform ranging from mobile, Android and iPhone, through to Windows and even Web-based. However with the pros, there are cons, which include the Pentaho Metadata Editor in Pentaho, which is difficult to understand, and the documentation provided offers few solutions for this tool (which is a key component). Also, compared to other tools, which we will mention below, the advanced analytics in Pentaho need improving. However, given that it is open source, there is continual improvement. Tableau Founded in 2003, Tableau also offers a range of suites, focusing on three products: Desktop, Server, and Public. Some benefits of using Tableau over other products include ease of use and a pretty simple UI involving drag and drop tools, which allows pretty much everyone to use it. Creating a highly interactive dashboard with various sources to obtain your data from is simple and quick. To sum up, Tableau is fast. Incredibly fast! There are relatively few cons when it comes to Tableau, but some automated features you would usually expect in other suites aren’t offered for most of the processes and uses here. Jaspersoft As well as being another suite that is open source, Jaspersoft ships with a number of data visualization, data integration, and reporting tools. Added to the small licensing cost, Jaspersoft is justifiably one of the leaders in this area. It can be used with a variety of databases including Cassandra, CouchDB, MongoDB, Neo4j, and Riak. Other benefits include ease of installation and the functionality of the tools in Jaspersoft is better than most competitors on the market. However, the documentation has been claimed to have been lacking in helping customers dive deeper into Jaspersoft, and if you do customize it the customer service can no longer assist you if it breaks. However, given the functionality/ability to extend it, these cons seem minor. Qlikview Qlikview is one of the oldest Business Intelligence software tools in the market, having been around since 1993, it has multiple features, and as a result, many pros and cons that include ones that I have mentioned for previous suites. Some advantages of Qlikview are that it takes a very small amount of time to implement and it’s incredibly quick; quicker than Tableau in this regard! It also has 64-bit in-memory, which is among the best in the market. Qlikview also has good data mining tools, good features (having been in the market for a long time), and a visualization function. These aspects make it so much easier to deal with than others on the market. The learning curve is relatively small. Some cons in relation to Qlikview include that while Qlikview is easy to use, Tableau is seen as the better suite to use to analyze data in depth. Qlikview also has difficulties integrating map data, which other BI tools are better at doing. This list is not definitive! It lays out some open source tools that companies and individuals can use to help them analyze data to prepare business performance KPIs. There are other tools that are used by businesses including Microsoft BI tools, Cognos, MicroStrategy, and Oracle Hyperion. I’ve chosen to explore some BI tools that are quick to use out of the box and are incredibly popular and expanding in usage.

0
0
39711

article-image-how-do-aws-developers-manage-web-apps

Guest Contributor

04 Jul 2019

6 min read

How do AWS developers manage Web apps?

Guest Contributor

04 Jul 2019

6 min read

When it comes to hosting and building a website on cloud, Amazon Web Services (AWS) is one of the most preferred choices for developers. According to Canalys, AWS is dominating the global public cloud market, holding around one-third of the total market share. AWS offers numerous services that can be used for compute power, content delivery, database storage, and more. Developers can use it to build a high-availability production website, whether it is a WordPress site, Node.js web app, LAMP stack web app, Drupal website, or a Python web app. AWS developers, need to set up, maintain and evolve the cloud infrastructure of web apps. Aside from these, they are also responsible for applying best practices related to security and scalability. Having said that, let’s take a deep dive into how AWS developers manage a web application. Deploying a website or web app with Amazon EC2 Amazon Elastic Compute Cloud (Amazon EC2) offers developers a secure and scalable computing capacity in the cloud. For hosting a website or web app, the developers need to use virtual app servers called instances. With Amazon EC2 instances, developers gain complete control over computing resources. They can scale the capacity on the basis of requirements and pay only for the resources they actually use. There are tools like AWS lambda, Elastic Beanstalk and Lightsail that allow the isolation of web apps from common failure cases. Amazon EC2 supports a number of main operating systems, including Amazon Linux, Windows Server 2012, CentOS 6.5, and Debian 7.4. Here is how developers get themselves started with Amazon EC2 for deploying a website or web app. The first step is to set up an AWS account and log into it. Select “Launch Instance” from the Amazon EC2 Dashboard. It will enable the creation of VM. Now configure the instance by choosing an Amazon Machine Image (AMI), instance type and security group. Click on Launch. In the next step, choose ‘Create a new key pair’ and name it. A key pair file gets downloaded automatically, which needs to be saved. It will be needed for logging in to the instance. Click on ‘Launch Instances’ to finish the set-up process. Once the instance is ready, it can be used to build high availability websites or web app. Using Amazon S3 for cloud storage Amazon Simple Storage Service, or Amazon S3 is a secure and highly scalable cloud storage solution that makes web-scale computing seamless for developers. It is used for the objects that are required to build a website, such as HTML pages, images, CSS files, videos and JavaScript. S3 comes with a simple interface so that developers can fetch and store large amounts of data from anywhere on the internet, at any time. The storage infrastructure provided with Amazon S3 is known for scalability, reliability, and speed. Amazon itself uses this storage option to host its own websites. Within S3, the developers need to create buckets for data storage. Each bucket can store a large amount of data, allowing developers to upload a high number of objects into it. The amount of data an object can contain, is up to 5 TB. The objects are stored and fetched from the bucket using a unique key. There are several purposes of a bucket. It can be used to organize the S3 namespace, recognize the accounts assigned for storage and data transfer, as well as work as the aggregation unit for usage. Elastic load balancing Load balancing is a critical part of a website or web app to distribute and balance the traffic load accordingly to multiple targets. AWS provides elastic load balancing to its developers, which allows them to distribute the traffic across a number of services, like Amazon EC2 instances, IP addresses, Lambda functions and containers. With Elastic load balancing, developers can ensure that their projects run efficiently even when there is heavy traffic. There are three kinds of load balancers available with AWS elastic load balancing— Application Load Balancer, Network Load Balancer and Classic Load Balancer. Application Load Balancer is an ideal option for HTTP and HTTPS traffic. It provides advanced routing for the requests meant for the delivery of microservices and containers. For balancing the load of Transmission Control Protocol (TCP), Transport Layer Security (TLS) and User Datagram Protocol (UDP), developers opt for Network Load Balancer. Whereas, the Classic Load Balancer is best suited for typical load distribution across EC2 instances. It works for both requests and connections. Debugging and troubleshooting A web app or website can include numerous features and components. Often, a few of them might face issues or not work as expected, because of coding errors or other bugs. In such cases, AWS developers follow a number of processes and techniques and check the useful resources that help them to debug a recipe or troubleshoot the issues. See the service issue at Common Debugging and Troubleshooting Issues. Check the Debugging Recipes for issues related to recipes. Check the AWS OpsWorks Stack Forum. It is a forum where other developers discuss their issues. AWS team also monitors these issues and helps in finding the solutions. Get in touch with AWS OpsWorks Stacks support team to solve the issue. Traffic monitoring and analysis Analysing and monitoring the traffic and network logs help in understanding the way websites and web apps perform on the internet. AWS provides several tools for traffic monitoring, which includes Real-Time Web Analytics with Kinesis Data Analytics, Amazon Kinesis, Amazon Pinpoint, Amazon Athena, etc. For tracking of website metrics, the Real-Time Web Analytics with Kinesis Data Analytics is used by developers. This tool provides insights into visitor counts, page views, time spent by visitors, actions taken by visitors, channels driving the traffic and more. Additionally, the tool comes with an optional dashboard which can be used for monitoring of web servers. Developers can see custom metrics of the servers to know about the performance of servers, average network packets processing, errors, etc. Wrapping up Management of a web application is a tedious task and requires quality tools and technologies. Amazon Web Services makes things easier for web developers, providing them with all the tools required to handle the app. Author Bio Vaibhav Shah is the CEO of Techuz, a mobile app and web development company in India and the USA. He is a technology maven, a visionary who likes to explore innovative technologies and has empowered 100+ businesses with sophisticated Web solutions

0
0
39586

article-image-why-jvm-java-virtual-machine-for-deep-learning

Guest Contributor

10 Nov 2019

5 min read

Why use JVM (Java Virtual Machine) for deep learning

Guest Contributor

10 Nov 2019

5 min read

Deep learning is one of the revolutionary breakthroughs of the decade for enterprise application development. Today, majority of organizations and enterprises have to transform their applications to exploit the capabilities of deep learning. In this article, we will discuss how to leverage the capabilities of JVM (Java virtual machine) to build deep learning applications. Entreprises prefer JVM Major JVM languages used in enterprise are Java, Scala, Groovy and Kotlin. Java is the most widely used programming language in the world. Nearly all major enterprises in the world use Java in some way or the other. Enterprises use JVM based languages such as Java to build complex applications because JVM features are optimal for production applications. JVM applications are also significantly faster and require much fewer resources to run compared to their counterparts such as Python. Java can perform more computational operations per second compared to Python. Here is an interesting performance benchmarking for the same. JVM optimizes performance benchmarks Production applications represent a business and are very sensitive to performance degradation, latency, and other disruptions. Application performance is estimated from latency/throughput measures. Memory overload and high resource usage can influence above said measures. Applications that demand more resources or memory require good hardware and further optimization from the application itself. JVM helps in optimizing performance benchmarks and tune the application to the hardware’s fullest capabilities. JVM can also help in avoiding memory footprints in the application. We have discussed on JVM features so far, but there’s an important context on why there’s a huge demand for JVM based deep learning in production. We’re going to discuss that next. Python is undoubtedly the leading programming language used in deep learning applications. For the same reason, the majority of enterprise developers i.e, Java developers are forced to switch to a technology stack that they’re less familiar with. On top of that, they need to address compatibility issues and deployment in a production environment while integrating neural network models. DeepLearning4J, deep learning library for JVM Java Developers working on enterprise applications would want to exploit deployment tools like Maven or Gradle for hassle-free deployments. So, there’s a demand for a JVM based deep learning library to simplify the whole process. Although there are multiple deep learning libraries that serve the purpose, DL4J (Deeplearning4J) is one of the top choices. DL4J is a deep learning library for JVM and is among the most popular repositories on GitHub. DL4J, developed by the Skymind team, is the first open-source deep learning library that is commercially supported. What makes it so special is that it is backed by ND4J (N-Dimensional Arrays for Java) and JavaCPP. ND4J is a scientific computational library developed by the Skymind team. It acts as the required backend dependency for all neural network computations in DL4J. ND4J is much faster in computations than NumPy. JavaCPP acts as a bridge between Java and native C++ libraries. ND4J internally depends on JavaCPP to run native C++ libraries. DL4J also has a dedicated ETL component called DataVec. DataVec helps to transform the data into a format that a neural network can understand. Data analysis can be done using DataVec just like Pandas, a popular Python data analysis library. Also, DL4J uses Arbiter component for hyperparameter optimization. Arbiter finds the best configuration to obtain good model scores by performing random/grid search using the hyperparameter values defined in a search space. Why choose DL4J for your deep learning applications? DL4J is a good choice for developing distributed deep learning applications. It can leverage the capabilities of Apache Spark and Hadoop to develop high performing distributed deep learning applications. Its performance is equivalent to Caffe in case multi-GPU hardware is used. We can use DL4J to develop multi-layer perceptrons, convolutional neural networks, recurrent neural networks, and autoencoders. There are a number of hyperparameters that can be adjusted to further optimize the neural network training. The Skymind team did a good job in explaining the important basics of DL4J on their website. On top of that, they also have a gitter channel to discuss or report bugs straight to their developers. If you are keen on exploring reinforcement learning further, then there’s a dedicated library called RL4J (Reinforcement Learning for Java) developed by Skymind. It can already play doom game! DL4J combines all the above-mentioned components (DataVec, ND4J, Arbiter and RL4J) for the deep learning workflow thus forming a powerful software suite. Most importantly, DL4J enables productionization of deep learning applications for the business. If you are interested to learn how to develop real-time applications on DL4J, checkout my new book Java Deep Learning Cookbook. In this book, I show you how to install and configure Deeplearning4j to implement deep learning models. You can also explore recipes for training and fine-tuning your neural network models using Java. By the end of this book, you’ll have a clear understanding of how you can use Deeplearning4j to build robust deep learning applications in Java. Author Bio Rahul Raj has more than 7 years of IT industry experience in software development, business analysis, client communication and consulting for medium/large scale projects. He has extensive experience in development activities comprising requirement analysis, design, coding, implementation, code review, testing, user training, and enhancements. He has written a number of articles about neural networks in Java and is featured by DL4J and Official Java community channel. You can follow Rahul on Twitter, LinkedIn, and GitHub. Top 6 Java Machine Learning/Deep Learning frameworks you can’t miss 6 most commonly used Java Machine learning libraries Deeplearning4J 1.0.0-beta4 released with full multi-datatype support, new attention layers, and more!

0
0
39583

article-image-what-is-streaming-analytics-and-why-is-it-important

Amey Varangaonkar

05 Oct 2017

5 min read

Say hello to Streaming Analytics

Amey Varangaonkar

05 Oct 2017

5 min read

In this data-driven age, businesses want fast, accurate insights from their huge data repositories in the shortest time span — and in real time when possible. These insights are essential — they help businesses understand relevant trends, improve their existing processes, enhance customer satisfaction, improve their bottom line, and most importantly, build, and sustain their competitive advantage in the market. Doing all of this is quite an ask - one that is becoming increasingly difficult to achieve using just the traditional data processing systems where analytics is limited to the back-end. There is now a burning need for a newer kind of system where larger, more complex data can be processed and analyzed on the go. Enter: Streaming Analytics Streaming Analytics, also referred to as real-time event processing, is the processing and analysis of large streams of data in real-time. These streams are basically events that occur as a result of some action. Actions like a transaction or a system failure, or a trigger that changes the state of a system at any point in time. Even something as minor or granular as a click would then constitute as an event, depending upon the context. Consider this scenario - You are the CTO of an organization that deals with sensor data from wearables. Your organization would have to deal with terabytes of data coming in on a daily basis, from thousands of sensors. One of your biggest challenges as a CTO would be to implement a system that processes and analyzes the data from these sensors as it enters the system. Here’s where streaming analytics can help you by giving you the ability to derive insights from your data on the go. According to IBM, a streaming system demonstrates the following qualities: It can handle large volumes of data It can handle a variety of data and analyze it efficiently — be it structured or unstructured, and identifies relevant patterns accordingly It can process every event as it occurs unlike traditional analytics systems that rely on batch processing Why is Streaming Analytics important? The humongous volume of data that companies have to deal with today is almost unimaginable. Add to that the varied nature of data that these companies must handle, and the urgency with which value needs to be extracted from this data - it all makes for a pretty tricky proposition. In such scenarios, choosing a solution that integrates seamlessly with different data sources, is fine-tuned for performance, is fast, reliable, and most importantly one that is flexible to changes in technology, is critical. Streaming analytics offers all these features - thereby empowering organizations to gain that significant edge over their competition. Another significant argument in favour of streaming analytics is the speed at which one can derive insights from the data. Data in a real-time streaming system is processed and analyzed before it registers in a database. This is in stark contrast to analytics on traditional systems where information is gathered, stored, and then the analytics is performed. Thus, streaming analytics supports much faster decision-making than the traditional data analytics systems. Is Streaming Analytics right for my business? Not all organizations need streaming analytics, especially those that deal with static data or data that hardly change over longer intervals of time, or those that do not require real-time insights for decision-making. For instance, consider the HR unit of a call centre. It is sufficient and efficient to use a traditional analytics solution to analyze thousands of past employee records rather than run it through a streaming analytics system. On the other hand, the same call centre can find real value in implementing streaming analytics to something like a real-time customer log monitoring system. A system where customer interactions and context-sensitive information are processed on the go. This can help the organization find opportunities to provide unique customer experiences, improve their customer satisfaction score, alongside a whole host of other benefits. Streaming Analytics is slowly finding adoption in a variety of domains, where companies are looking to get that crucial competitive advantage - sensor data analytics, mobile analytics, business activity monitoring being some of them. With the rise of Internet of Things, data from the IoT devices is also increasing exponentially. Streaming analytics is the way to go here as well. In short, streaming analytics is ideal for businesses dealing with time-critical missions and those working with continuous streams of incoming data, where decision-making has to be instantaneous. Companies that obsess about real-time monitoring of their businesses will also find streaming analytics useful - just integrate your dashboards with your streaming analytics platform! What next? It is safe to say that with time, the amount of information businesses will manage is going to rise exponentially, and so will the nature of this information. As a result, it will get increasingly difficult to process volumes of unstructured data and gain insights from them using just the traditional analytics systems. Adopting streaming analytics into the business workflow will therefore become a necessity for many businesses. Apache Flink, Spark Streaming, Microsoft's Azure Stream Analytics, SQLstream Blaze, Oracle Stream Analytics and SAS Event Processing are all good places to begin your journey through the fleeting world of streaming analytics. You can browse through this list of learning resources from Packt to know more. Learning Apache Flink Learning Real Time processing with Spark Streaming Real Time Streaming using Apache Spark Streaming (video) Real Time Analytics with SAP Hana Real-Time Big Data Analytics

0
0
39459

article-image-operator-overloading-techniques-in-kotlin-you-need-to-know

Aaron Lazar

21 Jun 2018

6 min read

4 operator overloading techniques in Kotlin you need to know

Aaron Lazar

21 Jun 2018

6 min read

Operator overloading is a form of polymorphism. Some operators change behaviors on different types. The classic example is the operator plus (+). On numeric values, plus is a sum operation and on String is a concatenation. Operator overloading is a useful tool to provide your API with a natural surface. Let's say that we're writing a Time and Date library; it'll be natural to have the plus and minus operators defined on time units. In this article, we'll understand how Operator Overloading works in Kotlin. This article has been extracted from the book, Functional Kotlin, by Mario Arias and Rivu Chakraborty. Kotlin lets you define the behavior of operators on your own or existing types with functions, normal or extension, marked with the operator modifier: class Wolf(val name:String) { operator fun plus(wolf: Wolf) = Pack(mapOf(name to this, wolf.name to wolf)) } class Pack(val members:Map<String, Wolf>) fun main(args: Array<String>) { val talbot = Wolf("Talbot") val northPack: Pack = talbot + Wolf("Big Bertha") // talbot.plus(Wolf("...")) } The operator function plus returns a Pack value. To invoke it, you can use the infix operator way (Wolf + Wolf) or the normal way (Wolf.plus(Wolf)). Something to be aware of about operator overloading in Kotlin—the operators that you can override in Kotlin are limited; you can't create arbitrary operators. Binary operators Binary operators receive a parameter (there are exceptions to this rule—invoke and indexed access). The Pack.plus extension function receives a Wolf parameter and returns a new Pack. Note that MutableMap also has a plus (+) operator: operator fun Pack.plus(wolf: Wolf) = Pack(this.members.toMutableMap() + (wolf.name to wolf)) val biggerPack = northPack + Wolf("Bad Wolf") The following table will show you all the possible binary operators that can be overloaded: Operator Equivalent Notes x + y x.plus(y) x - y x.minus(y) x * y x.times(y) x / y x.div(y) x % y x.rem(y) From Kotlin 1.1, previously mod. x..y x.rangeTo(y) x in y y.contains(x) x !in y !y.contains(x) x += y x.plussAssign(y) Must return Unit. x -= y x.minusAssign(y) Must return Unit. x *= y x.timesAssign(y) Must return Unit. x /= y x.divAssign(y) Must return Unit. x %= y x.remAssign(y) From Kotlin 1.1, previously modAssign. Must return Unit. x == y x?.equals(y) ?: (y === null) Checks for null. x != y !(x?.equals(y) ?: (y === null)) Checks for null. x < y x.compareTo(y) < 0 Must return Int. x > y x.compareTo(y) > 0 Must return Int. x <= y x.compareTo(y) <= 0 Must return Int. x >= y x.compareTo(y) >= 0 Must return Int. Invoke When we introduce lambda functions, we show the definition of Function1: /** A function that takes 1 argument. */ public interface Function1<in P1, out R> : Function<R> { /** Invokes the function with the specified argument. */ public operator fun invoke(p1: P1): R } The invoke function is an operator, a curious one. The invoke operator can be called without name. The class Wolf has an invoke operator: enum class WolfActions { SLEEP, WALK, BITE } class Wolf(val name:String) { operator fun invoke(action: WolfActions) = when (action) { WolfActions.SLEEP -> "$name is sleeping" WolfActions.WALK -> "$name is walking" WolfActions.BITE -> "$name is biting" } } fun main(args: Array<String>) { val talbot = Wolf("Talbot") talbot(WolfActions.SLEEP) // talbot.invoke(WolfActions.SLEEP) } That's why we can call a lambda function directly with parenthesis; we are, indeed, calling the invoke operator. The following table will show you different declarations of invoke with a number of different arguments: Operator Equivalent Notes x() x.invoke() x(y) x.invoke(y) x(y1, y2) x.invoke(y1, y2) x(y1, y2..., yN) x.invoke(y1, y2..., yN) Indexed access The indexed access operator is the array read and write operations with square brackets ([]), that is used on languages with C-like syntax. In Kotlin, we use the get operators for reading and set for writing. With the Pack.get operator, we can use Pack as an array: operator fun Pack.get(name: String) = members[name]!! val badWolf = biggerPack["Bad Wolf"] Most of Kotlin data structures have a definition of the get operator, in this case, the Map<K, V> returns a V?. The following table will show you different declarations of get with a different number of arguments: Operator Equivalent Notes x[y] x.get(y) x[y1, y2] x.get(y1, y2) x[y1, y2..., yN] x.get(y1, y2..., yN) The set operator has similar syntax: enum class WolfRelationships { FRIEND, SIBLING, ENEMY, PARTNER } operator fun Wolf.set(relationship: WolfRelationships, wolf: Wolf) { println("${wolf.name} is my new $relationship") } talbot[WolfRelationships.ENEMY] = badWolf The operators get and set can have any arbitrary code, but it is a very well-known and old convention that indexed access is used for reading and writing. When you write these operators (and by the way, all the other operators too), use the principle of least surprise. Limiting the operators to their natural meaning on a specific domain, makes them easier to use and read in the long run. The following table will show you different declarations of set with a different number of arguments: Operator Equivalent Notes x[y] = z x.set(y, z) Return value is ignored x[y1, y2] = z x.set(y1, y2, z) Return value is ignored x[y1, y2..., yN] = z x.set(y1, y2..., yN, z) Return value is ignored Unary operators Unary operators don't have parameters and act directly in the dispatcher. We can add a not operator to the Wolf class: operator fun Wolf.not() = "$name is angry!!!" !talbot // talbot.not() The following table will show you all the possible unary operators that can be overloaded: Operator Equivalent Notes +x x.unaryPlus() -x x.unaryMinus() !x x.not() x++ x.inc() Postfix, it must be a call on a var, should return a compatible type with the dispatcher type, shouldn't mutate the dispatcher. x-- x.dec() Postfix, it must be a call on a var, should return a compatible type with the dispatcher type, shouldn't mutate the dispatcher. ++x x.inc() Prefix, it must be a call on a var, should return a compatible type with the dispatcher type, shouldn't mutate the dispatcher. --x x.dec() Prefix, it must be a call on a var, should return a compatible type with the dispatcher type, shouldn't mutate the dispatcher. Postfix (increment and decrement) returns the original value and then changes the variable with the operator returned value. Prefix returns the operator's returned value and then changes the variable with that value. Now you know how Operator Overloading works in Kotlin. If you found this article interesting and would like to read more, head on over to get the whole book, Functional Kotlin, by Mario Arias and Rivu Chakraborty. Extension functions in Kotlin: everything you need to know Building RESTful web services with Kotlin Building chat application with Kotlin using Node.js, the powerful Server-side JavaScript platform

0
0
39451

article-image-9-reasons-to-choose-agile-methodology-for-mobile-app-development

Guest Contributor

01 Oct 2018

6 min read

9 reasons to choose Agile Methodology for Mobile App Development

Guest Contributor

01 Oct 2018

6 min read

0
0
39448

9 Data Science Myths Debunked

8 ways Artificial Intelligence can improve DevOps

Top 5 programming languages for crunching Big Data effectively

Top 5 cybersecurity assessment tools for networking professionals

Promising DevOps Projects

What role does Linux play in securing Android devices?

6 most commonly used Java Machine learning libraries

Best practices for deploying self-service BI with Qlik Sense

4 key benefits of using Firebase for mobile app development

Top 4 Business Intelligence Tools

Trending Topics

How do AWS developers manage Web apps?

Why use JVM (Java Virtual Machine) for deep learning

Say hello to Streaming Analytics

4 operator overloading techniques in Kotlin you need to know

9 reasons to choose Agile Methodology for Mobile App Development