Tech Guides

article-image-looking-different-types-lookup-cache

20 Nov 2017

6 min read

Looking at the different types of Lookup cache

20 Nov 2017

0
0
13276

article-image-17-nov-17-handpicked-weekend-reading

Aarthi Kumaraswamy

18 Nov 2017

2 min read

Handpicked for your Weekend Reading - 17th Nov '17

Aarthi Kumaraswamy

18 Nov 2017

2 min read

The weekend is here! You have got your laundry to do, binge on those Netflix episodes of your favorite show, catch up on that elusive sleep and go out with your friends and if you are married, then spending quality time with your family is also on your priority list. The last thing you want to do to spend hours shortlisting content that is worth your reading time. So here is a handpicked list of the best of Datahub published this week. Enjoy! 3 Things you should know that happened this week in News Introducing Tile: A new machine learning language with auto-generating GPU Kernels What we are learning from Microsoft Connect(); 2017 Tensorflow Lite developer preview is Here Get hands-on with these Tutorials Implementing Object detection with Go using TensorFlow Machine Learning Algorithms: Implementing Naive Bayes with Spark MLlib Using R to implement Kriging – A Spatial Interpolation technique for Geostatistics data Do you agree with these Insights & Opinions? 3 ways JupyterLab will revolutionize Interactive Computing Of Perfect Strikes, Tackles and Touchdowns: How Analytics is Changing Sports 13 reasons why Exit Polls get it wrong sometimes Just relax and have fun reading these Date with Data Science Episode 04: Dr. Brandon explains ‘Transfer Learning’ to Jon Implementing K-Means Clustering in Python Scotland Yard style!

0
0
18844

article-image-3-ways-jupyterlab-will-revolutionize-interactive-computing

Amey Varangaonkar

17 Nov 2017

4 min read

3 ways JupyterLab will revolutionize Interactive Computing

Amey Varangaonkar

17 Nov 2017

4 min read

The history of the Jupyter notebook is quite interesting. It started as a spin-off project to IPython in 2011, with support for the leading languages for data science such as R, Python, and Julia. As the project grew, Jupyter’s core focus shifted to being more interactive and user-friendly. It was soon clear that Jupyter wasn’t just an extension of IPython - leading to the ‘Big Split’ in 2014. Code reusability, easy sharing, and deployment, as well as extensive support for third-party extensions - these are some of the factors which have led to Jupyter becoming the popular choice of notebook for most data professionals. And now, Jupyter plan to go a level beyond with JupyterLab - the next-gen Jupyter notebook with strong interactive and collaborative computing features. [box type="info" align="" class="" width=""] What is JupyterLab? JupyterLab is the next-generation end-user version of the popular Jupyter notebook, designed to enhance interaction and collaboration among the users. It takes all the familiar features of the Jupyter notebook and presents them through a powerful, user-friendly interface.[/box] Here are 3 ways, or reasons shall we say, to look forward to this exciting new project, and how it will change interactive computing as we know it. [dropcap]1[/dropcap] Improved UI/UX One of Jupyter’s strongest and most popular feature is that it is very user-friendly, and the overall experience of working with Jupyter is second to none. With improvements in the UI/UX, JupyterLab offers a cleaner interface, with an overall feel very similar to the current Jupyter notebooks. Although JupyterLab has been built with a web-first vision, it also provides a native Electron app that provides a simplified user experience.The other key difference is that JupyterLab is pretty command-centric, encouraging users to prefer keyboard shortcuts for quicker tasks. These shortcuts are a bit different from the other text editors and IDEs, but they are customizable. [dropcap]2[/dropcap] Better workflow support Many data scientists usually start coding on an interactive shell and then migrate their code onto a notebook for building and deployment purposes. With JupyterLab, users can perform all these activities more seamlessly and with minimal effort. It offers a document-less console for quick data exploration and offers an integrated text editor for running blocks of code outside the notebook. [dropcap]3[/dropcap] Better interactivity and collaboration Probably the defining feature which propels JupyterLab over Jupyter and the other notebooks is how interactive and collaborative it is, as compared to the other notebooks. JupyterLab has a side by side editing feature and provides a crisp layout which allows for viewing your data, the notebook, your command console and some graphical display, all at the same time. Better real-time collaboration is another big feature promised by JupyterLab, where users will be able to share their notebooks on a Google drive or Dropbox style, without having to switch over to different tool/s. It would also support a plethora of third-party extensions to this effect, with Google drive extension being the most talked about. Popular Python visualization libraries such as Bokeh will now be integrated with JupyterLab, as will extensions to view and handle different file types such as CSV for interactive rendering, and GeoJSON for geographic data structures. JupyterLab has gained a lot of traction in the last few years. While it is still some time away from being generally available, the current indicators look quite strong. With over 2,500 stars and 240 enhancement requests on GitHub already, the strong interest among the users is pretty clear. Judging by the initial impressions it has had on some users, JupyterLab hasn’t made a bad start at all, and looks well and truly set to replace the current Jupyter notebooks in the near future.

0
0
19359

article-image-5-cool-ways-transfer-learning-used-today

Savia Lobo

15 Nov 2017

7 min read

5 cool ways Transfer Learning is being used today

Savia Lobo

15 Nov 2017

7 min read

Machine learning has gained a lot of traction over the years because of the predictive solutions that it provides, including the development of intelligent, and reliable models. However, training the models is a laborious task because it takes time to curate the labeled data within the model and then to get the model ready. Reducing the time involved in training and labeling can be overcome by using the novel approach of Transfer Learning - a smarter and effective form of machine learning, where you can use the learnings of one scenario and apply that learning to a different but related problem. How exactly does Transfer Learning work? Transfer learning reduces the efforts to build a model from scratch by using the fundamental logic or base algorithms within one domain and applying it to another. For instance, in the real-world, the balancing logic learned while riding a bicycle can be transferred to learn driving other two-wheeled vehicles. Similarly, in the case of machine learning, transfer learning can be used to transfer the algorithmic logic from one ML model to the other. Let’s look into some of the possible use cases of transfer learning. [dropcap]1[/dropcap] Real-world Simulations Digital simulation is better than creating a physical prototype for real-world implementations. Training a robot in the real-world surroundings is both time and cost consuming. In order to minimize this, robots can now be trained using simulation and the knowledge acquired can be thus transferred onto a real-world robot. This is done using progressive networks, which are ideal for a simulation to the real world transfer of policies in robot control domains. These networks consist of essential features for learning numerous tasks in sequence while enabling transfer and are resistant to catastrophic forgetting--a tendency of Artificial Neural Networks(ANNs) to completely forget previously learned information, on learning a new information. Another application of simulation can be seen while training self-driving cars, which are trained using simulations through video games. Udacity has open sourced its self-driving car simulator which allows training self-driving cars through GTA 5 and many other video games. However, not all features of a simulation are replicated successfully when they are brought into the real world, as the interactions in the real world are more complex. [dropcap]2[/dropcap] Gaming The adoption of Artificial Intelligence has taken gaming to an altogether new level. DeepMind’s neural network program AlphaGo is a testament to this, as it successfully defeated a professional Go player. AlphaGo is a master in Go but fails when tasked to play other games. This is because its algorithm is tailored to play Go. So, the disadvantage of using ANNs in gaming is that they cannot master all games as a human brain does. In order to do this, AlphaGo has to totally forget Go and adapt itself to the new algorithms and techniques of the new game. With transfer Learning, the tactics learned in a game can be reapplied to play another game. An example of how Transfer learning is implemented in gaming can be seen in MadRTS, a commercial Real Time Strategy games. MadRTS, is developed to carry out military simulations. MadRTS uses CARL(CAse-based Reinforcement Learner), a multi-tiered architecture which combines Case-based reasoning(CBR) and Reinforcement Learning(RL). CBR provides an approach to tackle unseen but related problems based on past experiences within each level of the game. RL algorithms, on the other hand, allow the model to carry out good approximations to a situation, based on the agent’s experience in its environment--also known as Markov’s Decision Process. These CBR/RL transfer learning agents are evaluated in order to perform effective learning on tasks given in MadRTS, and should be able to learn better across tasks by transferring experience. [dropcap]3[/dropcap] Image Classification Neural networks are experts in recognizing objects within an image as they are trained on huge datasets of labeled images, which is time-consuming. How transfer learning helps here is, it reduces the time to train the model by pre-training the model using ImageNet, which contains millions of images from different categories. Let’s assume that a convolutional neural network - for instance, a VGG-16 ConvNet - has to be trained to recognize images within a dataset. Firstly, it is pre-trained using ImageNet. Then, it is trained layer-wise starting by replacing the final layer with a softmax layer and training it until the training saturates. Further, the other dense layers are trained progressively. By the end of the training, the ConvNet model is successful in learning to detect images from the dataset provided. In cases where the dataset is not similar to the pre-trained model data, one can finetune weights in the higher layers of the ConvNet by backpropagation methods. The dense layers contain the logic for detecting the image, thus, tuning the higher layers won’t affect the base logic. The convolutional neural networks can be trained on Keras, using Tensorflow or as a backend. An example of Image Classification can be seen in the field of medical imaging, where the convolutional model is trained on ImageNet to solve kidney detection problem in ultrasound images. [dropcap]4[/dropcap] Zero Shot translation Zero shot translation is an extended part of supervised learning, where the goal of the model is, learning to predict novel values from values that are not present in the training dataset. The prominent working example of zero shot translation can be seen in Google’s Neural Translation model(GNMT), which allows for effective cross-lingual translations. Prior to Zero shot implementation, two discrete languages had to be translated using a pivot language. For instance, to translate Korean to Japanese, Korean had to be first translated into English and then English to Japanese. Here, English is the pivot language that acts as a medium to translate Korean to Japanese. This resulted in a translated language that was full of distortions created by the first language pair. Zero shot translation rips off the need for a pivot language. It uses available training data to learn the translational knowledge applied, to translate a new language pair. Another instance of Zero shot translation can be seen in Image2Emoji, which combines visuals and texts to predict unseen emoji icons in a zero shot approach. [dropcap]5[/dropcap] Sentiment Classification Businesses can know their customers better by implementing Sentiment Analysis, which helps them to understand emotions and polarity (negative or positive) underlying the feedback and the product reviews. Analyzing sentiments for a new text corpus is difficult to build up, as training the models to detect different emotions is difficult. A solution to this is Transfer Learning. This involves training the models on any one domain, twitter feeds for instance, and fine-tuning them to another domain you wish to perform Sentiment Analysis on; say movie reviews. Here, deep learning models are trained on twitter feeds by carrying out sentiment analysis of the text corpus and also detecting the polarity of each statement. Once the model is trained on understanding emotions through polarity of the twitter feeds, its underlying language model and learned representation is transferred onto the model assigned a task to analyze sentiments within movie reviews. Here, an RNN model is trained on logistic regression techniques carried out sentiment analysis on the twitter feeds. The word embeddings and the recurrent weights learned from the source domain (twitter feeds) are re-used in the target domain (movie reviews) to classify sentiments within the latter domain. Conclusion Transfer learning has brought in a new wave of learning in machines by reusing algorithms and the applied logic, thus speeding up their learning process. This directly results in a reduction in the capital investment and also the time invested to train a model. This is why many organizations are looking forward to replicating such a learning onto their machine learning models. Also, transfer learning has been carried out successfully in the field of Image processing, Simulations, Gaming, and so on. How transfer learning affects the learning curve of machines in other sectors in the future, is worth watching out for.

0
0
25556

article-image-how-to-plan-a-system-migration-10-steps

Hari Vignesh

14 Nov 2017

6 min read

How to plan a system migration in 10 steps

Hari Vignesh

14 Nov 2017

6 min read

How do I plan a system migration? A system migration refers to the process of moving an application from one environment to another (such as from an on-premises enterprise server to a cloud-based environment, from one server to another, or from cloud-to-cloud). You might, for example, migrate to or from custom-built on platforms like Microsoft Azure, the Google App Engine, Force.com, MySQL, or Amazon Web Services. Software migration is always a challenge, but fortunately many system migrations can be managed – and even automated - by a third-party middleware solution. Sometimes a system migration might be something smaller scale. You might want to move installed applications and data from one piece of hardware to another (as you would when you give your team new computers), rather than moving an app’s entire development environment. While this is pretty easy in technical terms, making sure it’s carefully managed for users is nevertheless important. Why migration? Migrations done to improve efficiency or bring all applications from a legacy system into a current one. That’s why it’s becoming such a pressing issue for many organizations as they seek to undergo ‘digital transformation’ or optimize their existing setup. Often, organizations will want to virtualize their software. This is ultimately about disassociating it with specific operating systems, instead hosting programs in separate environments for sandboxing at runtime. Here are some migration scenarios: Example 1: You want to move your team using Adobe Creative Cloud (CC) from old PCs to new Macs. You need to ensure that once team members are working on Macs with Adobe CC installed, they’re still able to use paths to the server to access all creative assets. Example 2: Your team uses custom software developed on one type of cloud environment — like Amazon Web Services (AWS) — and now your organization is moving en masse to Google Cloud Platform (GCP). You need to map each piece of functionality your app had on AWS to GCP, despite the major differences in how each environment operates. How to successfully plan a system migration The hard bit is actually planning and executing a system migration. There’s a lot that can go wrong from both a technical and people perspective. Here are 10 steps you should follow to remain (relatively) calm and in control when you’re making a move. Establish your cross-functional representatives. Because of the many hands required to see a software migration project through, and its long timeframe and far-off ROI, you need a champion in your corner — from every corner of the business. Get one key representative from each business function relevant to the software that’s moving — be it production, sales, accounting, IT, or another department. These people will help you gain continued support of the project as it continues without ROI yet and comes under budget threats during, say, a lean quarter. Frame the project for stakeholders. Be it department heads, the C-suite, or the board of directors, lay out the plan and just how essential it is for growth. Set up what the project entails, what it isn’t, and lay out goalposts for each phase. Whenever it comes under review, you’ll have this initial framework you and stakeholders agreed upon. Build a team of internal experts. Find technical experts within your organization who can assist with each part of the migration, even if you’re ultimately using a third-party vendor or software for the migration. Put these people in charge of cleaning or writing programs to clean existing data, knowing where everything is stored, and understanding limitations of the platforms on each end of the migration. Depending on the size of your organization, each member of this team may lead their own small team to handle their portion of the project. Take inventory of assets. There’s no way to judge a migration as successful if you’re not sure whether you lost any data along the way. In the case of data, some of your internal experts can check in on what is stored, making backups, and exporting to lightweight .CSV files or hard-copies (in the case of legal and other vital documentation). For software or applications, take inventory on each action and function possible with the software, how it interfaces with its databases, what it’s compatible with and what it isn’t, and the unique custom configurations it has that separate it from off-the-shelf software’s documentation. Create a risk assessment report. Using the section above on challenges, determine all relevant risks to the migration, including opportunity costs and compliance issues. This will be vital for getting final approval from stakeholders, and insulate project runners from being blindsided later. One of these risk assessment matrix templates can help you get started. Determine technical, time, and financial requirements. Work with individuals in finance to work out long-term budget needs and rates of approval over the whole project. Work with IT, developers, and engineering to figure out the technical aspects and requirements, what method of migration is appropriate, and who will be forced into downtime at what stages of the project. Compile all of this to figure out realistic timing and checkpoints in the migration. Create project management system for all parties. With the data you gathered in the previous step, and all the teams you’ve assembled (technical, cross-functional, and stakeholder teams), create a common project management hub where everyone can see progress, send messages, attach files and findings, and generally lend visibility into the process. It should be intuitive for all users. Set up the project management software with the budget and time expectations at each phase agreed upon. You can present this information to the stakeholders for final approval prior to project kickoff, and use it to submit regular reports to them as they request. Perform the migration in phases. Depending on the appropriate methods, perform the migration and document every step. Use the project management tool to keep everyone informed and gather documentation. Along the way, when some employees inevitably leave or get added to the team, you can use this tool to quickly get them up to speed. Test cases after each phase. After each phase, test whatever you’ve migrated into the new environment, and document the outcomes. Regular testing and sandboxing will allow your team to catch problems early and regroup or change direction before data is lost and progress is wasted. Results. Once the migration is complete, record final results, and compare it to the goalposts set up and tracked in your project management tool. Combine all documentation and deliver a final report to stakeholders, and begin reaping the rewards of your newer, faster, better software, operating system, cloud environment, or whatever else you migrated. By following the steps above, you should find your system migration a little more stress free than it might otherwise be! Hari Vignesh Jayapalan is a Google Certified Android app developer, IDF Certified UI & UX Professional, street magician, fitness freak, technology enthusiast, and wannabe entrepreneur. He can be found on Twitter @HariofSpades.

0
1
45742

Tech Guides

article-image-how-sports-analytics-is-changing-industry

Amey Varangaonkar

14 Nov 2017

7 min read

Of perfect strikes, tackles and touchdowns: how analytics is changing sports

Amey Varangaonkar

14 Nov 2017

7 min read

The rise of Big Data and Analytics is drastically changing the landscape of many businesses - and the sports industry is one of them. In today’s age of cut-throat competition, data-based strategies are slowly taking the front seat when it comes to crucial decision making - helping teams gain that decisive edge over their competition.Sports Analytics is slowly becoming the next big thing! In the past, many believed that the key to conquering the opponent in any professional sport is to make the player or the team better - be it making them stronger, faster, or more intelligent. ‘Analysis’ then was limited to mere ‘clipboard statistics’ and the intuition built by coaches on the basis of raw video footage of games. This is not the case anymore. From handling media contracts and merchandising to evaluating individual or team performance on matchday, analytics is slowly changing the landscape of sports. The explosion of data in sports The amount and quality of information available to decision-makers within the sports organization have increased exponentially over the last two decades. There are several factors contributing to this: Innovation in sports science over the last decade, which has been incredible, to say the least. In-depth records maintained by trainers, coaches, medical staff, nutritionists and even the sales and marketing departments Improved processing power and lower cost of storage allowing for maintaining large amounts of historical data. Of late, the adoption of motion capture technology and wearable devices has proved to be a real game-changer in sports, where every movement on the field can be tracked and recorded. Today, many teams in a variety of sports such as Boston Red Sox and Houston Astros in Major League Baseball (MLB), San Antonio Spurs in NBA and teams like Arsenal, Manchester City and Liverpool FC in football (soccer) are adopting analytics in different capacities. Turning sports data into insights Needless to say, all the crucial sports data being generated today need equally good analytics techniques to extract the most value out of it. This is where Sports Analytics comes into the picture. Sports analytics is defined as the use of analytics on current as well as historical sport-related data to identify useful patterns, which can be used to gain a competitive advantage on the field of play. There are several techniques and algorithms which fall under the umbrella of Sports Analytics. Machine learning, among them, is a widely used set of techniques that sports analysts use to derive insights. It is a popular form of Artificial Intelligence where systems are trained using large datasets to give reliable predictions on random data. With the help of a variety of classification and recommendation algorithms, analysts are now able to identify patterns within the existing attributes of a player, and how they can be best optimized to improve his performance. Using cross-validation techniques, the machine learning models then ensure there is no degree of bias involved, and the predictions can be generalized even in cases of unknown datasets. Analytics is being put to use by a lot of sports teams today, in many different ways. Here are some key use-cases of sports analytics: Pushing the limit: Optimizing player performance Right from tracking an athlete’s heartbeats per minute to finding injury patterns, analytics can play a crucial role in understanding how an individual performs on the field. With the help of video, wearables and sensor data, it is possible to identify exactly when an athlete’s performance drops and corrective steps can be taken accordingly. It is now possible to assess a player’s physiological and technical attributes and work on specific drills in training to push them to an optimal level. Developing search-powered data intelligence platforms seems to be the way forward. The best example for this is Tellius, a search-based data intelligence tool which allows you to determine a player’s efficiency in terms of fitness and performance through search-powered analytics. Smells like team spirit: Better team and athlete management Analytics also helps the coaches manage their team better. For example, Adidas has developed a system called miCoach which works by having the players use wearables during the games and training sessions. The data obtained from the devices highlights the top performers and the ones who need rest. It is also possible to identify and improve patterns in a team’s playing styles, and developing a ‘system’ to improve the efficiency in gameplay. For individual athletes, real-time stats such as speed, heart rate, and acceleration could help the trainers plan the training and conditioning sessions accordingly. Getting intelligent responses regarding player and team performances and real-time in-game tactics is something that will make the coaches’ and management’s life a lot easier, going forward. All in the game: Improving game-day strategy By analyzing the real-time training data, it is possible to identify the fitter, in-form players to be picked for the game. Not just that, analyzing opposition and picking the right strategy to beat them becomes easier once you have the relevant data insights with you. Different data visualization techniques can be used not just with historical data but also with real-time data, when the game is in progress. Splashing the cash: Boosting merchandising What are fans buying once they’re inside the stadium? Is it the home team’s shirt, or is it their scarfs and posters? What food are they eating in the stadium eateries? By analyzing all this data, retailers and club merchandise stores can store the fan-favorite merchandise and other items in adequate quantities, so that they never run out of stock. Analyzing sales via online portals and e-stores also help the teams identify the countries or areas where the buyers live. This is a good indicator for them to concentrate sales and marketing efforts in those regions. Analytics also plays a key role in product endorsements and sponsorships. Determining which brands to endorse, identifying the best possible sponsor, the ideal duration of sponsorship and the sponsorship fee - these are some key decisions that can be taken by analyzing current trends along with the historical data. Challenges in sports analytics Although the advantages offered by analytics are there for all to see, many sports teams have still not incorporated analytics into their day-to-day operations. Lack of awareness seems to be the biggest factor here. Many teams underestimate or still don’t understand, the power of analytics. Choosing the right Big Data and analytics tool is another challenge. When it comes to the humongous amounts of data, especially, the time investment needed to clean and format the data for effective analysis is problematic and is something many teams aren’t interested in. Another challenge is the rising demand for analytics and a sharp deficit when it comes to supply, driving higher salaries. Add to that the need to have a thorough understanding of the sport to find effective insights from data - and it becomes even more difficult to get the right data experts. What next for sports analytics? Understanding data and how it can be used in sports - to improve performance and maximize profits - is now deemed by many teams to be the key differentiator between success and failure. And it’s not just success that teams are after - it’s sustained success, and analytics goes a long way in helping teams achieve that. Gone are the days when traditional ways of finding insights were enough. Sports have evolved, and teams are now digging harder into data to get that slightest edge over the competition, which can prove to be massive in the long run. If you found the article to be insightful, make sure you check out our interview on sports analytics with ESPN Senior Stats Analyst Gaurav Sundararaman.

0
2
17622

article-image-13-reasons-exit-polls-wrong

Sugandha Lahoti

13 Nov 2017

7 min read

13 reasons why Exit Polls get it wrong sometimes

Sugandha Lahoti

13 Nov 2017

7 min read

An Exit poll, as the name suggests, is a poll taken immediately after voters exit the polling booth. Private companies working for popular newspapers or media organizations conduct these exit polls and are popularly known as pollsters. Once the data is collected, data analysis and estimation is used to predict the winning party and the number of seats captured. Turnout models which are built using logistic regression or random forest techniques are used for prediction of turnouts in the exit poll results. Exit polls are dependent on sampling. Hence a margin of error does exist. This describes how close pollsters are in expecting an election result relative to the true population value. Normally, a margin of error plus or minus 3 percentage points is acceptable. However, in the recent times, there have been instances where the poll average was off by a larger percentage. Let us analyze some of the reasons why exit polls can get their predictions wrong. [dropcap]1[/dropcap] Sampling inaccuracy/quality Exit polls are dependent on the sample size, i.e. the number of respondents or the number of precincts chosen. Incorrect estimation of this may lead to error margins. The quality of sample data also matters. This includes factors such as whether the selected precincts are representative of the state, whether the polled audience in each precinct represents the whole etc. [dropcap]2[/dropcap] Model did not consider multiple turnout scenarios Voter turnout refers to the percentage of voters who cast a vote during an election. Pollsters may often misinterpret the number of people who actually vote based on the total no. of the population eligible to vote. Also, they often base their turnout prediction on past trends. However, voter turnout is dependent on many factors. For example, some voters might not turn up due to reasons such as indifference or a feeling of perception that their vote might not count--which is not true. In such cases, the pollsters adjust the weighting to reflect high or low turnout conditions by keeping the total turnout count in mind. The observations taken during a low turnout is also considered and the weights are adjusted therein. In short, pollsters try their best to maintain the original data. [dropcap]3[/dropcap] Model did not consider past patterns Pollsters may commit a mistake by not delving into the past. They can gauge the current turnout rates by taking into the account the presidential turnout votes or the previous midterm elections. Although, one may assume that the turnout percentage over the years have been stable a check on the past voter turnout is a must. [dropcap]4[/dropcap] Model was not recalibrated for year and time of election such as odd-year midterms Timing is a very crucial factor in getting the right traction for people to vote. At times, some social issues would be much more hyped and talked-about than the elections. For instance, the news of the Ebola virus breakout in Texas was more prominent than news about the contestants standing in the mid 2014 elections. Another example would be an election day set on a Friday versus on any other weekday. [dropcap]5[/dropcap] Number of contestants Everyone has a personal favorite. In cases where there are just two contestants, it is straightforward to arrive at a clear winner. For pollsters, it is easier to predict votes when the whole world's talking about it, and they know which candidate is most talked about. With the increase in the number of candidates, the task to carry out an accurate survey is challenging for the pollsters. They have to reach out to more respondents to carry out the survey required in an effective manner. [dropcap]6[/dropcap] Swing voters/undecided respondents Another possible explanation for discrepancies in poll predictions and the outcome is due to a large proportion of undecided voters in the poll samples. Possible solutions could be Asking relative questions instead of absolute ones Allotment of undecided voters in proportion to party support levels while making estimates [dropcap]7[/dropcap] Number of down-ballot races Sometimes a popular party leader helps in attracting votes to another less popular candidate of the same party. This is the down-ballot effect. At times, down-ballot candidates may receive more votes than party leader candidates, even when third-party candidates are included. Also, down-ballot outcomes tend to be influenced by the turnout for the polls at the top of the ballot. So the number of down-ballot races need to be taken into account. [dropcap]8[/dropcap] The cost incurred to commission a quality poll A huge capital investment is required in order to commission a quality poll. The cost incurred for a poll depends on the sample size, i.e. the number of people interviewed, the length of the questionnaire--longer the interview, more expensive it becomes, the time within which interviews must be conducted, are some contributing factors. Also, if a polling firm is hired or if cell phones are included to carry out a survey, it will definitely add up to the expense. [dropcap]9[/dropcap] Over-relying on historical precedence Historical precedence is an estimate of the type of people who have shown up previously on a similar type of election. This precedent should also be taken into consideration for better estimation of election results. However, care should be taken not to over-rely on it. [dropcap]10[/dropcap] Effect of statewide ballot measures Poll estimates are also dependent on state and local governments. Certain issues are pushed by local ballot measures. However, some voters feel that power over specific issues should belong exclusively to state governments. This causes opposition to local ballot measures in some states. These issues should be taken into account while estimation for better result prediction. [dropcap]11[/dropcap] Oversampling due to various factors such as faulty survey design, respondents’ willingness/unwillingness to participate etc Exit polls may also sometimes oversample voters for many reasons. One example of this is related to the people of US with cultural ties to Latin America. Although, more than one-fourth of Latino voters prefer speaking Spanish to English, yet exit polls are almost never offered in Spanish. This might oversample English speaking Latinos. [dropcap]12[/dropcap] Social desirability bias in respondents People may not always tell the truth about who they voted for. In other words, when asked by pollsters they are likely to place themselves on the safer side, as exit polls is a sensitive topic. The voters happen to tell pollsters that they have voted for a minority candidate, but they have actually voted against the minority candidate. Social Desirability has no linking to issues with race or gender. It is just that people like to be liked and like to be seen as doing what everyone else is doing or what the “right” thing to do is, i.e., they play safe. Brexit polling, for instance, showed stronger signs of Social desirability bias. [dropcap]13[/dropcap] The spiral of silence theory People may not reveal their true thoughts to news reporters as they may believe media has an inherent bias. Voters may not come out to declare their stand publicly in fear of reprisal or the fear of isolation. They choose to remain silent. This may also hinder estimate calculation for pollsters. The above is just a shortlist of a long list of reasons why exit poll results must be taken with a pinch of salt. However, even with all its shortcomings, the striking feature of an exit poll is the fact that rather than predicting about a future action, it records an action that has just happened. So you rely on present indicators rather than ambiguous historical data. Exit polls are also cost-effective in obtaining very large samples. If these exit polls are conducted properly, keeping in consideration the points described above, they can predict election results with greater reliability.

0
0
15281

article-image-know-customer-envisaging-customer-sentiments-using-behavioral-analytics

Sugandha Lahoti

13 Nov 2017

6 min read

Know Your Customer: Envisaging customer sentiments using Behavioral Analytics

Sugandha Lahoti

13 Nov 2017

6 min read

“All the world’s a stage and the men and women are merely players.” Shakespeare may have considered men and women as mere players, but as large number of users are connected with smart devices and the online world, these men, and women—your customers—become your most important assets. Therefore, knowing your customer and envisaging their sentiments using Behavioral Analytics has become paramount. Behavioral analytics: Tracking user events Say, you order a pizza through an app on your phone. After customizing and choosing the crust size, type and ingredients, you land in the payment section. Suppose, instead of paying, you abandon the order altogether. Immediately you get an SMS and an email, alerting you that you are just a step away from buying your choice of pizza. So how does this happen? Behavior analytics runs in the background here. By tracking user navigation, it prompts the user to complete an order, or offer a suggestion. The rise of smart devices has enabled almost everything to transmit data. Most of this data is captured between sessions of user activity and is in the raw form. By user activity we mean social media interactions, amount of time spent on a site, user navigation path, click activity of a user, their responses to change in the market, purchasing history and much more. Some form of understanding is therefore required to make sense of this raw and scrambled data and generate definite patterns. Here’s where behavior analytics steps in. It goes through a user's entire e-commerce journey and focuses on understanding the what and how of their activities. Based on this, it predicts their future moves. This, in turn, helps to generate opportunities for businesses to become more customer-centric. Why Behavioral analytics over traditional analytics The previous analytical tools lacked a single architecture and simple workflow. Although they assisted with tracking clicks and page loads, they required a separate data warehouse and visualization tools. Thus, creating an unstructured workflow. Behavioral Analytics go a step beyond standard analytics by combining rule-based models with deep machine learning. Where the former tells what the users do, the latter reveals the how and why of their actions. Thus, they keep track of where customers click, which pages are viewed, how many continue down the process, who eliminates a website at what step, among other things. Unlike traditional analytics, behavioral analytics is an aggregator of data from diverse sources (websites, mobile apps, CRM, email marketing campaigns etc.) collected across various sessions. Cloud-based behavioral analytic platforms can intelligently integrate and unify all sources of digital communication into a complete picture. Thus, offering a seamless and structured view of the entire customer journey. Such behavioral analytic platforms typically capture real-time data which is in raw format. They then automatically filter and aggregate this data into a structured dataset. It also provides visualization tools to see and observe this data, all the while predicting trends. The aggregation of data is done in such a way that it allows querying this data in an unlimited number of ways for the business to utilize. So, they are helpful in analyzing retention and churn trends, trace abnormalities, perform multidimensional funnel analysis and much more. Let’s look at some specific use cases across industries where behavioral analytics is highly used. Analysing customer behavior in E-commerce E-commerce platforms are on the top of the ladder in the list of sectors, which can largely benefit by mapping their digital customer journey. Analytic strategies can track if a customer spends more time on a product page X over product page Y by displaying views and data pointers of customer activity in a structured format. This enables industries to resolve issues, which may hinder a page’s popularity, including slow loading pages, expensive products etc. By tracking user session, right from when they entered a platform to the point a sale is made, behavior analytics predicts future customer behavior and business trends. Some of the parameters considered include number of customers viewing reviews and ratings before adding an item to their cart, what similar products the customer sees, how often the items in the cart are deleted or added etc. Behavioral analytics can also identify top-performing products and help in building powerful recommendation engines. By analyzing changes in customer behavior over different demographical conditions or on the basis of regional differences.This helps achieve customer-to-customer personalization. KISSmetrics is a powerful analytics tool that provides detailed customer behavior information report for businesses to slice through and find meaningful insights. RetentionGrid provides color-coded visualizations and also provides multiple strategies tailormade for customers, based on customer segmentation and demographics. How can online gaming benefit from behavioral analysis Online gaming is a surging community with millions of daily active users. Marketers are always looking for ways to acquire customers and retain users. Monetization is another important focal point. This means not only getting more users to play but also to pay. Behavioral analytics keeps track of a user’s gaming session such as skill levels, amount of time spent at different stages, favorite features and activities within game-play, and drop-off points from the game. At an overall level, it tracks the active users, game logs, demographic data and social interaction between players over various community channels. On the basis of this data, a visualization graph is generated which can be used to drive market strategies such as identifying features that work, how to add additional players, or how to keep existing players engaged. Thus helping increase player retention and assisting game developers and marketers implement new versions based on player’s reaction. behavior analytics can also identify common characteristics of users. It helps in understanding what gets a user to play longer and in identifying the group of users most likely to pay based on common characteristics. All these help gaming companies implement right advertising and placement of content to the users. Mr Green’s casino launched a Green Gaming tool to predict a person’s playing behavior and on the basis of a gamer’s risk-taking behavior, they help generate personalized insights regarding their gaming. Nektan PLC has partnered with ‘machine learning’ customer insights firm Newlette. Newlette models analyze player behavior based on individual playing styles. They help in increasing player engagement and reduce bonus costs by providing the players with optimum offers and bonuses. The applications of behavioral analytics are not just limited to e-commerce or gaming alone. The security and surveillance domain uses behavioral analytics for conducting risk assessment of organizational resources and alerting against individual entities that are a potential threat. They do so by sifting through large amounts of company data and identifying patterns that portray irregularity or change. End-to-end monitoring of customer also helps app developers track customer adoption to new-feature development. It could also provide reports on the exact point where customers drop off and help in avoiding expensive technical issues. All these benefits highlight how customer tracking and knowing user behavior is an essential tool to drive a business forward. As Leo Burnett, the founder of a prominent advertising agency says “What helps people, helps business.”

0
1
9290

article-image-why-are-open-source-developers-more-in-demand-than-ever

Erik Kappelman

13 Nov 2017

3 min read

Why are open source developers more in demand than ever?

Erik Kappelman

13 Nov 2017

3 min read

There are many reasons why open source professionals are in high demand. But let’s start with the obvious one first: one of the biggest reasons open source professionals are in such demand today has nothing to do with the fact that they are open source professionals, but simply because technology has become such a core part of just about every organization in every industry. But, there’s more to it than that. Why open source developers are good for businesses Let's talk for a minute about what open source professional actually means, or what it could mean. I would separate open source professionals into two broad groups that overlap each other quite a bit. There are people who use open source tools in professional settings, such as Node.js, Angular 2, and others. Then there are the people who are the creators of open source tools, such as those at Red Hat, Ubuntu, and Firebase. But why are so many more people are using open source tools in a professional setting? Sure, fashion is a part of it, but there’s obviously a much more prosaic answer: open source tools are free to use. Technological innovation is seen as powering growth – and while the right people might cost money, taking advantage of these tools can make a transformative impact. Proprietary tools, after all, limit organizations – they require specific skillsets, and they demand you work in specific ways. It’s for this reason, then, that businesses are starting to consider the impact an open source tech strategy can have. And the consequence is then that these skills are more in demand than ever. Why is open source better? Open source is better because it allows more freedom. It empowers businesses – even very small ones – to innovate and create new solutions to problems. Traditional software – where you have vendor lock-in – doesn’t enable the same degree of innovation. And even for medium sized companies, it can still be very expensive to get the software solution you want. The world's biggest software companies are embracing open source The open source model is the antithesis of the monopolistic behemoth model we are accustomed too. The product is free and its ingredients aren’t a secret. This means that software will become the best version of itself because interest, not position, determines who is developing and how they are developing software. Furthermore, because the product is free, companies need to turn elsewhere to make a profit. They do this usually by selling enterprise level support, and I would say that their business model is working. The open source model forces competition by decentralizing power and allowing anyone with talent to get noticed very quickly. This is a recipe for success and there is one sure way to tell: What have Microsoft, IBM, Google, Amazon and the like been up to recently? Getting started on creating their own set open source tools of course. The titans of industry are now right down with you and me, because they know they would be missing out if they weren’t. Erik Kappelman wears many hats including blogger, developer, data consultant, economist, and transportation planner. He lives in Helena, Montana and works for the Department of Transportation as a transportation demand modeler.

0
0
3182

Tech Guides

article-image-6-use-cases-machine-learning-healthcare

Sugandha Lahoti

10 Nov 2017

7 min read

6 use cases of Machine Learning in Healthcare

Sugandha Lahoti

10 Nov 2017

7 min read

While hospitals have sophisticated processes and highly skilled administrators, management can still be an administrative nightmare for the already time-starved healthcare professionals. A sprinkle of automation can do wonders here. It could free up a practitioner's invaluable time and, thereby, allow them to focus on tending to critically ill patients and complex medical procedures. At the most basic level, machine learning can mechanize routine tasks such as automating documentation, billing, and regulatory processes. It can also provide ways and tools to diagnose and treat patients more efficiently. However, these tasks only scratch the surface. Machine learning is here to revolutionize healthcare and other allied industries such as pharma and medicine. Below are some ways it is being put to use in these domains. Helping with disease identification and drug discovery Healthcare systems generate copious amounts of data and use them for disease prediction. However, the necessary software to generate meaningful insights from this unstructured data is often not in place. Hence, drug and disease discovery end up taking time. Machine Learning algorithms can discover signatures of diseases at rapid rates by allowing systems to learn and make predictions based on the previously processed data. They can also be used to determine which chemical compounds could work together to aid drug discovery. Thus the time-consuming process of experimenting and testing millions of compounds is eliminated. With the fast discovery of diseases, the chances of detecting symptoms earlier and the probability of survival increases. It also boosts available treatment options. IBM has collaborated with Teva Pharmaceutical to discover new treatment options for respiratory and central nervous system diseases using Machine Learning algorithms such as predictive and visual analytics that run on IBM Watson Health Cloud. To gain more insights on how IBM Watson is changing the face of healthcare, check this article. Enabling precision medicine Precision Medicine revolves around healthcare practices specific to a particular patient. This includes analyzing a person’s genetic information, health history, environmental exposure, and needs and preferences to guide diagnosis for diseases and subsequent treatment. Here, machine learning algorithms are utilized to sift through vast databases of patient data to identify factors such as their genetic history and predisposition to diseases, that could strongly determine treatment success or failure. ML techniques in precision medicine exploit molecular and genomic data to assist doctors in directing therapies to patients and shed light on disease mechanisms and heterogeneity. It also predicts what diseases are likely to occur in the future and suggests methods to avoid them. Cellworks, a Life Sciences Technology company, brings together a SaaS-based platform for generating precision medicine products. Their platform analyses the genomic profile of the patient and then provides patient-specific reports for improved diagnosis and treatment. Assisting radiology and radiotherapy CTI and MRI scans for radiological diagnosis and interpretation are burdensome and laborious (not to mention, time-consuming). They involve segmentation—differentiating between healthy and infectious tissues—which when done manually has a good probability of resulting in errors and misdiagnosis. Machine Learning algorithms can speed up the segmentation process while also increasing accuracy in radiotherapy planning. ML can provide physicians information for better diagnostics which helps in obtaining accurate tumor location. It also predicts radiotherapy response to help create a personalized treatment plan. Apart from these, ML algorithms find use in medical image analysis as they learn from examples. This involves classification techniques which analyze images and available clinical information to generate the most likely diagnosis. Deep Learning can also be used for detecting lung cancer nodules in early screening CT scans and displaying the results in useful ways for clinical use. Google’s machine-learning division, DeepMind, is automating radiotherapy treatment for head and neck cancers using scans from almost 700 diagnosed patients. An ML algorithm scans the reports of symptomatic patients against these previous scans to help physicians develop a suitable treatment process. Arterys, a cloud-based platform, automates cardiac analysis using deep learning. Providing Neurocritical Care A large number of neurological diseases develop gradually or in stages, so the decay of the brain happens over time. Traditional approaches to neurological care such as peak activation, EEG epileptic spikes, Pronator drift etc., are not accurate enough to diagnose and classify neurological and psychiatric disorders. This is because they are typically used for end results assessment rather than for progressive analysis on how the brain disease develops. Moreover, timely personalized neurological treatments and diagnoses rely highly on the constant availability of an expert. Machine Learning algorithms can advance the science of detection and prediction by learning how the brain progressively develops into these conditions. Deep Learning techniques are applied in the area of neuroimaging to detect abstract and complex patterns from single-subject data to detect and diagnose brain disorders. Machine learning techniques such as SVM, RBFN, and RF are amalgamated with PDT (Pronator drift tests) to detect stroke symptoms based on quantification of proximal arm weakness using inertial sensors and signal processing. Machine Learning algorithms can also be used for detecting signs of dementia before its onset. The Douglas Mental Health University Institute uses PET scans to train ML algorithms to spot signs of dementia by analyzing it against scans of patients who have mild cognitive impairment. Then they run the scans belonging to symptomatic patients on the trained algorithm to predict the possibility of dementia. Predicting epidemic outbreaks Epidemic predictions traditionally rely on manual accounting. This includes self-reports or aggregation of information from healthcare services such as reports by different health protection agencies like CDC, NHIS, National Immunization Survey etc. However, they are time-consuming and error-prone. Thus predicting and prioritizing the outbreaks becomes challenging. ML algorithms can automatically perform analysis, improve calculations and verify information with minimal human intervention. Machine learning techniques like support vector machines and artificial neural networks can predict the epidemic potential of a disease and provide alerts for disease outbreak. They do this using data collected from satellites, and from real-time social media updates, historical information on the web, and other sources. They also use geospatial data such as temperature, weather conditions, wind speed, and other data points to predict the magnitude of impact an epidemic can cause in a particular area and to recommend necessary measures for preventing and containing them early on. AIME, a medical startup, has come up with an algorithm to predict outcome and even the epicenter of epidemics such as dengue fever before their occurrence. Better hospital management Machine Learning can bring about a change in traditional hospital management systems by envisioning hospitals as a digital patient-centric care center. These include automating routine tasks such as billing, admission and clearance, monitoring patients’ vitals etc. With administrative tasks out of the way, hospital authorities could fully focus on the care and treatment of patients. ML techniques such as computer vision can be used to feed all the vital signs of a patient directly into the EHR from the monitoring devices. Smart tracking devices are also used on patients to provide real-time whereabouts. Predictive analysis techniques provide continuous stream of real-time images and data. This analysis can sense risk and prioritize activities for the benefit of all patients. ML can also automate non-clinical functions, including pharmacy, laundry, and food delivery. The John Hopkins Hospital has its own command center that uses predictive analytics for efficient operational flow. Conclusion The digital health era focuses on health and wellness rather than diseases. The incorporation of machine learning in healthcare provides an improved patient experience, a better public health management, and reduces costs by automating manual labour. The next step in this amalgamation is a successful collaboration of clinicians and doctors with machines. This would bring about a futuristic health revolution with improved, precise, and more efficient care and treatment.

0
0
16122

Savia Lobo

10 Nov 2017

7 min read

Facelifting NLP with Deep Learning

Savia Lobo

10 Nov 2017

7 min read

Over the recent years, the world has witnessed a global move towards digitization. Massive improvements in computational capabilities have been made; thanks to the boom in the AI chip market as well as computation farms. These have resulted in data abundance and fast data processing ecosystems which are accessible to everyone - important pillars for the growth of AI and allied fields. Terms such as ‘Machine learning’ and ‘Deep learning’ in particular have gained a lot of traction in the data science community, mainly because of the multitude of domains they lend themselves to. Along with image processing, computer vision and games, one key area transformed by machine learning, and more recently by deep learning, is Natural Language Processing, simply known as NLP. Human language is a heady concoction of otherwise incoherent words and phrases with more exceptions than rules, full of jargons and words with different meanings. Making machines comprehend a human language in all its glory, not to mention its users’ idiosyncrasies, can be quite a challenge. Then there is the matter of there being thousands of languages, dialects, accents, slangs and what not. Yet, it is a challenge worth taking up - mainly because language finds its application in almost everything humans do - from web search to e-mails to content curation, and more. According to Tractica, a market intelligence firm, “Natural Language Processing market will reach $22.3 Billion by 2025.” NLP Evolution - From Machine Learning to Deep Learning Before deep learning embraced NLP into a smarter version of a conversational machine, machine learning based NLP systems were utilized to process natural language. Machine learning based NLP systems were trained on models which were shallow in nature as they were often based on incomplete and time-consuming custom-made features. They included algorithms such as support vector machines (SVM) and logistic regression. These models found their applications in tasks such as spam detection in emails, grouping together similar words in a document, spin articles, and much more. ML-based NLP systems relied heavily on the quality of the training data. Because of the limited nature of the capabilities offered by machine learning, when it came to understanding high-level texts and speech outputs from humans, the classical NLP model fell short. This led to the conclusion that machine learning algorithms can handle only narrow features and as such cannot perform high-level reasoning, which human conversations often comprise of. Also, as the scale of the data grew, machine learning couldn’t be an effective tool to tackle the different NLP problems related to efficiently training the models and their optimization. Here’s where deep learning proves to be a stepping stone. Deep learning includes Artificial Neural Networks (ANNs) that function similar to neural nerves in a human brain, a reason why they are considered to emulate human thinking remarkably. Deep learning models perform significantly better as the quantity of data fed to them increases. For instance, Google’s Smart Reply can generate relevant responses to the emails received by the user. This system uses a pair of RNNs, one to encode the incoming mail and the other to predict relevant responses. With the incorporation of DL in NLP, the need for feature engineering is highly reduced, saving time - a major asset. This means machines can be trained to understand languages other than English without complex and custom feature engineering by applying deep neural network models. In spite of the constant upgrades happening to language, the quest to get machines more and more friendly to humans is made possible using deep learning. Key Deep Learning techniques used for NLP NLP-based deep learning models make use of word-embeddings, pre-trained using a large corpus or collection of unlabeled data. With advancements in word embedding techniques, the ability of the machines to derive deeper insights from languages has increased. To do so, NLP uses a technique called Word2vec that converts a given word into a vector for the better understanding of the machines. Continuous-bag-of words and skip-gram models - models used for learning word vectors, help in capturing the sequential patterns within sentences. The latter predicts the outside words using the center word as an input and is used in large datasets whereas the former does the vice versa. Similarly, GloVe also computes vector representations but using a technique called matrix factorization. A disadvantage of the word embedding approach is that it cannot understand phrases and sentences. As mentioned earlier, the bag-of-words model converts each word into a corresponding vector. This can simplify many problems but it can also change the context of the text. For instance, it may not collectively understand the use of idioms or sub-phrases such as “Break a leg”. Also, recognizing indicative or negative words such as ‘not’, ‘but’, that attaches a semantical meaning to a word is difficult for the model to understand. A solution to this would be using ‘negative sampling’, i.e., a frequency-based sampling of negative terms while training the word2vec model. This is where neural networks can come into play. CNNs (Convolutional Neural Networks) and RNNs (Recurrent Neural Networks) are the two widely used neural network models in NLP. CNNs are good performers for text classification. However, the downside is that they are poor in learning the sequential information from the text. Expresso, built on Caffe, is one of the many tools used to develop CNNs. RNNs are preferred over CNNs for NLP as they allow sequential processing. For example, an RNN can differentiate between the words ‘fan’ and ‘fan-following’. This means RNNs are better equipped to handle complex dependencies and unbounded texts. Also, unlike CNNs, RNNs can handle input context of arbitrary length because of its flexible computational steps. All the above highlight why RNNs have better modeling potential than CNNs as far NLP is concerned. Although RNNs are the preferred choice, they have a limitation: The vanishing gradient problem. This problem can be solved using LSTM (Long-short term memory), which helps in understanding the association of words within a text, and back-propagates an error through unlimited steps. LSTM includes a forget gate, which forgets the learned weights if carrying it forward is negligible. Thus, long-term dependencies are reduced. Other than LSTM, GRU (Gated Recurrent Units) is also widely opted to solve the vanishing gradient problem. Current Implementations Deep Learning is good at identifying patterns within unstructured data. Social Media is a major dump of unstructured media content - a goldmine for human sentiment analysis. Facebook uses DeepText, a Deep Learning based text understanding engine, which can understand the textual content of thousands of posts with near-human accuracy. CRM systems strive to maximize customer lifetime value by understanding what customers want and then taking appropriate measures. TalkIQ, uses neural-network based text analysis and deep learning models to extract meaning from the conversations that organizations have with their customers in order to gain deeper insights in real-time. Google’s Cloud Speech API helps convert audio to texts; it can also recognize audio in 110 languages. Other implementations include Automated Text Summarization for summarizing the concept within a huge document, Speech Processing for converting voice requests into search recommendations, and much more. Many other areas such as fraud detection tools, UI/UX, IoT devices, and more, that make use of speech and text analytics can perform explicitly well by imbibing deep learning neural network models. The future of NLP with Deep Learning With the advancements in deep learning, machines will be able to understand human communication in a much more comprehensive way. They will be able to extract complex patterns and relationships and decipher the variations and ambiguities in various languages. This will find some interesting use-cases - smarter chatbots being a very important one. Understanding complex and longer customer queries and giving out accurate answers are what we can expect from these chatbots in the near future. The advancements in NLP and deep learning could also lead to the development of expert systems which perform smarter searches, allowing the applications to search for content using informal, conversational language. Understanding and interpreting unindexed unstructured information, which is currently a challenge for NLP, is something that is possible as well. The possibilities are definitely there - how NLP evolves by blending itself with the innovations in Artificial Intelligence is all that remains to be seen.

0
0
22574

article-image-generative-adversarial-networks-gans-next-milestone-deep-learning

Savia Lobo

09 Nov 2017

7 min read

Generative Adversarial Networks (GANs): The next milestone In Deep Learning

Savia Lobo

09 Nov 2017

7 min read

With the rise in popularity of deep learning as a concept and a paradigm, neural networks are captivating the interest of machine learning enthusiasts and developers alike, by being able to replicate the human brain for efficient predictions, image recognition, text recognition, and much more. However, can these neural networks do something more, or are they just limited to predictions? Can they self-generate new data by learning from a training dataset? Generative Adversarial networks (GANs) are here, to answer all these questions. So, what are GANs all about? Generative Adversarial Networks follow unsupervised machine learning, unlike traditional neural networks. When a neural network is taught to identify a bird, it is fed with a huge number of images including birds, as training data. Each picture is labeled before it is put to use in training the models. This labeling of data is both costly and time-consuming. So, how can you train your neural networks by giving it less data to train on? GANs are of a great help here. They cast out an easy way to train the DL algorithms by slashing out the amount of data required to train the neural network models, that too, with no labeling of data required. The architecture of a GAN includes a generative network model(G), which produces fake images or texts, and an adversarial network model--also known as the discriminator model (D)--that distinguishes between the real and the fake productions by comparing the content sent by the generator with the training data it has. Both of these are trained separately by feeding each of them with training data and a competitive goal. Source: Learning Generative Adversarial Networks GANs in action GANs were introduced by Ian Goodfellow, an AI researcher at Google Brain. He compares the generator and the discriminator models with a counterfeiter and a police officer. “You can think of this being like a competition between counterfeiters and the police,” Goodfellow said. “Counterfeiters want to make fake money and have it look real, and the police want to look at any particular bill and determine if it’s fake.” Both the discriminator and the generator are trained simultaneously to create a powerful GAN architecture. Let’s peek into how a GAN model is trained- Specify the problem statement and state the type of manipulation that the GAN model is expected to carry out. Collect data based on the problem statement. For instance, for image manipulation, a lot of images are required to be collected to feed in. The discriminator is fed with an image; one from the training set and one produced by the generator The discriminator can be termed as ‘successfully trained’ if it returns 1 for the real image and 0 for the fake image. The goal of the generator is to successfully fool the discriminator and getting the output as 1 for each of its generated image. In the beginning of the training, the discriminator loss--the ability to differentiate real and fake image or data--is minimal. As the training advances, the generator loss decreases and the discriminator loss increases, This means, the generator is now able to generate real images. Real world applications of GANs The basic application of GANs can be seen in generating photo-realistic images. But there is more to what GANs can do. Some of the instances where GANs are majorly put to use include: Image Synthesis Image Synthesis is one of the primary use cases of GANs. Here, multilayer perceptron models are used in both the generator and the discriminator to generate photo-realistic images based on the training dataset of the images. Text-to-image synthesis Generative Adversarial networks can also be utilized for text-to-image synthesis. An example of this is in generating a photo-realistic image based on a caption. To do this, a dataset of images with their associated captions are given as training data. The dataset is first encoded using a hybrid neural network called the character-level convolutional Recurrent Neural network, which creates a joint representation of both in multimodal space for both the generator and the discriminator. Both Generator and Discriminator are then trained based on this encoded data. Image Inpainting Images that have missing parts or have too much of noise are given as an input to the generator which produces a near to real image. For instance, using TensorFlow framework, DCGANs (Deep Convolutional GANs), can generate a complete image from a broken image. DCGANs are a class of CNNs that stabilizes GANs for efficient usage. Video generation Static images can be transformed into short scenes with plausible motions using GANs. These GANs use scene dynamics in order to add motion to static images. The videos generated by these models are not real but illusions. Drug discovery Unlike text and image manipulation, Insilico medicine uses GANs to generate an artificially intelligent drug discovery mechanism. To do this, the generator is trained to predict a drug for a disease which was previously incurable.The task of the discriminator is to determine whether the drug actually cures the disease. Challenges in training a GAN Whenever a competition is laid out, there has to be a distinct winner. In GANs, there are two models competing against each other. Hence, there can be difficulties in training them. Here are some challenges faced while training GANs: Fair training: While training both the models, precaution has to be taken that the discriminator does not overpower the generator. If it does, the generator would fail to train effectively. On the other hand, if the discriminator is lenient, it would allow any illegitimate content to be generated. Failure to understand the number of objects and the dimensions of objects, present in a particular image. This usually occurs during the initial learning phase. For instance, GANs, at times output an image which ends up having more than two eyes, which is not normal in the real world. Sometimes, it may present a 3D image like a 2D one. This is because they cannot differentiate between the two. Failure to understand the holistic structure: GANs lack in identifying universally correct images. It may generate an image which can be totally opposed to how they look in real. For instance, a cat having an elongated body shape, or a cow standing on its hind legs, etc. Mode collapse is another challenge, which occurs when a low variation dataset is processed by a GANs. Real world includes complex and multimodal distributions, where data may have different concentrated sub-groups. The problem here is, the generator would be able to yield images based on anyone sub-group resulting in an inaccurate output. Thus, causing a mode collapse. To tackle these and other challenges that arise while training GANs, researchers have come up with DCGANs (Deep Convolutional GANs), WassersteinGANs, CycleGANs to ensure fair training, enhance accuracy, and reduce the training time. AdaGANs are implemented to eliminate mode collapse problem. Conclusion Although the adoption of GANs is not as widespread as one might imagine, there’s no doubt that they could change the way unsupervised machine learning is used today. It is not too far-fetched to think that their implementation in the future could find practical applications in not just image or text processing, but also in domains such as cryptography and cybersecurity. Innovations in developing newer GAN models with improved accuracy and lesser training time is the key here - but it is something surely worth keeping an eye on.

0
0
20636

article-image-8-myths-rpa-robotic-process-automation

Savia Lobo

08 Nov 2017

9 min read

8 Myths about RPA (Robotic Process Automation)

Savia Lobo

08 Nov 2017

9 min read

Many say we are on the cusp of the fourth industrial revolution that promises to blur the lines between the real, virtual and the biological worlds. Amongst many trends, Robotic Process Automation (RPA) is also one of those buzzwords surrounding the hype of the fourth industrial revolution. Although poised to be a $6.7 trillion industry by 2025, RPA is shrouded in just as much fear as it is brimming with potential. We have heard time and again how automation can improve productivity, efficiency, and effectiveness while conducting business in transformative ways. We have also heard how automation and machine-driven automation, in particular, can displace humans and thereby lead to a dystopian world. As humans, we make assumptions based on what we see and understand. But sometimes those assumptions become so ingrained that they evolve into myths which many start accepting as facts. Here is a closer look at some of the myths surrounding RPA. [dropcap]1[/dropcap] RPA means robots will automate processes The term robot evokes in our minds a picture of a metal humanoid with stiff joints that speaks in a monotone. RPA does mean robotic process automation. But the robot doing the automation is nothing like the ones we are used to seeing in the movies. These are software robots that perform routine processes within organizations. They are often referred to as virtual workers/digital workforce complete with their own identity and credentials. They essentially consist of algorithms programmed by RPA developers with an aim to automate mundane business processes. These processes are repetitive, highly structured, fall within a well-defined workflow, consist of a finite set of tasks/steps and may often be monotonous and labor intensive. Let us consider a real-world example here - Automating the invoice generation process. The RPA system will run through all the emails in the system, and download the pdf files containing details of the relevant transactions. Then, it would fill a spreadsheet with the details and maintain all the records therein. Later, it would log on to the enterprise system and generate appropriate invoice reports for each detail in the spreadsheet. Once the invoices are created, the system would then send a confirmation mail to the relevant stakeholders. Here, the RPA user will only specify the individual tasks that are to be automated, and the system will take care of the rest of the process. So, yes, while it is true that RPA involves robots automating processes, it is a myth that these robots are physical entities or that they can automate all processes. [dropcap]2[/dropcap] RPA is useful only in industries that rely heavily on software “Almost anything that a human can do on a PC, the robot can take over without the need for IT department support.” - Richard Bell, former Procurement Director at Averda RPA is a software which can be injected into a business process. Traditional industries such as banking and finance, healthcare, manufacturing etc that have significant tasks that are routine and depend on software for some of their functioning can benefit from RPA. Loan processing and patient data processing are some examples. RPA, however, cannot help with automating the assembly line in a manufacturing unit or with performing regular tests on patients. Even in industries that maintain daily essential utilities such as cooking gas, electricity, telephone services etc RPA can be put to use for generating automated bills, invoices, meter-readings etc. By adopting RPA, businesses irrespective of the industry they belong to can achieve significant cost savings, operational efficiency, and higher productivity. To leverage the benefits of RPA, rather than understanding the SDLC process, it is important that users have a clear understanding of business workflow processes and domain knowledge. Industry professionals can be easily trained on how to put RPA into practice. The bottom line - RPA is not limited to industries that rely heavily on software to exist. But it is true that RPA can be used only in situations where some form of software is used to perform tasks manually. [dropcap]3[/dropcap] RPA will replace humans in most frontline jobs Many organizations employ a large workforce in frontline roles to do routine tasks such as data entry operations, managing processes, customer support, IT support etc. But frontline jobs are just as diverse as the people performing them. Take sales reps for example. They bring new business through their expert understanding of the company’s products, their potential customer base coupled with the associated soft skills. Currently, they spend significant time on administrative tasks such as developing and finalizing business contracts, updating the CRM database, making daily status reports etc. Imagine the spike in productivity if these aspects could be taken off the plates of sales reps and they could just focus on cultivating relationships and converting leads. By replacing human efforts in mundane tasks within frontline roles, RPA can help employees focus on higher value-yielding tasks. In conclusion, RPA will not replace humans in most frontline jobs. It will, however, replace humans in a few roles that are very rule-based and narrow in scope such as simple data entry operators or basic invoice processing executives. In most frontline roles like sales or customer support, RPA is quite likely to change significantly at least in some ways how one sees their job responsibilities. Also, the adoption of RPA will generate new job opportunities around the development, maintenance, and sale of RPA based software. [dropcap]4[/dropcap] Only large enterprises can afford to deploy RPA The cost of implementing and maintaining the RPA software and training employees to use it can be quite high. This can make it an unfavorable business proposition for SMBs with fairly simple organizational processes and cross-departmental considerations. On the other hand, large organizations with higher revenue generation capacity, complex business processes, and a large army of workers can deploy an RPA system to automate high-volume tasks quite easily and recover that cost within a few months. It is obvious that large enterprises will benefit from RPA systems due to the economies of scale offered and faster recovery of investments made. SMBs (Small to medium-sized businesses) can also benefit from RPA to automate their business processes. But this is possible only if they look at RPA as a strategic investment whose cost will be recovered over a longer time period of say 2-4 years. [dropcap]5[/dropcap] RPA adoption should be owned and driven by the organization's IT department The RPA team handling the automation process need not be from the IT department. The main role of the IT department is providing necessary resources for the software to function smoothly. An RPA reliability team which is trained in using RPA tools does not include IT professionals but rather business operations professionals. In simple terms, RPA is not owned by the IT department but by the whole business and is driven by the RPA team. [dropcap]6[/dropcap] RPA is an AI virtual assistant specialized to do a narrow set of tasks An RPA bot performs a narrow set of tasks based on the given data and instructions. It is a system of rule-based algorithms which can be used to capture, process and interpret streams of data, trigger appropriate responses and communicate with other processes. However, it cannot learn on its own - a key trait of an AI system. Advanced AI concepts such as reinforcement learning and deep learning are yet to be incorporated in robotic process automation systems. Thus, an RPA bot is not an AI virtual assistant, like Apple’s Siri, for example. That said, it is not impractical to think that in the future, these systems will be able to think on their own, decide the best possible way to execute a business process and learn from its own actions to improve the system. [dropcap]7[/dropcap] To use the RPA software, one needs to have basic programming skills Surprisingly, this is not true. Associates who use the RPA system need not have any programming knowledge. They only need to understand how the software works on the front-end, and how they can assign tasks to the RPA worker for automation. On the other hand, RPA system developers do require some programming skills, such as knowledge of scripting languages. Today, there are various platforms for developing RPA tools such as UIPath, Blueprism and more, which empower RPA developers to build these systems without any hassle, reducing their coding responsibilities even more. [dropcap]8[/dropcap] RPA software is fully automated and does not require human supervision This is a big myth. RPA is often misunderstood as a completely automated system. Humans are indeed required to program the RPA bots, to feed them tasks for automation and to manage them. The automation factor here lies in aggregating and performing various tasks which otherwise would require more than one human to complete. There’s also the efficiency factor which comes into play - the RPA systems are fast, and almost completely avoid faults in the system or the process that are otherwise caused due to human error. Having a digital workforce in place is far more profitable than recruiting human workforce. Conclusion One of the most talked about areas in terms of technological innovations, RPA is clearly still in its early days and is surrounded by a lot of myths. However, there’s little doubt that its adoption will take off rapidly as RPA systems become more scalable, more accurate and deploy faster. AI, cognitive, and Analytics-driven RPA will take it up a notch or two, and help the businesses improve their processes even more by taking away dull, repetitive tasks from the people. Hype can get ahead of the reality, as we've seen quite a few times - but RPA is an area definitely worth keeping an eye on despite all the hype.

0
0
20308

Amey Varangaonkar

06 Nov 2017

6 min read

NewSQL: What the hype is all about

Amey Varangaonkar

06 Nov 2017

6 min read

First, there was data. Data became database. Then came SQL. Next came NoSQL. And now comes NewSQL. NewSQL Origins For decades, relational database or SQL was the reigning data management standard in enterprises all over the world. With the advent of Big Data and cloud-based storage rose the need for a faster, more flexible and scalable data management system, which didn’t necessarily comply with the SQL standards of ACID compliance. This was popularly dubbed as NoSQL, and databases like MongoDB, Neo4j, and others gained prominence in no time. We can attribute the emergence and eventual adoption of NoSQL databases to a couple of very important factors. The high costs and lack of flexibility of the traditional relational databases drove many SQL users away. Also, NoSQL databases are mostly open source, and their enterprise versions are comparatively cheaper too. They are schema-less meaning they can be used to manage unstructured data effectively. In addition, they can scale well horizontally - i.e. you could add more machines to increase computing power and use it to handle high volumes of data. All these features of NoSQL come with an important tradeoff, however - these systems can’t simultaneously ensure total consistency. Of late, there has been a rise in another type of database systems, with the aim to combine ‘the best of both the worlds’. Popularly dubbed as ‘NewSQL’, this system promises to combine the relational data model of SQL and the scalability and speed of NoSQL. NewSQL - The dark horse in the databases race NewSQL is ‘SQL on Steroids’, say many. This is mainly because all NewSQL systems start with the relational data model and the SQL query language, but also incorporate the features that have led to the rise of NoSQL - addressing the issues of scalability, flexibility, and high performance. They offer the assurance of ACID transactions like in the relational models. However, what makes them really unique is that they allow the horizontal scaling functionality of NoSQL, and can process large volumes of data with high performance and reliability. This is why businesses really like the concept of NewSQL - the performance of NoSQL and the reliability and consistency of the SQL model, all packed in one. To understand what the hype surrounding NewSQL is all about, it’s worth comparing NewSQL database systems with the traditional SQL and NoSQL database systems, and see where they stand out: Characteristic Relational (SQL) NoSQL NewSQL ACID compliance Yes No Yes OLTP/OLAP support Yes No Yes Rigid Schema Structure Yes No In some cases Support for unstructured data No Yes In some cases Performance with large data Moderate Fast Very fast Performance overhead Huge Moderate Minimal Support from Community Very high High Low As we can see from the table above, NewSQL really comes through as the best when you’re dealing with larger datasets with a desire to lower performance overheads. To give you a practical example, consider an organization that has to work with a large number of short transactions, access a limited amount of data, but executes those queries repeatedly. For such organizations, a NewSQL database system would be a perfect fit. These features are leading to the gradual growth of NewSQL systems. However, it will take some time for more industries to adopt them. Not all NewSQL databases are created equal Today, one has a host of NewSQL solutions to choose from. Some popular solutions are Clustrix, MemSQL, VoltDB and CockroachDB. Cloud Spanner, the latest NewSQL offering by Google, became generally available in February 2017 - indicating Google’s interest in the NewSQL domain and the value a NewSQL database can offer to their existing cloud offerings. It is important to understand that there are significant differences among these various NewSQL solutions. As such you should choose a NewSQL solution carefully after evaluating your organization’s data requirements and problems. As this article on Dataconomy points out, while some databases handle transactional workloads well, they do not offer the benefit of native clustering - SAP HANA is one such example. NuoDB focuses on cloud deployments, but its overall throughput is found to be rather sub-par. MemSQL is a suitable choice when it comes to clustered analytics but falls short when it comes to consistency. Thus, the choice of the database purely depends on the task you want to do, and what trade-offs you are ready to allow without letting it affect your workflow too much. DBAs and Programmers in the NewSQL world Regardless of which database system an enterprise adopts, the role of DBAs will continue to be important going forward. Core database administration and maintenance tasks such as backup, recovery, replication, etc. will need to be taken care of. The major challenge for the NewSQL DBAs will be in choosing and then customizing the right database solution that fits the organizational requirements. Some degree of capacity planning and overall database administration skills might also have to be recalibrated. Likewise, NewSQL database programmers may find themselves dealing with data manipulation and querying tasks similar to those faced while working with traditional database systems. But NewSQL programmers will be doing these tasks at a much larger, or shall we say, at a more ‘distributed’ scale. In conclusion When it comes to solving a particular problem related to data management, it’s often said that 80% of the solution comes down to selecting the right tool, and 20% is about understanding the problem at hand! In order to choose the right database system for your organization, you must ask yourself these two questions: What is the nature of the data you will work with? What are you willing to trade-off? In other words, how important are factors such as the scalability and performance of the database system? For example, if you primarily work with mostly transactional data with a priority on high performance and high scalability, then NewSQL databases might fit your bill just perfectly. If you’re going to work with volatile data, NewSQL might help you there as well, however, there are better NoSQL solutions to tackle your data problem. As we have seen earlier, NewSQL databases have been designed to combine the advantages and power of both relational and NoSQL systems. It is important to know that NewSQL databases are not designed to replace either NoSQL or SQL relational models. They are rather intentionally-built alternatives for data processing, which mask the flaws and shortcomings of both relational and nonrelational database systems. The ultimate goal of NewSQL is to deliver a high performance, highly available solution to handle modern data, without compromising on data consistency and high-speed transaction capabilities.

0
0
32503

article-image-best-tools-improve-your-development-workflow

Antonio Cucciniello

01 Nov 2017

5 min read

The best tools to improve your development workflow

Antonio Cucciniello

01 Nov 2017

5 min read

For thoseweb developers out there who are looking for some tools that can help them become more productive and efficient, this list is for you. I will go through some of the basic tools you can take a look at in order to become a better web developer. Text editors First off, where do we write our code? That's right, in text editors. You need to have a text editor that you trust and one that you love. I have used a couple that I will recommend that you check out. Currently I am using Atom. It is a text editor that is minimal but can have plenty of features added to it. You may install various plugins that make things easier, or connect it to things like GitHub for your source control needs with your project. Another text editor I use is Sublime Text. This is extremely similar to Atom. It is actually a little faster to open than Atom as well. The only issue with this editor is that when you are using the free version, it asks you to donate or buy every couple of times you save a file. This can get annoying, but nonetheless, it's still a very powerful editor. The main key here is to find something that you love. Stick with it, and learn the ins and outs of it. You want to become a pro with it. This will greatly increase your productivity if you know your text editor inside and out. Source control High on the list of must have tools for web development, or even just development in general, is a form of Source Control. You need a place to backup your code and save in multiple states. It also allows for better and easier collaboration between multiple people working in different branches. I recommend using git and GitHub. The user interface is very friendly and the two integrate seamlessly. I have also used Subversion and AWS Code Commit, but these did not leave as great as impression as GitHub did. Source Control is very important, so make sure that you have selected it and use it for every project. It also doubles as a place to display your code over time. Command line interfaces This is not a specific tool per say, because it is already part of your computer. If you have a Mac or Linux, it is terminal. If you have Windows it is the command shell. I recommend learning as many of the commands that you can. You can do things like create files, create directories, delete files, edit files, and so much more. It allows you to be so much more productive. You must think of this as your go to tool for plenty of things you end up doing. Pro Tip for Windows Users: I personally do not like the Windows Command Prompt nearly as much as the Unix one. Look into using Cygwin, which allows you to use Unix commands on a windows command prompt. Debugging If you are doing anything in web development you need to be able to debug your code. No one writes perfect code and you will inevitably spend time fixing bugs. In order to reduce time spent on bugs, you should look into a tool that can help you with debugging. Personally, if you are using Google Chrome, I suggest using Chrome DevTools. It allows you to set breakpoints, edit code, and manipulate page elements as well as checking all CSS properties of the different HTML elements on the page. It is extremely powerful and can help you when debugging to see what is happening in real time on the webpage. HTTP client I believe you need something like Postman to test HTTP requests from web services. It makes it extremely easy to test and create APIs. You can make all different types of requests, pass in whatever headers you want, and even see what the response is like! This is important for any developer who needs to make API requests. So, there you have it. These are the best tools for web development, in my opinion. I hope this list has helped you get started in improving your web development workflow. Always be on the lookout for improving your toolset as time goes on. You can always improve, so why not let these tools make it easier for you? About the Author Antonio Cucciniello is a Software Engineer with a background in C, C++ and JavaScript (Node.Js) from New Jersey. His most recent project called Edit Docs is an Amazon Echo skill that allows users to edit Google Drive files using your voice. He loves building cool things with software, reading books on self-help and improvement, finance, and entrepreneurship. Follow him on twitter @antocucciniello, and follow him on GitHub here: https://github.com/acucciniello.

0
0
3801

Tech Guides

Looking at the different types of Lookup cache

Handpicked for your Weekend Reading - 17th Nov '17

3 ways JupyterLab will revolutionize Interactive Computing

5 cool ways Transfer Learning is being used today

How to plan a system migration in 10 steps

Of perfect strikes, tackles and touchdowns: how analytics is changing sports

13 reasons why Exit Polls get it wrong sometimes

Know Your Customer: Envisaging customer sentiments using Behavioral Analytics

Why are open source developers more in demand than ever?

6 use cases of Machine Learning in Healthcare

Trending Topics

Facelifting NLP with Deep Learning

Generative Adversarial Networks (GANs): The next milestone In Deep Learning

8 Myths about RPA (Robotic Process Automation)

NewSQL: What the hype is all about

The best tools to improve your development workflow