Showing posts with label validation. Show all posts
Showing posts with label validation. Show all posts

Monday, August 25, 2008

PHP Email Address validation through SMTP

Here is a PHP class written for PHP4 and PHP5 that will validate email addresses by querying the SMTP (Simple Mail Transfer Protocol) server. This is meant to complement validation of the syntax of the email address, which should be used before validating the email via SMTP, which is more resource and time consuming.

Update: Sept 8, 2008

The class has been updated to work with Windows MTA's such as Hotmail and many other fixes have been made. See changes. The class will no longer get you blacklisted by Hotmail due to improper HELO procedure.

Update: Sept 10, 2008

Window Support Added through Net_DNS (pear DNS class). Added support for validating multiple emails on the same domain through a single Socket. Improved the Email Parsing to support literal @ signs.

Update: Sept 29, 2008

The code for this project has been moved to Google Code. The latest source can be grabbed from SVN.

Update: Nov 22, 2008

SMTP Email Validation Class has been added to the Yii PHP Framework. http://www.yiiframework.com/. Yii is a high-performance component-based PHP framework for developing large-scale Web applications.

<?php
 
 /**
 * Validate Email Addresses Via SMTP
 * This queries the SMTP server to see if the email address is accepted.
 * @copyright http://creativecommons.org/licenses/by/2.0/ - Please keep this comment intact
 * @author [email protected]
 * @contributers [email protected]
 * @version 0.1a
 */
class SMTP_validateEmail {

 /**
  * PHP Socket resource to remote MTA
  * @var resource $sock 
  */
 var $sock;

 /**
  * Current User being validated
  */
 var $user;
 /**
  * Current domain where user is being validated
  */
 var $domain;
 /**
  * List of domains to validate users on
  */
 var $domains;
 /**
  * SMTP Port
  */
 var $port = 25;
 /**
  * Maximum Connection Time to an MTA 
  */
 var $max_conn_time = 30;
 /**
  * Maximum time to read from socket
  */
 var $max_read_time = 5;
 
 /**
  * username of sender
  */
 var $from_user = 'user';
 /**
  * Host Name of sender
  */
 var $from_domain = 'localhost';
 
 /**
  * Nameservers to use when make DNS query for MX entries
  * @var Array $nameservers 
  */
 var $nameservers = array(
 '192.168.0.1'
);
 
 var $debug = false;

 /**
  * Initializes the Class
  * @return SMTP_validateEmail Instance
  * @param $email Array[optional] List of Emails to Validate
  * @param $sender String[optional] Email of validator
  */
 function SMTP_validateEmail($emails = false, $sender = false) {
  if ($emails) {
   $this->setEmails($emails);
  }
  if ($sender) {
   $this->setSenderEmail($sender);
  }
 }
 
 function _parseEmail($email) {
  $parts = explode('@', $email);
 $domain = array_pop($parts);
 $user= implode('@', $parts);
 return array($user, $domain);
 }
 
 /**
  * Set the Emails to validate
  * @param $emails Array List of Emails
  */
 function setEmails($emails) {
  foreach($emails as $email) {
  list($user, $domain) = $this->_parseEmail($email);
  if (!isset($this->domains[$domain])) {
    $this->domains[$domain] = array();
  }
  $this->domains[$domain][] = $user;
 }
 }
 
 /**
  * Set the Email of the sender/validator
  * @param $email String
  */
 function setSenderEmail($email) {
 $parts = $this->_parseEmail($email);
 $this->from_user = $parts[0];
 $this->from_domain = $parts[1];
 }
 
 /**
 * Validate Email Addresses
 * @param String $emails Emails to validate (recipient emails)
 * @param String $sender Sender's Email
 * @return Array Associative List of Emails and their validation results
 */
 function validate($emails = false, $sender = false) {
  
  $results = array();

  if ($emails) {
   $this->setEmails($emails);
  }
  if ($sender) {
   $this->setSenderEmail($sender);
  }

  // query the MTAs on each Domain
  foreach($this->domains as $domain=>$users) {
   
  $mxs = array();
  
   // retrieve SMTP Server via MX query on domain
   list($hosts, $mxweights) = $this->queryMX($domain);

   // retrieve MX priorities
   for($n=0; $n < count($hosts); $n++){
    $mxs[$hosts[$n]] = $mxweights[$n];
   }
   asort($mxs);
 
   // last fallback is the original domain
   array_push($mxs, $this->domain);
   
   $this->debug(print_r($mxs, 1));
   
   $timeout = $this->max_conn_time/count($hosts);
    
   // try each host
   while(list($host) = each($mxs)) {
    // connect to SMTP server
    $this->debug("try $host:$this->port\n");
    if ($this->sock = fsockopen($host, $this->port, $errno, $errstr, (float) $timeout)) {
     stream_set_timeout($this->sock, $this->max_read_time);
     break;
    }
   }
  
   // did we get a TCP socket
   if ($this->sock) {
    $reply = fread($this->sock, 2082);
    $this->debug("<<<\n$reply");
    
    preg_match('/^([0-9]{3}) /ims', $reply, $matches);
    $code = isset($matches[1]) ? $matches[1] : '';
 
    if($code != '220') {
     // MTA gave an error...
     foreach($users as $user) {
      $results[$user.'@'.$domain] = false;
  }
  continue;
    }

    // say helo
    $this->send("HELO ".$this->from_domain);
    // tell of sender
    $this->send("MAIL FROM: <".$this->from_user.'@'.$this->from_domain.">");
    
    // ask for each recepient on this domain
    foreach($users as $user) {
    
     // ask of recepient
     $reply = $this->send("RCPT TO: <".$user.'@'.$domain.">");
     
      // get code and msg from response
     preg_match('/^([0-9]{3}) /ims', $reply, $matches);
     $code = isset($matches[1]) ? $matches[1] : '';
  
     if ($code == '250') {
      // you received 250 so the email address was accepted
      $results[$user.'@'.$domain] = true;
     } elseif ($code == '451' || $code == '452') {
   // you received 451 so the email address was greylisted (or some temporary error occured on the MTA) - so assume is ok
   $results[$user.'@'.$domain] = true;
     } else {
      $results[$user.'@'.$domain] = false;
     }
    
    }
    
    // quit
    $this->send("quit");
    // close socket
    fclose($this->sock);
   
   }
  }
 return $results;
 }


 function send($msg) {
  fwrite($this->sock, $msg."\r\n");

  $reply = fread($this->sock, 2082);

  $this->debug(">>>\n$msg\n");
  $this->debug("<<<\n$reply");
  
  return $reply;
 }
 
 /**
  * Query DNS server for MX entries
  * @return 
  */
 function queryMX($domain) {
  $hosts = array();
 $mxweights = array();
  if (function_exists('getmxrr')) {
   getmxrr($domain, $hosts, $mxweights);
  } else {
   // windows, we need Net_DNS
  require_once 'Net/DNS.php';

  $resolver = new Net_DNS_Resolver();
  $resolver->debug = $this->debug;
  // nameservers to query
  $resolver->nameservers = $this->nameservers;
  $resp = $resolver->query($domain, 'MX');
  if ($resp) {
   foreach($resp->answer as $answer) {
    $hosts[] = $answer->exchange;
    $mxweights[] = $answer->preference;
   }
  }
  
  }
 return array($hosts, $mxweights);
 }
 
 /**
  * Simple function to replicate PHP 5 behaviour. http://php.net/microtime
  */
 function microtime_float() {
  list($usec, $sec) = explode(" ", microtime());
  return ((float)$usec + (float)$sec);
 }

 function debug($str) {
  if ($this->debug) {
   echo htmlentities($str);
  }
 }

}

 
?>

Using the PHP SMTP Email Address Validation Class

Example Usage:

// the email to validate
$email = '[email protected]';
// an optional sender
$sender = '[email protected]';
// instantiate the class
$SMTP_Valid = new SMTP_validateEmail();
// do the validation
$result = $SMTP_Valid->validate($email, $sender);
// view results
var_dump($result);
echo $email.' is '.($result ? 'valid' : 'invalid')."\n";

// send email? 
if ($result) {
  //mail(...);
}

Code Status

This is a very basic, and alpha version of this php class. I just wrote it to demonstrate an example. There are a few limitations. One, it is not optimized. Each email you verify will create a new MX DNS query and a new TCP connection to the SMTP server. The DNS query and TCP socket is not cached for the next query at all, even if they are to the same host or the same SMTP server.
Second, this will only work on Linux. Windwos does not have the DNS function needed. You could replace the DNS queries with the Pear Net_DNS Library if you need it on Windows.

Limitations of verifying via SMTP

Not all SMTP servers are configured to let you know that an email address does not exist on the server. If the SMTP server does respond with an "OK", it does not mean that the email address exists. It just means that the SMTP server will accept the email address and not bounce it. What it does with the actual email is different. It may deliver it to the recipient, or it may just send it to a blackhole.
If you get an invalid response from the SMTP server however, you can be pretty sure your email will bounce if you actually send it.
You should also NOT use this class to try and guess emails, for spamming purposes. You will quickly get blacklisted on Spamhaus or a similar list.

Good uses of verifying via SMTP

If you have forms such as registration forms, where users enter their email addresses. It may be a good idea to first check the syntax of the email address, to see if it is valid as per the SMTP protocol specifications. Then if it is valid, you may want to verify that the email will be accepted (will not bounce). This can allow you to notify the user of a problem with their email address, in case they made a typo, knowingly entered an invalid email. This could increase the number of successful registrations.

How it works

If you're interested in how it works, it is quite simple. The class will first take an email, and separate it to the user and host portions. The host portion, tells us which domain to send the email to. However, a domain may have an SMTP server on a different domain so we retrieve a list of SMTP servers that are available for the domain by doing a DNS query of type MX on that domain. We receive a list of SMTP servers, so we iterate through each trying to make a connection. Once connected, we send SMTP commands to the SMTP server, first saying "HELO", then setting our sender, then our recipient. If the recipient is rejected, we know an actual sending of an email will fail. Thus, we close the TCP connection to the SMTP server and quit.

Sunday, January 20, 2008

Cleaning xHTML markup with PHP Tidy

Everyone makes mistakes. Even the best xHTML coders will sometimes write invalid xHTML. Not to worry, PHP can automatically clean up xHTML before display using the PHP Tidy Extension.

PHP Tidy uses the Tidy Parser. Tidy, is ported to many programming languages, and allows the language to clean up XML documents. It works well for xHTML.

In PHP5, the tidy extension is a default extension, however, in PHP4 you will need to download the Tidy PHP4 extension and compile the PHP executable with Tidy support.

How to use Tidy in PHP is documented here. Here is some examples of what Tidy can do.

Example use of Tidy in PHP

For code portability/distribution its necessary to first check if the tidy extension is available on your PHP version. You can do this by querying the existence of the tidy functions or classes (among other methods). So first you check if Tidy support is availalbe:

if (function_exists('tidy_parse_string')) {
// do your tidy stuff
}
Then comes the tidying. For simplicity, I'll use the single PHP Tidy function, 'tidy_repair_string'.

// Specify configuration
$config = array(
 'indent'         => true,
 'output-xhtml'   => true,
 'wrap'           => 200);
// Specify encoding
$encoding = 'utf8';
// repair HTML
$html = tidy_repair_string($html, $config, $encoding);

This works for both PHP4 and PHP5. PHP5 also supports an OO syntax.

Example Implementation: PHP Tidy Plugin for Joomla

Here is how I implemented the PHP Tidy Plugin into Joomla.

Joomla is a Content Management System, thus you cannot directly control the xHTML that will go into your articles. Some of your users may not be very xHTML savvy. The main reason I implemented Tidy is to clean content inserted automatically from feeds - which you have absolutely no control over.

A Joomla Plugin implements a basic Observer Pattern into Joomla. Functions are registered as observers, which are triggered during certain events. One such event is the preparation of content for display. The tidy plugin thus registers as a handler of content preparation. It then passes all content through the tidy parser, and returns the clean xHTML to Joomla.

The Joomla Tidy Plugin Code


/**
* @copyright Copyright (C) 2007 Fiji Web Design. All rights reserved.
* @license http://www.gnu.org/copyleft/gpl.html GNU/GPL
* @author [email protected]
*/

// no direct access
defined( '_VALID_MOS' ) or die( 'Restricted access' );

// register content event handlers
$_MAMBOTS->registerFunction( 'onPrepareContent', 'bot_tidy' );

/**
*  Tidy up the xHTML of your content
*/
function bot_tidy( $published, &$row, &$params, $page=0 ) {
 
 if ($published) {
  // get the plugin parameters
  //$botParams = bot_tidy_getParams('bot_tidy');

  if (isset($row->text) && $row->text) {
   $row->text = bot_tidy_parse($row->text);
  }

 }
 return true;
}

/**
* Parses a string with tidy taking into consideration the Joomla encoding
* @param String xHTML
*/
function bot_tidy_parse($html) {
 if (function_exists('tidy_parse_string')) {
  
  // Specify configuration
  $config = array(
       'indent'         => true,
       'output-xhtml'   => true,
       'wrap'           => 200);
  // get Joomla content encoding
  $iso = split( '=', _ISO );
  $encoding = '';
  $jos_enc = str_replace('-', '', $iso[1]);
  if (in_array($jos_enc, array('ascii', 'latin0', 'latin1', 'raw', 'utf8', 'iso2022', 'mac', 'win1252', 'ibm858', 'utf16', 'utf16le', 'utf16be', 'big5', 'shiftjis'))) {
   $encoding = $jos_enc;
  }
  
  // Tidy
  $html = tidy_repair_string($html, $config, $encoding);
  
  return $html
  ."\r\n"
  ;
 } else {
  return $html
  ."\r\n"
  ;
 }
}

Here is the tidy plugin for Joomla.

Tidy is great for Content Management Systems where content is contributed by users with differing levels of xHTML knowledge. It is also necessary if you want content from RSS feeds to pass W3C validation (if they contain xHTML like the Google News Feeds). I've noticed however, that PHP Tidy does not always create valid xHTML content. It does however create valid XML every time. This is yet to be explored further as I have just released Joomla Tidy Plugin for Alpha testing.