Validating email addresses in Delphi

Is an email address valid?

Validating email addresses in Delphi



Nowadays it's very common that our programs store email addresses in databases as part of the data of personnel, customers, providers, etc. When prompting the user for an email address, how do we know if the entered value is formally correct? In this article I'll show you how to validate email addresses using a variation of the RFC #822.



The RFC #822 rules the "STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES".



According to this rule, the following are valid email addresses:



  John Doe johndoe@server.com

  John Doe <johndoe@server.com>

  "John Doe" johndoe@server.com

  "John Doe" <johndoe@server.com>



The purpose of my code is not to validate such things, but strictly what is necessary to reach a single recipient (like "johndoe@server.com"), that in the specification is referred as an "addr-spec", which has the form:



  local-part@domain


  • local-part = one "word" or more, separated by periods
  • domain = one "sub-domain" or more, separated by periods





A "word" can be an "atom" or a "quoted-string":


  • atom = one or more chars in the range #33..#126 except ()<>@,;:\/".[]
  • quoted-string = A text enclosed in double quotes that can contain 0 or more characters (#0..#127) except '"' and #13. A backslash ('\') quotes the next character.





A "sub-domain" can be a "domain-ref" (an "atom") or a "domain-literal":


  • domain-literal = A text enclosed in brackets that can contain 0 or more characters (#0..#127) except '[', ']' and #13. A backslash ('\') quotes the next character.





According to the RFC 822, extended characters (#128..#255) cannot be part of an email address, however many mail servers accept them and people use them, so I'm going to take them into account.



The RFC 822 is very open about domain names. For a real Internet email address maybe we should restrict the domain part. You can read more about domain names in the RFC #1034 and RFC #1035.



For the RFC 1034 and the RFC 1035, a domain name is formed by "sub-domains" separated by periods, and each subdomain starts with a letter ('a'..'z', 'A'..'Z') and should be followed by zero or more letters, digits and hyphens, but cannot end with a hyphen. We are going to consider that a valid domain should have at least two "sub-domains" (like "host.com").



Now that we have the rules clear, let's get to the work. The algorithm for the function resembles a states-transition machine. Characters of the string are processed in a loop, and for each character first we determine in which state the machine is and then we process the character accordingly, to determine if the machine should continue in that state, switch to a different state or produce an error (breaking the loop). These kind of algorithms are extensively treated in programming-algorithms textbooks, so let's get right to the code:



function ValidEmail(email: string): boolean;

  // Returns True if the email address is valid

  // Author: Ernesto D'Spirito

  const

    // Valid characters in an "atom"

    atom_chars = [#33..#255] - ['(', ')', '<', '>', '@', ',', ';', ':',

                                '\', '/', '"', '.', '[', ']', #127];

    // Valid characters in a "quoted-string"

    quoted_string_chars = [#0..#255] - ['"', #13, '\'];

    // Valid characters in a subdomain

    letters = ['A'..'Z', 'a'..'z'];

    letters_digits = ['0'..'9', 'A'..'Z', 'a'..'z'];

    subdomain_chars = ['-', '0'..'9', 'A'..'Z', 'a'..'z'];

  type

    States = (STATE_BEGIN, STATE_ATOM, STATE_QTEXT, STATE_QCHAR,

      STATE_QUOTE, STATE_LOCAL_PERIOD, STATE_EXPECTING_SUBDOMAIN,

      STATE_SUBDOMAIN, STATE_HYPHEN);

  var

    State: States;

    i, n, subdomains: integer;

    c: char;

  begin

    State := STATE_BEGIN;


    n := Length(email);

    i := 1;

    subdomains := 1;

    while (i <= n) do begin

      c := email[i];

      case State of

      STATE_BEGIN:

        if c in atom_chars then

          State := STATE_ATOM

        else if c = '"' then

          State := STATE_QTEXT

        else

          break;

      STATE_ATOM:

        if c = '@' then

          State := STATE_EXPECTING_SUBDOMAIN

        else if c = '.' then

          State := STATE_LOCAL_PERIOD

        else if not (c in atom_chars) then

          break;

      STATE_QTEXT:

        if c = '\' then

          State := STATE_QCHAR

        else if c = '"' then

          State := STATE_QUOTE

        else if not (c in quoted_string_chars) then

          break;

      STATE_QCHAR:

        State := STATE_QTEXT;

      STATE_QUOTE:

        if c = '@' then

          State := STATE_EXPECTING_SUBDOMAIN

        else if c = '.' then

          State := STATE_LOCAL_PERIOD

        else

          break;

      STATE_LOCAL_PERIOD:

        if c in atom_chars then

          State := STATE_ATOM

        else if c = '"' then

          State := STATE_QTEXT

        else

          break;

      STATE_EXPECTING_SUBDOMAIN:

        if c in letters then

          State := STATE_SUBDOMAIN

        else

          break;

      STATE_SUBDOMAIN:

        if c = '.' then begin

          inc(subdomains);

          State := STATE_EXPECTING_SUBDOMAIN

        end else if c = '-' then

          State := STATE_HYPHEN

        else if not (c in letters_digits) then

          break;

      STATE_HYPHEN:

        if c in letters_digits then

          State := STATE_SUBDOMAIN

        else if c <> '-' then

          break;

      end;

      inc(i);

    end;

    if i <= n then

      Result := False

    else

      Result := (State = STATE_SUBDOMAIN) and (subdomains >= 2);

  end;


Any collaboration to improve this function will be welcome.



You can find the full source code of this article in the archive that accompanies the Pascal Newsletter #22.

 

Share this article!

Follow us!

Find more helpful articles: