How To Extract HTML Meta Data with Perl

Perl is a programming language that is commonly used by programmers because of its ease of use for general purposes.  Programmers that have been programming for a while but are new to Perl will find it easy to learn, since it was influenced by other programming languages. The program uses functions similar to other program languages. Developed by Larry Wall in 1987, it was originally a text manipulation program, but later versions have far exceeded its original purpose. Perl, known as a reporting language and a practical extraction program, is now used in network programming, video games, web development and much more.

Metadata refers to the information describing data elements or attributes like name, type, size etc. This information is used to better understand the type of content being presented. HTML facilitates the use of Meta data in coding and structuring a web site. The Meta data found in HTML code are used to describe what the web site is about, and allows search engine robots to crawl for this information when a search is made for a specific keyword. Perl can be used to extract this information or HTML Meta data from a website by writing a little program to search and deliver a report showing the meta data details.

Extract HTML Meta data through HTML::HeadParser

Open your perl code editor and prepare to write the code to extract metadata found between the <head> and </head> html tags.


print "Content-type: text/html\n\n";

We start out using two modules.

The LWP::Simple; module downloads the web page you are going to extract the meta data from, then followed by the HTML::HeadParser; module:

use LWP::Simple;

use HTML::HeadParser;

The $URL = get command targets the website's address:

$URL = get ("");

$Head = HTML::HeadParser->new;


The following commands will extract the metadata from the Title, Description, Keywords, content type and content language:

print $head->header('Title') . "\n\n";

print $head->header('X-Meta-Description') . "\n\n";

print $head->header('X-Meta-Keywords') . "\n\n";

print $head->header('Content-Type') . "\n\n";

print $head->header('Content-Language') . "\n\n";


Run the program, then the metadata information will be displayed.

Perl is a simple programming language that is practical and efficient. A task to extract metadata from a website is made easy with a simple Perl program, as you can see from the code. The information extracted from the metadata can help you analyze how a particular website was optimized for the search engine. The metadata found between the <head> and </head> HTML tags can tell you what keywords the website is ranking for and help you decide what keywords you can niche for.


Share this article!

Follow us!

Find more helpful articles: