Introduction
I will never forget the first
assignment of my apprenticeship. My new employer had a website with
over two-hundred pages, and wanted to add a new item to the navigation
menu. It was my first day on the job and I was determined and eager to
please as I opened the first page, added the new link and, after
saving, closed the file. Then the next one and the next one and the
next one.... After opening, editing, saving and closing fifty or so
files, I was starting to wonder if there
wasn't a better way to do this. After opening, editing, saving and
closing the one-hundred-and-twenty-fourth file, I knew that there had
to be a better way because I was not about to trade my sanity for an
apprenticeship. Sure enough, after a bit of searching, I found the
better way: PHP and query strings.
The Basics
The
layout and menu were identical on all the pages and the only things
that were different were the title and the content. I took one of the
pages, replaced the title and content with two PHP variables and saved
it as "index.php." I then defined the two PHP variables in an external file which was included at the top of my new index.
An old page looked like this:
<html> <head> <title>Products</title> <body> <div class="menu">
<ul> <li><a href="home.html">HOME</a></li> <li><a href="contact.html">CONTACT</a></li>
<li><a href="links.html">LINKS</a></li> <li><a href="products.html">PRODUCTS</a></li> </ul>
</div> <div class="content"> <h1>Our Products</h1> <p>Bla bla bla blub.</p> </div>
</body> </html>
And the new template (index.php) where the title and content get parsed in using PHP variables looked like this:
<?php include('page.php'); ?> <html> <head> <title><?php echo($title); ?></title>
<body> <div class="menu"> <ul> <li><a href="index.php?page=home">HOME</a></li> <li><a href="index.php?page=contact">CONTACT</a></li>
<li><a href="index.php?page=links">LINKS</a></li> <li><a href="index.php?page=products">PRODUCTS</a></li> </ul>
</div> <div class="content"> <?php echo($content); ?> </div> </body> </html>
And the external file ("page.php") which was included at the top of the template:
<?php $title="Products"; $content="\t<h1>Our Products</h1>\n\t<p>Bla bla bla blub.</p>"; ?>
The
HTML source code generated by both the old and the new pages was
identical but now I could use PHP to change the content of the $title and $content variables. I then created a sub-folder called "pages" and some files for testing: "home.php", "contact.php", "links.php" and "products.php". By modifying the PHP at the top of index.php, I could dynamically load these pages using the value in the query string:
<?php include('./pages'.$_GET['page'].'.php'); ?>
A short explanation: The $_GET array holds the values that are passed in the query string after the "?" question mark and seperated by "&" ampersands. So a query string like: "http://www.mysite.com/index.html ? page=start & id=236 & name=john_doe", stores the following values in $_GET array:
$_GET['page'] contains "start" $_GET['id'] contains "236" $_GET['name'] contains "john_doe"
So, now I still had to convert all the old pages to .php files but,
in the future, changes to the layout would mean only having to change
my template instead of having to change two-hundred pages.
Security Issues
This one little line of PHP that I used to include the pages is
completely inadequate. Not only can it quickly cause interpreter
errors, but it is also a huge security risk. For example, what happens
if someone enters a query string like "http://www.mysite.com/index.php?page=bla". The PHP intepreter will try to include "pages/bla.php" and will throw an error because that file does not exist. Another common problem is that the "page" variable in the query string is not set ("http://www.mysite.com/"). This also results in a PHP error when the interpreter tries to include "pages/.php".
The worst case scenario is that on a Unix server a malicious user could
enter the following query string to output the contents of your "passwd" file: "http://www.mysite.com/index.php?page=../../etc/passwd".
If your server security is not configured properly, your usernames and
passwords will be accessible from any browser in the world! So, how do
we go about preventing this?
Handling 404's
The first step towards making this method secure is preventing
clients from requesting pages that do not exist. First we check if the $_GET['page'] variable is valid and then we check if the requested file exists before we include it. I use an inline if statement to check the $_GET['page'] variable. If the $_GET['page'] variable is set and contains a value, then write it into the $page variable, otherwise $page will equal 'home.' Then I check if the requested file exists and, if not, then set the $page variable to 'error.':
<?php // check the $_GET['page'] variable $page = ((isset($_GET['page']) && $_GET['page'] != '') ? $_GET['page'] : 'home');
// check if file exists $page = (file_exists('./pages'.$page.'.php') ? $page : 'error');
include('./pages'.$page.'.php'); ?>
Of course there has to be an "error.php" page in your "pages" folder that tells the user the requested page is not available. A redirect to the standard 404 page is also a good option.
Prevent Browsing of the File Structure
It is also important that the clients are only allowed to request pages from your "pages" folder. That means the value stored in the $_GET['page']
variable is only allowed to be a string containing alphanumerical
characters plus "-", "_", "." and spaces. In other words, a valid
filename without the .php extension. There are various ways to check
this: perl-compatible regular expressions, POSIX-Extended regular
expressions, character type functions, or the normal string functions.
I will use the PCRE functions preg_match and preg_replace
to check for malicious requests and remove illegal characters. First I
catch requests up or down the folder structure by checking for ".." or
"/" in the $page variable. Then I remove all illegal characters from the $page variable by replacing them with an empty string. If there is anything suspicious in the query string, just set the $page variable to "home."This code comes right after the line where the $_GET['page'] variable is validated and before the file_exists function is called:
// prevent file browsing $page=(preg_match('/(\.\.|\/)/i',$page)?'home':$page);�
// replace illegal characters $page = preg_replace('/[^a-zA-Z0-9 \._-]/', '', $page);
Now the user can only request valid files from the pages folder. All other requests are caught and dealt with.
Finished!
<?php // check the $_GET['page'] variable $page = ((isset($_GET['page']) && $_GET['page'] != '') ? $_GET['page'] : 'home');
// prevent file browsing $page=(preg_match('/(\.\.|\/)/i',$page)?'home':$page);�
// replace illegal characters $page = preg_replace('/[^a-zA-Z0-9 \._-]/','',$page);
// check if the requested file exists $page = (file_exists('./pages'.$page.'.php') ? $page : 'error');
// and include the page include('./pages'.$page.'.php'); ?> <html> <head> <title><?php echo($title); ?></title> <body> <div class="menu">
<ul> <li><a href="index.php?page=home">HOME</a></li> <li><a href="index.php?page=contact">CONTACT</a></li>
<li><a href="index.php?page=links">LINKS</a></li> <li><a href="index.php?page=products">PRODUCTS</a></li> </ul>
</div> <div class="content"> <?php echo($content); ?> </div> </body> </html>
An example of an included file:
<?php $title="Welcome"; $content="\t<h1>Welcome</h1>\n\t<p>to our super cool website!.</p>"; ?>
Of course the included files can contain more than just two variable declarations....
A Bit of Cleanup
There are some drawbacks to this method. The first one is that the
query strings do not make very pretty URL's. They also let people know
what kind of technology you're using to generate your pages. This can
be a plus or a minus, personally, I see it as the latter. There is a
way of cleaning up your URL's using mod_rewrite. Getting mod_rewrite
to work is complicated process for me to describe it here, but if
you've got it up and running on your site already, then you can use
this rewrite rule to get rid of those query strings:
RewriteRule ^([A-Za-z0-9_]+)/?$ index.php?page=$1 [L]
Now a nice clean URL like http://www.yoursite.com/contact will be silently redirected to http://www.yoursite.com/index.php?page=contact
So, before you copy your layout fifty times into fifty new .html files, at least consider $_GET['ting'] pages (safely) with PHP.
|