Tuesday, February 23, 2010

Data validation musings

As I take on the responsibility of supporting more and more web applications I've come to realize that the most difficult and time consuming bugs to not only eradicate, but clean up after, are those which leak bad data into the backing databases.  As a result I find myself spending generous amounts of time working on data validation aspects in the new applications I create.

In web applications there are usually several layers of validation working in concert:

  1. Client validation.  Read: Javascript.  This form of validation became the norm in web applications over the past decade as Javascript has gained wider acceptance in the browser.  Since web applications are largely about the user experience in many ways, these weak checks can be quite important for a variety of reasons, the least of which relates to the basic integrity of a to-be-persisted piece of data.  From my perspective as data defender, I don't really consider Javascript validation to be, well, validation, at all.  It's 100% user experience in my view.  That's not to say that it isn't a key player in maintaining the correctness of data; after all, since users of web applications tend to interact with other players indirectly via the database, the means by which they see and interact with the data can have a serious impact on how they subsequently manipulate it.  AJAX helps web applications respond to and keep the users informed of events that have changed or which cannot be ascertained at the time of the initial request.
  2. Application code validation.  I've seen a lot of programs limp along with all validation being done at the server-side/application code level.  This is traditionally where most of the business logic lives.  And what are validations if not pure, codified business rules?
  3. Database validation.  This is where the edge cases are caught.  The place where the scenario of two people hitting a button at the exact same time is either caught and dealt with, or blissfully ignored at the cost of data integrity.  Functional tests can easily miss these cases, as they're usually the hardest to replicate.  Stress testing probably unearths more of them, but in a lot of production applications there are usually some cracks that, while small and unlikely to show themselves, exist, and which seem to be the most troublesome to debug.
The enumerations above are in practice not really as distinct as the numbered list would have you believe.  AJAX can blur the line between client-side validation and server-side validation, acting as a mere messenger rather than doing any work itself.  And the line between application code and database constraints is even more blurry, with transactions and vendor-specific features like stored procedures living in the fuzzy ether between the two.

While the validation of data is typically driven by business rules explicitly or implicitly, the method by which validations are written to enforce those rules varies quite a bit.  For example, a rule that users must enter their first and last names prior to registration is rather simple to enforce, as the presence or lack of values is itself enough information to determine whether the requirement has been met.  To contrast this, it becomes more complex when one must consult existing stored values in order to validate some new entry; as an example, a bank must check an account balance before determining whether a debit of $500 is allowed.  The difference between these two validations may seem minimal, especially when they sit side by side as brothers in a requirements document, but when the code hits the pavement the second example is many times more difficult to enforce absolutely.

In the first example we could simply slap a NOT NULL constraint on the relevant columns in the database and it would be enough to ensure the validity of our persistent data store.  It may not be handled in the most friendly manner, but there's no way that record will get inserted until values are provided.  If our application is the single point of access to the database and we enforce this requirement in every place in the application code which inserts or updates records, we could theoretically shift this check from the database to the application code and happily chug along for years in production with not a single problem.

The second example is not so simple.  Not only do we need to examine existing persisted data to determine whether the account has enough funds to cover the debit, we also need to serialize each transaction (overloaded term in this context) such that a situation cannot arise where simultaneous debits pass the balance check before either of them posts to the account, with the end result being a negative balance.  We cannot live in our own little bubble anymore; we must be fully aware of our environment and the other potential players.

To truly fulfill the basic integrity and correctness of data in modern day web applications, one typically must be competent in SQL, the vendor-specific quirks and features (such as PL/SQL or T-SQL) of the database being used, a server-side application programming language like C#, Java, Ruby, PHP or ASP, and most likely Javascript for the browser.  But simply "knowing" (whatever depth of comprehensive understanding that happens to convey) the appropriate languages is not enough.  One must be able to understand all the moving parts so that validations can be written in a way that all points at which these languages and tiers of the system interact with each other are conducted in such a fashion that things can't break regardless of the data being affected by simultaneous executions of the same code by other users and processes.

Saturday, February 13, 2010

Preparing a WAMP stack for Yii development - Part 1 of 2

This is the first part of a 2-part tutorial on setting up a WAMP stack for Yii Framework development. This part covers the installation of each individual component of a WAMP stack. The second part of the tutorial assumes you have your server set up, and picks up with the installation of the Yii Framework, configuration of your MySQL database, and the creation of a project.

Setting up a new development machine for Windows, Apache HTTP Server, MySQL and PHP + Yii Framework, i.e. WAMP-Yii, is a fairly easy process. WAMP being such a common configuration for web application development, some easy installation packages are out there which install and configure all these complementary components in one easy step.

I'm writing this walk-through to help those who don't have experience setting up a WAMP-Yii stack, in order to help them get up and running with the Yii Framework with minimal effort. I detest environment setup myself, and have wasted plenty of time trying to get various environments running the way I want them. It can be quite frustrating to start learning a new programming language or framework when you get stuck at the setup and configuration step.

As an aside: I recently had the experience of installing WampServer on a Windows XP machine and found it to be extremely user-friendly. If you're not comfortable installing each of the components separately, or if you just want to be up and running in a jiffy, you should definitely consider taking a look at one of the pre-configured installation packages. Unlike those I tried out several years ago, WampServer is a real treat in that it offers a nice GUI from which you can configure and manage your installation without having to make changes to your php.ini and httpd.conf files manually. For those who haven't administered PHP or Apache HTTP Server installations manually, these files, respectively, are where you do most of the tweaking to get things running to your specification. The Apache component is highly configurable, so it's common to make tweaks here and there as the need arises. If you wish to use WampServer instead of setting up each component separately, you can skip to Part 2 of the tutorial once you've got your server up and running.

I'm going to install each of these components separately, however, in the event that WampServer or another package doesn't fit your needs or isn't available for your particular environment or with the individual versions you desire.  Even if you do use a packaged one-click solution, it can be helpful to see the whole process so that you understand how each part interacts with the others, without having the ugly bits covered up by a GUI.  A warning, though: at the end of the process you'll probably wish you took the easy route ;)

Step 1: The first step is technically optional, but is quite important. As you're going to be downloading several installation packages from different websites, it's important to make sure you're getting what you expect. Most large open-source software packages are available from a series of mirrors rather than a single source, so verifying that the mirror served up the correct, unaltered file you were expecting is key from a security standpoint. In order to verify your downloads you need to get a checksum verification utility. Since you're using Windows I recommend Microsoft's File Checksum Integrity Verifier utility. Follow the instructions on the download page to install the "fciv.exe" checksum utility. It provides both md5 and sha1 checksum verification, which is what most vendors provide checksums for.

Step 2: Install MySQL Server. Select the desired MySQL installer version. I would recommend getting one of the MSI installers, as they're simple to install on a Windows machine. If you're on a 64 bit operating system you should opt for the 64 bit MSI installer. If you're on a 32 bit operating system, make sure you go for the 32 bit MSI installer, as the 64 bit version will not work. Once you've downloaded the installer, run the checksum utility you downloaded in Step 1 to verify the integrity of the file. If you downloaded the checksum utility to the same folder as the MySQL MSI file, it'll be a simple process. Open up the Windows command prompt, navigate to the download folder, and type fciv.exe <mysql-msi-file.msi> where <mysql-msi-file.msi> is the name of the MySQL msi file you downloaded (leave off the '<' and '>'). Compare the output of this command with the MD5 value provided on the download page. Currently MySQL provides MD5 checksums, but if they change to SHA1 you can type fciv.exe -sha1 <mysql-msi-file.msi> to get the SHA1 value instead of the default MD5. If the string of numbers and digits output matches the one on the download page, you can feel fairly comfortable that you have an unaltered file.  As an extra precaution you can scan the downloaded file with your Antivirus program.

Now run the MSI file to install MySQL on your machine.  You should be safe choosing the "Typical" setup type.

Continue with the wizard and hit the "Install" button.  Once complete it will pop up a window that says it finished installing.  Now make sure the "Configure the MySQL Server Now" box is ticked.
Next, make sure you select the "Detailed Configuration" radio button, so that we can make some tweaks to our MySQL Server installation.
Next, choose "Developer Machine".
The following option is up to you, but I recommend "Transactional Database only."  "Multifunction Database" is okay as well.  We're going to want to use the InnoDB storage engine for most web applications, as it supports transactions, so we need to make sure it is available.  You'll probably avoid MyISAM unless you are writing a web application that is used only for querying your database rather than issuing inserts or updates, and where speed is of the utmost importance.

You can leave the InnoDB Tablespace Setting as "Installation Path":

The next setting isn't terribly important, but I like to manually set a low number of concurrent connections since I don't typically have more than one or two open at a time during development:

The networking settings can be left as the defaults:

The character set is a decision you'll have to make based on your requirements.  I typically select UTF8:

We want our MySQL server to be running all the time on our machine as a Windows Service, rather than having to manually start it up every time, so we'll leave these boxes ticked.  The MySQL server doesn't use up to many resources, do don't fret about it hogging your RAM. You'll also want to tick the "Include Bin Directory in Windows PATH" option, as it will allow you to run the mysql utilities from the command line without having to prefix the installation directory path.

It now prompts you to enter a root password for your MySQL Server instance.  My screenshot may look a little different than what you see, because I've already set up MySQL with a password.  You'll probably see the same thing except there will be no "Current root password" field.  Just enter a root password in the "New root password" and "Confirm" fields.  Make sure you remember the password, as you'll need to know it later on; it can be difficult to reset this, so don't forget it!  Don't tick the root access on remote machines or the anonymous account options unless you have a reason to do so.

All done! You can test out your mysql server installation by opening up a command prompt. Note that you'll need to open a new one, as the PATH setting we enabled during setup will not be available on any existing command prompts you have open.

mysql -u root -p

Then you'll need to enter the root password you previously selected. If all goes well, you'll see this:

That's it for MySQL.  Now we'll move on to setting up Apache HTTP Server.

Step 3:  Download the Apache HTTP Server MSI.  According to Apache, version 1.3x was not meant to be installed as a Windows service, so they suggest version 2.x+.  I'm going to go with version 2.2, non-ssl version.  Again, run the checksum verifier to ensure the MD5 or SHA1 signatures match those provided by Apache.

After you accept the licensing agreement, you'll get to the Sever Information section.  Use localhost for the network domain and for the server name:

Next, choose the "Typical" installation.  The Apache HTTP Server will now be installed to your computer.  Once completed you should see the Apache Service Monitor icon on your toolbar.  If you right click and open it, you'll see the Apache server status:

If you need to make any alterations to the Apache HTTP Server settings in its httpd.conf file, or to the PHP settings in the php.ini file, you'll need to open up the Apache Service Monitor and restart Apache in order for them to take effect.

To test out your Apache installation, open a up browser and enter: http://localhost

If your browser gives you an error you've got a problem, but if you get the text "It works!" then everything was successful.
Now it's time to set up PHP.

Step 4:  We'll need to download a PHP installer.  If you're curious about the various different installers, here is a comparison of the thread safe and non-thread safe versions.  The decision between thread safe and non-thread safe comes down to whether we want to run PHP as a CGI or a module.

We're going to select the PHP 5.3.1 VC6 x86 Thread Safe installer.  Be careful and heed the warning on the download page not to install VC9 binaries in conjunction with the Apache HTTP Server binaries.

After accepting the license agreement you'll be prompted to select the web server.  Since we installed Apache HTTP 2.2, we'll opt to run PHP as a module:

After hitting next and selecting the install directory, we're prompted to select which extensions we want enabled.  The default selections should include, among other things, the MySQL and MySQLi extensions, as well as the PDO driver for MySQL.  IMPORTANT: If you downloaded the no-SSL version of Apache, you'll need to uncheck the OpenSSL extension.  Untick any extensions you don't think you'll need; you can always add them back in the future.  Be sure not to include any other databases than MySQL here or you may run into problems when you attempt to run php.

After clicking "Next" it will install PHP with the extensions we selected.  If you get an error message that says it had trouble accessing the httpd.conf file, you'll have to edit the Apache HTTP Server's httpd.conf file so that it can interpret PHP files.

If you get this error, don't worry: PHP probably installed successfully, it's just that your Apache HTTP Server doesn't realize it.

At this point you should open a new Command Prompt, navigate to the directory that you just installed PHP to, and try to run it by typing: php

If you get some errors about missing dll files, then you'll need to run the PHP MSI file again -- this time it will have a "Change" installation option -- and pay very careful attention to which extensions you're installing.  Uncheck MSSQL, PostgreSQL, and any other databases, as they create dependencies that your system may not be able to support.  Any extensions that have dependencies could cause problems.

If no errors occurred when you ran the "php" command you're probably in good shape.  Now you need to configure Apache so that it runs PHP as a module.

You need to navigate to your Apache 2.2 installation's conf folder.  It's probably located at C:\Program Files\Apache Software Foundation\Apache2.2\conf.  Once you locate it, open up the httpd.conf file within.  If the file is read-only, then you'll need to run your text editor as an administrator and then open the file.  To do this, locate your text editor's executable, e.g. Notepad (Start->Accessories->Notepad), right click on it and select "Run as administrator" and then go to File->Open, and then locate and open the httpd.conf file in the aforementioned location.

Search for this line in the httpd.conf file:

#LoadModule rewrite_module modules/mod_rewrite.so

And remove the '#' from the beginning of it in order to uncomment it. This will enable Mod Rewrite, the magical module that makes our URLs pretty.

At the end of all the LoadModule lines, add the following two lines (edit the path to your PHP install directory if applicable):

LoadModule php5_module "C:/Program Files/PHP/php5apache2_2.dll"

PHPIniDir "C:/Program Files/PHP"

Search for this line: <IfModule mime_module>

You need to add the following line between the opening <IfModule mime_module> and closing </IfModule> tags:

AddType application/x-httpd-php .php

Now search the file for the DocumentRoot tag... it probably looks like this:

DocumentRoot "C:/Program Files/Apache Software Foundation/Apache2.2/htdocs"

You'll want to change that directory path to the location where your web application will be developed, so that by navigating to http://localhost it will point to the root of your project folder.

Since my application files will reside at C:\yiiprojects\testproject I'll update this tag as follows:

DocumentRoot "C:/yiiprojects/testproject"

When you restart the Apache HTTP Server it will now start serving up files from that location.

A little farther down you'll see a tag with the original DocumentRoot; it has a comment preceding it stating, "This should be changed to whatever you set DocumentRoot to."  Make sure you copy the DocumentRoot value here as well.

For me it is:
<Directory C:/yiiprojects/testproject>

Now save and close the httpd.conf file.  Open up the Apache Service Monitor application that should still be running from before, or open it if not, and hit the "Restart" button (or "Start" if the server is not currently running).

If you get errors when you attempt to restart the Apache HTTP Server, go to the command line, navigate to your Apache installation's bin folder and try to run "httpd" from the command prompt.  It should display the errors there, which you can use to debug the problem.

If it restarts without any errors, navigate to the folder you specified as your DocumentRoot folder in the httpd.conf file, create a file called index.php with the following contents:

Now open up a browser, navigate to http://localhost and if it prints out all the details of your PHP installation you've got PHP and Apache HTTP Server running together.

To get Yii Framework integrated with your new WAMP stack, proceed to Preparing a WAMP stack for Yii development - Part 2.