This style guide is a distillation of my own experience with additional examples and verbiage compiled from several sources, including:
In many cases, the above cited references conflict with each other, and with what I have observed as preferable practice. In those situations, I've done my best to state the justification for the recommendation I made.
Fred Koschara
May 2013
Source code development usually involves balancing requirements, some of which are often in conflict. For example, from a computer's perspective, the best source code requires the least interpretation at run time to achieve maximum efficiency. From a network's perspective, the best source code has the least total number of bytes transmitted to achieve maximum efficiency. For someone doing mainenance, source code that is neatly formatted, well commented, and blocked into logical groups of statements will require the least effort to do their job, and is therefore most efficient. For the original programmer, writing code in a creative streak, using only enough structure to keep the code organized in their mind will seem to be the most efficient pattern - until it's time to debug the code, and to insure all of the requirements are met. When that mindset change occurs, the original programmer suddenly finds their needs to be quite similar to those of the maintainer, and stopping to fix the code structure and add documentation makes sense.
For compiled languages such as C and C++, the conflict between the formatted source and least interpretation requirements is resolved by the compiler, but that adds at least one more step to the development process, and makes the code able to run on only one type of platform. Languages that use a just-in-time (JIT) "compiler" such as PHP, Python and Java or C# don't require additional steps in development, but the "compiler" must be run each time the code is used, generally producing "bytecodes" that are then interpreted by a virtual machine. This reduces portability issues but increases the processing requirements. If a bytecode cache is used, the JIT compiler only has to be run when a source code change is detected, but the bytecode interpreter is still less efficient than running native code directly on the underlying hardware. In most cases, the difference is negligible, and the increased portability is used to justify the loss of efficiency.
Network efficiency is most often increased by using compression to reduce the amount of data transmitted, but any compression algorithm has to be in place on both ends of the wire (the receiver must use a matching decompression algorithm to recover the original data without loss or corruption) and increases the computation overhead. Compression patterns are generally established in server configuration, and are therefore out of the scope of program development. What an application developer needs to be concerned with, however, is reducing the total amount of data presented to the compressor in the first place. Things such as eliminating extraneous whitespace may not make much of a difference on a page that is only viewed infrequently, but for a Web site with millions of page views a month, it could make as much as a 10% reduction in network traffic.
In order to decide where the "best" balance can be found, a major consideration is how the lowest total lifetime cost of a piece of software can be achieved. For long-lived products, having neatly formatted and well commented source code is a major consideration because the cost of maintenance skyrockets every time someone has to figure out what the existing code is doing. Over the life of a properly maintained program, maintenance costs can be expected to far outweigh the original cost of development. For a throw-away piece of code that will only be used once, it may make sense to forgo formatting and documentation, but doing so can lead to bad programming habits, and if the code is retained as part of the documentation, or could be used "occasionally" rather than once, it should be treated as carefully as any other part of the system.
Similarly, expending original development effort to achieve maximum network performance versus reducing development time should be decided in light of reducing the total lifetime cost of the application. With a very high traffic Web site, sending neatly formatted HTML code to the browser could significantly increase transmission costs, especially for uncompressed traffic. However, the higher cost of discovering an error in an unformatted page could be more than the savings from eliminating whitespace sent to the browser. As the number of times a particular page is served declines, the savings from eliminating HTML formatting drop, but maintenance costs are likely to stay the same. Thus it makes sense to try to send properly formatted HTML code when serving Web pages in nearly every case. Most of what a Web developer can do to reduce network overhead involves things like using tabs rather than spaces, and commenting out code in PHP rather than in HTML so that the comments are never transmitted.
Balancing requirements for source formatting, computation and network overhead is only one example of resolving conflicts during program development. Other issues are beyond the scope of this document, however, so herein those will be the issues we are addressing.
Within an application, pages should have a consistent look-and-feel so a new user can quickly learn their way around. Having pages that react and/or display information in similar ways makes finding information and using it easier for both novices and experienced users.
While the original author of a Web page may have a clear understanding of how the code is constructed when they first finish writing it, the structure and functionality may be obscure to another developer who is given the task of debugging or enhancing the page as interactions with other resources and requirements change. Even the original author may have difficulty following the code after a significant time has passed and they have been working on other tasks. This one of the major reasons why software maintenance costs frequently far outweigh the costs of development. Following this Style Guide when writing source files will make following the structure of a source file easier to follow.
The original source code for any given program should be human readable. The parsers, interpreters and/or compilers for nearly every programming language ignore whitespace outside of quoted strings. There are rare exceptions, e.g. Python where indentation is used to define logical structure rather than delimiters or control structure keywords. However, for every other language whitespace is allowed for the convenience of human readers and writers. Use the whitespace to make your source code easier to comprehend. Not only will it make maintaining the program easier, but it will make looking at it as it is being written and knowing it functions correctly much simpler, feasible even for large, complex projects.
While it is frequently used to perform server operations not directly visible to a user, PHP is also used to generate source code - HTML, Javascript, CSS - that is sent to a browser for interpretation. Users occasionally look at the source code for Web pages, and skilled QA personnel and maintainers will use the source sent to the browser as a tool in their efforts. Without paying attention, it is trivial to get PHP to create Web source code that is nearly incomprehensible - badly formatted HTML with random line lengths and no rhyme or reason for use of whitespace is a frequent result.
There is no excuse for such a mess: PHP can and should be used to generate HTML code that is just as readable as the original PHP code itself. It doesn't take a lot more work to insure the generated code is properly formatted in the first place. Once it's done, no more work is required to create comprehensible HTML code unless the PHP source is changed: The server will just as happily emit properly formatted Web source code as not. Use the power of automation to write a better Web!
As of this writing (May 2013), most of MIT Sloan’s existing PHP Web pages include a DOCTYPE specification indicating they are compliant with XHTML 1.0 Strict, the most restrictive HTML standard. However, a very large number of the pages are not actually compliant with the DTD that defines the standard, which could result in cross-browser compatibility issues: The fact that some (or most) browsers ignore or fix up coding errors doesn't mean they all do (e.g., Internet Exploiter frequently has its own ideas of how things should work). Validated HTML code is most likely to function in the greatest number of browsers. Using a validation tool during development is strongly advisable to eliminate errors and warnings, such as the HTML Validator add-on for Firefox.
Code must run error free and not rely on warnings and notices to be hidden to
meet this requirement. For instance, never access a variable that you did not
set yourself (such as $_POST
array keys) without first checking to
see that it isset()
. When developing code, check the Apache error
log frequently to catch problems early, and eliminate warnings and errors as
they are introduced.
PHP should be configured to report as many errors and warnings as possible: Use
E_ALL|E_NOTICE
, or preferably E_ALL|E_NOTICE|E_STRICT
.
With error logging enabled and display errors disabled, even a badly written
script won't tell the world everything that's wrong, but the problems will be
logged so they can be corrected. If a sufficiently high level of reporting is
not configured in php.ini
, an error_reporting()
call
at the start of each script can fix the problem.
No debugging code can be left in place for when pushing to the production
server unless it is commented out, i.e. no var_dump()
or
print_r()
calls, and no die()
or exit()
calls that were used solely during development, unless they are commented out.
Use example.com
, example.org
and
example.net
for all example URLs and email addresses, per
RFC 2606.
Every source file should have a copyright declaration, even if it is a "copyleft" declaration placing the code into the public domain. This will avoid ambiguities about the author's intent. Note that under U.S. copyright law once a document has been published without making a copyright claim, it cannot be legally copyrighted in the future. Therefore, if any copyrights are going to be reserved, any documents (including source code) must have a copyright declaration affixed before they are first made publicly available.
Similarly, every Web page sent to a browser should have a copyright declaration
embedded in the HTML <head>
section of the document and
a visible copyright claim statement if copyrights are being reserved for the
page and/or its contents.
Never use shorthand PHP start tags. Always use full PHP tags. This is important because
<?
”
starts a PI (processing instruction) and must be followed by a name)
<?php
// INCORRECT: ?>
<?php ... ?>
<?php echo $var ?>
<?php
// CORRECT: ?>
<?php ... ?>
<?php echo $var ?>
PHP generally uses semicolons “;
” to mark the end of
a statement. However, if the PHP closing tag is on the same line after a
statement, the semicolon is optional and should not be used.
A code block, enclosed in braces “{}
” is not a
statement, it is a group of statements. Each statement within the code block
must be terminated with a semicolon, but the code block itself is NOT
terminated with a semicolon.
The closing “?>
” tag at the end of a PHP file is
optional to the PHP parser and is not required. However, if used, any whitespace
following the closing tag, whether introduced by the developer, user, or an FTP
application, will be immediately written to the output. This will prevent any
more headers from being sent to the browser, can cause PHP errors, and if the
latter are suppressed, blank pages. Leaving it out reduces the processing
necessary for the module.
For source files that end in PHP mode (as opposed to HTML mode), omit the closing PHP tag and instead use a comment block to mark the end of file:
<?php
//
// EOF: filename.php
where “filename
” is replaced with the name of the
source file. There should be one newline after the last non-empty line in the
file: When the cursor is at the very end of the file, it should be one line
below the closing text.
Having the “end of file” marker in place will make it easier to find unwanted truncations of the file that might accidentally occur.
Source code statements should always be limited to 80 characters per line, taking into account expansion of tabs to four character spaces. While wide desktop display screens and horizontally scrolling editors make it possible to create and view longer lines in an original development environment, the longer lines are more difficult to comprehend, and will cause wrapping issues on less capable displays. (The first time you open a file with long lines in a terminal window on your smart phone you'll understand the problem.) In addition, when attempting to do side-by-side comparisons of different versions of the same file, the longer lines will either cause wrapping or scrolling issues in the comparison utility.
If necessary, an empty PHP segment can be inserted to wrap a long line of HTML code and keep it within the 80 character line length.
<?php
// For example: ?>
<a href="http://www.example.com/n/deeply/nested/page.php" rel="external" <?php
?>class="anchor-class-name">The Anchor Text</a>
There will be times when limiting lines to 80 characters will be impossible, such as within a "heredoc" where there isn't a mechanism available for folding long lines. Such cases are generally rare, so following the 80 character limit usually should not be an issue.
Do not combine statements on one line. Doing so reduces the line count (volume) but increases the line density, and more dense lines are harder to comprehend. In addition, the fact that multiple statements are combined on one line may be overlooked when quickly reading code during maintenance, resulting in confusion at best, and quite probably, leading to insertion of additional errors.
<?php
// INCORRECT:
$foo='this'; $bar='that'; $bat=str_replace($foo,$bar,$bag);
// CORRECT:
$foo='this';
$bar='that';
$bat=str_replace($foo,$bar,$bag);
There are a few cases where this rule may be broken to improve readability of the code. The permitted exceptions are:
if
, else
,
for
, foreach
, do while
or
while
- as long as line length limits are maintained.switch
block where the
case
, assignment and break
can all fit within
the line length limits:
<?php
switch ($_REQUEST['tag'])
{ default: $string=FALSE; break;
case 1: $string='first choice'; break;
case 2: $string='another possible choice'; break;
case 3: $string='maybe something else'; break;
case 4: $string='yet another idea'; break;
}
The PHP parser ignores whitespace outside of quoted strings. Whitespace is solely used for the convenience and comprehension of human readers.
No whitespace can precede a file's opening PHP tag or follow a closing PHP tag: extraneous whitespace at the boundaries of your files can cause output to begin before it is supposed to, leading to errors and, potentially, blank pages.
Whitespace is required after the PHP start tag - a single space, one or more tab characters, or a newline. For single line statements or control structures where an opening brace starts the next line, the start tag and statement can be on the same line - use a space as the separator if the current indentation level is less than two tabstops from the left margin, otherwise use the correct number of tabs to indent the statement for the current indentation level.
A single space should be used between the PHP code and the close tag unless the close tag is at the start of a new line where there should be no leading space.
In general, parenthesis and brackets should not use any additional spaces. The
exception is a space must always follow PHP control structure keywords that take
arguments with parenthesis (declare
, do-while
,
while
, if-else
, switch
, for
,
foreach
), to help distinguish them from function calls and increase
readability. Function names should not have any whitespace between them and the
parentheses enclosing their argument list. When referring to array items, never
use spaces around the index.
<?php
// INCORRECT:
foreach( $query->result() as $row )
// CORRECT:
foreach ($query->result() as $row) // single space after PHP keyword, not within parenthesis
// INCORRECT:
function Foo ( $bar )
{
}
// CORRECT:
function Foo($bar) // no spaces around parenthesis in function declarations
{
}
// INCORRECT:
$arr[ $foo ] = 'foo';
// CORRECT:
$arr[$foo]='foo'; // no spaces around array keys
Remove trailing spaces at the end of each line of code. Extraneous whitespace at the end of a line serves no useful purpose, may cause diff errors, and increases network overhead. Your text editor should have an option to assist in meeting this requirement.
To support readability, the equal signs may be aligned in block-related assignments:
<?php
$short =foo($bar);
$longername=foo($baz);
The rule should be broken when the length of the variable name is at least eight characters longer or shorter than the surrounding ones:
<?php
$short=foo($bar);
$thisVariableNameIsVeeeeeeeeeeryLong=foo($baz);
One thing that needs to be understood to have PHP generate readable HTML code is when PHP sends whitespace to the browser.
Any whitespace to the left of a PHP opening tag will be emitted - once. This
means that if you indent an opening tag for an include
statement,
the first line, and only the first line, of any HTML or
text in the included file will be indented by the whitespace to the left of the
opening tag: PHP does not interpret indenting an include
statement to mean "indent each line in the file by the amount the
include
statement was indented by."
On the other hand, unless there are non-whitespace characters on the line after a PHP closing tag, PHP ignores any whitespace up to and including the newline terminating the source statement. This frequently results in run-on HTML statements with embedded tabs where the code's author was expecting nicely formatted output:
<?php
// expected newlines are not emitted ?>
<tr>
<td>
<?php /* intentional indent */ echo $some_variable ?>
</td>
</tr>
<?php // yields this HTML output: ?>
<tr>
<td>
some_variable's value </td>
</tr>
<?php // better code would be: ?>
<tr>
<td><?php echo $some_variable; /* non-whitespace after close tag */ ?></td>
</tr>
<?php // which yields this HTML output: ?>
<tr>
<td>some_variable's value</td>
</tr>
The PHP parser doesn't care about indentation, it is solely used for the convenience and comprehension of human readers. Consequently, indentation should be used to enhance readability of the source code. Three simple rules underly the rest of the indentation patterns to be used:
Your indentation should always reflect the logical structure of the code.
At the start of a code module, the code is at the [local] root level - so start all code lines in column 1 (the leftmost column of the page). Each time a new nesting level is entered, whether braces or parentheses are present or not, indent one tab, and use that level of indentation until the nesting level changes - whether indenting another tab to enter another nesting level, or outdenting when leaving the current nesting level.
Most people find that 4-space tabstops provide the best balance between making indentation visible and using screen space. By using tabs, rather than spaces, for indentation, that preference can be adjusted without reformatting the code: If you think four spaces is too much for each indentation level, set your editor to use two or three space tabstops and the tab width will adjust to suit your view. On the other hand, if you want more indentation, you can use eight space tabstops with the same file. Just be sure that when you save the file that you are using TAB characters, not SPACES, for the indentation written to permanent storage.
Within switch
statements, the switch
statement is
the parent indentation level, the case
statements (including the
default
statement, if present) are the next indentation level,
and the action statements within each case
, including the
break
statement, are at the next indentation level. Thus, the
correct indentation structure for a switch
block is:
<?php
switch (condition)
{
case 1:
action1;
break;
case 2:
action2;
break;
default:
defaultaction;
break;
}
Use real tabs and not spaces, allowing the most flexibility across editors and operating systems. An acceptable exception is if you have a block of code that would be more readable if things are aligned, use spaces:
<?php
[tab]$foo ='somevalue';
[tab]$foo2 ='somevalue2';
[tab]$foo34='somevalue3';
[tab]$foo5 ='somevalue4';
For associative arrays, values should start on a new line. Note the comma after the last array item: This is recommended because it makes it easier to change the order of the array, and makes for cleaner diffs too. (Unlike Javascript running in InternetExploiter, PHP ignores a trailing comma in an array declaration.)
<?php
$my_array=array
([tab]'foo' =>'somevalue',
[tab]'foo2' =>'somevalue2',
[tab]'foo3' =>'somevalue3',
[tab]'foo34'=>'somevalue3',
);
The rule of thumb here is that tabs should be used for indentation at the beginning of the line and spaces for alignment within the line.
When concatenating strings in an assignment, long lines should be broken at
clauses to improve readability or if the line length limit would be exceeded.
In these cases, each successive line should be padded with white space so the
".
" operator is aligned under the "=
" operator:
<?php
$sql='SELECT id,name FROM people '
."WHERE name='Susan' "
.'ORDER BY name ASC';
The exception handling try
and catch
mark control
structure blocks, just as if
and else
do. They are
indented to the same level as the surrounding code, with try
and
catch
aligned with each other, and with the braces surrounding
their code blocks:
<?php
try
{
// code that might fail
}
catch (FirstExceptionType $e)
{
// first catch body
}
catch (OtherExceptionType $e)
{
// other catch body
}
When HTML and PHP are interspersed,
ALWAYS put PHP start tags
(“<?php
”) at the left margin unless one of
these specific conditions exists:
<?php
<p><strong>Area</strong><br />
<?php /* intentional indent */ echo $person['AREANAME'] ?></p>
<?php
<a href="<?php echo $theLink ?>"><?php echo $theAnchorText ?></a>
When HTML and PHP are interspersed, indent the PHP statements at the indentation level that would be in force if there were no HTML tags: The HTML and PHP codes should be considered as maintaining separate indentation levels.
As a general rule, only use parentheses where they are required. Additional parentheses may be used to clarify groupings in complex conditional constructs, but knowing operator precedence should eliminate their necessity.
Do not use parentheses when using language constructs such as echo
,
print
, include
, or require
. These are
not functions and don't require parentheses around their parameters.
When calling class constructors with no arguments, always include parentheses: The constructors are functions, so constructor calls need to look like function calls.
Use Allman style indenting, or preferably, Horstmann style (a.k.a. "compacted Allman") indenting. Braces are never at the end of a line, but rather always placed on a new line, and indented at the same level as the control statement that "owns" them. This makes it easier to find the matching braces and provides a logical view of the structure of the code. Code within a block enclosed by braces must be indented one level from the surrounding code, and all statements at the current nesting level begin in the same vertical column of the page. This makes it easier to identify the structure of the code.
With Allman style code, braces are always on a line by themselves. In Horstmann style, the opening brace is also at the same indentation level as the parent statement, but is followed by a tab and the first (or only) statement within the child block.
<?php
|
<?php
|
<?php
|
If you have a really long block, consider whether it can be broken into two or more shorter blocks or functions. If you consider such a long block unavoidable, put a short comment at the end on the same line the closing brace so people can tell at glance what that ending brace ends. Typically this is appropriate for a logic block, longer than about "35" rows, but any code that’s not intuitively obvious should be commented.
<?php
if (some_condition && !some_other_condition)
{ // ...
// ...
// ...
} // end if (some_condition && !some_other_condition)
while (yet_other_condition) // describe how this happens
{ // ...
// ...
// ...
} // end of "how this happens"
Single line blocks can omit braces for brevity as long as the indentation rules
are followed as though the braces were present. The only exception is single
statement else
clauses are preferably written on the same line as
the else
keyword:
<?php
if (condition)
action1();
else if (condition2)
action2();
else action3();
if
and else
are words, "elseif
" is not.
Always use else if
when specifying an alternate branch in an
if
control structure, not elseif
.
PHP supports using an alternative syntax for some of its control structures -
if
, while
, for
, foreach
,
and switch
. In each case, the basic form of the alternate syntax
is to change the opening brace to a colon (:) and the closing brace to
endif;
, endwhile;
, endfor;
,
endforeach;
, or endswitch;
, respectively. Using the
alternative syntax yields code EXTREMELY
difficult to grasp at a glance (unless you are a BASIC programmer, maybe),
especially when interspersed with HTML. It also requires using a whole set
of additional keywords instead of consistent braces.
<?php
// INCORRECT:
?>
<table>
<tbody>
<?php
foreach ($foo as $bar) : ?>
<tr>
<?php
if ($bar == 'example') : ?>
<th>My Heading</th>
<?php
else : ?>
<td>My Data</td>
<?php
endif; ?>
</tr>
<?php
endforeach; ?>
</tbody>
</table>
<?php
// CORRECT:
?>
<table>
<tbody>
<?php foreach ($foo as $bar)
{ ?>
<tr>
<?php if ($bar == 'example')
{ ?>
<th>My Heading</th>
<?php
}
else
{ ?>
<td>My Data</td>
<?php
} ?>
</tr>
<?php
} ?>
</tbody>
</table>
When considering using the alternative syntax, keep in mind a simple rule: DON’T DO IT!!
Single-quoted strings are not examined for escape sequences or embedded variable names and therefore require less processing when a page is being parsed. Always use single quoted strings unless you need variables or escape sequences parsed, or to avoid excessive quote escaping. In most cases where you would want variables embedded in a string parsed, it is preferable to use single-quoted strings concatenated to either side of the variable which is faster to parse. Use double-quoted strings if the string contains single quotes so you do not have to use escape characters.
<?php
// INCORRECT:
"My String" // no variable parsing, so no use for double quotes
"My string $foo" // not optimal
'SELECT foo FROM bar WHERE baz=\'bag\'' // ugly
$foo='something';
"My string is $something_else" // PHP looks for $something_else
"My string is ${something}_else" // ugly and unobvious
// CORRECT:
'My String'
'My string '.$foo // string catenation is faster than embedding variables
"SELECT foo FROM bar WHERE baz='bag'"
$foo='something';
'My string is '.$something.'_else' // no ambiguity
In general, code should be commented prolifically. It not only helps describe the flow and intent of the code for less experienced programmers, but can prove invaluable when returning to your own code months down the line. Non-documentation comments - those which describe the logic and flow of the program, rather than the interface with the rest of the system - are strongly encouraged. A general rule of thumb is that if you look at a section of code and think "Wow, I don't want to try and describe that," you need to comment it before you forget how it works.
C style comments (delimited by /* */
) should be used for creating
large comment blocks, comments within a line of PHP code, or when commenting
out sections of code. C++ style inline comments (delimited by //
)
may be used when the comment extends through the remainder of the current line,
or for commenting out single PHP statements.
Do not use Perl/shell style inline comments (delimited by #
).
When adding end of line comments, separate the code statement from the comment delimiter using a single tab. If multiple statements in a group have end of line comments attached, the comment delimiters can be tab aligned to improve readability.
Do not use C++ style inline comment markers (//
) at the start of a
series of lines for a multi-line comment: It look sloppy, takes more typing,
increases the file size, and it requires more processing power, since the
interpreter has to repeatedly go in and out of its "parsing a comment" mode.
Further, N.B. - Do NOT use //
comment delimiters to
comment out blocks of code: It takes more work to comment/uncomment the block,
in addition to looking sloppy. If you want to comment out a block of code,
insert a /*
before the block and a /**/
after it.
Then, if you want to uncomment it, all you have to do is add a */
immediately after the opening mark, or remove the comment delimiters - one or
two changes rather than having to modify every line that had been commented out.
It is sometimes useful to write a case
statement that falls through
to the next case
by not including a break
or
return
within the first case
. To distinguish code so
constructed from bugs, any case
statement where break
or return
are omitted should contain a comment indicating that the
break
was intentionally omitted:
<?php
switch (condition)
{
case 1:
action1;
// no "break" here, we drop through
case 2:
action2;
break;
default:
defaultaction;
break;
}
Complete inline documentation comment blocks (docblocks) must be provided for files, classes and functions (including class methods). A docblock is a special type of comment that provides verbose information about an element in your code. The information can be used by developers to gain understanding of the purpose and operation of a given element. It can also be used by integrated development environments (IDEs) to provide hints and auto-completion, and by automatic tools to generate API documentation.
Two popular programs designed for reading docblocks to produce documentation are phpDocumentor and Doxygen.
Using consistently constructed docblocks makes "newly found" code easier to understand when doing maintenance, and simplifies re-use of existing code by providing a firm grasp of the interface, effects and results of a set of a functional block or class. This is an example docblock for a function:
<?php
/**
* brief description of the function
*
* (optional) longer description of the function, side effects, etc.
* the longer description usually spans more than one line.
*
* @param type $param1, what it's for
* @param type $param2, what it's for, default='something'
* @global type $global1, what's expected in the global variable
* @global type $global2, what's expected in the global variable
* @return type, description of the return value
*/
function FunctionName($param1,$param2='something')
{ global $global1,$global2;
$internalName=processed($param1,$param2);
return $internalName;
}
Docblocks must also precede class and method declarations. In this example, some tags needed for publishing classes in repositories such as PEAR (the PHP Extension and Application Repository) are illustrated. They are not, however, required for internal use:
<?php
/**
* Super Class
*
* @package Package Name
* @subpackage Subpackage
* @category Category
* @author Author Name
* @link http://example.com
*/
class Super_class
{
/**
* Encodes string for use in XML
*
* @access public
* @param string
* @return string
*/
function xml_encode($str)
In general, a docblock comment starts with a C-style comment start tag with an
extra asterisk attached /**
followed by a newline. Each docblock
line starts with an asterisk under the slash of the comment start marker to
provide visual continuity of the extent of the docblock. Do not space
the column of asterisks over to line up under the first opening asterisk because
editors with "smart indentation" (following the indentation of the line above
when starting a new line) will improperly indent the first line of the comment
or function declaration. Text within the docblock is separated from the column
of asterisks by a single space, and the docblock is terminated with a standard
C-style comment close tag */
under the column of asterisks and
immediately above the code being described (no intervening blank lines).
A docblock generally contains three sections, separated from each other by a blank line (consisting solely of the required asterisk in the left column):
@param
and @return
. Tags
represent meta-data which human readers, IDEs, external tooling or even
the application itself can use to know how to interpret an element.
The most common tags are:
Documentation of any globals used by functions or methods is not required, but should be included.
An @global
tag may be used in a docblock preceding the definition
of a global variable. Only one @global
tag is allowed per global
variable docblock. A global variable docblock must be followed by the global
variable’s definition before any other element or docblock occurs in the
source. The name must be the exact name of the global variable as it is
declared in the source.
<?php
/**
* short description of this variable
*
* longer description of the variable, e.g., where it's used and how it affects
* the rest of the application
*
* @global int $foo
*/
$foo=0;
Anywhere you are unconditionally including a [class] file, use
require_once
. Anywhere you are conditionally including a [class]
file (for example, factory methods), use include_once
. Either of
these will ensure that class files are included only once. They share the same
file list, so you don't need to worry about mixing them - a file included with
require_once
is not be included again by include_once
.
include_once
and require_once
are language constructs,
not functions. Parentheses should not surround the subject filename.
TRUE
, FALSE
, and NULL
are PHP keywords
that should always be written fully uppercase.
TRUE
and FALSE
are BOOLEAN values which express truth
(or not), not defined constants with respective values of one and zero.
Symbolic constants are specifically designed to always and only reference their
constant value. Booleans are not symbolic constants, they are distinct values.
TRUE
happens to cast to integer 1 when you print it or use it in an
expression, but it's not the same as a constant for the integer value 1 and you
shouldn't use it as one. FALSE
casts to empty when you print it or
to zero if you cast it to an integer or use it in an expression. Again, it's
not a constant for empty or zero, and should not be used as such.
<?php
echo FALSE; // prints nothing - FALSE is empty
echo (FALSE); // prints nothing - FALSE is empty
echo FALSE+FALSE; // prints 0 - FALSE is cast to integer for addition
echo (FALSE+FALSE); // prints 0 - FALSE is cast to integer for addition
echo intval(FALSE); // prints 0 - FALSE is zero when explicitly cast
echo '"'.FALSE.'"'; // prints "" - FALSE is empty
echo TRUE; // prints 1
echo (TRUE); // prints 1
echo TRUE+TRUE; // prints 2
echo (TRUE+TRUE); // prints 2
echo intval(TRUE); // prints 1
echo '"'.TRUE.'"'; // prints 1
Similarly, NULL
is a special value indicating the absense of
anything, not a defined constant equal to zero. NULL
,
zero and FALSE
are all empty, but they are not equivalent.
Always use the ||
and &&
operators instead of
the words OR
and AND
because the word operators have
lower priority than assignment operators, which can lead to very unobvious
logical errors. For example, what is the value of $z
after
this code sequence?
<?php
$x=TRUE;
$y=FALSE;
$z=$y OR $x;
ANSWER: $z
is FALSE
because the last statement is
equivalent to ($z=$y) OR $x
rather than $z=($x OR $y)
as would naively be expected. On the other hand, after this code
sequence:
<?php
$x=TRUE;
$y=FALSE;
$z=$y || $x;
$z
is TRUE
because the ||
operator has
higher precedence than assignment operators.
When writing code meant to be shared across more than one application, global names (classes, functions, variables, defines) must be prefixed to prevent name collisions with PHP itself or other code. When selecting a prefix, pick one relevant to the code being developed, and isn't likely to clash with PHP.
Other than in names of constants, or to specify class hierarchy, underscores should not be used within names. They should only be used as a prefix for private members of classes (variables or methods).
Caution: PHP reserves all function names starting with two
underscores (__
) as magical. Do not use names starting with two
underscores unless you want some documented magic functionality.
Use all capital letters with underscores separating the words in a name.
<?php
define('A_STRING_CONSTANT','Hello World!');
define('SOME_BOOLEAN',TRUE);
define('ZERO',0);
Global function names should be ProperCased (a.k.a. StudlyCaps): They start with an uppercase letter, and each new word begins with an uppercase letter. Acronyms are treated as normal words when used as a name: The first letter is capitalized, others are lower case.
Variables should be named concisely, using camelCase (also known as bumpyCase). Make names descriptive without being overly long. Don't create new variables by appending an integer to an existing variable name. Removing vowels from variable names may shorten them, but don't remove so many that the name becomes incomprehensible: don't use indecipherable abbreviations.
<?php
$aGlobalVariable=1;
$someThing=array();
function MyPublicFunction()
Classes should be given descriptive names. Avoid using abbreviations where possible. Class names should be ProperCased, starting with an uppercase letter, and each new word begins with an uppercase letter. The PEAR class hierarchy is also reflected in class names where each level of the hierarchy separated with a single underscore.
Class variables (a.k.a. properties) and methods should be named using camelCase. Private class methods and variables that are only accessed internally by your class, such as utility and helper functions that your public methods use for code abstraction, should have their names prefixed with a single underscore.
<?php
class Log // shows PEAR hierarchy
class Net_Finger // shows PEAR hierarchy
class HTML_Upload_Error // shows PEAR hierarchy
class AMoreCompleteExample
{
public $counter; // public property
function connect() // public method
function getData() // public method
function buildSomeWidget() // public method
private $_status; // private property
private function _sort() // private method
private function _initTree() // private method
function convertText() // public method
private function _convertText() // private method
PHP requires class definitions to be contained within a single file. When writing classes designed for reuse beyond the page the were originally written for, each class should be in its own file whose name is the same as the class. Several closely related classes may be included within one file, e.g., a base class and subclasses directly derived from it. In such a case, the file should be named for the base class.
When several classes are defined specifically for use in a single page, they
should be put in a file in an include
directory under the one
containing the page's script. The filename should then be the page's name with
_class
appended, e.g., index.php
would use classes
found in include/index_classes.php
.
Assignments in arrays may be aligned. When splitting array definitions onto several lines, the last value should also have a trailing comma. This is valid PHP syntax and helps to keep code diffs minimal.
Function declarations follow the Horstmann (preferred) or Allman style:
<?php
function FooFunction($arg1,$arg2='')
{ if (condition)
{ statement;
}
return $val;
}
Whenever appropriate, provide default values for function arguments to reduce the overhead required for calling the function. Do not use default values for required parameters to avoid having errors logged when they are not supplied: If a parameter is truly required to be supplied at run time, the logged error message will help debug the faulty code, rather than masking it through use of a default value.
As required by the PHP language specification, arguments with default values must follow ones that do not have default values. When deciding the order of arguments with default values, consider which one(s) will be changed most often: If a non-default value is needed for a function call, all of the default values to the left of the one being modified must be specified with their default value to preserve their default behaviour. Consequently, defaults that are most likely to be kept should be rightmost in the argument list, and the most often changed specified first (leftmost).
<?php
function MyFunction($required,$changed='often',$mostly='static')
{ return $required.' = '.$changed.' '.$mostly;
}
MyFunction(); // returns " = often static" (NULL used for $required)
// also "PHP Notice: Undefined variable: $required" is written to the log file
MyFunction(1); // returns "1 = often static"
MyFunction(2,'frequently'); // returns "2 = frequently static"
MyFunction(3,'sometimes'); // returns "3 = sometimes static"
MyFunction(4,'often','noise'); // returns "4 = often noise"
MyFunction(5,'screeching'); // returns "5 = screeching static"
MyFunction(6,'','screeching'); // returns "6 = screeching"
Always return a meaningful value from a function if one is appropriate. If any code branches return a value from a function, all branches MUST return a value. N.B. The functions used as illustration here will not work as expected - see the notes re. booleans above in the TRUE, FALSE, and NULL section.
<?php
// INCORRECT:
function Broken($param)
{ // ...
if ($param == 'true')
return TRUE;
else if ($param == 123)
return $param;
// error: NULL is returned for any other $param values
}
// CORRECT:
function Fixed($param)
{ // ...
if ($param == 'true')
return TRUE;
else if ($param == 123)
return $param;
return 'error: invalid param value: '.strval($param);
}
Functions with many parameters may need to be split onto several lines to keep within the 80 characters per line limit. The first parameters may be put onto the same line as the function name if there is enough space. Subsequent parameters on following lines are to be indented two tab stops. The closing parenthesis immediately follows the last parameter. The opening brace follows on the next line, at the same indentation level as the "function" keyword.
<?php
function MyVeryLongFunctionName($firstRequiredParameter,$secondRequiredOne,
$thirdRequiredParameter,$firstOptionalOne=TRUE,$lastOptional=NULL)
{ // code starts here
// ...
}
Functions should be called with no spaces in the statement. For example:
<?php
$var=foo($bar,$baz,$quux);
As displayed above, there should be no spaces on either side of an equals sign used to assign the return value of a function to a variable. In the case of a block of related assignments, more space may be inserted left of the equals sign to promote readability:
<?php
$short =foo($bar);
$long_variable=foo($baz);
Some PHP functions return FALSE
on failure and also have return
values which evaluate to FALSE
in loose comparisons, such as an
empty string or zero. Be explicit by comparing the variable type when using
these return values in conditionals to ensure the return value is indeed what
you expect, and not a value that has an equivalent loose-type evaluation.
Use the same stringency in returning and checking your own variables. Use the
===
and !==
comparison operators as necessary.
<?php
// INCORRECT:
/* If 'foo' is at the beginning of the string, strpos will return a 0,
* resulting in this conditional evaluating as TRUE
*/
if (strpos($str,'foo') == FALSE)
// CORRECT:
if (strpos($str,'foo') === FALSE)
// INCORRECT:
function build_string($str='')
{
if ($str=='') // uh-oh! What if FALSE or the integer 0 is passed as an argument?
{
}
}
// CORRECT:
function build_string($str='')
{
if ($str==='')
{
}
}
Typecasting has a slightly different effect which may be desirable.
When casting a variable as a string, NULL
and FALSE
become empty strings, zero (and other numbers) become strings of digits, and
TRUE
becomes '1'
:
<?php
$str=(string) $str; // cast $str as a string
SQL keywords are always capitalized: SELECT
, INSERT
,
UPDATE
, WHERE
, AS
, JOIN
,
ON
, IN
, etc.
Break up long queries into multiple lines for legibility, preferably breaking for each clause. Use string concatenation to allow alignment of the clauses in the source code without introducing extraneous whitespace into the SQL query.
<?php
// INCORRECT: // keywords are lowercase and query is too long for a
// single line (... indicates continuation of the line)
$query = $this->db->query("select foo, bar, baz, foofoo, foobar as raboof, foobaz from exp_pre_email_addresses
...where foo != 'oof' and baz != 'zab' order by foobaz limit 5, 100");
// CORRECT:
$query=$this->db->query( 'SELECT foo,bar,baz,foofoo,foobar AS raboof,foobaz '
.'FROM exp_pre_email_addresses '
."WHERE foo!='oof' AND baz!='zab' "
.'ORDER BY foobaz LIMIT 5,100');
When creating a new Web page, use a prototype to lay out the file structure and implement common elements of the presentation. Using a template helps to insure the file format will be more readily grasped by new readers, and makes it easier to remember to include all of the necessary details.
Three prototype files are available in the /shared/inc
directory
that should be used when constructing any new pages:
/shared/inc/inc/leftnav.php
(View Source)/shared/inc/inc/leftnav.php
(View Source)
These three prototypes all use PageFrame.php
(View Source) to
build the overall HTML framework for the page. They are heavily commented with
instructions for what needs to be placed where and how to customize them for a
particular application. In the simplest case, setting the page title and adding
some content is all that will be needed to create a new Web page. Additional
code can be added as needed to create any level of complexity desired, yet still
fit within the overall design framework of the MIT Sloan site.
Avoid heavy logic within presentational code (HTML). While some processing
and logic often needs to be done when it is nestled within a tag soup of HTML,
avoid making in-page coding complex. One should not be doing more than basic
foreach ()
, if ()
, and $obj->get*()
within the presentation parts of the PHP document source.
When parts of an HTML page are [temporarily] disabled, they should be commented out using PHP rather than HTML comments: If the code is disabled by PHP, it will not be sent to the browser, reducing network traffic, and eliminating false "hits" by search engines.
In general, HTML comments should only be used to provide hidden information in the HTML source code that may be useful to debuggers and maintainers, such as the date the last time the page was edited.
<?php
define('LastModified','April 9, 2013 @ 5:29 pm');
/*
* Copyright 2013 by MIT Sloan School of Management. All rights reserved.
*
* $Id: /path/to/page.php,v $
*/ ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title><?php echo $pagetitle ?> - MIT Sloan</title>
</head>
<body>
<ul>
<?php // INCORRECT: // this gets sent to the browser ?>
<!--<li>November 1, 2012
<br/>
<a href="mapsearch=Wong+Auditorium">Wong Auditorium</a>, 12 Noon<br/>
Light lunch, 11:30am.</li>-->
<?php // CORRECT: // this stays on the server ?>
<?php /*
<li>November 1, 2012
<br/>
<a href="mapsearch=Wong+Auditorium">Wong Auditorium</a>, 12 Noon<br/>
Light lunch, 11:30am.</li>
*/ ?>
</ul>
<?php // CORRECT: maintenance documentation ?>
<!-- <?php echo LastModified ?> -->
</body>
</html>
<?php
//
// EOF: page.php
To enhance readability of functions and methods, it is wise to return early if simple conditions apply that can be checked at the beginning of a method: It's better to return early, keeping indentation and the brain power needed to follow the code lower.
<?php
|
<?php
|
Long if
statements may be split onto several lines when the number
of character per line limit would be exceeded. The condition clauses are moved
to following line(s), indented one tab. Logical operators (&&
,
||
, etc.) should be aligned under the opening clause to make it
easier to comment (and exclude) the condition. When splitting statements this
way, the closing parenthesis may be on its own line, positioned under the
opening parenthesis. The opening brace for the conditional code goes on the
next line, aligned with the start of the statement.
Keeping the operators at the beginning of the line has two advantages: It is trivial to comment out a particular line during development while keeping syntactically correct code (except for the first line). It also keeps the logic at the front of the code where it's more readily observed. Scanning such conditions is very easy since they are aligned below each other.
<?php
if (( $condition1
|| $condition2
)
&& $condition3
&& $condition4
)
{
//code here
}
The first condition may be aligned to the others.
<?php
if ($condition1
|| $condition2
|| $condition3) // closing parenthesis on the last clause's line is OK
{
//code here
}
When an if
statement is really long enough to be split, it might be
better to simplify it. In such cases, you could express conditions as variables
and compare them in the if
statement. This yields "naming" and
splitting the condition sets into smaller, better understandable chunks. The
disadvantage in doing so is the increased processing overhead needed to create
the additional variables, which should be avoided within loops.
<?php
$is_foo=($condition1 || $condition2);
$is_bar=($condition3 && $condtion4);
if ($is_foo && $is_bar)
{
// ....
}
The same rule as for if
statements also applies for the ternary
operator: It may be split onto several lines, keeping the question mark and
the colon at the front.
<?php
$a = $condition1 && $condition2
? $foo : $bar;
$b = $condition3 && $condition4
? $foo_man_this_is_too_long_what_should_i_do
: $bar;
The style guide permits a maximum line length of 80 characters. When calling functions or methods with many parameters it may be impossible to respect the line limit. In that case, splitting the function calls between parameters is needed
Several parameters per line are allowed, filling the line as much as possible. Subsequent parameter lines need to be indented one tab compared to the level of the function call. If there is room for one or more parameters on the function call line, the opening parenthesis is between the function name and parameter name as usual. The closing parenthesis then immediately follows the last parameter:
<?php
$this->someObject->subObject->callThisFunctionWithALongName($parameterOne,
$parameterTwo,$aVeryLongParameterThree);
If the function call and the first parameter will not fit on the same line, the opening parenthesis is aligned with the function call on the next line, followed by a tab and the first parameter. Subsequent parameters can be used to fill the rest of the line, or can be specified on following lines, indented to fall under the first parameter. The closing parenthesis then goes on a line after the last parameter, aligned with the opening parenthesis.
The same applies not only for parameter variables, but also for nested function calls and for arrays.
<?php
$this->someObject->subObject->callThisFunctionWithALongName
( $this->someOtherFunc
( $this->someEvenOtherFunc
( 'Help me!',
array('foo' =>'bar',
'spam'=>'eggs'),
23
),
$this->someEvenOtherFunc()
),
$this->wowowowowow(12)
);
Nesting those function parameters is allowed if it helps to make the code more readable, not only when it is necessary when the characters per line limit is reached.
Using fluent application programming interfaces often leads to many concatenated
function calls. Those calls may be split onto several lines. When doing this,
all subsequent lines are indented by one tab and begin with the ->
arrow.
<?php
$someObject->someFunction('some','parameter')
->someOtherFunc(23,42)
->andAThirdFunction();
Assigments may be split onto several lines when the character/line limit would be exceeded. The equal sign has to be positioned onto the following line, and indented by one tab.
<?php
$GLOBALS['TSFE']->additionalHeaderData[$this->strApplicationName]
= $this->xajax->getJavascript(t3lib_extMgm::siteRelPath('nr_xajax'));
PHP is used as an interpreted language, rather than a compiled one. Each time a page loads, the source file(s) has/have to be read from disk, parsed, reduced to bytecodes, and interpreted by the Zend engine. Modern operating systems cache disk accesses, and using a bytecode cache with enough memory allocated to its buffers can nearly eliminate the necessity of parsing source files once active development is finished. Between the disk and bytecode caches, PHP code can be just as fast as, or even faster than, compiled languages which run on a virtual machine, such as Java. There are no optimizations, however, which can make up for sloppy, inefficient programming: It's up to you, the developer, to be wary of practices that lead to slower page loads and higher processing requirements.
You should always use echo
rather than print
because
it is more flexible and doesn't have the overhead of returning a value.
(print
always returns 1
, so its return value
isn't terribly useful in the first place.)
Don't call echo
repeatedly, use string concatenation instead to
eliminate the overhead of multiple function calls: Neither echo
nor print
automatically emit a newline at the end of the string(s)
they are passed, so there is absolutely no advantage to having multiple calls
in a row.
<?php
// INCORRECT: // prints "thisisreallybad!
echo 'this'; // no whitespace in any of these statements
echo 'is';
echo 'really';
echo 'bad!';
// CORRECT: // prints "This is much cleaner."
echo 'This ' // spaces embedded in each source string
.'is '
.'much '
.'cleaner.';
Create variables used to store "constant" values (ones which do not change with each iteration) before beginning loops. Otherwise, the variable will be created and destroyed during each loop iteration, which can be a very expensive bit of overhead.
Obejct oriented programming seeks to deal with systems and data as objects that interact with each other by passing messages through interfaces. Conceptually it's an attractive programming model because the boundaries between objects are well defined, making it easier to keep track of who owns what and how things can be manipulated. With that surface simplicity, however, comes the burden of increased processing overhead, and, behind the scenes, a much more complex system to support the programming model.
While lean classes can be constructed that provide only the necessary methods for the data they are encapsulating, many classes are built with all sorts of bells and whistles "in case" somebody needs them. There are even style guides suggesting that public properties should not be used and class data members have to be read and written using gettors and settors. One problem with writing Web code that way is every time a Web page is loaded, the entire class [file] has to be read and parsed, even if only a small part of the class functionality is being used. On a busy Web server, the extra overhead can add up quickly. Another problem, caused by only having one class per file, is pages that need functionality out of lots of classes will need to open and parse lots of files, which adds to congestion on the server's disk access bus.
Procedural code, on the other hand, requires a more intimate understanding of the data and logic for an application, resulting in an apparently more complex system design. The resulting code is more easily tuned for performance, though, and can be pared down to the minimum required to do the job without a major effort. As a result, pages load faster with smaller server requirements, and they don't have to carry around unused code the way OOP classes often do.
There are many cases where PHP classes are the most logical way to deal with a
data set: Having an array of class objects representing data records makes it
easy to present the data uniformly - a foreach
loop can iterate
over the array, calling the same method for each object to write the data to the
browser. That makes the body of the loop a single statement, rather than a mass
of mingled PHP and HTML that is more easily understood in a class method where
the related variables are close at hand for reference. A balance can be found
between design and implementation simplicity, but for the best performance, keep
in mind that smaller files require less processing power, and will have lower
communication overhead, even if the reduction is only internal to the server.
Duplicated code is A REALLY BAD THING™: Multiple copies of the same code require (approaching exponentially) more effort to debug or update, duplicated code increases the size of files, having repeated copies of the same thing (with the names changed or not) makes the code harder to comprehend, and when changes are introduced, verifying the correctness of the code set becomes "rather" difficult.
Copy-paste-edit development is the cause of most code duplication, often the result of an "add another element to my list" request: An existing list item is copied, pasted in above or below the original, then modified with the different information for the new item. While this procedure will add one item with a minimum of immediate effort, it makes changing the presentation of the data in the list effectively impossible unless an unreasonable amount of effort is expended. In addition, if editing accidentally removes characters from HTML tags - or even removes the tags altogether - fixing the resulting problems will most likely take a significant amount of time.
Avoid copy-paste-edit development as though it were poison ivy! If you need to duplicate some existing HTML code, look at the similarities and differences between the two instances. Write a PHP function (use the language to do what it was designed for) to emit the common elements, and pass parameters to control introduction of the differences. In all but the most trivial cases, the result will be a smaller file that is easier to maintain. Smaller files require less processing and often have lower communication overhead.
While the code in the left column of the illustration below appears to be about the same size as the code in the right column, there are a couple of problems:
class
attribute needed to be changed from
headshot
to bio-thumb
, it would have to be
corrected in three places in this subset example vs. once in the right
column.
<?php
|
<?php
|
When building a list such as the one illustrated above, consider the question
"what's the data source?" If the information is coming from a database query or
an XML feed, the display function can probably be written such that it's fed a
record directly from the data source, and building the list becomes nothing more
than a foreach
that repeatedly calls the function, passing the data
records in succession:
<?php
require_once $_SERVER['DOCUMENT_ROOT'].'/path/to/data-source.php';
$Faculty=GetFacultyInGroup($groupName);
/**
* displays the faculty member's name as a link to their profile
*
* @param array $person, database record with info about the faculty member
*/
function ShowFaculty($person)
{ ?>
<li><a href="/faculty/profile.php?id=<?php
echo $person['PERSONID']?>"><?php
echo $person['FULLNAME']?></a></li>
<?php
}
?>
<h3>Our Staff</h3>
<ul>
<?php foreach ($Faculty as $person) ShowFaculty($person) ?>
</ul>
Don't reinvent the wheel. In addition to the extensive library of functions
built into PHP, a significant number of routines have been written by other
members of the team. There are files of functions that can be used in the
/shared/inc
directory, and other places on the server. As you
find or write other code that is useful in more than just your current project,
encapsulate it in a well-documented function (or class, if it's a more complex
set of data and operations) and either add it to one of the existing files
containing similar functionality, or create a new one with a name describing
the type of code to be found inside (almost always required when writing a new
class). Building and using such a code library not only reduces the effort to
build new applications, but it makes debugging and maintaining the entire code
base much more efficient: Rather than having to track down an ill-defined set
of copies of similar code if an error is detected, having a common code base
means one fix can update a host of applications with the correction.
Always remember the cardinal rule of network security: Data coming from userland cannot be trusted. Even if you build a form that limits line lengths, data values, and uses Javascript to insure only valid data can get to the server when the submit button is pressed, there's nothing to stop Joe Hacker from creating a form that connects to your script and sends you a load of bull. Even if it's not Joe, it could be a glitch on the network, or any of a host of other problems that could corrupt the data - it can't be trusted. As a result, YOU MUST ALWAYS VALIDATE DATA ON THE SERVER before using it or passing it to a database (or any other trusting application) if it came from userland - form submittals, email messages, tweets, etc.
There are two basic types of data errors that need to be protected against - invalid data, and incorrect data.
Invalid data includes things such as strings that are longer than the
maxlength
attribute on an <input>
field allows,
or values that cannot be chosen from a <select>
list. A
successful submission of a valid form will not result in receipt of invalid
data, so if it is detected, all of the data received must be discarded, and
displaying nothing more than a terse Invalid data
message is an
appropriate response. (Don't be too harsh, though - the invalid data
could be the result of a network error.)
Incorrect data, on the other hand, is a common occurrence: Users type their passwords with the CapsLock key on, or enter an email address in a telephone number field, etc. In such cases, the job of the validation code is to detect as many of these types of errors as possible and cause an appropriate error message to be displayed so the user can intelligently correct the problem. When incorrect data is detected, be nice - the error message is supposed to help the user, not belittle them. Don't spend forever trying to make a foolproof system, though, because fools are too ingenious. Besides, only a fool will [be likely to] use a foolproof system...
PHP has grown to be an extremely powerful language. When combined with proper server technology (e.g., a bytecode cache), it is also a very efficient one. Use its power, code well, and you can easily build systems that rival anything constructed using proprietary or otherwise closed-source development tools.