Byte Order Mark (BOM) and PHP

on
Feb 22

Byte Order Mark is a character that used as indication of the endianness (byte order) of text. For example, if we have a text file that is encoded as UTF16,

when we open this file in some program, we will need to tell the program what is the encoding used, whether it is 8bit, 16bit, 32bit or any other encoding.
Many programs add BOM by default to a file under various encodings. In other programs we can choose to eliminate it. As this may seem irrelevant for many web developers, this BOM character can cause us, web developers, quite a headache.

Imagine having a PHP file, that in some point of the code I have to set the headers of a file, or use a session. We will have to alter the header information during runtime, and send the HTML output only after the headers have sent. Buffering any data before that will give us an error (headers have already been sent, output started at somefile.php:34...). This BOM character, if present, will be sent before the headers, as it is located before the "<?php" and is the first character in our file.

 

A file with bom  is included and causes headers to be sent


Another issue that this can cause is that when if we sent HTML code back to the client, that contains (somewhere inside the page) these BOM characters, web browsers will identify these characters and treat them differently. As these characters don’t really have a meaningful visual representation for us, it could mess up our design. It would be interpreted as a space character and could push our HTML elements. If in the middle of our code we include an external file, the BOM character would also be included. Here is an example of some code:

included.html:
<div style="width: 50px; height: 50px; border: 1px solid #000000;">Div2</div>


Example.php:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="Content-Type" content="text/html; charset=ENCODING" /></head>
<body>
<div style="width: 50px; height: 50px; border: 1px solid #000000;"></div>
<?php include("included.html"); ?>
</body></html>

If included.html has the BOM character we will get something like this (image below). Different types of encodings will show the BOM character differently, but in both cases it will cause an unwanted gap.

The BOM character under different encodings.

 

Notepad++ is a code editor for windows. It is very lightweight and supports many languages. In this editor we can select the encoding and choose whether we want the BOM character present.

 

Removing the BOM character using Notepad++


In the menu bar click on Format. Here you can select the encoding you wish to use, and state whether you want to use the BOM character or not. Big Endian and Little Endian are also types of BOM.

 

Viewed 29045 times.
Share |
How to migrate from o... (Daniel, 16:52 06/04/2013)
How to redirect SNMP ... (Daniel, 18:32 24/08/2012)
How to add httpd.conf... (Daniel, 19:20 24/03/2012)
How to avoid entering... (Daniel, 03:11 27/12/2011)
How to configure SOFT... (Daniel, 23:24 26/12/2011)
Subversion - Tortoise... (21:02 17/01/2010, 119857 views)
How to install Nagios... (01:08 23/12/2011, 46372 views)
How to avoid entering... (03:11 27/12/2011, 29300 views)
Byte Order Mark (BOM)... (17:36 22/02/2010, 29045 views)
Raid Types (17:54 09/02/2010, 21323 views)