Well-engineered Web pages should not contain non-essential header data (e.g., between the and HTML tags). All header data shall be a conscious item for inclusion by the Web page developer(s), and of direct value in meeting the information or service objectives for the target-user community.
Document type declaration
Well-engineered Web pages shall have initial lines as typically provided by the server for static Web pages, but which may be required for dynamically generated Web pages. indicates the DTD applicable for this page. XHTML pages should have the initial declaration, and for HTML consistency may need to include both HTML and XHTML head elements.
The page title shall include useful and distinctive indication of the contents. The HTML title should be chosen carefully considering its role in search engine indexing, query responses, window title bar, and in bookmark labels. If structured consistently, it may also improve the orientation of the user in the site.
Well-engineered Web pages shall incorporate appropriate metadata to provide for accurate cataloguing and indexing of pages for the environment in which the pages are accessible. Well-engineered Web pages shall not provide duplicate data to search engines or indexing systems, other than divergent spellings or grammatical forms. Header tags should include data needed for page processing (link, style, script) or page indexing (title, meta/keywords, meta/description, PICS, and Dublin Core items.) Where more than four metatags are included, the use of link to profiles should be used. Links to style sheets and script files should also be used to facilitate reuse as well as off-loading network overhead.
The DESCRIPTION metatag may be used to provide guidance to search engines on what to present users in the search response (e.g., ). Search engines often display the first few lines of a Web page to help searchers to identify the sites they want. Some engines display the META tag DESCRIPTION attribute instead. This display can persist long after the actual Web page has been deleted.
Therefore, if you want specific information to be visible, early page placement can help. If you do not want information to be visible, then avoid early page placement (note that for various reasons search engines may be displaying pages that you did not intend to have publicly available). Finally, to assure old information is not presented by search engines, it may be necessary to replace the page with a “no longer available” message page for an extended period of time to provide for search engine replacement of the earlier data (resubmission may also be useful.)
Search engines should be expected only to consider some limited number of keywords when indexing pages. Well-engineered Web pages shall present keywords in priority order and without duplication (e.g., ).
The Dublin Core DTD was developed by the library sciences community, but may be applicable to general purpose well-engineered Web page indexing. The Dublin Core Metadata (see Annex D for a recent version) shall be used for fields of information that are of value in indexing or cataloguing the well-engineered Web page.
Well-engineered Web site design shall include consideration of content-selection mechanisms. Within the context of Intranets/Extranets, PICS rating services and mechanisms may be useful to ensure that users are accessing the preferred information sources. For example, an index search within an organization for information about a corporate policy may yield pages with opinions, local implementations, or other variations. A rating system within an organization may distinguish between “corporate” policy data, legal requirements, and other guidelines.
The PICS mechanism could then be used to provide users with a view of the data that was relevant to their environment, rather than forcing them to locate the relevant views from a much wider set of responses. The use of metadata and content included for the purpose of content selection (indexing) shall not be misleading. Emerging tools, such as XML and RDF will provide additional mechanisms for content selection that should be considered in the future.
Well-engineered Web sites shall incorporate robot exclusion elements (see Annex E) as the method for indicating pages to be indexed or searched by automated means and those to be excluded.
The first bytes (including bytes) have the most impact on network overhead. Transport Control Protocol (TCP) operates with a “slow start,” awaiting an acknowledgment of initial packets sent before initiating a full sequence of transmissions. This avoids congestion of the net that may be directed to a nonresponsive site. This makes the data transferred first from the server, and initial elements of the page (e.g., , etc) more critical in response time and network loading. Data in the sequence should be focused to minimize overhead, and provide essential data to the client. Unfortunately, the HTML format calls for all metadata to be in the head section (see Nielsen, et al, [B55] and W3C Web HTTP Performance Overview [B63] for more details on bandwidth impact). Tags expected in the head section of a well-engineered Web page including minimal overhead would include: ‘title’, ‘link’ (to style sheets), ‘meta’ (as designated in Dublin Core plus ‘keyword’, ‘description’, or “http-equiv”), ‘base’, ‘script’, ‘object’. Where extended sets of metadata, style or scripts are included, the ‘link’ element should be used to reduce ‘in page’ overhead. Relevant information about the metadata should be indicated with the ‘profile’ attribute of the ‘head’ tag.
Human language specification
To facilitate accurate indexing and ease of access for users, well-engineered Web pages shall include the LANG metatag declaring the primary language environment(s) for each page.
Digital signature and other fingerprinting mechanisms should be applied when appropriate, to ensure page integrity and authentication. This would be appropriate when it is necessary to assure the material presented has not been changed, as when posting price data or other data that must be secured for legal or business reasons. Information related to this may be communicated through header extensions or related files, or it may be implicit in the content body. Resigning pages may be problematic, so extra care should be given to ensure the immutability of the data (including links, etc) within the signed area (see 4.2.8). (Testing for this is not easily automated.)