Refining your searches with zones and fields

One of the strengths of Verity is its ability to perform full-text searches on documents of many formats. However, there are often times when you want to restrict a search to certain portions of a document, to improve search relevance. If a Verity collection contains some documents about baseball and other documents about caves, then a search for the word bat might retrieve several irrelevant results.

If the documents are structured documents, you can take advantage of the ability to search zones and fields. The following are some examples of structured documents:

Note:   Although your word processor might open with what appears to be a blank page, the document has many regions such as title, subject, and author. Refer to your application's documentation or online help system for how to view a document's properties.

Zone searches

You can perform zone searches on markup language documents. The Verity zone filter includes built-in support for HTML and several file formats; for a list of supported file formats, see "Building a Search Interface". Verity searches XML files by treating the XML tags as zones. When you use the zone filter, the Verity engine builds zone information into the collection's full-word index. This index, enhanced with zone information, permits quick and efficient searches over zones. The zone filter can automatically define a zone, or you can define it yourself in the style.zon file. You can use zone searching to limit your search to a particular zone. This can produce more accurate, but not necessarily faster, search results than searching an entire file.

Note:   The contents of a zone cannot be returned in the results list of an application.

Examples

The following examples perform zone searching on XML files. In a list of rock bands, you could have XML files with tags for the instruments and for comments. In the following XML file, the word Pete appears in a comment field:

<band.xml>
   <Lead_Guitar>Dan</Lead_Guitar>
   <Rhythm_Guitar>Jake</Rhythm_Guitar>
   <Bass_Guitar>Mike</Bass_Guitar>
   <Drums>Chris</Drums>
   <COMMENT_A>Dan plays guitar, better than Pete.</COMMENT_A>
   <COMMENT_B>Jake plays rhythm guitar.</COMMENT_B>
</band.xml>

The following CFML code shows a search for the word Pete:

<cfsearch name = "band_search"
  collection="my_collection" 
  type = "simple"
criteria="Pete">

The above search for Pete returns this XML file because this search target is in the COMMENT_A field. In contrast, Pete is the lead guitarist in the following XML file:

<band.xml>
   <Lead_Guitar>Pete</Lead_Guitar>
   <Rhythm_Guitar>Roger</Rhythm_Guitar>
  <Bass_Guitar>John</Bass_Guitar>
   <Drums>Kenny</Drums>
   <COMMENT_A>Who knows who's better than this band?</COMMENT_A>
   <COMMENT_B>Ticket prices correlated with decibels.</COMMENT_B>
</band.xml>

To retrieve only the files in which Pete is the lead guitarist, perform a zone search using the IN operator according to the following syntax:

(query) <IN> (zone1, zone2, ...)

Note:   As with other operators, IN might be uppercase or lowercase. Unlike AND, OR, or NOT, you must enclose IN within brackets.

Thus, the following explicit search retrieves files in which Pete is the lead guitarist:

(Pete) <in> Lead_Guitar

This is expressed in CFML as follows:

<cfsearch name = "band_search"
  collection="my_collection" 
  type = "explicit"
  criteria="(Pete) <in> Lead_Guitar">

To retrieve files in which Pete plays either lead or rhythm guitar, use the following explicit search:

(Pete) <in> (Lead_Guitar,Rhythm_Guitar)

This is expressed in CFML as follows:

<cfsearch name = "band_search"
  collection="bbb" 
  type = "explicit"
  criteria="(Pete) <in> (Lead_Guitar,Rhythm_Guitar)">

Field searches

Fields are extracted from the document and stored in the collection for retrieval and searching, and can be returned on a results list. Zones, on the other hand, are merely the definitions of "regions" of a document for searching purposes, and are not physically extracted from the document in the same way that fields are extracted.

You must define a region of text as a zone before it can be a field. Therefore, it can be only a zone, or it can be both a field and a zone. Whether you define a region of text as a zone only or as both a field and a zone depends on your particular requirements.

A field must be defined in the style file, style.ufl, before you create the collection. To map zones to fields (to display field data), you must define and add these extra fields to style.ufl.

You can specify the values for the cfindex attributes TITLE, KEY, URL, and CUSTOM as document fields for use with relational operators in the criteria attribute. (The SCORE and SUMMARY attributes are automatically returned by a cfsearch; these attributes are different for each record of a collection as the search criteria changes.) Text comparison operators can reference the following document fields:

To explore how to use document fields to refine a search, consider the following database table, named Calls. This table has four fields and three records, as the following table shows:
call_ID
Problem_Description
Short_Description
Product
1
Can't bold text properly under certain conditions
Bold Problem
HomeSite
2
Certain optional attributes are acting as required attributes
Attributes Problem
ColdFusion
3
Can't do a File/Open in certain cases
File Open Problem
HomeSite

A Verity search for the word certain returns three records. However, you can use the document fields to restrict your search; for example, a search to retrieve HomeSite problems with the word certain in the problem description.

These are the requirements to run this procedure:

The following table shows the relationship between the database column and cfindex attribute:

Database column
The cfindex
attribute

Comment
call_ID
key
The primary key of a database table is often a key attribute.
Problem_Description
body
This column is the information to be indexed.
Short_Description
title
A short description is conceptually equivalent to a title, as in a running title of a journal article.
Product
custom1
This field refines the search.

You begin by selecting all data in a query:

<cfquery name = "Calls" datasource = "MyDSN">
  Select * from Calls
</cfquery>

The following code shows the cfindex tag for indexing the collection (the type attribute is set to custom for tablular data):

<cfindex
  query = "Calls"
  collection = "training"
  action = "UPDATE"
  type = "CUSTOM"
  title = "Short_Description"
  key = "Call_ID"
  body = "Problem_Description"
  custom1 = "Product">

To perform the refined search for HomeSite problems with the word certain in the problem description, the cfsearch tag uses the CONTAINS operator in its criteria attribute:

<cfsearch
  collection = "training"
  name = "search_calls"
  criteria = "certain and CF_CUSTOM1 <CONTAINS> HomeSite">

The following code displays the results of the refined search:

<table border="1" cellspacing="5">
<tr>
  <th align="LEFT">KEY</th>
  <th align="LEFT">TITLE</th>
  <th align="LEFT">CUSTOM1</th>
</tr>

<cfoutput query = "search_calls">
<tr>
  <td>#KEY#</td>
  <td>#TITLE#</td>
  <td>#CUSTOM1#</td>
</tr>
</cfoutput>
</table>

In a browser, the follwing retrieved results appear:

The results of a refined seacrh returns two records

Comments