luceneIndex
Lucene Full Text Indexing
Release Notes
attachGoldenThread
cast
copy
cutGoldenThread
cwu
delete
dpml
dump
entrypoint
exists
expire
fls
guid
hotRestart
HTTPCookie
HTTPRedirect
isUnix
locationToXPath
lock/unlock
log
loglevel
luceneIndex
luceneSearch
mapper
mapper-export
md5
mls
netkernel:*
new
Orchextra
org.ten60.util.aspell
org.ten60.util.image.SVG2PNG
org.ten60.util.image.Text2PNG
org.ten60.util.image.xchart
plainTextToXHTML
regex
relativizeURI
requestWithArgs
resolveURI
serialize
sleep
sqlBatch
sqlBooleanQuery
SQLEncodeElement
SQLEscapeXML
sqlQuery
sqlUpdate
stm
stopWatch
StringToCanonicalString
throw
toRelaxNG
trace
validateDTD
validateRNG
validateSchematron
validateXSD
xacml
xform
XHREFlinker
XHTMLredirect
XHTMLTidy
xinclude
xmltidy
xpatheval
xpur
xquery
xrl
xsign
xslfo-fop
xslfo-fop
xslt
xverify
License
Change History
NetKernel History
Acknowledgements

luceneIndex

Create and append to a Lucene text index

Module

urn:org:ten60:netkernel:mod:lucene

Definition

Format

<instr>
  <type>luceneIndex</type>
  <operand>file:/addressbook.xml</operand>
  <operator>
    <luceneIndex>
      <index>ffcpl:/org/ten60/test/myIndex/</index>
      <reset />
      <close />
      <flat />
      <id>My Addressbook</id>
    </luceneIndex>
  </operator>
</instr>

Syntax

ElementRulesDescription
typeMandatory luceneIndex
operandOptional the document to index
operatorMandatory an operator document containing an index URI, optionally a flag to reset and empty the index, and a document id to index the optional operand document under.

Purpose

Lucene is a full text indexing and searching technology written by Apache. Lucene provides low-level text indexing and searching facilities. This accessor adds a layer over Lucene to support indexing over the content XML documents preserving the xpath locations of the content. This approach allows content to be located down to the element level across multiple documents.

The luceneIndex accessor supports creation and clearing of indexes. Indexes are specified using the <index> element in the operator document. The index location must be an ffcpl: schemed URI that points to a unique directory space. The physical lucene index will occupy this directory and create files within it.

Before documents can be indexed the index must be created. Specifying a <reset/> flag in the operator document will initialise a new index or empty an existing one.

After adding to an index it must be closed and optimised for searching. Using a single luceneIndex instruction with just the index specified in the operator and a <close/> tag.

The index usually indexes the contents of the operand document against the contents xpath within the document. This is useful when indexing well structured documents and search for particular fields. By specifying the <flat/> flag in the operator document all the text of the document is merged and indexed at the root. This results in smaller and more efficient searches over more freeform text based documents.

The URI of indexed documents is stored within the index but in addition it is sometimes useful to store an independent id field. This can be specified with a <id> element within the operator document.

References

Apache Jakarta Lucene Homepage


(C) 2003, 1060 Research Limited
© 2003,2004, 1060® Research Limited
1060 registered trademark, NetKernel trademark of 1060 Research Limited