luceneSearch
Lucene Full Text Indexing
Release Notes
attachGoldenThread
cast
copy
cutGoldenThread
cwu
delete
dpml
dump
entrypoint
exists
expire
fls
guid
hotRestart
HTTPCookie
HTTPRedirect
isUnix
locationToXPath
lock/unlock
log
loglevel
luceneIndex
luceneSearch
mapper
mapper-export
md5
mls
netkernel:*
new
Orchextra
org.ten60.util.aspell
org.ten60.util.image.SVG2PNG
org.ten60.util.image.Text2PNG
org.ten60.util.image.xchart
plainTextToXHTML
regex
relativizeURI
requestWithArgs
resolveURI
serialize
sleep
sqlBatch
sqlBooleanQuery
SQLEncodeElement
SQLEscapeXML
sqlQuery
sqlUpdate
stm
stopWatch
StringToCanonicalString
throw
toRelaxNG
trace
validateDTD
validateRNG
validateSchematron
validateXSD
xacml
xform
XHREFlinker
XHTMLredirect
XHTMLTidy
xinclude
xmltidy
xpatheval
xpur
xquery
xrl
xsign
xslfo-fop
xslfo-fop
xslt
xverify
License
Change History
NetKernel History
Acknowledgements

luceneSearch

Search over a lucene index

Module

urn:org:ten60:netkernel:mod:lucene

Definition

Format

<instr>
  <type>luceneSearch</type>
  <operator>
    <luceneSearch>
      <index>ffcpl:/org/ten60/test/myIndex/</index>
      <query>red cat</query>
      <unique />
    </luceneSearch>
  </operator>
</instr>

Syntax

ElementRulesDescription
typeMandatory luceneSearch
operatorMandatory an operator document containing an index URI, and a search criteria
targetOptional a results document contain all the locations where matches where found

Purpose

Lucene is a full text indexing and searching technology written by Apache. Lucene provides low-level text indexing and searching facilities. This accessor adds a layer over Lucene to support indexing over the content XML documents preserving the xpath locations of the content. This approach allows content to be located down to the element level across multiple documents.

The luceneSearch accessor supports searching over a single lucene index.

The <unique/> tag in the operator document causes the search results to be filter for on the best match per indexed docId.

Query Syntax

By default the search looks for complete words in the text content of the document. Multiple words can be specified and these are 'OR'ed together (matches will all of them score highest). 'AND' can be used to only find all keywords.

Examples:

  • cow only find documents with the word cow mentioned
  • blue cow only find documents with the words cow or blue mentioned
  • blue AND cow only find documents with the words cow and blue mentioned
  • cow AND basis:/animal/name only find documents with the word cow in elements with the path /animal/name
  • docid:addressbook.xml only find matches in the document indexed under the id of addressbook.xml

This may not be the whole story- digging deeper into the lucene document may reveal more.

Search result document

Example result document:

<luceneQuery>
  <match>
    <basis>/root/name</basis>
    <xpath>/root/name[1]</xpath>
    <uri>ffcpl:/org/ten60/ura/lucene/test/doc1.xml</uri>
    <docid>doc1.xml</docid>
    <score>1.0</score>
  </match>
  <match>
    <basis>/root</basis>
    <xpath>/root</xpath>
    <uri>ffcpl:/org/ten60/ura/lucene/test/doc1.xml</uri>
    <docid>doc1.xml</docid>
    <score>0.53795576</score>
  </match>
</luceneQuery>

<basis> contains an a basis xpath expression that describes the effective element type. Multiple elements may have the same basis.
<xpath> contains an a xpath expression locates a unique single element with the document.
<uri> contains the uri of the originally indexed document
<docid> contains the id that the document was indexed under
<score> contains a scoring for the match normalized between one and zero. One being a perfect match. A match is lower if it is found within a larger body of text. A match is lower if not all of multiple keywords matched.

References

Apache Jakarta Lucene Homepage


(C) 2003, 1060 Research Limited
© 2003,2004, 1060® Research Limited
1060 registered trademark, NetKernel trademark of 1060 Research Limited