Zend Framework の紹介

 Learning Zend Framework


 Zend Framework リファレンス

  • 第36章 Zend_Gdata
  • 第37章 Zend_Http
  • 第38章 Zend_InfoCard
  • 第39章 Zend_Json
  • 第40章 Zend_Layout
  • 第41章 Zend_Ldap
  • 第42章 Zend_Loader
  • 第43章 Zend_Locale
  • 第44章 Zend_Log
  • 第45章 Zend_Mail
  • 第46章 Zend_Markup
  • 第47章 Zend_Measure
  • 第48章 Zend_Memory
  • 第49章 Zend_Mime
  • 第50章 Zend_Navigation
  • 第51章 Zend_Oauth
  • 第52章 Zend_OpenId
  • 第53章 Zend_Paginator
  • 第54章 Zend_Pdf
  • 第55章 Zend_ProgressBar
  • 第56章 Zend_Queue
  • 第57章 Zend_Reflection
  • 第58章 Zend_Registry
  • 第59章 Zend_Rest

  • 第60章 Zend_Search_Lucene
  • 第61章 Zend_Serializer
  • 第62章 Zend_Server
  • 第63章 Zend_Service
  • 第64章 Zend_Session
  • 第65章 Zend_Soap
  • 第66章 Zend_Tag
  • 第67章 Zend_Test
  • 第68章 Zend_Text
  • 第69章 Zend_TimeSync
  • 第70章 Zend_Tool
  • 第71章 Zend_Tool_Framework
  • 第72章 Zend_Tool_Project
  • 第73章 Zend_Translate
  • 第74章 Zend_Uri
  • 第75章 Zend_Validate
  • 第76章 Zend_Version
  • 第77章 Zend_View
  • 第78章 Zend_Wildfire
  • 第79章 Zend_XmlRpc
  • ZendX_Console_Process_Unix
  • ZendX_JQuery
  • Translation 70.6% Update 2010-11-28 - Revision 23415

    第10章 Getting Started with Zend_Search_Lucene

    10.1. Zend_Search_Lucene Introduction

    The Zend_Search_Lucene component is intended to provide a ready-for-use full-text search solution. It doesn't require any PHP extensions[1] or additional software to be installed, and can be used immediately after Zend Framework installation.

    Zend_Search_Lucene is a pure PHP port of the popular open source full-text search engine known as Apache Lucene. See http://lucene.apache.org/ for the details.

    Information must be indexed to be available for searching. Zend_Search_Lucene and Java Lucene use a document concept known as an "atomic indexing item."

    Each document is a set of fields: <name, value> pairs where name and value are UTF-8 strings[2]. Any subset of the document fields may be marked as "indexed" to include field data in the text indexing process.

    Field values may or may not be tokenized while indexing. If a field is not tokenized, then the field value is stored as one term; otherwise, the current analyzer is used for tokenization.

    Several analyzers are provided within the Zend_Search_Lucene package. The default analyzer works with ASCII text (since the UTF-8 analyzer needs the mbstring extension to be turned on). It is case insensitive, and it skips numbers. Use other analyzers or create your own analyzer if you need to change this behavior.

    [注意] Using analyzers during indexing and searching

    Important note! Search queries are also tokenized using the "current analyzer", so the same analyzer must be set as the default during both the indexing and searching process. This will guarantee that source and searched text will be transformed into terms in the same way.

    Field values are optionally stored within an index. This allows the original field data to be retrieved from the index while searching. This is the only way to associate search results with the original data (internal document IDs may be changed after index optimization or auto-optimization).

    The thing that should be remembered is that a Lucene index is not a database. It doesn't provide index backup mechanisms except backup of the file system directory. It doesn't provide transactional mechanisms though concurrent index update as well as concurrent update and read are supported. It doesn't compare with databases in data retrieving speed.

    So it's good idea:

    • Not to use Lucene index as a storage since it may dramatically decrease search hit retrieving performance. Store only unique document identifiers (doc paths, URLs, database unique IDs) and associated data within an index. E.g. title, annotation, category, language info, avatar. (Note: a field may be included in indexing, but not stored, or stored, but not indexed).

    • To write functionality that can rebuild an index completely if it's corrupted for any reason.

    Individual documents in the index may have completely different sets of fields. The same fields in different documents don't need to have the same attributes. E.g. a field may be indexed for one document and skipped from indexing for another. The same applies for storing, tokenizing, or treating field value as a binary string.

    [1] Though some UTF-8 processing functionality requires the mbstring extension to be turned on

    [2] Binary strings are also allowed to be used as field values

    digg delicious meneame google twitter technorati facebook