lucene以及solr作为索引工具已经被广泛使用,以前项目中也有用到过lucene4.x,如今lucene版本已经到5.1了,再次了解一下,来写个demo!
首先附一下文档及下载地址:
a:下载地址
b:文档地址
所需jar包(只附lucene相关jar):
- lucene-analyzers-common-5.1.0.jar
- lucene-core-5.1.0.jar
- lucene-queries-5.1.0.jar
- lucene-queryparser-5.1.0.jar
索引的创建
我们这里先讲一下索引的创建,官方已经给出了一个demo的代码,我们这边模拟一下实际开发中的demo,模拟一下需求 如下
1,首先,索引的建立分为两个部分,a:直接重新建立,b:在原有基础上更新部分索引
2,通过定时器在一定时间内更新索引,或者重建索引(当然如果需要也可增加人工点击操作)
3,索引的一些配置信心应当存放于配置文件中,方便修改
首先,假设需要查询一个产品相关的信息,定义一下po
package com.demo.test;import java.util.Date;public class Product { private String id; private String name; private String keywords; private String description; private String sn; private Date updatedTime; public Product() { super(); } public Product(String id, String name, String keywords, String description, String sn) { super(); this.id = id; this.name = name; this.keywords = keywords; this.description = description; this.sn = sn; } public String getId() { return id; } public void setId(String id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public String getKeywords() { return keywords; } public void setKeywords(String keywords) { this.keywords = keywords; } public String getDescription() { return description; } public void setDescription(String description) { this.description = description; } public String getSn() { return sn; } public void setSn(String sn) { this.sn = sn; } public Date getUpdatedTime() { return updatedTime; } public void setUpdatedTime(Date updatedTime) { this.updatedTime = updatedTime; }}
可以根据需要定义product需要建立索引的字段配置,以及product索引文件的存放路径,那么可以定义一个configBean,通过spring加载配置信息
package com.demo.test.config;import java.util.List;public class ConfigBean { private String indexName; private String storePath; private String tempPath; private int everyPage; private Field key; private Listfields; public static class Field { private String name; private boolean stored; private int indexOption; private boolean tokenized; public String getName() { return name; } public void setName(String name) { this.name = name; } public boolean isStored() { return stored; } public void setStored(boolean stored) { this.stored = stored; } public int getIndexOption() { return indexOption; } public void setIndexOption(int indexOption) { this.indexOption = indexOption; } public boolean isTokenized() { return tokenized; } public void setTokenized(boolean tokenized) { this.tokenized = tokenized; } } public String getIndexName() { return indexName; } public void setIndexName(String indexName) { this.indexName = indexName; } public int getEveryPage() { return everyPage; } public void setEveryPage(int everyPage) { this.everyPage = everyPage; } public Field getKey() { return key; } public void setKey(Field key) { this.key = key; } public List getFields() { return fields; } public void setFields(List fields) { this.fields = fields; } public String getStorePath() { return storePath; } public void setStorePath(String storePath) { this.storePath = storePath; } public String getTempPath() { return tempPath; } public void setTempPath(String tempPath) { this.tempPath = tempPath; }}
由于5.1没有了在4.x中FieldType 中的indexed属性,所以需要配置一个IndexOption的配置,我们这里可以添加一个IndexOption的枚举,方便配置
package com.demo.test.config;import org.apache.lucene.index.IndexOptions;public enum IndexTypeOptions { NONE(0, IndexOptions.NONE), DOCS(1, IndexOptions.DOCS), DOCS_AND_FREQS(2, IndexOptions.DOCS_AND_FREQS), DOCS_AND_FREQS_AND_POSITIONS(3, IndexOptions.DOCS_AND_FREQS_AND_POSITIONS), DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS( 4, IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); private IndexTypeOptions(int type, IndexOptions option) { this.type = type; this.option = option; } private int type; private IndexOptions option; public int getType() { return type; } public void setType(int type) { this.type = type; } public IndexOptions getOption() { return option; } public void setOption(IndexOptions option) { this.option = option; } public static IndexOptions fromType(int type) { for (IndexTypeOptions ito : IndexTypeOptions.values()) { if (ito.getType() == type) return ito.getOption(); } return IndexTypeOptions.NONE.getOption(); }}
下面添加一下product的存放路径配置,以及product个字段的索引配置
product.indexName = itemmasterproduct.indexPath = E\:/TestLucene/index/itemmaster/0product.tempPath = E:/TestLucene/index/tmpproduct.everyPage = 5000
这里就可以开始创建索引了,说明一下思路
1,重新创建索引(需保证原有索引能够正常使用,因此重新创建时需在一个临时目录下创建完成在移到真正的索引目录下)
a:首先需查询出所有的需要创建的product list(当然如果数据量过大需要分页)
b:删除原有创建索引目录下的文件(即tmp目录)
c:通过指定的配置信息创建索引
d:将临时目录下创建的索引copy到实际目录下
2,更新索引
a:首先需查询出所有的需要跟新的product list(当然如果数据量过大需要分页)一般需通过一个updateTime字段来表示
b:通过指定的配置信息更新索引
索引创建步骤
- 获取IndexWriter //用于将documents写入索引文件,每一条document对应一条po记录
- 把po转换为document
- 将document通过IndexWriter 写入索引
具体创建索引的代码可参考如下:
package com.demo.test;import java.io.File;import java.io.IOException;import java.lang.reflect.InvocationTargetException;import java.nio.file.Paths;import org.apache.commons.beanutils.BeanUtils;import org.apache.commons.lang3.StringUtils;import org.apache.log4j.Logger;import org.apache.lucene.analysis.Analyzer;import org.apache.lucene.analysis.standard.StandardAnalyzer;import org.apache.lucene.document.Document;import org.apache.lucene.document.Field;import org.apache.lucene.document.FieldType;import org.apache.lucene.index.IndexWriter;import org.apache.lucene.index.IndexWriterConfig;import org.apache.lucene.index.IndexWriterConfig.OpenMode;import org.apache.lucene.index.Term;import org.apache.lucene.store.Directory;import org.apache.lucene.store.FSDirectory;import com.demo.test.config.ConfigBean;import com.demo.test.config.ConfigurationLoader;import com.demo.test.config.IndexTypeOptions;public class IndexUtils { protected static Logger logger = Logger.getLogger(IndexUtils.class); public static void rebuildOrUpdateIndex(Iterableproducts, boolean create) throws IOException { ConfigBean config = ConfigurationLoader.getProductConf(); if (create) { FileUtils.cleanDirectiory(config.getTempPath()); } IndexWriter writer = getIndexWriter(config, create); try { for (Product product : products) { if (!create) { ConfigBean.Field key = config.getKey(); writer.deleteDocuments(new Term(key.getName(), product .getId())); } writer.addDocument(createProductDoc(product, config)); } } catch (Exception e) { logger.error(e.getMessage(), e); throw e; } finally { if (writer != null) { try { writer.close(); } catch (IOException e) { e.printStackTrace(); } writer = null; } } if (create) { try { FileUtils.copyDirectiory(config.getTempPath(), config.getStorePath()); logger.info("index files is copied from " + config.getTempPath() + " to " + config.getStorePath()); } catch (IOException e) { logger.error("error with copy index files: ", e); throw e; } } } private static IndexWriter getIndexWriter(ConfigBean config, boolean create) throws IOException { Analyzer analyzer = new StandardAnalyzer(); IndexWriterConfig iwc = new IndexWriterConfig(analyzer); String indexPath = null; if (create) { // Create a new index in the directory, removing any // previously indexed documents: iwc.setOpenMode(OpenMode.CREATE); indexPath = config.getTempPath(); } else { // Add new documents to an existing index: iwc.setOpenMode(OpenMode.CREATE_OR_APPEND); indexPath = config.getStorePath(); } System.out.println("indexPath:" + indexPath); File file = new File(indexPath); if (!file.exists()) { file.mkdir(); } Directory dir = FSDirectory.open(Paths.get(indexPath)); // Optional: for better indexing performance, if you // are indexing many documents, increase the RAM // buffer. But if you do this, increase the max heap // size to the JVM (eg add -Xmx512m or -Xmx1g): // // iwc.setRAMBufferSizeMB(256.0); IndexWriter writer = new IndexWriter(dir, iwc); return writer; } private static Document createProductDoc(Product product, ConfigBean config) { ConfigBean.Field key = config.getKey(); Document doc = new Document(); FieldType fieldType = new FieldType(); fieldType.setStored(key.isStored()); fieldType.setIndexOptions(IndexTypeOptions.fromType(key .getIndexOption())); fieldType.setTokenized(key.isTokenized()); doc.add(new Field(key.getName(), product.getId(), fieldType)); for (ConfigBean.Field field : config.getFields()) { String value = getProperty(product, field.getName()); if (StringUtils.isNotEmpty(value)) { FieldType field_type = new FieldType(); field_type.setStored(field.isStored()); field_type.setIndexOptions(IndexTypeOptions.fromType(field .getIndexOption())); field_type.setTokenized(field.isTokenized()); doc.add(new Field(field.getName(), value, field_type)); } } return doc; } private static String getProperty(Object obj, String name) { String result = null; try { result = BeanUtils.getProperty(obj, name); } catch (IllegalAccessException e) { logger.error(e.getMessage(), e); } catch (InvocationTargetException e) { logger.error(e.getMessage(), e); } catch (NoSuchMethodException e) { logger.error(e.getMessage(), e); } return result; }}
package com.demo.test;import java.io.BufferedInputStream;import java.io.BufferedOutputStream;import java.io.File;import java.io.FileInputStream;import java.io.FileOutputStream;import java.io.IOException;import org.apache.log4j.Logger;public class FileUtils { protected static Logger logger = Logger.getLogger(FileUtils.class); public static void copyDirectiory(String sourceDir, String targetDir) throws IOException { File source = new File(sourceDir); File target = new File(targetDir); if (!source.exists()) { logger.error("source " + sourceDir + " is empty"); return; } if (!target.exists()) { if (!target.mkdirs()) { logger.error("fail to create folder " + targetDir); return; } } cleanDirectiory(targetDir); for (File file : source.listFiles()) { if (file.isFile()) { copyFile(file, new File(target, file.getName())); } if (file.isDirectory()) { copyDirectiory(file.getAbsolutePath(), target.getAbsolutePath() + File.separator + file.getName()); } } } public static void copyFile(File source, File target) throws IOException { BufferedInputStream inBuff = null; BufferedOutputStream outBuff = null; try { inBuff = new BufferedInputStream(new FileInputStream(source)); outBuff = new BufferedOutputStream(new FileOutputStream(target)); byte[] b = new byte[1024 * 5]; int len; while ((len = inBuff.read(b)) != -1) { outBuff.write(b, 0, len); } outBuff.flush(); } finally { if (inBuff != null) { try { inBuff.close(); } catch (IOException e) { } } if (outBuff != null) { try { outBuff.close(); } catch (IOException e) { } } } } public static void cleanDirectiory(String sourceDir) { File folder = new File(sourceDir); if (folder.exists()) { if (folder.isDirectory()) { for (File file : folder.listFiles()) { if (file.isDirectory()) { cleanDirectiory(file.getAbsolutePath()); } file.delete(); } } else { logger.warn("'" + sourceDir + "' is not a Directory!"); } } }}
索引的查询
我们这里先进行一个简单的查询,通过一个或者多个关键字(通过空格分开),在以创建的索引的字段中查询所有符合条件的记录
索引查询步骤
- 获取IndexReader,读取已创建的索引文件
- 创建IndexSearcher
- 创建Query(这里是最重要的一步,有很多不同的query方式,它决定了你的查询结果)
- 查询结果获取TopDocs-->scoreDocs
- scoreDocs获取每一条document记录并转化为POJO
具体查询代码如下:
package com.demo.test;import java.io.IOException;import java.nio.file.Paths;import java.util.ArrayList;import java.util.List;import org.apache.commons.lang3.StringUtils;import org.apache.lucene.document.Document;import org.apache.lucene.index.DirectoryReader;import org.apache.lucene.index.IndexReader;import org.apache.lucene.index.Term;import org.apache.lucene.queryparser.classic.ParseException;import org.apache.lucene.queryparser.classic.QueryParser;import org.apache.lucene.search.BooleanClause;import org.apache.lucene.search.BooleanQuery;import org.apache.lucene.search.IndexSearcher;import org.apache.lucene.search.ScoreDoc;import org.apache.lucene.search.TermQuery;import org.apache.lucene.search.TopDocs;import org.apache.lucene.search.WildcardQuery;import org.apache.lucene.store.FSDirectory;import com.demo.test.config.ConfigurationLoader;/** Simple command-line based search demo. */public class SearchUtils { private final static String[] Search_FieldNames = {"name","sn","keywords"}; public static Listsearch(String keywords) throws IOException, ParseException { List result = new ArrayList (); String index = ConfigurationLoader.getProductConf().getStorePath(); IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths .get(index))); IndexSearcher searcher = new IndexSearcher(reader); /*Analyzer analyzer = new StandardAnalyzer(); QueryParser parser = new QueryParser("sn", analyzer); Query query = parser.parse(keywords);*/ keywords=keywords.trim().toLowerCase(); BooleanQuery query = new BooleanQuery(); String[] ks = keywords.split(" "); for (String fieldName : Search_FieldNames) { query.add(new TermQuery(new Term(fieldName, QueryParser.escape(keywords))), BooleanClause.Occur.SHOULD); query.add(new WildcardQuery(new Term(fieldName, "*" + QueryParser.escape(keywords) + "*")), BooleanClause.Occur.SHOULD); for (String k : ks) { if (StringUtils.isNotEmpty(k.trim())) { String k_trim = k.trim(); query.add(new TermQuery(new Term(fieldName, QueryParser.escape(k_trim))), BooleanClause.Occur.SHOULD); query.add(new WildcardQuery(new Term(fieldName, "*" + QueryParser.escape(k_trim) + "*")), BooleanClause.Occur.SHOULD); } } } TopDocs results = searcher.search(query, 100); ScoreDoc[] hits = results.scoreDocs; int numTotalHits = results.totalHits; System.out.println(numTotalHits + " total matching documents"); for (ScoreDoc scoreDoc : hits) { Document doc = searcher.doc(scoreDoc.doc); result.add(getItemFromDoc(doc)); } return result; } private static Product getItemFromDoc(Document doc) { Product product = new Product(); product.setId(doc.get("id")); product.setKeywords(doc.get("keywords")); product.setName(doc.get("name")); product.setSn(doc.get("sn")); return product; }}
查询结果不建议过大,可查询5-10页缓存起来,这些数据也可进行分页
关于Query更多类型的操作可以查看这里
附:我的测试代码
package com.demo.test;import java.io.BufferedWriter;import java.io.File;import java.io.FileWriter;import java.io.IOException;import java.util.List;import org.apache.log4j.Logger;import org.apache.lucene.queryparser.classic.ParseException;import org.springframework.context.ApplicationContext;import org.springframework.context.support.ClassPathXmlApplicationContext;public class Startup{ private static final Logger logger = Logger.getLogger(Startup.class); private ApplicationContext context; public ApplicationContext getContext() { return context; } public void setContext(ApplicationContext context) { this.context = context; } public static void main(String[] args) throws IOException, ParseException { Startup startup = new Startup(); String[] locations = new String[]{"applicationContext-*.xml"}; ApplicationContext _context = new ClassPathXmlApplicationContext(locations); startup.setContext(_context); logger.info("加载Spring容器到BeanFactory..."); System.out.println("11111111111111111111"); /*Listproducts =new ArrayList (); Random ran = new Random(); File dataFile=new File("E://data.txt"); FileWriter fileWriter=new FileWriter(dataFile); BufferedWriter bw=new BufferedWriter(fileWriter); for(int i=0;i<2;i++){ String id=UUID.randomUUID().toString(); String name="test"+i+ran.nextInt(5000); String sn="sn-"+333; String keywords="just test-"+UUID.randomUUID().toString().replace("-", ""); products.add(new Product(id, name, keywords, null, sn)); bw.write(id); bw.write("\t"); bw.write(name); bw.write("\t"); bw.write(keywords); bw.write("\t"); bw.write(sn); bw.newLine(); } fileWriter.flush(); bw.close(); fileWriter.close(); System.out.println(products.size()); try { System.out.println("start build index "+new Date()); IndexUtils.rebuildOrUpdateIndex(products, false); System.out.println("end build index"+new Date()); } catch (IOException e) { e.printStackTrace(); }*/ String keywords=" 333" ; List search = SearchUtils.search(keywords); File resFile=new File("E://resFile.txt"); FileWriter fileWriter=new FileWriter(resFile); BufferedWriter bw=new BufferedWriter(fileWriter); for (Product product : search) { bw.write(product.getId()); bw.write("\t");// bw.write(product.getName()); // bw.write("\t"); bw.write(product.getKeywords()); bw.write("\t"); bw.write(product.getSn()); bw.newLine(); } fileWriter.flush(); bw.close(); fileWriter.close(); }}
项目路径如下