Tuesday, December 11, 2012

Lucene Tutorial(1) - How to index Lucene


I'm first in use of lucene. I think this blog is not perfect for you because I'm not professional.
but what I wrote is by reading and stuying the book of Lucene in action 2.
so I hope this blog is going to be useful some of you

1. Indexing (3.6.1 version)

why Use?

when i can use this lucene?

Suppose~!  you have many text-files in your local computer, window xp, 7 and so on  search programs allow you find filename but not content in textfiles. 
how do search content?

so you have to use search engine like lucene


how to Index ?


explain :  

indexDir = directory that you want to index   (a.txt, b.txt, c.txt)
dataDir = destination directory where indexing file will store

IndexWriter = it is like translator original file into indexed file

TextFilesFilter class = filtering file extension
Document  = it is like file info class

Running this code
your first experience is done.


public class Indexer {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
throw new IllegalArgumentException("Usage: java " + Indexer.class.getName() + " <index dir> <data dir>");
}
final String indexDir = args[0];
final String dataDir = args[1];
final long start = System.currentTimeMillis();
final Indexer indexer = new Indexer(indexDir);
int numIndexed;

try {
numIndexed = indexer.index(dataDir, new TextFilesFilter());
} finally {
indexer.close();
}
final long end = System.currentTimeMillis();
System.out.println("Indexing " + numIndexed + " files took " + (end - start) + " milliseconds");

}

private IndexWriter writer;

public Indexer(
String indexDir) throws IOException {
final Directory dir = FSDirectory.open(new File(indexDir));
writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED);
}

public void close() throws IOException {
writer.close();
}

public int index(String dataDir, FileFilter filter) throws Exception {
final File[] files = new File(dataDir).listFiles();
for (final File f : files) {
if (!f.isDirectory() && !f.isHidden() && f.exists() && f.canRead() && (filter == null || filter.accept(f))) {
indexFile(f);
}
}
return writer.numDocs();
}

private static class TextFilesFilter implements FileFilter {
@Override
public boolean accept(File path) {
return path.getName().toLowerCase().endsWith(".txt");
}
}

protected Document getDocument(File f) throws Exception {
final Document doc = new Document();
doc.add(new Field("content", new FileReader(f)));
doc.add(new Field("filename", f.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("fullpath", f.getCanonicalPath(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new NumericField("filesize").setIntValue((int) f.length()));
System.out.println(f.length());

return doc;
}

private void indexFile(File f) throws Exception {
System.out.println("Indexing " + f.getCanonicalPath());
System.out.println(f.lastModified());
final Document doc = getDocument(f);
writer.addDocument(doc);
}
}




No comments:

Post a Comment