Module: CodeZauker

Defined in:
lib/code_zauker.rb,
lib/code_zauker/version.rb,
lib/code_zauker/constants.rb

Overview

This module implements a simple reverse indexer based on Redis The idea is ispired by swtch.com/~rsc/regexp/regexp4.html

Defined Under Namespace

Classes: FileScanner, Util

Constant Summary

GRAM_SIZE =
3
SPACE_GUY =
" "*GRAM_SIZE
VERSION =
"0.0.3"
MAX_PUSH_TRIGRAM_RETRIES =
3
TRIGRAM_DEFAULT_PUSH_SIZE =

Stats It is difficult to decide what is the best trigram push size. a larger one ensure a best in memory processing but can lead to longer transactions 6000 Ehuristic value used for historical reasons

6000
DEFAULT_EXCLUDED_EXTENSION =
[
# Documents
".pdf",
".xps",
".zip",".7z",
# MS Office zip-like files...
".pptx",".docx",".xlsx",
".ppt",".xls",".rtf",".vsd", ".odf",
# Binary bad stuff
".dll",".exe",".out",".elf",".lib",".so",
# Redis db
".rdb",
# Ruby and java stuff-like
".gem",
".jar",".class",".ear",".war",
".tar",
".gz",
".dropbox",
".svn-base",".pdb",".cache",                             
# Music exclusion
".mp3",".mp4",".wav",
# Image exclusion
".png",".gif",".jpg",".bmp",
# Temp stuff
".tmp","~",
# Oracle exports...
".exp"
]