Mr.G

Tuesday, December 9, 2014

Lucene Search API : Lucene Search implementation - Part 2

For understanding what is Lucene Index and creating Lucene Indexes, please take a look into my previous article here

Searching on the Index file:

Searching is the process of looking for words in the index and finding the documents that contain those words.
The following are the fundamental Lucene classes for searching a given text: Searcher, Term.

Searcher : Searcher is an abstract base class that has various search methods. The Search method returns an ordered collection of documents ranked by computed scores. Lucene calculates a score for each of the documents that match a given query.

IndexSearcher is most commonly used subclass that allows searching indices stored in a given directory. IndexSearcher is thread-safe, a single instance can be used by multiple threads concurrently.

SearcherFactory is a factory class used to customize the search results.

Term : Term is the most fundamental unit for searching. It's composed of two elements: the text of the word and the name of the field(column name) in which the text occurs.

Query : Query is an abstract base class for queries. Searching for a specified word or phrase involves wrapping them in a term, adding the terms to a query object, and passing this query object to IndexSearcher's search method.

Lucene comes with various types of concrete query implementations, such as TermQuery, BooleanQuery, PhraseQuery, PrefixQuery, RangeQuery, MultiTermQuery, FilteredQuery, SpanQuery, etc.

Displaying Results :

IndexSearcher returns an array of references to ranked search results, such as documents. Primary classes involved in retrieving the search results are ScoreDoc and TopDocs.

ScoreDoc A simple pointer to a document contained in the search results. This encapsulates the position of a document in
the index and the score(number of matches) computed by Lucene.

TopDocs Encapsulates the total number of search results and an array of ScoreDoc

Charles proxy - configuration behind firewall

Charles is an HTTP proxy / HTTP monitor / Reverse Proxy that enables a developer to view all of the HTTP and SSL / HTTPS traffic between their machine and the Internet. This includes requests, responses and the HTTP headers (which contain the cookies and caching information). (Source )

You can download the Charles proxy from charles

If your machine is not under any firewall then configuring Charles is pretty straightforward and is clearly explained here

However if your machine is under firewall then the following changes are required on your machine.

Steps to do this on Windows 7 machine :

Control Panel ---> Network and Internet ---> Network and Sharing Center
Click on Windows Firewall (located at the bottom left)
Advanced Settings
Click on Inbound Rules (located at top left)
Click on New Rule (located at top right)
Rule Type --> Program --> Next
Program ---> Browse to the Charles.exe installation path --> Next
Action ---> Allow the connection ---> Next
Profile ---> Select All the options ---> Next
Name ---> Assign any name and Click Finish.

Now in the Under Windows Firewall window --> Click "Allow a program or feature.." to verify the rule that is created. If the rule is successfully created it should appear here and you good to play with Charles Proxy.

Happy Proxying !!

Saturday, February 2, 2013

Lucene Search API : Introduction to Lucene - Part 1

In the Lucene Search API Part 1, lets look into following

What is Lucene
What is Indexing
How to do Indexing

If you are interesting in Lucene searching please look into my other post on Lucene Searching.

What is lucene :

Lucene is an open source, highly scalable text search-engine java library available from the Apache Software Foundation. You can use Lucene in commercial and open source applications. Lucene's powerful APIs focus mainly on text indexing and searching.

It can be used to build search capabilities for applications such as e-mail clients, mailing lists, Web searches, database search, etc.

Lucene Core, provides Java-based indexing and search technology,as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.
Web sites like Wikipedia, TheServerSide, jGuru, Eclipse help system and LinkedIn have been powered by Lucene.

Basically developing search applications with lucene core involves three steps.

Preparing the Index file of the database
Searching on the Index file
Displaying Results.

Preparing Index file for the database

What is Indexing :
Indexing is a process of converting text data into a format that facilitates rapid searching on a database table at the cost of slower writes and increased storage space. Lucene stores the input data in a data structure called an inverted index, which is stored on the file system or memory as a set of index files.
Indexing and Analysis are the two steps involved in creating a index file.

Indexing :Lucene stores the input data in a data structure called an inverted index, which is stored on the file system or memory as a set of index files. It lets users perform fast keyword look-ups and finds the documents that match a given query. Before the text data is added to the index, it is processed by an analyzer.

Analysis : Analysis is converting the text data into a fundamental unit of searching, which is called as term. During analysis, the text data goes through multiple operations: extracting the words, removing common words, ignoring punctuation, reducing words to root form, changing words to lowercase, etc. Analysis happens just before indexing and query parsing. Analysis converts text data into tokens, and these tokens are added as terms in the Lucene index.
Note : As analysis removes words before indexing, it decreases index size, but it can have a negative effect on precision query processing. You can have more control over the analysis process by creating custom analyzers using basic building blocks provided by Lucene.

The following are the fundamental Lucene classes for indexing text: Directory, IndexWriter, Analyzer, Document, and Field.
Directory : An abstract class that represents the location where index files are stored.
FSDirectory : Base class for Directory implementations that store index files in the file system.
RAMDirectory : A memory-resident Directory implementation. This class is not intended to work with huge indexes.
IndexWriter : The IndexWriter class creates a new index and adds documents to an existing index. It provides methods to add, delete, or update documents in the index.

Before text is indexed, it is passed through an Analyzer. Analyzer classes are in charge of extracting indexable tokens out of text to be indexed and eliminating the rest. Lucene comes with a few different Analyzer implementations. Some of them deal with skipping stop words (frequently used words that don't help distinguish one document from the other, such as a, an, the, in, and on), for instance, while others deal with converting all tokens to lowercase letters, so that searches are not case sensitive.

Analyzer : As discussed, the analyzers are responsible for preprocessing the text data and converting it into tokens stored in the index. IndexWriter accepts an analyzer used to tokenize data before it is indexed. To index text properly, you should use an analyzer that's appropriate for the language of the text that needs to be indexed.
Ex: WhitespaceAnalyzer, SimpleAnalyzer, StandardAnalyzer.
Default analyzers work well for the English language. There are several other analyzers in the Lucene sandbox, including those for Chinese, Japanese, and Korean.

Other Important terms of Lucene

An index consists of a set of Documents, and each Document consist of one or more Fields.
Each Field has a name and a value. You can think of a Document as a row in an RDBMS, and Fields as columns in that row.

Field : A field is a section of a Document. Each field has two parts, a name and a value. Fields are optionally
stored in the index, so that they may be returned with hits on the document.

Document : Documents are the unit of indexing and search. A Document is a set of fields. Each document should typically
contain one or more stored fields which uniquely identify it.

Searching on Index files is explained here

Thursday, May 19, 2011

Development Environment setup

Blackberry development can be done either in RIM's JDE or in Eclipse IDE with Blackberry plugin.

Lets look into the complete details:

Wednesday, May 18, 2011

1.1 List of Blackberry devices with resolutions

Here is the list of Blackberry devices available as of now.

BlackBerry 7100 ------------------- 240 x 260

BlackBerry 7100i ------------------- 240 x 260

BlackBerry 7100g ------------------- 240 x 260

BlackBerry 7100r ------------------- 240 x 260

BlackBerry 7100v ------------------- 240 x 260

BlackBerry 7100x ------------------- 240 x 260

BlackBerry 7100t ------------------- 324 x 352

BlackBerry 7105t -------------------- 240 x 260

BlackBerry 7130 --------------------- 240 x 260

BlackBerry 7130c -------------------- 240 x 260

BlackBerry 7130e --------------------- 240 x 260

BlackBerry 7130g -------------------- 240 x 260

BlackBerry 7130v --------------------- 240 x 260

BlackBerry 7210 ---------------------- 240 x 160

BlackBerry 7220 --------------------- 240 x 160

BlackBerry 7230 --------------------- 240 x 160

BlackBerry 7250 --------------------- 240 x 160

BlackBerry 7270 --------------------- 240 x 160

BlackBerry 7280 --------------------- 240 x 160

BlackBerry 7290 -------------------- 240 x 160

BlackBerry 7510 ---------------------- 240 x 160

BlackBerry 7520 ---------------------- 240 x 160

BlackBerry 7730 ----------------------- 240 x 240

BlackBerry 7750 ----------------------- 240 x 240

BlackBerry 7780 ---------------------- 240 x 240

BlackBerry 8100 ---------------------- 240 x 260

BlackBerry 8120 ----------------------- 240 x 260

BlackBerry 8130 ------------------------ 240 x 260

BlackBerry 8220 ------------------------- 240 x 320

BlackBerry 8300 ------------------------- 320 x 240

BlackBerry 8310 -------------------------- 320 x 240

Blackberry 8320 -------------------------- 320 x 240

BlackBerry 8330 -------------------------- 320 x 240

BlackBerry 8350i ------------------------- 320 x 240

BlackBerry 8520 ------------------------ 320 x 240

BlackBerry 8530 ---------------------- 320 x 240

BlackBerry 857 --------------------------- 160 x 160

BlackBerry 8700 (c/r/f/g) ---------------- 320 x 240

BlackBerry 8703e ------------------------- 320 x 240

BlackBerry 8707 (g/h/v) --------------------- 320 x 240

BlackBerry 8800 ----------------------------- 320 x 240

BlackBerry 8820 ----------------------------- 320 x 240

BlackBerry 8830 ------------------------------ 320 x 240

BlackBerry 8900 ------------------------------ 480 x 360

BlackBerry 9000 ----------------------------- 480 x 320

BlackBerry 9105 ------------------------------- 360 x 400

BlackBerry 9501 -------------------------------- 32 x 65

BlackBerry 9500 ------------------------------ 360 x 480

BlackBerry 9520 ------------------------------- 360 x 480

BlackBerry 9530 ------------------------------- 360 x 480

BlackBerry 9550 ------------------------------- 360 x 480

BlackBerry 957 --------------------------------- 160 x 160

BlackBerry 9630 ------------------------------- 480 x 360

BlackBerry 9700 -------------------------------- 480 x 360

BlackBerry 9800 --------------------------------- 360X480

1. Introduction to Blackberry

Hi All,

Welcome !!

Blackberry is one of leading business phone from Canadaian telecom gaint RIM

Blackberry phones got its fame for their ability to send and receive Push messages, Emails

through mobiles networks are wireless networks. Indeed it is first of its kind.

Blackberry provides robust, efficient, effective and high speed connectivity through

Blackberry Enterprise Services (BES)

Blackberry Internet Services (BIS)

Blackberry extends its services in multiple forms like

PDAs

Smartphones

Blackberry Messenger

Play book (Tablets)

Blackberry App World (where you can find multiple apps)

Blackberry enjoys different OS increment levels

from 4.0, 4.2, 4.3 4.5, 4.6, 4.7, 5.0, 6.0

As a matter of fact, when it comes to UI Blackberry is behind its competitors iPhone and Android. Don't get panic, here is the good news, RIM is going to release its highly challenge OS in terms of UI, which is supposed to compete with iPhone and Android.

List of Blackberry devices available here. (as of now. more to join the list)

Wednesday, May 11, 2011

Unix Commands - Know about other people

About other people

w --- tells you who's logged in, and what they're doing. Especially useful: the 'idle' part. This allows you to see whether they're actually sitting there typing away at their keyboards right at the moment.
who --- tells you who's logged on, and where they're coming from. Useful if you're looking for someone who's actually physically in the same building as you, or in some other particular location.
finger username --- gives you lots of information about that user, e.g. when they last read their mail and whether they're logged in. Often people put other practical information, such as phone numbers and addresses, in a file called .plan. This information is also displayed by 'finger'.
last -1 username --- tells you when the user last logged on and off and from where. Without any options, last will give you a list of everyone's logins.
talk username --- lets you have a (typed) conversation with another user
write username --- lets you exchange one-line messages with another user
elm --- lets you send e-mail messages to people around the world (and, of course, read them). It's not the only mailer you can use, but the one we recommend. See the elm page, and find out about the departmental mailing lists (which you can also find in /user/linguistics/helpfile).