HierarchyViewer missed in Android SDK r07 for Windows

I just updated to Android SDK-r07 and find out that I don’t have hierarchy-viewer tool anymore, what a nice surprise 🙂
I know this program is stored in hierarchyviewer.jar so just created the hv.bat in tools directory with just

java -Xmx512m  -jar ./lib/hierarchyviewer.jar

.. and it works for me.

Another way – is to download Android-SDK-r06 and take hierarchyviewer.bat from it. Still don’t know how it happens that Android-SDK-r07 miss that great tool launcher

For those who don’t know nothing about that great tool, I recommend to take a look at article about layout-efficiency in android and about layout-optimization – and start using and in your layouts. And of course spend some time exploring your view-s in HierarchyViewer.

Tagged with: , , ,
Posted in Android, Tips and Tricks

Show dynamic progress of time-consuming process in Seam/RichFaces

Sometimes you need to start really time-consuming task on the server, but doing that with simple h:commandAction which will re-render page after it is done is not a good idea. Showing a4j:status (which could be blocking – see “avoid concurrent call to conversation“) – is a bit better and will work for relatively short tasks (like 5 – 15 seconds, matter of taste actually) but still not good solution for really long tasks (something more than 1 minute)

For really long tasks I recommend to show the progress indicator (it’s also known usability fact that user will think that task with dynamic progress-bar is faster than the same task but without progress-bar)

There is a quite good component to do it with Richfaces – it’s progressBar component, the only problem to do it with Seam is that you should initiate progress (by calling the action) and that action should immediately return and start some background process.  It’s always bad practice to use Thread’s in the web-containers, in Seam we have alternative Asynchronous tasks, but.. the problem with them is that they actually will be running in completely separate scope (they don’t know nothing about your conversation scope)

The trick here is to pass all the required parameters to asynchronous method and use them for returning the results too (in the conversation). That way we are not crazy about memory issues, since as long the long computation process will be ended the memory will be released (the reference to the object will be only in the initiator – the conversation scoped bean)

So, the solution may look like that.

  1. Define the PrgressBean.java which will hold all required initialization parameters for the process and methods to access the progress-state
  2. add “progressBean” attribute (+getter) in your action-class (ConversationBean.java)
  3. define methods to start process which will make a call to asynchronous method in other “LongProcess.java” and pass the progress bean to it
  4. add rich:progressBar and start/stop buttons in “commandPanel” (could be done inside progressBar only), + “updatedResults” – here you can show intermediate and final results during the process (optional)

In short – it is all you need to show very informative long-running process progress.

Future Notes:

Richfaces has few other ways to build your own progressBean – it is a4j:poll and a4j:push components. rich:progressBean actually utilize “pooling” approach, in most cases it’s quite enough to periodically update progress/results for the user, so actually there is no much sense to write your own  approach using a4j:poll.  a4j:push maybe quite good alternative to rich:progressBar since it use much less traffic and doesn’t update JSF tree (so it’s potentially a better alternative). I think you can easily adapt described approach to use a4j:push – you just need to add few pieces  (addListener method and send events to the listener during the process)


Java Code:

public class LongProcess {
    private ProgressBean progress
    public void startProcess(ProgressBean progress) {
        this.progress = progress;
        try {
        } finally {

   private void runProcess(){
     //perform your time-consuming operations here and periodically update the progress status
     if (progress.shouldStop()){
        //finish long process and return

public class ConversationBean  {
    private ProgressBean progressBean = new ProgressBean();
    @In LongProcess longProcess;

    public void startProcess(){
        if (!progressBean.isInProgress()) {
           progressBean = createNewProgressBean(); //initialize it with required parameters here
    public ProgressBean getProgressBean(){
        return progressBean;
    public void stopProcess(){
       progressBean.stop(); //update the internal state of progress-bean so long-process will check it and stop

Richfaces/JSF code

<h:panelGroup id="commandPanel">
<a4j:commandButton action="#{conversationBean.startProcess}"
        value="Start" onclick="this.disabled=true;"
<a4j:commandButton action="#{conversationBean.stopProcess}"
        value="Stop" onclick="this.disabled=true;"

<a4j:outputPanel id="proggressPanel">
<rich:progressBar value="#{conversationBean.progressBean.progress}"
                  label="#{conversationBean.progressBean.progress} %"
                  minValue="-1" maxValue="100"
                  reRender="#{conversationBean.shouldUpdateTable ? 'updatedResults':'anyEmptyComponent'}"
                  reRenderAfterComplete="proggressPanel, updatedResults, commandPanel">
    <f:facet name="initial">
        <h:outputText value="&amp;lt; Click to start"/>
       <!-- we also may show here button to start process as in RichFaces example (I use separate commandPanel instead) -->
    <f:facet name="complete">
        <h:outputText value="Process Completed"/>
      <!-- we also may show here button to restart process as in RichFaces example -->

<h:panelGroup id="updatedResults">
      <!-- it could be used to show results (for example list of processed or generated rows) -->
     <rich:dataTable value="#{conversationBean.progressBean.items}>

Please take a notice at 2 parameters used in rich:progressBar
it use a method conversationBean.updateRate to determine the update rate. I’s optional and could be just hardcoded to some value like “1000” (1 second). Can be used to dynamically set it to appropriate value, so your progress bar will not be updated to often and can be even changed during a process to fit to your real update rate

reRender="#{conversationBean.shouldUpdateTable ? 'updatedResults':'anyEmptyComponent'}"
As you see reRender here use dynamic condition, so conversationBean has a control other it and can skip heavy “updatedResults” update to save a traffic

Tagged with: , , , , , , , , , , ,
Posted in Software Development, Tips and Tricks

Free Hosted Redmine

I like Redmine, it’s simple but yet powerful, one tool with almost all you need for managing project  (Issue tracker, time-sheets, wiki, file-share tool, cool filtering and reporting tools)

Just Like It.

Here is the link to the free redmine hosting (you can still have your project private) the only limitation I found – you can’t create users on your own (i.e you don’t have Admin rights which of course correct for free public service). Of course you can’t install plugins, but even without that Redmine offer you a lot

Thank you guys for your service https://www.hostedredmine.com/

And thanks to Redmine creators, I wish I will contribute to Redmine too.

Tagged with: , , ,
Posted in Software Development

How to view android sources in Eclipse

I don’t know any more straightforward way to work with (view/browse) android sources in eclipse on the moment. I have downloaded sources from GIT before, but that actually doesn’t allow me to browse them in Eclipse. I still have to find the source on the disk to browse it, since they are scattered in different folders.

I find this article very helpful http://android.opensourceror.org/2010/01/18/android-source/ (all information below is actually taken from this article)

So, in short – download zipped sources from

Unzip it them in the corresponding \source folder for particular platform/SDK  ${android-sdk-home}/platforms/android-4/sources

And refresh you project in Eclipse, that’s all – it works for me in “Eclipse Galileo” and “Eclipse Helios” so I believe it should work for you too.

There is some possible problem with viewing sources in debug which is covered in the source article – “Just click on that “Edit Source Lookup Path” button to add a source lookup path.

Click edit source lookup path button

In that dialog leave “Default” selected and click “Add.”

Leave default selected=

In the following “Add Source” dialog choose “File System Directory” and hit the OK button: Then choose the source directory where you unzipped the code. The debugger should now show all the code you can debug into.

Define path to unpacked sources

Tagged with: , , , ,
Posted in Android, Software Development

3 Magic words from Google “Upload, Train, Predict” = Prediction API

Yesterday, May 20, 2010 on Google I/O the brand new API was announced by Google “Prediction API
It looks very promising and open the wide range of areas to use it of course.
In short it will expose to everyone (on the moment only to “selected everyone” on the moment since it has limited access on the moment) the ability to build their own “Supervised Classification” without coding, without writing algorithms to analyze and learn from their data.

To use that API you have to sign-up and wait until Google decide to give you an access, so don’t wait and sign-up to the Prediction API access if you are interested
So, generally you have to do 3 relatively simple things (except thinking about the classification task description)
Upload your data to Google Storage

Train classify your data set

Predict use the model which google create from your data-set to predict (classify)

Usage of Google Prediction

Looking at the answers from Google representatives I can say that is absolutely fresh pilot product but for it is very promising, especially if they will have API to describe the data-set, configure prediction models and algorithms, etc. It will allow embed complex prediction based logic even in mobile phones without having actual computation and programming on client side.  One more example of how to transform  complex and scaring into simple.

Tagged with: , , , ,
Posted in Software Development

Lucene spellchecker for TinyMCE

I just finished creation of Lucene based spellchecker for TinyMCE editor. It is based on the same code as the two previous ones “Jazzy-based” and “JMySpell-based

You can download code from the jspellchecker (http://jspellchecker.svn.sourceforge.net/viewvc/jspellchecker/trunk/)

All configuration for TinyMCE still the same, just use updated path to spellchecker-servlet

spellchecker_rpc_url&nbsp;   : "/spellchecker/lucene-spellchecker",

Current implementation is based on the org.apache.lucene.search.spell.PlainTextDictionary (it is just the list of words delimited with newlines) and have additional memory-configuration servlet parameter “max_memory_usage” (value in megabytes which define the maximum size of Lucene indexes which could be stored in memory)

Indexes for spellchecker created at the first access to the particular language after web-application startup (or pre-created for “preloadedLanguages” on servlet-startup).
To speed-up index access (and the spell checking as the result) spellchecker indexes initially created on the file-system and after that they are moved to memory
It use 2-level cache to achieve the maximum performance and memory-management.

  • 1-st level of the cache is the cache of SpellCheckers which use In-Memory (RAMDirectory) Lucene indexes
  • 2-nd level cache store File-System SpellCheckers (FSDirectory) which don’t take memory but just hold the reference to the Directory object

1-st level cache implementation (based on LinkedHashMap) is also responsible for memory-management, it guarantees that summary size of all In-Memory indexes is less than “maxMemoryUsage” (this parameter is configured in servlet init parameters in megabytes)

On the moment I’ve found one-issue of Lucene spellchecker, it’s related to multi-word processing.  For example I have “New-York” in my dictionary, but it doesn’t processed as one-word (Lucene index-reader split it into two words of course).

The extension points of that spellchecker could be

  1. Usage of IndexReader to read existent Lucene indexes
    Dictionary dictionary = new LuceneDictionary(reader, indexedField);
  2. Use extended form of suggestSimilar which boost “most-popular” terms (it need the initial index-reader, so applicable only for LuceneDictionary based on IndexReader)
    suggestions = spellChecker.suggestSimilar(word, maxSuggestionsCount, fieldIR, suggestedField, true);
    See the code examples for that in the  “Did-you-mean feature with Hibernate Search, Lucene and Seam. Example application”.
Tagged with: , , , ,
Posted in Software Development

How to launch android camera using intents

Instead of writing your own activity to capture the pictures you probably prefer (in most cases)  to use existent Camera activity which actually have good UI and features. It’s really easy to do, just launch it with Intent like in the code below.

You should be notified that using that approach doesn’t work well on API before 2.0 (at least on G1 with 1.6 it save with pictures with 512*384 resolution). On phones with API 2.0+ it save full-sized picture.  So you could use the “How to use autofocus in Android” as the starting point to create your own Camera application.

//define the file-name to save photo taken by Camera activity
String fileName = "new-photo-name.jpg";
//create parameters for Intent with filename
ContentValues values = new ContentValues();
values.put(MediaStore.Images.Media.TITLE, fileName);
values.put(MediaStore.Images.Media.DESCRIPTION,"Image capture by camera");
//imageUri is the current activity attribute, define and save it for later usage (also in onSaveInstanceState)
imageUri = getContentResolver().insert(
		MediaStore.Images.Media.EXTERNAL_CONTENT_URI, values);
//create new Intent
Intent intent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE);
intent.putExtra(MediaStore.EXTRA_OUTPUT, imageUri);
intent.putExtra(MediaStore.EXTRA_VIDEO_QUALITY, 1);
startActivityForResult(intent, CAPTURE_IMAGE_ACTIVITY_REQUEST_CODE);

the code above should start the default Camera activity on your phone, now lets define the code to handle results returned by this Intent. Please take a notice that “imageUri” is the activity attribute, define and save it for later usage (also in onSaveInstanceState)

protected void onActivityResult(int requestCode, int resultCode, Intent data) {
    if (resultCode == RESULT_OK) {
        //use imageUri here to access the image

    } else if (resultCode == RESULT_CANCELED) {
        Toast.makeText(this, "Picture was not taken", Toast.LENGTH_SHORT);
    } else {
        Toast.makeText(this, "Picture was not taken", Toast.LENGTH_SHORT);

to get the reference to File object from imageUri you can use the following code.

public static File convertImageUriToFile (Uri imageUri, Activity activity)  {
Cursor cursor = null;
try {
    String [] proj={MediaStore.Images.Media.DATA, MediaStore.Images.Media._ID, MediaStore.Images.ImageColumns.ORIENTATION};
    cursor = activity.managedQuery( imageUri,
            proj, // Which columns to return
            null,       // WHERE clause; which rows to return (all rows)
            null,       // WHERE clause selection arguments (none)
            null); // Order-by clause (ascending by name)
    int file_ColumnIndex = cursor.getColumnIndexOrThrow(MediaStore.Images.Media.DATA);
    int orientation_ColumnIndex = cursor.getColumnIndexOrThrow(MediaStore.Images.ImageColumns.ORIENTATION);
    if (cursor.moveToFirst()) {
        String orientation =  cursor.getString(orientation_ColumnIndex);
        return new File(cursor.getString(file_ColumnIndex));
    return null;
} finally {
    if (cursor != null) {

As you see you may get some more information about the image, like orientation, etc, but for things like Thumbnail – you will get null results in most cases, since the thumbnail is not yet created and you should either open Gallery to create the thumbnail or initiate it by calling MediaStore.Images.Thumbnails.getThumbnail (this is the blocking method and was introduced in API-5)

Tagged with: , , ,
Posted in Android, Tips and Tricks

Did-you-mean feature with Hibernate Search, Lucene and Seam. Example application

In the previous post I described list of changes which to build a Spellchecker-Index based on existent Lucene indexes created by Hibernate-Search.

In this post I will show the working web-application with full-text search and lucene based did-you-mean feature (I call it suggester service). I think that could help anybody to have a good starting point for later extension and show the way it could be integrated with applications.

To make it really fast-start let’s use the existent Seam example application “dvdstore” (use this article if you are not familiar with Seam).

In short to start this example you need

  1. JBoss5 (I have issues to run it on JBoss4.2.3 because of hibernate library-version I believe)
  2. Download Seam 2.2.1-CR1
  3. in $seam-home define jboss.home in build.properties
  4. Go to /examples/dvdstore and run “ant deploy”

That should build and deploy dvdstore example application on your JBoss (exploded deployment doesn’t work for me by default because of issues with JBoss-5 classloading/hot-redeployment, so just deploy it as a EAR file). After that you should access it by path http://localhost:${jboss-port}/seam-dvdstore.
Once you open it you can go to “Shopping” tab and use the searchbox which is actually execute full-text search against Product entity (list of dvd’s)

Now, lets update this application to  have “did-you-mean” feature in addition to existent full-text search.

1. Add Lucene-Spellcheker library to the project

  1. Download Lucene 2.4.1 (it is used by Seam 2.2)
  2. Put the “contrib/spellchecker/lucene-spellchecker-2.4.1.jar” to the dvdstore/lib directory
  3. Add following code to “dvdstore/build.xml” (to add libraries to the EAR)
<!--new library was added to the lib/lucene-spellchecker.jar directory, include it to the EAR-->
<fileset id="ear.lib.extras" dir=".">
    <include name="/lib/*.jar"/>

<!--new library was added to the lib, include it to the compilation classpath-->
<path id="build.classpath.extras">
    <fileset refid="ear.lib.extras"/>

2. Create a Lucene indexes for spellchecker

 * Create indexes for suggestion n-gram based algorithm for selected entities/fields
 * @author: Andrey Chorniy

public class SpellCheckIndexerProcessor {

private EntityManager entityManager;

private Log logger;

public void scheduleIndexing(@Duration long duration) {

private void process() {
    indexSpellchecker(Product.class, "title");
    indexSpellchecker(Product.class, "description");

private void indexSpellchecker(Class indexedClass, String indexedField) {
    SearchFactory searchFactory = getSearchFactory();

    DirectoryProvider[] directoryProviders = searchFactory.getDirectoryProviders(indexedClass);

    ReaderProvider readerProvider = searchFactory.getReaderProvider();
    IndexReader reader = readerProvider.openReader(directoryProviders);
    try {
        final SpellChecker sp = new SpellChecker(getSpellCheckerDirectory(indexedClass, indexedField));
        Dictionary dictionary = new LuceneDictionary(reader, indexedField);
        logger.info("Create spellchecker index for {0} field {1}", indexedClass.toString(), indexedField);
    } catch (IOException e) {
        logger.error("Failed to create SpellChecker", e);
    } finally {

private SearchFactory getSearchFactory() {
    return ((FullTextEntityManager) entityManager).getSearchFactory();

 * @param indexedClass
 * @param indexedField
 * @return the Lucene Directory object for indexedClass and Entity. it is constructed as
 * "${base-spellchecker-directory}/${indexed-class-name}/${indexedField}" so each field indexes are stored in it's
 * own file-directory inside owning-class directory
 * @throws IOException
private Directory getSpellCheckerDirectory(Class indexedClass, String indexedField) throws IOException {
    new FSDirectoryProvider().getDirectory();
    String path = "./spellchecker/" + indexedClass.getName() + "/" + indexedField;
    return FSDirectory.getDirectory(path);

in the process() method we create two indexes for “title” and “description” field of the Product entity. For some reason “description” field isn’t indexed by default, let’s add Hibernate-Search annotation to that field in the Product class (@Field annotation)

    public String getDescription() {
        return description;

Now lets launch the indexing at the application startup, that indexing should happen after the indexes for the Product fields will be created by Hibernate-Search. And they are launched at the application startup by the IndexerAction.index() method (on EJB-3 bean creation). Here we have issue since Hibernate-Search/Lucene create indexes asynchronously and there is no guarantee that they will be created in some period of time after starting the IndexerAction.index() method.  So, I put 60 seconds delay to launch spellchecker index creation. Here is the code

@Startup(depends = "indexer")
public class SpellCheckIndexer {

private Log logger;

private SpellCheckIndexerProcessor spellCheckIndexerProcessor;

//method auto-startup on Seam postInitialization event
public void scheduleProcess() {
    //delay is needed since initial Lucene-index for entities may not be created at this moment
    int delayInSeconds = 60;
    logger.info("SpellCheckIndexer will start in 60 seconds");
    spellCheckIndexerProcessor.scheduleIndexing(delayInSeconds * 1000L);

Now we everything to run a spellchecker. After deploying the updated application you should find new directory ${jboss.home/bin/spellchecker with subdirs for “title” and “description” indexes.
In the real applications with a lot of indexed data we should do a bit advanced index creation to ensure that indexes which we use for spellchecker creation already created or even rework the spellchecker-index creation and index the entity-attribute values directly like it is done in the Hibernate-Search org.hibernate.search.impl.FullTextSessionImpl.index() method

3. Use Lucene spellchecker to create did-you-mean feature

The findSuggestions of FullTextSuggestionService is the place were the magic happens. It run the suggestion for each word in the query for each suggested field (“title” and “description” in our case) and then merge the results in the single list of suggestions. Code is not perfect since we shouldn’t include suggestions for the particular word if one of the field has exactly match to it

* @author: Andrey Chorniy
* Date: 21.04.2010
public class FullTextSuggestionService {

private Log logger;

 * @param searchQuery user defined search criteria (used as a list of words)
 * @param indexedClass entity class
 * @param maxSuggestionsPerFieldCount maximum number of suggestions per field (usually 2..3 is enough)
 * @param suggestedFields list of entity fields to look for suggestions
 * @return list of suggestions
public List<String> findSuggestions(String searchQuery, Class<Product> indexedClass, int maxSuggestionsPerFieldCount,
                                    String... suggestedFields) {
    Map<String, List<String>> fieldSuggestionsMap = new LinkedHashMap<String, List<String>>();

    for (String suggestedField : suggestedFields) {
        List<String> fieldSuggestions = findSuggestionsForField(searchQuery, indexedClass, maxSuggestionsPerFieldCount,
        fieldSuggestionsMap.put(suggestedField, fieldSuggestions);

    return mergeSuggestions(maxSuggestionsPerFieldCount, fieldSuggestionsMap);

public List<String> findSuggestionsForField(String searchQuery, Class<Product> indexedClass,
                                            int maxSuggestionsCount,
                                            String suggestedField) {
    try {
        final SpellChecker sp = new SpellChecker(getSpellCheckerDirectory(indexedClass, suggestedField));

        //get the suggested words
        String[] words = searchQuery.split("\\s+");
        for (String word : words) {
            if (sp.exist(word)) {
                //no need to include suggestions for that word
                //TODO in case of multiple-fields suggestion that word should be excluded from suggestion in other fields
            String[] suggestions = sp.suggestSimilar(word, maxSuggestionsCount);
            return Arrays.asList(suggestions);
    } catch (IOException e) {
        logger.error("Failed to create SpellChecker for {0} field of class {1}", suggestedField,
                indexedClass.getName(), e);
    return Collections.emptyList();

private List<String> mergeSuggestions(int suggestionNumber, Map<String, List<String>> fieldSuggestionsMap) {
    List<String> suggestionList = new ArrayList<String>();
    for (int suggestionPosition = 0; suggestionPosition <= suggestionNumber; suggestionPosition++) {
        for (Map.Entry<String, List<String>> fieldSuggestionsEntry : fieldSuggestionsMap.entrySet()) {
            String fieldName = fieldSuggestionsEntry.getKey();
            List<String> suggestedTerms = fieldSuggestionsEntry.getValue();
            if (suggestedTerms.size() > suggestionPosition) {
                String suggestion = suggestedTerms.get(suggestionPosition);
                if (!suggestionList.contains(suggestion)){
    return suggestionList;
 * @param indexedClass
 * @param indexedField
 * @return Lucene Directory object in which spellchecker indexes are located for specified entity-class and entity-field
 * @throws IOException
public Directory getSpellCheckerDirectory(Class indexedClass, String indexedField) throws IOException {
    new FSDirectoryProvider().getDirectory();
    String path = "./spellchecker/" + indexedClass.getName() + "/" + indexedField;
    return FSDirectory.getDirectory(path);

As you may see the code to have the suggestion for single word is pretty simple.

String[] suggestions = sp.suggestSimilar(word, maxSuggestionsCount);

however we could increase the relevancy of suggested results by using the alternative method by providing the entity index-reader (which is easy to get) and the fieldname which we know.

//return only the suggest words that are as frequent or more frequent than the searched word
String[] suggestions = sp.suggestSimilar(word, maxSuggestionsCount, entityIndex, fieldName, true);

Ok, we actually have all the code to run the suggestions and now just need to use it from the existent full-text search in cases then it return no results. To do that we can update FullTextSearchAction class

//add list of suggestions
private List<String> suggestions;

//inject the fullTextSuggestionService we created before
@In private FullTextSuggestionService fullTextSuggestionService;

//run suggestions is search doesn't return results in the end of method updateResults()
//look for suggestions if full-text search return nothing
suggestions = null;
if (numberOfResults == 0){
    suggestions = fullTextSuggestionService.findSuggestions(searchQuery, Product.class, 2, "title", "description");

//add the method to run a search from the page (just for testing purposes)
 * Helper method to run a search with query (used in browse.xhtml to launch search for one of the suggestion)
 * tobe updated: replace it with restful-links (low importance for this example project)
@Begin(join = true)
public String searchFor(String query) {
    currentPage = 0;
    searchQuery = query;
    return "browse";

I skipped the getter method for suggestions and @Local interface method declarations for that EJB3 bean in the article, see the link to the full-code

And in the end add the code in the browse.xhtml to show suggestion results with links

<f:subview id="searchresults" rendered="#{searchResults.rowCount == 0}">
    <h:outputText id="NoResultsMessage" value="#{messages.noSearchResultsHeader}" />
<h:panelGroup rendered="#{not empty search.suggestions}">
    <h:outputText value="Did you mean..."/>
        <h:dataTable value="#{search.suggestions}" var="suggestion">
                <h:commandLink action="#{search.searchFor(suggestion)}">
                    <h:outputText value="#{suggestion}"/>

That’s all, now we can redeploy application and try the searches with wrong titles/words. “tarminul” or “fuction” or “flawers”. It wroks, what else is needed ? the answer is improvements, there is a lot of places to work on, extend to achieve better search results and easier integration but essentially it just work on the moment and we already have that engine which shows quite relevant results as for me.


Steps for the future

  1. Check that correctness to use Lucene indexes to create indexes for spellchecker. The question is – what set of words is returned by index created by Hibernate-Search, does it corresponds to the set of words which could be created by iteration of the “title” property of the Product ? Since SpellChecker algorithm uses word frequency with (more-popular option) the word frequency in the index makes difference, so it should have the same value as in direct attribute values iteration. The fast way to check it – will be creation of spellchecker index by iterating the attribute value and compare the indexes.
  2. update spellchecker index creation approach. options are write code like in the org.hibernate.search.impl.FullTextSessionImpl.index() or even extend FullTextSessionImpl to do that. (of course the last variant is possible if Hibernate-Search team will find that feature worth an integration it in the framework). On the moment I see that it’s possible to create few custom annotations and process them
  3. Enable closer integration with Hibernate-Search by looking not for just a suggestion text, but for related object
  4. Probably it is related to the previous one – update spellchecker algorithm to store index-fields in the same directory with entity indexes
  5. Compound search. current algorithm is well suited to find suggestions to single-word queries. Even taking into account we could find suggestion for each word – it looks that it is not the greatest approach and it works well as a SpellChecker, but not cool for “Did you mean”
Tagged with: , , , , , , , , ,
Posted in Software Development

Suggestion engine with Hibernate Search and Lucene. Intro

I was inspired by the InfoQ articla Implementing Google’s “Did you mean” Feature In Java which shows the way to create the “Suggester” service based on Lucene.  Here is two more links you should read if interested in that topic

The articles are related to what I’ve done before – spellchecker, yeah, “spellchecker” and “suggester” are quite common but have different areas of usage

  • “Spellchecker” is used then you type/edit some text and want to check if it has correct spelling, you need some good and big dictionary to implement it as well as good spellchecking algorithm
  • “Suggester” is much more commercialized tool (for sure it can be used for all non-commerce searches too) but it’s commerce usage is really obvious and so “good-to-have”

Just imagine the situation then you are looking for something on e-commerce shop and just mistyped one or more letters (I swear! I do quite often). As a common-rule I will not see anything in that case and just give up the idea to find this product on that site. I actually think that it will be cool to have it for every “text based search”.
Hey, but google will suggest you something even for mistyped searches.  Yeah, it’s true, google does that, but don’t you think that it’s good to have not only at `google`?

In InfoQ article the PlainTextDictionary is used as a source-dictionary (word-source) for SpellChecker, good for start but I don’t remember any application which have such list (it has to be generated from the db-data).  For sure, we can write an algorithm to create it based on a  data in DB-table, but I think it’s not perfect solution for production systems which has data which is updated often (list of products, companies, customers, etc) to do that (and let’s thing about synchronization of DB and spellchecker indexes).

And the tool which already do the integration of DB/entity-mapping/Lucene already exists it’s another cool tool from JBoss/Hibernate – “Hibernate-Search”. So I decided to use it as a base of doing that integration.

Actually the first task we need to solve is how to create the IndexReader which will provide the list of words. Originally it was instantiated as

IndexReader.indexReader = IndexReader.open(originalIndexDirectory);

happily with Hibernate-Search it is easy to get an access to IndexReader for particular entity/field, so the code to do it in my application will look like

//"indexedClass" is the indexed entity entity (in my case it is  Product entity).
SearchFactory searchFactory = ((FullTextEntityManager) entityManager).getSearchFactory();
DirectoryProvider[] directoryProviders = searchFactory.getDirectoryProviders(indexedClass);
ReaderProvider readerProvider = searchFactory.getReaderProvider();
IndexReader reader = readerProvider.openReader(directoryProviders);
//the instance of index reader is created by ReaderProvider and could be actually compound index-reader if we use sharded index

After that we create the Dictinoary object to index with the same code and index it.

SpellChecker sp = new SpellChecker(getSpellCheckerDirectory(indexedClass, indexedField));
Dictionary dictionary = new LuceneDictionary(reader, indexedField);

 * @param indexedClass
 * @param indexedField
 * @return the Lucene Directory object for indexedClass and Entity. it is constructed as
 * "${base-spellchecker-directory}/${indexed-class-name}/${indexedField}" so each field indexes are stored in it's
 * own file-directory inside owning-class directory
 * @throws IOException
private Directory getSpellCheckerDirectory(Class indexedClass, String indexedField) throws IOException {
   new FSDirectoryProvider().getDirectory();
   String path = "./spellchecker/" + indexedClass.getName() + "/" + indexedField;
   return FSDirectory.getDirectory(path);

Generally with those two small changes we can create the search engine as it is described in java.net article
In the next post I will show the way how to create web-application with did-you-mean feature using Seam, Hibernate-Search and Lucene and it will also have full-text-search against the Product’s entity plus suggestion-engine will be used if full-text-search will be failed to find results.

Tagged with: , , , , , , , ,
Posted in Software Development

Google introduced new format for JSON results

Google announced new JSON-C format for Youtube API, in that announcement they mentioned that direct transformation of ATOM to JSON is not very effective and actually make no much sense.
It’s actually not just announcement – they already created docs which explain new format in details.

Our existing JSON format isn’t perfect, however. It’s very much a literal translation from Atom. As is often the case with literal translations, the current JSON format is wordier than it needs to be, and it lacks some of the elegance that a native dialect would offer.

We’ve rethought out current JSON implementation, and moved away from a literal representation of the Atom data to a format that we hope will be more pleasing to those who are fluent in JSON. The vestigial XML namespace prefixes are no more, and we’ve removed many pieces of metadata specific to Atom documents that come across as noise in JSON. Repeating data elements are always structured as true JSON lists, and useful video metadata that exist as XML attributes in Atom have been rearranged to make more sense in the JSON document. You’ll also find that the new JSON results are more compact than Atom XML, which is of special importance to code running from limited-bandwith mobile applications.

Sounds very promising and I have a hope that new format (which is more compact and lightweight) will open the door for mobile devices to use GData API. Especially for Android, which have no official support of GData API. I believe it happens because current protocol (based on quite complex and big ATOM) is not very well suited for mobile devices – it’s too big (network) and complex (performance/memory). Hover there are couple of implementations of existent GData API:

  1. Android-GData – it looks like it is working solution and used by couple of projects, but it’s main drawback that it is too big and have a lot of dependencies
  2. Another solution is to use code which is not included now in the android-platform but still exists in the GIT.

I just tried to use alt=jsonc with calendar and picassa-web-albums and it looks like at least Picassa API also support new Json-C format ! and the generated response is about 2 times smaller than standard JSON.

If you want to see more details from google about JSONC and other GData protocol updates and tricks you can see that youtube-video.  I was wondered to see really nice Partial-GET feature of GData which allow you to enumerate the fields/objects you want to see in the response.

Tagged with: , , , ,
Posted in Software Development