Penlets.com provides resources for users and developers of the Pulse smart pen from LiveScribe.

 

Subscribe: RSS Feed - Developers Group

Tutorial

Creating a Custom Vocabulary

Tutorial by Robert Hanson

If you are building a penlet that needs to "comprehend" what is being written, you may be better served with a custom vocabulary instead of using one of the built-in lexicons. For instance, a penlet that needed to recognize stock symbol names would fare pretty poorly without a custom vocabulary. In this tutorial I will show you how to create a custom lexicon file and use it from your penlet.

Lexicon or is it Vocabulary? A lexicon is a stock of terms, a vocabulary. The title and introduction of this tutorial use the word "vocabulary" because it is widely understood, while the SDK refers to it as a "lexicon". The rest of this tutorial will use the word "lexicon" so that it is consistent with the SDK terminology.

To begin, create a new project or open an existing one. Add a new directry /src/icr/ , this is where your custom lexicon files will be placed.

To create a lexicon file create a new file under /src/icr/ with whatever name you desire, and add a ".lex" extension. The file must end in ".lex" in order for it to be processed.

When you build your penlet all files matching /src/icr/*.lex will be processed and a compiled lexicon file created. The compiled file name will be /icr/LEX_[name].res , where "[name]" is the name of the original file without the file extension. So /src/icr/custom.lex would be compiled to /icr/LEX_custom.res .

Organizing lexicons into directories You may create directories under /src/icr/ to organize large numbers of lexicons, but beware that this directory structure is flattened when the lexicons are compiled. Your /src/icr/stocks/stocks.lex will be compiled to /icr/LEX_stocks.lex , and if you have two lexicon files with the same name but in different directories the build process will quietly only include one of them. Bottom line, avoid sub-directories when you can.

The actual contents of the lexicon source file is simply the words that you want to match. So a lexicon of stock ticker symbols would look like this.

ko
msft
tgt
wmt
mmm
phm
hpq
ifx
intc
...

With your custom lexicon in hand, it is time to look at how to use it in your application. If you aren't already familiar with how to use Intelligent Character Recognition (ICR) in your application, read the Using ICR tutorial first.

For the purposes of this tutorial I have created a method initICRContext that gets the ICR context, sets the resources, and assumes that this class implements the HWRListener interface. It does not do things like setup a StrokeListener . If you have a question about what any of this means, check out the Using ICR tutorial.

private ICRContext icrContext;

private void initICRContext ()
{
    this.icrContext = this.context.getICRContext(2000, this);
    Resource[] resources = {
        /* [1] */
        this.icrContext.getDefaultAlphabetKnowledgeResource(),
        this.icrContext.createSKSystemResource(ICRContext.SYSRES_SK_ALPHA),
        /* [2] */
        this.icrContext.createAppResource("/icr/LEX_custom.res"),
        /* [3] */
        this.icrContext.createLKSystemResource(ICRContext.SYSRES_LK_OUT_OF_LEXICON)
    };
    this.icrContext.addResourceSet(resources);
}

For our selection of resources we use the default AK resource and the alpha SK [1]. The alpha SK makes sense in this case because our sample lexicon file only contains letters.

Next we get to the purpose of this tutorial, loading our custom lexicon [2]. As said previously, the lexicon will always be in the /icr/ directory, begin with "LEX_", and have an extension of ".res".

The last resource that we specify is the ICRContext.SYSRES_LK_OUT_OF_LEXICON [3]. If you are using a fairly limited lexicon, it probably makes sense to use this. What this does is informs the HWR engine that it should not limit itself to only the lexicon. Without this the HWR engine will look for the best match in the lexicon, which may not be appropriate.

For instance the HWR engine might return "MSFT" when you write "NKFT" just because it was the closest match. By adding ICRContext.SYSRES_LK_OUT_OF_LEXICON you can increase the likelihood that a result returned from the HWR engine is accurate. I do though recommend a good dose of testing, because using this resource isn't always appropriate.

Troubleshooting Tip #1: The filename shuffle If your penlet fails to load double check that you specified the right path to the compiled lexicon. It is very easy to get the path wrong because of the file renaming done by the build script. Remember that your compiled lexicon is /icr/LEX_[name].res .

Troubleshooting Tip #2: Lexicon case-sensitivity and SK selection Based on my experience alone I have found that using lower-case in your lexicon file results in better matching. I have also found that for working with alpha-only lexicons, ICRContext.SYSRES_SK_ALPHA works best. ICRContext.SYSRES_SK_UPPER and ICRContext.SYSRES_SK_LOWER do not seem to work as expected. My advice is to experiment and QA your penlet very well.

If you have found a best set of resources for a specific use case let us know what you came up with.

Happy coding!

Comments (View)
blog comments powered by Disqus

Project Information

Tested for use with: PreRelease-SDK

New Tutorials

Using a Shared Library
Learn how to create a library of utils that you can share between projects.

Capturing Drawn Shapes
Learn how to capture shapes drawn by the pen and determine relationships.

Penlets 101
Never written a penlet before? Then start here with Penlets 101!

Creating a Custom Vocabulary
Learn how to create a custom vocabulary for your ICR applications.

Using Properties Files
J2ME lacks a Properties class. In this tutorial we roll our own, along with split and chomp functions.