Penlets.com provides resources for users and developers of the Pulse smart pen from LiveScribe.
Subscribe: RSS Feed - Developers Group
Tutorial
Creating a Custom Vocabulary
Tutorial by Robert HansonIf you are building a penlet that needs to "comprehend" what is being written, you may be better served with a custom vocabulary instead of using one of the built-in lexicons. For instance, a penlet that needed to recognize stock symbol names would fare pretty poorly without a custom vocabulary. In this tutorial I will show you how to create a custom lexicon file and use it from your penlet.
Lexicon or is it Vocabulary? A lexicon is a stock of terms, a vocabulary. The title and introduction of this tutorial use the word "vocabulary" because it is widely understood, while the SDK refers to it as a "lexicon". The rest of this tutorial will use the word "lexicon" so that it is consistent with the SDK terminology.
To begin, create a new project or open an existing one. Add a new
directry
/src/icr/
, this is where your custom lexicon files will be placed.
To create a lexicon file create a new file under
/src/icr/
with whatever name you desire, and add a ".lex" extension. The file
must end in ".lex" in order for it to be processed.
When you build your penlet all files matching
/src/icr/*.lex
will be processed and a compiled lexicon file created. The compiled
file name will be
/icr/LEX_[name].res
, where "[name]" is the name of the original file without the file
extension. So
/src/icr/custom.lex
would be compiled to
/icr/LEX_custom.res
.
Organizing lexicons into directories You
may create directories under
/src/icr/
to organize large numbers of lexicons, but beware that this directory
structure is flattened when the lexicons are compiled. Your
/src/icr/stocks/stocks.lex
will be compiled to
/icr/LEX_stocks.lex
, and if you have two lexicon files with the same name but in
different directories the build process will quietly only include one
of them. Bottom line, avoid sub-directories when you can.
The actual contents of the lexicon source file is simply the words that you want to match. So a lexicon of stock ticker symbols would look like this.
ko msft tgt wmt mmm phm hpq ifx intc ...
With your custom lexicon in hand, it is time to look at how to use it in your application. If you aren't already familiar with how to use Intelligent Character Recognition (ICR) in your application, read the Using ICR tutorial first.
For the purposes of this tutorial I have created a method
initICRContext
that gets the ICR context, sets the resources, and assumes that
this
class implements the
HWRListener
interface. It does not do things like setup a
StrokeListener
. If you have a question about what any of this means, check out the
Using ICR
tutorial.
private ICRContext icrContext;
private void initICRContext ()
{
this.icrContext = this.context.getICRContext(2000, this);
Resource[] resources = {
/* [1] */
this.icrContext.getDefaultAlphabetKnowledgeResource(),
this.icrContext.createSKSystemResource(ICRContext.SYSRES_SK_ALPHA),
/* [2] */
this.icrContext.createAppResource("/icr/LEX_custom.res"),
/* [3] */
this.icrContext.createLKSystemResource(ICRContext.SYSRES_LK_OUT_OF_LEXICON)
};
this.icrContext.addResourceSet(resources);
}
For our selection of resources we use the default AK resource and the alpha SK [1]. The alpha SK makes sense in this case because our sample lexicon file only contains letters.
Next we get to the purpose of this tutorial, loading our custom
lexicon
[2]. As said previously, the lexicon will always be
in the
/icr/
directory, begin with "LEX_", and have an extension of ".res".
The last resource that we specify is the
ICRContext.SYSRES_LK_OUT_OF_LEXICON
[3]. If you are using a fairly limited lexicon, it
probably makes sense to use this. What this does is informs the
HWR engine that it
should not limit itself to only the lexicon. Without this the
HWR engine will look
for the best match in the lexicon, which may not be appropriate.
For instance the
HWR engine might return
"MSFT" when you write "NKFT" just because it was the closest match.
By adding
ICRContext.SYSRES_LK_OUT_OF_LEXICON
you can increase the likelihood that a result returned from the
HWR engine is accurate.
I do though recommend a good dose of testing, because using this
resource isn't always appropriate.
Troubleshooting Tip #1: The filename
shuffle If your penlet fails to load double check that you specified
the right path to the compiled lexicon. It is very easy to get the
path wrong because of the file renaming done by the build script.
Remember that your compiled lexicon is
/icr/LEX_[name].res
.
Troubleshooting Tip #2: Lexicon
case-sensitivity and SK selection Based on my experience alone I
have found that using lower-case in your lexicon file results in
better matching. I have also found that for working with alpha-only
lexicons,
ICRContext.SYSRES_SK_ALPHA
works best.
ICRContext.SYSRES_SK_UPPER
and
ICRContext.SYSRES_SK_LOWER
do not seem to work as expected. My advice is to experiment and QA
your penlet very well.
If you have found a best set of resources for a specific use case let us know what you came up with.
Happy coding!
Comments (View) blog comments powered by DisqusProject Information
Tested for use with: PreRelease-SDK
New Tutorials
Using a Shared Library
Learn how to create a library of utils that you can
share between projects.
Capturing Drawn Shapes
Learn how to capture shapes drawn by the pen and determine relationships.
Penlets 101
Never written a penlet before? Then start here with Penlets 101!
Creating a Custom Vocabulary
Learn how to create a custom vocabulary for your ICR applications.
Using Properties Files
J2ME lacks a Properties class. In this tutorial we roll our own,
along with split and chomp functions.