Berkeley DB's Java API
$Id: README,v 11.2 2003/03/06 00:42:16 mjc Exp $
Berkeley DB's Java API is now generated with SWIG
(http://www.swig.org). This document describes how SWIG is used -
what we trust it to do, what things we needed to work around.
Overview
========
SWIG is a tool that generates wrappers around native (C/C++) APIs for
various languages (mainly scripting languages) including Java.
By default, SWIG creates an API in the target language that exactly
replicates the native API (for example, each pointer type in the API
is wrapped as a distinct type in the language). Although this
simplifies the wrapper layer (type translation in trivial), it usually
doesn't result in natural API in the target language.
A further constraint for Berkeley DB's Java API was backwards
compatibility. The original hand-coded Java API is in widespread use,
and included many design decisions about how native types should be
represented in Java. As an example, callback functions are
represented by Java interfaces that applications using Berkeley DB
could implement. The SWIG implementation was required to maintain
backwards compatibility for those applications.
Running SWIG
============
The simplest use of SWIG is to simply run it with a C include file as
input. SWIG parses the file and generates wrapper code for the target
language. For Java, this includes a Java class for each C struct and
a C source file containing the Java Native Interface (JNI) function
calls for each native method.
The s_swig shell script in db/dist runs SWIG, and then post-processes
each Java source file with the sed commands in
libdb_java/java-post.sed. The Java sources are placed in
java/src/com/sleepycat/db, and the native wrapper code is in a single
file in libdb_java/db_java_wrap.c.
The post-processing step modifies code in ways that is difficult with
SWIG (given my current level of knowledge). This includes changing
some access modifiers to hide some of the implementation methods,
selectively adding "throws" clauses to methods, and adding calls to
"initialize" methods in Db and DbEnv after they are constructed (more
below on what these aclls do).
In addition to the source code generated by SWIG, some of the Java
classes are written by hand, and constants and code to fill statistics
structures are generated by the script dist/s_java. The native
statistics code is in libdb_java/java_stat_auto.c, and is compiled
into the db_java_wrap object file with a #include directive. This
allows most functions in that object to be static, which encourages
compiler inlining and reduces the number of symbols we export.
The Implementation
==================
For the reasons mentioned above, Berkeley DB requires a more
sophisticated mapping between the native API and Java, so additional
SWIG directives are added to the input. In particular:
* The general intention is for db.i to contain the full DB API (just
like db.h). As much as possible, this file is kept Java independent
so that it can be updated easily when the API changes. SWIG doesn't
have any builtin rules for how to handle function pointers in a
struct, so each DB method must be added in a SWIG "%extend" block
which includes the method signature and a call to the method.
* SWIG's automatically generated function names happen to collide
with Sleepycat's naming convention. For example, in a SWIG class
called __db, a method called "open" would result in a wrapper
function called "__db_open", which already exists in DB. This is
another reason why making these static functions is important.
* The main Java support starts in db_java.i - this file includes all
Java code that is explicitly inserted into the generated classes,
and is responsible for defining object lifecycles (handling
allocation and cleanup).
* Methods that need to be wrapped for special handling in Java code
are renamed with a trailing zero (e.g., close becomes close0).
This is invisible to applications.
* Most DB classes that are wrapped have method calls that imply the
cleanup of any native resources associated with the Java object
(for example, Db.close or DbTxn.abort). These methods are wrapped
so that if the object is accessed after the native part has been
destroyed, an exception is thrown rather than a trap that crashes
the JVM.
* Db and DbEnv initialization is more complex: a global reference is
stored in the corresponding struct so that native code can
efficiently map back to Java code. In addition, if a Db is
created without an environment (i.e., in a private environment),
the initialization wraps the internal DbEnv to simplify handling
of various Db methods that just call the corresponding DbEnv
method (like err, errx, etc.). It is important that the global
references are cleaned up before the DB and DB_ENV handles are
closed, so the Java objects can be garbage collected.
* In the case of DbLock and DbLsn, there are no such methods. In
these cases, there is a finalize method that does the appropriate
cleanup. No other classes have finalize methods (in particular,
the Dbt class is now implemented entirely in Java, so no
finalization is necessary).
* Overall initialization code, including the System.loadLibrary call,
is in java_util.i. This includes looking up all class, field and
method handles once so that execution is not slowed down by repeated
runtime type queries.
* Exception handling is in java_except.i. The main non-obvious design
choice was to create a db_ret_t type for methods that return an
error code as an int in the C API, but return void in the Java API
(and throw exceptions on error).
* The only other odd case with exceptions is DbMemoryException -
this is thrown as normal when a call returns ENOMEM, but there is
special handling for the case where a Dbt with DB_DBT_USERMEM is
not big enough to handle a result: in this case, the Dbt handling
code calls the method update_dbt on the exception that is about to
be thrown to register the failed Dbt in the exception.
* Statistics handling is in java_stat.i - this mainly just hooks into
the automatically-generated code in java_stat_auto.c.
* Callbacks: the general approach is that Db and DbEnv maintain
references to the objects that handle each callback, and have a
helper method for each call. This is primarily to simplify the
native code, and performs better than more complex native code.
* One difference with the new approach is that the implementation is
more careful about calling DeleteLocalRef on objects created for
callbacks. This is particularly important for callbacks like
bt_compare, which may be called repeatedly from native code.
Without the DeleteLocalRef calls, the Java objects that are
created can not be collected until the original call returns.
* Most of the rest of the code is in java_typemaps.i. A typemap is a
rule describing how a native type is mapped onto a Java type for
parameters and return values. These handle most of the complexity
of creating exactly the Java API we want.
* One of the main areas of complexity is Dbt handling. The approach
taken is to accept whatever data is passed in by the application,
pass that to native code, and reflect any changes to the native
DBT back into the Java object. In other words, the Dbt typemaps
don't replicate DB's rules about whether Dbts will be modified or
not - they just pass the data through.
* As noted above, when a Dbt is "released" (i.e., no longer needed
in native code), one of the check is whether a DbMemoryException
is pending, and if so, whether this Dbt might be the cause. In
that case, the Dbt is added to the exception via the "update_dbt"
method.
* Constant handling has been simplified by making DbConstants an
interface. This allows the Db class to inherit the constants, and
most can be inlined by javac.
* The danger here is if applications are compiled against one
version of db.jar, but run against another. This danger existed
previously, but was partly ameliorated by a separation of
constants into "case" and "non-case" constants (the non-case
constants were arranged so they could not be inlined). The only
complete solution to this problem is for applications to check the
version returned by DbEnv.get_version* versus the Db.DB_VERSION*
constants.
Application-visible changes
===========================
* The new API is around 5x faster for many operations.
* Some internal methods and constructors that were previously public
have been hidden or removed.
* A few methods that were inconsistent have been cleaned up (e.g.,
Db.close now returns void, was an int but always zero). The
synchronized attributed has been toggled on some methods - this is
an attempt to prevent multi-threaded applications shooting
themselves in the foot by calling close() or similar methods
concurrently from multiple threads.