Chapter 4. Advanced Topics

Fred Toussi

HSQLDB Development Group

Copyright 2002-2004 Fred Toussi. Permission is granted to distribute this document without any alteration under the terms of the HSQLDB license. Additional permission is granted to the HSQLDB Development Group to distribute this document with or without alterations under the terms of the HSQLDB license.

$Date: 2004/11/24 23:06:35 $

Table of Contents

Purpose
Connections
Connection properties
Properties Files
Server and Web Server Properties
Starting a Server from your application
Individual Database Properties
JDBC Stream Based Methods
Managing Database Connections
Memory and Disk Use
Cache Memory Allocation
Upgrading and Managing Databases
Upgrading Using the SCRIPT Command
Manual Changes to the .script File
Backing Up Databases

Purpose

Many questions repeatedly asked in Forums and mailing lists are answered in this guide. If you want to use HSQLDB with your application, you should read this guide. This document covers system related issues. For issues related to SQL see the SQL Issues chapter.

Connections

The normal method of accessing an HSQLDB database is via the JDBC Connection interface. An introduction to different methods of providing database services and accessing them can be found in the SQL Issues chapter. Details and examples of how to connect via JDBC are provided in our JavaDoc for jdbcConnection.

Version 1.7.2 introduces a uniform method of distinguishing between different types of connection, alongside new capabilities to provide access to multiple databases. The common driver identifier is jdbc:hsqldb: followed by a protocol identifier (mem: file: res: hsql: http: hsqls: https:) then followed by host and port identifiers in the case of servers, then followed by database identifier.

Table 4.1. Hsqldb URL Components

Driver and ProtocolHost and PortDatabase
jdbc:hsqldb:mem:
not available
accounts

Lowercase, single-word identifier creates the in-memory database when the first connection is made. Subsequent use of the same Connection URL connects to the existing DB.

The old form for the URL, jdbc:hsqldb:. creates or connects to the same database as the new form for the URL, jdbc:hsqldb:mem:.

jdbc:hsqldb:file:
not available
mydb
/opt/db/accounts
C:/data/mydb

The file path specifies the database file. In the above examples the first one refers to a set of mydb.* files in the directory where the javacommand for running the application was issued. The second and third examples refer to absolute paths on the host machine.

jdbc:hsqldb:res:
not available
/adirectory/dbname
Database files can be loaded from one of the jars specified as part of the Java command the same way as resource files are accessed in Java programs. The /adirectory above stands for a directory in one of the jars.
jdbc:hsqldb:hsql:
jdbc:hsqldb:hsqls:
jdbc:hsqldb:http:
jdbc:hsqldb:https:
//localhost
//192.0.0.10:9500
//dbserver.somedomain.com
/an_alias
/enrollments
/quickdb

The host and port specify the IP address or host name of the server and an optional port number. The database to connect to is specified by an alias. This alias is a lowercase string defined in the server.properties file to refer to an actual database on the file system of the server or a transient, in-memory database on the server. The following example lines in server.properties or webserver.properties define the database aliases listed above and accessible to clients to refer to different file and in-memory databases.

    database.0=file:/opt/db/accounts
    dbname.0=an_alias

    database.1=file:/opt/db/mydb
    dbname.1=enrollments

    database.2=mem:adatabase
    dbname.2=quickdb

The old form for the server URL, e.g., jdbc:hsqldb:hsql//localhost connects to the same database as the new form for the URL, jdbc:hsqldb:hsql//localhost/ where the alias is a zero length string. In the example below, the database files lists.* in the /home/dbmaster/ directory are associated with the empty alias:

    database.3=/home/dbmaster/lists
    dbname.3=

Connection properties

Each new JDBC Connection to a database can specify connection properties. The properties user and password are always required. In 1.7.2 the following optional properties can also be used.

Connection properties are specified either by establishing the connection via the:

    DriverManager.getConnection (String url, Properties info);

method call, or the property can be appended to the full Connection URL.

Table 4.2. Connection Properties

get_column_nametruecolumn name in ResultSet

This property is used for compatibility with other JDBC driver implementations. When true (the default), ResultSet.getColumnName(int c) returns the underlying column name

When false, the above method returns the same value as ResultSet.getColumnLabel(int column) Example below:

    jdbc:hsqldb:hsql://localhost/enrollments;get_column_name=false
                    

When a ResultSet is used inside a user-defined stored procedure, the default, true, is always used for this property.

ifexistsfalseconnect only if database already exists

Has an effect only with mem: and file: database. When true, will not create a new database if one does not already exist for the URL.

When false (the default), a new mem: or file: database will be created if it does not exist.

Setting the property to true is useful when troubleshooting as no database is created if the URL is malformed. Example below:

    jdbc:hsqldb:file:enrollments;ifexists=true

In addition, when a connection to an in-process database creates a new database, or opens an existing database (i.e. it is the first connection made to the database by the application), all the user-defined database properties can be specified as URL properties. This is used to specify properties to enforce more strict SQL adherence, or to change cache_scale or similar properties before the database files are created.

Properties Files

HSQLDB relies on a set of properties files for different settings. Version 1.7.0 streamlines property naming and introduces a number of new properties (in this document, all references to versions 1.7.0 also apply to versions 1.7.1 and 1.7.2 unless stated otherwise). This process will continue with future versions and the properties will be used in a hierarchical manner.

In all properties files, values are case-sensitive. All values apart from names of files or pages are required in lowercase (e.g. server.silent=FALSE will have no effect, but server.silent=false will work).

The properties files and the settings stored in them are as follows:

Table 4.3. Hsqldb Server Properties Files

File NameLocationFunction
server.properties the directory where the command to run the Server class is issued settings for running HSQLDB as a database server communicating with the HSQL protocol
webserver.properties the directory where the command to run the WebServer class is issued settings for running HSQLDB as a database server communicating with the HTTP protocol
<dbname>.properties the directory where all the files for a database are located settings for each particular database

Properties files for running the servers are not created automatically. You should create your own files that contain server.property=value pairs for each property.

The properties file for each database is generated by the database engine. This file can be edited after closing the database. In 1.7.2, some property values can be changed via SQL commands.

Server and Web Server Properties

In both server.properties and webserver.properties files, supported values and their defaults are as follows:

Table 4.4. Property File Properties

ValueDefaultDescription
server.database.0test the path and file name of the first database file to use
server.dbname.0""lowercase server alias for the first database file
server.urlid.0NONESqlTool urlid used by UNIX init script. (This property is not used if your are running Server/Webserver on a platform other than UNIX, or of you are not using our UNIX init script).
server.silenttrue no extensive messages displayed on console
server.tracefalseJDBC trace messages displayed on console

In 1.7.2, each server can serve up to 10 different databases simultaneously. The server.database.0 property defines the filename / path whereas the server.dbname.0 defines the lowercase alias used by clients to connect to that database. The digit 0 is incremented for the second database and so on. Values for the server.database.{0-9} property can use the mem:, file: or res: prefixes and properties as discussed above under CONNECTIONS. For example,

    database.0=mem:temp;sql.enforce_strict_size=true;

Values specific to server.properties are:

Table 4.5. Server Property File Properties

ValueDefaultDescription
server.port9001 TCP/IP port used for talking to clients. All databases are served on the same port.
server.no_system_exittrue no System.exit() call when the database is closed

Values specific to webserver.properties are:

Table 4.6. WebServer Property File Properties

ValueDefaultDescription
server.port80TCP/IP port used for talking to clients
server.default_pageindex.htmlthe default web page for server
server.root./the location of served pages
.<extension>? multiple entries such as .html=text/html define the mime types of the static files served by the web server. See the source for WebServer.java for a list.

All the above values can be specified on the command line to start the server by omitting the server. prefix.

Starting a Server from your application

If you want to start the server from within your application, as opposed to the command line or batch files, you should create an instance of Server or Web Server, then assign the properties in the form of a String and start the Server. An example of this can be found in the org.hsqldb.test.TestBase source.

Note

Upgrading: If you have existing custom properties files, change the values to the new naming convention. Note the use of digits at the end of server.database.n and server.dbname.n properties.

Individual Database Properties

Each database has its own <dbname>.properties file as part of a small group of files which also includes <dbname>.script and <dbname>.data. The properties files contain key/value pairs for some important settings.

In version 1.7.2 a new SQL command allows some user-accessible database properties to be modified as follows:

    SET PROPERTY "property_name" property_value

Properties that can be modified via SET PROPERTY are indicated in the table below. Other properties are indicated as PROPERTIES FILE ONLY and can be modified only by editing the .properties file after a shutdown and before a restart. The *.properties file can be edited after closing the database. Only the user-defined values listed below should ever be modified. Changing any other value will result in unexpected malfunction in database operations. Most of these values have been introduced for the new features since 1.7.0 and are listed below with their default values in different contexts:

Table 4.7. Database-specific Property File Properties

ValueDefaultDescription
readonlynowhole database is read-only

When true, the database cannot be modified in use. This setting can be changed to yes if the database is to be opened from a CD. Prior to changing this setting, the database should be closed with the SHUTDOWN COMPACT command to ensure consistency and compactness of the data. (PROPERTIES FILE ONLY)

hsqldb.files_readonlyfalsedatabase files will not be written to

When true, data in MEMORY tables can be modified and new MEMORY tables can be added. However, these changes are not saved when the database is shutdown. CACHED and TEXT tables are always readonly when this setting is true. (PROPERTIES FILE ONLY)

hsqldb.first_identity0first identity value for a new table

The first value assigned automatically to the identity column of new tables. (SET PROPERTY)

sql.enforce_sizefalsetrimming and padding string columns

When true, all CHARACTER and VARCHAR values that are in a row affected by an INSERT INTO or UPDATE statement are trimmed to the size specified in the SQL table definition. Also all char strings that are shorter than the specified size are padded with spaces. When false (default), stores the exact string that is inserted. (SET PROPERTY)

sql.enforce_strict_sizefalse size enforcement and padding string columns

Conforms to SQL standards. When true, all CHARACTER and VARCHAR values that are in a row affected by an INSERT INTO or UPDATE statement are checked against the size specified in the SQL table definition. An exception is thrown if the value is too long. Also all char strings that are shorter than the specified size are padded with spaces. When false (default), stores the exact string that is inserted. (SET PROPERTY)

sql.compare_in_localefalselocale used for sorting

CHARACTER and VARCHAR columns are by default sorted according to POSIX standards. Setting the value to true will result in sorting in the character set of the current JRE locale.

Changing this value for an existing database that contains cached tables will break the indexing and result in inconsistent operation. To avoid this, first change the value in the properties file, then open the database and issue the SHUTDOWN COMPACT command to recreate all the indexes.

If set true, this setting affects all the database in the JVM. (PROPERTIES FILE ONLY)

hsqldb.cache_scale14memory cache exponent

Indicates the maximum number of rows of cached tables that are held in memory, calculated as 3 *(2**value) (three multiplied by (two to the power value)). The default results in up to 3*16384 rows from all cached tables being held in memory at any time.

The value can range between 8-18. (SET PROPERTY). If the value is set via SET PROPERTY then it becomes effective after the next database SHUTDOWN or CHECKPOINT. (SET PROPERTY)

hsqldb.cache_size_scale10memory cache exponent

Indicates the average size of each row in the memory cache used with cached tables, calculated as 2**value (two to the power value). This result value is multiplied by the maximum number of rows defined by hsqldb.cache_scale to form the maximum number of bytes for all the rows in memory cache. The default results in 1024 bytes per row. This default, combined with the default number of rows, results in approximately 50MB of the .data file to be stored in the memory cache.

The value can range between 6-20. (SET PROPERTY). If the value is set via SET PROPERTY then it becomes effective after the next database SHUTDOWN or CHECKPOINT. (SET PROPERTY)

hsqldb.log_size200size of log when checkpoint is performed

The value is the size in megabytes that the .log file can reach before an automatic checkpoint occurs. A checkpoint and rewrites the .script file and clears the .log file. The value can be changed via the SET LOGSIZE nnn SQL command.

runtime.gc_interval0forced garbage collection

This setting forces garbage collection each time a set number of result set row or cache row objects are created. The default, "0" means no garbage collection is forced by the program.

This should not be set when the database engine is acting as a server inside an exclusive JVM. The setting can be useful when the database is used in-process with the application with some Java Runtime Environments (JRE's). Some JRE's increase the size of the memory heap before doing any automatic garbage collection. This setting would prevent any unnecessary enlargement of the heap. Typical values for this setting would probably be between 10,000 to 100,000. (PROPERTIES FILE ONLY)

hsqldb.nio_data_filetrue use of nio access methods for the .data file

When HSQLDB is compiled and run in Java 1.4, setting this property to false will avoid the use of nio access methods, resulting in reduced speed. If the data file is larger than 512MB when it is first opened, nio access methods are not used. Also, if the file gets larger than the amount of available computer memory that needs to be allocated for nio access, non-nio access methods are used.

(SET PROPERTY). If used before defining any CACHED table, it applies to the current session, otherwise it comes to effect after a SHUTDOWN and restart or CHECKPOINT.

textdb.*0default properties for new text tables

Properties that override the database engine defaults for newly created text tables. Settings in the text table SET <tablename> SOURCE <source string> command override both the engine defaults and the database properties defaults. Individual textdb.* properties are listed in the Text Tables chapter. (SET PROPERTY)

When connecting to an in-process database creates a new database, or opens an existing database (i.e. it is the first connection made to the database by the application), all the user-defined database properties listed in this section can be specified as URL properties.

Note

Upgrading: From 1.7.0, the location of the database files can no longer be overridden by paths defined in the properties file. All files belonging to a database should reside in the same directory.

JDBC Stream Based Methods

The JDBC interface has been significantly improved in 1.7.2. There is almost full support for JDBC2. Under JDK 1.4, there is also some support for JDBC3 methods. ResultSetMetaData methods are now fully supported, as are ParameterMetaData and Savepoint (JDBC3). When upgrading from previous versions, certain issues may arise as several JDBC methods that previously returned incorrect values or were not supported now return correct values. All changes have been documented in the Javadoc for the jdbcXXX classes.

Since 1.7.0, the ResultSet interface methods, getAsciiStream(), getUnicodeStream() and getCharacterStream() are supported to return byte or char values from CHARACTER columns and its variants. Complementary methods in PreparedStatement, setAsciiStream(), setUnicodeStream() and setCharacterStream() are also supported. Unlike most other databases, the getString() methods can be used to retrieve very long character strings and is faster than the stream based methods.

Managing Database Connections

In all running modes (server or in-process) multiple connections to the database engine are supported. In-process (standalone) mode supports connections from the client in the same Java Virtual Machine, while server modes support connections over the network from several different clients.

Connection pooling software can be used to connect to the database but it is not generally necessary. With other database engines, connection pools are used for reasons that may not apply to HSQLDB.

  • To allow new queries to be performed while a time-consuming query is being performed in the background. This is not possible with HSQLDB as it blocks while performing the first query and deals with the next query once it has finished it.

  • To limit the maximum number of simultaneous connections to the database for performance reasons. With HSQLDB this can be useful only if your application is designed in a way that opens and closes connections for each small task.

  • To control transactions in a multi-threaded application. This can be useful with HSQLDB as well. For example, in a web application, a transaction may involve some processing between the queries or user action across web pages. A separate connection should be used for each session so that the work can be committed when completed or rolled back otherwise.

An application that is not both multi-threaded and transactional, such as an application for recording user login and logout actions, does not need more than one connection. The connection can stay open indefinitely and reopened only when it is dropped due to network problems.

When using an in-process database with versions prior to 1.7.2 the application program had to keep at least one connection to the database open, otherwise the database would have been closed and further attempts to create connections could fail. This is not necessary in 1.7.2, which does not automatically close an in-process database that is opened by establishing a connection. An explicit SHTDOWN command, with or without an argument, is required to close the database.

When using a server database (and to some extent, an in-process database), care must be taken to avoid creating and dropping JDBC Connections too frequently. Failure to observe this will result in unsuccessful connection attempts when the application is under heavy load.

Memory and Disk Use

Memory used by the program can be thought of as two distinct pools: memory used for table data, and memory used for building result sets and other internal operations. In addition, when transactions are used, memory is utilised for storing the information needed for a rollback.

Since version 1.7.1, memory use has been significantly reduced compared to previous versions. The memory used for a MEMORY table is the sum of memory used by each row. Each MEMORY table row is a Java object that has 2 slots for int or reference variables (10 slots in previous versions). It contains an array of objects for the fields in the row. Each field is an object such as Integer, Long, String, etc. In addition each index on the table adds a node object to the row. Each node object has 6 slots for int or reference variables (12 slots in previous versions). As a result, a table with just one column of type INTEGER will have four objects per row, with a total of 10 slots of 4 bytes each - currently taking up 80 bytes per row. Beyond this, each extra column in the table adds at least a few bytes to the size of each row.

The memory used for a result set row has fewer overheads (fewer slots and no index nodes) but still uses a lot of memory. All the rows in the result set are built in memory, so very large result sets may not be possible. In server mode databases, the result set memory is released from the server once the database server has returned the result set. In-process databases release the memory when the application program releases the java.sql.ResultSet object. Server modes require additional memory for returning result sets, as they convert the full result set into an array of bytes which is then transmitted to the client.

When UPDATE and DELETE queries are performed on CACHED tables, the full set of rows that are affected, including those affected due to ON UPDATE actions, is held in memory for the duration of the operation. This means it may not possible to perform deletes or updates involving very large numbers of rows of CACHED tables. Such operations should be performed in smaller sets.

When transactions support is enabled with SET AUTOCOMMIT OFF, lists of all insert, delete or update operations are stored in memory so that they can be undone when ROLLBACK is issued. Transactions that span hundreds of modification to data will take up a lot of memory until the next COMMIT or ROLLBACK clears the list.

Most JVM implementations allocate up to a maximum amount of memory (usually 64 MB by default). This amount is generally not adequate when large memory tables are used, or when the average size of rows in cached tables is larger than a few hundred bytes. The maximum amount of allocated memory can be set on the java ... command line that is used for running HSQLDB. For example, with Sun JVM version 1.3.0 the parameter -Xmx256m increases the amount to 256 MB.

1.7.2 uses a fast cache for immutable objects such as Integer or String that are stored in the database. In most circumstances, this reduces the memory footprint still further as fewer copies of the most frequently-used objects are kept in memory.

Cache Memory Allocation

With CACHED tables, the data is stored on disk and only up to a maximum number of rows are held in memory at any time. The default is up to 3*16384 rows. The hsqldb.cache_scale database property can be set to alter this amount. As any random subset of the rows in any of the CACHED tables can be held in the cache, the amount of memory needed by cached rows can reach the sum of the rows containing the largest field data. For example if a table with 100,000 rows contains 40,000 rows with 1,000 bytes of data in each row and 60,000 rows with 100 bytes in each, the cache can grow to contain nearly 50,000 rows, including all the 40,000 larger rows.

In 1.7.2, an additional property, hsqldb.cache_size_scale is introduced. This property, combined with the hsqldb.cache_scale property, puts a limit in bytes on the total size of rows that are cached. When the default values is used for both properties, the limit on the total size of rows is approximately 50MB. (This is the size of binary images of the rows and indexes. It translates to more actual memory, typically 2-4 times, used for the cache because the data is represented by Java objects.)

If memory is limited, the hsqldb.cache_scale or hsqldb.cache_size_scale database properties can be reduced. In the example above, if the hsqldb.cache_size_scale is reduced from 10 to 8, then the total binary size limit is reduced from 50MB to 12.5 MB. This will allow the number of cached rows to reach 50,000 small rows, but only 12,500 of the larger rows.

Upgrading and Managing Databases

Any database not produced with the release version of HSQLDB 1.7.2 must be upgraded to this version. This includes databases created with the ALPHA versions of 1.7.2. The instructions under the Upgrading Using the SCRIPT Command section should be followed in all cases.

Once a database is upgraded to 1.7.2, it can no longer be used with Hypersonic or HSQLDB 1.6.x, 1.7.0 or 1.7.1.

There may be some potential problems in the upgrade which should be resolved by editing the .script file:

  • Version 1.7.2 does not accept duplicate names for indexes.

  • Version 1.7.2 does not accept duplicate names for table columns.

  • Version 1.7.2 does not create the same type of index for foreign keys as previous versions.

Upgrading Using the SCRIPT Command

In all versions of HSQLDB and Hypersonic 1.43, the SCRIPT 'filename' command (used as an SQL query) allows you to save a full record of your database, including database object definitions and data, to a file of your choice. You can export a script file using the old version of the database engine and open the script as a database with 1.7.2.

Procedure 4.1. Upgrade Using SCRIPT procedure

  1. Open the original database in the old version of DatabaseManager

  2. Issue the SCRIPT command, for example SCRIPT 'newversion.script' to create a script file containing a copy of the database.

  3. You can now edit the newversion.script file to change any table and index definition so long as it is consistent with the data. Use the guidelines in the next section (Manual Changes to the .script File). Use a programming editor that is capable of handling very large files and does not wrap long lines of text.

  4. Use the 1.7.2 version of DatabaseManager to create a new database, in this example 'newversion' in a different directory.

  5. SHUTDOWN this database.

  6. Copy the newversion.script file from step 2 over the file of the same name for the new database created in 4.

  7. Try to open the new database using DatabaseManager.

  8. If there is any inconsistency in the data, the script line number is reported on the console and the opening process is aborted. Edit and correct any problems in the newversion.script before attempting to open again.

Manual Changes to the .script File

In 1.7.2 several ALTER TABLE commands are available to change the data structures and their names. However, if an old database cannot be opened due to data inconsistencies, or when there is no ALTER TABLE command for the particular change, manual editing of the SCRIPT file can sometimes be used.

Index and row data for CACHED tables is stored in the *.data file. Because of this, in 1.7.2, a new command, SHUTDOWN SCRIPT, has been introduced to save all the CACHED table data in the .script file and delete the .data and *.backup files. After issuing this command, you can make changes to the *.script file as listed below. This procedure can also be used for adding and removing indexes or constraints to CACHED tables when the size of the *.data file is over 1GB and the normal SQL commands do not work due to the *.data file growing beyond 2GB.

The following changes can be applied so long as they do not affect the integrity of existing data.

  • Names of tables, columns and indexes can be changed.

  • CREATE UNIQUE INDEX ... to CREATE INDEX ... and vice versa

    A unique index can always be converted into a normal index. A non-unique index can only be converted into a unique index if the table data for the column(s) is unique in each row.

  • NOT NULL

    A not-null constraint can always be removed. It can only be added if the table data for the column has no null values.

  • PRIMARY KEY

    A primary key constraint can be removed or added. It cannot be removed if there is a foreign key referencing the column(s).

  • COLUMN TYPES

    Some changes to column types are possible. For example an INTEGER column can be changed to BIGINT, or DATE, TIME and TIMESTAMP columns can be changed to VARCHAR.

Any other changes to data structures should be made only through the supported ALTER commands.

After completing the changes and saving the modified *.script file, you can open the database as normal.

Backing Up Databases

The data for each database consists of up to 5 files in the same directory. The endings are *.properties, *.script, *.data, *.backup and *.log (a file with the *.lck ending is used for controlling access to the database and should not be backed up). These should be backed up together. The files can be backed up while the engine is running but care should be taken that a CHECKPOINT or SHUTDOWN operation does not take place during the backup. It is more efficient to perform the backup immediately after a CHECKPOINT. The *.data file can be excluded from the backup. In this case, when restoring, a dummy *.data file is needed which can be an empty, 0 length file. The engine will expand the *.backup file to replace this dummy file if the backup is restored. If the *.data file is not backed up, the *.properties file may have to be modified to ensure it contain modified=yes instead of modified=no prior to restoration. If a backup immediately follows a checkpoint, then the *.log file can also be excluded, reducing the significant files to *.properties, *.script and *.backup. Normal backup methods, such as archiving the files in a compressed bundle can be used.