JDBC Queue (in lack of a better name)

JDBC queue is a library for writing transactional, messaging-oriented software. It does this by managing tasks in a set of queues in a plain SQL database. It is currently only been tested with PostgreSQL but should be fairly portable.

It consists of three major parts:

A core queue part which implements CRUD access to the queues, tasks and configuration.
An async part works on top of a single queue. It controls a consumer thread and dispatches tasks to an normal Java Executor.
A spring layer that integrates connection and transaction handing with the standard Spring tools.

The queue interface is indented to be used by:

Management code that want to get queue statistics or reconfigure the queues
Cron jobs that want to consume everything that has been scheduled
Applications that are run just to insert a small number of tasks

The async layer provides the JMS like interface for each queue. It creates a consumer thread that polls the database at a specified interval, marks the task for processing and passes it on to the executor. By using a multi-threaded executor it can scale up quite easily.

The spring layer makes sure that the parts plays along nicely with the existing JDBC/JPA/Hibernate code that you already have.

Features

Transactionality: each task is performed in an SQL transaction ensuring consistency between the task table and the other tables used when processing the task.

A task has:

state
parent
created_date
last_updated
completed_date

Each task has an optional parent reference: this allows you to trace the messages around in your system to see what effects each task had.

"queue system" allowing multiple queue systems to be run in a single JVM

Push: Intra-JVM notification of new elements on a queue for instant processing.

Implementation

Performance

Use this library if you want correctness and managebility over speed.

Possible improvements

Batch processing of tasks in a single transaction: let the consumer thread fetch a batch of N tasks, set all of them to PROCESSING in a transaction and send the batch to a processor thread which will process all of them in one transaction.

This will significantly reduce the number of transactions required thus increasing speed. A possible issue is that if one of the tasks fails it will abort the entire transaction. If this happens consistenly it can keep all of the tasks from completion so some sort of mechanism to only pick tasks that haven't failed before might be useful.

Error handling strategies: Currently there is no retrying or anything smart around tasks that fail. This definitely needs to be improved.

A generic class that re-schedules a task for execution and can be used as a TimerTask might be useful.

Support locking rows instead of extra states: This might significantly improve performance and write pressure on the db.

Configurable state machine: Right now the possible states a task can be in is hard-coded.

Utilities to do routing: this library does not intend to compete with normal JMS servers or specialized tools like Apache Camel but it might still be useful to have some tools with the package:

A consumer that can be configured to replicate the task to a set of other queues creating a classic MQ topic.
A consumer that can be configured to replicate the task from this database to another. As this will span two transactions the operation has to be idempotent, but that should be doable. It might be useful to add some fields to a task that points to the remote task.
A conumer that take tasks that has failed too many times and move them to a dead letter queue.

Optional push notification between JVMs: use a simple MQ with in-memory storage to provide push notification after new tasks has been committed to the database. This will allow the system to behave like a RPC-like system, just with proper transactional semantics. The normal database poller can be set to poll at a much lower interval to pick up old messages whose notification was lost.

Schema dependent features: JDBC queue does not depend on a very specific schema, it mainly requires two tables with a certain set of columns. Features like the parent reference might not be useful for all applications so it might be useful for a queue system to look in the task database to see if the column is there and fail if someone tries to create a task with a parent reference that is not valid.

This might also be implemented in a more simple fasion when creating the QueueSystem so the app doesn't have to discover anything.