This article describes the design of the database management system (DBMS) in the Suneido integrated application platform.
Suneido has an integrated client-server relational database. The
database is accessed via a language which includes administration
requests and update operations as well as actual queries. The query
language is based on the relational algebra language in An Introduction
to Database Systems by C.J.Date.
The database can store the following types of values:
- booleans (true or false)
- strings (including binary e.g. image data)
- number
- dates
- Suneido objects (i.e. arrays or records)
Suneido's DBMS, like its language, is dynamically typed i.e. database
columns (fields) do not have fixed types - they can hold any type of
value. Validating data is left up to the application. All fields and
records are variable length.
Suneido stores the entire
database as a single operating system file. This includes the schema
(table layout) information, data records, indexes, and concurrency &
recovery information. The database is accessed as a memory mapped file.
The database can operate in single-user local mode, or in
multi-user client-server mode. TCP/IP is used to communicate between the
clients and the server.
System Tables
Information about the database (the schema) is stored in the following system tables:
tables - lists the tables that exist
columns - lists the columns for each table
indexes - lists the keys and indexes for each table
triggers - lists the summarize triggers for each table
views - lists the view definitions
These tables may only be altered by the system. However, they can be read from just like any other table.
Administration
The following administrative requests allow you to create, alter, rename, and destroy database tables.
create
Every table must specify at least one key - one or more columns that
uniquely identify records. You are not required to identify a "primary
key". You cannot add a record with a key that already exists in the
table.
create customers (name, phone, fax, email) key(name)
If a table should only have at most one record you can specify an empty key:
create configuration (option1, option2, option3) key()
Tables can also have "indexes". These are not part of the "logical"
design of the database; they serve simply to speed up queries. Suneido
will automatically use indexes to execute queries faster. For example,
if you often searched for customers by phone number:
create customers (name, phone, fax, email) key(name) index(phone)
ensure
Ensure has the same syntax as create, with "create" replaced by
"ensure". If the table doesn't exist, it is the same as create.
Otherwise. it compares the specification with the existing table and
adds any columns, keys, indexes, or summarizes that don't exist. It does
not remove any extra columns, keys, indexes, or
summarizes. In other words, it ensures that the table has at least the
specified schema. For example:
ensure customers (name, phone, fax, email) key(name)
alter
Alter can be used to add or remove columns, keys, indexes, or summarizes, with a syntax similar to create and ensure.
alter customers create (city) index(city)
alter customers delete (fax)
rename
Rename tables, for example:
rename calls to phone_calls
view
Define a view, for example:
view local_customers = customers where city = 'Saskatoon'
or:
view calls = phone_calls union fax_calls
Views permit an application's "logical" view of the database to differ
from the "physical" design. For example, the application can refer to a
view as if it is a simple table, regardless of whether it is actually
implemented as a single table or a union, or a where.
Currently only some views are update-able, depending on their query.
Update-able queries include WHERE, PROJECT that includes a key, JOIN
that is one to one, RENAME, and EXTEND. Theoretically, according to
C.J.Date, all queries are update-able.
destroy
Destroy a table. For example:
destroy customers
Destroy can also be used to un-define a view. This does not affect the actual data.
destroy local_customers
Foreign Keys
A foreign key consists of one or more columns in one table that
uniquely identify a row in another table, i.e. are a key in the other
table. Foreign keys place constraints both on the source table - foreign
key fields must contain values that exist in the other table, and on
the target table - you can't delete or update records if doing so would
make source records invalid.
create calls (date, customer, comments, minutes) key(date)
index(customer) in customers(name)
You can also specify cascading deletes and updates. For example, if you
delete an invoice header record, you can have the invoice line records
automatically deleted. Or if you update the invoice number on the
header, you can have the invoice line items automatically updated. For
example:
create invoice_lines ... index (invoice_num) in invoices cascade
summarize
Summarize creates and maintains stored summaries of a table.
For example:
alter calls create summarize(customer, count, total
minutes)
would create a table called _summarize_calls_customer which would store
the count of calls and total minutes of calls for each customer. This
table would be automatically updated as records are added, updated, or
deleted from the calls table.
See also: Queries - summarize.
Queries
where (select/restrict)
Produces only those records that match certain criteria.
For example:
customers where city = "Saskatoon"
would produce only the customer records where the city was Saskatoon.
Where expressions are a subset of the expressions allowed in the
Suneido language, including arithmetic and string operations, and also
allow calling user defined functions.
project
Extract specified columns and eliminate any resulting duplicates.
For example:
customers project city
would produce a list of cities where you have customers.
rename
Rename columns - useful for join, leftjoin, union, minus, and intersect.
For example:
(sales rename salesman to employee) join staff
extend
Add calculated columns. For example:
sales extend amount = quantity * price
join (natural eq-join)
The result of join is a table with all the columns from the input
queries (without duplicates) and with the set of rows formed by
combining each pair of rows with equal common columns. The input queries
must have at least one column in common.
For example:
customers join sales
Join excludes rows from the first table that do not have a matching row in the second table.
leftjoin (left outer natural eq-join)
Similar to join except leftjoin includes rows from the first table that
do not have a matching row in the second table. These rows will have
empty ("") values for the columns of the second table.
For example:
customers leftjoin sales
will include all customers, whether they had sales or not.
times (product)
The result of times is a table with all the columns from both queries
and with the set of rows formed by combining each possible pair of rows
from the queries.
For example, to get all possible combinations of city and items:
city times items
summarize
Summarize a table by specified columns, with added columns for accumulated functions (count, total, max, min, average).
For example, to get the total sales for each city:
summarize sales city, total price
See also: Administration - summarize
union
minus (difference)
intersect
These are the standard set operations. Union eliminates any duplicates.
These operations require that their inputs have the same set of columns
- project, extend, and rename are often useful to achieve this.
For example, to get a combined list of cities for your customers and suppliers:
(customers project city) union (suppliers project city)
Note:
Although many of the example above show the query operations being
applied to tables, the real power of the query language comes from being
able to apply operations to any other query, not just tables.
Query Optimization
Query optimization has two main phases. The first phase applies some
standard tranformations to the query that are almost always
advantageous. For example, moving where's towards tables and combining
adjacent operations. In the second phase, operations choose appropriate
strategies, indexes, and temporary indexes based on estimated costs.
Data sizes are estimated using the indexes. Some operations have
multiple strategies (e.g. project) they can use, other operations (e.g.
rename) have only a single strategy.
Updating
Database update requests can be used to modify the database.
For example, to insert a record into a table:
insert { name: 'Fred', salary: 37500 } into employees
or to insert records from a query into a table:
insert sales where city = "Fargo" into fargo_sales
or to update all the records from a query:
update parts where category = 2 set price = price * 1.1
or to delete all the records from a query:
delete sales where price < .10
Rules
Unlike many systems, which limit business rules to constraints,
Suneido's business rules support a variety of uses including supplying
default values to fields, performing calculations, and summarizing other
data. Business rules have many advantages. They keep your business
logic separate from your user interface and reports, enable code re-use,
and allow your code to be written in smaller modules that are easier to
test and maintain.
You can define rules for fields by
defining functions called Rule_fieldname. When you access a field that
the record does not contain, if there is a rule it will be called. If
the rule returns a value, it will be stored in that field of the record.
When rules are executed, Suneido automatically tracks their
dependencies on other fields they access. If a dependency is changed,
then the rule field is invalidated. This means that the next time the
field is accessed, the rule will be executed again. Dependencies can be
stored in the database (by creating a field called fieldname_deps) so
that when old records are manipulated, rules will be triggered just as
on new records. Invalidations also trigger record.Observer - this is
used to update the user interface when records change. Invalidations do
not affect non-rule values. i.e. if the user has overridden a derived
value, then the rule on that field will no longer be triggered. Rules
can be used without actually storing the values, or calculated columns
can be stored in the database. Rules can also be used to adjust user
interface controls.
Triggers
Whenever records
are output, updated, or deleted from a table, the system checks for a
user defined trigger named "Trigger_" followed by the table name.
Triggers are only called after the operation succeeds. i.e. If the
output, update, or delete fails, the trigger will not be called.
Triggers can be used to maintain secondary tables such as summaries.
History
The "history" for a table can be accessed with history(table). Any
records removed from a table by deleting or updating will automatically
appear in the history table. The history has two additional columns,
_date is the timestamp when that version of the record was deleted (or
updated), and _action is whether the record was created or deleted.
Note: Since Suneido needs old versions of records for concurrency and recovery, this history facility is virtually "free".
Concurrency
Suneido's DBMS can operate in one of two modes - single-user local
mode, or multi-user client-server mode. In either case, the database
file itself is only ever accessed by a single program exclusively, so no
operating system file locking is required.
All access to
the database must be done within transactions. Transactions can be
either read-only or update. Transactions see the database as of their
start time, as if they were viewing a "snapshot" of the database.
Suneido uses a multi-version, optimistic, timestamp-based approach to
concurrency, which provides full transaction isolation, i.e. is
serializable. Because of this, read-only transactions (e.g. reports)
always succeed - they will never conflict with other transactions.
Update transactions check for conflicts with other transactions when
they complete, and fail (rollback) if conflicts are found.
On-line backups are done using a single read-only transaction to get a
"snap-shot" of the entire database without interfering with use of the
database.
Recovery
Suneido uses the database
itself as its "log" i.e. to keep track of updates, both to support
concurrency (e.g. multi-version and rollback) and to support recovery.
This is possible because data records are only ever added at the end of
the database file, never updated in place. This means that recovery
(e.g. from a crash) is largely a matter of determining what portion of
the end of the file must be discarded in order to rollback to a
consistent state. Since indexes are updated in-place for
efficiency, they are ignored by the recovery process and rebuilt from
the data. Checksums are used to determine the integrity of the database.
References
An Introduction to Database Systems, C.J.Date