Gadfly SQL constructs

Version:

1.2

This document describes SQL constructs supported by Gadfly. The presentation does not define the complete syntax -- see sqlgram.py for the precise syntax as BNF -- nor the complete semantics -- see a good book on SQL for more detailed coverage of semantic (or use the source, Luke ;c) ). Also, please have a look at my evolving database course notes for more coverage of SQL. Examples of all supported constructs are also shown in the test suite source file gftest.py. This document is only a very brief guide, primarily of use to those who already understand something about SQL -- it is neither a tutorial nor a detailed discussion of syntax and semantics.

Contents

The Standard, with omissions
Statements
Conditions
Expressions

The Standard, with omissions

Gadfly supports a large subset of ODBC 2.0 SQL. The reason ODBC 2.0 is chosen is because it provides a fairly strong set of constructs, but does not include some of the more obscure features of other SQL standards which would be extremely difficult and complex to implement correctly (and perhaps, not used very frequently (?)).

Supported features include views, groupings, aggregates, subquery expressions, quantified subquery comparisons, EXISTS, IN, UNION, EXCEPT, INTERSECT, searched mutations and Indices, among others (see below).

Some important omissions from ODBC 2.0 at this point are

Nulls.
Outer joins.
CHECK conditions.
Enforced data type constraints.
Alter table (can't implement until NULLs arrive).
Date, Time, and Interval data types

It is hoped these will be implemented at some future time.

Less important omissions include

Cursor based updates and deletes: (justification: if you really need them the db design is flawed, and it's possible to use python instead).
LIKE string predicate: (justification: use Python regexes in python code).
Users and permissions: (justification: Gadfly is not intended for full multiuser use at this time).

These may or may not be implemented at some future time.

Statements

All interaction with SQL databases is mediated by SQL statements, or statement sequences. Statement sequences are statements separated by semicolons. SQL keywords and user defined names are not case sensitive (but string values are, of course).

SQL statements include the following.

Select Statement

The select statement derives a table from tables in the database. It's general form is:

sub_query
  optorder_by

Where sub_query is given by:

SELECT alldistinct select_list
FROM table_reference_list
optwhere
optgroup
opthaving
optunion

Read the statement:

SELECT [DISTINCT|ALL] expressions or *
FROM tables
[WHERE condition]
[GROUP BY group-expressions]
[HAVING aggregate-condition]
[union-clause]
[ORDER BY columns]

as follows:

Make all combinations of rows from the tables (FROM line)
Eliminate those combinations not satisfying condition (WHERE line)
(if GROUP present) form aggregate groups that match on group-expressions
(if HAVING present) eliminate aggregate groups that don't satisfy the aggregate-condition.
compute the columns to keep (SELECT line).
(if union-clause present) combine (union, except, intersect) the result with the result of another select statement.
if DISTINCT, throw out redundant entries.
(if ORDER present) order the result by the columns (ascending or descending as specified, with precedence as listed).

This reading has little to do with the actual implementation, but the answer produced should match this intuitive reading.

Create and drop table

The create and drop table constructs initialize and destroy a table structure, respectively:

CREATE TABLE user_defined_name ( colelts )

DROP TABLE user_defined_name

The colelts declare the names of the columns for the table and their data types. The data types are not checked or enforced in any way at this time.

Table mutations (INSERT, UPDATE, DELETE)

Insert, Update, and Delete statements insert rows into tables, modify rows in tables in place, or remove rows from tables respectively:

INSERT INTO table_name optcolids insert_spec

DELETE FROM user_defined_name optwhere

UPDATE user_defined_name
SET assns
optwhere

The insert statement has two variants (in this implementation) INSERT sub-select and INSERT VALUES:

insert into r (a,b,c) select a,b,c from s

insert into r (a,b,c) values (1,2,3)

The first inserts the result of a SELECT statement into the target table and the other inserts explicit values (which may be dynamic parameters, see below).

Cursor based updates are not supported at the SQL level, eg:

update r set a=1 where current of curs

is not supported.

Indices

The create and drop index statements initialize and destroy index structures respectively:

CREATE INDEX user_defined_name
ON user_defined_name
( namelist )

DROP INDEX user_defined_name

Indices allow fast access to a table, based on values for the indexed columns in the namelist.

Indices can be UNIQUE, meaning that the attributes of the index cannot take on the same values in the table twice:

CREATE UNIQUE INDEX user_defined_name
ON user_defined_name ( namelist )

Unique indices can be used to enforce primary and secondary key constraints. After a UNIQUE index on a table is created inserts that attempt to insert repeat values for the indexed columns will be rejected.

Views

Create view and drop view statements initialize and drop views, respectively:

CREATE VIEW user_defined_name optnamelist
AS select_statement

DROP VIEW user_defined_name

Views are "derived tables" which are defined as stored SELECT statements. They can be used as tables, except that they cannot be directly mutated.

It is possible to "implement your own views in Python". Please see remotetest.py, gfintrospect and the FAQ for discussion.

Conditions

Conditions are truth valued boolean expressions formed from basic conditions possibly combined using NOT, AND, OR (where NOT has highest precedence and OR has lowest precedence) and parentheses.

Basic conditions include simple comparisons:

expression = expression
expression < expression
expression > expression
expression <= expression
expression >= expression
expression <> expression

Variants of the simple comparisons are the quantified subquery comparisons:

expression = ANY ( subquery )
expression = ALL ( subquery )

(and similarly for the other comparison operators). The IN predicate tests membership (like =ANY):

expression IN ( subquery )
expression NOT IN ( subquery )

For all the quantified comparisons and IN the subquery must generate a single column table.

Also included are the the BETWEEN and NOT BETWEEN predicates:

expression BETWEEN expression AND expression
expression NOT BETWEEN expression AND expression

The most general subquery predicate is EXISTS and NOT EXISTS which places no restriction on the subquery:

EXISTS (subquery)
NOT EXISTS (subquery)

Expressions

Expressions occur in conditions (WHERE, HAVING, etc.), in UPDATE searched assignments, and in the select list of select statements.

Expressions are formed from primary expressions, possibly combined using the standard arithmetic operators and parentheses with the normal precedence.

Primary expressions include numeric and string literals. Numeric literals supported are the Python numeric literals. String constants are set off by apostrophies, where two apostrophe's in sequence represent an apostrophy in the string: 'SQL string literals ain''t pretty'

Column name expressions may be unqualified if they are unambiguous, or may be qualified with a table name or table alias:

bar
frequents.bar
f.bar

The rules for scoping of column names are not covered here. Column names in subqueries may refer to bindings in the query (or queries) that contain the sub-query.

Subquery expressions of form:

( select_statement )

must produce a single column and single row table.

Aggregate operations are only permitted in the select list or in the HAVING condition of SELECT statements (including subselects):

COUNT(*)
COUNT(expression)
AVG(expression)
MAX(expression)
SUM(expression)
MIN(expression)

and also including the non-standard extension MEDIAN:

MEDIAN(expression)

Aggregate operations can be applied to distinct values as in:

COUNT(DISTINCT expression)

The Dynamic expression "?" is a placeholder for a value bound at evaluation time (from Python values). See the gadfly overview doc "Dynamic Values" section for more details on the use of dynamic parameters.