Version: |
1.2 |
To permit multiple processes to access and modify a single database instance, and to reduce the overhead per process of connecting to a Gadfly database a Gadfly database may be run in server mode. A Gadfly server can use a DBA (data base administrator) configured start-up script to set up optimized query accesses and certain forms of security.
For example to startup a server for the test database "test" in directory "_test_dir" (created by ./run_tests) use:
gfserve 2222 test _test_dir admin
or to start up the same server with some non-priviledged policies and some named prepared queries (as initialized in test/gfstest.startup(...)) use:
gfserve 2222 test _test_dir admin gfstest
In both cases the admin password for the server is "admin" and the server runs on port 2222. See the doc string for gadfly.server for more information on the command line arguments.
Only one process should directly access a gadfly database at once (not mediated by a server), so if a server is running, no other server for that database should be started and no other process should connect in "non-server" mode to that database.
There are several reasons to run a server: to allow multiple processes to access the same database; to allow password protected restricted access to the database by non-priviledged agents; and to permit faster access to the database by providing globally shared prepared statements. Using a server also eliminates the need to start up and load the database many times -- and startup time could be considerable if the database is large.
For example I imagine that simple Gadfly servers may be of use to implement database enabled CGI scripts, whereas the "non-server" Gadfly will only run with CGI scripts that do not modify the database, and the startup time for Gadfly might make those scripts unacceptibly slow if the database is large. Furthermore, by using the security features a Gadfly server could be configured to allow restricted data distribution across a network without compromising the integrity of the database.
The primary goal of Gadfly server security is to prevent accidental or malicious destruction of a database.
Security is arbitrated by policies. Policies have passwords that are never transmitted in clear text. However, a "captured" command could potentially be repeated by a hostile program even without knowing the password. It is not a good idea to run admin or other unrestricted commands on a network that may have hostile parties sniffing the network. As with the rest of the system I provide no guarantees, but for many purposes the level of security provided may be acceptible. To be specific passwords are used to generate md5 certificates for all server accesses (please see gadfly.gfsocket for implementation details).
A server always has an "admin" policy that is permitted to shutdown, restart, or force a checkpoint on the server. By default the admin policy also has the ability to run arbitrary SQL statements such as "drop table x". This ability can be disabled in a startup function if needed:
admin_policy.general_queries=0
Other policies can be created that have very restricted access. For example the following startup function initializes two policies beyond the admin policy that can only access certain tables in specific ways (from test/gfstest.py):
def startup(admin_policy, connection, Server_instance): """example startup script. add a policies test and test1 passwords same test1 is allowed to query the likess table by name test is allowed to update likes where drinker='nan' also add prepared query dumpwork to admin_policy. """ from gadfly.server import Policy admin_policy["dumpwork"] = "select * from work" test1 = Policy("test1", "test1", connection, queries=0) test = Policy("test", "test", connection, queries=0) test1["qlike"] = "select * from likes where drinker=?" test["updatenan"] = """ update likes set beer=?, perday=? where drinker='nan' """ test["getnan"] = """ select * from likes where drinker='nan' """ return {"test": test, "test1": test1}
Please see the doc string for gadfly.server for more information on creating startup functions.
A policy with queries disabled (queries=0) can only execute named queries. By using such policies a DBA can configure a server such that client programs can only read certain tables, can only update certain rows of certain tables in certain ways, and so forth.
Even policies with "unrestricted access" (queries=1) can provide performance benefits if they have associated named, prepared queries (like "dumpwork" above). At the moment the SQL parser slows down gadfly a bit, and prepared queries will only be parsed once for all clients. After the first access subsequent accesses may be noticably faster (10x faster in some cases), especially if the server has the kjbuckets builtin C module. However, with queries=1 the policy can execute any SQL statement.
NOTE: The server runs all accesses (once the complete message has been read from the network) serially -- there is no concurrent access permitted to a Gadfly instance at this time. For this reason a "large query" may cause the server to "freeze" and delay other accesses. Incomplete requests due to network delays or other problems will not freeze the server, however (sockets are polled using select.select).
NOTE: All server accesses run in "autocommit mode" at this time. A successful access automatically triggers a database commit (but an unsuccessful access will rollback).
As an optimization, however, checkpoints only occur occasionally, once per a given number of accesses, configurable by setting:
Server_instance.check_loop = 100
Servers can be started from the command line using the gadfly.server script interpretation (as shown above) or using gadfly.server.Server(...) from another program. See the doc strings and source for gadfly.server and test/gfstest.py for more information.
NOTE: this is current inaccurate - gfserver needs to be extended to provide this functionality
Servers can be shut down from the command line interpretation of gfclient or from another program using the gadfly.client.gfclient(...) class shutdown() method, but only using the admin policy with the admin password. For example to shut down the server started above:
gfclient shutdown 2222 admin
See the doc strings and source for gadfly.server and test/gfstest.py for more information.
Client access to a gadfly server is similar to the normal Python DB-SIG DBAPI access to gadfly, except that it is sometimes faster and can potentially be run from any machine reachable on the network (if the client program knows the password).
To access a gadfly server from a remote machine the only python modules required (in addition to the standard libraries) are gadfly.client and gadfly.gfsocket.
Initialize a connection with a given "POLICY" with "PASSWORD" to a running server on "machine.domain.com" using port number 2222 with: from gadfly.client import gfclient:
conn = gfclient("POLICY", 2222, "PASSWORD", "machine.domain.com")
Note that policy names and passwords are case sensitive.
Queries and other statements are normally executed via cursors. Obtain a cursor from a connection using:
cursor = connection.cursor()
Execute a statement in a cursor using:
cursor.execute(statement)
or to provide dynamic parameters:
cursor.execute(statement, dynamic_parameters)
For example:
cursor.execute("select * from work") ... cursor.execute("select * from work where name=?", ("carla",))
The dynamic parameters work the same as described in the the main gadfly documentation page. In particular INSERT VALUES can insert several rows at once by using a list of tuples for the rows.
If there is any problem (bad policy name, bad password, server not running, queries not allowed for this policy) the execute will generate an exception.
To run a named/prepared query (initialized at startup) use execute_prepared, which takes a prepared statement name rather than a query string:
cursor.execute_prepared("updatenan", ("rollingrock", 1)) ... cursor.execute_prepared("getnan")
The execute_prepared method works just like the execute method except that the "name" must be the name of a query initialized by the startup(...) function at server startup.
NOTE: by default any execution that sends or recieves "too much data" will be aborted. Edit gadfly.gfsocket (both on the client end and on the server end if different) if you wish to disable this sanity check feature:
LEN_LIMIT=10e8
As with other dbapi cursors the results of a query can be extracted as a list of tuples using (after execute):
result_list = cursor.fetchall()
The other fetches (fetchone and fetchmany) have not been implemented yet (partially since they don't make much sense in this context).
Both named and unnamed statements may be semicolon separated sequences of several SQL statements, but if they are they will return no results.
The gfplus command provides interactive access to a Gadfly server.
For your information the server/client interaction is much like "finger" or "http" -- each client access is a separate TCP/Stream connection where the client sends a request and the server sends a response. After each access the connection is closed and the next access generates a new connection. I did it that way, because it was a simple and robust strategy (witness the success of HTTP).
Please note: Although I have attempted to provide a robust implementation for this software I do not guarantee its correctness. I hope it will work well for you but I do not assume any legal responsibility for problems anyone may have during use of these programs.