|
Chordate Solutions
Gadfly FAQ
General information
The following topics are of a general nature.
- What is gadfly?
Gadfly is a relational database management system
which uses a large subset of very standard SQL
as its query language and Python modules and optional
Python/C extension modules as its underlying engine. Gadfly
stores the active database in memory, with recovery logging
to a file system. It supports an optional TCP/IP based
client server mode and log based failure recovery for
system or software failures (but not for disk failures).
- What is the current version?
The current version is Gadfly 1.0. To be sure you have the
current version, please consult the version number that comes
in the documentation.
Gadfly 1.0 attempts to fix all known bugs in previous versions and adds
a small number of features (hopefully without adding too many
new bugs).
- Where can I get it?
Please get it from http://www.chordate.com
- Why SQL and the relational model?
Gadfly is based in SQL and the relational model
primarily because the SQL query language and the relational
model have been highly successful, are highly standard, and
are well understood by a large number of programmers.
If you understand standard SQL you already know how to use
Gadfly -- and this distinguishes Gadfly from many similar
database systems out there -- even ones which aren't "really free."
- Why Python?
Gadfly is based in Python because Python made the development
of such a relatively sophisticated and standard SQL implementation
feasible for one programmer, in his spare time, and also because
it automatically made Gadfly portable to almost any computing
platform. In addition, Gadfly can be used with Python to develop
sophisticated data manipulation applications easily.
- What is the License? Is it Free? Why?
Gadfly can be used and modified in source form for any purpose,
including commercial purposes. There is no explicit or implied
warrantee for Gadfly, even though it is intended to be useful.
Gadfly is free because I wanted people to use it, and I suspected
they wouldn't if it wasn't free and highly standard, and furthermore
I didn't want to risk the possibility of legal responsibility
if people did use it and had problems.
- You mean I can bundle it into my own product
for free? Really?
Yes. Please do. Several organizations have done so already.
- Where does it run?
To run Gadfly you need (1) a file system and (2) Python.
If Python runs on your platform Gadfly should run there too.
I have not been able to verify that this version runs on the
Macintosh yet (since the Mac apparently has some fairly nonstandard
file system conventions), but even if it doesn't it should
run after a few trivial adjustments. Please let me know if
Gadfly doesn't run on any of your platforms.
Gadfly is expected to run "out of the box" on Windows 95/98, Windows NT,
all Unices or unix like platforms (Solaris, IX, BSD, Irix, Linux) and
anywhere else where Python runs and there is a filesystem
(Windows CE, QNX, Mac, etcetera).
Furthermore the Gadfly database storage format (based in the Python
marshal format) is binary compatible between all these systems -- you
can create a database on any one of the systems and use a binary
transfer mechanism to any other system and the database should load
and run.
- What are the software/hardware requirements?
Gadfly requires a Python installation and sufficient memory and disk
on the target machine to store the live databases in question.
Since live Gadfly databases are kept in memory during active use
Gadfly is not appropriate for databases that approach the size of
virtual memory. In practice this means that on modern inexpensive
machines Gadfly can store and manipulate databases with tens
of thousands of rows without difficulty. Unless your machine is
large databases with millions of rows might be infeasible -- and
even if memory is sufficient the loading and storing of the database
might make startup and shutdown speed an issue. Many interesting
database instances never approach this size.
- Is there a query optimizer?
Yes, Gadfly automatically optimizes queries using some
relatively sophisticated methods that attempt to take
advantage of any available indices, among other methods.
Technically all optimization uses only "equality predicates"
at this time, so queries involving many inequalities (eg, BETWEEN)
and few or no equalities might be slow, or might create very
large intermediate results. Most standard database applications
use equality constraints extensively (eg, as foriegn keys or
relationships) but if your application is, say, a scientific
application that requires many inequality comparisons
across multiple tables among
floating point data, the Gadfly optimizer may not help much
-- you may want to load the data into your own structures and
hand optimize the access and combination methods.
On the other hand, if your application uses equality
predicates often enough you may find that the Gadfly optimizer does pretty
well, perhaps better than a "hand coded home grown database"
would do without a lot of work.
- Is it fast?
You probably want a benchmark comparison. I have none to offer.
The query evaluation and load/store facilities for Gadfly seem
to be pretty fast for non-huge databases -- particularly if you
make intelligent use of indices. At this point the slowest part
is the SQL parser itself. If you use a lot of similar SQL statements
please consult the documentation on how to avoid reparsing
statements by use of multiple cursors and dynamic parameters.
Unverified anecdotal reports suggest that Gadfly is not
noticably slower than other free or freeish portable database
systems; some have suggested it can even be faster, at least
for certain types of use.
The entire query engine is designed to use the kjbuckets Python
C extension module, and use of kjbuckets should speed the performance
of Gadfly considerably -- Gadfly will run without it, but noticably
slower (2x slower for even small databases, more for larger ones).
- Are there data size limitations?
There are no intrinsic limitations. For example a "varchar"
can be of any size. That said, remember that a gadfly database,
in the absense of hackery (eg, storing a filename instead of a
value) stores all data in memory, so at some point large values
may fill up virtual memory and the load/store operations can get
slowish.
- What about recovery after crashes?
Gadfly supports a log with deferred updates recovery procedure,
together with commits and rollbacks. This means that if your program
crashes or if your computer crashes it should be able to restore
a Gadfly database to the state of the database at the point of the
last "commit". Any remaining bugs in this area are likely to have to do
with data definition statements ("create" or "drop") so be careful
to commit and checkpoint after modifying the data definitions.
Of course, there are no known bugs, but it is possibly that not all
possible combinations have been tested (this area is quite subtle,
unfortunately :( ). Please see the recovery documentation for more
information.
- What about client/server based access?
Gadfly supports a TCP/IP based client/server mode, together with a
non-standard but highly flexible security paradigm for the server.
Please see the client/server mode documentation.
- What about concurrency control?
Gadfly does not have true concurrency control. The only safe
way to allow two concurrent agents to modify a single database
is to have both of them communicate with the database via a gadfly
server that arbitrates database accesses. The server will serve
each agent serially -- one at a time -- in separate transactions
each committed immediately.
Of course, two programs can have read-only access to the same
database in separate memory instances without any problems.
- Is there a mailing list?
Yes. Please go
http://www.egroups.com/list/gadfly-rdbms/ for more information on
the gadfly-rdbms mailing list, hosted by egroups.com.
- Who uses it?
My email suggests that quite a few people have been using
Gadfly, including professional and industrial organizations
of various sorts. From my vantage point it is not clear
how serious all of these uses have been -- but some of them
appear quite serious. Gadfly has been included in a number
of CD-rom publications, such as Linux distributions.
- Is support available?
Yes, please contact
Chordate
for paid support. Unpaid support is more popular, but
less reliable. Please report bugs and suggestions
even if you don't want to pay ;).
- What are the plans for Gadfly?
At this point the current plan is to maintain the current
implementation and fix any bugs that arise. Some time in
the distant future (several months maybe) a new release with
major new features might arrive, but only if the new features
allow existing applications to run. Gadfly will support
backwards compatibility of existing databases.
- How do I contribute some code that uses Gadfly?
There is no formal mechanism. One possible approach would be to
join the
Python Software Association
and then get a
starship account
and then put your contributed code there. Chordate may add links
to contributed code to the Gadfly documentation at their discretion.
This section relates to the installation of Gadfly and connecting
Gadfly to other software.
- Where is sqlwhere?
If you have problems with the sqlwhere module you probably have not
installed gadfly or have installed it incorrectly. Gadfly requires
an installation procedure. Please see the documentation.
- I get an ImportError!
The gadfly modules must all be on the Python Path for a given
application to use gadfly.
import sys
gadflydirectory = "C:\\gadfly" # edit as appropriate
sys.path.append(gadflydirectory)
Or if you don't understand that please look to your Python
documentation.
- Why not use hierarchical packages?
Gadfly doesn't use the standard Python package mechanism
due to the need to support Python1.4 and to be backwards
compatibly to existing applications. Please suggest a way
to use packages that won't break existing applications.
- Does Gadfly support ODBC or JDBC?
Although the Gadfly SQL subset is based on the ODBC 2.0 SQL definition
Gadfly does not support the C-level ODBC calling mechanism.
All direct accesses to Gadfly must use Python modules at this time.
- How can I move a Gadfly database instance
from one location/machine to another?
A gadfly database instance as stored in a file system directory
may be copied to another directory even on another machine using
any binary file copy mechanism. You must copy all files relating
to the database instance in the
gadfly database directory to the destination, and be sure that
if you are copying across different platforms to use a binary
copy mechanism (eg, ftp in binary mode).
- How can I access gadfly from another program?
The only supported API (applications program interface) for gadfly
at this writing is the Python Database API either using direct access
or via the client/server mechanism. Although you must use Python
to access a gadfly database at this time it is possible to embed
a Python instance in another program. Please see the Python documentation.
- What database size limits are there?
As mentioned there are no intrinsic limits (eg, a varchar can be of
any size) except for the limitations of memory and the possibility
that the load/store mechanism may get too slow if the database grows
too large.
- What are the file types in the gadfly database
directory?
*.grl -- a relation representation where * is the relation name.
*.brl -- a back-up relation representation (for possible recovery).
*.gfd -- a data definitions file where * is the database name.
*.bfd -- a back-up data definitions file (for possible recovery).
*.gfl -- a log file where * is the database name
*.glb -- a back-up log file (for possible recovery).
Use
The following section relates to the use of gadfly.
- Does Gadfly support virtual tables (VIEWS)?
Quantifiers? EXISTS? Aggregates? Groupings? Indices?
Of course it does! Gadfly supports a very large
SQL subset. See the additional documentation for more detailed information.
- How standard is the SQL subset supported?
Gadfly SQL is based on ODBC 2.0 SQL which in turn is based on
an SQL standard draft. This means that Gadfly SQL adheres closely
to the SQL you find in many other database products and documented
in many books and other documentation. A number of people have
implemented Gadfly databases and transferred the SQL
with few to no modifications from the implementation
directly to Oracle or MS-Access, for example.
- Are BLOBS (binary large objects)
and arbitrary string values supported?
Yes, varchars have no intrinsic size limit and can be used to store
marshalled or pickled Python BLOBS -- but the application using gadfly
will have to know when to "deserialize" the objects (using marshal
or pickle or other mechanisms). For example it is even possible to
store Python byte code objects in a gadfly database in this way.
In particular gadfly stores all strings, including strings with null
bytes and newlines, but for such strings you must use
dynamic parameters (or other mechanisms that avoid embedding the
string directly into the SQL syntax), like:
s = chr(0)+chr(10)
cursor.insert("insert into table x(v) values (?)", (s,))
NOT
s = chr(0)+chr(10)
cursor.insert("insert into table x(v) values ('%s')" % (s,))
Since the SQL parser will choke on the latter.
- How can I find the column names in a "select *"?
The nasty answer is "don't use select *", but if you really want to you can get
the order of names for the columns from the description attribute for
a cursor.
>>> from gadfly import gadfly
>>> g = gadfly("test", "dbtest")
>>> c = g.cursor()
>>> c.execute("select * from work")
>>> c.description
(('RATE', None, None, None, None, None, None), ('NAME', None, None, None, None,
None, None), ('HOURS', None, None, None, None, None, None))
>>> print c.pp()
RATE | NAME | HOURS
=======================
40.2 | sam | 30
10.2 | norm | 45
5.4 | woody | 80
4.4 | diane | 3
12.9 | rebecca | 120
200.0 | cliff | 26
3.5 | carla | 9
IE, c.description[i][0] gives the name of the ith column. The query
mechanism essentially randomizes the order of the columns in a select
* and you cannot rely on the engine producing columns in any specific
order unless you specify the order in the query explicitly.
- Can I add my own table implementation?
Yes you can, to a limited extent. Use the remote view protocol in
gfintrospect.py. The table you add can extract information from any
source but must not recursively query the same
database instance (using a Python implementation)
and place it into a Gadfly database either only once on first
usage or once per query. The table added cannot be updated via SQL
and you cannot load "only the part of the table you need
for this query." The table
added must be explicitly re-added during the database initialization
on each usage, and if you implement the table incorrectly you may
cause some gadfly queries to crash. Use with caution: this is an
advanced feature. See remotetest.py.
- Why can't I update my own table implementation via SQL?
Although it is possible to create a protocol which allows
non standard table implementations to be updated and optimized
via SQL,
the implications to the query optimization mechanism
and the crash recovery strategy are not
clear and may be quite subtle. At this time the implementation
opts to avoid possible bugs by not supporting such features,
although programmers are welcome to experiment at their own
risk, with the understanding that their experimental modifications
may not be supported in future releases.
It is perfectly possible to change the number of rows
or values in the rows of your own table implementations
without using SQL or gadfly, however and have the updates
automatically reflected in the table instance in the database.
[You should not, however, change the number or names of the columns.]
For example gfintrospect.DictKeyValueView will "wrap" a Python dictionary
as a gadfly table and automatically reflect modifications
made by an external Python program to
the dictionary.
- How do you define a primary key?
New in 1.0 you can get the effect of a primary key by defining
a unique index on the primary key columns
create unique index pkey on person(firstname, lastname)
Effectively will enforce a primary key constraint for (firstname, lastname)
on the person table.
- What about NULLs, Triggers and other missing stuff?
The present release opted not to add missing standard or non-standard
features that were likely to cause major modifications to large sections
of the implementation, and therefore were likely to introduce bugs.
- Where is the LIKE predicate?
The LIKE predicate for string matching is still not supported at the
SQL level. For what it's worth, it is easy to use Python's string matching
(regex, re, string.search, etcetera)
facilities on the result of a query. Also, for what it's worth, since
the gadfly optimizer won't easily be able to optimize for string matching
the "by hand" method would essentially be what gadfly would do anyway,
without major modifications to the implementation.
- After a crash Gadfly won't recover! Help!
This shouldn't happen, but did happen (at least once) for a previous
release. If it happens again, you can explicitly delete the log files
from the database directory in order to recover the database to a state
which may or may not correspond to the state of the database at the second
to last commit/checkpoint operation. It is possible, but not likely,
that the database state will include some but not all updates from the
last commit, but, to repeat, it shouldn't happen. Please report the
problem if it occurs again.
feedback
home
Gadfly home
|