SQL: Just say No!

Michael Ubell, Sleepycat Software

SQL was designed as a general purpose query language for access to relational databases. As such, it is wildly successful. However, the utility of SQL for ad hoc queries does not necessarily translate into its use in high-performance transaction-processing applications. High-performance transaction-processing systems typically use a few standard queries against a database. As such, their use of SQL is unnecessary.

How do we assess transaction-processing performance? We create benchmarks that are representative of the needs of high-performance systems. What are the benchmarks we most commonly use? TPC-C and Spec-Web. Each specification devotes considerable effort to specifying the (small) set of queries executed by the application. Thus, the power of SQL is largely ignored. Using SQL to express database access may shorten the initial development cycle when the precise nature of the queries is unknown. However, as a product nears deployment, the small number of queries actually used become established and remain so for the lifetime of the application. Thus, once in the field, the application continues to pay the performance penalty for flexibility that is never used. Neither the query mix nor the underlying database schema changes after application deployment.

The result of the reliance on SQL is that application designers tune their systems indirectly. That is, they must figure out how changes to their application can/will be manifested in the database system and its query optimizer. In effect, when application designers do this, they are relegating the tuning of their application to the database vendors' engineering organization. The only tuning the application designers can do is modify their use of the database in an attempt to allow the standard database optimizations to do better for the application in question. This is not the shortest path to a high-performing system. Instead, effort is better spent optimizing the application itself rather than trying to game the database.

The relational database vendors all but concede that performance tuning requires gaming the database. For years we have seen special purpose tuning of these "general purpose" systems to win the standard benchmarks. At first this tuning benefitted the customers by raising the general performance of the system. As time went by general performance improvements were not enough and the vendors resorted to fastpathing of the benchmark operations or even inventing new operations like Oracle's discrete transactions. These performance hacks are simply an admission that the conventional approach is flawed.

Relying on database vendor tuning ignores the massive application knowledge that the developer brings to bear on the problem. It is frequently impossible or startlingly difficult to take advantage of a priori knowledge about relationships in the database, access patterns, query frequency, peak activity, etc. As mentioned above, it is often impossible for applications to take advantage of the high-performance features that the databases offer. Even if they do help, critical performance tuning is relegated to the database vendors' engineers instead of to the application developers who understand the needs of the application better.

The second obstacle to high performance that SQL introduces is that of translation. Business logic is rarely encapsulated in SQL. Instead, it is formulated in standard programming languages, XML, Java Beans, or some other format. As such, resources are dedicated to translating from the application's "native" format to SQL and back again. This is wasted effort, both in implementation and in performance. Coding database accesses directly in a native interface leads to less code, a better overall design, easier tuning, and higher performance.

Reliance on SQL also introduces serious additional security risks. For example, it dramatically increases the safety of many systems to disallow generic queries, allowing access to the database only through a narrow protocol which supports only canned requests. "Narrowing the aperture" by disallowing generic queries means that breaking into the web server does not provide complete access to the database system.

It is important to note that designing high-performance TP applications without SQL does not mean that ad hoc query mechanisms are unavailable. Instead, we propose using SQL for ad hoc queries, but using a more direct, programmatic, non client-server interface to the data for high performance concerns.

In conclusion, by definition, a high performance transaction-processing environment demands performance. As such, the overhead for an ad hoc query language that goes unused in this environment is an impediment. Instead, applications need the tools that empower them to directly and efficiently access and manipulate data. Both commercial vendors and the research community should devote time and energy to this endeavor.