How Would You Improve SQL? 271
theodp asks: "It was the best of languages, it was the worst of languages. SQL's handy, but it can also drive you nuts. For example, if you want all 100 columns from a table, 'SELECT *' works quite nicely. However, if you want all but 1 of the 100 columns, be prepared to spell out 99 column names. Wouldn't it not make sense to provide a Google-like shorthand notation like 'SELECT * -ColumnName' (or DROP=syntax like SAS)? So how would you improve SQL?"
Indexes (Score:2, Interesting)
Hierarchical queries (Score:5, Interesting)
Extremely useful when you need to produce a result tree instead of a result list (e.g. Slashdot's nested comments). Oracle does this with "CONNECT BY", there is also a PostgreSQL patch [moonbone.ru] available. Of course there are hacks to do this, such as adding extra fields to keep track of where you are in the tree, but they are a real pain in the arse compared with using the information that's already present in the database.
Re:Better NULL handling? (Score:3, Interesting)
Your method may be technically correct (I could never get those normal things straight) but as the GP points out, it's an unwieldy, inflexible way to do it.
Replace it by Lisp (Score:3, Interesting)
SQL is only 1/2 the story (Score:3, Interesting)
Re:Better NULL handling? (Score:3, Interesting)
Re:Better NULL handling? (Score:5, Interesting)
I've got mod points but I just have to give them up to reply to this....
You're wrong. I agree totally with the grandparent post.
You DON'T need 100 columns, ever. If you have that many columns you should be breaking the table up into several tables with 1:1 joins. Seriously. There will always be some fields that aren't needed. Make the rows smaller by putting commonly used fields in one table, rarely used rows in the other.
And your example of a questionaire (1 row per answer, one column per question) is not even close to normalised. What happens if there is a new question? you have to alter the schema. what happens if some questions are not answered? you'd have to have nulls, and wasted space.
A much better structure is to have a table of questions, and a table of responses (with something like a response id, and maybe an identifier on who answered the questionare) and a question-answer table with each row pointing at a response and a question, and giving the answer that person gave for that question.
Re:Better NULL handling? (Score:1, Interesting)
Something like:
ALTER TABLE Quiz ADD COLUMN Answer101 BIT NULL
So now you have to deal with all the previouslly submitted quizzes only having the first 100. Same for how to do you delete quizzes? Add another quiz? Now you've got to deal with stuff like [Quiz12].[Question50].
How about this, sorry about errors, dinner's on the stove:
CREATE TABLE Quiz
{
ID INT IDENTITY(1, 1)
-- add any other attributes
)
CREATE TABLE Question
(
ID INT IDENTITY(1, 1),
Text VARCHAR(1000)
)
CREATE TABLE QuizQuestion
(
QuizID INT, --add FK
QuestionID INT, --add FK
QuestionNumber INT
)
CREATE TABLE QuizResponse
(
ID INT IDENTITY(1, 1),
QuizID INT, --add FK here
Submitted VARCHAR(1000)
-- add any other attributes
)
CREATE TABLE Answer
(
QuizResponseID INT, --add FK whatever here
QuestionID INT, --add FK whatever here
Answer VARCHAR(1000) --or BIT, INT, whatever
)
Now you can have as many quizzes, questions, or answers as you need. Yes, it's a little more complicated and odds are I missed a major component of what's needed, but you should see the flexibility that this solution would offer over:
CREATE TABLE Quiz1
{
Answer1 BIT,
Answer2 BIT,
Answer3 BIT,
AnswerXX BIT
}
Sorry to ramble, but I've just spent a great deal of time fixing issues exactly like what you describe.
In conclusion, no. Bad. Please don't do any database work.
Dates (Score:4, Interesting)
2. Allow for SELECT statement reordering. I should be able to have the FROM first. This would be a BIG help to SQL editors!
3. Column aliases. So if I have a column in the select that is ColA+ColB as "My Value", I can use the "My Value" in the WHERE, GROUP BY, and ORDER BY instead of having to restate the equation every time.
Re:Better NULL handling? (Score:2, Interesting)
Re:Better NULL handling? (Score:3, Interesting)
any DBA worth his weight in salt knows that you ever see a single table with 100 columns then you have a major design issue.
splitting that up into several tables in the same database will offer a significant performance increase.
Java is your friend (Score:1, Interesting)
With CMP, instead of fooling around with SQL queries you deal with business objects. Why bother with "update inventory" or "insert into cashRegister" when you can call methods on objects, like inventoryItem.substract() or cashRegister.feed() ? This provides you with a good layer between the application and the database, so whenever one has to change you don't have to mess with both. All you got to do to have all this magic is to keep some XML configuration up to date, and if you have the big bucks you can even get JBuilder to do it for you!
SQL is sooo yesterday. Stop wasting time on technical issues: add business value to your applications instead. And with JBoss you don't even have to pay for a good EJB container. (Of course you'll need a good CPU and a little more RAM, but hey, magic has a price!).
Re:Winning the special olumpics and debating an AC (Score:3, Interesting)
((Lease JOIN Invoice) WHERE LeaseNum = "1234")
[LeaseNum, LesseeNum, InvoiceNum, AmountBilled]
ORDER BY whatever
Why do I put the column names at the end? Because a projection operation applies to a *single relational result*, not to individual tables.
Why do I not qualify every column name? Because relational attributes *must* always be unique and unambiguous.
Why do I leave off the Theta from the join (the equality test)? Because I *already* set the foreign keys on those tables. The DBMS should be able to *deduce* which columns to join on and generate an error if it can't.
Here's what's happening:
I used the JOIN operator on two base relation values, stored in variables with the names Lease and Invoice. I got a third (anonymous) relation with all of the combined columns.
I then applied the RESTRICT ("where") operator on that relation, along with a boolean expression, and I got a fourth relation with just the rows where that expression was true.
I then PROJECTED that relation to get a fifth relation with just the desired columns.
Note that ORDER BY is a non-relational operator. It turns the relational result (unordered set) into an an ordered array. So for a final step, the relation was 1) turned into a regular array and 2) ordered.
Of course I didn't actually perform those steps. I told the DBMS what I wanted, and *it* did the work.
Do you see how the SQL conceals the underlying algebra? And how it makes YOU do the work (in the join for instance)? You might not see it. Study the theory more, and you will. Compare with the A+3 example. Imagine a 12-table join. Imagine having to do a query like "where all rows of table X are a subset of all rows in table Y" right in the middle. An algebraic notation would make this MUCH easier. Just break it down, and apply the next operator to the result of the last. You do it all the time when programming your favorite language, why not in SQL?
And yes, the differences get MUCH deeper than this. SQL can't even represent all possible relational queries!
Disallow DELETE or UPDATE without WHERE (Score:2, Interesting)
The SELECT syntax is usually good enough for me, but the capabilities of most implementations leave much to be desired. They only support a small subset of what the syntax suggests.
Re:dot-qualified column names are unique (Score:1, Interesting)
SELECT 1 AS A, 2 AS A, 3 AS A
The fact that SQL allows this is a flaw. You have to constantly think about the names of your columns, even when they are hidden under "SELECT *".
2) Your point would be fine if SQL consistently used qualified table names, but it doesn't. It doesn't *return* them in a qualified state for example.
3) What qualifier should be used in a derived table (a "view").. the name of the view or the original table? If it uses the original table, that violates the requirement that views be indistinguishable from tables. If it uses the view, then the "full name" of the column has changed from what you specified in the query, and you still can't put two columns with the same "unqualified" name in the same view.
4) What name should be used in a self-join?
5) Relational theory just doesn't allow multiple attributes with the same name in a single relation, because there's no way to to tell them apart (relation headers are UNORDERED SETS, not arrays).
The syntax of what SQL does isn't as important, it's the broken inconsistent semantics.
There are extensions (Score:2, Interesting)
SELECT * FROM A INTERSECT SELECT * FROM A LIMIT 99;
or
SELECT * FROM A EXCEPT SELECT "B" FROM A;
Other engines do it differently. I think one of the best things about SQL is that it's a loose standard. You can easily choose the engine that works best for you... unless you are from the Cult of Microsoft (SQL Server). DB/2, Oracle, and even Sybase have very cool features that make queries much more powerful.
While SQL is hard to use at times (remembering double outer joins), it's that way for a reason. You don't want to be as easy to use as VB, for instance. Being forced to think in terms of lists and cartesean products forces you to think about speed and abstraction.
SQL is as easy as it should be, IMO. Specializing the access modifiers will only add to the complexity and make query optimization an impossibility. If you don't care about speed, then your needs probably aren't serious enough for a full blown SQL RDBMS. Text, XML, or even MS Access could be better suited.
Complaining about SQL is like complaining about Linear Algebra. These systems exist for exceptionally good reason. They are constrained to reduce or eliminate unsolvable situations.
Replace SQL With Prolog (Score:1, Interesting)
Actually Prolog can do anything an RDBMS can do, but transactions (along with the ACID properties: atomicity, consistency, isolation, durability) are missing. It would be simplest to add transaction processing to existing Prolog implementations and kiss the current RDBMS/SQL implementations goodbye.