How Would You Improve SQL? 271
theodp asks: "It was the best of languages, it was the worst of languages. SQL's handy, but it can also drive you nuts. For example, if you want all 100 columns from a table, 'SELECT *' works quite nicely. However, if you want all but 1 of the 100 columns, be prepared to spell out 99 column names. Wouldn't it not make sense to provide a Google-like shorthand notation like 'SELECT * -ColumnName' (or DROP=syntax like SAS)? So how would you improve SQL?"
Indexes (Score:2, Interesting)
Hierarchical queries (Score:5, Interesting)
Extremely useful when you need to produce a result tree instead of a result list (e.g. Slashdot's nested comments). Oracle does this with "CONNECT BY", there is also a PostgreSQL patch [moonbone.ru] available. Of course there are hacks to do this, such as adding extra fields to keep track of where you are in the tree, but they are a real pain in the arse compared with using the information that's already present in the database.
No poetry (Score:4, Funny)
(yes, well, I ran out of ideas)
Re:No poetry (Score:5, Funny)
2 row(s) returned.
Re:No poetry (Score:5, Funny)
violets are blue
you just did a Cartesian Product
your DBA will be talking to you
thank you, thank you, shows at 7 and 10, remember to tip the waitstaff
Re:No poetry (Score:3, Insightful)
Re:No poetry (Score:3, Funny)
Re:No poetry (Score:3, Funny)
And it's also theoretically impossible to get this far in life without a sense of humor.
I guess we were both wrong.
Check out LINQ... (Score:3, Informative)
http://msdn.microsoft.com/netframework/future/lin
LINQ (Language Integrated Query) project for c# 3.0. Some cool stuff tht i never really thought about.
For example, their select statements go backwords ie from table, select column1, n2, n3 etc... Seems kinda wacky at first, but it makes sense since you really should know what table your'e selecting from before you specify the columns.
ex.
public void Linq3() {
List products = GetProductList();
var expensiveInStockProducts =
from p in products
where p.UnitsInStock > 0 && p.UnitPrice > 3.00M
select p;
Console.WriteLine("In-stock products that cost more than 3.00:");
foreach (var product in expensiveInStockProducts) {
Console.WriteLine("{0} is in stock and costs more than 3.00.", product.ProductName);
}
}
Re:Check out LINQ... (Score:2)
*shudder*
Re:Check out LINQ... (Score:2)
Prehaps MS meant 'Laugh I Never Queried' or 'Laugh I Nearly Quit'
Better NULL handling? (Score:5, Insightful)
If this is your main problem with SQL, then you have other problems as well. Who in their right mind needs a table with 100 columns? If you have 100 columns, you seriously need to normalize your database.
Ok, I might not be a database buff. Actually, my experience with SQL is purely academical (although I've worked with object-oriented databases). But if I were to improve SQL, my attempts would be in the direction of making it into a more pure mapping of a relational database, not in adding yet more syntactic sugar.
Re:Better NULL handling? (Score:2)
You're kidding, right? I can think of a lot of reasons to have 100 columns, it simply depends on what you're working with - and it is perfectly normalized. Say you had a questionaire on a web form. Let's say it had 100 questions. Let's say you had a checkbox next to each. When the user checked the
Re:Better NULL handling? (Score:3, Interesting)
Your method may be technically correct (I could never get those normal things straight) but as the GP points out, it's an unwieldy, inflexible way to do it.
Re:Better NULL handling? (Score:3, Interesting)
Re:Better NULL handling? (Score:2, Interesting)
Re:Better NULL handling? (Score:2)
Re:Better NULL handling? (Score:4, Informative)
No, it simplifies them. RDBMS's make it very easy to do things on a row basis, not so much on a column basis. It is utterly trivial to get "usefull" [sic] data out of a table structure that's properly designed, but nightmarish to get it if it's not. For instance, if you have 100 columns how do you answer the question "what percentage of respondants answered at least 80% of all questions?". You can't easily. And if you add or remove a question then you will have to touch a great deal more code than if you had implemented the table sanely in the first place.
he OPs point stands - these are not normalization issues
Yes they are! This is practically a poster child of normalization (or lack thereof). In fact it's an utterly trivial example of normalization that doesn't actually raise any difficult issues over speed, accessibility, etc. where you often run into problems with normalization.
is a real weakness of the relational model compared to OO design
There's a vast difference between relational databases and object oriented databases. I've yet to see anyone (including Oracle) do objects-in-relational decently. And SQL is inherently a relational paradigm -- you wouldn't want to use it for an OODB because it would just be inappropriate.
Re:Better NULL handling? (Score:2)
Re:Better NULL handling? (Score:5, Interesting)
I've got mod points but I just have to give them up to reply to this....
You're wrong. I agree totally with the grandparent post.
You DON'T need 100 columns, ever. If you have that many columns you should be breaking the table up into several tables with 1:1 joins. Seriously. There will always be some fields that aren't needed. Make the rows smaller by putting commonly used fields in one table, rarely used rows in the other.
And your example of a questionaire (1 row per answer, one column per question) is not even close to normalised. What happens if there is a new question? you have to alter the schema. what happens if some questions are not answered? you'd have to have nulls, and wasted space.
A much better structure is to have a table of questions, and a table of responses (with something like a response id, and maybe an identifier on who answered the questionare) and a question-answer table with each row pointing at a response and a question, and giving the answer that person gave for that question.
Re:Better NULL handling? (Score:3, Insightful)
I completely disagree. Breaking up the questionaire into seperate tables does nothing. I want every question every time, it's all one HTML page with checkboxes and links to the questions. If they were seper
Re:Better NULL handling? (Score:5, Insightful)
That is not a SQL or database issue, it's an issue with your scripting language and Web server. This would be cached.
You certainly can have keep the table at 100 rows. But at that point, you might ask yourself why you're using a database at all. A flatfile will probably have less overhead, even with file locking issues. Especially considering the simplistic questionaire you're using as an example--a long line of 0s and 1s would do you.
Re:Better NULL handling? (Score:3, Interesting)
any DBA worth his weight in salt knows that you ever see a single table with 100 columns then you have a major design issue.
splitting that up into several tables in the same database will offer a significant performance increase.
Re:Better NULL handling? (Score:3)
Re:Better NULL handling? (Score:3, Informative)
Well from my experience, breaking up the questionaire into seperate tables does accomplish making maintenance of the app simpler. Which depending on whether or not you want to be stuck supporting the same dumb app for the rest of your life or not, is important.
Now you auto generate your questionaire based on a Questions table... and as an added bonus when someone decides they really need to ask Question #101, they can input it through a mainten
denormalized questionnaire (Score:3, Insightful)
Re:Better NULL handling? (Score:2)
Yes, you can come up with a way that you can organize data and tables so that this problem is easier to solve, but there's certainly a way that the data and the language can be made so that you don't have to fiddle with that sort of crap.
And that said, someone may have an instance where, for whatever reason, the data absolutely should be laid out in a table that large. They shouldn't be punished for using the system in th
Re:Better NULL handling? (Score:2)
An answer has to point to a specific question and makes no sense with just an answerid and a userid, so you should have the questionid in the answers table, it's a nonsense to asssociate them in a
Re:Better NULL handling? (Score:2)
Re:Better NULL handling? (Score:5, Insightful)
Doesn't make it right tho... If you are given a clean sheet to design a system as the grandparent post suggested (questionare) and you put the results in one table with a row for each question, then you deserve to work with fucked-up systems for the rest of your life.
Re:Better NULL handling? (Score:2)
Don't be so pompus. Recently I worked with a database containing all the census results for New Zealand. Ok, its not as large as the US, but its still not trivial.
The DBA who built & loaded the database from the raw CSV data we got from the department of stats really knew his shit. The database was structured with a table containing the questions, and a table containing the unit mea
Re:Better NULL handling? (Score:4, Insightful)
How can you make a request to find which user responded to more than 3 questions, for example ? you'd have to test each column individually, and then you'd have to modify the request if you add or remove some columns. Furthermore, you have to change the schema itself each time you add or remove a question.
And finally, you don't use the relationnal part of the database, at all. Your schema supposes that there is a question table, with a questionid. And in your anwsers table, you have columns named after the ids, right ? Now how can the database check if you delete a question that there are no answers pointing to it ? it can't, because you're using the database as if it was a flat file. So the database can't check data integrity for you, because your schema isn't normalized at all. as Johnno74 says, you need at least a question table, a response table, and an user/question/response association table. If you don't do that, the database has zero advantages over a flat file.
Re:Better NULL handling? (Score:2)
Re:Better NULL handling? (Score:2)
Re:Better NULL handling? (Score:3, Informative)
And you're completely misunderstanding what they're saying.
I'm talking about a list of the questions, and whether or not they are completed on a main page. Simple as that.
If you're just displaying a list of questions that's not gonna change, why are you using SQL at all? Why are you not simply using a flat text file?
my way is quick and to the point
And wrong.
Who needs 100 columns? (Score:2)
Do you think most SQL queries are executed by people who created the database in the first place, or even have any control over how it is designed? Don't be silly. It's the usual problem with software. Computers are so flexible, and commercial software is so inflexible, that a huge amount of energy is expended trying to get data from point A to point B when
Re:Better NULL handling? (Score:2)
I will tell you one case. In BILLING! As a business rule we have to have a record of exactly what was billed and this can be handled in one of two ways or both. The first way is to write a record that has everything you could possibly need to bill (LOTS OF FIELDS). Lots of redundent data, yes, but the data is easy to get to when needed.
What's wrong with something like Order Number, Quantity, Part Number, Description, Price, Discount? That's 6 columns, 2 which would be copies from your global part list
ugh... (Score:5, Insightful)
You know, I totally understand why Fabian Pascal [dbdebunk.com] is always pissed off.
Here's something for you to chew on:
Why do programmers write this:
A + 3
when they want to add 3 to A? Why do we not write some lovely crap like:
OPERATE ON A WITH 3 USING ADDITION
why do we write:
(A + 3) * 2
and not:
OPERATE ON (OPERATE ON A WITH 3 USING ADDITION) WITH 2 USING MULTIPLICATION
Why do we do that?? Because algebraic notation is 1) declarative
Do you agree with me that the verbose syntax clouds your thinking? Keeps you from seeing the underlying operations? Makes it difficult to apply the basic algebraic skills you learned in high school? Makes it difficult for the compiler writer to do the same?
Now I ask you, why do we write:
SELECT * FROM Order
and not
Order
Why do we write:
SELECT * FROM Order JOIN OrderItem WHERE Order.order_id = OrderItem.order_id
and not
Order JOIN OrderItem
And here you are, dwelling on some little detail about projecting columns.. this is an easy one: use an "ALL BUT" operator for example:
RelvarWith100Attributes ALL BUT (Attribute100)
Once you see that relational algebra is just values, variables, and operators nested in any arbitrary way, just like arithmetic, you have opened the door a little more to understanding the fundamental theory of data management and how backwards and primitive "modern" data management is.
And let's not even get into all the crap that SQL gives us like duplicate rows, NULLs, brain-dead table-oriented storage, lack of 100% updateable views, lack of arbitrary constraints, (often) lack of composite types (why the hell do we splat objects into MULTIPLE COLUMNS?? They should be stored in ONE column). SQL also confuses logical and physical layers (keys vs. indexes), and has basically kept the database industry in the dark ages for decades now.
So the answer to your question is pretty simple: I would ditch SQL and use something that looks like relational algebra, which has been understood and documented for a probably longer than you've been *alive*. No offense.
Re:ugh... (Score:2)
Re:ugh... (Score:2)
> months. Do you *really* think it's impossible to change SQL to
> something else?
Looks that way. How long has Codd been prophesying from the wilderness?
Has anyone ever implemented one of these relational calculus languages?
Re:ugh... (Score:2)
The thing with the buzzword of the month is- after its month is up, it dies. The stuff that works sticks around for a long, long time.
Re:ugh... (Score:5, Insightful)
Why do you assume Order and OrderItem will be joined on order_id? They don't have to, you know. So you have a "Order JOIN OrderItem ON order_id" format. Except, why do you assume both columns will have the same name? So you now have a "Order JOIN OrderItem ON order_id AND order_id_submitted" format.
And before long, you've got an equally baroque language for describing a query as SQL is now. SQL's point is to have something reasonably human-readable. It's amazingly flexible, and easy for beginners to pick up with simple queries. Your New and Improved query language had better be leaps and bounds ahead, not just simpler to type.
(Especially since probably 90% of queries are written only once, and stored for future use in a script or as a stored procedure. Building a query can be arduous for complex data, but it doesn't have to be typed every time you use it.)
Re:ugh... (Score:2)
Re:ugh... (Score:2)
Re:ugh... (Score:2)
Re:ugh... (Score:2)
Winning the special olumpics and debating an AC... (Score:5, Insightful)
SELECT
Lease.LeaseNum,
Lease.LesseeNum,
Invoice.InvoiceNum,
Invoice.AmountBilled
FROM
Lease INNER JOIN
Invoice ON
Lease.LeaseNum = Invoice.InvoiceNum
WHERE
Lease.LeaseNum = "1234"
ORDER BY
LeaseNum, InvoiceNum
Okay, that's pretty big to get some basic lease and invoice info. Now how you you write that?
Lease.LeaseNum,
Lease.LesseeNum,
Invoi
Invoice.AmountBilled
Lease JOIN
Invoice ON
Lease.LeaseNum AND Invoice.LeaseNum
Lease.LeaseNum = "1234"
Lease.LeaseNum
Invoice.InvoiceNum
??? All that's been accomplished is the removal of key words. I'm not seeing any benefit, and I'm seeing the pitfall of it being hard as hell to read.
-Rick
Re:Winning the special olumpics and debating an AC (Score:3, Interesting)
((Lease JOIN Invoice) WHERE LeaseNum = "1234")
[LeaseNum, LesseeNum, InvoiceNum, AmountBilled]
ORDER BY whatever
Why do I put the column names at the end? Because a projection operation applies to a *single relational result*, not to individual tables.
Why do I not qualify every column name? Because relational attributes *must* always be unique and unambiguous.
Why do I lea
dot-qualified column names are unique (Score:2)
Result, C naming syndrome.
LeaseNumber and Lease.Number are functionally equivalent, they're both unique, however the latter lends itself to convenient simplification in contexts where it wouldn't create ambiguity. Otherwise you end up repeating yourself a lot. LongTableNamePrefixFoo, LongTableNamePrefixBar, LongTableNamePrefixBaz etc.
Re:dot-qualified column names are unique (Score:2)
Re:ugh... (Score:3, Funny)
Now start inserting and updating.
maybe joining.
yeah -- sql is stupid, sure. But you're proposal here really isn't any brighter.
sort by... (Score:2)
Re:sort by... (Score:2)
easily achieved in SQL by using SELECT DISTINCT instead of SELECT.
i guess what you really mean is "group by everything which is not explicitly in some form of aggregation in the select statement" which is a more special case, ironically one that fits exactly into the "* except x" pattern of the original poster.
Re:sort by... (Score:2)
select fielda, fieldb, fieldc, fieldd, count(*) from foo where status=1 group by fielda, fieldb, fieldc, field;
Just a personal gripe I know, but it seems redundant to tell it what to group by when 99% of the time it's the same order as the fields are selected by...
Re:What's the 1%? (Score:3, Insightful)
This is non-optional. Something is either an aggregate expression or it isn't, so why are we expected to explain this in the statement? Probably because aggregates aren't 'first class' in terms of SQL. Aggregates are functions.
Can you give an example where it would be something else? I too would like GROUP BY DEFAULT or somesuch.
This is my idea:
SELECT a, b, c
AGGREGATE x, y, z
FROM foo
GROUP BY vanishes. Non-aggregates expressions are val
Standard stored procedure/trigger language (Score:2)
It would be great if there was a common language for store procedures. The RDBMS may support many languages, but that scripting language would be available on most platforms since it was part of the SQL standard.
Another feature missing is optimization hints. Oracle uses special S
Re:Standard stored procedure/trigger language (Score:2)
If your database does not support SQL/PSM and is not working on implementing it you have only your choice of database to blame for that.
Replace it by Lisp (Score:3, Interesting)
SQL is only 1/2 the story (Score:3, Interesting)
Dates (Score:4, Interesting)
2. Allow for SELECT statement reordering. I should be able to have the FROM first. This would be a BIG help to SQL editors!
3. Column aliases. So if I have a column in the select that is ColA+ColB as "My Value", I can use the "My Value" in the WHERE, GROUP BY, and ORDER BY instead of having to restate the equation every time.
UPDATE, INSERT look different (Score:3, Informative)
Fundamentally, though, what we need is much better interfaces to our applications. Having to convert C++ data into some format appropriate for SQL and back again is a pain in the behind. Every existing interface between the two that I've seen is crap. Heck, most of the C++ interfaces don't even let you use the std::string type. What are we, back in 1995? Forget vectors and maps, or the fancy boost multiindexed templates. Anyway, these really aren't problems for SQL to solve. And yes, there are object-oriented databases and the like. But relational databases are still pretty much universal. I just mention it because it is a pain in the behind.
Re:UPDATE, INSERT look different (Score:2)
Infuriating (Score:4, Insightful)
Out of all the annoying issues, I've pulled out the most hair over unique id's, and INSERT vs UPDATE.
Most SQL implementations give you some way of assigning unique id's to newly INSERT'ed rows. It would be nice if there were a standardized way, but that's a side issue. Once rows have unique id's, you can identify rows to be updated by id. This is very fast and simple.
Except .. in order to find out what id the DBMS has assigned a row, I usually have to follow my INSERT with a SELECT, to read the id column. Slow and annoying. Sometimes the DBMS I am working with takes a few seconds to perform the INSERT, and ten minutes to perform the SELECT. That takes it beyond an optimization issue, and into a workability issue.
Also, if I do not yet know if a data record has been INSERT'ed, and I need to either UPDATE the existing record or INSERT a new one (say, with just a new timestamp), then I need to either attempt an UPDATE and then fall back on INSERT if the UPDATE fails (ew!) or attempt a SELECT and either INSERT or UPDATE depending on whether it returned any rows (ew!).
If SQL came up with a standardized way to associate unique id's with newly INSERT'ed rows, it would be very, very nice if the id column(s) assigned were returned to the client in the same packet as the message confirming that the INSERT succeeded. Nearly zero additional overhead, neat, fast, and easy.
To solve the UPDATE/INSERT issue, I'm less sure. Say, for instance, that I have a daemon which periodically scans the filesystems in a cluster of machines, and it wants to UPDATE the "exists" column of a given row identified by a ( hostname, mountpoint, path, filename) tuple with the current time, if that row already exists, or INSERT a whole new row for that file if it does not exist. Perhaps there could be a "WRITE" command which is just like INSERT but overwrites a row if it already exists? That seems like the wrong solution, too. In the meantime, I play with caches of hashes to unique id's and lose more hair.
-- TTK
Re:Infuriating (Score:3, Informative)
For other DBs that can use triggers and sequence generators (Oracle, PostGres, Interbase/Firebird), it can be a bit easier, as these DBs have ways to query the sequence generator for its current or next value (as long as you're still in the same transacti
Re:Infuriating (Score:2)
For Postgres, this is patently false. [postgresql.org] You can get the value of a sequence (serial) either before or after your INSERT.
Re:Infuriating (Score:2)
however if you do some locking and synchronization, it can be used
>begin
>allocate_id -> x
>insert (x,'fishy')
>commit
if commit won't come, the x value will be released again, if a another process between this processes allocate&insert comes by, it just gets
Yeah, INSERT/UPDATE sucks (Score:4, Insightful)
INSERT INTO someTable (fld1,fld2) VALUES ("foo","bar")
UPDATE someTable SET fld1=foo, fld2=bar
It is REALLY annoying when you have to write some code that generates a SQL statement because you must code for two completely different syntaxes. Someone replied about the Oracle MERGE which seems like a nice way to go.
Fortunately though, there are lots of good frameworks around SQL that make it so that writing SQL is becoming a thing of the past. I would like to see SQL treated like HTTP - nobody writes HTTP. It's a protocol. Let it be. Just use the tools. I guess it will never go away though...
Insert / Update combination (Score:2)
Perhaps, instead of replacing SQL, let's just add a clause to the update statement (or insert statement):
insert into table tabname (field1, field2) values (1, 2) [ WITH UPDATE [ALL,FIRST,ADD,NONE] ON FIELD1=1 ];
Thus, if the row exists with field1=1, it will update that row to with value field2=2.
If there are multiple rows where field1=1, define either update all of them (uses 'ALL'), update the first one found (uses 'FIRST'), insert one more (uses 'ADD'), or updates none of them (uses 'NONE').
Alternately, y
Re:Infuriating (Score:2, Informative)
insert into tab (col1, col2, col3,
returning expr1, expr2,
into var1, var2,
or
update tab
set col1=v1, col2=v2, col3=v3,
returning expr1, expr2,
into var1, var2,
Re:Infuriating (Score:2)
Re:Infuriating (Score:2)
What database requires this? Every database I've ever used has an efficient, documented method of getting the automatically generated ID.
To solve the UPDATE/INSERT issue, I'm less sure.
MySQL has this:
INSERT INTO table (...) VALUES (...) ON DUPLICATE KEY UPDATE column=...
It also lets you use UPDATE syntax for INSERTs:
INSERT INTO table SET name=...
In g
some solutions (Score:2, Informative)
in MySQL, the statement REPLACE INTO will perform an update or an insert, depending on whether the primary key value exists in the table. It performs exactly like your WRITE command would.
in MySQL, you can perform a SELECT LAST_INSERT_ID() to get the last inserted value.
in MSSQL, use a SELECT @@IDENTITY to get the same. (check in the docs whether you need @@IDENTITY OR @SCOPE_IDENTITY or the third version, I always forget).
wild card/regexs (Score:2)
Disallow DELETE or UPDATE without WHERE (Score:2, Interesting)
The SELECT syntax is usually good enough for me, but the capabilities of most implementations leave much to be desired. They only support a small subset of what the syntax suggests.
Re:Disallow DELETE or UPDATE without WHERE (Score:2)
I agree wholeheartedly. My main beef with SQL is that it is optimised for doing the wrong thing in so many cases:
One thing I do more often than anything else operating o
mind reader (Score:4, Funny)
Join syntax stinks (Score:2)
I've never been a fan of the join syntax. 'Inner' and 'outer' and 'left' and 'right' and 'full' joins? Let's for gosh sakes use the simple set-theory math terms we all understand.
Instead of:
select A.f1, b.f1, b.f2 from A join B on A.f1 = b.f1
do:
Is this a test? (Score:2, Funny)
SELECT * FROM table WHERE column !="99";
Did I get the job? Or for that matter DID YOU?
Slightly offtopic (Score:2)
Object oriented (Score:2)
SELECT FROM base-class
and get back all the fields from sub-classes so that you can populate entire objects from them. Real object oriented databases let you do that.
There are extensions (Score:2, Interesting)
SELECT * FROM A INTERSECT SELECT * FROM A LIMIT 99;
or
SELECT * FROM A EXCEPT SELECT "B" FROM A;
Other engines do it differently. I think one of the best things about SQL is that it's a loose standard. You can easily choose the engine that works best for you... unless you are from the Cult of Microsoft (SQL Server). DB/2, Oracle, and even Sybase have very cool features that make queries much more powerful.
While SQL is hard to use at times (remembering d
Copying rows with slight modifications (Score:2)
So what I'm looki
syntax free (Score:2)
As we all know, unfettered application of this principle leads to some terrible languages (see BASIC, COBOL). SQL is one such language: every extension, every new functionality that looks "sql-like" (eg, not a user function) has to be integrated into the grammar. Furthermore, although perhaps not oft considered by SQL veterans, there are key words that, in a
Re:syntax free (Score:2)
C2 wiki on SQL overhauls and fixes (Score:2)
http://www.c2.com/cgi/wiki?SqlFlaws [c2.com]
http://www.c2.com/cgi/wiki?TqlRoadmap [c2.com]
insert vs update (Score:2)
e.g. (field1,field2,field3) values ('1','2','blah') vs field1='1', field2='2', field3='blah'. Kinda dumb.
Plus why wasn't there a command to "merge/replace/put" rows into a table much earlier?
SQL's badly designed, but looks like we'll have to live with it for decades to come.
Standardization (Score:3, Insightful)
Every vendor seems to have their own ways to define (or arbitrarily break standard) date functions (add/compare/convert/get current timestamp), string manipulation (like uppercasing, substrings and concatenation), getting the generated id from an inserted row (identity/serial/auto_increment), limiting the number of rows returned (TOP or FIRST?), getting a subset of rows (ie a standard way to get rows 100-150 that works with most DBs) or even getting a list of tables or viewing the schema.
Trying to make an app portable across DBs is next to impossible, and that's not even counting stored procedures or different behaviours for the same syntax (like NULL handling across the various functions). This is very irritating and should have been fixed long ago. Instead, we get this crap which makes the differences we see across different web browsers look like child's play.
Re:Drop it for something relational (Score:2)
Re:Drop it for something relational (Score:5, Insightful)
1. SQL syntax doesn't look like relational algebra (see my long rant above). This clouds thinking and hides the simplicity of the underlying model.
2. Relations are sets. SQL allows duplicate rows, so its tables aren't sets, and therefore aren't relations. (This property alone is enough to make it "not relational" by the way).
3. Relation *attributes* (the column names) are also sets. SQL allows columns with the SAME NAME in a query result!!
4. SQL has no "table equality" operator. You'd think the first operator you'd implement for a data type, especially a fundamental data type, would be equality! Imagine a programming language with no integer equality for instance.
5. Relations require each attribute to be drawn from a single type or domain. SQL allows NULLs, which are values not drawn from the column's type. And SQL gives you very little to help you work without NULLs. To add insult to injury, the default for columns is NULLable.
6. (related) Relational algebra requires boolean logic. SQL uses three-valued logic because of NULLs. And it uses it *inconsistently*.
7. The relational model does not specify a type system, it just requires one. Yet SQL specifies it's own particular type system (integers, chars, etc). What if you want to store XML or audio in one of your columns?
8. The relational model specifies nothing about physical implementation. Yet, almost every SQL product stores the columns of tables "together" in such a way that makes joins needlessly expensive.
9. SQL distinguishes between "base tables" and "views". The relational model requires them to be indistinguishable to the end user. Specifically, most SQL implementations don't let you update views! Pretty unbelievable. Imagine a programming language that didn't let you pass arguments to any function for instance.
10. SQL lets you do meaningless things like multiply the primary key values of two tables or add a weight to a height. This is related to the type system issues.
11. SQL confuses KEYs (logical) with INDEXes (physical implementation).
12. (This one gets me all the time) SQL has an EXISTS operator (is this statement true for at least one value of this result?) but not a FOREACH operator (is this statement true for all values in this result?)
13. SQL implementations don't have ANY brains whatsoever. They don't know that book_id from column A and book_id from column B are equal in a join, and that you don't need both of them in the query result. They don't "look inside" your CHECKs and foreign keys to deduce information about your database and use that information to optimize queries.
I'm sure if you picked up a basic theory book you'd find plenty of other nitpicks for the syntax, the semantics, and the basic underlying model of SQL.
And these aren't just "theoretical" problems, I run into them every day because I know there's something "more" out there. Here's a simple query you should try to do in one line of SQL: "give me a list of all customers who bought every product in product line X". Someone who knows relational theory just thinks up the solution (you just need to create a list P of all products in product line X, and pull out the list of orders where P is a subset of the order items, then join with the list of customers). Someone who only knows SQL will immediately run for the application layer, where you can't just *declare* your problem and have the app solve it, you literally have to write loops and procedural code to solve the problem.
If you are interested in learning more, get Date's O'Reilly book "Database in Depth". It's very short, roughly 200 pages, and tells you all you need to know about data management theory.
Bullshit, at least partial (Score:2)
Re:Drop it for something relational (Score:2)
SQL tables aren't relations, SQL data types are way too limited, NULLs are broken, classes are equated to tables, there are pointers... the ISO SQL standards themselves have long given up on even using relational terms.
Re:Drop it for something relational (Score:2)
I agree. One of the great (potential) advantages of relational data bases (unlike previous horrors like IMS -- shudder) is that they had a theoretical basis in relational algebra. SQL loses some of that advantage.
Chris Date has written a lot about the deficiencies of SQL (e.g., how joins work). Check out his book Database in Depth, published by O'Reilly.
Re:Drop it for something relational (Score:5, Insightful)
Computings needs for well structured access and manipulation of large data sets has been well served by SQL.
A clear replacement has yet to emerge. There are pretenders to the throne, of which Tutorial D is certainly technically nice, XQuery is a mess and ODBMSs (and their query tools) really haven't caught on.
Its just that SQL passes a simple test - its good enough for the job and relatively ubiquitous. And standards do exist [wiscorp.com] (that every major vendor breaks. sigh.).
Re:Drop it for something relational (Score:3, Informative)
Not for those who haven't learnt data fundamentals.
By defining and manipulating relations.
It is not. Lisp is, Scheme, Haskell, not SQL. Never was, nor intended to be.
SQL tables are not relations. The very words relation and relational have been dropped from the ISO SQL standards since 1999 at least.
Re:Be practical (Score:5, Informative)
If you really can't stand to write queries containing the actual column names, you should be using some type of abstraction layer in whatever language you're writing your code in.
If you're not writing code and just making queries by hand to test the results, then you're even further off your rocker. (this also applies in general to the statement you made) Why would you ever NOT want to select that last value out of 100? is it going to keep your output from wrapping? (lol)
Also, those of you saying that you should never have 100 columns in your table, you're certifiable lunatics as well. If you have 100 columns that are used in every record and have very little or no duplication per row, there is no reason you should break this up into multiple tables!!! Then the database has to do joins, which again require more processing power and disk usage. It's also hard to maintain multiple tables when you really have one table after you normalize it.
For those of you that say this isn't normalized... I'm not even really sure how to answer that.... If you have several tables all with a strict 1:1 relationship, they should be in ONE table. Anything else is considered denormalized, not yet normalized. (aside from being just plain BAD!)
For those of you that say you'd never need that many columns in one table or split across multiple tables, however you'd like to think the world should work. I have an example of just that. My wife does genetic research, primarily statistical analysis of sequence data (in various forms, but that's the easiest way to sum it up). We've had discussions on this particular topic, where she had been told by someone else that she would get better performance in Oracle if she split her one table into several tables containing a smaller number of columns, each.
This is just simply not true. It also is a perfect example of a situation where you would actually need a large number of columns. There were specific bits of data that needed to be looked up quickly (like, 45'ish). You can't store it all in one column (or even just a few) and use regexes to find the bits you're looking for. You also don't want to be doing a lot of joins unless you really need to, you know.. when you actually have data that would fit into some form of normalization. Technically, you CAN do this stuff, but not if you want decent performance. If you didn't want decent performance, you could just leave the data in a text file and shell out a grep command. *sigh*
Anyway, enough ranting, but seriously people... Get a clue. Get some experience with these issues. Don't just pipe up because "hey, I've worked with databases and while I probably don't understand them very well, I don't know anybody else that understands them at all, so I'm kind of an expert!"
Re:Better handling of relationships between tables (Score:2)
logins had an id and a name
projects has, among other things, submit by and coded by, both of which reference the logins table.
So if you do default joining, which field should it join by?
If you limit the default join to when there's one and only one reference to a given table, then existing code may break once you add a new reference.
And lastly, what if you really want a cartesian join? Granted I've needed this
Re:Java is your friend (Score:2)
term "data persistence" shows ignorance (Score:3)
Re:Native Queries / DLING (Score:2, Funny)
This sounds like the fuel for a massive OO-vs-Relational and/or Dynamic-vs-Static typing holy war geek battle. Been in them and seen them rage for 1000+ messages. Batten down the hatches and hide the women and children.
Re:Native Queries / DLING (Score:2)