Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming Technology

How Would You Improve SQL? 271

theodp asks: "It was the best of languages, it was the worst of languages. SQL's handy, but it can also drive you nuts. For example, if you want all 100 columns from a table, 'SELECT *' works quite nicely. However, if you want all but 1 of the 100 columns, be prepared to spell out 99 column names. Wouldn't it not make sense to provide a Google-like shorthand notation like 'SELECT * -ColumnName' (or DROP=syntax like SAS)? So how would you improve SQL?"
This discussion has been archived. No new comments can be posted.

How Would You Improve SQL?

Comments Filter:
  • Indexes (Score:2, Interesting)

    by diamondmagic ( 877411 ) on Thursday November 03, 2005 @06:30PM (#13945573) Homepage
    Right now there is no standard way of maintaining indexes. Most databases have some sort of CREATE INDEX query, but it is by no means standardized.
  • Hierarchical queries (Score:5, Interesting)

    by Bogtha ( 906264 ) on Thursday November 03, 2005 @06:38PM (#13945641)

    Extremely useful when you need to produce a result tree instead of a result list (e.g. Slashdot's nested comments). Oracle does this with "CONNECT BY", there is also a PostgreSQL patch [moonbone.ru] available. Of course there are hacks to do this, such as adding extra fields to keep track of where you are in the tree, but they are a real pain in the arse compared with using the information that's already present in the database.

  • by Otter ( 3800 ) on Thursday November 03, 2005 @07:05PM (#13945896) Journal
    I don't work with databases, just with enormous flat text and SAS files spit out by them, but -- isn't the proper way to do that with one table with user, question# (1-100), and response (0,1) and another with question# and question?

    Your method may be technically correct (I could never get those normal things straight) but as the GP points out, it's an unwieldy, inflexible way to do it.

  • Replace it by Lisp (Score:3, Interesting)

    by RAMMS+EIN ( 578166 ) on Thursday November 03, 2005 @07:06PM (#13945908) Homepage Journal
    I'd replace it by a special-purpose Lisp, and compose it like s-expressions. Mix and match query elements in a flexible manner, yet never run the risk of injections, because it all happens in a structured way. I've done things like this on a small scale (contact information database), and it works really nicely.
  • by klui ( 457783 ) on Thursday November 03, 2005 @07:08PM (#13945930)
    No standard way to extract/load data. No standard way to get all tables in a database. Basically DDL is entirely separate and each type of database has its own way of doing things. Let's not talk about embedded SQL and optimizing queries (like Oracle hints.. *ugh*).
  • by hawkbug ( 94280 ) <psxNO@SPAMfimble.com> on Thursday November 03, 2005 @07:09PM (#13945939) Homepage
    You could easily do it the way you propose - the only thing is, you now have 100 rows where before you had 1 - which I'm not saying is the wrong way... but either way you're going to have 100 columns or 100 rows. Performance wise, I'm not sure which is better or if it matters.
  • by Johnno74 ( 252399 ) on Thursday November 03, 2005 @07:10PM (#13945952)
    I can think of a lot of reasons to have 100 columns, it simply depends on what you're working with - and it is perfectly normalized

    I've got mod points but I just have to give them up to reply to this....

    You're wrong. I agree totally with the grandparent post.
    You DON'T need 100 columns, ever. If you have that many columns you should be breaking the table up into several tables with 1:1 joins. Seriously. There will always be some fields that aren't needed. Make the rows smaller by putting commonly used fields in one table, rarely used rows in the other.

    And your example of a questionaire (1 row per answer, one column per question) is not even close to normalised. What happens if there is a new question? you have to alter the schema. what happens if some questions are not answered? you'd have to have nulls, and wasted space.

    A much better structure is to have a table of questions, and a table of responses (with something like a response id, and maybe an identifier on who answered the questionare) and a question-answer table with each row pointing at a response and a question, and giving the answer that person gave for that question.
  • by Anonymous Coward on Thursday November 03, 2005 @07:19PM (#13946019)
    I'm sorry to bitch but that's so so wrong. What happens when your questionaire has a question added?

    Something like:
    ALTER TABLE Quiz ADD COLUMN Answer101 BIT NULL

    So now you have to deal with all the previouslly submitted quizzes only having the first 100. Same for how to do you delete quizzes? Add another quiz? Now you've got to deal with stuff like [Quiz12].[Question50].

    How about this, sorry about errors, dinner's on the stove:

    CREATE TABLE Quiz
    {
    ID INT IDENTITY(1, 1)
    -- add any other attributes
    )

    CREATE TABLE Question
    (
    ID INT IDENTITY(1, 1),
    Text VARCHAR(1000)
    )

    CREATE TABLE QuizQuestion
    (
    QuizID INT, --add FK
    QuestionID INT, --add FK
    QuestionNumber INT
    )

    CREATE TABLE QuizResponse
    (
    ID INT IDENTITY(1, 1),
    QuizID INT, --add FK here
    Submitted VARCHAR(1000)
    -- add any other attributes
    )

    CREATE TABLE Answer
    (
    QuizResponseID INT, --add FK whatever here
    QuestionID INT, --add FK whatever here
    Answer VARCHAR(1000) --or BIT, INT, whatever
    )

    Now you can have as many quizzes, questions, or answers as you need. Yes, it's a little more complicated and odds are I missed a major component of what's needed, but you should see the flexibility that this solution would offer over:

    CREATE TABLE Quiz1
    {
    Answer1 BIT,
    Answer2 BIT,
    Answer3 BIT, .....
    AnswerXX BIT
    }

    Sorry to ramble, but I've just spent a great deal of time fixing issues exactly like what you describe.

    In conclusion, no. Bad. Please don't do any database work.
  • Dates (Score:4, Interesting)

    by omibus ( 116064 ) on Thursday November 03, 2005 @07:30PM (#13946122) Homepage Journal
    1. Standard date functions and handling.
    2. Allow for SELECT statement reordering. I should be able to have the FROM first. This would be a BIG help to SQL editors!
    3. Column aliases. So if I have a column in the select that is ColA+ColB as "My Value", I can use the "My Value" in the WHERE, GROUP BY, and ORDER BY instead of having to restate the equation every time.
  • by boxxa ( 925862 ) on Thursday November 03, 2005 @07:42PM (#13946201) Homepage
    I think its better performance wise to have more rows than columns. I develop a lot of PHP/MySQL applications, comerically and OSS and I find that using multiple small tables with more rows seems to load and run smoother than tables with a large number of columns. Maybe something with the indexing but I have no actual proof of what works better. On the original topic, I believe that MySQL has its downfalls compared to other database systems but for me personally, I find that I can work around the problems with it to build highly deployable PHP/MySQL web applications very cheaply. Hey, when its free, I can deal with it.
  • by Naikrovek ( 667 ) <jjohnson@ps g . com> on Thursday November 03, 2005 @08:23PM (#13946513)
    if you put 0's in unanswered fields, then you'll have an awful lot of zeros, and since you're using bits, that's a lot of 'no' answers, which isn't what's really there.

    any DBA worth his weight in salt knows that you ever see a single table with 100 columns then you have a major design issue.

    splitting that up into several tables in the same database will offer a significant performance increase.
  • Java is your friend (Score:1, Interesting)

    by lucm ( 889690 ) on Thursday November 03, 2005 @08:28PM (#13946549)
    In the world of J2EE you can use container-managed persistence on entity beans (a flavor of EJB). Such object will handle the interaction with the database for you so you don't have to worry about SQL. And it will also address transaction issues so no need to fiddle with manual rollbacks.

    With CMP, instead of fooling around with SQL queries you deal with business objects. Why bother with "update inventory" or "insert into cashRegister" when you can call methods on objects, like inventoryItem.substract() or cashRegister.feed() ? This provides you with a good layer between the application and the database, so whenever one has to change you don't have to mess with both. All you got to do to have all this magic is to keep some XML configuration up to date, and if you have the big bucks you can even get JBuilder to do it for you!

    SQL is sooo yesterday. Stop wasting time on technical issues: add business value to your applications instead. And with JBoss you don't even have to pay for a good EJB container. (Of course you'll need a good CPU and a little more RAM, but hey, magic has a price!).

  • by Anonymous Coward on Thursday November 03, 2005 @08:38PM (#13946608)
    You're missing the point. You really have to study the theory, and you'd get something like this (the exact syntax is unimportant of course):

    ((Lease JOIN Invoice) WHERE LeaseNum = "1234")
    [LeaseNum, LesseeNum, InvoiceNum, AmountBilled]
    ORDER BY whatever

    Why do I put the column names at the end? Because a projection operation applies to a *single relational result*, not to individual tables.

    Why do I not qualify every column name? Because relational attributes *must* always be unique and unambiguous.

    Why do I leave off the Theta from the join (the equality test)? Because I *already* set the foreign keys on those tables. The DBMS should be able to *deduce* which columns to join on and generate an error if it can't.

    Here's what's happening:

    I used the JOIN operator on two base relation values, stored in variables with the names Lease and Invoice. I got a third (anonymous) relation with all of the combined columns.

    I then applied the RESTRICT ("where") operator on that relation, along with a boolean expression, and I got a fourth relation with just the rows where that expression was true.

    I then PROJECTED that relation to get a fifth relation with just the desired columns.

    Note that ORDER BY is a non-relational operator. It turns the relational result (unordered set) into an an ordered array. So for a final step, the relation was 1) turned into a regular array and 2) ordered.

    Of course I didn't actually perform those steps. I told the DBMS what I wanted, and *it* did the work.

    Do you see how the SQL conceals the underlying algebra? And how it makes YOU do the work (in the join for instance)? You might not see it. Study the theory more, and you will. Compare with the A+3 example. Imagine a 12-table join. Imagine having to do a query like "where all rows of table X are a subset of all rows in table Y" right in the middle. An algebraic notation would make this MUCH easier. Just break it down, and apply the next operator to the result of the last. You do it all the time when programming your favorite language, why not in SQL?

    And yes, the differences get MUCH deeper than this. SQL can't even represent all possible relational queries!
  • by dtfinch ( 661405 ) * on Thursday November 03, 2005 @08:58PM (#13946733) Journal
    I feel lucky to have not been burned by this, but you can bet a lot of people have. If you really want to delete everything, you should have to type something like "delete from table where true" or "truncate table".

    The SELECT syntax is usually good enough for me, but the capabilities of most implementations leave much to be desired. They only support a small subset of what the syntax suggests.
  • by Anonymous Coward on Thursday November 03, 2005 @09:42PM (#13946985)
    1) Relation values can be anonymous, just like "3" is anonymous when it isn't assigned to a variable. What's the table name in that case? In other words, what does this mean:

    SELECT 1 AS A, 2 AS A, 3 AS A

    The fact that SQL allows this is a flaw. You have to constantly think about the names of your columns, even when they are hidden under "SELECT *".

    2) Your point would be fine if SQL consistently used qualified table names, but it doesn't. It doesn't *return* them in a qualified state for example.

    3) What qualifier should be used in a derived table (a "view").. the name of the view or the original table? If it uses the original table, that violates the requirement that views be indistinguishable from tables. If it uses the view, then the "full name" of the column has changed from what you specified in the query, and you still can't put two columns with the same "unqualified" name in the same view.

    4) What name should be used in a self-join?

    5) Relational theory just doesn't allow multiple attributes with the same name in a single relation, because there's no way to to tell them apart (relation headers are UNORDERED SETS, not arrays).

    The syntax of what SQL does isn't as important, it's the broken inconsistent semantics.
  • There are extensions (Score:2, Interesting)

    by pestilence669 ( 823950 ) on Thursday November 03, 2005 @11:32PM (#13947530)
    Most SQL dialects include some sort of exclusion operator.

    SELECT * FROM A INTERSECT SELECT * FROM A LIMIT 99;

    or

    SELECT * FROM A EXCEPT SELECT "B" FROM A;

    Other engines do it differently. I think one of the best things about SQL is that it's a loose standard. You can easily choose the engine that works best for you... unless you are from the Cult of Microsoft (SQL Server). DB/2, Oracle, and even Sybase have very cool features that make queries much more powerful.

    While SQL is hard to use at times (remembering double outer joins), it's that way for a reason. You don't want to be as easy to use as VB, for instance. Being forced to think in terms of lists and cartesean products forces you to think about speed and abstraction.

    SQL is as easy as it should be, IMO. Specializing the access modifiers will only add to the complexity and make query optimization an impossibility. If you don't care about speed, then your needs probably aren't serious enough for a full blown SQL RDBMS. Text, XML, or even MS Access could be better suited.

    Complaining about SQL is like complaining about Linear Algebra. These systems exist for exceptionally good reason. They are constrained to reduce or eliminate unsolvable situations.
  • by Anonymous Coward on Friday November 04, 2005 @12:18AM (#13947716)
    Prolog was designed to work with the relations that relational databases store and is thus the natural language for RDBMS. It's also easy to express hierarchical or network relationships in Prolog.

    Actually Prolog can do anything an RDBMS can do, but transactions (along with the ACID properties: atomicity, consistency, isolation, durability) are missing. It would be simplest to add transaction processing to existing Prolog implementations and kiss the current RDBMS/SQL implementations goodbye.

Understanding is always the understanding of a smaller problem in relation to a bigger problem. -- P.D. Ouspensky

Working...