Stored Procedures - Good or Bad? 629
superid asks: "I'd like to get opinions and real world experiences that people have had with database centric applications that rely extensively on stored procedures. I believe that most enterprise class databases such as Oracle, MS-SQL, PostgreSQL, DB2 and others implement stored procedures. MySQL has been criticized for not supporting stored procedures and will be adding them in MySQL 5. The ANSI-92 SQL Standard also requires implementing some form of stored procedure (section 4.17). So, I'm asking Slashdot readers: if you were architecting a highly data-centric web based application today from a clean slate, how much (if at all) would/should stored procedures factor into your design? Where are they indispensable and where do they get in the way?"
"The arguments for stored procedures are pretty straightforward: 1) Centralized code; 2) Compiled SQL is faster; 3) Enhanced security (as our application is over 15 years old, and consists of much legacy code, reimplementation and feature creep that now includes over 3000 stored procedures). At one time we had a client/server architecture so those three advantages were relevant. However, in the past 4 years we have moved everything to web front ends and I have argued that this is no longer true. Does it really matter if my business rules are centralized in stored procedures or in a set of php/asp scripts (ie, in the web tier)? Is it really important to shave compilation time when connection and execution times dominate? (and overall response is ok anyway?) Since the focal point is the webserver, shouldn't security be done there, rather than the DB?
In addition, you either have to have a dedicated T-SQL or PL/SQL coder who then is the weak link in your coding chain, or your pool of developers must become fluent in both your scripting language of choice as well as the SP language. I have experienced both of these approaches and found this to cause bottlenecks when 'the database guy' is unavailable and learning curve problems (bugs) with new coders getting familiar with the db language.
Finally, after staying with our DB engine choice for all these years we are acknowledging that they may not be around forever. Management has asked us to look into migrating our data and business logic to another DB choice. We'd sure love to just be able to point the web tier at a new data source but that is unattainable due to a convoluted tangle of db specific code."
Good or bad? (Score:4, Informative)
I only put triggers or constraints or whatever in the database for one reason: to make sure only valid data enters the database.
For instance if ColumnA must be between 5 and ColumnB+34, that should go in the database. The database itself should guarantee the data is clean. "Data logic" you could call it.
Business logic and everything else should go in the higher layers. There is some ambiguity about what is "data logic" and what is "Business logic" but it's usually pretty clear.
Why? Maintanence. The stored procedures tend to rust in place over time. If you're an "agile" developer you'll go nuts not being able to refactor business logic or have unit tests check your database procedures. If you're a "BDUF" (big design up front) shop, you might like it, but thankfully many are moving away from that.
Postgres supports (Score:3, Informative)
Just define the function (in java or python) and SELECT it with whatever arguments you've designed it for. I don't know what the overhead involved is, but we used it more for convenience than anything else.
I was told basically, "the fewer the better" and "keep them confined to the innermost loops."
Re:Good or bad? (Score:2, Informative)
Speed
Code Reuse - if everyone uses stored procedures, it's less likely that your developers will just write another (duplicate?) query when they don't want to go mucking around in all your php/asp classes/functions.
Bandwidth between the database server and your web server will be reduced - instead of passing huge queries across the link, you send a simple stored procedure call.
Plus, PL/SQL is really easy to use and learn and is relatively portable across at least PostgreSQL and Oracle.
They also provide a good facility for logging or tracking that can be (should be!) completely transparent to the web application.
Re:Two answers. (Score:5, Informative)
Because a person who knows what the hell they are doing can write a good function (or stored procedure) containing what the hell they know so that a person who doesn't know a hell of a lot about what the first person knows but a lot about something else on a higher level can interface with that logic and get a result without having to be a guru.
Or something like that...
As always, it depends (Score:4, Informative)
In addition, depending on your situation, it shouldn't be too hard for the developers to learn how to write stored procedures for the database. Once you know one language, learning another isn't that hard. The developers might write some inefficient code at first, but they'll get better very quickly. Plus, it will give them a better idea of how the database really works and performs, improving their overall designs.
Of course, IANADBA (I am not a DBA), so take it with a grain of salt.
An example of good (Score:2, Informative)
This involved of course the creation of a schema that made use of referential integrity and stored procedures for certain key operations that would enforce data integrity that the code required but fell outside of relational databases proper.
As I was completing the source code conversion I noticed that a lot of the data checks that had to be done under MySQL 3.x disappeared as PostgreSQL enforced it for me.
The creation of users, and other entities became much simpler as did their removal. Cleaning up the code and making it easier for me to make modifications to the scripts, without having to second guess another script having adverse effects.
The scripts themselves still handle logic, albeit at a higher level. The process of using stored procedures to handle data integrity and enforcing certain rules simply allowed me to concentrate on the bigger picture when dealing with scripts.
Of course, the trade off was in speed. MySQL to this day, still seemed to be capable of handling more loads since the site is dominant on SELECT statements. However that is more of an issue between PostgreSQL and MySQL proper.
I like stored procedures (Score:5, Informative)
You have to be extra careful with dynamic SQL due to SQL injection bugs that we all know about. This isn't really an issue when you're dealing with stored procedures that take defined data types as opposed to creating SQL on the fly based upon your data (which could have injected SQL).
Controlling which DB accounts can use what stored procedures is also handy mechanism for determining permissions. Stored procedures represent what all your application might do, so picking which DB connections (which have credentials) can access these is a nice place to control those permissions.
Granted, you can still do lots of stupid things to mess up security
Also, there are places where SP's are not really possible. Severely dynamic reports are a good example (assuming you allow that functionality in your application). There's definitely times when you HAVE to generate SQL on the fly. In any event, try to create a "data access layer" in your code and if you have to dynamically generate SQL, run all sorts of checks on it with regexp's etc to check for injection.
It depends (as always) (Score:3, Informative)
Of course you can also solve this using an additional tier (like an app server and use web services) and it could be easier to maintain, but if performance is too much of an issue, then you could go for SP's for some of the logic.
I don't think it has to be an all-or-nothing decision, though. You usually end up with some logic in the app code and sometimes some logic in SP's.
Re:I don't use em unless I have to (Score:0, Informative)
Re:Stored Procedures are a must (Score:3, Informative)
Like any good tool, it can be used for evil (Score:3, Informative)
triggers that keep history tables
triggers that keep referencial integrity that a foreign key simply can't do
stored procedures that loop through large amounts of data that only return a small amount back
A stored procedure can be an evil, evil thing:
implementing complicated algorithms that can't be debugged
creating HTML
I've seen both sides. It's like asking if javascript is a good thing for web browsers.
John, who gets a 503 when trying to log on
Distributed Computing (Score:1, Informative)
1. Nearly all the valid arguments in favor of stored procedures have lost validity as technology has shifted.
1a. Arguments like the one for compiled apps used to make sense. These days, apps tend to be n-tier web apps, so the compiled argument goes out the window.
1b. Argument like "stored procedures provide atomicity" were never valid. Atomicity is attained via transactions whether you use SP or not.
2. SP rely on a single implementation of a sometimes arcane language. Such implementations can almost never advance the way independent languages can. PL/SQL will not get the same advances as say, Python. Oracle's implementation of Java will not advance as well as Java in general. If you have a stored Java procedure, you're stuck with how Oracle does it. If you write an external procedure, you can choose the best JVM available from Sun, IBM, Oracle, or whoever.
3. SP centralize your code. Many people see this as an advantage. I see it as a disadvantage. Every single SP must be executed on your DB server/cluster. If I have a single puny P4 DB server behind 4 load-balanced app servers, I can distribute the load of the external procedure across the 4 app servers. I can load the DB with super fast disks and lots of RAM and modest CPU, then load the App servers with no disk, lots of CPU and decent RAM.
External procedures allow me significant flexibility in my architecture. They allow me to develop in the most effective language. They allow me to use collaborative tools more easily. They allow me to load balance in almost any way I can imagine.
Good, no question (Score:2, Informative)
I run web for a medium size telecom, and recently hired two people. I questioned all the applicants extensively about web security and just about none of them had anything remotely resembling a clue. Most of them listed web sites they'd worked on on their resumes, and more than half of these sites were vulnerable to both Form and URL SQL injection attacks, which are largely (in our case, completely) defeated with stored procedures.
(Even more of a good thing when the PHBs insist on being an MS shop...)
Your "Database" Guy Here (Score:2, Informative)
A few bullets
1. Application coders write suck performing SQL and running SQL dynamically makes performance problems worse
2. Forget your pipe dream about writing SQL one time and running it on all database platforms, the SQL is so generic and so inefficient that it inherently leads back to number 1
3. Its better to enforce data integrity in the database, coders who leave all their code in ASP and middle-tier objects don't put things like foreign keys on their tables, have flags flipped incorrectly, and wonder why there are orphan records all over the database. True story, during a conversion of another bank's records I found a customer hadn't been billed for approximately $70,000 worth of interest because of poor data integrity
4. Rolling out changes to stored procedures is much easier then pushing out changes to application code along with being easier to unit test
5. Stored procedures facilitate the use of transactions and set-based operations
I have found VERY few app coders (10+) that actually could write good, clean, efficient SQL code and design schemas to match. The problem lies when you're dealing with a large data-centric application, ALMOST ALWAYS YOUR PERFORMANCE ISSUE LIES IN THE SQL! If you have someone writing dynamic SQL looping through a 500,000 record resultset doing something like this (this comes from an application we purchased), your performance will be horrendous:
BEGIN TRAN
UPDATE IAR SET ARIBDATE = '7/30/2004' WHERE ARRECID = 38196.810543788582
COMMIT TRAN
Your application coders will write this sort of crap instead of using one set-based operation and you will all wonder why your application takes so long to complete an operation.
Your best bet, find two or three good SQL developers and have them do traces and find out how many of those 3000+ stored procedures written actually are getting used. My guess is a whole bunch of them are not being used or just were never cleaned off after their retirement. Next, go in and let them figure out how many of these procedures are one-off of one another, you probably take out a good 20% from there. Now you're getting towards a more manageable number. Very few people are good at database development. It is a unique skill set that can have a major impact on the maintainability and performance of your application. Stay with stored procedures and find some additional database developers if you need them. Let your app and web coders code what there good at, let database developers code what they are good at and you will have a wonderful application.
Re:I don't use em unless I have to (Score:5, Informative)
Especially if login information is used as a parameter (I always use username and password as parameters to look up access levels in all user-executable scripts for this reason), if the user cannot access a single table in your database, then they can only do what the stored procedures allow them to do. This allows you to have a central place where security logic is executed, and business logic may either be in the procedures (ideal), or in the app. Just remember, leaving business logic up to the app may pose a risk if the app has a flaw, if the procedures allow the user more access than the app does. If your database security model has more flaws than your app could, then maybe you should consider switching databases.
Re:Two answers. (Score:3, Informative)
Throw in Hibernate to get rid of the SQL-specifics of the database and reduce the OO-RDBMS nightmare.
I actually do more Java coding than database management (I've set it up so it's pretty much self managing); I wouldn't let it get all over the place, because I would be the person writing the session beans.
Avoid corporate-speak like the plague it is (Score:1, Informative)
A good read... (Score:3, Informative)
the myth of portability (Score:3, Informative)
You are just as likely to want to change your middle tier as your DB. More likely, in my opinion. If your business logic is in sprocs, then it's as easy to call from Java as it is from
Re:What's the point of this question? (Score:4, Informative)
Would you want to send 5 million rows to the application just to check a few fields in each of them, and how they relate to records before and after them? Hell no, sending 5 million rows uses a ton of bandwidth, even for small row sizes. Also, SQL is a language for set operations, while most (99%, Lisp/etc are other 1%) application languages are designed around a single value, not a set of values. For example:
To check if array a[50000] is a subset of array b[2000000], in most languages, you must somehow iterate through both arrays (sorting would help, if possible) and see if each value in a (all 50000 of them) also exists somewhere in the 2 million values for b.
In SQL, something like SELECT COUNT(*) FROM a WHERE a.key NOT IN (SELECT key FROM b) would do the same thing, only much faster than downloading 2,050,000 rows and comparing them. Each vendor handles internal set operations differently, but all of them optimize their internal data structures and abilities to do exactly these types of operations.
Don't make me get my soapbox! (Score:2, Informative)
When I came to my current job, my opinion was to keep as much of the logic in the application as possible, stored procedures should help make the developer's life easier, and make the database more robust (triggers mostly do this part).
When I came here, however, they forced EVERY SINGLE QUERY into a stored procedure, and like you said, it brought development speed to a halt, because every project was assigned "a database guy" to do the work, but there weren't enough database guys for the projects.
It was pathetic, and when our big layoff occurred (we got bought out), we found alot of really really bad stored procedures that the so-called experts made for us.
Worse, it makes code releases a nightmare, because not only does the codebase have to be updated, but the database too, every single time. Unlike code bases where you can checkout your latest release, with databases you have to a) find all the altered database scripts, b) package them up for the release, c) halt the database for the release and d) run the script at go-live and hope you didn't miss one more thing.
Mind you I still use stored procedures, views and triggers where they make sense (when it's obvious that it just belongs in the database), but mostly I try to put as much logic as possible in the app.
Now you want to make me get into NULL values and candidate keys issues too! Bah, no NULLs anywhere, #1, and #2, don't make up integer primary keys when an existing field will work. Don't let your database administrators talk you into it, I beg you!
They can help a lot... (Score:2, Informative)
We decided to use stored procedures instead of inline SQL so that we could make modifications on the server instead of having to change the code in the application, recompile it, repackage it and deploy it to over 100 PCs.
The drawback is that it's not always easy to know what's going on because the application is broken into various parts. That is there are business rules being enforced in the PowerBuilder code and in the stored procedures on the server. I guess once you get to know the application really well it doesn't matter but it can make it a real pain in the ass for a new programmer.
Re:Stored Procedures are a must (Score:2, Informative)
Re:Good. (Score:4, Informative)
Stored procedures are great when you have complex relationships between data than can't be defined by simple constraints. Any time you need to apply procedural logic to ensure data integrity, you need a stored procedure.
Taking the aggregate information and dealing with it is the job of the business logic, which is the middle tier. But the middle tier should be unable to insert data that is incorrect, no matter how poorly written the middle tier is.
Re:Why stored procedures are bad. (Score:2, Informative)
Re:My take (Score:3, Informative)
This depends solely on your own discipline. I've seen quite a bit of horrible T-SQL code, but in 90% of the cases the worst part about it was not the logic. That could be followed, somewhat. It was the formatting. In the same way that you wouldn't write C code all on a single line (even though you could), you shouldn't do that with SQL code, either. And yet, it still happens all the time. Which is easier to read:
or: I know which one I'd rather read, and which one will more quickly make sense to me. And yet, most programmers still write their SQL code as the former, and not the latter. Nasty, nasty "programmers"!That's a deficiency of your build process, then. At work, we have our build process setup to "compile" (ie, load objects and stored procedures into a database) T-SQL code at build-time, helping us catch errors long before we get to runtime. It's not perfect, but it's quite a bit better than waiting until runtime to find our your query is broken.
Big deal. Chance are, you're not going to be writing portable SQL code in the first place if you're doing anything more advanced than simple select, insert, update, or delete.
Re:I don't use em unless I have to (Score:3, Informative)
Well, you certainly know how to pitch the Oracle monolithic viewpoint on database architectures. It works, I did about 7 years on Oracle 5-7.2 before I ended up working with other RDBMS for a while. I'm not knocking your opinion, just saying that it's specific to Oracle.
DB/2 UDB used to have relatively poor stored proc performance, because they ran out-of-process. I think that's changed with the latest releases, but I'm not sure. Yet you could still get blazing performance for certain apps by using precompiler binding stubs instead of stored procs -- the stub is effectively a special case, one-statement stored proc. It doesn't save on network data transfer, but it makes a huge difference for individual statements. Plus it's an interesting approach to database security.
Sybase takes a bit of a hybrid approach. They have the high performance stored procs, but they also support statement caching and precompiling. All in all I'd say it's the most flexible database if you need to build a few different styles of data access into the same repository.
As someone mentioned earlier, none of the approaches is inherently good or bad. There are different ways of slicing your performance, and the decision of what to do and when to do it depends on knowing your specific tools, not just general theory or "rules".
It's surprising how many DBA's and architects keep trying to shove every application into the same design framework they're familiar with, instead of taking the time to learn and try different approaches. There really is only one real "rule" I know of:
Re:Two answers. (Score:3, Informative)
Yes. Get rid of SQL specific and go to HQL specifics. I do not know why products like hibernate call themselves Object Relational mappers.
They are Object Tabular mappers. They can only map tables or views to objects, but not other types of relations such as queries.
For something closer to a true Object Relational mapper, try iBatis SQL Maps. Ironically, they say their product is not an object relational mapper, but is the only product I have seen that actually can map any relation to objects.
SQL rules. HQL, JDOQL, EJBQL, and *QL (except SQL suck.
Re:I don't use em unless I have to (Score:4, Informative)
There's your answer... Most != All. The DB is connected to something, which is connected to something that is accessible from the outside. But attacks don't just happen from the outside, either. Probably 90% of the attacks you really need to be concerned about with most DB applications are an inside job. Can your company trust every user that has some access to the database?
Re:I don't use em unless I have to (Score:2, Informative)
From MSDN:
"Using sp_executesql is recommended over using the EXECUTE statement to execute a string. Not only does the support for parameter substitution make sp_executesql more versatile than EXECUTE, it also makes sp_executesql more efficient because it generates execution plans that are more likely to be reused by SQL Server."
Good and bad of stored procs (Score:5, Informative)
Triggers are more important and usually use stored proceedures to ensure that the information in the database is always meaningful or that some other automated activity happens on an insert or update.
Unless I absolutely have to, I try to avoid having my application call stored proceedures directly. A relational database manager *should* be able to hide the stored procedures behind a view allowing a nice standard interface for your data. This means that if you have to move to another RDBMS later, porting is much more simple and mostly confined to the backend.
BTW, I agree with the points about having your business logic in one place. Stored procedures allow you to move business logic which is inherent to the database into the database, thus making it available from all clients regardless of the language they are written in. For a single app/db pair this is not an issue but if you have a large DB with many different clients, it is a larger issue. Maintaining your application in one location is a LOT less work than reimplimenting it all in every one of your apps.
Triggers, BTW, as I mentioned before are very powerful mechanisms. They are not called by the app directly but are run every time a row is inserted, updated, or deleted (some RDBMS's also allow select triggers, though some have alternate ways of implimenting this). They can be used to impliment additional security restrictions, enforce referential integrity, or more advanced stuff such as send email when a new order is added to the database. Again, this is done regardless of whether the order is filed using a web app or a Windows app for the rep in the call center. Since the logic is unknown to the app, it isn't even aware of what happens after the order is placed. Talk about clean interfaces..... This requires stored procedures.
So, these are the great things about stored procedures. But when they are used badly, you end up with the stored procedures reducing the agility of the application because they tie it in too closely to the database. What do you do when your app is tied to Sybase and your customers want MS SQL Server? What if *all* your logic is in that database? Do you rewrite *all* of it for MS SQL Server? Probably Not. You are stuck and out of your investment.
In my opinion, it is important to maintain a clear line between what is the database's job and what is the application's job. If this line is blurred, bad things can result. Stored procedures are a very easy way to blur this line.
I design my apps with the following layers to them:
UI
Application-specific logic
Database Access Layer
Information Presentation (SQL API using views)
Information Management and Data Logic (usually stored procedures)
Information Storage (highly normalized, usually).
This allows the database to be a server app in its own right, and the client logic to run in the apps themselves. HERMES (see my sig) is mostly built this way, but there are a few things I need to change before I am happy with the interfaces. This is one reason it no longer supports MySQL.
Re:I don't use em unless I have to (Score:3, Informative)
I agree totally. I've seen too many "enterprise" apps that reside on DMZ boxes that have JDBC connections straight into the data tables in the DB. Not really that great from a security standpoint, IMO.
I prefer to make all remote calls via Stored Procedure API's, via a "proxy" database schema that only has execute permissions on the procedures, and no direct data table access.
This also means that the middle-tier devs can write code for a published API, and the DBA's can screw with the data model to their hearts content without (theoretically) breaking the app.
It also lets the DBA deal with SQL tuning, etc., while the middle-tier (typically Java) devs don't have to worry about understanding SQL tuning details.
I'm sure there are a number of views on this, but this has worked out very well for me over the years.
Re:Square hole, round peg.. (Score:3, Informative)
1)Provide a consistant interface to applications regardless of changes to the underlying database structure.
2)Enforce row level security.
3)Enforce data integrity.
Trees in relational tables without recursion is a problem that has been solved for years. http://www.sqlteam.com/item.asp?ItemID=8866 [sqlteam.com] It is trivial to implement nested sets with a few lines of SQL and after the tree is constructed, a simple SELECT will give you children, descendants, or ancestors.
Great for *processing* data (Score:3, Informative)
Stored procs are great, though, for processing lots of data on the server, without wasting resources (network bandwidth, client memory and CPU) sending the data to a client for processing and then posting results back to the database. Even if you have to do record-by-record processing instead of set operations, it's much more efficient to use a cursor on the server than a loop through a resultset on the client.
Stored procs are good at encapsulating a series of operations done periodically, such as end-of-month processing. In some databases you can define an event on the server that will automatically fire the SP. With such an event on the server rather than on a client machine, you don't need to worry about whether the correct client is running and connected when the event is supposed to fire.
Complex reports often require more data crunching that just a single query. I use an SP to process the data and leave the results in a temporary table. My client program runs the SP, then reports with a simple SELECT on the newly generated result table.
Depends on the project and the team. (Score:3, Informative)
The first time we tried sprocs, we basically treated them as functions. I would pass in a bunch of arguments (db column contents) to sprocs that would insert a new Activity, for instance. This got old very fast. It was much faster for me to write the business object that would insert itself from the middleware layer, than it was to wait for the DBA to create the sproc, after which I would have to create a middleware layer to the sproc anyway. It also didn't make financial sense for the client, because DBA's usually charge a higher hourly rate than middleware programmers.
The other sprocs were ones where I would supply several search criteria to a sproc (basically portions of a where clause), where he would assemble it into a sql query and then return the result set back to me. That was a bit more useful, but ended up kind of silly too, because it wasn't efficient to involve the DBA in actual application logic - we kept on having to go back to him whenever someone wanted to add a new dropdown to the search form.
If you're been an intermediate programmer, you've painted yourself in the hellacious corner of trying to dynamically generate a sql query that may or may not join across multiple tables. It ended up being a lot easier for the DBA to simply create a view for each family of search queries. Then I'd assemble the sql query on the middleware layer. Easy then, because I'd never have to worry about dynamic joins - the view would already have the joins, and I'd only be querying against the single view. And if there was a query change, I wouldn't have to involve him unless is actually required adding new columns to the view.
Right now the complexity of our projects don't require the remaining cases where a sproc would really make a huge positive difference. One such case would be a multi-step atomic transaction where we were worried about performance. A sproc would be perfect for that. But in general you can do just fine with inserting into tables and selecting from views without having to deal with the cost of having a significant amount of your project in sprocs.
Finally, an important tiebreaker between having logic in sprocs and in middleware is a pragmatic one - system resources. If you're making a lot of changes, you're going to be dealing with source code management - trunks, branches, and multiple staging installations. It's much easier to do this with your code repository than your database. With ost companies I've worked with, it's a lot easier to set up a new vhost for a code installation than it is to set up a third oracle installation. If you have a lot of quick changes to make, it's easier to make them in the codebase.
Beyond that though, it really depends on the team. If we had a full-time DBA rather than our 10-hour/week guy, and a less competent middleware programmer than myself, and a project with more fixed requirements, then we might defer more to sprocs. But our DBA is swamped, our projects tend to have ever-changing scopes, and I'm quite comfortable with MVC and keeping the control layer thin, to be able to respond quickly to the scope changes without having to majorly rework business objects on the model layer. It works well for us - and these are for large scale bank intranets, not simple little webapp one-offs.
Many people that know too many buzzwords think that "the business object layer" by definition MEANS the database and sprocs. It doesn't have to. It can just as easily mean the Model layer of an MVC middleware layer. With my work style, it's faster to leave it there and then use the database for storage and data-level calculations that can be embedded in the queries themselves.
non-proprietary stored procedure languages (Score:1, Informative)