Obtaining Multi-Tier Application Logs for Reseach? 40
arohann asks: "I'm a research assistant in a well-known university in the US. As part of the research work my group is doing, we need access to the logs from a production system of an n-tier web-application. I've been looking around for a while with no result. Most places reply with a flat 'No!'. I was wondering if there anyone who could help/advise with this. Please read about our requirement below and do let me know if you can help?"
"We want to examine the request arrival behaviour of a real-world web-application and will also need to examine how long each request takes to be processed at each tier. We would collect this data over a few days and then use it to build a real-world model of the request behaviour of an internet application. This model would be used in our analysis and profiling of clustered, multi-tier, internet applications.
Of course, we realize it maybe that some of this data cannot be shared due to client privacy concerns. However, let me assure you that we are not interested in any client details and we're not particularly concerned with what kind of an application it is as long as its at least 3-tier, is a production system (we need a real-world model), and is used daily. We are also willing to sign a confidentiality agreement if necessary and follow any company protocol required to ensure that security and confidentiality are preserved.
Of course, if this results in any research paper publications, we would give credit to the supplier of the data.
Hoping to hear back from everyone soon ;)"
own them (Score:1)
Re:own them (Score:2, Insightful)
Re:own them (Score:1)
Don't think so (Score:1, Troll)
Have you tried... (Score:1)
Before you laugh, it goes amazingly far where I work....
Re:Have you tried... (Score:2)
That said, the OP may find that if he gave us the log analysis tools and algorithms he wants to apply to the log files, a bunch of us would run the analysis on our own logs and send him the results. That way he would get the benefit of a slew of different data sets, instead of just one or two.
Companies are not too keen on sending out their internal datasets (Sarbanes / Oxley might have something to do with that, or the thought of being caught i
My Suggestions, (Score:3, Insightful)
no (Score:1, Insightful)
System logs are for the machine's administrators and for software developers, not researchers.
If you guys want research material, build your own systems and sink in the tens of miillions of dollars to do that. If your app is decent you'll have more log data that you could possibly wish for.
Re:no (Score:2)
Although I see limited use for a
Re:no (Score:2)
and will also need to examine how long each request takes to be processed at each tier.
That can vary greatly in N-Tier apps. In N-Tier apps, many people put business logic in sprocs in the database (and sometimes some in the clients... poor design usually although can be used to "double check" things), while others will have exactly 100.0% of the BL in the Business Logic Layer and none anywhere else (none in sprocs whatsoever). Things like that will affect results g
Re:no (Score:1)
We've sunk $20+M in one project for 1 app. Granted the app is very complex, with all sorts of crap in it. I am sure we've passed the 100,000 man hours mark too in the last 8 months. It's got some
Re:no (Score:1)
Welcome to America.
Re:no (Score:1)
Besides, I lived in France. It's no better.
The key is that if you work hard, at least you CAN.
This country is not going straight to hell, since, well, countries don't go to hell. Or heaven, for that matter.
Probably not. (Score:1)
Talk to research friendly companies (Score:5, Informative)
Just cold calling or sending in letters or email is about as effective as you've found it to be.
Also you should try looking through published artcles in trade journals and find out which companies are sponsoring research in your field by association with existing published research.
The fact is that you'll certainly have to sign an NDA and likely they will have to scrub the data anyway. One way or another it's going to cost the donors $$$ that you aren't going to reimburse. Your project will have to fit in with their research goals or they'll be returning a favor from someone else.
-- John.
Use your professors (Score:3, Insightful)
Most companies will consider this to be a security risk. They don't even want you to know the rough design of their backends let alone collect data from it.
Some companies wouldn't know how to gather what you want and wouldn't risk letting you touch their systems.
Most of these systems are probably messy, kludged together by former employees and hacked by current employees just enough to keep them running.
If you have some time, get an internship and do your research on the side.
What's it worth to the supplier? (Score:3, Informative)
Welcome to capitalism, we hope you enjoy your stay. While here, please note that TANSTAAFL [wikipedia.org].
Asking for data from a business requires a lot of work on your part. You must somehow convince them that all the effort they are going to spend collecting, sanitizing, and providing you with the information is going to pay off for them in a reasonable way. Since this request involves several months of data, and more employee involvement than a 5 minute survey you'll have to build a strong relationship with a company who has this data.
Opportunites include:
You may have better luck calling at the outset, intriducing yourself and your research, then asking who at the company would be suited to help you out with your research. Then engage that person. Don't get too low on the totem pole or you may end up with someone who is inneffective within the company at getting you what you want. Certian companies (Google, forinstance) are resource rich and may be easier to work with, especially if you can get one or two workers involved and spending their 20% time helping you. If your research isn't exciting on a general level, you're in for a rough ride.
Once you've started a conversation (with several people at different companies - you're still trying to get something they will be reluctant to give) then you can start edging into what you need to complete your research. This whole process will take 2-6 months just to set everything up. I hope you've started early.
Good luck.
-Adam
be more helpful, talk to open-source website (Score:3, Interesting)
> if this results in any research paper publications,
> we would give credit to the supplier of the data.
If that's all you offer in return, which company will allocate the resources to verify:?
(a) this breaches no privacy laws (b) business advantage isn't sacrificed?
Some suggestions:
1. Offer a quid-pro-quo to companies you contact: in return for access, you will deliver (say) a multi-page detailed architectural review and specific recommendations on potential improvements, reviewed, say, by your professor.
2. Talk to people who run websites for non-profits, or open-source/ creative-commons websites like wikipedia.org, sourceforge.net, even slashdot. The attitude there may be more sympathetic to your efforts and the admins more willing to knock up a few Perl scripts to strip logs of sensitive information.
3. Offer to be a website maintainer for a large indepedent open-source / community effort and obtain agreement on your access to logs.
I can probably help (Score:2, Interesting)
If you reply to this comment with your email address, then maybe we can work something out.
I need some help with testing my current project, and you need some data. It 's actually more work for me to have someone besides myself test the software but the quality should be higher and it could help you out.
Re:I can probably help (Score:1)
Are you for real? (Score:1)
Re:Are you for real? (Score:2)
If you are from a "large university", how come you can find any big app log files right on campas? Most "large universities" have plenty of "n-tier web-applications". Me thinks your request smells bad.
At my university the response would be a big fat NO!. I have asked the admins for some software that I desparately needed to do my research work efficiently (OSS) and they just said "no that's a security risk". I fail to see how some basic image procesing software is a security risk, but they have it in t
Underlying reason (Score:1)
Since you're already on
ever thougth ... (Score:1)
Smells fishy (Score:2)
Send a data shrouder with your requst (Score:2)
That will make them realize that you understand some of the constraints they are under, and that you'e a nice person (:-))
In particular, transform
http://www.sin.com/porno [sin.com] into
193'd-seen-HTTP-address
--dave
Sourceforge? (Score:1)
use your own? (Score:2)
+0, Obvious?
Christ, what do they teach in schools these days?
Sweeten the pot (Score:2)
If you can't get in on your own, convince some important vendor (e.g., IBM's Websphere group) that you're legit and can help them if they'll help you.
And if nobody