Ask Slashdot: Getting a Grip On an Inherited IT Mess? 424
First time accepted submitter bushx writes "A little over a month ago, I assumed the position of programmer and sole IT personnel at a thriving e-commerce company. All the documentation I have is of my own creation, as I've spent most of my time reverse-engineering the systems in place just so I can understand how everything works together. Since I've started, I've done everything from network and phone upgrades to database maintenance with Perl, and thus far it's been immensely rewarding. But as I dig deeper, I notice the alarming number of band-aids applied by my predecessor, and it seems like the entire company's infrastructure is just a few problems away from a total meltdown. The big question now is, how do I, as a single person, effectively audit the network, servers, databases, backups, and formulate a long-term plan that can be implemented by one person? Is it possible? Where do I begin?"
methodically and late into the night (Score:5, Insightful)
Escalate (Score:5, Insightful)
Brief your management on the situation. Explain what condition things are in and what is needed to get them into a manageable state. Give them a list of projects / tasks that you have to deal with and get them to prioritize.
Get management buy in... (Score:5, Insightful)
You need to document it and get management to approve spending money.
I'll bet you $100.00 the band-aids are there because management refuses to spend money on Infrastructure and its' why it is a mess and the guy there beforehand has left.
99% of the time a hosed IT infrastructure is because management refused to spend any money so it had to be half assed.
this is a majorly funny story (Score:5, Insightful)
Facts:
1. The job has lasted for 1 month so far.
2. The e-commerce company is 'thriving' apparently'.
3. All of the systems have been "reverse engineered" in that 1 month.
4. All of the documents are written in that 1 month.
5. In 1 months there have been: network and phone upgrades and database maintenance with Perl and it all has been 'immensely rewarding'.
6. The entire infrastructure is 'a few problems away from a total meltdown'.
7. Single person IT operation to do everything.
Question: is this for real? What's the size of the company and what's the budget?
Re:Quit (Score:5, Insightful)
No!
This is actually the kind of career building stuff one should leap at. What would you rather say in an interview for your next job: ..
- I took this system that was falling apart and made it run like clockwork.. downtime and issue frequency went from "it's down again" to "been up all year"
- Yeah it was pretty good when I got there, and I maintained the status quo
My thoughts on original question:
First step is comprehension. You can’t fix what you don’t know you have/need. Identify the key components of your system. Then for each key component, break it down to it’s parts and dependencies. Then break each one of those out, and so on, until you have a pretty damn good idea of what you have.
Next part is assessment. For each component you’ve identified, what is its current state.
And then it’s time to do triage. Prioritize stuff by largest potential impact.
And finally carry out your well thought out pla.. ok, can't say that one with a straight face. Basically try to fix stuff when you can, between putting out the daily fires.
Speak UP (Score:2, Insightful)
Tell/emai/post your opinion and observation, as detailed as you can, alongside with your concerns. Make sure your managers see it. Do not expect them to do anything about it. Do it for your own reference, so you may continue working normally. Do not overwork or overworry yourself, for that will not bring you nor the failing systems anywhere closer to resolution. Do your normal job, stay cool and speak up. You are in drivers' seat.
Wait a minute .... (Score:5, Insightful)
"I assumed the position of programmer and sole IT personnel at a thriving e-commerce company."
Wait.... a thriving e-commerce company has one IT person? Am I missing something here...? No wonder everything was band-aided together. They have one person doing everything.
You may want to consider hiring an outside firm to come in and do the audit for you. The last thing you need right now, on top of your daily workload, is to perform an audit. That, and a third party firm creates a sense of objectivity, and would eliminate the "The IT guy wants a new toy" response from the CFO.
Re:you don't (Score:3, Insightful)
Or bring in contractors / consultants and have them serve their part and then part ways, the biggest mistake you can make is taking everything on your shoulders, that = loss of life & health. It's a job and work != life.
Re:Getting a Grip (Score:4, Insightful)
This is the only solid advice I've read so far. Band-aid solutions are indicative of two things: too shy to ask management for a bigger budget, or management's reluctance to improve their budget. Generally it is the latter.
Re:Quit (Score:4, Insightful)
Quit? Do you give up on every task before you start?
Some of us like a challenge.
Easy on the finger pointing (Score:2, Insightful)
No offense, but if you don't have the necessary background to know what/where the tools are; who are you to say everything is band-aided? I see this a lot with new ITs, they see something different than they would have done and instantly label their predecessor a moron; later to make "their" change and break everything. Easy on the finger pointing.
The first thing you need to do is make a comprehensive assessment; don't jump in and start making changes until you have documented everything. If you can contact your predecessor and ask about design and/or documentation that may be stored in an industry standard tool that YOU are unaware of; do so. Once you know how all the pieces move, then start to plan how to improve/repair it. If you dive in and it breaks, you will be blamed; if it breaks and you fix it with minimal down time, you're the hero.
Start over... slowly (Score:5, Insightful)
Re:this is a majorly funny story (Score:5, Insightful)
Question: is this for real?
It's an "Ask Slashdot". They're as real as "Letters to Penthouse". Both carefully crafted to create a fantasy situation to excite readers. Read them if the subject is something you're interested in, but don't waste your time giving advice..
Re:Quit (Score:5, Insightful)
I worked in this environment for one year as to not tarnish my resume. I toughed out the last 4 months absolutely burned out and bitter. You cannot communicate to management that outages and issues aren't your fault; they're adopted. When you fix things, you'll inevidably miss something (I did because of the pace, not dictated by me). Get out. It's not worth the challenge to get proper budgeting to get the right tools in place or the organization as a whole wouldn't let things get how they are in the first place. The business model I came from is failing. If you're good, there are better paying, better rewarding, less "heart and soul" companies out there. You're doing basically startup work for at will employment pay.
Re:Quit (Score:5, Insightful)
All of what you said (and the initial reaction to quit in the GP) all hinges on the root cause of the mess. If it's a result of the predecessor not doing things correctly and flying by the seat of his pants, you're correct at jumping at the opportunity. However, if it's caused by management screwing IT every chance they get with poor timelines, lack of funding, no foresight, and so on, run like hell.
Re:this is a majorly funny story (Score:5, Insightful)
I don't see where he said that all systems have been reverse engineered and documented in one month; only that he is currently reverse engineering systems and documenting.
And, maybe this guy likes what he is doing, getting his hands dirty with network and phone stuff. And some people really like writing Perl (I don't; I think it's the devil's language). If he finds his work rewarding, who are you to mock him?
Re:methodically and late into the night (Score:5, Insightful)
Money is the last step not the first (Score:2, Insightful)
99% of the time a hosed IT infrastructure is because management refused to spend any money so it had to be half assed.
It is certainly true that a great many companies are penny-wise-pound-foolish when it comes to IT but it is VERY premature to jump to that conclusion here. I've seen almost as many cases where companies over spent on IT for things they didn't really need. My current company has a piece of accounting software that is seriously overkill for our relatively pedestrian needs. Cost our company $80,000 when $3000 on Quickbooks Enterprise would have done the job fine. ( Bought by the previous owners who were all engineers without a lick of business savvy)
In any case it is much more likely that any "half assed" solutions were due to a lack of competence rather than a lack of money. It sounds like this guy has done a lot to improve things without throwing big bucks at the problem so I'm inclined to suspect his predecessor was not especially gifted.
Money whipping a problem should always be the solution of last resort. While it is certainly possible this company isn't spending enough, you don't spend money on anything without a reasonable expected ROI. Spending money as a first impulse usually means you haven't really thought about the problem sufficiently and are just assuming that a more expensive product will solve all your problems. If I hired an IT guy and the first thing out of his mouth was that I wasn't spending enough I'd be seriously worried.
Re:Quit (Score:5, Insightful)
It is probably a combination of the two. Because MGMT always assumes IT can do something with very little, and often the Impossible with Nothing.
We are skilled (most of us anyway) problem solvers, and they rely upon that to function. I hate to say it, but to the original question should be answered this way: HIRE outside consultants to evaluate your system(s), and give you a hard copy report on their findings that you can present to MGMT.
If the situation is as I believe, it is worse than he even suspects. He needs more help than he can do by himself, to get ahead of the curve.
Re:Quit (Score:4, Insightful)
Re:methodically and late into the night (Score:5, Insightful)
My guess is he's not... I'm immediately concerned by: "position of programmer and sole IT personnel" and: "thriving e-commerce company" together in the same sentence. The fact that this does not appear to be a small mom and pop with two or three servers making up the "e-presence" adds fuel to the fire. I'm getting the image of a fairly large company that relies heavily on it's web and e-commerce presence. And has one guy to take care of that. What happens when he's on vacation or sick and a server dies? What happens when the website has an issue and then *anything* else goes wrong?
There's no bullpen here, if anything, anything at all, breaks there's only one guy to fix it. Day or night. If two things break you're already triaging. Surely a "thriving" company can afford a backup to what is pretty clearly a business critical unit?
Your biggest problem will be funding (Score:4, Insightful)
Either because your predecessor 'made it work' with little or no funding (better translated as 'he made it almost work'), or because your predecessor failed to acquire sufficient funding to do it 'right'.
As a former field tech/consultant, try to avoid bringing in consultants to explain why the stuff needs to be bought. Many a manager ends up believing the consultants and disbelieving their staff. You get to either hire the consultant to justify your plan or find yourself undercut by that lack of confidence.
And of course nail some problems and show improvements as early as you can. It's wise to both solve pressing impactful problems first, and gain trust.
I always loved going into a client with lots of problems. Not just for the thrill of making things right, but knowing that if I did it right I had a referral for my next client - because the end result was most often working my way out of the gig. Either I passed the client on to another tech to maintain, or they got their staff's legs under them and could carry on. So long as there are more clients, this is good. Great fun to figure things out, isn;t it?
Re:Quit (Score:5, Insightful)
This.
I'd be willing to bet a year's pay that the previous guy wasn't straight-up incompetent. He was probably relatively skilled, and doing the best he could with the resources at his disposal. Which were probably not actually the resources he needed.
Odds are good that there's a reason why the place is in the condition it is now.
Odds are good that there's a reason why the last guy isn't there anymore.
Odds are good that you're going to need more than one guy in IT to get it all straightened-out.
Comment removed (Score:4, Insightful)
Re:Escalate (Score:5, Insightful)
I'd like to add that v1 above made a great comment too, namely that in addition to the crucial steps you mentioned, it's also important to keep management informed of your task list AND your progress on it, since often your fixes will be behind-the-scenes things that they may not notice. Make sure they know you're working hard to make their system better and more robust, do NOT assume that anyone else notices what you're doing.
Re:methodically and late into the night (Score:5, Insightful)
First thing to do is find out why. Why was there only one IT person, and why did they quit? Look through the junk on all the systems - if they were half-way intelligent, you'll find an external email address in some source somewhere. Ping them and ask what really happened.
Re:methodically and late into the night (Score:5, Insightful)
"if they were half-way intelligent, you'll find an external email address in some source somewhere. Ping them and ask what really happened."
I have done this before.. the response was...
"Find another job and run, run as fast as you can. Oh and trust no one."
Re:Quit (Score:1, Insightful)
I suggest getting more English lessons as the words , "I worked in this environment for one year as to not tarnish my resume" does not by any stretch of the language mean, "So you worked for one whole year in this industry"
So what is your native language where the word "environment" is equivalent to "industry"?
And you claim you have worked in IT for 22 years but yet you cant read English properly.
Show me you H1B card, I think you are fluffing us.
Re:Quit (Score:4, Insightful)
When I'm given a spoon and told to storm the hill and kill everyone in the machine gun nest?
yes. I quit before I even try. After you have been in IT long enough you can spot a suicide mission a mile away.
Re:methodically and late into the night (Score:5, Insightful)
That's assuming the predecessor wasn't the problem. I have learned over the years that there are far too many tech types to prefer to be the only one that does a particular task and will make any excuse to management to make sure things stay that way. When these lone wolf types happen to not be as competent as they pretend to be they tend to themselves into too deep a hole so they either get fired or quit in frustration but when you talk to them it will always be some other person's fault.
I'm not saying management isn't at fault, they very well could be but don't assume that right off. The first step is to try and get a read on how good the predecessor was at their job otherwise he can get very misleading info.
Re:Configuration management (Score:4, Insightful)
Do the fight, at least if there is a paper trail your ass is covered. If your company has auditors, buy them a coffee and see if they can help you explain to senior management why root access for everyone is a bad thing. I needed it years ago as the support person, but, when I moved back to development they kept my access. They gave me very strange looks when I asked for access to be revoked, but, when they got audited they didn't get nailed by the auditor for having developers full access to the prod machines.
As a compromise see if you can get a 'SYSTEST' area defined where an image of prod data is stored and the new code that is to be promoted can be staged. That way developers can put up their code and prove it works with prod data and if it gets signed off by management you can 'promote' the code to the prod servers.
Re:Quit (Score:5, Insightful)
Oh god yes do this.
If your bosses will sign off on getting a second opinion, great, stick around and fix stuff. If they don't even want to know that it's screwed up, get out as soon as you can.
Just be very careful when selecting who you'll bring in to do the audit, and be very clear that if anyone is brought on to help fix the problems, it absolutely will not be the same as the evaluators. Otherwise you're essentially handing them a blank check to say whatever they feel like is wrong, and fix it any way they want.
My suggestion is to generally avoid letting contractors do more than consult with you on a project--they know very well how to set things up so that it's easy for them to work on in the future, and are generally not very good at making the stuff actually fit in well with your business processes.
Re:Escalate (Score:5, Insightful)
Brief your management on the situation. Explain what condition things are in and what is needed to get them into a manageable state. Give them a list of projects / tasks that you have to deal with and get them to prioritize.
Its sort of unrelated. But my brother was doing some independent audio work for some VIP wedding in Italy, when he realized the electrical hardware & connections were a mess (meaning they were actually dangerous to use). He first talked with the management for the event and let them know about the situation. They ignored him. He quit the job, and was highly criticized for it.
As he was disconnecting all of his hardware with his team, a short circuit caused a fire, which fortunately was controlled easily.
The event's management immediately contacted him to offer him a formal apology and pay for the damages to his hardware. They also offered to hire him back, double the salary. The last part was kind of luck, but had the fire not been controlled as easily as it was, my brother would have shared the responsibility.
Long story short: sometimes you have to know when to step down.
Re:methodically and late into the night (Score:4, Insightful)
As opposed to management and their insistence that anyone is replaceable (except themselves) for pennies on the dollar. Sometimes, the lone wolf thing is just a HR-speak for "not a team player," sometimes it's a precursor to replacing someone with someone else who costs less.
Re:Quit (Score:5, Insightful)
Completely agree. Perhaps the previous guy didn't take the time to inform the management of what was required to do the job properly, or didn't know himself, or was more interested in painting himself as indispensable than doing the right thing. First things first, if this is genuinely a thriving e-commerce company then their website is their number one priority and their fulfilment systems are the number two priority, phones are number 3 with everything else taking a back seat - and they REALLY need to get a second employee. If you are ill, on holiday, or, deity forbid, something happens to you, then they need someone else who can step in. If their infrastructure is as shot as you suspect then you're going to need a second brain to sort it all out and help you implement it.
You must make sure that backups are being taken and are robust. You need a disaster recovery plan. You need both short term and long term plans to scale the infrastructure as the business grows and reactively if there's a sudden growth spurt. You need to know where the next bottleneck in the system is and come up with a plan to fix it. Do you have an adequate handle on monitoring traffic to the site from when they first land through to placing an order? Do management have the stats required to make informed decisions about the business? Management will also need to be aware of when IT will need extra funds as mapped against their own sales growth targets.
Once all of the above is sorted, and decent management allowing (and presuming this isn't something that is already being taken care of), you need to start suggesting to management the skillsets of people and / or contractors and / or agencies that need to be brought in to proactively grow the business. Be it SEO, PPC, UX, new features, etc. whatever it is, you have the opportunity to help the business understand it all and be instrumental in their success.
Re:methodically and late into the night (Score:5, Insightful)
Oh, that's easy:
In fact, I'd lay odds that's how the vacancy occurred.
Re:methodically and late into the night (Score:3, Insightful)
It's not a matter of whether he can keep up with it, it's a matter of backup. Did you ever take a vacation? Not a "going on a trip this weekend" thing, a real week long vacation? How many times did you log in remotely? What if you got really sick? What if something broke while you were really sick? I'm not talking about, "gosh, I don't feel too good today I'm staying home and can log in remotely if there's a problem" sick, I'm talking bedridden or hospitalized sick. How important was it that your servers stay up 99.9% of the time? This sounds like a "Our servers are down, we're hemorrhaging money" kind of situation. I'm not saying they need a team of 15 people, but one extra guy who can fix the business critical systems because you have jury duty today is always nice.
Not to mention that he's also the programmer. So while he's ripping out, documenting and fixing all the IT his predecessor left him, he's also supposed to be building and maintaining the site and the backend code. Oh, and fixing the bosses lap top. Cause hey, he's the IT guy.
Re:methodically and late into the night (Score:5, Insightful)
Yes, in a larger company, you'd hire an Exchange pro, an AD pro, a networking pro, a programmer or two and a couple techs that are slightly more generalized guys to manage backups, the server room and help desk. The unfortunate truth is that specialized individuals are rarely any good outside their specialty... which is unhelpful to a small business that can't afford a stable full of tech talent.
I know, as I've been this guy. It's brutal work but can be pretty satisfying. Every day your work is different. But you're never an expert at one particular thing and you're never paid like someone who specialized early on.
Re:methodically and late into the night (Score:3, Insightful)
First thing to do is find out why. Why was there only one IT person, and why did they quit?
If there was only one, I'm betting it was a real small shop run by somebody who thought they could pay a local computer geek $10/hr to run everything, and the guy left the second he found a job that paid what the work is worth.
Re:methodically and late into the night (Score:5, Insightful)
I call this the 'Hit By A Bus' scenario. If you're hit by a bus in the next five minutes can the business carry on without you? If the answer is no for any reason then the business has major problems.