Forgot your password?
typodupeerror
Networking Businesses Programming IT Technology

Ask Slashdot: Getting a Grip On an Inherited IT Mess? 424

Posted by timothy
from the can-is-open-worms-are-everywhere dept.
First time accepted submitter bushx writes "A little over a month ago, I assumed the position of programmer and sole IT personnel at a thriving e-commerce company. All the documentation I have is of my own creation, as I've spent most of my time reverse-engineering the systems in place just so I can understand how everything works together. Since I've started, I've done everything from network and phone upgrades to database maintenance with Perl, and thus far it's been immensely rewarding. But as I dig deeper, I notice the alarming number of band-aids applied by my predecessor, and it seems like the entire company's infrastructure is just a few problems away from a total meltdown. The big question now is, how do I, as a single person, effectively audit the network, servers, databases, backups, and formulate a long-term plan that can be implemented by one person? Is it possible? Where do I begin?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Getting a Grip On an Inherited IT Mess?

Comments Filter:
  • say goodbye to your life for the next year. hope you're getting paid to mislay it....
  • Escalate (Score:5, Insightful)

    by sociocapitalist (2471722) on Tuesday December 06, 2011 @12:47PM (#38281240)

    Brief your management on the situation. Explain what condition things are in and what is needed to get them into a manageable state. Give them a list of projects / tasks that you have to deal with and get them to prioritize.

  • by Lumpy (12016) on Tuesday December 06, 2011 @12:48PM (#38281258) Homepage

    You need to document it and get management to approve spending money.

    I'll bet you $100.00 the band-aids are there because management refuses to spend money on Infrastructure and its' why it is a mess and the guy there beforehand has left.

    99% of the time a hosed IT infrastructure is because management refused to spend any money so it had to be half assed.

  • by roman_mir (125474) on Tuesday December 06, 2011 @12:49PM (#38281270) Homepage Journal

    Facts:

    1. The job has lasted for 1 month so far.
    2. The e-commerce company is 'thriving' apparently'.
    3. All of the systems have been "reverse engineered" in that 1 month.
    4. All of the documents are written in that 1 month.
    5. In 1 months there have been: network and phone upgrades and database maintenance with Perl and it all has been 'immensely rewarding'.
    6. The entire infrastructure is 'a few problems away from a total meltdown'.
    7. Single person IT operation to do everything.

    Question: is this for real? What's the size of the company and what's the budget?

  • Re:Quit (Score:5, Insightful)

    by Anonymous Coward on Tuesday December 06, 2011 @12:50PM (#38281284)

    No!

    This is actually the kind of career building stuff one should leap at. What would you rather say in an interview for your next job:
    - I took this system that was falling apart and made it run like clockwork.. downtime and issue frequency went from "it's down again" to "been up all year" ..
    - Yeah it was pretty good when I got there, and I maintained the status quo

    My thoughts on original question:

    First step is comprehension. You can’t fix what you don’t know you have/need. Identify the key components of your system. Then for each key component, break it down to it’s parts and dependencies. Then break each one of those out, and so on, until you have a pretty damn good idea of what you have.

    Next part is assessment. For each component you’ve identified, what is its current state.

    And then it’s time to do triage. Prioritize stuff by largest potential impact.

    And finally carry out your well thought out pla.. ok, can't say that one with a straight face. Basically try to fix stuff when you can, between putting out the daily fires.

  • Speak UP (Score:2, Insightful)

    by Anonymous Coward on Tuesday December 06, 2011 @12:50PM (#38281292)

    Tell/emai/post your opinion and observation, as detailed as you can, alongside with your concerns. Make sure your managers see it. Do not expect them to do anything about it. Do it for your own reference, so you may continue working normally. Do not overwork or overworry yourself, for that will not bring you nor the failing systems anywhere closer to resolution. Do your normal job, stay cool and speak up. You are in drivers' seat.

  • Wait a minute .... (Score:5, Insightful)

    by CuriousGeorge113 (47122) on Tuesday December 06, 2011 @12:51PM (#38281328) Homepage

    "I assumed the position of programmer and sole IT personnel at a thriving e-commerce company."

    Wait.... a thriving e-commerce company has one IT person? Am I missing something here...? No wonder everything was band-aided together. They have one person doing everything.

    You may want to consider hiring an outside firm to come in and do the audit for you. The last thing you need right now, on top of your daily workload, is to perform an audit. That, and a third party firm creates a sense of objectivity, and would eliminate the "The IT guy wants a new toy" response from the CFO.

  • Re:you don't (Score:3, Insightful)

    by Synerg1y (2169962) on Tuesday December 06, 2011 @12:54PM (#38281400)

    Or bring in contractors / consultants and have them serve their part and then part ways, the biggest mistake you can make is taking everything on your shoulders, that = loss of life & health. It's a job and work != life.

  • Re:Getting a Grip (Score:4, Insightful)

    by Hadlock (143607) on Tuesday December 06, 2011 @12:55PM (#38281432) Homepage Journal

    This is the only solid advice I've read so far. Band-aid solutions are indicative of two things: too shy to ask management for a bigger budget, or management's reluctance to improve their budget. Generally it is the latter.

  • Re:Quit (Score:4, Insightful)

    by 1s44c (552956) on Tuesday December 06, 2011 @12:55PM (#38281438)

    Quit? Do you give up on every task before you start?

    Some of us like a challenge.

  • by Anonymous Coward on Tuesday December 06, 2011 @12:57PM (#38281476)

    No offense, but if you don't have the necessary background to know what/where the tools are; who are you to say everything is band-aided? I see this a lot with new ITs, they see something different than they would have done and instantly label their predecessor a moron; later to make "their" change and break everything. Easy on the finger pointing.

    The first thing you need to do is make a comprehensive assessment; don't jump in and start making changes until you have documented everything. If you can contact your predecessor and ask about design and/or documentation that may be stored in an industry standard tool that YOU are unaware of; do so. Once you know how all the pieces move, then start to plan how to improve/repair it. If you dive in and it breaks, you will be blamed; if it breaks and you fix it with minimal down time, you're the hero.

  • by rabenja (919226) on Tuesday December 06, 2011 @01:04PM (#38281580) Journal
    I was in much the same position 12 years ago at this company. I am now CIO with 7 people on my team with several business partners to help manage the infrastructure. My advice for what it is worth:
    • - Take time every day to assess and analyze the bigger picture before allowing yourself to get drawn into the details.
    • - Look at the entire system from a risk mitigation perspective. What areas are most likely to cause "meltdown". Spend the most effort there.
    • - What are incremental changes that can be made that improve the overall risk picture? Focus on the biggest bang for the buck.
    • - Defer anything that works well enough for the time being.
    • - Avoid big bang solutions unless they can be contained and tested well, with the capability of rolling back.
    • - Get help where necessary.
  • by 1u3hr (530656) on Tuesday December 06, 2011 @01:06PM (#38281618)

    Question: is this for real?

    It's an "Ask Slashdot". They're as real as "Letters to Penthouse". Both carefully crafted to create a fantasy situation to excite readers. Read them if the subject is something you're interested in, but don't waste your time giving advice..

  • Re:Quit (Score:5, Insightful)

    by Anonymous Coward on Tuesday December 06, 2011 @01:07PM (#38281636)

    I worked in this environment for one year as to not tarnish my resume. I toughed out the last 4 months absolutely burned out and bitter. You cannot communicate to management that outages and issues aren't your fault; they're adopted. When you fix things, you'll inevidably miss something (I did because of the pace, not dictated by me). Get out. It's not worth the challenge to get proper budgeting to get the right tools in place or the organization as a whole wouldn't let things get how they are in the first place. The business model I came from is failing. If you're good, there are better paying, better rewarding, less "heart and soul" companies out there. You're doing basically startup work for at will employment pay.

  • Re:Quit (Score:5, Insightful)

    by The Moof (859402) on Tuesday December 06, 2011 @01:08PM (#38281642)
    I'd amend that to a big "maybe" for sticking around.

    All of what you said (and the initial reaction to quit in the GP) all hinges on the root cause of the mess. If it's a result of the predecessor not doing things correctly and flying by the seat of his pants, you're correct at jumping at the opportunity. However, if it's caused by management screwing IT every chance they get with poor timelines, lack of funding, no foresight, and so on, run like hell.
  • by KermodeBear (738243) on Tuesday December 06, 2011 @01:11PM (#38281686) Homepage

    I don't see where he said that all systems have been reverse engineered and documented in one month; only that he is currently reverse engineering systems and documenting.

    And, maybe this guy likes what he is doing, getting his hands dirty with network and phone stuff. And some people really like writing Perl (I don't; I think it's the devil's language). If he finds his work rewarding, who are you to mock him?

  • by mattventura (1408229) on Tuesday December 06, 2011 @01:16PM (#38281780) Homepage
    This. From what I've heard, it often involves weekends too.
  • by sjbe (173966) on Tuesday December 06, 2011 @01:17PM (#38281788)

    99% of the time a hosed IT infrastructure is because management refused to spend any money so it had to be half assed.

    It is certainly true that a great many companies are penny-wise-pound-foolish when it comes to IT but it is VERY premature to jump to that conclusion here. I've seen almost as many cases where companies over spent on IT for things they didn't really need. My current company has a piece of accounting software that is seriously overkill for our relatively pedestrian needs. Cost our company $80,000 when $3000 on Quickbooks Enterprise would have done the job fine. ( Bought by the previous owners who were all engineers without a lick of business savvy)

    In any case it is much more likely that any "half assed" solutions were due to a lack of competence rather than a lack of money. It sounds like this guy has done a lot to improve things without throwing big bucks at the problem so I'm inclined to suspect his predecessor was not especially gifted.

    Money whipping a problem should always be the solution of last resort. While it is certainly possible this company isn't spending enough, you don't spend money on anything without a reasonable expected ROI. Spending money as a first impulse usually means you haven't really thought about the problem sufficiently and are just assuming that a more expensive product will solve all your problems. If I hired an IT guy and the first thing out of his mouth was that I wasn't spending enough I'd be seriously worried.

  • Re:Quit (Score:5, Insightful)

    by Archangel Michael (180766) on Tuesday December 06, 2011 @01:18PM (#38281812) Journal

    It is probably a combination of the two. Because MGMT always assumes IT can do something with very little, and often the Impossible with Nothing.

    We are skilled (most of us anyway) problem solvers, and they rely upon that to function. I hate to say it, but to the original question should be answered this way: HIRE outside consultants to evaluate your system(s), and give you a hard copy report on their findings that you can present to MGMT.

    If the situation is as I believe, it is worse than he even suspects. He needs more help than he can do by himself, to get ahead of the curve.

  • Re:Quit (Score:4, Insightful)

    by Nethemas the Great (909900) on Tuesday December 06, 2011 @01:23PM (#38281882)
    I'm familiar with that kind of mess... Best advise is to piss on all the fires as quickly as they arise and unfortunately put in over-time putting in place alternative implementations that prevent it from happening again. During those brief intervals when something isn't burning survey the land and determine the highest risk (read cost and likelihood of failure) and take proactive steps to mitigate. At first you'll feel like you're in the worst kind of hell but eventually, things will start to come in place and you'll be able to enjoy your hard earned vacation. Further, it's quite probable that you don't have a complete knowledge of the technology being used. Get it. There's no substitute for actually knowing what you're doing. It's unfortunately far too common for people with no time and/or interest to search the web for a snippet of code, or set of procedure steps and hack these into place with out the slightest clue what they're doing nor the consequences thereof. It's usually better to go sharpen the ax before you go into the woods even if it seems like it will take more time. Trust me, it will pay dividends later.
  • by DrgnDancer (137700) on Tuesday December 06, 2011 @01:30PM (#38281984) Homepage

    My guess is he's not... I'm immediately concerned by: "position of programmer and sole IT personnel" and: "thriving e-commerce company" together in the same sentence. The fact that this does not appear to be a small mom and pop with two or three servers making up the "e-presence" adds fuel to the fire. I'm getting the image of a fairly large company that relies heavily on it's web and e-commerce presence. And has one guy to take care of that. What happens when he's on vacation or sick and a server dies? What happens when the website has an issue and then *anything* else goes wrong?

    There's no bullpen here, if anything, anything at all, breaks there's only one guy to fix it. Day or night. If two things break you're already triaging. Surely a "thriving" company can afford a backup to what is pretty clearly a business critical unit?

  • by rickb928 (945187) on Tuesday December 06, 2011 @01:31PM (#38282006) Homepage Journal

    Either because your predecessor 'made it work' with little or no funding (better translated as 'he made it almost work'), or because your predecessor failed to acquire sufficient funding to do it 'right'.

    As a former field tech/consultant, try to avoid bringing in consultants to explain why the stuff needs to be bought. Many a manager ends up believing the consultants and disbelieving their staff. You get to either hire the consultant to justify your plan or find yourself undercut by that lack of confidence.

    And of course nail some problems and show improvements as early as you can. It's wise to both solve pressing impactful problems first, and gain trust.

    I always loved going into a client with lots of problems. Not just for the thrill of making things right, but knowing that if I did it right I had a referral for my next client - because the end result was most often working my way out of the gig. Either I passed the client on to another tech to maintain, or they got their staff's legs under them and could carry on. So long as there are more clients, this is good. Great fun to figure things out, isn;t it?

  • Re:Quit (Score:5, Insightful)

    by Ephemeriis (315124) on Tuesday December 06, 2011 @01:34PM (#38282052)

    This.

    I'd be willing to bet a year's pay that the previous guy wasn't straight-up incompetent. He was probably relatively skilled, and doing the best he could with the resources at his disposal. Which were probably not actually the resources he needed.

    Odds are good that there's a reason why the place is in the condition it is now.

    Odds are good that there's a reason why the last guy isn't there anymore.

    Odds are good that you're going to need more than one guy in IT to get it all straightened-out.

  • by datavirtue (1104259) on Tuesday December 06, 2011 @01:44PM (#38282208)

    Judging from the questions, it seems he needs professional help. Seriously.

  • Re:Escalate (Score:5, Insightful)

    by gknoy (899301) <gknoy@anasazi s y s t e m s .com> on Tuesday December 06, 2011 @01:45PM (#38282214)

    I'd like to add that v1 above made a great comment too, namely that in addition to the crucial steps you mentioned, it's also important to keep management informed of your task list AND your progress on it, since often your fixes will be behind-the-scenes things that they may not notice. Make sure they know you're working hard to make their system better and more robust, do NOT assume that anyone else notices what you're doing.

  • I'm immediately concerned by: "position of programmer and sole IT personnel" and: "thriving e-commerce company" together in the same sentence.

    First thing to do is find out why. Why was there only one IT person, and why did they quit? Look through the junk on all the systems - if they were half-way intelligent, you'll find an external email address in some source somewhere. Ping them and ask what really happened.

  • by Lumpy (12016) on Tuesday December 06, 2011 @01:52PM (#38282298) Homepage

    "if they were half-way intelligent, you'll find an external email address in some source somewhere. Ping them and ask what really happened."

    I have done this before.. the response was...

    "Find another job and run, run as fast as you can. Oh and trust no one."

  • Re:Quit (Score:1, Insightful)

    by Anonymous Coward on Tuesday December 06, 2011 @01:57PM (#38282374)

    I suggest getting more English lessons as the words , "I worked in this environment for one year as to not tarnish my resume" does not by any stretch of the language mean, "So you worked for one whole year in this industry"

    So what is your native language where the word "environment" is equivalent to "industry"?

    And you claim you have worked in IT for 22 years but yet you cant read English properly.

    Show me you H1B card, I think you are fluffing us.

  • Re:Quit (Score:4, Insightful)

    by Lumpy (12016) on Tuesday December 06, 2011 @01:58PM (#38282388) Homepage

    When I'm given a spoon and told to storm the hill and kill everyone in the machine gun nest?

    yes. I quit before I even try. After you have been in IT long enough you can spot a suicide mission a mile away.

  • That's assuming the predecessor wasn't the problem. I have learned over the years that there are far too many tech types to prefer to be the only one that does a particular task and will make any excuse to management to make sure things stay that way. When these lone wolf types happen to not be as competent as they pretend to be they tend to themselves into too deep a hole so they either get fired or quit in frustration but when you talk to them it will always be some other person's fault.

    I'm not saying management isn't at fault, they very well could be but don't assume that right off. The first step is to try and get a read on how good the predecessor was at their job otherwise he can get very misleading info.

  • by TomTraynor (82129) on Tuesday December 06, 2011 @02:14PM (#38282586) Homepage

    Do the fight, at least if there is a paper trail your ass is covered. If your company has auditors, buy them a coffee and see if they can help you explain to senior management why root access for everyone is a bad thing. I needed it years ago as the support person, but, when I moved back to development they kept my access. They gave me very strange looks when I asked for access to be revoked, but, when they got audited they didn't get nailed by the auditor for having developers full access to the prod machines.

    As a compromise see if you can get a 'SYSTEST' area defined where an image of prod data is stored and the new code that is to be promoted can be staged. That way developers can put up their code and prove it works with prod data and if it gets signed off by management you can 'promote' the code to the prod servers.

  • Re:Quit (Score:5, Insightful)

    by 19thNervousBreakdown (768619) <davec-slashdot@@@lepertheory...net> on Tuesday December 06, 2011 @02:17PM (#38282616) Homepage

    Oh god yes do this.

    If your bosses will sign off on getting a second opinion, great, stick around and fix stuff. If they don't even want to know that it's screwed up, get out as soon as you can.

    Just be very careful when selecting who you'll bring in to do the audit, and be very clear that if anyone is brought on to help fix the problems, it absolutely will not be the same as the evaluators. Otherwise you're essentially handing them a blank check to say whatever they feel like is wrong, and fix it any way they want.

    My suggestion is to generally avoid letting contractors do more than consult with you on a project--they know very well how to set things up so that it's easy for them to work on in the future, and are generally not very good at making the stuff actually fit in well with your business processes.

  • Re:Escalate (Score:5, Insightful)

    by Decameron81 (628548) on Tuesday December 06, 2011 @02:19PM (#38282624)

    Brief your management on the situation. Explain what condition things are in and what is needed to get them into a manageable state. Give them a list of projects / tasks that you have to deal with and get them to prioritize.

    Its sort of unrelated. But my brother was doing some independent audio work for some VIP wedding in Italy, when he realized the electrical hardware & connections were a mess (meaning they were actually dangerous to use). He first talked with the management for the event and let them know about the situation. They ignored him. He quit the job, and was highly criticized for it.

    As he was disconnecting all of his hardware with his team, a short circuit caused a fire, which fortunately was controlled easily.

    The event's management immediately contacted him to offer him a formal apology and pay for the damages to his hardware. They also offered to hire him back, double the salary. The last part was kind of luck, but had the fire not been controlled as easily as it was, my brother would have shared the responsibility.

    Long story short: sometimes you have to know when to step down.

  • by lightknight (213164) on Tuesday December 06, 2011 @02:29PM (#38282754) Homepage

    As opposed to management and their insistence that anyone is replaceable (except themselves) for pennies on the dollar. Sometimes, the lone wolf thing is just a HR-speak for "not a team player," sometimes it's a precursor to replacing someone with someone else who costs less.

     

  • Re:Quit (Score:5, Insightful)

    by myurr (468709) on Tuesday December 06, 2011 @02:34PM (#38282824)

    Completely agree. Perhaps the previous guy didn't take the time to inform the management of what was required to do the job properly, or didn't know himself, or was more interested in painting himself as indispensable than doing the right thing. First things first, if this is genuinely a thriving e-commerce company then their website is their number one priority and their fulfilment systems are the number two priority, phones are number 3 with everything else taking a back seat - and they REALLY need to get a second employee. If you are ill, on holiday, or, deity forbid, something happens to you, then they need someone else who can step in. If their infrastructure is as shot as you suspect then you're going to need a second brain to sort it all out and help you implement it.

    You must make sure that backups are being taken and are robust. You need a disaster recovery plan. You need both short term and long term plans to scale the infrastructure as the business grows and reactively if there's a sudden growth spurt. You need to know where the next bottleneck in the system is and come up with a plan to fix it. Do you have an adequate handle on monitoring traffic to the site from when they first land through to placing an order? Do management have the stats required to make informed decisions about the business? Management will also need to be aware of when IT will need extra funds as mapped against their own sales growth targets.

    Once all of the above is sorted, and decent management allowing (and presuming this isn't something that is already being taken care of), you need to start suggesting to management the skillsets of people and / or contractors and / or agencies that need to be brought in to proactively grow the business. Be it SEO, PPC, UX, new features, etc. whatever it is, you have the opportunity to help the business understand it all and be instrumental in their success.

  • by xdroop (4039) on Tuesday December 06, 2011 @02:36PM (#38282844) Homepage Journal

    What happens when he's on vacation or sick and a server dies? What happens when the website has an issue and then *anything* else goes wrong?

    Oh, that's easy:

    • He gets called in from being on vacation or sick;
    • he gets to work uncompensated time to fix the problem;
    • if he fails to either respond to the call OR fails to fix the problem, he gets fired;
    • if he succeeds in fixing the problem, he gets threatened with termination should something else fail while he's "unavailable".

    In fact, I'd lay odds that's how the vacancy occurred.

  • by DrgnDancer (137700) on Tuesday December 06, 2011 @02:47PM (#38282958) Homepage

    It's not a matter of whether he can keep up with it, it's a matter of backup. Did you ever take a vacation? Not a "going on a trip this weekend" thing, a real week long vacation? How many times did you log in remotely? What if you got really sick? What if something broke while you were really sick? I'm not talking about, "gosh, I don't feel too good today I'm staying home and can log in remotely if there's a problem" sick, I'm talking bedridden or hospitalized sick. How important was it that your servers stay up 99.9% of the time? This sounds like a "Our servers are down, we're hemorrhaging money" kind of situation. I'm not saying they need a team of 15 people, but one extra guy who can fix the business critical systems because you have jury duty today is always nice.

    Not to mention that he's also the programmer. So while he's ripping out, documenting and fixing all the IT his predecessor left him, he's also supposed to be building and maintaining the site and the backend code. Oh, and fixing the bosses lap top. Cause hey, he's the IT guy.

  • by Anonymous Coward on Tuesday December 06, 2011 @03:36PM (#38283612)
    Small operations like his are common. I'd guess he is a reasonably capable person. Where his world and yours differ, is that he's a jack of all trades (master of none)... because that's what that kind of business requires.

    Yes, in a larger company, you'd hire an Exchange pro, an AD pro, a networking pro, a programmer or two and a couple techs that are slightly more generalized guys to manage backups, the server room and help desk. The unfortunate truth is that specialized individuals are rarely any good outside their specialty... which is unhelpful to a small business that can't afford a stable full of tech talent.

    I know, as I've been this guy. It's brutal work but can be pretty satisfying. Every day your work is different. But you're never an expert at one particular thing and you're never paid like someone who specialized early on.
  • by Anonymous Coward on Tuesday December 06, 2011 @03:49PM (#38283768)

    First thing to do is find out why. Why was there only one IT person, and why did they quit?

    If there was only one, I'm betting it was a real small shop run by somebody who thought they could pay a local computer geek $10/hr to run everything, and the guy left the second he found a job that paid what the work is worth.

  • by Ransak (548582) on Tuesday December 06, 2011 @04:06PM (#38284004) Homepage Journal
    What if you got really sick?

    I call this the 'Hit By A Bus' scenario. If you're hit by a bus in the next five minutes can the business carry on without you? If the answer is no for any reason then the business has major problems.

The most important early product on the way to developing a good product is an imperfect version.

Working...