Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Networking Businesses Programming IT Technology

Ask Slashdot: Getting a Grip On an Inherited IT Mess? 424

First time accepted submitter bushx writes "A little over a month ago, I assumed the position of programmer and sole IT personnel at a thriving e-commerce company. All the documentation I have is of my own creation, as I've spent most of my time reverse-engineering the systems in place just so I can understand how everything works together. Since I've started, I've done everything from network and phone upgrades to database maintenance with Perl, and thus far it's been immensely rewarding. But as I dig deeper, I notice the alarming number of band-aids applied by my predecessor, and it seems like the entire company's infrastructure is just a few problems away from a total meltdown. The big question now is, how do I, as a single person, effectively audit the network, servers, databases, backups, and formulate a long-term plan that can be implemented by one person? Is it possible? Where do I begin?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Getting a Grip On an Inherited IT Mess?

Comments Filter:
  • by p43751 ( 170402 ) on Tuesday December 06, 2011 @12:44PM (#38281160)

    You work at RIM?

  • say goodbye to your life for the next year. hope you're getting paid to mislay it....
    • by mattventura ( 1408229 ) on Tuesday December 06, 2011 @01:16PM (#38281780) Homepage
      This. From what I've heard, it often involves weekends too.
      • Comment removed (Score:4, Insightful)

        by account_deleted ( 4530225 ) on Tuesday December 06, 2011 @01:44PM (#38282208)
        Comment removed based on user account deletion
        • by Anonymous Coward on Tuesday December 06, 2011 @03:36PM (#38283612)
          Small operations like his are common. I'd guess he is a reasonably capable person. Where his world and yours differ, is that he's a jack of all trades (master of none)... because that's what that kind of business requires.

          Yes, in a larger company, you'd hire an Exchange pro, an AD pro, a networking pro, a programmer or two and a couple techs that are slightly more generalized guys to manage backups, the server room and help desk. The unfortunate truth is that specialized individuals are rarely any good outside their specialty... which is unhelpful to a small business that can't afford a stable full of tech talent.

          I know, as I've been this guy. It's brutal work but can be pretty satisfying. Every day your work is different. But you're never an expert at one particular thing and you're never paid like someone who specialized early on.
    • by DrgnDancer ( 137700 ) on Tuesday December 06, 2011 @01:30PM (#38281984) Homepage

      My guess is he's not... I'm immediately concerned by: "position of programmer and sole IT personnel" and: "thriving e-commerce company" together in the same sentence. The fact that this does not appear to be a small mom and pop with two or three servers making up the "e-presence" adds fuel to the fire. I'm getting the image of a fairly large company that relies heavily on it's web and e-commerce presence. And has one guy to take care of that. What happens when he's on vacation or sick and a server dies? What happens when the website has an issue and then *anything* else goes wrong?

      There's no bullpen here, if anything, anything at all, breaks there's only one guy to fix it. Day or night. If two things break you're already triaging. Surely a "thriving" company can afford a backup to what is pretty clearly a business critical unit?

      • I'm immediately concerned by: "position of programmer and sole IT personnel" and: "thriving e-commerce company" together in the same sentence.

        First thing to do is find out why. Why was there only one IT person, and why did they quit? Look through the junk on all the systems - if they were half-way intelligent, you'll find an external email address in some source somewhere. Ping them and ask what really happened.

        • by Lumpy ( 12016 ) on Tuesday December 06, 2011 @01:52PM (#38282298) Homepage

          "if they were half-way intelligent, you'll find an external email address in some source somewhere. Ping them and ask what really happened."

          I have done this before.. the response was...

          "Find another job and run, run as fast as you can. Oh and trust no one."

          • Seriously, I want the story on this one.
          • by Jane Q. Public ( 1010737 ) on Tuesday December 06, 2011 @04:30PM (#38284294)
            Absolutely. I once found myself working on a web project that had been through 3 previous developers -- and it wasn't even that big of a project -- but of course I did not know that when I took the job. If I had known the history of the project, I probably would not have taken it.

            I ended up trying to reverse-engineer a huge mess, without really being given the time to do so. They kept me busy making stupid little changes to the graphics, when it really needed some serious underlying code work.

            Then, out of the blue, they sprung a deadline on me of like 4 days, AND they wanted to release on a holiday. I said "No way. I would need at least another week to get this working properly." I did not get the week. PLUS they kept making changes up until literally the last hour, PLUS guess who got blamed when things -- inevitably -- did not work right?

            I was glad to get the hell out of there. As -- it turned out later -- were the 3 developers before me.
          • Comment removed (Score:4, Interesting)

            by account_deleted ( 4530225 ) on Tuesday December 06, 2011 @06:27PM (#38285646)
            Comment removed based on user account deletion
        • by gmack ( 197796 ) <gmack@noSpAM.innerfire.net> on Tuesday December 06, 2011 @02:05PM (#38282470) Homepage Journal

          That's assuming the predecessor wasn't the problem. I have learned over the years that there are far too many tech types to prefer to be the only one that does a particular task and will make any excuse to management to make sure things stay that way. When these lone wolf types happen to not be as competent as they pretend to be they tend to themselves into too deep a hole so they either get fired or quit in frustration but when you talk to them it will always be some other person's fault.

          I'm not saying management isn't at fault, they very well could be but don't assume that right off. The first step is to try and get a read on how good the predecessor was at their job otherwise he can get very misleading info.

        • Re: (Score:3, Insightful)

          by Anonymous Coward

          First thing to do is find out why. Why was there only one IT person, and why did they quit?

          If there was only one, I'm betting it was a real small shop run by somebody who thought they could pay a local computer geek $10/hr to run everything, and the guy left the second he found a job that paid what the work is worth.

      • by Colin Smith ( 2679 ) on Tuesday December 06, 2011 @01:52PM (#38282312)

        They have a guy who finds upgrading phone systems immensely satisfying! If he's sick he'll come in and fix it and who needs vacation anyway, he'll take the cash instead.

        I'm betting it's a psychotic break and he IS his predecessor.
         

      • by afidel ( 530433 ) on Tuesday December 06, 2011 @02:21PM (#38282662)
        Dude, for three years I was the sole server, network, and SAN guy for an S&P 500 company. If you can't handle a small e-commerce company with a handful of servers (we grew for 60 to 160 servers in those 3 years) then get out of IT. It took most of those 3 years to eradicate the poor work of my predecessor (hey, let's buy an $800 RAID card and then do Windows software RAID AND compression) but I eventually got there. It was a lot of work but I managed to keep it to 50 hours a week for most of that time
        • Re: (Score:3, Insightful)

          by DrgnDancer ( 137700 )

          It's not a matter of whether he can keep up with it, it's a matter of backup. Did you ever take a vacation? Not a "going on a trip this weekend" thing, a real week long vacation? How many times did you log in remotely? What if you got really sick? What if something broke while you were really sick? I'm not talking about, "gosh, I don't feel too good today I'm staying home and can log in remotely if there's a problem" sick, I'm talking bedridden or hospitalized sick. How important was it that your ser

          • by Ransak ( 548582 ) on Tuesday December 06, 2011 @04:06PM (#38284004) Homepage Journal
            What if you got really sick?

            I call this the 'Hit By A Bus' scenario. If you're hit by a bus in the next five minutes can the business carry on without you? If the answer is no for any reason then the business has major problems.

          • It's an ecommerce company. The only way they sell products is using computers, network, and software. It is beyond comprehension that they have a single person to do all of these tasks.

        • by tnk1 ( 899206 ) on Tuesday December 06, 2011 @03:18PM (#38283360)

          I don't think there is any doubt that under certain circumstances, one person could do a lot of work in what is, at it's heart, a set of automated processes to begin with. The problem here is that having one person do anything is a horrible idea. Even a small company should do it's best to have two IT people, or at least two people who know how to run the IT department if the company relies on their IT resources for their business. People do get sick or get hit by buses, or even have heart attacks on the soccer field while playing with co-workers after work. More often, they simply find other jobs and leave you with two weeks notice and that's not enough time to get the best transition, especially for the guy who runs "everything".

          As an IT person, I appreciate your level of workmanship for keeping things together yourself, but your boss should have been fired for allowing you to run things yourself. It's not a matter of having the skillset to pull it off yourself, its a matter of continuing operations and work-life balance. Maintaining staffing levels is not your responsibility, but I hope that you didn't believe that it was a good idea for you to be by yourself either. Your company got lucky that you were competent and didn't leave prematurely, but you aren't supposed to run successful companies on luck.

        • Re: (Score:3, Interesting)

          by BitZtream ( 692029 )

          After reading your post ... I call bullshit.

          A quick check of the S&P 500 shows that you'd have to be in several places at once to work as the only guy at any of those companies, and that 160 'servers' would be far lower than ... well ANY of them actually have, probably by an order of magnitude at least.

          • by afidel ( 530433 )
            Whatever, we were on the S&P 500 at the time (got removed after we went from ~$7B valuation to ~$3B). We are a REIT and so we have fairly small IT needs relative to the size of the company (peak of ~850 employees, and as I said $7B valuation with revenues of about a half billion a year). We have offices in 36 states and four countries but only have centralized IT operations at our HQ and our DR site.
      • Thank you. That sentence bothered me as well.

        The only place I can see that possibly making any sense is if you were a genius level programmer, at the head of your own company, and had a net worth of more than a million. Since he mentioned at least one other person (predecessor) and talks about a position...I'm thinking not.

        And dual-timing IT / programming for commerce is NOT a good idea, for any site with any traffic. You split those roles if you have that much traffic.

        So, I'm guessing it's a LAMP shop, doi

      • by xdroop ( 4039 ) on Tuesday December 06, 2011 @02:36PM (#38282844) Homepage Journal

        What happens when he's on vacation or sick and a server dies? What happens when the website has an issue and then *anything* else goes wrong?

        Oh, that's easy:

        • He gets called in from being on vacation or sick;
        • he gets to work uncompensated time to fix the problem;
        • if he fails to either respond to the call OR fails to fix the problem, he gets fired;
        • if he succeeds in fixing the problem, he gets threatened with termination should something else fail while he's "unavailable".

        In fact, I'd lay odds that's how the vacancy occurred.

      • by racermd ( 314140 ) on Tuesday December 06, 2011 @06:12PM (#38285486)

        I'm likely commenting too deeply for the person that asked the original question, but my advice seems to fit best here. What the company needs is an IT manager, whether hired directly or outsourced.

        Firstly, assess the corporate attitude towards hiring (competent) staff directly and buying or leasing hardware directly vs. purchasing outsourced services. Once you know where that conversation leads, you'll have a better idea of how to address the larger problems that only a bunch of time (and usually money) can solve.

        If the former, start the interviewing process ASAP. What you're looking for is self-starters that really do know their stuff. Take a handful of real-world scenarios, change some of the minor details a bit, and ask candidates what they'd do in that situation (or if they've encountered something like it before). Don't take them at their word, ask them to back it up with details of their own. Also, since you're going to wind up spending money on staff, you're probably going to be spending money on tools like new systems, software, and basic architecture hardware. Use an appropriate procurement process (and make sure it's followed) to meet your specific needs.

        If the latter, like I and many others here suspect it is, be sure to negotiate favorable contract terms with this in mind - everything is about money. You might be able to get a better rate on some services if you limit support to 8x5 instead of going 24x7, for instance. Is remote support acceptable or do you want someone on-site when you have to make that call? What is the response time to various levels of service calls? Do you want to host hardware on-site or have that done elsewhere? Things like that should be priced out and assessed against the needs of the business.

        Lastly, an important bit regardless of how the company wants to do it, the goal is to streamline operations which includes any support that's required when systems are not operating properly. Identify the weak subsystems and put them on a roadmap to be replaced with something more robust. It's a boring exercise in IT management that involves budgets and change control procedures but it does pay off in the long-run. If you need to get approval for spending, it helps to show what the current cost is, what the cost could be if things go wrong, and what costs could be if replaced with the more robust system. As long as you speak to your management in terms of money, they should listen.

  • by Anonymous Coward on Tuesday December 06, 2011 @12:45PM (#38281180)

    start drinking

  • by Neil Watson ( 60859 ) on Tuesday December 06, 2011 @12:45PM (#38281182) Homepage

    Automate your servers so you can focus your time elsewhere. I use Cfengine.
    http://watson-wilson.ca/2011/03/enterprise-system-administration-using-configuration-management.html [watson-wilson.ca]

    • by 1s44c ( 552956 ) on Tuesday December 06, 2011 @12:51PM (#38281322)

      Yes, automate everything, monitor everything, backup everything, document everything.

      I used to use cfengine but find puppet an easier tool to work with. Nagios and BackupPC are also wonderful tools but you might want to choose alternatives if they better fit your needs.

      You might want to express some concerns to management just in case something critical does fall over you don't look quite so bad.

      • by vajrabum ( 688509 ) on Tuesday December 06, 2011 @01:05PM (#38281598)
        Lots of folks here have talked about backups but if you're company is really successful then restores could be more of a problem than backups. Large databases and system configuration can take a loooong time. Develop a plan for restore and execute it regularly as a test. Make sure management understands the time for restoration. Two other things--virtualize (that reduces the coefficient of friction for moving things considerably) and consider using Amazon or some other cloud provider in your restore plan to in case your cage/server room/whatever burns. Some of those services are low or no cost until you start loading things up. If you go the cloud route be sure to get a read on your traffic, storage and other billable numbers. If that's the disaster plan then if the numbers are of any size at all you need to run the cost by the CFO to make sure that it's sustainable.
      • by charnov ( 183495 ) on Tuesday December 06, 2011 @02:09PM (#38282508) Homepage Journal

        The combo of Observium (network monitoring), Hobbit (monitor everything with extreme ease), and either ESXi or Proxmox VE for consolidation and ease of management/isolation/testing/etc has served me well for years to take control of large organizations quickly. Last two business I was hired to fix, I set this up and then built a parallel enterprise as VMs (the right way this time) and then cut everyone over in a weekend. No one noticed the change except to say stuff didn;t crash anymore and it was really fast.

        Also OpenFiler and NexentaStor make for a great SAN.

        If you need more: PFSense for firewall or VLAN router, BlueIris for IP cameras, PBX in a Flash for VoIP, SoGo for Outlook compatible email, LibreOffice, etc.

    • Inherited my mess about a year ago. I've done much to clean it up and monitor it.

      I may have to investigate Cfengine soon, but for now, since I am comfortable with creating my own RPMs since all of our servers are CentOS, I simply use yum with rpm. It works very nicely. If I make changes (I use git to track/branch/etc), I then just rsync the repository to our production server once I am happy that everything is correct. Building, git, etc, is all automated from within vim with some simple scripts that I

      • by bill_mcgonigle ( 4333 ) * on Tuesday December 06, 2011 @02:13PM (#38282576) Homepage Journal

        They all have root access still :-( A political fight I'm not yet prepared to have. I was able to take it away on the web servers, at least, and that's the only thing our developers touch, so life is a bit better.

        A fine baby step is to move everybody over to sudo. If you can get buy-in that everybody will track changes with git, then you have somebody to blame and can build a case if they break it. With sudo you have a record of who was mucking (in your /var/log/secure).

        If they're perfectly reasonable/responsible and you can track changes, it's not such a problem, really, unless you're worried that they're secret agents meaning to break your stuff. I typically only see frustrating carelessness where people can get away with it.

      • by TomTraynor ( 82129 ) <thomas.traynor@gmail.com> on Tuesday December 06, 2011 @02:14PM (#38282586)

        Do the fight, at least if there is a paper trail your ass is covered. If your company has auditors, buy them a coffee and see if they can help you explain to senior management why root access for everyone is a bad thing. I needed it years ago as the support person, but, when I moved back to development they kept my access. They gave me very strange looks when I asked for access to be revoked, but, when they got audited they didn't get nailed by the auditor for having developers full access to the prod machines.

        As a compromise see if you can get a 'SYSTEST' area defined where an image of prod data is stored and the new code that is to be promoted can be staged. That way developers can put up their code and prove it works with prod data and if it gets signed off by management you can 'promote' the code to the prod servers.

    • And document every step of the way. What you did, how you did it, where you did it and WHY! I did support for three years on a legacy mainframe app and a lot was never documented, especially the WHY. Half the time I put into fixing the outage was documentation.

  • by Anonymous Coward on Tuesday December 06, 2011 @12:46PM (#38281214)

    Dude, that is to easy. There are serious wiseacres on this board.

  • by shentino ( 1139071 ) <shentino@gmail.com> on Tuesday December 06, 2011 @12:47PM (#38281232)

    Did the last guy outsource everything to india?

  • Escalate (Score:5, Insightful)

    by sociocapitalist ( 2471722 ) on Tuesday December 06, 2011 @12:47PM (#38281240)

    Brief your management on the situation. Explain what condition things are in and what is needed to get them into a manageable state. Give them a list of projects / tasks that you have to deal with and get them to prioritize.

    • Re:Escalate (Score:5, Insightful)

      by Decameron81 ( 628548 ) on Tuesday December 06, 2011 @02:19PM (#38282624)

      Brief your management on the situation. Explain what condition things are in and what is needed to get them into a manageable state. Give them a list of projects / tasks that you have to deal with and get them to prioritize.

      Its sort of unrelated. But my brother was doing some independent audio work for some VIP wedding in Italy, when he realized the electrical hardware & connections were a mess (meaning they were actually dangerous to use). He first talked with the management for the event and let them know about the situation. They ignored him. He quit the job, and was highly criticized for it.

      As he was disconnecting all of his hardware with his team, a short circuit caused a fire, which fortunately was controlled easily.

      The event's management immediately contacted him to offer him a formal apology and pay for the damages to his hardware. They also offered to hire him back, double the salary. The last part was kind of luck, but had the fire not been controlled as easily as it was, my brother would have shared the responsibility.

      Long story short: sometimes you have to know when to step down.

  • by Lumpy ( 12016 ) on Tuesday December 06, 2011 @12:48PM (#38281258) Homepage

    You need to document it and get management to approve spending money.

    I'll bet you $100.00 the band-aids are there because management refuses to spend money on Infrastructure and its' why it is a mess and the guy there beforehand has left.

    99% of the time a hosed IT infrastructure is because management refused to spend any money so it had to be half assed.

    • by kiwimate ( 458274 ) on Tuesday December 06, 2011 @01:25PM (#38281918) Journal

      Exactly correct.

      Step 1.

      Document. Look at your critical systems. Document what they are. Start at a high level - line of business, internal (HR, etc). Drill down - I have an Oracle server, I have a Citrix system to allow the users to remote connect, which uses a VPN, etc.

      Cost: your time.

      Step 2.

      Prioritize. What are the most important systems? Start with the systems which, if they go down, will cause the company to lose money. Then the ones which support internal processes. Rank order.

      Cost: your time. Possibly management's time - they may have input into priorities.

      Step 3.

      Audit. Start at the top and find out just what state they're in. If you don't feel sufficiently comfortable with a particular technology to do this yourself, hire an SME for a few hours.

      Cost: potentially the consulting SME to evaluate various systems. Note - the initial contract is an audit, not a "find everything and fix".

      Step 4.

      Fix. If you have audit notes which say "this critical line of business system is on the verge of death and once it dies it can't be resurrected", that goes first. If you have audit notes which say "this is a system which provides some reporting capabilities and it's a bit shaky, but worst case is you have to reboot the server and the reports to management go out a bit late", not so bad.

      If you get to step 3 and management won't pay, then you have a problem.

      If you get to step 2 and management won't give up their time, then you have a very big problem.

      A big question will be the level of support from management. If they are not supportive, or if money is tight and they say "we'd like to pay for the consultants but", then that's why you've rank ordered.

      If they're cooperative but don't have the money, work with them to figure out some kind of timeline based on highest risk.

      If they're stubborn, urgh, bad spot. Do your best to determine level of risk. Work with the company accountant to figure out the cost to the company if a critical line of business system goes down for 10 minutes. 2 hours. Include some waffle about reputation, if you can. Include any penalties or SLA violations, if you have those.

  • by roman_mir ( 125474 ) on Tuesday December 06, 2011 @12:49PM (#38281270) Homepage Journal

    Facts:

    1. The job has lasted for 1 month so far.
    2. The e-commerce company is 'thriving' apparently'.
    3. All of the systems have been "reverse engineered" in that 1 month.
    4. All of the documents are written in that 1 month.
    5. In 1 months there have been: network and phone upgrades and database maintenance with Perl and it all has been 'immensely rewarding'.
    6. The entire infrastructure is 'a few problems away from a total meltdown'.
    7. Single person IT operation to do everything.

    Question: is this for real? What's the size of the company and what's the budget?

    • by 1u3hr ( 530656 ) on Tuesday December 06, 2011 @01:06PM (#38281618)

      Question: is this for real?

      It's an "Ask Slashdot". They're as real as "Letters to Penthouse". Both carefully crafted to create a fantasy situation to excite readers. Read them if the subject is something you're interested in, but don't waste your time giving advice..

    • by vlm ( 69642 )

      My gut level guess is my house's IT infrastructure is more elaborate / complicated. Admittedly very little of my gross income depends on my home infrastructure.
      My guess is he's a noob to IT. "'a few problems away from a total meltdown" describes every IT infrastructure I've seen in the past 20 years, including fortune 500 corporates. Nothing new there.

      I'm serious about the house analogy. Just treat it like a extremely advanced home lan, except you have more time, and outages are much more costly.

      I keep

    • by KermodeBear ( 738243 ) on Tuesday December 06, 2011 @01:11PM (#38281686) Homepage

      I don't see where he said that all systems have been reverse engineered and documented in one month; only that he is currently reverse engineering systems and documenting.

      And, maybe this guy likes what he is doing, getting his hands dirty with network and phone stuff. And some people really like writing Perl (I don't; I think it's the devil's language). If he finds his work rewarding, who are you to mock him?

  • by anom ( 809433 ) on Tuesday December 06, 2011 @12:50PM (#38281288)

    Just buy a few cases of your energy drink of choice and put Eye of the Tiger on repeat until you've got it all fixed.

    I believe in you.

  • Wait a minute .... (Score:5, Insightful)

    by CuriousGeorge113 ( 47122 ) on Tuesday December 06, 2011 @12:51PM (#38281328) Homepage

    "I assumed the position of programmer and sole IT personnel at a thriving e-commerce company."

    Wait.... a thriving e-commerce company has one IT person? Am I missing something here...? No wonder everything was band-aided together. They have one person doing everything.

    You may want to consider hiring an outside firm to come in and do the audit for you. The last thing you need right now, on top of your daily workload, is to perform an audit. That, and a third party firm creates a sense of objectivity, and would eliminate the "The IT guy wants a new toy" response from the CFO.

  • by slazzy ( 864185 ) on Tuesday December 06, 2011 @12:51PM (#38281330) Homepage Journal
    Always start by making sure the backups are working properly.
  • by Okian Warrior ( 537106 ) on Tuesday December 06, 2011 @12:57PM (#38281464) Homepage Journal

    You're going to spend time rewriting things that currently work? That's a recipe for disaster.

    Unless you can predict when something will fail (as in - the database uses 16-bit indexing, so when we hit 65,536 orders the database will crash), it's much more effective to leave things alone.

    Wait until changes are needed, then straighten out only those pieces that you have to touch when implementing new functionality.

    Work to a benefit. Unless you can point to some aspect which will change in a measurable way (it's crashing frequently, it will crash *less* when I'm done, it will cost less in terms of server rental, &c), leave it alone.

  • What, where, why... (Score:5, Informative)

    by ScottyLad ( 44798 ) on Tuesday December 06, 2011 @01:01PM (#38281544)

    I've spent the best part of my career undertaking tasks like this (as an external consultant), with my average time on an assignment lasting somewhere between 18 months and 3 years.

    My aim on every project is to make myself obsolete - in that I try to get documentation up to a point where a suitably qualified individual could come in, read the documentation, and work the rest out for themselves.

    My primary objectives are to implement some form of inventory control to document the what / where / why...

    • What - What have you got (servers, software, services, contracts, operating systems, databases, users)
    • Where - Where is it - where are your servers, what machine is this software licence running on?
    • Why - What is the Business Justification for this service - what is the Business Impact if this database stopped running tomorrow?

    Once you've got to that stage, then you're ready to get in to the real technical details. Remember that you are pitching your documentation to your successor, or to some imaginary "suitably qualified individual", so documenting what a system does and why is a higher priority than commenting every line of code.

    It is possible to do with one person, depending on the size of the organisation, it can be particularly rewarding to do on your own - in a small business you often find some of the users have a good understanding of some of the systems, or are keen to learn.

    You stated in your post that you've assumed the role of programmer and sole IT personnel - which means you need to learn to think like a manager as well as a techie (which is harder than most people imagine!). Once you learn to focus on the business priorities, you'll understand where to begin with the technical detail, and what level of documentation is required.

  • by rabenja ( 919226 ) on Tuesday December 06, 2011 @01:04PM (#38281580) Journal
    I was in much the same position 12 years ago at this company. I am now CIO with 7 people on my team with several business partners to help manage the infrastructure. My advice for what it is worth:
    • - Take time every day to assess and analyze the bigger picture before allowing yourself to get drawn into the details.
    • - Look at the entire system from a risk mitigation perspective. What areas are most likely to cause "meltdown". Spend the most effort there.
    • - What are incremental changes that can be made that improve the overall risk picture? Focus on the biggest bang for the buck.
    • - Defer anything that works well enough for the time being.
    • - Avoid big bang solutions unless they can be contained and tested well, with the capability of rolling back.
    • - Get help where necessary.
  • Me too (Score:5, Interesting)

    by weave ( 48069 ) on Tuesday December 06, 2011 @01:10PM (#38281666) Journal

    I walked into a similar nightmare two years ago. Before I even took the job I assessed the situation and gave them a proposal for what needed to be done and a price estimate for the software and hardware. I told them I would not take the job unless they committed funds to support the function. I also warned them that there were numerous ticking time bombs and I'll defuse them as fast as possible but there was no magic fix and it would take some time and they could have a disaster still

    I then convinced them to only hire me part-time and to also hire a part-time desktop support person for a few reasons including they don't want to pay me to do that and having two IT people at least gives you some continuity. Even if the desktop support guy doesn't know the high-end stuff, if I leave the desktop person can still guide the new person and save them a lot of time I never got.

    My line of attack was:

    1. Back up data. Wasn't easy. They had old cart tape drive units that were problematic. I ended up getting cheap TB externals to at least make mirrored copies of things. But at least if there was a disaster, I'd have their data safe somewhere -- even if it took me weeks to reconstruct systems to use it.
    2. Secure data. Everything was wide open. All domain users WERE DOMAIN ADMINISTRATORS. Locking that down was a pain. An understanding of what would be impacted ahead of time would have taken months, so I didn't tell anyone what access they had, then started removing people from domain admins a few at a time and waited to hear what broke, then fixed access issues. Not user friendly, but getting that under control fast was necessary.
    3. Renovated room with servers in it (that were 5+ year old deskside servers) so as to accommodate a rack with proper A/C flow, electrical feed, and physical security.
    4. Had them throw ~$50k into a virtual infrastructure and SAN, then virtualized all their old deskside servers until I could migrate apps on them to fresh OS installs. Used Vspehere's DRS product to back up the OS images and data to another system I had them buy for their other site (thankfully not too far away and connected by fiber)
    5. Identifying all in-house written programs and finding turnkey solutions to them, preferably cloud-based to reduce their dependency on in-house IT staff in future.
    6. Documenting everything as best I can as I go.

    Getting back to original point, a one-person IT shop is suicide. Them having a two person part-time crew is better because if one leaves, at least the other can provide some sort of continuity -- and that happened already. The fairly young guy I hired for desktop support two years ago died last month :-(

  • by scamper_22 ( 1073470 ) on Tuesday December 06, 2011 @01:13PM (#38281734)

    The first step is to define your goals. What do you want out of this?

    1. a job
    2. learning new skils
    3. leadership
    4. a chance to grow in the company

    If you are the sole IT/programmer person, this is a company in dire need of management with clue as to IT. You could be that change and end up being a manager of IT for this company. You have to work you butt off, fixing things, dealing with budgets and hiring staff. Can you deal with upper management to accomplish everything? That's up to you to decide.

    What I won't recommend is killing yourself for a company that is unwilling to learn from its mistakes and do it right. In that case, just treat it as a good learning opportunity, but don't kill yourself. They won't always be able to hire a superhero to come in and keep things running. Or if they do, it will be a well-paid consultant and they will learn their lesson quickly how much it costs.

    There is a reason this company has such poor IT systems. You could up being the IT guy in a long line of IT idiots.

  • by YojimboJango ( 978350 ) on Tuesday December 06, 2011 @01:23PM (#38281892)

    The number one best thing you can ever do in your situation is ask your bosses what they think the system should be doing.

    Step 1: All the squirrelly business logic and the rationale behind each system you have to maintain should have a plain text description. You have to know the 'Why' before the mess of band aids that is the 'How' will ever make sense. Have your boss (or his secretary, or whoever) document it and get it to you. Do NOT do this step yourself. Repeat do NOT perform this step.

    Step 2: Put out fires till someone not you finishes step 1. Start making backups of every last scrap of data you can get your grubby hands on.

    Step 3: Once step 1 is done compare it to the mess. Note where the realities that are in your bosses head diverge from what is actually happening. Your job is to now create a detailed functional spec that takes what your boss says, and expand on it with what is really happening. Try to include worst case scenarios and document them as intended features.

    Step 4: Have your boss and sales and marketing, and every other top level manager sign off on it. This will not happen. No two managers in your company will fully agree on what the current system is actually doing. Your goal is to figure out what sales and marketing are telling your users that your products do. Do not disregard this step or it will come back and bite you very hard.

    Step 5: Once every department actually agrees on what your job really is, you will be well equipped to start the long process of fixing things. Again make lots and lots of backups. Management will sign off on step 4, then you'll fix a gaping security hole, and some customer somewhere will throw a raging fit because sales promised that they'd be able to get admin access to your databases or something ridiculous.

    Step 6: Don't be an ass. When step 5 inevitably happens, explain the miss-step in communication graciously, and roll back. If you pulled not being an ass off properly, you now have a great platform to explain to management why X was a bad idea, and present an idea to fix it.

    I'm a grizzled vet to your situation. If someone would've told me what I just told you when I started out, there would have been a lot less headache and stress. Hang in there, it can be an intensely rewarding experience.

  • by nine-times ( 778537 ) <nine.times@gmail.com> on Tuesday December 06, 2011 @01:27PM (#38281932) Homepage

    I've been through similar situations a number of times. For the people who are telling you to get out of this job, I say: not necessarily. If you manage to fix these things, it can be a great learning experience and it can help you earn a name for yourself.

    So my advice is to start out bringing these problems to the attention of management. You don't need to be pushy, but be very clear that you have found these problems, that you think they're serious problems, and that the problems may endanger the success of the company. Give them a little leeway on how to direct you. They probably won't want to throw lots of money at the problem, but if they don't seem genuinely concerned and looking for solutions, then start looking for a new job.

    Second, get ready to learn about project management, because you're not fixing all of this at once. Make a list of what needs to be done. Prioritize that list. Estimate the time needed to do each task. If there's something extremely high priority that will run up against a specific deadline, then figure out what's necessary to meet that deadline. Start working on a budget.

    Start setting schedules for each thing that needs to be done, but recognize that the schedule will have to be flexible. In fact, don't bother scheduling things that are low priority until you've put out some fires. Keep them on your todo list, but consider making a separate "to do eventually, but I'm not going to bother thinking about it right now" list. When you have a schedule set, get to work. Keep track of your progress, and keep management informed of your progress. Keep them informed about problems and obstacles that you encounter along the way, especially if they'll cause an increase in your budget or a delay in your schedule.

    You'll want to gather some good project management tools along the way. At a bare minimum, these tools will include a calendar, a todo list, and a way to keep organized notes. Set aside time every week to review your notes, your calendar, your todo list.

    You can take project management classes, but most of what they teach you comes down to this: Make sure you understand what you're trying to accomplish, and that what you're doing is actually the best way to accomplish it. Keep your stakeholders informed, and listen to their feedback on your progress.

  • You document what's there. You've already started that. Next you document what's deficient. Then you put together a plan that, in stages, makes things better. Then you propose that plan to your management in terms that make sense to business people (happier customers, money saved, disaster avoided, etc...). Then you execute the plan.

  • by rickb928 ( 945187 ) on Tuesday December 06, 2011 @01:31PM (#38282006) Homepage Journal

    Either because your predecessor 'made it work' with little or no funding (better translated as 'he made it almost work'), or because your predecessor failed to acquire sufficient funding to do it 'right'.

    As a former field tech/consultant, try to avoid bringing in consultants to explain why the stuff needs to be bought. Many a manager ends up believing the consultants and disbelieving their staff. You get to either hire the consultant to justify your plan or find yourself undercut by that lack of confidence.

    And of course nail some problems and show improvements as early as you can. It's wise to both solve pressing impactful problems first, and gain trust.

    I always loved going into a client with lots of problems. Not just for the thrill of making things right, but knowing that if I did it right I had a referral for my next client - because the end result was most often working my way out of the gig. Either I passed the client on to another tech to maintain, or they got their staff's legs under them and could carry on. So long as there are more clients, this is good. Great fun to figure things out, isn;t it?

  • by TomTraynor ( 82129 ) <thomas.traynor@gmail.com> on Tuesday December 06, 2011 @01:54PM (#38282336)

    You will probably be getting a large number of suggestions. I have done both support and development on mainframes and servers so here is some input:

    1. Let management know at a high level the state of the machine(s) and get permission to spend part of your time documenting the system. When you get permission ask them for how often they need updates and how much detail. Keeping them in the loop seems to make them happy and feel important.

    2. Document the current state and highlight areas of concern. Put down what the concerns are, the risks and the potential costs to the company if it fails.

    3. Go through the document and organize it by risks. Try to figure out the size of the risk and how much work it will take to fix it and what is needed to fix the problem.

    4. Automate as much of your process as possible. Any task you have to do on a regular basis (in my humble opinion if you do it more than once then automate it) should be automated. Dedicate time to document what you did.

    5. Senior management is probably not wanting to see details. When you present, keep it simple and short. Point out the costs of failure and if you need software to help put that forward as an 'investment in infrastructure'.

    6. If the company has an internal auditor make friends with him/her. Getting them on your side to present to management will help. Having the auditor explain to them the financial costs will help your cause a lot.

    7. When you do things take the time to document what you are doing, WHY you are doing it, how you did it and where to go for the programs/scripts/data.

    8. Pick the brains as much as possible of all the people there. Offering to buy coffee and donuts seems to make them more receptive to an informal
    session and the amount of information they have could help you.

    Part of every project we do now is dedicated to documentation and the client now knows the importance of that documentation and is happy to pay for it. The current system is over 25 years old and a lot of business knowledge has been lost due to people retiring or leaving. When we find things we put them into a document. The hardest thing to find is the 'WHY', but, once you get that the rest of the information starts to make more sense. Our most popular section is the 'HOW TO DO' as this is the short cut for every other document in the system.

    When you do your documentation try to keep the documents as open as possible. Try to avoid proprietary packages as much as possible. We had an old flow chart program that we didn't have the program for and it took me a week to find an open source package that could read and export the files.

  • by Overzeetop ( 214511 ) on Tuesday December 06, 2011 @02:08PM (#38282504) Journal

    When I was hired to run the IT department of a major company my predecessor left three letters in the desk that was now mine. Each letter was clearly labeled; System Failure #1, System Failure #2, System Failure #3. A post-it note was attached to the bundle of letters.

            In case of a substantial system failure open the letters in order, once per failure, and they will help you through the problem.

    I put the letters back in the desk and forgot about them.

    About one year later we had a cascading server failure that left our corporate intranet and several important production servers off-line. While repairing the problem I remembered the letters. Curious, I opened the first letter.

            Blame me, your predecessor

    The day after we got the servers back up I was called in to my boss;s office to explain what happened and why were down for so long. Taking my cue from the letter I blamed my predecessor. My boss was satisfied with my answer and let me go.

    About six months down the road we had another big failure. This time our primary database server went down and the secondary was having trouble dealing with the load. I had to put a lot of extra hours into getting them back up and we lost a few transactions due to the backup server not being able to function under the load.

    Once again, I reached into that desk drawer and opened letter #2.

            Blame the equipment

    This time I lamented to the boss about how it wasn't my fault. It was that backup server! If we had some good equipment to run on these things just would not happen. He was satisfied with my answer and I went back to work.

    Things ran smoothly for the next 18 months. Then we got hit with a virus that somehow got past our firewall and wreaked havoc on our systems.

    I opened the third letter.

            Write three letters

    (Sorry, this was the first thing I thought of when I read the summary)

  • by thesandbender ( 911391 ) on Tuesday December 06, 2011 @02:19PM (#38282632)
    You already know that it's a tangled mess. You need to map that tangle throughly before you start fixing/replacing/retiring anything. The conversation you do not want to have with your superiors is why retiring system X (which costs $5,000/month) took down system Y (which makes $100,000/month). You need to map out both the business processes (which systems they touch) and the system dependencies (trust no one, log network data and look at the traffic between boxes). Do not start pulling strings until you know what they're connected to.

    You're not going to do this by yourself... at the very least you're going to need someone who knows the business side throughly. I've walked into a situation like this before for a very, very large company and I swear it took years off of my life but I learned a whole hell of a lot from the experience. Best of luck.
  • by plurgid ( 943247 ) on Tuesday December 06, 2011 @03:01PM (#38283134)

    You wouldn't be in this situation if your employer gave a crap. It's plain and simple: you report to someone. They know the extent of the problem and that there is only one of you. If they cared, there would be more than one of you. But there isn't. So turnabout is fair play.

    This is the true American solution to your problem: find other people to exploit and skim off the top ...

    Step 1: tell them you're going to become a telecommuter so that you can work 100% of the time
    Step 2: get on elance or some other such site: hire gobs of cheap (dubious) overseas help at $1/hour
    Step 3: instruct them all to send emails from your address and answer the phone with your name.
    Step 4: find a different job and just let your sub-contractors handle that one until the house of cards falls apart

    If your current employer calls you out on the fact that you have 15 different accents and sometimes answer the phone in a female voice, ask them why they're so racist.

    bonus if you used a pseudonym when hiring for your present job.

The Tao is like a glob pattern: used but never used up. It is like the extern void: filled with infinite possibilities.

Working...