Ask Slashdot: Getting a Grip On an Inherited IT Mess? 424
First time accepted submitter bushx writes "A little over a month ago, I assumed the position of programmer and sole IT personnel at a thriving e-commerce company. All the documentation I have is of my own creation, as I've spent most of my time reverse-engineering the systems in place just so I can understand how everything works together. Since I've started, I've done everything from network and phone upgrades to database maintenance with Perl, and thus far it's been immensely rewarding. But as I dig deeper, I notice the alarming number of band-aids applied by my predecessor, and it seems like the entire company's infrastructure is just a few problems away from a total meltdown. The big question now is, how do I, as a single person, effectively audit the network, servers, databases, backups, and formulate a long-term plan that can be implemented by one person? Is it possible? Where do I begin?"
Explaines a lot (Score:5, Funny)
You work at RIM?
Re:Explaines a lot (Score:5, Funny)
So you are asking him if he got a RIM job?
Re:Explaines a lot (Score:5, Funny)
It doesn't seem to work anymore, but for a while they had "http://rim.jobs"...
Re:Explaines a lot (Score:5, Funny)
http://steve.jobs/ [steve.jobs] does not seem to be operational either :).... I will probably get marked as troll by apple fanboys... still funny :p
Re:Explaines a lot (Score:4, Funny)
http://steve.jobs/ [steve.jobs] does not seem to be operational either :).... I will probably get marked as troll by apple fanboys... still funny :p
Nah, couldn't find the +1 Troll.
They get such a bad rap... poor trolls.
Re:Explaines a lot (Score:5, Funny)
Of course not, he said "thriving".
methodically and late into the night (Score:5, Insightful)
Re:methodically and late into the night (Score:5, Insightful)
Comment removed (Score:4, Insightful)
Re:methodically and late into the night (Score:5, Insightful)
Yes, in a larger company, you'd hire an Exchange pro, an AD pro, a networking pro, a programmer or two and a couple techs that are slightly more generalized guys to manage backups, the server room and help desk. The unfortunate truth is that specialized individuals are rarely any good outside their specialty... which is unhelpful to a small business that can't afford a stable full of tech talent.
I know, as I've been this guy. It's brutal work but can be pretty satisfying. Every day your work is different. But you're never an expert at one particular thing and you're never paid like someone who specialized early on.
Re:methodically and late into the night (Score:5, Insightful)
My guess is he's not... I'm immediately concerned by: "position of programmer and sole IT personnel" and: "thriving e-commerce company" together in the same sentence. The fact that this does not appear to be a small mom and pop with two or three servers making up the "e-presence" adds fuel to the fire. I'm getting the image of a fairly large company that relies heavily on it's web and e-commerce presence. And has one guy to take care of that. What happens when he's on vacation or sick and a server dies? What happens when the website has an issue and then *anything* else goes wrong?
There's no bullpen here, if anything, anything at all, breaks there's only one guy to fix it. Day or night. If two things break you're already triaging. Surely a "thriving" company can afford a backup to what is pretty clearly a business critical unit?
Re:methodically and late into the night (Score:5, Insightful)
First thing to do is find out why. Why was there only one IT person, and why did they quit? Look through the junk on all the systems - if they were half-way intelligent, you'll find an external email address in some source somewhere. Ping them and ask what really happened.
Re:methodically and late into the night (Score:5, Insightful)
"if they were half-way intelligent, you'll find an external email address in some source somewhere. Ping them and ask what really happened."
I have done this before.. the response was...
"Find another job and run, run as fast as you can. Oh and trust no one."
Re: (Score:3)
Re:methodically and late into the night (Score:5, Interesting)
I ended up trying to reverse-engineer a huge mess, without really being given the time to do so. They kept me busy making stupid little changes to the graphics, when it really needed some serious underlying code work.
Then, out of the blue, they sprung a deadline on me of like 4 days, AND they wanted to release on a holiday. I said "No way. I would need at least another week to get this working properly." I did not get the week. PLUS they kept making changes up until literally the last hour, PLUS guess who got blamed when things -- inevitably -- did not work right?
I was glad to get the hell out of there. As -- it turned out later -- were the 3 developers before me.
Comment removed (Score:4, Interesting)
Re:methodically and late into the night (Score:5, Insightful)
That's assuming the predecessor wasn't the problem. I have learned over the years that there are far too many tech types to prefer to be the only one that does a particular task and will make any excuse to management to make sure things stay that way. When these lone wolf types happen to not be as competent as they pretend to be they tend to themselves into too deep a hole so they either get fired or quit in frustration but when you talk to them it will always be some other person's fault.
I'm not saying management isn't at fault, they very well could be but don't assume that right off. The first step is to try and get a read on how good the predecessor was at their job otherwise he can get very misleading info.
Re:methodically and late into the night (Score:4, Insightful)
As opposed to management and their insistence that anyone is replaceable (except themselves) for pennies on the dollar. Sometimes, the lone wolf thing is just a HR-speak for "not a team player," sometimes it's a precursor to replacing someone with someone else who costs less.
Re: (Score:3, Insightful)
First thing to do is find out why. Why was there only one IT person, and why did they quit?
If there was only one, I'm betting it was a real small shop run by somebody who thought they could pay a local computer geek $10/hr to run everything, and the guy left the second he found a job that paid what the work is worth.
Why would they need a backup? (Score:5, Funny)
They have a guy who finds upgrading phone systems immensely satisfying! If he's sick he'll come in and fix it and who needs vacation anyway, he'll take the cash instead.
I'm betting it's a psychotic break and he IS his predecessor.
Re:methodically and late into the night (Score:4, Funny)
Re: (Score:3, Insightful)
It's not a matter of whether he can keep up with it, it's a matter of backup. Did you ever take a vacation? Not a "going on a trip this weekend" thing, a real week long vacation? How many times did you log in remotely? What if you got really sick? What if something broke while you were really sick? I'm not talking about, "gosh, I don't feel too good today I'm staying home and can log in remotely if there's a problem" sick, I'm talking bedridden or hospitalized sick. How important was it that your ser
Re:methodically and late into the night (Score:5, Insightful)
I call this the 'Hit By A Bus' scenario. If you're hit by a bus in the next five minutes can the business carry on without you? If the answer is no for any reason then the business has major problems.
Re: (Score:3)
It's an ecommerce company. The only way they sell products is using computers, network, and software. It is beyond comprehension that they have a single person to do all of these tasks.
Re:methodically and late into the night (Score:4, Informative)
The pride and arrogance in this trend of "I am god" here is sickening. I was part of a 2 man crew on a company of 40 people and it sucked. I got called on my way to vacation, on vacation and on the way back. I was called when ON the doctor's table. I left, and while I value the learning opportunities I was given, it is no way to run a company.
If a company does not value IT and the efforts put into maintaining it, then they are bleeding you and deceiving themselves.
This gig has writings on the wall all over (Score:5, Interesting)
My backup was my boss who was technically competent, so there was that, but it's not like I've never worked a job as a one man show. You buckle down and do what you have to and make it so things don't break just because you're not around (yes, this requires budget, but I've been fortunate enough that anyone willing to pay me what I'm worth is also willing to invest in a solid infrastructure).
This made you (and the situation you described) an outlier, one with a positive outcome. Your experiences cannot be applied in general. In general, this is rarely the case, for one-man-tech shops that is.
For the most part, conditions as described by the original submitter typically have "GTFO ASAP!" written all over it. I've done IT in companies, small and large, and I can attest that what you say is true: Yes, it is possible to being the one-guy-IT-slash-programmer-shop at a small e-commerce company. But the question is why? I wouldn't do it (again) unless a good compensation package came with it (which is typically never the case), or if I'm fresh out of school with nothing on my plate to take (in which case, it is ok.)
Good companies are never based on one-man-IT-slash-dev-shops, regardless of size (or at least they try not to.) I know, again, I've worked with companies big and small. Conditions like that are typically good proxies for more systemic problems, and at the end of the day (whenever possible), you want a paycheck, a rewarding job and good working conditions. Rarely you see that with one-man-IT-slash-dev-shop gigs, rarely if ever, regardless of the size of the company.
That's just my $0.02 input from what I've seen. YMMV so readers be warned and please take this anecdotal piece with a grain of salt.
Re:methodically and late into the night (Score:4, Interesting)
I don't think there is any doubt that under certain circumstances, one person could do a lot of work in what is, at it's heart, a set of automated processes to begin with. The problem here is that having one person do anything is a horrible idea. Even a small company should do it's best to have two IT people, or at least two people who know how to run the IT department if the company relies on their IT resources for their business. People do get sick or get hit by buses, or even have heart attacks on the soccer field while playing with co-workers after work. More often, they simply find other jobs and leave you with two weeks notice and that's not enough time to get the best transition, especially for the guy who runs "everything".
As an IT person, I appreciate your level of workmanship for keeping things together yourself, but your boss should have been fired for allowing you to run things yourself. It's not a matter of having the skillset to pull it off yourself, its a matter of continuing operations and work-life balance. Maintaining staffing levels is not your responsibility, but I hope that you didn't believe that it was a good idea for you to be by yourself either. Your company got lucky that you were competent and didn't leave prematurely, but you aren't supposed to run successful companies on luck.
Re: (Score:3, Interesting)
After reading your post ... I call bullshit.
A quick check of the S&P 500 shows that you'd have to be in several places at once to work as the only guy at any of those companies, and that 160 'servers' would be far lower than ... well ANY of them actually have, probably by an order of magnitude at least.
Re: (Score:3)
Re: (Score:3)
Thank you. That sentence bothered me as well.
The only place I can see that possibly making any sense is if you were a genius level programmer, at the head of your own company, and had a net worth of more than a million. Since he mentioned at least one other person (predecessor) and talks about a position...I'm thinking not.
And dual-timing IT / programming for commerce is NOT a good idea, for any site with any traffic. You split those roles if you have that much traffic.
So, I'm guessing it's a LAMP shop, doi
Re:methodically and late into the night (Score:5, Insightful)
Oh, that's easy:
In fact, I'd lay odds that's how the vacancy occurred.
Re:methodically and late into the night (Score:5, Informative)
I'm likely commenting too deeply for the person that asked the original question, but my advice seems to fit best here. What the company needs is an IT manager, whether hired directly or outsourced.
Firstly, assess the corporate attitude towards hiring (competent) staff directly and buying or leasing hardware directly vs. purchasing outsourced services. Once you know where that conversation leads, you'll have a better idea of how to address the larger problems that only a bunch of time (and usually money) can solve.
If the former, start the interviewing process ASAP. What you're looking for is self-starters that really do know their stuff. Take a handful of real-world scenarios, change some of the minor details a bit, and ask candidates what they'd do in that situation (or if they've encountered something like it before). Don't take them at their word, ask them to back it up with details of their own. Also, since you're going to wind up spending money on staff, you're probably going to be spending money on tools like new systems, software, and basic architecture hardware. Use an appropriate procurement process (and make sure it's followed) to meet your specific needs.
If the latter, like I and many others here suspect it is, be sure to negotiate favorable contract terms with this in mind - everything is about money. You might be able to get a better rate on some services if you limit support to 8x5 instead of going 24x7, for instance. Is remote support acceptable or do you want someone on-site when you have to make that call? What is the response time to various levels of service calls? Do you want to host hardware on-site or have that done elsewhere? Things like that should be priced out and assessed against the needs of the business.
Lastly, an important bit regardless of how the company wants to do it, the goal is to streamline operations which includes any support that's required when systems are not operating properly. Identify the weak subsystems and put them on a roadmap to be replaced with something more robust. It's a boring exercise in IT management that involves budgets and change control procedures but it does pay off in the long-run. If you need to get approval for spending, it helps to show what the current cost is, what the cost could be if things go wrong, and what costs could be if replaced with the more robust system. As long as you speak to your management in terms of money, they should listen.
1 suggestions (Score:5, Funny)
start drinking
Re:1 suggestions (Score:5, Informative)
Heavily
Configuration management (Score:5, Informative)
Automate your servers so you can focus your time elsewhere. I use Cfengine.
http://watson-wilson.ca/2011/03/enterprise-system-administration-using-configuration-management.html [watson-wilson.ca]
Re:Configuration management (Score:5, Informative)
Yes, automate everything, monitor everything, backup everything, document everything.
I used to use cfengine but find puppet an easier tool to work with. Nagios and BackupPC are also wonderful tools but you might want to choose alternatives if they better fit your needs.
You might want to express some concerns to management just in case something critical does fall over you don't look quite so bad.
Re:Configuration management (Score:5, Informative)
Re: (Score:3)
Observium, ESXi, and Hobbit (Score:5, Informative)
The combo of Observium (network monitoring), Hobbit (monitor everything with extreme ease), and either ESXi or Proxmox VE for consolidation and ease of management/isolation/testing/etc has served me well for years to take control of large organizations quickly. Last two business I was hired to fix, I set this up and then built a parallel enterprise as VMs (the right way this time) and then cut everyone over in a weekend. No one noticed the change except to say stuff didn;t crash anymore and it was really fast.
Also OpenFiler and NexentaStor make for a great SAN.
If you need more: PFSense for firewall or VLAN router, BlueIris for IP cameras, PBX in a Flash for VoIP, SoGo for Outlook compatible email, LibreOffice, etc.
Re: (Score:3)
Inherited my mess about a year ago. I've done much to clean it up and monitor it.
I may have to investigate Cfengine soon, but for now, since I am comfortable with creating my own RPMs since all of our servers are CentOS, I simply use yum with rpm. It works very nicely. If I make changes (I use git to track/branch/etc), I then just rsync the repository to our production server once I am happy that everything is correct. Building, git, etc, is all automated from within vim with some simple scripts that I
Re:Configuration management (Score:4, Informative)
They all have root access still :-( A political fight I'm not yet prepared to have. I was able to take it away on the web servers, at least, and that's the only thing our developers touch, so life is a bit better.
A fine baby step is to move everybody over to sudo. If you can get buy-in that everybody will track changes with git, then you have somebody to blame and can build a case if they break it. With sudo you have a record of who was mucking (in your /var/log/secure).
If they're perfectly reasonable/responsible and you can track changes, it's not such a problem, really, unless you're worried that they're secret agents meaning to break your stuff. I typically only see frustrating carelessness where people can get away with it.
Re:Configuration management (Score:4, Insightful)
Do the fight, at least if there is a paper trail your ass is covered. If your company has auditors, buy them a coffee and see if they can help you explain to senior management why root access for everyone is a bad thing. I needed it years ago as the support person, but, when I moved back to development they kept my access. They gave me very strange looks when I asked for access to be revoked, but, when they got audited they didn't get nailed by the auditor for having developers full access to the prod machines.
As a compromise see if you can get a 'SYSTEST' area defined where an image of prod data is stored and the new code that is to be promoted can be staged. That way developers can put up their code and prove it works with prod data and if it gets signed off by management you can 'promote' the code to the prod servers.
Re: (Score:3)
And document every step of the way. What you did, how you did it, where you did it and WHY! I did support for three years on a legacy mainframe app and a lot was never documented, especially the WHY. Half the time I put into fixing the outage was documentation.
"A little over a month ago I assumed the position" (Score:5, Funny)
Dude, that is to easy. There are serious wiseacres on this board.
related story (Score:3, Funny)
Did the last guy outsource everything to india?
Escalate (Score:5, Insightful)
Brief your management on the situation. Explain what condition things are in and what is needed to get them into a manageable state. Give them a list of projects / tasks that you have to deal with and get them to prioritize.
Re:Escalate (Score:5, Insightful)
Brief your management on the situation. Explain what condition things are in and what is needed to get them into a manageable state. Give them a list of projects / tasks that you have to deal with and get them to prioritize.
Its sort of unrelated. But my brother was doing some independent audio work for some VIP wedding in Italy, when he realized the electrical hardware & connections were a mess (meaning they were actually dangerous to use). He first talked with the management for the event and let them know about the situation. They ignored him. He quit the job, and was highly criticized for it.
As he was disconnecting all of his hardware with his team, a short circuit caused a fire, which fortunately was controlled easily.
The event's management immediately contacted him to offer him a formal apology and pay for the damages to his hardware. They also offered to hire him back, double the salary. The last part was kind of luck, but had the fire not been controlled as easily as it was, my brother would have shared the responsibility.
Long story short: sometimes you have to know when to step down.
Re:Escalate (Score:5, Insightful)
I'd like to add that v1 above made a great comment too, namely that in addition to the crucial steps you mentioned, it's also important to keep management informed of your task list AND your progress on it, since often your fixes will be behind-the-scenes things that they may not notice. Make sure they know you're working hard to make their system better and more robust, do NOT assume that anyone else notices what you're doing.
Get management buy in... (Score:5, Insightful)
You need to document it and get management to approve spending money.
I'll bet you $100.00 the band-aids are there because management refuses to spend money on Infrastructure and its' why it is a mess and the guy there beforehand has left.
99% of the time a hosed IT infrastructure is because management refused to spend any money so it had to be half assed.
Re:Get management buy in... (Score:5, Informative)
Exactly correct.
Step 1.
Document. Look at your critical systems. Document what they are. Start at a high level - line of business, internal (HR, etc). Drill down - I have an Oracle server, I have a Citrix system to allow the users to remote connect, which uses a VPN, etc.
Cost: your time.
Step 2.
Prioritize. What are the most important systems? Start with the systems which, if they go down, will cause the company to lose money. Then the ones which support internal processes. Rank order.
Cost: your time. Possibly management's time - they may have input into priorities.
Step 3.
Audit. Start at the top and find out just what state they're in. If you don't feel sufficiently comfortable with a particular technology to do this yourself, hire an SME for a few hours.
Cost: potentially the consulting SME to evaluate various systems. Note - the initial contract is an audit, not a "find everything and fix".
Step 4.
Fix. If you have audit notes which say "this critical line of business system is on the verge of death and once it dies it can't be resurrected", that goes first. If you have audit notes which say "this is a system which provides some reporting capabilities and it's a bit shaky, but worst case is you have to reboot the server and the reports to management go out a bit late", not so bad.
If you get to step 3 and management won't pay, then you have a problem.
If you get to step 2 and management won't give up their time, then you have a very big problem.
A big question will be the level of support from management. If they are not supportive, or if money is tight and they say "we'd like to pay for the consultants but", then that's why you've rank ordered.
If they're cooperative but don't have the money, work with them to figure out some kind of timeline based on highest risk.
If they're stubborn, urgh, bad spot. Do your best to determine level of risk. Work with the company accountant to figure out the cost to the company if a critical line of business system goes down for 10 minutes. 2 hours. Include some waffle about reputation, if you can. Include any penalties or SLA violations, if you have those.
this is a majorly funny story (Score:5, Insightful)
Facts:
1. The job has lasted for 1 month so far.
2. The e-commerce company is 'thriving' apparently'.
3. All of the systems have been "reverse engineered" in that 1 month.
4. All of the documents are written in that 1 month.
5. In 1 months there have been: network and phone upgrades and database maintenance with Perl and it all has been 'immensely rewarding'.
6. The entire infrastructure is 'a few problems away from a total meltdown'.
7. Single person IT operation to do everything.
Question: is this for real? What's the size of the company and what's the budget?
Re:this is a majorly funny story (Score:5, Insightful)
Question: is this for real?
It's an "Ask Slashdot". They're as real as "Letters to Penthouse". Both carefully crafted to create a fantasy situation to excite readers. Read them if the subject is something you're interested in, but don't waste your time giving advice..
Re: (Score:3)
My gut level guess is my house's IT infrastructure is more elaborate / complicated. Admittedly very little of my gross income depends on my home infrastructure.
My guess is he's a noob to IT. "'a few problems away from a total meltdown" describes every IT infrastructure I've seen in the past 20 years, including fortune 500 corporates. Nothing new there.
I'm serious about the house analogy. Just treat it like a extremely advanced home lan, except you have more time, and outages are much more costly.
I keep
Re:this is a majorly funny story (Score:5, Insightful)
I don't see where he said that all systems have been reverse engineered and documented in one month; only that he is currently reverse engineering systems and documenting.
And, maybe this guy likes what he is doing, getting his hands dirty with network and phone stuff. And some people really like writing Perl (I don't; I think it's the devil's language). If he finds his work rewarding, who are you to mock him?
It's the Eye of the Tiger! (Score:5, Funny)
Just buy a few cases of your energy drink of choice and put Eye of the Tiger on repeat until you've got it all fixed.
I believe in you.
Wait a minute .... (Score:5, Insightful)
"I assumed the position of programmer and sole IT personnel at a thriving e-commerce company."
Wait.... a thriving e-commerce company has one IT person? Am I missing something here...? No wonder everything was band-aided together. They have one person doing everything.
You may want to consider hiring an outside firm to come in and do the audit for you. The last thing you need right now, on top of your daily workload, is to perform an audit. That, and a third party firm creates a sense of objectivity, and would eliminate the "The IT guy wants a new toy" response from the CFO.
Backups (Score:3)
Re: (Score:3)
Don't forget to verify that the restore process using the backups works too.
If it works, don't fix it. (Score:3)
You're going to spend time rewriting things that currently work? That's a recipe for disaster.
Unless you can predict when something will fail (as in - the database uses 16-bit indexing, so when we hit 65,536 orders the database will crash), it's much more effective to leave things alone.
Wait until changes are needed, then straighten out only those pieces that you have to touch when implementing new functionality.
Work to a benefit. Unless you can point to some aspect which will change in a measurable way (it's crashing frequently, it will crash *less* when I'm done, it will cost less in terms of server rental, &c), leave it alone.
What, where, why... (Score:5, Informative)
I've spent the best part of my career undertaking tasks like this (as an external consultant), with my average time on an assignment lasting somewhere between 18 months and 3 years.
My aim on every project is to make myself obsolete - in that I try to get documentation up to a point where a suitably qualified individual could come in, read the documentation, and work the rest out for themselves.
My primary objectives are to implement some form of inventory control to document the what / where / why...
Once you've got to that stage, then you're ready to get in to the real technical details. Remember that you are pitching your documentation to your successor, or to some imaginary "suitably qualified individual", so documenting what a system does and why is a higher priority than commenting every line of code.
It is possible to do with one person, depending on the size of the organisation, it can be particularly rewarding to do on your own - in a small business you often find some of the users have a good understanding of some of the systems, or are keen to learn.
You stated in your post that you've assumed the role of programmer and sole IT personnel - which means you need to learn to think like a manager as well as a techie (which is harder than most people imagine!). Once you learn to focus on the business priorities, you'll understand where to begin with the technical detail, and what level of documentation is required.
Start over... slowly (Score:5, Insightful)
Me too (Score:5, Interesting)
I walked into a similar nightmare two years ago. Before I even took the job I assessed the situation and gave them a proposal for what needed to be done and a price estimate for the software and hardware. I told them I would not take the job unless they committed funds to support the function. I also warned them that there were numerous ticking time bombs and I'll defuse them as fast as possible but there was no magic fix and it would take some time and they could have a disaster still
I then convinced them to only hire me part-time and to also hire a part-time desktop support person for a few reasons including they don't want to pay me to do that and having two IT people at least gives you some continuity. Even if the desktop support guy doesn't know the high-end stuff, if I leave the desktop person can still guide the new person and save them a lot of time I never got.
My line of attack was:
Getting back to original point, a one-person IT shop is suicide. Them having a two person part-time crew is better because if one leaves, at least the other can provide some sort of continuity -- and that happened already. The fairly young guy I hired for desktop support two years ago died last month :-(
Define your goals. (Score:3)
The first step is to define your goals. What do you want out of this?
1. a job
2. learning new skils
3. leadership
4. a chance to grow in the company
If you are the sole IT/programmer person, this is a company in dire need of management with clue as to IT. You could be that change and end up being a manager of IT for this company. You have to work you butt off, fixing things, dealing with budgets and hiring staff. Can you deal with upper management to accomplish everything? That's up to you to decide.
What I won't recommend is killing yourself for a company that is unwilling to learn from its mistakes and do it right. In that case, just treat it as a good learning opportunity, but don't kill yourself. They won't always be able to hire a superhero to come in and keep things running. Or if they do, it will be a well-paid consultant and they will learn their lesson quickly how much it costs.
There is a reason this company has such poor IT systems. You could up being the IT guy in a long line of IT idiots.
In this situation currently. (Score:5, Interesting)
The number one best thing you can ever do in your situation is ask your bosses what they think the system should be doing.
Step 1: All the squirrelly business logic and the rationale behind each system you have to maintain should have a plain text description. You have to know the 'Why' before the mess of band aids that is the 'How' will ever make sense. Have your boss (or his secretary, or whoever) document it and get it to you. Do NOT do this step yourself. Repeat do NOT perform this step.
Step 2: Put out fires till someone not you finishes step 1. Start making backups of every last scrap of data you can get your grubby hands on.
Step 3: Once step 1 is done compare it to the mess. Note where the realities that are in your bosses head diverge from what is actually happening. Your job is to now create a detailed functional spec that takes what your boss says, and expand on it with what is really happening. Try to include worst case scenarios and document them as intended features.
Step 4: Have your boss and sales and marketing, and every other top level manager sign off on it. This will not happen. No two managers in your company will fully agree on what the current system is actually doing. Your goal is to figure out what sales and marketing are telling your users that your products do. Do not disregard this step or it will come back and bite you very hard.
Step 5: Once every department actually agrees on what your job really is, you will be well equipped to start the long process of fixing things. Again make lots and lots of backups. Management will sign off on step 4, then you'll fix a gaping security hole, and some customer somewhere will throw a raging fit because sales promised that they'd be able to get admin access to your databases or something ridiculous.
Step 6: Don't be an ass. When step 5 inevitably happens, explain the miss-step in communication graciously, and roll back. If you pulled not being an ass off properly, you now have a great platform to explain to management why X was a bad idea, and present an idea to fix it.
I'm a grizzled vet to your situation. If someone would've told me what I just told you when I started out, there would have been a lot less headache and stress. Hang in there, it can be an intensely rewarding experience.
Been there... (Score:3)
I've been through similar situations a number of times. For the people who are telling you to get out of this job, I say: not necessarily. If you manage to fix these things, it can be a great learning experience and it can help you earn a name for yourself.
So my advice is to start out bringing these problems to the attention of management. You don't need to be pushy, but be very clear that you have found these problems, that you think they're serious problems, and that the problems may endanger the success of the company. Give them a little leeway on how to direct you. They probably won't want to throw lots of money at the problem, but if they don't seem genuinely concerned and looking for solutions, then start looking for a new job.
Second, get ready to learn about project management, because you're not fixing all of this at once. Make a list of what needs to be done. Prioritize that list. Estimate the time needed to do each task. If there's something extremely high priority that will run up against a specific deadline, then figure out what's necessary to meet that deadline. Start working on a budget.
Start setting schedules for each thing that needs to be done, but recognize that the schedule will have to be flexible. In fact, don't bother scheduling things that are low priority until you've put out some fires. Keep them on your todo list, but consider making a separate "to do eventually, but I'm not going to bother thinking about it right now" list. When you have a schedule set, get to work. Keep track of your progress, and keep management informed of your progress. Keep them informed about problems and obstacles that you encounter along the way, especially if they'll cause an increase in your budget or a delay in your schedule.
You'll want to gather some good project management tools along the way. At a bare minimum, these tools will include a calendar, a todo list, and a way to keep organized notes. Set aside time every week to review your notes, your calendar, your todo list.
You can take project management classes, but most of what they teach you comes down to this: Make sure you understand what you're trying to accomplish, and that what you're doing is actually the best way to accomplish it. Keep your stakeholders informed, and listen to their feedback on your progress.
Document, plan, propose, execute (Score:3)
You document what's there. You've already started that. Next you document what's deficient. Then you put together a plan that, in stages, makes things better. Then you propose that plan to your management in terms that make sense to business people (happier customers, money saved, disaster avoided, etc...). Then you execute the plan.
Your biggest problem will be funding (Score:4, Insightful)
Either because your predecessor 'made it work' with little or no funding (better translated as 'he made it almost work'), or because your predecessor failed to acquire sufficient funding to do it 'right'.
As a former field tech/consultant, try to avoid bringing in consultants to explain why the stuff needs to be bought. Many a manager ends up believing the consultants and disbelieving their staff. You get to either hire the consultant to justify your plan or find yourself undercut by that lack of confidence.
And of course nail some problems and show improvements as early as you can. It's wise to both solve pressing impactful problems first, and gain trust.
I always loved going into a client with lots of problems. Not just for the thrill of making things right, but knowing that if I did it right I had a referral for my next client - because the end result was most often working my way out of the gig. Either I passed the client on to another tech to maintain, or they got their staff's legs under them and could carry on. So long as there are more clients, this is good. Great fun to figure things out, isn;t it?
Document and chat with management. (Score:3)
You will probably be getting a large number of suggestions. I have done both support and development on mainframes and servers so here is some input:
1. Let management know at a high level the state of the machine(s) and get permission to spend part of your time documenting the system. When you get permission ask them for how often they need updates and how much detail. Keeping them in the loop seems to make them happy and feel important.
2. Document the current state and highlight areas of concern. Put down what the concerns are, the risks and the potential costs to the company if it fails.
3. Go through the document and organize it by risks. Try to figure out the size of the risk and how much work it will take to fix it and what is needed to fix the problem.
4. Automate as much of your process as possible. Any task you have to do on a regular basis (in my humble opinion if you do it more than once then automate it) should be automated. Dedicate time to document what you did.
5. Senior management is probably not wanting to see details. When you present, keep it simple and short. Point out the costs of failure and if you need software to help put that forward as an 'investment in infrastructure'.
6. If the company has an internal auditor make friends with him/her. Getting them on your side to present to management will help. Having the auditor explain to them the financial costs will help your cause a lot.
7. When you do things take the time to document what you are doing, WHY you are doing it, how you did it and where to go for the programs/scripts/data.
8. Pick the brains as much as possible of all the people there. Offering to buy coffee and donuts seems to make them more receptive to an informal
session and the amount of information they have could help you.
Part of every project we do now is dedicated to documentation and the client now knows the importance of that documentation and is happy to pay for it. The current system is over 25 years old and a lot of business knowledge has been lost due to people retiring or leaving. When we find things we put them into a document. The hardest thing to find is the 'WHY', but, once you get that the rest of the information starts to make more sense. Our most popular section is the 'HOW TO DO' as this is the short cut for every other document in the system.
When you do your documentation try to keep the documents as open as possible. Try to avoid proprietary packages as much as possible. We had an old flow chart program that we didn't have the program for and it took me a week to find an open source package that could read and export the files.
Three Letters (Score:4, Funny)
When I was hired to run the IT department of a major company my predecessor left three letters in the desk that was now mine. Each letter was clearly labeled; System Failure #1, System Failure #2, System Failure #3. A post-it note was attached to the bundle of letters.
In case of a substantial system failure open the letters in order, once per failure, and they will help you through the problem.
I put the letters back in the desk and forgot about them.
About one year later we had a cascading server failure that left our corporate intranet and several important production servers off-line. While repairing the problem I remembered the letters. Curious, I opened the first letter.
Blame me, your predecessor
The day after we got the servers back up I was called in to my boss;s office to explain what happened and why were down for so long. Taking my cue from the letter I blamed my predecessor. My boss was satisfied with my answer and let me go.
About six months down the road we had another big failure. This time our primary database server went down and the secondary was having trouble dealing with the load. I had to put a lot of extra hours into getting them back up and we lost a few transactions due to the backup server not being able to function under the load.
Once again, I reached into that desk drawer and opened letter #2.
Blame the equipment
This time I lamented to the boss about how it wasn't my fault. It was that backup server! If we had some good equipment to run on these things just would not happen. He was satisfied with my answer and I went back to work.
Things ran smoothly for the next 18 months. Then we got hit with a virus that somehow got past our firewall and wreaked havoc on our systems.
I opened the third letter.
Write three letters
(Sorry, this was the first thing I thought of when I read the summary)
Map the dependencies (Score:3)
You're not going to do this by yourself... at the very least you're going to need someone who knows the business side throughly. I've walked into a situation like this before for a very, very large company and I swear it took years off of my life but I learned a whole hell of a lot from the experience. Best of luck.
if they don't care, why should you? (Score:5, Funny)
You wouldn't be in this situation if your employer gave a crap. It's plain and simple: you report to someone. They know the extent of the problem and that there is only one of you. If they cared, there would be more than one of you. But there isn't. So turnabout is fair play.
This is the true American solution to your problem: find other people to exploit and skim off the top ...
Step 1: tell them you're going to become a telecommuter so that you can work 100% of the time
Step 2: get on elance or some other such site: hire gobs of cheap (dubious) overseas help at $1/hour
Step 3: instruct them all to send emails from your address and answer the phone with your name.
Step 4: find a different job and just let your sub-contractors handle that one until the house of cards falls apart
If your current employer calls you out on the fact that you have 15 different accents and sometimes answer the phone in a female voice, ask them why they're so racist.
bonus if you used a pseudonym when hiring for your present job.
Re:Quit (Score:5, Insightful)
No!
This is actually the kind of career building stuff one should leap at. What would you rather say in an interview for your next job: ..
- I took this system that was falling apart and made it run like clockwork.. downtime and issue frequency went from "it's down again" to "been up all year"
- Yeah it was pretty good when I got there, and I maintained the status quo
My thoughts on original question:
First step is comprehension. You can’t fix what you don’t know you have/need. Identify the key components of your system. Then for each key component, break it down to it’s parts and dependencies. Then break each one of those out, and so on, until you have a pretty damn good idea of what you have.
Next part is assessment. For each component you’ve identified, what is its current state.
And then it’s time to do triage. Prioritize stuff by largest potential impact.
And finally carry out your well thought out pla.. ok, can't say that one with a straight face. Basically try to fix stuff when you can, between putting out the daily fires.
Re:Quit (Score:5, Insightful)
I worked in this environment for one year as to not tarnish my resume. I toughed out the last 4 months absolutely burned out and bitter. You cannot communicate to management that outages and issues aren't your fault; they're adopted. When you fix things, you'll inevidably miss something (I did because of the pace, not dictated by me). Get out. It's not worth the challenge to get proper budgeting to get the right tools in place or the organization as a whole wouldn't let things get how they are in the first place. The business model I came from is failing. If you're good, there are better paying, better rewarding, less "heart and soul" companies out there. You're doing basically startup work for at will employment pay.
Re: (Score:3, Interesting)
So you worked for one whole year in this industry, and that gives you insight enough to know that there is only ONE reason that things get to this state?
That's interesting because I've been in this industry for 22 years and I can list at least two possible reasons. The obvious one you're missing is that there is budget but the previous guy was an idiot.
I'd say there's a 90% chance it's the latter. Budget is easy to come by in a thriving business, but people who know what they are doing are still rare (hel
Re:Quit (Score:5, Informative)
I've been in a few similar situations over the years. The first thing you put on the table is "This is not an acceptable situation. Your risks are .".
If they don't cover this, then that's really not your problem. I've coding for 32 years, and doing sysadmin stuff as well for about 20 (among other strings to the bow), and live in despair of people who really don't understand that this stuff doesn't happen by waving a magic wand, and there is more to it than making pretty buttons appear on a screen.
At interview, if someone said they'd reverse engineered and documented a system in this environment (and yes, I interview people for dev/admin jobs from time to time), I would seriously ask them why they didn't get management a junior to cover the paperwork and cover duties, while they dealt with the heavy lifting of reverse engineering and planning. I want someone around who will grok the risks, take responsibility and come up with a resilient service (not just a few machines that may be able to fail over). Budget isn't always easy to come by, especially if there are political axes to grind.
I'm with the AC on this, from the limited info available. Either get them to get you a second, or get out. If the business is thriving, they can afford it, and they're just being cheapskates (and in many years, I've met quite a few like that) if they don't. You don't want to work for a cheapskate.
The time to take this kind of work on solo is if you're part of a startup, when you've got a lot invested in the success of the company. You live or fall on your wits, capability, and ability to lose every evening, weekend, and many a night too, on keeping this up and running as cheaply as possible.
Once the 'thriving' level arrives, you'd better make sure you're not still carrying that load alone, otherwise your own lifespan (as well as that of the company) may be quite severely limited.
Re:Quit (Score:5, Insightful)
Completely agree. Perhaps the previous guy didn't take the time to inform the management of what was required to do the job properly, or didn't know himself, or was more interested in painting himself as indispensable than doing the right thing. First things first, if this is genuinely a thriving e-commerce company then their website is their number one priority and their fulfilment systems are the number two priority, phones are number 3 with everything else taking a back seat - and they REALLY need to get a second employee. If you are ill, on holiday, or, deity forbid, something happens to you, then they need someone else who can step in. If their infrastructure is as shot as you suspect then you're going to need a second brain to sort it all out and help you implement it.
You must make sure that backups are being taken and are robust. You need a disaster recovery plan. You need both short term and long term plans to scale the infrastructure as the business grows and reactively if there's a sudden growth spurt. You need to know where the next bottleneck in the system is and come up with a plan to fix it. Do you have an adequate handle on monitoring traffic to the site from when they first land through to placing an order? Do management have the stats required to make informed decisions about the business? Management will also need to be aware of when IT will need extra funds as mapped against their own sales growth targets.
Once all of the above is sorted, and decent management allowing (and presuming this isn't something that is already being taken care of), you need to start suggesting to management the skillsets of people and / or contractors and / or agencies that need to be brought in to proactively grow the business. Be it SEO, PPC, UX, new features, etc. whatever it is, you have the opportunity to help the business understand it all and be instrumental in their success.
Re: (Score:3, Interesting)
It's not worth the challenge to get proper budgeting to get the right tools in place or the organization as a whole wouldn't let things get how they are in the first place.
I haven't been there, but it sounds like it would be very beneficial to learn to present the business case for upgrades and budgeting. Explain the difference in downtime that it would entail, and the benefits the company will get. From what I've seen our previous IT guy do, it seems that bosses are NOT opposed to spending money, as long as you can make a good case for why it's necessary. Put it in terms of dollars that it will save you and that will go a long way.
Re: (Score:3)
Depends on the management. 7 months ago I inherited a dysfunctional department with morale in the crapper and a seeming inability to do anything. The organization brought in a new director (my boss) and new department manager (me). I got a lot of funding and a lot of resources to fix things. Net result is that we are now ahead of the game for the first time in 5 years, people like being here, and we're having a blast.
So if management is providing support and resources, I'd say go for it. If they're say
Re:Quit (Score:5, Insightful)
All of what you said (and the initial reaction to quit in the GP) all hinges on the root cause of the mess. If it's a result of the predecessor not doing things correctly and flying by the seat of his pants, you're correct at jumping at the opportunity. However, if it's caused by management screwing IT every chance they get with poor timelines, lack of funding, no foresight, and so on, run like hell.
Re:Quit (Score:5, Insightful)
It is probably a combination of the two. Because MGMT always assumes IT can do something with very little, and often the Impossible with Nothing.
We are skilled (most of us anyway) problem solvers, and they rely upon that to function. I hate to say it, but to the original question should be answered this way: HIRE outside consultants to evaluate your system(s), and give you a hard copy report on their findings that you can present to MGMT.
If the situation is as I believe, it is worse than he even suspects. He needs more help than he can do by himself, to get ahead of the curve.
Re:Quit (Score:5, Insightful)
This.
I'd be willing to bet a year's pay that the previous guy wasn't straight-up incompetent. He was probably relatively skilled, and doing the best he could with the resources at his disposal. Which were probably not actually the resources he needed.
Odds are good that there's a reason why the place is in the condition it is now.
Odds are good that there's a reason why the last guy isn't there anymore.
Odds are good that you're going to need more than one guy in IT to get it all straightened-out.
Re: more than one guy in IT (Score:3)
I'll go on a limb based on my own current experience.
I think just about all companies bigger than say seven people need two people split half IT and half "line functions".
Then when everything is humming, they can "just work". But when a cascade situation comes up, you do those Tier Levels. Level 1 does all the End User fallout. (Every computer needs to get that new utility installed, then all the printers quit working because of a 2 minute power outage (winter is coming), User 1 wants to know where their fi
Re: (Score:3)
As someone who has inherited a bowl of spaghetti more than once in his day, I can say definitively that it's all driven by upper management/ownership. You're given a limited set of tools and an even smaller budget to make sure everything not only runs, but runs at peak efficiency. Then, add in incompetent end-users that are allowed, nay, ENCOURAGED to build undocumented, unstructured and barely maintainable "applications", and you're in for some real fun.
Re:Quit (Score:5, Insightful)
Oh god yes do this.
If your bosses will sign off on getting a second opinion, great, stick around and fix stuff. If they don't even want to know that it's screwed up, get out as soon as you can.
Just be very careful when selecting who you'll bring in to do the audit, and be very clear that if anyone is brought on to help fix the problems, it absolutely will not be the same as the evaluators. Otherwise you're essentially handing them a blank check to say whatever they feel like is wrong, and fix it any way they want.
My suggestion is to generally avoid letting contractors do more than consult with you on a project--they know very well how to set things up so that it's easy for them to work on in the future, and are generally not very good at making the stuff actually fit in well with your business processes.
Re:Quit (Score:5, Informative)
Yep. Hop into the waders and get to work. It can be a very rewarding experience turning a steaming pile back into a smooth running good looking machine.
To add to the above, document everything. Though it sounds like you're already doing that. Make sure it's documentation that works for anyone not just you. Don't take anything for granted. Automate whatever you can, including problem detection and notification. (save yourself from having to check things daily or weekly, have it shoot you an email or something if a common issue crops up again)
Make sure your employer fully understands the situation you and they are in, so they don't expect you to be doing improvements and striking things off their sore to-do-list that they were probably hoping you'd tackle the day you started. Get them a timeline as soon as you get something of a grip on the situation, tell them where you're going to be spending your time to start with, and the reasons why it's essential and going to delay their getting their bells and whistles and visible bang-for-the-buck of hiring you. Otherwise they may think you're just sitting on your butt because they're seeing no tangible benefits.
If you've got a LOT of things that need to be fixed, things that can be done by closer to trained-monkey level, consider getting a temp assistant to help you dig out. Someone to run around and reimage machines, fix networks, repair stations, do RMAs, etc while you pull up your sleeves and unhack the servers. But if they're not in that big of a hurry this may not be appropriate.
Good luck with it, sounds like fun actually, a challenge at the least.
Transparency (Score:3)
. . . document everything. . . [m]ake sure your employer fully understands the situation you and they are in.
This was the first thought that came to me. While there surely are employers that are the personification of Evil (I, too, have met my share), most are simply trying to do the best they can, but are hampered in their ability to help you by a lack of time, a lack of knowledge of IT subjects, or both. Because of this, they can't independently judge the quality of the advice you're giving them -- i.e., they have to trust you. Since (at least in my experience) most conversations with management IT initiates
Re:Quit (Score:4, Interesting)
Best advice, right there. It's a challenge for certain, but making things better is the best thing you can do - for the company (ha) and, far more importantly, for yourself.
Hang in there!
And although it may feel like the whole place is going to fall apart any moment, it hasn't yet, you're in charge, and it sounds like you're gradually making it all better. Take a deep breath, Don't Panic, it'll be okay.
Re:Quit (Score:4, Insightful)
Re: (Score:3)
Then build a Action Priority Matrix. It'll help you fit together an action plan and block out time for what appears to be major projects. It also allows you to get some Quickies done to show management you're the right guy to keep doing the job.
http://www.showingnaturally.com/ActionPriorityMatrix.png [showingnaturally.com]
Re: (Score:3)
Based on this - "I assumed the position of programmer and sole IT personnel at a thriving e-commerce company." I am assuming...
1) He says he "assumed" the position which would imply that he worked elsewhere in the company and was made the de-facto IT person based on having an Android phone or a PS3 or whatever other metrics they decided to use. I am going to give the Poster the benefit of the doubt, and assume he is not in over his head.
2
Re:Quit (Score:4, Insightful)
Quit? Do you give up on every task before you start?
Some of us like a challenge.
Re: (Score:3)
Re: (Score:3)
It sounds like it was your boss that was the problem rather than the project. If you can't communicate properly to your boss why there is a problem, what it is, what the consequences are, what you will have to do to fix it, approximately how long it will take and which problems/systems have and have not been fixed (and therefore problems are all your responsibility) then it isn't going to work out. That's a lot of work, unlikely to be a lot of fun, and takes two people: you, to give the right information, a
Re:Quit (Score:4, Insightful)
When I'm given a spoon and told to storm the hill and kill everyone in the machine gun nest?
yes. I quit before I even try. After you have been in IT long enough you can spot a suicide mission a mile away.
Re: (Score:3)
Re:I wouldn't worry... (Score:4, Informative)
Sadly there's a lot of truth to this. In my experience the difference between most "good" and "bad" networks is whether the WTFs are vendor-blessed hacks or in-house hacks.
Of course, there are always those places where this is not the case but I've seen enough IT environments to believe that for a majority of companies this is sadly the state of things. If maintenance in the average factory was handled the same way IT is handled at the average company most machines would consist of approximately 30-50% duct tape, newspaper, string and glue...
Re: (Score:3, Insightful)
Or bring in contractors / consultants and have them serve their part and then part ways, the biggest mistake you can make is taking everything on your shoulders, that = loss of life & health. It's a job and work != life.
Re:Getting a Grip (Score:4, Insightful)
This is the only solid advice I've read so far. Band-aid solutions are indicative of two things: too shy to ask management for a bigger budget, or management's reluctance to improve their budget. Generally it is the latter.
Re:Getting a Grip (Score:5, Informative)
agreed. As soon as I saw this was an IT department of one, I could tell the exact amount of care that management has on getting things like this corrected. These things are in place because management does not want to provide what is needed. If they only want to pay for band-aids, that is all they will have.
This isn't necessarily the case though. I have a friend who took over IT at a small business. When he walked in they were using pirated software and their IT was a complete mess. After he put in hours to get it fixed up (with personal support from the owner), they ended up offering him an executive position with a massive pay increase. Some small shops with one IT guy really just don't know what they are doing, and haven't had a person in the job to tell them what is being done wrong. Your advice is still good though. A person in that situation needs to test whether they have management support to do things better. If so, it can turn into a career making opportunity to turn things around. If you can't get the management on your side though, it very well could be time to start looking for another job with more supportive management.
Re: (Score:3)
We happened to have Sharepoint already installed (as part of SBS2008), so we started using its Wiki feature for our documentation.
We use its lists feature to keep track of license keys and firewall settings (not in the same list of course).
Just make as comprehensive a list as possible.