Ask Slashdot: Remote Server Support and Monitoring Solution? 137
New submitter Crizzam writes I have about 500 clients which have my servers installed in their data centers as a hosted solution for time & attendance (employee attendance / vacation / etc). I want to actively monitor all the client servers from my desktop, so know when a server failure has occurred. I am thinking I need to trap SNMP data and collect it in a dashboard. I'd also like to have each client connect to my server via HTTP tunnel using something like OpenVPN. In this way I maintain a site-site tunnel open so if I need to access my server remotely, I can. Any suggestions as to the technology stack I should put together to pull off this task? I was looking at Zabbix / Nagios for SNMP monitoring and OpenVPN for the other part. What else should I include? How does one put together a good remote monitoring / access solution that clients can live with and will still allow me to offer great proactive service to my servers located on-site?
Reverse-SSH tunnel phone-home from remote device (Score:2, Informative)
Set up a script to initiate a reverse-SSH tunnel from the remote device back to a monitoring server, set up no-login on the tunnel but distribute keys for the monitoring user on the remote devices.
You should be able to passwordless login from the monitoring box over a completely secure link that doesn't require port-forwarding at the remote site.
Re:Reverse-SSH tunnel phone-home from remote devic (Score:5, Insightful)
Or, do the right thing and hire a network admin so someone with a clue is involved.
If you have to ask this question on slashdot, you need to change the question to something appropriate. Based on exactly what was posted, he doesn't have any idea what his requirements are. He knows the conceptual goals, but not the actual goals or requirements. Unless he is trying to change careers from whatever he is to a full time network infrastructure person he is going to be wasting a lot of time getting a clue. That means time he won't be spending doing whatever his actual job is.
He needs someone who can look at his actual setup, figure what what actually needs monitored, and knows the appropriate ways to do it.
Short of multiple Bennett hasleton length posts, and many discussions in depth, no answer coming from slashdot or all of them combined is going to be useful.
Everyone here posting solutions has their own, certainly incorrect idea of what he wants but no one actually knows. No one so far has even started by asking the right questions. It's the blind leading the blind at best.
Re: Reverse-SSH tunnel phone-home from remote devi (Score:1)
Re: (Score:1)
I agree. A meeting needs to be held with the technical team to determine what exactly needs to be monitored.
With that being said, ask yourself a few questions:
Are you looking for a heartbeat?
Are you actually more concerned for the applications running on the servers?
Are you looking to monitor individual pieces of hardware, e.g. CPU, RAM, etc.
Are you trying to determine if a there is a network hardware failure as well, e.g. router, switch, etc. (did a switchport die and did I lose a particular subnet or clu
Re: Reverse-SSH tunnel phone-home from remote devi (Score:1)
Re: (Score:3, Funny)
Just because you're unfamiliar with networking administration doesn't mean this needs to blown up into "hire a network guy". That's just ignorance and
As someone who's been a network admin for a few years, I'm fairly confident in my statements. Do you do even minor surgery on yourself if you're not a surgeon? If you come to slashdot to ask how to do something for your business, you already fucked up and the only valid responses you should be getting from slashdot are help on finding someone who can help you. If he asked 'how do I find someone, like a consultant for a short term project, like this' that would be one thing. He didn't, he came here expec
Re: (Score:2)
As someone who's been a network admin for a few years, I'm fairly confident in my statements. Do you do even minor surgery on yourself if you're not a surgeon?
I am a network (and Linux) admin by profession, but I can also repair my audio equipment and do some repairs on my car, even though I do not work as a car mechanic or electronics repair guy. While I could find a mechanic to repair my car (and sometimes I do), a lot of the time is is cheaper and faster to do it myself.
So, if the OP wants to create a monitoring solution himself (assuming he knows something about the monitoring systems) more power to him. I probably would ask a similar question if I had to mon
Re: (Score:1)
Re: (Score:2)
The reverse-SSH tunnel is the correct way to "phone home". Maintaining a VPN is a shit show.
A blanket statement like this shows your cluelessness and shear ignorance.
Without considerably more information neither you nor I nor anyone else can make such a statement.
Pure Storage does it this way, and they are quite the experts.
Oh well, since a company thats barely 5 years old does it this way, and since their primary business line is selling flash drive arrays ... not network administration and monitoring ... they must be the most qualified and perfect example to follow.
IS IT the right way for THEM? Maybe. Maybe not. To pretend that just because they do it t
Re: (Score:3)
You sound like a Windows admin for a gov't entity.
You spend a lot of energy telling people they do it wrong without having any real insight or advice on how to do it correctly.
A blanket statement like this shows your cluelessness and shear ignorance.
What does his knowledge of a specific cutting tool have to do with anything?
Scratch my back (Score:2, Interesting)
Re: (Score:3)
Re: (Score:2)
Would this centralized server be your universal remote server?
Is this a serious question? lol
It would be a Desktop PC, constantly mobile. Works only in 64bit with local mouse and keyboard inputs.
Re: (Score:2)
I would elaborate on that a bit. I would have in the colo facility a Cisco ASA or other hardened appliance, and use that for the VPN connection.
I would then build a hardened server that accepts the stuff the parent points out, SNMP traps, syslog (both TCP and UDP), but I would recommend a tool like Splunk or a similar item. Splunk has served me well in my dealings. Once that is in place, I'd set up Splunk forwarders on critical machines for more detailed monitoring.
From there, I'd create a dashboard for
NSA (Score:1)
Should ask them!
I just discovered NewRelic ... (Score:4, Interesting)
Check out www.newrelic.com - even their free service tier offers great features and it's easy to deploy on all servers
Re:I just discovered NewRelic ... (Score:4, Informative)
NewRelic is pretty sweet, as the parent says, even at the free tier. They will definitely bombard your email and phone with hard-sales pitches, though, and there's a giant cost leap from free to the next tier.
Re: (Score:1)
To that point ... I installed it on 11 servers in 14 day "Pro" trial period. Sales guy contacted me by email, we exchanged 3 emails since I will be subscribing in the future but when I told him that I'm happy with free tier for now, there was no further push from their side. Since then I'm up to 30 servers and loving it.
FYI: Server monitoring is a side product of theirs. Their main product is app stack monitoring - great for finding failures and bottlenecks in PHP, Ruby, Java apps etc
Re: (Score:1)
We have NewRelic deployed and pay for it. It is worth the money for us because we not only get the "Is it up" but get to see the software stack interact with the hardware. We had one client who had feature creep and we watched their VM start to die because of memory creep and it justified putting them on another box. When we showed them the reports, they were quite happy to write a check for the upgrade.
Ping? (Score:3)
For Server active status (eg: am i dead?)
Inside a while loop or sleep() if you cant be bothered.
for(int i=0;iMAX_SERVERS;i++)
{
IcmpSendEcho(..........);
}
For everything else monitoring related. Employ someone to make a custom monitoring application ,or, Google "server monitoring software".
Re:Ping? (Score:4, Informative)
For some reason, disabling ping is considered a security feature, so a lot of places block it at the firewall. Cloud services (I'm looking at you, Azure) also either doesn't allow it or can't do it.
Ping is not reliable (Score:2)
Ping is almost the worst way to check to see if your server is up. In fact, certain machines will return an ICMP response even after you've broken into their bios-equivalent (hello, Solaris).
Do a service level check.It's not that hard to do a curl instead of a ping. A curl's results can show you if it's present and functioning. A ping just shows you that the network interface is responding or not.
People disable ping because if you don't know a server is there you can't attack it. It's like enabling MAC addr
Re: (Score:2)
People disable ping because if you don't know a server is there you can't attack it. It's like enabling MAC address filtering - it doesn't really help that much, but it in a specific set of circumstances help a bit.
If there's no other services presented to the world, yes. But a simple port scan will tell you it's up and that doesn't take long to do.
reverse ssh (Score:1)
Have the clients connect with ssh to your server and open a reverse port. They'll each have to pick a different port on your server.
Use something like autossh ( http://www.harding.motd.ca/aut... [harding.motd.ca] ) to make sure the ssh connection is always open.
Having said all that, sounds like a great security hole if your server is ever breached. Plus lots of potential privacy violations.
Marqis
Re: (Score:3)
Keeping track.. (Score:2)
HTTPS solutions like NewRelic aren't an option because you want to be able to ssh back into the host..
Assuming all clients will allow it I can only think t
Re: (Score:3)
You'll need a means of knowing that 10.20.20.x is client x and 10.20.20.y is client y. Of course OpenVPN allows you to do this but maintaining that table by hand could be a bit of a pain.
You mean like the common name of the ssl certificate used to connect in the first place? Combine this with a client-connect script to update dns and/or the ifconfig-pool-persist option and you've got a great solution.
Re: (Score:2)
Re: (Score:3)
There's no need to install Ansible on the remote systems, only on the machine running the playbooks. All Ansible activity is run over SSH and has no remote dependencies.
Re: (Score:3)
Managing the OpenVPN connections is not that bad. You give each client its own key and certificate and you use OpenVPN's ccd/ directory to assign VPN IP addresses.
We use the following tools to monitor our servers, but we're only monitoring about 30, not 500:
Re: (Score:2)
I personally have used Xymon with more than that many systems. It takes time to classify them, but it is doable.
The price is right on Xymon, however, if I were to recommend a monitoring solution for both real time, "oh shit" monitoring such as a drive array about to fail as well as a historical log (for security and finding a baseline), I'd go with Splunk if possible due to the tools available, and the fact that you can send management-friendly reports about the health of the enterprise up the chain.
Again,
Re: (Score:2)
Or you could (Score:2)
Re:Or you could (Score:4, Informative)
Nagios is Open Source.. GPL V2 specifically..
Re: (Score:1)
Actually, forget Nagios. Lately It has been turning into a NIH syndrome/Copyright/ego_clash fight. Go with Shinken instead. Drop in replacement for nagios that scales and does not have childish problems.
Re: (Score:1)
"BUT before you set this up, be damn sure that you don't punch a hole in your customers' firewalls by having a VPN to your monitoring server. Having 500+ VPN connections from one Linux box to servers located in customers' internal networks might backfire at some point if it's implemented incorrectly."
Just disable clien-to-client in the OpenVPN server (which routes all activity through the tun device) and setup iptables to accept only incoming/established connections on the tun device. Only allow the server
Re: (Score:3)
Openvpn and x11vnc (Score:3)
I do something similiar. I use openvpn and x11vnc. I have a cron on each client that runs a
small perl script that grabs the output of several programs like top, uptime, and sensors
and then saves the results in an easy to parse file that my server periodically grabs so that
I have stuff like cpu temperature, cpu usage, memory usage, etc...
I also grab a screenshot of x11vnc using vnccapture.
I also have a way to remotely activate reverse ssh if for some reason openvpn fails.
My only problem with openvpn is key management. Creating and distributing unique keys
to each client is kindof a pain.
Hopefully this goes without saying (Score:4, Insightful)
Make damn sure your clients are aware of exactly what you're doing. They probably don't care about the specifics (e.g. openvpn, reverse ssh); but they need to know you can remotely access the boxes.
It's probably a good idea to have some sort of document to give them that does spell out all the specifics - something they need to acknowledge/sign, with both of you keeping copies.
Re:Hopefully this goes without saying (Score:5, Informative)
Actually, the model of remotely-managed on-premise appliances is not that crazy. Assuming it's done securely, you get the best of both worlds:
If the customer's Internet access goes down, they're not dead in the water as they would be with a cloud solution.
If you manage everything for them, then the box is completely hands-off... just like a cloud solution.
There's an entire business category called "Managed Service Providers" whose vendors do exactly this: Remotely manage all aspects of your IT infrastructure so you don't need to worry about anything. For mom-and-pop non-technical businesses, it's an excellent model.
Re: (Score:3)
The fact that a well-managed cloud service is multiply-redundant is of little consolation if your crappy DSL line goes down for 6 hours and your salespeople cannot access the CRM tool.
What's more likely to happen: the loss of access to Amazon cloud services/internet, or a local box getting cacked
Unequivocally for us: Loss of Internet access happens far more often than a server failure.
Re: (Score:2)
Our DSL is not particularly unreliable. However, our servers are spectacularly reliable. They run Linux on decent hardware and we almost never have a server failure. Our most common cause of a server failure over the last 10 years has been power failures long enough for the UPS to decide we'd better shut down.
Security and liability: think Target (Score:2)
The media says Target was breached due to a compromise at their HVAC vendor. Do you want to be the vendor that gets hit with a liability suit because someone broke in through your network?
It's obvious from your question that you're not really sure what you're doing. SNMP? That's for network crap, not for server and application level stuff. Why would you even talk about SNMP? Why would you even want a VPN into the customer network?
If you need access to your server, write it into your support contract, and as
OpenVPN + Nagios (Score:1)
Re: (Score:2)
Icinga rather than nagios... always... the simple basic changes to Icinga make it so much nicer to work with, even the v1 branch which is just a fork with some updates
zabbix is NOT an snmp manager (Score:3)
not really. snmp is an afterthought for them and its clumsy as hell to add snmp to it. I tried and gave up. instead, I picked hobbit (uhm, the new name is 'xymon').
xymon has its quirks but it was not hard to modify to add more snmp features to and its coding was not too bad to get thru. its not written in a lot of 'strange' languages, and that's a plus, to me, too.
personally, I usually just write snmp code fresh, from scratch, using net-snmp mgr tools. its not hard and you get just what you want and you are not muddled down in lots of 'infrastructure' that someone else thought was good but useless to you (like zabbix).
BMC Patrol (Score:1)
Re: BMC Patrol (Score:1)
Zenoss is awesome (Score:1)
Zenoss is awesome and as your business scales so can it. Our organization monitors 5000+ servers worldwide in all sorts of places. Zenoss lets you do everything you'd want. Setup notifications for one or more servers, types of errors, and filters within filters. It's a rocking platform and if you're big enough, they'll set it all up for you for a fee.
Re:Zenoss is awesome - Zenoss Core + OpenVPN (Score:2)
OP - For what it's worth, any open source monitoring software should play just fine with OpenVPN. However, the monitoring feature set should be simplified into a single interface, you don't want to have to be fixing scripts and
Stunnel for secure connection. (Score:1)
NAV works great (Score:2)
Look at the ELK Stack (Score:2)
The ELK Stack (ElasticSearch, Logstash, Kibana) are great tools for capturing logs from *anything*, indexing and massaging of the data captured, and then offering up visualization, searches, and dashboards (that refresh). Built with Angular.js so the speed happens.
We could be talkin' web server logs of the NY Times servers, centralized and displaying dashboards in real-time, or maybe 24/7 sensor data streaming from the ocean floor. The ELK Stack can do it.
First googled citation, and there's plenty more wher
Re: (Score:2)
ELK works but frankly it's defaults do just about nothing. As a stack sure it's great but it needs to be added as an adjunct to a real monitoring system and it needs useful defaults and/or some sort of add on repository. The opennms boys are working on showing rrd data into ES.
Pretty much you set up ELK and go great my logs are all one place but it does nothing by default nor is it easy to do anything useful with it. Adhoc searches of logs is great in all but your basically replacing ssh cat | grep. Tak
Comment removed (Score:3)
Re: (Score:2)
Call me silly (Score:2, Insightful)
But shouldn't this have been part of the design BEFORE you rolled out 500 servers?
Re: (Score:2)
I'll bit, and I'll call you silly.
Many projects evolve over their lifetimes. This isn't just an IT thing. In many cases during the construction / commissioning stage you'll come out of the end with a wishlist of things and features to add in the future. Many such things would be impossibly expensive (both in money and lost time) to add during the project stage, and many projects which demand everything from the very beginning end up turning into an unmanageable behemoth.
If the primary goal was to get 500 se
Nagios with NSCA? (Score:1)
You could outsource this. (Score:1)
Let me know if you want to do something like this and we can work something out. Reply to this and we can connect.
I would do exactly what you outlined (Score:1)
SolarWinds Server & Application Monitor? (Score:1)
Lemme rephrase (Score:1)
bash (Score:2)
Re: (Score:2)
Duct-tapey but it works. Lets add some bailing wire and include a phone with a limited data plan that you can pair with via bluetooth or usb. That way, when both their internet connection *and* the box you are monitoring go tits up at the same time you can be notified as well.
Check out "The Assimilation Project" (Score:2)
Status updates (Score:2)
I manage a hub server and a backup server. Every 60 seconds the backup server crontab (wget) fetches a 'web page' from the hub server which as a side effect records the callers IP address into a file. Even though the backup srever has a dynamic IP address I can always find it by going to the hub server and looking into that file.
I have a page I can go to on the hub server which checks the timestamp on the file BackupServer.ip. if it is suspiciously old then that web page turns red and tells me that thin
Whats Up (Score:2)
Re: (Score:2)
www.pulseway.com Ultimate Flexability (Score:1)
This is an amazing product. I've used this in the past and LOVE it! Need to run a remote powershell command from your android? It does that. Dashboards for all the things? Has that covered.
Check it out:
http://www.pulseway.com/
Another option to reverse ssh tunnels and openvpn (Score:1)
If you're using linux or BSD, another option to reverse ssh tunnels or openvpn would be EPS Conduits: http://eps-conduits.sourceforg... [sourceforge.net]
It was written with the goal of having a large number of remote devices form a virtual network for ease of management/maintenance.
Cacti (Score:2)
We have VPNs to each data centre and client site and administer them over SSH generally. Some systems (eg ones dealing with customer details like credit cards) we have a single external facing hos
PRTG is the most cost effective and feature rich (Score:2)
Continuum (Score:2)
Check out http://www.continuum.net/ [continuum.net]. I've been using their services for over 5 years, and they've been steadily improving it since they split from Zenith Infotech. No, it's not free, but it's quite cheap per unit and you get a lot of bang for your buck. Remote monitoring and alerts on any service, remote access, at-a-glance dashboard, etc. With 500 clients, I'm guessing you'd rather spend your time monitoring the situation than putting together a custom solution.