Category Archives: IT

Overview of Modern Storage for the Enterprise

By | CTO, IT, Storage, SysAd | No Comments

Hyper-converged, scale-out, black-box, roll your own; on-premises enterprise data storage is a bit more than a hobby of ours here at Symbio. Until recently, most medium scale enterprises bought their storage from one of a few vendors; HP, Dell/Compellent, Nimble, Netapp, EMC, etc. These products arrive as a more-or-less plug and play appliances, and are usually fully supported by their respective vendors. They are also expensive. Symbio got it’s start using a home-brewed Linux based storage appliance we built ourselves because we couldn’t afford anything else.

This was great until I found myself debugging production storage issues during (literally) the birth of my first child. After that experience and at the urging of my wife, we committed to a Compellent investment. The cost was extreme for a company of Symbio’s size (the initial purchase price was nearly 20% of our annual revenue, annual support renewals were almost 10% of revenue the first couple years.) The Compellent served us well, and it’s stability allowed us to reliably quadruple the size of our customer base. However, after the first couple years, we outpaced it’s performance abilities.

As technology decision makers, we’re used to basing a storage platform choice on metrics like cost-per-GB, IOPS, features and support reputation. However, we wanted to take a deeper look at our options this time as it’s clear the very ways we think about storing data are changing. This presents fantastic opportunities for cost-control and adding new capabilities, but also present new risks for businesses to contend with. This series of articles explores a few of the emerging trends in on-premises enterprise storage, ideal applications for each technology, and details our specific experiences of each of the approaches. Symbio is currently running all of these systems in a production capacity.

Hyper-converged – VMware vSAN: Solutions like vSAN or Nutanix place the storage directly inside your processing hosts, then use software on the “back end” to provide for performance and data redundancy. These solutions can offer extreme performance at a moderate cost, but dramatically change failure models and require very careful and experienced planning to implement reliably. Symbio uses vSAN as our primary storage for high performance needs; specifically database and virtual desktops.

Traditional “Black Box” SAN – Nimble Storage: The “usual” enterprise approach with an appliance provided for and supported by a vendor. This approach offers moderate performance, generally very high reliability, and is generally compatible with existing thinking regarding failure modes (storage and compute can be thought of as isolated components of an overall system). Cost is often high compared to alternative approaches, but the “one-ass-to-kick” nature of the support can be of tremendous value to shops that lack deep IT talent. Symbio uses Nimble for our “general purpose” workloads; things that don’t demand extreme performance or capacity but where we derive value from some of the “nice to have” features that aren’t available on our other solutions.

Open Source, Scale Out – RedHat Ceph: Ceph is rapidly emerging as a favorite low-cost, high capacity solution for shops with strong technical capability. Ceph uses a mathematical model to decide where to place data on the underlying disks, and clients talk directly to the disks to request data. This means your controller is no longer a bottleneck or failure point as with a traditional SAN. Ceph can scale to Petabytes simply and without the enormous cost a traditional SAN would require. Ceph is open source, community supported (though enterprise support is available), and will run on commodity hardware. Symbio re-purposed all our old Compellent hardware into Ceph clusters (which, yes, we will write a blog post about) and is used as a low-performance high-capacity storage for backups and our SymbioVault off-site backup product. Ceph is presently limited in some very important ways, however.

We’ll explore each of these technologies in depth in the coming series of articles.

To Outsource or Not To Outsource Your IT

By | Business, IT, Outsourcing | No Comments

Outsourcing IT can be a challenging proposition as IT has become increasingly more critical to operations… just think, in years past you could get work done with a pad of paper and a telephone. Today, what happens when you can’t get into your computer? Or you don’t have access to your email? For most organizations this causes a complete standstill.

When it comes to deciding whether to outsource or not, the best course is usually to outsource certain functions and not others. This is highly dependent upon the size and type of organization you are in, although managed service providers, or MSPs, are starting to perform many of these critical IT functions better than an internal team. One of the hardest challenges for smaller organizations is that when they get to 20 or 30 people a dedicated IT resource is often needed and the cost to hire that person isn’t in the budget (see the IT Hiring Guide here: This is also a bit of a burnout job at this smaller scale because there is nobody to cover for them during vacation or sick days.

We’ve put together a guide to help you determine the right type of IT for you. The guide contains a clear comparison of the different types of IT vendors, how to choose a vendor and a checklist for evaluating service level agreements.

Grab the guide here:

IT Security for SMBs and the Rising Risk of Cyber-threats

By | Business, Food for thought, IT, Security | No Comments

70% of Cyber Attacks Target Small Businesses.

This scary stat came out in late 2016 from the National Cyber Security Alliance. The reason for the very high volume of attacks against small businesses is that they make easy targets. While the payday may not be huge, the results can be devastating for the business. Most small businesses are under the impression that they are too small to be interesting for cyber crime. This means that routine IT security patches and server upgrades or maintenance are frequently deferred, exposing vulnerabilities.

Take into account one of the more popular cyber crimes, Ransomware, which systematically encrypts as many files as it can across your entire network. Once encrypted, a ransom email is sent to the company asking for anywhere from thousands to tens of thousands of dollars in exchange for the key to unlock the encrypted files. The price is usually set below the cost of hiring the IT firm to fix the problem and saves the time of performing lengthy restores.

At Symbio we encounter this occurrence fairly often when an employee opens a file or visits a site they shouldn’t have. To combat this, we have architected our IT as a Service platform to stop the spread of the attack and then to rapidly restore from the point just before the attack occurred. Regardless of the hack, make sure that you have a strong IT Security policy in place with the proper processes and infrastructure to back them up.

Going to do a series of posts around; IT Security, Liability, Data Compliance (HIPAA, FAFSA, HI-TRUST, PCI, etc.) because these are very important topics and one thing we see over and over is smaller organizations that have avoidable IT disasters.

Lessons from “The Grid”

By | Business, Entrepreneurship, Food for thought, IT, Outsourcing | No Comments

Recently, on Bill Gates’ suggestion, I picked up ‘The Grid’ by Gretchen Bakke. To borrow Bill’s words: “This book, about our aging electrical grid, fits in one of my favorite genres: “Books About Mundane Stuff That Are Actually Fascinating.”

I often use the term “utility style computing” when talking about our IT-as-a-Service product, and the parallels between the IT industry today and the way our electrical grid developed in the late 1800’s and early 1900’s are striking. To (very) briefly describe the first few chapters in the book, when electrification first became popular, we were limited to DC (direct current, as opposed to today’s more common AC or alternating current). The primary limitation this imposed was range; it was very difficult to transmit DC electricity more than a mile or so. Thus, “private” small generation plants became the defacto standard for electrical power generation. Factories, private estates, and even the occasional municipality would deploy their own stand-alone infrastructure to power their facilities.

During this time, many small power companies emerged, and due to the limited range their plants were idle much of the time (a factory only runs during the day, street lamps only at night, etc). Because there is no cost effective way to store electrical power for later use, the capital investment was huge and the return very limited. Many of the small utilities turned to selling and servicing private plants as a way to make ends meet.

This really reminds me of the way IT service for most SMB’s developed during the 90’s and 00’s. Lots of businesses with servers sitting in closets, idle 96% of the time. Yet that degree of investment was required because there was no shared infrastructure (or grid) comparable to today’s cloud services.

About 30 years pass, and by 1920, AC (alternating current) becomes the norm. Long-distance transmission and voltage conversion become cost-effective. Samuel Insull, Thomas Edison’s long time aid and business manager took control of Chicago Edison, a small generating station downtown. He came to a series of interlocking insights about the utility business:

1) The key to making money as a utility is keeping your infrastructure as close to 100% utilization 24×7 as possible.
2) The lower your prices, the more money you will make.
3) Subscription is ultimately cheaper for the consumer than independence.

Those insights were enough to propel Chicago Edison under Insull’s leadership from bit player to regional monopoly. He realized that federal and state regulation would turn utilities into natural monopolies, meaning competition in those markets would largely disappear due to high infrastructure costs. It remains to be seen if something similar will play out in the cloud space, but with major infrastructure players developing their own custom chips, and what Cory Doctrow (a personal hero) called a “war on general purpose computing,” it wouldn’t be an unsurprising outcome.

Practical vSAN: Increasing Congestion Thresholds

By | IT, Storage, SysAd, VMware | No Comments

As I described in this post, Practical vSAN: Measuring Congestion, the vsan uses congestion as one of the primary metrics in determining whether or not to add latency to incoming writes. As congestion increases, the vsan’s overall performance will decline until eventually it will stop accepting incoming writes all together and machines on the vsan will effectively hang, similar to an APD condition using the core storage (NFS/iSCSI) subsystem in ESXi.

One incredibly useful technique to “buy yourself time” is to increase the buffer sizes. The upper limit according to PSS is 128GB. Remember, this will only buy you time if you don’t resolve the underlying problem. The “LowLimitGB” is the threshold at which latency will start to be added. The “HighLimitGB” is the threshold at which incoming writes will be halted. 64/128 appear to be the limits for these values. You will need to execute these commands on every host in the cluster that is experiencing congestion. We suggest changing them on all hosts to be identical. Also, don’t set limits larger than the size of your cache devices.

esxcfg-advcfg -s 64 /LSOM/lsomLogCongestionLowLimitGB
esxcfg-advcfg -s 128 /LSOM/lsomLogCongestionHighLimitGB

In one recent case, we were able to use these commands to buy ourselves a few hours while we diagnosed an underlying hardware issue, and simultaneously finish up the working day without any further performance complaints. We ended up leaving the values at these levels rather than reverting them to default as recommended by PSS, as I don’t really see a downside if you’re proactive about monitoring for congestion generally. In our next post, Practical vSAN: Part 3 – Adjusting the number of background resync processes, if your storage woes are being exacerbated by a background resync we’ll show you how to change the number of simultaneous resync processes.

Practical vSAN: Measuring Congestion

By | IT, Storage, SysAd, VMware | No Comments

VMware’s vSAN, while an amazing product in many respects, leaves something to be desired when it comes to troubleshooting issues while in production. Many of the knobs and dials are “under the hood” and really not exposed in a way that is obvious in a crisis. In this series of posts we’ll document some of the troubleshooting techniques and tools we’ve gathered over the last several months. As always, use these tools at your own risk; we highly advise engaging VMware PSS if you can.

Part 1: Measuring Congestion

In vSAN, as writes are committed to the cache tier other writes are concurrently destaged to the capacity tier. In essence, the cache tier is a buffer for incoming writes to the capacity tier. If there is an issue with the underlying capacity tier, this buffer can start to fill. In vSANese, this is known as “log congestion.” In my opinion congestion is one of the primary metrics of health of a vSAN. If during a resync or other intensive IO operation you start to experience persistent log congestion, that’s a very good sign that some underlying component has a fault or there is an issue with drivers/firmware. I should also note that log congestion does not necessarily always indicate a problem with the capacity tier, logs are also used when persisting data to the caching tier.

As an aside, the health check plugin reports on these logs on what appears to be a scale of 0 to 255, with 255 being 100% full (though support is unclear on this exactly).

As the log buffer fills up, the vSAN starts to “add latency” to the incoming write requests as a way to throttle them. When a resync is triggered, if the underlying storage cant keep up for whatever reason, these buffers WILL fill. Once they hit a certain threshold (16GB by default) latency is added. More and more latency is added until a second threshold is reached (24GB by default) in which incoming writes are completely halted until enough data has been destaged to resume operation. At this point your entire cluster may enter a kind of bizarro-world APD type state, where individual hosts start dropping out of vCenter, virts will hang and pause. Note, the only indication you will have from vCenter or the vSAN itself at this point is that log congestion is too high.

You can check the current size of these log buffers by running this command from each host in your vsan:
esxcli vsan storage list |grep -A1 "SSD: true" |grep UUID|awk '{print $3}'|while read i;do sumTotal=$(vsish -e get /vmkModules/lsom/disks/$i/info |grep "Log space consumed"|awk -F \: '{print $2}'|sed 'N;s/\n/ /'|awk '{print $1 + $2}');gibTotal=$(echo $sumTotal|awk '{print $1 / 1073741824}');echo "SSD \"$i\" total log space consumed: $gibTotal GiB";done

Another useful tip is to save this into shell script and use the unix command “watch” to observe these values over time (e.g. watch ./

Again, any value below 16GB should not cause the vSAN to introduce latency. Any value between 16GB and 24GB is bad news. If you hit 24GB, you’re really having a bad time. Also watch these values over time, if they are generally moving upwards, that’s bad. If they are generally decreasing you can start breathing again.
These values can be increased, which can buy you some breathing room in a crisis. You can read about that in our next post: Practical vSAN Part 2: Increasing Congestion Thresholds

Is IT as a Service right for you?

By | Business, IT, Outsourcing | No Comments

In short; IT as a Service can make complete sense for small and mid-sized businesses that want access to enterprise-class infrastructure and features without having to manage all the mundane support needs, all while consolidating tech expenditures and overhead into a predictable monthly fee — just like a utility. It does not, however, fit for all businesses.
I’ve been working with some very innovative guys over here at Symbio and am compelled to share some of my learnings regarding IT as a Service. First off, the business model for ITaaS is quite unique and better aligns business interests with those of the client. By charging a flat, per-user fee that covers ALL hardware, licensing and support expenses, Symbio is incentivized to ensure that the support lines ring as little as possible… that means that the infrastructure is extremely well maintained and preventive measures are taken to meet client needs before they arise. When was the last time you called your utility company to tell them what a great job they are doing because your power hasn’t gone out in 3 months? Exactly.
The other thing that stands out is the cost savings to the customer while at the same time always having upgraded tech. One possible IT as a Service model, and the one Symbio uses, leverages desktop virtualization via their own hosted environment, which essentially means that your desktop lives in a data-center (where it should be) and you access it via software or a ‘thin-client’. The benefits far outweigh the cons for most organizations; efficient upgrades and patches deployed to customers remotely, a desktop that can be accessed from anywhere or any workstation, rapid resolution of issues, disaster recovery and audit compliances to name a few. Additionally, what I’ve seen on average is that most firms are able to lower their average annual cost by over 20% when embracing the IT as a Service model, and at the same time dramatically improving their technology and productivity.
There are however businesses where this does not make sense. Organizations that are heavily dependent upon graphics processing or primarily using Macs are not a good fit for this technology as the enterprise management tools aren’t quite there yet. Also, systems dependent upon absolute guaranteed uptime (high frequency trading, medical emergency, etc.) would be best suited to locating all infrastructure as close to the user as possible. In some of these cases it makes sense to bring all the expertise in-house, and in other scenarios it may be best to hire a managed service provider to manage your infrastructure. One size does not fit all, but it may fit most.
Just like outsourcing payroll or legal functions, it seems like more and more often, outsourcing IT makes complete business sense for small to mid-sized businesses. Some food for thought.

Upgrading to a High Performance Storage System

By | Business, IT, News, Outsourcing | No Comments

With summer almost upon us, we’re excited to announce that we’ve completed the deployment of our new high performance storage system. This system is orders of magnitude more performant than our existing system and it means we will no longer have a single spinning hard drive in production. We’re going to complete the data migration portion of the project over the next couple months, and then the last vestiges of 1970’s era technology will finally be behind us. We are now a Virtual Desktop provider that doesn’t use hard drives for anything other than data backups!

What this means for our clients is that our VDI environment will be much more responsive from a user experience perspective and load times for applications will be dramatically reduced. We’re excited to roll out this upgrade to all our clients as a value-add and competitive differentiator for the years to come — aren’t you glad you got rid of that clunky desktop computer?

Spear Phishing: What you need to know

By | IT, Security, SysAd | No Comments

As your IT advisor it is very important for us to remind you to never ever click on attachments that you are not expecting, and if you are responding to a questionable email, please check the address for accuracy.

Here’s why:
Over the last few years, we have made excellent strides towards improving operating systems security. We also have seen a decline in traditional computer viruses. However, there is still a lot of money to be made in the business of compromising your computer (or Virtual Desktop). As a result, there are a lot of people diligently trying to trick you into installing malicious software. We have all seen infected websites, usually pop-ups, which try to trick you into thinking you have a problem that can only be fixed by installing some piece of software. If this has ever happened to you, hopefully you know to exit the Web Browser (Alt-f4 on the keyboard rather than the X in the upper right) and that you should never install ‘security’ software from a random website.

Just as you should never trust website pop-ups, you should also be very careful about trusting your email. Our industry has spent many years developing very complicated software in an attempt to automatically remove things like spam, malicious software, and questionable web links from your incoming mail before you ever see it. This security software works remarkably well given how hard people are working to get around it. The fact is: email was never designed to be secure.

As security systems have improved over the years, smart attackers have shifted their techniques from attacking our filters and trying to get past them, to more direct, personalized emails and contacts. Instead of poorly written emails that look like gibberish, we’re seeing well written emails that reference people by name and occasionally mention specific details about your company which can be gleaned from your website.

These social engineering techniques have been a part of the systems security landscape for a long time, but we’re now seeing enough of it that computer security folks decided to give it a name: Spear-Phishing.

vSphere Appliance 6 update 1b issues – Regenerate your certificates!

By | SysAd, Uncategorized, VMware | No Comments

Recently we decided to upgrade one of our clusters to the latest vSphere/ESXi 6.0 U1b.  While it’s early in the release cycle to apply these fixes to a production system, we’ve been having some issues with this cluster we were hoping the upgrade would resolve.  This cluster has 4 hosts running VSAN for storage, and is primarily used for VDI.  We use App Volumes pretty extensively.  This cluster uses the certs that were generated when the appliance was deployed.

Last Friday night, we mounted the ISO and ran the update.  Following the update things seemed fine for a while other than an apparently cosmetic “This node cannot communicate with all the other nodes in the VSAN cluster” in the VI native client which is documented in this VSAN Forum discussion.  However, after approximately 2 hours online, the vpxd service would end up unresponsive and VI and Web Client tools would hang.  It would intermittently come back, but you could only click on one or two things in the VI Client before it would become unresponsive again.  VCS servers would be unable to execute power operations.  If we reboot the VCSA, it would be responsive for a few hours until hanging again.  None of this behavior presented itself before the upgrade to U1b.

We opened up an SRR with VMware support, but after hours looking at logs the best they could suggest was rebuilding the applicance from scratch and re-importing our database from the broken appliance.  Ultimately, we started looking at SSO as the probable cause of our issues, we noted our SSO logs appeared much larger than they should be.  At the moment the appliance became unresponsive the SSO service started failing authentication requests.  Given that VMware’s solution was to rebuild the appliance, we decided we had little to loose by attempting to regenerate all of the certificates using the certificate-manager utility.  After giving that a go, the problem has resolved.  Our best guess is that one of the solution certificates was, to use the technical term “borked,” and that U1b either has some new throttling in place or handles broken solution certificates differently than 6.0a.

We’ll continue troubleshooting with VMware to attempt to determine the underlying cause and will update this post if we learn more.  It seems as though the common wisom was true, at least this time: “If thouest have performance issues with vSphere, SSO and certificates are thy cause.”