Dell Storage

1201, 2016

Happy New Year from Planetchopstick

By |January 12th, 2016|Dell Storage, DellXC, Nutanix, Storage|0 Comments

Ahh, a brand new year. It smells fresh …

I’m not a New Years Resolution guy, because I always break them. I prefer nondescript dates, innocuous, inconsequential (I’m all out of big words). A good example is when I started the LCHF eating plan, I picked Sunday Nov 2nd 2014 for no other reason than it wasn’t a key date and it wasn’t a Monday to start :)

So, I’m breaking habits but like a lot of people I am making a resolution to post more and about more things, and what an easy start, a post about posting more.

Just before the Christmas break we got a bunch of the Queensland Dell specialists and solution consultants together in the boardroom and we brought in all our toys.

  • Dell XC Nutanix
  • Dell Networking running Cumulus
  • Dell Storage SC Series SCv2020 iSCSI array
  • “Cough” Supermicro Nutanix :)
  • Dell Storage PS Series PS6210XV
  • And a bunch of other bits, including my trusty 1G TP-LINK 16 port switch, which I call THE WORKHORSE

We may run out of ports here

Cumulus switch deployment and configuration with Ansible on a Dell Networking S4048-ON open networking switch.

Lots of playing around with the XC clusters, connecting SC storage into XC nodes. Pulling cables, failure scenarios. The networking team automated the deployment of Cumulus and configuration via Ansible which was pretty awesome. Cam from Nutanix testing replication from ESX to Acropolis and doing a restore. (I won’t steal his thunder here as he is going to do his own post about this at http://invisibleinfra.com/).

One thing we did (sort-of) get working is the Cumulus integration into Prism. Very cool. Full doco on how to do that is here.

We learnt a lot, especially that we tried to do too much at once which slowed us down a bit. We will be looking to run these once a quarter and if we get better at it we’ll get customers involved as well. I was using the hashtag #bnedellplugfest but I think that was a bit long and it would have been more interesting if it wasn’t just me tweeting :)

As you can see, fodder for a few interesting blog posts. Now to follow through …

I hope everyone has a great year, I know as a Storage guy in Dell this year is going to be VERY interesting.

Cheers Daniel

1012, 2015

Why you should use TLC flash in your storage arrays

By |December 10th, 2015|Compellent, Dell Storage, Storage|0 Comments

*Ed note* I wrote this over 3 months ago but didn’t get around to posting. Still relevant though.

Everyone knew SSD drives would change the storage landscape dramatically but the speed of development and rate the capacities are growing is still impressive. Dell has taken the next step in SSD evolution by announcing support for TLC SSD drives in our SC series storage arrays. We were first to market with the high capacity TLC drives in an enterprise storage array. We are still the only vendor as far as I know that can mix multiple SSD types in the same pool.

Why do you care? It all comes down to the different SSD drive costs, quality, resiliency, and perhaps most importantly, *capacity*.

There is a lot of information out there about the various SSD types and their use types so I wont go into it in much detail here(reference Pure SSD breakdown article). There are three types of SSD drives supported in Dell SC series (Compellent) arrays

Perhaps make this into a simple table, resiliency, cost, capacity

  • SLC – WI Write Intensive. Great for writes, great for reads, $/TB high
  • MLC – PRI Premium RI Read Intensive. Ok for writes, great for reads, $/TB better and higher capacity drives
  • TLC – MRI Mainstream Read Intensive, Average for writes, still great for reads, $/TB excellent and the highest capacity (in a 2.5 inch form factor to boot!). Massively outperforms a 15K drive.

*Ed Note* The 1.6TB WI drives are Mixed Used WI drives.

Where SC series has its *magic sauce* is it uses Tiers of different disk types/speeds to move data within the array. Hot data on Tier 1, warm data or write heavy data on tier 2, and older, colder data in Tier 3. Typically Tier 3 would be NLSAS 7.2K spinning drives as they add the best cost/TB. SC series can mix and match the drive types in the same pool because of *data progression*. New writes and heavy lifting is handled by the top tiers and the bottom tiers are only periodically used and only for reads.

The largest TLC drive at time of writing (Sept 2015) is 3.8TB. 3.8TB in a 2.5 inch caddy, low power consumption and no moving parts. I don’t have exact performance details but for read workloads the MRI drives perform about the same as the PRI drives, but for random write workloads they are about half the performance of a PRI SSD. (*Rule of thumb, every workload is different. Speak to your local friendly storage specialist to get the right solution for your workloads). Compare that with a 15K spindle, its better rack density and power saving and a huge performance boost per drive. Then consider 4TB NLSAS drive that is 3.5inch, 80 – 100 IOPS with a random workload, spinning constantly so higher power consumption and moving parts. Obviously you can have situations where a NLSAS drive can spin down when not getting used but thats not the norm. The TLC drive is going to be more expensive than the NLSAS drive but when you take into account power, footprint, added performance over the life of the array it becomes a different calculation.

*magic sauce – secret sauce and magic dust together, like ghost chilli sauce, just with more tiers (geddit? hot sauce .. tears? )

SC Drive Types Dec 2015

You can see there are 4 capacities being supported at the moment.

  • 480GB, 960GB, 1.9TB, 3.8TB!!!

Yup, a nearly 4TB drive in 2.5 inch form-factor that is low power and 1000s of times faster than a 15k spinning drive with the about same cost per GB as a 15K drive. This is just the beginning, there are larger capacities on the roadmap. I wouldn’t be surprised by the end of next year to see the end of 15k and 10k drives in any of our storage arrays.

While we are on the topic, this is an excellent blog on the newer types of flash storage being tested and developed to help take Enterprise Storage into the future, whatever it looks like.

What are the gotchas? It can’t all be peaches and cream. You can see in the table above, there are different SSD types for different workloads. If you have a high write environment then the RI drives may not be a good fit because of the high erase cost and NAND cell resiliency. For that workload you would be better off with the WI SSDs.

However, most of the workloads I see and also stats that come from our awesome free DPACK tool is most environments are about 70/30 R/W% and average 32K IOs. (Typical VM environment). These are a great candidate for the RI drives.

Here is the great part for Compellent SC, if you want the best of both worlds we can do that using tiering and Data Progression to leverage a small group of the WI drives to handle the write workloads and a larger group of the RI drives to handle all the read traffic, even though to the application its just one bucket of flash. Now we can provide an all-flash, or hybid array with loads of flash but with a much much lower $/GB which is essential with the current data growth rates.

Data Progression in SC series

Data Progression in SC series

Here is an example. You have a VMware workload that you would like to turbo charge. You want to be able to support more IOPS but you also want those IOPS to be sub millisecond. You reach out to me, I talk about myself for the first 15 mins and then we run the free DPACK tool to analyse your workload.

  • DPACK reports 70/30 R/W% and average 32K IOs, with 95% of the time sitting at 5000 IOPS peaking to 12000 during backups.
  • Also that there is latency spikes throughout the day when SQL devs run large queries at 10am and 2pm but it usually sits about 3ms – 10ms, not too bad although during backups read latency jumps up to 30ms sometimes.
  • Queue depth is pretty good and CPU/MEM usage is fine. Capacity is 60TB used but a lot of probably cold data.
  • Looking at the backups about 2TB of data changes per day.
  • The SQL devs want to lock the SQL volumes into flash because they write shitty queries and can’t be assed optimising them. (I used to be a Oracle DBA, devs are lazy).
  • Growth no more than 30% year but a lot of that will be data growth, not workload growth.

This is a very common workload I see, it helps that Australia and New Zealand are very highly virtualised so a lot the workloads we see are ESX, with Hyper-v becoming more common. With this much information its reasonably simple to design an SC array that I would be 100% confident would nail that workload.

Its not a massive system and growth will mainly be Tier 3 but there are a few writes from the SQL databases so a SC4020 array with WI SSD, RI SSD, and NLSAS for the cold tier should do the trick.

The SC array uses tiering and incoming writes into the array very differently to a lot of arrays in the market. All new writes come into the array into Tier 1 (the fastest tier) as RAID 10 (the most efficient write). This is done on purpose to get the write committed and the ack back to the application as fast as possible. The challenge is R10 has 50% overhead and with flash that can mean $$$ and this is where the two tiers of SSD comes into its own. Every couple of hours (2 hours by default), a replay (snapshot) is taken by the SC array and marks the volumes blocks as read only. This is then instantly migrated to the second RI flash tier as R5 to maximise usable capacity. Because the data isn’t R/W anymore there is no need for it to be R10. SC uses redirect on write so new writes are written into Tier 1 as R10 and volume pointers are just simply updated.

A lot of info in a small paragraph but you can see what is happening there, the WI tier does all the heavy lifting in the array and then older data is moved to the RI tier for it to be read from. Then, as the data gets cold, it is typically moved to Tier 3 (NLSAS in my example) as R6. Same data, moved to the write tier and the right time to maximise performance and $/GB.

The replay is taken every 2 hours and then moves the data down to Tier 2. This means we only need to size Tier 1 for the required IOPS and enough capacity to hold 2 hours worth of writes x 2 (R10 overhead). in my example above there is about 2TB of data written everyday (if every write was a new write, assuming worse case). If you break that into 2 hour chunks its less than 200GB per replay, double it for R10 and I would only need 400GB of WI SSD to service that 60TB workload. The reality is that there are spikes during the day and the DPACK tool identifies those but you get my drift.

So .. Tier 1 lets go with 6 x 400GB WI drives. (1 is a Hot Spare). I wont put the exact figures here but those drives with that workload would smash it out of the park with 0.2ms latency.

Now I can focus on Tier 2 almost purely from a capacity standpoint. Remember, this tier will hold the data being moved down from the WI tier but its also holding data classified as hot that gets read from a lot. Everything in this tier will be R5 to get the best usable capacity number. They have 60TB , change 2TB a day and the SQL DB they want to pin is 10TB. So I want to aim for about 18TB usable in this tier just to be safe. I don’t have to worry about SSD write performance on this tier because it will be nearly 100% read except when data is moved down every couple of hours.

So . Tier 2 I’ll use 12 x 1.9TB MRI drives (1 hot spare). This gives me 18TB usable (not raw, you’ll find Dell guys always talk usable). Plenty of room for hot data and to lock the entire SQL workload in this tier. You would need shelves of 15k to get the same performance.

By splitting up the the WI & RI tiers it gives a level of flexibility that is difficult without tiers. If the write workload stays static, in other words stays around the same IOPS number and TB/day, there is no need to grow it. However some other business units see the benefits that the SQL guys are getting and want in on that action. We can grow the WI and RI tiers separately. Simply add a couple more 1.9TB RI drives and that tier gets bigger. We then change the Storage Profile on that volume (and with VVols we’ll change it on that VM) and voila, that volume is now pinned to flash.

Finally, we need another 40TB for the rest of the workload + 30% a year for growth over three years = approx 90TB.

Note: you can add drives anytime into an SC array and the pool will expand and rebalance so you dont have to purchase everything upfront. Also, with thin (assuming provisioning), thin writes, compression, raid optimisation etc there are extra savings but I’ll leave those out for now.

Like the RI tier, all the data in here will be Read Only and for larger drives will be R6. Because we don’t write to this tier besides internal data movement we are squeezing as much performance out of the spinning rust as possible. The key is to not have too much of a divide between the SSD tiers and the NLSAS tier. Again, DPACK allows us to size for the workload instead of guessing. We know the workload is 5000 IOPS so I want this tier to be about 15-20% of the total number, 1000 IOPS (that’s convenient). The NLSAS drives aren’t being written to and so there is no RAID write penalty so I can assume 80 IOPS per drive, 12 drives gets me very close to my IOPS number with a hot spare and magically its also the amount of 3.5inch drives we can fit in a 2U enclosure. Its almost like I’m making this up :) I have the drive number, but I want to get to 90TB usable at R6. Different story, with 24 x 6TB drive we get about 100TB usable. The good thing is I know I have met the performance brief.

Still with me, this has been a longer explanation that I intended. Speak of Puns, I hoped some of the 10 puns I have in this post would make you laugh, but sadly no pun in ten did.

End result, I have an SC4020 with 18 SSD drives (6 spare slots for expansion), 2 extra SC200 enclosures with 24 6TB NLSAS drives. 6RU in total and it nails the performance and growth rates needed.

You can see, having the option for multiple flash types makes for very flexible and cost effective solutions.

Where to from here? I’m sure drive capacities will continue to grow and grow, with the newer types becoming more mainstream. Samsung released a 12GB SSD recently and without doubt we’ll see that sort of capacity in our arrays over time. Imagine have a 16TB SSD in 2.5 inch, 32TB? A 1RU XC630 Nutanix node with 24 x 4TB 1.8 inch SSD. The only issue is we still have to back it up!!!

*Final Ed note* Since I wrote this post Dell has released the SC9000 platform. When it is paired up with all flash it is a monster.

2409, 2014

Dell to explain Project Blue Thunder and SDS strategy at Dell User Forum in Sydney.

By |September 24th, 2014|Dell Storage, DUF, Storage|0 Comments

Dell User Forum ANZThis is a follow up post to Dell User Forum registrations open for Sydney 15th October 2014

Date: Wednesday, 15th October, 2014 @ Randwick Racecourse, Sydney
Website: http://www.delluserforumaustralia.com.au/
Twitter: #DUFAU14
Register here: http://www.etouches.com/97034

Every large vendor needs their own shindig, and here at Dell Australia, we have ours coming up in 3 weeks on a grand scale.

Dell User Forum will be in Sydney on October 15th and is Dell’s largest customer event in ANZ this year. It’s being held at Randwick Racecourse which should be really interesting. Apparently the function centre has had some renovations recently and the word internally is that it looks fantastic.

I’m particularly keen to see what it’s like inside because for all the years I lived in Sydney I have never been to the track to watch a horse race. This is mainly because horse racing is phenomenally boring, and I have no idea what I’m doing when placing a bet, and I’m too big to be a jockey.

Fun fact: in the Triathlon series in Sydney, they have a separate category for guys over 92kg called the “Clydesdale Class”.

I did go there once to set up a CX3 system to keep track of all the horse husbandry data, which is an enormous industry in itself, e.g. the son of Silver Sovereign and I’llShoutyouGetEm was called MadonnaCantSing, but that’s a story for another time.

Now that the event has scaled to include all the main Dell pillars; Storage (Woo!!), Server, Software & Security, there will be tracks you can follow if you are interested in a certain topic. It’s buzzword city but after DUF in Miami in June these are the main topics attendees were interested in:

  • Big Data
  • Cloud
  • Mobility

Of course, if you want you can just pick and choose and go to whatever session you want.DUF Agenda

The formal agenda has been released to the DUF website. The PDF file has all the descriptions on what the breakout sessions will be covering.

I’m a storage guy, or Dan Dan Storage Man as I seem to get called lately, so I HIGHLY recommend you ditch all the other crap and only come to the storage sessions. In particular ignore servers, servers are the past man, storage is the future , like *cough* .. Nutanix .. EVO:Rail .. Storage Spaces …….. oh shit :S

I’m only kidding, SDS & convergence is a big topic this year and Dell seems to have its paws all over it. The upcoming OEM agreement with Nutanix, the new VMware EVO:Rail appliance, and Microsoft is a Platinum sponsor so heaps of info on Storage Spaces and whats coming from their part of the world. I have heard a rumour that we will have an EVO:Rail box on display at the Solutions Expo but that’s not confirmed. I’m keen to check it out. Hopefully I’ll know soon and put it up on twitter.

Helping the big push is the newly announced new PowerEdge 13G platform that is going to drive all these SDS products. Density and flexibility in storage/connectivity and of course raw compute power is needed to take it all this hyper-converged industry to the next level, and that is where the new R730XD and FX2 blade system will shine.

There will be 13G hardware demos and displays on the Expo floor.

Dell Storage

This time last year Dell was announcing our industry first ability to mix Write Intensive SLC with Read Intensive MLC flash drives with our On Demand Data Progression (ODDP) technology. This allows us to have the blinding write performance and endurance of SLC with the capacity and read performance of MLC at the same price/capacity of an equivalent spinning disk system.

You can see from this Register article on the 17/09/14 where the author wistfully dreams of a future where you can tier between two types of flash and Compellent has been able to do it for over a year!

It’s a year later and we have shipped a shed tonne of flash in that time and now we will be showcasing the latest Storage Center SC6.5 code that features compression, greater addressable capacity and some other goodness. As always, the additional software features are free if you have a valid support contract.

There will be a running SC4020 on the Expo floor for you to get hands on and try out some live demos.

One session not to miss is the “What’s coming next with Dell Storage (NDA session). We will have our very own storage Zen Master Andrew Diamond co-presenting with two of the guys running the Nextgen Storage strategy for Dell; Keith Swindell and Peter Korce. I expect to learn stuff in that session too, in which I will be live tweeting :)

On a side note, how do you live tweet an NDA session?

As most people who will read this post is aware, Dell is on a path to merge all of our storage systems into one common architecture over the next few years and SDS the fudge out of it. CRN has a good article on Project Blue Thunder  with an interview with Alan Atkinson (VP Storage) and where Dell Storage is heading. Andrew’s session will explain all of this in more detail.

Also, EqualLogic v8 firmware is about to be released with support for Compression and VMware Vvols. In the roadmap session we’ll take you through the next generation of the PS4110 and PS6510 systems.

The “Talent”

As part of the Keynote Forrester Research will be “presenting their exclusive findings on how Australian businesses are innovating using Cloud, Big Data and Mobility to securely meet the changing needs of their customers.” This will be an interesting session as I’m curious to see what they think the different Australian states are doing. Here in QLD we are seeing a massive push for Cloud and managed services, not just for storage but for everything in the datacenter. This is partially driven by government policy up here but private businesses are looking to make the move too. In NSW & VIC the cloudy push isn’t as widespread (IMHO). It probably helps that Dell has a established Data Centre presence in QLD already.

We have a lot of Dell execs coming down to present and talk with customers fresh after DUF in Miami, including

  • Alan Atkinson – VP Dell Storage @dell_storage
  • Enrico Bracalente – Directory of Product Management for 13G
  • Kishore Gagrani – Product Director – Fluid Cache for SAN @kgagrani
  • Amit Midha – Dell APJ President
  • Daniel Morris – All around good bloke and chicken enthusiast @danmoz

candy-thumb-400x300-49I haven’t personally met the other guys but if you can try and have a chat with Alan. He is remarkably approachable and happy to take the time to talk for someone at his level. Top bloke.

And then finally, at the end of the day we get to celebrate and have a beer or two. Sadly  we have training at 8am the next morning in Frenches Forrest, so that’s not so hot. The beers are sponsored by Cloud which is as ambiguous as you can get.

Orange whip? Orange whip? Three Orange whips!!

If you see me at DUF come and say G’day. I’ll be the guy frantically trying to cut and paste ■ ■ ■ ■ continually on my phone :)

Register now

Dell User Forum ANZ

508, 2014

Compellent and EqualLogic Management with vCenter Operations Manager

By |August 5th, 2014|Dell Storage, Storage|0 Comments

Ever since VMware has been around it’s been difficult to troubleshoot and isolate issues to do with storage. In the early days you had a RAID set and a volume, present the volume to VMware and then from then you would carve up virtual machines. VMware is like an Ogre, it has layers. However, if there ever was a performance issue it was difficult to know where in the stack the problem resided, was it the storage, memory balloon driver, network, vmotion.  I remember when Navisphere was first able to show a LUN and its corresponding VMware volume name. It was like finding that last ice cream at the bottom of the freezer. From there we’ve had plug-ins, tools, different tools, SRM etc. but still if there was an issue you had to go through each layer in the stack looking for issues. Enter vCenter™ Operations Manager (vCOps), an analytics VM that is constantly watching and learning about your virtual environment and providing dashboards to help resolve current issues, prevent future issues and look at ways the environment can be better optimized.

vCops Overview

vCops Overview

Unlike traditional monitoring tools that work on thresholds, like space full, latency, etc; vCops learns how your environment typically operates day to day and displays events that would be above or below normal. Over time it knows when your backups run, when the Monday morning report is run, when everyone takes off to the pub for Friday afternoon drinks. It looks for patterns and then triggers when events or metrics fall outside that normal pattern. It takes about 30 days of collecting data before vCOps data becomes useful. For example: a virtual machine that typically does 300 IOPS sustained and that changes to 700 IOPS sustained, the virtual machine health status will change and alert in vCOPs. Likewise, every Thursday a full backup is run and the storage is smashed, vCOps knows that’s the expected workload for a Thursday night so its all ok. There is a lot more to vCOps and if you want more info I suggest checking out @mwpreston’s site as he loves the stuff. http://blog.mwpreston.net/tag/vcops/ As a product vCOps has been around for a few years but with the latest version (5.7.1 when I wrote this) it now supports external data sources and custom dashboards, in particular around external storage metrics.

vCOps Storage Adapters

Heads up, it must be pointed out you need either advanced or enterprise versions of vCenter Operations Manager for the storage adapters to work.  Also the Dell Storage Adapter is a licensed product, but not too expensive in the scheme of things. Finally, most of this content is taken from a deck by Jason Boche and David Glynn at Dell User Forum in Miami this year, from a session called Monitoring a vSphere Environment with Storage Center and EqualLogic vCenter Operations Manager Adapters … say that three times with marbles in your mouth. vCOps Key FeaturesThe Dell storage solution pack adds value in a number of ways:

  • It delivers a servers to storage view of the health of your VMware infrastructure
  • Provides increased visibility, metrics, and rich storage analytics
  • Pre-packaged dell storage dashboards for alerts, health, performance, capacity, and top-utilization metrics
  • Performs analysis from VM to storage volume to speed root cause analysis

The storage adapter collects data through enterprise manager on Compellent (SC) and San HQ with EqualLogic (PS).  The data collection isn’t real time and can be up to 15 minutes behind depending on the values you choose to use (default is 5 mins for SC and 2 mins for PS).  This information goes into vCOps and it does its analytic magic from there. There are different UIs for VCOps, standard out of the box, Admin, and Custom.  The storage UI will have its own custom page. Standard UI

  • https://vcops UI FDQN
  • https://vcops UI FDQN/vcops-vsphere
  • Native VMware objects

Admin UI

  • https://vcops UI FQDN/admin
  • Configure/update vC Ops and install adapters

Custom UI

  • https://vcops UI FQDN/vcops-custom
  • 3rd party adapter configuration
  • 3rd party adapter objects
  • Custom dashboards

Loads of metrics are gathered by the adapter, arrays stats, controller, volume, front end and back end port stats.  By default there are a number of dashboards included with the adaptor but that doesn’t stop you creating your own custom dashboards that suit your business.  One bright feature is that shows you a visualization of the volumes presented to vCenter.  Volumes with issues will come up highlighted which you can then click and get information about the specific volume. There are little differences between the Compellent and EqualLogic metrics and I have included a screen shot of the differences below.

There is a great video by Josh Raw who is a Dell Storage Product Manager on Youtube that goes through the Solutions Pack, how to install it and demos a lot of the different dashboards (demo starts about the 7 min mark). He goes through the process of discovering an issue with a Virtual machine and drilling down to the volume that is the source of the problem, then eventually working out another VM on that volume is running a rouge process.  https://www.youtube.com/watch?v=_6ZkNOolQnY 

Requirements

As of July 2014, best to check the release notes:

  • vCenter Operations Manager 5.7.1 or higher (Advanced or Enterprise)
  • Storage Center 5.5 or newer, 6.3 or newer
  • Enterprise Manager 2014 R1 or newer
  • Enterprise Manager listens on TCP 3033
  • Credentials for EM or SAN HQ
  • SAN HQ version 3.0 or greater
  • EqualLogic Statistics Agent installed on SAN HQ Server
  • EqualLogic Statistics Agent listens on TCP 5040
  • EqualLogic firmware 6.0.7 or newer

 Screenshots of Storage Dashboards

 

Load More Posts