Less More with Virtual Provisioningand Linux on System zGail RileyEMC CorporationFebruary 7, 2013Thursday @ 3:00pmSession Number 12317

Agenda Introduction to Virtual Provisioning Virtual Provisioning features FBA CKD Virtual Provisioning Benefits Fully Automated Storage Tiering for Virtual Pools (FASTVP) Overview2

Virtual Provisioning Thin Provisioning From wiki:“Thin provisioning is the act of using virtualization technologyto give the appearance of having more physical resourcesthan are actually available.”“Thin provisioning[1] is a mechanism that applies to largescale centralized computer disk storage systems, SANs, andstorage virtualization systems. Thin provisioning allows spaceto be easily allocated to servers, on a just-enough and justin-time basis.” Virtual Provisioning is the EMC term for thin provisioning3

Data Layout – disk device Capacity for a disk device is allocated from a group ofphysical disks Example: RAID 5 with striped data parity Workload is spread across multiple physical diskLogicalUnits (LUN)PhysicalDisk Drives4

Data Layout – Pool-based AllocationVirtual Provisioning Storage capacity is structured in pools Thin devices are disk devices that are provisioned tohostsThin Device Bound to PoolThin Device(TDEV)Thin Pool5

Storage Requirement: Performance Storage LayoutGo Wide Before Deep! Goal is to spread workload across all available systemresources Optimize resource utilization Maximize performance Three approaches: RAID data protection Meta Devices (Symmetrix) Virtual Provisioning6

VP Components Thin Data Device (TDAT)– An internal, non-addressabledevice– Provides the physical storage fora thin device– Multiple RAID protection types RAID 1, RAID 5, RAID 6 Thin Pool– a shared, physical storageresource of a single RAIDprotection and drive technology– the first TDAT added determinesthe protection type7Thin PoolFC Raid1Add (4) x 25GB Raid 1TDATs

VP Components Thin Device (TDEV)– Host-addressable, cache only device– bound to a thin pool and provisioned to hosts– Seen by the operating system as a “normal” device– Used in the same way as other host-addressable devices Can be replicated both locally and remotely– Physical storage need not be completely allocated at devicecreation– Physical storage is allocated from a thin pool of DATAdevices Thin Device Extent– unit of allocation from a thin pool when a host writes to a newarea of a thin device– 12 Symmetrix tracks, 768 KB (aka track group)8

Virtual Provisioning forFBA (SCSI) deviceswithLinux on System z

VP Concepts for FBA as a SCSI LUNVirtual ProvisioningApplication perceivedthin devicesReportedcapacityAllocated AllocatedCommonstorageDatadevices pool10 Thin Provisioning - SCSISpace efficient technologyData storage never 100% fullPresent thin device to LinuxOnly consumes storage as thehost writes to the thin device Physical storage allocatedfrom a shared pool Over Subscription Thin device capacity pool

Binding a Thin Device A thin device must be bound to a pool in order to beallocated any storage One extent is allocated from the pool when it’s bound Any write to a new area of a thin device will trigger an extentallocation from the pool the device is bound to– New allocations are performed using a round robin algorithmto spread extents across all of the enabled data devices in thethin pool11

Virtual Provisioning Bindbind allocates initial extent in thin poolThin PoolStorage GroupFC Raid1200GB TDEVAdd (4) x 25GBRaid 1 TDATsBindTDEV100GB of Storage Capacity inPool12Thin Pool is Oversubscribed2:1Host sees 200GB Device (Ready)

Virtual Provisioning WritesWrite to new area of tdev will allocate extents in thin poolThin PoolStorage GroupFC Raid1200GB TDEVAdd (4) x 25GBRaid 1 TDATs10001011000101100GB of Storage Capacity inPool13Thin Pool is Oversubscribed2:1

Host Reads from Thin Devices Thin devices are cache only devices that contain pointers tothe allocated extents on the data devices When a read is performed to a thin device, the data isretrieved from the appropriate data device Reading from a previously unallocated logical block addresswill:– return a block containing all zeros– not trigger an allocation of a new extent14

VP Threshold Settings15

Over Subscription with SCSI devices A thin pool can be over subscribed– Provision more space than exists in the pool A thin device’s entire configured capacity counts against thebound pool’s maximum subscription percentage– Even if the device remains thin (or all of its allocated extents arepromoted/demoted to other pools by FAST VP)16

Extended Pool Functions and Attributes Pool Rebalancing– Rebalancing Variance % - controls whether a data device(TDAT) will be chosen for a possible rebalance– Maximum Rebalance Scan Device Range – the maximumnumber of data devices (TDATs) to concurrently balance atany one time Attributes (for FBA as a SCSI device)– Maximum Subscription % - controls whether a pool can beover subscribed (allocated)– Pool Reserve Capacity (PRC) – pools enabled capacity to bereserved for allocating new extents for the bound devices inthe pool17

Space Reclamation Use Case Some migration methods between regular and thin deviceswill leave the target thin devices fully allocated Extents that are allocated on the thin devices may beeligible to be returned to the thin pool Some extents may never have been written to by a host Some extents may contain all zero data Available capacity in the thin pool can be maximized byreturning unneeded extents back to the pool Space Reclamation is an extension of the existing VirtualProvisioning space de-allocation mechanism18

Space Reclamation Feature Reclamation operations are run against individual thindevices Enginuity* will examine all of the allocated groups onspecified thin device All tracks will be examined to see if they contain all-zero data If all tracks in an extent contain all-zero data, theextent will be de-allocated Tracks that are marked Never Written By Host (NWBH) do notneed to be examined by Enginuity Space Reclamation is a slow running process Enginuity does not reclaim space at the expense of hostperformance*Enginuity is the EMC Symmetrix Storage Operating environment19

Thin Provisioning “cleanup” Terms are used loosely which can be confusing SCSI standard ( - T10 Technical Committee on SCSIStorage Interfaces Host Based SCSI commands for thin devices– SCSI unmap– SCSI write same with unmap Support for these SCSI commands are– kernel dependent – Linux vendor and release– Storage array dependent Any new technology should be tested and fully understood before being putinto production!Check the vendor’s documentation and support matrix for requirements and/or restrictions20

Thin Provisioning “cleanup” Terminology Unmap– SCSI command– Sent to thin device to unmap (or deallocate) one or morelogical blocks Write Same (with unmap flag)– SCSI command to write at least one block and unmap otherlogical blocks fstrim – executable, batch command used on filesystems Discard– option on mount and mkfs command for ext4 and xfsfilesystems– controls if filesystem supports the SCSI unmap command sothin devices can free specific blocks21

Filesystem mount discard option Linux Releases supporting the discard option on thefilesystem mount command– SLES 11 SP2*– RHEL 6.2 with a hot fix and ext4– RHEL 6.3 and ext4 Storage Array– EMC VMAX @ Enginuity level 5876*– Other?*Check the vendor’s support matrix for the specific details22

Verification of discard support Thin device must be mapped and masked to Linux Examine file(s) to verify discard support for the device/sys/bock/ device /queue / discard max bytes# cat discard max bytes25165824from“The discard max bytes parameter is set by the device driver to the maximumnumber of bytes that can be discarded in a single operation. Discard requestsissued to the device must not exceed this limit. A discard max bytes value of 0means that the device does not support discard functionality.”23

Create ext4 filesystem with discard ext4 filesystem created with discard first discardsblocks on thin device, then creates filesystem# mke2fs -F -t ext4 -E discard -vvv /dev/sdbmke2fs 1.41.12 (17-May-2010)fs types for mke2fs.conf resolution: 'ext4', 'default'Discarding device blocks: doneDiscard succeeded and will return 0stable wipe .24- skipping inode

mount ext4 with discard Filesystem mounted with the discard option– Frees up space on thin device at time of file deletionAnd when the array receives the actual write request– NOTE: there is overhead associated with active discard so this should betested in your own environmentmount -o discard -t ext4 /dev/sdb /thin mount# mount/dev/sdb on /thin mount type ext4 (rw,discard)25

fstrim mount ext4 filesystem without discard mount option Filesystem mounted without the discard option Does not frees up space on thin device at time of file deletion You may free up space on a filesystem, where files were previouslydeleted, on a thin device with fstrim fstrim is executed against a filesystem and it’s underlying thin device Linux support - release and vendor dependent. Check vendor’ssupport matrix for proper support requirements26

Virtual Provisioning forCKD deviceswithLinux on System z

Standard Provisioning Concept (CKD)101D -3390-9 101E -3390-9Hostaddressabledevices Front End(FICON)TrackmappingtablesCacheBack EndSingleRAID rankDisks28

Virtual (thin) Provisioning ConceptThindevices101D -3390-9 101E -3390-9 Front End(FICON)Track groupmappingtablesCacheBack EndVirtual PoolDisksDatadevices29

VP Components for CKD CKD VP components are same for CKD as they are forFBA:– Thin Pool – a shared, physical storage resource of a singleRAID protection and drive technology– Data Device (TDAT) – RAID protected devices that providethe actual storage for a thin pool– Thin Device (TDEV) – cache only devices that are bound to athin pool and provisioned to hosts– Thin Device Extent – allocation unit from a thin pool when ahost writes to a new area of a thin device 12 Symmetrix tracks, 768 KB (aka track group)30

VP for CKD with Linux on System z Present thin CKD device to z/VM and/or Linux on z Thin CKD device must be fully provisioned for z/VM andLinux Initial format of thin CKD device fully allocates device cpfmtxa dasdfmt Benefits Wide striping EMC FAST – Fully Automated Storage Tiering31

Common Functions of VP for CKD and FBA Underlying VP technology is the same for FBA and CKDtherefore certain management activities are also the same Rebalancing Drain Fully Automated Storage Tiering (FAST)32

Rebalancing Should be started after adding new TDATs to an existingpool Runs at a very low priority Can be influenced by two extended pool attributes: Rebalancing Variance % controls whether a data device (TDAT) will be chosen for apossible rebalance Maximum Rebalance Scan Device Range the maximum number of data devices (TDATs) to concurrently balanceat any one time33

VP Benefits Improved capacity utilization (with VP LUNs and Linux) Reduces the amount of allocated but unused physical storage Avoids over-allocation of physical storage to applications Efficient utilization of available resources Wide striping distributes I/O across spindles Reduces disk contention and enhances performance Maximizes return on investment Ease and speed of provisioning Simplifies data layout Lowers operational and administrative costs Basis for Automated Tiering (FAST VP) Active performance management at a sub-volume, sub datasetlevel34

Basis for FAST With information growth trends, allFibre Channel (FC) configurations will:EFDFCSATA Cost too much Consume too much energy Take up too much space Skew: At any given time, only a smalladdress range is active – the smallerthe range, the better Persistence: If an address range isactive (or inactive), it remains so for awhile – the longer the duration, thebetterWorkload FAST helps by leveraging disk drivetechnologies What makes FAST work in real-worldenvironments?80% of IO’s on 20% of capacity35

Fully Automated Storage Tiering VP FAST VP is a policy-based system that promotes anddemotes data at the sub-volume, and more importantly,sub-dataset/sub-lun which makes it responsive to theworkload and efficient in its use of control unit resources Performance behavior analysis is ongoing Active performance management FAST VP delivers all these benefits without using anyhost resources36

Virtual Provisioning with TiersVP 14R53 EFD Pool37VP DevApp1R1 FC PoolR6 SATA Pool

Storage Elements Symmetrix Tier – a shared storage resource withcommon technologies (Virtual Pools) FAST Policy – manage Symmetrix Tiers to achieveservice levels for one or more Storage Groups FAST Storage Group – logical grouping of thin devicesfor common managementFAST StorageGroupsFAST PoliciesAutomatic 100%100%Custom38R53 EFD 200GB200 GB EFDRAID 5 (3 1)R1 FC 450GBVP Prod DB2VP QA DB2Symmetrix Tiersx%y%z%450 GB 15K FCRAID 1R66 SATA 1TB1 TB SATARAID 6 (6 2)

FAST VP Implementation Performance data collected bythe system Intelligent Tiering algorithmgenerates movement requestsbased on performance data Allocation Compliancealgorithm generates movementrequests based on capacityutilization Algorithms continuously assessI/O statistics and capacity use,and make decisions forpromotion and demotion39Performance DataCollectionAnalyzePerformance DataIntelligent TieringAlgorithmExecute DataMovementAllocationComplianceAlgorithm

Summary Virtual Provisioning Thin Provisioning Available for FBA/SCSI and CKD devices FBA as SCSI devices Space is allocated as needed Over subscription Cleanup of unused space via space reclamation or T10 SCSIcommand standards Linux and Storage array dependent CKD Fully allocated Wide Striping FAST VP – Fully Automated Storage Tiering VP Active performance management40

THANK YOUGail RileyEMC [email protected]

RAID 1, RAID 5, RAID 6 VP Components Thin Pool FC_Raid1 Add (4) x 25GB Raid 1 TDATs Thin Pool –a shared, physical storage resource of a single RAID protection and drive technology –the first TDAT added determines the protection type 7