CentOS / Red Hat Linux: Working with iSCSI

Here is a great article on how to install the iSCSI software initiator on Linux and then connect to volumes.
http://www.cyberciti.biz/tips/rhel-centos-fedora-linux-iscsi-howto.html

The article mentions the steps were tested on Redhat Enterprise (RHEL) v5, CentOS v5, Fedora v7 and Debian/Ubuntu Linux.
I went through the steps on a CentOS v5.3 x64 machine and it worked flawlessly.

Here’s another link when working with the Linux Device-mapper Multi-pathing with iSCSI:
http://www.cyberciti.biz/tips/rhel-linux4-setup-device-mapper-multipathing-devicemapper.html

Here is a link on working with SAN Snapshots and mounting that snapshot volume to a Linux host:
http://knowledgelayer.softlayer.com/questions/405/How+to+connect+to+an+iSCSI+Snapshot

Here are also some useful Linux and iSCSI documents from HP Lefthand that we’ve uploaded to this blog:
Setting Up iSCSI volumes on CENTOS 5, RedHat 5, Fedora 7 and, Debian
Configuring CHAP authentication with the linux iscsi initiator
LeftHand Volumes with SUSE Linux iSCSI

Why is my backup running slow?

Backup systems, while a necessary part of any well managed IT system, are often a large source of headaches for IT staff. One of the biggest issues with any back system is poor performance. It is often assumed that performance is related to the efficiency of the backup software or the performance capabilities of backup hardware. There are, however, many places within the entire backup infrastructure that could create a bottleneck.
Weekly and nightly backups tend to place a much higher load on systems than normal daily activities. For example a standard file server may access around 5% of its files during the course of a day but a full backup reads every file on the system. Backups put strain on all components of a system from the storage through the internal buses to the network. A weakness in any component along the path can cause performance problems. Starting with the backup client itself, let’s look at some of the issues which could impact backup performance.

  • File size and file system tuning
  • Small Files

A file system with many small files is generally slower to back up than one with the same amount of data in fewer large files. Generally systems with home directories and other shares which house user files will take longer to back up than database servers and systems with fewer large files. The primary reason for this is due to the overhead involved in opening and closing a file
In order to read a file the operating system must first acquire the proper locks then access the directory information to ascertain where the data is located on the physical disk. After the data is read, additional processing is required to release those locks and close the file. If the amount of time required to read on block of data is x, then it is a minimum of 2-3x to perform the open operations and x to perform the close. The best case scenario, therefore, would require 4x to open, read and close a 1 block file. A 100 block file would require 103x. A file system with a 4 100 block files will require around 412x to back up. The same amount of data stored in 400 1 block files would require 1600x or about 4 times as much time.

So, what is the solution? Multiple strategies exist which can help alleviate the situation.
The use of synthetic full backups only copies the changed files from the client to the backup server (as with an incremental backup) and a new full is generated on the backup server from the previous full backup and the subsequent incrementals. A synthetic full strategy at a minimum requires multiple tape drives and disk based backup is recommended. Adequate server I/O performance is a must as well since the creation of the synthetic full requires a large number of read and write operations.
Another strategy can be to use storage level snapshots to present the data to the backup server. The snapshot will relieve the load from the client but will not speed up the overall backup as the open/close overhead still exists. It just has been moved to a different system. Snapshots can also be problematic if the snapshot is not properly synchronized with the original server. Backup data can be corrupted if open files are included in the snapshot.
Some backup tools allow for block level backups of file systems. This removes the performance hit due to small files but requires a full file system recovery to another server in order to extract a single file.
Continuous Data Protection (CDP) is a method of writing the changes within a file system to another location either in real time or at regular, short intervals. CDP overcomes the small file issue by only copying the changed blocks but requires reasonable bandwidth and may put an additional load on the server.
Moving older, seldom accessed files to a different server via file system archiving tools will speed up the backup process while also reducing required investment in expensive infrastructure for unused data.

  • Fragmentation

A system with a lot of fragmentation can take longer to back up as well. If large files are broken up into small pieces a read of that file will require multiple seek operations as opposed to a sequential operation if the file has no fragmentation.
File systems with a large amount of fragmentation should regularly utilize some sort of de-fragmentation process which can impact both system and backup performance.

  • Client throughput

In some cases a client system may be perfectly suited for the application but not have adequate internal bandwidth for good backup performance. A backup operation requires a large amount of disk read operations which are passed along a system’s internal bus to the network interface card (NIC). Any slow device along the path from the storage itself, through the host bus adapter, the system’s backplane and the NIC can cause a bottleneck.
Short of replacing the client hardware the solution to this issue is to minimize the effect on the remainder of the backup infrastructure. Strategies such as backup to disk before copying to tape (D2D2T) or multiplexing limit the adverse effects of a slow backup on tape performance and life. In some cases a CDP strategy might be considered as well.

  • Network throughput

Network bandwidth and latency can also affect the performance of a backup system. A very common issue arises when either a client or media server has connected to the network but the automatic configuration has set the connection to a lower speed or incorrect duplex. Using 1Gb/sec hardware has no advantage when the port is incorrectly set to 10Mb/half duplex.
Remote sites can also cause problems as those sites often utilize much slower speeds than local connections. Synthetic full backups can alleviate the problem but if there is a high daily change rate may not be ideal. CDP is often a good fit, as long as the change rate does not exceed the available bandwidth. In many cases a remote media server with deduplicated disk replicated to the main site is the most efficient method for remote sites.

  • Media server throughput

Like each client system the media server can have internal bandwidth issues. When designing a backup solution be certain that systems used for backup servers have adequate performance characteristics to meet requirements. Often a site will choose an out of production server to become the backup system. While such systems usually meet the performance needs of a backup server, in many cases obsolete servers are not up to the task.
In some cases a single media server cannot provide adequate throughput to complete the backups within required windows. In these cases multiple media servers are recommended. Most enterprise class backup software allows for sharing of tape and disk media and can automatically load balance between media servers. In such cases multiple media servers allow for both performance and availability advantages.

  • Storage network

When designing the Storage Area Network (SAN) be certain that the link bandwidth matches the requirements of attached devices. A single LTO-4 tap drive writes data at 120MB/sec. In network bandwidth terms this is equivalent to 1.2Gb/sec. If this tape drive is connected to an older 1Gb SAN, the network will not be able to write at tape speeds. In many cases multiple drives are connected to a single Fibre Channel link. This is not an issue if the link allows for at least the bandwidth of the total of the connected devices. The rule of thumb for modern LTO devices and 4Gb Fibre Channel is to put no more than 4 LTO-3 and no more than 2 LTO-4 drives on a single link.
For disk based backup media, be certain that the underlying network infrastructure (LAN for network attached or iSCSI disk and SAN for Fibre Channel) can support the required bandwidth. If a network attached disk system can handle 400MB/sec writes but is connected to a single 1Gb/sec LAN it will only be able to write up to the network speed, 100MB./sec. In such a case, 4 separate 1Gb connections will be required to meet the disk system’s capabilities.

  • Storage devices

The final stage of any backup is the write of data to the backup device. While these devices are usually not the source of performance problems there may be some areas of concern. When analyzing a backup system for performance, be sure to take into account the capabilities of the target devices. A backup system with 1Gb throughput throughout the system with a single LTO-1 target will never exceed the 15MB/sec (150Mb/sec) bandwidth of that device.

  • Disk

For disk systems the biggest performance issues is the write capability of each individual disk and the number of disks (spindles) within the system. A single SATA disk can write between 75 and 100 MB/sec. An array with 10 SATA drives can, therefore, expect to be able to write between 750MB/sec and 1GB/sec. RAID processing overhead and inline deduplication processing will limit the speed so except the real performance to be somewhat lower, as much as 50% less than the raw disk performance depending on the specific system involved. When deciding on a disk subsystem, be sure to evaluate the manufacturer’s performance specifications.

  • Tape

With modern high speed tape subsystems the biggest problem is not exceeding the device’s capability but not meeting the write speed. A tape device performs best when the tape is passing the heads at full speed. If data is not streamed to the tape device at a sufficient rate to continuously write, the tape will have to stop while the drive’s buffer is filled with enough data to perform the next write. In order to get up to speed, the tape must rewind a small amount and then restart. Such activity is referred to as “shoe shining” and drastically reduces the life of both the tape and the drive.
Techniques such as multiplexing (intermingling backup data from multiple clients) can alleviate the problem but be certain that the last, slow client is not still trickling data to the tape after all other backup jobs have completed. In most cases D2D2T is the best solution, provided that the disk can be read fast enough to meet the tape’s requirements.

  • Conclusion

In most backup systems there are multiple components which cause performance issues. Be certain to investigate each stage of the backup process and analyze all potential causes of poor performance.

SAS vs. SATA Differences, Technology and Cost

One of our resources at HP Lefthand Networks (thanks Ben!) made the following comment to one of our customers and I thought it’d be a perfect post for the blog as it contains some useful information that some might not be aware of.

Here are the high-level differences between SAS and SATA disk drives:

Capacity:

  • SATA disk drives are the largest on the market.  The largest SATA drives available with widespread distribution today are 1.5TB-2TB.
  • SAS disk drives are typically smaller than SATA.  The largest SAS drives available with widespread distribution today are 450GB.
  • So, for capacity, a SATA disk drive is 3X-4x as dense for capacity than SAS.
  • A good way to quantify capacity comparison is $/GB.  SATA will have best $/GB.

Performance:

  • SATA disk drives spin at 7.2k RPMs.  Average seek time on SATA is 9.5msec.  Raw Disk IOPS (IOs per second) are 106.
  • SAS disk drives spin at 15k RPMs.  Average seek time on SATA is 3.5msec.  Raw Disk IOPS (IOs per second) are 294.
  • So, for performance, a SAS hard drive is nearly 3X as fast as SATA.
  • A good way to quantify performance comparison is $/IOP.  SAS will have best $/IOP.

Reliability: there are two reliability measures – MTBF and BER.

  • MTBF is mean time between failures.  MTBF is a statistical measure of drive reliability.
  • BER is Bit Error Rate.  BER is a measure of read error rates for disk drives.
  • SATA drives have a MTBF of 1.2 million hours.  SAS drives have a MTBF of 1.6 million hours.  SAS drives are more reliable than SATA when looking at MTBF.
  • SATA drives have a BER of 1 read error in 10^15 bits read.  SAS drives have a BER of 1 read error in 10^16 bits read.  SAS drives are 10x more reliable for read errors.  Keep in mind a read error is data loss without other mechanisms (RAID or Network RAID) in place to recover the data.

Here are some good links for comparing disk types:
http://www.seagate.com/docs/pdf/datasheet/disc/ds_barracuda_es_2.pdf
http://www.seagate.com/docs/pdf/datasheet/disc/ds_cheetah_15k_7.pdf
http://h18004.www1.hp.com/products/servers/proliantstorage/drives-enclosures/index.html

Microsoft Desktop Licensing in a Virtual Environment (VDI)

We’ve already posted some useful info on licensing servers for a virtual environment, but we also get the question a lot of how virtualization changes licensing for hosted desktop (aka VDI) environments. Well, here are some very useful links.

Here’s a high level:

Virtualization brings about new use cases that did not previously exist in traditional desktop environments. These use cases include the ability to create multiple desktops dynamically, enable user access to multiple virtual machines (VMs) simultaneously, and move desktop VMs across multiple platforms, especially in load-balancing and disaster recovery situations. Microsoft designed Windows Virtual Enterprise Centralized Desktop (VECD) to enable organizations to license virtual copies of Windows client operating systems in virtual environments.

Virtual Enterprise Centralized Desktop Licensing:
http://www.microsoft.com/windows/enterprise/solutions/virtualization/licensing.aspx

More Licensing Details:
http://blogs.technet.com/virtualization/archive/2009/07/13/Microsoft_1920_s-new-VDI-licensing_3A00_-VDI-Suites.aspx

Desktop Virtualization (VDI) Info:
http://www.microsoft.com/virtualization/products/desktop/default.mspx
http://www.vmware.com/products/view/

 Hope it helps!

Raw SAN Network Speeds

Here’s some RAW SAN network speeds that I found in some post somewhere (which I didn’t write down).. Obviously there are a lot of caveats related to this, but from a pure bandwidth perspective, I thought this was interesting for reference.

1 gig = 125 MB/sec

2 gig = 250 MB/sec

4 gig = 500 MB/sec

8 gig = 1000 MB/sec

10 gig = 1250 MB/sec

Windows 7 XP Mode RC Available

There is an article over at TechTree.com talks about the Windows XP Mode (XPM) RC which is now available for download. Good Stuff!

http://www.techtree.com/India/News/Windows_XP_Mode_RC_for_Windows_7_Arrives/551-105212-580.html

A similar article over at Virtualization Review – http://virtualizationreview.com/articles/2009/08/04/windows-xp-mode-rc-released.aspx

Oracle Kills Virtual Iron Brand and Fires Employees

Wow, I thought this was crazy.. Virtual Iron was gobbled up by Oracle a couple months ago and I know a lot of us were wondering what was in store for Virtual Iron.. well, the future isn’t a good one for them. It looks like Oracle has killed the Virtual Iron brand, assimulating their technology and moving on (yes, that was a geeky Star Trek reference hehe). Read more about it at these links:

http://storageinformer.com/oracle-to-terminate-virtual-iron-business/
http://gregarius.dropcode.net/demo/virtualization.info/2009/06/19/Oracle_kills_Virtual_Iron_brand,_fires_all_employees_but_10

Oh, and if you need a VMware design/demo/quote to replace Virtual Iron in your environment, just let us know! ;-)
UPDATE: VMware has also provided special promo pricing for ex-Virtual Iron customers to move over to VMware! See the details here:
http://www.vmware.com/company/news/releases/virtualiron-safepassage.html

How To Collect Data From a Fibre Channel (FC) Switch

Sometimes you will be asked by either the manufacturers support or perhaps by Lewan for data from your Fibre Channel switch. Here is how you can gather that information in a format that helps support and/or Lewan:

Brocade – How-To Collect a “supportshow” from a Brocade Switch from a Windows Host with HyperTerminal
Follow these steps:

  1. Start the HyperTerminal program by selecting Start -> Programs -> Accessories -> Communications -> HyperTerminal.
  2. Make a new connection and select a name and icon for the connection.
  3. A “Connect to” window is displayed.
  4. Change the Connection using modem to TCP/IP (Winsock) and enter the IP address of the Brocade switch.
  5. Click the OK button.
  6. Log in to Brocade switch (default user: admin/default password: password), and then start to capture text. Select Transfer -> Capture text -> File C:supportshow.wri.
  7. Run the Brocade supportshow command.
  8. After the command completes, stop the “capture text” process (Transfer -> Capture text -> Stop).
  9. After completing this for all switches in all related fabrics, type quit and close the HyperTerminal session.

Cisco Support Logs

To capture support logs for a Cisco FC switch, following these instructions:

1) For firmware 1.2(x) and above telnet to the switch and open a capture session.
2) Run the following commands:
    term len 0
    show tech-support details
3) For firmware 1.0(4):  There is not a single command like a supportshow or data collection. There are two ways to get the outputs needed to troubleshoot most Cisco switch issues. Contact Lewan for additional information.

McDATA Switch Data Collection

In order to collect data from a McDATA switch being managed by McDATA’s EFCM utility, follow these instructions:

  1. Select the switch that you want to collect data from.
  2. Select Maintenance and then Data Collection.
  3. Enter a file name to call the file and then select save. Note the directory where the data is saved.  Once you select save, the data collection takes over and the files is downloaded to the local PC and stored in the directory specified.

How to collect switch information and related data from a McDATA DS-16M, DS-32M or another switch with EWS:

These switches (also known as ES3016 and ES3032)  have an Embedded Web Server (EWS) GUI. You can access this through a web browser by entering the IP Address in the URL address line  (that is, http:/10.14.1.92).  Once you have logged in you can run a script that collects switch information including: Network Info, Operating Parameters, Zone Info, Port Login Data, Port Data and Port Types, and Switch Status.

Note:  These model switches do not support serial port connectivity for information retrieval.

To collect this information, follow these steps:

  1. Once you have logged in to the EWS GUI, click on ” Operations ” from the left frame of the EWS GUI.
  2. Click the third tab called “Maintenance.”
  3. Click the secondary tab labeled Product Info.
  4. Click Product Information. This will generate a report.
  5. Click “File” on the web browser toolbar and select “Save As” to save the .txt file with either the default name or one that you rename it to. Save it on the desktop or to a directory where you can locate it so that you can email it to Technical Support.

To locate the switch firmware revision, follow these steps:

  1. Click “View” from the left frame of the EWS GUI.
  2. Select Unit Properties. The last entry of that page has the firmware level.

Oracle Buys Virtual Iron

I guess when Oracle bought Sun, they had some more money laying around.. they also bought a Xen Hypervisor offshoot based virtualization company called Virtual Iron.

http://www.oracle.com/us/corporate/press/018535
http://www.oracle.com/virtualiron/index.html

It’ll be interesting to see what they do with it, now that Oracle has Sun’s virtual offerings as well as Virtual Iron’s offerings..