Ensure Your Success in the World of NFS Cache Appliances: Things to consider during your evaluation By Doug Rainbolt On June 28, 2011
Before embarking down the path of evaluating caching appliances for your NFS (network file system) environment, there are some important things to consider. There are several options available and the results that you might see from a given appliance might vary greatly from another installation site deploying the same solution. Let’s focus on the big factors that can shape your evaluation experience and ultimately determine how successful your implementation will be. The process requires some preparatory thought.
First, have a sense for the file set size that you are looking to support and what percentage of this file set size actually comprises the most active data. Your file set might be 7TB but the active data making up the working set might be a fraction of this. The more of the working set that can reside in the cache, the higher your hit rate will be. The hit rate is the percentage of operations that are satisfied by the cache.
Second, know your NFS mix. What percentage of the mix is reads, what percentage is writes, and what percentage is made up of NFS metadata operations? If the percentage of reads is high and the working set isn’t too large, the cache can greatly reduce traffic to the backend NAS (network-attached storage). The same applies to many NFS metadata operations such as GETATTRs and LOOKUPs. If your NFS mix is extremely write heavy, the performance of the cache appliance can vary widely, depending upon how the device handles writes. Let’s address this further by comparing a write-through versus write-back model.
In a write-through model, the appliance may store an incoming write, but not make the newly written data available to subsequent reads until an acknowledgement has been returned from the NAS. The latency associated with passing the write through can be very low, in the lines of switch latencies, but the latency associated with receiving the acknowledgment back is dependent upon the NAS.
The write-through model has the benefit of ensuring data consistency with the backend NAS. If the NFS mix favors reads or NFS metadata operations, this model can deliver very high performance both in throughput and response times. The NAS can be offloaded, not avoided as in a write-back model, and focus on write performance—returning acknowledgements faster.
It is advisable when using the write-through model to have all of the clients that would otherwise be connected to the NAS to instead be connected to the cache appliance. This will produce the best results. If some clients are connected to the cache appliance and some to the NAS directly, those connected directly will obviously place load on the NAS. This added load can negatively impact the performance of the clients connected to the cache appliance because operations that can’t be satisfied by the cache appliance will be dependent on the NAS. The best way to reduce latency for all clients is to connect them to the cache appliance.
Alternatively, in a write-back model, the appliance doesn’t wait for an acknowledgment from the NAS before satisfying subsequent reads. It stores the writes locally and at some predetermined time, passes the writes to the NAS.
If you are thinking of using a write-back model, there are several things you should keep in mind:
- Be comfortable with the appliance holding the write data. The longer the writes are held, the less consistent it will be with data stored on the NAS. The consistency points are wide apart. If your NAS is deploying extensive data protection tools, you need to feel comfortable with the risk that these tools will not have visibility of uncommitted writes stored on the appliance.
- Your write-back model may require several more clustered nodes to deliver the throughput you’re targeting. If you are comfortable with consistency points being far apart, the benefit of this model is being far less dependent upon NAS performance. You should compare the costs associated with the added nodes required to the cache cluster to savings on the back-end.
Third, make sure that the cache appliance selected can deliver the response times required. There are several things that affect response times: First, ensure adequate network bandwidth on the appliance. Second, learn what type of data acceleration is being offered to move data with minimal latency. SSDs and DRAM offer superior latencies over conventional disk drives, but become meaningless if you can’t get data in and out of the caches and across the network expediently. Third, make sure that the appliance has the raw processing horsepower to scale. As more NAS instances are added the cache appliance, does it have the headroom to deliver consistent throughput and low latency?
In summary, there are a number of things to consider when evaluating and choosing a network caching solution. The same product may produce different results and benefits for different companies. It’s important for you to know the pressing objectives of today and how these might evolve over time, especially as data grows. This includes having a sense for what performance is expected based upon your data and I/O mix. You should also have a sense for what you are willing to spend in capital outlay and operating expenses. Factor in power and space utilization factors too. Define your expectations from the NAS when adding a networked cache solution to support it. For example, is the expectation that the cache appliance will always be consistent with the NAS, meaning the NAS always has the most recent writes safeguarded? If yes, the write-through model is probably best. If the answer is no, having a write-back cache could be the way to go. But if this is the case, given the write-back modeled cache appliances on the market today, is the perceived need really for networked cache featuring write-back or is it for tiered storage? This will be a topic of future blog postings.
Leave a Reply