Baseline Efficiency of Current File System Environment

 
As file-based workflows continue to evolve, a variety of problems regarding the efficiency, effectiveness, and productivity of production-line data systems have become apparent. Because analyses of these production data systems are often limited only to cleanup and crises, the systems are often considered generally inefficient and unproductive, but without specific characterization or full root-cause analysis that put a dollar figure on the problem and point the way to a solution.
 
File based workflows are aligned closely to conventional production lines, with the file representing the work in progress moving down the production line. However, efficiency metrics and other management tools typically employed in the manufacture of conventional products are missing or inadequate. The equivalent of inventory/production-line tools are needed. In the case of file-based products, inventory refers to data, or more specifically files, so these tools would be product/project -based data management tools.
 

The manufacture of conventional production-line products is supported by a rich variety of such tools. For example, tools measuring

  • Inventory carrying costs,
  • Inventory turns, and
  • Average inventory costs per project / product

are common in physical industries. However, they are not generally available for “data production lines.”

In order to determine the value of this specialized data management, the efficiency of current systems should be analyzed. Full characterization is akin to taking a “wall-to-wall” inventory, except the costs are even greater. Due to the sheer size (often hundreds of millions of files or more), the frequent churn, and the dispersed, heterogeneous nature of storage resources, facilities almost always have great difficulty determining the status of their data production line.   barcode image
 
DataFrameworks has developed a set of tools to analyze the current state of a data production line without requiring full analysis . This results in a bounded estimate for current file system efficiency and provides data needed to estimate return on investment when considering solutions to the problem.
 

To establish a baseline of a current system, a statistical analysis of files on the storage infrastructure is performed, with the goal of answering the following questions:

  • Should each file even be on the production line (is it associated with a valid project, work order, customer, etc.)?
  • Is each file in the proper location?
  • Is each file / directory named properly?
  • Are duplicates present?
 

Our analysis of actual results to date, illustrate the gains to be had are clearly large. The results have ranged from thirty-six percent to fifty-four percent of the files on storage used were found to be out of policy and unneeded.

<click here for complete detailed analysis and methodology>