| Existing Approaches to the Problem |
|
| |
|
|
| Background |
|
Native File Management Tools |
|
|
|
Some of the causes of excess storage consumption and inefficient data management operations are due to:
- Current tools for scrubbing the system are slow and crude.
- Tools are focused on individual storage locations and not well suited for data across multiple physical storage locations.
- Too time consuming to do right. Data managers are already spending as much time as they can on the problem.
|
|
The de facto approach to data management is via the native file manager tools provided by the native operating system environment. These tools have the following limitations:
- Native file managers (Explorer, Finder, and Nautilus) all slow down and become massive time wasters when looking at multiple TB. Waiting for roll up info can take minutes to hours.
- Answering "why" then requires more roll ups at a lower level.
- du and df based reports are slow and usually require follow on reports.
|
| |
|
|
 |
| |
|
|
| Ad-Hoc Data Management in Excel |
|
Limitations of Data Management Via Excel |
|
|
|
Customers develop processes to try to streamline this, with imperfect results. Data managers will run a tool such as du or a script and enter results in a spreadsheet.
- Run scripts using find, df, and du commands.
- Dump data to raw spreadsheet. Typically one row per file.
- Create pivot table.
- Manually attach business tags and metadata.
- Sort by size, business unit, user, etc.
- Rinse and repeat as needed!
|
|
Analysis and ad-hoc data management in Excel yields imperfect results due to:
- File system scanning can take days! Script at one animation studio takes 2.5 days to complete. Info is already out of date by then!
- Wake of debris. Usually target only the largest space wasters, leaving an ever-increasing pile of small files. Repeatedly re-crawling the small files can add as much as 10x to future scanning times.
- Infirmity. As engineers move on, custom scripts age and deteriorate.
|
| |
| |