Accounting for just 260 working days a year, 5 terabytes a day quickly mounts up to 1.3 petabytes a year – meaning CSIRO’s data bank will double in less than two years. While it seems an enormous growth rate – it’s smack in the middle of Gartner’s forecast for all industry sectors which is 40-60 per cent data growth per year.
Dr Robert Bell, technical services manager for the CSIRO’s advanced scientific computer centre, says that at present the CSIRO’s storage systems have enough headroom to handle that level of growth. The SGI Copan MAID system was brought into production at the CSIRO with 870 Tbyte formatted capacity in September – with the ability to build that out to 1.7 Petabytes.
Dr Bell said the MAID layer of storage helps to keep running costs under control – but also provide much faster access to data than is possible using tape. (CSIRO has access to 30 Petabytes of tape storage should it need it).
Two decades ago CSIRO started using Cray’s Data Migration Facility – a system designed to allow data storage to be sensibly tiered. That technology was inherited by SGI when it bought Cray, and since been redeveloped, but remains a fixture at the CSIRO.
One of the clear benefits of the tiered approach was the reduced power costs according to Dr Bell.
He said that the storage collection, housed in its Melbourne Docklands facility, today consumed about $13,000 worth of electricity each year. Had the entire collection been housed on fast expensive disk the electricity bill alone would have blown out to $500,000 a year.
“Power costs are not going down and demand for computer power is going up,” said Dr Bell. While the CSIRO’s data storage challenges are particularly acute, there are lessons for corporate Australia he believes.
Dr Bell said that there were significant power savings for organisations able to sensibly cascade their storage requirements.
However he acknowledged that for critical corporate applications it might not be sensible to wait the 90 seconds or so it could take for data stored in a tape library to be made available to an application. Scientists could be a little more patient he acknowledged.
But the MAID could be a half-way house, as only a quarter of the disks in the chassis are powered at any one time, leading to lower energy costs, and access to data stored in the MAID could be made accessible within 15-20 seconds.
“We can never predict the workload – science research varies from day to day,” he said, adding that the CSIRO was still expecting “new avalanches of data”. But there was no real pattern which controlled when scientists might need access to that data and “Uncontrolled workload is a difficult thing to do,” he said, adding that this was one area where corporate might have more predictability.