World Community Grid - View Thread - Server Errors. [ RESOLVED ]

Meanwhile, WCG have not said that they're now running regular scans for large directory-files that have a high proportion of entries for deleted files. They've added more RAM to the servers, but unless there's a lid on the size of the problem they may hit the RAM limit again at some stage.

The filesystem at this point contains about 13.5 million files contained in a directory structure that has 2 top level directory and 1024 sub directories for each of those two top level directories. This gives about 6,500 files per directory. We see a turnover of about 50-65% of the files every 2-3 days. It is a very volatile filesystem.

The issue with the large number of directory blocks being assigned to the directory inode occurs because once GPFS assigns blocks to a directory inode, they are never released, even if the number of files in the directory is significantly reduced. We had bug last year that resulted in a very large number of files being created in the sub directories (specifically temp files were created that we not being deleted properly). This resulted in the directories having 60-70 thousand files in them. Thus GPFS assigned more blocks to store this data. Some of the directory inodes had 14MB blocks assigned. Only about 1/4 of the subdirs were impacted in this way and based on our calculations, GPFS needed about 9GB of RAM to be able to cache enough of the inodes to achieve optimal performance. We only had 2.0GB of RAM assigned to the cache and thus performance was not optimal. Once we reduced the size of the directory inodes down to a max of 1MB, the cache now only needs to be about 2.0 GB to perform optimally. We added RAM to the servers so that we could increase the cache so that there is additional RAM available. We are also adding monitoring to the servers so that we will automatically become aware if size of the directory inodes grows excessively again.