Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 352
Posts: 352   Pages: 36   [ Previous Page | 27 28 29 30 31 32 33 34 35 36 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 230825 times and has 351 replies Next Thread
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Errors.

Meanwhile, WCG have not said that they're now running regular scans for large directory-files that have a high proportion of entries for deleted files. They've added more RAM to the servers, but unless there's a lid on the size of the problem they may hit the RAM limit again at some stage.


The filesystem at this point contains about 13.5 million files contained in a directory structure that has 2 top level directory and 1024 sub directories for each of those two top level directories. This gives about 6,500 files per directory. We see a turnover of about 50-65% of the files every 2-3 days. It is a very volatile filesystem.

The issue with the large number of directory blocks being assigned to the directory inode occurs because once GPFS assigns blocks to a directory inode, they are never released, even if the number of files in the directory is significantly reduced. We had bug last year that resulted in a very large number of files being created in the sub directories (specifically temp files were created that we not being deleted properly). This resulted in the directories having 60-70 thousand files in them. Thus GPFS assigned more blocks to store this data. Some of the directory inodes had 14MB blocks assigned. Only about 1/4 of the subdirs were impacted in this way and based on our calculations, GPFS needed about 9GB of RAM to be able to cache enough of the inodes to achieve optimal performance. We only had 2.0GB of RAM assigned to the cache and thus performance was not optimal. Once we reduced the size of the directory inodes down to a max of 1MB, the cache now only needs to be about 2.0 GB to perform optimally. We added RAM to the servers so that we could increase the cache so that there is additional RAM available. We are also adding monitoring to the servers so that we will automatically become aware if size of the directory inodes grows excessively again.
[Sep 6, 2012 2:31:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Errors.

Thanks, KNReed: "We are also adding monitoring to the servers so that we will automatically become aware if size of the directory inodes grows excessively again."
That's what I was really asking.
It would be better if the monitoring & re-creation of directories was handled automatically by the filesystem driver software. That might avoid the need to take the system offline to do it manually. ( ==> IBM wishlist? )

I haven't looked at *ix filesystems since the 16-bit days (eg Bell Labs UNIX v6). The directory behaviour that you describe is the way it was done back then, and it's probably been done the same way ever since.

On to the next WCG hurdle ... HCCGPU?
[Sep 7, 2012 4:40:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 352   Pages: 36   [ Previous Page | 27 28 29 30 31 32 33 34 35 36 ]
[ Jump to Last Post ]
Post new Thread