Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Support Forum: BOINC Agent Support Thread: ubuntu fdupes crunching improvements |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 28
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I have seen a pretty big jump by running the fdupes program on the boinc working filesystem tree for a mix of cep2 and MD.
----------------------------------------edit: my tuning results thus far with/without: 11/26/10 0:016:00:13:13 70,115 80 11/25/10 0:013:20:53:36 49,686 82 11/24/10 0:015:03:46:29 48,962 105 11/23/10 0:007:13:24:27 34,541 50 11/22/10 0:018:11:06:49 78,323 146 11/21/10 0:018:16:22:31 83,977 88 11/20/10 0:016:16:58:35 72,952 66 11/19/10 0:009:02:15:42 39,606 35 i would like it if someone can experiment with filesystem normalization like myself to determine if there's a possible win on multicore systems. on ubuntu on a 12 thread i980x i run the following sudo screen the second line above could be tacked on to run_client script handily for lights-out operation. what this does is hardlink redundant non-empty files in a sleep loop of 10 minutes. it is a fact this reduces the storage size of my work slots to 2.8gig from any size (9+ gigs) of cep2 slots open this can be tested before and after with du -sh /var/lib/boinc-client this reduction in allocation size reduces the load on the linux filesystem buffer cache to provide the normalized filesystem nodes for multiple concurrent slots which would otherwise cause fetching of duplicate data from redundant blocks, over, and over, and over, in redundant binary executable images in each slot, over, and over and over. put simply, we reduce page faulting, and all kinds of context overhead by normalizing data. the cpu l3 cache can yield a higher hit rate on a uniform lineup of concurrent shared executable buffers, and instead of fetching the code, the hardware can fit more loops into the cycles. i would like it if someone else wants to give it a try. My jaw dropped at the relative difference in the graph for a couple of days sustained trial. [Edit 2 times, last edit by Former Member at Nov 27, 2010 3:23:53 AM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
hi sqqqas,
----------------------------------------Will give it a try on Linux 2.6.32.26 + some scheduling hacks on a Q6600... 6600 being by coincidence the same number as the total of files exceeded in a CEP2 task slot. If you run 12 concurrent, I can see your disk glowing from afar. Presently in a left alone mode ** achieve better than 96% efficiency with 2 concurrent and either HCMD2/HFCC/C4CW on the other 2 cores. Notably, regardless of the large number of [soft] page faulting, under W7-64 running a mix with 2 concurrent was hitting 99% efficiency according BOINCTasks for full 8-10 hour runs and barely any kernel time. If your solution gets rid of most of the PF Delta, that could mean some interesting gain... but remain cautious that a valid result might in fact not be OK. The validators, is my impression, use rather fault tolerant rules. In that, you might want to send a note to support f.a.o. armstrdj what machines are set to do this so they can target your results for extra inspection. It would be great if confirmed to work fine. cheers ** Xorg proofed to be a major cycle eater when GUI gloss is cheers
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Nov 26, 2010 12:20:03 PM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
BTW, we know about the duplicate download files which it seemed from a commentary by a WCG tech needed, but that was before Linux version 6.37 was released. It was observed for Windows. Don't know if the Windows crunchers still see the 2x 66MB file being downloaded when newly starting the CEP2 application (post reset for instance).
----------------------------------------For my Linux, fdupes 1.50-PR2-2 build 1 is installed, but the command fails on the -Lrnq. Tried variations but could not get it to work even after visiting http://www.cyberciti.biz/faq/linux-unix-finds...les-in-given-directories/ . Did try fdupes -r /var/lib/boinc-client which found many duplicates, but that was like /slots1 with copy in /slots2 etc. Don't think I'd want to delete anything unique on a per-slot basis until understanding exactly what's going on. A manual here, for other if looking for some documentation: http://linux.die.net/man/1/fdupes ps, is the correct line maybe replacing the capital L with 1, the 1rnq then doing: -1 --sameline -r --recurse -n --noempty -q --quiet thus fdupes -1rnq /var/lib/boinc-client ;do sleep 600;done hoping to soon hear, to go try this out. noobish fall and standup to fall again, seems my version has an issue with the semicolon. Should these be colon instead as in: fdupes -1rnq /var/lib/boinc-client :do sleep 600:done Sorry for my ignorance
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Nov 26, 2010 12:13:57 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
apoligies sekereb
----------------------------------------the options i am using are specific, im using ubuntu 10.10 which has recent fdupes opts i use L - hardlink instead of delete or symlink r - recurse q - quiet n - ignore empty files, bad idea to link empty files together. the syntax i posted was sh/bash shell syntax, not csh/ksh/zsh, just paste it into a bash shell as is.. i run "screen" because its a workstation in use, so i dont need to spend any time writing a script to test a simple repeat loop of fdupes [Edit 2 times, last edit by Former Member at Nov 26, 2010 12:58:23 PM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
OK, yours is than probably 1.50-PR2-3 per https://launchpad.net/ubuntu/+source/fdupes/1.50-PR2-3 for Maverick Merrkat. Wanted to stick with the Linux 10.04.1 LTS for longer, but will likely make the jump up if kernel 2.6.38 comes out.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges: |
Like Sekerob, I have Ubuntu 10.04 so my available version of "fdupes" doesn't have the -L option.
----------------------------------------It does have the following option: -H --hardlinks normally, when two or more files point to the same disk area they are treated as non-duplicates; this option will change this behavior Is that anything similar to the -L? Is it worth trying? |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
|
||
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges: |
Well that's interesting. I have an earlier kernel version than you, but the same fdupes version. And my fdupes has no -L.
----------------------------------------kate@system76-pc:~$ more /proc/version Linux version 2.6.32-26-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #47-Ubuntu SMP Wed Nov 17 15:58:05 UTC 2010 kate@system76-pc:~$ fdupes --version fdupes 1.50-PR2 kate@system76-pc:~$ fdupes -h Usage: fdupes [options] DIRECTORY... -r --recurse for every directory given follow subdirectories encountered within -R --recurse: for each directory given after this option follow subdirectories encountered within -s --symlinks follow symlinks -H --hardlinks normally, when two or more files point to the same disk area they are treated as non-duplicates; this option will change this behavior -n --noempty exclude zero-length files from consideration -A --nohidden exclude hidden files from consideration -f --omitfirst omit the first file in each set of matches -1 --sameline list each set of matches on a single line -S --size show size of duplicate files -m --summarize summarize dupe information -q --quiet hide progress indicator -d --delete prompt user for files to preserve and delete all others; important: under particular circumstances, data may be lost when using this option together with -s or --symlinks, or when specifying a particular directory more than once; refer to the fdupes documentation for additional information -N --noprompt together with --delete, preserve the first file in each set of duplicates and delete the rest without without prompting the user -v --version display fdupes version -h --help display this help message |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
kateiacy,
----------------------------------------it's the 1.50-PR2-3 version you'll need. NOT tested for Lucid, get it here: http://packages.debian.org/sid/fdupes in a deb kit for the CPU of your system. https://launchpad.net/ubuntu/maverick/amd64/fdupes/1.50-PR2-3 https://launchpad.net/ubuntu/maverick/i386/fdupes/1.50-PR2-3 Don't know what the library dependencies are... living dangerous, will dig for some backport. :P edit: corr. amd64 link and to confirm this version works on Lucid with the L switch.
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 3 times, last edit by Sekerob at Nov 27, 2010 10:19:34 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
it's not obvious, but the fslint gui also has a util to do similar:
----------------------------------------[edits: tested], but this is finer grained includion/exclusion options so that we can collect possible volatile slot files to not duplicate (important ongoing consideration) /usr/share/fslint/fslint/findup -m /var/lib/boinc-client -size +2 [edits: ] takes 4x the wall clock of fdupes. that's enough of a metric for me to avoid this. [Edit 1 times, last edit by Former Member at Nov 27, 2010 1:19:22 AM] |
||
|
|