Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 28
Posts: 28   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 15302 times and has 27 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
ubuntu fdupes crunching improvements

I have seen a pretty big jump by running the fdupes program on the boinc working filesystem tree for a mix of cep2 and MD.

edit: my tuning results thus far with/without:


11/26/10 0:016:00:13:13 70,115 80
11/25/10 0:013:20:53:36 49,686 82
11/24/10 0:015:03:46:29 48,962 105
11/23/10 0:007:13:24:27 34,541 50
11/22/10 0:018:11:06:49 78,323 146
11/21/10 0:018:16:22:31 83,977 88
11/20/10 0:016:16:58:35 72,952 66

11/19/10 0:009:02:15:42 39,606 35



i would like it if someone can experiment with filesystem normalization like myself to determine if there's a possible win on multicore systems.

on ubuntu on a 12 thread i980x i run the following

sudo screen 
while fdupes -Lrnq /var/lib/boinc-client ;do sleep 600;done

the second line above could be tacked on to run_client script handily for lights-out operation.

what this does is hardlink redundant non-empty files in a sleep loop of 10 minutes.

it is a fact this reduces the storage size of my work slots to 2.8gig from any size (9+ gigs) of cep2 slots open

this can be tested before and after with

du -sh  /var/lib/boinc-client
fdupes -Lrnq /var/lib/boinc-client
du -sh /var/lib/boinc-client



this reduction in allocation size reduces the load on the linux filesystem buffer cache to provide the normalized filesystem nodes for multiple concurrent slots which would otherwise cause fetching of duplicate data from redundant blocks, over, and over, and over, in redundant binary executable images in each slot, over, and over and over.

put simply, we reduce page faulting, and all kinds of context overhead by normalizing data. the cpu l3 cache can yield a higher hit rate on a uniform lineup of concurrent shared executable buffers, and instead of fetching the code, the hardware can fit more loops into the cycles.

i would like it if someone else wants to give it a try. My jaw dropped at the relative difference in the graph for a couple of days sustained trial.
----------------------------------------
[Edit 2 times, last edit by Former Member at Nov 27, 2010 3:23:53 AM]
[Nov 26, 2010 4:14:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: ubuntu fdupes crunching improvements

hi sqqqas,

Will give it a try on Linux 2.6.32.26 + some scheduling hacks on a Q6600... 6600 being by coincidence the same number as the total of files exceeded in a CEP2 task slot. If you run 12 concurrent, I can see your disk glowing from afar. Presently in a left alone mode ** achieve better than 96% efficiency with 2 concurrent and either HCMD2/HFCC/C4CW on the other 2 cores.

Notably, regardless of the large number of [soft] page faulting, under W7-64 running a mix with 2 concurrent was hitting 99% efficiency according BOINCTasks for full 8-10 hour runs and barely any kernel time. If your solution gets rid of most of the PF Delta, that could mean some interesting gain... but remain cautious that a valid result might in fact not be OK. The validators, is my impression, use rather fault tolerant rules. In that, you might want to send a note to support f.a.o. armstrdj what machines are set to do this so they can target your results for extra inspection. It would be great if confirmed to work fine.

cheers

** Xorg proofed to be a major cycle eater when GUI gloss is
cheers
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Nov 26, 2010 12:20:03 PM]
[Nov 26, 2010 11:19:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: ubuntu fdupes crunching improvements

BTW, we know about the duplicate download files which it seemed from a commentary by a WCG tech needed, but that was before Linux version 6.37 was released. It was observed for Windows. Don't know if the Windows crunchers still see the 2x 66MB file being downloaded when newly starting the CEP2 application (post reset for instance).

For my Linux, fdupes 1.50-PR2-2 build 1 is installed, but the command fails on the -Lrnq. Tried variations but could not get it to work even after visiting http://www.cyberciti.biz/faq/linux-unix-finds...les-in-given-directories/ . Did try fdupes -r /var/lib/boinc-client which found many duplicates, but that was like /slots1 with copy in /slots2 etc. Don't think I'd want to delete anything unique on a per-slot basis until understanding exactly what's going on.

A manual here, for other if looking for some documentation: http://linux.die.net/man/1/fdupes

ps, is the correct line maybe replacing the capital L with 1, the 1rnq then doing:

-1 --sameline
-r --recurse
-n --noempty
-q --quiet

thus

fdupes -1rnq /var/lib/boinc-client ;do sleep 600;done

hoping to soon hear, to go try this out.

noobish fall and standup to fall again, seems my version has an issue with the semicolon. Should these be colon instead as in:

fdupes -1rnq /var/lib/boinc-client :do sleep 600:done

Sorry for my ignorance confused
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Nov 26, 2010 12:13:57 PM]
[Nov 26, 2010 12:07:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: ubuntu fdupes crunching improvements

apoligies sekereb

the options i am using are specific, im using ubuntu 10.10 which has recent fdupes

opts i use

L - hardlink instead of delete or symlink
r - recurse
q - quiet
n - ignore empty files, bad idea to link empty files together.

the syntax i posted was sh/bash shell syntax, not csh/ksh/zsh, just paste it into a bash shell as is..

i run "screen" because its a workstation in use, so i dont need to spend any time writing a script to test a simple repeat loop of fdupes
----------------------------------------
[Edit 2 times, last edit by Former Member at Nov 26, 2010 12:58:23 PM]
[Nov 26, 2010 12:53:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: ubuntu fdupes crunching improvements

OK, yours is than probably 1.50-PR2-3 per https://launchpad.net/ubuntu/+source/fdupes/1.50-PR2-3 for Maverick Merrkat. Wanted to stick with the Linux 10.04.1 LTS for longer, but will likely make the jump up if kernel 2.6.38 comes out.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Nov 26, 2010 1:06:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
kateiacy
Veteran Cruncher
USA
Joined: Jan 23, 2010
Post Count: 1027
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: ubuntu fdupes crunching improvements

Like Sekerob, I have Ubuntu 10.04 so my available version of "fdupes" doesn't have the -L option.

It does have the following option:

-H --hardlinks normally, when two or more files point to the same disk area they are treated as non-duplicates; this option will change this behavior

Is that anything similar to the -L? Is it worth trying?
----------------------------------------

[Nov 26, 2010 1:56:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: ubuntu fdupes crunching improvements

root@i980xxl:/var/lib/boinc-client# while fdupes -Lrqn .;do sleep 600;done


root@i980xxl:/var/lib/boinc-client# fdupes --help

Usage: fdupes [options] DIRECTORY...

-r --recurse for every directory given follow subdirectories
encountered within
-R --recurse: for each directory given after this option follow
subdirectories encountered within (note the ':' at
the end of the option, manpage for more details)
-s --symlinks follow symlinks
-H --hardlinks normally, when two or more files point to the same
disk area they are treated as non-duplicates; this
option will change this behavior
-n --noempty exclude zero-length files from consideration
-A --nohidden exclude hidden files from consideration
-f --omitfirst omit the first file in each set of matches
-1 --sameline list each set of matches on a single line
-S --size show size of duplicate files
-m --summarize summarize dupe information
-q --quiet hide progress indicator
-d --delete prompt user for files to preserve and delete all
others; important: under particular circumstances,
data may be lost when using this option together
with -s or --symlinks, or when specifying a
particular directory more than once; refer to the
fdupes documentation for additional information
-L --linkhard hardlink duplicate files to the first file in
each set of duplicates without prompting the user
-N --noprompt together with --delete, preserve the first file in
each set of duplicates and delete the rest without
without prompting the user
-D --debug enable debugging information
each set of duplicates without prompting the user
-v --version display fdupes version
-h --help display this help message

root@i980xxl:/var/lib/boinc-client# fdupes --version
fdupes 1.50-PR2
root@i980xxl:/var/lib/boinc-client# uname -a
Linux i980xxl 2.6.35-23-generic #40-Ubuntu SMP Wed Nov 17 22:14:33 UTC 2010 x86_64 GNU/Linux

[Nov 26, 2010 2:04:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
kateiacy
Veteran Cruncher
USA
Joined: Jan 23, 2010
Post Count: 1027
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: ubuntu fdupes crunching improvements

Well that's interesting. I have an earlier kernel version than you, but the same fdupes version. And my fdupes has no -L. sad

kate@system76-pc:~$ more /proc/version
Linux version 2.6.32-26-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #47-Ubuntu SMP Wed Nov 17 15:58:05 UTC 2010

kate@system76-pc:~$ fdupes --version
fdupes 1.50-PR2

kate@system76-pc:~$ fdupes -h
Usage: fdupes [options] DIRECTORY...
-r --recurse for every directory given follow subdirectories encountered within
-R --recurse: for each directory given after this option follow subdirectories encountered within
-s --symlinks follow symlinks
-H --hardlinks normally, when two or more files point to the same disk area they are treated as non-duplicates; this option will change this behavior
-n --noempty exclude zero-length files from consideration
-A --nohidden exclude hidden files from consideration
-f --omitfirst omit the first file in each set of matches
-1 --sameline list each set of matches on a single line
-S --size show size of duplicate files
-m --summarize summarize dupe information
-q --quiet hide progress indicator
-d --delete prompt user for files to preserve and delete all others; important: under particular circumstances, data may be lost when using this option together with -s or --symlinks, or when specifying a particular directory more than once; refer to the fdupes documentation for additional information
-N --noprompt together with --delete, preserve the first file in each set of duplicates and delete the rest without without prompting the user
-v --version display fdupes version
-h --help display this help message
----------------------------------------

[Nov 26, 2010 2:26:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: ubuntu fdupes crunching improvements

kateiacy,

it's the 1.50-PR2-3 version you'll need. NOT tested for Lucid, get it here: http://packages.debian.org/sid/fdupes in a deb kit for the CPU of your system.

https://launchpad.net/ubuntu/maverick/amd64/fdupes/1.50-PR2-3

https://launchpad.net/ubuntu/maverick/i386/fdupes/1.50-PR2-3

Don't know what the library dependencies are... living dangerous, will dig for some backport. :P

edit: corr. amd64 link and to confirm this version works on Lucid with the L switch.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 3 times, last edit by Sekerob at Nov 27, 2010 10:19:34 AM]
[Nov 26, 2010 2:58:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: ubuntu fdupes crunching improvements

it's not obvious, but the fslint gui also has a util to do similar:

[edits: tested], but this is finer grained includion/exclusion options so that we can collect possible volatile slot files to not duplicate (important ongoing consideration)



/usr/share/fslint/fslint/findup -m /var/lib/boinc-client -size +2

[edits: ] takes 4x the wall clock of fdupes. that's enough of a metric for me to avoid this.
----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 27, 2010 1:19:22 AM]
[Nov 27, 2010 12:25:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 28   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread