Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Support Forum: BOINC Agent Support Thread: BOINC 6.10 Alpha Testing |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 70
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
BOINC tells me
Sat 26 Sep 2009 02:48:42 AM EDT||A new version of BOINC (6.6.40) is available for your computer Sat 26 Sep 2009 02:48:42 AM EDT||Visit http://boinc.berkeley.edu/download.php to get it. on linux, so should I try 6.10 for linux (I'm waiting for FLU to run dry first), or go to the 6.6.40 it's telling me to download? Thanks! |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
6.6 is passé, but I take it that 6.6.40 found a rapid elevation to 'recommended' though it's been barely out a few days and see it only listed for Linux!!.
----------------------------------------If you wish to spend time on testing and reporting on WCG sciences related issues you're most welcome. For now I've just picked up latest 6.10.9 alpha. There are still problems of small versus large project weights and the small projects having short run times and short deadlines, tough buffer size is shorter. They take way over their share of resource time, jumping queu and keep pulling work going deep into Long/Short term debt, so be warned that hands on management is needed. These shorties now are taking 4 cores of the quad, basically blocking out WCG and several running in high priority. It makes no sense when scheduled contact is daily i.e. once more I'm disallowing all but WCG to continue the longer running stability testing. edit: Numbers: The short running project has a 0.5% resource share, taken so far 4.5 hours, with one last probably finishing in another 3 hours. Switch time is longer than the longest job run time so all jobs can finish in 1 run without loitering in memory, using LAIM. The presumed scheduler conflict is still, and it was reported at Berkeley in 6.6 pre recommended time, that small resource shares causes panic, but why is more work pulled has never been answered, though in deep debt, 15,500 seconds now. I'm sure there's an equation for it, but it's not of interest. What's of interest is that a user needs a stable client with a human comprehend-able scheduler logic without needing to take a 3 year degree. No it's not ready.
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Sep 26, 2009 10:38:19 AM] |
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: |
6.6 is passé, but I take it that 6.6.40 found a rapid elevation to 'recommended' though it's been barely out a few days and see it only listed for Linux!!. v6.6.40 is Linux-only, it's doubtful there will be a windows-release sine v6.6.38 was recently elevated to "recommended". If you wish to spend time on testing and reporting on WCG sciences related issues you're most welcome. For now I've just picked up latest 6.10.9 alpha. There are still problems of small versus large project weights and the small projects having short run times and short deadlines, tough buffer size is shorter. They take way over their share of resource time, jumping queu and keep pulling work going deep into Long/Short term debt, so be warned that hands on management is needed. These shorties now are taking 4 cores of the quad, basically blocking out WCG and several running in high priority. It makes no sense when scheduled contact is daily i.e. once more I'm disallowing all but WCG to continue the longer running stability testing. edit: Numbers: The short running project has a 0.5% resource share, taken so far 4.5 hours, with one last probably finishing in another 3 hours. Switch time is longer than the longest job run time so all jobs can finish in 1 run without loitering in memory, using LAIM. The presumed scheduler conflict is still, and it was reported at Berkeley in 6.6 pre recommended time, that small resource shares causes panic, but why is more work pulled has never been answered, though in deep debt, 15,500 seconds now. I'm sure there's an equation for it, but it's not of interest. What's of interest is that a user needs a stable client with a human comprehend-able scheduler logic without needing to take a 3 year degree. No it's not ready. Until it's announced, v6.10.9 is pre-alpha. Since they've apparently already fixed a crashing-bug with GPU, the next alpha-build after v6.10.7 will possibly be v6.10.10. As for your scheduling-problem... For already downloaded work, a very low-resource-share-project will have very high estimated run-time, example a 1-hour-cpu-task with 1% resource-share has estimated 100 hours run-time, so if the deadline is 4 days this task obviously can't run solely by resource-share in round-robin, but must run "high priority". But, why not wait until 2 hours before deadline to run it, can be a logical question. This would be an option, but has some disadvantages, like bigger chance to miss deadline, and the work is unneccesarily long time delayed before returned. In either case, the downloaded low-resource-share-task will take 1 hour out of 96 from the other projects, so running immediately after download or later wouldn't be much difference either way, since client must in either case still run the task... For work-request on the other hand... Due to AFAIK complaints from dialup-users, that didn't manage getting enough work to last until next connection, due to only their postive LTD-projects was contacted but didn't manage giving enough, the work-request-scheduler was changed in v6.6.xx, and AFAIK the rules is basically: 1: If total cached work for resource-type < "Connected..." => start with highest contactable LTD-project, and connect to contractable projects in order of decreasing LTD until enough cached. 2: If total cached work for resource-type > "Connected..." but < "Connected..." + "Additional..." => connect any contractable zero-LTD-projects until enough cached. Not sure if the rule about, -LTD < "Switch between applications..." => ask for work, is still present in the v6.6.xx-scheduling or not... Rule #1 is a major change since the v5.10.xx-scheduler, but was done due to dialup-users not getting enough work. Anyone that is permanently connected is highly recommended to use "Connected..." = zero, this is also the BOINC-defaults. So, appart for dialup-users and anyone else with non-permanent connection, they'll use rule #2 that is fairly similar to the old v5.10.xx-rule, except LTD was also positive. So... is BOINC not following these rules, and is asking for work then it shouldn't? No idea, but if you think you've detected a bug, please enable <work_fetch_debug> and post a relevant log showing this, and post it to correct location. "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Back to 6.2.28, it was still there on my disk ready to run, till the next test release shows in the alpha announcement list as did 6.10.9 since at least when I saw that which was the cue to try this one... thus not sure how it can be qualified as pre-alpha.http://boinc.berkeley.edu/dev/forum_thread.php?id=2518
----------------------------------------This thread is here, the right location, so members here read about the pitfalls of buggy alpha's, old ones I might say as the behavior has not improved over 6.6... when it was reported and discussed all thru official 'recommended' releases, which I trust can be discussed on the Berkeley forums. HP problem and excess work allocation to small resourced projects fixed? No!
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Sep 26, 2009 2:36:08 PM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
I fell off my chair, breaking the safety belt. The first job paused at 48 minutes done (43%), with -41,419 debt. Unacceptable if this is supposed to be proper resource allocation. 41,419 is equivalent of 12% of 1 day quad crunching, not 0.5%, 0.47% really, so in effect BOINC has given advanced use of designated rights by a factor of 24 to this project. If BOINC is 'a slow learner', I'd consider that an understatement. Voted most worst 'feature', if this is by design. Dail Up users complaining do have a bigger hear is my reading.
----------------------------------------Just fired up 6.2.28 again.
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: |
I fell off my chair, breaking the safety belt. The first job paused at 48 minutes done (43%), with -41,419 debt. Unacceptable if this is supposed to be proper resource allocation. 41,419 is equivalent of 12% of 1 day quad crunching, not 0.5%, 0.47% really, so in effect BOINC has given advanced use of designated rights by a factor of 24 to this project. If BOINC is 'a slow learner', I'd consider that an understatement. Voted most worst 'feature', if this is by design. Dail Up users complaining do have a bigger hear is my reading. Just fired up 6.2.28 again. v6.2.xx doesn't work the same way then it comes to scheduling and work-request as v6.6.xx, so for each time you downgrade, v6.6.xx and later clients will need some time to re-adjust when you upgrade again... As for your mis-behaving client, I've not looked on HP, so appart for knowing a 4-day CPDN-task won't sit for 300+ days before it starts running HP, it instead probably runs HP fairly quickly after download, if CPDN has very low resource-share. Not tested, it's possible it will spread-out the running across the whole year, so while it starts-off HP, it will pause frequently and other work can run, even doesn't run HP... Long-term-debt It's this feature I've mainly looked-on, and a run with <debt_debug> on can be informative, and also shows user-choises can give "unforseen consequences". To make things easy, suspended all projects except 2 and this revealed the following rule: #1: Any project "suspended", or set to "no new work", is Ineligible, and it's LTD won't change. So, if you suspended some projects or set them "no new work", your 0.5% resource-share-project got a higher effective resource-share. Also, if set the project to "no new work", the LTD wouldn't continue dropping either. Work-fetch Haven't looked much on this, but it didn't work exactly as expected then it came to how "Additional..." was handled. But, did reveal an interesting rule: #2: Contactable projects is connected in decreasing LTD-order, as long as -LTD < #cpu * ("Connected..." + "Additional...") So, for a quad you'll get a table like this:
cache: LTD So, the larger your cache-size, the more you'll likely to crunch your low-resource-share project initially... But, since LTD would immediately start to drop when starts crunching, as long as your zero-LTD-project is contactable, you won't re-fill again. The problem is that projects either doesn't respond, or responds but doesn't completely fill the cache so you'll need to deferr before is contactable again... In any case, with a 2-project-setup, since LTD for the very low-resource-share-project should drop, while other should increase but can't since already zero, the LTD will drop with 8 seconds per second if runs 4 threads on a quad. So, you'll only need to run 1/2 day with a 1-day cache setting, and so on. Probably the best rule when it comes to BOINC is, Don't micro-manage, but let BOINC-client handle things itself, after you've configured your resource-shares, and set your cache-size. Another good rule would be: Let BOINC-client run on it's own for atleast a week, before changing anything. "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Seems that the swap up to 6.10.10 has done things not documented in the Berkeley forum alpha change log... whatever they did, there is improvement, but goes back to my previous point repeated "expect the unexpected, to include changes without notice"
----------------------------------------Observational, less aggro in upping the rDCF when a single job for the same project ran longer than the mean. the <dcf_debug> log flag is quite helpful in documenting that. Whilst the very small weighted project has build up -78,000 LTD (rights for 150 days), I helped it out of it's and my misery by ending that projects work fetch. It's now finally getting a limited, yet still way over the top 30 minutes per day (the switch time) and needs probably 1 more day, but expect panic to hit anyway as it's due Oct1, early in the morning. The 0.47% share shown in the interface is actually according the calculator 0.596%, given the 3 out of 503 total resource for active projects... makes a ding fractional difference but that I knew already. Still how little do I know about "don't micro-manage" yet wanting to understand if this client will finally produce and not short change WCG when I'm not there to do the very thing. There's an interesting option mentioned in this latest version where users can control per project if NVidia/ATI/CPU use is permitted for the workfetch control. Not seen it in the interface yet, the interpretation being that with WCG now communicating there aren't any, it is not shown. More config flag indicator in the startup log will most certainly help support to diagnose problem reports. Wish there were a report to say there was no config file, or just list it in full, not only 'couple more'. ttyl (doing 6.10 now for a week or longer, uninterruptedly, and work remains 100% valid, for WCG... the most important part of all.) Change Log 6.10.10 - client: fix crashing bug in GPU message display - client: show a couple more config flags on startup - client: fix bug in CPU prefs enforcement: enforce "suspend if no recent input" and "exclusive apps" only if overall mode is RUN_MODE_AUTO (run according to prefs) - client/scheduler/web: add per-project preferences for whether to accept CPU, NVIDIA and ATI jobs. These prefs are shown only where relevant: e.g., only for processor types for which the project has app versions, and if it has versions for only one type, no pref is shown. These prefs affect both client and scheduler. The client won't ask for work for a device blocked by prefs, and the scheduler won't send it. This replaces earlier optional project-specific prefs for "no CPU jobs" and "no GPU jobs". (However, these prefs continue to be honored on the server side). - client: if NVIDIA driver is unknown, say that rather than 0 - client: add missing Windows SKUs. From Robert Kreß
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: |
Still how little do I know about "don't micro-manage" yet wanting to understand if this client will finally produce and not short change WCG when I'm not there to do the very thing. You won't get any idea if the BOINC-client works as it should or not if you're setting "no new work", since this leads to LTD isn't updated for the affected project... It will also be impossible to fix any bugs, if there are any, in how low resource-share-projects gets their LTD updated, if no-one lets their BOINC-client do it's job trying to balance work-request according to users resource-shares. In case detects any irregularities, a log with <debt_debug> showing this will be very useful. (doing 6.10 now for a week or longer, uninterruptedly, and work remains 100% valid, for WCG... the most important part of all.) There's a bug in v6.10.10 if <ncpus> is zero. Appart for this, it seems to work as it should... "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
It's been reported months ago... the behavior follows the identical profile, just not letting it run for weeks, so 78,000 debt, the equivalent of 150 times the allowed resource is more than enough evidence for me that this client is as cooked on scheduling as 6.6 was/is. Neither project has had a single case of 'no work for...', which shuts the case further why it remains up the creek.
----------------------------------------STD & LTD not being updated with work for said project in the buffer would be getting the S qualification, or without... but, it's being updated... just very slowly per day... it's now 76,000 debt, so you need to tell where it says it definitely would not? As for the less aggressive DCF/TTC increase I thought to observe, that lasted about 4 results. The 5th set the time of all remaining results to the time of the Wallclock aka Elapsed of the last completed result that ran overlong. Suddenly I've got a day more in the queue for all WCG projects, not only the relevant science. It's time for my 18:00 UTC acid reflux pill, a peppermint, royal seal by Queen Wilhelmina, giant size, special import ;>)
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Sep 29, 2009 4:57:36 PM] |
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: |
It's been reported months ago... the behavior follows the identical profile, just not letting it run for weeks, so 78,000 debt, the equivalent of 150 times the allowed resource is more than enough evidence for me that this client is as cooked on scheduling as 6.6 was/is. Neither project has had a single case of 'no work for...', which shuts the case further why it remains up the creek. I was surprised by how it worked the 1st. time saw it, but it's maybe an extension to the old rule there "Switch between applications..." played a role. A couple quick tests on Einstein@home gives these logs:
Increase buffer-size with 1 day, and gets this: 29.09.2009 23:51:28 Einstein@Home chosen: CPU minor shortfall Also, if switches preferences so "Connect..." = 9 days and "Additional..." = zero, you get:
As the two 1st. logs shows, a project can be asked for work as long as -LTD < cache-size. Also, even if -LTD > cache-size, the project can be asked for work if "Connected..." > currently cached work. STD & LTD not being updated with work for said project in the buffer would be getting the S qualification, or without... but, it's being updated... just very slowly per day... it's now 76,000 debt, so you need to tell where it says it definitely would not? Based on <debt_debug>, it's not updated. With SETI@home allowed to ask for work I get this: 29.09.2009 22:53:04 AQUA@home [debt] CPU ineligible; debt 0.00 Set SETI@home to "no new work", and the result it this: 29.09.2009 22:53:25 SETI@home work fetch suspended by user All projects besides CPDN, WCG and SETI is set to "no new work", and is out of work. WCG is also currently out of work. Anyway, v6.10.11 is out, with two changes: - client: fix crash with <ncpus>0</ncpus> - client: fix bug in coproc summary string. "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." [Edit 1 times, last edit by Ingleside at Sep 29, 2009 10:29:04 PM] |
||
|
|