Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 8
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 15245 times and has 7 replies Next Thread
robertmiles
Senior Cruncher
US
Joined: Apr 16, 2008
Post Count: 442
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Very long running workunit

Is there a problem with this workunit?

2/1/2009 5:05:01 PM|World Community Grid|Restarting task mf189_00038_13 using hpf2 version 603

It's already run over 23 CPU hours on my machine, and BOINC manager says that it's expected to take over 14 CPU hours more. However, its progress appears stuck at 50.807%.

I don't think I've seen any of the proteome workunits take more than perhaps 24 CPU hours before. Also, it seems to be taking more time without suspending than usual - I have BOINC set to decide which workunit gets the next timeslice every two hours, but it looks like this one's been running for at least 19 CPU hours without giving any other workunit a timeslice.
[Feb 2, 2009 6:43:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Very long running workunit

robertmiles, just stop the BOINC service / exit BOINC and restart. It's near guaranteed to resume a few percentage points lower and under loss of time cycling in the loop. After that it will quickly pass that threshold of 50.87% and finish. Old case one or the other HPF2 cruncher incurs.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Feb 2, 2009 6:46:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
robertmiles
Senior Cruncher
US
Joined: Apr 16, 2008
Post Count: 442
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Very long running workunit

Is there a problem with this workunit?

2/1/2009 5:05:01 PM|World Community Grid|Restarting task mf189_00038_13 using hpf2 version 603

It's already run over 23 CPU hours on my machine, and BOINC manager says that it's expected to take over 14 CPU hours more. However, its progress appears stuck at 50.807%.

I don't think I've seen any of the proteome workunits take more than perhaps 24 CPU hours before. Also, it seems to be taking more time without suspending than usual - I have BOINC set to decide which workunit gets the next timeslice every two hours, but it looks like this one's been running for at least 19 CPU hours without giving any other workunit a timeslice.


I've manually suspended it for long enough for another workunit to start getting a timeslice.

I don't know if it makes a difference that I've recently lowered the percentage of CPU time for BOINC from 100% to 98% in order to help look for a problem in Ralph@home workunits.
[Feb 2, 2009 6:52:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Very long running workunit

That wont work if you have "Leave application in memory when pre-empted" on. The science needs to unload to get it to resume from the last good checkpoint.

FAQ for those who do not wish to follow the short instruction: http://worldcommunitygrid.org/forums/wcg/viewthread?thread=16378

cheers
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Feb 2, 2009 6:57:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
robertmiles
Senior Cruncher
US
Joined: Apr 16, 2008
Post Count: 442
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Very long running workunit

Just suspending for few minutes didn't help, so I suspended everything and rebooted. It's now at 50.765% progress, with only about 5 CPU hours CPU time reported as used so far, and less than 6 CPU hours estimated as needed for completion. I'll watch it to see if this problem happens again.
[Feb 2, 2009 7:14:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
robertmiles
Senior Cruncher
US
Joined: Apr 16, 2008
Post Count: 442
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Very long running workunit

That wont work if you have "Leave application in memory when pre-empted" on. The science needs to unload to get it to resume from the last good checkpoint.

FAQ for those who do not wish to follow the short instruction: http://worldcommunitygrid.org/forums/wcg/viewthread?thread=16378

cheers


I have that option turned on.

Looks like the reboot procedure I used is an even more thorough, but easier to remember, method of doing essentially the same thing to BOINC workunits.
[Feb 2, 2009 7:30:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
rilian
Veteran Cruncher
Ukraine - we rule!
Joined: Jun 17, 2007
Post Count: 1443
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Very long running workunit

i think there is some info about such WUs in this topic

http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=22340
----------------------------------------
[Feb 3, 2009 5:21:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Very long running workunit

Good news from the early warning systems, the writer of BOINCTasks has put a progress monitor function on his ToDo list. You can when implemented then set for instance 30 minutes run time coupled to 0.2% progress minimum, whilst CPU time consumption remains normal. BOINCTasks [v 0.42] already has a warning function for low CPU i.e. a job running at 100% setting and hardly any being used.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Mar 5, 2010 9:47:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread