Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 28
Posts: 28   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 6477 times and has 27 replies Next Thread
Mumak
Senior Cruncher
Joined: Dec 7, 2012
Post Count: 477
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Stuck unit?

This unit runs since few hours at 100%:

MCM1_0000007_0793_3

Progress: 100.000%
Elapsed time: 07:07:41
CPU time: 07:06:59
Last checkpoint: none
CPU utilization: 99.84%

Abort, Retry, Fail ?
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by Mumak at Nov 19, 2013 4:26:23 PM]
[Nov 19, 2013 4:25:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Stuck unit?

Exit Boinc and re-start Boinc, it may give it a boot up the backside wink
[Nov 19, 2013 4:38:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BladeD
Ace Cruncher
USA
Joined: Nov 17, 2004
Post Count: 28976
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stuck unit?

Exit Boinc and re-start Boinc, it may give it a boot up the backside wink

Or suspend and resume the WU.
----------------------------------------
[Nov 19, 2013 4:41:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mumak
Senior Cruncher
Joined: Dec 7, 2012
Post Count: 477
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stuck unit?

After 9 hours I did Suspend/Resume with LAIM=off. The task was rescheduled and remains running in the same state as before (100% complete, full load).

MCM1_ 0000007_ 0793_ 3-- - In Progress 11/19/13 04:48:22 11/22/13 04:48:22 0.00 0.0 / 0.0
MCM1_ 0000007_ 0793_ 2-- - In Progress 11/19/13 04:48:12 11/22/13 04:48:12 0.00 0.0 / 0.0
MCM1_ 0000007_ 0793_ 1-- - No Reply 11/9/13 04:48:16 11/19/13 04:48:16 0.00 0.0 / 0.0
MCM1_ 0000007_ 0793_ 0-- - No Reply 11/9/13 04:48:05 11/19/13 04:48:05 0.00 0.0 / 0.0

...the first 2 machines didn't return it yet (probably still running it?).
Must be a buggy one.

I'm aborting it.
----------------------------------------

----------------------------------------
[Edit 3 times, last edit by Mumak at Nov 19, 2013 8:24:52 PM]
[Nov 19, 2013 8:19:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 2955
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stuck unit?

Would these eventually time out I wonder?
----------------------------------------

[Nov 19, 2013 8:46:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stuck unit?

Mumak,

I ran this workunit in what we call standalone mode which means running it on a development machine not under the BOINC client. It ran just fine and showed regular percent complete updates and finished after reaching 100%. This must be an issue related to running under the client. I am currently running all of my clients full on MCM1 to try and catch one myself but have not had any luck yet.
For anyone who experiences this issue please supply the following:

  • Operating System and Version
  • BOINC Client version
  • Does the process continue incrementing CPU time (Windows use task manager, Unix use top or ps)
  • Locate the slot dir it is running in and check the contents of stderr.txt post its contents
  • Search the log file stdoutdae.txt in the boinc data directory for any entries with the workunit id and post any messages
  • What are the BOINC client settings for CPU throttle and LAIM

[Nov 21, 2013 3:43:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 2955
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stuck unit?

Hi armstrdj. I'm just about to send you through an e-mail, in regards to a Stuck WU...
----------------------------------------

[Nov 21, 2013 4:40:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mumak
Senior Cruncher
Joined: Dec 7, 2012
Post Count: 477
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stuck unit?

That unit has run on:
1. Win7 x64, Core i7-860, HT on
2. BOINC 7.2.28
3. Yes, CPU time continued to increment several hours after reaching 100%
4. Will do when I catch it again
5. dtto
6. No throttle, LAIM=on. I disabled LAIM before restarting the task (suspend/resume), but as I said - it continued to run @ 100%

That particular task had problems on other machines as well: 2xUser Aborted, 2xNo Reply
----------------------------------------

[Nov 21, 2013 6:05:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
X-Files 27
Senior Cruncher
Canada
Joined: May 21, 2007
Post Count: 391
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stuck unit?

1. Win7 x64 v6.1

2. 7.2.28

3. Yes

4. stderr.txt
Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.24_windows_x86_64 -SettingsFile MCM1_0000061_6178.txt -DatabaseFile dataset-17_72_SDG_v1.txt
Initializing
wcg_learn_limit = 500000
Running

5. These are the only related entries:
19-Nov-2013 02:16:36 [World Community Grid] Starting task MCM1_0000061_6178_3 using mcm1 version 724 in slot 1
21-Nov-2013 00:43:51 [World Community Grid] task MCM1_0000061_6178_3 suspended by user
21-Nov-2013 00:43:56 [World Community Grid] task MCM1_0000061_6178_3 resumed by user
21-Nov-2013 00:43:57 [World Community Grid] Resuming task MCM1_0000061_6178_3 using mcm1 version 724 in slot 1

6. No throttle and LAIM = ON
----------------------------------------

[Nov 21, 2013 7:04:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
X-Files 27
Senior Cruncher
Canada
Joined: May 21, 2007
Post Count: 391
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stuck unit?

This is a record breaking WU. LOL

2 days and 9 hours (CPU time)- still counting
----------------------------------------

[Nov 21, 2013 7:05:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 28   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread