Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 8
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 11487 times and has 7 replies Next Thread
BSD
Senior Cruncher
Joined: Apr 27, 2011
Post Count: 224
Status: Offline
Reply to this Post  Reply with Quote 
confused WU seems to be hung at checkpoint [COMPLETED]

Got a WU that's seems to be hung at a checkpoint. Sould I let it continue or do a suspend and computer reboot to see if progress continues?

It's an old IBM R40 laptop with Win 7 x86, 1 GB DDR RAM, been an good cruncher running typically 24/7 for about a year now.

9/2/2011 3:05:01 PM Processor: 1 GenuineIntel Intel(R) Pentium(R) M processor 1300MHz [Family 6 Model 9 Stepping 5]


DSFL_ 00000002_ 0000045_ 0647_ 0-- - In Progress 9/2/11 12:58:51 9/12/11 12:58:51 0.00 <-- Mine
DSFL_ 00000002_ 0000045_ 0647_ 1-- 619 Pending Validation 9/2/11 12:58:25 9/3/11 18:24:07 5.39

Elapsed Time: 37 Hrs, 27 Min, 07 Sec and continuing
Progress: 30%



"stdout.txt" file last modifed 9/2/2011 6:08 PM:

[09:48:01] [INFO] Checkpoint complete.
[10:34:08] [INFO] Checkpoint complete.
[11:19:19] [INFO] Checkpoint complete.
[12:05:38] [INFO] Checkpoint complete.
[12:14:30] [INFO] Checkpoint complete.
[12:23:25] [INFO] Checkpoint complete.
[12:32:20] [INFO] Checkpoint complete.
[12:41:14] [INFO] Checkpoint complete.
[16:37:10] [INFO] Checkpoint complete.
[18:08:07] [INFO] Checkpoint complete.


"stderr.txt" file last modified 9/2/2011 6:08 PM:

INFO: No state to restore. Start from the beginning.
[09:01:10] Number of tasks = 36
[09:01:10] Starting job 0,CPU time is 0.000000.
[09:01:10] ZINC20604122.pdbqt size = 34 9 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000002.pdbqt size = 2474 0
[09:48:01] Finished Job #0 cpu time used 2778.114733
[09:48:01] Starting job 1,CPU time is 2778.114733.
[09:48:01] ZINC20604122.pdbqt size = 34 9 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000002.pdbqt size = 2474 0
[10:34:08] Finished Job #1 cpu time used 2761.681102
[10:34:08] Starting job 2,CPU time is 5539.795835.
[10:34:09] ZINC20604122.pdbqt size = 34 9 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000002.pdbqt size = 2474 0
[11:19:19] Finished Job #2 cpu time used 2704.739224
[11:19:19] Starting job 3,CPU time is 8244.535059.
[11:19:19] ZINC20604122.pdbqt size = 34 9 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000002.pdbqt size = 2474 0
[12:05:38] Finished Job #3 cpu time used 2772.276338
[12:05:38] Starting job 4,CPU time is 11016.811397.
[12:05:38] ZINC20604123.pdbqt size = 21 3 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000002.pdbqt size = 2474 0
[12:14:30] Finished Job #4 cpu time used 530.813272
[12:14:30] Starting job 5,CPU time is 11547.624669.
[12:14:30] ZINC20604123.pdbqt size = 21 3 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000002.pdbqt size = 2474 0
[12:23:25] Finished Job #5 cpu time used 533.517160
[12:23:25] Starting job 6,CPU time is 12081.141829.
[12:23:25] ZINC20604123.pdbqt size = 21 3 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000002.pdbqt size = 2474 0
[12:32:20] Finished Job #6 cpu time used 533.587261
[12:32:20] Starting job 7,CPU time is 12614.729090.
[12:32:21] ZINC20604123.pdbqt size = 21 3 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000002.pdbqt size = 2474 0
[12:41:14] Finished Job #7 cpu time used 530.442739
[12:41:14] Starting job 8,CPU time is 13145.171829.
[12:41:14] ZINC20604125.pdbqt size = 34 9 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000002.pdbqt size = 2474 0
Quit requested: Exiting
[15:07:03] Number of tasks = 36
[15:07:03] Starting job 8,CPU time is 13145.171829.
[15:07:03] ZINC20604125.pdbqt size = 34 9 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000002.pdbqt size = 2474 0
[16:37:10] Finished Job #8 cpu time used 2641.999008
[16:37:10] Starting job 9,CPU time is 15787.170837.
[16:37:10] ZINC20604125.pdbqt size = 34 9 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000002.pdbqt size = 2474 0
[18:08:07] Finished Job #9 cpu time used 2678.611654
[18:08:07] Starting job 10,CPU time is 18465.782491.
[18:08:07] ZINC20604125.pdbqt size = 34 9 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000002.pdbqt size = 2474 0
----------------------------------------
[Edit 2 times, last edit by BSD at Sep 7, 2011 1:26:07 PM]
[Sep 5, 2011 2:53:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
etienne06
Advanced Cruncher
France
Joined: Jun 11, 2009
Post Count: 56
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU seems to be hung at checkpoint

I'm sorry, I can't really help you, but my problem may be linked to yours : yesterday, I thought it woas better to abort a WU because after 2 hours & 1/2, it was only 0.875% completed... which corresponds to 286 hours to achieve 100%...

Is it normal ? Is it due ton the weakness of my computer ?

Thanks to all, and sorry to use this thread, but I wonder if the problems are linked...
[Sep 5, 2011 3:16:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU seems to be hung at checkpoint

Switch off "Leave Application in Memory when preempted" in the preferences [if it's on], then:

1. Suspend all "Ready to start" tasks.
2. Suspend the suspect task for 30 seconds and check in Task Manager it's unloaded from memory (the process wcg_dsfl_vina_6.19_windows_intelx86)
3. Resume the task, which will cause it to go back to last checkpoint and observe.
4. Resume the other suspended tasks.

Let us know what happens.

--//--

P.S. it's easier to monitor saved progress by activating the <checkpoint_debug> log flag in the cc_config.xml.
----------------------------------------
[Edit 1 times, last edit by Former Member at Sep 5, 2011 3:32:13 PM]
[Sep 5, 2011 3:24:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BSD
Senior Cruncher
Joined: Apr 27, 2011
Post Count: 224
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU seems to be hung at checkpoint

Switch off "Leave Application in Memory when preempted" in the preferences [if it's on], then:

1. Suspend all "Ready to start" tasks.
2. Suspend the suspect task for 30 seconds and check in Task Manager it's unloaded from memory (the process wcg_dsfl_vina_6.19_windows_intelx86)
3. Resume the task, which will cause it to go back to last checkpoint and observe.
4. Resume the other suspended tasks.

Let us know what happens.

--//--

P.S. it's easier to monitor saved progress by activating the <checkpoint_debug> log flag in the cc_config.xml.


Started your instructions. The elapsed time was 41:07:26 and progress 30%.

Unchecked LAIM.

Suspened the only task, opened Task Manager, sorted by username "boinc_*". No CPU activity for any boinc related process, I had CPU preference set to use 50% changed it to 100% and changed it back to 50%. Waited for about 1 minute, no boinc related process unloaded. What ever boinc process was originally there, stayed there.

Resumed task, task elapsed time changed to 05:11:00 and progress stayed at 30%, status changed to "Waiting to run (waiting for GPU memory)". BOINC then downloaded a CEP2 WU and status for that went to "Running". This IBM R40 laptop has no GPU, boinc message "No usable GPUs found". This device is using the same device profile as 4 other of my devices that also don't have any GPUs. Have had no problem with them crunching DSFL tasks except a recent 195 error on one device.

I suspended the CEP2 task, no change on the DSFL task.

Rechecked LAIM, suspended DSFL task and resumed, no change on the status "Waiting to run ...".

Did Activity Suspend, rebooted, did Activity Resume. The DSFL task status changed to "Running", elapsed time 05:10:55, progress 30.277%. I resumed the CEP2 task which status changed to "Waiting to run".

Looks like all is happy again. dancing
[Sep 5, 2011 10:42:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BSD
Senior Cruncher
Joined: Apr 27, 2011
Post Count: 224
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU seems to be hung at checkpoint [COMPLETED]

DSFL_ 00000002_ 0000045_ 0647_ 0-- 619 Valid 9/2/11 12:58:51 9/7/11 10:56:57 16.59 126.1 / 117.2 <-- Mine
DSFL_ 00000002_ 0000045_ 0647_ 1-- 619 Valid 9/2/11 12:58:25 9/3/11 18:24:07 5.39 108.2 / 117.2

CPU Time: 16:35
Elapsed Time: 16:44

Wasted Electricity Time: ~ 96 Hours very sad sad


Changed subject from RESOLVED to COMPLETED to more accurately reflect final outcome of this trouble some WU. If this will be an ongoing problem with DSFL WUs on this IBM R40 laptop, I'll just have make sure not to include any more of this project.
[Sep 7, 2011 1:39:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU seems to be hung at checkpoint [COMPLETED]

I have the same problem you had above. Is there now any reason/solution or why is it completed here?

My last WU was running 30 hrs and was counted only 15 hrs.

It looks like my little old Win XP computer is loosing a lot of time... :-(
[Sep 8, 2011 6:08:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU seems to be hung at checkpoint [COMPLETED]

Hello elRonaldo,
SekeRob's method is a gentle reboot for a single stuck task. It does almost as much as a reboot for the selected work unit, but without the loss of time for other work units.
Being both lazy and suspicious, I usually reboot when I have strange behavior like this. It can happen in the best computers.

Lawrence
[Sep 8, 2011 11:39:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
pcwr
Ace Cruncher
England
Joined: Sep 17, 2005
Post Count: 10903
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU seems to be hung at checkpoint [COMPLETED]

I had one on my laptop.
Luckily I noticed it had stopped.
After a laptop reboot, it continued.

Patrick
----------------------------------------

[Sep 12, 2011 8:45:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread