Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 33
Posts: 33   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 13505 times and has 32 replies Next Thread
NightBlade
Advanced Cruncher
Joined: Jun 10, 2008
Post Count: 89
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Some VINA based WU take forever and don't checkpoint

I posted in a previous thread (here: https://secure.worldcommunitygrid.org/forums/...ead,35055_offset,0#419848) about this, which happened with GFAM. Now, it seems that Say No To Schiso is showing the same pattern.

Namely, work-units take 2-3 days (48-60+) hours to "complete" and eventually time out. Under properties, I can see little CPU time logged with lots of elapsed time logged, and rare checkpointing.

You can check the old thread; a fresh screenshot is below.

I only see this on one computer, which is also my first i7 hyper-threaded computer (8 threads on 4 cores). I fixed this last time by reinstalling WCG, and making sure all the files/folders in WCG and Data are writeable.

Any suggestions would be appreciated.

By the way, checking my results stats, I don't see any aborted or computation error WUs listed, so I'm suspicious.


----------------------------------------

----------------------------------------
[Edit 1 times, last edit by ashes999 at May 24, 2013 2:25:47 PM]
[May 14, 2013 1:42:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

CPU time 00:03:16 [the important bit], and Elapsed 10:29:50, the time on the clock that BOINC was allowed to compute. Is something wrong? Yes, smells that way, all through where I am. Why? Check if your security software is locking the process [No different to actions proposed in the old topic] AND/OR, if something devious is running on your computer [Task Manager, Show all user processes, that is, we assume it's Windows, but you don't tell [and not going to look it up]. A copy of the message log from the start tells us the basic details.
----------------------------------------
[Edit 2 times, last edit by Former Member at May 14, 2013 2:08:40 PM]
[May 14, 2013 2:02:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
NightBlade
Advanced Cruncher
Joined: Jun 10, 2008
Post Count: 89
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

Sorry if it's double-posted. The last time, I thought it was science-specific; so I'm reposting here. Feel free to delete the old thread.

I don't understand all your instructions, so please help me out here.

First, here is the event/activity log for WCG (77kb): http://pastebin.com/QnsjrA2p

Second, it is a new machine, and I didn't see any suspicious/malicious services or activities taking up CPU. In fact, WCG is pretty much hogging all of it (as expected, which is good).

In terms of locking, I'm not sure what to look for. I can use Process Explorer to try and search to see which processes are locking/holding certain files, but I'm not sure what file(s)/substring(s) to look for.

Again, this is Windows 7.
----------------------------------------

[May 14, 2013 2:50:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

hmmm D:\temp\... how does cleaning/security software interpret names like that? I do not know, but that's one odd name to ponder over.

The log throws one big series of "Resuming task xxxxxx" and an occasional task finishing and a new one starting. What's going on there, when there are no signs of BOINC being paused for one or the other reason [that would be logged]. Propose at the very least to run the installer and choose the 'Repair' option, so properties and ownerships are confirmed. Check Windows Event viewer and AV/Security software if anything is being logged. If things do not improve, do a project reset so that the science apps get freshly downloaded.
[May 14, 2013 3:10:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

P.S. With all that 'resuming' sans logged reason, set at least the "Leave application in memory when suspended''. With that, at least the tasks will be held in memory, but they would be anyway by default until first checkpoint is reached [Many projects have longer setup periods for models, thus if a device would be used frequently and intermittent, a task would never reach that 1st checkpoint, which is why tasks until that point are paused in memory].

P.P.S. Are Activity menu options set to 'Run always' or 'Run based on preferences'? If so, also set to Run always to see if things improve. I run that way for CPU and never know BOINC runs, even when I use the computer quite intensely.
[May 14, 2013 3:18:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
NightBlade
Advanced Cruncher
Joined: Jun 10, 2008
Post Count: 89
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

I'll set from Run Scheduled to Run Always. I also already had "leave in memory when suspended" already checked.

I'll monitor for 24 hours and try (yet another) science app reset if things don't improve.
----------------------------------------

[May 14, 2013 5:19:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

Make sure to scan your logs for anything suspicious interrupting computing if things do not improve with 'run always'.

I've noticed several times [on Linux] with VINA but with Q-Chem too, when crashing [power outs], that a task would drop back to the beginning or last good checkpoint, then ending up with double the Elapsed time over CPU time. Suspect there's something related going on with you, but it should not, interruptions leading to drop backs, the Elapsed time keeps building and on the CPU time going nowhere.
[May 14, 2013 5:43:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
NightBlade
Advanced Cruncher
Joined: Jun 10, 2008
Post Count: 89
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

It looks like, even with Run Always, some tasks just constantly resume, and never make any progress. See: http://pastebin.com/VRFi8YVp

I'm going to try:

1) Repair the installer
2) Reset the WCG project
3) Set application switch to something obscene like one day (instead of 60 minutes)
----------------------------------------

[May 15, 2013 2:24:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

OK, well let us know, but, the obscene switch time wont do anything much unless you also participate in projects outside of WCG and have work for them that's looking for a time-slot.
[May 15, 2013 2:27:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
NightBlade
Advanced Cruncher
Joined: Jun 10, 2008
Post Count: 89
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

The repair seems stuck on setting BOINC directory permissions.

Very suspicious.

I will keep you posted. I'm going to nuke my BOINC installation and reinstall.
----------------------------------------

[May 15, 2013 2:55:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 33   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread