Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Support Forum: BOINC Agent Support Thread: Some VINA based WU take forever and don't checkpoint |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 33
|
Author |
|
NightBlade
Advanced Cruncher Joined: Jun 10, 2008 Post Count: 89 Status: Offline Project Badges: |
I posted in a previous thread (here: https://secure.worldcommunitygrid.org/forums/...ead,35055_offset,0#419848) about this, which happened with GFAM. Now, it seems that Say No To Schiso is showing the same pattern.
----------------------------------------Namely, work-units take 2-3 days (48-60+) hours to "complete" and eventually time out. Under properties, I can see little CPU time logged with lots of elapsed time logged, and rare checkpointing. You can check the old thread; a fresh screenshot is below. I only see this on one computer, which is also my first i7 hyper-threaded computer (8 threads on 4 cores). I fixed this last time by reinstalling WCG, and making sure all the files/folders in WCG and Data are writeable. Any suggestions would be appreciated. By the way, checking my results stats, I don't see any aborted or computation error WUs listed, so I'm suspicious. [Edit 1 times, last edit by ashes999 at May 24, 2013 2:25:47 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
CPU time 00:03:16 [the important bit], and Elapsed 10:29:50, the time on the clock that BOINC was allowed to compute. Is something wrong? Yes, smells that way, all through where I am. Why? Check if your security software is locking the process [No different to actions proposed in the old topic] AND/OR, if something devious is running on your computer [Task Manager, Show all user processes, that is, we assume it's Windows, but you don't tell [and not going to look it up]. A copy of the message log from the start tells us the basic details.
----------------------------------------[Edit 2 times, last edit by Former Member at May 14, 2013 2:08:40 PM] |
||
|
NightBlade
Advanced Cruncher Joined: Jun 10, 2008 Post Count: 89 Status: Offline Project Badges: |
Sorry if it's double-posted. The last time, I thought it was science-specific; so I'm reposting here. Feel free to delete the old thread.
----------------------------------------I don't understand all your instructions, so please help me out here. First, here is the event/activity log for WCG (77kb): http://pastebin.com/QnsjrA2p Second, it is a new machine, and I didn't see any suspicious/malicious services or activities taking up CPU. In fact, WCG is pretty much hogging all of it (as expected, which is good). In terms of locking, I'm not sure what to look for. I can use Process Explorer to try and search to see which processes are locking/holding certain files, but I'm not sure what file(s)/substring(s) to look for. Again, this is Windows 7. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
hmmm D:\temp\... how does cleaning/security software interpret names like that? I do not know, but that's one odd name to ponder over.
The log throws one big series of "Resuming task xxxxxx" and an occasional task finishing and a new one starting. What's going on there, when there are no signs of BOINC being paused for one or the other reason [that would be logged]. Propose at the very least to run the installer and choose the 'Repair' option, so properties and ownerships are confirmed. Check Windows Event viewer and AV/Security software if anything is being logged. If things do not improve, do a project reset so that the science apps get freshly downloaded. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
P.S. With all that 'resuming' sans logged reason, set at least the "Leave application in memory when suspended''. With that, at least the tasks will be held in memory, but they would be anyway by default until first checkpoint is reached [Many projects have longer setup periods for models, thus if a device would be used frequently and intermittent, a task would never reach that 1st checkpoint, which is why tasks until that point are paused in memory].
P.P.S. Are Activity menu options set to 'Run always' or 'Run based on preferences'? If so, also set to Run always to see if things improve. I run that way for CPU and never know BOINC runs, even when I use the computer quite intensely. |
||
|
NightBlade
Advanced Cruncher Joined: Jun 10, 2008 Post Count: 89 Status: Offline Project Badges: |
I'll set from Run Scheduled to Run Always. I also already had "leave in memory when suspended" already checked.
----------------------------------------I'll monitor for 24 hours and try (yet another) science app reset if things don't improve. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Make sure to scan your logs for anything suspicious interrupting computing if things do not improve with 'run always'.
I've noticed several times [on Linux] with VINA but with Q-Chem too, when crashing [power outs], that a task would drop back to the beginning or last good checkpoint, then ending up with double the Elapsed time over CPU time. Suspect there's something related going on with you, but it should not, interruptions leading to drop backs, the Elapsed time keeps building and on the CPU time going nowhere. |
||
|
NightBlade
Advanced Cruncher Joined: Jun 10, 2008 Post Count: 89 Status: Offline Project Badges: |
It looks like, even with Run Always, some tasks just constantly resume, and never make any progress. See: http://pastebin.com/VRFi8YVp
----------------------------------------I'm going to try: 1) Repair the installer 2) Reset the WCG project 3) Set application switch to something obscene like one day (instead of 60 minutes) |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
OK, well let us know, but, the obscene switch time wont do anything much unless you also participate in projects outside of WCG and have work for them that's looking for a time-slot.
|
||
|
NightBlade
Advanced Cruncher Joined: Jun 10, 2008 Post Count: 89 Status: Offline Project Badges: |
The repair seems stuck on setting BOINC directory permissions.
----------------------------------------Very suspicious. I will keep you posted. I'm going to nuke my BOINC installation and reinstall. |
||
|
|