Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 99
Posts: 99   Pages: 10   [ 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 44875 times and has 98 replies Next Thread
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,37626

Please post your issues/comments/questions for this beta test here.

Yes, If you can please suspend these workunits with LAIM turned off. This will allow the application to show it is able to properly restore from checkpointing.

Thanks,
-Uplinger
[Jan 7, 2015 10:00:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Falconet
Master Cruncher
Portugal
Joined: Mar 9, 2009
Post Count: 3265
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

I got one. Ran it for just over two minutes for it to checkpoint at 1% or so.
Restarted it with LAIM off and it went back to 0.204%.
----------------------------------------


AMD Ryzen 5 1600AF 4C/8T 3.2 GHz - 85W
AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W
Intel Z3740 4C/4T 1.8 GHz - 6W
[Jan 7, 2015 10:18:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

Falconet,

Thanks for the info, that sounds like it restored from mid checkpoint. I may send more work units for rigid here soon.

Thanks,
-Uplinger
[Jan 7, 2015 10:22:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
dango
Senior Cruncher
Joined: Jul 27, 2009
Post Count: 307
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

got 6...
[Jan 7, 2015 10:22:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Falconet
Master Cruncher
Portugal
Joined: Mar 9, 2009
Post Count: 3265
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

You are welcome. I tried it again and it went from 3.265% to 2.244%.
----------------------------------------


AMD Ryzen 5 1600AF 4C/8T 3.2 GHz - 85W
AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W
Intel Z3740 4C/4T 1.8 GHz - 6W
[Jan 7, 2015 10:24:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

workunits with 298 in them are the start of the flexible work units.

295, 296, 297 are rigid work units with more than one job per workunits.

298 will only have 1 job per workunit. They are more complicated and may run longer.

Thanks,
-Uplinger
[Jan 7, 2015 10:41:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

All of the initial results have been sent out. If we decide to send out more results I'll let you know.

Thanks,
-Uplinger
[Jan 7, 2015 11:32:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

I got 4 on two machines: 3 flex and one rigid. The flex ones are running well and do seem to be checkpointing a little more often, and closer to the "Write to disk at most every" time which I have set to 180sec. The rigid one ran for nearly 28 minutes before it checkpointed (at 20% progress), but shortly thereafter I suspended and resumed it with LAIM off and it seemed to correctly restart from that checkpoint. (I'm monitoring checkpoints with BoincTasks 1.66 which may not be fully reliable in that regard.)

However (if the times are accurate) I think 28 minutes is way too long, and now at 37min progress appears stuck at 20%.

Edit: At 38 minutes progress appears to have retreated to 10%. This is similar to the strange numbers that appear on long-running tasks that don't checkpoint, so maybe not too surprising but certainly unwelcome (and very off-putting for newbies). No further checkpoints have occurred either.

Edit 2: Checked stderr.txt in the slot directory and, yes, it does look like it correctly restarted. Also, looking at the times of the .ckp files, it does appear that the checkpoints are at extended intervals. It just checkpointed for a second time at 55 minutes.
----------------------------------------
[Edit 3 times, last edit by Former Member at Jan 8, 2015 12:12:01 AM]
[Jan 7, 2015 11:51:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
widdershins
Veteran Cruncher
Scotland
Joined: Apr 30, 2007
Post Count: 673
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

I've tried suspending and restarting a selection of units with LAIM off and all restarted without any problems. Work lost suspending and restarting was minimal for all the units tested when compared against units with similar run times and completion percentages that weren't suspended and restarted.

I wouldn't say no to a few more betas though, just to make sure. I'm still miles off my next badge. biggrin
[Jan 8, 2015 12:24:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

That rigid unit checkpointed a third time at 69 minutes. This time I noticed that the number of checkpoint files is growing with each checkpoint, both in the slot directory and in the contained vina_checkpoint directory. Is that supposed to happen?
[Jan 8, 2015 12:29:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 99   Pages: 10   [ 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread