Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Locked
Total posts in this thread: 177
Posts: 177   Pages: 18   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 104049 times and has 176 replies Next Thread
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Re: Linux Only Beta Test

widdershins,
Yes, since DDDT2 beta testing of the very long A-type WUs beta tests are limited to one job per thread at any given time to avoid "endless" beta test sessions and to increase the chances that the largest variety of beta testing machines get some.
As long as you have as many beta jobs as running threads in a given machine the servers will not send more. Even if some are Ready to Report.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
----------------------------------------
[Edit 1 times, last edit by JmBoullier at May 29, 2010 10:22:30 AM]
[May 29, 2010 10:21:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Re: Linux Only Beta Test

The timekeeping has it's learning curve. When production is running the estimated flops will get adjusted on a daily basis allowing a more accurate projection, a solito.
Actually they are rather well estimated when looking at jobs waiting to start. It's just that the percentage computation seems to be based on the maximum 8-hour duration whether the task will exceed the limit or not. That should be rather easy to correct before going to production.

Can't say that is an estimate if the progress is simply computed as a fraction of 8 hours. If the same job on yours is showing near 8 hours and mine is too, the last one shows 8:01:59 in ready to start, there might be another field to feed that time into the client TTC. Kind of like: Hello, I'm client X, with Y benchmark and the servers responding: Hello client, here's 8x3600 seconds worth of flops, the result with a header fitted to the benchmark. Wonder if knreed made the server that smart already... would be great. Will be fun to watch when mixed with HCMD2 when a series of great grandchildren hop by.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 29, 2010 11:55:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Re: Linux Only Beta Test

If the same job on yours is showing near 8 hours and mine is too, the last one shows 8:01:59 in ready to start, there might be another field to feed that time into the client TTC.
Sorry, since I started these beta WUs their estimated time before starting has always been in the range of what they have actually needed, say the final time has been between 60 % and 140 % of what was "announced". Not that precise, but much better than what we use to see during beta tests, and the DCF is currently at 0.78.
I don't know why yours are announced at around 8 hours?
I have one waiting to start which is estimated at 3:14:15, probably a bit below what will be really needed, but not that much. The 4 last ones have used from 3.81 to 4.42 hours.

I am using the standard version 6.10.17 delivered with Ubuntu 10.04 if that matters.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[May 29, 2010 3:42:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Re: Linux Only Beta Test

Houston I have a more serious problem than duration estimates, indeed: my last job has exploded after normal completion with a file too big condition.

sam. 29 mai 2010 17:18:58 CEST World Community Grid Output file BETA_A.19.C15H10N2SSe.1.2_1_4 for task BETA_A.19.C15H10N2SSe.1.2_1 exceeds size limit.
sam. 29 mai 2010 17:18:58 CEST World Community Grid File size: 53957882.000000 bytes. Limit: 52428800.000000 bytes

Please take measures for other jobs to come... sad

Edit: And one more... crying

sam. 29 mai 2010 18:11:16 CEST World Community Grid Computation for task BETA_A.19.C15H11NSSeSi.2.1_0 finished
sam. 29 mai 2010 18:11:16 CEST World Community Grid Output file BETA_A.19.C15H11NSSeSi.2.1_0_4 for task BETA_A.19.C15H11NSSeSi.2.1_0 exceeds size limit.
sam. 29 mai 2010 18:11:16 CEST World Community Grid File size: 55755276.000000 bytes. Limit: 52428800.000000 bytes

And since jobs seem to be bigger and bigger I find that this problem is very serious.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
----------------------------------------
[Edit 1 times, last edit by JmBoullier at May 29, 2010 4:26:37 PM]
[May 29, 2010 3:46:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
widdershins
Veteran Cruncher
Scotland
Joined: Apr 30, 2007
Post Count: 673
Status: Offline
Project Badges:
Re: Linux Only Beta Test

One has errored out, it appears it wasn't an error with the computation but rather with the transmission of the data back to WCG.

Result Name: BETA_ A.19.C14H12N2SSi2.1_ 0--



<core_client_version>5.10.45</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[08:36:00] Number of jobs = 16
[08:36:00] Starting job 0,CPU time has been restored to 0.000000.
[08:36:00] Creating Scratch Dir
[08:36:00] Copying jobfile
[08:36:00] Copying regular version
[08:36:00] Starting new Job
[08:36:00] Qink name = fldman
[08:36:00] Qink name = gesman
[08:36:00] Qink name = scfman
[08:37:35] Qink name = anlman
[08:37:35] End of Job
[08:37:38] Updating rkrun
[08:37:38] Copying Output Files
[08:37:38] Delete Scratch Dir
[08:37:38] Saving State and Checkpointing
[08:37:38] Finished Job #0
[08:37:38] Starting job 1,CPU time has been restored to 82.189136.
[08:37:38] Creating Scratch Dir
[08:37:38] Copying jobfile
[08:37:38] Copying regular version
[08:37:38] Starting new Job
[08:37:38] Qink name = fldman
[08:37:39] Qink name = gesman
[08:37:39] Qink name = scfman
[08:42:00] Qink name = anlman
[08:42:09] End of Job
[08:42:11] Updating rkrun
[08:42:11] Copying Output Files
[08:42:12] Delete Scratch Dir
[08:42:12] Saving State and Checkpointing
[08:42:12] Finished Job #1
[08:42:12] Starting job 2,CPU time has been restored to 312.967558.
[08:42:12] Creating Scratch Dir
[08:42:12] Copying jobfile
[08:42:12] Copying regular version
[08:42:12] Starting new Job
[08:42:12] Qink name = fldman
[08:42:13] Qink name = gesman
[08:42:13] Qink name = scfman
[08:45:56] Qink name = anlman
[08:45:56] Qink name = drvman
[08:46:38] Qink name = optman
[08:46:38] Qink name = fldman
[08:46:38] Qink name = gesman
[08:46:38] Qink name = scfman
[08:52:19] Qink name = anlman
[08:52:19] Qink name = drvman
[08:52:59] Qink name = optman
[08:52:59] Qink name = fldman
[08:52:59] Qink name = gesman
[08:52:59] Qink name = scfman
[08:58:26] Qink name = anlman
[08:58:27] Qink name = drvman
[08:59:08] Qink name = optman
[08:59:08] Qink name = fldman
[08:59:08] Qink name = gesman
[08:59:09] Qink name = scfman
[09:04:01] Qink name = anlman
[09:04:01] Qink name = drvman
[09:04:40] Qink name = optman
[09:04:40] Qink name = fldman
[09:04:40] Qink name = gesman
[09:04:40] Qink name = scfman
[09:09:08] Qink name = anlman
[09:09:08] Qink name = drvman
[09:09:49] Qink name = optman
[09:09:49] Qink name = fldman
[09:09:49] Qink name = gesman
[09:09:50] Qink name = scfman
[09:13:53] Qink name = anlman
[09:13:53] Qink name = drvman
[09:14:31] Qink name = optman
[09:14:31] Qink name = anlman
[09:14:40] End of Job
[09:14:43] Updating rkrun
[09:14:43] Copying Output Files
[09:14:43] Delete Scratch Dir
[09:14:43] Saving State and Checkpointing
[09:14:43] Finished Job #2
[09:14:43] Starting job 3,CPU time has been restored to 2023.570464.
[09:14:43] Creating Scratch Dir
[09:14:43] Copying jobfile
[09:14:43] Copying regular version
[09:14:44] Starting new Job
[09:14:44] Qink name = fldman
[09:14:44] Qink name = gesman
[09:14:44] Qink name = scfman
[09:19:23] Qink name = anlman
[09:19:32] End of Job
[09:19:34] Updating rkrun
[09:19:34] Copying Output Files
[09:19:34] Delete Scratch Dir
[09:19:34] Saving State and Checkpointing
[09:19:34] Finished Job #3
[09:19:34] Starting job 4,CPU time has been restored to 2283.950736.
[09:19:34] Creating Scratch Dir
[09:19:34] Copying jobfile
[09:19:34] Copying regular version
[09:19:34] Starting new Job
[09:19:34] Qink name = fldman
[09:19:34] Qink name = gesman
[09:19:34] Qink name = scfman
[09:22:45] Qink name = anlman
[09:22:55] End of Job
[09:22:57] Updating rkrun
[09:22:57] Copying Output Files
[09:22:57] Delete Scratch Dir
[09:22:57] Saving State and Checkpointing
[09:22:57] Finished Job #4
[09:22:57] Starting job 5,CPU time has been restored to 2471.374449.
[09:22:57] Creating Scratch Dir
[09:22:57] Copying jobfile
[09:22:57] Copying regular version
[09:22:57] Starting new Job
[09:22:57] Qink name = fldman
[09:22:58] Qink name = gesman
[09:22:58] Qink name = scfman
[09:26:29] Qink name = anlman
[09:26:37] End of Job
[09:26:39] Updating rkrun
[09:26:39] Copying Output Files
[09:26:39] Delete Scratch Dir
[09:26:39] Saving State and Checkpointing
[09:26:39] Finished Job #5
[09:26:39] Starting job 6,CPU time has been restored to 2667.674717.
[09:26:39] Creating Scratch Dir
[09:26:39] Copying jobfile
[09:26:39] Copying regular version
[09:26:39] Starting new Job
[09:26:39] Qink name = fldman
[09:26:40] Qink name = gesman
[09:26:40] Qink name = scfman
[09:30:10] Qink name = anlman
[09:30:21] End of Job
[09:30:23] Updating rkrun
[09:30:23] Copying Output Files
[09:30:23] Delete Scratch Dir
[09:30:24] Saving State and Checkpointing
[09:30:24] Finished Job #6
[09:30:24] Starting job 7,CPU time has been restored to 2865.491079.
[09:30:24] Creating Scratch Dir
[09:30:24] Copying jobfile
[09:30:24] Copying regular version
[09:30:24] Starting new Job
[09:30:24] Qink name = fldman
[09:30:25] Qink name = gesman
[09:30:25] Qink name = scfman
[09:34:42] Qink name = anlman
[09:34:50] End of Job
[09:34:53] Updating rkrun
[09:34:53] Copying Output Files
[09:34:53] Delete Scratch Dir
[09:34:53] Saving State and Checkpointing
[09:34:53] Finished Job #7
[09:34:53] Starting job 8,CPU time has been restored to 3115.086677.
[09:34:53] Creating Scratch Dir
[09:34:53] Copying jobfile
[09:34:53] Copying regular version
[09:34:54] Starting new Job
[09:34:54] Qink name = fldman
[09:34:54] Qink name = gesman
[09:34:54] Qink name = scfman
[09:38:07] Qink name = anlman
[09:38:16] End of Job
[09:38:19] Updating rkrun
[09:38:19] Copying Output Files
[09:38:19] Delete Scratch Dir
[09:38:19] Saving State and Checkpointing
[09:38:19] Finished Job #8
[09:38:19] Starting job 9,CPU time has been restored to 3303.686463.
[09:38:19] Creating Scratch Dir
[09:38:19] Copying jobfile
[09:38:19] Copying regular version
[09:38:19] Starting new Job
[09:38:19] Qink name = fldman
[09:38:19] Qink name = gesman
[09:38:19] Qink name = scfman
[09:45:59] Qink name = anlman
[09:46:14] End of Job
[09:46:16] Updating rkrun
[09:46:16] Copying Output Files
[09:46:16] Delete Scratch Dir
[09:46:16] Saving State and Checkpointing
[09:46:16] Finished Job #9
[09:46:16] Starting job 10,CPU time has been restored to 3745.298062.
[09:46:16] Creating Scratch Dir
[09:46:16] Copying jobfile
[09:46:16] Copying regular version
[09:46:16] Starting new Job
[09:46:16] Qink name = fldman
[09:46:17] Qink name = gesman
[09:46:17] Qink name = scfman
[09:53:13] Qink name = anlman
[09:53:26] End of Job
[09:53:29] Updating rkrun
[09:53:29] Copying Output Files
[09:53:29] Delete Scratch Dir
[09:53:29] Saving State and Checkpointing
[09:53:29] Finished Job #10
[09:53:29] Starting job 11,CPU time has been restored to 4148.239244.
[09:53:29] Creating Scratch Dir
[09:53:29] Copying jobfile
[09:53:29] Copying regular version
[09:53:29] Starting new Job
[09:53:29] Qink name = fldman
[09:53:30] Qink name = gesman
[09:53:30] Qink name = scfman
[09:58:21] Qink name = anlman
[09:58:36] End of Job
[09:58:38] Updating rkrun
[09:58:38] Copying Output Files
[09:58:38] Delete Scratch Dir
[09:58:39] Saving State and Checkpointing
[09:58:39] Finished Job #11
[09:58:39] Starting job 12,CPU time has been restored to 4426.328623.
[09:58:39] Creating Scratch Dir
[09:58:39] Copying jobfile
[09:58:39] Copying regular version
[09:58:39] Starting new Job
[09:58:39] Qink name = fldman
[09:58:40] Qink name = gesman
[09:58:40] Qink name = scfman
[10:19:55] Qink name = anlman
[10:22:08] End of Job
[10:22:10] Updating rkrun
[10:22:10] Copying Output Files
[10:22:10] Delete Scratch Dir
[10:22:11] Saving State and Checkpointing
[10:22:11] Finished Job #12
[10:22:11] Starting job 13,CPU time has been restored to 5719.677452.
[10:22:11] Creating Scratch Dir
[10:22:11] Copying jobfile
[10:22:11] Copying regular version
[10:22:11] Starting new Job
[10:22:11] Qink name = fldman
[10:22:13] Qink name = gesman
[10:22:13] Qink name = scfman
[11:37:36] Qink name = anlman
[11:40:40] End of Job
[11:40:42] Updating rkrun
[11:40:42] Copying Output Files
[11:40:42] Delete Scratch Dir
[11:40:42] Saving State and Checkpointing
[11:40:42] Finished Job #13
[11:40:42] Starting job 14,CPU time has been restored to 10083.394167.
[11:40:42] Creating Scratch Dir
[11:40:42] Copying jobfile
[11:40:42] Copying regular version
[11:40:43] Starting new Job
[11:40:43] Qink name = fldman
[11:40:45] Qink name = gesman
[11:40:45] Qink name = scfman
[13:07:37] Qink name = anlman
[13:10:29] End of Job
[13:10:31] Updating rkrun
[13:10:31] Copying Output Files
[13:10:31] Delete Scratch Dir
[13:10:32] Saving State and Checkpointing
[13:10:32] Finished Job #14
[13:10:32] Starting job 15,CPU time has been restored to 15190.037312.
[13:10:32] Creating Scratch Dir
[13:10:32] Copying jobfile
[13:10:32] Copying regular version
[13:10:32] Starting new Job
[13:10:32] Qink name = fldman
[13:10:34] Qink name = gesman
[13:10:34] Qink name = scfman
[14:45:48] Qink name = anlman
[14:50:37] End of Job
[14:50:40] Updating rkrun
[14:50:40] Copying Output Files
[14:50:40] Delete Scratch Dir
[14:50:40] Saving State and Checkpointing
[14:50:40] Finished Job #15
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>BETA_A.19.C14H12N2SSi2.1_0_4</file_name>
<error_code>-131</error_code>
</file_xfer_error>


</message>
]]>
[May 29, 2010 5:36:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Re: Linux Only Beta Test

No transmission issue. -131 is indicating what Jean wrote about:
ERR_FILE_TOO_BIG -131 file size too big an output file was bigger than max_nbytes

----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 29, 2010 5:52:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
pirogue
Veteran Cruncher
USA
Joined: Dec 8, 2008
Post Count: 685
Status: Offline
Project Badges:
Re: Linux Only Beta Test

So far, I've had 4 with error -131 and 22+ hours down the tubes. Are these one of the types of errors for which credit is granted?
----------------------------------------

[May 29, 2010 6:02:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Re: Linux Only Beta Test

It's a genuine application [parm] fault i.e. Don't Panic Mr Mainwaring. It's not down the tube either since now at larger test scale it's learned how big things can get and however big the result files may get, not noticing anything, largest mem used is 89MB over 259MB for RAM and VM according the Top view.

So far the longest run I had on the quad has been 5:46 hours at 2.4 ghz. Only one of 5 had a wingman agreeing and 4 more crunching now.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 29, 2010 6:15:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1294
Status: Offline
Project Badges:
Re: Linux Only Beta Test

Former BETA's runtimes between 2.94 and 4.60 ended successfull.
The last good one had an upload file of 47MB.

Now 2 longer running ones errored out with the same error as mentioned by Jean: exceeds size limit (52428800)

World Community Grid 29-05-2010 21:55:39 Output file BETA_A.19.C15H11NOS2.4.4_0_4 for task BETA_A.19.C15H11NOS2.4.4_0 exceeds size limit.
Run time 6hr42min
World Community Grid 29-05-2010 22:34:14 Output file BETA_A.19.C16H10S2Se.1.2_1_4 for task BETA_A.19.C16H10S2Se.1.2_1 exceeds size limit.
Run time 5hr25min

Dual Core Processor: Intel P8400 @ 2.26GHz
Memory 3GB
OS: Linux 2.6.28-18-generic

Waiting for new BETA's and will try to increase the max_nbytes value.
----------------------------------------

[May 29, 2010 8:55:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Re: Linux Only Beta Test

CP, are you sure you want to do that? Not sure if the servers will accept over-sized result files.

Just now my quad is sweating on a 49,382k upload, the critical level to watch for in the transfer screen presently being 51,300k. Long as that is going, not going faster than 50k, no replacement B type will be fetched /รต\
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 29, 2010 9:07:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 177   Pages: 18   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread