Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 109
Posts: 109   Pages: 11   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 73242 times and has 108 replies Next Thread
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

Your perception of your stats are still not appreciating the PV jail numbers. About day 4 you should reach a fairly constant, but HPF2 is extra special because it is highly susceptible to office Monday-Friday crunch contributing, adding to that the quorum 15 mechanism. Just look at this roller coaster
http://i137.photobucket.com/albums/q210/Sekerob/WCGHPF2ProdChart.png

and compare that to the project continuity of the others.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Mar 4, 2010 6:32:52 PM]
[Mar 4, 2010 6:31:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

a
aa
aaa
aaaa
aaaaa
aaaaaa
aaaaaaa
aaaaaaaa
aaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa...............................
----------------------------------------

[Mar 4, 2010 8:47:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
rilian
Veteran Cruncher
Ukraine - we rule!
Joined: Jun 17, 2007
Post Count: 1442
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

One of my hosts started getting random errors today/yesterday

Result Name: ne416_ 00037_ 6--
<core_client_version>6.10.32</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>
<stderr_txt>


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7C81A3E1

Engaging BOINC Windows Runtime Debugger...

++ there is a long debug info below



WU quited after 60 hours of crunching crying confused

Beside this WU, some other quite after from 0.02 hours up to 7.xx hours with same error

<core_client_version>6.10.32</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>

</stderr_txt>
]]>

ne751_ 00038_ 3-- computername Error 3/5/10 11:12:52 3/5/10 11:15:15 0.02 0.1 / 0.0
ne752_ 00050_ 9-- computername Error 3/5/10 11:12:25 3/5/10 11:15:15 0.02 0.2 / 0.0
ne753_ 00044_ 7-- computername Error 3/5/10 11:12:25 3/5/10 11:15:15 0.02 0.1 / 0.0
ne741_ 00042_ 13-- computername Error 3/5/10 11:12:25 3/5/10 11:15:15 0.02 0.2 / 0.0
ne735_ 00006_ 3-- computername Error 3/5/10 07:00:19 3/5/10 11:12:24 3.84 32.0 / 0.0
ne727_ 00029_ 17-- computername Error 3/5/10 07:00:19 3/5/10 11:12:24 3.37 28.0 / 0.0
ne691_ 00073_ 18-- computername Pending Validation 3/4/10 15:39:23 3/5/10 07:00:19 13.28 110.5 / 0.0
ne691_ 00043_ 2-- computername Error 3/4/10 15:39:23 3/5/10 11:12:24 7.37 61.3 / 0.0
ne691_ 00041_ 10-- computername Error 3/4/10 15:39:23 3/5/10 11:12:24 5.82 48.4 / 0.0

confused

i can get messages log form this machine later...
----------------------------------------
----------------------------------------
[Edit 2 times, last edit by rilian at Mar 5, 2010 3:24:00 PM]
[Mar 5, 2010 3:20:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

rillian,

Please see my HPF2 forum post of today... BOINCTasks is getting an alert system to warn for tasks stuck in a loop. HPF2 is the only one I know at WCG that does that, rarely.

I'm for now using RosettaView (no longer available on the intertube)
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Mar 5, 2010 3:56:09 PM]
[Mar 5, 2010 3:52:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
rilian
Veteran Cruncher
Ukraine - we rule!
Joined: Jun 17, 2007
Post Count: 1442
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

Sekerob, thanks, i've seen this post ( http://www.worldcommunitygrid.org/forums/wcg/...24380_lastpage,yes#270466 ) about BOINCTasks tool

unfortunately i have quite remote machines so even if it will warn me on some WU, i could not do anything in time

Is there anything i can do, except not running HPF2 on that machine?

it is

GenuineIntel Intel(R) Xeon(TM) CPU 3.00GHz [x86 Family 15 Model 4 Stepping 10]
Microsoft Windows Server 2003
Enterprise Server x86 Edition, Service Pack 2, (05.02.3790.00)
----------------------------------------
[Mar 5, 2010 8:43:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
robertmiles
Senior Cruncher
US
Joined: Apr 16, 2008
Post Count: 442
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

I got somewhat similar errors on my laptop for a while, before I decided I had credit for enough of this type workunits for now and switched all three of my computers to another WCG subproject.

A few details about that computer:
64-bit Windows Vista SP2
BOINC 6.10.18
several other BOINC projects,including the GPU type and the full CPU and full GPU type (Einstein)
8 GB memory for 2 CPU cores; BOINC allowed to use only 40% of it due to problems on my other two computers if more allowed
Keep workunits in memory when suspended turned off, again due to problems on my other two computers
BOINC allowed to use 60% of the CPU time, compared to 100% on the two computers with better results on this subproject
Errors generally occur well after the workunit is started, when it's trying to resume from a checkpoint

The GPU and Einstein workunits tend to suspend themselves whenever I use the keyboard or the touchpad. For Einstein workunits, at least, this lets a CPU-only workunit get a much shorter than usual piece of a timeslot; I suspect that could cause problems for CPU workunits with infrequent checkpoints if BOINC counts those pieces the same as a full timeslot. The GPU workunits resume within minutes after I stop using the keyboard and the touchpad; so do Einstein workunits, even when that requires an early suspension of a CPU-only workunit about the same as some other workunit going into high-priority mode.

I've never been interested in overclocking enough to find instructions on how to do it, but that laptop is rather hot to put on my lap even with the current settings, and tends to use the high speed of its fan much more often now that I've found some GPU projects compatible with its GPU board (a G105M).
----------------------------------------
[Edit 1 times, last edit by robertmiles at Mar 7, 2010 5:17:05 AM]
[Mar 7, 2010 4:17:50 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

May have mentioned this before, but with me quad W7-64 bit and 64 bit client (6.10.36) was observing a pattern of HPF2 fails, but exclusively when running in combination with AutoDock sciences. First saw a number of 2 minute error-outs and one 50 minutes into the job all with the same lines in the result log ending in /401 whilst HFCC was running, so deselect that project and by the time none we left in the mix, all ran happy with RICE, HCC, HCMD2. Then yesterday I had a few FAAH come and forced 1 to start. Sure enough whilst running several HPF2 jobs failed in the familiar 2 minute style. The FAAH finished and since returned 4 more without issue, 2 still in progress with 2 hours under the buckle.

Thus, anyone else having this particular experience or can reconstruct this to have happened when listing out the Result Status pages or BOINCTasks history (v 0.45)? A Sample:

First set when a FAAH ran:

World Community Grid 6.03 hpf2 nf439_00014 06:35:57 (06:13:36) 17-03-2010 10:10 17-03-2010 10:10 Reported: Ok
World Community Grid 6.06 hcc1 X0000090400045200707131445 04:56:10 (04:46:55) 17-03-2010 09:57 17-03-2010 09:57 Reported: Ok
World Community Grid 6.06 hcc1 X0000090400129200708021915 04:53:47 (04:44:07) 17-03-2010 09:24 17-03-2010 09:24 Reported: Ok
World Community Grid 6.06 hcc1 X0000090400276200707121410 04:30:44 (04:28:19) 17-03-2010 06:31 17-03-2010 06:31 Reported: Ok
World Community Grid 6.06 hcc1 X0000090400316200707121409 04:34:58 (04:32:13) 17-03-2010 05:01 17-03-2010 05:01 Reported: Ok
World Community Grid 6.06 hcc1 X0000090400589200707121404 04:38:46 (04:36:22) 17-03-2010 04:30 17-03-2010 04:31 Reported: Ok
World Community Grid 6.03 hpf2 nf439_00010 05:32:54 (05:31:14) 17-03-2010 03:34 17-03-2010 03:34 Reported: Ok
World Community Grid 6.03 hpf2 nf439_00015 05:48:11 (05:44:52) 17-03-2010 02:00 17-03-2010 02:00 Reported: Ok
World Community Grid 6.03 hpf2 nf406_00064 04:29:17 (04:24:05) 16-03-2010 23:51 16-03-2010 23:52 Reported: Ok
World Community Grid 6.07 faah faah11385_ZINC11800521_xMut_md21780_02 06:24:48 (06:14:08) 16-03-2010 23:42 16-03-2010 23:43 Reported: Ok
World Community Grid 6.03 hpf2 nf439_00011 00:01:16 (00:01:15) 16-03-2010 22:01 16-03-2010 22:02 Reported: Computation error (1,)
World Community Grid 6.06 hcc1 X0000090370235200708021803 04:45:19 (04:38:38) 16-03-2010 22:00 16-03-2010 22:00 Reported: Ok
World Community Grid 6.03 hpf2 nf406_00058 05:25:20 (04:50:20) 16-03-2010 20:12 16-03-2010 20:12 Reported: Ok
World Community Grid 6.03 hpf2 nf405_00046 06:10:32 (05:41:34) 16-03-2010 19:48 16-03-2010 19:48 Reported: Ok
World Community Grid 6.03 hpf2 nf389_00078 05:25:14 (05:11:54) 16-03-2010 17:55 16-03-2010 17:56 Reported: Ok
World Community Grid 6.03 hpf2 nf439_00023 00:01:20 (00:01:12) 16-03-2010 17:18 16-03-2010 17:20 Reported: Computation error (1,)
World Community Grid 6.03 hpf2 nf382_00032 05:33:06 (05:25:04) 16-03-2010 16:36 16-03-2010 16:36 Reported: Ok
World Community Grid 6.06 hcc1 X0000090281140200708021314 04:58:47 (04:42:34) 16-03-2010 13:40 16-03-2010 13:40 Reported: Ok
World Community Grid 6.03 hpf2 nf380_00030 05:12:20 (04:50:29) 16-03-2010 13:36 16-03-2010 13:37 Reported: Ok

Second set when several HFCC ran:

World Community Grid 6.06 hcc1 X0000084650807200703070838 04:35:01 (04:33:19) 10-03-2010 05:54 10-03-2010 05:55 Reported: Ok
World Community Grid 6.03 hpf2 ne861_00046 04:53:04 (04:50:50) 10-03-2010 03:52 10-03-2010 03:52 Reported: Ok
World Community Grid 6.03 hpf2 ne863_00000 05:51:41 (05:49:11) 10-03-2010 01:19 10-03-2010 01:20 Reported: Ok
World Community Grid 6.03 hpf2 ne858_00011 05:07:29 (05:04:20) 09-03-2010 23:00 09-03-2010 23:06 Reported: Ok
World Community Grid 6.03 hpf2 ne858_00042 05:06:34 (05:03:57) 09-03-2010 22:59 09-03-2010 23:06 Reported: Ok
World Community Grid 6.03 hpf2 ne843_00019 05:22:19 (05:16:21) 09-03-2010 22:27 09-03-2010 23:06 Reported: Ok
World Community Grid 6.03 hpf2 ne859_00044 05:47:49 (05:14:02) 09-03-2010 19:28 09-03-2010 19:28 Reported: Ok
World Community Grid 6.06 hcc1 X0000084630459200703161915 05:17:39 (05:01:45) 09-03-2010 17:52 09-03-2010 17:53 Reported: Ok
World Community Grid 6.03 hpf2 ne853_00105 06:44:52 (06:27:13) 09-03-2010 17:52 09-03-2010 17:53 Reported: Ok
World Community Grid 6.06 hcc1 X0000084640008200703021829 05:34:07 (05:19:53) 09-03-2010 16:55 09-03-2010 16:55 Reported: Ok
World Community Grid 6.03 hpf2 ne820_00038 06:26:45 (06:11:06) 09-03-2010 13:07 09-03-2010 13:07 Reported: Ok
World Community Grid 6.03 hpf2 ne852_00027 00:01:11 (00:01:02) 09-03-2010 12:35 09-03-2010 12:36 Reported: Computation error (1,)
World Community Grid 6.10 hfcc HFCC_s2_00419591_s2_0001 09:32:57 (09:20:57) 09-03-2010 12:33 09-03-2010 12:34 Reported: Ok
World Community Grid 6.10 hfcc HFCC_s2_00418320_s2_0001 10:22:15 (10:10:21) 09-03-2010 11:53 09-03-2010 11:54 Reported: Ok
World Community Grid 6.03 Human Proteome Folding - Phase 2 ne853_00092 00:50:50 (00:48:20) 09-03-2010 10:34 09-03-2010 10:36 Reported: Computation error (1,)
World Community Grid 6.03 Human Proteome Folding - Phase 2 ne825_00040 00:01:26 (00:01:12) 09-03-2010 09:38 09-03-2010 09:39 Reported: Computation error (1,)

World Community Grid 6.06 Help Conquer Cancer X0000084600343200703161822 05:08:41 (05:00:26) 09-03-2010 09:37 09-03-2010 09:37 Reported: Ok
World Community Grid 6.03 Human Proteome Folding - Phase 2 ne820_00036 00:01:19 (00:01:15) 09-03-2010 05:53 09-03-2010 05:54 Reported: Computation error (1,)
World Community Grid 6.10 Help Fight Childhood Cancer HFCC_s2_00418006_s2_0001 07:54:03 (07:50:56) 09-03-2010 05:52 09-03-2010 05:52 Reported: Ok
World Community Grid 6.03 Human Proteome Folding - Phase 2 ne816_00007 05:46:25 (05:42:25) 09-03-2010 04:28 09-03-2010 04:28 Reported: Ok

To emphasize, when no AutoDock jobs ran concurrently, there was a 100% hpf2 success rate, to include the periodic preemptive schedule in of a 300 hour CPDN model.

edit: italics on jobs of interest.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Mar 17, 2010 10:49:08 AM]
[Mar 17, 2010 10:48:29 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

I will not be able to try out your sucess formula on my Win7-64 for another 8-9 hours but I will certainly be testing this tonight!
[Mar 17, 2010 12:38:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

Now that I stopped HPFP2 I wanted to give a more detailed look to the thousands of errors I got and unfortunately it is not only between one or two minutes that it fails.
Here under a list of errors with the highest crunch time values before failing. I do not care the loss of points, but surely I am not very happy with the loss of time.


ne580_ 00022_ 13-- Ceres Error 03.03.10 02:35:43 04.03.10 21:06:48 31.03 705.9 / 0.0
nf185_ 00033_ 5-- Uranus Error 12.03.10 01:22:10 13.03.10 23:49:13 30.49 681.8 / 0.0
ne998_ 00018_ 18-- Ceres Error 09.03.10 09:26:57 11.03.10 11:47:13 30.09 695.0 / 0.0
nf023_ 00025_ 11-- Pluto Error 09.03.10 16:50:42 11.03.10 12:41:16 29.80 695.0 / 0.0
ne867_ 00077_ 13-- Pluto Error 07.03.10 08:13:40 09.03.10 03:58:41 29.49 770.0 / 0.0
ne870_ 00038_ 14-- Ceres Error 07.03.10 10:27:16 09.03.10 03:58:16 29.26 658.5 / 0.0
ne684_ 00019_ 7-- Saturn Error 04.03.10 13:15:55 06.03.10 01:44:05 28.86 723.5 / 0.0
nf225_ 00030_ 10-- Ceres Error 12.03.10 17:22:58 14.03.10 10:36:41 28.57 641.8 / 0.0
ne956_ 00041_ 1-- Mercury Error 08.03.10 18:35:22 10.03.10 09:11:37 28.20 700.6 / 0.0
ne762_ 00044_ 1-- Saturn Error 05.03.10 14:15:05 07.03.10 11:24:50 26.73 670.0 / 0.0
nf049_ 00036_ 2-- Mars Error 10.03.10 01:28:50 10.03.10 23:17:40 5.00 120.7 / 0.0
nf086_ 00088_ 1-- Mars Error 10.03.10 13:14:57 11.03.10 11:31:40 4.32 101.3 / 0.0
ne859_ 00088_ 20-- Terra Error 07.03.10 09:20:54 08.03.10 05:26:53 3.40 79.4 / 0.0
ne845_ 00043_ 3-- Ceres Error 06.03.10 21:48:37 07.03.10 12:22:30 3.24 76.0 / 0.0
nf116_ 00029_ 4-- Mars Error 10.03.10 23:17:42 11.03.10 15:08:24 3.21 76.0 / 0.0
ne768_ 00051_ 4-- Pluto Error 05.03.10 16:41:01 06.03.10 04:47:06 2.29 56.3 / 0.0
ne963_ 00031_ 14-- Jupiter Error 08.03.10 20:47:25 09.03.10 23:52:13 2.26 55.4 / 0.0
ne851_ 00028_ 10-- Mars Error 07.03.10 00:23:02 07.03.10 13:03:27 2.10 50.6 / 0.0
nf030_ 00005_ 4-- Ceres Error 09.03.10 19:03:08 10.03.10 10:01:26 0.77 17.5 / 0.0
----------------------------------------

[Mar 17, 2010 2:02:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

Hypernova,

Suggest you look in the log detail. The ones, say up to 5 hours are all the /401 fails or an absent output file type **. Those with the 27-31 hours are probably time out loopers, when they've computed like 10x the fpops amount that was given in the task headers.

I'll drop a note in the back room to see if the lord of the wrench can do something about the time part.

ttyl

** was collecting the different messages for errors on my own and all the wingmen errors, than lost it. There were like 6 of which 3 at least surely are device issues such as "too many exits" and a time exceed.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Mar 17, 2010 2:21:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 109   Pages: 11   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread