Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 17
Posts: 17   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 6184 times and has 16 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
confused [Explained, but not resolved until new BOINC SERVER version's applied][Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

Not sure if reported previously, but here it is:

Situation:
1. One of the two regular WU copies becomes "No Reply"
2. A repair copy got sent to my device (Win7 64bit SP1)
3. The "No Reply" regular copy got sent back to the server, and got validated as "Valid" with the other got-sent-back-in-time regular copy.
4. Repair copy got crunched, finished and uploaded back sucessfully.
5. Repair copy got treated as "Error" and claimed to have not crunched at all (0.00 hour)
Edit: After checking the event log, it turns out that the WU was not really crunched, so it's correct to have 0.00 hour crunching time. However, it should be treated as "Server Aborted" instead of "Error".


Workunit Status

Project Name: Computing for Sustainable Water
Created: 07/10/2012 13:49:46
Name: cfsw_8020_08020188
Minimum Quorum: 2
Replication: 2


Result Name App Version Number Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
cfsw_ 8020_ 08020188_ 2-- 611 Error 12年7月22日 11:48:28 12年7月22日 14:20:49 0.00 0.0 / 0.0 <--mine
cfsw_ 8020_ 08020188_ 1-- 611 Valid 12年7月12日 11:48:35 12年7月12日 15:45:13 1.15 22.3 / 21.1
cfsw_ 8020_ 08020188_ 0-- 611 Valid 12年7月12日 11:48:16 12年7月22日 14:06:42 1.15 19.8 / 21.1 <--originally "No Reply" copy


Result Log

Result Name: cfsw_ 8020_ 08020188_ 2--
<core_client_version>7.0.31</core_client_version>

Will add the Event Log entries when I have access to that device later today.
Edit: Nothing found in the event log regarding to the WU (refer to the above edit).

This is the second time I've encountered this issue (one with BOINC 7.0.28 and the other 7.0.31 (both are 64bit)).

Edit:
Note: Never encountered this problem when dealing with regular non-repair copies.

Edit 2: Added additional stuff after investigating the event log.

Edit 3: Changed the tag in the title to reflect we have to wait for the server-side codes get changed in order to solve this problem.
----------------------------------------
[Edit 4 times, last edit by Former Member at Aug 1, 2012 5:14:14 AM]
[Jul 23, 2012 5:11:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
mikey
Veteran Cruncher
Joined: May 10, 2009
Post Count: 821
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

Not sure if reported previously, but here it is:

Situation:
1. One of the two regular WU copies becomes "No Reply"
2. A repair copy got sent to my device (Win7 64bit SP1)
3. The "No Reply" regular copy got sent back to the server, and got validated as "Valid" with the other got-sent-back-in-time regular copy.
4. Repair copy got crunched, finished and uploaded back sucessfully.
5. Repair copy got treated as "Error" and claimed to have not crunched at all (0.00 hour)
Edit: After checking the event log, it turns out that the WU was not really crunched, so it's correct to have 0.00 hour crunching time. However, it should be treated as "Server Aborted" instead of "Error".


This is NORMAL Boinc behavior, the reason is because the 'no reply' copy of the workunit does NOT get marked as invalid even though a replacement is sent out. This happens only once in a great while as most 'no reply' workunits do NOT get returned prior to you returning the unit, if that had happened YOU would have gotten credit and the original 'no reply' pc would have gotten no credit. One way to help ensure it doesn't happen again is to reduce the size of your cache, meaning you will then return the units faster and hopefully be prior to the 'no reply' unit. Oh the reason you got a copy is because the 'no reply' copy expired and was not returned prior to its expiration. We have asked Dr. Anderson for this for many years but there are higher priorities, and probably always will be. Dr. David Anderson, of Berkeley, wrote and still maintains the Boinc program.
----------------------------------------


[Jul 23, 2012 1:15:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: [Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

Hi mikey159b,
I think that Moonian is concerned with the behavior of his own work unit copy in this case, rather than the late unit that got validated.

Lawrence
[Jul 23, 2012 2:24:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1294
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

Edit: After checking the event log, it turns out that the WU was not really crunched, so it's correct to have 0.00 hour crunching time. However, it should be treated as "Server Aborted" instead of "Error".

Result Name: cfsw_ 8020_ 08020188_ 2--
<core_client_version>7.0.31</core_client_version>

Hello Moonian,

This bug is introduced in BOINC client version 7.0.27 because of new exit codes. Up to version 7.0.31 the bug is still there but meanwhile David Anderson made a fix in BOINC's source code.
In version 7.0.32 it should be fixed, but that version is not available yet, so we have to live with 'error' in stead of 'server aborted' or 'cancelled by server'.
----------------------------------------

[Jul 23, 2012 4:05:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: [Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

Edit: After checking the event log, it turns out that the WU was not really crunched, so it's correct to have 0.00 hour crunching time. However, it should be treated as "Server Aborted" instead of "Error".

Result Name: cfsw_ 8020_ 08020188_ 2--
<core_client_version>7.0.31</core_client_version>

Hello Moonian,

This bug is introduced in BOINC client version 7.0.27 because of new exit codes. Up to version 7.0.31 the bug is still there but meanwhile David Anderson made a fix in BOINC's source code.
In version 7.0.32 it should be fixed, but that version is not available yet, so we have to live with 'error' in stead of 'server aborted' or 'cancelled by server'.

Ah, this should explain why this "Error" status occurs. (I was using the older v.7.0.25 until I've got this newer machine recently)

Thanks for the information biggrin
----------------------------------------
[Edit 1 times, last edit by Former Member at Jul 23, 2012 6:09:07 PM]
[Jul 23, 2012 6:05:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Bugg
Senior Cruncher
USA
Joined: Nov 19, 2006
Post Count: 271
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

I would be willing to bet that if you used the 6.10.58 (recommended by WCG, after all) or at the very least 6.12.34, things like this possibly wouldn't even happen. Just a guess, as I only use 6.12.34 as that's what I found before I started back with WCG and so have stuck with it. :)
----------------------------------------

i5-12600K (3.7GHz), 32GB DDR5, Win11 64bit Home

[Jul 24, 2012 6:33:02 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

This bug is introduced in BOINC client version 7.0.27 because of new exit codes. Up to version 7.0.31 the bug is still there but meanwhile David Anderson made a fix in BOINC's source code.
In version 7.0.32 it should be fixed, but that version is not available yet, so we have to live with 'error' in stead of 'server aborted' or 'cancelled by server'.

The release of BOINC-client v7.0.32 or later won't have any effect in this case, since as the linked code-snippet shows, the "bug" is in the web-code.

Meaning, until WCG upgrades their web-code, you'll continue getting results marked as "Error" on WCG's web-pages.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
[Jul 24, 2012 12:13:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: [Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

I would be willing to bet that if you used the 6.10.58 (recommended by WCG, after all) or at the very least 6.12.34, things like this possibly wouldn't even happen. Just a guess, as I only use 6.12.34 as that's what I found before I started back with WCG and so have stuck with it. :)

Well, I can live with it without any problem, at least it doesn't consume any crunch time at all. Anyways, this is one of the costs of using "cutting-edge" versions, which is not unexpected biggrin
[Jul 25, 2012 5:14:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: [Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

This bug is introduced in BOINC client version 7.0.27 because of new exit codes. Up to version 7.0.31 the bug is still there but meanwhile David Anderson made a fix in BOINC's source code.
In version 7.0.32 it should be fixed, but that version is not available yet, so we have to live with 'error' in stead of 'server aborted' or 'cancelled by server'.

The release of BOINC-client v7.0.32 or later won't have any effect in this case, since as the linked code-snippet shows, the "bug" is in the web-code.

Meaning, until WCG upgrades their web-code, you'll continue getting results marked as "Error" on WCG's web-pages.

Oh, so this is a server-side stuff? Anyways, I guess we should let them resolve that upload/download issue first before dealing with somewhat-trivial stuffs like this one smile
[Jul 25, 2012 5:16:12 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1294
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Bug?]Repair WU copy reported LATER than a "No Reply" copy and got an "ERROR" status

Oh, so this is a server-side stuff? Anyways, I guess we should let them resolve that upload/download issue first before dealing with somewhat-trivial stuffs like this one smile

The latter is surely more important!

I'm not sure whether it's a server issue.
It could be that the server code is OK, but the client is falsely returning code 202 where it should be 203.
I´ve seen it happen with other projects too like SIMAP, SETI and Yoyo.
----------------------------------------

[Jul 25, 2012 9:27:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 17   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread