Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 171
Posts: 171   Pages: 18   [ Previous Page | 9 10 11 12 13 14 15 16 17 18 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 114218 times and has 170 replies Next Thread
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

I think those two cases must have different causes.
All BOINC projects use different validators (they have to know what values to look for, related to their own science), and in general they keep the exact process pretty private. But I think that they all follow the same basic procedure.

1) Check that the output file(s) have been uploaded properly, and have the right sort of 'shape' - a reasonable size, expected format elements, etc.
2) Compare the actual numerical results with another computer running the same workunit.

We know that WCG does use comparison checking - otherwise why would they deliberately send all replications to 'similar' computers (same OS, same device class)? But although all iGPUs should be the same, in practice their accuracy varies. I think that's what has tripped up your iGPU example

But the NVidia example, with every replication invalid, seems to have tripped over the first part of the test: something about the data has failed the 'sensible structure' check on the returned files.
[Aug 31, 2021 2:43:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 1985
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Another mix of Valids and Invalids on the same device, several samples. Again: Invalids are usually unseen on this device. Mine is the blue marked text.

workunit 799806294:
OPNG_0082346_00063_3--   Linux Ubuntu   728   Server Aborted         9/4/21 13:22:15     9/5/21 05:01:34     0.00       0.0 / 0.0
OPNG_0082346_00063_2-- Linux Ubuntu 728 Valid 9/4/21 13:19:08 9/5/21 04:59:13 0.38 0.2 / 167.1
OPNG_0082346_00063_1-- Linux Fedora 728 Valid 9/3/21 03:15:34 9/3/21 20:48:20 0.21 0.9 / 678.6
OPNG_0082346_00063_0-- Linux Fedora 728 Invalid 9/3/21 03:14:50 9/4/21 13:18:30 0.31 1.0 / 1.0
---------------------------------------------------------------------------------------------------------------------------------------
workunit 797034683:
OPNG_0081490_00056_3--   Linux Ubuntu   728   Server Aborted         9/2/21 13:57:04     9/2/21 20:32:07     0.00       0.0 / 0.0
OPNG_0081490_00056_2-- Linux Ubuntu 728 Valid 9/2/21 13:54:49 9/2/21 20:28:02 0.22 0.8 / 697.9
OPNG_0081490_00056_1-- Linux Fedora 728 Invalid 8/31/21 12:31:54 9/2/21 13:54:14 0.29 0.9 / 0.9
OPNG_0081490_00056_0-- Linux Debian 728 Valid 8/31/21 12:30:21 8/31/21 17:45:52 0.06 0.9 / 807.5
---------------------------------------------------------------------------------------------------------------------------------------
workunit 796199435:
OPNG_0081168_00029_3--   ManjaroLinux   728   Valid                  8/31/21 21:38:25    8/31/21 22:28:55    0.21       2.1 / 619.8
OPNG_0081168_00029_4-- Linux Debian 728 Server Aborted 8/31/21 21:38:22 8/31/21 22:44:49 0.00 0.0 / 0.0
OPNG_0081168_00029_2-- Linux Fedora 728 Valid 8/30/21 17:19:02 8/31/21 21:37:59 0.20 0.6 / 500.8
OPNG_0081168_00029_1-- Linux Debian 728 Error 8/30/21 17:15:01 8/30/21 17:18:21 0.03 2.2 / 0.0
OPNG_0081168_00029_0-- Linux Fedora 728 Invalid 8/30/21 17:14:31 8/30/21 17:22:00 0.12 0.0 / 0.0
---------------------------------------------------------------------------------------------------------------------------------------
workunit 796184219:
OPNG_0081120_00040_4--   Linux Debian   728   Valid                  8/31/21 14:52:02    9/1/21 00:05:45     0.08       1.2 / 871.5
OPNG_0081120_00040_3-- Linux Debian 728 Too Late 8/31/21 14:51:01 9/1/21 00:45:11 0.15 0.9 / 0.0
OPNG_0081120_00040_2-- Linux Fedora 728 Invalid 8/30/21 19:15:14 8/31/21 23:00:01 0.38 1.2 / 1.2
OPNG_0081120_00040_1-- Linux Debian 728 Invalid 8/30/21 19:15:01 8/31/21 14:48:56 0.10 0.8 / 0.8
OPNG_0081120_00040_0-- Linux Ubuntu 728 Invalid 8/30/21 16:51:52 8/30/21 19:14:52 0.36 1.1 / 1.1
---------------------------------------------------------------------------------------------------------------------------------------


Too bad that one task is marked Too Late (too late to validate), because OPNG_0081120_00040_3-- is returned in time. Luckily it's not mine this time.
----------------------------------------
[Edit 3 times, last edit by adriverhoef at Sep 5, 2021 2:38:06 PM]
[Sep 1, 2021 9:12:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
spRocket
Senior Cruncher
Joined: Mar 25, 2020
Post Count: 234
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Errors in OPNG units have generally been rare for me, but once in a while there's been a spate of malformed ones. I haven't seen any recently, in any case; I can go to the WCG stats page and select for "Error" without any OPNG units showing.

The other day, though, I got a few resends with 1½-day deadlines, all of which came up valid for me on a GTX 960.
----------------------------------------
[Edit 1 times, last edit by spRocket at Sep 1, 2021 11:47:00 AM]
[Sep 1, 2021 11:46:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 1985
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

This time one of my devices with an Intel GPU returned a task that was declared Invalid, while only one wingman was enough for a Valid result.

workunit 798259029:
OPNG_0081873_00102_2--   Linux Ubuntu   728   Valid                  9/2/21 20:56:51     9/3/21 02:16:55     0.59       1.9 / 1,000.3
OPNG_0081873_00102_1-- ManjaroLinux 728 Server Aborted 9/2/21 20:55:55 9/3/21 06:54:21 0.00 0.0 / 0.0
OPNG_0081873_00102_0-- Linux Fedora 728 Invalid 9/1/21 15:22:49 9/2/21 20:55:41 0.45 1.6 / 1.6
---------------------------------------------------------------------------------------------------------------------------------------
EDIT: Two days later, another Invalid on that same device, while also producing about one Valid result per hour.
workunit 800032096:
OPNG_0082420_00056_4--   Linux Ubuntu   728   Server Aborted         9/4/21 19:37:05     9/5/21 11:35:01     0.00       0.0 / 0.0
OPNG_0082420_00056_3-- Linux Ubuntu 728 Valid 9/4/21 19:35:48 9/4/21 20:51:59 0.31 1.0 / 989.8
OPNG_0082420_00056_2-- Linux Fedora 728 Valid 9/4/21 19:34:47 9/5/21 11:31:40 0.08 0.6 / 618.3
OPNG_0082420_00056_1-- Linux Ubuntu 728 Invalid 9/3/21 08:20:52 9/4/21 19:34:10 0.25 1.0 / 1.0
OPNG_0082420_00056_0-- Linux Fedora 728 Invalid 9/3/21 08:20:47 9/4/21 17:45:53 0.23 0.9 / 0.9
---------------------------------------------------------------------------------------------------------------------------------------


'Invalid' is not the same as an Error status. With an Invalid, your task might still get counted towards Credits, Time and Results Returned; however, with an Error result, your task doesn't count towards Credits, Time nor Results Returned.
----------------------------------------
[Edit 1 times, last edit by adriverhoef at Sep 5, 2021 1:42:11 PM]
[Sep 3, 2021 10:07:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 1985
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

It was time for another case of all Invalids, I guess … devilish
workunit 803021600:
OPNG_0083295_00022_4--   Linux Fedora   728   Server Aborted         9/7/21 15:03:10     9/7/21 15:27:34     0.00       0.0 / 0.0
OPNG_0083295_00022_3-- Linux Ubuntu 728 Server Aborted 9/7/21 15:03:04 9/7/21 15:32:46 0.00 0.0 / 0.0
OPNG_0083295_00022_2-- Linux Ubuntu 728 Invalid 9/7/21 10:51:57 9/7/21 15:02:44 0.25 0.9 / 0.0
OPNG_0083295_00022_1-- LinuxMint 728 Invalid 9/7/21 10:51:52 9/7/21 15:23:32 1.24 0.1 / 0.0
OPNG_0083295_00022_0-- Linux Fedora 728 Invalid 9/6/21 02:10:20 9/7/21 10:51:43 0.24 0.9 / 0.0
---------------------------------------------------------------------------------------------------------------------------------------

[Sep 7, 2021 10:38:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 1985
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Another one bites the dust … (Intel GPU)

workunit 811943579:
OPNG_0085681_00077_4--   Linux Ubuntu   728   Server Aborted         9/16/21 11:54:12    9/16/21 12:17:05    0.00       0.0 / 0.0
OPNG_0085681_00077_3-- Linux Ubuntu 728 Invalid 9/16/21 03:54:54 9/16/21 11:53:37 0.35 3.9 / 0.0
OPNG_0085681_00077_2-- Linux Debian 728 Invalid 9/16/21 03:53:52 9/16/21 12:12:48 0.13 1.3 / 0.0
OPNG_0085681_00077_1-- Linux Fedora 728 Invalid 9/14/21 15:33:24 9/16/21 03:51:28 0.36 1.1 / 0.0
OPNG_0085681_00077_0-- Linux Fedora 728 Too Late 9/14/21 15:32:13 9/14/21 23:54:27 0.11 0.8 / 0.0
---------------------------------------------------------------------------------------------------------------------------------------
And another one gone …

workunit 808736232:
OPNG_0084827_00243_4--   Linux Ubuntu   728   Server Aborted         9/12/21 00:39:42    9/13/21 10:46:06    0.00       0.0 / 0.0
OPNG_0084827_00243_3-- Linux Fedora 728 Invalid 9/12/21 00:38:10 9/13/21 10:42:25 0.29 1.1 / 0.0
OPNG_0084827_00243_2-- Linux Fedora 728 Invalid 9/12/21 00:36:56 9/12/21 11:28:30 0.09 0.7 / 0.0
OPNG_0084827_00243_1-- Linux Ubuntu 728 Invalid 9/11/21 14:04:07 9/12/21 00:36:48 0.26 0.8 / 0.0
OPNG_0084827_00243_0-- Linux Ubuntu 728 Invalid 9/11/21 14:02:26 9/11/21 19:28:57 0.32 1.1 / 0.0
---------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------
[Edit 2 times, last edit by adriverhoef at Sep 16, 2021 6:55:54 PM]
[Sep 13, 2021 11:58:42 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 1985
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

More all Invalid workunits (NVIDIA):

workunit 813538160:
OPNG_0085854_00229_4--   Linux Debian   728   Invalid                9/16/21 22:49:59    9/16/21 22:56:01    0.07       0.7 / 0.0
OPNG_0085854_00229_3-- Linux 728 Server Aborted 9/16/21 22:49:58 9/16/21 23:15:44 0.00 0.0 / 0.0
OPNG_0085854_00229_1-- Linuxmint 728 Invalid 9/16/21 22:41:23 9/16/21 22:57:58 0.27 0.7 / 0.0
OPNG_0085854_00229_2-- Linux Fedora 728 Invalid 9/16/21 22:41:22 9/16/21 22:49:50 0.07 0.8 / 0.0
OPNG_0085854_00229_0-- Linux Ubuntu 728 Invalid 9/16/21 06:20:31 9/16/21 22:41:15 0.15 0.7 / 0.0
---------------------------------------------------------------------------------------------------------------------------------------
workunit 814164903:
OPNG_0085889_00528_3--   Linux Gentoo   728   Too Late               9/16/21 21:17:23    9/16/21 21:21:36    0.05       0.7 / 0.0
OPNG_0085889_00528_4-- Linux Fedora 728 Invalid 9/16/21 21:17:23 9/16/21 21:21:37 0.06 0.7 / 0.0
OPNG_0085889_00528_2-- Linux Ubuntu 728 Server Aborted 9/16/21 21:17:22 9/16/21 21:24:26 0.00 0.0 / 0.0
OPNG_0085889_00528_1-- Linux Ubuntu 728 Invalid 9/16/21 21:10:36 9/16/21 21:15:35 0.08 0.5 / 0.0
OPNG_0085889_00528_0-- Linux Ubuntu 728 Invalid 9/16/21 21:10:30 9/16/21 21:17:13 0.07 0.7 / 0.0
---------------------------------------------------------------------------------------------------------------------------------------
workunit 813741495:
OPNG_0085828_00048_4--   Linux Fedora   728   Invalid                9/16/21 11:09:06    9/16/21 11:13:20    0.06       0.6 / 0.0
OPNG_0085828_00048_3-- Linuxmint 728 Invalid 9/16/21 11:09:04 9/16/21 11:17:35 0.08 0.6 / 0.0
OPNG_0085828_00048_2-- Linux Ubuntu 728 Server Aborted 9/16/21 10:58:56 9/16/21 11:15:16 0.00 0.0 / 0.0
OPNG_0085828_00048_1-- Linux 728 Invalid 9/16/21 10:58:55 9/16/21 11:08:55 0.05 0.5 / 0.0
OPNG_0085828_00048_0-- Linux Ubuntu 728 Invalid 9/16/21 10:54:31 9/16/21 10:58:47 0.06 0.4 / 0.0
---------------------------------------------------------------------------------------------------------------------------------------
workunit 813181825:
OPNG_0085694_00446_3--   Linuxmint      728   Invalid                9/16/21 02:55:29    9/16/21 03:00:46    0.07       0.6 / 0.0
OPNG_0085694_00446_4-- Linux Fedora 728 Invalid 9/16/21 02:55:29 9/16/21 03:01:53 0.05 0.7 / 0.0
OPNG_0085694_00446_1-- Linux Ubuntu 728 Invalid 9/16/21 02:47:39 9/16/21 02:55:20 0.12 0.7 / 0.0
OPNG_0085694_00446_2-- Linux Ubuntu 728 Server Aborted 9/16/21 02:47:39 9/16/21 03:28:28 0.00 0.0 / 0.0
OPNG_0085694_00446_0-- Linux Ubuntu 728 Invalid 9/15/21 21:50:22 9/16/21 02:47:29 0.13 0.5 / 0.0
---------------------------------------------------------------------------------------------------------------------------------------
workunit 813318525:
OPNG_0085785_00315_3--   Linux Ubuntu   728   Invalid                9/16/21 01:34:18    9/16/21 01:43:35    0.15       0.7 / 0.0
OPNG_0085785_00315_4-- Linuxmint 728 Invalid 9/16/21 01:34:17 9/16/21 01:39:53 0.08 0.8 / 0.0
OPNG_0085785_00315_2-- Linux Fedora 728 Invalid 9/16/21 01:29:42 9/16/21 01:34:08 0.07 0.9 / 0.0
OPNG_0085785_00315_1-- Linux Ubuntu 728 Invalid 9/16/21 01:29:40 9/16/21 01:44:39 0.13 0.8 / 0.0
OPNG_0085785_00315_0-- Linux Ubuntu 728 Invalid 9/16/21 01:10:14 9/16/21 01:29:30 0.06 0.5 / 0.0
---------------------------------------------------------------------------------------------------------------------------------------
workunit 813129825:
OPNG_0085719_00349_3--   Linux Ubuntu   728   Server Aborted         9/16/21 00:14:26    9/16/21 00:19:22    0.00       0.0 / 0.0
OPNG_0085719_00349_4-- Linux Fedora 728 Invalid 9/16/21 00:14:26 9/16/21 00:18:42 0.07 0.9 / 0.0
OPNG_0085719_00349_1-- Linux openSU 728 Invalid 9/16/21 00:07:11 9/16/21 00:16:30 0.15 0.5 / 0.0
OPNG_0085719_00349_2-- Linux Ubuntu 728 Invalid 9/16/21 00:07:11 9/16/21 00:14:18 0.10 0.6 / 0.0
OPNG_0085719_00349_0-- Linux Ubuntu 728 Invalid 9/15/21 20:39:06 9/16/21 00:07:02 0.09 0.6 / 0.0
---------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------
[Edit 3 times, last edit by adriverhoef at Sep 17, 2021 12:17:18 AM]
[Sep 16, 2021 9:09:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

I have 11 "four invalids and an abort" - 10 on NVidia (both Windows and Linux) and one iGPU. I think we have to conclude that the datasets resulted in an "unexpected item in the result file", rather than a systemic fault in the volunteer community.
[Sep 16, 2021 9:38:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 1887
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Yup, me too. 3 invalids, and one abort so far.
----------------------------------------

[Sep 16, 2021 11:41:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 739
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

I have 11 "four invalids and an abort" - 10 on NVidia (both Windows and Linux) and one iGPU. I think we have to conclude that the datasets resulted in an "unexpected item in the result file", rather than a systemic fault in the volunteer community.

Yup - this is the second time that a cluster of work units associated with the receptor 7jji_001--ALYS417_inert_rigid has thrown up lots of Invalids. There were a load of tasks with numbers in the 0045xxx and low 0046xxx sequences, from around the 2nd June 2021 for about three or four days.

In both cases, the majority of work units seem to end up with most tasks Invalid, perhaps one Too Late(!) and one or two Server Aborted. However, sometimes there is a Valid result (or even two on very odd occasions!), so it looks as if whatever is causing the "unexpected item(s)" may be to do with some odd edge case that gets processed in different ways on different GPUs. (I had a few where mine was Valid in the June set, but so far none in the September set.)

I presume/hope that these find their way back to the scientists in some form, and they can decide what to do about them. As has been pointed out elsewhere a result that comes back Invalid may actually reflect valid science!

Cheers - Al.

P.S. I have task names and outcomes for all the examples I've seen, should they be needed...
[Sep 16, 2021 9:15:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 171   Pages: 18   [ Previous Page | 9 10 11 12 13 14 15 16 17 18 | Next Page ]
[ Jump to Last Post ]
Post new Thread