Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 14
Posts: 14   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 18369 times and has 13 replies Next Thread
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Periodic Issues with File Uploads and Downloads [Resolved]

We are currently experience an intermittent issue that causes a significant slowdown on the filesystems that support file downloads and file uploads. Due to this file upload and downloads will be intermittently unavailable.
----------------------------------------
[Edit 3 times, last edit by knreed at Sep 5, 2012 5:33:52 AM]
[Jul 20, 2012 3:34:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unstable Server Environment

We are open again, but it is not clear why it slowed down for a period of time. We will continue to investigate.
[Jul 20, 2012 3:50:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unstable Server Environment

And it is back again.
[Jul 20, 2012 12:34:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unstable Server Environment

Currently, when this issue occurs, it impacts scheduler requests, website/forum access and file upload/downloads. However, it is only file upload/downloads that are involved in the actual root cause of the issue.

As a result, we are going to change things so that website/forum traffic and scheduler requests go through one set of servers while file upload/download requests go through a separate set of servers.

This change will be largely transparent. However, we will be changing the scheduler URL to https://scheduler.worldcommunitygrid.org/boinc/wcg_cgi/fcgi from https://grid.worldcommunitygrid.org/boinc/wcg_cgi/fcgi. In order to force the clients to change their setting, we will disable the scheduler at https://grid.worldcommunitygrid.org/boinc/wcg_cgi/fcgi. This means that your client will try 10 times to connect before querying the website again for the current location of the scheduler. It will then get the new location and connect properly. No action is needed on your part, but you will see messages in the software client.
[Jul 20, 2012 2:23:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unstable Server Environment

The issue with uploading/downloading is not yet resolved. The change that we made on July 20th was to ensure that our volunteers could reliably access the website and forums. We have also moved scheduler requests as well since they are independent of the filesystem issue as well.

We are working with the support groups for Red Hat Linux and IBM GPFS (the filesystem) to resolve this issue.

Users will unfortunately continue to see messages such as this until the issue is resolved:
24/07/2012 12:43:58 | World Community Grid | Started upload of c4cw_target06_088990280_0_0
24/07/2012 12:43:58 | World Community Grid | Started upload of c4cw_target06_088989962_0_0
24/07/2012 12:44:20 | World Community Grid | Temporarily failed upload of c4cw_target06_088990280_0_0: connect() failed
24/07/2012 12:44:20 | World Community Grid | Backing off 1 hr 29 min 3 sec on upload of c4cw_target06_088990280_0_0
24/07/2012 12:44:20 | World Community Grid | Temporarily failed upload of c4cw_target06_088989962_0_0: connect() failed
24/07/2012 12:44:20 | World Community Grid | Backing off 11 min 18 sec on upload of c4cw_target06_088989962_0_0
24/07/2012 12:44:21 | | Project communication failed: attempting access to reference site
24/07/2012 12:44:24 | | Internet access OK - project servers may be temporarily down.
[Jul 24, 2012 6:53:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unstable Server Environment

We continue to experience this issue. We have investigated a few items suggested by the support groups, but nothing that has resolved the issue or led us to understand the root cause.

In the meantime, we are implementing some workarounds that will allow us limit the duration of this issue when it occurs. Although this won't eliminate the issue, the goal is to minimize the impact on the end users. Part of the workarounds include disabling scheduler requests and file uploads when the issue occurs. As a result, there will be times when both file uploads/downloads and schedule requests are disabled.
[Jul 26, 2012 9:04:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unstable Server Environment

We have worked on a number of items this weekend to both mitigate and further investigate this ongoing issue. At the moment, we have started running some new code for some of our backend processes that allow them to quickly pause when this issue appears. This should significantly reduce the time that our volunteers are not able to upload/download work.
[Jul 30, 2012 3:00:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unstable Server Environment

We are going to disable the servers for a few moments here shortly as we change some settings, the outage should be only minutes.

Thanks,
-Uplinger
[Jul 30, 2012 3:36:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unstable Server Environment

Server disable was completed successfully and things should be running again.

Thanks,
-Uplinger
[Jul 30, 2012 4:22:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unstable Server Environment

Over the past 12 hours we have been running some tests to help determine the cause of our issues. These have included some diagnostic tools on the filesystem as well as network tracing between the storage system and the servers. We have disabled those now and things should be returning to normal.

Thank you for your patience,
-Uplinger
[Aug 13, 2012 6:30:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 14   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread