Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 14
Posts: 14   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 20042 times and has 13 replies Next Thread
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Degraded database performance [Resolved]

We have been experiencing poor database performance. We were investigating how to improve it and we are running the MySQL optimize command on two tables. Unfortunately, this is proceeding much slower than we expected so the BOINC grid is stopped for a couple of hours. We will inform you here when things are started again.
----------------------------------------
[Edit 1 times, last edit by knreed at Mar 13, 2013 3:12:58 PM]
[Jan 9, 2013 6:37:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Degraded database performance

We have started things up again. Unfortunately we didn't get as much performance increase as we hoped. We have ordered additional memory for the servers which we expect to get installed in the next 2-3 weeks.
[Jan 9, 2013 8:45:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Degraded database performance

We are currently running backups of the database starting at 7:30 UTC on Sunday and Wednesday mornings. In order to allow them to finish relatively quickly while we wait for the additional RAM, we are stopping the backend processes (i.e. validation) while the backup runs.

This means that a large backlog of work needing validation will build up during that timeframe. It will resume around 15:00 UTC and take 12-18 hours to catch up.
[Jan 13, 2013 5:19:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Degraded database performance

The daemons that run to clear up the file system have fallen very far behind and our filesystem is filling up. We have turned off the backend daemons except for the file delete daemons to allow them to catch up. This means that validation will be stopped for several hours.
[Jan 14, 2013 4:37:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Degraded database performance

We have to modify one of the indexes on the 'result' table. As a result we will have to stop the schedulers for about 45 75 minutes.
----------------------------------------
[Edit 2 times, last edit by knreed at Jan 14, 2013 8:49:29 PM]
[Jan 14, 2013 7:22:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Degraded database performance

Work is flowing again and we have started all backed process up EXCEPT for hcc1 validation. We want to let everything finishing catching up before we start hcc1 validation again. We appreciate your patience.
[Jan 14, 2013 10:05:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Degraded database performance

Ok - I'm overdue for an update here, but I finally have some good news.

All processes are running again and we have put the system under max load and everything is behaving well. Here are some of the things that we had to change:

1) The bufferpool we have for MySQL is 44GB. However, we had innodb_buffer_pool_instances set to 1. This resulted in heavy contention for locks. MySQL has threads use spin locks for a short period of time while waiting for a lock. This caused very high cpu use on the server which resulted in a degradation in performance for all database activity. We are now using innodb_buffer_pool_instances=8 which has significantly reduced contention for locks which has lowered the cpu use. This is allowing better performance for all transactions.

2) We lowered the transaction isolation level for some backend daemons. There was contention in particular on some of the indexes for the result table (it is a 96GB table on disk). By changing the isolation level, fewer locks are held on the indexes which has increased throughput.

3) We have change the way the jobs that load work compute how much work needs to be loaded. In particular, there was one query that computes the average jobs completed per day over the past 4 days. This value is part of the calculation used to determine how much work needs to be loaded. This query was executed repeatedly when work was being loaded and is particularly resource intensive. We have replaced the dynamically computed value with a manually computed value for now. Sometime in the future we will replace this with a job that runs once per day.

4) In a few cases we have improved the SQL used to make the queries more efficient.

5) We have added additional instances of daemons for the very backend processes in order to ensure that data is deleted from the database and filesystems promptly so that we don't fall behind again.

At this time the backend processes are catching up quickly. There is only a backlog of 98,000 workunits for hcc1 validation. It has been catching up at a rate of 20,000 workunits/hour so we will almost be caught up by end of day.
----------------------------------------
[Edit 1 times, last edit by knreed at Jan 15, 2013 10:13:05 PM]
[Jan 15, 2013 9:36:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Degraded database performance

I should mention that we will still be stopping the backend daemons while the database backups are run. We will wait until the new memory is installed before we change this.
[Jan 15, 2013 10:14:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Degraded database performance

Unfortunately we did not build up enough work for Help Conquer Cancer before the daemons were stopped for the database backup. As a result we ran out of work ready to send about 3 hours ago. The daemons are running now and there will be work ready to send within an hour.
[Jan 16, 2013 2:30:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Degraded database performance

We are caught up now and work is flowing freely.

In order to help volunteers keep their machines contributing during these outages, we have expanded some setting that control how much can be cached. We are now using the following settings"
    <daily_result_quota>300</daily_result_quota>
<gpu_multiplier>15</gpu_multiplier>
<initial_daily_result_quota>5</initial_daily_result_quota>
<max_wus_to_send>30</max_wus_to_send>
<max_wus_in_progress>90</max_wus_in_progress>
<max_wus_in_progress_gpu>1200</max_wus_in_progress_gpu>

[Jan 16, 2013 6:13:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 14   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread