Northgrid-tech: June 2011

Wednesday 22 June 2011

How to remove apel warnings and avoid nagios alerts

Quite few sites have few entries in APEL that don't quite match. They can appear with two messages

OK [ Minor discrepancy in even numbers ]
WARN [ Missing data detected ]

They don't look good on the Sync page and nagios also sends alerts for this problem which is even more annoying.

The problem is caused by few records with the wrong time stamp (StartTime=01-01-1970). These records need to be deleted from the local database and the period were they appear republished with the gap publisher. To delete the records connect to your local APEL mysql and run:

mysql> delete from LcgRecords where StartTimeEpoch = 0;

Then for each month were the entries appear rerun the gap publisher. And finally rerun the publisher in missing records mode to update the SYNC page or you can wait the next proper run if you are not impatient.

Thanks to Cristina for this useful tip she gave me in this ticket.

Tuesday 14 June 2011

DPM optimization next round

After I applied 3 of the mysql parameters changes I talk about in this post I didn't see the improvement I was hoping with atlas jobs time outs.

This is another set of optimizations I put together after further search

First of all I started to systematically count the time TIME_WAIT connections every five minutes. I also correlated them in the same log file to the number of concurrent threads the server keeps mostly in sleep mode. You can get the last bit running mysqladmin -p proc stat or from within a mysql command line. The number of threads was near to the max allowed default value in mysql so I doubled that in my.cnf

max_connections=200

then I halved the kernel time out for TIME_WAIT connections

sysctl -w net.ipv4.tcp_fin_timeout=30

the default value is 60 sec. If you add it to /etc/sysctl.conf it becomes permanent.

Finally I found this article which explicitly talks about mysql tunings to reduce connection timeouts: Mysql Connection Timeouts and I set the following

sysctl -w net.ipv4.tcp_max_syn_backlog=8192
sysctl -w net.core.somaxconn=512

again add to /etc/sysctl.conf to make it permanent; and added in my.cnf

back_log=500

I calculated my numbers on 500 connections/s because that's what I have observed when I did all this (I obeserved even larger numbers). Admittedly now they are stable at 330 connections per second but we haven't had any heavy ramp up since Saturday. Only a mild one but that didn't cause any time out. I'm waiting for a serious ramp as definitive test. Said that since Saturday we haven't seen any timeout errors not even the low background that was always present. So there is already an improvement.

Update 16/06/2011

Today there was an atlas ramp from almost 0 to >1400 jobs and no time outs so far.

Few timeouts were seen yesterday but they were due to authentication between the head node and a couple of data servers which I will have to investigate but they are a handful, nowhere near the scale observed before and not due to mysql. I will still keep things under observation for a while longer. Just in case.

Friday 10 June 2011

DPM Optimization

My quest to optimize DPM continues. Bottlenecks are like Russian dolls and hide behind each other. After optimizing the data servers increasing the block device read ahead; enabling lacp on network channel bonding and multiplying the atlas hotdisk files there is still a problem with mysql on the head node which causes time outs.

When atlas ramps up there is often a increase of connection in TIME_WAIT. I observed >2600 at times. The mysql database becomes completely unresponsive and causes the time outs. Restarting the database causes the connections to finally close and the database to resume normal activity. Although a restart might alleviate the problem as usual it's not a cure. So I went on a quest. What follows might not alleviate my specific problem, I haven't tested in production yet, but it certainly helps with another: DB reload.

Sam already wrote some performance tuning tips here: Performance and Tuning most notably the setting of innodb_buffer_pool_size. After a discussion on the DPM user forum and some testing this is what I'd add:

I set "DPM REQCLEAN 3m" when I upgraded to DPM 1.7.4 and this, after a reload, has reduced Manchester DB file size from 17GB to 7.6GB. Dumping the db took 7m34s. I then reloaded it with different combinations of suggested my.cnf innodb parameters and the effects of some of them are dramatic.

The default parameters should definitely be avoided. Reloading a database with the default parameters takes several hours. Last time it took 17-18 hours, this time I interrupted after 4.

With a combination of the parameters suggested by Maarten the time is drastically reduced. In particular the most effective have been setting innodb_buffer_pool_size and innodb_log_file_size. Below are the results of the upload tests I made in decreasing order of time. I then followed Jean Philippe suggestion to drop the requests tables. Dropping the tables took several minutes and it was slightly faster with a single db file. After I dropped the tables and the indexes ibdata1 size dropped to 1.2GB and using combination 4 below it took 1m7s to dump and 5m7s to reload. With one file per table configuration reloading was slightly faster but after I dropped the requests tables there was no difference and it is also balanced by the fact that deletion seems slower and the effects are probably more visible when the database is bigger so these small tests don't give any compelling reason in favour nor against for now.

This are steps that help reducing the time it takes to reload the database:

1) Enable REQCLEAN in shift.conf (I set it to 3 months to comply with security requirements.)
2) set innodb_buffer_pool_size in my.cnf (I set it at 10% of the machine memory and I couldn't see much difference eventually when I set it to 22.5% but in production it might be another story with repeated queries for the same input files)
3) set innodb_log_file_size in my.cnf (didn't give much thought to this, Maarten value of 50MB seemed good enough. Binary log files need to be removed to enable this and the database restarted but check the docs this might not be a valid strategy if you make heavier use of the binary logs.)
4) set innodb_flush_log_at_trx_commit = 2 in my.cnf (although this parameter seems less effective during reload it might be useful in production 2 is slightly safer than 0).
5) Use the script Jean-Philippe gave me to drop the requests tables before an upgrade.

Hopefully they will help stop also the time outs.

Tests:

COMBINATION 1

innodb_buffer_pool_size = 400MB
# innodb_log_file_size = 50MB
innodb_flush_log_at_trx_commit = 2
# innodb_file_per_table

real 167m30.226s
user 1m41.860s
sys 0m9.987s

============================
COMBINATION 2
innodb_buffer_pool_size = 900MB
# innodb_log_file_size = 50MB
# innodb_flush_log_at_trx_commit = 2
# innodb_file_per_table

real 155m2.996s
user 1m40.843s
sys 0m9.935s

===========================
COMBINATION 3
innodb_buffer_pool_size = 900MB
innodb_log_file_size = 50MB
# innodb_flush_log_at_trx_commit = 2
# innodb_file_per_table

real 49m2.683s
user 1m39.137s
sys 0m9.902s
===========================
COMBINATION 4
innodb_buffer_pool_size = 400MB
innodb_log_file_size = 50MB
innodb_flush_log_at_trx_commit = 2 <-- test also with 0 instead of 2 but it didn't change the time it took and 2 is slightly safer
# innodb_file_per_table

real 48m32.398s
user 1m40.638s
sys 0m9.733s
===========================
COMBINATION 5
innodb_buffer_pool_size = 900MB
innodb_log_file_size = 50MB
innodb_flush_log_at_trx_commit = 2
innodb_file_per_table

real 47m25.109s
user 1m39.230s
sys 0m9.985s
===========================
COMBINATION 6
innodb_buffer_pool_size = 400MB
innodb_log_file_size = 50MB
innodb_flush_log_at_trx_commit = 2
innodb_file_per_table

real 46m46.850s
user 1m40.378s
sys 0m9.950s
===========================