In: MySQL
20 Apr 2011For a long time I’d been seeing high server load on the server MyWapBlog.com is hosted. I thought the increasing traffic was the cause. MySQL, I knew was the causing the load, but I thought it was all natural for a dynamic site with NO caching.
One day, just when I was seriously thinking about implementing some kind of cache, I found about mtop
– small tool like top
to display running MySQL queries in real-time, running it I found that every second there were many queries (same query but made by different requests) that got stuck in the “PREPARING” state and sometimes took as long as 2 secs. to complete. The evil query was:
SELECT c.post_id FROM category_post_relationship c WHERE (c.cat_id IN (SELECT c2.id AS c2__id FROM categories c2 WHERE (c2.user_id = ?)))
Thinking what it does? Let me make it easier – it fetches “post ids” of all the posts of a user that are categorized.
1. Running the query
184 rows, Query took 0.0812 sec
2. Profiling:
starting 0.000132 checking permissions 0.000011 checking permissions 0.000011 Opening tables 0.000037 System lock 0.000015 Table lock 0.000022 init 0.000065 optimizing 0.000025 statistics 0.000027 preparing 0.000027 executing 0.000011 Sending data 0.000047 optimizing 0.000023 statistics 0.000086 preparing 0.075275 end 0.000015 query end 0.000010 freeing items 0.000033 logging slow query 0.000009 cleaning up 0.000011
3. EXPLAIN:
id select_type table type possible_keys key key_len ref row Extra 1 PRIMARY c index NULL PRIMARY 8 NULL 53898 Using where; Using index 2 DEPENDENT SUBQUERY c2 unique_subquery PRIMARY,user_id PRIMARY 4 func 1 Using where
If you’d ask some novice to do what the query does many of them would use two separate queries and believe me (I’ve done the testing as well) that’ll be much faster!
Googling “subquery optimization” got me this:
MySQL evaluates queries “from outside to inside.” That is, it first obtains the value of the outer expression outer_expr, and then runs the subquery and captures the rows that it produces.
From http://dev.mysql.com/doc/refman/5.1/en/in-subquery-optimization.html
The page also tells us some tips on how to optimize subqueries, going by the suggestions we get the following query:
SELECT c.post_id FROM category_post_relationship c WHERE EXISTS (SELECT 1 FROM categories c2 WHERE c2.user_id = 7639 AND c.cat_id = c2.id)
Still, running it takes similar query times and as such doesn’t help.
Again from the same page:
After the conversion, MySQL can use the pushed-down equality to limit the number of rows that it must examine when evaluating the subquery.
Though I didn’t read the whole page thoroughly (I’m lazy and since I got the job done by some other technique) still I’m sure that this “optimized” query only “optimizes” the subquery while our biggest problem is that the outer table (with about 50000 rows) is getting evaluated.
I felt, this was one query you’d rather not “optimize” and be happy with two separate queries.
But, this is not to say it can’t be optimized, it can be very easily – by NOT using subqueries at all. I used JOINS:
SELECT cc.post_id FROM categories c RIGHT JOIN category_post_relationship cc ON c.id = cc.cat_id<br> WHERE c.user_id = ?
Now the query takes 0.0008 sec compared to 0.0812, that’s a hundred fold improvement!
Some tips:
WHERE
condition on the outer query.
Comments are closed.