Counting Comments: Why I didn’t use COUNT(*)

October 3rd, 2008

When developing this blog, one decision I had to make was how I wanted to retrieve the number of comments per post to be displayed on the blog index page. This is a specific example, but this is an issue that is raised in many web applications, whenever you need to display how many sub-elements there are per item (eg. threads per forum, replies per message, posts per category, etc.). It’s also an example where conventional wisdom is wrong.

For the sake of this article, let’s say our tables look like this:


posts
---------------
id INT
title VARCHAR
body TEXT

comments
---------------
id INT
post_id INT
comment TEXT

The obvious answer is to query the posts table for the post data, join the comments table and count the comments where the post_id matches. Well, that will get you your answer…eventually. Aggregate functions in MySQL, especially COUNT(*) work very well when you’re only returning the COUNT(*) value, and when you’re only querying one MyISAM table. If you’re querying multiple tables, or using InnoDB tables then you will run into some pretty nasty performance issues. So, in this case, it’s not the best approach.

“Well then,” you say, “why don’t we just query for post data, then run our results through a loop to query for the comment counts?” Well, you could do that, but it’s not a solution that scales well. Let’s pretend for the moment that we’re not using caching in anyway, and let’s assume there are ten posts per page. That means in order to get the comment count for each post we would have to run one query to get our post data, and 10 subsequent COUNT(*) queries on the comments table to get the comment counts for each post. That’s 11 queries in total, which is hardly an efficient use of our resources. It means that each time the page loads we’re making 11 queries just for our post results ? on a high traffic site that’s a huge drain on our MySQL server.

In an ideal world, we’d like to make one query, on one table, and get our results. No complicated joins, aggregate functions, or excessive querying. This can be achieved very simply by setting up our data in a way that makes sense for us to retrieve it. What if our posts table looked like this:


posts
---------------
id INT
title VARCHAR
body TEXT
comment_count INT

Well, if the comment count was part of the posts table then a simple query to the posts table would get us all the information we need wouldn’t it? How can this be achieved? When comments are stored to the database all we need to do, in addition to inserting our comment data is to simply increment the value in the comment_count column in the posts table. This means an extra query is involved in adding a comment, but MySQL handles SET column_name = column_name + 1 very quickly and efficiently, and we make this additional query just once when the comment is written, but we reap the savings on every page load.

So, yes, we are storing a little extra data, and making an extra query, but storage space is cheap, and the savings on the reads are greater than the costs on the writes.

Next time you’re faced with a similar scenario, look at how you can structure your tables to make your reads easier because in the real world while writing the a the fanciest query conceivable may be gratifying, but it just doesn’t scale.

Tweet This

Leave a Reply