MySQL JOIN with LIMIT 1 on joined table

I like more another approach described in a similar question: https://stackoverflow.com/a/11885521/2215679

This approach is better especially in case if you need to show more than one field in SELECT. To avoid Error Code: 1241. Operand should contain 1 column(s) or double sub-select for each column.

For your situation the Query should looks like (this query also will work in PostgresQL and it is pretty fast, see my update below):

SELECT
 c.id,
 c.title,
 p.id AS product_id,
 p.title AS product_title
FROM categories AS c
JOIN products AS p ON
 p.id = (                                 --- the PRIMARY KEY
  SELECT p1.id FROM products AS p1
  WHERE c.id=p1.category_id
  ORDER BY p1.id LIMIT 1
 )

PS. I did the performance test of the query vs other proposed here, and this query is the best option yet!

UPDATE (2022-07-20, PostgresSQL)

I’m not working with mySQL for a while already, so, I decided to test the performance of my solution (which actually works perfect in both MySQL and PostgresQL) with solution provided by @Gravy in PostgresQL v.12.9.

For that I decided to create a dummy tables and data with 100 categories and 100000 products. You can check the code on this gist

I run my query above and it took only 13ms to run.

After I slightly modified (for postgres) the query from @Gravy:

SELECT
  id,
  category_title,
  (array_agg(product_title))[1]  
FROM
    (SELECT c.id, c.title AS category_title, p.id AS product_id, p.title AS product_title
    FROM categories AS c
    JOIN products AS p ON c.id = p.category_id
    ORDER BY c.id ASC) AS a 
GROUP BY id, category_title;

and run it too.
It took more than 150ms in my machine. Which is >10x times slower.

In defense of @gravy’s solution, I agree with n+1 problem. But, in this particular case, usually the number of products is way larger than categories. So, running through each category is way less expensive than running through each product as in @Gravy’s query.

By the way, if your table has 1mln products with 100 categories, the speed of my query is still the same (between 9-17ms), but the query from [@Gravy] takes more than 2 seconds to run

In resume, at this moment, my query is the most performant and optimal solution for the current task.

Feel free to comment.

Leave a Comment