query-optimization - w3toppers.com

Difference in MySQL JOIN vs LEFT JOIN

I thought that by not specifying a type of join it was assumed to be a LEFT JOIN. Is this not the case? No, the default join is an INNER JOIN. Here is a visual explanation of SQL joins. Inner join Left join

mysql select from n last rows

Starting from the answer given by @chaos, but with a few modifications: You should always use ORDER BY if you use LIMIT. There is no implicit order guaranteed for an RDBMS table. You may usually get rows in the order of the primary key, but you can’t rely on this, nor is it portable. If … Read more

Counting DISTINCT over multiple columns

If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns. Once it is persisted, provided the column is deterministic and you are using “sane” database settings, it can be indexed and / or statistics can be created on it. … Read more

Subqueries with EXISTS vs IN – MySQL

An Explain Plan would have shown you why exactly you should use Exists. Usually the question comes Exists vs Count(*). Exists is faster. Why? With regard to challenges present by NULL: when subquery returns Null, for IN the entire query becomes Null. So you need to handle that as well. But using Exist, it’s merely … Read more

MySQL indexes – what are the best practices?

You should definitely spend some time reading up on indexing, there’s a lot written about it, and it’s important to understand what’s going on. Broadly speaking, an index imposes an ordering on the rows of a table. For simplicity’s sake, imagine a table is just a big CSV file. Whenever a row is inserted, it’s … Read more

Execute Hive Query with IN clause parameters in parallel

There is no need to read the same data many times in separate queries to achieve better parallelism. Tune proper mapper and reducer parallelism for the same. First of all, enable PPD with vectorizing, use CBO and Tez: SET hive.optimize.ppd=true; SET hive.optimize.ppd.storage=true; SET hive.vectorized.execution.enabled=true; SET hive.vectorized.execution.reduce.enabled = true; SET hive.cbo.enable=true; set hive.stats.autogather=true; set hive.compute.query.using.stats=true; set … Read more

JOIN queries vs multiple queries

For inner joins, a single query makes sense, since you only get matching rows. For left joins, multiple queries is much better… look at the following benchmark I did: Single query with 5 Joins query: 8.074508 seconds result size: 2268000 5 queries in a row combined query time: 0.00262 seconds result size: 165 (6 + … Read more

How to do the Recursive SELECT query in MySQL?

Edit Solution mentioned by @leftclickben is also effective. We can also use a stored procedure for the same. CREATE PROCEDURE get_tree(IN id int) BEGIN DECLARE child_id int; DECLARE prev_id int; SET prev_id = id; SET child_id=0; SELECT col3 into child_id FROM table1 WHERE col1=id ; create TEMPORARY table IF NOT EXISTS temp_table as (select * … Read more

PostgreSQL LIKE query performance variations

FTS does not support LIKE The previously accepted answer was incorrect. Full Text Search with its full text indexes is not for the LIKE operator at all, it has its own operators and doesn’t work for arbitrary strings. It operates on words based on dictionaries and stemming. It does support prefix matching for words, but … Read more

How to describe performance issue in relational database?

For Oracle Database provide this information: Describe the symptoms of the problem Describe the behavior that cause the problem. Is the behavior of the query stable or does the problem occurs only sometimes, with specific parameters or simple random. Can you reproduce this behavior in an IDE (e.g. SQL Developer)? Describe the environment Define the … Read more