Finding similar strings with PostgreSQL quickly

The way you have it, similarity between every element and every other element of the table has to be calculated (almost a cross join). If your table has 1000 rows, that’s already 1,000,000 (!) similarity calculations, before those can be checked against the condition and sorted. Scales terribly. Use SET pg_trgm.similarity_threshold and the % operator … Read more

Keep PostgreSQL from sometimes choosing a bad query plan

If the query planner makes bad decisions it’s mostly one of two things: 1. The statistics are inaccurate. Do you run ANALYZE enough? Also popular in it’s combined form VACUUM ANALYZE. If autovacuum is on (which is the default in modern-day Postgres), ANALYZE is run automatically. But consider: Are regular VACUUM ANALYZE still recommended under … Read more

Any downsides of using data type “text” for storing strings?

Generally, there is no downside to using text in terms of performance/memory. On the contrary: text is the optimum. Other types have more or less relevant downsides. text is literally the “preferred” type among string types in the Postgres type system, which can affect function or operator type resolution. In particular, never use char(n) (alias … Read more

Optimize GROUP BY query to retrieve latest row per user

For best read performance you need a multicolumn index: CREATE INDEX log_combo_idx ON log (user_id, log_date DESC NULLS LAST); To make index only scans possible, add the otherwise not needed column payload in a covering index with the INCLUDE clause (Postgres 11 or later): CREATE INDEX log_combo_covering_idx ON log (user_id, log_date DESC NULLS LAST) INCLUDE … Read more