Postgres query optimization (forcing an index scan)

For testing purposes you can force the use of the index by “disabling” sequential scans – best in your current session only: SET enable_seqscan = OFF; Do not use this on a productive server. Details in the manual here. I quoted “disabling”, because you cannot actually disable sequential table scans. But any other available option … Read more

Whats the fastest way to lookup big tables for points within radius MySQL (latitude longitude)

Well first of all if you have a lot of geospatial data, you should be using mysql’s geospatial extensions rather than calculations like this. You can then create spatial indexes that would speed up many queries and you don’t have to write long drawn out queries like the one above. Using a comparision with ST_Distance … Read more

Hive query performance for high cardinality field

Use ORC with bloom filters: CREATE TABLE employee ( employee_id bigint, name STRING ) STORED AS ORC TBLPROPERTIES (“orc.bloom.filter.columns”=”employee_id”) ; Enable PPD with vectorizing, use CBO and Tez: SET hive.optimize.ppd=true; SET hive.optimize.ppd.storage=true; SET hive.vectorized.execution.enabled=true; SET hive.vectorized.execution.reduce.enabled = true; SET hive.cbo.enable=true; set hive.stats.autogather=true; set hive.compute.query.using.stats=true; set hive.stats.fetch.partition.stats=true; set hive.execution.engine=tez; set hive.stats.fetch.column.stats=true; set hive.map.aggr=true; SET hive.tez.auto.reducer.parallelism=true; Ref: … Read more

60 million entries, select entries from a certain month. How to optimize database?

To get entries in a particular month, for a particular year, faster – you will need to index the time column: CREATE INDEX idx_time ON ENTRIES(time) USING BTREE; Additionally, use: SELECT e.* FROM ENTRIES e WHERE e.time BETWEEN ‘2010-04-01’ AND DATE_SUB(‘2010-05-01′ INTERVAL 1 SECOND) …because BETWEEN is inclusive, so you’d get anything dated “2010-05-01 00:00:00” … Read more