Привет всем. Помогите пожалуйста понять как оптимизировать запрос.
Есть 2 таблицы к которым нужно применить JOIN, в одной таблице около 65 млн. строк (`csco_idesind`), в другой 336 млн. строк (`csco_ifndq`).
На входе был такой запрос:
SELECT k.gvkey, k.datadate, i.valuei, i.item, i.effdate, i.thrudate, k.datafmt
FROM csco_idesind AS k
INNER JOIN csco_ifndq i ON k.coifnd_id = i.coifnd_id
WHERE k.gvkey IN (...gvkeys ~1600)
AND k.datadate BETWEEN '2013-10-01 00:00:00' AND '2015-04-01 00:00:00'
выхлоп анализатора:
Merge Join (cost=11777177.79..119779025883.98 rows=7904066233200 width=42) (actual time=5929148.383..22674304.150 rows=39931455 loops=1)
Merge Cond: (i.coifnd_id = k.coifnd_id)
-> Index Scan using csco_ifndq_coifnd_id_idx on csco_ifndq i (cost=0.57..1205401664.09 rows=336723168 width=30) (actual time=2.838..22524320.405 rows=325599420 loops=1)
-> Materialize (cost=11777177.22..11800650.71 rows=4694697 width=20) (actual time=108488.998..111424.490 rows=39931551 loops=1)
-> Sort (cost=11777177.22..11788913.96 rows=4694697 width=20) (actual time=108488.981..108646.332 rows=179749 loops=1)
Sort Key: k.coifnd_id
Sort Method: external merge Disk: 6024kB
-> Gather (cost=1000.56..11064387.75 rows=4694697 width=20) (actual time=3.677..108369.271 rows=179749 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Index Scan using csco_idesind_datadate_idx on csco_idesind k (cost=0.56..10593918.05 rows=1956124 width=20) (actual time=122.282..108262.504 rows=59916 loops=3)
Index Cond: ((datadate >= '2013-10-01 00:00:00'::timestamp without time zone) AND (datadate <= '2015-04-01 00:00:00'::timestamp without time zone))
Filter: (gvkey = ANY ('{~1600 items}'::integer[]))
Rows Removed by Filter: 1536564
Planning Time: 3.558 ms
JIT:
Functions: 25
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 26.070 ms, Inlining 210.727 ms, Optimization 269.278 ms, Emission 182.134 ms, Total 688.210 ms
Execution Time: 22676429.641 ms