Ich habe die Abfrage folgendenPostgreSQL: Je niedriger das LIMIT, desto langsamer ist die Abfrage
SELECT translation.id
FROM "TRANSLATION" translation
INNER JOIN "UNIT" unit
ON translation.fk_id_unit = unit.id
INNER JOIN "DOCUMENT" document
ON unit.fk_id_document = document.id
WHERE document.fk_id_job = 3665
ORDER BY translation.id asc
LIMIT 50
Es läuft für schreckliche 110 Sekunden.
Die Tischgrößen:
+----------------+-------------+
| Table | Records |
+----------------+-------------+
| TRANSLATION | 6,906,679 |
| UNIT | 6,906,679 |
| DOCUMENT | 42,321 |
+----------------+-------------+
Allerdings, wenn ich die LIMIT
Parameter von 50 bis 1000, die Abfrage beendet in 2 Sekunden ändern. Hier
ist der Abfrageplan für den langsamen
Limit (cost=0.00..146071.52 rows=50 width=8) (actual time=111916.180..111917.626 rows=50 loops=1)
-> Nested Loop (cost=0.00..50748166.14 rows=17371 width=8) (actual time=111916.179..111917.624 rows=50 loops=1)
Join Filter: (unit.fk_id_document = document.id)
-> Nested Loop (cost=0.00..39720545.91 rows=5655119 width=16) (actual time=0.051..15292.943 rows=5624514 loops=1)
-> Index Scan using "TRANSLATION_pkey" on "TRANSLATION" translation (cost=0.00..7052806.78 rows=5655119 width=16) (actual time=0.039..1887.757 rows=5624514 loops=1)
-> Index Scan using "UNIT_pkey" on "UNIT" unit (cost=0.00..5.76 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=5624514)
Index Cond: (unit.id = translation.fk_id_translation_unit)
-> Materialize (cost=0.00..138.51 rows=130 width=8) (actual time=0.000..0.006 rows=119 loops=5624514)
-> Index Scan using "DOCUMENT_idx_job" on "DOCUMENT" document (cost=0.00..137.86 rows=130 width=8) (actual time=0.025..0.184 rows=119 loops=1)
Index Cond: (fk_id_job = 3665)
und für die schnellen ein
Limit (cost=523198.17..523200.67 rows=1000 width=8) (actual time=2274.830..2274.988 rows=1000 loops=1)
-> Sort (cost=523198.17..523241.60 rows=17371 width=8) (actual time=2274.829..2274.895 rows=1000 loops=1)
Sort Key: translation.id
Sort Method: top-N heapsort Memory: 95kB
-> Nested Loop (cost=139.48..522245.74 rows=17371 width=8) (actual time=0.095..2252.710 rows=97915 loops=1)
-> Hash Join (cost=139.48..420861.93 rows=17551 width=8) (actual time=0.079..2005.238 rows=97915 loops=1)
Hash Cond: (unit.fk_id_document = document.id)
-> Seq Scan on "UNIT" unit (cost=0.00..399120.41 rows=5713741 width=16) (actual time=0.008..1200.547 rows=6908070 loops=1)
-> Hash (cost=137.86..137.86 rows=130 width=8) (actual time=0.065..0.065 rows=119 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 5kB
-> Index Scan using "DOCUMENT_idx_job" on "DOCUMENT" document (cost=0.00..137.86 rows=130 width=8) (actual time=0.009..0.041 rows=119 loops=1)
Index Cond: (fk_id_job = 3665)
-> Index Scan using "TRANSLATION_idx_unit" on "TRANSLATION" translation (cost=0.00..5.76 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=97915)
Index Cond: (translation.fk_id_translation_unit = unit.id)
Anscheinend ist die Ausführungspläne sind sehr unterschiedlich und die zweiten Ergebnisse in einer Abfrage 50mal schneller.
Ich habe Indizes für alle Felder in der Abfrage beteiligt und ich habe ANALYZE
auf allen Tabellen ausgeführt, kurz bevor die Abfragen ausgeführt werden.
Kann jemand sehen, was mit der ersten Abfrage falsch ist?
UPDATE: Tabellendefinitionen
CREATE TABLE "public"."TRANSLATION" (
"id" BIGINT NOT NULL,
"fk_id_translation_unit" BIGINT NOT NULL,
"translation" TEXT NOT NULL,
"fk_id_language" INTEGER NOT NULL,
"relevance" INTEGER,
CONSTRAINT "TRANSLATION_pkey" PRIMARY KEY("id"),
CONSTRAINT "TRANSLATION_fk" FOREIGN KEY ("fk_id_translation_unit")
REFERENCES "public"."UNIT"("id")
ON DELETE CASCADE
ON UPDATE NO ACTION
DEFERRABLE
INITIALLY DEFERRED,
CONSTRAINT "TRANSLATION_fk1" FOREIGN KEY ("fk_id_language")
REFERENCES "public"."LANGUAGE"("id")
ON DELETE NO ACTION
ON UPDATE NO ACTION
NOT DEFERRABLE
) WITHOUT OIDS;
CREATE INDEX "TRANSLATION_idx_unit" ON "public"."TRANSLATION"
USING btree ("fk_id_translation_unit");
CREATE INDEX "TRANSLATION_language_idx" ON "public"."TRANSLATION"
USING hash ("translation");
CREATE TABLE "public"."UNIT" (
"id" BIGINT NOT NULL,
"text" TEXT NOT NULL,
"fk_id_language" INTEGER NOT NULL,
"fk_id_document" BIGINT NOT NULL,
"word_count" INTEGER DEFAULT 0,
CONSTRAINT "UNIT_pkey" PRIMARY KEY("id"),
CONSTRAINT "UNIT_fk" FOREIGN KEY ("fk_id_document")
REFERENCES "public"."DOCUMENT"("id")
ON DELETE CASCADE
ON UPDATE NO ACTION
NOT DEFERRABLE,
CONSTRAINT "UNIT_fk1" FOREIGN KEY ("fk_id_language")
REFERENCES "public"."LANGUAGE"("id")
ON DELETE NO ACTION
ON UPDATE NO ACTION
NOT DEFERRABLE
) WITHOUT OIDS;
CREATE INDEX "UNIT_idx_document" ON "public"."UNIT"
USING btree ("fk_id_document");
CREATE INDEX "UNIT_text_idx" ON "public"."UNIT"
USING hash ("text");
CREATE TABLE "public"."DOCUMENT" (
"id" BIGINT NOT NULL,
"fk_id_job" BIGINT,
CONSTRAINT "DOCUMENT_pkey" PRIMARY KEY("id"),
CONSTRAINT "DOCUMENT_fk" FOREIGN KEY ("fk_id_job")
REFERENCES "public"."JOB"("id")
ON DELETE SET NULL
ON UPDATE NO ACTION
NOT DEFERRABLE
) WITHOUT OIDS;
UPDATE: Datenbank-Parameter
shared_buffers = 2048MB
effective_cache_size = 4096MB
work_mem = 32MB
Total memory: 32GB
CPU: Intel Xeon X3470 @ 2.93 GHz, 8MB cache
können Sie die Tabellendefinition veröffentlichen? –
@JohnTotetWoo Aktualisiert – twoflower
Ist Ihre Installation überhaupt abgestimmt? Wie lauten die Einstellungen für shared_buffers, effective_cache_size, work_mem und Ihre Systemspezifikationen? – eevar