Join Algorithms
ęåę“ę°äŗ
ęåę“ę°äŗ
ä¼ē§ēę°ę®åŗč®¾č®”åŗčÆ„åå°ę°ę®åä½ćčę们ē Joins åŗčÆ„č¦åå°čæäøē¹ć
ę¬ēÆå ³ę³Øäø¤äøŖéåƹäø¤äøŖč”Øē inner equijoin ē®ę³ć equijoin ē®ę³ęÆåƹäŗ keys äøę ·ēč”Øć
åƹäŗäøäøŖ tuple åå¦äøäøŖ tuple č¦å¹é join å±ę§ļ¼join ęä½ä¼äø²č å ć å ¶å 容åå³äŗ processing model, storage model, and the query itselfć äø¤ē§ę¹ę³å³å®č¾åŗēå 容ēę¹å¼ļ¼
Dataļ¼ ē“ę„ęęęå¼ę·č“å°č¾åŗč”Øäøļ¼čæę ·äøéč¦å夓åēåå§č”Øćē¼ŗē¹ęÆéč¦ę“å¤ēå åę„ååØč”Øļ¼
Record Idsļ¼ä» ä» ę·č“ join keysļ¼å ¶ä»ēę·č“å ¶ēøåƹåŗē record idsćčæęÆåå¼ååØēēę³ę¹ę³ļ¼å äøŗ DBMS äøéč¦ä»»ä½ query äøéč¦ēę°ę®ćčæäøŖē§°ä½ late materializationć
å¦ęę°ę®éčæ大ļ¼åæåæ č¦ęŗ¢åŗå°ē”¬ēäøļ¼I/O ꬔę°å°±ęÆę们蔔é join ē®ę³ēę åć
Variables used in this lecture: |
---|
ęē®åä¹ęÆęč ¢ēåę³ļ¼å°±ęÆåéåµå„å¾ŖēÆ joinļ¼å¾ē®åļ¼äø¤äøŖč”Øļ¼äø¤äøŖ for å¾ŖēÆļ¼å¦ęę»”č¶³ęē join č°čÆļ¼é£ä¹č¾åŗćäøč¬ččØ DBMS ä¼č®© smaller ē table ä½äøŗ outer forć Cost:
å¦ę inner table ę indexļ¼éåŗ¦čæåÆ仄åęåļ¼å äøŗę们äøēØäøꬔꬔēę«ęę“äøŖē£ēļ¼čęÆä½æēØē“¢å¼ē“ę„ę¾å°ć
Pick the smaller table as the outer table.
Buffer as much of the outer table in memory as possible.
Loop over the inner table or use an index.
Phase #1: Sort
Sort both tables on the join key(s).
We can use the external merge sort algorithm that we talked about last class.
Phase #2: Merge
Step through the two sorted tables with cursors and emit
matching tuples.
May need to backtrack depending on the join type.
One or both tables are already sorted on join key.
Output must be sorted on join key.
The input relations may be sorted by either by an explicit sort operator, or by scanning the relation using an index on the join key.
Hash Join ē high level ęę³ęÆę join attributes éčæ hash table åę small chuncksļ¼čæę ·å°±åå°äŗåå¼åę ·ēęÆč¾čæē®äŗć
Phase #1: Build
Scan the outer relation and populate a hash table using the hash function h1 on the join attributes.
Phase #2: Probe
Scan the inner relation and use h1 on each tuple to jump to a location in the hash table and find a matching tuple.
ä½ęÆčæę ·ęäøäøŖē¼ŗē¹ļ¼å°±ęÆäøäøå®å®å Øę¾čæå åäøć
Build Phase: Hash both tables on the join attribute into partitions.
Probe Phase: Compares tuples in corresponding partitions for each table.
Hash join å ä¹ęÆę儽ēéę©ččæ sort-merge joinćå½ē¶åØäøäŗę åµ sort-merge ę“ä¼ćę仄 DBMS ä¼ę··ēØčæäø¤ē§ē®ę³ć
čæäøŖē®ę³ēęęęÆåØå¤å±å äøäø¤ę¬”å¾ŖēÆļ¼åå«åƹäŗ outer table block å inner table blockćčæę ·å仄 block äøŗåä½ļ¼åå°äŗäøäŗ I/Oć Cost:
å½ē¶ä»„äøęÆ仄 3 äøŖ buffers äøŗč®”ē®éļ¼å¦ęåé äŗ B äøŖ buffersļ¼äøäøŖåé ē» inner blockļ¼äøäøŖåé ē» input blockļ¼å©ä½ēé½åé äøŗ outer blockć Cost:
Cost:
Sort Cost for Table R:
Sort Cost for Table S:
Merge Cost:
Partitioning Phase Cost: Probe Phase Cost: Total Cost:
Algorithm | IO Cost | Example |
---|---|---|
Simple Nested Loop Join
1.3 hours
Block Nested Loop Join
seconds
Index Nested Loop Join
20 seconds
Sort-Merge Join
0.59 seconds
Hash Join
0.45 seconds
M pages in table R (Outer Table), m tuples total
N pages in table S (Inner Table), n tuples total