Why are partial PostgreSQL HASH indices not smaller than full indices?Why does MySQL not have hash indices on MyISAM or InnoDB?Adding index to large mysql tablesWhy would function based indices I’ve created lower the cost but not show up in the explain plan breakdown?Optimizing indexesPostgreSQL indices on multi fields queriesPostgreSQL not using partial index when using boolean in WHERE clause512 Bytes are not being used from SQL Server's 8 KByte data pageIndexing strategy for VARCHAR2 LIKE searchHow is it possible for Hash Index not to be faster than Btree for equality lookups?PostgreSQL suitabilty of Hash Index on PK and FK

Should I file my taxes? No income, unemployed, but paid 2k in student loan interest

Why do phishing e-mails use faked e-mail addresses instead of the real one?

EXM headers adding bounce@spe.sitecoremail.com as the sender

Is "cogitate" used appropriately in "I cogitate that success relies on hard work"?

What is the purpose of a disclaimer like "this is not legal advice"?

Is it appropriate to ask a former professor to order a library book for me through ILL?

An Undercover Army

What is Tony Stark injecting into himself in Iron Man 3?

Vector-transposing function

Why would /etc/passwd be used every time someone executes `ls -l` command?

Professor forcing me to attend a conference, I can't afford even with 50% funding

How to negotiate a patent idea for a raise?

What can I do if someone tampers with my SSH public key?

Use Mercury as quenching liquid for swords?

How can I portion out frozen cookie dough?

Unidentified signals on FT8 frequencies

Should we avoid writing fiction about historical events without extensive research?

Are small insurances worth it?

Why isn't P and P/poly trivially the same?

Who has more? Ireland or Iceland?

Rationale to prefer local variables over instance variables?

Help! My Character is too much for her story!

Is this a crown race?

Why is my explanation wrong?



Why are partial PostgreSQL HASH indices not smaller than full indices?


Why does MySQL not have hash indices on MyISAM or InnoDB?Adding index to large mysql tablesWhy would function based indices I’ve created lower the cost but not show up in the explain plan breakdown?Optimizing indexesPostgreSQL indices on multi fields queriesPostgreSQL not using partial index when using boolean in WHERE clause512 Bytes are not being used from SQL Server's 8 KByte data pageIndexing strategy for VARCHAR2 LIKE searchHow is it possible for Hash Index not to be faster than Btree for equality lookups?PostgreSQL suitabilty of Hash Index on PK and FK













4















I want to create the most efficient index for a sparsely populated column. I only need equality operations, so a HASH index should be beneficial.



Now I'm wondering why a partial HASH index isn't smaller than a full hash index:



CREATE INDEX full_hash ON mytable USING HASH(my_id); # 256 MB
CREATE INDEX partial_hash ON mytable USING HASH(my_id) WHERE my_ID IS NOT NULL; # 256 MB

CREATE INDEX full_btree ON mytable (my_id); # 537 MB
CREATE INDEX partial_btree ON mytable (my_id) WHERE my_ID IS NOT NULL; # 32 MB


Both hash indices take exactly the same amount of space (as shown in pgHero). However, when using standard BTREE indices, the partial index takes only 5% of the space of the full index.



Are partial HASH indices not supported in PostgreSQL 10?










share|improve this question


























    4















    I want to create the most efficient index for a sparsely populated column. I only need equality operations, so a HASH index should be beneficial.



    Now I'm wondering why a partial HASH index isn't smaller than a full hash index:



    CREATE INDEX full_hash ON mytable USING HASH(my_id); # 256 MB
    CREATE INDEX partial_hash ON mytable USING HASH(my_id) WHERE my_ID IS NOT NULL; # 256 MB

    CREATE INDEX full_btree ON mytable (my_id); # 537 MB
    CREATE INDEX partial_btree ON mytable (my_id) WHERE my_ID IS NOT NULL; # 32 MB


    Both hash indices take exactly the same amount of space (as shown in pgHero). However, when using standard BTREE indices, the partial index takes only 5% of the space of the full index.



    Are partial HASH indices not supported in PostgreSQL 10?










    share|improve this question
























      4












      4








      4








      I want to create the most efficient index for a sparsely populated column. I only need equality operations, so a HASH index should be beneficial.



      Now I'm wondering why a partial HASH index isn't smaller than a full hash index:



      CREATE INDEX full_hash ON mytable USING HASH(my_id); # 256 MB
      CREATE INDEX partial_hash ON mytable USING HASH(my_id) WHERE my_ID IS NOT NULL; # 256 MB

      CREATE INDEX full_btree ON mytable (my_id); # 537 MB
      CREATE INDEX partial_btree ON mytable (my_id) WHERE my_ID IS NOT NULL; # 32 MB


      Both hash indices take exactly the same amount of space (as shown in pgHero). However, when using standard BTREE indices, the partial index takes only 5% of the space of the full index.



      Are partial HASH indices not supported in PostgreSQL 10?










      share|improve this question














      I want to create the most efficient index for a sparsely populated column. I only need equality operations, so a HASH index should be beneficial.



      Now I'm wondering why a partial HASH index isn't smaller than a full hash index:



      CREATE INDEX full_hash ON mytable USING HASH(my_id); # 256 MB
      CREATE INDEX partial_hash ON mytable USING HASH(my_id) WHERE my_ID IS NOT NULL; # 256 MB

      CREATE INDEX full_btree ON mytable (my_id); # 537 MB
      CREATE INDEX partial_btree ON mytable (my_id) WHERE my_ID IS NOT NULL; # 32 MB


      Both hash indices take exactly the same amount of space (as shown in pgHero). However, when using standard BTREE indices, the partial index takes only 5% of the space of the full index.



      Are partial HASH indices not supported in PostgreSQL 10?







      postgresql index index-tuning postgresql-10






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 10 hours ago









      Ortwin GentzOrtwin Gentz

      1324




      1324




















          2 Answers
          2






          active

          oldest

          votes


















          5














          I would argue that this is a bug in the hash index code. When you create an index on an already-populated table, it tries to pre-size the index to hold all the data so that it doesn't have to keep splitting buckets as the index is created. But the code for doing this does not take the NULL fraction of the column nor (apparently) the selectivity of the partial index clause into account, so it arrives at a too-large number for the pre-sizing.



          If you were to create the index first, and then populated the table, you will find that the hash index is small, whether you made it partial or not. If the table is going to grow substantially after the index is created, the extra space consumed by the index upon original creation will be put to good use.






          share|improve this answer


















          • 2





            I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.

            – jjanes
            3 hours ago











          • Oh, and I submitted a bug already: postgresql.org/message-id/…

            – Ortwin Gentz
            2 hours ago


















          4














          It's not explicitly stated in the documentation, but in the source code there is the following comment:



          /*
          * We do not insert null values into hash indexes. This is okay because
          * the only supported search operator is '=', and we assume it is strict.
          */


          So the is not null predicate does indeed change nothing, as null values are always ignored for hash indexes (which does make sense, as comparing null values with = would never return true).






          share|improve this answer




















          • 2





            Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.

            – Ortwin Gentz
            10 hours ago











          • This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.

            – jjanes
            9 hours ago











          • The full btree index is more than double the size of the hash index.

            – Ortwin Gentz
            3 hours ago










          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "182"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f231647%2fwhy-are-partial-postgresql-hash-indices-not-smaller-than-full-indices%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          5














          I would argue that this is a bug in the hash index code. When you create an index on an already-populated table, it tries to pre-size the index to hold all the data so that it doesn't have to keep splitting buckets as the index is created. But the code for doing this does not take the NULL fraction of the column nor (apparently) the selectivity of the partial index clause into account, so it arrives at a too-large number for the pre-sizing.



          If you were to create the index first, and then populated the table, you will find that the hash index is small, whether you made it partial or not. If the table is going to grow substantially after the index is created, the extra space consumed by the index upon original creation will be put to good use.






          share|improve this answer


















          • 2





            I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.

            – jjanes
            3 hours ago











          • Oh, and I submitted a bug already: postgresql.org/message-id/…

            – Ortwin Gentz
            2 hours ago















          5














          I would argue that this is a bug in the hash index code. When you create an index on an already-populated table, it tries to pre-size the index to hold all the data so that it doesn't have to keep splitting buckets as the index is created. But the code for doing this does not take the NULL fraction of the column nor (apparently) the selectivity of the partial index clause into account, so it arrives at a too-large number for the pre-sizing.



          If you were to create the index first, and then populated the table, you will find that the hash index is small, whether you made it partial or not. If the table is going to grow substantially after the index is created, the extra space consumed by the index upon original creation will be put to good use.






          share|improve this answer


















          • 2





            I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.

            – jjanes
            3 hours ago











          • Oh, and I submitted a bug already: postgresql.org/message-id/…

            – Ortwin Gentz
            2 hours ago













          5












          5








          5







          I would argue that this is a bug in the hash index code. When you create an index on an already-populated table, it tries to pre-size the index to hold all the data so that it doesn't have to keep splitting buckets as the index is created. But the code for doing this does not take the NULL fraction of the column nor (apparently) the selectivity of the partial index clause into account, so it arrives at a too-large number for the pre-sizing.



          If you were to create the index first, and then populated the table, you will find that the hash index is small, whether you made it partial or not. If the table is going to grow substantially after the index is created, the extra space consumed by the index upon original creation will be put to good use.






          share|improve this answer













          I would argue that this is a bug in the hash index code. When you create an index on an already-populated table, it tries to pre-size the index to hold all the data so that it doesn't have to keep splitting buckets as the index is created. But the code for doing this does not take the NULL fraction of the column nor (apparently) the selectivity of the partial index clause into account, so it arrives at a too-large number for the pre-sizing.



          If you were to create the index first, and then populated the table, you will find that the hash index is small, whether you made it partial or not. If the table is going to grow substantially after the index is created, the extra space consumed by the index upon original creation will be put to good use.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 8 hours ago









          jjanesjjanes

          13.6k917




          13.6k917







          • 2





            I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.

            – jjanes
            3 hours ago











          • Oh, and I submitted a bug already: postgresql.org/message-id/…

            – Ortwin Gentz
            2 hours ago












          • 2





            I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.

            – jjanes
            3 hours ago











          • Oh, and I submitted a bug already: postgresql.org/message-id/…

            – Ortwin Gentz
            2 hours ago







          2




          2





          I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.

          – jjanes
          3 hours ago





          I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.

          – jjanes
          3 hours ago













          Oh, and I submitted a bug already: postgresql.org/message-id/…

          – Ortwin Gentz
          2 hours ago





          Oh, and I submitted a bug already: postgresql.org/message-id/…

          – Ortwin Gentz
          2 hours ago













          4














          It's not explicitly stated in the documentation, but in the source code there is the following comment:



          /*
          * We do not insert null values into hash indexes. This is okay because
          * the only supported search operator is '=', and we assume it is strict.
          */


          So the is not null predicate does indeed change nothing, as null values are always ignored for hash indexes (which does make sense, as comparing null values with = would never return true).






          share|improve this answer




















          • 2





            Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.

            – Ortwin Gentz
            10 hours ago











          • This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.

            – jjanes
            9 hours ago











          • The full btree index is more than double the size of the hash index.

            – Ortwin Gentz
            3 hours ago















          4














          It's not explicitly stated in the documentation, but in the source code there is the following comment:



          /*
          * We do not insert null values into hash indexes. This is okay because
          * the only supported search operator is '=', and we assume it is strict.
          */


          So the is not null predicate does indeed change nothing, as null values are always ignored for hash indexes (which does make sense, as comparing null values with = would never return true).






          share|improve this answer




















          • 2





            Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.

            – Ortwin Gentz
            10 hours ago











          • This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.

            – jjanes
            9 hours ago











          • The full btree index is more than double the size of the hash index.

            – Ortwin Gentz
            3 hours ago













          4












          4








          4







          It's not explicitly stated in the documentation, but in the source code there is the following comment:



          /*
          * We do not insert null values into hash indexes. This is okay because
          * the only supported search operator is '=', and we assume it is strict.
          */


          So the is not null predicate does indeed change nothing, as null values are always ignored for hash indexes (which does make sense, as comparing null values with = would never return true).






          share|improve this answer















          It's not explicitly stated in the documentation, but in the source code there is the following comment:



          /*
          * We do not insert null values into hash indexes. This is okay because
          * the only supported search operator is '=', and we assume it is strict.
          */


          So the is not null predicate does indeed change nothing, as null values are always ignored for hash indexes (which does make sense, as comparing null values with = would never return true).







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 10 hours ago

























          answered 10 hours ago









          a_horse_with_no_namea_horse_with_no_name

          40.5k777113




          40.5k777113







          • 2





            Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.

            – Ortwin Gentz
            10 hours ago











          • This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.

            – jjanes
            9 hours ago











          • The full btree index is more than double the size of the hash index.

            – Ortwin Gentz
            3 hours ago












          • 2





            Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.

            – Ortwin Gentz
            10 hours ago











          • This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.

            – jjanes
            9 hours ago











          • The full btree index is more than double the size of the hash index.

            – Ortwin Gentz
            3 hours ago







          2




          2





          Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.

          – Ortwin Gentz
          10 hours ago





          Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.

          – Ortwin Gentz
          10 hours ago













          This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.

          – jjanes
          9 hours ago





          This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.

          – jjanes
          9 hours ago













          The full btree index is more than double the size of the hash index.

          – Ortwin Gentz
          3 hours ago





          The full btree index is more than double the size of the hash index.

          – Ortwin Gentz
          3 hours ago

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Database Administrators Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f231647%2fwhy-are-partial-postgresql-hash-indices-not-smaller-than-full-indices%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Isabella Eugénie Boyer Biographie | Références | Menu de navigationmodifiermodifier le codeComparator to Compute the Relative Value of a U.S. Dollar Amount – 1774 to Present.

          Join wedge with single bond in chemfigHow to make only one part of double bond bold with chemfig?Crossing bonds in chemfigjoining atoms in chemfig. Two adjacent molculesHow do I selectively change bond length in chemfig?Ugly bond joints in chemfigchemfig: reaction above arrowUsing the mhchem and chemfig packages in conjunctionBonding to specific element letter using chemfigResonance hybrids in chemfigScale chemfig molecule in beamer with tikzWhy does this chemfig bond with a hook start in the middle of the atom?

          Should we avoid writing fiction about historical events without extensive research?How do we write a story about genocide committed by a fascist government without falling into the “Nazi Germany” cliché?Researching sensitive subjectsShould I avoid “lecturing” my readers?Archetypical/popular historical fictionHow to write a “strong” passage?Will what worked 'back then' work today? (Novels)Historical Fiction: using you and thouHow do you make characters relatable if they exist in a completely different moral context?How do I write a MODERN combat/violence scene without being dry?Fictionizing firsthand accounts from history?Is it possible to narrate a novel in a faux-historical style without alienating the reader?