MySQL – Identify potential duplicate Indexes

Background

A little while ago I was trying to get a quick education on the Information schema in MySQL.

Shlomi Noach

Ran into a nice post by Shlomi Noach.

The post is titled “Useful database analysis queries with INFORMATION_SCHEMA” and published here.

Code

Table

Test Table

CREATE TABLE `City`
(
  `ID` int(11) NOT NULL auto_increment,
  `Name` char(35) character set utf8 NOT NULL default '',
  `CountryCode` char(3) NOT NULL default '',
  `District` char(20) NOT NULL default '',
  `Population` int(11) NOT NULL default '0',
  PRIMARY KEY  (`ID`),
  UNIQUE KEY `ID` (`ID`),
  KEY `Population` (`Population`),
  KEY `Population_2` (`Population`,`CountryCode`)
)
-- ENGINE=MyISAM
ENGINE=INNODB
AUTO_INCREMENT=4080
DEFAULT CHARSET=latin1

Outline By Shlomi Noach

We can see that the Population_2 index covers the Population index, so the latter is redundant and should be removed. We also see that the ID index is redundant, since there is a PRIMARY KEY on ID, which is in itself a unique key. How can we test such cases by querying the INFORMATION_SCHEMA? Turns out we can do that using the STATISTICS table.

[Update: thanks to Roland Bouman’s comments. The following queries only consider BTREE indexes, and do not verify FULLTEXT or HASH indexes]

 

Query

Original


SELECT * FROM (
  SELECT TABLE_SCHEMA, TABLE_NAME, INDEX_NAME,
    GROUP_CONCAT(COLUMN_NAME ORDER BY SEQ_IN_INDEX) AS columns
  FROM `information_schema`.`STATISTICS`
  WHERE TABLE_SCHEMA NOT IN ('mysql', 'INFORMATION_SCHEMA')
    AND NON_UNIQUE = 1 AND INDEX_TYPE='BTREE'
  GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
) AS i1 INNER JOIN (
  SELECT TABLE_SCHEMA, TABLE_NAME, INDEX_NAME,
    GROUP_CONCAT(COLUMN_NAME ORDER BY SEQ_IN_INDEX) AS columns
  FROM `information_schema`.`STATISTICS`
  WHERE INDEX_TYPE='BTREE'
  GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
) AS i2
USING (TABLE_SCHEMA, TABLE_NAME)
WHERE i1.columns != i2.columns
AND LOCATE(CONCAT(i1.columns, ','), i2.columns) = 1

Revised


SELECT 

          i1.TABLE_SCHEMA as `schema`
        , i1.TABLE_NAME as `table`
        , i1.INDEX_NAME as `index1`
        , replace(i1.`columns`, '~', '') as `columns1`
        , i2.INDEX_NAME as `index2`
        , replace(i2.`columns`, '~', '') as `columns2`          

FROM
  (
        SELECT
                  TABLE_SCHEMA
                , TABLE_NAME
                , INDEX_NAME
                , GROUP_CONCAT(CONCAT('~', COLUMN_NAME, '~')
                                ORDER BY SEQ_IN_INDEX
                               ) AS columns

        FROM `information_schema`.`STATISTICS`

        WHERE TABLE_SCHEMA NOT IN ('mysql', 'INFORMATION_SCHEMA')

        -- AND NON_UNIQUE = 1 

        AND INDEX_TYPE='BTREE' 

        GROUP BY
              TABLE_SCHEMA
            , TABLE_NAME
            , INDEX_NAME
    ) AS i1 

    INNER JOIN 

    (

        SELECT
              TABLE_SCHEMA
            , TABLE_NAME
            , INDEX_NAME
            , GROUP_CONCAT(CONCAT('~', COLUMN_NAME, '~')
                            ORDER BY SEQ_IN_INDEX
                           ) AS columns

        FROM `information_schema`.`STATISTICS`

        WHERE INDEX_TYPE='BTREE' 

        GROUP BY
              TABLE_SCHEMA
            , TABLE_NAME
            , INDEX_NAME
    ) AS i2

/*
 *
USING (TABLE_SCHEMA, TABLE_NAME)

WHERE i1.columns != i2.columns 

AND LOCATE(CONCAT(i1.columns, ','), i2.columns) = 1 

*/  

  /*
   * Same Schema and Table
  */
  on i1.TABLE_SCHEMA = i2.TABLE_SCHEMA

  and i1.TABLE_NAME = i2.TABLE_NAME

  /* Different Index */
  and i1.INDEX_NAME != i2.INDEX_NAME

  and
      (

            /*
             Same exact column list
             */
            (
                   ( i1.columns = i2.columns )
               and ( i1.INDEX_NAME < i2.INDEX_NAME )
            )

             /*
                Different column list
                One column list is fully menetioned in the other
             */
             or (

                         ( i1.columns  i2.columns )

                    and  (
                            locate( i1.columns, i2.columns ) > 0
                         )   

                )    

      )

order by      

          `schema`
        , `table`
        , `index1`
        , `columns1`

.

Dedication

Dedicated to Shlomi Noach.

In time we all get to recognize a giving humble spirit.

The code is by no means perfect.

But, he is like I, think this is a good building block.  And, here are the the things I considered and the goal I have in mind.

Anyone can share when things are perfect and no one holds it against you if you choose not to share.

To me it takes a lot of confidence in far & disparate areas to place things in the public space when one is just starting out.

And that is the spirit of blogging, posting things on YouTube or doing a start-up.

Is one humble enough to grow in the public space.  And, can the relationships supplicate thoughtful innovation.

Sometimes in life we all enter those seasons, when one does not want to keep things as they are.

 

MySQL – information_schema.statistics

Background

As a quick follow-up to our discussion of metadata on tables, let us discuss indexes.

BTW, the discussion on tables is here.

INFORMATION_SCHEMA.STATISTICS

Code


select

		tblStat.TABLE_NAME as 'table'

	  , tblStat.INDEX_NAME as 'index'

	  , case tblStat.NON_UNIQUE
			   when 0 then 'No'
			   when 1 then 'Yes'
			   else '?'
		end as 'Unique'


	  , GROUP_CONCAT( COLUMN_NAME order by SEQ_IN_INDEX ) as colList


	  , tblStat.CARDINALITY


from  INFORMATION_SCHEMA.STATISTICS tblStat


where  tblStat.TABLE_SCHEMA = 'drupal'


group by

		 tblStat.TABLE_SCHEMA
	   , tblStat.TABLE_NAME
	   , tblStat.INDEX_NAME


order by

		 tblStat.TABLE_SCHEMA
	  ,  tblStat.TABLE_NAME
	  ,  tblStat.INDEX_NAME


LIMIT 25

;


Output

 

Comparison with Other Databases

  1. GROUP_CONCAT
    • The group_concat is very, very powerful and has few equal in other database platforms
    • Function Documentation

 

SHOW INDEX

To get a bit more information, specifically cardinality ( Number of unique values in each column combination ), we can issue “show index” against the table.

Code

Syntax


show index from [table] from [database];

Sample


show index from search_index from drupal;

Output

References

  1. Reference
    • INFORMATION_SCHEMA Tables
      • INFORMATION_SCHEMA STATISTICS Table
    • Show Index
    •  Group_Function
      • Group By Function
        Link
    • Group_Concat