Background
The first thing one learns about Database Performance is the importance of “up to date” Statistics. As soon as things start running slow, one hears the harried echoes – Is there blocking, are the stats up to date, are the Indexes Defragged, are those bad queries running again?
I am always amazed as to how many things get thrown at the lowly and lonely DBA.
No Join Predicate
And, so that I am really jazzed that Microsoft Research along with the Database Core group continues to think ahead and evolve the product. Let us take for example this “No Join predicate” thing.
Optimization Path
Rules Based
We has DBAs have to quickly understand what is new, how it helps us, and what do we have to do to ensure we are reaping unearned credits.
How does it work? Well, it looks at the query and tries to decipher whether it’s join clause is fully defined.
In the Join clause it says have we included all the columns needed to ensure that a unique row is returned. Uniqueness is satisfied via:
- Primary Key
- Unique Index
- Unique Constraints
In previous post, we defined our table as
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO if schema_id('njp') is null begin exec('create schema [njp] authorization [dbo]') end go if object_id('[njp].[AttributeList]') is not null begin drop table [njp].[AttributeList] end go if object_id('[njp].[listofAttributes]') is not null begin drop table [njp].[listofAttributes] end go if object_id('[njp].[person]') is not null begin drop table [njp].[person] end go create table [njp].[person] ( [personID] bigint not null identity(1,1) CONSTRAINT [PK_PERSON_ID] PRIMARY KEY CLUSTERED ( [personID] ASC ) ) create table [njp].[listofAttributes] ( [attributeName] nvarchar(50) not null primary key ) go CREATE TABLE [njp].[AttributeList] ( [pkAttributeListID] [bigint] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL , [personID] bigint not null , [attributeName] [nvarchar](50) not null , [attributeValue] [nvarchar](380) null , [addedBy] [sysname] NOT NULL CONSTRAINT constrantDefaultNJPAttributeList default (SUSER_SNAME()) , [addedOn] [datetime] NOT NULL CONSTRAINT constrantDefaultNJPAttributeListAddedOn default getdate() CONSTRAINT [PK_AttributeList] PRIMARY KEY NONCLUSTERED ( [pkAttributeListID] ASC ) , CONSTRAINT [FK_AttributeList_PersonID] FOREIGN KEY ( [personID] ) REFERENCES [njp].[person] ( [personID] ) , CONSTRAINT [FK_AttributeList_AttributeName] FOREIGN KEY ( [attributeName] ) REFERENCES [njp].[listofAttributes] ( [attributeName] ) ) go create clustered index idx_personID_attributeName on [njp].[AttributeList] ( [personID] , [attributeName] ) GO create nonclustered index idx_pkAttributeListID on [njp].[AttributeList] ( [pkAttributeListID] ) GO /* drop index [njp].[AttributeList].idx_unique_personID_attributeName create unique index idx_unique_personID_attributeName on [njp].[AttributeList] ( [personID] , [attributeName] ); update statistics [njp].[AttributeList] idx_unique_personID_attributeName with fullscan; */ go /* -- drop index [njp].[AttributeList].idx_unique_pkAttributeListID create unique nonclustered index idx_unique_pkAttributeListID on [njp].[AttributeList] ( [pkAttributeListID] ) */ go /* alter table [njp].[AttributeList] drop constraint [PK_AttributeList] alter table [njp].[AttributeList] add constraint PK_AttributeList primary key ( [personID] , [attributeName] ); */ /* -- Check for unique constraints select object_name(id) as objectName , object_name(constid) as constraintName , tblC.name as columnName , tblSC.colid as columnID from sys.sysconstraints tblSC inner join sys.columns tblC on tblSC.id = tblC.object_id and tblSC.colid = tblC.column_id where tblSC.id = object_id('[njp].[AttributeList]') and tblSC.constid = object_id('constraint_AttributeList_Unique_personID_attributeName') */ /* select * from sys.check_constraints select * from sys.default_constraints */ /* --Drop Unique Constraint, if is already exists if exists ( select * from sys.indexes tblSI where tblSI.is_unique_constraint = 1 and tblSI.object_id = object_id('[njp].[AttributeList]') and tblSI.name = 'constraint_AttributeList_Unique_personID_attributeName' ) begin alter table [njp].[AttributeList] drop constraint constraint_AttributeList_Unique_personID_attributeName end alter table [njp].[AttributeList] add constraint constraint_AttributeList_Unique_personID_attributeName unique ( [personID] , [attributeName] ); */ go
And, did a self join
declare @personID bigint declare @attributeName nvarchar(50) declare @attributeNameAlt nvarchar(50) declare @attributeNameAlt2 nvarchar(50) set @attributeName = 'BirthMonth' set @attributeNameAlt = 'favoriteColor' set @attributeNameAlt2 = 'BirthCity' set @personID = 1 select tblAL.personID , tblAL.attributeName , tblAL.attributeValue , tblALAlt.attributeName , tblALAlt.attributeValue from [njp].[AttributeList] tblAL left outer JOIN [njp].[AttributeList] tblALAlt ON tblAL.personID = tblALAlt.personID AND tblALAlt.attributeName = @attributeNameAlt where tblAL.personID = @personID and tblAL.attributeName = @attributeName ;
Upon returning to our criteria, we can see that our join columns ( personID and attributeName) does not satisfy our uniqueness criteria:
- The Primary Key is our identity column ( pkAttributeListID). This ensures that we have an ever increasing index. The good is that we will not incur split pages; but we will have hotspots at the shared insertion page
- We have an index on personID and attributeName
- We do not have unique constraints
And, so we are hoping the system will hint us with “No Join Predicate” warning.
But, no it is giving us a pass.
Cost Based
Let us see whether we can trip the warning by introducing a bit of cost. There are a couple of ways to do so:
- Add data
- Play game with the statistics
Statistics Tweaking
Let us play games with our statistics
Set hard-coded Statistics
Syntax:
update statistics [njp].[AttributeList] with rowcount = <rowcount>, pagecount = <page-count>
Sample:
In the example below, we wll set the Number of rows to 1000000
update statistics [njp].[AttributeList] with rowcount = 1000000;
Query Plan
Once we have statistics, we have a far more helpful query plan.
Hints
Here are our hints:
- No Join Predicate
- No Stats
No Join Predicates
No Stats
- TableCardinality :- 1000000
- The table cardinality of 1 million is obviously due to us faking a rowcount of 1 million records
- Warnings :- Columns With No Statistics: [DBLAB].[njp].[AttributeList].personID, [DBLAB].[njp].[AttributeList].attributeName
- The columns with no statistics is indicative of the fact that we have table level statistics, but we have no way of creating column level data stats
Check Statistics
DBCC SHOW_STATISTICS ('[njp].[AttributeList]', 'idx_personID_attributeName')
Output:
Stats:
Here are the Stats through GUI:
We can also use the GUI by transversing Database \ Tables \ <Table name> \ Statistics \ <Index-name>
Steps
Output
Either path we take, issuing SQL statement or via GUI, we have no stats.
Take Care of Hints
Let us go address our helpful hints.
No Join Predicate
As we discussed on a previous post, we need to ensure that our joins result in a single row being returned.
We also offered up primary key or unique indexes. A unique constraint is a first cousin of unique index and we will use it.
--Drop Unique Constraint, if is already exists if exists ( select * from sys.indexes tblSI where tblSI.is_unique_constraint = 1 and tblSI.object_id = object_id('[njp].[AttributeList]') and tblSI.name = 'constraint_AttributeList_Unique_personID_attributeName' ) begin alter table [njp].[AttributeList] drop constraint constraint_AttributeList_Unique_personID_attributeName end alter table [njp].[AttributeList] add constraint constraint_AttributeList_Unique_personID_attributeName unique ( [personID] , [attributeName] );
Btw, I prefer to check if an object exists, before attempting to create it. That is the basis for the if exists code-line above.
And, as always say our blind spots are not obvious outside of getting out a pen and writing it out.
To check for unique constraint, my original thoughts were geared towards checking the constraints table
-- Check for unique constraints select object_name(id) as objectName , object_name(constid) as constraintName , tblC.name as columnName , tblSC.colid as columnID from sys.sysconstraints tblSC inner join sys.columns tblC on tblSC.id = tblC.object_id and tblSC.colid = tblC.column_id where tblSC.id = object_id('[njp].[AttributeList]') and tblSC.constid = object_id('constraint_AttributeList_Unique_personID_attributeName')
But, sys.sysconstrainsts does not expose unique constraints, just default and check constraints.
Also, keep in mind that we have sys.check_constraints and sys.default_constraints. But, no sys.unique_constraints.
No Stats
Once we create the unique constraint ( index ), we were able to rid ourselves of the corresponding missing stats warning, as well.
But, keep in mind, we will still without detailed column stats, until we add data rows.
Once we add data rows, we can now see stats:
Source Control
GitHub
The Transact SQL is available GitHub @ https://github.com/DanielAdeniji/TransactSQLWarningNoJoinPredicate .
Additional Reading
As I prepared this post, Goggled on this and that.
And, was lucky enough to find this gem…
More Common Mistakes Java Developers Make when Writing SQL
http://blog.jooq.org/2013/08/12/10-more-common-mistakes-java-developers-make-when-writing-sql/
Item 8 – Not using row value expressions where they are supported
SELECT c.first_name, c.last_name, a.street
FROM customer c
JOIN address a
ON (c.id, c.tenant_id) = (a.id, a.tenant_id)
Unfortunately, not all databases support row value expressions in the same way. But the SQL standard had defined them already in 1992, and if you use them, sophisticated databases like Oracle or Postgres can use them for calculating better execution plans.
Dedicated
I will be remiss if I do not dedicate this post to Microsoft – Tech Support, Research, and SQL Server. The “No Join predicates” is very useful tooling.
Yes, I know that often I use this forum to throw things at MSFT. And, anyone sympathetic to their cause will and should borrow from Chris Brown (CB) and say
“Miss me with that BS”.
But, this is house business. Though we take it to the streets ever so succinctly; it doesn’t stay that way.
Summary:
We need declarative tooling to establish row level uniqueness. Primary Keys, Unique Indexes or Constraints should suffice.
But, it seems we also might need actual data or faked number of row statistics to trigger “No Join Predicate” warnings.