Friday, December 11, 2009

Keys, Cosntraints, index

A constraint is a property assigned to a column or the set of columns in a table that prevents certain types of inconsistent data values from being placed in the column(s). Constraints are used to enforce the data integrity. This ensures the accuracy and reliability of the data in the database. The following categories of the data integrity exist
Entity Integrity ensures that there are no duplicate rows in a table.
Domain Integrity enforces valid entries for a given column by restricting the type, the format, or the range of possible values.
Referential integrity ensures that rows cannot be deleted, which are used by other records (for example, corresponding data values between tables will be vital).
User-Defined Integrity enforces some specific business rules that do not fall into entity, domain, or referential integrity categories.

A PRIMARY KEY constraint is a unique identifier for a row within a database table. Every table should have a primary key constraint to uniquely identify each row and only one primary key constraint can be created for each table. The primary key constraints are used to enforce entity integrity

CREATE TABLE employee(
EmployeeId INT NOT NULL CONSTRAINT pk_employee PRIMARY KEY (EmployeeId)
,
LName VARCHAR(30) NOT NULL,
FName VARCHAR(30) NOT NULL,
Address VARCHAR(100) NOT NULL,
HireDate DATETIME NOT NULL,
Salary MONEY NOT NULL CONSTRAINT check_sale CHECK (salary > 0)
)
Constraints Enhancements

When you attempt to update or delete a key to which existing foreign keys point. You can control it by using the new ON DELETE and ON UPDATE clauses in the REFERENCES clause of the CREATE TABLE and ALTER TABLE statements. For example, in the previous versions of SQL Server if you wanted to do a cascade delete from the referenced table when the appropriate record in the parent table is deleted, you had to create a trigger which executed on delete of the parent table, but now you can simply specify the ON DELETE clause in the REFERENCES clause.
CREATE TABLE Books (
BookID INT NOT NULL PRIMARY KEY,
AuthorID INT NOT NULL,
BookName VARCHAR(100) NOT NULL,
Price MONEY NOT NULL
)
GO

CREATE TABLE Authors (
AuthorID INT NOT NULL PRIMARY KEY,
Name VARCHAR(100) NOT NULL
)
GO

ALTER TABLE Books
ADD CONSTRAINT fk_author
FOREIGN KEY (AuthorID)
REFERENCES Authors (AuthorID) ON DELETE CASCADE
GO

Constraints are the built-in mechanism for enforcing data integrity. Using constraints is preferred to using triggers, rules, and defaults because built-in integrity features use much less overhead and perform faster than the ones you can create. When you write your own code to realize the same actions the constraints can make you can make some errors, so the constraints are not only faster, but also are more consistent and reliable. So, you should use triggers and rules only when the constraints do not provide all the needed functionality.

Candidate Key - A Candidate Key can be any column or a combination of columns that can qualify as unique key in database. There can be multiple Candidate Keys in one table. Each Candidate Key can qualify as Primary Key.
Primary Key - A Primary Key is a column or a combination of columns that uniquely identify a record. Only one Candidate Key can be Primary Key.



Select a key that does not contain NULL
It may be possible that there are Candidate Keys that presently do not contain value (not null) but technically they can contain null. In this case, they will not qualify for Primary Key. In the following table structure, we can see that even though column [name] does not have any NULL value it does not qualify as it has the potential to contain NULL value in future
A table can have multiple Candidate Keys that are unique as single column or combined multiple columns to the table. They are all candidates for Primary Key. Candidate keys that follow all the three rules - 1) Not Null, 2) Unique Value in Table and 3) Static - are the best candidates for Primary Key. If there are multiple candidate keys that are satisfying the criteria for Primary Key

When you use the multiple-column constraint format, you can create a composite key. A composite key specifies multiple columns for a primary-key or foreign-key constraint.
The next example creates two tables. The first table has a composite key that acts as a primary key, and the second table has a composite key that acts as a foreign key.
CREATE TABLE accounts (
acc_num INTEGER,
acc_type INTEGER,
acc_descr CHAR(20),
PRIMARY KEY (acc_num, acc_type))

CREATE TABLE sub_accounts (
sub_acc INTEGER PRIMARY KEY,
ref_num INTEGER NOT NULL,
ref_type INTEGER NOT NULL,
sub_descr CHAR(20),
FOREIGN KEY (ref_num, ref_type) REFERENCES accounts
(acc_num, acc_type))
Cluster index & Non cluster index
Primary_Key column is a monotonically increasing integer and that the last three columns are VARCHAR columns. Let's also assume that there is a clustered index on the Primary_Key column, and that there are no nonclustered indexes on the table.

Now, let's assume we run the following query:

SELECT Customer_No, Customer_Name, Customer_Address
FROM Table_Name
WHERE Primary_Key = 1001

When the query optimizer analyzes this query, it will know that the Primary_Key column is the key to a clustered index. So the clustered index will (most likely) be used to locate the Primary_Key of 1001 very quickly using a clustered index lookup. In addition, since the other three columns are part of the clustered index, the values for these three columns are immediately available and can be immediately displayed after the query finishes executing. Keep in mind that all the columns in a clustered index are stored at the leaf level of the clustered index, so SQL Server, in this example, does not have to look elsewhere to find the data to be returned.

Non cluster
Now, let's take a look at the following query:

SELECT Customer_No, Customer_Name, Customer_Address
FROM Table_Name
WHERE Customer_No = 'ABC123'

In this case, when this query is analyzed by the query optimizer, there is no index on the column Customer_No. Because of this, SQL Server will perform a clustered index scan to look for the designated record to return. Since this column is not indexed or unique, every row in the table will have to be scanned until the record is found. Once it is found, the rest of the record (because it is a clustered index and each row's data is part of the leaf index), is immediately available and the data is returned.

If we need to perform the above query often, it would be to our advantage to add a nonclustered index to the Customer_No column. This way, when we run a query like the one above, instead of performing a time-consuming clustered index scan, it can use the nonclustered index on Customer_No to perform an index lookup, which is much faster than a clustered table scan. But is this all we need to do to optimize the above query? Actually, we have forgotten something. If we run the query, let's look at it again below, something else has to happen before our data is returned.

SELECT Customer_No, Customer_Name, Customer_Address
FROM Table_Name
WHERE Customer_No = 'ABC123'

When this query runs, it will use the nonclustered index on the Customer_No column to quickly identify the record using an index lookup. But unlike a clustered index, a nonclustered index only contains the data stored in the index, which is, in this case, only the Customer_No column. On the other hand, our query wants to return three columns: Customer_No, Customer_Name, and Customer_Address, not just key Customer_No. Because of this, SQL Server will now have to perform another step before it can return our data. Once it has located the correct row in the nonclustered index, then SQL Server must then look up the values of the other two columns from the clustered index, where this data is stored
Covering indexes, if used judiciously, can be used to speed up many types of commonly run queries. But covering indexes have some limitations. For example, they are limited to a maximum of 16 columns; they have a 900-character maximum width; certain datatypes cannot be included in them; and adding additional columns to an index makes the index wider, which in turn requires more disk I/O to read and write the rows, potentially hurting performance, especially if the table is subject to many INSERTs, UPDATEs, and DELETEs.

"nonkey column nonclustered indexes." This new feature is a variation of the standard covering index we have been talking about up to this point, but with some advantages over covering indexes.

Let's go back to our example query.

SELECT Customer_No, Customer_Name, Customer_Address
FROM Table_Name
WHERE Customer_No = 'ABC123'

Let's assume, as before, that there is a clustered index on Primary_Key. Now, instead of creating a covering index as described above, what we can do is create a nonclustered index on Customer_No, but instead of adding Customer_Name and Customer_Address as additional key columns to the index, we instead add Customer_Name and Customer_Address as nonkey columns to the nonclustered index.

nonkey column nonclustered index over using a covering index. The benefits include:

All data types are supported, except text, ntext, and image. So you have more datatype options than a covering index.
The maximum number of columns is 1,023, not 16 as with covering indexes.
Because the actual index is narrower, the key is more efficient and can offer better performance over a covering index where all of the columns are part of the key.
Yes, these are small differences, but lots of small differences add up and can help boost the performance of your SQL Server. In fact, you may want to review all of your currently existing covering indexes and change them to nonkey column nonclustered indexes to see if your query performance increases. Depending on the circumstances, performance could indeed increase.

Nonkey columns are created using the INCLUDE clause of the CREATE INDEX statement. For example:

CREATE INDEX IX_Table ON Table (Key_Index_Column) INCLUDE (Column1, Column2, Column3)

More space is required to store indexes with nonkey columns. Nonkey column data is stored at both the leaf level of the index and in the table itself.
Larger indexes mean fewer rows can fit on a page, potentially increasing disk I/O.
Index maintenance is increased for data modifications, potentially hurting performance if nonkey columns are large and the database experiences a high level of data modifications.

Copyright © 2009 Angel