Single Table Queries (DML)
Single-table queries
Elements of the SELECT statement –20
- The FROM clause –18 underlined
- Delimiting identifier names
- The WHERE clause
- The GROUP BY clause
- The HAVING clause
- The SELECT clause
- The ORDER BY clause
- The TOP and OFFSET-FETCH filters
- The TOP filter
- The OFFSET-FETCH filter
Predicates and operators
CASE expressions
NULLs
All-at-one operations
Working with character data
- Data types
- Collation
- String operators and functions
- String concatenation (+ sign operator and CONCAT function)
- The SUBSTRING function
- The LEFT and RIGHT functions
- The LEN and DATALENGTH functions
- The CHARINDEX function
- The PATINDEX function
- The REPLACE function
- The REPLICATE function
- The STUFF function
- The UPPER and LOWER functions
- The RTRIM and LTRIM functions
- The FORMAT function
- The COMPRESS and DECOMPRESS functions
- The STRING_SPLIT function
- The LIKE predicate
- The % (percent) wildcard
- The _(underscore) wildcard
- The [<list of characters>] wildcard
- The [<character>-<character>] wildcard
- The [^<character list or range>] wildcard
- The ESCAPE character
Working with Date and Time data
- Date and time data types
- Literals
- Working with date and time separately
- Filtering date ranges
- Date and time functions
- Current data and time
- The CAST, CONVERT, and PARSE functions and their TRY_ counterparts
- The SWITCHOFFSET function
- The TODATETIMEOFFSET function
- The AT TIME ZONE function
- The DATEADD function
- The DATEDIFF and DATEDIFF_BIG functions
- The DATEPART function
- The YEAR, MONTH, and DAY functions
- The DATENAME function
- The ISDATE function
- The FROMPARTS functions
- The EOMONTH function
Querying metadata
- Catalog views
- Information schema views
- System stored procedures and function
(INCLUDE DIAGRAMS FOR ALL SYNTAX FOR THE CLAUSES)
Elements of the SELECT statement
The purpose of the SELECT statement is to query database tables, apply logical manipulation to the data, and result a result set.
SQL code is processed differently than most programming languages, where code is processed in the order they were written. In SQL, the clauses are logically processed in the following order:
- FROM
- WHERE
- GROUP BY
- HAVING
- SELECT
- ORDER BY
We cannot write SQL code in the correct logical order. Most SQL code begins with the SELECT clause because the designers of SQL intended SQL to be a declarative language where the syntax resembles the English language.
(Note that the use of the semicolon in SQL code is mostly optional and only required in situations where the parsing of the code is ambiguous. However, it is good practice to end SQL statements with a semicolon because it improves readability and because not using the semicolon will be a deprecated feature.)
The FROM clause
The FROM clause is the query clause that is logically processed first, and this clause is used to specify the names of the tables we are querying, as well as the table operators that operate on those tables.
(Note that it is good practice to always schema-qualify objects in our SQL code). If the schema name is not explicitly specified, the RDBMS implicitly resolves it based on implicit name-resolution rules. This creates a minor cost in performance, and there’s also the risk of choosing a different object than what was intended.)
Here, we use the FROM clause to returns all rows from the SalesOrderHeader table in the Sales schema, selecting columns SalesOrderID, CustomerID, SalesPersonID, OrderDate, and Freight:
USE AdventureWorks2017; SELECT SalesOrderID, CustomerID, SalesPersonID, OrderDate, Freight FROM Sales.SalesOrderHeader;
Note the order of the rows in the result data set is not guaranteed.
Delimiting identifier names
Irregular identifiers are ones having embedded spaces, special characters, starts with a digit, or is a reserved keyword. We need to delimit irregular identifiers for those to be recognized by the RDBMS. Standard SQL delimits irregular identifiers is using double quotes. T-SQL accomplishes this by using the standard SQL method, as well as using square brackets. Delimiting identifiers is optional only when the identifiers comply with the rules for the format of regular identifiers (as outlined in SQL Server Books Online). In is recommended to use only regular identifiers and delimit identifiers only if they are irregular so that the code doesn’t get clutters.
The WHERE clause
The WHERE clause is used to apply a predicate or logical expression to filter rows returned from the FROM clause. Only rows for which the predicate evaluates to TRUE are returned in the result set.
Here, we filter out for rows in the Orders table to return only information for customer 71:
SELECT SalesOrderID, SalesPersonID, OrderDate, Freight FROM Sales.SalesOrderHeader WHERE CustomerID=71;
Without the WHERE clause, we would just have the FROM clause returning 830 records. The WHERE clause filters for records where customer ID is 71 evaluates to TRUE, returning 31 out of the 830 records.
Based on the what the predicate or filter expression is, SQL Server may use indexes to access the queried data. The WHERE clause can be significant to query performance, because using indexes enables SQL Server to sometimes retrieve the queried data with far less work when compared to full table scans. Query filters also reduced the network traffic that’s created by returning all possible rows to the caller and filtering on the client side.
Note that the WHERE clause only filters for records that evaluate to TRUE and discards all records that evaluate to FALSE or UNKNOWN.
The GROUP BY clause
Using the GROUP BY clause we can group the records of the returned by the previous logical processing phase(s). The element(s) by which the records will be grouped are specified in the GROUP BY clause. For example, the records in the result set of the following query are grouped by empid and YEAR(orderdate):
SELECT SalesPersonID, YEAR(OrderDate) AS orderyear, FROM Sales.SalesOrderHeader WHERE CustomerID=71 GROUP BY SalesPersonID,Year(OrderDate);
The WHERE clause itself returns 31 records, and when we apply grouping to that we return 16 records reflecting the unique combinations of empid and YEAR(orderdate) values.
All subsequent clauses after the GROUP BY (including HAVING, SELECT, and ORDER BY) must operate on groups rather than individual rows. Each group is represented by a single row in the result set. This means that all subsequent clauses after GROUP BY must return a scalar (single value) per group.
Attributes that not participate in the GROUP BY clause are allowed only as inputs to an aggregate function (COUNT, SUM, AVG, MIN, MAX). For example, the following query returns the total freight and number of orders per employee and order year:
SELECT SalesPersonID, YEAR(OrderDate) AS orderyear, SUM(Freight) AS totalfreight, COUNT(*) AS numorders FROM Sales.SalesOrderHeader WHERE CustomerID=71 GROUP BY SalesPersonID,YEAR(OrderDate);
Attributes that do not participate in a GROUP BY clause and are not an input for an aggregate function will return an error in the query because there’s no guarantee that the attribute will return a single value per group. For example, the following query will return an error message:
SELECT SalesPersonID, YEAR(OrderDate) AS orderyear, Freight), FROM Sales.SalesOrderHeader WHERE CustomerID=71 GROUP BY SalesPersonID,YEAR(OrderDate);
(Include the error message from page 56)
Aggregate functions ignore NULLs, with the exception of COUNT(*). The following is an example of using the DISTINCT option with an aggregate function:
SELECT SalesPersonID, YEAR(OrderDate) AS orderyear, COUNT(DISTINCT CustomerID) as numcusts FROM Sales.SalesOrderHeader GROUP BY SalesPersonID,YEAR(OrderDate);
Here, we return the number of distinct customers handled by each employee and each year.
The HAVING clause
The WHERE clause is a row filter, whereas the HAVING clause is a group filter. Only groups that evaluated to TRUE by the HAVING predicate are returned to the next logical query phase (if one exists). Any group that evaluates to FALSE or UNKNOWN will not be returned. Because the HAVING clause is processed after rows have been grouped, we can apply aggregate functions in conjunction with the HAVING clause. For example, the following query will return records grouped by employee and year with more than one row:
SELECT SalesPersonID, YEAR(OrderDate) AS orderyear, FROM Sales.SalesOrderHeader GROUP BY SalesPersonID,YEAR(OrderDate) HAVING COUNT(*)>1;
Recall from the previous query that we have 16 rows that were grouped by employee and year. Out of the 16, 9 records have more than 1 row.
The SELECT clause
The SELECT clause is used to specify the attributes (columns) to return in the result set of a query. If the SELECT clause specifies an attribute with no manipulation, the name of the target attribute retains the same name as the name of the source attribute. Optionally, we can add our own name to an attribute by aliasing them using the AS clause. Attributes that are manipulated (such as YEAR(orderdate)), or attributes not based on a source attribute (such as the SYSDATETIME function), won’t have a name unless they are aliased. While T-SQL querying allows for columns with no name to be returned, the relational model does not. It is good practice to alias attributes when applicable so that all attributes returned from a query have names.
In addition to aliasing using the AS clause, T-SQL also allows <alias>=<expression> (alias equals expression) and <expression> <alias> (expression space alias).
If there is no comma between two column names in the SELECT list, the RDBMS will assume the second name as the alias for the first column name. For example:
SELECT SalesOrderID OrderDate FROM Sales.SalesOrderHeader;
Recall the SELECT clause is processed after the FROM, WHERE, GROUP BY< and HAVING clauses. This means that column aliases in the SELECT do not exist for those logical query phases preceding it. For this reason, the RDBMS will return an error message is we refer to column aliases in any clause processed before the SELECT clause. For example:
SELECT SalesOrderID, Year(OrderDate) AS orderyear FROM Sales.SalesOrderHeader WHERE orderyear>2015;
This behavior is by design and is not a bug. A work-around to this problem is to repeat YEAR(orderyear) expression in the WHERE clause:
SELECT SalesOrderID, Year(OrderDate) AS orderyear FROM Sales.SalesOrderHeader WHERE YEAR(OrderDate)>2015;
Similarly, we would have an error if we refer to a column alias in the HAVING clause since HAVING is also processed before the SELECT clause. Like the previous example, the work-around to this problem is to repeat the expression COUNT(*) in the HAVING clause:
SELECT SalesPersonID, Year(OrderDate) AS orderyear, COUNT(*) AS numorders FROM Sales.SalesOrderHeader Where CustomerID=71 GROUP BY SalesPersonID, YEAR(orderdate) HAVING COUNT(*)>1;
In set theory of the relational model, there are no duplicates. While the relational model is based on mathematical set theory, SQL is based on the multiset theory, which allows for duplicate records. In SQL, it is not a requirement for tables to have a key, and therefore a table can have duplicate and not be relational. Even if a table has a key, a SELECT query against the table can still return duplicate rows. For example, the Orders table has a primary key defined on the orderid column, but querying against the Orders table returns duplicate records:
SELECT SalesPersonID, Year(OrderDate) AS orderyear, COUNT(*) AS numorders FROM Sales.SalesOrderHeader Where CustomerID=71;
Using the DISTINCT clause in SQL allows the removal of duplicate records and return a relational result set:
SELECT DISTINCT SalesPersonID, Year(OrderDate) AS orderyear, COUNT(*) AS numorders FROM Sales.SalesOrderHeader Where CustomerID=71;
The previous query returned 31 records, but after the removal of duplicates we now have 16 distinct records.
Using an asterisk (*) in the SELECT query returns all attributes of the table without explicitly listing them:
SELECT * FROM Sales.SalesOrderHeader;
Use of the asterisk is regarded as bad practice and it is recommended we explicitly list all attributes. Unlike the relational model, SQL keeps ordinal positions of the columns based on the order specified in the CREATE TABLE statement. Explicitly specifying attributes needed will always return the right attributes as long as they exist in the table. If a column specified in a query was dropped from the table, we will get an error message and fix accordingly. There is some extra work involved in resolving column names when the asterisk is used, but the cost is negligible in comparison to other costs involved in the processing of a query.
It is not allowed to refer column aliases created in the SELECT clause in other expressions within the same SELECT clause, even if the expression is to the right of the expression that created the alias. For example:
SELECT SalesOrderID, YEAR(OrderDate) AS orderyear, orderyear + 1 AS nextyear FROM Sales.SalesOrderHeader;
As explained earlier, the work-around to this problem is to repeat the expression:
SELECT SalesOrderID, Year(OrderDate) AS orderyear, YEAR(OrderDate) + 1 AS nextyear FROM Sales.SalesOrderHeader;
The ORDER BY clause
The ORDER BY clause is used to sort records of the result set for presentation purposes. The ORDER BY clause is the very last clause processed in the phases of logical query processing. The following query sorts records in the result set by employee ID and order year:
SELECT SalesPersonID, YEAR(OrderDate) AS orderyear, COUNT(*) AS numorders FROM Sales.SalesOrderHeader WHERE CustomerID=71 GROUP BY Sales.SalesPersonID,YEAR(OrderDate) HAVING COUNT(*)>1 ORDER BY SalesPersonID,orderyear;
In SQL, tables (whether existing in a database or is a result set) have no guaranteed order. A table of represents a set of rows (or multisets if contains duplicates), and sets have no order. Without the ORDER BY clause, the RDBMS will return records from a query in any unspecified order. The ORDER BY clause guarantees the ordering of the result set.
Note that by specifying an ORDER BY clause in a query, the result can no longer be regarded as a table because it is now ordered. Such a result is called a cursor in standard SQL. This is important because some SQL language elements and operations work with table results of queries and not with cursors.
Because the ORDER BY phase is processed after the SELECT phase, the ORDER BY clause is the only clause that can refer to column alias created in the SELECT clause.
The ASC keyword sorts records in ascending order, and is optional because by ascending is the default sorting of the ORDER BY clause. The DESC keyword is used to sort records in descending order.
In T-SQL it is possible to specify ordinal positions of columns in the ORDER BY clause, based on the order of the columns specified in the SELECT list. For example, the following query is an alternative to the previous query:
SELECT SalesPersonID, YEAR(OrderDate) AS orderyear, COUNT(*) AS numorders FROM Sales.SalesOrderHeader WHERE CustomerID=71 GROUP BY Sales.SalesPersonID,YEAR(OrderDate) HAVING COUNT(*)>1 ORDER BY 1,2;
This however is considered bad practice. In the relational model attributes don’t have ordinal positions and must be referenced by name. Also, when making revisions to the SELECT clause it is possible for us to forget the corresponding revisions in the ORDER BY clause. By explicitly specifying columns names, our code is safe from this kind of mistake.
With T-SQL it’s possible to ORDER BY records of a query by attributes not in the SELECT list, meaning we can sort by attributes we do not necessarily wish to return. The drawback to this is that we’re unable to check the sorting is correct by looking at the query result. For example, the following query sorts by hiredate while not returning the hiredate column:
(Include code and results from page 65)
However, when we use the DISTINCT clause in the query, the ORDER BY list is restricted only to elements that appear in the SELECT list. This is because when a DISTICT is specified, a single result row might represent multiple source rows, so it may be unclear which of the values in the multiple rows should be used for the ordering. For example:
(Include code and results from page 65)
In the Employees table, there are a total of nine employees – five from the United States and four from the United Kingdom. After omitting the invalid ORDER BY clause in the query, two rows are returned-one for each distinct country. Because in the source table each country appears in multiple rows, and each row has a different employee ID, the meaning of ORDER BY empid is not defined.
The TOP and OFFSET-FETCH BY filters
The TOP filter
TOP is a filter that is proprietary to T-SQL that is used to limit the number or percentage of records returned by a query. There are two parts that define the TOP clause: the number or percent of records to return, and the ordering of the result. For example, the following query returns the five most recent orders from the Orders table:
SELECT TOP(5) SalesOrderID, OrderDate, CustomerID, SalesPersonID FROM Sales.SalesOrderHeader ORDER BY OrderDate DESC;
Recall that ORDER BY is processed after the SELECT clause, including the DISTINCT option. The same applies to the TOP filter because it relies on the ORDER BY clause as part of its definition. This means that if DISTINCT is used with the SELECT clause, the TOP filter is processed after all duplicate records are discarded.
Using TOP with the PERCENT keyword, SQL Server can calculate the number of rows to return based on a percentage, rounded up. For example, the following query returns the top 1 percent of the most recent orders:
SELECT TOP(1) PERCENT SalesOrderID, OrderDate, CustomerID, SalesPersonID FROM Sales.SalesOrderHeader ORDER BY OrderDate DESC;
The Orders table has 830 rows, and the query returns 9 rows since that is one percent of 830. Also note that records in the result set is not unique because no primary or unique keys were defined in the orderdate column, and multiple orders have the same order date. In this case, the ordering of the rows with the same order date is undefined, making the query undeterministic. In the case of ties, there is no guarantee on what order the records will get returned, and SQL Server returns rows based on what record it can physically access first.
To make the query deterministic, we apply a tiebreaker to make the ORDER BY list unique. For example, we can add orderid DESC to the ORDER BY list so that the greater order ID is returned first:
SELECT TOP(5) SalesOrderID, OrderDate, CustomerID, SalesPersonID FROM Sales.SalesOrderHeader ORDER BY OrderDate DESC, SalesOrderID DESC;
It is also possible to return all ties along with the TOP n rows. For example, in addition to returning the TOP 5 rows we can also return all rows that have the same order date. We can accomplish this with the WITH TIES option:
SELECT TOP(5) WITH TIES SalesOrderID, OrderDate, CustomerID, SalesPersonID FROM Sales.SalesOrderHeader ORDER BY OrderDate DESC;
We now have eight rows returned. SQL Server returned the TOP(5) rows based on orderdate DESC, as well as rows having the same order date. The WITH TIES option makes the selection of rows deterministic, but the presentation order among rows with the same order date is undeterministic.
The OFFSET-FETCH filter
Unlike the TOP filter, OFFSET-FETCH is standard SQL, with added skipping capability. OFFSET-FETCH is considered an extension of the ORDER BY clause. The OFFSET clause specifies how many rows to skip, and the FETCH clause specifies how many rows to return after the skipped rows. For example:
SELECT SalesOrderID, OrderDate, CustomerID, SalesPersonID FROM Sales.SalesOrderHeader ORDER BY OrderDate,SalesOrderID OFFSET 50 ROWS FETCH NEXT 25 ROWS ONLY;
The query returns rows from the Orders table ordered by orderdate (least to most recent), with orderid as the tiebreaker. Based on the ordering, the OFFSET clause skips the first 50 rows and the FETCH clause returns the next 25 rows only.
There must be an ORDER BY clause in order to use OFFSET-FETCH. Also, in T-SQL the FETCH clause cannot be used without the OFFSET clause. If we don’t want to skip any rows but return several rows with FETCH, we can accomplish this by specifying OFFSET 0 ROWS. OFFSET without the FETCH clause is allowed. In this case, the query skips the number of rows specified by the OFFSET clause and return all remaining records.
ROW and ROWS specification in the OFFSET clause are interchangeable. For example, we can specify FETCH 1 ROW. The same principle applies to the OFFSET clause. Also if we’re not skipping any rows (OFFSET 0 ROWS), we can use the FIRST option instead of the NEXT option.
The following list presents the complete list of logical order processing for all clauses previously discussed:
- FROM
- WHERE
- GROUP BY
- HAVING
- SELECT
- Expressions
- DISTINCT
- ORDERBY
- TOP/OFFSET-FETCH
- FROM clause
- ON clause (i.e., the second part of the JOIN clause)
- OUTER clause (i.e., the first part of the JOIN clause)
- WHERE clause
- GROUP BY clause
- HAVING clause
- SELECT clause (and the first shall be last, or at least much later in the queue)
- DISTINCT clause (this was also in the SELECT clause)
- ORDER BY clause
- TOP clause
Predicates and operators
Recall that predicates expressions can evaluate to either TRUE, FALSE, or UNKNOWN. Predicates can be combined using logical operators like AND (conjunction of predicates) and OR (disjunction of predicates). We can also use other types of operators in our expression such as comparison operators.
Examples of predicates in T-SQL include IN, BETWEEN, and LIKE. The IN predicate is used to determine if a value or scalar expression is equal to one of the elements within the IN predicate. For example, the following query returns all orders where the order ID is equal to either 10248, 10249, or 10250:
SELECT SalesOrderID, OrderDate, SalesPersonID FROM Sales.SalesOrderHeader WHERE SalesOrderID IN(10248,10249,10250);
The BETWEEN predicate is used to determine whether a value is within a specified range, including the two specified delimiters. For example, the following query returns all orders between the inclusive range 10300 and 10310:
SELECT SalesOrderID, OrderDate, SalesPersonID FROM Sales.SalesOrderHeader WHERE SalesOrderID BETWEEN 10300 and 10310;
The LIKE predicate determines whether a character string value meets a specified pattern. For example, the following query returns all employees with last names starting with the letter D:
SELECT BusinessEntityID, FirstName, LastName FROM HumanResources.vEmployee WHERE LastName LIKE N'D%';
(Include code and results from page 72. Mention that the letter N is prefixed before the string to denote the string is of Unicode data type).
T-SQL supports the following comparison operators: =,>,<,>=,<=,<>,!=,!>,!<. Of these, the last three comparison operators are not standard. It is good practice to avoid nonstandard operators when standard operators are available (such as <> instead of !=). The following query returns all rows placed on or after January 1, 2016:
SELECT SalesOrderID, OrderDate, SalesPersonID FROM Sales.SalesOrderHeader WHERE OrderDate>='20160101';
The logical operators OR and AND can be used to combine logical expressions. For example, the following query returns all rows placed on or after January 1, 2016 that were handled by an employee whose ID is 1,3,or 5:
SELECT SalesOrderID, OrderDate, SalesPersonID FROM Sales.SalesOrderHeader WHERE OrderDate>='20160101' AND SalesPersonID IN(1,3,5);
T-SQL also supports the arithmetic operators +,-,*,/,and % (modulo). For example, the following query calculates the net values by applying arithmetic operations to the quantity, unitprice, and discount attributes:
(Include code and results from page 73)
(Complete the highlighted notes from page 73 and 74)
CASE expressions
A CASE expression is a SQL-standard scalar expression that returns a value based on conditional logic. CASE is not a statement (take actions such as controlling the flow of our code), but rather an expression that returns a value. Because CASE is a scalar expression, it is allowed wherever scalar expressions are allowed (SELECT, WHERE, HAVING, ORDER BY clauses, and CHECK constraints).
CASE expressions have two forms: simple and searched. Simple CASE expressions compare one value or scalar expressions with a list of possible values, returning the value for the first match. If the tested value does not equal to any element of the listed values, the CASE expression returns the value in the ELSE clause (if one exists). If there is no ELSE clause, the CASE expression defaults to ELSE NULL. For example, in the following query the CASE expression in the SELECT clause produces a description for the categoryid column in the Production.ProductCategory table:
SELECT Name, ProductCategoryID, CASE ProductCategoryID WHEN 1 THEN 'Beverages' WHEN 2 THEN 'Condiments' WHEN 3 THEN 'Confections' WHEN 4 THEN 'Dairy Products' WHEN 5 THEN 'Grains/Cereals' WHEN 6 THEN 'Meat/Poultry' WHEN 7 THEN 'Produce' WHEN 8 THEN 'Seafood' ELSE 'Unknown Category' END AS categoryname FROM Production.ProductCategory;
A simple CASE form has a single test value or expression right after the CASE keyword that is compared with a list of possible values in the WHEN clauses.
In the searched CASE form, we can specify predicates in the WHEN clauses, rather than using equality comparisons as in the simple CASE form. If the first WHEN predicate evaluates to TRUE, the searched CASE expression returns the value in the first THEN clause associated with the first predicate. If none of the WHEN clauses evaluate to TRUE, the CASE expression returns the value that appears in the ELSE clause, or NULL no ELSE is specified. For example, the following query produces a value category description column that is based on the size of the values:
(Include code and results from page 76)
While every simple case expression can be converted to searched case expressions, the reverse is not possible.
T-SQL supports some functions that are abbreviations of the CASE expression, including ISNULL, COALESCE, IIF, and CHOOSE. Out of these four, only COALESCE is ANSI standard.
- ISNULL: Accepts two arguments as input and returns the first non-NULL argument. If both arguments are NULL, then NULL is returned.
- COALESCE: Like ISNULL but supports more than two arguments, returning the first non-NULL argument, or NULL if all arguments are NULL.
- IIF: IIF(<logical_expression>, argument1, argument2) returns argument1 if <logical_expression> evaluates to TRUE and returns argument2 if <logical_expression> is not TRUE.
- CHOOSE: CHOOSE(<index>, <expr1>, <expr2>,…<expr_n>) returns the expression in the list matching the specified index. The CHOOSE function becomes dynamic when it relies on user input.
NULLs
SQL and T-SQL utilizes three-valued predicate logic where predicates can evaluate to TRUE, FALSE, or UNKNOWN. NULLs represent missing values. Non-NULL values evaluate to TRUE or FALSE, whereas NULL values evaluate to UNKNOWN. For query filters (like WHERE or HAVING clauses), SQL accepts records that are TRUE, while both FALSE and UNKNOWN records are rejected. For CHECK constraints, SQL “rejects FALSE”, meaning records that evaluate to either TRUE or UNKNOWN are accepted. With three-valued predicate logic, “accept TRUE” rejects UNKNOWNs, while “reject FALSE” accepts UNKNOWNs. In other words, all records that are NULL in query filters will get discarded because they are not evaluated to TRUE but rather UNKNOWN. However, since CHECK constraints just rejects FALSE, NULLs are accepted because they are not FALSE but rather UNKNOWN.
Because NULLs represent missing values, SQL can’t determine if one missing value is equal to another. Therefore, SQL provides us with the IS NULL and IS NOT NULL predicates, which should be used instead of = NULL and <> NULL respectively.
For example, the Sales.Customer table has three attributes storing customer location information: country, region, and city. Some values in the region attribute is missing but applicable. The following query returns all customers where the region is equal to WA:
(Include code, explanation, and result from page 78):
Out of the 91 records in the table, only three records evaluate to TRUE and are returned in the result set. All records that are missing (UNKNOWN) or different from WA (FALSE) are discarded. The following query returns all records where the region is different from WA:
(Include code, explanation, and result from page 79):
Because a query filter only accepts records that evaluate to TRUE, the above query did not return any rows where the region is equal to WA or NULL. The same result would be returned if we used NOT(region =N’WA’) in our query.
It is not recommended to use “=NULL” to return records that are NULL, because the expression it returns records that evaluate to UNKNOW – whether present or missing. For example, the following query returns an empty set:
(Include code, explanation, and result from page 79):
We should, instead use the IS NULL predicate:
(Include code, explanation, and result from page 80):
If we wish to return all regions different from WA, including records that are NULL, we need to explicity include that in our query:
(Include code, explanation, and result from page 80):
For grouping and sorting purposes, all NULLs are considered equal. The GROUP BY clause will arrange all NULLs into a single group, and the ORDER BY clause sorts all NULLs together. ANSI SQL leave it to the RDBMS vendor to determine whether NULLs should be sorted before or after non-NULL records. T-SQL sorts NULLs before all non-NULL records or groups. While enforcing the UNIQUE constraint, standard SQL treats NULLs as different from each other, so to allow multiple NULLs. UNIQUE constraints in T-SQL treats all NULLs as equal, allowing only one NULL.
All-at-one operations
All-at-once operations means that all expressions appearing in the same logical query processing phase are logically evaluated at the same time. This is because all expressions in the same logical query processing phase are treated as a set, and by definition a set has no order to its elements.
Because of this, it is not possible to refer to a column alias by another expression in the same SELECT statement, even if the referencing expression comes after the expression that created the column alias:
SELECT SalesOrderID, YEAR(OrderDate) AS orderyear, orderyear + 1 as nextyear FROM Sales.SalesOrderHeader;
We get an error for this query because there is no order in the logical evaluations of the expressions in the SELECT clause – it is a set of expressions which are evaluated at the same time.
Suppose we have a table T1 with two integer columns col1 and col2. We wish to return all records where col2/col is greater than 2. Because there may be zeros in col1, we need to ensure division doesn’t take place for those records, which will otherwise produce a divide-by-zero error, so our query goes as follows:
SELECT col1,col2 FROM dbo.T1 WHERE col1<>0 AND col2/col1 > 2;
Because of the all-at-once concept, the RDBMS is free to evaluate the expressions in the WHERE predicate in any order. Typically, the order of evaluating is based on cost estimations. So it’s possible the query might fail because of a divide-by-zero error. The order in which a WHEN clause is guaranteed, so we can revise the above query as follows:
SELECT col1,col2 FROM dbo.T1 WHERE CASE WHEN col1=0 THENn 'no' WHEN col2/col1 > 2 THEN 'yes' ELSE 'no' END = 'yes';
Where the records in col1 is zero, the first WHEN clause will evaluate to TRUE, so the CASE expression will return ‘no.’ Only records where the first CASE WHEN expression evaluates to not TRUE (ie. the record is not 0), does the second WHEN clause check whether col2/col1>2 evaluates to true. If the second WHEN clause evaluate to TRUE then ‘yes’ is returned. In cases beyond the first two WHEN clauses, ‘no’ is returned. The predicate in the WHERE clause evaluates to TRUE only when the result in the CASE expression returns ‘yes’, meaning there will be no attempts to divide by 0 in this query.
Alternatively, we can employ a mathematical workaround to avoid division altogether:
SELECT col1, col2 FROM dbo.T1 WHERE (col1>0 AND col2>2*col1) OR (col1<0 AND col2<2*col1);
Always remember that SQL Server guarantees the processing order of the WHEN clause in a CASE expression.
Working with character data
Data types
There are two types of character data in SQL Server: regular (CHAR and VARCHAR) and Unicode (NCHAR and NVARCHAR). Regular characters use one byte of storage per character whereas Unicode requires 2 bytes per character. In cases where a surrogate pair is needed, 4 bytes per character is required.
Regular character types restrict a column to having only one language in addition to English. Multiple languages are supported with the Unicode character types.
We simply use single quotes to express regular character literals (ex. ‘Test’), whereas Unicode character literals require we specify N (National) as a prefix (ex. N‘Test’)
For data types without the VAR element (CHAR and NCHAR), the length is fixed and SQL Server preserves space in the row based on the column’s defined size and not on the actual number of characters in the string. These data types are ideal for write-focused systems. However, because storage consumption is not optimal with fixed-length data types, it is more costly to read data.
Data types having the VAR element (VARCHAR and NVARCHAR) have variable length, and the RDBMS utilizes as much storage space in the row as required to store the characters appearing in the string plus 2 extra bytes for offset data. For example, the maximum numbers of characters supported in a column defined as VARCHAR(25) is 25, but in actuality the number of characters in the string determines the storage. Because storage consumption for these character types is less than for fixed-length types, read operations are faster and less costly. However, updates may result in row-expansion, resulting in data moving outside the current page. Therefore, updates for these variable-length types are less efficient than for fixed-length data types. Instead of specifying the maximum number of characters for variable-length data types, we can also instead use the MAX specifier. When a column is defined with the MAX specifier, any value having a size up to a certain threshold (8000 bytes by default) can be stored inline the row. Any value stored having a size exceeding the threshold is stored outside the row as a LOB (Large Object).
Collation
(Complete this section)
String operators and functions
String concatenation
To concatenate strings, use the plus-sign (+) or the CONCAT function. For example, the following query concatenates the firstname, a space, and lastname from the Employees table:
(Include code, explanation, and results from page 86)
By default, concatenation with a NULL will return a NULL:
(Include code, explanation, and results from page 87)
To substitute a NULL with an empty string, use the COALESCE function. COALESCE accepts a list of arguments and returns the first value that is NULL. The following is a revision of the previous query, with NULLs substituted with empty strings:
(Include code, explanation, and results from page 88)
The CONCAT function accepts a list of inputs to be concatenated and automatically substitutes NULLs with empty strings. For example, the following query concatenates the customer’s location elements using the CONCAT function, replacing NULLs with empty strings:
(Include code, explanation, and results from page 88)
The SUBSTRING function
(Include syntax diagram)
The SUBSTRING function accepts an input string and extracts a substring of that starting at a specified start position for a specified length of the substring. For example, the following query returns ‘abc’:
SELECT SUBSTRING('abcde',1,3);
If the specified substring length exceeds the length of the input string, the SUBSTRING function returns everything until the end without raising an error message.
The LEFT and RIGHT functions
The LEFT and RIGHT functions are abbreviations of the SUBSTRING function. They return the specified number of characters from the left or right end of the inputted string.
(Include syntax diagram)
The first argument is the string we’re extracting the substring from. The next argument specifies the number of characters to return from the left or right end of the string. For example, the following code returns ‘cde.’
SELECT RIGHT('abcde',3);
The LEN and DATALENGTH functions
The LEN functions returns the number of characters in an inputted string.
(Include syntax diagram)
The LEN function returns the number of characters in an inputted string, not the number of bytes. Typically, both numbers are the same since each character requires 1 byte of storage. However, with Unicode characters each character requires at least 2 bytes of storage, so the number of characters is about half the number of bytes. The DATALENGTH function returns the actual number of bytes in an inputted string:
SELECT LEN(N'abcde');
SELECT DATALENGTH(N'abcde');
The CHARINDEX function
The CHARINDEX function returns the position of the first occurrence of a substring inside a string.
(Include syntax diagram)
CHARINDEX has an optional third argument that specifies the position from which to start looking. Without this optional third argument, by default CHARINDEX starts searching from the first character. If the substring is not found, CHARINDEX returns 0. For example, the following query returns the position of the first occurrence of ‘Itzik Ben-Gan’, returning 6:
SELECT CHARINDEX(' ','Moe Ahad');
The PATINDEX function
The PATINDEX function returns the position of the first occurrence of a pattern within a string.
(Include syntax diagram)
For example, the following code returns the position of the first occurrence of a number within a string, returning 5:
SELECT PATINDEX('%[0-9]%','abcd123efgh');
The REPLACE function
The REPLACE function replace all occurrences of a substring with another substring.
(Include syntax diagram)
The following query replaces all occurrences of a dash in the input string with a colon:
SELECT REPLACE('1-a 2-b','-',':');
The REPLACE function can also be used to count the number of occurrences of a character in a string. To accomplish this, we replace all occurrences of the character with an empty string, and from that calculate the length of the original string minus the length of the new string. For example, the following query returns the count of the letter e in the lastname attribute for each employee:
SELECT BusinessEntityID, LastName, LEN(LastName)-LEN(REPLACE(LastName,'e','')) AS numoccur FROM HumanResources.vEmployee;
The REPLICATE function
The REPLICATE function replicates a string a specified number of times.
(Include syntax diagram)
The following query replicates ‘abc’ three times, returning ‘abcabcabc’:
SELECT REPLICATE('abc',3);
The following query against the Purchasing.PurchaseOrderHeader table generates a 10-digit string from the vendorr ID by adding leading zeros. This is done using the REPLICATE and RIGHT functions, and also string concatenation.
SELECT VendorID, RIGHT(REPLICATE('0',0) + CAST(VendorID as VARCHAR(10)),10) AS strvendorid FROM Purchasing.PurchaseOrderHeader;
The CAST function converts the original vendor ID from an integer to a string (VARCHAR). The RIGHT functions takes the 10 rightmost characters of the result string.
Note that the FORMAT function can return the same result set as the above query, but at a higher cost.
The STUFF function
The STUFF function take an inputted string, and deletes a specified number of characters from that string starting at a specified character position. It then inserts a specified string at the specified start position. For example, the following code removes the ‘y’ in ‘xyz’, and replaces that with ‘abc’:
SELECT STUFF('xyz',2,1,'abc');
If we just want to insert a string without deleting anything, we just specify a length of 0 as the third argument.
The UPPER and LOWER functions
The UPPER function converts a string to uppercase, and the LOWER function converts a stringer to lowercase.
(Include syntax diagram)
For example, the following queries returns Moe Ahad in uppercase and lowercase:
SELECT UPPER('Moe Ahad');
SELECT LOWER('Moe Ahad');
The RTRIM and LTRIM functions
(Include syntax diagram)
LTRIM removes leading spaces and RTRIM removes trailing spaces in a string. To trim both sides of a string, we use the result of one function as an input for the other. For example, the following query will return ‘abc’:
SELECT RTRIM(LTRIM(' abc '));
The FORMAT function
The FORMAT function formats an input value as a string based on a Microsoft .NET format string and an optional culture specification.
(Include syntax diagram)
For example, the following code converts 1759 into a string with nine leading zeros:
SELECT FORMAT(1759,'000000000');
It is recommended to not use the FORMAT function when possible because of its expensive cost in terms of performance. The above code will have less performance penalty if we used the REPLICATE and RIGHT combination of functions instead of the FORMAT function.
The COMPRESS and DECOMPRESS functions
The COMPRESS and DECOMPRESS functions were introduced in SQL Server 2016, and they take an input and uses the GZIP algorithm to compress or decompress that input, respectively.
(Include syntax)
The COMPRESS function takes a character or binary string as input and returns a compressed VARBINARY(MAX) value type. For example, the following query uses the COMPRESS function with a string constant as input to return a binary value holding a compressed form of the input:
(Include code, explanation, and results from page 93)
To store the compressed for of an input value in a column of a table, we first apply the COMPRESS function to the input value and then store the result in the table. This can be done as part of the INSERT statement that would add rows to the target table.
Supposed we had a table called EmployeeCVs containing two columns: empid, and cv which holds the compressed form of the employee’s resume defined as VARBINARY(MAX). For this table we can use a stored procedure called AddEmpCV that accepts the parameters @empid and @cv (uncompressed form of the input employee’s resume as NVARCHAR(MAX)). This AddEmpCV stored procedure can be used to insert new rows into the EmployeeCVs table with the compressed employee resume information. The INSERT statement in the stored procedure would look like as follows:
(Include code, explanation, and results from page 94)
The DECOMPRESS function accepts a binary string as a binary string as input, to return a decompressed VARBINARY(MAX) value. Note that if the originally compressed input value was of a character string type, we need to explicitly cast the result of the DECOMPRESS query to the target type, otherwise it will return a binary value. For example, to return the uncompressed form of the employee resumes from the EmployeeCVs table, we use the following query that casts to resumes to the target character string type:
(Include code, explanation, and results for both from page 94)
The STRING_SPLIT function
The STRING_SPLIT function splits an input string representing a separated list of values into individual elements.
(Include syntax)
Unlike the previous string functions which return scalar values, the STRING_SPLIT function is a table function. The STRING_SPLIT function accepts an input string representing a separated list of values with a separator and returns individual elements stored in a column called val. To return a column with a data type other than string, we cast the val column to the target type. For example, the STRING_SPLIT function takes ‘101,102,103’ and the delimiter ‘,’ as arguments to return the individual elements in a column as integer values:
SELECT CAST(value AS INT) AS myvalue FROM STRING_SPLIT('10248,10249,10250',',') AS S;
The LIKE predicate
The LIKE predicate is used to check whether a character string matches a specified pattern. LIKE is a better alternative to functions like SUBSTRING and LEFT because it is optimizing the query better, especially when the pattern begins with a known prefix.
The following are wildcards supported by the LIKE predicate:
The % (percent wildcard)
The % represents a string of any size, including empty strings. For example, the following query returns records for all employees having a last name starting with D:
SELECT BusinessEntityID, LastName FROM HumanResources.vEmployee WHERE LastName LIKE N'D%';
The _ (underscore) wildcard
An underscore represents a single character. For example, the following query returns employee records where the second character in the last name is e:
SELECT BusinessEntityID, LastName FROM HumanResources.vEmployee WHERE LastName LIKE N'_e%';
The [<list of characters>] wildcard
Square brackets containing a list of characters return records matching one of the characters in the square bracket list. For example, the following query returns all records where employee last name starts with A, B, or C:
SELECT BusinessEntityID, LastName FROM HumanResources.vEmployee WHERE LastName LIKE N'[ABC]%';
The [<character>-<character>] wildcard
Squared brackets containing a range of characters return records that match one of the characters in the range. For example, the following query returns all records where the employee last name starts with a character in the range A-E.
SELECT BusinessEntityID, LastName FROM HumanResources.vEmployee WHERE LastName LIKE N'[A-E]%';
The [^<character list or range>] wildcard
Square brackets containing a ^ (caret sign) followed by a list or range of characters will return records that are NOT matching the specified list or range of characters. For example, the following query will return all records where the employee last name does not start in the range A-E:
SELECT BusinessEntityID, LastName FROM HumanResources.vEmployee WHERE LastName LIKE N'[^A-E]%';
The ESCAPE character
An escape character can be used to search for a character that are also used as wildcards (such as %, _, [,]). We specify a character we are sure is not in the data, followed by the character we’re searching for, and following that we specify the ESCAPE keyword and then the escape character again. For example, to search for an underscore in col1, we use col1 LIKE ‘%!_%’ESCAPE‘!’.
(Include an example for this (none in book))
Working with date and time data
Date and time data types
There are six date and time data types that are supported by T-SQL: DATETIME, SMALLDATETIME, DATE, TIME, DATETIME2, and DATETIMEOFFSET. DATETIME and SMALLDATETIME. DATETIME and SMALLDATETIME are legacy data types and include date and time components that are inseparable. The two types differ in their storage requirements, their supported date range, and their precision. The DATE and TIME data types do provide a separation of date and time components. DATETIME2 has a bigger date range and better precision than DATETIME and SMALLDATETIME. DATETIMEOFFSET is similar to DATETIME2, but it also provides offset from UTC.
The following table describes all the T-SQL date and time data types in more detail, including storage requirements, supported date ranges, level of precision, and recommended entry format:
The storage requirement for TIME, DATETIME2, and DATETIMEOFFSET is dependent on the specified precision. Fractional-second precision can be specified as an integer between 0 and 7. For instance, TIME(0)means 0 fractional-second, or one-second precision. TIME(3) means one-millisecond precision. If no fractional-second is specified, T-SQL by default sets the precision to 7. When we’re converting a value to a lower precision, the data type gets rounded to the closest expressible value in the target precision.
Literals
T-SQL does not provide a means to express a date and time literal. Instead, we can specify a literal of a different data type and then convert that (explicitly or implicitly) to a data and time data type. It is good practice to use character strings to show date and time values, as shown below:
SELECT SalesOrderID, CustomerID OrderDate, SalesPersonID, FROM Sales.SalesOrderHeader WHERE OrderDate>='20160212';
SQL Server recognizes ‘20160212’ as a character-string literal. Implicit conversion between data types is based on data-type precedence. Precedence among data types is pre-defined in SQL Server, and SQL Server implicitly converts a data type of lower precedence to one that is of higher precedence. In our example, character strings are of a lower data precedence than the column’s data type (DATETIME). For more information, consult the SQL Sever documentation on the topic here: https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-type-precedence-transact-sql?view=sql-server-2017
The following query is the same as our previous query, the only difference being the character string is explicitly converted to the DATE data type using the CAST function:
SELECT SalesOrderID, CustomerID OrderDate, SalesPersonID, FROM Sales.SalesOrderHeader WHERE OrderDate=CAST('20160212' AS DATE);
The effective language of our session configures several language-related settings on the backend. For example, the DATEFORMAT function is expressed as a combination of d, m, and y characters. The us_english of DATEFORMAT is set to mdy, whereas the British setting is dmy. Though not recommended, we can override the DATEFORMAT setting of our session using the SET DATEFORMAT command.
The LANGUAGE/DATEFORMAT setting defines how literals will be defined for DATETIME, SMALLDATETIME, DATE, DATETIME2, and DATETIMEOFFSET data types. For example, the string literal ‘02/12/2016’ can be interpreted as February 12, 2016 or December 2, 2016, with the determining factor being the effective LANGUAGE/DATEFORMAT setting:
SET LANGUAGE British; SELECT CAST('02/12/2016' AS DATE);
SET LANGUAGE us_english; SELECT CAST('02/12/2016' AS DATE);
Because it’s possible for our code to be read or used by international users with different login language settings, it is essential that we phrase our literals in a language-neutral manner. SQL Server interprets language-neutral literals the same way and is not affected by language-related settings like LANGUAGE/DATEFORMAT. To see examples of neutral literal formats, consult the documentation here: (Include table)
For all data types that include both a data and time component, SQL Server will assume midnight if the time component is not provided. Also, if no offset from UTC specified, SQL Server defaults to 00:00.
The formats ‘YYYY-MM-DD’ and ‘YYYY-MM-DD hh:mm…’ are language dependent when converted to DATETIME or SMALLDATETIME, and language-neutral when converted to DATE, DATETIME2, or DATETIMEOFFSET. For example, in the following code we have a literal with the format ‘YYYYMMDD’. The language setting here has no impact how the literal is interpreted when it’s converted to DATE:
SET LANGUAGE British; SELECT CAST('02/12/2016' AS DATE);
SET LANGUAGE us_english; SELECT CAST('02/12/2016' AS DATE);
To use formats that are language-dependent rather than language-neutral, we have two options available. We can use the CONVERT function to explicitly convert the character-string literal to the requested data type, with a third argument specifying a number representing the style we want to use. To learn more about the list of style numbers and their respective formats, consult Books Online here: https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql?view=sql-server-2017
For example, we can use style 101 as an argument to format the literal ‘02/12/2016’ as MM/DD/YYYY:
SELECT CONVERT(DATE,'02/12/2016',101);
We can specify style 103 to format ‘02/12/2016’ as DD/MM/YYYY, returning December 2, 2016:
SELECT CONVERT(DATE,'02/12/2016',103);
Our second option is to use the PARSE function. With PARSE we specify a value as a requested type and also specify the culture. For example, the following query is equivalent to using the CONVERT function with style 101:
SELECT PARSE('02/12/2016' AS DATE USING 'en-US');
The following is equivalent to using CONVERT with style 103:
SELECT PARSE('02/12/2016' AS DATE USING 'en-GB');
Note that the PARSE function is significantly more expensive than CONVERT in terms of performance, which is why it’s best not to use the PARSE function whenever possible.
Working with date and time separately
It’s recommended that use the DATE and TIME data types if we wish to work with date and time separately.
In the following query we have a column orderdate in the Sales.Orders2 table that is of DATETIME data type. The Sales.Orders2 table was created from the Sales.Order table, with the orderdate cast from DATE to DATETIME, as follows:
DROP TABLE IF EXISTS Sales.SalesOrderHeader2; SELECT SalesOrderID, CustomerID SalesPersonID, CAST(OrderDate AS DATETIME) AS orderdate INTO Sales.SalesOrderHeader2 FROM Sales.SalesOrderHeader;
Because only the date component of orderdate is relevant, all time values are set to midnight. To filter for orders from a certain date, we simply use the equality operator, like follows:
SELECT SalesOrderID, CustomerID SalesPersonID, orderdate FROM Sales.SalesOrderHeader2 WHERE orderdate='20160212';
By default, SQL Server converts a character string literal that only has a date to DATETIME, to have the time component set to midnight. Since all orders in the orderdate column has the time component to midnight, all orders for a queried date will get returned. To ensure all time components are set to midnight, we can use the CHECK constraint, as follows:
ALTER TABLE Sales.SalesOrderHeader2 ADD CONSTRAINT CHK_Orders2_orderdate CHECK(CONVERT(CHAR(12),orderdate,114)='00:00:00:000');
The CONVERT function extracts the time component of orderdate as a character string using style 114, and the CHECK constraint verifies the string represents midnight.
If the time component is stored as non-midnight values, we can use a range filter, like follows:
SELECT SalesOrderID, CustomerID SalesPersonID, OrderDate FROM Sales.SalesOrderHeader2 WHERE OrderDate>='20160212' AND OrderDate<'20160213';
When SQL Server is converting a character-string literal containing only the time component of DATETIME or SMALLDATETIME, it assumes the base date of January 1, 1900. The following is an example of this:
SELECT CAST('12:30:15.123' AS DATETIME);
To work with only date or only time in values of DATETIME or SMALLDATETIME data types, we need to apply manipulation to those values to “zero” the irrelevant parts. In other words, we set the time component to midnight if we want to only work with dates or set the date component to the base date (January 1, 1900) if we only want to work with the times.
We execute the following query for cleanup by dropping the Sales.SalesOrderHeader2 table:
DROP TABLE IF EXISTS Sales.SalesOrderHeader2;
Filtering date ranges
We can use functions like YEAR or MONTH to return a range of dates like whole year or whole month. For example, the following query, returns all orders placed in 2015:
SELECT SalesOrderID, CustomerID SalesPersonID, OrderDate FROM Sales.SalesOrderHeader WHERE YEAR(OrderDate)=2015;
Note that SQL Server cannot use an index efficiently when we apply manipulation on the filtered column. For this reason, it’s good practice to revise the filter predicate from the previous query as follows:
SELECT SalesOrderID, CustomerID SalesPersonID, OrderDate FROM Sales.SalesOrderHeader WHERE OrderDate>='20150101' AND OrderDate<'20160101';
Similarly, to filter for orders placed on a month, like this:
SELECT SalesOrderID, CustomerID SalesPersonID, OrderDate FROM Sales.SalesOrderHeader WHERE YEAR(OrderDate)=2016 AND MONTH(OrderDate)=2;
it’s better to use a range filter, like this:
SELECT SalesOrderID, CustomerID SalesPersonID, OrderDate FROM Sales.SalesOrderHeader WHERE OrderDate>='20160201' AND OrderDate<'20160301';
Date and time functions
GETDATE, CURRENT_TIMESTAMP, GETUTCDATE, SYSDATETIME, SYSUTCDATETIME, SYSDATETIMEOFFSET
All six of the above date and time functions are parameterless and return the current date and time in the system in which the SQL Server instance resides.
(Include table from page 105)
Except for CURRENT_TIMESTAMP, all of these parameterless functions must be specified with empty parentheses. While CURRENT_TIMESTAMP and GETDATE return the same value, GETDATE is ANSI standard and therefore the recommended function of the two. The following code demonstrates all six of the current date and time functions:
SELECT GETDATE() AS [GETDATE], CURRENT_TIMESTAMP AS [CURRENT_TIMESTAMP], GETUTCDATE() AS [GETUTCDATE], SYSDATETIME() AS [SYSDATETIME], SYSUTCDATETIME AS [SYSUTCDATETIME], SYSDATETIMEOFFSET() AS [SYSDATETIMEOFFSET];
To return just the current date or current time instead of current date and time, we simply need to use the CAST function to the requested data type (DATE or TIME), like follows:
SELECT CAST(SYSDATETIME() AS DATE) AS [current_date], CAST(SYSDATETIME() AS TIME) AS [current_time];
The CAST, CONVERT, and PARSE functions and their TRY_ counterparts
The CAST, CONVERT, and PARSE functions are used to convert an inputted value to a specified target data type, and return an error if the input isn’t convertible to the target type. These three functions have TRY_ counterparts: TRY_CAST, TRY_CONVERT, and TRY_PARSE, respectively. TRY_ functions accepts the same arguments as their counterparts and returns the same conversions, except that it returns NULL if the input is not convertible to the target data type.
(Include syntax from page 106)
CONVERT has an optional third argument that specifies the style of the conversion. The full list of style numbers and their formats can be found in Books Online: https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql?view=sql-server-2017
The PARSE function also has an optional argument that specifies the culture, for example ‘en-US’ for U.S. English or ‘en-GB’ for British English.
Note that CAST is ANSI-standard while CONVERT and PARSE are not. For this reason, it’s good practice to use CAST over the other two, unless we need to specify a style number or culture.
The following query coverts the current system and date time to DATE using the CAST function:
SELECT CAST(SYSDATETIME() AS DATE);
To work with just the time component of CURRENT_TIMESTAMP (DATETIME data type), we can use the CONVERT function to convert the current date and time value to CHAR(12) and style specified to 114 (‘hh:mm:ss:nn’) to “zero” out the date component:
SELECT CONVERT(CHAR(12), CURRENT_TIMESTAMP,114);
We the convert to DATETIME to get the current time on the base date:
SELECT CONVERT(DATETIME,CONVERT(CHAR(12),CURRENT_TIMESTAMP,114),114);
The following two queries converts the input string ‘02/12/2016’ to U.S. English and British English culture respectively.
SELECT PARSE('02/12/2016' AS DATETIME USING 'en-US');
SELECT PARSE('02/12/2016' AS DATETIME USING 'en-GB');
Note that PARSE is significantly more expensive in terms of performance than CONVERT. For this reason, CONVERT is recommended over PARSE.
The SWITCHOFFSET function
The SWITCHOFFSET function accepts the DATETIMEOFFSET value as an argument and adjusts that value to a specified target offset from UTC.
(Include syntax)
The following query adjusts the current DATETIMEOFFSET value to offset by -5:00 (input the offset-by value in quotes):
SELECT SWITCHOFFSET(SYSDATETIMEOFFSET(),'-5:00');
To adjust the current DATETIMEOFFSET to UTC, set the offset-by value to +00.00, like follows:
SELECT SWITCHOFFSET(SYSDATETIMEOFFSET(),'+00:00');
The TODATETIMEOFFSET function
The TODATETIMEOFFSET function accepts a local date and time value, and offsets that UTC by a specified value to return a DATETIMEOFFSET value:
(Include syntax)
TODATETIMEOFFSET is different from SWITCHOFFSET in that its first input is a local date and time value without an offset component. This function simply creates a new DATETIMEOFFSET value by merging the input date and time with a specified offset.
This function is typically used when migrating non-offset aware data to offset-aware data.
The AT TIME ZONE function
The AT TIME ZONE function converts a date and time value to a specified DATETIMEOFFSET value that corresponds to a specified time zone.
(Include syntax)
The input date and time value can be one of four data types: DATETIME, SMALLDATETIME, DATETIME2, and DATETIMEOFFSET. The specified time zone can be any time zone supported by Windows. The following query returns the supported time zones in the sys.time_zone_info column, along with their respective offset from UTC and whether it supports Daylight Savings Time (DST):
SELECT name, current_utc_offset, is_currently_dst FROM sys.time_zone_info;
The following two queries demonstrates an example of the AT TIME ZONE function:
SELECT CAST('20160212 12:00:00.0000000' AS DATETIME2) AT TIME ZONE 'Pacific Standard Time' AS val1, CAST('20160812 12:00:00.0000000 AS DATETIME2) AT TIME ZONE 'Pacific Standard Time' AS val2;
SELECT CAST('20160212 12:00:00.0000000 -05:00' AS DATETIMEOFFSET) AT TIME ZONE 'Pacific Standard Time' AS val1, CAST('20160812 12:00:00.0000000 -04:00' AS DATETIMEOFFSET) AT TIME ZONE 'Pacific Standard Time' AS val2;
In the inputted date in the first query occurs when DST doesn’t apply, hence offset -08:00 is returned. The inputted date in the second query occurs during DST, therefore offset of -07:00 is returned.
The DATEADD function
The DATEADD function accepts a specified part of a date (Year, Quarter, Month, Day of Year, Day, Week, Weekday, Hour, Minute, Second, Millisecond, Microsecond, and Nanosecond), and adds to that part by specified a number.
(Include syntax)
The specified part of the year can be inputted as abbreviations in the query. Consult Books Online for the list of date parts and their respective abbreviations here: https://docs.microsoft.com/en-us/sql/t-sql/functions/dateadd-transact-sql?view=sql-server-2017
The returned type for the date and time will be of the same data type as the inputted date and time. However, if the inputted date and time is a string literal, the returned value will be of DATETIME type, like in the following query where one year is added to ‘20160212’ and the result is of DATETIME type:
SELECT DATEADD(year,1,'20160212');
The DATEDIFF and DATEDIFF_BIG functions
DATEDIFF and DATEDIFF_BIG returns the difference between two date and time values based on the specified part of the date. DATEDIFF returns values of INT type (4-byte integer) while DATEDIFF_BIG returns values of BIGINT type (8-byte integer).
(Include syntax)
For example, the following queries returns the difference between two dates by day:
SELECT DATEDIFF(day,'20150212','20160212');
DATEDIFF_BIG is used when the result is expected to be larger than the maximum INT value of 2,147,483,647. For example, we can use the DATEDIFF_BIG function to calculate the difference between January 1, 001 and February 12, 2016 to return a value of 65,590,832,000,000:
SELECT DATEDIFF_BIG(millisecond,'00010101','20160212');
With DATEADD and DATEDIFF, we can calculate the beginning or end of different date parts (day, month, quarter, or year) that corresponds to the input value. For example, the following query returns the beginning of the day corresponding to the input date and time:
SELECT DATEADD(day,DATEDIFF(day,'19000101',SYSDATETIME()),'19000101');
Here, the DATEDIFF function is used to calculate the difference between the current date and time and the anchor date (‘19000101’). The DATEADD function is then used to the add the result of the DATEDIFF function to the anchor date. This returns the current system date at midnight.
If we’re inputting month instead of day in the above DATEADD and DATEDIFF functions, we need to make sure the anchor is the first day of the month. For example, the following query returns the first day of the current month:
SELECT DATEADD(month,DATEDIFF(month,'19000101',SYSDATETIME()),'19000101');
To return the last day of the month or year, the anchor should be set to the last day of the month or year. For example, the following query returns the last day of the current year:
SELECT DATEADD(year,DATEDIFF(year,'18991231',SYSDATETIME()),'18991231');
Note that it’s much easier to get the last day of the month using the EOMONTH function. Unfortunately, there’s no similar function to return the quarter or year.
The DATEPART function
The DATEPART function accepts a date as input and returns an integer value for the requested part of the date and time.
(Include syntax)
Valid arguments representing part of the inputted date include: year, quarter, month, day of year, day, week, weekday, hour, minute, second, millisecond, microsecond, nanosecond, TZoffset, and ISO_WEEK. For example, the following query returns the month part of ‘20160212’, to return an integer value of 2 representing February:
SELECT DATEPART(month,'20160212');
The YEAR, MONTH, and DAY functions
The YEAR, MONTH and DAY functions are essentially abbreviations of the DATEPART function. They return an integer value representing the year, month, or day component of an inputted date and time.
(Include syntax)
As an example, the query below returns the day, month, and year integer values for February 12, 2016:
SELECT DAY('20160212') AS theday, MONTH('20160212') AS themonth, YEAR('20160212') AS theyear;
The DATENAME function
The DATENAME function returns a character string value that is representing a component of the inputted date and time.
(Include syntax)
While similar to the DATEPART function, the main difference between the two functions is that DATENAME returns the name of the requested component rather than an integer value as in the case for DATEPART. For example, the following query returns the name of the month for ‘20160212’:
SELECT DATENAME(month,'20160212');
Note that DATENAME is a language dependent function. The component of the inputted date has no name and is rather just an integer (such as year), DATENAME returns the numeric value as a character string. For example, the following query returns ‘2016’ for the inputted date ‘20160212’:
SELECT DATENAME(year,'20160212');
The ISDATE function
(Include syntax)
The ISDATE function is used to verify whether inputted argument is a date. If the inputted argument is a legitimate date, ISDATE returns 1 and returns 0 otherwise. For example, the following code returns 1:
SELECT ISDATE('20160212');
and returns 0 here:
SELECT ISDATE('20160230');
The FROMPARTS function
FROMPARTS accepts integer input values representing date and time components to construct a value of the specified type from those components. Variations of the FROMPARTS function include: DATEFROMPARTS, DATETIME2FROMPARTS, DATETIMEFROMPARTS, DATETIMEOFFSETFROMPARTS, SMALLDATETIMEFROMPARTS, and TIMEFROMPARTS.
(Include syntax)
These functions help construct date and time values from the different components, and also simplify the process of migrating code from other SQL environments supporting similar functions. The following is an example of how to query using the FROMPARTS function:
SELECT DATEFROMPARTS(2016,02,12), DATETIME2FROMPARTS(2016,02,12,13,30,5,1,7), DATETIMEFROMPARTS(2016,02,12,13,30,5,997), DATETIMEOFFSETFROMPARTS(2016,02,12,13,30,5,1,-8,0,7), SMALLDATETIMEFROMPARTS(2016,02,12,13,30), TIMEFROMPARTS(13,20,5,1,7);
The EOMONTH function
EOMONTH accepts an inputted date and time and returns the date for the end of the month of the inputted value as a DATE data type. There is also an optional second argument that specifies how many months to add or subtract.
(Include syntax)
For example, the following query will return the date of the current month:
SELECT EOMONTH(SYSDATETIME());
The following query returns orders placed at the last day of the month:
SELECT SalesOrderID, CustomerID SalesPersonID, OrderDate FROM Sales.SalesOrderHeader WHERE OrderDate=EOMONTH(OrderDate);
Querying metadata
There are tools in SQL Server that allow us to view the metadata of objects, such as information about the tables in a database or columns in a table. Example of such tools in SQL Server include catalog views, information schema views, and system stored procedures and functions.
Catalog Views
Catalog views provide detailed information on objects in the SQL Server database. For example, we can query the sys.tables view to get the list of tables in a database along with their schema names:
USE AdventureWorks2017; SELECT SCHEMA_NAME(schema_id) AS table_schema_name, name AS table_name, FROM sys.tables;
The SCHEMA_NAME function converts the schema ID to its name. We can get information about the columns in a table by querying the sys.columns table. For example, the following query returns information about the columns in the Sales.Order table. This includes the column names as well as their data type, maximum length, collation name, and nullability:
SELECT name as column_name, TYPE_NAME(system_type_id) AS column_type, max_length, collation_name, is_nullable FROM sys.columns WHERE object_id=OBJECT_ID(N'Sales.SalesOrderHeader');
Information schema views
The INFORMATION_SCHEMA schema contains a set of views that provide metadata information defined in the SQL standard. These don’t cover metadata specific to SQL Server (such as indexing). For example, the following query on INFORMATION_SCHEMA.TABLES provides a list of user tables in the current database along with their schema names:
SELECT TABLE_SCHEMA,TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE=N'BASE TABLE';
The following query against the INFORMATION_SCHEMA.COLUMNS returns information about columns in the Sales.Orders table:
SELECT COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, COLLATION_NAME, IS_NULLABLE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA=N'Sales' AND TABLE_NAME=N'Orders';
System stored procedures and functions
System stored procedures and functions query the system internally to return more curated metadata information.
The sp_tables stored procedure returns tables, views, and other objects that can be queried in the current database:
EXEC sys.sp_tables;
The sp_help stored procedure returns multiple result sets containing general information about an object such as columns, indexes, and constraints. For example, the following query returns detailed information about the Orders table:
EXEC sys.sp_help @objname=N'Sales.SalesOrderHeader';
The sp_columns stored procedure returns information about columns in an object. For example, the following query returns information about columns in the Orders table:
EXEC sys.sp_columns @table_name=N'SalesOrderHeader', @table_owner=N'Sales';
The sp_helpconstraint stored procedure provides information about constraints. For example, the following query returns constraints information in the Orders table:
EXEC sys.sp_helpconstraint @objname=N'Sales.SalesOrderHeader';
The SERVERPROPERTY function returns information about the requested property of the current instance. For example, the following query returns the product level of the current instance:
SELECT SERVERPROPERTY('ProductLevel');
The DATABASEPROPERTYEX function returns information about the requested property of a specified database name. For example, the following query returns the collation information of the TSQLV4 database:
SELECT DATABASEPROPERTYEX(N'AdventureWorks2017','Collation');
The OBJECTPROPERTY function returns information about the requested property of a specified object name. For example, the following query is used to indicate whether the Orders table has a primary key:
SELECT OBJECTPROPERTY(OBJECT_ID(N'Sales.SalesOrderHeader'),'TableHasPrimaryKey');
The COLUMNPROPERTY function returns information about the requested property of a specified column. For example, the following query indicates whether the shipcountry column in the Orders table is nullable:
SELECT COLUMNPROPERTY(OBJECT_ID(N'Sales.SalesOrderHeader'), N'ShipToAddressID','AllowsNull');