Filters:

  • Technologies

  • Instructors

  • A Real-World Example of a Non-Correlated SELECT Subquery and Cross Join

    I recently developed a solution to a problem that is a good example of a non-correlated subquery in a SELECT clause that can also be written as a cross join. I’ll describe the requirement, the data, and then the solution with a non-correlated subquery and another with a cross join. To keep my client and their data anonymous, I’ve translated the problem to the AdventureWorks database.

    The Requirement

    The client wanted to see, for each sales person who placed orders over a time period, their total sales and their “productivity” over that time period. They defined productivity as the total sales divided by the number of business days in that time period. Here is the output that meets the requirement and that our solution will have to produce. This is for the time period of March, 2004, given the data in the AdventureWorks database.

    001-non-Correlated-SELECT-Subquery-and-Cross-Join

    The Data

    The AdventureWorks database contains a table called Sales.SalesOrderHeader that holds sales orders. There are three columns in that table that are of interest to us: SalesPersonID, OrderDate, and TotalDue. Here is a sample of the data in those columns, formatted for a clean presentation, and including only the data for the time period of March, 2004 that I will use to test the solution, though not all rows in that time period are shown:

    002-non-Correlated-SELECT-Subquery-and-Cross-Join

    In addition, I have created a table called DimDate that is similar to the one my client has in their database. DimDate is a dimension table in a data warehouse. (As we see in the Data Warehouses course, it is useful to store dates in a data warehouse in a date dimension table in which each row represents a different date). The IsBusinessDay column is a bit data type in which a 1 means that the date is a business day and a zero means that it is not. To simplify things for this example, I have created dates only for the date range I will be using in this example—March, 2004—and flagged only weekend dates as non-business days. Here is the script I used to create the DimDate table and to show its contents:

    003-non-Correlated-SELECT-Subquery-and-Cross-Join

    The Solution

    Step 1: Create the Base Query that Aggregates by the Salespersonid

    The output you saw earlier that meets the requirement shows one row for every sales person, along with a sum of the TotalDue column for each sales person and the number of business days. Even though the number of business days will be the same for each salesperson, it needs to be included on every row so that we can divide each salesperson’s sum of total due by the number of business days.

    We’ll start with the following query that finds the sum of the total due for each employee for the test time period of March, 2004. Notice that we group by the SalesPersonID to get one row per sales person and then we sum the TotalDue for each sales person. Notice also that in this phase we are not yet doing any formatting of the output, because we need the TotalDue to remain a numeric value for when we divide it by the number of business days in a subsequent phase.

    You may also like:  How to Display the Elapsed Processing Time of a Report in SSRS

    004-non-Correlated-SELECT-Subquery-and-Cross-Join

    By the way, notice that there is a NULL SalesPersonID. A SalesPersonID is NULL when the order is taken  electronically over the company’s website. Let’s clean that up a bit. We’ll use the ISNULL() function to display “Internet Sale” when the SalesPersonID is NULL. There’s just one small problem with that approach, though. The ISNULL() function requires that both of its arguments be the same data type, and our two arguments are not; SalesPersonID is an integer and “Internet Sale” is character. If we do nothing and leave it up to SQL Server, it will attempt to perform an implicit conversion. In an implicit conversion, the lower data type (character, in this case) is converted up to the higher data type (integer, in this case), and that would fail because “Internet Sale” does not resemble an integer! So, we will need to perform an explicit conversion and cast the SalesPersonID down to a character. Here is the revised query:

    005-non-Correlated-SELECT-Subquery-and-Cross-Join

    Step 2: Include the Number of Business Days

    We can very easily find the number of business days for a time period by filtering the DimDate table to return only days where IsBusinessDay = 1 within the time period we want. Here is a query that does just that for our time period of March, 2004 (there were 23 business days in March, 2004):

    006-non-Correlated-SELECT-Subquery-and-Cross-Join

    Notice that this query returns a single row and a single column. A query that returns a single value like this is referred to as a scalar query. Because the query above is scalar, it can be used as a column in a SELECT clause. Just one problem: The GROUP BY clause in our base table won’t allow us to do that, because the GROUP BY demands that the only two things in the SELECT clause be what is in the GROUP BY clause and aggregate functions. We can get around this in a couple of ways.

    One way is by making our base query a derived table subquery and including the scalar query in the outer query.

    In the query below, BaseQuery is the base query we developed earlier, and BusDays is the scalar query that returns the number of business days in our time period. NumBusDays is a subquery in the SELECT clause; it occupies a column position in the outer SELECT. The outer query displays all the columns in the base query (BaseQuery.*) and the row count returned by NumBusDays.

    007-non-Correlated-SELECT-Subquery-and-Cross-Join

    The other way to accomplish the same results is by using a cross join. A cross join joins two tables by matching up every row in one table with every row in the other table. Since we want the single row returned by our scalar query to appear on every row of our aggregate query, a cross join would also work (any query that uses a non-correlated subquery in a Select clause can also be written as a cross join).

    You may also like:  Repeating and Freezing Column Headers in SSRS Tables

    008-non-Correlated-SELECT-Subquery-and-Cross-Join

    You can run both queries with the actual execution plan on to see if one is more efficient than the other. In this example, they are both the same—neither approach is faster or more efficient than the other, though you should not always count on this being the case; it’s always a good idea to compare both execution plans as there are many factors that can effect the outcome.

    Step 3: Calculate the Productivity and Format the Output

    At this point, we have everything we need on one row to allow us to divide the sum of the TotalDue column by the number of business days. All we have to do now is add another column to the outer query that performs that division; but in the subquery approach, if we do it that way, we’d have to repeat the scalar query as the denominator. To avoid that, we can make a derived table out of everything we have so far and then use the column aliases, instead. We can also format things nicely in the new outer query. Much easier!

    Compare the results of this query with the results shown in the Requirements section above; they are the same.

    009-non-Correlated-SELECT-Subquery-and-Cross-Join

    With the cross join, the new outer query is not even necessary:

    010-non-Correlated-SELECT-Subquery-and-Cross-Join

    Enjoy!

    Peter Avila
    SQL Server Instructor – Interface Technical Training
    Phoenix, AZ

    Share your thoughts...

    Please fill out the comment form below to post a reply.