Sunday, May 29, 2016

Why use a Date Dimension Table in a Data Warehouse

In the Data Mart, or the Data Warehouse world, there is a date dimension table in all schemas if you are using the Kimball Dimensional Modeling method. In the beginning of Dimensional Modeling, it was called a Time dimension. Since then, the Time dimension has actually been separate from the date values. After listing all these terms: time, date, dimensional modeling, data mart, data warehouse, etc., you might be more confused than ever about what we are talking about.

So, let’s start with some definitions. A Data Mart is actually a small part of a Data Warehouse. The Data Warehouse is considered the entire set of tables in a database. The Data Mart is defined as a business process that is represented by the data contained in the process. The structure of a Data Mart is to enable a simplistic way to do querying or reporting. The data population stage has a component that translates the data complexity into an actual column for a description rather than a code.

Dimensional Modeling is the process to take the data from various sources and produce this Data Mart structure with usually one Fact table and multiple Dimension tables. The Dimension tables are related to the Fact table by a surrogate keys. The Fact contains the measures of the data the business process wants to consume. There can be multiple Data Marts in a Data Warehouse, so do not get hung up by the single Fact table in a Data Mart. Eventually, you will see the Dimension tables related to many Fact tables in the overall schema. These dimension are termed Conformed Dimensions.

The Date dimension is one of these dimension tables related to the Fact. Here is a simple Data Diagram for a Data Mart of Internet Sales information for the Adventure Works DW database which can be obtained for free from CodePlex or other online sources.

image_thumb[1]

As you can see in this diagram, the Date table (DimDate) is related to the fact table (FactInternetSales) by 3 different dates in the fact: Order, Ship and Due Date. This is called a Role Playing dimension in the Dimensional Modeling world. The relationship is created by the surrogate keys columns (integer data type) rather than the date data type. The DateKey column in the DimDate table is related to the OrderDateKey column in the FactInternetSales table as well as ShipDateKey and DueDateKey.

The DimDate table has columns that normally would have to be extracted from a Date column with a function. These include CalendarYear, CalendarQuarter or DayNumberOfMonth. The data in this table has a row for every day possible in your Data Mart.

 

DateKey

FullDateAlternateKey

DayNumberOfMonth

CalendarQuarter

CalendarYear

20050101

2005-01-01

1

1

2005

20050102

2005-01-02

2

1

2005

20050103

2005-01-03

3

1

2005

20050104

2005-01-04

4

1

2005

20050105

2005-01-05

5

1

2005

20050106

2005-01-06

6

1

2005

20050107

2005-01-07

7

1

2005

20050108

2005-01-08

8

1

2005

20050109

2005-01-09

9

1

2005

20050110

2005-01-10

10

1

2005

Think about the reason for every day to be in this table. If there are no Internet Sales on 12/28/2010, then we would have to do some fancy LEFT JOIN with a sub-query to create this row in an analysis of the data. But, with a Date Dimension table, we LEFT JOIN from the DimDate table to the Internet Sales fact table and we can extract $0 with a IsNull(SalesAmount) from this query.

 

SELECT d.CalendarYear, d.FullDateAlternateKey, Sum([SalesAmount])

  FROM dbo.DimDate d

    LEFT JOIN [dbo].[FactInternetSales] fs ON fs.OrderDateKey = d.DateKey

  WHERE d.CalendarYear = 2010

    AND d.MonthNumberOfYear = 12

  GROUP BY d.CalendarYear, d.FullDateAlternateKey

  ORDER BY d.CalendarYear, d.FullDateAlternateKey

image_thumb[4]

The query users have to write against a Data Mart are much simpler than against a transaction database. There are less joins because of the one to many relationships between the fact dimension table(s). The dimension tables are confusing to someone who has been normalizing databases as a career. The dimension is a flattened or de-normalized table. This creates cases of duplicate data, but the simplistic query overrides the duplicate data in a dimensional model.

With this table, we can stop using functions on date columns like YEAR (), MONTH (), etc. So, a query from an OLTP might look like this:

 

SELECT DATEPART(Year, SOH.OrderDate), SUM(DET.LineTotal) AS Sales

  FROM Sales.SalesOrderHeader SOH

    INNER JOIN Sales.SalesOrderDetail DET ON SOH.SalesOrderID = DET.SalesOrderID

  GROUP BY DATEPART(Year, SOH.OrderDate)

Whereas the dimensional model query would look like this:

 

SELECT d.CalendarYear, Sum(SalesAmount)

  FROM dbo.FactInternetSales fi

    INNER JOIN dbo.DimDate d ON d.DateKey = fi.OrderDateKey

  GROUP BY d.CalendarYear

The end user might have a better time understanding the structure of the Dimensional Model than the transactional system. Especially, if data is obtained from different databases maybe on different servers. Is it no fun trying to explain the LinkedServer and how to use it? Consolidating the needed reporting data into a Data Mart, then Data Marts together into a Data Warehouse makes life much easier for report writers.

The date dimension can also contain columns for Weekend versus Weekday, Holiday and month markers like 2014-10 or by quarter like 2014-Q1. All these can be computed once in the dimension table and used at will by query writers. They now do not have to know how to use T-SQL functions or concatenate substrings of “CASTed” date columns.

Then, when the DimDate is related to various Fact tables and processed into an OLAP cube, the measures and aggregations are displayable side by side through the DimDate dimension which is now considered a Conformed Dimension. The slicing and dicing of data has just been made a whole lot easier.

Saturday, April 23, 2016

SQLSaturday Atlanta #521 May 19th & 20th 2016

SQLSaturday events are a pleasure to be a part of whether it involves speaking or volunteering. Volunteering definitely takes more work and time, but to see someone who is interested in SQL Server as a career take time to learn on a Saturday, makes it all worth the effort. When speaking, I tend to gravitate toward those 3-5 in the front row that seem full of questions, and away from that 1-2 that seemed to “know” more than me.

Photo

Friday May 20th 2016, I have be selected to present a full-day session on Reporting with Dimensional Modeling and Microsoft Business intelligence. These are called pre-conference sessions which reflects their purpose at larger conferences like PASS Summit. What makes them more attractive to attendees (AND Bosses) is the fact that you do not have to travel to the larger conference plus they are usually less expensive. This translates to a higher ROI (Return On investment) for you or the company you work for.

Pre-Son

Atlanta’s SQLSaturday this year is May 21st 2016 at Georgia State University in Alpharetta just north of Atlanta. The speaker lineup is incredible and I know half my day will be attending sessions to advance my knowledge with the Microsoft data Platform plus talk about Transitioning from a DBA to Data Warehouse Architect.

sqlsatatl

If you are interested in learning about dimensional modeling and MSBI, please register for the day long pre-conference session I am giving, or you might be interested in the other 2 all-day sessions – Enterprise Scripting by the Midnight DBAs (Sean and Jen McCown) or SQLSentry’s Kevin Kline on 50 Things All SQL Server Developers Need to Know. There is also a half-day session on Friday presented by Linchpin People’s Brian Moran talking about Secrets of independent Consulting.

sqlsatatlsched

Of course, the Saturday event is free (you might have to pay for lunch). You could learn a lot of pertinent skills on Saturday as well as network with probably 500+ attendees and speakers along with sponsors and volunteers. Please come join us.

Photo

Wednesday, March 16, 2016

Excel BI VC presents Dustin Ryan: New Features of Power BI

Next Meeting

Thu, Mar 24 2016 12:00 Central Daylight Time


Dustin Ryan presents What's New in Power BI

RSVP:https://attendee.gotowebinar.com/register/8425155555275941377

WHAT'S NEW IN POWERBI

Between the weekly and monthly updates to Power BI, its tough to stay caught up with the latest updates and improvements with Microsoft's newest business intelligence tool. In this session, we'll take a look at the latest improvements to Power BI and how you and your organization can begin using these new capabilities now.


Dustin Ryan is a senior BI consultant and trainer with Pragmatic Works in Jacksonville, FL. Dustin specializes in delivering quality enterprise-level business intelligence solutions to clients using SSRS, SSIS, SSAS, SharePoint, and Power BI. Dustin has authored and contributed to SQL Server books. You can find Dustin speaking at events such as SQLSaturday, Code Camp, and online webinars.

Analysis Services: Solving Hierarchy Errors of Uniqueness

Multidimensional Cubes provide speed when it comes to retrieving aggregations that are important to business decisions. Being able to slice or group the measures by dimension attributes helps with a quick analysis and flexible/interactive reporting. Configuring these attributes as hierarchies has some details that are not at first obvious. The error message when these problems exist is not extremely helpful for someone new to cubes.

Let’s look at creating a Date hierarchy with Year, Quarter, Month and Day. Our cube already has measures created from a sales fact table and the dimension for date has been created without any hierarchies.

clip_image002

Figure 1: Attribute List for Date Dimension

The measures can be displayed in a Pivot Table in Excel. Figure 2 below shows the Sales amount sliced by Sales Territory and Year/Quarter/Month.

clip_image004

Figure 2: Pivot Table in Excel

Figure 2 show the Year, Quarter and Month as Rows while the Sales Territory is used as columns with Internet Sales used for Values in the Pivot Table. Users will get frustrated when they have to pick one attribute at a time when logically the hierarchy is known.

To create this hierarchy, you need to edit the date dimension in the cube.

clip_image006

Figure 3: Edit Date Dimension in SQL Server Data Tools

To create a hierarchy, you can drag and drop the first or top level of the hierarchy from the Attributes pane into the hierarchies’ pane. We will do this with Year at the top level.

clip_image008

Figure 4: Drag Year to Create New Hierarchy

To finish this hierarchy, drag the Quarter under the Year followed by the Month and Dates attributes. To complete the Hierarchy, right-click on the name Hierarchy, and select Rename. We renamed the hierarchy to Y-Q-M-D.

clip_image010

Figure 5: Y-Q-M-D Hierarchy Created

We can deploy this project and preview in Excel to see the effects.

clip_image012

Figure 6: Preview Hierarchy Y-Q-M-D in Excel

So, what is the problem at this point? Well, for performance reasons, there is a blue line under the hierarchy name in the Cube project. The message tells use to create an attribute relationship for the hierarchy. This is done by editing the date dimension using the Attribute Relationship tab.

clip_image014

Figure 7: Attribute Relationship Does Not Exist.

Y-Q-M-D is a natural hierarchy because a Day is in a Month that is in a Quarter that is in a Year. So, we should be able to show that in the Attribute Relationship for this Hierarchy. You can drag and drop Quarter on Year, then drag and drop Month on Quarter to accomplish this. Dates is the root attribute or key to the dimension.

clip_image016      clip_image018

Figures 8 & 9: Before and After Y-Q-M-D Hierarchy

Now, when we deploy, we get an error. The error message with the red circle and white x does not tell us the problem. The problem is in the last warning indicating that Quarter has duplicates for value 4. In order for attribute relationship to exist, the values in each have to be unique across all occurrences. The Quarter 4 (as well as 3, 2 and 1) are duplicated for every year we have in the data dimension table.

clip_image020

Figure 10: Deployment Failed

There are a couple of solutions for this problem, but we are only going to look at one. We are going to use multiple columns in the KeyColumn property of the Quarter and Month to create uniqueness. Then, we have to add a column to the NameColumn property in order to have something display for the multi-column KeyColumn property.

clip_image022

Figure 11: Changing the KeyColumn of Attribute Quarter

To do this, you have to highlight the Quarter attribute in the Attributes’ pane, then go to the properties. Find the KeyColumn and click on the ellipse. When prompted, add CalendarYear to the Key Columns list and move the Year above the Quarter (Figure 11). Do the same thing for Month, add CalendarYear to the KeyColumn.

clip_image024

Figure 12: NameColumn for Month Attribute

The NameColumn needs to be change from nothing to CalendarQuarter for Quarter attribute and EnglishMonthName for Month attribute (Figure 12). Re-deploy the project and the error should no longer exist and we get a Deployment Competed Successfully.

clip_image026

Figure 13: Deployment Completed Successfully

The use of an Attribute Relationship for natural hierarchies greatly improves the processing and retrieval of data from the cube. This also assists the aggregation builder for indexing the combination of dimension attributes needed for analysis. In the end, the cube can retrieve the aggregation from the month to get quarter or quarter values to get the year which saved retrieving details to aggregate up the hierarchy.

Monday, January 18, 2016

SQLSaturday Austin – Jan 30th

I will be speaking with other SQL Server experts for SQLSaturday #461 in Austin on Jan 30th, 2016. This is a FREE day of training. FREE!!!
There are also a couple of great Pre-Cons on Friday Jan 29th, 2016 that are very cheap compared to similar training at big conferences.
 
SQLSaturday #461 -- the precons will be held on Friday, January 29th, 2015 at the Microsoft office, 10900 Stonelake Blvd., Suite 225  Austin, TX  78759.  
Registration for SQLSaturday #447 Precon:
Creating a High Availability and Disaster Recovery Plan BY Ryan Adams -- http://sqlsat461-ha.eventbrite.com
Performance Tuning Like a Boss BY Robert Davis --http://sqlsat461-perf.eventbrite.com

Here is a list of the sessions for Saturday (FREE!!!):

Session Time Session Title Track Speaker(s)
09:00 AM - 10:00 AM Branding Yourself for a Dream Job Professional Development Steve Jones
09:00 AM - 10:00 AM Designing Stored Procedure Solutions Application & Database Development Jennifer McCown
09:00 AM - 10:00 AM Enforce Best Practices with CMS and Policy Based Management Enterprise Database Administration & Deployment Ryan Adams
09:00 AM - 10:00 AM From Here to Azure Enterprise Database Administration & Deployment Thomas LaRock
09:00 AM - 10:00 AM Know what your machines are doing besides SQL Server - WMI, WQL, and PoSH Enterprise Database Administration & Deployment Michael Bourgon
09:00 AM - 10:00 AM Using Power BI when Implementing Data Analytics Management Practices BI Information Delivery Ginger Grant
10:10 AM - 11:10 AM Continuous Deployments using SSDT Application & Database Development Christopher Wolff
10:10 AM - 11:10 AM Fun Facts about Availability Groups Enterprise Database Administration & Deployment Joe Hellsten, Joe Hellsten
10:10 AM - 11:10 AM Introducing Azure Data Factory Enterprise Database Administration & Deployment Kevin Hazzard
10:10 AM - 11:10 AM Introduction to Time Series Forecasting Advanced Analysis Techniques Peter Myers
10:10 AM - 11:10 AM SQL Server Deprecated and Discontinued Features Enterprise Database Administration & Deployment Nancy Hidy Wilson
11:20 AM - 12:20 PM Database Virtualization (Monster VM) – Vsphere 6 Enterprise Database Administration & Deployment Michael Corey
11:20 AM - 12:20 PM Dimensional Modeling 101 BI Platform Architecture, Development & Administration Thomas LeBlanc
11:20 AM - 12:20 PM Making the Leap from Developer to DBA Professional Development Amy Herold
11:20 AM - 12:20 PM Monster Text Manipulation: Regular Expressions for the DBA Enterprise Database Administration & Deployment Sean McCown
11:20 AM - 12:20 PM Parameter Sniffing the Good, the Bad and the Ugly Application & Database Development Lance Tidwell
02:00 PM - 03:00 PM Azure Machine Learning: From Design to Integration Advanced Analysis Techniques Peter Myers
02:00 PM - 03:00 PM Backup and Restore Tips and Tricks Enterprise Database Administration & Deployment Ryan Adams
02:00 PM - 03:00 PM ColumnStore Index:
Microsoft SQL Server 2014 and Beyond
BI Platform Architecture, Development & Administration Jayaprakash Jothiraman
02:00 PM - 03:00 PM Database Design Throwdown Application & Database Development Karen Lopez, Thomas LaRock
02:00 PM - 03:00 PM SQL Server 2014/2016 In Memory Technologies - OLTP (aka Hekaton) & ColumnSt Application & Database Development Reinaldo Kibel
02:00 PM - 03:00 PM The Data Warehouse of the Future BI Platform Architecture, Development & Administration Stan Geiger
03:10 PM - 04:10 PM "Can I Get A Report of This?": Introduction to SSRS BI Platform Architecture, Development & Administration Jeremy Frye
03:10 PM - 04:10 PM Changing Your Habits to Improve the Performance of Your T-SQL Application & Database Development Mickey Stuewe
03:10 PM - 04:10 PM Configuring SQL Server for Mission Critical and Scalable Deployments Enterprise Database Administration & Deployment Suresh Kandoth
03:10 PM - 04:10 PM Make Your SQL Server Queries Go Faster Enterprise Database Administration & Deployment John Sterrett
03:10 PM - 04:10 PM Mission Possible: Interactive Performance Troubleshooting - Index Enterprise Database Administration & Deployment Jim Murphy
03:10 PM - 04:10 PM Overview of Security Investments in SQL Server 2016 and Azure SQL Database Enterprise Database Administration & Deployment Jamey Johnston
04:20 PM - 05:20 PM Analyze your query plan like a Microsoft Engineer! Advanced Analysis Techniques Daniel Janik
04:20 PM - 05:20 PM Introduction to Biml BI Platform Architecture, Development & Administration Tim Mitchell
04:20 PM - 05:20 PM Service Broker: Coach your processes asynchronously Application & Database Development William Wolf
04:20 PM - 05:20 PM SQL Server Database Encryption (TDE) Enterprise Database Administration & Deployment Ben Miller
04:20 PM - 05:20 PM SS2014 In-Memory Tables: Formula 1 Performance Unleashed! Application & Database Development Mike Byrd
04:20 PM - 05:20 PM Strategies for Working with Texas-sized Databases Enterprise Database Administration & Deployment Robert Davis

Wednesday, December 2, 2015

Idera #SQLChat with John Morehouse

Please join Idera tomorrow for a #SQLChat on Twitter. The December #SQLChat will take place Tuesday, December 9 at 11 a.m. central to 12 noon with John Morehouse (@Sqlrus). We will be discussing how to survive the holidays with SQL Server.

Other Idera ACE’s will be helping with the chat and helping answer questions as well as follow up questions. Get started or improve your existing SQL Server systems to minimize the amount of time spent working during the holidays.

They are also giving away a Kangaroo Mobile Desktop Computer during the #SQLChat

SQLChat

Thursday, November 26, 2015

Covering Indexes

Microsoft continues to improve indexes and options for additional performance enhancements. One I see frequently is the need for a covering index. Before we look at those options, let’s talk about the need for a covering index.

The term covering index was created probably a decade ago. The idea is for the index to cover all columns need to improve the performance of a query. This includes the filters in the WHERE clause as well as the columns in the SELECT section of a query. Before Include Columns and Column Store indexes, this had to be accomplished by adding the columns to index tree structure.

Here is an example query that would benefit from a covering index using the Adventure Works database from CodePlex:

USE [AdventureWorks2014]
GO
SELECT soh.[SalesOrderID], [RevisionNumber], [OrderDate], [DueDate]
    , p.Name, p.ListPrice, sod.OrderQty
  FROM [Sales].[SalesOrderHeader] soh
    INNER JOIN [Sales].[SalesOrderDetail] sod ON sod.SalesOrderID = soh.SalesOrderID
      INNER JOIN [Production].[Product] p ON p.ProductID = sod.ProductID
  WHERE OrderDate Between '20130101' AND '20130101'
   AND p.Color = 'Black'

In the case of the above query, the Product table has an Index Seek on the clustered index which implements a Seek Predicate on the ProductID, but the Color column has a separate Seek which is the Predicate part of the query plan below. So, work is done in the Data Pages of the Clustered Index to find the proper Color value to match the second part of the WHERE (AND …) clause.

image

We can create a new index to “cover” the ProductID and Color in order to have only a Seek Predicate in the execution plan.

CREATE NONCLUSTERED INDEX idxProduct_ProductIDColor ON [Production].[Product]
  (ProductID, Color)
GO

Now, the problem is the execution plane shows a Key Lookup to get the columns Name and List Price.

KeyLookup

To cover the columns Name and ListPrice in the Product table, we need to add those columns to our index. Instead of adding to the end of the Column list like the following…

 

CREATE NONCLUSTERED INDEX idxProduct_ProductIDColor ON [Production].[Product]
  (ProductID, Color, Name, ListPrice)
GO

…we can include it in the data page part of the index by using the INCLUDE part of the CREATE INDEX syntax.

CREATE NONCLUSTERED INDEX idxProduct_ProductIDColorIncludeListPrice
    ON [Production].[Product]
        (ProductID, Color)
    INCLUDE (Name, ListPrice)
GO

Now, we have improved the performance of the query and limited the search part of the index structure to only the columns need for the Seek Predicate. In the image below, you can see in the output list the columns Name and ListPrice in addition to the Seek Predicates with extra Seek. The Object section of the display shows the index being used - idxProduct_ProductIDColorIncludeNameListPrice.

IncludeColumns

In conclusion, Microsoft has really helped us over the years with improvements to the Index creation. After we have created this new index, it is a good idea to start monitoring the indexes with DMVs/DMFs to see how often they are used (Reads and Writes), and if other indexes can be removed because they are not being utilized.