70-457 Reviewer #02, What is Columnstore Index?

Columnstore index is the new type of index introduced in SQL Server 2012. It is a column based non-clustered index geared toward increasing query performance involving large result sets, ideally in fact tables for OLAP query. This type of index is based on Vertipaq engine acquisition.

Columnstore indexes in the SQL Server Database Engine can be used to significantly speed-up the processing time of common data warehousing queries. Ideally, you can maximize the power of using columnstore index in summary tables, indexed views, OLAP cubes and other queries with very large result set.  Although these can greatly speed up query processing, these techniques can be inflexible, difficult to maintain, and must be designed specifically for each query problem.

Columnstore Index Problem

Noticeable improved query performance can be observed when using columnstore index on very large tables, such as those used as fact tables in data warehousing projects, but on some queries, the performance may not be as expected and at times may also be worse than expected. Sometimes the query optimizer opts to use the columnstore index when its more better to use row-store for the query. In cases like these, it is better to drop the columnstore index to force the query optimizer to use row store processing  or use IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX query hint, or use an index hint to direct the optimizer to a row store index. Therefore, in rare cases, this option might not resolve the performance problem.

Requirements and Limitations of Columnstore Index

  • Only 1 columnstore index per table. You can include all your columns but not exceeding 2014 columns so you have to be very careful in planning and creating the columnstore index.
  • Only nonclustered columnstore indexes are available.
  • Cannot be a unique index.
  • Cannot be created on a view or indexed view.
  • Cannot include a sparse column.
  • Cannot act as a primary key or a foreign key.
  • Cannot be changed using the ALTER INDEX statement. Drop and re-create the columnstore index instead.
  • Cannot be created with the INCLUDE keyword.
  • Columnstore indexes are ordered according to the compression algorithms. Sorting is not allowed in the index. Values selected from a columnstore index might be sorted by the search algorithm, but you must use the ORDER BY clause to guarantee sorting of a result set.
  • Cannot contain a column with a FILESTREAM attribute. Other columns in the table that are not used in the index can contain the FILESTREAM attribute.

A table with an existing columnstore index is read only and cannot be updated. To insert data in a table with a columnstore index, drop the columnstore index, do the insert/update/delete in the table  then rebuild the columnstore index again.

Demonstration

A sample demonstration from MSDN and as illustrated below. You will need to setup AdventureWorksDW2012 database. If you don’t have this attached to your server yet, you may download it from codeplex.

USE AdventureWorksDW2012;
GO

CREATE PARTITION FUNCTION [ByOrderDateMonthPF](int) AS RANGE RIGHT
FOR VALUES (
20050701, 20050801, 20050901, 20051001, 20051101, 20051201,
20060101, 20060201, 20060301, 20060401, 20060501, 20060601,
20060701, 20060801, 20060901, 20061001, 20061101, 20061201,
20070101, 20070201, 20070301, 20070401, 20070501, 20070601,
20070701, 20070801, 20070901, 20071001, 20071101, 20071201,
20080101, 20080201, 20080301, 20080401, 20080501, 20080601,
20080701, 20080801, 20080901, 20081001, 20081101, 20081201
)
GO

CREATE PARTITION SCHEME [ByOrderDateMonthRange]
AS PARTITION [ByOrderDateMonthPF]
ALL TO ([PRIMARY])
GO

CREATE TABLE [dbo].[FactResellerSalesPtnd](
[ProductKey] [int] NOT NULL,
[OrderDateKey] [int] NOT NULL,
[DueDateKey] [int] NOT NULL,
[ShipDateKey] [int] NOT NULL,
[CustomerKey] [int] NOT NULL,
[EmployeeKey] [int] NOT NULL,
[PromotionKey] [int] NOT NULL,
[CurrencyKey] [int] NOT NULL,
[SalesTerritoryKey] [int] NOT NULL,
[SalesOrderNumber] [nvarchar](20) NOT NULL,
[SalesOrderLineNumber] [tinyint] NOT NULL,
[RevisionNumber] [tinyint] NULL,
[OrderQuantity] [smallint] NULL,
[UnitPrice] [money] NULL,
[ExtendedAmount] [money] NULL,
[UnitPriceDiscountPct] [float] NULL,
[DiscountAmount] [float] NULL,
[ProductStandardCost] [money] NULL,
[TotalProductCost] [money] NULL,
[SalesAmount] [money] NULL,
[TaxAmt] [money] NULL,
[Freight] [money] NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[CustomerPONumber] [nvarchar](25) NULL,
OrderDate [datetime] NULL,
DueDate [datetime] NULL,
ShipDate [datetime] NULL
) ON ByOrderDateMonthRange(OrderDateKey);
GO

— Using simple or bulk logged recovery mode, and then the TABLOCK
— hint on the target table of the INSERT…SELECT is a best practice
— because it causes minimal logging and is therefore much faster.
ALTER DATABASE AdventureWorksDW2012 SET RECOVERY SIMPLE;
GO

— Copy the data from the FactResellerSales into the new table
INSERT INTO dbo.FactResellerSalesPtnd WITH(TABLOCK)
SELECT * FROM dbo.FactResellerSales;
GO

— Create the columnstore index
CREATE NONCLUSTERED COLUMNSTORE INDEX [csindx_FactResellerSalesPtnd]
ON [FactResellerSalesPtnd]
(
[ProductKey],
[OrderDateKey],
[DueDateKey],
[ShipDateKey],
[CustomerKey],
[EmployeeKey],
[PromotionKey],
[CurrencyKey],
[SalesTerritoryKey],
[SalesOrderNumber],
[SalesOrderLineNumber],
[RevisionNumber],
[OrderQuantity],
[UnitPrice],
[ExtendedAmount],
[UnitPriceDiscountPct],
[DiscountAmount],
[ProductStandardCost],
[TotalProductCost],
[SalesAmount],
[TaxAmt],
[Freight],
[CarrierTrackingNumber],
[CustomerPONumber],
[OrderDate],
[DueDate],
[ShipDate]
);

The above script will create a partition in the table FactResellerSalesPtnd then create a columnstore index on this fact table. To test if the query optimizer will use the columnstore index create above, run the following script.

SELECT SalesTerritoryKey, SUM(ExtendedAmount) AS SalesByTerritory
FROM FactResellerSalesPtnd
GROUP BY SalesTerritoryKey;

Check the execution plan to make sure it used the Index Scan on columnstore index csindx_FactResellerSalesPtnd. It should show as below.

Columnstore Query Plan

Best practice in using columnstore index

  • The datatypes that are best candidate to include when creating a columnstore index are characters and numeric columns.
  • It is best to create partitions in your large table and make sure these partitions are regularly updated.
  • Do not create a columnstore index for the purpose of making your large table read-only. If you need your large table to be read-only, create a read-only filegroup and move your table to that read-only filegroup.

For more information on columnstore index, click here.

Advertisements

One response to “70-457 Reviewer #02, What is Columnstore Index?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

My Time to Travel

The travels of an old(er), solo, woman

Blog Home for MSSQLDUDE

The life of a data geek

unfoldthecreativity

Traveller Observer

The SQL Pro

(Ayman El-Ghazali - www.thesqlpro.com)

Meels on Wheels

Meels for breakfast, lunch and dinner.

juliansherman.net/

Building A Business While Having A Life

Paul Turley's SQL Server BI Blog

sharing my experiences with thee Microsoft data platform, SQL Server BI, Data Modeling, SSAS Design, Power Pivot, Power BI, SSRS Advanced Design, Power BI, Dashboards & Visualization

TroubleshootingSQL

Explaining the bits and bytes of SQL Server and Azure

Coding Tips

We are crazy about programming and we want to share our craziness with you!!!

SQL Studies

 Live, Learn, Share

Sql And Me

My Experiments with SQLServer

Dimitrios Kalemis

I am exactly like Jesus Christ: an atheist and anarchist against society and bad people with influence and power (judges, social workers, politicians, priests, and teachers).

Clint Huijbers' Blog

Senior Certified Microsoft BI Consultant

Blog of Many Useless Wonders

Where Uselessness Abounds!

Steve Spevack's Blog

IT Recruitment