-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathcol-3.sql
More file actions
409 lines (323 loc) · 12.6 KB
/
col-3.sql
File metadata and controls
409 lines (323 loc) · 12.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
/*
Fundamentals of Columnstore: How Data is Selected
v1.0 - 2020-10-17
https://www.BrentOzar.com/go/columnfund
This demo requires:
* SQL Server 2016 or newer
* Stack Overflow database 2018-06 version: https://www.BrentOzar.com/go/querystack
This first RAISERROR is just to make sure you don't accidentally hit F5 and
run the entire script. You don't need to run this:
*/
RAISERROR(N'Oops! No, don''t just hit F5. Run these demos one at a time.', 20, 1) WITH LOG;
GO
/* I'm using the large Stack database: */
USE StackOverflow;
GO
DropIndexes @TableName = 'Users';
GO
/* Turn on actual plans and start in 2016 compat level: */
SET STATISTICS IO, TIME ON;
ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 130
GO
/* Recreate & reload the table. I know, you might already have it built, but
trust me, this is going to be important later. Start with a fresh one. */
DROP TABLE IF EXISTS [dbo].[Users_columnstore];
GO
CREATE TABLE [dbo].[Users_columnstore](
[Id] [int] IDENTITY(1,1),
[AboutMe] [nvarchar](4000) NULL,
[Age] [int] NULL,
[CreationDate] [datetime] NOT NULL,
[DisplayName] [nvarchar](40) NOT NULL,
[DownVotes] [int] NOT NULL,
[EmailHash] [nvarchar](40) NULL,
[LastAccessDate] [datetime] NOT NULL,
[Location] [nvarchar](100) NULL,
[Reputation] [int] NOT NULL,
[UpVotes] [int] NOT NULL,
[Views] [int] NOT NULL,
[WebsiteUrl] [nvarchar](200) NULL,
[AccountId] [int] NULL)
GO
CREATE CLUSTERED COLUMNSTORE INDEX CCI ON dbo.Users_Columnstore;
GO
/* Load with the same amount of data: */
SET IDENTITY_INSERT dbo.Users_columnstore ON;
GO
INSERT INTO dbo.Users_columnstore([Id], [AboutMe], [Age], [CreationDate], [DisplayName], [DownVotes],
[EmailHash], [LastAccessDate], [Location], [Reputation], [UpVotes], [Views], [WebsiteUrl], [AccountId])
SELECT [Id], LEFT([AboutMe],4000), [Age], [CreationDate], [DisplayName], [DownVotes],
[EmailHash], [LastAccessDate], [Location], [Reputation], [UpVotes], [Views], [WebsiteUrl], [AccountId]
FROM dbo.Users;
GO
SET IDENTITY_INSERT dbo.Users_columnstore OFF;
GO
/* Let's say we only want to find the top 1 CreationDate: */
SELECT TOP 1 CreationDate
FROM dbo.Users
ORDER BY CreationDate DESC;
GO
SELECT TOP 1 CreationDate
FROM dbo.Users_columnstore
ORDER BY CreationDate DESC;
GO
/* Compare those queries by:
* Logical reads (including lob logical reads)
* Duration
* CPU time
* Parallelism
* Memory grant
At first glance, it seems like the columnstore index wins.
But it gets a little trickier. I'm going to rewrite those
two queries - will they get the same plans in my new way? */
SELECT TOP 1 CreationDate
FROM dbo.Users
ORDER BY CreationDate DESC;
SELECT MAX(CreationDate)
FROM dbo.Users;
SELECT TOP 1 CreationDate
FROM dbo.Users_columnstore
ORDER BY CreationDate DESC;
SELECT MAX(CreationDate)
FROM dbo.Users_columnstore;
GO
/* The plans, logical reads, and times are all different
for both rowstore & columnstore tables.
This isn't a new thing: different queries = different plans.
For the rest of this, I'm going to focus on reporting
style queries. They will be the kinds of queries where
columnstore usually does better - but I'm not trying
to make columnstore look good.
So let's use the MAX queries.
Will the rowstore table do well if it has an index?
*/
CREATE INDEX CreationDate ON dbo.Users(CreationDate);
GO
SELECT MAX(CreationDate)
FROM dbo.Users;
SELECT MAX(CreationDate)
FROM dbo.Users_columnstore;
GO
/* Compare those queries by:
* Logical reads (including lob logical reads)
* Duration
* CPU time
* Parallelism
* Memory grant
If:
* You can predict the columns queries will filter/sort on
* You can keep that number down relatively low (the 5 & 5 guideline)
* You're doing updates on the table
Then it makes more sense to just do normal rowstore indexes.
But what if:
* You can't predict the filter/sort columns
* You can't keep that number down
* The queries don't bring back all of the columns
* The data is loaded periodically (deletes/inserts, no updates)
Let's try a reporting-style query that uses more than
just the CreationDate column - it also gets Reputation:
*/
SELECT YEAR(CreationDate) AS Yr, MONTH(CreationDate) AS Mo,
COUNT(*) AS UsersCreated, AVG(Reputation) AS AvgReputation
FROM dbo.Users
GROUP BY YEAR(CreationDate), MONTH(CreationDate)
ORDER BY YEAR(CreationDate), MONTH(CreationDate);
SELECT YEAR(CreationDate) AS Yr, MONTH(CreationDate) AS Mo,
COUNT(*) AS UsersCreated, AVG(Reputation) AS AvgReputation
FROM dbo.Users_columnstore
GROUP BY YEAR(CreationDate), MONTH(CreationDate)
ORDER BY YEAR(CreationDate), MONTH(CreationDate);
GO
/* Compare those queries by:
* Logical reads (including lob logical reads)
* Duration
* CPU time
* Parallelism
* Memory grant
So why is the columnstore query faster?
* It reads less columns of the table
* It has less data to group/sort
* It does the work in a more efficient way (batch operation)
In 2019 compat level, rowstore indexes can get batch
execution mode too. If you're on 2019, see it:
*/
ALTER DATABASE [StackOverflow] SET COMPATIBILITY_LEVEL = 150
GO
SELECT YEAR(CreationDate) AS Yr, MONTH(CreationDate) AS Mo,
COUNT(*) AS UsersCreated, AVG(Reputation) AS AvgReputation
FROM dbo.Users
GROUP BY YEAR(CreationDate), MONTH(CreationDate)
ORDER BY YEAR(CreationDate), MONTH(CreationDate)
OPTION (USE HINT('DISALLOW_BATCH_MODE')); /* Simulates 2017 execution */
SELECT YEAR(CreationDate) AS Yr, MONTH(CreationDate) AS Mo,
COUNT(*) AS UsersCreated, AVG(Reputation) AS AvgReputation
FROM dbo.Users
GROUP BY YEAR(CreationDate), MONTH(CreationDate)
ORDER BY YEAR(CreationDate), MONTH(CreationDate);
SELECT YEAR(CreationDate) AS Yr, MONTH(CreationDate) AS Mo,
COUNT(*) AS UsersCreated, AVG(Reputation) AS AvgReputation
FROM dbo.Users_columnstore
GROUP BY YEAR(CreationDate), MONTH(CreationDate)
ORDER BY YEAR(CreationDate), MONTH(CreationDate);
GO
/*
The takeaway: 2019's batch mode on rowstore indexes
gives you a lot of columnstore's low-CPU advantage
without actually having to use columnstore.
In this edge case, don't judge success by duration
alone: the columnstore query may not have gone
parallel. Judge it by CPU time overall & reads.
(These differences will grow larger as we move
to larger tables.)
Let's go back to 2016 compat mode to level the
playing field: */
ALTER DATABASE [StackOverflow] SET COMPATIBILITY_LEVEL = 130
GO
/*
However, columnstore has some other cool tricks up
its sleeve, especially around what data it reads.
Look at the reporting query again: */
SELECT YEAR(CreationDate) AS Yr, MONTH(CreationDate) AS Mo,
COUNT(*) AS UsersCreated, AVG(Reputation) AS AvgReputation
FROM dbo.Users_columnstore
GROUP BY YEAR(CreationDate), MONTH(CreationDate)
ORDER BY YEAR(CreationDate), MONTH(CreationDate);
GO
/* To execute this query, we have to read all of
the row groups because we don't have a WHERE.
But what if we add one? */
SELECT YEAR(CreationDate) AS Yr, MONTH(CreationDate) AS Mo,
COUNT(*) AS UsersCreated, AVG(Reputation) AS AvgReputation
FROM dbo.Users_columnstore
WHERE CreationDate BETWEEN '2008-01-01' AND '2011-01-01'
GROUP BY YEAR(CreationDate), MONTH(CreationDate)
ORDER BY YEAR(CreationDate), MONTH(CreationDate);
GO
/* Which row groups & segments will we read: */
sp_BlitzIndex @TableName = 'Users_columnstore';
GO
/* Run the report query with the filter, and look
at the statistics io output. Read these:
Segment reads: 10
Segment skipped: 0
Hmm, why did SQL Server have to read all of them?
You would think that CreationDate would march
upwards, always going up, as Ids went up. You
would be wrong: */
SELECT MIN(CreationDate) FROM dbo.Users WHERE Id BETWEEN 1 AND 1000000;
SELECT MIN(CreationDate) FROM dbo.Users WHERE Id BETWEEN 1000001 AND 2000000;
/* Look at these CreationDates: */
SELECT * FROM dbo.Users WHERE Id BETWEEN 1384642 AND 1384662 ORDER BY Id;
/* And no, it's not a bug:
https://stackoverflow.com/users/1384651/ Created in 2012
https://stackoverflow.com/users/1384652/ Created in 2008
___
_/ ..\
( \ 0/__
\ \__)
/ \
/ _\
`"""""``
We CAN get segment elimination if we specify Ids:
*/
SELECT YEAR(CreationDate) AS Yr, MONTH(CreationDate) AS Mo,
COUNT(*) AS UsersCreated, AVG(Reputation) AS AvgReputation
FROM dbo.Users_columnstore
WHERE Id BETWEEN -1 AND 2000000
GROUP BY YEAR(CreationDate), MONTH(CreationDate)
ORDER BY YEAR(CreationDate), MONTH(CreationDate);
GO
/* Segment reads 3, segment skipped 7.
Because only 3 segments have those Ids in them: */
sp_BlitzIndex @TableName = 'Users_columnstore';
GO
/* But this leads to a big columnstore challenge:
if we really want the best performance, SQL Server
would love to avoid reading all the segments.
Partitioned tables have "partition elimination"
Columnstore indexes have "segment elimination"
(which is really "row group elimination")
And they're both hard work.
But take heart: the big bang for the buck is
that you only have to read the columns you want,
and that's still a win even if we can't eliminate
row groups.
Now, let's delete, update, and insert some rows,
and see how that effects our query.
*/
DELETE dbo.Users_columnstore
WHERE Location = N'Paris, France';
GO
UPDATE dbo.Users
SET LastAccessDate = GETDATE()
WHERE Location = N'London, United Kingdom';
GO
INSERT INTO [dbo].[Users_columnstore]([AboutMe], [Age], [CreationDate], [DisplayName], [DownVotes],
[EmailHash], [LastAccessDate], [Location], [Reputation], [UpVotes], [Views], [WebsiteUrl], [AccountId])
SELECT TOP 1000 LEFT([AboutMe],4000), [Age], GETDATE() AS CreationDate, [DisplayName], [DownVotes],
[EmailHash], [LastAccessDate], [Location], [Reputation], [UpVotes], [Views], [WebsiteUrl], [AccountId]
FROM dbo.Users;
GO 5
INSERT INTO [dbo].[Users_columnstore]([AboutMe], [Age], [CreationDate], [DisplayName], [DownVotes],
[EmailHash], [LastAccessDate], [Location], [Reputation], [UpVotes], [Views], [WebsiteUrl], [AccountId])
SELECT TOP 10000 LEFT([AboutMe],4000), [Age], GETDATE() AS CreationDate, [DisplayName], [DownVotes],
[EmailHash], [LastAccessDate], [Location], [Reputation], [UpVotes], [Views], [WebsiteUrl], [AccountId]
FROM dbo.Users;
GO 5
/* Now run our report query again with the date filter: */
SELECT YEAR(CreationDate) AS Yr, MONTH(CreationDate) AS Mo,
COUNT(*) AS UsersCreated, AVG(Reputation) AS AvgReputation
FROM dbo.Users_columnstore
WHERE CreationDate BETWEEN '2008-01-01' AND '2011-01-01'
GROUP BY YEAR(CreationDate), MONTH(CreationDate)
ORDER BY YEAR(CreationDate), MONTH(CreationDate);
GO
/* Check your logical reads - segments have been skipped!
Because the newly created segments have new CreationDates
only, and no "old" users, so they get skipped: */
sp_BlitzIndex @TableName = 'Users_columnstore';
GO
/* However, if we want to filter by anything other than
CreationDate - like, say, Reputation - will we be able
to eliminate segments? */
SELECT YEAR(CreationDate) AS Yr, MONTH(CreationDate) AS Mo,
COUNT(*) AS UsersCreated, AVG(Reputation) AS AvgReputation
FROM dbo.Users_columnstore
WHERE Reputation BETWEEN 1000 AND 10000
GROUP BY YEAR(CreationDate), MONTH(CreationDate)
ORDER BY YEAR(CreationDate), MONTH(CreationDate);
GO
/* What we learned in this module:
Rowstore indexes are a better fit when:
* You can predict the columns queries will filter/sort on
* You can keep that number down relatively low (the 5 & 5 guideline)
* The queries are bringing back a lot of columns (SELECT *)
* You're doing updates on the table
* You're on 2019 compat level, get batch mode on rowstore
When we use columnstore indexes:
* The syntax we use matters (TOP vs MAX) just like
it does with rowstore - we still tune queries
* Columnstore goes faster when we eliminate
row groups and column segments
* It's going to be hard to get segment elimination
as we delete/update/insert data
* Query performance will degrade over time, and
we'll need to fix that with index maintenance
and careful loading
*/
/*
License: Creative Commons Attribution-ShareAlike 4.0 Unported (CC BY-SA 4.0)
More info: https://creativecommons.org/licenses/by-sa/4.0/
You are free to:
* Share - copy and redistribute the material in any medium or format
* Adapt - remix, transform, and build upon the material for any purpose, even
commercially
Under the following terms:
* Attribution - You must give appropriate credit, provide a link to the license,
and indicate if changes were made. You may do so in any reasonable manner,
but not in any way that suggests the licensor endorses you or your use.
* ShareAlike - If you remix, transform, or build upon the material, you must
distribute your contributions under the same license as the original.
* No additional restrictions You may not apply legal terms or technological
measures that legally restrict others from doing anything the license permits.
*/