Optimization Isn’t Always Easy
SELECT * FROM [My Life]
Optimization Isn’t Always Easy
I’ve been at my new job for two weeks. In that time I’ve diagrammed all the databases as they are today. I’ve suggested recommendations on adding keys to some tables that had none. I’ve also started diagramming the object data that’s being collected in the system, I’m hoping to find duplication within the system and build out a more efficient design. I’ll be putting together both a transactional design and a related analysis design. To me, this is the really interesting part, visualizing the data, and optimizing it.
The problem is, this new architecture is the long term optimization. While I move to this, I have to keep an eye on how the data is being used. Right now I’m evaluating Confio’s Ignite8 for SQL Server. So far I like this product. I am however seeing limitations in how I’ve implemented it. I don’t have a server on the same LAN segment that I could use to install the monitor, and allow me to monitor 24-7. I’m working to remedy this, but it looks like early January before I can do that. After I get the hardware in place, I’ll resume my monitoring efforts.
Anyway, back to my story.
I was checking out my server, and I identified a query that was being run that was waiting nearly an hour a day to complete. It was an export process, so I knew I’d be dealing with a large data set. I could also see that this query was running between 5,000 and 10,000 times per day. Ignite helped me identify two wait types Memory/CPU and PAGEIOLATCH_SH.
I didn’t throw out Confio’s suggestion that we may be “reading more data from memory than necessary”. I started scanning the query that was being run. I discovered that the query was calling a view.
That was composed of a few tables joined with a view.
That view was itself composed of several views (that used the PIVOT function), joined to tables.
One of those views in turn had a cross database join, so setting up indexes on the views is out, since using WITH SCHEMABINDING is out due to that cross database join. I looked at the amount of data being retrieved by each view.
Each of the “children” views passed all of their columns up to the parent, and those columns were used (this is a wide export). So I turned my attention to the base tables used in the queries. I noticed that each join was using two columns.
So I dug into the actual data. I found that one column used in the join wasn’t very selective at all. But the other was!
A-HAH!
So after a little further exploring, I found I was able to make the queries a bit more optimized by changing all the joins to use the single column, rather than both columns. This was due to the selective key was the primary key in several of the tables (though it wasn’t always identified as the foreign key in the matched table…I have to fix that).
This improvement is only a small improvement, but a noticeable one. In unit testing I was able to save 1621 milliseconds (average) per query. At our current volume that should just cut that wait time by half.
Couple that with the fact we’re moving to new hardware (with more than double the CPU and triple the RAM) in January, and we should see the wait for this query fall off to nothing. I usually hate to say throw more hardware at it, but this time I’m going to leave it there, since the hardware has already been ordered.
The whole process took about 4 hours. Like I said, not easy. Definitely not quick either. After we throw the hardware at it, I am looking at re-factoring this query, if it remains to be one of the top 10 worst performing queries on my servers.
Do any of you out there have any suggestions for tracking down performance issues with queries that call views? What have you done to help performance? What was a waste of time?
Foreign Keys and Witchcraft
I’ve worked with hundreds databases in my career. I could probably design a handful of them from memory if asked. I have notes to remind me of lessons learned from most of them. When I start looking through that list one of the things I most often reference is the importance of Foreign keys. Why is that?
Because most of the time, all of the control over referential integrity has been moved up to the business logic or application layer. Developers will give me a handful of reasons why they didn’t add them to the database. Unfortunately many of the reasons can be proven false, once you start considering all the costs of not implementing them.
I’m creating this post, so I can refer to it later when developers start calling me a witch, and asking if they can burn me.
Myth #1: I can do updates quicker without foreign keys
This is probably the most common point I have to argue against. Developers think since they have to update the “parent” table first before they can update the “child” table, it will take more time. They usually fail to see the queries they have to write to make sure that ID exists in the “parent” table for every insert/update query they want to write.
Instead of having to write that validation query each time, why not let the database server handle that for you? MS SQL is a database management system. Let it manage those keys for you. Yes, you’ll have to trap an error in your code that handles the case where that primary key doesn’t already exist, but that’s what you’re doing now, more or less.
Myth #2: I won’t be able to import my files into the database with foreign keys
This is the second most common reason I receive from programmers on why they don’t want foreign keys. They want to read a file from another system (or client) directly into their dbms without considering the fact the data could be corrupt in some way. Instead they’ll import the data directly into their production tables, then try to clean it up.
How many of you built an SSIS package or DTS package to read data into your database, and got it right the first time? How many of those packages never received a bad file, a corrupt file?
What? None of you?
Of course you’ve had to handle errors on imports. That’s why you shouldn’t import directly to your production tables. Read the data into a workspace ( a disposable table). Read it in raw. Check for errors on that table. Once the data passes all your validation checks, then import it to your system.
I think perhaps the goal should be to never import bad data into the database. Catch it before it becomes an error.
Myth #3:Foreign key constraints cost you in performance
Another one I hear is “We don’t use foreign keys because they’ll cause the database to be slower.” And then I check the tables, and find an indexes covering every column (several times over).
Those indexes are costing you more than the foreign key constraint ever could. Every insert is having to be entered multiple times into your keys…that you’re using in your application layer to test data integrity.
If you’re having slow downs, you’ll often find the solution in letting the server do more for you, not less. If you limit your server by not letting it know there is a key relationship, the optimizer can’t take advantage of that knowledge.
Let’s take a quick look at an example of how a foreign key will actually help performance. Let’s do some counts.
select count(*) from table1 --659687, and it took logical reads 642 select count(*) from table2 --15204678, and it took logical reads 35574
Without a foreign key constraint between these tables, if we want to check the number of entries in table2 that match table1, we have to do a join. And we’ll have to include that join in every query we write, since the integrity cannot be confirmed without it.
select count(*) from table2 INNER JOIN table1 on table1.sharedKey =table2.sharedKey --14974478 /* Table 'table1'. Scan count 9, logical reads 705... Table 'table2'. Scan count 9, logical reads 35526... */
Now, if I add the foreign key constraint, we only have to count the records in table 2, since the server is now maintaining the integrity between the tables.
select count(*) from table2 --14974478, scan count9, logical reads 35574
Looks to me that 35574 is less than 36231 (705 + 35526). Smaller read counts are better than larger ones. Looks like adding a key made the query faster, rather than slower. Not even to mention that we can now improve the developers coding speed, since he or she no longer has to do a join to table1 each time.
Fact #1: It will be difficult to maintain consistent validation checks across all insert/update queries.
Since you don’t have the server maintaining the relationship between those tables, every time you write to the those tables, you’re going to have to do the validation yourself. That means that you will have to hope each developer does it the same way…or at least hope that they do it well enough to prevent bad data from getting into the tables. I’ve worked with marines before (OOOH-RAAA!), and they beat it into my head: “Hope is not a course of action!”
I’m sure I’ll find more notes on why foreign keys are not witchcraft. When I find them, I’ll update this post. If you have any reasons why developers should implement foreign keys, let me know! I want this article to be more complete. If you have arguments against it, I’ll listen. There may be some legitimate reasons to avoid them on occasion, but those I’ve seen are exceptions, not the rules.
Now, about those villagers with pitchforks… I think I’ll cast magic missile!
No Turning Back
Well I did it! As of this morning I’ve submitted my session “The Career Your Career Could Smell Like.” I’ve been working on this one for a while now. Usually I would just take time one on one to help fix a resume, or to give some advice on interviewing. But I’ve decided it’s time to kick it up a notch.
I want to help more people.
As I mentioned, I just started a new job. Within just a few days someone mentioned how there were several interviews before me, and they all were poor. They also revealed there were many more resumes that had been submitted for the role.
I started asking questions. What was bad about their interviews. What was bad about their resumes. They were happy to share. I was happy to listen. I wrote down as many notes as I could, I even was able to collect their resumes. I’ll be able to use a few of them as examples. (I’m definitely removing the identities to protect the innocent.
Some of the mistakes on the resume were pretty simple. When you’re applying for a DBA role, you don’t need to mention what types of laptop you used. You don’t have to list the versions of office you know. But you can also omit certain critical pieces of information. When you say you’ve installed instances of SQL Server, what versions? What editions?
And then there were the interviews. Some of the comments I got concerned their confidence. Some were overwhelmed by the whole process. Others made it seem like the interview process was boring and a formality. The trick is finding a balance between these extremes. You have to be confident in what you bring to the table. You have to know your strengths. You also have to know how to handle it when they tell you they need someone who can do more than you. But never make it seem like the interview is trivial.
Your interviewer(s) will take that personally.
You’ve got to be friendly.
There are some more hints and suggestions the interviewing team made, and I’m adding them to my speaking notes for my session. I really can’t wait to share this all with you! If you’re interested in a sneak peak (and are willing to provide creative criticism), I’m looking for a crowd to beta this with. Are you interested?



