CREATE TABLE Is a Policy Document: How I Stopped Winging Schema Definitions

A client called me once to look at a performance problem on their data warehouse. The query was slow, the tables were large, and the indexes were wrong. I fixed the indexes. But while I was in there, I noticed something else: every table in the warehouse had a different schema for tracking when a row was created or last modified. Some had CreateDate. Some had InsertedOn. Some had DateAdded as a VARCHAR. One table had no audit column at all. The timestamp columns were in different positions, used different data types, and had different default constraints — or no default constraints.

Nobody had been negligent. Different developers had built different tables at different times, and each one had made reasonable individual choices. The result, accumulated over years, was a schema that required you to look up the specific column name before you could query audit data on any given table.

That is what happens when you treat CREATE TABLE as improvisation instead of policy.

The Standard CREATE TABLE Template

Here is the template I apply to every staging and warehouse table I build:

CREATE TABLE [dbo].[TableName] (
    -- Business key / identity
    [TableNameID]       INT             NOT NULL IDENTITY(1,1),

    -- Business columns go here
    -- ...

    -- Audit columns — always last, always the same
    [CreatedDate]       DATETIME2(0)    NOT NULL DEFAULT SYSDATETIME(),
    [CreatedBy]         NVARCHAR(128)   NOT NULL DEFAULT SYSTEM_USER,
    [ModifiedDate]      DATETIME2(0)    NOT NULL DEFAULT SYSDATETIME(),
    [ModifiedBy]        NVARCHAR(128)   NOT NULL DEFAULT SYSTEM_USER,
    [SourceSystem]      NVARCHAR(128)   NULL,
    [SourceKey]         NVARCHAR(512)   NULL,
    [IsActive]          BIT             NOT NULL DEFAULT 1,
    [RowHash]           VARBINARY(32)   NULL,

    CONSTRAINT [PK_TableName] PRIMARY KEY CLUSTERED ([TableNameID] ASC)
);

Every table. Same column names. Same data types. Same position (always last). Same default constraints.

Why Each Column Is There

CreatedDate / CreatedBy — set on insert, never updated. Tells you when the row entered the system and what process or user created it. DATETIME2(0) drops the sub-second precision because nobody needs to sort by microsecond for audit purposes, and it saves two bytes per row across a billion-row table.

ModifiedDate / ModifiedBy — updated on every change. The SSIS package or stored procedure is responsible for setting these explicitly on updates; the default handles inserts. If ModifiedDate equals CreatedDate, the row has never been updated — that is useful diagnostic information.

SourceSystem and SourceKey — where the row came from and what its identifier was in the source. These are nullable because not every table has a single source system, but when they apply, having a standard place to put this information makes deduplication and lineage queries consistent across all tables.

IsActive — soft deletes. Physical deletes cause downstream joins to silently return fewer rows. A soft delete flag means you can see what was removed and when. Set the default to 1; set it to 0 when retiring a row.

RowHash — a hash of the business columns, computed at load time. Used for change detection: compare the incoming row's hash to the stored hash to decide whether an UPDATE is needed. This pattern shows up in SCD Type 1 and Type 2 implementations constantly. If the column is in the standard template, it is always there when you need it.

Naming Conventions Are Part of the Policy

The template enforces more than column structure — it enforces naming conventions by example. The primary key is always TableNameID, which means it is unique across the schema and never ambiguous in a join. The constraint name follows the pattern PK_TableName so it is always findable in system catalog queries. The schema is always explicitly specified.

These look like minor stylistic preferences until you are writing a query that joins six tables and needs to reference primary keys without aliases. Or until you are writing a script that generates ALTER TABLE statements from the system catalog and needs to find primary key constraints by naming pattern. Consistency at the template level enables automation at the query level.

The Snowflake Schema Objection

Yes, if you are building a normalized transactional schema, some of these columns do not belong on every table. A junction table does not need its own surrogate key. A reference table that never changes does not need ModifiedDate. The template is a starting point, not a straitjacket. Delete what does not apply. But start from the full template and consciously remove, rather than starting from blank and consciously adding — the second approach is how you end up with tables that are missing CreatedDate because you forgot.

A CREATE TABLE statement is a contract. It defines the shape of every row that will ever live in that table, and that shape is very hard to change cleanly once there is data in it. Treat it like the policy document it is. If you have settled on a different standard — different column names, different positions, different type choices — I'd be curious what drove those decisions. As always, I'm here to help.

Read more