Much ado about GUIDs

How does one enforce referential integrity across separate databases, or servers for that matter?  The answer is simple—there is nothing native in SQL Server to enforce integrity outside of Primary and Foreign Keys which are scoped within a database.

However, you can go outside the box and use a Uniqueidentifier, often referred to as GUIDs (globally unique identifiers), but everything is manual. Designing a federated database can be somewhat cumbersome, but for good Database Engineers and Architects it is not that hard to set up, just a little forward thinking. Then the question is: Why do we not see more federated designs? The answer is not in the design; it has to do with the GUID.

Lets take a look at a normal simple table:

CREATE TABLE ExampleInt
 
(
 
  Col1   [INT]  IDENTITY(1,1) NOT NULL
 
                 CONSTRAINT PK_ExIntCol1 PRIMARY KEY
 
,Col2   [NVARCHAR](255)
 
)

In our example above we see we have an identity as the Primary Key which will auto increment with an anticipated order Ascending in a numeric fashion. This allows us to easily Cluster on the Primary Key and we can have all our extents in order and it will dramatically cut down on fragmentation.

Now let’s look at a table with a GUID as the primary key:

CREATE TABLE ExampleGuid
(
Col1 [UNIQUEIDENTIFIER]
CONSTRAINT PK_ExGuidCol1 PRIMARY KEY
CONSTRAINT DF_ExGuidCol1 DEFAULT NEWID()
,Col2 [NVARCHAR](255)
)

In this example we are using a NEWID() to create a UNIQUEIDENTIFIER which is a 32-character, alphanumeric data type that is completely random. That means it has no order whatsoever, and that also means your data is no longer sequential like it is with an Identity. Populate this guy with a few million rows and unless you have solid state drives you will probably hear the hard drive head units thrashing frantically across the platters desperately trying to return your data. Performance issues? Like they say in Fargo, “You Betcha!” Not to mention that you may have to adjust your fill factor dramatically to deal with the copious page splits, and may have to index nightly (if not more frequently, depending on how you use the table).

This is the real reason we do not see more federated designs, because of the nature of the GUID—not because we lack the skills to design or the desire to engineer the right solution. With the release of SQL Server 2008, everyone hoped that the performance issue with the random GUID would be gone as Microsoft came out with a function called NEWSEQUENTIALID ().  The dream was now alive as we possibly had the power to create unique alphanumeric GUIDS that would be sequential so that we could order them in an Ascending order and cluster on that order. Everyone was very excited and the buzz phrase “federated database” was on every salesperson’s lips whispering in the CTO’s ear.

Unfortunately, there was a very big problem with the NEWSEQUENTIALID(). First is it’s merely a wrapper of Windows API UuidCreateSequential, which does not have order guarantee according to its documentation. It actually worked pretty good for the most part, until you rebooted the system and then it would hash out a new key and start ordering sequentially based on the new hash. So as long as you never rebooted your system, it would work for the most part. There were some other issues with the function, but that was the big one.

You cannot expect to have maintenance on a server without rebooting the system every once in a while, and if that action triggers the primary key jumping around the anticipated increment… that is probably a deal breaker in most environments. When you select from a table ordered by the primary key, one does expect to be able to anticipate the result set. A new function never came up, and the NEWSEQUENTIALID() is used for solutions that need a federated design.

Now it would be horrible to take you down this rabbit hole just to end this blog with “GUIDS need work.” There is a better way. GUIDS can be a great tool, and having them in your toolbox can be instrumental, but like all good things know when to use it and how to leverage its full random sequential ability is key. Let’s take a quick look at a GUID:

Looking closer at guids

 

 

 

 

 

 

 

Looking at the GUID we see a 32-character alphanumeric value separated with 8 characters, a  dash, 4 characters, a dash, 4 characters, a dash, 4 characters, a dash, and then the last 12 characters. Here is the important thing: just like machine code/binary (1s and 0s 100101), the GUID is read from right to left. That means if we were to manipulate the GUID we would need to change just the end to make it sequential.

An easy way to do this is to configure a datetime or timestamp. A date time could be read in varchar as well as the 32-character GUID. You know what else converts to varchar easily? Binary, which is read right to left, where varchar is read left to right. So if we concatenate our manipulation of functions using binary, we will have the correct allocation of where we want the order by to be on the GUID. It will be much easier when you look at the solution below, which is how to create a SEQUENTIAL RANDOM GUID:

Looking closer at guids

 

 

 

 

 

It looks completely random but it is in fact sequential within 900 milliseconds (yes, that is the downfall if you are inserting 10 rows in a batch, those 10 records will get a unique GUID but they may not be sequential within that small of a batch). Using the built-in functions NEWID() and GETDATE(), we concatenate them together converting them into binary, and then simply convert back into a GUID. Looking back at our Guid table it would now look like so….

CREATE TABLE ExSeqGUID
(
Col1   [UNIQUEIDENTIFIER]  NOT NULL
CONSTRAINT PK_ExSeqGuid_Col1 PRIMARY KEY
CONSTRAINT DF_ExSeqGuid_Col1 DEFAULT
CONVERT(UNIQUEIDENTIFIER, CONVERT(BINARY(10), NEWID()) +
CONVERT(BINARY(6), GETDATE())),
Col2 INTEGER IDENTITY
);
GO
INSERT INTO dbo.ExSeqGUID DEFAULT VALUES
GO 1000
SELECT  *
FROM    dbo.ExSeqGUID
ORDER BY Col1

In conclusion, we have a way to make sequential GUIDs and have them set up as primary keys using defaults on the table definition. The performance gain over the NEWID() function is overwhelming, and it is more predictable than using NEWSEQUENTIALID(), and unlike the function, this sequential GUID can be created anywhere, not just in the table definition.

Please share this

Leave a Reply

Related Articles

A toy machine and a microphone on a table Description automatically generated

The Right Tool for the Right Job

I’m looking forward to 2024. From many perspectives, I think we are going to see great advancements, including in technology, that on one hand will make our lives easier, but also make our decisions a bit harder.

Read More »