Linked Servers are a very popular way to access distributed data in environments with lots of SQL Server instances on different serves, or even on other Database Management Systems like Oracle or MySQL. While linked servers can be very useful if used correctly, I have found that more often than not, they are not used correctly and are a source of lots of performance problems.  In this article we’ll take a look at using linked servers the right way (and a few wrong ones).

Linked Server Review

To start with, lets look at how a linked server works.  We configure a linked server in by specifying the remote data source and a name for the sever.

linked1

This allows us to use the linked server name ADVENTUREWORKS to access the AdventureWorks database on the instance RKLAPTOPSS.  Note that the Linked Server name can be whatever we want it to.  It does not have to be the name of the instance. I prefer this practice because you can use the same linked server name in multiple environments (Development, QA, Production, etc.)

Authentication settings have to be configured as well.

image

In this case, I am saying to just pass through the login’s security context to the remote server.  I could also choose to always use a certain security context or to map local login’s to remote logins. The security implications of these choices are beyond the scope of this post but the most important thing is that in most cases Options 2 and 4 should not be used.

Four Part Naming in Linked Servers

Now that we have a linked server set up, we can access it from the local SQL Server instance.  The most common way to do this is to use four part names to refer to objects on the remote server.

<LinkedServerName>.<DatabaseName>.<Schema>.<Object>

SELECT *
FROM ADVENTUREWORKS.AdventureWorks.HumanResources.Employee
WHERE Gender = 'M'

While this is the most popular way to use Linked Servers, it is also the worst way in terms of performance.  To understand this, you need to always ask the following question: “Where will the query be optimized?”

In this case, the optimizer on the local server will determine the execution plan.  But it doesn’t know anything about indexes and statistics that exist on the Employee table on the remote server.  There may well be an index on Gender which could be taken advantage of, but the local optimizer has no way to know that.   So this query will result in a table scan, every time, regardless of remote indexes. In addition, all of the data will be sent across the network only to be filtered for Gender on the local system.

OPENQUERY()

A better way to do this is to use a function like OPENQUERY().  Essentially what we are doing is executing a remote query on the linked server and just asking for the results back.

SELECT *
FROM OPENQUERY
   (
    ADVENTUREWORKS,
    'SELECT * FROM AdventureWorks.HumanResources.Employee WHERE Gender = ''M'''
   )

The function takes two parameters.  The remote server to execute the query on, and the query itself.  Note that this is remotely executing so if the server was to another DBMS, you’d want the query written in that engine’s flavor of SQL.

This is a simple example, but imagine if the remote table had a million plus rows.  The inefficient table scan plus bringing all that over the network so that it can be filtered locally.  I have “fixed” queries that were taking 12+ hours to run and reduced the time to just minutes by making this simple change to the code.