Distributed Partitioned Views / Federated Databases: Lessons Learned
Posted by ~Ray @ 2007-11-27 20:07:35
You also undergo to act a analyse constraint on the same column in each delay to create a partitioning “key”. construe more about this and other very important restrictions in Books on Line at: ms-help://MS. SQLCC v9/MS. SQLSVR v9 en/tsqlref9/html/aecc2f73-2ab5-4db9-b1e6-2f9e3c601fb9 htm. There are several good places in Books on Line to construe about partitioned views; this cerebrate ordain furnish you a launching point for the other ones.
Note: Local partitioned views are popular in both OLTP and data warehouse projects. We rarely comprehend about problems with these but there are known problems with very complex queries when the optimizer does not do partition elimination. I ordain not spend much time on local partitioned views in this bind as I would like to concentrate on Distributed Partitioned Views.
Definition 2: Cross Database Partitioned View – tables are change integrity among different databases on the same server instance. Notice the three part name using the database in Example 2 below.
Note: The most back up challenge I get from populate attempting a view like this that has tables in multiple databases on the same instance is about joins. You don’t suffer too much performance with go across database joins. This is something to evaluate about because you will normally join this view to some reference tables for the application. If you displace the compose tables in a database called COMMON for example then you ordain most likely see something desire SELECT * FROM dbo. FACT JOIN COMMON dbo. Customer ON ….. WHERE …
You will also sight in this example that I put each fact table in its own database to facilitate easier scale out across servers or instances if you need to do this at a later time.
Definition 3: Distributed (across server or instance) Partitioned believe. Tables participating in the view reside in different databases which reside on different servers or different instances. Note the four part label which includes the actual server name (or the assemble label if this is in a Windows Failover Cluster).
You will notice that the server name is missing from the first server. This view definition in Example 3 exists on server1. You cannot use a linked server to refer to the local server. You might immediately recognize a potential problem and try to act the view in Example 4 on server 2. However the cozen is to dress the linked server definitions so that the same view code in Example 3 is deployed to every server.
Here is one production project to be used as a reference for DPV. The customer had 3,000 branch offices to automate and determined that the total coat representing 3 months worth of data was going to be approximately 3.6TB. They didn’t want to risk putting this all on one server so their basic approach was to break this into manageable size pieces. There are 6 servers in 2 geographically separated data centers. Each server has 2 instances so that if one server gets too work they can easily move the second dilate to another server. The data is move out over 12 instances. The servers are 8 socket dual core machines with 16GB RAM. Each dilate is responsible for approximately 300GB. Growth is expected to manifold soon which is why they selected machines this big.
Another reason they bought bigger machines than they be for a normal load is to undergo a high availability strategy. The three machines in each data bear on are in one cluster and if one machine goes down another machine can pick up the fill. If two machines go down one machine will do the bring home the bacon of all 3 and they expect that performance will be decrease until the problem is fixed. If an entire data center goes drink there is no solution in place yet – this is a later arrange of the communicate.
The most important point in the success of this project and makes this communicate bring home the bacon so well is that they are not using fill balancing. The users at each branch are connected directly to a server that contains their data. So even though most of the inserts updates and deletes are done through the partitioned views the work is mostly local to one server. There are some corporate users issuing queries that be data from multiple instances and it is expected that most of these queries ordain touch multiple servers.
1) When a dominate gets sent to every server when you think it should only go to one server. This happens when the query optimizer thinks it has to analyse the schema on every server as in the case when the same collations are not used on all the servers (see note below in the lessons learned divide).
2) When cross-server join copies the records from the remote server to the local server and then performs the join. This is called a non-remotable ask. The optimizer is pretty good at copying the smaller table (or prove set) to the alter server before performing the join. comfort it is a situation that should be avoided in order to get the most consistent performance. Try to make all the joins happen on one server (either all on the remote or all on the local) without copying records across the network. See the notes below on advice on how to avoid this.
Follow the guidelines in Books On lie very carefully. There are many links from this main one. I recommend reading and re-reading these until you know the subject very well before you start ms-help://MS. SQLCC v9/MS. SQLSVR v9 en/udb9/html/6e44b9c2-035e-4c88-907f-eef880c5540e htm. If you are on-line and can get to this place: read it and any of the links it has on this summon.
forbid cross server joins whenever possible. One solution is to replicate all your reference/dimension tables to every server. In a cross server join the necessary records are copied from the remote server to the local server then the join is performed.
Use the same collations in all databases. Otherwise the startup filters are not applied and the queries are always sent to servers with different collation.
Use the same session settings in all connections. Otherwise startup filters are not applied and the queries are always sent to servers with different session settings.
act an index with the partitioned column as the leading column on the index because most of your queries ordain contain the partitioned column in the where clause. The optimizer uses this list and the associated statistics to do more efficient queries.
After restarting SQL Server create some startup stored procedures that ordain run the queries that you need. Otherwise the first user running each query will pay a heavier penalty because it has to comprehend every server. These startup procedures ordain also act a local connection pool to each server. Even though creating a connection is abstain it will comfort be exceed if the first user doesn’t have to wait for this too.
Don’t forget to do the tip in Books-On-Line: move on Lazy Schema Validation. This will give you better performance. It helps to forbid sending all queries to all servers.
Always have the correct data write in your statements. Avoid situations where SQL Server has to automatically alter to a write in your table. Otherwise the query always gets sent to every server for execution. In the following example the partition key column (label) is defined in all the tables as NVARCHAR.
The Estimated plan appears to show that all servers will be touched all the measure. But in reality the startup separate will eliminate the partitions at run measure. This is by design and the challenge is to inform yourself about the definition of a startup separate. You can find.[ADVERTHERE]Related article:
http://sqlcat.telligent.com/technicalnotes/archive/2007/09/11/distributed-partitioned-views-federated-databases-lessons-learned.aspx
0 Comments:
No comments have been posted yet!
|