Recently, I’ve been studying Microsoft PDW as part of an ETL integration project. This got me thinking about the distributed database computing paradigm and scaling out performance for SQL Server for OLTP purposes. PDW utilizes multiple compute nodes that are all running SQL instances to help scale very large database implementation, but the main purpose of PDW is as an OLAP-type repository for a BI infrastructure and not as a mechanism to host OLTP databases.
Interfacing with PDW requires interacting with a layer that sits on top of the standard T-SQL interface for both DDL and DML purposes. PDW provides a syntax so that the database can be defined in a distributed fashion (i.e. replicating out dimension type data and hash-partitioning fact-type data) and queried so that the distributed compute nodes are put to work to bring back the data. More about PDW is on the Microsoft site at http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/pdw.aspx. PDW has made progress in facilitating shorter times for very large queries and enabling larger databases, but it is not really a HPC solution for OLTP databases. It is oriented to support very large queries on very large databases mainly from other servers in a BI infrastructure rather than a being a user-centric transactional database supporting thousands of simultaneous users.
In addition to PDW, SQL Server 2014 is on the horizon with Hekaton (http://research.microsoft.com/pubs/193594/Hekaton%20-%20Sigmod2013%20final.pdf) to enable higher-performance database processing including OLTP scenarios. Hekaton provides a mechanism to map tables directly to memory and rely exclusively on high-throughput logging to ensure integrity. It will be interesting to see how Hekaton is utilized by organizations or it also becomes more integrated into the BI aspects of SQL Server including PDW.
My belief is that even though enabling components are present, there does not seem to be an obvious design pattern for a high-performance SQL Server OLTP database for very low latency that includes very large/scalable storage requirements. Such databases have generally involved large amounts of custom development to implement. An example of such a project was the MySpace implementation which involved 440 different SQL Server database instances (http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?casestudyid=4000004532). I hope to test out over the next few months a solution outlined below which has the benefit of supporting standard drivers and database interfaces within SQL Server without the need for database sharding (http://www.codefutures.com/database-sharding/) that normally entails significant application development to manage. The goal is to utilize standard SQL Server database implementation techniques on top of the Windows Server storage infrastructure to incorporate distributed storage over high-speed networking.
Some of my previous posts have focused on PCIE SSD benefits including the use of Fusion-IO cards within a single server. I was able to achieve over 5 GB/s database backup throughput – enough to nearly saturate even a 56Gb InfiniBand connection, the highest-throughput network configuration that supports standard server protocols in data centers. Windows Server 2012 has made significant advances in SMB which can be leveraged by ISCSI interfaces which can sit on top of InfiniBand providing potentially 56Gb/s throughput. This enabling of high-speed networking over the ISCSI interface effectively allows high-speed storage devices hosted on multiple nodes to achieve very low latency and high-throughput. This inspired the idea for the HPD (High Performance Database).
The design goal is simple – utilize standard SQL Server tools and Microsoft platform components to build a database on a single SQL Server instance which could be clustered or mirrored, but leverage high-performance storage devices not only on the hosting node, but on distributed nodes to support very large OLTP databases. The technique for leveraging the distributed storage is via ISCSI over InfiniBand. Previous to this I had played with ISCSI to create a SQL Cluster so knew that this would work.
I almost gave up on the HPD idea, because of concerns about the processing cycles required on the single SQL Server instance to deal with the high-speed data interactions with the networked nodes, but then noticed this white paper – http://www.mellanox.com/related-docs/whitepapers/WP_Scalable_Storage_InfiniBand_Final.pdf
This paper along with some other research indicating that much of the work involved with storage retrieval with the latest SMB and ISCSI fully leverage the data hosting nodes makes me think that the idea is worth testing out. The key enabler is that much of the processing is done natively in the Mellanox cards rather than requiring server resources. This is especially true when utilizing the RDMA interface as discussed in this article (http://embedded-computing.com/pdfs/Mellanox.Apr07.pdf). The below diagram from the Mellanox paper illustrates the concept:
The paper includes the following comment with the graphic:
VSA based appliances are being used to enhance the performance of clustered databases by providing a “Fan-In” approach. High-end SMP machines, like the HP DL 580, can process more transactions than can be handled by local storage or a dedicated SAN. A solution is to provide a virtual RAID across disks from five physical storage servers. In this example, InfiniBand was able to saturate the IO bus of the DL580.In this configuration, performance was 23Gb/s with 2.5M Random IOPs. An equivalent storage configuration with Fibre Channel would require 50 FC wires!
Based on this, utilizing a robust SQL Server instance on a machine with high number of cores and large amount of memory (i.e. 2 TB RAM, 144 cores) could utilize a dozen or so other servers to achieve ultra high OLTP performance, greater than any benchmarks based on purely local storage or the use of a large SAN even with multiple high-speed HBAs. Distributing the PCIE SSD out to the various nodes which function only as file servers supporting ISCSI and mapping the ISCSI drives to the SQL Server database has the potential to provide higher throughput than a monolithic structure where the database server is attached to a traditional SAN for storage – even in the case where the SAN is hosting SSD devices.
I am very busy on multiple projects currently, but my next skunk-works project will be to take my 2 smaller servers that have PCIE SSD Fusion-io cards and connect them together via InfiniBand. I will configure one server as the SQL Server database instance to utilize both the local PCIE SSD as well as the PCIE SSD on the other server via InfiniBand using ISCSI. If my hunch is right, then the overall throughput of the database server will be higher leveraging additional storage from the other server than it could be if all of the storage was local to the server.
Although the solution is limited in scalability since there is still only 1 main SQL Server instance, it provides a topology to scale out storage to the maximum potential of the CPU capabilities of the hosting SQL Server machine. The typical bottleneck with SQL is on the I/O side rather than the CPU side. This approach helps to scale storage out such that CPU is no longer the bottleneck for a very large SQL instance that supports standard T-SQL and driver requirements so that existing applications can still connect to the database without any modifications. This could help customers who are dealing with inability to scale their current database server to support higher-throughput or lower-latency requirements from their applications with the benefit of not needing to modify the applications.
As far as where Hekaton fits, I see this as yet another component for the high-speed database to store highly-accessed tables in memory and to leverage multiple PCIE-SSD devices connected together via Microsoft’s Storage Spaces technology for the logging and the non-memory-located tables. Potentially, this type of approach could also be leveraged in a PDW implementation by the compute nodes as well. Since I am building my solution on top of standard Microsoft protocols and components, it is perfectly compatible as a design pattern to compliment Microsoft’s latest SQL Server product implementations.
As far as where my concept fits in as far as a larger BI infrastructure including predictive analytics, one significant advantage for a high-throughput/high-capacity OLTP database is the ability to perform pre-aggregations and inject intelligent data into data warehouses. This facilitates the ability to respond in near-real time based on just-in-time analytic processing as data is received, rather than as a later process. Ultimately lower -latency predictive analytics are facilitated when OLTP databases have the capability to perform aggregative functions or execute predictive models as data is received rather rather than in later post processes.