D.S

sungsoo.github.io

Where are the GPU based SQL databases?

Where are the GPU based SQL databases? Better Motivation, Different Thinking Sung-Soo Kim's Blog Home Tags Categories Archive Where are the GPU based SQL databases? Tags big data 255 24 December 2015 Article Source Title: Where are the GPU based SQL databases? Where are the GPU based SQL databases? Data processing and gathering has come a long way since the initial automation improvements in the late 19th century. Despite a plethora of academic research, we have yet to see a market ready GPU based SQL database. Until now. Humble beginnings The mechanized database solution invented by Herman Hollerith of the US Census pioneered the way for data collection and processing. His census tabulating machine shaved months off the US census processing time, from the 13 years time projected. Using a clock-like counting device to display the results, when a predetermined position on a specially designed punch-card was scanned, the counter’s electromagnet moved the clock hand one point forward. The operator of the machine would place a card in the reader, pull down a lever and then remove the card once the holes had been counted. We can compare the successive stages of punched-card processing to fairly complex calculations in SQL. Each stage can compare to this full SQL query: SELECT (filter columns) WHERE (filter cards, or “rows”) By the end of all cycles, we can liken the result to the previous query, with the addition of a GROUP BY clause for the totals and counts. Hollerith’s tabulating machine. Source: IBM In fact, Hollerith’s machine was so successful, it launched his company into what we know today as IBM. IBM continued developing these machines and they were still used well into the 1950s, even when electronic computers had already been invented. This revolution brought on by punch cards, proliferated into many different areas outside of the US government. Many corporations used this to redesign their administration. IBM was naturally the leader in this form of enterprise tabulating. The advent of the relational model and SQL In 1970, Edgar Codd, an employee at IBM’s research laboratory published his revolutionary relational database model research paper titled “ A Relational Model of Data for Large Shared Data Banks “. In this relational model, all data is represented in tuples, grouped into relations. This means that by a simple organization, the data could be used as a tool for complex querying – providing valuable hidden analytics. Like many other revolutionary ideas, the relational model wasn’t easy to push forward. IBM chose to focus on its existing product named IMS/DB running on the then ubiquitous System/360. It was already making the company a lot of money. Unfazed, he proceeded to show his new model to IBM customers who in turn pressured IBM into including it in their upcoming System R. “A database – illustrated”. Source: IBM   IBM, still not keen on cutting off its IMS/DB cash cow isolated the System R development team from Codd. The development team came up with an alternative representation known as SEQUEL as early as 1974. By separating the data from the applications accessing the data and enabling the manipulation of the information through an English like query language, the hidden analytics could be extracted through construction of logic statements in the relational model. Despite its drawbacks, SEQUEL was so vastly superior to other existing non-relational database implementations, that it was eventually replicated into Larry Ellison’s Oracle Database using only pre-launch whitepapers. Oracle Database was eventually first to market in 1979, before IBM’s SQL/DS in 1981 which later became DB2. Alternatives to SQL Today, SQL and the relational model are the most widely used computer languages for databases, despite newcomers like NoSQL and clustered database solutions like Hadoop. For many startups, NoSQL might be a good idea because it allows the fast collection of a lot of data, even when you don’t know when and if it’ll be used. Further down the line, when the company matures, the complexity of migrating to structured data might become a big expense in time, manpower and money. Because NoSQL is non-relational, most implementations do not provide features like persistence, transactions or durability contrary to the prevailing practice among relational database systems. Despite the attraction it generates, SQL continues to be a mainstay for big corporations, gaining even more adoption in the big data market.   In comes the GPU (Graphics Processing Units) Over the past decade, graphics processors have made leaps and bounds and have been found to add significant value in both research and industry, where better performance in data intensive operations is needed. Because of its structure, the GPU enables a single ‘instruction’ to be performed on huge chunks of data simultaneously (SIMD, Single Instruction, Multiple Data), compared to a general purpose CPU which typically has a smaller scale implem