[MUSIC] Fast forward one more year and so there's 2012 paper on the system called Spanner. And I just wanna mention these quotes and then we'll talk a little bit about the system. And this one is still sort of being explored by the community. It's not available actually for use, but the paper is being explored and the idea is being explored. So for example, you don't see an open source, actually that's not true you do see, there has been a couple open source with limitations of the ideas in Spanner. But they're not quite as popular as some of the open source limitations of the other Google systems. Okay. So, it says even though many projects happily use Bigtable, we've also consistently received complaints from users of Bigtable. It can be difficult to use for certain kinds of applications. Those that have complex, evolving schemas or those that want strong consistency in the presence of wide-area replication, okay? And so then he goes on to say, we believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always cutting around the lack of transactions. And so this, the database community could have said well, sure, you know well duh, right? That's exactly the point is that system supplied support for transactions is always a win, right? Because it's, difficult, error prone, and expensive to try to do this at the application level. And more importantly, it's fundamentally wrong in some sense to do it at the application level, because the application doesn't have global knowledge of what's going on. Only the system does. So, all those scanners scalable in the number of nodes, or the final quote here. The node-local data structures have relatively poor performance on complex SQL queries, because they were designed for simple key-value accesses. And then, algorithms and data structures from the database literature can improve singlenode performance a great deal. You know, it's somewhat of a Google-style approach to the problem of reboot everything, rebuild it all from scratch and then sort of cherry pick and bring things in. So, this has been working pretty well. And they have fantastic impact in the community. But there is a lot out there in the database literature and in database systems that could have been used from the start. I mean, in fact, trying to start from the beginning and just say we're going to build a big Google-style parallel database may have been a good choice. Rather than sort of getting completely away from it and then coming back incrementally and find yourself in a SQL system. Now, I sort of skipped what Spanner is, but it's a planet scale database system. There is a SQL like language, well actually, let's go back to our, I know what I'm missing. I'm missing our table here. Let me flip back up to it. So here it is down here. I'm missing this slide here where I showed it. So really big scale. Primary axis is the ucaine axis. By other attributes. There are transactions, and in fact, they're global this time. They're real, real, actual transactions. It's not clear to me whether joins are supported. I suspect they are because cuz they keep talking about SQL, but I couldn't find an example on whether it is or not. There is notion of schema and they do sort of protect against data that doesn't conform to the schema. There is some notion of logical data independence, although they don't talk about it much. There is a SQL-like decorative language on top of it. I didn't see much evidence that they're doing a whole lot of fancy optimization. And I did just show you that quote where they say that performance is sort of poor on complex analytic queries. But that's something that probably can come along, somewhat quickly. So that's Spanner at a high level. Let me give you a couple more details about what the system does. So the data model here is this notion of directories. And these are a set of continuous keys with a shared prefix. So you can think of it kind of like a tablet was in Bigtable, but now they have this notion of multiple, logical tables being sort of interleaved. And so, if you're not used to staring at the syntax, don't worry too much. But those of you who are thinking in terms of DDL and a relational database, they have kind of a create table language that looks like this. You create table users with two columns, and you give it this keyword, DIRECTORY, and then you're gonna create table albums with some columns, and then you have this keyword INTERLEAVE IN PARENT Users. And what you end up with is something like this, where there's a user with all of its albums. There's a user with all of its albums. As you can see here, that what we've been talking about, all these different systems are experimenting with ways of getting these nested date structures, hierarchical data structures that look a lot like what we saw way back in the sixties. Right? And the motivation is the same as it was then. It's actually really, really fast. When you're gonna access, when you wanna pull up a user and then immediately pull up all of its albums, it's really fast accesses this way. But, I'd probably speculate that the reasons why relational, the relational approach eventually replace these can and will happen here as well. Is that performance is not the number one priority a lot of the time. It's minimizing the amount of developer headaches. Okay. So, needs to be seen. But I think that this increment will walk step towards a big new scalable relational database is underway. Now again that doesn't mean that I'm saying use all the old databases. They really were designed for a different workload and there really is no evidence that they scale to some of these levels. But that doesn't mean that you throw out everything that we learned. [MUSIC]