Top database researcher and new scientific advisor at LeanXcale, Patrick Valduriez, talks to us about the latest trends on database world, his new position at LeanXcale, and the upcoming edition of the best-selling book he co-authors with Professor Tamer Özsu, from the University of Waterloo in Canada.
Q. Hi Patrick, and thank you very much for being here with us. It’s a pleasure to talk with you about the databases world. But first, we want to know more about your incorporation to LeanXcale.
You have recently joined LeanXcale as Scientific Advisor. Why did you take that step?
Patrick Valduriez: It is a great opportunity for me, and probably the right time, to go deeper in applying the principles of distributed and parallel databases on real-world problems. LeanXcale has a disruptive technology that can make a big difference on the DBMS market. I am pleased to be part of this exciting adventure and have the chance to work with a great team of researchers and engineers in Europe.
Q. How did you meet LeanXcale for the first time?
Patrick Valduriez: I first met Professor Ricardo Jimenez-Peris (LeanXcale’s CEO and founder) in 2005 at a VLDB workshop on database replication, where we both gave talks. After some discussion, it became obvious that both of us could learn much from each other. Ricardo is a leading expert in transaction management and database replication, which nicely complements my expertise in distributed and parallel query processing. Thus, we started doing joint research on distributed and parallel data management and became good friends since then. In 2013, Ricardo invited me to participate in the CoherentPaaS European Project, in which we developed the CloudMdsQL polystore. During the project, LeanXcale was created. Since then, our collaboration has continued, producing excellent research results.
Q. Could you give us an outlook of the database market nowadays?
Patrick Valduriez: For the last 30 years, the database market has been dominated by relational DBMSs, which have proved effective in mission-critical application domains (e.g., transaction processing and business intelligence). In particular, the SQL language has fostered their wide adoption, both from tool vendors and application developers. However, with the advent of big data, RDBMSs have been criticized for their “one size fits all” approach. As an alternative solution, more specialized NoSQL DBMSs, such as key-value stores, document stores and graph DBMSs, have emerged, able to scale out in large clusters of commodity servers. However, scalability has been typically achieved by relaxing database consistency. NewSQL is a recent class of DBMS that seeks to combine the scalability of NoSQL systems with the strong consistency and usability of RDBMSs. An important class of NewSQL is Hybrid Transaction and Analytics Processing (HTAP) whose objective is to perform real-time analysis on operational data, thus avoiding the traditional separation between operational database and data warehouse and the complexity of dealing with ETLs. LeanXcale is at the forefront of the HTAP movement, with a disruptive technology that provides ultra-scalable transactions, polyglot queries, key-value capabilities, and many others.
Q. You have been involved in the startup world before. How was that experience?
Patrick Valduriez: In the 1990s, I managed Dyade, a joint venture between Bull and Inria, to foster the development of core technologies in information systems. Dyade was a great success, with some major technology transfers into Bull products and four startups that are still in business (TrustedLogic, Kelkoo, Jalios and Scalagent). I was directly involved in the transfer of the Disco technology (Internet data integration system), which I developed with my Inria team, to KelKoo, a successful price comparator. I learnt a lot from this experience, in particular that, in addition to excellent research results, strong knowledge of the business domain and good vision of the future are critical.
Q. With Professor Tamer Özsu (University of Waterloo, Canada), you are co-author of “Principles of Distributed Database Systems”, the bestselling textbook on the topic. From the first edition published in 1991 to the upcoming fourth edition, how has the world of distributed database systems evolved?
Patrick Valduriez: Distributed database systems have moved from a small part of the worldwide computing environment a few decades ago to mainstream today. The editions of the book reflect such impressive evolution. The first edition describes relational distributed database systems, involving just a few geo-distributed sites. The second edition introduces single site distributed database systems, also called parallel DBMSs, and object-oriented distributed database systems. The third edition reflects an accelerated investigation of distributed data management technologies over the preceding period in the context of P2P, cluster, XML, data streaming, Web data integration systems and cloud computing. As a result, the book has become quite big (850 pages).
Q. What are the main updates on the new edition of the book?
Patrick Valduriez: First, to make room, we removed some background material, which is now well presented elsewhere, and reorganized and updated previous chapters. Second, we added new material on recent hot topics such as big data, NoSQL, NewSQL, polystores, web data integration and blockchain. As a short preview, note that there is a section on LeanXcale’s ultra-scalable transaction management approach in the transaction chapter and another section on LeanXcale’s architecture in the NoSQL/NewSQL chapter. My co-author and I thought these deserved to be in the book.
Q. As Scientific Advisor of LeanXcale, what is your role?
Patrick Valduriez: I see my role as a sort of consulting chief architect for the company, providing advice on architectural and design choices as well as implementation techniques. I will also do what I like most, i.e., teach the engineers the principles of distributed database systems, do technology watch, write white papers and blog posts on HTAP-related topics, and do presentations at various venues.
Q. What are you currently working on at LeanXcale?
Patrick Valduriez: The first topic is query optimization, based on the Calcite open source software, where we need to improve the optimizer cost model and search space, in particular, to support bushy trees in parallel query execution plans. The second topic is to add a JSON data type in SQL, inspired by the now famous SQL++ language, in order to combine the best of relational DBMS and document NoSQL DBMS.
Q. Is there anything else you would like to mention?
Patrick Valduriez: Well, the adventure just got started and it is already a lot of fun. I like to learn from real problems, and LeanXcale has great use cases to satisfy my curiosity and creativity. I want to thank the company for its trust in me.
Patrick Valduriez
Patrick Valduriez is a senior scientist at Inria in France. He has been a scientist at Microelectronics and Computer Technology Corp. in Austin (Texas) in the 1980s and a professor at University Pierre et Marie Curie (UPMC) in Paris in the early 2000s. He has also been consulting for major companies in USA (HP Labs, Lucent Bell Labs, NERA, LECG, Microsoft), Europe (ESA, Eurocontrol, Ask, Shell) and France (Bull, Capgemini, Matra, Murex, Orsys, Schlumberger, Sodifrance, Teamlog). A successful career that has been recognized with prestigious awards and prizes, such as the 1993 IBM scientific prize in France, the VLDB2000 best paper award and the 2014 Innovation Award from Inria – French Academy of Science – Dassault Systems. Now, as a part-time consulting job, he engages in a new adventure as Scientific Advisor at LeanXcale.