Babelfish: Efficient Execution of Polyglot Queries

Philipp M. Grulich, Steffen Zeuch, Volker Markl

Proceedings of the International Conference on Very Large Data Bases (VLDB 2022) | September 2022

Abstract:

Today’s users of data processing systems come from different domains, have different levels of expertise, and prefer different programming languages. As a result, analytical workload requirements shifted from relational to polyglot queries involving user-defined functions (UDFs). Although some data processing systems support polyglot queries, they often embed third-party language runtimes. This embedding induces a high-performance overhead, as it causes additional data materialization between execution engines. In this paper, we present Babelfish, a novel data processing engine designed for polyglot queries. Babelfish introduces an intermediate representation that unifies queries from different implementation languages. This enables new, holistic optimizations across operator and language boundaries, e.g., operator fusion and workload specialization. As a result, Babelfish avoids data transfers and enablesefficient utilization of hardware resources. Our evaluation shows that Babelfish outperforms state-of-the-art data processing systems by up to one order of magnitude and reaches the performance of handwritten code. With Babelfish, we bridge the performance gap between relational and multi-language UDFs and lay the foundation for the efficient execution of future polyglot workloads

Bibtex:

 
                  @article{Grulich2021babelfish,
                    Title = {Babelfish: Efficient Execution of Polyglot Queries},
                    Author = {Philipp Marian Grulich, Steffen Zeuch, Volker Markl},
                    Year = {2021},
                    Journal = {PVLDB},
                    volume = {15},
                    number = {2},
                    publisher = {VLDB Endowment},
                    Abstract = {Today’s users of data processing systems come from different domains,
                    have different levels of expertise, and prefer different programming languages. As a result, analytical workload requirements
                    shifted from relational to polyglot queries involving user-defined functions (UDFs). Although some data processing systems support
                    polyglot queries, they often embed third-party language runtimes.
                    This embedding induces a high-performance overhead, as it causes additional data materialization between execution engines.
                    In this paper, we present Babelfish, a novel data processing engine designed for polyglot queries. Babelfish introduces an intermediate
                    representation that unifies queries from different implementation languages. This enables new, holistic optimizations across operator
                    and language boundaries, e.g., operator fusion and workload specialization. As a result, Babelfish avoids data transfers and enables
                    efficient utilization of hardware resources. Our evaluation shows that Babelfish outperforms state-of-the-art data processing systems
                    by up to one order of magnitude and reaches the performance of handwritten code. With Babelfish, we bridge the performance gap
                    between relational and multi-language UDFs and lay the foundation for the efficient execution of future polyglot workloads.},
                    Url = {},
                    doi = {10.14778/3489496.3489501},
                    issn = {2150-8097}
                    }

View Paper Code