High performance analytic database for Apache Hadoop in-Cloud or on-premises in use at Caterpillar, Cox Automotive, Jobrapido, Marketing Associates, the New York Stock Exchange, phData, and Quest Diagnostics, among others.
Forest Hill, MD —28 November 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Impala™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project’s community and products have been well-governed under the ASF’s meritocratic process and principles.
Apache Impala is a modern, high-performance analytic database for Apache Hadoop. The massively parallel processing (MPP) SQL query engine allows for analytical queries on data stored on-premises (in HDFS or Apache Kudu) or in Cloud object storage via SQL or business intelligence tools without having to migrate data sets into specialized systems or proprietary formats.
"The Impala project has grown a lot since we entered incubation in December 2015," said Jim Apple, Vice President of Apache Impala. "With the help of our mentors and the Incubator, we have grown as a community and adopted the Apache Way, all while the Impala contributors have helped make Impala more stable and performant."
In addition to using the same unified storage platform as other Hadoop components, Impala also uses the same metadata, SQL syntax (Apache Hive SQL), ODBC driver, and user interface (Impala query UI in Hue) as Hive. This provides a familiar and unified platform for real-time or batch-oriented queries. Impala provides:
- A familiar SQL interface that data scientists and analysts already know;
- The ability to query high volumes of data (Big Data) in Apache Hadoop;
- Distributed queries in a cluster environment, for convenient scaling and to make use of cost-effective commodity hardware;
- The ability to share data files between different components with no copy or export/import step; for example, to write with Apache Pig, transform with Hive and query with Impala. Impala can read from and write to Hive tables, enabling simple data interchange using Impala for analytics on Hive-produced data; and
- A single system for big data processing and analytics, so customers can avoid costly modeling and ETL just for analytics.
Impala was inspired by Google’s F1 database, which also separates query processing from storage management. It was originally released in 2012 and entered the Apache Incubator in December 2015. The project has had four releases during its incubation process.
"In 2011, we started development of Impala in order to make state-of-the-art SQL analytics available to the user community as open-source technology," said Marcel Kornacker, original founder of the Impala project. "The graduation to an Apache Top-Level Project is a recognition of the exceptional developer community that stands behind this project."
Apache Impala is deployed across a number of industries such as financial services, healthcare, and telecommunications, and is in use at companies that include Caterpillar, Cox Automotive, Jobrapido, Marketing Associates, the New York Stock Exchange, phData, and Quest Diagnostics. In addition, Impala is shipped by Cloudera, MapR, and Oracle.
"Apache Impala is our interactive SQL tool of choice. Over 30 phData customers have it deployed to production," said Brock Noland, Chief Architect at phData. "Combined with Apache Kudu for real-time storage, Impala has made architecting IoT and Data Warehousing use-cases dead simple. We can deploy more production use-cases with fewer people, delivering increased value to our customers. We’re excited to see Impala graduate to a top-level project and look forward to contributing to its success."
"We use Apache Impala to boost performance of our SQL queries against our data lake," said Matteo Coloberti, Head of Analytics at Jobrapido. "Impala is an incredible service that gives us impressive performance on queries."
"We used to distribute Microsoft Excel reports to clients every one or two days but now they can search on their own by customer, sales deal, or even service type," said Andy Frey, CTO of Marketing Associates. "Apache Impala is used to query millions of rows to identify specific records that match the clients’ criteria. We’ve even given clients a ‘Query Hadoop’ option that allows them to create simple SQL statements and query Hadoop directly via Impala. We’re able to offer a faster, richer, and more accurate selection of services without the labor or latency concerns that we used to have."
"The Apache Impala community is growing, and we welcome new contributors to join in our efforts in our code, documentation, issue tracker, and discussion forums," added Apple.
Catch Apache Impala in action at Not Another Big Data Conference, taking place 12 December 2017 in Palo Alto, CA.
Availability and Oversight
Apache Impala software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project’s day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Impala, visit http://impala.apache.org/
About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server –the world’s most popular Web server software. Through the ASF’s meritocratic process known as "The Apache Way," more than 680 individual Members and 6,300 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation’s official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Facebook, Google, Hewlett Packard, Hortonworks, Huawei, IBM, Inspur, iSIGMA, ODPi, LeaseWeb, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://apache.org/
© The Apache Software Foundation. "Apache", "Impala", "Apache Impala", "Hadoop", "Apache Hadoop", "Hive", "Apache Hive", "Kudu", "Apache Kudu", "Pig", "Apache Pig", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #