Unified programming model for batch and streaming Big Data processing, handling data of any scale, and providing portability across multiple execution engines and environments.
Forest Hill, MD —10 January 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Beam™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project’s community and products have been well-governed under the ASF’s meritocratic process and principles.
Apache Beam is a unified programming model for both batch and streaming data processing. It includes software development kits in Java and Python for defining the data processing pipelines, as well as runners to execute them on several execution engines, including Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.
"Graduation is an exciting milestone for Apache Beam," said Davor Bonaci, Vice President of Apache Beam. "Becoming a top-level project is a recognition of the amazing growth of the Apache Beam community, both in terms of size and diversity. Together we are pushing forward the state of the art in distributed data processing and, at the same time, enhancing the ability to interconnect additional storage/messaging systems and execution engines."
The technology behind Apache Beam evolved in large part from Google’s internal work on data processing, tracing its roots all the way back to the Google’s initial MapReduce system and its fundamental changes to the science of distributed data processing. It also reflects modern advances in data processing, embodied in Google’s FlumeJava and MillWheel systems, and culminating with the unified programming model of Google Cloud Dataflow, which became the heart of Apache Beam.
This unified programming model can easily and intuitively express data processing pipelines for everything from simple batch-based data ingestion to complex event-time-based stream processing. The abstractions in the model are designed to support efficient parallel execution, while also cleanly separating the user’s processing logic from details of the underlying engine.
Raising the level of abstraction allows a single Apache Beam pipeline to run, without modification, on multiple execution engines. This portability across diverse execution engines is just one of many extensibility points that let Apache Beam integrate with the broader Apache and Big Data ecosystems. Beside runners, developers can already easily add support for additional IO connectors, libraries of transformations, SDKs, and even domain-specific extensions.
"Apache Beam helps us make stream processing accessible to a broad audience of data engineers, by offering an API which is comprehensive, easy to reason about and at the same time fully decoupled from the underlying execution engine," said Assaf Pinhasi, Director of Big Data Platform at PayPal. "Our data engineers can now focus on what they do best – i.e. express their processing pipelines easily, and not have to worry about how these get translated to the complex underlying engine they run on."
"The graduation of Apache Beam as a top-level project is a great achievement and, in the fast-paced Big Data world we live in, recognition of the importance of a unified, portable, and extensible abstraction framework to build complex batch and streaming data processing pipelines," said Laurent Bride, Chief Technology Officer at Talend. "Customers don’t like to be locked-in, so they will appreciate the runtime flexibility Apache Beam provides. With four mature runners already available and I’m sure more to come, Beam represents the future and will be a key element of Talend’s strategic technology stack moving forward."
"We applaud the Apache Beam working group for its success in creating a unified and consistent platform for building portable data processing pipelines," said Fausto Ibarra, Director of Product Management, Google Cloud Platform. "We believe that we all have a responsibility to share what we’re learning, and we are proud and delighted to witness the successful collaboration to build not only a powerful programming model for processing data from bounded and unbounded sources, but also a portability layer for running pipelines on many processing engines, including Apache Spark, Apache Flink, Apache Apex, and Google Cloud Dataflow. Apache Beam’s graduation to Top Level Project is a well-deserved recognition for the individuals and companies who contributed to the project."
"Apache Beam represents a principled approach for analyzing data streams, simplifying a range of complex data processing concepts and providing developers with a flexible, straightforward model," said Kostas Tzoumas, Co-founder and Chief Executive Officer at data Artisans. "The Apache Flink community wrote one of the first Beam runners, and those of us at data Artisans has been contributing to the Beam project since its inception."
"The Apache Beam community has quickly adapted the Apache Way and been very welcoming to new contributors and ideas. It also encourages communication across other projects that collaborate under the Beam umbrella," said Thomas Weise, Vice President of Apache Apex, and Chief Technology Officer/Co-Founder of Atrato. "Beam helps the wider ecosystem by establishing common terminology and well thought through concepts that reflect in multiple runners and even the native API of the underlying engines."
"In my work at Apache, I have rarely seen an incubating project build a community as well as the Apache Beam project has done," said Ted Dunning, Vice President of Apache Incubator, and Chief Application Architect at MapR Technologies. "The way that they have been able to complement and enhance other streaming data projects is really a credit to everyone involved."
"We’d like to invite you to consider joining us on this exciting ride, whether as a user or a contributor, as we work towards our first release with API stability," added Bonaci. "If you’d like to try out Apache Beam today, check out the latest 0.4.0 release. We welcome contribution and participation from anyone through our mailing lists, issue tracker, pull requests, and events."
Catch Apache Beam in action at numerous face-to-face meetups and conferences, including Apache: Big Data North America 2017, DataWorks Summit and Hadoop Summit Munich 2017, Strata + Hadoop World San Jose and London 2017.
Availability and Oversight
Apache Beam software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project’s day-to-day operations, including community development and product releases. For project updates, downloads, documentation, and ways to become involved with Apache Beam, visit https://beam.apache.org/
About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server –the world’s most popular Web server software. Through the ASF’s meritocratic process known as "The Apache Way," more than 620 individual Members and 5,900 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation’s official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/
© The Apache Software Foundation. "Apache", "Beam", "Apache Beam", "Apache Apex", "Apex", "Apache Flink", "Flink", "Apache Spark", "Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.