Open Source distributed Big Data integration framework in use at Apple, CERN, Comcast, Intel, LinkedIn, Nerdwallet, PayPal, Prezi, Roku, Sandia National Labs, Swisscom, Verizon, and more.
Wilmington, DE —16 February 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Gobblin™ as a Top-Level Project (TLP).
Apache Gobblin is a distributed Big Data integration framework used in both streaming and batch data ecosystems. The project originated at LinkedIn in 2014, was open-sourced in 2015, and entered the Apache Incubator in February 2017.
“We are excited that Gobblin has completed the incubation process and is now an Apache Top-Level Project,” said Abhishek Tiwari, Vice President of Apache Gobblin and software engineering manager at LinkedIn. “Since entering the Apache Incubator, we have completed four releases and grown our community the Apache Way to more than 75 contributors from around the world.”
Apache Gobblin is used to integrate hundreds of terabytes and thousands of datasets per day by simplifying the ingestion, replication, organization, and lifecycle management processes across numerous execution environments, data velocities, scale, connectors, and more.
“Originally creating this project, seeing it come to life and solve mission-critical problems at many companies has been a very gratifying experience for me and the entire Gobblin team,” said Shirshanka Das, Founder and CTO at Acryl Data, and member of the Apache Gobblin Project Management Committee.
As a highly scalable data management solution for structured and byte-oriented data in heterogeneous data ecosystems, Apache Gobblin makes the arduous task of creating and maintaining a modern data lake easy. It supports the three main capabilities required by every data team:
- Ingestion and export of data from a variety of sources and sinks into and out of the data lake while supporting simple transformations.
- Data Organization within the lake (e.g. compaction, partitioning, deduplication).
- Lifecycle and Compliance Management of data within the lake (e.g. data retention, fine-grain data deletions) driven by metadata.
“Apache Gobblin supports deployment models all the way from a single-process standalone application to thousands of containers running in cloud-native environments, ensuring that your data plane can scale with your company’s growth,” added Das.
Apache Gobblin is in use at Apple, CERN, Comcast, Intel, LinkedIn, Nerdwallet, PayPal, Prezi, Roku, Sandia National Laboratories, Swisscom, and Verizon, among many others.
“We chose Apache Gobblin as our primary data ingestion tool at Prezi because it proved to scale, and it is a swiss army knife of data ingestion,” said Tamas Nemeth, Tech Lead and Manager at Prezi. “Today, we ingest, deduplicate, and compact more than 1200 Apache Kafka topics with its help, and this number is still growing. We are looking forward to continuing to contribute to the project and helping the community enable other companies to use Apache Gobblin.”
“Apache Gobblin has been at the center stage of the data management story at LinkedIn. We leverage it for various use-cases ranging from ingestion, replication, compaction, retention, and more,” said Kapil Surlaker, Vice President of Engineering at LinkedIn. “It is battle-tested and serves us well at exabyte scale. We firmly believe in the data wrangling capabilities that Gobblin has to offer, and we will continue to contribute heavily and collaborate with the Apache Gobblin community. We are happy to see that Gobblin has established itself as an industry standard and is now an Apache Top-Level Project.”
“Open community and meritocracy are the key drivers for Apache Gobblin’s success,” added Tiwari. “We invite everyone interested in the data management space to join us and help shape the future of Gobblin.”
Catch Apache Gobblin in action in the upcoming hackathon planned for late Q1 2021. Details will be posted on the Apache Gobblin mailing lists and Twitter feed listed below.
Availability and Oversight
Apache Gobblin software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project’s day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Gobblin, visit https://gobblin.apache.org/ and https://twitter.com/ApacheGobblin
About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with nearly 8,000 Committers through the ASF’s meritocratic process known as “The Apache Way”. Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. “Apache”, “Gobblin”, “Apache Gobblin”, “Hadoop”, “Apache Hadoop”, “MapReduce”, “Apache MapReduce”, “Mesos”, “Apache Mesos”, “YARN”, “Apache YARN”, and “ApacheCon” are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #