Apache Software Foundation Announces New Top-Level Project Apache® Paimon 

A table format for batch and stream processing, Apache Paimon has graduated from the Apache Incubator

Wilmington, DE, April 16, 2024 – The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 320 active open source projects and initiatives, today announced that Apache Paimon has graduated from incubation and is now a Top-Level Project (TLP). Paimon is a data lake format that enables real-time lakehouse architectures built with Apache Flink and Apache Spark for streaming and batch operations. Paimon innovatively combines lake format and log-structured merge-tree (LSM) to bring real-time streaming updates into the data lake. 

“I extend my heartfelt congratulations to the Paimon community on their graduation to a Top-Level Project at ASF,” remarked Yu Li, an ASF Member and mentor in the Incubator program. “As a champion for the project, witnessing the community’s remarkable development and expansion, all in adherence to the Apache principle of prioritizing community over code, has been immensely satisfying. The Paimon community has not only delivered major feature-rich releases but has also fostered an inclusive atmosphere that warmly welcomes new contributors.”

As a streaming data lake platform, Paimon allows users to process data in both batch and streaming modes. Feature highlights and benefits include:

  • High-speed Data Processing: Paimon’s append table (no primary-key) provides large scale batch and streaming processing capability;
  • Flexible Updates: Paimon gives users the flexibility of choice when updating records including deduplication to keep last row; partial-updates; aggregation records; first-row updates;
  • Fast Real-time Analytics: By leveraging Flink Streaming, Paimon’s primary key table supports real-time streaming updates of large amounts of data. Paimon performs real-time query within one minute;
  • Simplified Changelog Production: Paimon simplifies users’ streaming analytics by producing accurate and complete changelog updates for merge engines; and
  • Low-latency Data Queries: Paimon supports data compaction with z-order sorting to optimize file layout. By using indexes such as minmax, Paimon also enables fast queries based on data skipping.

“I am very happy that the Apache Paimon community has become increasingly strong over the past year,” said Jingsong Lee, Vice President of Apache Paimon. “A large number of newcomers have joined this community, and Paimon has exceeded my imagination and has a very rich range of usage scenarios in many enterprises.”

“I am really excited to see Paimon graduate and become a top-level ASF project. Paimon has begun enabling Alibaba to do real-time updates and analytics on lakehouse architecture, and we will also leverage Paimon to serve AI business in the future,” said Feng Wang, head of Open Data Platform at Alibaba Cloud.

“Apache Paimon is a high-performance, low-latency real-time data lake that significantly reduces data computation and storage costs and markedly enhances data development efficiency in various scenarios, such as Ant Group’s risk control and the Wufu application,” said Zhigang Li, head of Real-time Computing at Ant Group.

“I was fortunate enough to participate in the entire lifecycle of Paimon to-date, from Flink Table Store to independent incubation and successful graduation, experiencing firsthand the practicality and excellence of community developers,” said Guanghui Zhang, head of Streaming Computing at ByteDance.

Formerly known as Flink Table Store, Paimon was  first developed by the Flink community. Paimon is leveraged globally in production environments by companies such as Alibaba, Ant Group, Bytedance, China Unicom, and Tongcheng, among others. 

Additional Resources 

About the Apache Incubator

The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1)

ensure all donations are in accordance with the ASF legal standards; and 2) develop new  communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision-making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit https://incubator.apache.org/.

About The Apache Software Foundation (ASF)
Founded in 1999, the Apache Software Foundation exists to provide software for the public good with support from more than 75 sponsors. ASF’s open source software is used ubiquitously around the world with more than 8,400 committers contributing to 320+ active projects including Apache Superset, Apache Camel, Apache Flink, Apache HTTP Server, Apache Kafka, and Apache Airflow. The Foundation’s open source projects and community practices are considered industry standards, including the widely adopted Apache License 2.0, the podling incubation process, and a consensus-driven decision model that enables projects to build strong communities and thrive. https://apache.org

ASF’s annual Community Over Code event is where open source technologists convene to share best practices and use cases, forge critical relationships, and learn about advancements in their field. https://communityovercode.org/ 

© The Apache Software Foundation. “Apache” is a registered trademark or trademark of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

Media Contact
press@apache.org