Newest addition to Apache Big Data ecosystem used for continual, incremental processing of data at petabyte scale
Forest Hill, MD –26 July 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Fluo™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project’s community and products have been well-governed under the ASF’s meritocratic process and principles.
Apache Fluo is a distributed system for incrementally processing large data sets stored in Apache Accumulo (the sorted, distributed key/value store based on Google’s Bigtable, built on top of Apache Hadoop, Apache Zookeeper, and Apache Thrift). With Fluo, users can continuously join new data into large existing data sets without reprocessing all data. Unlike batch and streaming frameworks, Fluo offers much lower latency and can operate on extremely large data sets.
"I am very excited to see Apache Fluo graduate and I would like to thank our mentors for all their help, the Apache Incubator Project Management Committee for its advice and guidance, everyone in the Fluo community, and Google for publishing the research upon which Fluo is based," said Keith Turner, Vice President of Apache Fluo. "As a result of collaboration within the community, we are graduating with a beautifully designed piece of software."
Based on Percolator (built on top of Bigtable to support incremental updates to the search index at Google), Fluo makes it possible to continually-update the results of a large-scale computation, index, or analytic as new data is discovered.
"Apache Fluo is a very clever piece of software, elegantly supplementing Apache Accumulo’s ability to store and maintain very large indexes," said Christopher Tubbs, ASF Member and Committer on Apache Accumulo and Apache Fluo. "Its support of transactions enables Accumulo to solve a whole new set of big data problems, and its observer framework makes designing ingest workflows fun."
An example of how Fluo works is a use case of counting phrases in unique documents. This could be accomplished by two MapReduce jobs: one job to get a unique set of documents and a following job to count phrases. Where petabytes of documents are concerned, running both jobs for a small amount of new data is inefficient. Apache Fluo enables continuous, quick computations of these two joins as new data arrives, constantly emitting deltas of phrase counts. Anything could consume the emitted deltas. For example, a query system could be continuously updated using them.
"We are excited that Fluo is becoming a Top-Level Project at the Apache Software Foundation," said Dr. Adina Crainiceanu, Apache Rya (incubating) Committer and Associate Professor, Computer Science Department, United States Naval Academy. "Heartfelt congratulations to the Fluo community for achieving this important milestone. The Apache Rya project uses the observer framework in Fluo to cache and maintain answers to complex SPARQL queries for large RDF datasets. Using cached answers greatly improves Rya’s performance for complex queries. Fluo complements Rya by allowing the incremental and continuous update of the cached answers. Fluo is particularly useful because it allows updates to happen as new data is ingested, reduces updates latency, avoids stale results, and circumvents the periodical reprocessing of the entire dataset. We are confident that Apache Fluo will become one of the important frameworks for updating indexing results in a dynamic data-acquiring context."
"Fluo fulfills an important role in the Apache Hadoop ecosystem, significantly expanding existing capabilities for working with large data sets," said Billie Rinaldi, ASF Member and former Vice President of Apache Accumulo. "I was excited to see this project come to the Apache Incubator, and am even more pleased to see it graduate to a top-level Apache project."
"We welcome new users and contributors to Apache Fluo," added Turner. "If you are interested in trying Fluo, check out the Fluo Tour on the project Website. Join our mailing lists to discuss how Fluo may be a good solution for your problem, as well as for help with debugging and finding starter issues."
Catch Apache Fluo in action and meet members of the Fluo community at Accumulo Summit, 16 October 2017 in Columbia, MD. http://accumulosummit.com/
Availability and Oversight
Apache Fluo software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project’s day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Fluo, visit http://fluo.apache.org/ and https://twitter.com/ApacheFluo
About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server –the world’s most popular Web server software. Through the ASF’s meritocratic process known as "The Apache Way," more than 620 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation’s official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit https://www.apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "Fluo", "Apache Fluo", "Accumulo", "Apache Accumulo", "Rya", "Apache Rya", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #