by Dinesh Joshi
My journey with Apache began in 1999 with Apache httpd and Apache Tomcat. Apache httpd was the de facto webserver at the time on Linux and Tomcat was the most well known Java Servlet container. LAMP (Linux, Apache httpd, MySQL, PHP) stack was a fantastic combination. From that point on, I have always been a Apache user, successively exploring technologies like Apache Commons, Apache Storm, Apache Hadoop, Apache HBase, and Apache Cassandra. It has been a very dependable OSS brand. Being interested in Distributed Systems and Databases, I began exploring OSS databases and came across Cassandra.
In early 2018, almost 19 years after I was first introduced to Apache projects, I began actively contributing to Apache Cassandra source. I have always been passionate about Cassandra and used it during my Masters at Georgia Tech. Its distributed, shared-nothing model is amazing. So when I did get the opportunity to contribute to the Cassandra codebase, I decided to make the most of it. Over the past year I have contributed over 25 patches as an author and reviewed over 30 patches. Collaborating with various contributors in the community, we successfully proposed the very first CIP (Cassandra Improvement Process), a Cassandra Sidecar. We, the community, are now busy building it. I have contributed some interesting changes to Cassandra so that it is more reliable and can scale better viz. Zero Copy streaming and Zstd Compressor which have been featured on the Apache Cassandra blog and at various international conferences. This has generated new interest in Cassandra.
I fully credit the Cassandra community with enabling a new contributor like me to make meaningful contributions. It is an incredibly passionate community, with a lot of questions, answers and knowledge dominating the project JIRA board and mailing lists. As a new contributor it was incredible to see a lot of community interest in what I was contributing. The Sidecar specifically generated a lot of discussion and debate within the community and ultimately we achieved consensus, the Apache Way! Zero Copy streaming is something that big players like Netflix, Uber, etc were interested in. Contributors from Netflix took the initiative in testing and benchmarking it and posting the results on Jira. Getting your work into an Open Source project is one thing but it is humbling to see your work being actively evaluated by some of the biggest industry names. It is even more fascinating to me how people can overcome organizational boundaries to collaborate on a project, and how ideas are accepted, debated and implemented as a community ultimately making it better for everyone in the world. Given my contributions to the Cassandra community, recently the PMC voted me in as a Committer which will help me bring in more contributions from the community as well as help mentor others to join in and contribute!
My goal with contributing to Cassandra was to give back to the community the knowledge & expertise that I have gained over the years building some of the most scalable systems in the world. I have found great mentors along the way who have helped me achieve that goal. It is incredible to see the impact we have on the world through Apache projects such as Apache Cassandra.
Cassandra is used at some of the biggest organizations in the world for mission critical applications and changes like Zero Copy streaming (CASSANDRA-14556) or Zstd Compression (CASSANDRA-14482) will have a significant impact on many large businesses and more importantly people’s lives. Specifically Zero Copy Streaming in Cassandra allows the database to recover from a failed node several times faster than existing stable version of Cassandra. In addition, it also lowers the amount of resources that are required by the streaming process. Therefore, an organization running large installations of Cassandra can see a meaningful reduction in MTTR (Mean Time to Recovery) as well as reduce the spare server pool capacity that they need to maintain. This lowers the TCO (Total Cost of Ownership) for Cassandra. Zstd Compression is a new lossless compression scheme that offers better compression ratios over existing LZ4 Compression that is used within Cassandra with comparable compression speed. It can reduce storage needs by up to 40% depending on the characteristics of your dataset. Again, this not only reduces the expenses but also requires fewer servers to store data. As a result you are not only saving money but in a way saving the planet by using fewer servers.
I also believe that being a Open Source contributor is not just about code contributions. Contributions come in various forms and one of them is documentation. Seeing how Cassandra’s documentation is not updated, I proposed Cassandra for Google Season of Documentation to improve it. I also have been invited to talk about Cassandra at various conferences across Asia, Europe and North America. So far, in the past year, I have spoken about Cassandra at 9 conferences. It is great to engage with the user community at large which is very passionate and excited about Cassandra. This is one of the most important aspects of community contributions because you get to talk to your users first hand. It also generates interest in the project and is key to getting new contributors for your project.
In summary, this is impossible without having a great, supportive community which is the whole point of the ASF – to build great communities that foster collaboration making the world better one contribution at a time.
Dinesh Joshi is a Senior Software Engineer and a Committer on the Apache Cassandra project. He has a Masters in Computer Science (Distributed Systems & Databases) from Georgia Tech, Atlanta. In the past, Dinesh was a Principal Software Engineer at Yahoo building real time distributed systems for Yahoo’s Finance Web, iOS & Android apps. He is also an international speaker and regularly talks about Apache Cassandra and Databases. In his spare time, he volunteers as a mentor for Women Who Code.