opensource.google.com

Menu

This Week in Open Source #8

Friday, August 15, 2025

This Week in Open Source for 08/15/2025

A look around the world of open source
by Daryl Ducharme & amanda casari, Google Open Source

Upcoming Events

  • August 14-16: Open Source Festival 2025 (OSCAFest'25) is happening in Lagos, Nigeria. It uses community to help integrate the act of open source contribution to African developers whilst strongly advocating the movement of free and open source software.
  • August 25-27: Open Source Summit Europe (OSSEU) is happening in Amsterdam, Netherlands. It is the premier event for the open source community to collaborate, share information, solve problems, and gain knowledge, furthering open source innovation and ensuring a sustainable open source ecosystem. Many Googlers will be there giving talks along with so many others.
  • September 5-7: NixCon 2025 is happening in Switzerland. It is the annual conference for the Nix and NixOS community where Nix enthusiasts learn, share, and connect with others.
  • September 9: Kubernetes Community Day 2025 SF Bay Area event, the ultimate gathering for cloud native enthusiasts! This full-day event, sponsored by the Cloud Native Computing Foundation (CNCF), is packed with insightful cloud native talks and unparalleled opportunities for community networking.
  • September 12 - 16: PyCon AU 2025 is happening in Narrm/Melbourne. It is the national conference for the Python programming community, bringing together professional, student and enthusiast developers, sysadmins and operations folk, students, educators, scientists, statisticians, and many others besides, all with a love for working with Python.

Open Source Reads and Links

  • [Article] Google Brings the A2A Protocol to More of Its Cloud - Last month, Google transferred the A2A protocol to the Linux Foundation and we are still continuing to improve it. Be it updating the spec, integrating it into Cloud Run and GKE we are still happy to see excitement about the future of this protocol.
  • [Book] OSPO Book - Open Source Programs Offices are an important part of connecting open source communities to your company (if we do say so ourselves). If you are an open source enthusiast who thinks they can start one in their company, here is a good guide from CNCF. There's also a github repo for it.
  • [Analysis] The RedMonk Programming Language Rankings: January 2025 - Redmonk's regular analysis of programming languages. Trends are remaining mostly steady across languages, which is an interesting trend of itself!
  • [Blog] One Event at a Time: Funding Your Community the Realistic Way - Great writeup, from a PSF Board member, advising event organizers in the Python community on developing responsible and sustainable funding plans for their community events.
  • Python Software Foundation News: The PSF has paused our Grants Program - The PSF is temporarily pausing their Grants Program after reaching their 2025 grant budget cap earlier than expected. While they know how important this program is to many in the community, this is a necessary step to protect both the future of the program and the short- and long-term sustainability of the PSF. (If this moves you immediately to donate to the PSF, we welcome your contributions via our donations page).

What exciting open source events and news are you hearing about? Let us know on our @GoogleOSS X account.

Google Summer of Code 2025: Contributor Statistics

Thursday, August 14, 2025

The Numbers Are In: A Deep Dive into GSoC 2025 Stats

Google Summer of Code (GSoC) is an online global program that introduces students and beginner developers to open source software development. For our 21st year of the program we welcomed 1280 Contributors from 68 countries who are coding for 185 Mentoring Organizations.

With the coding period starting June 2nd, GSoC contributors are focused on their 2025 projects alongside their Mentors and the thriving open source communities they are working with. We are excited to share some statistics about the accepted contributors in this year's program.

Accepted GSoC Contributors

  • 92.32% are participating in their first GSoC
  • 43.04% had not contributed to open source before GSoC 2025
  • 89.02% are enrolled in an academic program
An infographic titled Google Summer of Code 2025: The numbers are in!. The image provides the following statistics:

Proposals: 23,000+ proposals were received from 15,000+ individual applicants, representing 130 countries.

Applicants: Over 96% of applicants were applying to GSoC for the first time.

Contributors: 89% of GSoC 2025 contributors are enrolled in an academic program.

Mentorship: The program has 2,100+ mentors from 75 countries and involves 185 open-source organizations.

Project Size: A bar chart shows the project size distribution:

Large (~350 hours): 54%

Medium (~175 hours): 42%

Small (~90 hours): 4%

Projects

  • 53.68% of projects were large (~350 hours), 41.54% medium (~175 hours), 4% (~90 hour) projects
  • Currently, 77.9% of projects are the standard 12 weeks in length, with 18.3% extending their projects between 14-22 weeks.
Proposals

We got a whopping 15,240 applicants submitting proposals (an increase of 130% of our previous high - a new record!) from 130 countries. These folks submitted 23,559 proposals, a 159% increase over last year!

96.55% applied to GSoC for the first time in 2025

Registrations

We had a record 98,698 people registering from 172 countries for the 2025 program, an increase of 124.4% over the previous high.

Mentors

This summer, 185 open-source organizations are participating in GSoC. Their projects are supported by over 2,100 mentors from 75 countries. These dedicated volunteers guide new contributors, helping them hone their skills.

Many of these mentors are highly experienced. Almost two-thirds have mentored GSoC contributors for four or more years.

A big thank you for being part of this wonderful community and for helping to spread the word about GSoC, which offers an invaluable opportunity for all the individuals beginning their journey in Open Source. We'll keep you updated with future entries about GSoC 2025, stay tuned!

by Stephanie Taylor, Mary Radomile & Lucila Ortiz, Google Open Source Team

This Week in Open Source #7

Friday, August 8, 2025

This Week in Open Source for 08/08/2025

A look around the world of open source
by Daryl Ducharme, Google Open Source

Upcoming Events

  • August 14-16: Open Source Festival 2025 (OSCAFest'25) is happening in Lagos, Nigeria. It uses community to help integrate the act of open source contribution to African developers whilst strongly advocating the movement of free and open source software.
  • August 25-27: Open Source Summit Europe (OSSEU) is happening in Amsterdam, Netherlands. It is the premier event for the open source community to collaborate, share information, solve problems, and gain knowledge, furthering open source innovation and ensuring a sustainable open source ecosystem. Many Googlers will be there giving talks along with so many others.
  • September 5-7: NixCon 2025 is happening in Switzerland. It is the annual conference for the Nix and NixOS community where Nix enthusiasts learn, share, and connect with others.

Open Source Reads and Links

  • The Asymmetry of Open Source - Open source software projects need funding, but users are not obligated to pay for them. Companies should invest in open source to maintain quality and avoid issues, while hobbyists can contribute without financial pressure. Proper boundaries and mutual responsibility between companies and developers are essential for a healthy open source ecosystem. How do we find and set those boundaries?
  • Linux Foundation Announces Intent to Form Developer Relations Foundation - The Linux Foundation has created the Developer Relations Foundation which aims to unify best practices and enhance the role of developer relations in technology. The DRF will focus on collaboration and shared knowledge. Having an open source organization behind this, helps to make sure DevRel is always of service to developers along with whoever is employing them.
  • 5 tips to get started on accessibility - Not exactly open source and yet super important. So important to the open source community that All Things Open posted it on their site. Accessibility (A11y) is always useful. The more it gets used properly, the more useful it is for everyone.
  • Bringing open source development to Trust and Safety - Ever open source champion, former Googler and now COO at Roost, Anne Bertucio discusses how some teams still have a difficult time understanding open source. The standards that they are used to don't always occur within the transparent world of open source. This means, bringing open source to those teams requires understanding where they are coming from and discussing its limitations as well as its benefits.
  • How we made JSON.stringify more than twice as fast - One of the beautiful things about open source is the transparency in projects. Google's Chromium V8 engine is no exception. This walk through of the technical structuring that led to a faster JSON.stringify is a great way to learn some approaches to solving software bottlenecks that you may not have thought of. With it being open source, you can also visit the repository and follow along with the history of these code changes.

What exciting open source events and news are you hearing about? Let us know on our @GoogleOSS X account.

What's new in Apache Iceberg v3?

Thursday, August 7, 2025

A Deeper Dive into Apache Iceberg V3: How New Designs Are Solving Core Data Lake Challenges

The Next Chapter for Apache Iceberg: Welcoming the Iceberg V3 Spec
by Talat Uyarer, BigQuery Managed Iceberg & Shane Glass, Google Open Source Programs Office

An infographic illustrating the new features in Apache Iceberg V3. In the center is a logo of an iceberg with V3 written on it. Arrows point from the central logo to four surrounding illustrations, each representing a new feature: Top left: Deletion Vectors, depicted as a tall stack of data blocks. Top right: Variant Data Type, shown as a collection of colorful circles and cubes. Bottom right: Geospatial Data Types, illustrated by a folded world map with location pins. Bottom left: Row Lineage, represented by a grid of various colorful icons.

The data community has long grappled with the challenge of how to bring database-like agility to petabyte-scale datasets stored in open cloud storage. The trade-off has often been between the scalability of data lakes and the performance and ease-of-use of traditional data warehouses. Executing fine-grained updates or evolving table schemas on massive tables often required slow, expensive, and disruptive operations.

The Apache Iceberg project is taking on this challenge. Early versions introduced a revolutionary metadata layer that brought reliability and ACID transactions to data lakes. However, certain operations still presented performance bottlenecks at scale.

With the ratification of the V3 specification, the Apache Iceberg community has introduced new designs that directly address these core issues. These advancements represent a significant leap forward in the mission to build an open and high-performance data lakehouse architecture. Let's explore the technical details of these solutions.

More Efficient Row-Level Transactions with Deletion Vectors

A primary challenge for data lakes has been handling row-level deletes efficiently. Previous approaches, like positional delete files, were a clever solution but could lead to performance degradation at query time when a reader had to reconcile many small delete files against large data files.

The Iceberg V3 spec introduces binary deletion vectors, a more performant and scalable architecture. The core idea is to attach a bitmap to each data file, where each bit corresponds to a row, marking it as deleted or not.

When a query engine reads a data file, it also reads its corresponding deletion vector. As it scans rows, it can check the bitmap with minimal overhead and skip rows marked for deletion. This design is made exceptionally efficient through the use of Roaring bitmaps. This data structure is ideal for this task because it can compress sparse sets of integers—like the positions of deleted rows—into a tiny footprint.

The practical difference is profound:

  • Previous Model (Positional Deletes): A query might involve reading a central log of deletes, like deletes.avro, containing tuples of (file_path, row_position).
  • V3 Model (Deletion Vectors): Each data file (e.g., file_A.parquet) is paired with a small, efficient sidecar file (e.g., file_A.puffin) containing a Roaring bitmap of its deleted rows.

This change localizes delete information, streamlines the read path, and dramatically improves the performance of workloads that rely on frequent Change Data Capture (CDC) or row-level updates.

Simplified Schema Evolution with Default Column Values

Another common operational hurdle in managing large tables has been schema evolution. Adding a column to a table with billions of rows traditionally required a "backfill"—a costly and time-consuming job to rewrite all existing data files to add the new column.

Iceberg V3 eliminates this friction with default column values. This feature allows a default value to be specified directly in the table's metadata when a column is added.

ALTER TABLE events ADD COLUMN version INT DEFAULT 1;

This operation is instantaneous because it only modifies metadata. No data files are touched. When a query engine encounters an older data file without the version column, it consults the table schema, finds the default value, and seamlessly populates it in the query results on the fly. This simple but powerful mechanism makes schema evolution a fast, non-disruptive operation, allowing data models to evolve quickly.

Improved Query Engine Compatibility with Enhanced Data Types and Lineage

Beyond these headline features, V3 broadens the capabilities of Iceberg to support more advanced use cases:

  • Row-Level Lineage: For robust auditing and reliable CDC pipelines, V3 formalizes the tracking of row history. By embedding metadata about when a row was added or last modified, Iceberg tables can now provide a clear lineage, simplifying data governance and enabling more efficient downstream data replication.
  • Rich Data Types: V3 closes the gap with traditional databases by introducing a more expressive type system. This includes a VARIANT type for handling semi-structured data like JSON, native GEOMETRY and GEOGRAPHY types for advanced geospatial analysis, support for nanosecond-precision timestamps with the new timestamp_ns and timestamptz_ns data types, a significant increase from the previous microsecond limit.

Building the Future of the Open Data Lakehouse

These V3 features—deletion vectors, default values, row lineage, and richer types—are more than just individual improvements. Together, they represent a cohesive step toward a new paradigm where the lines between the data lake and the data warehouse are erased. They enable faster, more efficient, and more flexible data operations than previously thought possible.

This progress is a testament to the collaborative spirit of the Apache Iceberg community. At Google, we are proud to contribute to and support open-source projects like Iceberg that are defining the future of data architecture. We are excited to see the innovative applications the community will build on this powerful new foundation.

Want to get started with Iceberg? Check out this blog post to learn more about how Google Cloud's managed Iceberg offering, BigLake tables for Apache Iceberg in BigQuery, makes building Iceberg-native lakehouses easier by maximizing performance without sacrificing governance.


This Week in Open Source #6

Friday, August 1, 2025

This Week in Open Source for 08/01/2025

A look around the world of open source

by Daryl Ducharme & amanda casari, Google Open Source Programs Office

Diving into the open source world this week, we'll cover upcoming events that foster collaboration and innovation, alongside new reads and links that highlight significant advancements and discussions within the open source community. From new Google projects enhancing package ecosystem confidence to thought-provoking articles on open source funding, we hope this keeps you aware of new areas of the ecosystem.

Upcoming Events

  • August 14-16: Open Source Festival 2025 (OSCAFest'25) is happening in Lagos, Nigeria. It uses community to help integrate the act of open source contribution to African developers whilst strongly advocating the movement of free and open source software.
  • August 25-27: Open Source Summit Europe (OSSEU) is happening in Amsterdam, Netherlands. It is the premier event for the open source community to collaborate, share information, solve problems, and gain knowledge, furthering open source innovation and ensuring a sustainable open source ecosystem. Many Googlers will be there giving talks along with so many others.
  • September 5-7: NixCon 2025 is happening in Switzerland. It is the annual conference for the Nix and NixOS community where Nix enthusiasts learn, share, and connect with others.

Open Source Reads and Links

  • [Blog] Google introduced OSS Rebuild, a new project designed to enhance confidence in open source package ecosystems through the reproduction of upstream artifacts.
  • [Story] SF-Based Internet Archive Is Now a Federal Depository Library. What Does That Mean? - The Internet Archive is a foundational reference and repository for open-access information and digital archives.The San Francisco-based digital library now has federal depository status, joining a network of over 1,100 libraries that archive government documents and make them accessible to the public — even as ongoing legal challenges pose an existential threat to the organization.
  • [Video] Keynote: Building community through collaborative datasets - Mago Torres' keynote from csv,conf 8, on her work building collaborative datasets for award-winning data journalism, captures the spirit and focus on where open technology enables communities to accomplish more together.
  • [Paper] Anubis Pilot Project Report - June 2025 - In May and June 2025, Duke University Libraries (DUL) successfully implemented Anubis, a configurable open source web application firewall (WAF), to combat persistent AI-related bot scraping. During this pilot (May 1 - June 10, 2025), aggressive bot scraping caused outages for three critical library platforms (Duke Digital Repository, Archives & Manuscripts, and the Books & Media Catalog); Anubis mitigated the problem in each instance.
  • [Article] Microsoft-owned GitHub says open source needs to be funded - The Register published this editorial which asks whether open source software has reached the point that it should be managed as infrastructure and funded by governments that rely on it? Some studies show impressive numbers in how much it contributes to many economies.
  • [Blog] Open Source Explained Like You're Five (But Smarter) - Explaining open source to people outside the tech world is tough. This article uses some good metaphors along with some details you may not have known to better explain it and spread the word. Or, you could just send them this article and hope they read it. 😜

What exciting open source events and news are you hearing about? Let us know on our @GoogleOSS X account.

.