Skip to main content

OSPP 2023 深度洞察报告

· 15 min read
Frank Zhao
Ph.D candidate at X-lab, author of OpenDigger
Will Wang
Prof. @ ECNU / Founder of X-lab

背景介绍

开源之夏 OSPP 是中国科学院软件研究所发起的“开源软件供应链点亮计划”系列暑期活动,旨在鼓励高校学生积极参与开源软件的开发维护,促进优秀开源软件社区的蓬勃发展,至今已成功举办五届(2020 ~ 2024),X-lab 开放实验室从第一届就开始深度参与。

OpenDigger 作为一直以来深入参与 OSPP 的开源数据研究项目,也在此就 OSPP 2023 年的数据做一次深度的分析,也算是对 OSPP 社区的一次回馈。

OSPP 2023 宏观数据

根据 OSPP 社区的数据报告,2023 年度,OSPP 总共发布了项目 593 个,有学生中选项目共计 504 个,最终结项项目为 421 个,结项率高达 71%

OSPP 2023 年度高校贡献度排行榜
项目总数中选项目数结项项目数结项率(%)高校数量
593increase/decrease91
504increase/decrease56
421increase/decrease73
71increase/decrease2
144increase/decrease13

最终结项项目大部分除了个别与操作系统内核相关的社区使用了自己的 git 仓库外,大部分社区均托管于 GitHub(298 个)、Gitee(112 个)等代码托管平台上,平台的总体分布如下:

从结项项目的学生所属高校来看,结项的 421 个项目由分别来自 144 所高校的学生最终完成,其中北京邮电大学、浙江大学、华中科技大学以 20 个以上的学生数量领跑各高校,具体的分布如下所示:

年度贡献度分析

除了上述一些统计数据外,我们也希望可以给出一些更加深入的洞察,例如每个高校中不同学生在社区中具体的贡献度等,这种精细化的分析也有助于我们进一步观察学生在整个过程中对于项目的协同参与程度,而不仅仅局限于学生是否仅是完成了一个特定的任务。

注意:受限于 OpenDigger 目前的底层基础数据,下述分析将仅包含 GitHub、Gitee 平台上的数据。

我们使用了 2023 全年的贡献度数据和社区 OpenRank 算法对参与到各社区学生的参与度进行了详细的分析,最终统计到各高校总体贡献度前 20 名如下表所示:

OSPP 2023 年度高校贡献度排行榜
#高校名称OpenRank参数学生数人均 OpenRank
1
华中科技大学
67.3increase/decrease43.57
21increase/decrease3
3.21increase/decrease1.89
2
浙江大学
61.23increase/decrease16.62
23increase/decrease9
2.66increase/decrease2.9
3
北京邮电大学
60.19increase/decrease35.17
27increase/decrease5
2.23increase/decrease0.75
4
西安电子科技大学
60.05increase/decrease37.86
13increase/decrease4
4.62increase/decrease2.15
5
复旦大学
59.7increase/decrease7.51
4increase/decrease8
14.93increase/decrease10.58
6
西安邮电大学
55.67increase/decrease24.09
10increase/decrease3
5.57increase/decrease3.14
7
华东师范大学
54.15increase/decrease19.2
13increase/decrease2
4.17increase/decrease2.5
8
电子科技大学
50.6increase/decrease35.74
14increase/decrease8
3.62increase/decrease1.14
9
重庆邮电大学
48.92increase/decrease24.29
5increase/decrease3
9.78increase/decrease2.53
10
上海交通大学
48.34increase/decrease40.83
6
8.06increase/decrease6.8
11
杭州电子科技大学
41.99increase/decrease34.6
11increase/decrease8
3.82increase/decrease1.35
12
陇东学院
39.48new
1new
39.48new
13
中国科学院大学
37.36increase/decrease23.15
18increase/decrease10
2.08increase/decrease0.3
14
南京大学
33.9increase/decrease32.41
17increase/decrease15
1.99increase/decrease1.25
15
同济大学
21.35increase/decrease15.98
6increase/decrease4
3.56increase/decrease0.87
16
武汉大学
19.02increase/decrease11.33
1increase/decrease3
19.02increase/decrease17.09
17
东南大学
18.57increase/decrease8.54
8increase/decrease3
2.32increase/decrease0.32
18
北京工业大学
18.52increase/decrease18.52
3increase/decrease2
6.17increase/decrease6.17
19
成都信息工程大学
18.11new
1new
18.11new
20
福州大学
16.21increase/decrease8.01
5increase/decrease4
3.24increase/decrease20.98

我们在给出了高校总体贡献度的同时也给出了校人均 OpenRank 贡献度,可以看到华中科技大学、浙江大学、北京邮电大学依凭学生数量优势依然排在贡献榜前三位,但也有些高校因为很高的人均 OpenRank 贡献度而上榜,如复旦大学、陇东学院、武汉大学、成都信息工程大学等,他们在学生数量上并不占优,但因为个别学生的贡献度较高而使得最终的排名较高。

为了进一步观察学生的贡献情况,我们也对学生贡献者进行了 OpenRank 贡献度的排名,OpenRank 前 20 的学生如下:

OSPP 2023 年度学生贡献度排行榜
#学生姓名OpenRank学校参与社区活跃月数
1
王**
50.361
复旦大学
Apache HugeGraph
16
2
潘**
44.955
上海交通大学
MatrixOne
19
3
姬**
39.475
陇东学院
Spring Cloud Alibaba
19
4
孟**
34.52
重庆邮电大学
Apache SkyWalking
18
5
刘**
25.838
西安电子科技大学
OpenMessaging
10
6
王**
25.15
电子科技大学
MegEngine(旷视天元)
13
7
谭**
24.831
华中科技大学
GraphScope
12
8
张**
19.65
西安电子科技大学
泰晓科技
9
9
乔*
19.016
武汉大学
Apache RocketMQ社区
14
10
周**
18.924
中国科学院大学
openEuler 社区
9
11
黄**
18.115
成都信息工程大学
CubeFS
15
12
朱**
17.194
华东师范大学
OpenDigger
14
13
应**
16.561
杭州电子科技大学
Volcano社区
10
14
李**
14.307
华东师范大学
OpenDigger
14
15
丛**
14.045
山东大学
Apache HugeGraph
12
16
徐*
13.995
华东理工大学
Apache Kvrocks (Incubating)
8
17
刘*
13.865
华中科技大学
Apache HugeGraph
16
18
陈**
13.452
浙江大学
Curve
6
19
张**
12.606
西安邮电大学
Linux内核之旅开源社区
16
20
兰**
12.581
四川大学
DLRover
8

通过对于学生个体的分析,一些贡献度极高的学生就可以清晰的看到,例如来自陇东学院的姬同学在 Spring Cloud Alibaba 社区、来自成都信息工程大学的黄同学在 CubeFS 社区、来自武汉大学的乔同学在 Apache RocketMQ 社区的参与,他们都仅凭一己之力将自己学校的总体贡献度拉入到高校前 20。

同时上表也给出了这些同学从 2023 年 1 月到 2024 年 7 月中在参与项目中的活跃月数,可以看到前 20 位的同学的活跃月数均达到了 6 个月以上,而上述提到的几位同学贡献时长都达到了 12 个月以上,这里也体现出了 OpenRank 鼓励长期贡献的价值取向。

相应的,我们也给出了 2022 年学生贡献排名前 20 位的同学:

OSPP 2022 年度学生贡献度排行榜
#学生姓名OpenRank学校参与社区活跃月数
1
唐**
42.181
华东师范大学
Apache ECharts
29
2
程*
40.912
浙江大学
Karmada
23
3
杨*
35.699
中国传媒大学
Element Plus
22
4
朱**
31.264
东北大学
Apache Dubbo
23
5
容*
25.844
百色学院
Apache APISIX
27
6
黄**
24.218
福州大学
Apache RocketMQ 社区
12
7
孟**
24.177
重庆邮电大学
Apache Pulsar
30
8
宋**
22.948
复旦大学
Apache SkyWalking
27
9
陈*
19.426
北京邮电大学
Milvus
25
10
范**
16.426
University College London, University of London
Apache Pulsar
8
11
张**
14.617
华东师范大学
DevLake
17
12
赵**
13.8
北京邮电大学
OpenMLDB
5
13
杨*
13.085
西安邮电大学
Curve
18
14
崔**
12.279
桂林电子科技大学
MegEngine(旷视天元)
28
15
叶**
11.502
College of William and Mary
Alluxio
6
16
韩**
9.98
北京邮电大学
KubeVela
15
17
张**
9.443
湖南工业大学科技学院
Apache DolphinScheduler
9
18
杨**
9.157
中国原子能科学研究院
Jina AI
10
19
吴**
9.077
浙江大学
Linux内核之旅开源社区
9
20
吴**
8.831
New York University
Hypercrx
30

后续持续贡献分析

我们可以看到,OSPP 拉动了大量高校的优秀学生在校期间就深入参与到开源社区的贡献之中,那么这些学生后续的活跃情况如何呢?为此我们也进行了更长期的跟踪分析,看一下在 OSPP 结束之后,还有多少的同学继续留在社区中持续的参与贡献。

上图是 2022 年 1 月到 2024 年 7 月所有结项学生的贡献度变化情况,我们可以看到虽然在每年的 9 月份是一个贡献高峰期,但在全域的贡献上保持了一种相对稳当的状态,说明学生们除了参与 OSPP 以外,后续也持续的参与到了开源世界其他项目的贡献之中,也说明 OSPP 为他们打开了一扇通往开源世界的大门。

学生全域贡献度排行榜
学生姓名OpenRank学校参与项目
杨*
315.068
中国传媒大学
YunLeFun/status
YunYouJun/valaxy
element-plus/element-plus
姬**
148.622
陇东学院
alibaba/spring-cloud-alibaba
spring-cloud-alibaba-group/spring-cloud-alibaba-group.github.io
apache/hertzbeat
刘**
136.224
杭州电子科技大学
iyear/tdl
iyear/pure-live-core
devstream-io/devstream
唐**
132.826
华东师范大学
hypertrons/hypertrons-crx
X-lab2017/open-wonderland
X-lab2017/open-research
郑**
132.375
浙江大学
eunomia-bpf/eunomia-bpf
eunomia-bpf/bpftime
eunomia-bpf/bpf-developer-tutorial
刘**
107.148
电子科技大学
SciSharp/LLamaSharp
SciSharp/TensorFlow.NET
Oneflow-Inc/oneflow
容*
91.659
百色学院
apache/apisix-ingress-controller
apache/apisix
apache/apisix-helm-chart
崔**
89.637
桂林电子科技大学
PaddlePaddle/Paddle
PaddlePaddle/PaddleSeg
openvinotoolkit/openvino
左*
89.047
哈尔滨医科大学
Well2333/nonebot-plugin-bilichat
djkcyl/BBot-Graia
IceTiki/ruoli-sign-optimization
林**
88.883
华东交通大学
Undertone0809/promptulate
PKUFlyingPig/cs-self-learning
langchain-ai/langchain

我们可以看到除了 OSPP 的开源社区外,很多同学还大量参与了其他开源社区的贡献,而来自陇东学院与百色学院的两位同学则是长期参与到了自己参加的 OSPP 的社区之中,成为了稳定的贡献者甚至 Committer。

Redis Changed Open Source License! Are Cloud Providers Really Freeloading off the Open Source Community?

· 8 min read
Frank Zhao
Ph.D candidate at X-lab, author of OpenDigger
Xiaoya Xia
Ph.D candidate at X-lab, CHAOSS APAC Manager

Origin

On March 21, 2024, Rowan Trollope, CEO of Redis Inc., the company behind the famous key-value database open source project Redis, announced a change in the project's licensing type from the original BSD open source license to a dual license under RSALv2 and SSPLv1.

This license change was primarily aimed at protecting Redis Inc.'s commercial interests, preventing cloud providers from using the open source version to offer commercial Redis SaaS services. Such actions are not uncommon; companies like Confluent, MongoDB, and Elastic have made similar license changes to their open source projects to protect their interests. However, this move by Redis triggered anger among many developers, a significant reason being that the Redis community includes many external contributors, and such unilateral modifications of the license are seen as damaging to the community and harmful to these contributors.

So, who exactly is deeply involved in contributing to the Redis community?

In-depth

The chart below shows the contribution distribution of the OpenRank [1] top 10 developers in the Redis community every year since 2020. We can tell that the Redis community has been trending towards diversification. The contribution share of internal Redis developers has decreased from nearly 80% in 2020, and by the first quarter of 2024, the contribution share of the top 10 developers within Redis had dropped to less than 40%. Many companies, including AWS, Alibaba Cloud, Tencent Cloud, and Ericsson, have been deeply involved in contributing to the Redis community for years, with their contribution intensity increasing annually.

At the end of June 2020, Salvatore Sanfilippo (@antirez), the original author of Redis, announced in a blog post that he was stepping down from maintenance work of the Redis community, handing over the community maintenance tasks to Yossi Gottlieb (@yossigo) and Oran Agra (@oranagra) from the then-called RedisLabs. At the same time, the aforementioned duo published an article stating the initiation of a new community governance model, and together with Itamar Haber (@itamarhaber), formed the core development team of the Redis community. The following month, Madelyn Olson (@madolson) from AWS and Zhao Zhao (@soloestoy) from Alibaba Cloud joined the core development team, which remained stable until the recent license change of Redis.

Besides the six core developers mentioned above, Zhu Binbin (@enjoy-binbin) joined the Tencent Cloud database product department due to his long-term involvement in Redis community. In addition to Zhao Zhao, Alibaba Cloud has three other developers who have appeared in the top 10 contributors list over the years. Overall, the current cloud providers like AWS, Alibaba Cloud, Google, and Tencent Cloud have nearly 20 developers actively contributing to the Redis community. The investment of cloud providers in the Redis community is evident and contrasts with the common perception of cloud providers freeloading off the open source community.

Split

Because of the involvement of numerous cloud provider contributors that, following Redis's announcement of the license change, Madelyn Olson from AWS immediately initiated a fork of Redis named Valkey, with plans to host it under the Linux Foundation. Google and Ericsson have already explicitly expressed their support for the development of the Valkey community.

Other cloud provider developers are almost left with no choice but to migrate to the Valkey community, as the new licensing terms for Redis exclude cloud providers, preventing them from continuing to contribute to the Redis community. It appears that Redis has no intention of allowing the community to deeply participate in further development. According to feedback from several Chinese committers, the permissions of the GitHub redis-committers team were revoked within a week, removing the external committers' repository write access and Issue/PR management access. Now, their permissions in the Redis project are essentially the same as those of ordinary users.

"In addition to participating in specific functional contributions to the Redis community, we also contribute back to the community the fixes and improvements in aspects such as functionality, performance, stability, and observability that we have accumulated in our cloud products. The rich user base of our cloud products also conveys a large number of real users' needs to the upstream community. We believe it is our responsibility, and we trust that a thriving open source community is worth our long-term maintenance," said Zhao Zhao of Alibaba Cloud.

From the data, among the top 10 contributors to Redis in 2024, aside from two developers from Redis Inc., the remaining seven have already participated in the development of the Valkey project. This indicates that the Valkey project has effectively become the new community and is now in normal operation, while the developers at Redis Inc. will continue to independently develop and maintain the Redis project.

Data updated in 2024.9

From a macro data perspective, the Redis community has maintained an OpenRank collaboration influence of around 80 in the past six months, while Valkey, just ten days after being open sourced in March, soared to an OpenRank of about 40, reaching half of the Redis project's level. In terms of the number of developers participating in the community, Redis has been maintaining a scale of around 100 people per month. In March, due to the license modification, a significant number of developers were involved in the discussion threads in the Redis community, leading to a doubling of participants to 220 people. Meanwhile, within ten days of Valkey being open sourced, the number of participants reached 146, surpassing the regular scale of Redis.

Overall, the split in the Redis community seems irreversible. With Valkey being donated to the Linux Foundation, it is expected that more open source developers will participate in the contribution and development of Valkey.

Ripples

Just as the underlying value proposition of the OpenRank algorithm suggests, the world is always interconnected and influencing each other; any event will not only affect itself but will ripple through to other related parts. As mentioned in the 2023 China Open Source Annual Report, we discovered through data that in September 2023, Unity's change in pricing strategy directly led to the largest growth spurt since the inception of the open source game engine godotengine. Of the 80,000 stars for this project, which has been open source for over a decade, more than 10,000 stars came from September 2023 alone, as game developers responded to Unity's decision with support for open source.

Apart from giving rise to a new forked community, Valkey, the license change in Redis also led many developers in need of key-value databases to start paying attention to other open source projects related to Redis. Apache Software Foundation's kvrocks is one such project. Unlike Redis, which is an in-memory key-value database, kvrocks is a disk-based key-value database. As seen in the graph below, kvrocks experienced a significant increase in various metrics in March, possibly because it is a foundation-host project. In an era where the enterprises behind open source projects can bypass set community rules and unilaterally change the license at any time, hosting a project with a foundation may provide developers with a greater sense of security.

Close

The practice of cloud providers exploiting open source projects has been a criticism among open source developers in the past few years, but changes are subtly taking place. More cloud providers are recognizing the importance of community and are willing to invest employees and even material resources into the open source communities they depend on, to ensure that their cloud services can develop in better coordination with upstream sources. In the future, we believe that the upstream and downstream of the open source community can collaborate more effectively, creating a win-win situation for all parties involved. Effective contributions and influence evaluation in open source are prerequisites for forming a healthier and more efficient collaborative mechanism, to identify those developers who make genuine contributions, and to ensure that what they create can indeed belong to them.

References:

[1]: OpenRank Paper in ICSE 2024

Awesome OpenRank

· One min read
Will Wang
Prof. @ ECNU / Founder of X-lab