OpenRank 开源评价系列算法介绍

August 9, 2025 · 9 min read

Ph.D candidate at X-lab, author of OpenDigger

OpenRank 开源评价系列算法使用开发者协作网络，配合将 PageRank 算法推广至具初值的高维异质网络的 OpenRank 算法，针对不同开源评价应用场景所提出的系列算法，主要包括了 OpenRank 影响力评价算法与 OpenRank 贡献度评价算法。

OpenRank 影响力评价算法

简介：OpenRank 影响力评价算法脱胎于开源价值流网络思想，通过开源生态全域协作数据对开发者与开源项目之间的协作关系进行建模，并通过 OpenRank 算法对开发者与开源项目按月度进行影响力协同评价，形成具有时序连续性的开源生态影响力评估的北极星指标。
学术：OpenRank 影响力评价算法对应的学术论文《OpenRank 动力学：面向开源生态的影响力评估与动态传播模型》获得首届开源技术学术大会最佳论文奖（大会唯一），并发表于中文核心学术期刊《计算机科学》2025 年 8 月刊。该论文介绍了 OpenRank 影响力评价指标的构建方法，并使用动态传播模型证明了 OpenRank 影响力评估算法在开源生态影响力评估中的相较于传统影响力评估模型在有效性与评估效率上的优势。
标准：OpenRank 影响力算法作为一种开源社区的评价算法已进入《信息技术开源治理第3部分：社区治理框架》标准之中，该系列标准同时入选 2024 年团体标准应用推广典型案例（工信部科函〔2025〕19号），作为一种开源社区评价方法被数十家企业所采纳与使用。
报告：OpenRank 影响力指标发明以来，受到业界的广泛认可，自 2022 年以来深度支撑了《中国开源年度报告》、《中国开源发展蓝皮书》等一系列重要的年度开源数据报告，同时支撑了《中国十年开源洞察报告》、《2022 开源大数据热力报告》、《大模型开源开发全景图》、《CCF 开源战略动态月报》等开源报告。
应用：OpenRank 影响力指标除支持大量开源数据报告外，还形成了丰富的下游应用生态，如 OpenLeaderboard、HyperCRX、开放原子基金会全球开源协作全景图、OSGraph、PolarDB 开源社区洞察大屏等，为众多政府机构、企业与社区提供开源洞察能力。

OpenRank 贡献度评价算法

简介：OpenRank 贡献度评价算法是 OpenRank 算法在开源评价领域的又一重要应用，利用开源社区内部的协作网络，配合协作单元的价值量化，形成具有时序连续性的开发者贡献度评价方法。该方法可广泛应用于具有协作特性的数字制品生产过程中的贡献度度量，从而为开发者经济生态量化提供坚实的理论基础。
学术：OpenRank 贡献度算法对应的学术论文《OpenRank Leaderboard: Motivating Open Source Collaborations Through Social Network Evaluation in Alibaba》发表在软件工程国际学术顶级会议 ICSE 2024，赵生宇博士在葡萄牙里斯本针对该论文进行了学术汇报，得到广泛关注与认可。该论文利用 OpenRank 算法构建社区贡献度度量算法，并在阿里巴巴集团进行落地与持续一年的追踪研究，证明了该算法在开源贡献度量中的有效性，同时证明该算法在公开算法机制的前提下依然可以促进社区的有效协作与健康发展。值得一提的是，该论文作者中包含爱尔兰软件研究中心（Lero）创始人兼首席科学家 Brian Fitzgerald、Oulu 大学教授 Davide Taibi 团队、蚂蚁开源办公室开源数据科学家夏小雅以及阿里巴巴开源办公室高级社区经理王蓉。另外，利用开源社区群体博弈仿真实验对 OpenRank 贡献度评价算法下的社区演化模拟与有效性验证文章也已发布至微信公众号。
标准：OpenRank 贡献度评价算法作为一种开源开发者贡献度量评价算法已进入《信息技术开源治理第5部分：开源贡献者评价模型》标准之中，该系列标准同时入选 2024 年团体标准应用推广典型案例（工信部科函〔2025〕19号），作为一种开源开发者贡献度量评价方法被数十家企业所采纳与使用。
应用：OpenRank 贡献度度量算法形成了丰富的下游生态，如 OpenTalent、开放原子基金会开源人才评价平台、阿里巴巴开源开发者贡献榜、蚂蚁集团开源开发者贡献榜等。OpenRank 同时已于 2025 年 8 月与天工开物开源基金会达成战略合作，将构建服务于开源基金会的开源开发者激励平台，助力共建开源开发者经济生态。目前部分社区已尝试基于 OpenRank 贡献度度量算法的社区激励机制，总体分配金额规模已超过百万元人民币。

OpenDigger

OpenDigger 项目作为上述 OpenRank 系列指标生产的基础设施平台，承接了数据分析与指标生产的工作，是中国乃至全球开源度量的重要基础设施。

OpenDigger 项目包含百亿级开源仓库托管平台开发者行为数据（GitHub、Gitee、GitCode、GitLink 等），千万级软件供应链数据（Maven、PyPI、npm 等）、4000+ 开源项目标签数据等开源生态核心数据，生产数据指标文件超 2000 万个，指标接口月度消费超百万次。

OpenDigger 项目荣获 2023 上海开源创新卓越成果特等奖、2024 国际测试委员会 BenchCouncil 全球开源 Top 100 成果奖、2025 上海开源创新菁英优秀开源项目奖等，上海市经信委代表在 2024 开放原子生态大会的《上海市开源生态体系建设思考》发言中指出 OpenDigger 为上海市优秀开源项目与重点支持开源项目。

另外，OpenDigger 孵化的下游浏览器插件项目 HyperCRX 荣获 2024 上海开源创新卓越成果奖特等奖，学术论文发表于软件工程国际顶会 ICSE 2025。

OSPP 2024 深度洞察报告

July 16, 2025 · 12 min read

Frank Zhao

Ph.D candidate at X-lab, author of OpenDigger

Will Wang

Prof. @ ECNU / Founder of X-lab

背景介绍

开源之夏 OSPP 是中国科学院软件研究所发起的“开源软件供应链点亮计划”系列暑期活动，旨在鼓励高校学生积极参与开源软件的开发维护，促进优秀开源软件社区的蓬勃发展，至今已成功举办六届（2020 ~ 2025），X-lab 开放实验室从第一届就开始深度参与。

OpenDigger 作为一直以来深入参与 OSPP 的开源数据研究项目，也在此就 OSPP 2024 年的数据做一次深度的分析，作为 OSPP 2023 年数据报告的延续。

OSPP 2024 宏观数据

根据 OSPP 社区的数据报告，2024 年度，OSPP 总共发布了项目 562 个，有学生中选项目共计 519 个，最终结项项目为 455 个，结项率高达 81%。在更严格的筛选下，发布项目总数相较 2023 年有所降低，但其他数据均有显著提高，尤其参与高校数量增长高达 30%，活动的影响力之大可见一斑。

OSPP 2024 年度高校贡献度排行榜

项目总数	中选项目数	结项项目数	结项率(%)	高校数量
56231	51915	45534	8110	18642

最终结项项目大部分除了个别与操作系统内核相关的社区使用了自己的 git 仓库外，大部分社区均托管于 GitHub（315 个）、Gitee（136 个）等代码托管平台上，分布比例与 2023 年持平，平台的总体分布如下：

从结项项目的学生所属高校来看，结项的 455 个项目由分别来自 186 所高校的学生最终完成，其中华中科技大学、北京邮电大学以 20 个以上的学生数量领跑各高校，具体的分布如下所示：

年度贡献度分析

除了上述一些统计数据外，我们也希望可以给出一些更加深入的洞察，例如每个高校中不同学生在社区中具体的贡献度等，这种精细化的分析也有助于我们进一步观察学生在整个过程中对于项目的协同参与程度，而不仅仅局限于学生是否仅是完成了一个特定的任务。

注意：受限于 OpenDigger 目前的底层基础数据，下述分析将仅包含 GitHub、Gitee 平台上的数据。

我们使用了 2024 全年的贡献度数据和社区 OpenRank 算法对参与到各社区学生的参与度进行了详细的分析，最终统计到各高校总体贡献度前 20 名如下表所示：

OSPP 2024 年度高校贡献度排行榜

#	高校名称	OpenRank	参数学生数	人均 OpenRank
1	西安邮电大学	85.1329.46	155	5.680.11
2	陇东学院	61.3721.89	1	61.3721.89
3	上海大学	42.21	2	21.11
4	北京邮电大学	42.2117.98	1710	2.480.25
5	华中科技大学	32.3734.93	174	1.91.3
6	西安财经大学	27.25	3	9.08
7	清华大学	26.77	8	3.35
8	重庆邮电大学	24.5424.38	41	6.133.65
9	南京大学	20.8113.09	143	1.490.51
10	东南大学	19.440.88	8	2.430.11
11	浙江大学	19.2441.99	914	2.140.52
12	中国科学技术大学	19.15	11	1.74
13	山东大学	16.1	5	3.22
14	上海交通大学	16.0932.25	51	3.224.84
15	中国科学院大学	14.8922.47	144	1.061.01
16	武汉大学	14.624.4	32	4.8714.14
17	华南理工大学	14.57	4	3.64
18	北京航空航天大学	14.35	5	2.87
19	华东师范大学	13.8340.32	76	1.982.19
20	广东工业大学	12.95	5	2.59

我们在给出了高校总体贡献度的同时也给出了校人均 OpenRank 贡献度，可以看到西安邮电大学凭借多名学生在 Linux 内核之旅开源社区的活跃与贡献获得本年度的高校贡献度第一名，并且在贡献度前 20 位的高校中，有 8 所都是本年度新上榜的高校。

为了进一步观察学生的贡献情况，我们也对学生贡献者进行了 OpenRank 贡献度的排名，OpenRank 前 20 的学生如下：

OSPP 2024 年度学生贡献度排行榜

#	学生姓名	OpenRank	学校	参与社区
1	姬**	61.37	陇东学院	Spring Cloud Alibaba
2	杨*	41.03	上海大学	昇思MindSpore
3	邵**	23.18	西安财经大学	OI Wiki
4	杨**	19.85	西安邮电大学	Linux内核之旅开源社区
5	徐**	19.29	西安邮电大学	Linux内核之旅开源社区
6	陈**	18.43	重庆邮电大学	PikiwiDB(Pika)开源社区
7	张**	16.2	西安邮电大学	Linux内核之旅开源社区
8	张**	13.19	西安邮电大学	Linux内核之旅开源社区
9	杨**	12.83	电子科技大学成都学院	清华大学 TUNA 协会
10	刘**	11.47	天津中德应用技术大学	BMF字节跳动多媒体框架
11	陈**	11.46	浙江科技大学	Apache ShenYu
12	宋*	10.99	北京邮电大学	KubeEdge
13	林**	10.88	上海交通大学	Kmesh
14	周**	9.42	湖北文理学院	Volcano社区
15	甘**	9.17	华南师范大学	KubeBlocks
16	曾**	8.94	山东大学	Embox
17	范**	8.62	清华大学	清华大学 TUNA 协会
18	陈**	8.56	北京理工大学	DragonOS开源社区
19	李**	8.39	武汉大学	OceanBase
20	吴**	8.13	西安电子科技大学	PikiwiDB(Pika)开源社区

通过对于学生个体的分析，一些贡献度极高的学生就可以清晰的看到，例如来自陇东学院的姬同学在 Spring Cloud Alibaba 社区、来自上海大学的杨同学在 MindSpore 社区、来自西安财经大学的邵同学在 IO Wiki 社区的参与，他们都仅凭一己之力将自己学校的总体贡献度拉入到高校前 10 位。

全域贡献分析

我们可以看到，OSPP 拉动了大量高校的优秀学生在校期间就深入参与到开源社区的贡献之中，那么这些学生是否还深入参与到其他开源社区中，以及他们在全域的贡献度如何呢？我们也统计了这些同学在整个开源的领域的贡献度以及主要贡献项目的情况，如下表所示：

学生全域贡献度排行榜

#	学生姓名	OpenRank	学校	参与项目
1	殷**	211.13	西北工业大学	mdn/translated-content mdn/content yin1999/translated-content
2	姬**	93.64	陇东学院	alibaba/spring-cloud-alibaba apache/hertzbeat spring-cloud-alibaba-group/spring-cloud-alibaba-group.github.io
3	李**	90.92	中南大学	project-trans/MtF-wiki project-trans/RLE-wiki felixonmars/archriscv-packages
4	杨*	42.89	上海大学	mindspore/mindquantum mindspore/docs mindspore/mindspore
5	吕*	34.53	华中科技大学	datenlord/datenlord antrea-io/antrea goplus/community
6	杨**	30.5	电子科技大学成都学院	llvm/circt chipsalliance/chisel chipsalliance/t1
7	蒋**	29.49	Aalto University	mit-cml/workspace-multiselect Igalia/wolvic oceanbase/oceanbase
8	蔡**	25.85	重庆邮电大学	youngyangyang04/KVstorageBaseRaft-cpp OpenAtomFoundation/pikiwidb OpenAtomFoundation/pika
9	邵**	24.53	西安财经大学	OI-wiki/feedback-sys OI-wiki/OI-wiki satorijs/satori
10	孙*	21.76	南昌大学	openeuler/community openeuler/mugen openeuler/utsudo

我们可以看到除了 OSPP 的开源社区外，很多同学还大量参与了其他开源社区的贡献。

GitHub、Gitee 数据揭秘开源世界 "人口普查"，中国开发者排名第几？

April 11, 2025 · 26 min read

Frank Zhao

Ph.D candidate at X-lab, author of OpenDigger

前言

当写下这篇文章时，GitHub 和 Gitee 的服务器集群还在不停轰鸣，每小时全球有 5000 多个 PR 将超过 600 万行代码变更合入到这些开源代码托管平台之中。

从 Kubernetes 开始重构全球云计算版图，到 PyTorch 取代 MATLAB 的学术霸权、从 Hadoop 拉开大数据热潮的序幕，到 DeepSeek 开源引发中美科技竞速，过去十年浩浩荡荡的开源大潮，早已超越了技术迭代的范畴，每个开发者地域标签的背后，隐藏的是全球开源开发者的快速增长与变迁。然而全球有多少开发者，他们都在哪里，在做什么，却一直如同一团迷雾笼罩在开源生态研究者的头顶。

本文旨在通过对 GitHub 和 Gitee 平台十年数据的考古，利用 Issue、PR 的网络来编织一幅技术地缘的图景，带你走进那个尚不清晰的开源开发者世界。

本文将给出如下一些数据要点：

全球泛开发者总量已超过 1 亿，开源开发者数量超过 2200 万人。
中国开发者数量超 1000 万人，开源开发者总量约 198 万人，紧随美国、印度均位居全球第三。
2024 年中国开源开发者影响力与贡献度排名全球第二，增速分别为 4.24% 与 7.48% 为全球最快。

并且从过去十年的趋势来看，世界开源格局正在发生剧变：

美国凭借其强大的科技力量与先发优势在开源开发者的各项指标中依然占据绝对的领先位置，但无论是影响力还是贡献度在近年来都出现了明显下滑。
中国开源开发者数量稳步增长，总规模已将近 200 万人。数量上虽然被印度超越，但影响力和贡献度均稳居全球第二，并且拥有全球最快增速，正在快速缩小与美国之间的差距，并快速拉开与其他国家间的距离，标志着中国开源进入高质量发展时期。
德国、英国、法国、加拿大、荷兰等传统强国凭借其扎实的基础在各指标中都稳居全球前十，并且保持平稳的发展趋势。
俄罗斯参与到全球开源生态中的开发者数量相当可观，但受到地缘政治等各种因素影响，其影响力和贡献度都无法与其开发者体量匹配。
印度、巴西和印度尼西亚已经成为了不可忽视的开发者大国，近年来显现出强劲的增长势头，虽然开发者数量优势明显，但相较而言其影响力和贡献度与传统欧美强国还无法匹敌，仍处于相对早期的发展阶段。

中美自主开源项目的全球化程度差距较大：

美国自主开源项目全球化程度较高，非本土贡献占比在 60% 以上，中国以 8.4% 的贡献占比成为美国开源项目的第二大贡献国。
中国自主开源项目全球化程度较低，非本土贡献占比在 20% 左右，对全球开发者的吸引力和影响力仍有较大的提升空间。

全球开发者总量

全球到底有多少开发者，这个问题甚至比国家的人口普查更难，不同的统计口径与方法，都会带来不同的答案。

例如 Statista 数据称 2024 年全球开发者数量为 2870 万，IDC 的报告则称 2024 年全球开发者的数量为 3700 万，但这些咨询机构通常会利用政府侧的就业数据或抽样调查来统计和估计全职开发者数量，将兼职、业余爱好者和学生等都排除在外，即便如此不同机构的数据结论也相去甚远。而且随着计算机教育的高度普及和开源模式带来的生产关系变革，数字游民的数量也开始激增，广义的开发者数量已经难以通过就业数据来反映。

不过在所有基础软件和开发框架都默认选择开源的时代，开发者几乎都无法绕过 GitHub 这个全球最大的开源代码托管和开发者社交平台，即便是如 Gitee 这样的中国平台，其注册用户估计也有八成以上同时也是 GitHub 的用户。因此透过 GitHub 的用户数据来窥探软件开发者的总体规模成为了一个新的有效途径。

2023 年 9 月，GitHub 发布了 Innovation Graph 项目，这是一个开放数据的发布与洞察平台。GitHub 会将内部的宏观统计数据进行清洗与整理，并以季度为粒度进行发布，其中就包含了一项全球各经济体的注册账号总量数据。根据该平台最新数据，截止到 2024 年第三季度，GitHub 注册用户覆盖全球 201 个国家和地区，注册用户总量已突破 1.33 亿。

当然，在这 1.33 亿的注册用户中除了正常的开发者外，也包含了部分自动化账号、恶意注册的一次性账号、一人注册多个账号等的情况。根据 OpenDigger 的数据显示，过去十年在 GitHub 上留下了公开事件记录（如 Star、Fork、Commit、Issue，PR 活跃等）的账号总量超过了 7700 万，考虑到也有不少开发者只是开源软件的用户，可能从不会在 GitHub 进行协作，因此个人认为粗略估计全球的泛开发者总量为 1 亿左右是一个较为合理的数字。

虽然开发者总量达到了 1 亿，但其中不少都是不活跃的账号，这个数量如果是用来当做是开源开发者数量显然是不合理的。如果我们定义在 GitHub 上有过任意的 Issue、PR 等协作行为的账号为开源开发者的话，那么根据 OpenDigger 的数据，GitHub 上过去十年的活跃开源开发者数量约为 2208 万。

总结而言，透过 GitHub 数据，我们粗略认为全球泛开发者总量已突破 1 亿，其中开源开发者数量超过了 2200 万。

国家分布情况

开发者总体分布

根据 Innovation Graph 的数据显示，截止到 2024 年第三季度，美国开发者总量为 2384 万居世界首位，印度 1711 万位居第二，中国（含港澳台，下同）1347 万排名第三，后续是巴西、英国、俄罗斯、印度尼西亚、德国、日本、加拿大，法国。而如果将欧盟 27 国看作统一的欧盟经济体，则其开发者总量为 1865 万仅次于美国。

按照全球 1 亿开发者数量来推算，中国的开发者总量已超过 1000 万，目前仅次于美国和印度，排名全球第三。

GitHub 官方的国家统计是通过账号的登录 IP 地址来判断的，由于网络环境不稳定的因素，这会导致中国大陆的开发者数量偏低，但考虑到大陆很多开发者网络出口都在香港，而从数据上香港也确实有异常高的开发者数量（超过 220 万，达到香港总人口 30%），因此我们认为加上港澳台数据，可以大致反映中国的总体开发者数量。

开源开发者分布

涉及到 2200 万开源开发者的国家分布，由于 Innovation Graph 仅开放了宏观统计结果，因此无法得知确切的每个开发者所在的国家。OpenDigger 采集了这 2200 万个账号在 GitHub 上的公开信息，并通过他们填写的公开位置信息进行了解析，尽可能得到每个账号所在国家的信息。

在这 2200 万个账号中，填写了公开的位置信息且可以被正确解析的账号数约为 406 万个，占总量的约 18.4%。虽然填写比例不高，但越是头部的开发者填写的信息一般也会越完整。根据 OpenDigger 数据显示，2024 年全球活跃的开源开发者总量超 607 万人，但其中 OpenRank 开发者影响力排名前十万名的开发者就占据全部开发者影响力的三分之一，而其中可以解析出国家信息的比例高达 56% 以上，因此已有数据具有较好的代表性。

而又由于近年来的中美脱钩，大量的中国开源项目开始选择使用 Gitee 平台作为自己的主要协作平台，因此 Gitee 平台开源项目和开源开发者的数量都在过去几年快速增长。由于不少开发者会同时活跃于 GitHub 和 Gitee 平台，我们难以关联他们在不同平台的账号，因此我们仅使用所有 GVP 项目中的活跃开源开发者作为增量加入到中国开源开发者数量中，而不考虑长尾的大量开发者（暂忽略 Gitee 平台上的海外用户数量），这个数量大约为 17.5 万人。

因此最终估计的开源开发者排名前三的国家分别是美国 476 万，印度 240 万，中国 198 万，随后是巴西、德国、英国、加拿大、法国、俄罗斯和波兰。

开源开发者数量与开发者总量相比，可以看到前四位没有发生变化，但俄罗斯开发者总量第五，但开源开发者跌至第九，这应该与 GitHub 封禁俄罗斯开发者账号有关。而印度尼西亚开发者总量第六，但开源开发者总量跌出前十来到了第十四位，说明虽然作为新兴的软件外包大国，印尼的软件开发产业发展迅速，但总体在开源侧的参与度却不高。相较而言，大量的欧洲国家在开源开发者数量中明显更有优势，有更高比例的开发者会参与到开源生态的贡献之中。

开源开发者的十年变迁

上面虽然给出了过去十年全球活跃过的开源开发者总量大约在 2200 万，但事实上开发者来来往往，不同国家每年活跃的开发者数量随时都在发生变化，只有加上时间的维度，才能洞悉开源开发者的变迁趋势。

首先，我们从下图看一下 2024 年全年活跃开源开发者数量最多的 10 个国家在过去十年中每年的活跃开源开发者的数量变化情况。可以看到，美国的活跃开源开发者数量依然呈现出较明显的优势，2024 有约 111 万美国开发者在开源生态中活跃；排在第二名的印度在 2024 年有约 57 万活跃开源开发者，印度在 2020 年后增长迅猛，并于 2023 年超越中国成为活跃开源开发者全球第二；中国的活跃开源开发者数量增长速度相对稳定，并以 51 万活跃开源开发者的数量名列第三；而巴西同样是在 2020 年后开始迅速增长的，在 2023 年反超德国以 39 万活跃开源开发者的数量稳居全球第四。总体而言，全球的活跃开源开发者数量稳步增长，而印度和巴西则在 2020 年后显现出明显的高速增长，德英法加等传统强国则凭借其原有的开发者人口优势稳居前十，但明显增速缓慢，总量相对平稳。

但开发者的数量背后代表的是用户习惯或平台渗透率，但这些用户的质量或许才是我们更应该关注的。从开发者的 OpenRank 影响力角度出发，下图展示了 2024 年各国开发者影响力排名前 10 的国家在过去十年的影响力变化情况。可以看到美国依然是以断层式领先位居全球第一，但在 2020 年后显现出较为明显的震荡和下滑，2024 年下跌 2.89%；中国的发展则非常的迅猛，在 2019 年超过英国成为全球第三，随后在 2023 年超过德国成为了全球第二，并且依然保持着强劲的增长势头，2024 年以 4.24% 的增速引领全球。这中间 Gitee 平台上的项目，尤其是以 OpenHarmony、OpenEuler 为代表的一批国产项目的核心贡献者起到了相当的作用，中国开发者的 OpenRank 影响力中 Gitee 平台账号的占比在 2024 年已经来到了近 17%；在人口数量上并不占优的德国、英国虽然在开源开发者数量排行中屈居印度、巴西之后，但在影响力排行中依然紧随中国之后，坚守着全球第三、第四的位置，并且相较后面的国家具有明显的优势。印度、巴西虽然有明显的开发者数量优势，但在影响力方面，却还有较大的成长空间，分列全球第五与第八位；另外，值得注意的是相较于开源开发者数量排行，在影响力榜单中俄罗斯跌出了前十位，日本则紧随荷兰之后排名全球第十。

开源开发者影响力体现了开发者在开源生态中的位置优势，但贡献度则更可以体现开发者在开源生态中真实的贡献多少。以 GitHub 平台过去十年影响力头部 40 万个仓库与 Gitee 平台 GVP 项目为准，可以看到中国开发者的贡献度在 2020 年后迅速增长，2022 年时即超过德国来到全球第二。与影响力相较，可以看到中国开发者更注重深度贡献，以美国约三分之一的开发者影响力，却达到了美国开源开发者贡献度总量的近 50%，并且仍在以 7.48% 的增速快速发展，已与其他国家拉开了距离。而且由于美国开发者的贡献度在 2020 年后出现了下滑，2024 年中美贡献度增速差已超过 10%，按目前的发展态势，8 年后中国开发者的贡献度将超过美国成为全球第一。与影响力相较而言，德英法加依然有强劲的开源贡献进入到开源生态中，而印度、巴西的贡献度相较影响力则更低一些，分列全球第七与第九位，而波兰则替代日本进入贡献度全球前十。

中美开源项目的贡献分布

我们虽然可以看到中国过去几年在开源软件领域的快速发展，但无法忽视的是开源本身就是一个科技全球化的过程，优质的开源生态不仅依赖各个国家本土开发者的深度贡献，更需要吸引全球人才参与到技术生态的共建当中。目前不可忽视的是，虽然中国开源开发者正在加速深度参与到全球开源技术发展当中，开发者影响力和贡献度都在快速提升，但中国发起的开源项目的全球影响力依然不足，对于全球开源人才的吸引力非常有限。

根据 OpenDigger 的数据显示，在 2024 年，美国科技企业发起的开源项目中，美国本土开发者的贡献占比仅为 38.7%，全球有 17 个国家和地区的贡献占比超过 1%，其中中国以 8.4% 的贡献占比位居全球第二。回溯过去十年，美国的开源全球化进展迅速，引领了全球的开源发展，美国开源项目的本土开发者贡献占比从 2015 年的 51.8% 降低到了 2024 年的 38.7%，中国开发者的贡献占比从 2015 年的 3% 位居全球第六增长到了 2024 年的 8.4% 稳居全球第二，高于第三名德国 2 个百分点。

2024 美国自主开源项目全球贡献分布 Top 10

#	国家	OpenRank 贡献度	贡献度占比
1	美国🇺🇸	200492.15	38.69%
2	中国🇨🇳	43542.15	8.40%
3	德国🇩🇪	33845.51	6.53%
4	英国🇬🇧	28720.74	5.54%
5	加拿大🇨🇦	27072.93	5.22%
6	印度🇮🇳	20866.43	4.03%
7	法国🇫🇷	17518.46	3.38%
8	荷兰🇳🇱	12014.88	2.32%
9	波兰🇵🇱	9663.52	1.86%
10	瑞士🇨🇭	9049.63	1.75%

相较而言，中国科技企业发起的开源项目的本土贡献比例在过去十年始终维持在 80% 左右，2024 年该数值为 79.4%，全球仅有 7 个国家和地区的贡献占比超过了 1%，其中美国以 5.7% 位居全球第二，剩余 5 个国家（加拿大、德国、新加坡、印度、捷克）的贡献占比均不超过 1.5%。虽然中国的科技企业在 GitHub 上对全球开发者抱有开放的姿态，大多采用英文进行协作交流，但总体而言全球化的程度并未得到显著的提升。

2024 中国自主开源项目全球贡献分布 Top 10

#	国家	OpenRank 贡献度	贡献度占比
1	中国🇨🇳	44698.58	79.33%
2	美国🇺🇸	3241.15	5.75%
3	加拿大🇨🇦	774.71	1.37%
4	德国🇩🇪	769.57	1.37%
5	新加坡🇸🇬	767.51	1.36%
6	印度🇮🇳	727.49	1.29%
7	捷克🇨🇿	680.86	1.21%
8	保加利亚🇧🇬	332.93	0.59%
9	瑞典🇸🇪	326.98	0.58%
10	英国🇬🇧	295.21	0.52%

未来的国家间的开源技术博弈将不仅仅是本土开发者参与到全球开源生态当中，而是本土主导的开源项目可以吸引到更多的全球开源人才深度参与贡献，从而进一步打造本国开源技术的全球影响力，引领科技的持续发展。

一句话总结

全球泛开发者总量已超 1 亿人，开源开发者数量突破 2200 万。其中中国开发者数量全球第三，影响力与贡献度全球第二，已进入高质量发展阶段。但中国自主开源项目的全球化程度依然有待提高，未来需要技术创新与制度保障吸引全球科技人才，推动开源竞争力的进一步提升。

如何使用 OpenDigger MCP Server 定制你的开源数据报告

March 21, 2025 · 9 min read

Frank Zhao

Ph.D candidate at X-lab, author of OpenDigger

近期 MCP 概念在开源生态中非常火热，OpenDigger 也实现并开源了自己的第一版 MCP 服务，并通过对于 Kubernetes 项目的分析验证了利用大模型实时获取开源数据指标并进行分析的可行性。

什么是 MCP？

MCP (Model Context Protocol，模型上下文协议) 是由 Claude 的母公司 Anthropic 在 2024 年底推出的一种开放协议，它通过提供一种标准化的接口，旨在实现大语言模型（LLM）与外部数据源及工具的无缝集成。MCP 服务可以提供如静态资源（Resource）、工具调用（Tool）、LLM 提示词（Prompt）等多种不同类型的能力，方便支持 MCP 的工具无缝访问外部的数据源或进行自动化工具调用，使大模型在生成过程中可以使用这些能力来辅助和增强生成效果。

近期 MCP 技术在开发者中热度逐渐攀升，诸多的 AI 编辑器（如 Cursor，Windsurf）、VSCode 插件（如 Cline）、聊天客户端（如 Cherry Studio，NextChat）等都开始纷纷支持了 MCP 能力。而 Anthropic 为 MCP 开发的多语言 SDK 也可以使开发者快速开发自己的 MCP 服务，因此除了官方提供了大量主流平台的 MCP 服务外，开源生态中开始涌现出大量的 MCP 服务项目。

OpenDigger MCP Server

OpenDigger 旨在为开源项目提供全面有效的开源数据指标，OpenDigger 所生产的数据指标一直以来被大量的下游应用所使用（如 HyperCRX、OpenLeaderboard、OpenGalaxy 等），然而这些应用都没有自主进行数据分析和洞察的能力。

大语言模型具有极强的文本生成能力，这对于数据洞察有极佳的辅助作用，但如何在生成过程中动态引用真实数据，生成有效的数据报告也是近期一个研究的难点与热点，而 MCP 则为 LLM 生成数据报告时动态提供线上数据带来了一种新的实现方式。

OpenDigger 也在 X-lab 的 GitHub 上开源了第一版基础的 MCP 服务（X-lab2017/open-digger-mcp-server），该服务提供了如下两个功能：

数据指标获取工具（Tool）：该工具可以实时在线获取 OpenDigger 生产的开源项目数据指标文件，供 LLM 进行分析洞察及后续的生成过程。
数据报告生成提示词（Prompt）：该提示词会向 LLM 解释各类指标的具体含义，并帮助开发者快速生成一个可直接在网页端预览的数据报告。

安装该 MCP 服务后便可以在调用 LLM 生成开源数据洞察报告时调用 OpenDigger 的指标数据，以便进行数据可视化及数据洞察。

数据报告示例

本文以 Cline 插件为例，展示在具有在线数据访问能力后，如何使用 DeepSeek-V3 来生成开源项目的洞察报告。

在本地安装 OpenDigger MCP Server 后，启用该服务，并开启 MCP 的 Auto-approve 选项，以便自动进行数据获取。之后使用该项目提供的 Prompt 让 DeepSeek-V3 模型来生成一个 Kubernetes 主仓库的数据报告。

根据上图，我们可以看到，大模型在接到任务后先对任务进行了分析，分解为如下步骤：

使用 MCP 服务来获取该仓库的 OpenRank、Star、Participants、Contributors 四个数据指标
根据仓库的创建年限来确定数据分析使用的数据粒度（年度、季度、月度）
生成一个 HTML 来展示数据的可视化效果及趋势解读
使用 Chart.js 组件来进行数据可视化

随后大模型自动调用了 MCP 的 get_open_digger_metric 工具来获取数据文件并得到了相应的数据，并根据仓库创建时间选择使用年度数据作为分析粒度，分析数据后大模型直接在编辑器中创建了一个名为 kubernetes-report.html 的文件，并将年度的数据趋势与解读内容生成到该文件中，最后提示用户使用命令行在浏览器中打开该网页。

整个过程一气呵成，用户仅需提供需求，后续的数据获取与可视化报告生成全部由大模型配合 MCP 服务逐步完成。

下图是最终页面中 OpenRank 指标的可视化及解读效果：

根据上图，DeepSeek-V3 模型先将 Kubernetes 主仓库的 OpenRank 年度指标数据使用 Chart.js 组件绘制出来，然后给出了具体的洞察内容。它根据数据的趋势将 Kubernetes 主仓库的发展阶段分为了：

2015 至 2017 年：快速发展期，OpenRank 指标在快速增长，该技术作为容器编排平台被快速认知和使用。
2018 至 2019 年：平稳成熟期，OpenRank 指标维持在相对平稳的状态，几乎没有太多变化。
2020 至 2022 年：缓慢下降期，OpenRank 指标开始逐渐下降，但其也指出这背后可能存在的多种因素，如发展逐渐稳定、开发者更多在扩展的生态项目中活跃、容器技术的标准化完成等。
2023 至今：近期趋势相对稳定，甚至在 2023 年还略有回升，中间月度数据也存在震荡，可能是由于发版或特定特性带来的。

可以看到，DeepSeek-V3 在生成数据报告过程中可以正确的识别 MCP 服务提供的接口和参数，并正确的调用接口得到数据，之后正确的生成了 HTML 文件对数据进行可视化并提供了数据的洞察分析内容。令人惊艳的是，虽然使用了年度数据进行分析，但在近两年的数据分析中，模型也同时使用了月度数据进行了细致的说明。

结论

MCP 是目前大模型生态中最有优势的大模型交互接口协议，已经发展出了繁荣的开源生态，有大量的开发者在上下游中围绕 MCP 进行开发和创作。OpenDigger 也通过实现自己的 MCP 服务验证了利用大模型（如 DeepSeek-V3）进行定制化数据分析的能力，有兴趣的小伙伴欢迎体验和共建。

2025 年 2 月开源生态数据洞察报告

March 6, 2025 · 7 min read

Frank Zhao

Ph.D candidate at X-lab, author of OpenDigger

Will Wang

Prof. @ ECNU / Founder of X-lab

OpenRank 指标是对工信部电子标准院的“信息技术开源治理”系列标准中评价指标的开源实现，能够有效反映开源项目在开发者中的协作影响力，从而帮助我们了解开源世界，发现开源趋势，洞察开源事件。

热点事件：DeepSeek 开源周引爆全球 LLM 基础优化技术

继 2025 年 1 月 DeepSeek 发布轰动全球的 DeepSeek-R1 模型后，2025 年 2 月 21 日宣布启动为期一周的“开源周”计划，从 2 月 24 日其连续五天开源一项核心技术，旨在推动 AI 技术共享与行业应用加速。这五项技术共开放了 7 个开源仓库，根据 OpenDigger 数据，凭借开源周的热度，DeepSeek 在 GitHub 上的组织在 2 月 24 日至 3 月 6 日间共获得 56.2k+ Star，有 805 位开发者参与到讨论和协作中。最终 DeepSeek 在企业 OpenRank 中再次强劲增长近 60%，达到 330 并进军到中国企业榜第 11 位。

开源周中提到的这五项技术包括：

Day 1 - FlashMLA
- 面向 Hopper GPU 的高效 MLA（多头潜在注意力机制）解码内核，针对可变长度序列优化算力分配，显著降低推理成本。
Day 2 - DeepEP
- 首个专为 MoE（混合专家）模型设计的 EP（专家并行）通信库，支持 FP8 低精度计算，提升 GPU 间通信效率 10 倍，兼顾高吞吐与低延迟。
Day 3 - DeepGEMM
- 基于 FP8 精度的通用矩阵乘法加速库，代码仅 300 行，高效优化深度学习矩阵运算，提升训练与推理效率。
Day 4 - 并行策略三连发
- DualPipe：双向流水线并行算法，优化模型训练流程。
- EPLB：MoE 负载均衡算法，解决专家模型资源分配不均问题。
- profile-data：公开训练框架数据，助力开发者复现与优化。
Day 5 - 3FS 分布式文件系统
- 面向 AI 训练的高性能分布式存储系统，结合固态硬盘与 RDMA 网络，极致压榨硬件带宽，被评价为 “数据处理新标杆”。

从 Star 增长情况来看，FlashMLA 凭借先发优势，在第一天就斩获了 7k+ Star，截止 3 月 6 日共获得 Star 数 11.3k+。而最后一天发布的分布式文件系统 3FS 格外受到开发者的关注，发布当天就获得了近 4k Star，截止 3 月 6 日共获得了超过 8k Star。

从 Star 增长来看，2025 年 2 月 DeepSeek 的 Star 增长仍然遍布了全球 127 个国家和地区，且各国的贡献度比例与 1 月呈现类似的分布。对比上个月数据，中国的贡献度更为集中，以 70.69% 的比例领跑。而美国和印度分别以 8.08% 和 3.38% 位于第二梯队。后续为加拿大、英国、新加坡和巴西等国。

作者点评：DeepSeek 开源周不仅是技术实力的展现，更是对 “开源精神” 的极致践行 —— 以开放代码推动行业共进，印证了 “越是开源，越能扩大生态” 的战略远见。
进阶阅读：
- DeepSeek GitHub 地址：https://github.com/deepseek-ai/

本月推荐项目

DeepSeek 带来的热潮也开始对大模型基础技术生态产生重要影响，多个项目都受到其影响出现了爆发式的增长。

kvcache-ai/ktransformers

KTransformers 项目旨在提供基础模型的各类底层优化，2024 年 7 月开源以来一直没有太多关注度。2025 年 2 月，其开始支持对于 DeepSeek V3 和 R1 模型的优化，从减少推理显存、提升上下文长度等多个方面对模型进行了优化，2025 年 2 月该项目在创建半年多后迎来了爆发式增长，当月 OpenRank 影响力增长 34 倍达到 138，有 736 位开发者参与到了项目讨论和协作中，成为了一个现象级项目。
仓库地址：https://github.com/kvcache-ai/ktransformers

huggingface/open-r1

DeepSeek-R1 发布后引发全球复现高潮，作为全球最模型托管平台，Hugging Face 也提供了一个完全开源的 DeepSeek-R1 的复现仓库 Open-R1，该仓库在开源后获得了 22.8k+ Star，2025 年 2 月有 359 位开发者参与到了讨论与协作中，OpenRank 值达到 88，成功进入全球仓库增长榜单。
仓库地址：https://github.com/huggingface/open-r1

January 2025 Open Source Monthly Insight Report

February 6, 2025 · 7 min read

Frank Zhao

Ph.D candidate at X-lab, author of OpenDigger

Will Wang

Prof. @ ECNU / Founder of X-lab

The OpenRank metrics are an open-source implementation of the evaluation criteria outlined in the "Information Technology - Open Source Governance" series of standards developed by the Electronics Standards Institute of the Ministry of Industry and Information Technology. These metrics effectively reflect the collaborative influence of open-source projects among developers, thereby aiding our understanding of the open-source ecosystem, identifying emerging trends, and uncovering significant events.

The Global Impact of DeepSeek: Pioneering a New Era of AI

On January 20, 2025, the Chinese AI company DeepSeek unveiled its R1 series of large language models, causing a seismic shift in the global AI industry. Characterized by their low cost, high performance, and open-source nature, these models not only triggered a significant impact on the U.S. financial markets in the short term but also profoundly influenced the technological trajectory, industry landscape, and geopolitical dynamics of large language model development. This insight report will delve into DeepSeek's entire suite of models, providing a comprehensive data analysis.

Overview

DeepSeek launched its R1 inference model on GitHub on January 20, 2025, followed by the release of the Janus Pro multimodal on January 28. These models quickly gained global attention due to their exceptional cost-effectiveness and performance. From the release of the R1 model until February 6, DeepSeek's official GitHub organization garnered over 150,000 new stars, with 1,679 active developers contributing. Five of DeepSeek's repositories entered the top 300 list of Chinese OpenRank repositories in January 2025, with DeepSeek-R1 ranking at 62nd after just 10 days of being open-sourced. In the OpenRank enterprise rankings, DeepSeek scored 207 points in January 2025, rapidly ascending to the 86th position globally and 13th in China.

2025.1 OpenRank Leaderboard of Chinese Companies Top 15

#	Company	OpenRank	Active Repos Count	Active Developers Count
1	Huawei	10416.91441.38	300593	47821103
2	Alibaba	1822.95142.79	1410306	2026524
3	Ant group	1329.9797.46	54210	1671336
4	Baidu	1119.3783.37	19219	978249
5	ByteDance	684.210.5	3712	1112185
6	ESPRESSIF	529.5623.4	16815	86869
7	Tencent	476.5156.4	23755	687285
8	DaoCloud	424.4789.53	496	555186
9	PingCAP	423.8914.15	7611	25236
10	Fit2Cloud	419.8954.12	571	348145
11	Zilliz	294.026.32	443	24134
12	StarRocks	215.4610.95	11	16033
13	DeepSeek	207.45172.47	161	13861207
14	openKylin	204.3759.26	117100	11896
15	Deepin	162.049.12	12210	833

Star Growth Analysis

The following chart illustrates the daily star growth for the five fastest-growing repositories under DeepSeek's GitHub account up to February 6. Notably, DeepSeek-R1's repository saw an immediate surge of over 2,000 stars on the day of its release, with daily increments ranging between 2,000 and 4,000 stars until January 26. The true explosion occurred on January 27 when the U.S. stock market experienced a sharp decline following the release of DeepSeek-R1. NVIDIA's stock plummeted by 17% on that day, leading to widespread recognition of DeepSeek-R1 and boosting the popularity of its base model V3 and the Janus Pro multimodal model released on January 28. On January 28, both V3 and R1 models saw star growth exceeding 10,000, while the Janus repository gained over 4,000 stars. Subsequently, the growth rate slowed down, with another spike observed on February 5 following the Lunar New Year holiday in China.

The distribution of star growth by country and region is depicted in the chart below. According to OpenDigger data, the 150,000 stars accumulated during this period originated from 185 countries and regions worldwide. On the day of the R1 release, stars came from 82 countries, with the United States contributing the most (28%), significantly surpassing China's share of 17.4%. Despite time zone differences, this highlights the rapid response and keen interest from U.S. developers. By January 28, the global impact peaked, with contributions from 149 countries. Brazil and South Korea were notable late entrants, while post-holiday activity on February 5 was predominantly driven by Chinese developers returning to work.

Participants Distribution

Although DeepSeek's models are primarily hosted on platforms like HuggingFace and ModelScope, GitHub has played a crucial role as a forum for developer discussions and Q&A sessions, far exceeding the volume of interactions on HuggingFace. Analyzing the global distribution of contributors based on OpenRank data reveals that China, the U.S., and India form the first tier of contributors. The second tier includes the UK, Brazil, and Germany, while Australia, Pakistan, and Singapore follow in the third tier. Notably, despite having fewer contributors, Singapore ranks highly in terms of contribution quality. Israel's growing tech sector is also reflected in this data.

Detailed analysis shows that DeepSeek has attracted numerous developers and enthusiasts who have been deeply involved in large language model research over the past six months. Prominent contributors include:

Krish Dholakia (@krrishdholakia), founder and CEO of LiteLLM (OpenRank 193)
Yineng Zhang (@zhyncs), core maintainer of SGLang (OpenRank 180)
Michael (@mldangelo), core maintainer of Promptfoo (OpenRank 46)
yetone (@yetone), author of avante.nvim (OpenRank 57)
Dev Khant (@Dev-Khant), co-founder of Mem0 AI (OpenRank 31)
Junyan Qin (@RockChinQ), author of LangBot
wong2 (@wong2), author of ChatHub
Dongbo Wang (@daxian-dbw) from Microsoft's PowerShell team on AIShell project
Wenhua Cheng (@wenhuach21) from Intel's AutoAround team

This data indicates that while North American developers show strong interest in using DeepSeek, they are less actively engaged in discussions. Conversely, Chinese and Indian developers have been more proactive in participating and collaborating.

Key Findings

The release of DeepSeek's large language models marks a significant milestone in the global AI landscape. Within two weeks of the R1 launch, DeepSeek's multiple GitHub repositories received over 150,000 stars, with nearly 1,700 active developers, underscoring the global recognition and enthusiasm for this innovation. DeepSeek's OpenRank score also saw a dramatic increase.

Key observations from the data include:

Global Reach: DeepSeek-R1's influence extends across almost all major countries and regions, showcasing its broad appeal.
Rapid Response from U.S. Developers: U.S. developers exhibited a high level of sensitivity to technological advancements, responding faster than Chinese developers initially.
Contributor Diversity: Contributions come from students, individual AI enthusiasts, and corporate AI project leaders or founders of AI startups, forming a balanced community.
Indian Engagement: Indian developers play a crucial role in this AI wave, actively collaborating with Chinese counterparts.
North American Observation: While North American developers show significant interest, many remain observers rather than active contributors, with more engagement from student and Chinese-American communities.

Conclusion

Historically, China has often been seen as a consumer in the open-source community, occasionally criticized for limited contributions. However, projects like DeepSeek are now leading the way, demonstrating not only technical breakthroughs but also fostering extensive global participation and contributions. We hope to see more European and North American developers deeply engage in the development of top-tier Chinese projects.

In summary, DeepSeek's success is not just a technological triumph but also a social and industrial milestone. It has attracted global developers to contribute to the advancement of AI, setting the stage for future innovations in artificial intelligence. We look forward to DeepSeek continuing to lead the global AI revolution, opening up new possibilities for humanity.

December 2024 Open Source Ecosystem Data Insight Report

January 6, 2025 · 4 min read

Frank Zhao

Ph.D candidate at X-lab, author of OpenDigger

Will Wang

Prof. @ ECNU / Founder of X-lab

The OpenRank indicator is an open source implementation of the evaluation indicators in the "Information Technology Open Source Governance" series of standards of the Electronic Standards Institute of the Ministry of Industry and Information Technology. It can effectively reflect the collaborative influence of open source projects among developers, thereby helping us understand the open source world, discover open source trends, and gain insight into open source events.

Hot Event 1: Ghostty is released, and it is still young again

Data Facts: According to OpenDigger data, within 5 days of its release, the Ghostty project attracted over 530 developers, more than 1,000 discussions, and gained over 16,000 stars. Its OpenRank surged past 100, settling at 105.
Detailed Analysis: Ghostty is a terminal emulator that runs on MacOS or Linux systems. By utilizing local GPU resources, it enhances terminal functionality and provides a smoother user experience. On December 26, 2024, after more than 2 years of private repository development, Ghostty was open-sourced and officially released version 1.0. The author, Mitchell Hashimoto, founded HashiCorp at the age of 23. He stepped down as CEO in 2016 to become the CTO and later resigned from the CTO position in late 2021 to return to personal programming. He left the company he founded at the end of 2023. Data shows that the Ghostty project was created in March 2022, with over a million lines of code. Initially, Mitchell developed the project alone for two years until mid-2024, when other developers joined. Mitchell remains the primary developer, contributing over 90% of the project's code.
Author's Comments: As the founder of HashiCorp, Mitchell loves coding and is the founding engineer and core developer of well-known open-source projects like Vagrant, Consul, Terraform, and Vault. Despite being a multi-millionaire, his passion for coding remains unchanged, which is likely a significant factor in the project's popularity among developers.
Further Reading:
- Ghostty Project Repository: https://github.com/ghostty-org/ghostty
- Ghostty Project Release Blog: https://mitchellh.com/writing/ghostty-1-0-reflection

Hot Event 2: Generative AI Empowers Embodied Intelligence, Genesis Officially Released

Data Facts: According to OpenDigger data, since its release on December 19, 2024, the Genesis project attracted over 500 developers within 10 days, with 21 contributors and nearly 20,000 stars. Its OpenRank settled at 85.
Detailed Analysis: Genesis is a research platform for embodied intelligence that integrates generative model capabilities. It consists of a general-purpose physics engine, robot simulation platform, photorealistic rendering system, and data generation engine powered by generative AI technology. This engine converts natural language into training data for various modules. The project is developed by a team led by Dr. Chan, Chief Scientist at the MIT-IBM Watson AI Lab. In late 2023, the team published a paper introducing RoboGen, a framework that uses generative AI to provide unlimited learning data for robots and automate training. After over a year of development, RoboGen was open-sourced as the embodied intelligence research platform Genesis, gaining widespread attention.
Author's Comments: Embodied intelligence is a cutting-edge research area in artificial intelligence, with few open-source research platforms available. Facebook's Habitat platform, open-sourced in 2019, is a notable example. With the rise of generative AI, scientists are exploring its application in embodied intelligence to accelerate the development of intelligent robots. Dr. Chan's team, building on a solid theoretical foundation, has integrated generative AI technology into their research platform, which is expected to make significant contributions in this field.
Further Reading:
- Genesis Project Repository: https://github.com/Genesis-Embodied-AI/Genesis
- Habitat Project Repository: https://github.com/facebookresearch/habitat-sim
- Genesis Media Article: https://sawanrai777.medium.com/genesis-a-revolutionary-platform-for-physics-and-embodied-ai-fc85914e8249

Recommended Projects of the Month

eliza

eliza is a lightweight AI agent framework for individual developers, enabling quick creation of personal AI agents and workflows. Since its open-source release in July 2024, the project has focused on development and gained significant popularity in December 2024, with over 10,000 stars and 441 active developers in December. Its OpenRank has reached 149.
Repository: https://github.com/elizaOS/eliza

blink.cmp

blink.cmp is a code completion plugin for the Neovim editor, unlike the popular Copilot, it is a traditional text indexing and fuzzy search-based completion tool known for its efficiency. It can respond in milliseconds with an index size of 20,000, making it popular among Neovim users. The project was open-sourced in October 2024 and had 294 active developers in December, with an OpenRank of 108.
Repository: https://github.com/Saghen/blink.cmp

关于 OpenDigger 标签工作的思考与规划

December 9, 2024 · 17 min read

Frank Zhao

Ph.D candidate at X-lab, author of OpenDigger

这段时间，对 OpenDigger 的标签做了一次较大的更新，主要是新增了一批项目和企业的标签，以及对国家和地区的开发者占比做了统计，主要用于 BenchCouncil 中的榜单发布（全球行政区划开发者 OpenRank 排行榜, 全球企业 OpenRank 排行榜, 全球项目 OpenRank 排行榜）。因此也有了一些新的思考，这里分享一下，也希望可以抛砖引玉，看接下来如何进一步规划和优化 OpenDigger 的标签体系。

总体而言，OpenDigger 的标签工作分为两部分，标签体系建设和标签工具建设。标签体系的重点在于如何构建以一套有效且易维护的标签结构，而标签工具则是使用怎样的技术方案来实现和维护上述的标签体系。

标签体系建设

OpenDigger 的标签体系早期是源于 OpenDigger 本身的数据需求而逐渐建立起来的。主要是各类数据报告中需要有不同的指标聚合方式，尤其是如企业、国家维度的聚合，所以最初的设计中最主要标注的也就是企业、国家的数据，后续又陆续增加了基金会、技术领域和项目群的相关标签。然而随着标签数据越来越多，维护的难度也开始逐渐上升，缺乏顶层设计的缺陷也逐渐凸显。截止到 2024 年 12 月，已经有上千个标签，涵盖 200 多家企业、数十个基金会和 500 多个项目，这也需要 OpenDigger 建立一套标准化的标签体系，方便后续的持续维护和进一步拓展。

总体而言，早期标签的加入是需求驱动的，并没有进行顶层的统一设计，因此结构上也是趋向于扁平化的，即每类标签单独在一个文件夹下，通过标签之间的 ID 进行交叉引用。但在过去一段时间丰富标签的过程中，发现目前主要的标签需求之间其实存在某种关联性，这种关联性也进而导致了后续的一些设计上的变化，例如：

项目一般都是由某个实体发起，后续可能捐献给基金会的，因此项目可以不单独使用某个目录，而是在其对应的发起实体的目录下维护。
项目发起的实体可以是个人、企业、高校、政府机构（如美国退伍军人事务部、英国司法部）、研究机构（如欧洲核子研究中心）等。这些实体的类型不一而足，但大多与各国当地的机构结构有关，因此总体上虽然结构相似，但在不同国家也会有细微的差异。
对于上述的各类实体，需要一套标准化可行的分类方式，这种分类方式不仅体现在维护性上，而且也是后续各种聚合查询的基础，因为构建在这套标签体系之上的指标查询工具将使用这套标签体系来进行查询。

基于上述的一些反思，因此对于指标体系的建设可以从几个方面来说：

指标结构

从指标结构上来说，之前是扁平化展开的，国家、企业、基金会、高校、机构、项目都是放在同级目录下的，然后会进行交叉引用，例如国家会以高校、企业、基金会等为子标签。然而上述的标签其实都是从项目发起方的角度来看的，因此应该可以构建在同一个目录下，形成“行政区划”-“发起机构”-“开源项目”的三层结构。

行政区划一级主要是指地区信息，如国家，当然也可以更进一步细化到省市一级。
发起机构则是指在法律上实体注册在这些行政区划内的机构，这些机构本身可以进一步进行分类，关于这个分类方法后续再进一步讨论。
项目就是 GitHub、Gitee 上的组织或仓库群构成的开源项目，同一个开源项目可以包含多个组织或仓库，也可以托管在多个平台上。

上述的发起方角度应该是整个标签体系构建的基础，在此基础之上，可以进一步增加其他的并行标签内容，如项目类型、技术领域等，这些标签均以项目标签为基础构建，即它们仅可引用项目级标签为自己的子标签，而不能单独使用平台上的仓库或组织为自己的标签数据内容。即当某个领域出现一个新的项目要标注时，需要先鉴别其对应的发起方及其所在的行政区划，并设置好这些数据后引用该项目标签，而不要直接使用仓库或组织数据。

行政区划

行政区划是发起方所属国家或地区的信息，这部分事实上已经有一些标准可以直接采用。例如 OpenDigger 目前使用 ISO 3166 标准进行国家标注，国家和地区编码部分使用的是 ISO 3166-1 alpha-2 标准，该标准下所有国家和地区使用一个 2 位的英文字母进行标识，同时也包含该国家对应的全称，而恰好 GitHub 发布的全球开发者区划分布也是按照该标准发布（区别在于该数据将欧盟作为一个一级区划），因此较容易进行关联性建立。而对应的 ISO 3166-2 标准则进一步对国家和地区内部的一级行政区划进行了定义，因此国家和国家内部的一级行政区划可以完全使用 ISO 3166 系列标准进行定义。

发起主体

这部分需要比较专业的知识，可能本人的理解也有出入，欢迎指出。

如上所述，发起主体与各国中对于法律实体的定义有关，因此情况也最为复杂。相对而言，高校、政府机构、研究机构是相对明确简单的，而企业和基金会是最为复杂的。

以中美的差异为例，对于大部分企业而言其结构是相似的，尤其是私营企业主要以独资企业、合伙企业、有限责任公司、股份有限公司等形式为主，在 OpenDigger 的标签体系中可以不做额外的区分，就是公司/企业标签即可。主要难点在于基金会的分类：

在中国的实体分类中，一般性企业属于工商部管理范畴，而社会团体、民办非企业单位和基金会则属于民政部管理范畴，这也是为什么国内部分唯二的开源基金会（开放原子开源基金会、重庆开工开物开源基金会）都注册在民政部，其对应的统一社会信用代码以 53 开头，即民政部下属基金会属性单位。可见基金会在中国是一个独立的法人实体类型。且在中国，法律认可的非营利性组织也只有社会团体、民办非企业单位和基金会三类。

但在美国的法律体系中，并不包含一种名为基金会的法人实体，所有的非营利组织在美国都属于企业性质，只是分类会略有不同，主要都在 501(c) 分类下。常见的非营利组织类型包括慈善组织 501(c)(3)，如 Apache 基金会就是这类组织；还有商业联盟性质的 501(c)(6)，如 Linux 基金会就属于这类组织。它们在财务规定和监管层面有一定的差异，这也是为什么近年来 Linux 基金会可以通过企业捐赠快速扩张发展，而 Apache 基金会则更加佛系的根本原因之一。

也正是由于上述区别，基金会这个名称在中美有了很大的差异，在中国是一类非常明确的法人实体类型，而在美国基金会是非营利组织可选的一种注册名称而已。如美国的连接标准联盟与 Linux 基金会相同也是一个 501(c)(6) 组织，但其名称确为"联盟"。而正是由于这种命名的随意性，使得追踪海外基金会变得非常困难，例如一些自称为基金会的组织，我们甚至在网上无法查证其组织类型以及是否真的是非营利性的组织。

另外一个有趣的差别是，在美国，在一般性企业和非营利性组织之间，还存在一种叫做 PBC（Public Benefit Corporation）的企业类型，即公益法团。如最近大火的社交平台 Bluesky 背后的公司即属于这类。该类型是一种具有公益性质的营利性组织。对应中文语境中的“社会企业”，但在中国，目前“社会企业”还并非一种具有法律认可的实体类型，主要是由中国慈展会定期进行公开评定，可给各类企业或非营利性组织进行非正式的社会企业认证。当然，在 OpenDigger 的标签体系中，这类还是统一被归为企业类型。

综上所述，在发起主体层面，除明确的高校（University）、政府机构（Agency）、研究机构（Institution）外，其他则分为公司（Company）和非营利组织（NPO）。则在各国法律体系下，基金会均属于非营利组织范畴，而基金会排名对比时则也是与其他非营利组织统一排名，如行业联盟等。

社区项目

虽然上面提到在新的设计中，我们希望为所有项目均找到对应的法人实体发起方。但在现实中，依然会存在没有明确发起人的项目，或发起人希望该项目是完全社区驱动的，又或者发起人为个人的项目，这类项目难以对应到具体的法人实体，因此需要一个社区项目类型来涵盖这部分项目。

需要注意的是这里的社区也只是一种无明确发起方的分类方式，而社区（Community）本身并不是 OpenDigger 标签体系中的一部分。这是由于我发现社区本身的定义非常宽泛和模糊，一个企业项目也可以称自己为社区，一个兴趣团体也可以称自己为社区，因此这会导致该标签可能被滥用，而其对应的排行也就没有太多意义了。不过可能确实存在某些群体需要一个独立身份的情况，后续可能根据需求的变化会进一步细化这部分设计。

总结

因此最新的设计下，总体的标签结构示例应该为：

label_data
├── division # 行政区划
│   ├── cn # 中国
│   │   ├── gd # 广东
│   │   │   └── huawei # 华为
│   │   │       └── openharmony
│   │   └── zj # 浙江
│   │       └── alipay # 蚂蚁集团
│   │           └── tugraph
│   └── us # 美国
│       ├── ca # 加利福尼亚州
│       │   └── linux_foundation # Linux 基金会
│       │       └── valkey
│       └── md # 马里兰州
│           └── apache_software_foundation # Apache 软件基金会
└── technology # 技术领域
    ├── cloud_native # 云原生
    │   ├── platform # 平台
    │   └── runtime # 运行时
    └── database # 数据库
        ├── graph # 图数据库 -> 引用 :division/cn/zj/alipay/tugraph
        └── kv # 键值数据库 -> 引用 :division/us/ca/linux_foundation/valkey

标签工具建设

标签工具建设是更加偏向技术的一部分，是上述标签体系的具体实现。该实现不仅需要考虑到上述标签体系的所有能力和业务需求，同时也需要向下适配与数据库交互的结构以及标签数据的常见运算，如集合的交并差等。

目前的标签工具是使用 TypeScript 编写的，直接在运行时基于标签数据文件在内存中构建整套标签数据，可实现基础的运算和标签关系查询能力。但长远而言，从可扩展性以及查询效率上，还是希望标签数据可以直接落库，则最终的指标查询只需要做一个联表查询即可。

但由于存在多层标签的父子标签溯源问题（如某个项目是哪个国家发起的），这种溯源在数据库中需要递归 CTEs 特性的支持，而 OpenDigger 目前底层的 ClickHouse 版本尚不支持该特性，因此需要等待 ClickHouse 升级后再进行改造。

November 2024 Open Source Ecosystem Data Insight Report

December 6, 2024 · 5 min read

Frank Zhao

Ph.D candidate at X-lab, author of OpenDigger

Will Wang

Prof. @ ECNU / Founder of X-lab

The OpenRank indicator is an open source implementation of the evaluation indicators in the "Information Technology Open Source Governance" series of standards of the Electronic Standards Institute of the Ministry of Industry and Information Technology. It can effectively reflect the collaborative influence of open source projects among developers, thereby helping us understand the open source world, discover open source trends, and gain insight into open source events.

Hot Event 1: BlueSky's Surge, Driven by US Elections and AI Wave

Data Facts: According to OpenDigger data, multiple BlueSky repositories on GitHub experienced a surge in activity. This includes their decentralized social media protocol repository, atproto, and the client repository, social-app. The total number of active developers across all repositories in November increased by 173% year-over-year to 1,082, with a total star increase of 5,800. The total OpenRank value increased by 67%, reaching 340 points.
Detailed Analysis: BlueSky is an independent project created by former Twitter CEO Jack Dorsey, using a newly developed AT social network protocol to achieve a decentralized social media platform. Following the US elections on November 5, some users dissatisfied with the election results chose to leave Twitter in search of new social platforms, with BlueSky becoming a significant option. A week after the election, its client app topped the free app charts in the US Apple App Store. Additionally, on November 16, Twitter updated its Privacy Policy to allow third-party platforms to use user data for generative AI training. In response, BlueSky officially stated that it would not use user data for generative AI training, leading many high-quality content creators to migrate to BlueSky to protect their digital content. The platform had approximately 10 million registered users as of September 2024, and various events in November led to a surge in users, with registered users exceeding 20 million by November 20.
Author's Comments: The tech world is not isolated from real-world events, which can significantly impact the open-source community. The rise of generative AI has also highlighted underlying issues, with developers and users taking action to protect their interests.
Further Reading:
- Twitter Privacy Policy Update: https://privacy.x.com/en/blog/2024/updates-tos-privacy-policy
- BlueSky's Statement on Generative AI: https://bsky.app/profile/bsky.app/post/3layuzbto2c2x
- BlueSky GitHub Page: https://github.com/bluesky-social

Hot Event 2: Redis Attempts to Control Peripheral Projects, Valkey Community Continues to Grow

Data Facts: According to OpenDigger data, the number of active developers in Redis's Rust client repository, rust-rs, increased by 54% in November 2024 to 40, with many participating in discussions about Redis's request for the project's author to transfer control. Meanwhile, the Valkey community, which forked from Redis in March 2024, continues to grow, surpassing the main Redis repository in various metrics.
Detailed Analysis: On November 25, 2024, Armin Ronacher, the author of Redis's Rust client project rust-rs, opened an issue discussing Redis's request for control over the project. The maintainer of Redis's PHP client, Pedis, reported receiving a similar request. This is not Redis's first attempt to control community projects; between 2020 and 2024, Redis transferred several community clients, including Jedis, Redis-py, and Lettuce, to its GitHub organization. Meanwhile, there are concerns that new versions of community clients controlled by Redis may not be compatible with Valkey. Valkey is a community fork of Redis created in March 2024 after Redis announced changes to its project license. It is led by core developers from AWS, Alibaba Cloud, Google, and Tencent Cloud and is now hosted by the Linux Foundation. Since the Redis community split, Valkey has developed steadily, while Redis has become less active. According to OpenDigger data, Valkey's main repository OpenRank reached 71 points in November, while Redis's main repository dropped from 62 points in March to 27 points.
Author's Comments: Software ownership involves more than just code; it affects a project's sustainability and community trust. When an open-source project's ownership is transferred to a commercial company, community members often worry about the project's neutrality and openness. The future development of Redis and Valkey remains to be seen.
Further Reading:
- redis-rs Repository Discussion Issue: https://github.com/redis-rs/redis-rs/issues/1419
- Valkey Repository: https://github.com/valkey-io/valkey
- OpenDigger Analysis Report on Redis and Valkey: https://open-digger.cn/blog/2024-04-04-redis-analysis

Recommended Projects of the Month

Julia

Julia is a high-performance dynamic programming language for numerical analysis and computational science, first developed in 2009 and released version 1.0 in 2018. With continuous improvements to its language core, the development focus has shifted towards supporting standard libraries. In November 2024, the community moved linear algebra-related standard libraries to a separate repository, transferring thousands of related issues. This migration was noted in logs as new issues, drawing attention from data insights. Julia's development remains stable, with an OpenRank value of 242 across all repositories as of November 2024.
Repository: https://github.com/JuliaLang/julia

Zen Browser

Zen Browser is an open-source browser based on the Firefox engine, open-sourced in April 2024 and gaining popularity in August. In November, the repository had 882 participating developers. Known for its excellent user experience, the browser features a split-screen display, a popular feature not natively supported in Chrome. According to OpenDigger data, the repository's OpenRank value reached 262 in November, ranking 63rd globally.
Repository: https://github.com/zen-browser/desktop

October 2024 Open Source Ecosystem Data Insight Report

November 6, 2024 · 5 min read

Frank Zhao

Ph.D candidate at X-lab, author of OpenDigger

Will Wang

Prof. @ ECNU / Founder of X-lab

The OpenRank indicator is an open source implementation of the evaluation indicators in the "Information Technology Open Source Governance" series of standards of the Electronic Standards Institute of the Ministry of Industry and Information Technology. It can effectively reflect the collaborative influence of open source projects among developers, thereby helping us understand the open source world, discover open source trends, and gain insight into open source events.

Hot Event 1: Linux Removes Russian Maintainers, Huawei Releases Native HarmonyOS

Data Facts: According to OpenDigger data, OpenHarmony has rapidly grown since its open-source release in August 2019, becoming the top-ranked open-source community in China. Currently, OpenHarmony projects are primarily hosted on the Gitee platform, with over 2,000 repositories, more than 8,000 contributors, and over 15,000 active developers. More than 70 tech companies, including Ruyi Software, Softpower Technology, Shencanhong, and Jolian Technology, are involved in its development.
Detailed Analysis: In late October 2024, Linux removed over a dozen Russian developers from its kernel maintainer list due to "compliance requirements." Linus Torvalds responded firmly to other developers' questions in a subsequent mailing list. This event drew significant attention in the open-source community, highlighting the increasing impact of geopolitics on open-source technology. In May 2019, Huawei was added to the US Entity List, preventing it from using Google's Android OS. In response, Huawei released HarmonyOS in August 2019 and open-sourced its core code as the OpenHarmony project, donating it to the OpenAtom Open Source Foundation in May 2020. After over five years of development, OpenHarmony has become the highest-ranked open-source project group in China. In late October 2024, Huawei released a fully independent, natively developed HarmonyOS based on OpenHarmony's development, marking the project's maturity.
Author's Comments: While technology itself is borderless, technologists have nationalities. In the face of significant geopolitical changes, we must maintain an open and cooperative stance while being prepared to lead and develop our core technologies. Only then can we leverage technology to drive national development and ensure strong global competitiveness.
Further Reading:
- OpenHarmony Repository: https://gitee.com/openharmony
- News on Linux Removing Russian Maintainers: https://www.infoq.cn/article/PGmRDMhjjINXafmHM0bI
- Native HarmonyOS Release News: http://cq.people.com.cn/n2/2024/1023/c365412-41017916.html

Hot Event 2: Open Source Summer Programs Conclude, Global Summer Activities Thrive

Data Facts: According to OpenDigger data, due to the impact of the National Day holiday in China, most projects experienced a decline in OpenRank during October. However, due to the popularity of OSPP and GSoC, related projects saw an overall increase of 3.5%, with thousands of participants involved in summer activities.
Detailed Analysis: Both OSPP (Open Source Promotion Plan) and GSoC (Google Summer of Code) concluded in October. According to official data, both programs set new records in 2024, with 561 and 1,133 projects, respectively. OpenDigger data shows that similar summer programs targeting college students are emerging globally. For instance, the GSSoC24 (GirlScript Summer of Code) program launched in India in October, with over 2,000 students registering for certificates. Additionally, Woowa Brothers in South Korea initiated programming training courses targeting students, with over 4,500 learning PRs and 28,000 PR review comments across 10 learning repositories in October, placing multiple repositories on the global OpenLeaderboard.
Author's Comments: In recent years, open-source summer programs for college students have become more numerous and diverse. These programs not only produce excellent software but also provide students with valuable coding and practical experience, becoming important platforms for their technical growth and innovation.
Further Reading:
- OSPP Official Website: https://summer-ospp.ac.cn/
- GSoC Official Website: https://summerofcode.withgoogle.com/

Recommended Projects of the Month

freeCodeCamp

freeCodeCamp is a popular online learning platform that teaches programming and web development skills through interactive methods. It offers free resources, including thousands of coding challenges, projects, algorithms, and front-end development practices. Its main repository has over 400,000 stars, consistently ranking first on GitHub's star chart. In October 2024, freeCodeCamp participated in Hacktoberfest, attracting more developers. During the month, 380 developers contributed, resulting in 435 PRs and over 2,200 discussions, boosting the project's OpenRank by 50% to 151.
Repository: https://github.com/freeCodeCamp/freeCodeCamp
Comment: Both freeCodeCamp and Hacktoberfest started in 2014, and their combination continues to inspire creativity after a decade of development.

Bolt.new

In early October 2024, StackBlitz, the company behind the WebContainer project, launched Bolt.new. This new product integrates AI assistants based on large language models with WebContainer technology, enabling local code generation and Node.js execution in the browser. This allows Node.js based software projects to be developed, debugged, and deployed entirely within the browser. The launch was well-received, with over 600,000 views on Twitter. Within a month, the repository received over 6,600 stars, and more than 1,100 developers participated in discussions and collaborations, resulting in an OpenRank of 163.
Repository: https://github.com/stackblitz/bolt.new
Comment: The emergence of large language models has significantly enhanced programming productivity, while WebContainer technology has revolutionized application deployment. Their combination provides unprecedented convenience and experience for developers, greatly inspiring their enthusiasm and creativity.

Hackpad

Hackpad is an interesting hackathon project initiated by Hack Club, a global community of high school hackers. The project invites developers to submit mini keyboard designs, including PCB designs, hardware models, and software programs, during the event. The organizers will produce physical keyboards based on the accepted designs and distribute them to participants. In October 2024, 178 participants submitted 287 PRs, contributing to the repository's OpenRank of 100.
Repository: https://github.com/hackclub/hackpad
Comment: Open-source collaboration platforms provide fertile ground for global community development. Hack Club, a global tech community of young students, stands out with creative ideas and activities, reminding us of the original hacker spirit: just for fun!

OpenRank 影响力评价算法​

OpenRank 贡献度评价算法​

OpenDigger​

背景介绍​

OSPP 2024 宏观数据​

年度贡献度分析​

全域贡献分析​

前言​

全球开发者总量​

国家分布情况​

开发者总体分布​

开源开发者分布​

开源开发者的十年变迁​

中美开源项目的贡献分布​

一句话总结​

什么是 MCP？​

OpenDigger MCP Server​

数据报告示例​

结论​

热点事件：DeepSeek 开源周引爆全球 LLM 基础优化技术​

本月推荐项目

kvcache-ai/ktransformers​

huggingface/open-r1​

The Global Impact of DeepSeek: Pioneering a New Era of AI​

Overview​

Star Growth Analysis​

Participants Distribution​

Key Findings​

Conclusion​

Hot Event 1: Ghostty is released, and it is still young again​

Hot Event 2: Generative AI Empowers Embodied Intelligence, Genesis Officially Released​

Recommended Projects of the Month

eliza​

blink.cmp​

标签体系建设​

指标结构​

行政区划​

发起主体​

社区项目​

总结​

标签工具建设​

Hot Event 1: BlueSky's Surge, Driven by US Elections and AI Wave​

Hot Event 2: Redis Attempts to Control Peripheral Projects, Valkey Community Continues to Grow​

Recommended Projects of the Month

Julia​

Zen Browser​

Hot Event 1: Linux Removes Russian Maintainers, Huawei Releases Native HarmonyOS​

Hot Event 2: Open Source Summer Programs Conclude, Global Summer Activities Thrive​

Recommended Projects of the Month

freeCodeCamp​

Bolt.new​

Hackpad​

OpenRank 影响力评价算法

OpenRank 贡献度评价算法

OpenDigger

背景介绍

OSPP 2024 宏观数据

年度贡献度分析

全域贡献分析

前言

全球开发者总量

国家分布情况

开发者总体分布

开源开发者分布

开源开发者的十年变迁

中美开源项目的贡献分布

一句话总结

什么是 MCP？

OpenDigger MCP Server

数据报告示例

结论

热点事件：DeepSeek 开源周引爆全球 LLM 基础优化技术

kvcache-ai/ktransformers

huggingface/open-r1

The Global Impact of DeepSeek: Pioneering a New Era of AI

Overview

Star Growth Analysis

Participants Distribution

Key Findings

Conclusion

Hot Event 1: Ghostty is released, and it is still young again

Hot Event 2: Generative AI Empowers Embodied Intelligence, Genesis Officially Released

eliza

blink.cmp

标签体系建设

指标结构

行政区划

发起主体

社区项目

总结

标签工具建设

Hot Event 1: BlueSky's Surge, Driven by US Elections and AI Wave

Hot Event 2: Redis Attempts to Control Peripheral Projects, Valkey Community Continues to Grow

Julia

Zen Browser

Hot Event 1: Linux Removes Russian Maintainers, Huawei Releases Native HarmonyOS

Hot Event 2: Open Source Summer Programs Conclude, Global Summer Activities Thrive

freeCodeCamp

Bolt.new

Hackpad