Skip to main content

GitHub

Data Source

OpenDigger uses GHArchive as its data source, collecting global event log data from GitHub and using ClickHouse cloud service as the underlying data infrastructure.

The code for data collection, cleaning, and database entry is not currently open sourced in the OpenDigger project and runs as a scheduled task every hour to import data into the database.

Missing Data

Due to potential service outages from GHArchive, there is data loss in OpenDigger's GitHub data source. The following time periods currently show missing data:

  • 2016-10-21-18
  • 2018-10-21-23
  • 2018-10-22-0 ~ 2018-10-22-1
  • 2019-05-08-12 ~ 2019-05-08-13
  • 2019-09-12-8 ~ 2019-09-13-5
  • 2020-03-05-22
  • 2020-06-10-12 ~ 2020-06-10-21
  • 2020-08-21-9 ~ 2020-08-23-15
  • 2020-10-30-17
  • 2021-08-25-17 ~ 2021-08-27-22
  • 2021-09-11-9
  • 2021-10-22-5 ~ 2021-10-22-22
  • 2021-10-23-2 ~ 2021-10-23-22
  • 2021-10-24-3 ~ 2021-10-24-22
  • 2021-10-25-1 ~ 2021-10-25-22
  • 2021-10-26-0 ~ 2021-10-29-17
  • 2023-05-14-19