This project investigates different approaches for programmatically updating GitHub files without checking out the entire repository.
Three approaches are evaluated:
- Kohsuke GitHub API (
org.kohsuke:github-api) - High-level Java client - Direct REST API (with OkHttp) - Maximum control and flexibility
- GraphQL API - Optimal for atomic multi-file operations
- Java 21
- Gradle 8.x
- GitHub Personal Access Token with
reposcope
-
Clone this project (or create the structure below)
-
Set your GitHub token:
export GITHUB_TOKEN=your_github_personal_access_token -
Create the project structure:
. ├── build.gradle └── src └── main └── java └── com └── examples └── github ├── GitHubApiComparison.java └── apis ├── KohsukeGitHubExample.java ├── RestApiExample.java └── GraphQLApiExample.java
./gradlew build./gradlew runThis will prompt you for:
- Repository owner
- Repository name
- File path to update
- Branch name
Kohsuke API Example:
./gradlew runKohsukeExampleDirect REST API Example:
./gradlew runRestApiExampleGraphQL API Example:
./gradlew runGraphQLExampleNone of these approaches require cloning or checking out the repository. They all work directly with GitHub's APIs.
Pros:
- Most mature and well-documented Java library
- High-level abstraction makes it easy to use
- Active maintenance and community support
- Excellent for single file operations
Cons:
- Multiple API calls required for multi-file updates (one commit per file)
- Less control over HTTP layer details
Best for: General-purpose GitHub file operations in Java
Example:
GitHub github = new GitHubBuilder().withOAuthToken(token).build();
GHRepository repo = github.getRepository("owner/repo");
GHContent file = repo.getFileContent("path/to/file.txt", "main");
repo.createContent()
.path("path/to/file.txt")
.content("new content")
.sha(file.getSha())
.branch("main")
.message("Update file")
.commit();Pros:
- Maximum control and flexibility
- No dependency on third-party library bugs or updates
- Can implement custom retry logic, rate limiting, etc.
- Direct access to latest GitHub API features
Cons:
- More boilerplate code
- Need to manually handle API changes
- More error-prone (manual JSON construction)
Best for: Edge cases, custom requirements, or when you need fine-grained control
Example:
// GET current file SHA
GET /repos/{owner}/{repo}/contents/{path}
// PUT update
PUT /repos/{owner}/{repo}/contents/{path}
{
"message": "Update file",
"content": "base64_encoded_content",
"sha": "current_file_sha",
"branch": "main"
}Pros:
- Atomic multi-file commits - update many files in ONE commit
- More efficient for batch operations
- Better rate limit utilization
- Single source of truth for complex queries
Cons:
- More complex query/mutation construction
- Less mature Java tooling
- Overkill for simple single-file updates
- Steeper learning curve
Best for: Batch file operations, complex workflows requiring atomicity
Example:
mutation {
createCommitOnBranch(input: {
branch: {
repositoryNameWithOwner: "owner/repo"
branchName: "main"
}
message: { headline: "Update multiple files" }
fileChanges: {
additions: [
{ path: "file1.txt", contents: "base64_content" }
{ path: "file2.txt", contents: "base64_content" }
]
}
expectedHeadOid: "current_commit_sha"
}) {
commit { oid }
}
}| Approach | Single File | 10 Files | Checkout Required | Complexity |
|---|---|---|---|---|
| Kohsuke API | ~500ms | ~5s (10 commits) | ❌ No | Low |
| REST API | ~400ms | ~4s (10 commits) | ❌ No | Medium |
| GraphQL | ~600ms | ~800ms (1 commit) | ❌ No | High |
While REST and Kohsuke APIs work great for simple operations, GraphQL excels in specific scenarios:
Problem with REST:
# Deploying a website with 10 files
PUT /repos/owner/repo/contents/index.html # Commit 1
PUT /repos/owner/repo/contents/styles.css # Commit 2
PUT /repos/owner/repo/contents/script.js # Commit 3
# ... 7 more commits
# Result: 10 separate commits, messy git history, 20+ API callsGraphQL Solution:
mutation {
createCommitOnBranch(input: {
message: {headline: "Deploy website v2.0"}
fileChanges: {
additions: [
{path: "index.html", contents: "..."}
{path: "styles.css", contents: "..."}
{path: "script.js", contents: "..."}
# ... all 10 files
]
}
})
}
# Result: 1 atomic commit, clean history, 1 API callReal-world scenarios:
- Static site deployments (HTML + CSS + JS + images)
- Multi-file refactoring that should be one logical change
- Documentation updates across multiple markdown files
- Configuration changes affecting multiple config files
- Database migrations with multiple SQL files
Updating 50 files:
| Approach | API Calls | Rate Limit Impact |
|---|---|---|
| REST API | 100 (50 GETs + 50 PUTs) | 2% of 5,000/hour quota |
| GraphQL | 1 mutation | ~0.1% of quota |
For high-frequency automation or CI/CD pipelines, this difference is significant.
REST API Race Condition:
# Service A: GET file.txt (SHA: abc123) at T0
# Service B: GET file.txt (SHA: abc123) at T1
# Service A: PUT file.txt (SHA: abc123) → Success (new SHA: def456) at T2
# Service B: PUT file.txt (SHA: abc123) → CONFLICT! at T3GraphQL with Branch-Level Protection:
mutation {
createCommitOnBranch(input: {
expectedHeadOid: "current-branch-head-sha" # Checks entire branch state
fileChanges: { ... }
})
}
# Fails cleanly if ANY commit was made to the branch
# More robust than per-file SHA checkingScenario: Get repo info + branch details + file contents + commit history
REST API:
GET /repos/owner/repo # Request 1
GET /repos/owner/repo/branches # Request 2
GET /repos/owner/repo/contents/... # Request 3
GET /repos/owner/repo/commits # Request 4
# = 4 requests, over-fetching dataGraphQL:
query {
repository(owner: "owner", name: "repo") {
id
defaultBranchRef {
name
target {
... on Commit {
oid
message
history(first: 5) {
nodes { message, author { name } }
}
}
}
}
object(expression: "HEAD:README.md") {
... on Blob { text }
}
}
}
# = 1 request, get exactly what you needScenario: In one commit, you need to:
- Update 3 files
- Create 2 new files
- Delete 1 file
REST API:
# 6 requests (3 GET + 3 PUT for updates)
# 2 requests (2 PUT for creates)
# 2 requests (1 GET + 1 DELETE for deletion)
# = 10 API calls, 6 separate commitsGraphQL:
mutation {
createCommitOnBranch(input: {
fileChanges: {
additions: [
{path: "updated1.txt", contents: "..."}
{path: "updated2.txt", contents: "..."}
{path: "updated3.txt", contents: "..."}
{path: "new1.txt", contents: "..."}
{path: "new2.txt", contents: "..."}
]
deletions: [
{path: "old-file.txt"}
]
}
})
}
# = 1 API call, 1 atomic commit| Scenario | Files | Frequency | Recommendation | Reason |
|---|---|---|---|---|
| Update README | 1 | Occasional | Kohsuke/REST | Simplest approach |
| Update config | 1-2 | Daily | Kohsuke/REST | Easy, well-documented |
| CI/CD deployment | 5-20 | Per commit | GraphQL | Atomic commits, clean history |
| Batch migration | 50+ | One-time | GraphQL | Rate limit efficiency |
| Content sync | 10-100 | Hourly | GraphQL | Performance critical |
| Quick automation | Any | Ad-hoc | REST | Fast to prototype |
| Custom workflow | Any | Any | REST | Maximum flexibility |
Deploying a static blog (19 files: 1 HTML + 3 CSS + 5 JS + 10 images)
| Metric | REST API | GraphQL API | Winner |
|---|---|---|---|
| API Calls | 38 (19 GET + 19 PUT) | 1 mutation | ✅ GraphQL |
| Git Commits | 19 separate commits | 1 atomic commit | ✅ GraphQL |
| Rate Limit Usage | 38 requests | ~1 request | ✅ GraphQL |
| Git History | Messy, hard to revert | Clean, easy to revert | ✅ GraphQL |
| Time to Deploy | ~15-20 seconds | ~2-3 seconds | ✅ GraphQL |
| Code Complexity | Medium (loops, error handling) | Low (single mutation) | ✅ GraphQL |
| Learning Curve | Low | Medium | ✅ REST |
| Debugging | Easy | Medium | ✅ REST |
- Easy to use, well-documented
- Handles authentication, rate limiting automatically
- Perfect for typical file operations
- Best for teams new to GitHub API integration
- Atomic multi-file commits (THE killer feature)
- Significant performance advantage for bulk updates
- Essential for CI/CD pipelines with multi-file deployments
- Worth the complexity when updating 5+ files regularly
- Critical when clean git history matters
- Maximum flexibility
- Good for edge cases or specific integration needs
- Requires more maintenance
- Best when you need fine-grained control over HTTP layer
- REST API: 5,000 requests/hour (authenticated)
- GraphQL API: Calculated by query complexity (generally more efficient)
Both approaches count against your rate limit, so plan accordingly.
- Never commit tokens to version control
- Use environment variables or secure secret management
- Tokens should have minimum required scopes (
repofor private repos,public_repofor public only) - Consider using GitHub Apps for production systems
- GitHub REST API Documentation
- GitHub GraphQL API Documentation
- Kohsuke GitHub API Documentation
- Creating Personal Access Tokens
This is a spike/proof-of-concept project for evaluation purposes.