Skip to content

feat: add image compression and upload pipeline for org logos#102

Closed
aayank13 wants to merge 4 commits into
ketankauntia:masterfrom
aayank13:feat/image-compression-upload-pipeline
Closed

feat: add image compression and upload pipeline for org logos#102
aayank13 wants to merge 4 commits into
ketankauntia:masterfrom
aayank13:feat/image-compression-upload-pipeline

Conversation

@aayank13
Copy link
Copy Markdown

@aayank13 aayank13 commented Feb 20, 2026

Summary

Adds an image processing pipeline for GSoC organization logos that:

  • Downloads org logos from the GSoC API
  • Compresses them to optimized WebP format via sharp
  • Renames them to {slug}.webp
  • Uploads them to Cloudflare R2

Also creates the images/ folder structure with tech-stack/ and 2026/ subfolders.

Closes #96

Changes

New Files

  • scripts/lib/r2-client.ts — Reusable Cloudflare R2 upload client (S3-compatible)
  • scripts/lib/image-processor.ts — Image download (with retry) + WebP compression
  • scripts/process-org-images.ts — Main pipeline orchestrator with --dry-run and --local-only modes
  • images/tech-stack/.gitkeep — Placeholder for future tech-stack icons
  • images/2026/.gitkeep — Output directory for processed 2026 org logos

Modified Files

  • package.json — Added sharp, @aws-sdk/client-s3; new gsoc:images script; updated gsoc:sync
  • .gitignore — Ignore generated image files in images/
  • transform-year-organizations.ts — Fixed img_r2_url for new orgs (was incorrectly set to raw API URL)

Usage

pnpm gsoc:images -- --year 2026 --dry-run     # Preview what would be processed
pnpm gsoc:images -- --year 2026 --local-only   # Download + compress only
pnpm gsoc:images -- --year 2026                # Full pipeline with R2 upload


<!-- This is an auto-generated comment: release notes by coderabbit.ai -->
## Summary by CodeRabbit

* **New Features**
  * Added an automated org-image pipeline: download logos, compress to WebP, optional dry-run and local-only modes, and optional upload to cloud storage; updates per-org metadata when uploads occur.

* **Chores**
  * Added new npm scripts to run the image processing workflow.
  * Added dev dependencies for image processing and cloud uploads.
  * Updated ignore patterns to exclude generated org images (webp/png/jpg).

* **Bug Fixes**
  * Change: new org entries no longer default img_r2_url to the original logo URL.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

@vercel
Copy link
Copy Markdown

vercel Bot commented Feb 20, 2026

@aayank13 is attempting to deploy a commit to the Ketan's Personal Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 20, 2026

Warning

Rate limit exceeded

@aayank13 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 15 minutes and 28 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Warning

.coderabbit.yaml has a parsing error

The CodeRabbit configuration file in this repository has a parsing error and default settings were used instead. Please fix the error(s) in the configuration file. You can initialize chat with CodeRabbit to get help with the configuration file.

💥 Parsing errors (1)
Validation error: Expected 'de' | 'de-DE' | 'de-AT' | 'de-CH' | 'en' | 'en-US' | 'en-AU' | 'en-GB' | 'en-CA' | 'en-NZ' | 'en-ZA' | 'es' | 'es-AR' | 'fr' | 'fr-CA' | 'fr-CH' | 'fr-BE' | 'nl' | 'nl-BE' | 'pt-AO' | 'pt' | 'pt-BR' | 'pt-MZ' | 'pt-PT' | 'ar' | 'ast-ES' | 'ast' | 'be-BY' | 'be' | 'br-FR' | 'br' | 'ca-ES' | 'ca' | 'ca-ES-valencia' | 'ca-ES-balear' | 'da-DK' | 'da' | 'de-DE-x-simple-language' | 'el-GR' | 'el' | 'eo' | 'fa' | 'ga-IE' | 'ga' | 'gl-ES' | 'gl' | 'it' | 'ja-JP' | 'ja' | 'km-KH' | 'km' | 'ko-KR' | 'ko' | 'pl-PL' | 'pl' | 'ro-RO' | 'ro' | 'ru-RU' | 'ru' | 'sk-SK' | 'sk' | 'sl-SI' | 'sl' | 'sv' | 'ta-IN' | 'ta' | 'tl-PH' | 'tl' | 'tr' | 'uk-UA' | 'uk' | 'zh-CN' | 'zh' | 'crh-UA' | 'crh' | 'cs-CZ' | 'cs' | 'nb' | 'no' | 'nl-NL' | 'de-DE-x-simple-language-DE' | 'es-ES' | 'it-IT' | 'fa-IR' | 'sv-SE' | 'de-LU' | 'fr-FR' | 'bg-BG' | 'bg' | 'he-IL' | 'he' | 'hi-IN' | 'hi' | 'vi-VN' | 'vi' | 'th-TH' | 'th' | 'bn-BD' | 'bn', received object at "language"
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Walkthrough

Adds an image processing pipeline and CLI to download, retry, compress logos to WebP, save under images/, and optionally upload them to Cloudflare R2; includes R2 client helpers, local processing utilities, a orchestrating script with flags, package.json scripts/devDependencies, and .gitignore entries for generated images.

Changes

Cohort / File(s) Summary
Ignore & package scripts
/.gitignore, package.json
Added ignore patterns for generated images/* (*.webp, *.png, *.jpg). Added gsoc:images script and updated gsoc:sync to invoke the image processing step; new devDependencies: @aws-sdk/client-s3, @smithy/node-http-handler, sharp.
Image processing utilities
scripts/lib/image-processor.ts
New utilities: downloadImage (retries, timeout), compressToWebP, processAndSaveLocally, sleep, and interfaces CompressOptions/ProcessResult. Handles local saving and compression to WebP.
R2 client
scripts/lib/r2-client.ts
New Cloudflare R2 S3-compatible client: lazy S3Client creation, uploadToR2 (PutObjectCommand) and getR2PublicUrl; reads required R2 env vars and throws on missing config.
Orchestration script
scripts/process-org-images.ts
New CLI that reads raw org JSON, filters orgs, processes images (download, compress, save), optionally uploads to R2, updates per-org JSON with img_r2_url/logo_r2_url; supports --year, --dry-run, --local-only, per-item delays, retries, logging and aggregated results.
Org transform tweak
scripts/transform-year-organizations.ts
Minor formatting change and behavioral change: newly created org objects now set img_r2_url to an empty string ("") instead of defaulting to raw.logo_url.

Sequence Diagram(s)

sequenceDiagram
    participant Script as process-org-images.ts
    participant FS as File System
    participant Processor as image-processor.ts
    participant R2Client as r2-client.ts
    participant CloudflareR2 as Cloudflare R2

    Script->>FS: Read raw org JSON & per-org JSON files
    loop for each org with logo_url and not already on R2
        Script->>Processor: processAndSaveLocally(logo_url, outputDir, slug, options)
        activate Processor
        Processor->>Processor: downloadImage (retries, timeout)
        Processor->>Processor: compressToWebP (sharp)
        Processor->>FS: Ensure images/<YEAR>/ and save slug.webp
        Processor-->>Script: Return local image path + sizes
        deactivate Processor

        alt not --local-only
            Script->>R2Client: uploadToR2(key, Buffer, contentType)
            activate R2Client
            R2Client->>CloudflareR2: PutObjectCommand (S3 API)
            CloudflareR2-->>R2Client: Success
            R2Client-->>Script: Return public R2 URL
            deactivate R2Client
            Script->>FS: Update per-org JSON with img_r2_url/logo_r2_url
        end
    end
    Script->>Script: Log summary (processed, skipped, failed)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibble bytes and hop with glee,
I fetch each logo from the sea.
I squeeze to webp and tuck it tight,
To R2 clouds it takes its flight.
A tiny hop — pixels shine bright! 📸✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add image compression and upload pipeline for org logos' clearly and concisely summarizes the main change in the PR.
Description check ✅ Passed The PR description includes a comprehensive summary, detailed changes section, and usage examples covering all main aspects of the implementation.
Linked Issues check ✅ Passed All requirements from issue #96 are met: downloads org logos from GSoC API, compresses to WebP, renames to {slug}.webp format, uploads to Cloudflare R2, and creates images/ folder with tech-stack and 2026 subfolders.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the image processing pipeline requirements; minor fixes in transform-year-organizations.ts align with the pipeline's needs.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (3)
package.json (1)

57-57: sharp ^0.33.0 will not resolve to the current 0.34.x series.

For packages with a 0.x.y version, the ^ range only allows patch increments within the same minor (0.33.*). The latest published version is 0.34.5, which includes upstream libvips bug fixes and TypeScript improvements. Consider bumping to ^0.34.0 to pick up those fixes.

💡 Proposed change
-"sharp": "^0.33.0",
+"sharp": "^0.34.0",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@package.json` at line 57, The package.json currently pins the sharp
dependency to "sharp": "^0.33.0", which will not pick up the 0.34.x series;
update the sharp entry to use "^0.34.0" so the project can receive the 0.34.*
bugfix and TypeScript improvements, then regenerate your lockfile by running
your package manager install (npm/yarn/pnpm) to update package-lock.json or
yarn.lock accordingly; ensure any CI/cache is refreshed so the new version is
used.
scripts/lib/r2-client.ts (1)

26-33: Consider setting a request timeout on the S3 client.

The AWS SDK v3 S3Client has no default socket/request timeout; a stalled upload to R2 will hang the script indefinitely. Add requestHandler or maxAttempts config, or at minimum a socketTimeout.

💡 Suggested timeout config
+import { NodeHttpHandler } from "@smithy/node-http-handler";
+
 _client = new S3Client({
     region: "auto",
     endpoint: `https://${accountId}.r2.cloudflarestorage.com`,
     credentials: {
         accessKeyId: getEnvOrThrow("R2_ACCESS_KEY_ID"),
         secretAccessKey: getEnvOrThrow("R2_SECRET_ACCESS_KEY"),
     },
+    requestHandler: new NodeHttpHandler({
+        requestTimeout: 30_000,
+        socketTimeout: 30_000,
+    }),
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lib/r2-client.ts` around lines 26 - 33, The S3Client instantiation
assigned to _client lacks a request timeout and can hang; update the S3Client
config in the S3Client(...) call to include a requestHandler with timeouts
(e.g., import and use NodeHttpHandler and pass requestHandler: new
NodeHttpHandler({ socketTimeout: <ms>, connectionTimeout: <ms> })) and/or set
maxAttempts to a sensible retry limit so R2 uploads won't stall indefinitely;
update the S3Client(...) call where _client is created to include these options.
scripts/lib/image-processor.ts (1)

58-74: processAndSaveLocally is exported but its functionality is duplicated inline in process-org-images.ts.

process-org-images.ts manually calls downloadImagecompressToWebPfs.writeFileSync (lines 117–121) instead of calling processAndSaveLocally. The only difference is the inline size comparison log. Consider extending processAndSaveLocally to return both buffers (or sizes) so callers can retain size logging while avoiding the duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lib/image-processor.ts` around lines 58 - 74, processAndSaveLocally
duplicates logic in process-org-images.ts; change processAndSaveLocally to
return both the saved file path and size info (e.g., { outputPath: string,
originalSize: number, compressedSize: number } or include the
original/compressed Buffers) so callers can log size differences without
reimplementing downloadImage/compressToWebP/write logic. Update
processAndSaveLocally (the function shown) to capture original buffer size
before compression and compressed buffer size after compressToWebP, write the
file as now, and return the sizes alongside outputPath; then replace the manual
download/compress/write sequence in process-org-images.ts with a call to
processAndSaveLocally and use the returned sizes for the existing size
comparison log.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@package.json`:
- Line 22: The npm script "gsoc:sync" currently invokes
scripts/process-org-images.ts without the --local-only flag, which forces R2
uploads and causes silent per-org failures when R2_* env vars are not set;
update the "gsoc:sync" entry to call scripts/process-org-images.ts --local-only
(so uploads are decoupled and handled by the separate "gsoc:images" script), or
alternatively add a README note that "gsoc:sync" requires R2 credentials (R2_*
env vars) if you want to keep the current behavior; reference the "gsoc:sync"
npm script and scripts/process-org-images.ts and "gsoc:images" to make the
change or documentation clear.

In `@scripts/lib/image-processor.ts`:
- Around line 15-39: The downloadImage function currently uses fetch without a
timeout, so a stalled response will never throw and retries won't trigger;
modify downloadImage to create an AbortController for each fetch attempt, pass
controller.signal into fetch(url, { signal }), start a per-attempt timer (e.g.,
via setTimeout) that calls controller.abort() after a configured per-attempt
timeout, and clear the timer when the response is received or on error; ensure
the abort error is handled like other errors so the loop retries (using existing
lastError, RETRY_DELAY_MS and MAX_RETRIES) and that the controller/timer are
properly cleaned up each attempt to avoid leaks.

In `@scripts/process-org-images.ts`:
- Around line 76-88: The skip-and-update logic uses raw.slug directly to build
orgFile so aliased slugs (SLUG_ALIASES) never resolve and R2 URLs aren't
persisted; import or duplicate the SLUG_ALIASES mapping and resolve the
canonical file slug before any filesystem lookup (i.e., compute a resolvedSlug
from SLUG_ALIASES[raw.slug] || raw.slug) and use that when constructing orgFile
(used in the pre-skip check and in updateOrgJson), ensuring both the existence
check and the write/update target the actual JSON filename under ORGS_DIR.
- Line 45: R2_URL_PREFIX is hardcoded which breaks the skip check that uses
currentR2.startsWith(R2_URL_PREFIX); instead derive the prefix from the same
source as r2-client (use the R2_PUBLIC_URL env var or call getR2PublicUrl from
r2-client) so the skip logic matches the actual public URL; update the
declaration of R2_URL_PREFIX to compute its value from process.env.R2_PUBLIC_URL
(or import and call getR2PublicUrl) with the existing literal as a fallback, and
ensure the currentR2.startsWith(...) check uses this computed value.
- Around line 127-132: The R2 upload uses r2Key = `${raw.slug}.webp` which omits
the year and causes cross-year overwrites; update the r2Key construction in the
block that checks LOCAL_ONLY (where uploadToR2 is called) to include the same
year segment used for local saves (e.g., `${year}/${raw.slug}.webp` or whatever
variable holds the YEAR), so the remote key mirrors the local path; ensure any
logging (console.log) and references to r2Url remain unchanged after this
change.
- Around line 150-163: The script currently logs failures but never sets a
non-zero exit code; update the end of the script where failures is inspected
(the block that prints "[FAILURES]" and the LOCAL_ONLY messages) to call
process.exit(1) when failures.length > 0 so CI fails on any upload errors;
ensure you only skip the exit when LOCAL_ONLY is true and uploads were
intentionally not attempted (or always exit non-zero regardless of LOCAL_ONLY if
you prefer the simpler behavior), referencing the failures array and the
existing LOCAL_ONLY/IMAGES_DIR logic to decide when to call process.exit(1).

---

Nitpick comments:
In `@package.json`:
- Line 57: The package.json currently pins the sharp dependency to "sharp":
"^0.33.0", which will not pick up the 0.34.x series; update the sharp entry to
use "^0.34.0" so the project can receive the 0.34.* bugfix and TypeScript
improvements, then regenerate your lockfile by running your package manager
install (npm/yarn/pnpm) to update package-lock.json or yarn.lock accordingly;
ensure any CI/cache is refreshed so the new version is used.

In `@scripts/lib/image-processor.ts`:
- Around line 58-74: processAndSaveLocally duplicates logic in
process-org-images.ts; change processAndSaveLocally to return both the saved
file path and size info (e.g., { outputPath: string, originalSize: number,
compressedSize: number } or include the original/compressed Buffers) so callers
can log size differences without reimplementing
downloadImage/compressToWebP/write logic. Update processAndSaveLocally (the
function shown) to capture original buffer size before compression and
compressed buffer size after compressToWebP, write the file as now, and return
the sizes alongside outputPath; then replace the manual download/compress/write
sequence in process-org-images.ts with a call to processAndSaveLocally and use
the returned sizes for the existing size comparison log.

In `@scripts/lib/r2-client.ts`:
- Around line 26-33: The S3Client instantiation assigned to _client lacks a
request timeout and can hang; update the S3Client config in the S3Client(...)
call to include a requestHandler with timeouts (e.g., import and use
NodeHttpHandler and pass requestHandler: new NodeHttpHandler({ socketTimeout:
<ms>, connectionTimeout: <ms> })) and/or set maxAttempts to a sensible retry
limit so R2 uploads won't stall indefinitely; update the S3Client(...) call
where _client is created to include these options.

Comment thread package.json Outdated
Comment thread scripts/lib/image-processor.ts
Comment thread scripts/process-org-images.ts Outdated
Comment thread scripts/process-org-images.ts Outdated
Comment thread scripts/process-org-images.ts
Comment thread scripts/process-org-images.ts
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
scripts/process-org-images.ts (2)

116-117: failed counter is redundant — failures.length already tracks the same value.

failed is incremented in lockstep with every failures.push(...), so failed === failures.length always holds. The summary log can use failures.length directly.

♻️ Proposed cleanup
-    let processed = 0;
-    let failed = 0;
+    let processed = 0;
     const failures: Array<{ slug: string; error: string }> = [];

     // ...inside catch block:
-            failed++;
             failures.push({ slug: raw.slug, error: errorMsg });

     // ...summary:
-    console.log(`  Failed:    ${failed}`);
+    console.log(`  Failed:    ${failures.length}`);

Also applies to: 153-153, 161-163

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/process-org-images.ts` around lines 116 - 117, The extra numeric
counter failed is redundant because failures.push(...) already tracks failures;
remove the failed variable declaration and every failed++ update (where
failures.push(...) is called) and change any uses of failed (e.g., in the final
summary log) to use failures.length instead so processed remains and
failures.length provides the failure count.

113-115: Redundant existsSync guard before mkdirSync.

fs.mkdirSync(path, { recursive: true }) is already a no-op when the directory exists. The existence check adds no safety and can be dropped.

♻️ Proposed simplification
-    if (!fs.existsSync(IMAGES_DIR)) {
-        fs.mkdirSync(IMAGES_DIR, { recursive: true });
-    }
+    fs.mkdirSync(IMAGES_DIR, { recursive: true });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/process-org-images.ts` around lines 113 - 115, The check using
fs.existsSync(IMAGES_DIR) before creating the directory is redundant; remove the
if-block and call fs.mkdirSync(IMAGES_DIR, { recursive: true }) directly so the
directory is created if missing and is a no-op if it already exists—update the
block containing IMAGES_DIR and the fs.mkdirSync call accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@scripts/process-org-images.ts`:
- Around line 50-54: Extract the hardcoded SLUG_ALIASES map into a shared module
and import it into both scripts so aliases stay synchronized; specifically, move
the SLUG_ALIASES constant into a new shared file (e.g., export a const ALIASES)
and replace the inline SLUG_ALIASES usage in process-org-images.ts and
transform-year-organizations.ts with imports from that shared module, ensuring
both files reference the same exported identifier (SLUG_ALIASES or ALIASES) so
updates are centralized.

---

Nitpick comments:
In `@scripts/process-org-images.ts`:
- Around line 116-117: The extra numeric counter failed is redundant because
failures.push(...) already tracks failures; remove the failed variable
declaration and every failed++ update (where failures.push(...) is called) and
change any uses of failed (e.g., in the final summary log) to use
failures.length instead so processed remains and failures.length provides the
failure count.
- Around line 113-115: The check using fs.existsSync(IMAGES_DIR) before creating
the directory is redundant; remove the if-block and call
fs.mkdirSync(IMAGES_DIR, { recursive: true }) directly so the directory is
created if missing and is a no-op if it already exists—update the block
containing IMAGES_DIR and the fs.mkdirSync call accordingly.

@aayank13
Copy link
Copy Markdown
Author

@ketankauntia check the above PR

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
scripts/process-org-images.ts (1)

121-122: failed counter is redundant — it always equals failures.length.

failed++ and failures.push(...) are always called together, so failed carries no independent information. failures.length can be used directly everywhere.

♻️ Proposed refactor
     let processed = 0;
-    let failed = 0;
     const failures: Array<{ slug: string; error: string }> = [];
-            failed++;
-    console.log(`  Failed:    ${failed}`);
+    console.log(`  Failed:    ${failures.length}`);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/process-org-images.ts` around lines 121 - 122, Remove the redundant
`failed` counter: delete the `let failed = 0;` declaration and remove every
`failed++` update where you currently also call `failures.push(...)`; replace
any uses of `failed` (e.g., in logging, metrics, or return values) with
`failures.length` so the code relies on the single source of truth `failures`
instead of duplicating state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/process-org-images.ts`:
- Around line 121-122: Remove the redundant `failed` counter: delete the `let
failed = 0;` declaration and remove every `failed++` update where you currently
also call `failures.push(...)`; replace any uses of `failed` (e.g., in logging,
metrics, or return values) with `failures.length` so the code relies on the
single source of truth `failures` instead of duplicating state.

@ketankauntia
Copy link
Copy Markdown
Owner

@ketankauntia check the above PR

I'm traveling. I'll review the pr on 25th or 26th positively

@aayank13
Copy link
Copy Markdown
Author

Sure :)

@vercel
Copy link
Copy Markdown

vercel Bot commented Feb 26, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gsoc-orgs Ready Ready Preview, Comment Feb 26, 2026 1:56pm

import path from "path";
import sharp from "sharp";

const MAX_RETRIES = 3;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would suggest, constants to be moved to a proper constants file for better handling.

input: Buffer,
options: CompressOptions = {},
): Promise<Buffer> {
const { width = 400, quality = 80 } = options;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you come up with this specific size? what did you take into account? and how much of a size compression will be see on an avg?

* @param options - Optional compression settings.
* @returns The output path and original/compressed sizes in bytes.
*/
export async function processAndSaveLocally(
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay.. one thing that i noticed, a better way will be to export all these fn to their separate files as they do specific tasks. and then call them in one final file for better readability and keeping things properly structured.

Comment thread scripts/lib/r2-client.ts
import { NodeHttpHandler } from "@smithy/node-http-handler";

/**
* Reads a required environment variable or throws an error if it is not set.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code can be generalized and separated out to a new file and extended to rest of the codebase. why just keep for r2 then?

* in our local dataset (new-api-details/organizations/).
* Shared by both transform-year-organizations.ts and process-org-images.ts.
*/
export const SLUG_ALIASES: Record<string, string> = {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the use of this file? please explain. do we need this specifically? or did you just cursored it? (there is no harm in cursoring, just curious)

process.exit(1);
}

const ROOT = process.cwd();
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for these to be exported to a const file instead.

category: raw.categories?.[0] || "Other",
description: raw.description || raw.tagline || "",
image_url: raw.logo_url || "",
img_r2_url: raw.logo_url || "",
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this?

* GSoC Org Image Processing Pipeline
*
* Downloads org logos from the GSoC API, compresses to WebP,
* saves locally, and optionally uploads to Cloudflare R2.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok just a quick question, why no delete the .png files and .webp files after uploading to r2?

Also, it will be better to introduce a field name gsoc_img_url:"" that points to the image hosted by gsoc website so that we have a proper link to the image from which we can poll and check.

this way we do not have to run through the entire gsoc api instead check internally only for the list of new orgs that appeared in the latest gsoc edition and only work on those images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Image compression & upload pipeline

2 participants