Skip to content

Extend kennankole's solution#385

Open
Judahmeek wants to merge 8 commits into
serpapi:masterfrom
Judahmeek:extend-kkole-solution
Open

Extend kennankole's solution#385
Judahmeek wants to merge 8 commits into
serpapi:masterfrom
Judahmeek:extend-kkole-solution

Conversation

@Judahmeek
Copy link
Copy Markdown

@Judahmeek Judahmeek commented Jun 1, 2026

Obviously, instead of trying to come up with my own solution & tests, I went & checked out the competition. @kennankole's solution (#379) was by far the most robust (although it also proved overly complicated). The only real flaws that it seemed have is that it was more computationally expensive & that it had no way to detect drift, while a more brittle solution, such as depending on CSS, will be faster and will break as soon as Google changes whatever CSS it depends on.

I initially tried to solve this by merging #381 & #379 together (you can see my initial refactor of @DanTaiko's work here), with the idea that parts of @kennankole's logic would serve as a backup for scenarios that @DanTaiko's solution couldn't cover, but the longer I worked on it, the more it felt unnecessarily complex, so I scrapped that idea & tried using #379 as a base.

The idea of robust logic that tries to address most possible variants & forms of drift that sits behind a search index that provides performance for known solutions is definitely the ideal, however, and I hope that my code illustrates an approximate example of that.

...

P.S. the search results for "Tom Cruise films" is probably way outside the scope of what y'all expected us to cover, but if I was going to address it more effectively, then I would have copied @dsojevic's solution of replacing the html mappings since the initial search results for that particular kind of query even have the anchor links be lazy-loaded.

Of course, redirecting users who searched for "Tom Cruise films" to search results for "Tom Cruise filmography" would probably be the best course of action.

Kennedy Omondi and others added 4 commits May 20, 2026 23:33
  Parse the bundled SERP HTML into a SerpApi-shaped array of
  `{name, extensions, link, image}` without making HTTP requests.

  - Detect carousel tiles by structural signal (`/search?...&stick=...` siblinggroups),
    not volatile Google CSS classes, so the parser works across Van Gogh and variant fixtures.
  - Resolve thumbnails by parsing `_setImagesSrc(ii, s, r)` blocks into an `id -> image` map,including unescaping `\x3d` and `\/` values emitted in inline JS.
  - Extract `extensions` from leaf text nodes under each anchor to avoid container-text noise (for example, concatenated `name+year`).
  - Resolve `image` from values already present in the page file:
    inline JS mapping, inline non-placeholder data URIs, and in-file `data-src`/ `src` URLs.
  - Add comprehensive RSpec coverage for golden output, cross-layout fixtures, item parsing, thumbnail indexing, and carousel selection behavior.
This is because I discovered that interactive search results, such as the results for "Tom Cruise Movies", do not contain anchors in the initial HTML

CI fix

fix for missed @anchor reference
Judahmeek added 2 commits June 2, 2026 00:14
Adding Tom cruise filmography results to contrast with the Tom Cruise movies results.

Adding the U.S. Presidents results because its parent data-attrid doesn't start with 'kc:' like most grid results
The changes to the group score method are what I'm most proud of.

The original method returned an array, which when run through the max function (in the tiles method ~ line 36), acts like a series of tiebreakers

This gives an overwhelming amount of weight to whatever quality proxy is measured first.

The other aspect of my changes that I would like to draw your attention to is the use of environment variables. It's a basic feature, but one I don't recall seeing in my competitors PRs.
@Judahmeek Judahmeek force-pushed the extend-kkole-solution branch from dc8b05e to 81a2bb2 Compare June 2, 2026 05:26
One flaw I noticed in nearly all competitors was relying on Google's image lazy-load script not to change in any way.

A more robust solution than mine would account for the _setImagesSrc function name to also possibly change & probably try only relying on the data:image structure as the initial clue.

It would make scanning the first script more computationally expensive, but detected variables could then be used to speed up processing of subsequent scripts.

Hopefully, Google never decides to combine all their lazy-loading scripts together. I'm not sure how that could be detected performantly, but I'm sure I could find a way, given enough time.
@Judahmeek Judahmeek changed the title Extend kennankole's solution [WIP] Extend kennankole's solution Jun 2, 2026
@Judahmeek Judahmeek marked this pull request as ready for review June 2, 2026 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant