The elusive song (web scraping)

Jacob David C. Cunningham
4 min readJun 15, 2024

--

There is a song I’ve heard… I always to catch the last part of it before I get an indication of what it is. I do have SoundHound and I have a playlist… the issue is they only list the last 30 songs and I keep forgetting to look at it while I’m driving eg. doing deliveries.

Let me see if I can make this in an hour. It’s rote code for me.

You need:

  • scraper
  • database
  • CRON job

9:50 PM start

This is the target

Do it

I will be using Puppeteer for my scraper, it is JavaScript. You need to know the structure of the page to pull specific information out of it. Oh damn it’s Elementor lol oof memories.

Before you do something like this, you should check… is there a simpler solution eg. they’re using some API and it can accept a date/time range… that would be convenient.

10:23 PM

Alright I got stuck on something dumb (waitForSelector not working as expected). The song list is asynchronously loaded, you can wait for something to appear or just use delays too (not safe). But you can see it working here.

So I’ll just write that to a file and put it on my Raspberry Pi with a CRON job every half hour. Later I’ll write a thing to open all files/keep unique entries then make a final file.

10:35 PM

OMG keep running into problems. How to reference fs in this page.evaluate context or call a function outside, new thing I saw page.exposeFunction.

Anyway now it looks like this and writes the songs to a file with an epoch timestamp.

Oh where’s that GitHub Gist embed.

Now to put it on the Pi (FileZilla) and setup a CRON job.

crontab.guru is a good site for figuring out how to write an interval with CRON.

I did find this old Adele song recently lol “semaluhtounuyulohowwah”.

Well… these Pi’s are using old NodeJS… I need at least 16… having problems with JS eg. ??= and return { static { …

I’m just gonna put it on one of my VPS’s that I rent… nope… too old of Debian damn.

Lol this Pi’s node version is 0.10.29 OMG lol… I’m stuck on this. So here’s the dilemma… I have an encrypted at rest server that is not running and it has my server passwords in it lmao… so I can’t get in there… the Raspberry Pi’s are not doing well updating node or they’re running other things that can break if I upgrade… so what do I do?

I use another pi, a pi 4 lol. This one is just sitting in a box so I’ll use it.

What is my purpose? To scrape a website for a specific song.

Alright this one has node 18 on it so it should be good with that nullish coallescing error.

11:44 PM I keep running into problems man… puppeteer is struggling to install on here, cache miss for example. I know this tech is overkill for this task, except the async loaded songs part. Writing the code was simple it’s deploying it into an environment that seems to be annoying, I’d say you could use a lambda/serverless but idk if puppeteer is just there to use.

Well I’m gonna sleep, imagine it ended well.

Update

Omg I’m so mad… there I was doing a delivery and the song plays right. I catch it playing near the end again. I frantically open up the playlist… it’s not there! I later realize the playlist doesn’t match what played… wtf? I try SoundHound it fails to recognize it, I try to record it with my camera just so I have lyrics… camera freezes when I have two delivery apps going ahhhhhhh.

OMG

I found it yes. It’s Fontaines D.C. — Starburster

--

--