If it ain't broke
don'tfix it.
Isn't this image so satisfying?
(o´▽`o)
I have a huge collection of music and wanted to make the listening experience in Navidrome
better and more immersive. So I used a Philips WiZ light bulb and designed a system that smoothly transitions my
room lighting between the 5 most dominant colors of the currently playing album art to keep the experience
intimate and cohesive.
Somehow, that led me into the embedding model rabbithole. Soon enough, I had a
system where I could type a prompt like "soft relaxing bright" or "campfire" or "deep sea" or anything at all and
my room's lights would change to that. I learnt a lot of things like, why bigger model is not always better, etc.
Table of contents →
It's good that I'm immune to seizure-inducing graphics or else i'd be
(°o°:;)
dotenv library to setup the variables like so. There are 2 modes in the script.
Mode 1 ends the lighting show after the current album stops playing . Whereas Mode
2 dynamically checks everytime album has changed and crossfades the colors.
64x64.
Then the colorspace is changed to the HSV format. I intentionally reduced the hue resolution
and cranked up saturation and brightness, so that I can get sharp colors instead of having to deal with 100
different shades of red. Now, I can have upto 32 colors as input.
32 colors here does not limit me to 32 light colors, I have a trick to get the full 16.7 million colors (^_-)
BRIGHTNESS in the .env file so that I can manually set the brightness or control it based on
time of day or other external factors. Letting the album art decide the Brightness value is kind of like a
missed opportunity in my opinion.
So... uhh....
A thesaurus that needs a graphics card
(ㆆ_ㆆ)
colors.json, but the results were
underwhelming. So I decided to add another layer of computa- ahem, intelligence to
the program. My plan was something like this:| Model | Time Taken | Quality (opinionated) | Notes | Results |
|---|---|---|---|---|
| granite:33m | 00:16:03 | linkedin 🧠📉 |
i would use rapidfuzz or pywal instead, like a cavemanTruth be told, it seems like this is better for RAG than creative tasks. Still very 🧠🪦 |
check here |
| all-minilm:22m | 00:09:52 | excellent | it's better than 33m variant, but feels robotic | check here |
| all-minilm:33m | 00:11:13 | good | no soul + a bit corpo | check here |
| snowflake-arctic-embed:22m | 00:08:07 | excellent | perfect model for this use case | check here |
| snowflake-arctic-embed:33m | 00:09:36 | good | loses character compared to smaller model | check here |
larger modelsnomic-embed-moe, bge-m3, mxbai-embed-large |
1-2 hours | bad | not really good for this use case |
NAnot gonna waste any more time |
I just like Qwen alirght?
Definitely not sponsored by Alibaba...
shhh...
| Model | Speed | Quality | Notes |
|---|---|---|---|
| qwen3.5:0.8b | medium | Best overall | Good creativity |
| qwen3:4b | slowest | Straight up bad | "aktually 🤓☝️" and harder to follow rules |
| qwen3:1.7b | fast | Cinematic | Very good but always "soft glow", "ethereal", "muted tones" |
| qwen3:0.6b | fast | Repetitive | Same as 1.7b but obsessed with "ethereal" |
| qwen2.5:0.5b | fastest | Alright | Very basic model, just sticks to catchphrases like "ethereal" |
Qwen 3.5: 0.5b took approximately 6
hours.
Prompt to run it locally on my 3GB vram GTX 1060.
LLM had generated ~5 (excluding negative prompts which were filtered) extra tags
for each of the 31k colors, I ran a loop, found embeddings for all of them and saved them in numpy verctor
arrays, because we gonna do some vector math . This process took approximately 8 minutes.
semantic search in the numpy vector array and ranks the top 8 colors with the most embedding
matching. It first generates an embedding for input text, then searches the array using cosine
similarity.
So it finds the "direction" of the input phrase.
Get it? Get it?
(✖╭╮✖)
Ugh...
nevermind