FRESH

Hacker News

Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon

216 points by MediaSquirrel

by LuxBennu

3 subcomments

I run whisper large-v3 on an m2 max 96gb and even with just inference the memory gets tight on longer audio, can only imagine what fine-tuning looks like. Does the 64gb vs 96gb make a meaningful difference for gemma 4 fine-tuning or does it just push the oom wall back a bit? Been wanting to try local fine-tuning on apple silicon but the tooling gap has kept me on inference only so far.

by conception

0 subcomment

I’m pretty excited about the edge gallery ios app with gemma 4 on it but it seems like they hobbled it, not giving access to intents and you have to write custom plugins for web search, etc. Does anyone have a favorite way to run these usefully? ChatMCP works pretty well but only supports models via api.

by sails

0 subcomment

> Accent, dialect, and low-resource language adaptation — adapt a base Gemma model to underrepresented voices and languages with your own labeled audio.
Is this for TTS? Have been looking for something to do a local fine tune to get a specific accent

by craze3

0 subcomment

Nice! I've been wanting to try local audio fine-tuning. Hopefully it works with music vocals too

by mandeepj

1 subcomments

> I had 15,000 hours of audio data
do you really need that much data for fine-tuning?

by dsabanin

1 subcomments

by yousifa

0 subcomment

by m3kw9

0 subcomment

by neonstatic

2 subcomments

Just a heads up, that I found NVIDIA Parakeet to be way better than Whisper - faster, uses less compute, the output is better, and there are more options for the output. I am using parakeet-mlx from the command line. Check it out!

by pivoshenko

0 subcomment

by takahitoyoneda

0 subcomment

by sayYayToLife

0 subcomment