FRESH

Hacker News

Show HN: VoxConvo – "X but it's only voice messages"

10 points by siim

by monadoid

0 subcomment

Cool idea! You should make it so that I can only play one audio message at once (currently if I click to start two, they both play simultaneously)

by 1bpp

3 subcomments

How would this prevent someone from just plugging ElevenLabs into it? Or the inevitable more realistic voice models? Or just a prerecorded spam message? It's already nearly impossible to tell if some speech is human or not. I do like the idea of recovering the emotional information lost in speech -> text, but I don't think it'd help the LLM issue.

by teunlao

0 subcomment

Impressive tech execution, but the format has fundamental scaling issues.
Clubhouse lost 93% of users from peak. WhatsApp sends 7 billion voice messages daily - but those are DMs, not feeds.
The math doesn't work: reading is 50-80% faster than listening. You can skim 50 text posts in 100 seconds. 50 voice posts? 15 minutes.
Voice works async 1-to-1. You built Twitter where every tweet is a 30-second voicemail nobody has time to listen to.
The transcription proves it - users will read, not listen. Which makes this "text feed with worse UX"

by zahlman

0 subcomment

by cdrini

1 subcomments

Neat idea! Not sure if I'm willing to register just try it, though. Having the main feed public would be nice! Or even a sample feed.

by esafak

0 subcomment

So you're going to reject recordings detected as computer generated, or human recorded from a computer-generated script?
I feel like you are making your users jump through hoops to do bot and slop detection, when you ought to be investing in technology to do the same. Here is a focusing question: would you still demand audio recordings if you had that technology?
Maybe you will court an interesting set of users when you do this? I just know I will not be one of them; ain't got time for that. Good luck.

by cjflog

0 subcomment

by oulipo2

0 subcomment

Idea is cool, but the STT is bad (at least with an accent), and the fact that you need to edit each word is too cumbersome

by jagged-chisel

0 subcomment