• Psythik@lemm.ee
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    1 hour ago

    I need this for the nail salon. When is it hitting stores and for how much? (I didn’t see any mention of cost/availability in the article.)

  • yesman@lemmy.world
    link
    fedilink
    English
    arrow-up
    31
    arrow-down
    2
    ·
    10 hours ago

    I hate AI as much as the next lemming, but nobody is going to tell me a babble fish for real isn’t cool AF.

    • 𝕸𝖔𝖘𝖘@infosec.pub
      link
      fedilink
      English
      arrow-up
      2
      ·
      3 hours ago

      Except, unlike the real babble fish that feed on our thought waves, this one feeds on our environment and our planet’s future.

  • gedaliyah@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    1
    ·
    8 hours ago

    Anyone who likes this idea might also be interested in checking out RTranslator, an open source, on-device app, which has some similar functionality. You can connect two Bluetooth devices using this app to communicate between two people in different languages.

    It can’t translate multiple speakera simultaneously or clone voices, but it’s very useful for traveling or communicating with friends and family in multiple languages. Especially since it does not need any connection, it comes in handy on the road when you might not have a reliable connection.

  • stoy@lemmy.zip
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    3
    ·
    10 hours ago

    Ok, so this concept is cool, but has a few problems…

    1. Privacy, this is far too complex to run on the headphones themselves, so the system will need to connect to a server to do the heavy lifting, what happens to the data once it used? For legal purposes I suspect it will need to be saved, meaning that any thing recorded could be analyzed or monitored.
    2. Trust, AI models have rules in place to make them act in specific ways, the owner of the AI system used could tweak it to change what spoken or how it is said, this could push political agendas in everyday conversations.
    3. Reduced lingual skills, an AI like this would reduce the incentive to learn another language, reducing people’s international direct communications, increasing dependancy on the AI service, further reducing our lingual skills.

    This is scary…

    • Psythik@lemm.ee
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      1 hour ago

      Next time try reading the article first before you comment.

      • stoy@lemmy.zip
        link
        fedilink
        English
        arrow-up
        1
        ·
        15 minutes ago

        This is an utterly idiotic comment, I’ll break it down into bulletpoints to make it earier to understand.

        1. The comment assumes that I didn’t read the article, this is semi-wrong, I skimmed it, and found nothing of what I wrote in the article.
        2. The comment provides ZERO additional information, it is pure snark, and does nothing to inform me about what I missed.
        3. The comment assumes that everyone else also reads the article, this is not the case.
        4. The comment forgets the advantage of summarizing for others, if my points was found in the article, it is a good thing to summarize them in a more accessible way.

        With these point in mind I believe you can make an effort to make a better comment next time.

    • iturnedintoanewt@lemm.ee
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      edit-2
      6 hours ago

      Check whisper apk on fdroid. The thing runs local. It does just this. The model gets audio in an undetermined language, figures out which one automatically, transcribes it, translates it to English (only English atm) and then it speaks it out. It’s not using any acceleration and its a very early build. My Pixel 9 is getting about 3 seconds delay from input to output. It’s all running local.

      It’s doable.

    • lakemalcom10@lemm.ee
      link
      fedilink
      English
      arrow-up
      10
      ·
      10 hours ago

      For 1 they actually addressed that: The system then translates the speech and maintains the expressive qualities and volume of each speaker’s voice while running on a device, such mobile devices with an Apple M2 chip like laptops and Apple Vision Pro. (The team avoided using cloud computing because of the privacy concerns with voice cloning.) Finally, when speakers move their heads, the system continues to track the direction and qualities of their voices as they change.

      • Ilovethebomb@lemm.ee
        link
        fedilink
        English
        arrow-up
        2
        ·
        9 hours ago

        The fact that all this can run on a phone is incredible, this sounds very processor intensive.

        I wonder what it would do to your battery life?

      • stoy@lemmy.zip
        link
        fedilink
        English
        arrow-up
        2
        ·
        10 hours ago

        If that is enough power, and you can run it without any internet access, then yes, it would probably adress point 1.

    • themoken@startrek.website
      link
      fedilink
      English
      arrow-up
      5
      ·
      10 hours ago

      I’m with you on 1 and 2, but “reduced lingual skills” I think is a bit of a stretch. Becoming fluent in another language takes a lot of effort and people only do it if they have a good long term reason.

      I think it’s more likely this would cover the vacation / short term business case that is already covered by human interpreters (or apps already) instead.

        • Sandbar_Trekker@lemmy.today
          link
          fedilink
          English
          arrow-up
          3
          ·
          3 hours ago

          It’s not sending the audio to an unknown server. It’s all local. From the article:

          The system then translates the speech and maintains the expressive qualities and volume of each speaker’s voice while running on a device, such mobile devices with an Apple M2 chip like laptops and Apple Vision Pro. (The team avoided using cloud computing because of the privacy concerns with voice cloning.)