lemmy.co.nz
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Michael Ten @lemmy.world to Technology@lemmy.worldEnglish · 1 year ago

OpenAI transcribed over a million hours of YouTube videos to train GPT-4

www.theverge.com

external-link
message-square
43
fedilink
  • cross-posted to:
  • [email protected]
155
external-link

OpenAI transcribed over a million hours of YouTube videos to train GPT-4

www.theverge.com

Michael Ten @lemmy.world to Technology@lemmy.worldEnglish · 1 year ago
message-square
43
fedilink
  • cross-posted to:
  • [email protected]
How OpenAI, Google, and Meta deal with the limits of data online.
  • Dkarma@lemmy.world
    link
    fedilink
    English
    arrow-up
    12
    arrow-down
    25
    ·
    1 year ago

    How clueless are you. Everything “taken” was available for free. Provided for free for any web crawler to consume and now you’re acting like consuming it is a crime?

    I get that you’re really jealous because you didn’t think of LLMs but you don’t get to claim something is a crime in one specific instance just because you don’t like what they’re doing after their program consumes content.

    Google has done the same thing for years and no one said a peep. What does everyone think search results even are???

    • Defaced@lemmy.world
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      2
      ·
      edit-2
      1 year ago

      You completely miss my point, are you saying data such as copyrighted published works and medical records are free? Because I did not in any way consent to sharing medical records to OpenAI https://www.businessinsider.com/openai-chatgpt-generative-ai-stole-personal-data-lawsuit-children-medical-2023-6?op=1

      Now I realize this is an alleged offense, but it’s still fucked up. As for wanting to be the first to make a LLM, I have no desire to put myself into that amount of responsibility and liability. Sam Altman is chasing money and nothing more.

    • BreakDecks@lemmy.ml
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      2
      ·
      1 year ago

      There’s a distinct difference between quotation and plagiarism. A search engine does the former, LLMs do the latter.

      • Knock_Knock_Lemmy_In@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        5
        ·
        1 year ago

        No. If you write a truly unique combination of words then an LLM will be very unlikely to reproduce them.

        An LLM is only likely to plagiarise you if your writing is similar to others.

        • BreakDecks@lemmy.ml
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          1 year ago

          [citation needed]

          • Knock_Knock_Lemmy_In@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            3
            ·
            1 year ago

            https://blog.gdeltproject.org/do-llms-truly-create-or-merely-arrange-just-how-much-of-an-llms-writing-is-original/

            • BreakDecks@lemmy.ml
              link
              fedilink
              English
              arrow-up
              1
              ·
              1 year ago

              The differences between human and machine-generated text overlap support the image of LLMs as more “arrangers” than “creators” of text.

              So plagiarism…

              • Knock_Knock_Lemmy_In@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                1 year ago

                It only plagiarises you if you write something similar to lots of other people.

                Write something original and, even if it is in their training dataset, LLMs are highly unlikely to reproduce it.

    • EurekaStockade@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      1
      ·
      1 year ago

      Fuck Google too

Technology@lemmy.world

technology@lemmy.world

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


  • @[email protected]
  • @[email protected]
  • @[email protected]
  • @[email protected]
Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 3.17K users / day
  • 9.62K users / week
  • 18K users / month
  • 37.7K users / 6 months
  • 1 local subscriber
  • 69.9K subscribers
  • 14K Posts
  • 602K Comments
  • Modlog
  • mods:
  • L3s@lemmy.world
  • enu@lemmy.world
  • Technopagan@lemmy.world
  • L4sBot@lemmy.world
  • L3s@hackingne.ws
  • L4s@hackingne.ws
  • BE: 0.19.5
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org