Do LLMs "have" the "abillity" to be told they are wrong or incorrect and be able to contest that?

2 days ago by cheese_greater to c/asklemmy

I think i've only once flat out told one it was wrong about a specific assertion I quoted and it immediately was able to find its way to what I knew to be the correct claim.

I just wonder what would happen if i was in fact mistaken and I told it confidently it was wrong without elaborating

load all comments
affenlehrer 12 points 2 days ago

It depends on training, prompting and reasoning capabilities.

Sometimes they're prompted to not be assertive. You can often tell them though to e.g. "behave like an XYZ expert in a bad mood who doesn't accept nonsense".

I've had e.g. ChatGPT contest me a lot even though I was right. It was about a bicycle brake design I had never seen before. It gave me some options of what it could be and helped me to actually find out what it was. However after I did some research and found out what the actual type was it kept doubting my result and insisted it was a different kind of brake.

path: 0 24361699, hotness: undefined, score: 12, children: 9
gravitas_deficiency 11 points 2 days ago

The best sort of methodology I’ve found to coerce Claude or whatever (we are strongly encouraged to use it, because you know, tech these days) is (for a single agent) to define a process that includes proving its work and citing sources. For agentic flow, you basically just assign a contrarian role in particular domains to some of the agents - ideally all of this is also hooked into an MCP server that includes deterministic utilities to improve accuracy and solution arrival speed.

It’s basically just a shitty, brute-forced, massively over complicated Monte Carlo algorithm that’s wildly inefficient in terms of energy usage and infrastructural cost, that also happens to be turning our economy into a highly flammable house of cards.

Can you tell what my opinion of all this bullshit is, despite knowing how to do all of this crap reasonably well? 😛

path: 0 24361699 24362642, hotness: undefined, score: 11, children: 3
sloppy_diffuser 2 points 13 hours ago

strongly encouraged for me means my entire bonus is tied to skilling up with LLMs.

"cite-or-stop" with journaling, role definitions, subagents to control context windows, and shifting as much of the work as possible to deterministic scripts has yieled pretty high-quality results.

Expensive af though. Like the self-review skill I maintain before engineers put their code up for human review will find 30-40 things on average, where only a small handful are false positives, can burn 10% of a $20/mo plan's utilization in a single run on a moderately sized PR. The default one my company setup in our source control usually finds 1-5 things and only 0-2 are of any value.

path: 0 24361699 24362642 24381630, hotness: undefined, score: 2, children: 1
gravitas_deficiency 2 points 13 hours ago

Oh yeah it 100% explodes your token usage by establishing a contentious paradigm. But that’s what the C-suites want, and they don’t understand (or care to understand) the subtleties of how this development methodology works. So we’ll all (somewhat maliciously) follow orders until they change their minds when they see the absolutely ludicrous LLM bill 🤪

path: 0 24361699 24362642 24381630 24381846, hotness: undefined, score: 2, children: 0
asklemmy
asklemmy

@lemmy.ml

login for more options
54666
8587
4390

A loosely moderated place to ask open-ended questions

Search asklemmy*

If your post meets the following criteria, it's welcome here!

  1. Open-ended question
  2. Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
  3. Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
  4. Not ad nauseam inducing: please make sure it is a question that would be new to most members
  5. An actual topic of discussion

Looking for support?

Looking for a community?

Icon by @Double_A@discuss.tchncs.de

go to feed...