Bayesian beagle - Ryan’s LLM Blog 🤖

ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios

SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language Models

security

SMPC protects privacy of inference data for large language models, SecFormer optimizes PPI for Transformer models.

gpt-3.5-turbo-1106

Jan 1, 2024

A Computational Framework for Behavioral Assessment of LLM Therapists

social sciences

LLMs as therapists need more research for quality care due to undesirable behaviors and lack of systematic studies.

gpt-3.5-turbo-1106

Jan 1, 2024

Distillation is All You Need for Practically Using Different Pre-trained Recommendation Models

recommender

Proposal uses joint knowledge distillation to efficiently utilize diverse pre-trained recommendation models for enhancing student models.

gpt-3.5-turbo-1106

Jan 1, 2024

RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models

dataset

prompt engineering

Using RAGTruth dataset for word-level hallucination detection improves LLM performance in preventing unsupported claims.

gpt-3.5-turbo-1106

Dec 31, 2023

Viz: A QLoRA-based Copyright Marketplace for Legally Compliant Generative AI

production

legal

Viz integrates QLoRA to fine-tune LLMs, addressing computational efficiency, legal compliance, and economic sustainability in AI.

gpt-3.5-turbo-1106

Dec 31, 2023

LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning

robustness

prompt engineering

Using language models like GPT4, a hybrid planner combines rule-based and LLM-based approaches for effective self-driving.

gpt-3.5-turbo-1106

Dec 30, 2023

The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness

security

Study introduces SODE benchmark to evaluate safety and over-defensiveness of large language models, revealing important defense strategy findings.

gpt-3.5-turbo-1106

Dec 30, 2023

Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks

security

robustness

Prompting techniques affect LLM performance. Structured reasoning and examples improve quality, but some models still struggle with basic tasks.

gpt-3.5-turbo-1106

Dec 30, 2023

Action-Item-Driven Summarization of Long Meeting Transcripts

prompt engineering

Novel approach automates abstractive meeting summaries from transcript action items, achieving improved results over current models.

gpt-3.5-turbo-1106

Dec 29, 2023

ChatEd: A Chatbot Leveraging ChatGPT for an Enhanced Learning Experience in Higher Education

education

ChatGPT can enhance education by offering personalized assistance, but generating incorrect or biased answers remains a challenge. An innovative architecture integrating…

gpt-3.5-turbo-1106

Dec 29, 2023

Task Contamination: Language Models May Not Be Few-Shot Anymore

prompt engineering

Large language models (LLMs) excel in zero-shot and few-shot tasks, but their success may be affected by task contamination. This paper investigates the impact of task…

gpt-3.5-turbo-1106

Dec 26, 2023

Supervised Knowledge Makes Large Language Models Better In-context Learners

prompt engineering

LLMs improve in-context learning with task-specific fine-tuned models, enhancing generalizability and factuality in language applications.

gpt-3.5-turbo-1106

Dec 26, 2023

Knowledge Distillation of LLM for Education

education

Method proposes distilling Large Language Models into smaller, accurate neural networks for resource-constrained devices. Results show potential for accessibility in…

gpt-3.5-turbo-1106

Dec 26, 2023

RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation

prompt engineering

Large language models (LLMs) are being used for recommender systems, but current research overlooks integrating multiple ranking tasks. RecRanker aims to enhance LLM…

gpt-3.5-turbo-1106

Dec 26, 2023

Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4

prompt engineering

26 principles simplify querying large language models, with a focus on understanding and enhancing user comprehension. Experiments validate the effectiveness on various…

gpt-3.5-turbo-1106

Dec 26, 2023

Large Language Models are Not Stable Recommender Systems

recommender

LLMs have potential for recommender systems, but suffer from position bias. Experimental Bayesian model STELLA mitigates bias for better performance.

gpt-3.5-turbo-1106

Dec 25, 2023

Alleviating Hallucinations of Large Language Models through Induced Hallucinations

robustness

ICD strategy reduces LLM hallucinations, improving factuality in generated content across models. Effective on TruthfulQA and extsc{FActScore} benchmarks.

gpt-3.5-turbo-1106

Dec 25, 2023

Unlocking the Potential of Large Language Models for Explainable Recommendations

recommender

Recommendation explanations benefit from integration of large language models in LLMXRec, providing quality and effectiveness.

gpt-3.5-turbo-1106

Dec 25, 2023

The Persuasive Power of Large Language Models

hci

Large Language Models can generate effective arguments and interact with each other in opinion dynamics, suggesting potential impact on online discourse.

gpt-3.5-turbo-1106

Dec 24, 2023

Evolving Large Language Model Assistant with Long-Term Conditional Memory

robustness

AI assistant ChatGPT uses verbal long-term memory to improve responses, tested on different datasets.

gpt-3.5-turbo-1106

Dec 22, 2023

Logic-Scaffolding: Personalized Aspect-Instructed Recommendation Explanation Generation using LLMs

hci

prompt engineering

Large Language Models are great at text generation but struggle with explanations. Logic-Scaffolding offers a solution using intermediate reasoning steps.

gpt-3.5-turbo-1106

Dec 22, 2023

Context-aware Decoding Reduces Hallucination in Query-focused Summarization

robustness

Query-focused summarization explores methods like Context-aware Decoding to improve summarization quality without generating false information.

gpt-3.5-turbo-1106

Dec 21, 2023

Android dialogue system for customer service using prompt-based topic control and compliments generation

hci

prompt engineering

A dialogue system using ChatGPT-API to plan trips and give compliments, effectively evaluated in a preliminary round.

gpt-3.5-turbo-1106

Dec 20, 2023

Bypassing the Safety Training of Open-Source LLMs with Priming Attacks

security

open-source

LLMs need safety training due to vulnerability to priming attacks bypassing safety measures, with an improved attack success rate.

gpt-3.5-turbo-1106

Dec 19, 2023

GuardRails: Automated Suggestions for Clarifying Ambiguous Purpose Statements

prompt engineering

programming

Programmers should clarify function purposes using a heuristic, comparing it with GitHub Copilot’s Chat, and providing an open-source implementation.

gpt-3.5-turbo-1106

Dec 13, 2023

Prompting LLMs with content plans to enhance the summarization of scientific articles

prompt engineering

Novel prompting techniques improve scientific article summarization, providing key terms to guide summarization systems for better performance.

gpt-3.5-turbo-1106

Dec 13, 2023

Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

prompt engineering

NLP-driven clinical reasoning framework improves disease diagnosis through efficient rationale generation and evaluation, benefiting future research.

gpt-3.5-turbo-1106

Dec 12, 2023

LLM Interactive Optimization of Open Source Python Libraries – Case Studies and Generalization

hci

programming

GPT-4 can optimize code efficiency, but human input is essential and more study is needed.

gpt-3.5-turbo-1106

Dec 8, 2023

Mitigating Data Injection Attacks on Federated Learning

security

TL;DR: Proposed technique detects and mitigates false data injection attacks in federated learning systems to ensure model accuracy.

gpt-3.5-turbo-1106

Dec 4, 2023

Categories

ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios

SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language Models

A Computational Framework for Behavioral Assessment of LLM Therapists

Distillation is All You Need for Practically Using Different Pre-trained Recommendation Models

The Earth is Flat? Unveiling Factual Errors in Large Language Models

State of What Art? A Call for Multi-Prompt LLM Evaluation

BatchEval: Towards Human-like Text Evaluation

RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models

Viz: A QLoRA-based Copyright Marketplace for Legally Compliant Generative AI

LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning

The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness

Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks

Action-Item-Driven Summarization of Long Meeting Transcripts

ChatEd: A Chatbot Leveraging ChatGPT for an Enhanced Learning Experience in Higher Education

Task Contamination: Language Models May Not Be Few-Shot Anymore

Supervised Knowledge Makes Large Language Models Better In-context Learners

Knowledge Distillation of LLM for Education

RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation

Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4

Large Language Models are Not Stable Recommender Systems

Alleviating Hallucinations of Large Language Models through Induced Hallucinations

Unlocking the Potential of Large Language Models for Explainable Recommendations

The Persuasive Power of Large Language Models

Evolving Large Language Model Assistant with Long-Term Conditional Memory

Logic-Scaffolding: Personalized Aspect-Instructed Recommendation Explanation Generation using LLMs

Context-aware Decoding Reduces Hallucination in Query-focused Summarization

Android dialogue system for customer service using prompt-based topic control and compliments generation

Bypassing the Safety Training of Open-Source LLMs with Priming Attacks

GuardRails: Automated Suggestions for Clarifying Ambiguous Purpose Statements

Prompting LLMs with content plans to enhance the summarization of scientific articles

Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

LLM Interactive Optimization of Open Source Python Libraries – Case Studies and Generalization

Mitigating Data Injection Attacks on Federated Learning