2026-02-17 22:37 KST Β· by Angtiger Β· 맀일 10:00 KST μ—…λ°μ΄νŠΈ
5 sources
1 new posts

πŸ† AI λͺ¨λΈ 벀치마크

β–Ό

πŸ–₯️ Terminal-Bench 2.0 (Top 5)

πŸ† Chatbot Arena ELO (Top 5)

🧠 ARC-AGI-2 달성λ₯ 

84.6%
πŸ€– 84.6% β€” Gemini 3 Deep Think (Google) πŸ§‘ Human Panel = 100% κΈ°μ€€
← 전체 보기 πŸ“‚ Research & Papers× πŸ“… 2026-02-07×
총 1건
HF Daily Papers Blind to the Human Touch: Overlap Bias in LLM-Based Summary Evaluation
LLM νŒμ •μžκ°€ μš”μ•½ ν‰κ°€μ—μ„œ λ³΄μ΄λŠ” μ˜€λ²„λž© 편ν–₯(overlap bias) 뢄석. LLM νŒμ •μžκ°€ 길이, μˆœμ„œ λ“±μ˜ 편ν–₯을 κ°€μ§€λ©° μ λŒ€μ  μž…λ ₯에 μ·¨μ•½ν•œ 문제λ₯Ό μ„Έλ°€ν•˜κ²Œ λΆ„μ„ν•œλ‹€.