2026-02-17 22:41 KST Β· by Angtiger Β· 맀일 10:00 KST μ—…λ°μ΄νŠΈ
5 sources
1 new posts

πŸ† AI λͺ¨λΈ 벀치마크

β–Ό

πŸ–₯️ Terminal-Bench 2.0 (Top 5)

πŸ† Chatbot Arena ELO (Top 5)

🧠 ARC-AGI-2 달성λ₯ 

84.6%
πŸ€– 84.6% β€” Gemini 3 Deep Think (Google) πŸ§‘ Human Panel = 100% κΈ°μ€€
← 전체 보기 πŸ“‚ Company News× πŸ“… 2025-12-09×
총 1건
Google DeepMind FACTS Benchmark Suite: Systematically evaluating the factuality of large language models
FACTS Benchmark Suite: LLM의 사싀성을 λ§€κ°œλ³€μˆ˜, 검색, λ©€ν‹°λͺ¨λ‹¬ μΆ”λ‘  3개 μ˜μ—­μ—μ„œ μ²΄κ³„μ μœΌλ‘œ ν‰κ°€ν•˜λŠ” 벀치마크.