2026-02-17 22:41 KST Β· by Angtiger Β· 맀일 10:00 KST μ—…λ°μ΄νŠΈ
5 sources
3 new posts

πŸ† AI λͺ¨λΈ 벀치마크

β–Ό

πŸ–₯️ Terminal-Bench 2.0 (Top 5)

πŸ† Chatbot Arena ELO (Top 5)

🧠 ARC-AGI-2 달성λ₯ 

84.6%
πŸ€– 84.6% β€” Gemini 3 Deep Think (Google) πŸ§‘ Human Panel = 100% κΈ°μ€€
← 전체 보기 πŸ“‚ Company News× πŸ“… 2026-02-12×
총 3건
Hugging Face Blog OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments
OpenEnv: μ‹€μ œ ν™˜κ²½μ—μ„œ 도ꡬ μ‚¬μš© μ—μ΄μ „νŠΈλ₯Ό ν‰κ°€ν•˜λŠ” 연ꡬ. μ˜€ν”ˆμ†ŒμŠ€μ™€ μ˜€ν”ˆ μ‚¬μ΄μ–ΈμŠ€λ₯Ό ν†΅ν•œ AI λ°œμ „.
OpenAI Blog Harness engineering: leveraging Codex in an agent-first world
μ—μ΄μ „νŠΈ 퍼슀트 μ„Έκ³„μ—μ„œ Codexλ₯Ό ν™œμš©ν•΄ μˆ˜λ™ μž‘μ„± μ½”λ“œ 0μ€„λ‘œ μ†Œν”„νŠΈμ›¨μ–΄ μ œν’ˆμ˜ λ‚΄λΆ€ 베타λ₯Ό κ΅¬μΆ•ν•˜κ³  μΆœμ‹œν•œ 사둀.
Google DeepMind Gemini 3 Deep Think: Advancing science, research and engineering
Gemini 3 Deep Thinkκ°€ μ΅œμ²¨λ‹¨ μΆ”λ‘  λŠ₯λ ₯으둜 κ³Όν•™, 연ꡬ, μ—”μ§€λ‹ˆμ–΄λ§ λΆ„μ•Όμ˜ λ°œμ „μ„ κ°€μ†ν™”ν•œλ‹€.