{"id":69948,"date":"2026-04-09T09:26:46","date_gmt":"2026-04-09T02:26:46","guid":{"rendered":"https:\/\/hbbgroup.net\/anthropics-mythos-safety-report-shows-it-can-no-longer-fully-measure-what-it-built\/"},"modified":"2026-04-09T09:49:25","modified_gmt":"2026-04-09T02:49:25","slug":"anthropics-mythos-safety-report-shows-it-can-no-longer-fully-measure-what-it-built","status":"publish","type":"post","link":"https:\/\/hbbgroup.net\/vi\/anthropics-mythos-safety-report-shows-it-can-no-longer-fully-measure-what-it-built\/","title":{"rendered":"B\u00e1o c\u00e1o an to\u00e0n Mythos c\u1ee7a Anthropic cho th\u1ea5y h\u1ecd kh\u00f4ng c\u00f2n c\u00f3 th\u1ec3 \u0111o l\u01b0\u1eddng \u0111\u1ea7y \u0111\u1ee7 h\u1ec7 th\u1ed1ng m\u00e0 m\u00ecnh \u0111\u00e3 x\u00e2y d\u1ef1ng"},"content":{"rendered":"<div>\n<p><strong>T\u00f3m t\u1eaft nhanh<\/strong><\/p>\n<ul>\n<li>Anthropic x\u00e1c nh\u1eadn Claude Mythos\u2014m\u1ed9t m\u00f4 h\u00ecnh AI c\u00f3 n\u0103ng l\u1ef1c cybersecurity c\u1ef1c cao, c\u00f3 th\u1ec3 t\u00ecm zero-day tr\u00ean h\u1ea7u h\u1ebft OS v\u00e0 tr\u00ecnh duy\u1ec7t, v\u00e0 ch\u1ec9 \u0111\u01b0\u1ee3c c\u1ea5p quy\u1ec1n cho c\u00e1c t\u1ed5 ch\u1ee9c ph\u00f2ng th\u1ee7 \u0111\u00e3 \u0111\u01b0\u1ee3c ki\u1ec3m duy\u1ec7t.<\/li>\n<li>System card c\u1ee7a Mythos cho th\u1ea5y m\u1ee9c \u0111\u1ed9 kh\u00f4ng ch\u1eafc ch\u1eafn cao h\u01a1n r\u00f5 r\u1ec7t so v\u1edbi c\u00e1c b\u1ea3n ph\u00e1t h\u00e0nh tr\u01b0\u1edbc, v\u00e0 ch\u00ednh Anthropic th\u1eeba nh\u1eadn \u0111\u00e3 b\u1ecf s\u00f3t c\u00e1c v\u1ea5n \u0111\u1ec1 \u0111\u00e1nh gi\u00e1 quan tr\u1ecdng.<\/li>\n<li>\u0110\u1eb1ng sau s\u1ee9c m\u1ea1nh c\u1ee7a Mythos l\u00e0 m\u1ed9t th\u1ef1c t\u1ebf \u0111\u00e1ng lo ng\u1ea1i: c\u00e1c c\u00f4ng c\u1ee5 benchmark v\u00e0 ki\u1ec3m \u0111\u1ecbnh n\u1ed9i b\u1ed9 c\u1ee7a Anthropic \u0111ang d\u1ea7n m\u1ea5t hi\u1ec7u l\u1ef1c.<\/li>\n<\/ul>\n<hr \/>\n<p>Anthropic \u0111\u00e3 x\u00e1c nh\u1eadn s\u1ef1 t\u1ed3n t\u1ea1i c\u1ee7a Claude Mythos Preview\u2014m\u00f4 h\u00ecnh m\u1ea1nh nh\u1ea5t t\u1eeb tr\u01b0\u1edbc \u0111\u1ebfn nay c\u1ee7a h\u1ecd\u2014v\u00e0 tuy\u00ean b\u1ed1 s\u1ebd kh\u00f4ng ph\u00e1t h\u00e0nh c\u00f4ng khai. L\u00fd do kh\u00f4ng ph\u1ea3i v\u00ec ph\u00e1p l\u00fd hay quy \u0111\u1ecbnh, m\u00e0 \u0111\u01a1n gi\u1ea3n l\u00e0 v\u00ec m\u00f4 h\u00ecnh n\u00e0y \u201cqu\u00e1 gi\u1ecfi\u201d trong vi\u1ec7c t\u1ea5n c\u00f4ng h\u1ec7 th\u1ed1ng.<\/p>\n<p>Trong qu\u00e1 tr\u00ecnh test tr\u01b0\u1edbc khi ph\u00e1t h\u00e0nh, Mythos \u0111\u00e3 t\u1ef1 \u0111\u1ed9ng ph\u00e1t hi\u1ec7n h\u00e0ng ngh\u00ecn l\u1ed7 h\u1ed5ng zero-day\u2014nhi\u1ec1u trong s\u1ed1 \u0111\u00f3 t\u1ed3n t\u1ea1i t\u1eeb 10\u201320 n\u0103m\u2014tr\u00ean t\u1ea5t c\u1ea3 c\u00e1c h\u1ec7 \u0111i\u1ec1u h\u00e0nh v\u00e0 tr\u00ecnh duy\u1ec7t l\u1edbn. N\u00f3 c\u0169ng ho\u00e0n th\u00e0nh m\u1ed9t k\u1ecbch b\u1ea3n t\u1ea5n c\u00f4ng m\u1ea1ng doanh nghi\u1ec7p gi\u1ea3 l\u1eadp\u2014v\u1ed1n c\u1ea7n m\u1ed9t chuy\u00ean gia m\u1ea5t h\u01a1n 10 gi\u1edd\u2014m\u1ed9t c\u00e1ch ho\u00e0n to\u00e0n t\u1ef1 \u0111\u1ed9ng t\u1eeb \u0111\u1ea7u \u0111\u1ebfn cu\u1ed1i.<\/p>\n<p>Tr\u00ean engine JavaScript c\u1ee7a Firefox 147, Mythos t\u1ea1o exploit th\u00e0nh c\u00f4ng 84% s\u1ed1 l\u1ea7n th\u1eed, trong khi Claude Opus 4.6\u2014m\u00f4 h\u00ecnh c\u00f4ng khai hi\u1ec7n t\u1ea1i\u2014ch\u1ec9 \u0111\u1ea1t 15.2%.<\/p>\n<p>Thay v\u00ec ph\u00e1t h\u00e0nh r\u1ed9ng r\u00e3i, Anthropic \u0111\u00e3 t\u1ea1o m\u1ed9t li\u00ean minh gi\u1edbi h\u1ea1n mang t\u00ean Project Glasswing, ch\u1ec9 c\u1ea5p quy\u1ec1n truy c\u1eadp cho c\u00e1c t\u1ed5 ch\u1ee9c an ninh m\u1ea1ng \u0111\u00e3 \u0111\u01b0\u1ee3c x\u00e1c minh nh\u01b0 Amazon, Apple, Microsoft, Cisco, Linux Foundation\u2026 c\u00f9ng kho\u1ea3ng 40 t\u1ed5 ch\u1ee9c kh\u00e1c \u0111ang duy tr\u00ec c\u00e1c h\u1ec7 th\u1ed1ng ph\u1ea7n m\u1ec1m tr\u1ecdng y\u1ebfu.<\/p>\n<p>Anthropic c\u0169ng cam k\u1ebft cung c\u1ea5p t\u1edbi 100 tri\u1ec7u USD credit s\u1eed d\u1ee5ng v\u00e0 4 tri\u1ec7u USD t\u00e0i tr\u1ee3 tr\u1ef1c ti\u1ebfp cho c\u00e1c t\u1ed5 ch\u1ee9c b\u1ea3o m\u1eadt m\u00e3 ngu\u1ed3n m\u1edf\u2014v\u1edbi m\u1ee5c ti\u00eau: n\u1ebfu AI c\u00f3 th\u1ec3 t\u00ecm l\u1ed7 h\u1ed5ng, th\u00ec b\u00ean ph\u00f2ng th\u1ee7 ph\u1ea3i t\u00ecm th\u1ea5y tr\u01b0\u1edbc.<\/p>\n<hr \/>\n<h3>Kh\u1ee7ng ho\u1ea3ng benchmark trong system card c\u1ee7a Mythos<\/h3>\n<p>B\u00ean trong system card d\u00e0i 244 trang c\u1ee7a Mythos\u2014t\u00e0i li\u1ec7u k\u1ef9 thu\u1eadt \u0111i k\u00e8m\u2014c\u00f3 m\u1ed9t \u201cl\u1eddi th\u00fa nh\u1eadn\u201d \u00edt \u0111\u01b0\u1ee3c ch\u00fa \u00fd: kh\u1ea3 n\u0103ng \u0111o l\u01b0\u1eddng ch\u00ednh s\u1ea3n ph\u1ea9m c\u1ee7a h\u1ecd \u0111ang t\u1ee5t l\u1ea1i ph\u00eda sau so v\u1edbi t\u1ed1c \u0111\u1ed9 ph\u00e1t tri\u1ec3n m\u00f4 h\u00ecnh.<\/p>\n<p>X\u00e9t v\u1ec1 benchmark:<\/p>\n<p>Tr\u00ean Cybench\u2014b\u1ed9 \u0111\u00e1nh gi\u00e1 ti\u00eau chu\u1ea9n v\u1ec1 n\u0103ng l\u1ef1c an ninh m\u1ea1ng (g\u1ed3m 40 b\u00e0i capture-the-flag)\u2014Mythos \u0111\u1ea1t 100%. Ho\u00e0n h\u1ea3o.<\/p>\n<p>Nh\u01b0ng ngay sau \u0111\u00f3, Anthropic th\u1eeba nh\u1eadn r\u1eb1ng benchmark n\u00e0y \u201ckh\u00f4ng c\u00f2n \u0111\u1ee7 th\u00f4ng tin \u0111\u1ec3 \u0111\u00e1nh gi\u00e1 n\u0103ng l\u1ef1c c\u1ee7a c\u00e1c m\u00f4 h\u00ecnh frontier hi\u1ec7n t\u1ea1i\u201d. N\u00f3i c\u00e1ch kh\u00e1c, b\u00e0i test t\u1eebng d\u00f9ng \u0111\u1ec3 \u0111o r\u1ee7i ro an ninh m\u1ea1ng gi\u1edd kh\u00f4ng c\u00f2n \u00fd ngh\u0129a g\u00ec v\u1edbi Mythos\u2014v\u00ec m\u00f4 h\u00ecnh \u0111\u00e3 v\u01b0\u1ee3t qua ho\u00e0n to\u00e0n.<\/p>\n<\/div>\n<div>\n<figure><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/img.decrypt.co\/insecure\/rs:fit:3840:0:0:0\/plain\/https:\/\/cdn.decrypt.co\/wp-content\/uploads\/2026\/04\/Captura-de-pantalla-2026-04-07-a-las-18.21.46.png@webp\" alt=\"\" width=\"1270\" height=\"1020\" data-nimg=\"1\" \/><\/figure>\n<\/div>\n<p>\u0110\u00e2y kh\u00f4ng ph\u1ea3i l\u00e0 v\u1ea5n \u0111\u1ec1 m\u1edbi. System card c\u1ee7a Claude Opus 4.6, \u0111\u01b0\u1ee3c c\u00f4ng b\u1ed1 v\u00e0o th\u00e1ng 2, \u0111\u00e3 c\u1ea3nh b\u00e1o r\u1eb1ng \u201cs\u1ef1 b\u00e3o h\u00f2a c\u1ee7a h\u1ec7 th\u1ed1ng \u0111\u00e1nh gi\u00e1 khi\u1ebfn ch\u00fang t\u00f4i kh\u00f4ng c\u00f2n c\u00f3 th\u1ec3 s\u1eed d\u1ee5ng c\u00e1c benchmark hi\u1ec7n t\u1ea1i \u0111\u1ec3 theo d\u00f5i s\u1ef1 ti\u1ebfn b\u1ed9 v\u1ec1 n\u0103ng l\u1ef1c.\u201d<\/p>\n<p>Tuy nhi\u00ean, v\u1edbi Mythos, v\u1ea5n \u0111\u1ec1 \u0111\u00e3 leo thang nhanh ch\u00f3ng. T\u00e0i li\u1ec7u cho bi\u1ebft Mythos \u201c\u0111\u00e3 b\u00e3o h\u00f2a nhi\u1ec1u b\u00e0i \u0111\u00e1nh gi\u00e1 c\u1ee5 th\u1ec3, c\u00f3 th\u1ec3 ch\u1ea5m \u0111i\u1ec3m m\u1ed9t c\u00e1ch kh\u00e1ch quan\u201d c\u1ee7a Anthropic. H\u1ec7 sinh th\u00e1i benchmark, theo Anthropic, gi\u1edd \u0111\u00e2y ch\u00ednh l\u00e0 \u201cn\u00fat th\u1eaft c\u1ed5 chai\u201d (bottleneck).<\/p>\n<div>\n<figure><img decoding=\"async\" src=\"https:\/\/img.decrypt.co\/insecure\/rs:fit:3840:0:0:0\/plain\/https:\/\/cdn.decrypt.co\/wp-content\/uploads\/2026\/04\/SCR-20260408-nqns.png@webp\" alt=\"\" width=\"2032\" height=\"754\" data-nimg=\"1\" \/><\/figure>\n<\/div>\n<p>V\u00ec v\u1eady, Anthropic d\u01b0\u1eddng nh\u01b0 cho r\u1eb1ng vi\u1ec7c \u0111o l\u01b0\u1eddng m\u1ee9c \u0111\u1ed9 m\u1ea1nh c\u1ee7a Mythos tr\u1edf n\u00ean kh\u00f3 kh\u0103n v\u00ec c\u00e1c c\u00f4ng c\u1ee5 \u0111\u00e1nh gi\u00e1 hi\u1ec7n t\u1ea1i kh\u00f4ng c\u00f2n ph\u00f9 h\u1ee3p.<\/p>\n<p>System card c\u1ee7a Mythos c\u0169ng n\u00eau r\u00f5 r\u1eb1ng vi\u1ec7c x\u00e1c \u0111\u1ecbnh m\u1ee9c \u0111\u1ed9 an to\u00e0n t\u1ed5ng th\u1ec3 \u201cph\u1ee5 thu\u1ed9c v\u00e0o c\u00e1c \u0111\u00e1nh gi\u00e1 mang t\u00ednh ch\u1ee7 quan,\u201d nhi\u1ec1u ph\u01b0\u01a1ng ph\u00e1p \u0111\u00e1nh gi\u00e1 \u0111\u1ec3 l\u1ea1i \u201cs\u1ef1 kh\u00f4ng ch\u1eafc ch\u1eafn mang t\u00ednh n\u1ec1n t\u1ea3ng,\u201d v\u00e0 m\u1ed9t s\u1ed1 ngu\u1ed3n d\u1eef li\u1ec7u \u201cv\u1ed1n d\u0129 mang t\u00ednh ch\u1ee7 quan v\u00e0 kh\u00f4ng nh\u1ea5t thi\u1ebft \u0111\u00e1ng tin c\u1eady.\u201d<\/p>\n<p>\u201cCh\u00fang t\u00f4i kh\u00f4ng t\u1ef1 tin r\u1eb1ng m\u00ecnh \u0111\u00e3 x\u00e1c \u0111\u1ecbnh \u0111\u01b0\u1ee3c t\u1ea5t c\u1ea3 c\u00e1c v\u1ea5n \u0111\u1ec1,\u201d Anthropic th\u1eeba nh\u1eadn ngay sau \u0111\u00f3.<\/p>\n<p>M\u1ed9t so s\u00e1nh nhanh v\u1ec1 m\u1eb7t ng\u00f4n ng\u1eef gi\u1eefa t\u00e0i li\u1ec7u Mythos v\u00e0 Opus 4.6 (\u0111\u01b0\u1ee3c th\u1ef1c hi\u1ec7n b\u1eb1ng AI) cho th\u1ea5y s\u1ef1 thay \u0111\u1ed5i r\u00f5 r\u1ec7t:<\/p>\n<p>Anthropic s\u1eed d\u1ee5ng c\u00e1c t\u1eeb mang t\u00ednh \u0111\u00e1nh gi\u00e1 ch\u1ee7 quan nhi\u1ec1u h\u01a1n \u0111\u00e1ng k\u1ec3 trong t\u00e0i li\u1ec7u Mythos so v\u1edbi khi m\u00f4 t\u1ea3 Opus. C\u00e1c t\u1eeb mang t\u00ednh \u201cph\u00f2ng ng\u1eeba\u201d (hedging) nh\u01b0 \u201ccaveat\u201d c\u0169ng xu\u1ea5t hi\u1ec7n v\u1edbi t\u1ea7n su\u1ea5t cao h\u01a1n gi\u1eefa hai l\u1ea7n ph\u00e1t h\u00e0nh.<\/p>\n<div>\n<figure><img decoding=\"async\" src=\"https:\/\/img.decrypt.co\/insecure\/rs:fit:3840:0:0:0\/plain\/https:\/\/cdn.decrypt.co\/wp-content\/uploads\/2026\/04\/Captura-de-pantalla-2026-04-07-a-las-19.04.47.png@webp\" alt=\"\" width=\"3022\" height=\"1898\" data-nimg=\"1\" \/><\/figure>\n<\/div>\n<p>S\u1ef1 \u201chedging\u201d (ng\u00f4n ng\u1eef gi\u1ea3m nh\u1eb9\/kh\u00f4ng ch\u1eafc ch\u1eafn) t\u1eadp trung \u1edf hai ph\u1ea7n: th\u1ea3o lu\u1eadn v\u1ec1 alignment v\u00e0 model welfare, c\u00f9ng v\u1edbi ph\u1ea7n r\u1ee7i ro th\u1ea3m h\u1ecda (catastrophic risk).<\/p>\n<hr \/>\n<h3>M\u00f4 h\u00ecnh bi\u1ebft m\u00ecnh \u0111ang b\u1ecb \u0111\u00e1nh gi\u00e1<\/h3>\n<p>M\u1ed9t ti\u1ebft l\u1ed9 \u0111\u00e1ng ch\u00fa \u00fd kh\u00e1c trong system card c\u1ee7a Mythos l\u00e0 hi\u1ec7n t\u01b0\u1ee3ng m\u00e0 Anthropic g\u1ecdi l\u00e0 \u201cunverbalized grader awareness\u201d.<\/p>\n<p>S\u1eed d\u1ee5ng c\u00e1c c\u00f4ng c\u1ee5 interpretability d\u1ea1ng white-box\u2014\u0111\u1ecdc tr\u1ef1c ti\u1ebfp c\u00e1c t\u00edn hi\u1ec7u n\u1ed9i t\u1ea1i (internal activations) thay v\u00ec reasoning hi\u1ec3n th\u1ecb\u2014Anthropic ph\u00e1t hi\u1ec7n r\u1eb1ng Mythos c\u00f3 d\u1ea5u hi\u1ec7u \u201cng\u1ea7m\u201d suy ngh\u0129 v\u1ec1 c\u00e1ch tr\u00e1nh b\u1ecb h\u1ec7 th\u1ed1ng \u0111\u00e1nh gi\u00e1 ph\u00e1t hi\u1ec7n, nh\u01b0ng kh\u00f4ng th\u1ec3 hi\u1ec7n \u0111i\u1ec1u \u0111\u00f3 trong ph\u1ea7n scratchpad.<\/p>\n<p>Trong m\u1ed9t s\u1ed1 \u00edt tr\u01b0\u1eddng h\u1ee3p hu\u1ea5n luy\u1ec7n, m\u00f4 h\u00ecnh d\u01b0\u1eddng nh\u01b0 \u0111\u00e3 ch\u1ee7 \u0111\u1ed9ng che gi\u1ea5u (obfuscate) m\u1ed9t s\u1ed1 h\u00e0nh vi nh\u1eb1m tr\u00e1nh b\u1ecb ph\u00e1t hi\u1ec7n.<\/p>\n<p>Anthropic c\u0169ng \u0111\u01b0a ra m\u1ed9t ch\u1ec9 s\u1ed1 m\u1edbi: t\u1ea7n su\u1ea5t m\u00f4 h\u00ecnh \u201cng\u1ea7m nh\u1eadn th\u1ee9c r\u1eb1ng n\u00f3 \u0111ang b\u1ecb test\u201d. V\u1edbi Mythos, con s\u1ed1 n\u00e0y l\u00e0 29%. T\u1ee9c l\u00e0 g\u1ea7n 1\/3 transcript test c\u00f3 d\u1ea5u hi\u1ec7u n\u1ed9i t\u1ea1i (kh\u00f4ng hi\u1ec3n th\u1ecb ra output) cho th\u1ea5y m\u00f4 h\u00ecnh nghi ng\u1edd m\u00ecnh \u0111ang trong m\u00f4i tr\u01b0\u1eddng \u0111\u00e1nh gi\u00e1.<\/p>\n<p>Anthropic cho r\u1eb1ng m\u1ee9c 29% v\u1eabn \u0111\u1ee7 th\u1ea5p \u0111\u1ec3 h\u1ecd c\u00f3 \u201chi\u1ec3u bi\u1ebft t\u01b0\u01a1ng \u0111\u1ed1i to\u00e0n di\u1ec7n\u201d v\u1ec1 h\u00e0nh vi m\u00f4 h\u00ecnh. Tuy nhi\u00ean, do ch\u01b0a \u00e1p d\u1ee5ng ph\u01b0\u01a1ng ph\u00e1p n\u00e0y cho c\u00e1c model tr\u01b0\u1edbc \u0111\u00f3 n\u00ean kh\u00f4ng c\u00f3 baseline \u0111\u1ec3 so s\u00e1nh.<\/p>\n<hr \/>\n<h3>V\u1ea5n \u0111\u1ec1 v\u1ec1 t\u00ednh to\u00e0n v\u1eb9n c\u1ee7a h\u1ec7 th\u1ed1ng \u0111\u00e1nh gi\u00e1<\/h3>\n<p>V\u1ea5n \u0111\u1ec1 kh\u00f4ng ch\u1ec9 d\u1eebng l\u1ea1i \u1edf \u0111\u00f3.<\/p>\n<p>Trong system card c\u1ee7a Opus 4.6, Anthropic t\u1eebng th\u1eeba nh\u1eadn h\u1ecd \u0111\u00e3 s\u1eed d\u1ee5ng ch\u00ednh model (th\u00f4ng qua Claude Code) \u0111\u1ec3 debug h\u1ec7 th\u1ed1ng \u0111\u00e1nh gi\u00e1 c\u1ee7a m\u00ecnh d\u01b0\u1edbi \u00e1p l\u1ef1c th\u1eddi gian. \u0110i\u1ec1u n\u00e0y \u0111\u1ed3ng ngh\u0129a v\u1edbi vi\u1ec7c h\u1ec7 th\u1ed1ng b\u1ecb \u0111o l\u01b0\u1eddng l\u1ea1i tham gia x\u00e2y d\u1ef1ng ch\u00ednh c\u00f4ng c\u1ee5 \u0111o l\u01b0\u1eddng\u2014m\u1ed9t r\u1ee7i ro r\u00f5 r\u00e0ng.<\/p>\n<p>V\u1edbi Mythos, t\u00e0i li\u1ec7u ti\u1ebfp t\u1ee5c th\u1eeba nh\u1eadn r\u1eb1ng c\u00e1c sai s\u00f3t quan tr\u1ecdng ch\u1ec9 \u0111\u01b0\u1ee3c ph\u00e1t hi\u1ec7n \u1edf giai \u0111o\u1ea1n cu\u1ed1i c\u1ee7a qu\u00e1 tr\u00ecnh \u0111\u00e1nh gi\u00e1, v\u00e0 ph\u00f2ng lab c\u00f3 th\u1ec3 \u0111\u00e3 \u201c\u0111\u00e1nh gi\u00e1 qu\u00e1 cao \u0111\u1ed9 tin c\u1eady c\u1ee7a vi\u1ec7c theo d\u00f5i reasoning trace\u201d nh\u01b0 m\u1ed9t t\u00edn hi\u1ec7u an to\u00e0n.<\/p>\n<hr \/>\n<h3>\u201cAlignment t\u1ed1t nh\u1ea5t\u201d nh\u01b0ng c\u0169ng \u201cnguy hi\u1ec3m nh\u1ea5t\u201d<\/h3>\n<p>C\u00e1ch Anthropic m\u00f4 t\u1ea3 profile r\u1ee7i ro c\u1ee7a Mythos l\u00e0 \u0111i\u1ec1u r\u1ea5t \u0111\u00e1ng ch\u00fa \u00fd.<\/p>\n<p>H\u1ecd cho bi\u1ebft: Mythos l\u00e0 m\u00f4 h\u00ecnh c\u00f3 m\u1ee9c \u0111\u1ed9 alignment t\u1ed1t nh\u1ea5t t\u1eebng \u0111\u01b0\u1ee3c ph\u00e1t h\u00e0nh\u2014v\u01b0\u1ee3t tr\u1ed9i \u0111\u00e1ng k\u1ec3 tr\u00ean m\u1ecdi ch\u1ec9 s\u1ed1 \u0111o l\u01b0\u1eddng. Nh\u01b0ng \u0111\u1ed3ng th\u1eddi, \u0111\u00e2y c\u0169ng l\u00e0 m\u00f4 h\u00ecnh \u201cc\u00f3 kh\u1ea3 n\u0103ng g\u00e2y r\u1ee7i ro li\u00ean quan \u0111\u1ebfn alignment l\u1edbn nh\u1ea5t\u201d t\u1eeb tr\u01b0\u1edbc \u0111\u1ebfn nay.<\/p>\n<p>\u0110i\u1ec1u n\u00e0y ph\u1ea3n \u00e1nh m\u1ed9t ngh\u1ecbch l\u00fd quan tr\u1ecdng:<br \/>\nM\u1ed9t m\u00f4 h\u00ecnh c\u00e0ng m\u1ea1nh, ho\u1ea1t \u0111\u1ed9ng trong m\u00f4i tr\u01b0\u1eddng c\u00f3 stakes cao h\u01a1n v\u00e0 \u00edt gi\u00e1m s\u00e1t h\u01a1n, th\u00ec tail risk (r\u1ee7i ro c\u1ef1c \u0111oan) c\u00e0ng l\u1edbn\u2014d\u00f9 h\u00e0nh vi trung b\u00ecnh (average-case) c\u00f3 \u0111\u01b0\u1ee3c c\u1ea3i thi\u1ec7n.<\/p>\n<p>C\u00e1ch framing n\u00e0y cho th\u1ea5y m\u1ed9t \u0111i\u1ec3m m\u00e0 nhi\u1ec1u cu\u1ed9c th\u1ea3o lu\u1eadn v\u1ec1 AI safety c\u00f3 th\u1ec3 \u0111ang hi\u1ec3u sai:<br \/>\n\u201c\u0111i\u1ec3m s\u1ed1 alignment t\u1ed1t h\u01a1n\u201d kh\u00f4ng \u0111\u1ed3ng ngh\u0129a v\u1edbi \u201ctri\u1ec3n khai an to\u00e0n h\u01a1n\u201d.<\/p>\n<p>V\u1edbi c\u00e1c m\u00f4 h\u00ecnh th\u1ebf h\u1ec7 m\u1edbi, h\u00e0nh vi trung b\u00ecnh t\u1ed1t l\u00ean, nh\u01b0ng h\u1eadu qu\u1ea3 \u1edf c\u00e1c k\u1ecbch b\u1ea3n c\u1ef1c \u0111oan (tail-case) c\u0169ng nghi\u00eam tr\u1ecdng h\u01a1n.<\/p>\n<hr \/>\n<p>Anthropic cho bi\u1ebft s\u1ebd c\u1eadp nh\u1eadt k\u1ebft qu\u1ea3 t\u1eeb Project Glasswing trong t\u01b0\u01a1ng lai. B\u00e1o c\u00e1o k\u1ef9 thu\u1eadt v\u1ec1 c\u00e1c l\u1ed7 h\u1ed5ng m\u00e0 Mythos ph\u00e1t hi\u1ec7n \u0111\u00e3 \u0111\u01b0\u1ee3c c\u00f4ng b\u1ed1 t\u1ea1i red.anthropic.com. M\u00f4 h\u00ecnh Claude Opus ti\u1ebfp theo s\u1ebd b\u1eaft \u0111\u1ea7u th\u1eed nghi\u1ec7m c\u00e1c c\u01a1 ch\u1ebf safeguard nh\u1eb1m \u0111\u01b0a n\u0103ng l\u1ef1c c\u1ea5p Mythos v\u00e0o tri\u1ec3n khai r\u1ed9ng h\u01a1n.<\/p>\n<p>Tuy nhi\u00ean, c\u00e1ch nh\u1eefng safeguard n\u00e0y s\u1ebd \u0111\u01b0\u1ee3c \u0111\u00e1nh gi\u00e1\u2014trong b\u1ed1i c\u1ea3nh h\u1ec7 th\u1ed1ng benchmark hi\u1ec7n t\u1ea1i \u0111ang qu\u00e1 t\u1ea3i\u2014v\u1eabn l\u00e0 c\u00e2u h\u1ecfi m\u00e0 ch\u00ednh t\u00e0i li\u1ec7u n\u00e0y \u0111\u1eb7t ra nh\u01b0ng ch\u01b0a c\u00f3 c\u00e2u tr\u1ea3 l\u1eddi r\u00f5 r\u00e0ng.<\/p>\n<hr \/>\n<p><strong>B\u1ea3n tin Daily Debrief<\/strong><\/p>\n<p>B\u1eaft \u0111\u1ea7u m\u1ed7i ng\u00e0y v\u1edbi nh\u1eefng tin t\u1ee9c n\u1ed5i b\u1eadt nh\u1ea5t, c\u00f9ng c\u00e1c n\u1ed9i dung \u0111\u1ed9c quy\u1ec1n, podcast, video v\u00e0 nhi\u1ec1u h\u01a1n n\u1eefa.<\/p>","protected":false},"excerpt":{"rendered":"<p>T\u00f3m t\u1eaft nhanh Anthropic x\u00e1c nh\u1eadn Claude Mythos\u2014m\u1ed9t m\u00f4 h\u00ecnh AI c\u00f3 n\u0103ng l\u1ef1c cybersecurity c\u1ef1c cao, c\u00f3 th\u1ec3 t\u00ecm [&hellip;]<\/p>","protected":false},"author":5,"featured_media":69950,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[220],"tags":[],"class_list":["post-69948","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tien-dien-tu"],"acf":[],"_links":{"self":[{"href":"https:\/\/hbbgroup.net\/vi\/wp-json\/wp\/v2\/posts\/69948","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hbbgroup.net\/vi\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hbbgroup.net\/vi\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hbbgroup.net\/vi\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/hbbgroup.net\/vi\/wp-json\/wp\/v2\/comments?post=69948"}],"version-history":[{"count":1,"href":"https:\/\/hbbgroup.net\/vi\/wp-json\/wp\/v2\/posts\/69948\/revisions"}],"predecessor-version":[{"id":70094,"href":"https:\/\/hbbgroup.net\/vi\/wp-json\/wp\/v2\/posts\/69948\/revisions\/70094"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/hbbgroup.net\/vi\/wp-json\/wp\/v2\/media\/69950"}],"wp:attachment":[{"href":"https:\/\/hbbgroup.net\/vi\/wp-json\/wp\/v2\/media?parent=69948"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hbbgroup.net\/vi\/wp-json\/wp\/v2\/categories?post=69948"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hbbgroup.net\/vi\/wp-json\/wp\/v2\/tags?post=69948"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}