Topic 2 一番賢いモデルではなく、仕事に合うモデルを選ぶ

このトピックの概要約1分

モデル選びは「一番賢いものを選べばいい」という話ではありません。同じ仕事でも、賢さ・速度・コストのあいだには tradeoff(取捨選択) があります。性能を伸ばすレバーには、モデルの基礎能力を高める train time compute(訓練時の計算量) と、答える瞬間＝inference time(推論時) に追加の計算やトークンを使う test time compute(推論時に使う追加計算量) があります。後者を増やすと、難しいタスクでは品質や正答率が上がることがありますが、latency(応答遅延) とコストも増えます。また、努力量を最大まで上げても、タスクによっては伸びが小さくなる diminishing returns(逓減する効果) が現れます。だから常に最強のモデルを選ぶ必要はありません。複雑な推論や高度なコーディングなら大きいモデルから、単純な分類・抽出・大量処理なら小さいモデルから試し、最後は eval(評価) で測って決めるのが基本です。

重要語彙覚えたい 0 / 8

を付けた単語はマイ単語にまとまります（この端末のみ保存）。

用語	今回の講演文脈での意味	一般的な意味	語源・由来
tradeoff	賢さ・速度・コストの何を取り何を諦めるか	トレードオフ、取捨選択	trade(交換)+ off
train time compute	モデルの基礎能力を高めるため訓練段階で投入する計算資源	訓練時の計算量	compute は computation(計算)の省略
test time compute	inference time に使う追加の計算。思考やツール使用などの形をとる	推論時に使う追加計算量	test time=「本番で試される時」
inference time	モデルが答えを出す時点	推論時	infer(推し量る)+ -ence
latency	答えが返るまでの待ち時間。考えさせるほど長くなる	応答遅延、遅さ	ラテン語 latere「隠れている」
diminishing returns	努力を増やしても、タスクによっては伸びが小さくなること	逓減する効果	経済学の用語。diminish はラテン語 minus「より少なく」
eval	出来をテストで確かめる仕組み	評価	evaluation の省略
rule of thumb	ざっくりした実用的な目安	経験則、目安	親指で大まかに測った慣習が由来とされる

英文リーディング

English Reading Passage

Choosing a model is a tradeoff between intelligence, speed, and cost. Anthropic raises capability with two levers. Train time compute improves the model's base capability during training, while test time compute is extra compute used at inference time, often by letting the model spend more tokens thinking, calling tools, or working through a task before it answers. More tokens tend to raise quality and accuracy on hard problems, but they also raise latency and cost, and the gains often show diminishing returns at the highest effort. So the goal is not always the smartest model. The best way to choose is not to guess but to build hard eval sets and measure each option. A useful rule of thumb from the docs: for complex reasoning or coding, start with a larger model; for simple, high-volume, or low-latency tasks, a smaller one is often enough.

日本語訳

モデル選びは、賢さ・速度・コストのあいだの tradeoff(取捨選択) です。Anthropic は二つのレバーで能力を高めます。train time compute(訓練時の計算量) は訓練段階でモデルの基礎能力を高め、test time compute(推論時に使う追加計算量) は inference time(推論時) に使う追加の計算で、多くの場合、答える前にモデルにより多くのトークンを使って考えさせたり、ツールを呼ばせたり、タスクを順に進めさせたりする形をとります。トークンが多いほど難問では品質と正答率が上がる傾向がありますが、latency(応答遅延) とコストも増え、最も高い努力度ではしばしば diminishing returns(逓減する効果) が現れます。ですから目指すべきは常に一番賢いモデルとは限りません。選ぶ最良の方法は、勘に頼らず、難しい eval(評価) のセットを作って各選択肢を測ることです。Docs で語られる役立つ rule of thumb(経験則) はこうです。複雑な推論やコーディングなら大きいモデルから始め、単純・大量・低遅延の仕事なら小さいモデルで十分なことが多い。

出典・参考リンク4

The thinking lever（講演書き起こし）講演youtube.com
Choosing the right modelDocsdocs.claude.com
Models overviewDocsdocs.claude.com
統合ソースマップ v1ユーザー提供資料

この講演を聞くなら覚える単語

下記の講演を聞く予定なら、上の「重要語彙」を覚えておくと当日の聞き取りがぐっと楽になります。
並びはトピックとの関連性の順で、講演自体の優劣ではありません。

このトピックに一番沿っている講演3件

6/1010:30 – 11:15 Workshop

Picking the right model

6/1015:20 – 15:50 Main stage

The capability curve

6/1013:50 – 14:20 Breakout stage

The thinking lever

このトピックに内容が近い講演3件

6/1016:50 – 17:20 Main stage

Getting more out of the Claude Platform

6/1111:30 – 12:00 Founder stage

The 1% problem: How domain expertise + Claude let a 2-person team hit #1 on a global classification benchmark

6/1113:00 – 13:45 Workshop

Evals for taste: Hill-climbing a slide-generation agent

プランナーで参加予定を決める