Advanced
In reply to @savvyavi
graham@gcmac
9/19/2023

The log probability bit is just about how the output is natively represented. For math reasons (differentiable loss functions) the model outputs the log of the probabilities. You can reverse this by taking e ^ (-0.89) ~= 41.27%

In reply to @gcmac
Avi đź’™@savvyavi
9/19/2023

I’m asking because I’m writing a short post about how LLMs work and want to explain the probabilities which makes sense for the words displayed but don’t obviously add up to a 100 even if one bucket is “all other possibilities” which I don’t see. Why does next line have a probability?