Entropy got a lot more exciting to me after hearing Sean Carroll talk about it. He has a foundational/philosophical bent and likes to point out that there are competing definitions of entropy set on different philosophical foundations, one of them seemingly observer dependent: - https://youtu.be/x9COqqqsFtc?si=cQkfV5IpLC039Cl5 - https://youtu.be/XJ14ZO-e9NY?si=xi8idD5JmQbT5zxN
Leonard Susskind has lots of great talks and books about quantum information and calculating the entropy of black holes which led to a lot of wild new hypotheses.
Stephen Wolfram gave a long talk about the history of the concept of entropy which was pretty good: https://www.youtube.com/live/ocOHxPs1LQ0?si=zvQNsj_FEGbTX2R3
> The amount of information can be viewed as the ‘degree of surprise’ on learning the value of x. If we are told that a highly improbable event has just occurred, we will have received more information than if we were told that some very likely event has just occurred, and if we knew that the event was certain to happen we would receive no information. Our measure of information content will therefore depend on the probability distribution p(x), and we therefore look for a quantity h(x) that is a monotonic function of the probability p(x) and that expresses the information content. The form of h(·) can be found by noting that if we have two events x and y that are unrelated, then the information gain from observing both of them should be the sum of the information gained from each of them separately, so that h(x, y) = h(x) + h(y). Two unrelated events will be statistically independent and so p(x, y) = p(x)p(y). From these two relationships, it is easily shown that h(x) must be given by the logarithm of p(x) and so we have h(x) = − log2 p(x).
This is the definition of information for a single probabilistic event. The definition of entropy of a random variable follows from this by just taking the expectation.
My understanding is that entropy is a way of quantifying how many different ways a thing could 'actually be' and yet still 'appear to be' how it is. So it is largely a result of an observer's limited ability to perceive / interrogate the 'true' nature of the system in question.
So for example you could observe that a single coin flip is heads, and entropy will help you quantify how many different ways that could have come to pass. e.g. is it a fair coin, a weighted coin, a coin with two head faces, etc. All these possibilities increase the entropy of the system. An arrangement _not_ counted towards the system's entropy is the arrangement where the coin has no heads face, only ever comes up tails, etc.
Related, my intuition about the observation that entropy tends to increase is that it's purely a result of more likely things happening more often on average.
Would be delighted if anyone wanted to correct either of these intuitions.
> Boltzmann’s argument summarized in Exercise of 2.4.11 just derives Shannon’s formula and uses it. A major lesson is that before we use the Shannon formula important physics is over.
> There are folklores in statistical mechanics. For example, in many textbooks ergodic theory and the mechanical foundation of statistical mechanics are discussed even though detailed mathematical explanations may be missing. We must clearly recognize such topics are almost irrelevant to statistical mechanics. We are also brainwashed that statistical mechanics furnishes the foundation of thermodynamics, but we must clearly recognize that without thermodynamics statistical mechanics cannot be formulated. It is a naive idea that microscopic theories are always more fundamental than macroscopic phenomenology.
sources: http://www.yoono.org/download/inst.pdf http://www.yoono.org/download/smhypers12.pdf
I define both concepts fundamentally in relation to priors and possibilities:
- Entropy is the relationship between priors and ANY possibility, relative to the entire space of possibilities.
- Probability is the relationship between priors and a SPECIFIC possibility, relative to the entire space of possibilities.
The framing of priors and possibilities shows why entropy appears differently across disciplines like statistical mechanics and information theory. Entropy is not merely observer-dependent, but prior-dependent. Including priors not held by any specific observer but embedded in the framework itself. This helps resolve the apparent contradiction between objective and subjective interpretations of entropy.
It also defines possibilities as constraints imposed on an otherwise unrestricted reality. This framing unifies how possibility spaces are defined across frameworks.
[1]: https://buttondown.com/themeaninggap/archive/a-unified-persp...
> entropy quantifies uncertainty
This sums it up. Uncertainty is the property of a person and not a system/message. That uncertainty is a function of both a person's model of a system/message and their prior observations.
You and I may have different entropies about the content of the same message. If we're calculating the entropy of dice rolls (where the outcome is the 'message'), and I know the dice are loaded but you don't, my entropy will be lower than yours.
If we see another configuration M2JlH8qc, I would say that the macrostate is the same, it's still "random" and "unordered", and my friend would agree. I say that both macrostates are the same: "random and unordered", and there are many microstates that could be called that, so therefore both are microstates representing the same high-entropy macrostate. However, my friend sees the macrostates as different: one is "my password and ordered", and the other is "random and unordered". There is only one microstate that she would describe as "my password", so from her perspective that's a low-entropy macrostate, while they would agree with me that M2JlH8qc represents a high-entropy macrostate.
So while I agree that "order" is subjective, isn't "how many microstates could result in this macrostate" equally subjective? And then wouldn't it be reasonable to use the words "order" and "disorder" to count (in relative terms) how many microstates could result in the macrostate we subjectively observe?
More elaborately, its the number bits needed to fully specify something which is known to be in some broad category of state but the exact details to calculate it are unknown.
To try to expand on the information measure part from a more abstract starting point: Consider a probability distribution, some set of probabilities p. We can consider it as indicating our degree of certainty about what will happen. In an equiprobable distribution, e.g. a fair coin flip (1/2, 1/2) there is no skew either which way, we are admitting that we basically have no reason to suspect any particular outcome. Contrarily, in a split like (1/4, 3/4) we are stating that we are more certain that one particular outcome will happen.
If you wanted to come up with a number to represent the amount of uncertainty, it's clear that the number should be higher the closer the distribution is to being completely equiprobable (1/2, 1/2)—complete lack of certainty about the result, and the number should be smallest when we are 100% certain (0, 1).
This means that the function has to be an order inversion on the probability values—that is I(1) = 0 (no uncertainty). The logarithm, to arbitrary base (selecting a base is just a change of units) has this property under the convention that I(0) = inf (that is, a totally improbable event carries infinite information—after all, an impossibility occurring would in fact be the ultimate surprise).
Entropy is just the average of this function taken over the probability values (multiply each probability in the distribution by the log of the inverse of the probabilities and sum them). In info theory you also usually assume the probabilities are independent, and so the further condition that I(pq) = I(p) + I(q) is also stipulated.
>Heat flows from hot to cold because the number of ways in which the system can be non-uniform in temperature is much lower than the number of ways it can be uniform in temperature ...
Should probably say "thermal energy" instead of "temperature" if we want to be really precise with our thermodynamics terms. Temperature is not a direct measure of energy, rather it is an extensive property describing the relationship between change in energy to change in entropy.
Most of all, it highlights the subjective / relative foundations of these concepts.
Entropy and Information only exist relative to a decision about the set of state an observer cares to distinguish.
It also caused me to change my informal definition of entropy from a negative ("disorder)" to a more positive one ("the number of things I might care to know")
The Second Law now tells me that the number of interesting things I don't know about is always increasing!
This thread inspired me to post it here: https://news.ycombinator.com/item?id=43695358
Here it is explained at length: "An Intuitive Explanation of the Information Entropy of a Random Variable, Or: How to Play Twenty Questions": http://danielwilkerson.com/entropy.html
Imagine a very high resolution screen. Say a billion by a billion pixels. Each of them can be white, gray or black. What is the lowest entropy possible? Each of the pixels has the same color. How does the screen look? Gray. What is the highest entropy possible? Each pixel has a random color. How does it look from a distance? Gray again.
What does this mean? I have no idea. Maybe nothing.
Also sorry for writing two top level comments, but I just really care about this topic
Information and statistical explanations of entropy are very easy. The real question is, what does entropy mean in the original context that it was introduced in, before those later explanations?
> But I have no idea what entropy is, and from what I find, neither do most other people.
The article does not go on to explain what entropy is, it just tries to explain away some hypothetical claims about entropy which as far as we can tell do hold, and does not explain why, if they were wrong, they do in fact hold.
Entropy can't be a measure of uncertainty, because all the uncertainty is in the probability distribution p(x) - multiplying it with its own logarithm and summing doesn't tell us anything new. If it did, it'd violate quantum physics principles including the Bell inequality and Heisenberg uncertainty.
The article never mentions the simplest and most basic definition of entropy, ie its units (KJ/Kelvin), nor the 3rd law of thermodynamics which is the basis for its measurement.
“Every physicist knows what entropy is. Not one can write it down in words.” Clifford Truesdell
The article hints very briefly at this with the discussion of an unequally-weighted die, and how by encoding the most common outcome with a single bit, you can achieve some amount of compression. That's a start, and we've now rediscovered the idea behind Huffman coding. What information theory tells us is that if you consider a sequence of two dice rolls, you can then use even fewer bits on average to describe that outcome, and so on; as you take your block length to infinity, your average number of bits for each roll in the sequence approaches the entropy of the source. (This is Shannon's source coding theorem, and while entropy plays a far greater role in information theory, this is at least a starting point.)
There's something magical about statistical mechanics where various quantities (e.g. energy, temperature, pressure) emerge as a result of taking partial derivatives of this "partition function", and that they turn out to be the same quantities that we've known all along (up to a scaling factor -- in my stat mech class, I recall using k_B * T for temperature, such that we brought everything back to units of energy).
https://en.wikipedia.org/wiki/Partition_function_(statistica...
https://en.wikipedia.org/wiki/Fundamental_thermodynamic_rela...
If you're dealing with a sea of electrons, you might apply the Pauli exclusion principle to derive Fermi-Dirac statistics that underpins all of semiconductor physics; if instead you're dealing with photons which can occupy the same energy state, the same statistical principles lead to Bose-Einstein statistics.
Statistical mechanics is ultimately about taking certain assumptions about how particles interact with each other, scaling up the quantities beyond our ability to model all of the individual particles, and applying statistical approximations to consider the average behavior of the ensemble. The various forms of entropy are building blocks to that end.
organisms started putting things in places to increase "survivability" and thriving of themselves until the offspring was ready for the job at which point the offspring started to additionaly put things in place for the sake of the "survivability" and thriving of their ancestors ( mostly overlooking their nagging and shortcomings because "love" and because over time, the lessons learned made everything better for all generations ) ...
so entropy is only relevant if all the organisms that can put some things in some place for some reason disappear and the laws of nature run until new organisms emerge. ( which is why I'm always disappointed at leadership and all the fraudulent shit going on ... more pointlessly dead organisms means less heads that can come up with ways to put things together in fun and useful ways ... it's 2025, to whomever it applies: stop clinging to your sabotage-based wannabe supremacy, please, stop corrupting the law, for fucks sake, you rich fucking losers )
Grok:
Yes, thermal entropy is largely a theoretical and statistical concept, rooted in the probabilistic behavior of particles in a system, as described by statistical mechanics. It quantifies disorder or the number of possible microstates, which isn't directly measurable like temperature or pressure. Measuring entropy typically involves indirect methods, such as calculating changes based on heat transfer and temperature (e.g., ΔS = q/T for reversible processes), but these rely on idealized assumptions and precise conditions, making direct measurement challenging in practice.
We IT folk should find another word for disorder that increases over time, specially when that disorder has human factors (number of contributors, number of users, etc). It clearly cannot be treated in the same way as in chemistry.