26 November 2015 Solubility Theories.

There's a strange vacuum appeared in my life. The 5th Edition of HSPiP (Hansen Solubility Parameters in Practice) has been released an hour ago after a stupendous amount of work with Charles Hansen and Hiroshi Yamamoto. Suddenly my ToDo list has shrunk and I no longer have to worry about all those launch details and programming problems and interface testing. This has given me time to think about an intemperate email I sent yesterday to an expert in solvent thermodynamics.

This expert produced a very powerful solvent and polymers thermodynamic system in the 1970s. It is more fundamental and powerful than the well-known Flory-Huggins theory. My argument was that although it is fundamental and powerful it is useless for the sorts of things most of want to do with solvents, solvent blends, polymers, nanoparticles and so on.

Although my response was a characteristic over-reaction, I do have a point. The strength of such thermodynamics is that it is pure. And that is also its weakness. Not once does an actual molecule appear in the theory. Yes there are "molecules", but no real molecule with its shape, size, charge distribution makes an appearance. So although the theory can cope with the "anomalies" caused by molecules liking or disliking each other (i.e. activity coefficients) there is no way to go from any specific molecule to any specific thermodynamics other than by measurement. But there are billions of potential molecular pairs, and no hope of measuring them all.

My new favourite thermodynamic theory, Kirkwood-Buff, is equally pure, but at its heart are the radial distribution functions that one can at least imagine being calculable for real molecules via molecular dynamics. In practice this has proven near impossible, but because the core of KB is imaginable in molecular terms I am far more comfortable with it.

UNIFAC has immense power in its own domain because hordes of dedicated scientists have parameterised the system over decades. Unfortunately, those hordes have had (rightly) to be paid for their hard work so UNIFAC is unaffordable for most of us.

The wonderful COSMO-RS approach is a beautiful balance between molecules and pure theory. Each molecule's shape, size, charge distribution is captured in a one-off quantum chemistry (DFT) calculation and then the thermodynamic interactions with itself and other molecules can be accurately calculated via arithmetic - i.e. the results are rapidly calculated after the initial investment in the slow DFT calculation. The COSMOtherm implementation of COSMO-RS is provably the most powerful solubilty predictive tool available - it is the regular prizewinner in solubility prediction competitions.

Why, then, do I use HSP for most of my solubility work? Its thermodynamic roots are clear but also clearly limited. And yet it works pretty darned well a lot of the time. Why is that? I think it's because it has found a Goldilocks level of abstraction that is neither too shallow to be useful nor too deep to be practical. Each molecule is characterised by the 3 HSP (Dispersion, Polar, Hydrogen-bonding) plus Molar Volume. Anything less than this loses too much information. Anything more than this (e.g. adding shape parameters) adds complications that are not workable in practice. Charles, Hiroshi and I, for example, have never been able to find a way to include shape parameters into any calculation, even though it is clear that shape can be important.

What irks me about all this is that molecules don't have powerful computers or elegant theories yet work out their thermodynamics effortlessly. Here we are in the 21st century with essentially limitless computer power and vast knowledge of chemistry and chemical systems, yet the practical choices for those doing solubility work are things like UNIFAC, COSMOtherm or HSPiP. We must be doing something wrong. It must be possible to create something relatively simple that captures the deep purity of classical thermodynamics, the molecularly-relevant radial distribution functions at the heart of KB theory, but uses a modest set of chemical parameters each of which can be readily estimated or calculated with little effort.

My email yesterday was not meant to be totally negative. It was sent to two esteemed colleagues who, unlike me, have all the intellectual firepower in their respective "pure" domains. The three of us want to create something new and useful. At the end of my email I shifted from being negative (all too easy) to expressing the notion that we should be able to add molecular knowledge to the pure systems and thereby increase their practial predictive capacity. If dumb molecules can work this stuff out, why can't we? At least, with the HSPiP release out of the way, I now have some time to think rather than just write an intemperate email.