Roman Leventov
3 min readApr 28, 2023

--

You illustrate the mechanism of the Gaia network in the agriculture space and also argue that it will be economically attractive for all actors in this space to opt into the network.

I think it makes sense. However, when you try to transfer this principle into the domain of AI, I think it becomes more challenging on two fronts.

First, I don't see pure economic incentive for AI actors to embark on the platform that you suggest instead of developing monolithic AIs in the way the current SoTA AI systems are developed. Seems that it will be at best neutral for these AIs, and at worst will hurt their performance. Therefore, there should be already some regulation or a shared agreement to trade pure capability for interpretability, privacy, etc. And even if there is such an agreement, and even if the world economy has completely switched to the Gaia system (which is wildly improbably, but let's assume), then, even if misaligned players don't have financial incentives to develop AI in the "bad" way (because to benefit from it and earn FERN they would need to subject their AI to the constraints of the framework), we think there still could be reckless misaligned actors who would run dangerous experiments with "incorrectly" trained agents, power-seeking AI, "just for fun", or for the sake of "gain-of-function research".

Thus, I don't see how even if the Gaia network is wildly successful we can avoid what you called "draconian control".

The second problem of the application of the Gaia network principles to AI alignment is that whereas in the context of agriculture, the "pragmatic value" component of expected free energy is relatively tractable to define (but not that easy actually, if we try to correctly account for the complexity of ecological value and not simply the yield and the soil and biome parameters within defined thresholds; actually, the ecological "pragmatic value" could turn out a rabbit hole, too), then defining "pragmatic value" in the context of alignment seems almost intractable. If we try to score pragmatic value as the degree of misaligned between the models of human values empirically expressed by humans and modelled by AIs, then we run into the problem that the Bayesian approach to modelling human values probably cannot capture the full complexity of value (because humans are complex systems), as I hypothesised here: https://www.lesswrong.com/posts/FnwqLB7A9PenRdg4Z/for-alignment-we-should-simultaneously-use-multiple-theories. If you disagree, I would be interested to hear your view. My position was influenced by a recent interview with Ines Hipolito, https://www.youtube.com/watch?v=04yvn-B7vRg.

You can say this is not a big deal because the majority of value will still be aligned when modelled via Active Inference, and the rest (which can be captured only within other perspectives on cognition than Active Inference) we can align through other means. But then the problem is the ambition of the Gaia network to be the all-encompassing framework precisely presupposing a certain cognitive framework (Active Inference), which we precisely want to sidestep to capture and align the "last 2% of human values". But then if the Gaia network allows any kind of extra-framework policy and decision-making at all, it seems to open up a door for all sorts of abuses.

--

--