I feel like you are making it yourself too complicated with this "happiness".

The reason neural networks use feedback is to adjust the weights of nodes, but you don't really have nodes. On top of that, you give a neural network feedback while it's learning. Once the network is trained, you can just give it input and it'll produce the (hopefully) correct output based on the previous learning.
This is one of the great things about neural nets, you can teach them offline! Learning is the heavy CPU intensive process (probably GPU intensive, really). At runtime they are fast and rather resource inexpensive.

Anyway, by forcing the concept of learning and feedback into the state machine you are overcomplicating things quite a bit. Rather, you should try to define the states for your AI and what states the AI can switch to from the other states. That is to say, if your AI is in the reloading state, it can't get to the shooting state, it has to finish the reloading state. But it can start running for cover during reloading.

Next up, define how much the AI wants to reach a certain state. If it wants to see the player dead, that should be its highest goal. And then, every tick, evaluate the current state and what other states the AI can move to from where it is right now. See how much the AI wants to be in its current state (say, searching the player) versus the other states. If it can move to a state that it wants more (eg, shooting the player), then it should switch to that state.

So, let's say you have the following states and their "wanted" level:
- Searching the player: 0
- Reloading (requires gun to not be full): 1
- Shooting at the player (requires player in sight and ammo): 2

The AI starts out with an empty gun, and in the searching the player state.

Tick 1: AI checks if there are any better states and decides on the "Reloading" one because it can also satisfy its requirement. Therefore AI goes to the "Reloading" state
Tick 2: AI finishes reloading, it can no longer satisfy the requirement for the better state. Player not in sight? Okay, let's step one down to the Searching the Player state
Tick 3: Player shows up! The shooting player state can be satisfied AND it has the highest weight, therefore the AI starts shooting
Tick 4: The AI shot a couple of bullets but still has some ammo left. The shooting state is still the best option
Tick 5: Gun is empty. The AI can no longer satisfy its current state and drops to the next best state: Reloading

You get the idea. You can also have it so that not every state can be reached from every other state. That probably makes the most sense. For example, you can have it so that AI is guarding something as the least priority, while also looking out for anything suspicious which would put it in the Alarmed state, from which it can start chasing the player etc. So even though hunting the player would have a higher weight, it can go to that state unless something pings it off and it transitions to the Alarmed state.


Shitlord by trade and passion. Graphics programmer at Laminar Research.
I write blog posts at feresignum.com