Adapting resource seeking behavior is of
primary importance in survival. Then, balancing
exploration and exploitation of discovered resources is
at the core of adaptation to the environment. The
reinforcement learning theoretical framework has been
elaborated to formalize such reward seeking behavior.
Biologically plausible models based on this algorithm
have flourished recently. Among them, a neural
network model was developed to investigate the
functions of the anterior cingulate cortex (ACC) and the
dorsolateral prefrontal cortex (DLPFC) involved in
action valuation and action selection, respectively [1].
This model proposes a method to regulate dynamically
the exploration inspired by literature on meta-learning
in order to solve dynamically the exploration/
exploitation trade-off [2]. This model performed well in
a deterministic problem solving task (PST). Our goal
was to demonstrate that the model is generalizable to a
more ecological PST with probabilistically dispensed
rewards. The model was tested with its preset learning
rate / exploration rate / initial action values and then
optimized with search of the parameters space. The
initial values of model's parameters proved to be good
however not optimal for the new task. Interestingly, the
model's performance is very dependent on the initial
action values.
|