The generalized version of policy improvement and policy evaluation allows one to leverage the solution of some tasks to speed up the solution of others. If the reward function of a task can be well approximated as a linear combination of the reward functions of tasks previously solved, we can reduce a reinforcement-learning problem to a simpler linear regression. When this is not the case, the agent can still exploit the task solutions by using them to interact with and learn about the environment. Both strategies considerably reduce the amount of data needed to solve a reinforcement-learning problem.
The combination of reinforcement learning with deep learning
is a promising approach to tackle important sequential decisionmaking
problems that are currently intractable. One obstacle to
overcome is the amount of data needed by learning systems of
this type. This article proposes to address this issue through
a divide-and-conquer approach. The authors argue that complex decision
problems can be naturally decomposed into multiple tasks that
unfold in sequence or in parallel. By associating each task with
a reward function, this problem decomposition can be seamlessly
accommodated through a generalization
of two fundamental operations in reinforcement learning:
policy improvement and policy evaluation.
We present a system that converts annotated broadcast video of tennis matches into interactively controllable video sprites that behave and appear like professional tennis players. Our approach is based on controllable video textures, and utilizes domain knowledge of the cyclic structure of tennis rallies to place clip transitions and accept control inputs at key decision-making moments of point play. Most importantly, we use points from the video collection to model a player's court positioning and shot selection decisions during points. We use these behavioral models to select video clips that reflect actions the real-life player is likely to take in a given matchplay situation, yielding sprites that behave realistically at the macro level of full points, not just individual tennis motions. Our system can generate novel points between professional tennis players that resemble Wimbledon broadcasts, enabling new experiences such as the creation of matchups between players that have not competed in real life, or interactive control of players in the Wimbledon final. According to expert tennis players, the rallies generated using our approach are significantly more realistic in terms of player behavior than video sprite methods that only consider the quality of motion transitions during video synthesis.
The key difference in distributional reinforcement learning (RL) lies in how ‘anticipated reward’ is defined. In traditional RL, the reward prediction is represented as a single quantity: the average taken over all potential reward outcomes, weighted by their respective probabilities. By contrast, distributional RL uses a multiplicity of predictions. These predictions vary in their degree of optimism about upcoming reward. More optimistic predictions anticipate obtaining greater future rewards; less optimistic predictions anticipate less positive outcomes. Together, the entire range of predictions captures the full probability distribution over future rewards.
A person's mood has been linked with predictions of future reward and it has been proposed that both depression and bipolar disorder may involve biased predictions of future value. These biases may arise from asymmetries in reward prediction error (RPE) coding.
Much of systems neuroscience has attempted to formulate succinct statements about the function of individual neurons in the brain. This approach has been successful at explaining some (relatively small) circuits and certain hard-wired behaviours. However, there is reason to believe that this approach will need to be complemented by other insights if we are to develop good models of plastic circuits with thousands, millions or billions of neurons. There is, unfortunately, no guarantee that the function of individual neurons in the CNS can be compressed down to a human-interpretable, verbally articulable form. Given that we currently have no good means of distilling the function of individual units in deep ANNs into words, and given that real brains are likely more, not less, complex, we suggest that systems neuroscience would benefit from focusing on the kinds of models that have been successful in ANN research programs, i.e., models grounded in the three essential components: objective functions, the learning rules and the architectures.
a lot of computational neuroscience has emphasized models of the dynamics of neural activity, which has not been a major theme in this discussion. As such, one might worry that the framework fails to connect with this past literature.
The results show a level of suggestive knowledge that indicates the continuing existence of a gap between the capabilities of recent vision-based face recognition algorithms and human-level performance
The performance gap appears to be narrowing in terms of accuracy-based expectations, a curious question has arisen; "Face understanding of AI is really close to that of human?" In the present study, in an effort to confirm the brain-driven concept, we conduct image-based detection, classification, and generation using an in-house created fake face database
The aim of this paper is to study the fusion at feature extraction level forface and fingerprint biometrics. The proposed approach is based on the fusionof the two traits by extracting independent feature pointsets from the twomodalities, and making the two pointsets compatible for concatenation.Moreover, to handle the problem of curse of dimensionality, the featurepointsets are properly reduced in dimension. Different feature reductiontechniques are implemented, prior and after the feature pointsets fusion, andthe results are duly recorded. The fused feature pointset for the database andthe query face and fingerprint images are matched using techniques based oneither the point pattern matching, or the Delaunay triangulation. Comparativeexperiments are conducted on chimeric and real databases, to assess the actualadvantage of the fusion performed at the feature extraction level, incomparison to the matching score level.
The semantic web is an open and distributed environment in which it is hardto guarantee consistency of knowledge and information. Under the standardtwo-valued semantics everything is entailed if knowledge and information isinconsistent. The semantics of the paraconsistent logic LP offers a solution.However, if the available knowledge and information is consistent, the set ofconclusions entailed under the three-valued semantics of the paraconsistentlogic LP is smaller than the set of conclusions entailed under the two-valuedsemantics. Preferring conflict-minimal three-valued interpretations eliminatesthis difference. Preferring conflict-minimal interpretations introduces non-monotonicity. Tohandle the non-monotonicity, this paper proposes an assumption-basedargumentation system. Assumptions needed to close branches of a semantictableaux form the arguments. Stable extensions of the set of derived argumentscorrespond to conflict minimal interpretations and conclusions entailed by allconflict-minimal interpretations are supported by arguments in all stableextensions.
While being it extremely important, many Exploratory Data Analysis (EDA)systems have the inhability to perform classification and visualization in acontinuous basis or to self-organize new data-items into the older ones(evenmore into new labels if necessary), which can be crucial in KDD -Knowledge Discovery, Retrieval and Data Mining Systems (interactive and onlineforms of Web Applications are just one example). This disadvantge is alsopresent in more recent approaches using Self-Organizing Maps. On the presentwork, and exploiting past sucesses in recently proposed Stigmergic Ant Systemsa robust online classifier is presented, which produces class decisions on acontinuous stream data, allowing for continuous mappings. Results show thatincreasingly better results are achieved, as demonstraded by other authors indifferent areas. KEYWORDS: Swarm Intelligence, Ant Systems, Stigmergy,Data-Mining, Exploratory Data Analysis, Image Retrieval, ContinuousClassification.
Biologically-inspired methods such as evolutionary algorithms and neuralnetworks are proving useful in the field of information fusion. ArtificialImmune Systems (AISs) are a biologically-inspired approach which takeinspiration from the biological immune system. Interestingly, recent researchhas show how AISs which use multi-level information sources as input data canbe used to build effective algorithms for real time computer intrusiondetection. This research is based on biological information fusion mechanismsused by the human immune system and as such might be of interest to theinformation fusion community. The aim of this paper is to present a summary ofsome of the biological information fusion mechanisms seen in the human immunesystem, and of how these mechanisms have been implemented as AISs