We present a new technology-based paradigm to support embodied mathematics educational games, using wearable devices in the form of SmartPhones and SmartWatches for math learning, for full classes of students in formal in-school education settings. The Wearable Learning Games Engine is web based infrastructure that enables students to carry one mobile device per child, as they embark on math team-based activities that require physical engagement with the environment. These Wearable Tutors serve as guides and assistants while students manipulate, measure, estimate, discern, discard and find mathematical objects that satisfy specified constraints. Multiplayer math games that use this infrastructure have yielded both cognitive and affective benefits. Beyond math game play, the Wearable Games Engine Authoring Tool enables students to create games themselves for other students to play; in this process, students engage in computational thinking and learn about finite-state machines. We present the infrastructure, games, and results for a series of experiments on both game play and game creation.
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d). This nested system of two flows, where the parameter-flow is constrained to lie on the compact manifold, provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem which is intrinsically related to training deep neural network architectures such as Neural ODEs. Consequently, it leads to better downstream models, as we show on the example of training reinforcement learning policies with evolution strategies, and in the supervised learning setting, by comparing with previous SOTA baselines. We provide strong convergence results for our proposed mechanism that are independent of the depth of the network, supporting our empirical studies. Our results show an intriguing connection between the theory of deep neural networks and the field of matrix flows on compact manifolds.
We present a novel view on principal component analysis (PCA) as a competitive game in which each approximate eigenvector is controlled by a player whose goal is to maximize their own utility function. We analyze the properties of this PCA game and the behavior of its gradient based updates. The resulting algorithm -- which combines elements from Oja's rule with a generalized Gram-Schmidt orthogonalization -- is naturally decentralized and hence parallelizable through message passing. We demonstrate the scalability of the algorithm with experiments on large image datasets and neural network activations. We discuss how this new view of PCA as a differentiable game can lead to further algorithmic developments and insights.
Variational quantum algorithms (VQAs) optimize the parameters θ of a parametrized quantum circuit V(θ) to minimize a cost function C. While VQAs may enable practical applications of noisy quantum computers, they are nevertheless heuristic methods with unproven scaling. Here, we rigorously prove two results, assuming V(θ) is an alternating layered ansatz composed of blocks forming local 2-designs. Our first result states that defining C in terms of global observables leads to exponentially vanishing gradients (i.e., barren plateaus) even when V(θ) is shallow. Hence, several VQAs in the literature must revise their proposed costs. On the other hand, our second result states that defining C with local observables leads to at worst a polynomially vanishing gradient, so long as the depth of V(θ) is O(logn). Our results establish a connection between locality and trainability. We illustrate these ideas with large-scale simulations, up to 100 qubits, of a quantum autoencoder implementation.
We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness. To approximate softmax attention-kernels, Performers use a novel Fast Attention Via positive Orthogonal Random features approach (FAVOR+), which may be of independent interest for scalable kernel methods. FAVOR+ can be also used to efficiently model kernelizable attention mechanisms beyond softmax. This representational power is crucial to accurately compare softmax with other kernels for the first time on large-scale tasks, beyond the reach of regular Transformers, and investigate optimal attention-kernels. Performers are linear architectures fully compatible with regular Transformers and with strong theoretical guarantees: unbiased or nearly-unbiased estimation of the attention matrix, uniform convergence and low estimation variance. We tested Performers on a rich set of tasks stretching from pixel-prediction through text models to protein sequence modeling. We demonstrate competitive results with other examined efficient sparse and dense attention methods, showcasing effectiveness of the novel attention-learning paradigm leveraged by Performers.
There are several distinct failure modes for overoptimization of systems on
the basis of metrics. This occurs when a metric which can be used to improve a
system is used to an extent that further optimization is ineffective or
harmful, and is sometimes termed Goodhart's Law. This class of failure is often
poorly understood, partly because terminology for discussing them is ambiguous,
and partly because discussion using this ambiguous terminology ignores
distinctions between different failure modes of this general type. This paper
expands on an earlier discussion by Garrabrant, which notes there are "(at
least) four different mechanisms" that relate to Goodhart's Law. This paper is
intended to explore these mechanisms further, and specify more clearly how they
occur. This discussion should be helpful in better understanding these types of
failures in economic regulation, in public policy, in machine learning, and in
Artificial Intelligence alignment. The importance of Goodhart effects depends
on the amount of power directed towards optimizing the proxy, and so the
increased optimization power offered by artificial intelligence makes it
especially critical for that field.
We show how a number of well-known uncertainty principles for the Fourier transform, such as the Heisenberg uncertainty principle, the Donoho-Stark uncertainty principle, and Meshulam's nonabelian uncertainty principle, have little to do with the structure of the Fourier transform itself. Rather, all of these results follow from very weak properties of the Fourier transform (shared by numerous linear operators), namely that it is bounded as an operator , and that it is unitary. Using a single, simple proof template, and only these (or weaker) properties, we obtain some new proofs and many generalizations of these basic uncertainty principles, to new operators and to new settings, in a completely unified way. Together with our general overview, this paper can also serve as a survey of the many facets of the phenomena known as uncertainty principles.
Two of the biggest barriers to the large-scale adoption of cryptocurrencies as a means of payment are ease-of-use and purchasing-power volatility. We introduce Celo, a protocol that addresses these issues with an address-based encryption scheme and a stable-value token. We show how these attributes together can be used to foster a monetary ecology that includes global reference currencies, local and regional stable-value currencies, and a social dividend. Our first application is a social-payments system centered around mobile phones.