Trending

Today

Raincloud Plots Proposed as a Concise, Hybrid Data Visualization that Provides "Inference at a Glance" with Raw Statistical Information

From Paper: Raincloud plots: a multi-platform tool for robust data visualization

Published: Aug 2018

- Space saved compared to violin + boxplot is replaced with 'jittered' (arbitrarily offset) data points at each axis increment that provides raw, statistical data insights--forms 'raindrops' of raincloud
- Raincloud plot, in part, employs 'glanceable' nature of violin plot but without unnecessary mirroring across center axis (resulting in better "ink-to-data" ratio)

Persistence in the market risk premium: evidence across countries

From Paper: CROSSMARK_BW_txt_100x100.eps

Published: Sep 2020

- The CAPM is still the most popular model for analysing the relationshipbetween risk and return. This paper provides evidence on the degree ofpersistence of one of its key components, namely the market risk premium,as well as its volatility. The analysis applies fractional integration methods todata for the US, Germany and Japan, and for robustness purposes considersdifferent time horizons (2, 5 and 10 years) and frequencies (monthly andweekly). The empirical findings in most cases imply that the market riskpremium is a highly persistent variable which can be characterized as a randomwalk process, whilst its volatility is less persistent and exhibits stationary long-memory behaviour. There is also evidence that in the case of the US the degreeof persistence has changed as a results of various events such as the 1973–74oil crisis, the early 1980s recession resulting from the Fed’s contractionarymonetary policy, the 1997 Asian financial crisis, and the 2007 global financialcrisis; this is confirmed by both endogenous break tests and the associatedsubsample estimates. Market participants should take this evidence into accountwhen designing their investment strategies.

Controlled Experiments at Scale

Web-facing companies, including Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, Netflix, Shop Direct, StumbleUpon, Yahoo, and Zynga use online controlled experiments to guide product development and accelerate innovation. At Microsoft’s Bing, the use of controlled experiments has grown exponentially over time, with over 200 concurrent experiments now running on any given day. Running experiments at large scale requires addressing multiple challenges in three areas: cultural/organizational, engineering, and trustworthiness. On the cultural and organizational front, the larger organization needs to learn the reasons for running controlled experiments and the tradeoffs between controlled experiments and other methods of evaluating ideas. We discuss why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits. On the engineering side, we architected a highly scalable system, able to handle data at massive scale: hundreds of concurrent experiments, each containing millions of users. Classical testing and debugging techniques no longer apply when there are billions of live variants of the site, so alerts are used to identify issues rather than relying on heavy up- front testing. On the trustworthiness front, we have a high occurrence of false positives that we address, and we alert experimenters to statistical interactions between experiments. The Bing Experimentation System is credited with having accelerated innovation and increased annual revenues by hundreds of millions of dollars, by allowing us to find and focus on key ideas evaluated through thousands of controlled experiments. A 1% improvement to revenue equals more than $10M annually in the US, yet many ideas impact key metrics by 1% and are not well estimated a-priori. The system has also identified many negative features that we avoided deploying, despite key stakeholders’ early excitement, saving us similar large amounts.

Slide 1 of 1

Improving Transparency, Falsifiability, and Rigour by Making Hypothesis Tests Machine Readable

Published: Oct 2020

- This paper describes the benefits provided by hypothesis tests and gives examples of how to create machine-readable statistical predictions
- The authors propose that the gold standard for well-specified hypothesis tests should be a statistical prediction that is machine-readable.

Slide 1 of 1

A Bayesian Deep Learning Framework for End-To-End Prediction of Emotion
from Heartbeat

Authors:

Ross Harper, Joshua SouthernPublished: Feb 2019

- Bayesian

Slide 1 of 1

Asymptotic Results on Adaptive False Discovery Rate Controlling
Procedures Based on Kernel Estimators

Author:

Pierre NeuvialPublished: Mar 2010

The False Discovery Rate (FDR) is a commonly used type I error rate inmultiple testing problems. It is defined as the expected False DiscoveryProportion (FDP), that is, the expected fraction of false positives amongrejected hypotheses. When the hypotheses are independent, theBenjamini-Hochberg procedure achieves FDR control at any pre-specified level.By construction, FDR control offers no guarantee in terms of power, or type IIerror. A number of alternative procedures have been developed, includingplug-in procedures that aim at gaining power by incorporating an estimate ofthe proportion of true null hypotheses. In this paper, we study the asymptoticbehavior of a class of plug-in procedures based on kernel estimators of thedensity of the $p$-values, as the number $m$ of tested hypotheses grows toinfinity. In a setting where the hypotheses tested are independent, we provethat these procedures are asymptotically more powerful in two respects: (i) atighter asymptotic FDR control for any target FDR level and (ii) a broaderrange of target levels yielding positive asymptotic power. We also show thatthis increased asymptotic power comes at the price of slower, non-parametricconvergence rates for the FDP. These rates are of the form $m^{-k/(2k+1)}$,where $k$ is determined by the regularity of the density of the $p$-valuedistribution, or, equivalently, of the test statistics distribution. Theseresults are applied to one- and two-sided tests statistics for Gaussian andLaplace location models, and for the Student model.

Retrieved from arxiv

Scaling and Universality in River Flow Dynamics

Authors:

M. De Domenico, V. LatoraPublished: Nov 2010

We investigate flow dynamics in rivers characterized by basin areas and dailymean discharge spanning different orders of magnitude. We show that the delayedincrements evaluated at time scales ranging from days to months can beopportunely rescaled to the same non-Gaussian probability density function.Such a scaling breaks up above a certain critical horizon, where a behaviortypical of thermodynamic systems at the critical point emerges. We finally showthat both the scaling behavior and the break up of the scaling are universalfeatures of river flow dynamics.

Retrieved from arxiv

Fractal distributions of dark matter and gas in the MareNostrum Universe

Author:

Jose GaitePublished: Jun 2008

- Nbody

Retrieved from arxiv

How to reduce dimension with PCA and random projections?

Authors:

Fan Yang, Sifan Liu, Edgar Dobriban, David P. WoodruffPublished: May 2020

In our "big data" age, the size and complexity of data is steadilyincreasing. Methods for dimension reduction are ever more popular and useful.Two distinct types of dimension reduction are "data-oblivious" methods such asrandom projections and sketching, and "data-aware" methods such as principalcomponent analysis (PCA). Both have their strengths, such as speed for randomprojections, and data-adaptivity for PCA. In this work, we study how to combinethem to get the best of both. We study "sketch and solve" methods that take arandom projection (or sketch) first, and compute PCA after. We compute theperformance of several popular sketching methods (random iid projections,random sampling, subsampled Hadamard transform, count sketch, etc) in a general"signal-plus-noise" (or spiked) data model. Compared to well-known works, ourresults (1) give asymptotically exact results, and (2) apply when the signalcomponents are only slightly above the noise, but the projection dimension isnon-negligible. We also study stronger signals allowing more general covariancestructures. We find that (a) signal strength decreases under projection in adelicate way depending on the structure of the data and the sketching method,(b) orthogonal projections are more accurate, (c) randomization does not hurttoo much, due to concentration of measure, (d) count sketch can be improved bya normalization method. Our results have implications for statistical learningand data analysis. We also illustrate that the results are highly accurate insimulations and in analyzing empirical data.

Retrieved from arxiv

Load More Papers