Using real-world evidence in biomedical research, an indispensable complement to clinical trials, requires access to large quantities of patient data that are typically held separately by multiple healthcare institutions. We propose FAMHE, a novel federated analytics system that, based on multiparty homomorphic encryption (MHE), enables privacy-preserving analyses of distributed datasets by yielding highly accurate results without revealing any intermediate data. We demonstrate the applicability of FAMHE to essential biomedical analysis tasks, including Kaplan-Meier survival analysis in oncology and genome-wide association studies in medical genetics. Using our system, we accurately and efficiently reproduce two published centralized studies in a federated setting, enabling biomedical insights that are not possible from individual institutions alone. Our work represents a necessary key step towards overcoming the privacy hurdle in enabling multi-centric scientific collaborations.
Citations are often used as a metric of the impact of scientific publications. Here, we examine how the number of downloads from Sci-Hub as well as various characteristics of publications and their authors predicts future citations.Using data from 12 leading journals in economics, consumer research, neuroscience, and multidisciplinary research, we found that articles downloaded from Sci-Hub were cited 1.72 times more than papers not downloaded from Sci-Hub and that the number of downloads from Sci-Hub was a robust predictor of future citations. Among other characteristics of publications, the number of figures in a manuscript consistently predicts its future citations.The results suggest that limited access to publications may limit some scientific research from achieving its full impact.
Citation indices are tools used by the academic community for research and research evaluation which aggregate scientific literature output and measure scientific impact by collating citation counts. Citation indices help measure the interconnections between scientific papers but fall short because they only display paper titles, authors, and the date of publications, and fail to communicate contextual information about why a citation was made. The usage of citations in research evaluation without due consideration to context can be problematic, if only because a citation that disputes a paper is treated the same as a citation that supports it. To solve this problem, we have used machine learning and other techniques to develop a “smart citation index” called scite, which categorizes citations based on context. Scite shows how a citation was used by displaying the surrounding textual context from the citing paper, and a classification from our deep learning model that indicates whether the statement provides supporting or disputing evidence for a referenced work, or simply mentions it. Scite has been developed by analyzing over 23 million full-text scientific articles and currently has a database of more than 800 million classified citation statements. Here we describe how scite works and how it can be used to further research and research evaluation.
Inaccurate data in scientific papers can result from honest error or intentional falsification. This study attempted to determine the percentage of published papers that contain inappropriate image duplication, a specific type of inaccurate data. The images from a total of 20,621 papers published in 40 scientific journals from 1995 to 2014 were visually screened. Overall, 3.8% of published papers contained problematic figures, with at least half exhibiting features suggestive of deliberate manipula- tion. The prevalence of papers with problematic images has risen markedly during the past decade. Additional papers written by authors of papers with problematic images had an increased likelihood of containing problematic images as well. As this analysis focused only on one type of data, it is likely that the actual prevalence of inaccurate data in the published literature is higher. The marked variation in the frequency of problematic images among journals suggests that journal practices, such as prepublication image screening, influence the quality of the scientific literature.
Preprints increase accessibility and can speed scholarly communication if researchers view them as credible enough to read and use. Preprint services do not provide the heuristic cues of a journal's reputation, selection, and peer-review processes that, regardless of their flaws, are often used as a guide for deciding what to read. We conducted a survey of 3759 researchers across a wide range of disciplines to determine the importance of different cues for assessing the credibility of individual preprints and preprint services. We found that cues related to information about open science content and independent verification of author claims were rated as highly important for judging preprint credibility, and peer views and author information were rated as less important. As of early 2020, very few preprint services display any of the most important cues. By adding such cues, services may be able to help researchers better assess the credibility of preprints, enabling scholars to more confidently use preprints, thereby accelerating scientific communication and discovery.
Efforts to make research results open and reproducible are increasingly reflected by journal policies encouraging or mandating authors to provide data availability statements. As a consequence of this, there has been a strong uptake of data availability statements in recent literature. Nevertheless, it is still unclear what proportion of these statements actually contain well-formed links to data, for example via a URL or permanent identifier, and if there is an added value in providing such links. We consider 531, 889 journal articles published by PLOS and BMC, develop an automatic system for labelling their data availability statements according to four categories based on their content and the type of data availability they display, and finally analyze the citation advantage of different statement categories via regression. We find that, following mandated publisher policies, data availability statements become very common. In 2018 93.7% of 21,793 PLOS articles and 88.2% of 31,956 BMC articles had data availability statements. Data availability statements containing a link to data in a repository—rather than being available on request or included as supporting information files—are a fraction of the total. In 2017 and 2018, 20.8% of PLOS publications and 12.2% of BMC publications provided DAS containing a link to data in a repository. We also find an association between articles that include statements that link to data in a repository and up to 25.36% (± 1.07%) higher citation impact on average, using a citation prediction model. We discuss the potential implications of these results for authors (researchers) and journal publishers who make the effort of sharing their data in repositories. All our data and code are made available in order to reproduce and extend our results.
Background Governments commonly fund research with specific applications in mind. Such mechanisms may facilitate ‘research translation’ but funders may employ strategies that can also undermine the integrity of both science and government. We estimated the prevalence and investigated correlates of funder efforts to suppress health behaviour intervention trial findings. Methods Our sampling frame was lead or corresponding authors of papers (published 2007–2017) included in a Cochrane review, reporting findings from trials of interventions to improve nutrition, physical activity, sexual health, smoking, and substance use. Suppression events were based on a previous survey of public health academics. Participants answered questions concerning seven suppression events in their efforts to report the trial, e.g., [I was…] “asked to suppress certain findings as they were viewed as being unfavourable.” We also examined the association between information on study funder, geographical location, targeted health behaviour, country democracy rating and age of publication with reported suppression. Findings We received responses from 104 authors (50%) of 208 eligible trials, from North America (34%), Europe (33%), Oceania (17%), and other countries (16%). Eighteen percent reported at least one of the seven suppression events relating to the trial in question. The most commonly reported suppression event was funder(s) expressing reluctance to publish because they considered the results ‘unfavourable’ (9% reported). We found no strong associations with the subject of research, funding source, democracy, region, or year of publication. Conclusions One in five researchers in this global sample reported being pressured to delay, alter, or not publish the findings of health behaviour intervention trials. Regulation of funder and university practices, establishing study registries, and compulsory disclosure of funding conditions in scientific journals, are needed to protect the integrity of public-good research.
Abstract Background Systematic reviews (SRs) are useful tools in synthesising the available evidence, but high numbers of overlapping SRs are also discussed in the context of research waste. Although it is often claimed that the number of SRs being published is increasing steadily, there are no precise data on that. We aimed to assess trends in the epidemiology and reporting of published SRs over the last 20 years. Methods A retrospective observational study was conducted to identify potentially eligible SRs indexed in PubMed from 2000 to 2019. From all 572,871 records retrieved, we drew a simple random sample of 4,000. The PRISMA-P definition of SRs was applied to full texts and only SRs published in English were included. Characteristics were extracted by one reviewer, with a 20% sample verified by a second person. Results A total of 1,132 SRs published in 710 different journals were included. The estimated number of SRs indexed in 2000 was 1,432 (95% CI: 547-2,317), 5,013 (95% CI: 3,375-6,650) in 2010 and 29,073 (95% CI: 25,445-32,702) in 2019. Transparent reporting of key items increased over the years. About 7 out of 10 named their article a SR (2000-2004: 41.9% and 2015-2019: 74.4%). In 2000-2004, 32.3% of SRs were based in the UK (0% in China), in 2015-2019 24.0% were from China and 10.8% from the UK. Nearly all articles from China (94.9%) conducted a meta-analysis (overall: 58.9%). Cochrane reviews (n=84; 7.4%) less often imposed language restrictions, but often did not report the number of records and full texts screened and did not name their article a SR (22.6% vs. 73.4%). Conclusions We observed a more than 20-fold increase in the number of SRs indexed over the last 20 years. In 2019, this is equivalent to 80 SRs per day. Over time, SRs got more diverse in respect to journals, type of review, and country of corresponding authors. The high proportion of meta-analyses from China needs further investigation. Study registration Open Science Framework (https://osf.io/pxjrv/).
The majority of scientific research data in the United States is not shared, meaning that our nation has vast untapped potential to fuel scientific advances. The next administration can dramatically accelerate scientific progress by (i) requiring scientists who receive federal funding to share their research data and (ii) directing federal research agencies to coordinate to build an International Research Data Commons that allows research data to be easily discovered and shared.