Good practices in using open source tools for reproducible science
2020-11-21, 16:00–16:30, Κύρια Αίθουσα Ομιλιών

The Replication crisis has been characterized as a systemic and methodological problem in academia that stigmatizes research, affects funding and imposes an important hurdle to progress and innovation. In short, the replication crisis is the phenomenon where a large proportion of scientific publications, even these published in high profile journals with ground breaking conclusions cannot be simply reproduced from independent researchers. The main causes of this crisis are deeply rooted in the mindset of academics that treat research as a personal asset that should be communicated with other peers but not shared. Most of the time academics are afraid of being stolen their intellectual property, for losing citations, or for being criticized for inadequacies of their implementations. This phenomenon has been brought forward from many science advocacy groups and has resulted in policy changes in funding bodies that demand the release of data and source code as a requirement for funding research proposals. In this talk we present how although this is a good first step, it does not solve the problem. We demonstrate how researchers can "hide" their methods even by releasing the source code and how they can obscure the replication process. We also present good practices for code and data releases without compromising intellectual rights and without risking losing citations. Finally we present the modern ecosystem for open source tools for sharing code, data, images, workflows, results and hypotheses. As a test case we also present the newly introduced platform for reproducible science, OpenBio.eu .


This talk focuses on the following issues: * The Replication crisis, the problem, the financial effect and the social consequences * The academia mindset, the rational and progress so far. * Current Policies for battling the crisis, how they are enforced, how they are circumvented. * Good practices for releasing research code. Ensuring reproducibility while holding copyright and citations. * Existing open source ecosystem for publishing research. * Research Data Repositories: Figshare, Zenodo, osf.io * Assigning DOIs in open source software repositories * Automating analysis and maximizing reproducibility
* Workflow Management Systems: Nextflow, snakemake, airflow, argo
* Modelling analysis: CWL (Common Workflow Language)
* Virtualization software (singularity vs. Docker)
* Deployment software (Toil, Arvados) * Integration and Keeping the complete research process open and reproducible: openbio.eu

See also: powerpoint presentation