Dear colleagues and friends,
In 2019, Management Science introduced a Data and Code Disclosure Policy (https://pubsonline.informs.org/page/mnsc/datapolicy), with the objective to assure the availability of the material necessary to replicate the research published in the journal. Early in 2020, the editorial board has appointed a Data Editor whose responsibility is to ensure that accepted papers comply with the Data and Code Disclosure Policy and to verify the ability to replicate results published by the journal.The Data Editor, Ben Greiner, and his team (Milos Fisar and Ali Ozkes) started reviewing their first paper and replication package in April 2020. Since then, until the end of March 2021, the Data Editorial team worked with authors of 123 accepted articles to provide proper replication materials.
The main goal of the review is to ensure to the best possible extent that data and code is provided to allow replication of all results in the main manuscript of the article (with replication materials for results in the appendix welcomed but not compulsory). Note, however, that there is currently no capacity to compare the results produced in the replication to the results reported in the paper: this is left to the academic community and the data and code review focuses on making this possible.
Since only articles submitted on July 1st 2019 or later fall under the Data and Code Disclosure Policy, the number of reviewed papers is increasing over time and this trend is expected to continue for a while.
Now, one year later, it is time to look back a year and share some insights.
Statistics about reviewed papers
Papers that require data and code review come from all Management Science departments; the distribution across departments reflects the relative size and data/code intensity in these departments, see Figure 1. In terms of methodology, more than half of all papers in data/code review are predominantly empirical papers. About a quarter of the papers report results from laboratory, online or field experiments, only 4% are based on surveys, and 17% are papers that feature theoretical models, simulations, or computations, and often either include code only or use data mostly for simulation/demonstration purposes, see Figure 2 for a summary.
Figure 1: Departments of papers in data and code review, 04/2020-03/2021
Figure 2: Types of papers in data and code review, 04/2020-03/2021
Among all papers that include data, 53% of all eventual replication packages include all data necessary to replicate, 39% rely on proprietary data and 8% of the papers provide at least partial data. Among those papers with proprietary data, 42% of the datasets are publicly accessible (e.g. via subscription to services like WRDS or Compustat), and 25% of papers provide at least sample or synthetic datasets which allow to rerun the code but without producing the exact same results as reported in the paper.
Statistics about the review process
For those accepted papers which required a replication package review, the data and code review added on average about 17 days to the processing time of a paper. On average, a paper spends about 9 days in review with the Data Editor and his team, and about 8 days with authors for revisions to their replication materials.
Depending on the availability of data and software packages, the data editorial team also checks the validity of the code, i.e., whether it runs through without errors. It could do so in 54% of all replication packages.
Only 24% of submitted replication materials could be accepted as is, without further revisions. About 55% of data and code packages needed one revision, 19% went through 2 revisions, and 2% required 3 rounds of revisions until all data/code was provided and – if applicable – the code ran through without errors. Figure 3 lists a break-down of different types of revision requests to authors, out of all papers with data/without data that entered the data and code review. The figure shows that most often, the Data Editor has to ask for better documentation of included data sets and code (e.g., variable dictionaries, instructions on dataset construction, descriptions where which figure and table is produced, etc.). However, it also shows that often the Data Editor has to ask for code that produces certain figures and tables, and that errors in the submitted code are very common (e.g. for papers with data, given that only 54% of codes can be checked, the 26% of code errors imply that half of submitted codes are found faulty).
Figure 3: Revision requests/concerns for papers in data and code review, 04/2020-03/2021
A final piece of good news is that only 7% of data and code reviews required authors to make changes in their paper. In all cases, these changes were minor, and in all cases the respective Department Editor was involved to confirm their acceptance decision in face of the changes.
I would like to thank the Data Editor, Ben Greiner, and his team for all the hard work on behalf of the journal and the management science community. I hope the policy and their work will help advance areas covered by the journal. Indeed, sharing of data and codes is of value to the relevant research communities, allowing them to leverage this prior work in their own pursuits. This sharing should increase the rate of scientific progress and impact.
The Institute for Operations Research and the Management Sciences
phone 1 443-757-3500
phone 2 800-4INFORMS (800-446-3676)