a basic framework to do scientific research which handles a large amount of data
1. experiment phase with a small subset
1.1 pick a small subset data
1.2 design/try many different algorithms / methods
1.3 evaluate all the algorithms / methods / parameters with the selected small dataset
1.4 if the best one is not satisfactory, tune the algorithm / parameters, back to 1.2. else go to phase 2
2. scale up to larger dataset and do experiment again, fix issues and bugs
3. scale up to the whole dataset
The purpose of starting with a small dataset is to speed up experiment iteration loop. The faster iteration, the more experiment and the better methods can be found.
key elements in experimentation:
1. real dataset,
2. algorithm development
3. evaluation metrics
the scientific study methodology seems be applying to government programs which are evidence based. from this paper:
tiered evidence standards in the Department of education’s investing in innovation fund (i3) competitive Grant Program
The Department of Education’s i3 program provides competitive grants to local education agencies to expand innovative practices that have been demonstrated to improve student achievement, increase high school graduation rates, or increase college enrollment and completion. The program established three tiers: scale-up grants to fund expansion of practices for which there is already strong evidence, validation grants to provide funding to support promising strategies for which there is currently only moderate evidence, and development grants to provide funding to support “high-potential and relatively untested” practices.
The Department of Education also established standards for evidence. “Strong evidence” requires a prior randomized trial or a rigorous quasi-experimental design. “Moderate evidence” is defined as promising research that had a flaw such as insufficient sample sizes or a potential for selection bias that limited the amount of confidence that could be placed in the research.
Over three rounds of competitions, the Department of Education has awarded five scale-up grants, twenty-eight validation grants, and fifty-nine development grants with total grants of $940 million. Most of the scale-up grants provided approximately $50 million each; most of the validation grants were for approximately $15 million; and most of the development grants were for approximately $3 million. Thus, the program reserved the largest blocks of funding for proven practices, while also investing in promising but not fully-proven approaches. It also required rigorous evaluation plans from grantees, so that unproven programs can, over time, become proven programs if they are shown to work.
three phase: development, validation, and scale-up
no loop here, but multiple experiments runs in parallel at the same time as each experiment take several years.