development / experiment / product market fit and then scale up

a basic framework to do scientific research which handles a large amount of data

1. experiment phase with a small subset

1.1 pick a small subset data

1.2 design/try many different algorithms / methods

1.3 evaluate all the algorithms / methods / parameters with the selected small dataset

1.4 if the best one is not satisfactory, tune the algorithm / parameters, back to 1.2. else go to phase 2

2. scale up to larger dataset and do experiment again, fix issues and bugs

3. scale up to the whole dataset

The purpose of starting with a small dataset is to speed up experiment iteration loop. The faster iteration, the more experiment and the better methods can be found.

key elements in experimentation:

1. real dataset,

2. algorithm development

3. evaluation metrics

4. debugging

the scientific study methodology seems be applying to government programs which are evidence based. from this paper:

tiered evidence standards in the Department of education’s investing in innovation fund (i3) competitive Grant Program

The Department of Education’s i3 program provides competitive grants to local education agencies to expand innovative practices that have been demonstrated to improve student achievement, increase high school graduation rates, or increase college enrollment and completion. The program established three tiers: scale-up grants to fund expansion of practices for which there is already strong evidence, validation grants to provide funding to support promising strategies for which there is currently only moderate evidence, and development grants to provide funding to support “high-potential and relatively untested” practices.

The Department of Education also established standards for evidence. “Strong evidence” requires a prior randomized trial or a rigorous quasi-experimental design. “Moderate evidence” is defined as promising research that had a flaw such as insufficient sample sizes or a potential for selection bias that limited the amount of confidence that could be placed in the research.

Over three rounds of competitions, the Department of Education has awarded five scale-up grants, twenty-eight validation grants, and fifty-nine development grants with total grants of $940 million. Most of the scale-up grants provided approximately $50 million each; most of the validation grants were for approximately $15 million; and most of the development grants were for approximately $3 million. Thus, the program reserved the largest blocks of funding for proven practices, while also investing in promising but not fully-proven approaches. It also required rigorous evaluation plans from grantees, so that unproven programs can, over time, become proven programs if they are shown to work.

three phase: development, validation, and scale-up

no loop here, but multiple experiments runs in parallel at the same time as each experiment take several years.

Advertisements

positive feedback

前天装了个any.do,打开应用,加了一个test任务

昨天出了个notification,说有一个任务

然后打开,把任务给消掉

后来又出了个notification,说恭喜任务完成

感觉很好

hooked model说这是一种reward

不过我觉得用positive feedback来描述更准确。

其实无论是hooked model还是growth hacking的核心都是positive feedback。

Positive feedback最简单的例子是拿麦克风对着音箱。即使不说任何话,即便是非常微弱的环境噪音也会被迅速放大。

一个business,如果viral指数超过一,那么就是positive feedback。

人对一个东西上瘾,也是因为positive feedback。

一个好的界面,也应该有设计良好的feedback。

Feedback的机制可以是email,可以是notification,可以是推荐。

Hooked model and growth hacker marketing

读完两本。都还不错。新知类别。

Hooked model就是让一个产品变成用户习惯性使用的工具,上瘾的工具。四个要素。trigger,action, variable reward, investment。hooked model解决retention的问题。

Growth hacking 第一步是pmf,就是product marketing fit。pmf是第一步。然后是growth。首先是pull users。不要打广告,而是有针对性的到有目标客户的地方发帖。有了初始用户后,想办法go viral。这个时候需要把viral的特性实现在产品中。比如dropbox那样推荐用户就送空间。比如在用户的内容里面加上一句话宣传自己。最后就是retention。用回到hooked model。

julia studio

http://forio.com/products/julia-studio/

julia> 1 + 2
3

julia> x = rand(2,2)
2x2 Array{Float64,2}:
 0.395255  0.639148
 0.619408  0.232032

julia> y = rand(2, 1)
2x1 Array{Float64,2}:
 0.0550153
 0.47498  

julia> x * y
2x1 Array{Float64,2}:
 0.325328
 0.144288

julia> x'
2x2 Array{Float64,2}:
 0.395255  0.619408
 0.639148  0.232032

julia> x[1]
0.3952545670997125

julia> x[:]
4-element Array{Float64,1}:
 0.395255
 0.619408
 0.639148
 0.232032

julia> u, s, v = svd(x)
(
2x2 Array{Float64,2}:
 -0.761492  -0.648174
 -0.648174   0.761492,

[0.9483465520845085,0.32074946151237244],
2x2 Array{Float64,2}:
 -0.740729   0.671804
 -0.671804  -0.740729)

julia> u * diagm(s) * v'
2x2 Array{Float64,2}:
 0.395255  0.639148
 0.619408  0.232032

here is a good resource to learn:

http://learnxinyminutes.com/docs/julia/

RI DataHub

http://ridatahub.org/

The RI DataHUB is a central resource for anyone interested in using data to understand the well-being of people in Rhode Island.

The DataHUB brings together data sets from multiple federal, state and local sources. The site allows you to select the data of your choice and visualize it in charts, graphs, maps and more. The ability to see relationships between data sets sheds light on important details and allows for new insights into policy or programmatic questions about the well-being of Rhode Islanders.

Policymakers, program planners and grants writers can use the DataHUB to demonstrate where to target scarce public resources and explain the data-driven rationale behind policy decisions.

// story slides are good

// weave is slow and hard to use

Building on Recent Advances in Evidence-Based Policymaking

A paper jointly released by Results for America and The Hamilton Project at April 2013:

These strategies:

subsidize learning and experimentation so that new solutions are developed,

increase the amount of evidence on the effectiveness of existing and potential new programs,

make greater use of evidence in budget and management decisions,

make purposeful efforts to target improved outcomes for particular populations,

and spur innovation and align incentives through cross-sector and community-based collaborations.

This paper describes the new strategies. It also proposes several steps to advance the use of evidence-based policy in the federal government, including

giving agencies the authority to reserve a percentage of program spending to fund program evaluations

and expanding the use of tiered evidence standards in grant competitions.

Finally, it recommends two initiatives that would supplement the diffusion of these evidence-based practices with a more-focused approach that aims to supply solutions for specific high-priority social problems.

The Ten-Year Challenge would tackle ten social problems by establishing data-driven, outcome-focused initiatives in one hundred communities.

A federal Pay for Success initiative would help state and local governments establish Pay for Success projects in areas like early-childhood education where state and local activity has the potential to achieve important federal policy objectives or produce significant federal budget savings.

DSHS Integrated Client Database

here 2011: DSHS’ INTEGRATED CLIENT DATABASE (ICDB) is a longitudinal client database containing 10 or more years of detailed service risks, history, costs, and outcomes. ICDB is used to support cost-benefit and cost offset analyses, program evaluations, operational program decisions, geographical analyses and in-depth research. DSHS serves over 2.2 million clients a year. The ICDB is the only place where all the client information comes together. From this central DSHS client database, we get a current and historical look into the life experiences of residents and families who encounter the state’s social service system.

Internal to ICDB are more than 80 pl/sql packages containing more than 80 thousand lines of pl/sql code. SAS processes consist of more than 100 main program files and several hundred supporting code files containing more than 60,000 lines of code. These have produced about a terabyte of data files including more than 500 SAS data sets and 325 Excel spreadsheets. The Relational DataBase Management System uses Oracle, with partitioning and spatial options.