From 7593e7940d0bd7ce3fecd46e582329773abf3f19 Mon Sep 17 00:00:00 2001 From: JianqiaoWang Date: Thu, 2 Mar 2017 22:55:30 +0800 Subject: [PATCH 1/7] Update Rubin_all.Rmd --- Rubin_all.Rmd | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/Rubin_all.Rmd b/Rubin_all.Rmd index 899b69b..ce7fed1 100644 --- a/Rubin_all.Rmd +++ b/Rubin_all.Rmd @@ -11,27 +11,27 @@ knitr::opts_chunk$set(echo = TRUE) Abstract. Donald Bruce Rubin is John L. Loeb Professor of Statistics at Harvard University. He has made fundamental contributions to statistical methods for missing data, causal inference, survey sampling, Bayesian in- ference, computing and applications to a wide range of disciplines, including psychology, education, policy, law, economics, epidemiology, public health and other social and biomedical sciences. -摘要:Donald Bruce Rubin是哈佛大学统计系...教授。他在缺失数据,因果推断,抽样调查,贝叶斯推断等统计学方法上做了许多基础贡献。且在心理学,教育,政策,法律,经济,流行病,公共卫生河其他社会及生物医学领域的计算及应用上有许多贡献。 +摘要:Donald Bruce Rubin是哈佛大学统计系教授。他在缺失数据,因果推断,抽样调查,贝叶斯推断等统计学方法上做了许多基础贡献。且在心理学,教育,政策,法律,经济,流行病,公共卫生和其他社会及生物医学领域的计算及应用上有许多贡献。 Don was born in Washington, D.C. on December 22, 1943, to Harriet and Allan Rubin. One year later, his family moved to Evanston, Illinois, where he grew up. He developed a keen interest in physics and math- ematics in high school. In 1961, he went to college at Princeton University, intending to major in physics, but graduated in psychology in 1965. He began gradu- ate school in psychology at Harvard, then switched to Computer Science (MS, 1966) and eventually earned a Ph.D. in Statistics under the direction of Bill Cochran in 1970. After graduating from Harvard, he taught for a year in Harvard’s Department of Statistics, and then in 1971 he began working at Educational Testing Ser- vice (ETS) and served as a visiting faculty member at Princeton’s new Statistics Department. He held several visiting academic appointments in the next decade at Harvard, UC Berkeley, University of Texas at Austin and the University of Wisconsin at Madison. He was a full professor at the University of Chicago in 1981– 1983, and in 1984 moved back to the Harvard Statistics Department, where he remains until now, and where he served as chair from 1985 to 1994 and from 2000 to 2004. -Don在Harriet和Allan Rubin之后于1943年12月22日出生于华盛顿。一年后,举家搬到伊利诺伊Evanston,他在那里长大。他在高中发展了对物理和数学浓厚的兴趣。1961年,就读于普林斯顿大学,打算主修物理,却在1965年毕业于心理系。然后他就读于哈佛大学心理学研究生院,在1966年转到计算机系硕士并最终在Bill Cochran的指导下,于1970年获得统计学博士学位。从哈佛毕业以后,他在哈佛大学统计系担任一年教职,1971年开始在ETS工作并且是普林斯顿大学新的统计系的访问学者。十年中,他在哈佛大学,加州大学伯克利分校,得克萨斯大学奥斯汀分校,威斯康星麦迪逊分校举办过多个学术会议。在1981-1983年间,他是芝加哥大学的全职教授,于1984年重返哈佛大学统计系工作至今,并分别在1985-1994年,2000-2004年担任主席。 +Don于1943年12月22日出生于华盛顿,其父母是在Harriet和Allan Rubin。一年后,举家搬到伊利诺伊Evanston,他在那里长大。他在高中发展了对物理和数学浓厚的兴趣。1961年就读于普林斯顿大学时,他本打算主修物理,却在1965年毕业于心理系。然后他就读于哈佛大学心理学研究生院,在1966年转到计算机系硕士并最终在Bill Cochran的指导下,于1970年获得统计学博士学位。从哈佛毕业以后,他在哈佛大学统计系担任一年教职,1971年开始在ETS工作并且是普林斯顿大学新的统计系的访问学者。之后十年里,他在哈佛大学,加州大学伯克利分校,得克萨斯大学奥斯汀分校,威斯康星麦迪逊分校举办过多个学术会议。在1981-1983年间,他是芝加哥大学的全职教授,于1984年重返哈佛大学统计系工作至今,并分别在1985-1994年,2000-2004年担任主席。 Don has advised or coadvised over 50 Ph.D. stu- dents, written or edited 12 books, and published nearly 400 articles. According to Google Scholar, by May 2014, Rubin’s academic work has 150,000 citations, 16,000 in 2013 alone, placing him at the top with the most cited scholars in the world. -Don指导和共同指导超过50名博士生,撰写和编辑12本著作,发表近400篇文章。根据谷歌学术到2014年5月的统计,Rubin的学术成果以有15万次引用,单单2013年就有1万6千次引用,在全球学者中名列前茅。 +Don指导和共同指导超过50名博士生,撰写和编辑12本著作,发表近400篇文章。根据谷歌学术到2014年5月的统计,Rubin的学术成果已有15万次引用,单单2013年就有1万6千次引用,在全球学者中名列前茅。 For his many contributions, Don has been hon- ored by election to Membership in the US National Academy of Sciences, the American Academy of Arts and Sciences, the British Academy, and Fellowship in the American Statistical Association, Institute of Mathematical Statistics, International Statistical Insti- tute, Guggenheim Foundation, Humboldt Foundation and Woodrow Wilson Society. He has also received the Samuel S. Wilks Medal from the American Sta- tistical Association, the Parzen Prize for Statistical In- novation, the Fisher Lectureship and the George W. Snedecor Award of the Committee of Presidents of Sta- tistical Societies. He was named Statistician of the Year by the American Statistical Association’s Boston and Chicago Chapters. In addition, he has received hon- orary degrees from Bamberg University, Germany and the University of Ljubljana, Slovenia. -由于他的诸多贡献,Don被推选为美国国家科学院成员,美国艺术与科学学院成员,英国国家学术院成员,美国统计学会理事,国际数理统计协会理事,国际统计学会理事,古根海姆基金会理事,洪堡基金会理事,Woodrow Wilson Society理事。他获得了美国统计学会Samuel S. Wilks奖,统计创新奖Parzen Prize,the Fisher Lectureship and the George W. Snedecor Award of the Committee of Presidents of Sta- tistical Societies.他被美国统计学会的波士顿和芝加哥篇章誉为年度统计学家。此外,他获得了Bamberg University, Germany and the University of Ljubljana, Slovenia的荣誉学位。 +由于他的诸多贡献,Don被推选为美国国家科学院成员,美国艺术与科学学院成员,英国国家学术院成员,美国统计学会理事,国际数理统计协会理事,国际统计学会理事,古根海姆基金会理事,洪堡基金会理事,Woodrow Wilson Society理事。他获得过美国统计学会Samuel S. Wilks奖,统计创新奖Parzen Prize,以及由统计学会会长委员会(the Committee of Presidents of Statistical Societies,即COPSS)颁发费舍奖the Fisher Lectureship和斯尼德克奖the George W. Snedecor Award. 他被美国统计学会的波士顿和芝加哥篇章誉为年度统计学家。此外,他获得了Bamberg University, Germany and the University of Ljubljana, Slovenia的荣誉学位。 Besides being a statistician, he is a music lover, au- diophile and fan of classic sports cars. -除了成为统计学家,他还是一个音乐爱好者,古典乐,体育,汽车发烧友。 +除了统计学家的身份,他还是一个音乐爱好者,古典乐,体育,汽车发烧友。 This interview was initiated on August 7, 2013, dur- ing the Joint Statistical Meetings 2013 in Montreal, in anticipation of Rubin’s 70th birthday, and completed at various times over the following months. -本访谈于2013年8月7日进行,在2013年JSM大会期间,Rubin70岁生日之时,在接下来的几个月的不同时间完成。 +为庆祝Rubin70岁生日,本访谈开始于2013年8月7日2013年JSM大会期间,并在接下来的几个月的不同时间完成。 # 开始 @@ -41,7 +41,7 @@ Fan:让我们从你的小时候说起吧。我知道你出生于一个律师 Don: Yes. My father was the youngest of four broth- ers, all of whom were lawyers, and we used to have stimulating arguments about all sorts of topics. Prob- ably the most argumentative uncle was Sy (Seymour Rubin, senior partner at Arnold, Fortas and Porter, diplomat, and professor of law at American Univer- sity), from D.C., who had framed personal letters of thanks for service from all the presidents starting with Harry Truman and going through Jerry Ford, as well as from some contenders, such as Adlai Stevenson, and various Supreme Court Justices. I found this impres- sive but daunting. The relevance of this is that it clearly created in me a deep respect for the principles of our le- gal system, to which I find statistics highly relevant— this has obviously influenced my own application of statistics to law, for example, concerning issues as di- verse as the death penalty, affirmative action and the tobacco litigation. -Don:是的。我父亲是他们4兄弟中最年轻的,他们4兄弟都是律师,我们过去在所有的话题上都有激烈的辩论。可能最好辩的是Sy,他在华盛顿,他从总统那里获得了许多感谢信,同样也从一些竞争对手获得了很多感谢信,如Adlai Stevenson和不同的高级法庭。这让我印象深刻,但也让我害怕。原因是,这让我对我们的法律体系所遵守的原则印象深刻,在这里面,我发现统计学与它高度相关,这明显触动我做了统计学在法律上的应用,例如,把问题考虑成包括死刑,平权法案和烟草诉讼。 +Don:是的。我父亲是他们4兄弟中最年轻的,他们4兄弟都是律师,我们过去在所有的话题上都有激烈的辩论。可能最好辩的叔叔是Sy(Seymour Rubin),他在华盛顿,他曾经写过私人感谢信,来表达他对从Harry Truman到Jerry Ford所有总统的感谢,同时他也感谢了对一些竞争对手如Adlai Stevenson和不同的高级法庭。这让我印象深刻,但也让我害怕。原因是,这让我对我们的法律体系所遵守的原则印象深刻,在这里面,我发现统计学与它高度相关,这明显触动我做了统计学在法律上的应用,例如,把问题考虑成包括死刑,平权法案和烟草诉讼。 Fabri: We will surely get back to these issues later, but was there anyone else who influenced your interest in statistics? @@ -49,7 +49,7 @@ Fabri: 等下我们再回到这些问题,但还有其他人影响了你对统 Don: Probably the most influential was Mel, my mother’s brother, a dentist (then a bachelor). He loved to gamble small amounts, either in the bleachers at Wrigley Field, betting on the outcome of the next pitch, while watching the Cubs lose, or at Arlington Race track, where I was taught at a young age how to read the Racing Form and estimate the “true” odds from the various displayed betting pools, while losing two dol- lar bets. Wednesday and Saturday afternoons, during the warm months when I was a preteen, were times to learn statistics—even if at various bookie joints that were sometimes raided. As I recall, I was a decent stu- dent of his, but still lost small amounts. -Don:可能影响最深的是我舅舅,一名牙医。他喜欢小赌,在芝加哥箭牌球场的露天看台看到比赛输了然后开始赌一下场比赛的结果,或者在阿灵顿赛道,在我很小的时候,当我输了2刀的时候,被教导如何阅读赛马新闻,从不同的赌注估计胜算。周三和周六下午,当我还是个青春期少年的时候,在温暖的月份,我就开始学习统计,即使不同的赌注下有时候会被突然袭击。根据我的回忆,我那时是他相当优秀的学生,但是仍然输了一小部分。 +Don:可能影响最深的是我舅舅,一名牙医。他喜欢小赌,要么在芝加哥箭牌球场的露天看台后赌下场比赛的结果,然后看到Cubs队比赛输了,或者在阿灵顿赛道,输了2刀。也是在那里,我小时候就被教导了如何阅读赛马新闻,从不同的赌注估计胜算。周三和周六下午,当我还是个青春期少年的时候,在温暖的月份,我开始学习统计,即使不同的赌注下有时候会被突然袭击。根据我的回忆,我那时是他相当优秀的学生,但是仍然输了一小部分。 There were two other important influences on my statistical interests from the late 1950s and early 1960s. First, there was an old friend of my father’s from their government days together, a Professor Emeritus of Economics at UC Berkeley, George Mehren, with whom I had many entertaining and educational (to me) arguments, which generated a respect for economics that continues to grow to this day. And second, my wonderful teacher of physics at Evanston Township High School—Robert Anspaugh—who tried to teach me to think like a real scientist, and how to use mathe- matics in the pursuit of science. @@ -69,7 +69,7 @@ Fan: 你在1961年进入普林斯顿,开始主修物理,但后来变成了 Don: That’s a good question. Inspired by Anspaugh, I wanted to become a physicist. I was lined up for a BA in three years when I entered Princeton, and unknown to me before I entered, also lined up for a crazy plan to get a Ph.D. in physics in five years, in a program being reconditely planned by John Wheeler, a very well-known professor of physics there (and Richard Feynman’s Ph.D. advisor years earlier). In retrospect, this was a wildly over-ambitious agenda, at least for me. For a combination of complications, including the Vietnam War (and its associated drafts) and Profes- sor Wheeler’s sabbatical at a critical time, I think no one succeeded in completing a five-year Ph.D. from entry. In any case, there were many kids like me at Princeton then, who, even though primarily interested in math and physics, were encouraged to explore other subjects. I did that, and one of the courses I took was on personality theory, taught by a wonderful professor, Silvan Tomkins, who later became a good friend. At the end of my second year, I switched from Physics to Psychology, where my mathematical and scientific background seemed both rare and appreciated—it was an immature decision (not sure what a mature one would have been), but a fine one for me because it in- troduced me to some new ways of thinking, as well as to new fabulous academic mentors. -Don:这个问题好。在Anspaugh的鼓舞下,我想成为一个物理学家。我进入普林斯顿被排了3年拿到BA学位,在我进来之前我并不知道,而且被安排了一个疯狂的5年拿到物理学博士的计划,这是物理系一个非常有名的John Wheeler教授做的计划。回想起来,对于我来说,那是一个雄心勃勃的日程。结合一些其他的复杂因素,包括越南战争,Wheeler教授在关键时刻休假,我觉得没人能在5年内完成博士学位。不管怎样,在普林斯顿还有很多孩子和我一样,即使开始对数学和物理感兴趣,也被鼓励去其他学科探索。我就是这么做的,我上的一门课叫人格理论,是一个很好的教授教的,后来我们成为了朋友。在第二学年末,我从物理转到了心理系,在那,我的数学和科学背景看起来都很弱,喜欢心理学是一个不成熟的决定(我也不确定什么是成熟的),但对我来说,好的是,它带给我一些全新的思考方式,也让我认识了一些新的著名学者。 +Don:这个问题好。在Anspaugh的鼓舞下,我想成为一个物理学家。我进入普林斯顿被排了3年拿到BA学位,在我进来之前我并不知道,而且被安排了一个疯狂的5年拿到物理学博士的计划,这是物理系一个非常有名的John Wheeler教授做的计划。回想起来,对于我来说,那是一个雄心勃勃的日程。结合一些其他的复杂因素,包括越南战争,Wheeler教授在关键时刻休假,我觉得没人能在5年内完成博士学位。不管怎样,在普林斯顿还有很多孩子和我一样,即使开始对数学和物理感兴趣,也被鼓励去其他学科探索。我就是这么做的,我上的一门课叫人格理论,是一个很好的教授教的,后来我们成为了朋友。在第二学年末,我从物理转到了心理系,在那,我的数学和科学背景是很稀缺,并且被欣赏。喜欢心理学是一个不成熟的决定(我也不确定什么是成熟的),但对我来说,好的是,它带给我一些全新的思考方式,也让我认识了一些新的著名学者。 Fabri: You had some computing skills which were uncommon then, right? So you started to use comput- ers quite early. @@ -93,7 +93,7 @@ Fan:你在普林斯顿高年级的时候,申请了心理学的博士项目 Don: Yes, I was accepted by Stanford, Michigan and Harvard. I met some extraordinary people during my visits to these programs. I went out to Stanford first, and met William Estes, a quiet but wonderful profes- sor with strong mathematical skills and a wry wit, who later moved to Harvard. Michigan had a very strong mathematical psychology program, and when I visited in the spring of 1965, I was hosted primarily by a very promising graduating Ph.D. student, Amos Tversky, who was doing extremely interesting work on human behavior and how people handled risks. In later years, he connected with another psychologist, Daniel Kah- neman, and they wrote a series of extremely influential papers in psychology and economics, which eventually led to Kahneman’s winning the Nobel Prize in Eco- nomics in 2002; Tversky passed away in 1996 and was thus not eligible for the Nobel Prize. Kahneman (who recently was awarded a National Medal of Science by President Obama) always acknowledges that the Nobel Prize was really a joint award (to Tversky and him). I was on a committee sometime last year with Kahne- man, and it was interesting to find out that I had known Tversky longer than he had. -Don:是的,我被斯坦福,密歇根和哈佛录取了。在我访问这些项目的过程中认识了很多杰出人物。我先去了斯坦福,认识了William Estes,一个很安静的但是很厉害的教授,他有很强的数学技能并且风趣幽默。后来他去了哈佛。密歇根有很强的数学心理学项目,当我1965年春天访问的时候,我被一个非常有前途的博士生接待了,Amos Tversky,当时在做非常有趣的人类行为和人是如何处理风险的项目。后来的几年,他和另一个心理学家Daniel Kahneman一起在心理学和经济学领域写了一系列极有影响力的文章,最终使得Kahneman在2002年获得诺贝尔经济学奖。Tversky在1996年去世,因此没有资格拿到诺贝尔奖。Kahneman经常致谢那个诺贝尔奖是一个共同的奖项。去年我和Kahneman一起作为评审,非常有意思的是,发现我认识Tversky的时间比他长。 +Don:是的,我被斯坦福,密歇根和哈佛录取了。在我访问这些项目的过程中认识了很多杰出人物。我先去了斯坦福,认识了William Estes,一个很安静的但是很厉害的教授,他有很强的数学技能并且风趣幽默。后来他去了哈佛。密歇根有很强的数学心理学项目,当我1965年春天访问的时候,我被一个非常有前途的博士生接待了,Amos Tversky,当时在做非常有趣的人类行为和人是如何处理风险的项目。后来的几年,他和另一个心理学家Daniel Kahneman一起在心理学和经济学领域写了一系列极有影响力的文章,最终使得Kahneman在2002年获得诺贝尔经济学奖。Tversky在1996年去世,因此没有资格拿到诺贝尔奖。Kahneman经常致谢那个诺贝尔奖是对他俩共同的奖励。去年我和Kahneman一起作为评审,非常有意思的是,发现我认识Tversky的时间比他长。 Fan: But ultimately you chose Harvard. @@ -113,7 +113,7 @@ Fabri: 1965年你最初到哈佛是心理学系的博士生,社会关系学院 Don: When I visited Harvard in the summer of 1965, some senior people in Social Relations appeared to find my background, in subjects like math and physics, at- tractive, so they promised me that I could skip some of the basic more “mathy” requirements. But when I arrived there, the chair of the department, a sociolo- gist, told me something like, “No, no, I looked over your transcript and found your undergraduate educa- tion scientifically deficient because it lacked ‘methods and statistics’ courses. You will have to take them now or withdraw.” Because of all the math and physics that I’d had at Princeton, I felt insulted! I had to get out of there. Because I had independent funding from an NSF graduate fellowship, I looked around. At the time, the main applied math appeared being done in the Di- vision of Engineering and Applied Physics, which re- cently became the Harvard’s “School of Engineering and Applied Sciences.” The division had several sec- tions; one of them was computer science (CS), which seemed happy to have me. -Don:我1965年夏天访问哈佛大学的时候,社会关系系高年级的人知道了我的背景,数学、物理是很吸引人的学科,他们向我保证可以跳过一些数学类的基础课的要求。但当我到了那,系主任,一个社会学家告诉我,“不,不,我看了你的成绩单,觉得你的本科教育缺乏科学性,因为缺少‘方法和统计’这样的课程,你必须要修,不然就退学”。由于我在普林斯顿上了所有数学和物理的课程,我感觉很侮辱。我必须要离开那。由于我拿了NSF的研究生全额奖学金,我开始寻找。在那时,工程和应用物理系新开了应用数学,它最近成为了哈佛的工程和应用科学学院。它有很多部分,其中一个就是计算机,看起来很愿意接收我。 +Don:我1965年夏天访问哈佛大学的时候,社会关系系高年级的人知道了我的数学、物理背景,那很吸引它们,他们向我保证可以跳过一些数学类的基础课的要求。但当我到了那,系主任,一个社会学家告诉我,“不,不,我看了你的成绩单,觉得你的本科教育缺乏科学性,因为缺少‘方法和统计’这样的课程,你必须要修,不然就退学”。由于我在普林斯顿上了所有数学和物理的课程,我感觉很侮辱。我必须要离开那。由于我拿了NSF的研究生全额奖学金,我开始寻找。在那时,工程和应用物理系新开了应用数学,它最近成为了哈佛的工程和应用科学学院。它有很多部分,其中一个就是计算机,看起来很愿意接收我。 Fan: But you got bored again soon. Was this because you found the problems in CS not interesting or chal- lenging enough? @@ -121,7 +121,7 @@ Fan: 但是你很快再次觉得很无聊。是因为你觉得计算机没意思 Don: No, not really that. There were several reasons. First, there was a big emphasis on automatic language translation, because it was cold war time, and it ap- peared that CS got a lot of money for computational linguistics from ARPA (Advanced Research Projects Agency), now known as DARPA. The Soviet Union, from behind the iron curtain, produced a huge num- ber of documents in Russian, but evidently there were not enough people in the US to translate them. A com- plication is that there are sentences that you could not translate without their context. I still remember one example: “Time flies fast,” a three-word sentence that has three different meanings depending on which of the three words is the verb. If this three-word sentence cannot be automatically translated, how can one get an automatic (i.e., by computer) translation of a complex paragraph? Related to this was Noam Chomsky’s work on transformational grammars, down the river at MIT. -Don:不,不是那样的。有好多因素。首先,它很强调自动化的语言翻译,因为那是冷战时期,计算机系得到了来自ARPA(高级研究项目代理,现在是DARPA)的很多经费用于计算语言。苏联,从铁幕背后生产了大量的文件,但很显然美国没有足够的人手去翻译。复杂的是,有些句子没有语境的情况下你无法翻译。我记得一个例子:“时光飞逝”,一个三个单词的句子,可能有三个意思,这取决于这三个单词哪个是动词。如果这个三个单词的句子不能被自动化地翻译,又怎么能自动化翻译一个复杂的段落呢?Noam Chomsky的翻译语法跟这相关,在麻省理工。 +Don:不,不是那样的。有好多因素。首先,它很强调自动化的语言翻译,因为那是冷战时期,计算机系得到了来自ARPA(高级研究项目代理,现在是DARPA)的很多经费用于这个研究。在当时冷战背景下,苏联生产了大量的俄语文件,但很显然美国没有足够的人手去翻译。复杂的是,有些句子没有语境的情况下你无法翻译。我记得一个例子:“时光飞逝”,一个三个单词的句子,可能有三个意思,这取决于这三个单词哪个是动词。如果这个三个单词的句子不能被自动化地翻译,又怎么能自动化翻译一个复杂的段落呢?Noam Chomsky的翻译语法跟这相关,在麻省理工。 Second, although I found some real math courses and the ones in CS on mathy topics, such as computa- tional complexity, which dealt with Turing machines, Godel’s theorem, etc., interesting, I found many of the courses dull. Much of the time they were about programming. I remember one of my projects was to write a program to project 4-dimensional figures into 2- dimensions, and then rotate them using a DEC PDP-1. It took an enormous number of hours. Even though my program worked perfectly, I felt it was a gigantic waste of time. I also got a C+ in that course because I never went to any of the classes. Now, having dealt with many students, I would be more sympathetic that I deserved a C+, but not when I was a kid. At that time, I figured there must be something better to do than rotating 4D objects and getting a C+. But marching through rice paddies in Vietnam or departing for some- where in Canada didn’t seem appealing. So after pick- ing up a MS degree in CS in 1966, although I stayed another year in CS, I was ready to try something else. @@ -632,4 +632,4 @@ Don:享受统计!不要暴躁。如果幸运的话,你可以有一个精彩 We thank Elizabeth Zell, Guido Imbens, Tom Belin, Rod Little, Dale Rinkel and Alan Zaslavsky for helpful suggestions. This work is partially funded by NSF-SES Grant 1155697. -我们特别感谢Elizabeth Zell, Guido Imbens, Tom Belin, Rod Little, Dale Rinkel以及 Alan Zaslavsky提出的宝贵意见。这项工作有美国国家科学基金会的部分资金补助。 \ No newline at end of file +我们特别感谢Elizabeth Zell, Guido Imbens, Tom Belin, Rod Little, Dale Rinkel以及 Alan Zaslavsky提出的宝贵意见。这项工作有美国国家科学基金会的部分资金补助。 From e723dcfafccfafdb81947e4ac39033df0a9e7711 Mon Sep 17 00:00:00 2001 From: JianqiaoWang Date: Thu, 2 Mar 2017 23:48:22 +0800 Subject: [PATCH 2/7] Update Rubin_all.Rmd --- Rubin_all.Rmd | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/Rubin_all.Rmd b/Rubin_all.Rmd index ce7fed1..617e209 100644 --- a/Rubin_all.Rmd +++ b/Rubin_all.Rmd @@ -113,7 +113,7 @@ Fabri: 1965年你最初到哈佛是心理学系的博士生,社会关系学院 Don: When I visited Harvard in the summer of 1965, some senior people in Social Relations appeared to find my background, in subjects like math and physics, at- tractive, so they promised me that I could skip some of the basic more “mathy” requirements. But when I arrived there, the chair of the department, a sociolo- gist, told me something like, “No, no, I looked over your transcript and found your undergraduate educa- tion scientifically deficient because it lacked ‘methods and statistics’ courses. You will have to take them now or withdraw.” Because of all the math and physics that I’d had at Princeton, I felt insulted! I had to get out of there. Because I had independent funding from an NSF graduate fellowship, I looked around. At the time, the main applied math appeared being done in the Di- vision of Engineering and Applied Physics, which re- cently became the Harvard’s “School of Engineering and Applied Sciences.” The division had several sec- tions; one of them was computer science (CS), which seemed happy to have me. -Don:我1965年夏天访问哈佛大学的时候,社会关系系高年级的人知道了我的数学、物理背景,那很吸引它们,他们向我保证可以跳过一些数学类的基础课的要求。但当我到了那,系主任,一个社会学家告诉我,“不,不,我看了你的成绩单,觉得你的本科教育缺乏科学性,因为缺少‘方法和统计’这样的课程,你必须要修,不然就退学”。由于我在普林斯顿上了所有数学和物理的课程,我感觉很侮辱。我必须要离开那。由于我拿了NSF的研究生全额奖学金,我开始寻找。在那时,工程和应用物理系新开了应用数学,它最近成为了哈佛的工程和应用科学学院。它有很多部分,其中一个就是计算机,看起来很愿意接收我。 +Don:我1965年夏天访问哈佛大学的时候,社会关系系高年级的人知道了我的数学、物理背景,那很吸引他们,他们向我保证可以跳过一些数学类的基础课的要求。但当我到了那,系主任,一个社会学家告诉我,“不,不,我看了你的成绩单,觉得你的本科教育缺乏科学性,因为缺少‘方法和统计’这样的课程,你必须要修,不然就退学”。由于我在普林斯顿上了所有数学和物理的课程,我感觉很侮辱。我必须要离开那。由于我拿了NSF的研究生全额奖学金,我开始寻找其他出路。在那时,工程和应用物理系新开了应用数学,它最近成为了哈佛的工程和应用科学学院。它有很多部分,其中一个就是计算机,看起来很愿意接收我。 Fan: But you got bored again soon. Was this because you found the problems in CS not interesting or chal- lenging enough? @@ -133,7 +133,7 @@ Fabri: 统计是如何成为你的最终之路的? Don: A summer job in Princeton in 1966 led to it. I did some programming for John Tukey in For- tran, LISP and COBOL. I also did some consulting for a Princeton sociology professor, Robert Althauser, basically writing programs to do matched sampling, matching blacks and whites, to study racial disparity in dropout rates at Temple University. I had a conversa- tion with Althauser about how psychology and then CS weren’t working out for me at Harvard. Because Bob was doing some semi-technical things in sociology, he knew of Fred Mosteller, although not personally, and also knew that Harvard had a decade-old Statistics Department that was founded in 1957. He suggested that I contact Mosteller. After getting back to Harvard, I talked to Fred, and he suggested that I take some stat courses. So in my third year in Harvard, I took mostly stat courses and did OK in them. And the Stat depart- ment said “Yes” to me. It also helped to have my own NSF funding, which I had from the start; they kept re- newing for some reason, showing their bad taste prob- ably, but it worked out well for me. Anyway, at the end of my third year at Harvard, I had switched to statistics, my third department in four years. -Don:1966年夏天在普林斯顿的工作成就了这件事。我用Fortran,LISP和COBOL为John Tukey写了一些程序。我也为普林斯顿社会学教授Robert Althauser做了一些咨询,写一些程序匹配抽样,匹配黑人和白人,研究天普大学辍学率的种族差异。我跟Althauser谈过关于心理学是怎样的以及为什么计算机专业在哈佛不适合我。因为Bob在社会学上做一些半技术的东西,他认识Fred Mosteller,虽然不是个人名义,也知道哈佛有一个1957年成立的统计系。他建议我联系Mosteller。在回到哈佛后,我和Fred谈了一下,他建议我修一些统计课程。所以在第三年,我大部分上的都是统计课程并且还不错。统计系接收了我并帮我拿到我自己的NSF资助,从我一开始入学就有。由于某些原因,他们一直重新开始,可能显示了他们的坏品味,但是对我来说很好。不管怎样,在哈佛的第三年末,我转到了统计系,四年中的第三个系。 +Don:1966年夏天在普林斯顿的工作成就了这件事。我用Fortran,LISP和COBOL为John Tukey写了一些程序。我也为普林斯顿社会学教授Robert Althauser做了一些咨询,写一些程序匹配抽样,匹配黑人和白人,研究天普大学辍学率的种族差异。我跟Althauser谈过关于心理学是怎样的以及为什么计算机专业在哈佛不适合我。因为Bob在社会学上做一些半技术的东西,他认识Fred Mosteller,虽然不算什么私人友谊,他也知道哈佛有一个1957年成立的统计系。他建议我联系Mosteller。在回到哈佛后,我和Fred谈了一下,他建议我修一些统计课程。所以在第三年,我大部分上的都是统计课程并且还不错。统计系接收了我并帮我拿到我自己的NSF资助,从我一开始入学就有。由于某些原因,他们一直重新开始,可能显示了他们的坏品味,但是对我来说很好。不管怎样,在哈佛的第三年末,我转到了统计系,四年中的第三个系。 Fabri: Besides Mosteller, who else was on the statis- tics faculty then? It was a quite new department, as you said. @@ -141,7 +141,7 @@ Fabri: 除了Mosteller,统计系还有哪些老师?你说,它是个很新 Don: The other senior people were Bill Cochran and Art Dempster, who had recently been promoted to tenure. The junior ones were Paul Holland; Jay Gold- man, a probabilist; and Shulamith Gross from Berke- ley, a student of Erich Lehmann’s. -Don:其他的高级教师还有Bill Cochran和Art Dempster,他们最近升为终身教授。初级的有Paul Holland,Jay Goldman是概率论专家,Shulamith Gross来自伯克利,Erich Lehmann的学生。 +Don:其他的高级教师还有Bill Cochran和Art Dempster,他们刚刚升为终身教授。初级的有Paul Holland;Jay Goldman是概率论专家,Shulamith Gross来自伯克利,Erich Lehmann的学生。 Fabri: And you decided to work with Bill. @@ -149,7 +149,7 @@ Fabri: 你决定和Bill合作。 Don: Actually, I first talked to Fred. Fred always had a lot of projects going; one was with John Tukey and he proposed that I work on it. I told him that I had this matched sampling project of my own, and he sug- gested that I talk to Cochran—Cochran a few years ear- lier was an advisor for the Surgeon General’s report on smoking and lung cancer. It was obviously based on observational data, not on randomized experiments, and Fred said that Cochran knew all about these issues in epidemiology and biostatistics. So I went to knock on Bill’s door. He answered with a grumpy sounding “yes,” I went in and he said, “No, not now, later!” So I thought “Hmmm, rough guy,” but actually he was a sweetheart, with a great Scottish dry sense of humor and a love of scotch and cigarettes (I understand the former, although not the latter). -Don:事实上,我先找了Fred谈话。他总是有很多项目在进行,其中一个是和John Tukey,他建议我做这个。我告诉他我自己做过这个抽样的项目了,他建议我去找Cochran—Cochran谈,几年前是为Surgeon General汇报过吸烟和肺癌项目的。很明显是基于观察数据,不是随机试验,Fred说Cochran知道所有的关于流行病和生物统计的问题。所以我去敲了Bill的门。他用脾气暴躁的声音说了‘yes’,我进去了他说,“不,不是现在,一会儿!”。然后我想,“嗯,粗人”。但事实上他心地善良,有很强的苏格兰幽默感,喜欢苏格兰威士忌和香烟(前者我理解,后者我不能理解) +Don:事实上,我先找了Fred谈话。他总是有很多项目在进行,其中一个是和John Tukey,他建议我做这个。我告诉他我自己做过这个抽样的项目了,他建议我去找Cochran谈—Cochran几年前是卫生局局吸烟和肺癌项目报告的顾问。这个报告很明显是基于观察数据,不是随机试验,Fred说Cochran知道所有的关于流行病和生物统计的问题。所以我去敲了Bill的门。他用脾气暴躁的声音说了‘yes’,我进去了他说,“不,不是现在,一会儿!”。然后我想,“嗯,粗人”。但事实上他心地善良,有很强的苏格兰幽默感,喜欢苏格兰威士忌和香烟(前者我理解,后者我不能理解) Fabri: Cochran did have a lasting influence on you, right? @@ -157,23 +157,23 @@ Fabri: Cochran对你产生了持续影响,对吗? Don: Yes, he had a tremendous influence on me. Once I was doing some irrelevant math on matching, which I now see popping up again in the literature. I showed that to Bill, and he asked me, “Do you think that’s important, Don?” I said, “Well, I don’t know.” Then he said, “It is not important to me. If you want to work on it, go find someone else to advise you. I care about statistical problems that matter, not about making things epsilon better.” Another person who was very influential was Art Dempster. Once I did some con- sulting for Data Text, a collection of batch computer programs like PSTAT or BMDP. I was designing pro- grams to calculate analyses of variance, do regressions, ordinary least squares, matrix inversions, all when you have, in hindsight, limited computing power. For ad- vice on some of those I talked to Dempster, who al- ways has great multivariate insights based on his deep understanding of geometry—very Fisherian. -Don:是的,他对我有巨大的影响。有一次我正在做一些不相关的数学匹配,如今我在文献里又看到了。我给Bill看,他问我,“Don,你认为那重要吗?”我说,“我不知道。”然后他说,“这对我不重要。如果你还想做这方面,去找其他人来给你建议。我关心那背后的统计问题,而不是把事情做的更好。”另一个对我有很大影响的是Art Dempster。有一次我为数据文本做一些咨询,一系列的计算机程序像PSTAT或者BMDP。我设计计算机程序计算方差,做回归,普通最小二乘,矩阵求逆,所有都局限于计算。和Dempster聊天,他给了我一些建议,他对多元有很深刻的理解,基于他对几何的理解已经和Fisher相差无几。 +Don:是的,他对我有巨大的影响。有一次我正在研究样本配对,做一些其实并不重要的数学推导,如今我在文献里又看到了。我给Bill看,他问我,“Don,你认为那重要吗?”我说,“我不知道。”然后他说,“这对我不重要。如果你还想做这方面,去找其他人来给你建议。我关心那背后的统计问题,而不是把事情做的更好。”另一个对我有很大影响的是Art Dempster。有一次我为数据文本做一些咨询,一系列的计算机程序像PSTAT或者BMDP。我设计计算机程序计算方差,做回归,普通最小二乘,矩阵求逆,所有都局限于计算。和Dempster聊天,他给了我一些建议,他对多元有很深刻的理解,他对几何的理解已经和Fisher相差无几。 Fan: Your Ph.D. thesis was on matching, which is the start of your life-long pursuit of causal inference. How did your interest in causal inference start? -Fan: 你的博士论文是关于匹配的,从什么时候开始你致力于因果推断的研究?你什么时候对因果推断产生兴趣的? +Fan: 你的博士论文是关于配对的,从什么时候开始你致力于因果推断的研究?你什么时候对因果推断产生兴趣的? Don: When I worked with Althauser on the racial disparity problem, I always emphasized to him that it was inherently descriptive, not really causal. I re- membered enough from my physics education in high school and Princeton that association is not causation. So I was probably not intrigued by causal inference per se, but rather by the confusion that the social scien- tists had about it. You have to describe a real or hy- pothetical experiment where you could intervene, and after you intervene, you see how things change, not in time but between intervention (i.e., treatment) groups. If you are not talking about intervention, you can’t talk about causality. For some reason, when I look at old philosophy, it seems to me that they didn’t get it right, whereas in previous centuries, some experimenters got it. They bred cows, or mated hunting falcons. If you mated excellent female and male falcons, the resulting next generation of falcons would generally be better hunters than those resulting from random mating. In the 20th century, many scientists and experimentalists got it. -Don: 当我和Althauser一起研究种族差异问题的时候,我总是跟他强调那是固有的描述,并不是真正的因果关系。我记得我在高中和在普林斯顿学物理的时候就知道关联不是因果。我可能对因果推断并不好奇,但是我对一些社会科学家对它好奇很不解。我必须要描述一个真实的或者假设的实验,你可以介入它,在你介入之后,你会看到事情是如何变化的,不是瞬时的,而是在不同的介入之间(比如不同的治疗)。如果你不谈介入,你就不能谈因果。由于某些原因,当我看旧哲学的时候,对我来说那可能不对,然而在前几个世纪,一些实验员验证了它。他们养奶牛,或者让猎鹰交配。如果你让优秀的公母猎鹰交配,下一代的狩猎水平要高于随机交配的结果。在20世纪,许多科学家和实验员得到了这个结果。 +Don: 当我和Althauser一起研究种族差异问题的时候,我总是跟他强调那是固有的描述,并不是真正的因果关系。我记得我在高中和在普林斯顿学物理的时候就知道关联不是因果。我可能最初对因果推断本身并不好奇,而是好奇一些社会科学家这种因果关系的不解。在对因果性的讨论里,你必须要描述一个真实的或者假设的实验,然后你可以介入它,在你介入之后,你会看到事情是如何变化的,不是瞬时的,而是在不同的介入之间(比如不同的治疗)。如果你不谈介入,你就不能谈因果。由于某些原因,当我看过去对因果关系的哲学讨论时,我总觉得那可能不对,尽管在前几个世纪,一些实验员验证了它。他们养奶牛,或者让猎鹰交配。如果你让优秀的公母猎鹰交配,下一代的狩猎水平应该要高于随机交配的结果。后来在20世纪,许多科学家和实验员得到了这个结果。 Fabri: So you were only doing descriptive compar- isons in your Ph.D. thesis, and the notation of potential outcomes was not there. -Fabri:所以你在你的博士论文里只做了描述性的比较,潜在的结果注释不在里面。 +Fabri:所以你在你的博士论文里只做了描述性的比较,并没有指出可能的结果。 Don: Partly correct. At that time, the notation of po- tential outcomes was in my mind, because that is the way that Cochran initiated discussions of randomized experiments in the class he taught in 1968. Initially, it was all based on randomization, unbiasedness, Fisher’s test, etc. But the concepts had to be flipped into or- dinary least squares (OLS) regression and analysis of variance tables, because nobody could compute any- thing difficult back then. One of the lessons in Bill’s class in regression and experimental design was to use the abbreviated Dolittle method to invert matrices, by hand! So you really couldn’t do randomization tests in any generality. The other reason I was interested in ex- periments and social science was my family history. There was always this legal question lurking: “But for this alleged misconduct, what would have happened?” -Don:一部分是对的。在那个时候,潜在结果的注释在我脑袋里,因为那个是Cochran在1968年他的随机实验课堂上讨论的方法。最开始,全部都是基于随机,无偏,Fisher检验等等。但是这些概念必须要被注入普通最小二乘回归,方差分析表,因为没人能计算出任何困难的东西。在Bill的一堂回归和实验设计课上,用缩略的Dolittle方法求矩阵的逆,徒手!所以你真的对随机检验不能做任何普适的推广。另一个让我对实验和社会科学感兴趣的原因是我的家庭史。总是有这样一个法律问题埋在我心里:“对于这个涉嫌渎职,会发生什么呢?” +Don:一部分是对的。在那个时候,这种可能的结论在我脑袋里,因为这来自于Cochran在1968年他的随机实验课堂上讨论的方法。最开始,全部都是基于随机,无偏,Fisher检验等等。但是这些概念必须要被注入普通最小二乘回归,方差分析表,因为没人能计算出任何困难的东西。在Bill的一堂回归和实验设计课上,我们要用缩略的Dolittle方法求矩阵的逆,手算!所以你真的对随机检验不能做任何普适的推广(译者注:因此无法将随机对照组和试验组进行对比,得出因果结论)。另一个让我对实验和社会科学感兴趣的原因是我的家庭史。总是有这样一个法律问题埋在我心里:“对于这个涉嫌渎职,会发生什么呢?” Fan: What was your first job after getting your Ph.D. degree in 1970? @@ -181,7 +181,7 @@ Fan: 在你1970年拿到博士学位你的第一份工作是什么? Don: I stayed at Harvard for one more year, as an instructor in the Statistics Department, partly sup- ported by teaching, partly supported by the Cambridge Project, which was an ARPA funded Harvard–MIT joint effort; the idea was to bring the computer science technologies of MIT and the social sciences research of Harvard together to do wonderful things in the social sciences. In the Statistics Department, I was coteaching with Bob Rosenthal the “Statistics for Psychologists” course that, ironically, the Social Relations Department wanted me to take five years earlier, thereby driving me out of their department! Bob had, and has, tremendous intuition for experimental design and other practical is- sues, and we have written many things together. -Don:我在哈佛多待了一年,作为统计系的讲师,一部分工资来自教学,一部分来自剑桥项目,这是一个ARPA支持的哈佛-麻省理工联合项目;旨在将麻省理工的计算机科学技术和哈佛大学的社会科学研究结合到一起在社会科学上做一些有趣的事情。在统计系,我和Bob Rosenthal一起教“统计心理学”,讽刺的是,5年前社会科学院就想让我上这门课,为了这个,他们把我逐出了他们院!Bob对于实验设计和其他一些实践问题很有直觉,我们一起写了很多东西。 +Don:我在哈佛多待了一年,作为统计系的讲师,一部分工资来自教学,一部分来自剑桥(美国马萨诸塞州东部城市,哈佛大学所在地)项目,这是一个ARPA支持的哈佛-麻省理工联合项目;旨在将麻省理工的计算机科学技术和哈佛大学的社会科学研究结合到一起在社会科学上做一些有趣的事情。在统计系,我和Bob Rosenthal一起教“统计心理学”,讽刺的是,5年前社会科学院就想让我上这门课,为了这个,他们把我逐出了他们院!Bob对于实验设计和其他一些实践问题很有直觉,我们一起写了很多东西。 # THE ETS DECADE: MISSING DATA, EM AND CAUSAL INFERENCE # ETS的10年:缺失数据,EM和因果推断 @@ -192,7 +192,7 @@ Fan: 一年后,你在普林斯顿的ETS找了一个职位,而不是在大学 Don: Right—many people thought I was goofy. I did have several good offers, one was to stay at Harvard, and another was to go to Dartmouth. But I met Al Beaton, who was later my boss at ETS in Princeton, at a conference in Madison, Wisconsin, and he offered me a job, which I took. Al had a doctorate in education at Harvard, and had worked with Dempster on compu- tational issues, such as the “sweep operator.” He was a great guy with a deep understanding of practical com- puting issues. Also, he appreciated my research. Be- cause I was an undergrad at Princeton, it was almost like going home. For several years, I taught one course at Princeton. Between the jobs at ETS and Princeton, I was earning twice what the Harvard salary would have been, which allowed me to buy a house on an acre and a half, with a garage for rebuilding an older Mer- cedes roadster, etc. A different style of life from that in Cambridge. -Don:很多人认为我是一个傻瓜。我有很多不错的机会,一个是待在哈佛,另一个是去达特茅斯。但我遇到了Al Beaton,后来他是我在普林斯顿ETS的老板,在威斯康星麦迪逊的一个会议上,他给了我这份工作,我接受了。Al在哈佛有一个博士学位,他跟Dempster做一些计算问题,例如“扫描算子”。他是个很好的人,并且对计算问题有很深刻的理解,这大概就像是回家。几年来,我在普林斯顿教一门课。在ETS和普林斯顿的工作之间,我赚了在哈佛的工资的两杯,这让我买了一个1.5英亩的房子,有一个车库,改造旧的奔驰跑车,等等。这是和在剑桥完全不同的生活方式。 +Don:很多人认为我是一个傻瓜。我有很多不错的机会,一个是待在哈佛,另一个是去达特茅斯。但我遇到了Al Beaton,后来他是我在普林斯顿ETS的老板,在威斯康星麦迪逊的一个会议上,他给了我这份工作,我接受了。Al在哈佛有一个博士学位,他跟Dempster做一些计算问题,例如“扫描算子”。他是个很好的人,并且对计算问题有很深刻的理解,回普林大概就像是回家。几年来,我在普林斯顿教一门课。在ETS和普林斯顿的工作之间,我赚了在哈佛的工资的两倍,这让我买了一个1.5英亩的房子,有一个车库,改造旧的奔驰跑车,等等。这是和在剑桥完全不同的生活方式。 Fan: You seem to have had a lot of freedom to pur- sue research at the ETS. What was your responsibility at ETS? @@ -200,15 +200,15 @@ Fan: 看起来在ETS做研究有很多自由。你在ETS的职责是什么? Don: The position at ETS was like an academic posi- tion with teaching responsibilities replaced by consult- ing on ETS’s social science problems, including psy- chological and educational testing ones. I found con- sulting much easier for me than teaching, and ETS had interesting problems. Also there were many very good people around, like Fred Lord, who was highly respected in psychometrics. The Princeton faculty was great, too: Geoffrey Watson (of the Durbin–Watson statistic) was the chair; Peter Bloomfield was there as a junior faculty member before he moved to North Car- olina; and of course Tukey was still there, even though he spent a lot of time at Bell Labs. John was John, hav- ing a spectacular but very unusual way of thinking— obviously a genius. Stuart Hunter was in the Engineer- ing School then. These were fine times for me, with tremendous freedom to pursue what I regarded as im- portant work. -Don:在ETS的工作有点像教职,只是在做ETS的社会科学的咨询而不是教学了,包括心理的教育的问题。我发现咨询比教学对我来说简单,ETS有很多有意思的问题。而且周围有很多比错的人,比如Fred Lord,在心理测验学上很有名望。普林斯顿的老师们也很好:Geoffrey Watson(of the Durbin–Watson statistic)是系主任;Peter Bloomfield在搬到北卡之前是初级教授;Tukey一直在那,即使他在Bell实验室花了很多时间。John是John,有很不寻常的思考方式,很显然是个天才。Stuart Hunter后来去了工程学院。这些对我来说都是好时光,有大量自由的时间来追求我认为重要的工作。 +Don:在ETS的工作有点像教职,只是在做ETS的社会科学的咨询而不是教学了,包括心理的教育的问题。我发现咨询比教学对我来说简单,ETS有很多有意思的问题。而且周围有很多不错的人,比如Fred Lord,在心理测验学上很有名望。普林斯顿的老师们也很好:Geoffrey Watson(of the Durbin–Watson statistic)是系主任;Peter Bloomfield在搬到北卡之前是初级教授;Tukey一直在那,即使他在Bell实验室花了很多时间。John毕竟是John,有很不寻常的思考方式,很显然是个天才。Stuart Hunter后来去了工程学院。这些对我来说都是好时光,有大量自由的时间来追求我认为重要的工作。 Fabri: By any measure, your accomplishments in the ETS years were astounding. In 1976, you published the paper “Inference and Missing Data” in Biometrika (Rubin, 1976) that lays the foundation for modern anal- ysis of missing data; in 1977, with Arthur Dempster and Nan Laird, you published the EM paper “Max- imum Likelihood from Incomplete Data via the EM Algorithm” in JRSS-B (Dempster, Laird and Rubin, 1977); in 1974, 1977, 1978, you published a series of papers that lay the foundation for the Rubin Causal Model (Rubin, 1974, 1977, 1978a). What was it like for you at that time? How come so many groundbreak- ing ideas exploded in your mind at the same time? -Fabri: 通过一些衡量,你在ETS的几年的成就是令人震惊的。在1976年,你在Biometrika上发表了“推断和却是数据”(Rubin, 1976),为现代缺失数据的分析奠定了基础;在1977年,你和Arthur Dempster,Nan Laird一起在JRSS-B上发表了“通过EM算法从不完整数据估计极大似然”(Dempster, Laird and Rubin, 1977);在1974,1977,1978年,你发表了一系列奠定了因果模型基础的文章(Rubin, 1974, 1977, 1978a)。你在那个时候是什么样的?你是如何在同一时间作出这么多开创性的工作的? +Fabri: 不论以哪种标准,你在ETS的几年的成就都是令人震惊的。在1976年,你在Biometrika上发表了“推断和缺失数据”(Rubin, 1976),为现代缺失数据的分析奠定了基础;在1977年,你和Arthur Dempster,Nan Laird一起在JRSS-B上发表了“通过EM算法从不完整数据估计极大似然”(Dempster, Laird and Rubin, 1977);在1974,1977,1978年,你发表了一系列奠定了因果模型基础的文章(Rubin, 1974, 1977, 1978a)。你在那个时候是什么样的?你是如何在同一时间作出这么多开创性的工作的? Don: Probably the most important reason is that I al- ways worried about solving real problems. I didn’t read the literature to uncover a hot topic to write about. I al- ways liked math, but I never regarded much of math- ematical statistics as real math—much of it is just so tedious. Can you keep track of these epsilons? -Don:可能最重要的原因是我总是很担忧解决实际问题。我没有读过写热门话题的文献。我一直喜欢数学,但是我不认为数理统计是真正的数学,大部分数学问题是很枯燥的。你能一直追溯这些epsilon吗? +Don:可能最重要的原因是我总是很担忧解决实际问题。我也不去找学术热门来写文章。我一直喜欢数学,但是我不认为数理统计是真正的数学,大部分数学问题是很枯燥的。而且难道你能搞清这些误差项吗? Fabri: There is no coincidence that all these papers share the common theme of missing data. @@ -216,7 +216,7 @@ Fabri: 毫无巧合的所有这些论文都有共同的主题:缺失数据。 Don: That’s right. That theme arose when I was a graduate student. The first paper I wrote on missing data, which is also my first sole-authored paper, was on analysis of variance designs, a quite algorithmic pa- per. It was always clear to me, from the experimental design course from Cochran that you should set up ex- periments as missing data problems, with all the poten- tial outcomes under the not-taken treatments missing. But nobody did observational studies that way, which seemed very odd to me. Indeed, nobody was using po- tential outcomes outside the context of randomized ex- periments, and even there, most writers dropped poten- tial outcomes in favor of least squares when actually doing things. -Don:是的。在我还是研究生的时候这个主题就出现了。我写的第一篇关于缺失数据的论文,也是我的第一篇唯一作者的论文,是关于实验设计的方差分析,一个非常偏算法的论文。我很清楚,从Cochran的实验课上,你应该把实验问题当作缺失数据问题,对于没有处理的,潜在结果是缺失的。但是没有人做观测的研究,对于我来说很奇怪。事实上,没有人用随机实验以外的潜在结果,甚至,很多学者在真正做事情的时候为了支持最小二乘,直接丢掉潜在结果。 +Don:是的。在我还是研究生的时候这个主题就出现了。我写的第一篇关于缺失数据的论文,也是我的第一篇唯一作者的论文,是关于实验设计的方差分析,一个非常偏算法的论文。我很清楚,从Cochran的实验课上,你应该把实验问题当作缺失数据问题,对于没有随机处理的,潜在结果是缺失的。但是没有人做观测的研究,对于我来说很奇怪。事实上,没有人用随机实验以外的潜在结果,甚至,很多学者在真正做事情的时候为了支持最小二乘,直接丢掉潜在结果。 Fan: What was the state of research on missing data before you came into the scene? From 21e242c0d97b9c9f363af7f6ba0ba8ed3106343d Mon Sep 17 00:00:00 2001 From: JianqiaoWang Date: Sun, 5 Mar 2017 12:19:31 +0800 Subject: [PATCH 3/7] Update Rubin_all.Rmd --- Rubin_all.Rmd | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/Rubin_all.Rmd b/Rubin_all.Rmd index 617e209..b187fd0 100644 --- a/Rubin_all.Rmd +++ b/Rubin_all.Rmd @@ -93,7 +93,7 @@ Fan:你在普林斯顿高年级的时候,申请了心理学的博士项目 Don: Yes, I was accepted by Stanford, Michigan and Harvard. I met some extraordinary people during my visits to these programs. I went out to Stanford first, and met William Estes, a quiet but wonderful profes- sor with strong mathematical skills and a wry wit, who later moved to Harvard. Michigan had a very strong mathematical psychology program, and when I visited in the spring of 1965, I was hosted primarily by a very promising graduating Ph.D. student, Amos Tversky, who was doing extremely interesting work on human behavior and how people handled risks. In later years, he connected with another psychologist, Daniel Kah- neman, and they wrote a series of extremely influential papers in psychology and economics, which eventually led to Kahneman’s winning the Nobel Prize in Eco- nomics in 2002; Tversky passed away in 1996 and was thus not eligible for the Nobel Prize. Kahneman (who recently was awarded a National Medal of Science by President Obama) always acknowledges that the Nobel Prize was really a joint award (to Tversky and him). I was on a committee sometime last year with Kahne- man, and it was interesting to find out that I had known Tversky longer than he had. -Don:是的,我被斯坦福,密歇根和哈佛录取了。在我访问这些项目的过程中认识了很多杰出人物。我先去了斯坦福,认识了William Estes,一个很安静的但是很厉害的教授,他有很强的数学技能并且风趣幽默。后来他去了哈佛。密歇根有很强的数学心理学项目,当我1965年春天访问的时候,我被一个非常有前途的博士生接待了,Amos Tversky,当时在做非常有趣的人类行为和人是如何处理风险的项目。后来的几年,他和另一个心理学家Daniel Kahneman一起在心理学和经济学领域写了一系列极有影响力的文章,最终使得Kahneman在2002年获得诺贝尔经济学奖。Tversky在1996年去世,因此没有资格拿到诺贝尔奖。Kahneman经常致谢那个诺贝尔奖是对他俩共同的奖励。去年我和Kahneman一起作为评审,非常有意思的是,发现我认识Tversky的时间比他长。 +Don:是的,我被斯坦福,密歇根和哈佛录取了。在我访问这些项目的过程中认识了很多杰出人物。我先去了斯坦福,认识了William Estes,一个很安静的但是很厉害的教授,他有很强的数学技能并且风趣幽默。后来他去了哈佛。密歇根有很强的数学心理学项目,当我1965年春天访问的时候,我被一个非常有前途的博士生接待了,Amos Tversky,当时在做非常有趣的人类行为和人是如何处理风险的项目。后来的几年,他和另一个心理学家Daniel Kahneman一起在心理学和经济学领域写了一系列极有影响力的文章,最终使得Kahneman在2002年获得诺贝尔经济学奖。Tversky在1996年去世,因此没有资格拿到诺贝尔奖。Kahneman经常说那个诺贝尔奖是对他俩共同的奖励。去年我和Kahneman一起作为评审,非常有意思的是,发现我认识Tversky的时间比他长。 Fan: But ultimately you chose Harvard. @@ -192,7 +192,7 @@ Fan: 一年后,你在普林斯顿的ETS找了一个职位,而不是在大学 Don: Right—many people thought I was goofy. I did have several good offers, one was to stay at Harvard, and another was to go to Dartmouth. But I met Al Beaton, who was later my boss at ETS in Princeton, at a conference in Madison, Wisconsin, and he offered me a job, which I took. Al had a doctorate in education at Harvard, and had worked with Dempster on compu- tational issues, such as the “sweep operator.” He was a great guy with a deep understanding of practical com- puting issues. Also, he appreciated my research. Be- cause I was an undergrad at Princeton, it was almost like going home. For several years, I taught one course at Princeton. Between the jobs at ETS and Princeton, I was earning twice what the Harvard salary would have been, which allowed me to buy a house on an acre and a half, with a garage for rebuilding an older Mer- cedes roadster, etc. A different style of life from that in Cambridge. -Don:很多人认为我是一个傻瓜。我有很多不错的机会,一个是待在哈佛,另一个是去达特茅斯。但我遇到了Al Beaton,后来他是我在普林斯顿ETS的老板,在威斯康星麦迪逊的一个会议上,他给了我这份工作,我接受了。Al在哈佛有一个博士学位,他跟Dempster做一些计算问题,例如“扫描算子”。他是个很好的人,并且对计算问题有很深刻的理解,回普林大概就像是回家。几年来,我在普林斯顿教一门课。在ETS和普林斯顿的工作之间,我赚了在哈佛的工资的两倍,这让我买了一个1.5英亩的房子,有一个车库,改造旧的奔驰跑车,等等。这是和在剑桥完全不同的生活方式。 +Don:很多人认为我是一个傻瓜。我有很多不错的机会,一个是待在哈佛,另一个是去达特茅斯。但我遇到了Al Beaton,后来他是我在普林斯顿ETS的老板,在威斯康星麦迪逊的一个会议上,他给了我这份工作,我接受了。Al在哈佛有一个博士学位,他跟Dempster做一些计算问题,例如“扫描算子”。他是个很好的人,并且对计算问题有很深刻的理解,而且也很欣赏我的工作。作为曾经普林的本科生,回普林大概就像是回家。几年来,我在普林斯顿教一门课。在ETS和普林斯顿的工作之间,我赚了在哈佛的工资的两倍,这让我买了一个1.5英亩的房子,有一个车库,改造旧的奔驰跑车,等等。这是和在剑桥完全不同的生活方式。 Fan: You seem to have had a lot of freedom to pur- sue research at the ETS. What was your responsibility at ETS? @@ -208,7 +208,7 @@ Fabri: 不论以哪种标准,你在ETS的几年的成就都是令人震惊的 Don: Probably the most important reason is that I al- ways worried about solving real problems. I didn’t read the literature to uncover a hot topic to write about. I al- ways liked math, but I never regarded much of math- ematical statistics as real math—much of it is just so tedious. Can you keep track of these epsilons? -Don:可能最重要的原因是我总是很担忧解决实际问题。我也不去找学术热门来写文章。我一直喜欢数学,但是我不认为数理统计是真正的数学,大部分数学问题是很枯燥的。而且难道你能搞清这些误差项吗? +Don:可能最重要的原因是我总是很担忧解决实际问题。我也不特意去找学术热门来写文章。我一直喜欢数学,但是我不认为数理统计是真正的数学,大部分数理统计问题是很繁冗的。而且难道你能搞清那些误差项吗? Fabri: There is no coincidence that all these papers share the common theme of missing data. @@ -232,7 +232,7 @@ Fan: EM算法是现代统计的另一个里程碑;它也和计算机相关, Don: In those early years at ETS, I had the free- dom to remain in close contact with the Harvard peo- ple, Cochran, Dempster, Holland and Rosenthal, which was very important to me. I always enjoyed talking to Dempster, who is a very principled and deep thinker. I was able to arrange some consulting projects at ETS to bring him to Princeton. Once we were talking about some missing data problem, and we started discussing filling these values in, but I knew it wouldn’t work in generality. I pointed to a paper by Hartley and Hock- ing (1971), where they deserted the approach of itera- tively filling in missing values, as in Hartley (1956) for the counted data case, and went to Newton–Raphson, I think, in the normal case. Even though aspects of EM were known for years, and Hartley and others were sort of nibbling around the edges of EM, apparently nobody put it all together as a general algorithm. Art and I real- ized that you have to fill in sufficient statistics. I had all these examples like t distributions, factor analysis (the ETS guys loved that), latent class models. And Art had a great graduate student, Nan Laird, available to work on parts of it, and we started writing it up. The EM paper was accepted right away by JRSS-B, even with invited discussions. -Don:在ETS的早些年,我可以自由的和哈佛的人联系,比如Cochran, Dempster, Holland和Rosenthal,这对我来说非常重要。我非常喜欢和Dempster聊天,他很有原则思考问题很深入。我能在ETS安排一些咨询项目带他到普林斯顿。一次我们聊缺失数据的问题,我们开始讨论插补值,但是我知道它不具有普适性。我指出一篇论文,Hartley和Hocking (1971)写的,里面用了迭代的方法来插补缺失数据,正如Hartley(1956)的计数数据的情形,后来发展到牛顿算法,我认为,是一个普遍情形。即使EM的种种已经被知道很多年了,Hartley和其他人有点咬在EM的边缘,很明显没有人把它总结成一个普适性的算法。Art和我意识到必须填补充分统计量。我做了所有的例子,如t分布,因子分析(ETS的人们喜欢),隐类模型。Art有一个很好的研究生,Nan Laird,可以做一部分工作,我们开始写起来。EM被JRSS-B接收,并且被邀请讨论。 +Don:在ETS的早些年,我可以自由的和哈佛的人联系,比如Cochran, Dempster, Holland和Rosenthal,这对我来说非常重要。我非常喜欢和Dempster聊天,他很有原则,思考问题很深入。我能在ETS安排一些咨询项目带他到普林斯顿。一次我们聊缺失数据的问题,我们开始讨论插补值,但是我知道它不具有普适性。我指出一篇论文,Hartley和Hocking (1971)写的,里面用了迭代的方法来插补缺失数据,正如Hartley(1956)讨论的关于计数数据的情形,后来发展到牛顿算法,我认为,是一个普遍情形。即使EM的种种已经被知道很多年了,Hartley和其他人有点咬在EM的边缘,很明显没有人把它总结成一个普适性的算法。Art和我意识到必须填补充分统计量。我做了所有的例子,如t分布,因子分析(ETS的人们喜欢),隐类模型。Art有一个很好的研究生,Nan Laird,可以做一部分工作,我们开始写起来。EM被JRSS-B接收,并且被邀请讨论。 Fan: Now let’s talk more about causal inference. You are known for proposing the general potential out- come framework. It was Neyman who first mentioned the notation of potential outcomes in his Ph.D. thesis (Neyman, 1990), but the notation seemed to have long been neglected. @@ -240,7 +240,7 @@ Fan: 让我们再说说因果推断。你因为提出了普适的潜在结果框 Don: Yes, it was ignored outside randomized exper- iments. Within randomized studies, the notion became standard and used, for example, in Kempthorne’s work, but as I mentioned earlier, ignored otherwise. -Don:是的,它被忽略在随机试验之外了。在随机领域,这个概念变得标准和通用,比如,在Kempthorne的工作中,但是在我之前提到的,它被忽略了。 +Don:是的,它被忽略在随机试验之外了。在随机领域,这个概念变得标准和通用,比如,在Kempthorne的工作中,但是在其他领域,如我之前说过的,它被忽略了。 Fan: Were you aware of Neyman’s work before? @@ -248,7 +248,7 @@ Fan: 你之前注意到Neyman的工作了吗? Don: No. I wasn’t aware of his work defining po- tential outcomes until 1990 when his Ph.D. thesis was translated into English, although I attributed much of the perspective to him because of his work on surveys in Neyman (1934) and onward (see Rubin, 1990a, fol- lowed by Rubin, 1990b). -Don:不。1990年以前,我都没有注意到他的工作中定义了潜在结果,知道他的博士论文被翻译成英文,虽然我对他在调查上Neyman (1934)的工作充满了敬佩。 +Don:不。1990年以前,我都没有注意到他的工作中定义了潜在结果,直到他的博士论文被翻译成英文,虽然我对他在调查上Neyman (1934)的工作充满了敬佩。 Fabri: You actually met Neyman when you visited Berkeley in the mid-1970s. During all those lunches, had you ever discussed causal inference and potential outcomes with him? @@ -256,7 +256,7 @@ Fabri: 事实上你在20世纪70年代中去伯克利访问见到了Neyman。在 Don: I did. In fact, I had an office right next to his. Neyman came to Berkeley in the late 30s. He was very impressive, not only as a mathematical statistician, but also as an individual. There was a tremendous aura about him. Shortly after arriving in Berkeley, I gave a talk on missing data and causal inference. The next day, I went to lunch with Neyman and I said something like, “It seems to me that formulating causal prob- lems in terms of missing potential outcomes is an ob- vious thing to do, not just in randomized experiments, but also in observational studies.” Neyman answered to the effect that (remarkable in hindsight because he did so without acknowledging that he was the person who first formulated potential outcomes), “No, causal- ity is far too speculative in nonrandomized settings.” He repeated something like this quote from his biog- raphy, “...Without randomization an experiment has little value irrespective of the subsequent treatment.” (Also see my comment on this conversion in Rubin, 2010.) Then he went to say politely but firmly, “Let’s not talk about that, let’s instead talk about astronomy.” He was very into astronomy at the time. -Don:讨论了。事实上,我的办公室在他右边。Neyman在30年代末来到伯克利。他给人印象深刻,不仅仅是一个数理统计学家,也很有个人魅力。他有一种巨大的光环。在来到伯克利不久,我做了一个缺失数据和因果推断的演讲。第二天,我和Neyman吃午饭,我说,“看起来就缺失潜在结果的因果推断问题是很明显要做的事情,不仅仅在随机实验中,在观测研究中也应该做。”Neyman回答(事后很显著,因为他这么做并没有把自己当作第一个提出潜在结果的人),“不,因果在非随机的设置里过于投机。”他在他的自传中重复了类似的话,“。。。没有随机性的实验的后续处理是没有价值的。”(Also see my comment on this conversion in Rubin, 2010.)然后他很有礼貌很严肃的说,“我们不讨论这个,我们讨论天文学。”那个时候他很擅长天文学。 +Don:讨论了。事实上,我的办公室在他右边。Neyman在30年代末来到伯克利。他给人印象深刻,不仅仅是一个数理统计学家,也很有个人魅力。他有一种巨大的光环。在来到伯克利不久,我做了一个缺失数据和因果推断的演讲。第二天,我和Neyman吃午饭,我说,“看起来就缺失潜在结果的因果推断问题是很明显要做的事情,不仅仅在随机实验中,在观测研究中也应该做。”Neyman回答(事后来看这段对话真是很重要,因为他说这些话时并没有认为自己是第一个提出潜在结果的人),“不,因果在非随机的设置里过于投机。”他在他的自传中重复了类似的话,“。。。没有随机性的实验的后续处理是没有价值的。”(Also see my comment on this conversion in Rubin, 2010.)然后他很有礼貌很严肃的说,“我们不讨论这个,我们讨论天文学。”那个时候他很擅长天文学。 Fabri: You probably learned the reasons why he was so involved in the frequentist approach. @@ -264,7 +264,7 @@ Fabri: 你可能知道了为什么他这么支持频率派方法。 Don: Yes. I remember we once had a conversation about what confidence intervals really meant and why the formal Neyman–Pearson approach seemed irrele- vant to me. He said something like, “You misinterpret what we have done. We were doing the mathematics; go back and read my 1934 paper where I first defined a confidence interval.” He defined it as a procedure that has the correct coverage for all prior distributions (see page 589, Neyman, 1934). If you think of that, you are forced to include all point mass priors and, therefore, you are forced to do Neyman–Pearson. He went on to say (approximately), “If you are a real scientist with a class of problems to work on, you don’t care about all point-mass priors, you only care about the priors for the class of problems you will be working on. But if you are doing the mathematics, you can’t talk about the problems you or anyone is working on.” I tried to make this point in a comment (Rubin, 1995), but it didn’t seem to resonate to others. -Don:是的。我记得有一次我们谈论置信区间的真实含义,以及为什么Neyman–Pearson的方法看起来跟我的毫不相关。他说,“你错误地解释了我们已经做的。我们做的是数学;回去读我1934年的论文,里面我定义了置信区间。”他把它定义为对所有先验分布有正确覆盖的过程(see page 589, Neyman, 1934)。如果你这么认为,你必须包括所有点的先验,你必须做Neyman–Pearson。他继续说(大概),“如果你是一个真正的科学家,在研究一类问题,你不要关注所有点的先验,你只需要关注你在做的那类问题的先验。但是如果你在做数学,你不能谈论你或者其他人在做的东西。”我试图在一篇评论中指出这一点(Rubin, 1995),但似乎并没有引起共鸣。 +Don:是的。我记得有一次我们谈论置信区间的真实含义,以及为什么Neyman–Pearson的方法看起来跟我的毫不相关。他说,“你错误地解释了我们的工作。我们做的是数学;回去读我1934年的论文,里面我定义了置信区间。”他把它定义为对所有先验分布有正确覆盖的过程(see page 589, Neyman, 1934)。如果你这么认为,你必须包括所有点的先验,你必须做Neyman–Pearson。他继续说(大概),“如果你是一个真正的科学家,在研究一类问题,你不要关注所有点的先验,你只需要关注你在做的那类问题的先验。但是如果你在做数学,你不能谈论你或者其他人在做的东西。”我试图在一篇评论中指出这一点(Rubin, 1995),但似乎并没有引起共鸣。 Fabri: In his famous 1986 JASA paper, Paul Hol- land coined the term “Rubin Causal Model (RCM),” referring to the potential outcome framework to causal inference (Holland, 1986). Can you explain why, if you think so, the term “Rubin Causal Model” is a fair de- scription of your contribution to this topic? From eba0ed2544acf38e03ac4fe00331a86bc4909c3d Mon Sep 17 00:00:00 2001 From: JianqiaoWang Date: Mon, 6 Mar 2017 08:15:55 +0800 Subject: [PATCH 4/7] Update Rubin_all.Rmd --- Rubin_all.Rmd | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/Rubin_all.Rmd b/Rubin_all.Rmd index b187fd0..f8b1c4c 100644 --- a/Rubin_all.Rmd +++ b/Rubin_all.Rmd @@ -101,7 +101,7 @@ Fan: 但最终你选择了哈佛。 Don: Well, we all make strange decisions. The rea- son was that I had an east-coast girlfriend who had an- other year in college. -Don:我们所有人都做奇怪的决定。原因是我在东海岸的女朋友还要再读一年。 +Don:我们所有人都做奇怪的决定。原因是我在东海岸的女朋友还要再读一年大学。 # GRADUATE YEARS AT HARVARD @@ -224,7 +224,7 @@ Fan: 在你进入这个领域之前,缺失数据是什么研究状态? Don: It was extremely ad hoc. The standard ap- proach to missing data then was comparing the biases of filling in the means, or of regression imputation un- der different situations, but almost always under an im- plicit “missing completely at random” assumption. The purely technical sides of these papers are solid. But I found there were always counter examples to the pro- priety of the specific methods being considered, and to explore them, one almost needed a master’s thesis for each situation. I would rather address the class of prob- lems with some generality. There is a mechanism that creates missing data, which is critical for deciding how to deal with the missing data. That idea of formal indi- cators for missing data goes way back in the contexts of experimental design and survey design. I am consis- tently amazed how this was not used in observational studies until I did so in the 1970s; maybe someone did, but I’ve looked for years and haven’t found anything. But probably because the missing data paper was done in a relatively new way, I had great difficulty in getting it published (more details in Rubin, 2014a). -Don:它非常临时。标准的解决缺失数据的方法是比较用均值填补后的偏差,或者在不同的情况下,用回归插补,但是都要基于“随机缺失”的假设。这些论文在纯的技术上都非常扎实。但是我发现对于科学的方法,总有反例,为了探索这个问题,可能每种情况都是一篇硕士论文。我宁愿用一种更普适的方法来解决这类问题。有一个机制能够生成缺失数据,这对如何解决缺失数据很关键。那个思想可以还原到实验设计和调查设计上。指直到我20世纪70年代做这件事的时候,我很惊讶为什么没有用到观察研究上;可能有人做了,但这么多年我什么都没找到。但可能因为缺失数据的论文相对比较新,发表的过程中我遇到很大的困难(more details in Rubin, 2014a)。 +Don:它非常临时。那时标准的解决缺失数据的方法是比较用均值填补后的偏差,或者在不同的情况下,用回归插补,但是都要基于“随机缺失”的假设。这些论文在纯的技术上都非常扎实。但是我发现对于科学的方法,总有反例,为了探索这个问题,可能每种情况都需要写一篇硕士论文。我宁愿用一种更普适的方法来解决这类问题,缺失数据的生成存在一个机制,它对于如何处理缺失数据很关键。那个思想可以还原到实验设计和调查设计上。指直到我20世纪70年代做这件事的时候,我很惊讶为什么没有用到观察研究上;可能有人做了,但这么多年我什么都没找到。但可能因为缺失数据的论文相对比较新,发表的过程中我遇到很大的困难(more details in Rubin, 2014a)。 Fan: The EM algorithm is another milestone in mod- ern statistics; it is also relevant in computer science and one of the most important algorithm in data mining. Though similar ideas had been used in several specific contexts before, nobody had realized the generality of EM. How did Dempster, Laird and you discover the generality? @@ -260,31 +260,30 @@ Don:讨论了。事实上,我的办公室在他右边。Neyman在30年代末 Fabri: You probably learned the reasons why he was so involved in the frequentist approach. -Fabri: 你可能知道了为什么他这么支持频率派方法。 +Fabri: 你可能知道了他为什么这么支持频率派方法。 Don: Yes. I remember we once had a conversation about what confidence intervals really meant and why the formal Neyman–Pearson approach seemed irrele- vant to me. He said something like, “You misinterpret what we have done. We were doing the mathematics; go back and read my 1934 paper where I first defined a confidence interval.” He defined it as a procedure that has the correct coverage for all prior distributions (see page 589, Neyman, 1934). If you think of that, you are forced to include all point mass priors and, therefore, you are forced to do Neyman–Pearson. He went on to say (approximately), “If you are a real scientist with a class of problems to work on, you don’t care about all point-mass priors, you only care about the priors for the class of problems you will be working on. But if you are doing the mathematics, you can’t talk about the problems you or anyone is working on.” I tried to make this point in a comment (Rubin, 1995), but it didn’t seem to resonate to others. -Don:是的。我记得有一次我们谈论置信区间的真实含义,以及为什么Neyman–Pearson的方法看起来跟我的毫不相关。他说,“你错误地解释了我们的工作。我们做的是数学;回去读我1934年的论文,里面我定义了置信区间。”他把它定义为对所有先验分布有正确覆盖的过程(see page 589, Neyman, 1934)。如果你这么认为,你必须包括所有点的先验,你必须做Neyman–Pearson。他继续说(大概),“如果你是一个真正的科学家,在研究一类问题,你不要关注所有点的先验,你只需要关注你在做的那类问题的先验。但是如果你在做数学,你不能谈论你或者其他人在做的东西。”我试图在一篇评论中指出这一点(Rubin, 1995),但似乎并没有引起共鸣。 +Don:是的。我记得有一次我们谈论置信区间的真实含义,以及为什么Neyman–Pearson的方法看起来跟我的毫不相关。他说,“你错误地理解了我们的工作。我们做的是数学;回去读我1934年的论文,那里面我第一次定义了置信区间。”他把它定义为对所有先验分布有正确覆盖的过程(see page 589, Neyman, 1934)。如果你这么认为,你必须包括所有点的先验,因此你必须做Neyman–Pearson。他继续说(大概),“如果你是一个真正的科学家,在研究一类问题,你不要关注所有点的先验,你只需要关注你在做的那类问题的先验。但是如果你在做数学,你不能谈论你或者其他人在做的东西。”我试图在一篇评论中指出这一点(Rubin, 1995),但似乎并没有引起共鸣。 Fabri: In his famous 1986 JASA paper, Paul Hol- land coined the term “Rubin Causal Model (RCM),” referring to the potential outcome framework to causal inference (Holland, 1986). Can you explain why, if you think so, the term “Rubin Causal Model” is a fair de- scription of your contribution to this topic? -Fabri: 在1986年很著名的JASA文章里,Paul Holland提出了“Rubin因果模型(RCM)”,指因果推断潜在的结果框架(Holland, 1986)。你能解释下为什么,如果你这么认为的话,“Rubin因果模型”是你对这个话题的贡献的相对描述? - +Fabri: 在他很著名的1986年发表的JASA文章里,Paul Holland提出了“Rubin因果模型(RCM)”,指因果推断潜在的结果框架(Holland, 1986)。你能解释下为什么,如果你这么认为的话,“Rubin因果模型”是你对这个话题的贡献的相对描述?(应该是公正的描述吧。。) Don: Actually Angrist, Imbens and I had a rejoin- der in our 1996 JASA paper (Angrist, Imbens and Ru- bin, 1996), where we explain why we think it is fair. Neyman is pristinely associated with the development of potential outcomes in randomized experiments, no doubt about that. But in the 1974 paper (Rubin, 1974), I made the potential outcomes approach for defining causal effects front and center, not only in randomized experiments, but also in observational studies, which apparently had never been done before. As Neyman told me back in Berkeley, in some sense, he didn’t believe in doing statistical inference for causal effects outside of randomized experiments. Don:事实上Angrist, Imbens和我在我们1996年的JASA论文(Angrist, Imbens and Ru- bin, 1996)里一起反驳了,我们解释了为什么我们认为是相对的。Neyman是最初和随机实验中的潜在结果的发展相关的人,这毫无疑问。但是在1974年的文章里(Rubin, 1974),我用潜在结果的方法来定义因果效应的前中,这很明显,之前没做过。正如Neyman在伯克利告诉我的,在某种意义上,他不相信在随机实验之外做因果效应的统计推断。 Fan: Also there are features in the RCM, such as the definition of the assignment mechanism, that belong to you. -Fan: RCM也有很多特点,比如分配机制的定义,这个属于你。 +Fan: RCM也有很多特点,比如分配机制的定义,这是你提出的。 Don: Yes, it was crucial to realize that random- ized experiments are embedded in a larger class of as- signment mechanisms, which was not in the literature. Also, in the 1978 paper (Rubin, 1978a), I proposed three integral parts to this RCM framework: potential outcomes, assignment mechanisms, and a (Bayesian) model for the science (the potential outcomes and co- variates). The last two parts were not only something that Neyman never did, he possibly wouldn’t even like the third part. In fact, I think it is unfair to attribute something to someone who is dead, who may not ap- prove of the content being attributed. If the funda- mental idea is clear, such as with Fisher’s random- ization test of a sharp null hypothesis, sure, attribute that idea to Fisher no matter what the test statistic, as in Brillinger, Jones and Tukey (1978). Panos Toulis (a fine Harvard Ph.D. student) helped me track down this statement that I remembered reading in my ETS days from a manuscript John gave to me: -Don:是的,意识到随机实验是嵌套在一个大的分配机制中是很关键的,这个不在论文里。在1978年的论文里(Rubin, 1978a),我给RCM框架提出了3个完整的部分:潜在结果,分配机制,一个贝叶斯模型(潜在结果和协变量)。后两个部分不仅是Neyman从来没做过的,他很可能甚至不喜欢第三部分。事实上,我认为我认为把某事归因于死去的人是不公平的,他们可能不赞成被归因的内容。如果基本的想法很清晰,例如Fisher的随机检验有尖锐的零假设,把这个想法归因于Fisher,不管是什么统计检验,如Brillinger, Jones and Tukey (1978)中说的。Panos Toulis(一个很好的哈佛的博士生)帮我追溯了这个陈述,我记得在ETS的那些日子读了John给我的一个稿子: +Don:是的,意识到随机实验是嵌套在一个大的分配机制中是很关键的,但这个思路并未在论文里提过。在1978年的论文里(Rubin, 1978a),我给RCM框架提出了3个完整的部分:潜在结果,分配机制,一个贝叶斯模型(潜在结果和协变量)。后两个部分不仅是Neyman从来没做过的,他很可能甚至不喜欢第三部分。事实上,我认为我认为把某事归因于死去的人是不公平的,他们可能不赞成被归因的内容。但如果基本的想法很清晰,例如Fisher的随机检验有尖锐的零假设,那么把这个想法归因于Fisher就是很自然的,不管是什么具体统计检验,如Brillinger, Jones and Tukey (1978)中说的。Panos Toulis(一个很好的哈佛的博士生)帮我追溯了这个陈述,我记得在ETS的那些日子读了John给我的一个稿子: “In the precomputer era, the fact that almost all work could be done once and for all was of great impor- tance. As a consequence, the advantages of randomiza- tion approaches—except for those few cases where the randomization distributions could be dealt with once and for all—were not adequately valued. -“在前计算机时代,几乎所有的工作都能被做,所有的都很重要。因此,随机方法的优势,除了那些很少数的情况,随机分布能被处理,对于大多数,并没有足够的价值。 +“在前计算机时代,那些只需做一次的工作更受重视。因此,随机方法的优势,除了那些很少数的如随机分布很一次就能处理好的工作,对于大多数复杂情况,并没有足够的价值。 One reason for this undervaluation lay in the fact that, so long as randomization was confined to spe- cially manageable key statistics, there seemed no way to introduce into the randomization approach the insights—some misleading and some important and valuable—into what test statistics would be highly sen- sitive to the changes that it was most desired to detect. The disappearance of this situation with the rise of the computer seems not to have received the attention that it deserves.” (Brillinger, Jones and Tukey, 1978, Chap- ter 25, page F-5.) From 580e8d06a6bbdc067dd7c0ed84c154a5139394e9 Mon Sep 17 00:00:00 2001 From: JianqiaoWang Date: Mon, 13 Mar 2017 20:18:21 +0800 Subject: [PATCH 5/7] Update Rubin_all.Rmd --- Rubin_all.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Rubin_all.Rmd b/Rubin_all.Rmd index f8b1c4c..6141183 100644 --- a/Rubin_all.Rmd +++ b/Rubin_all.Rmd @@ -41,7 +41,7 @@ Fan:让我们从你的小时候说起吧。我知道你出生于一个律师 Don: Yes. My father was the youngest of four broth- ers, all of whom were lawyers, and we used to have stimulating arguments about all sorts of topics. Prob- ably the most argumentative uncle was Sy (Seymour Rubin, senior partner at Arnold, Fortas and Porter, diplomat, and professor of law at American Univer- sity), from D.C., who had framed personal letters of thanks for service from all the presidents starting with Harry Truman and going through Jerry Ford, as well as from some contenders, such as Adlai Stevenson, and various Supreme Court Justices. I found this impres- sive but daunting. The relevance of this is that it clearly created in me a deep respect for the principles of our le- gal system, to which I find statistics highly relevant— this has obviously influenced my own application of statistics to law, for example, concerning issues as di- verse as the death penalty, affirmative action and the tobacco litigation. -Don:是的。我父亲是他们4兄弟中最年轻的,他们4兄弟都是律师,我们过去在所有的话题上都有激烈的辩论。可能最好辩的叔叔是Sy(Seymour Rubin),他在华盛顿,他曾经写过私人感谢信,来表达他对从Harry Truman到Jerry Ford所有总统的感谢,同时他也感谢了对一些竞争对手如Adlai Stevenson和不同的高级法庭。这让我印象深刻,但也让我害怕。原因是,这让我对我们的法律体系所遵守的原则印象深刻,在这里面,我发现统计学与它高度相关,这明显触动我做了统计学在法律上的应用,例如,把问题考虑成包括死刑,平权法案和烟草诉讼。 +Don:是的。我父亲是他们4兄弟中最年轻的,他们4兄弟都是律师,我们过去在所有的话题上都有激烈的辩论。可能最好辩的叔叔是Sy(Seymour Rubin),他在华盛顿,他曾经写过私人感谢信,来表达他对从Harry Truman到Jerry Ford所有总统的感谢,同时他也竟然感谢了一些竞争对手如Adlai Stevenson和不同的高级法庭。这让我印象深刻,但也让我害怕。原因是,因此我对我们的法律体系所遵守的原则印象深刻,在这里面,我发现统计学与它高度相关,这明显触动我做了统计学在法律上的应用,例如,把问题考虑成包括死刑,平权法案和烟草诉讼。 Fabri: We will surely get back to these issues later, but was there anyone else who influenced your interest in statistics? From fc55e0ef9f8631b6ac6f4ca51613e1d787f99950 Mon Sep 17 00:00:00 2001 From: JianqiaoWang Date: Mon, 13 Mar 2017 20:24:12 +0800 Subject: [PATCH 6/7] Update Rubin_all.Rmd --- Rubin_all.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Rubin_all.Rmd b/Rubin_all.Rmd index 6141183..e987de5 100644 --- a/Rubin_all.Rmd +++ b/Rubin_all.Rmd @@ -49,7 +49,7 @@ Fabri: 等下我们再回到这些问题,但还有其他人影响了你对统 Don: Probably the most influential was Mel, my mother’s brother, a dentist (then a bachelor). He loved to gamble small amounts, either in the bleachers at Wrigley Field, betting on the outcome of the next pitch, while watching the Cubs lose, or at Arlington Race track, where I was taught at a young age how to read the Racing Form and estimate the “true” odds from the various displayed betting pools, while losing two dol- lar bets. Wednesday and Saturday afternoons, during the warm months when I was a preteen, were times to learn statistics—even if at various bookie joints that were sometimes raided. As I recall, I was a decent stu- dent of his, but still lost small amounts. -Don:可能影响最深的是我舅舅,一名牙医。他喜欢小赌,要么在芝加哥箭牌球场的露天看台后赌下场比赛的结果,然后看到Cubs队比赛输了,或者在阿灵顿赛道,输了2刀。也是在那里,我小时候就被教导了如何阅读赛马新闻,从不同的赌注估计胜算。周三和周六下午,当我还是个青春期少年的时候,在温暖的月份,我开始学习统计,即使不同的赌注下有时候会被突然袭击。根据我的回忆,我那时是他相当优秀的学生,但是仍然输了一小部分。 +Don:可能影响最深的是我舅舅,一名牙医。他喜欢小赌,要么在芝加哥箭牌球场的露天看台后赌下场比赛的结果,然后看到Cubs队比赛输了,或者在阿灵顿赛道,输了2刀。也是在那里,我小时候就被教导了如何阅读赛马新闻,从不同的赌注估计胜算。周三和周六下午,当我还是个青春期少年的时候,在温暖的月份,我开始学习统计,即使是有时冒着可能被抢劫的风险。根据我的回忆,我那时是他相当优秀的学生,但是仍然输了一小部分。 There were two other important influences on my statistical interests from the late 1950s and early 1960s. First, there was an old friend of my father’s from their government days together, a Professor Emeritus of Economics at UC Berkeley, George Mehren, with whom I had many entertaining and educational (to me) arguments, which generated a respect for economics that continues to grow to this day. And second, my wonderful teacher of physics at Evanston Township High School—Robert Anspaugh—who tried to teach me to think like a real scientist, and how to use mathe- matics in the pursuit of science. From 1da8097abea308a6a3d0cd1addba605d6c0ed620 Mon Sep 17 00:00:00 2001 From: JianqiaoWang Date: Wed, 17 May 2017 22:05:00 +0800 Subject: [PATCH 7/7] Update Rubin_all.Rmd --- Rubin_all.Rmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Rubin_all.Rmd b/Rubin_all.Rmd index e987de5..88968ee 100644 --- a/Rubin_all.Rmd +++ b/Rubin_all.Rmd @@ -49,7 +49,7 @@ Fabri: 等下我们再回到这些问题,但还有其他人影响了你对统 Don: Probably the most influential was Mel, my mother’s brother, a dentist (then a bachelor). He loved to gamble small amounts, either in the bleachers at Wrigley Field, betting on the outcome of the next pitch, while watching the Cubs lose, or at Arlington Race track, where I was taught at a young age how to read the Racing Form and estimate the “true” odds from the various displayed betting pools, while losing two dol- lar bets. Wednesday and Saturday afternoons, during the warm months when I was a preteen, were times to learn statistics—even if at various bookie joints that were sometimes raided. As I recall, I was a decent stu- dent of his, but still lost small amounts. -Don:可能影响最深的是我舅舅,一名牙医。他喜欢小赌,要么在芝加哥箭牌球场的露天看台后赌下场比赛的结果,然后看到Cubs队比赛输了,或者在阿灵顿赛道,输了2刀。也是在那里,我小时候就被教导了如何阅读赛马新闻,从不同的赌注估计胜算。周三和周六下午,当我还是个青春期少年的时候,在温暖的月份,我开始学习统计,即使是有时冒着可能被抢劫的风险。根据我的回忆,我那时是他相当优秀的学生,但是仍然输了一小部分。 +Don:可能影响最深的是我舅舅,一名牙医。他喜欢小赌,要么在芝加哥箭牌球场的露天看台后赌下场比赛的结果,然后看到Cubs队比赛输了,或者在阿灵顿赛道,输了2刀。也是在那里,小时候的我就被教导了如何阅读赛马新闻,从不同的赌注估计胜算。周三和周六下午,当我还是个青春期少年的时候,在温暖的月份,我开始学习统计,即使是有时冒着可能被抢劫的风险。根据我的回忆,我那时是他相当优秀的学生,但是仍然输了一小部分。 There were two other important influences on my statistical interests from the late 1950s and early 1960s. First, there was an old friend of my father’s from their government days together, a Professor Emeritus of Economics at UC Berkeley, George Mehren, with whom I had many entertaining and educational (to me) arguments, which generated a respect for economics that continues to grow to this day. And second, my wonderful teacher of physics at Evanston Township High School—Robert Anspaugh—who tried to teach me to think like a real scientist, and how to use mathe- matics in the pursuit of science. @@ -232,7 +232,7 @@ Fan: EM算法是现代统计的另一个里程碑;它也和计算机相关, Don: In those early years at ETS, I had the free- dom to remain in close contact with the Harvard peo- ple, Cochran, Dempster, Holland and Rosenthal, which was very important to me. I always enjoyed talking to Dempster, who is a very principled and deep thinker. I was able to arrange some consulting projects at ETS to bring him to Princeton. Once we were talking about some missing data problem, and we started discussing filling these values in, but I knew it wouldn’t work in generality. I pointed to a paper by Hartley and Hock- ing (1971), where they deserted the approach of itera- tively filling in missing values, as in Hartley (1956) for the counted data case, and went to Newton–Raphson, I think, in the normal case. Even though aspects of EM were known for years, and Hartley and others were sort of nibbling around the edges of EM, apparently nobody put it all together as a general algorithm. Art and I real- ized that you have to fill in sufficient statistics. I had all these examples like t distributions, factor analysis (the ETS guys loved that), latent class models. And Art had a great graduate student, Nan Laird, available to work on parts of it, and we started writing it up. The EM paper was accepted right away by JRSS-B, even with invited discussions. -Don:在ETS的早些年,我可以自由的和哈佛的人联系,比如Cochran, Dempster, Holland和Rosenthal,这对我来说非常重要。我非常喜欢和Dempster聊天,他很有原则,思考问题很深入。我能在ETS安排一些咨询项目带他到普林斯顿。一次我们聊缺失数据的问题,我们开始讨论插补值,但是我知道它不具有普适性。我指出一篇论文,Hartley和Hocking (1971)写的,里面用了迭代的方法来插补缺失数据,正如Hartley(1956)讨论的关于计数数据的情形,后来发展到牛顿算法,我认为,是一个普遍情形。即使EM的种种已经被知道很多年了,Hartley和其他人有点咬在EM的边缘,很明显没有人把它总结成一个普适性的算法。Art和我意识到必须填补充分统计量。我做了所有的例子,如t分布,因子分析(ETS的人们喜欢),隐类模型。Art有一个很好的研究生,Nan Laird,可以做一部分工作,我们开始写起来。EM被JRSS-B接收,并且被邀请讨论。 +Don:在ETS的早些年,我可以自由地和哈佛的人联系,比如Cochran, Dempster, Holland和Rosenthal,这对我来说非常重要。我非常喜欢和Dempster聊天,他很有原则,思考问题很深入。我能在ETS安排一些咨询项目带他到普林斯顿。一次我们聊缺失数据的问题,我们开始讨论插补值,但是我知道它不具有普适性。我指出一篇论文,Hartley和Hocking (1971)写的,里面用了迭代的方法来插补缺失数据,正如Hartley(1956)讨论的关于计数数据的情形,后来发展到牛顿算法,我认为,是一个普遍情形。即使EM的种种已经被知道很多年了,Hartley和其他人有点咬在EM的边缘,很明显没有人把它总结成一个普适性的算法。Art和我意识到必须填补充分统计量。我做了所有的例子,如t分布,因子分析(ETS的人们喜欢),隐类模型。Art有一个很好的研究生,Nan Laird,可以做一部分工作,我们开始写起来。EM被JRSS-B接收,并且被邀请讨论。 Fan: Now let’s talk more about causal inference. You are known for proposing the general potential out- come framework. It was Neyman who first mentioned the notation of potential outcomes in his Ph.D. thesis (Neyman, 1990), but the notation seemed to have long been neglected. @@ -331,7 +331,7 @@ Fabri:所以EPA的项目还没开始就结束了。 Don: It didn’t start at all in some sense. I formally signed on at the beginning of December, and after one pay period, I turned in my resignation. But I felt responsible to find jobs for all these people I brought there. Eventually, Susan Hinkins got connected with Fritz Scheuren at the IRS; Paul Rosenbaum got a position at the University of Wisconsin at Madison; Rod got a job related to the Census. One nice thing about that short period of time is that, through the projects I was in charge of, I made several good connections, such as to Herman Chernoff and George Box. George and I really hit it off, primarily because of his insistence on statistics having connections to real problems, but also because of his wonderful sense of humor, which was witty and ribald, and his love of good spirits. In any case, the EPA position led to an invitation to visit Box at the Math Research Center at the University of Wisconsin, which I gladly accepted. That gave me the chance to finish writing the propensity score papers with Paul (Rosenbaum and Rubin, 1983a, 1983b, 1984a). -Don:在某种意义上这个项目就是没开始。我在十二月初正式签约,不过只在一个支付期之后,我就递交了自己的辞职信,但我觉得我有义务为我带来的这些人提供工作。最终,Susan Hinkins和IRS的Fritz Scheuren取得了联系;Paul Rosenbaum在威斯康辛大学麦迪逊分校得到了一个职位;Rod也得到了一份有关普查的工作。在这么短的工作时间内,有一件特别棒的事情是,我通过我管理的这个项目,和Herman Chernoff、George Box等人取得了很好的联系。George和我真的很搭,主要是因为他坚持认为统计应该和实际问题相联系,但也有一部分原因是由于他那粗犷而又很诙谐的幽默感,以及他的积极向上。在任何情况下,我都很同意将威斯康辛大学数学研究中心中EPA的位置留给Box。那也给了我一个机会来和Paul一起完成倾向评分(Rosenbaum and Rubin, 1983a, 1983b, 1984a)的论文。 +Don:在某种意义上这个项目就是没开始。我在十二月初正式签约,不过只在一个支付期之后,我就递交了自己的辞职信,但我觉得我有义务为我带来的这些人提供工作。最终,Susan Hinkins和IRS的Fritz Scheuren取得了联系;Paul Rosenbaum在威斯康辛大学麦迪逊分校得到了一个职位;Rod也得到了一份有关普查的工作。在这么短的工作时间内,有一件特别棒的事情是,我通过我管理的这个项目,和Herman Chernoff、George Box等人取得了很好的联系。George和我真的很搭,主要是因为他坚持认为统计应该和实际问题相联系,但也有一部分原因是由于他那粗犷而又很诙谐的幽默感,以及他的积极向上。无论如何,在EPA的工作使我能够去威斯康辛大学数学研究中心中访问Box。那也给了我一个机会来和Paul一起完成倾向评分(Rosenbaum and Rubin, 1983a, 1983b, 1984a)的论文。 Fan: Since you mentioned propensity score, arguably the most popular causal inference technique in a wide range of applied disciplines, can you give some insights on the “natural history” of propensity score?