Geisinger Health System teams up with SCET to offer Collider Project

The Sutardja Center is pleased to announce the Geisinger Health Collider Project. Geisinger Health System is known as an early adopter of modern paradigms of healthcare and medical informatics. Data Science team at Geisinger combines the best practices of machine learning to support decision-making in healthcare. In 2014 Geisinger completed over 20 research projects using Electronic Medical Records (EMRs) to predict personalized treatment outcomes, to better allocate healthcare resources, and to obtain early warnings in scenarios of crisis. The team is focused on research that lies at the interface of clinical medicine and applied mathematics & computer science but is also interested in all forms of multidisciplinary studies centered on effective utilization of data in healthcare.


Undergraduate and graduate students across multiple disciplines are encouraged to attend and the kickoff to sign up for this collider.  Students in data science, information systems, business, sociology, psychology, political science and others will form teams of 2 to explore the questions posed.  Team members are required to attend the kick-off lecture to sign-up for this project. Important Note: This project is in two phases, with Phase One taking place Fall 2015.  Successful teams will be selected to continue in Phase Two which will be scheduled in Spring 2016.  


The team that successfully completes both phases of the project will be awarded a paid internship at Geisinger Health.  Other candidates will receive consideration for possible internships as well.

Project Description:

The project consists of two phases: Phase I was held during the Fall Semester. Phase 2 will take place during the Spring 2016 term. The proposed timeline and details of these stages are:

Phase 1 (Recently Concluded):

A) Teams will be asked to address of one of three problems related to multidisciplinary data analysis of obesity, heart/lung failure and mood disorder. Specifics will be addressed at the introductory lecture. Teams will then work independently to identify novel data sources, to refine the hypothesis, and to formulate a tentative strategy for data blending and subsequent analysis. At this stage, a premium is placed on creativity, although the practical achievability of the project must also be considered. A data dictionary for the clinical data set will be provided. Teams are reminded that the main body of work must be based on data that is tangible and publicly available.

B) Deliverable: a 3-5 page summary of proposed work that contains the following information:

a. Required

i. clearly stated objectives, including a well-formulated hypothesis or research question ii. a list of the additional data sets that will be combined with the clinical data

iii. an explanation of relevance of the additional data iv. a general description of data integration process

v. a general description of the process for analyzing the integrated data set

b. Encouraged

i. references to similar multidisciplinary efforts
ii. justification of used metrics of distance and quality (i.e. “how will you know that your outcome is ‘better’?)

iii. an in-depth description of data sets (this part does not count for the page limit).


Phase II

Kickoff via webcast (BlueJeans) on Feb 2 at 10am Pacific. Final Phase 2 submissions are expected March 29 (exact dates and detailed agenda to be reviewed at the kickoff webcast)

A) Overview:


Students will spend most of the time actually doing the work that they proposed and outlined in Phase 1. They will gather their data sets and integrate them with the base clinical data set provided by the data science team. They will have some standard code bases provided by the data science team from which to build their integration and analysis strategies. They may refine their objectives and strategies of analysis and them using easy-to-acquire, easy-to-use methods and software. Throughout these 8 weeks they can communicate with the data scientists from Geisinger and/or with whatever other data scientists, mathematicians, computer scientists, and domain experts they like. The Geisinger team will provide some guidance and support, including: assistance in acquisition and interpretation of public data sets, access to additional code-bases, and, most importantly, access to core data clinical data sets relevant for the project. Additional clinical data pulls may be possible, and these must be requested early to allow us time to process. Phase 2 will last approximately 8 weeks. If there is interest, the data science team can be available for 1-2 days on site at Berkeley around the middle of this timeline to provide hands-on assistance with the technical components as needed.

B) Final deliverable:


A 5-15 page document accompanied by additional data used in analysis. It must contain:

 i. A discussion of final objectives, including rationale for any revisions
 ii. An explanation of the relevance of the data added by the group
 iii. A discussion of the methods of data blending
 iv. A discussion of the methods of data analysis
 v. Numerical or narrative evidence for the improvement in quality or understanding provided by using additional multidisciplinary data.