Each Harvard Extension School data science master’s degree candidate completes a a capstone project by collaborating with an industry, government, or academic partner to investigate a real-world topic. Previous partnerships include Microsoft, Wildtrack, FEMA, MGH, IEEE, and Canadian Health.

When six students of the 2024 cohort came together for their capstone, they each shared a passion for leveraging their data science skills to affect positive change.

During the three-week, on-campus precapstone residency, the HES students contacted their partner of choice: NASA.

Three NASA scientists from the Ocean Ecology Laboratory agreed to collaborate on the capstone project: Carlos Del Castillo, chief of the Ocean Ecology Laboratory; Bridget Seegers, lead of the Cyanobacteria Assessment Network; and Cecile Rousseaux, research scientist. Later, research scientist Lionel Arteaga also joined the team.

Their project focused on analyzing data to better understand the natural phenomenon of phytoplankton, a crucial part of the Earth’s ecosystem. Castillo and Rousseaux had been working on the topic before the students’ introduction but needed the data science component to round out the research.

Bruce Huang, director of information technology programs, and Stephen Elston, data science instructor, approved the project proposal and the team got to work.

Meet the Data Science Capstone Team

Doris Wong.

Doris Wong
Location: Toronto, Canada
Profession: Director of risk modeling

Ronan Fonseca.

Ronan Fonseca
Location: Vitoria, Brazil
Profession: Economics

Marina Engelhardt.

Marina Engelhardt
Location:
New York, U.S.
Profession: Data analytics manager

Vivien Kocsis.

Vivien Kocsis
Location:
Miskolc, Hungary
Profession: Entrepreneur

Walter Wojciech Latusek.

Walter Wojciech Latusek
Location:
Warsaw, Poland
Profession: Data science and marketing analytics consultant

Emily Mocek.

Emily Mocek
Location:
Toronto, Canada
Profession: Data science, ad-tech

The Capstone Research Process

Project research involved evaluating satellite data to estimate phytoplankton levels in the Southern Ocean and in North American Lakes. Phytoplankton is a microscopic organism responsible for producing more than 50 percent of the planet’s oxygen, as well as for maintaining aquatic food webs.

The Southern Ocean is estimated to store a considerable proportion of excess heat and carbon from the atmosphere. Due to its extreme conditions, it is a particularly challenging environment to study and from which to collect data.

The capstone team used satellite data and Earth system models to gain complete coverage of the Southern Ocean.

Harvard professors and NASA scientists were there to help us with their technical expertise and domain-specific knowledge.

Vivien Kocsis

Rousseux, one of the NASA scientists, explained, “By using both models and satellite data, this project was able to look at how much of the information is lost and how much the data analysis is biased because of gaps in the data.”

During the second part of the project, the team focused on identifying the drivers of phytoplankton in the area. Because pytoplankton decreases carbon dioxide stored in the atmosphere and feeds higher tropic levels, understanding the drivers can help control phytoplankton levels in the various ocean regions.

This research directly supports four of 17 of the United Nations Sustainable Development Goals: good health and wellbeing, clean water and sanitation, life below water, and climate action.

Collaboration with NASA

The student team worked closely with the NASA scientists, meeting almost every week. The students worked collaboratively, but independently from one another, as each was responsible for their own scope of work.

Data science capstone students.
The data science capstone team.

“It has truly been a positive experience to work with them. They were extremely organized, hard-working and appreciative and this made a world of a difference,” said Rousseaux. “I am always looking forward to their weekly updates because it’s polished and professional.”

On both sides of the collaboration, the project went smoothly, despite encountering a few time zone challenges across five different countries.

Kocsis, one of the capstone students, said navigating the rigorous demands of the master’s program while also juggling full-time employment and other responsibilities was daunting at times.

“The flexibility that Harvard afforded us throughout our master’s degree was probably the most valuable support we received,” she said. “This flexibility allowed us to tailor our study schedules to accommodate our professional and personal obligations.”

In the beginning, building the knowledge needed to conduct their research was also challenging.

Emily and Marina at NASA.

“At first, we encountered a steep learning curve as we built an understanding of natural processes, including the impact of ice, clouds, and seasons on phytoplankton levels,” said Kocsis. “We were also challenged by developing complex data science models. In both cases, Harvard professors and NASA scientists were there to help us with their technical expertise and domain-specific knowledge.”

It wasn’t just the capstone research that left an impression; the team had the opportunity to visit NASA facilities, including a behind-the-scenes tour at the Goddard Space Flight Center, as well as the PACE satellite launch at the NASA Kennedy Space Center in Florida.

Capstone students at NASA.
Capstone students at NASA.

On-Campus Experience

Data Science Precapstone & Capstone

Precapstone Course

Near the end of the master’s degree program, students come together for a 3-week course on campus (during the summer or January). Student teams work with an industry partner on the research design/protocol for their final capstone project.

Capstone Course

In the final online course of the program, students complete the team-based capstone project. It is taken as the sole remaining degree requirement in the semester immediately following the precapstone.

For the precapstone experience, data science students spent three weeks on Harvard’s campus. In addition to taking classes in Harvard Hall and studying in Widener Library, Huang treated the students to what he called “cheap noodles,” which Kocsis recalls as “a fabulous bonding experience.”

Along the way, the students received support and guidance from Huang and Elston, as well as from their fellow students. Additionally, access to campus resources during the on-campus experience facilitated collaboration.

“Working with our capstone students is always a blast, and this project showcased their impressive blend of technical skills and collaborative spirit,” said Huang. “They excelled in data science while fostering a strong sense of community and belonging. I’m proud of their achievements and commitment to inclusive collaboration.”

Data science students at Harvard Hall.
Bruce Huang and students at dinner.

Data science students at Harvard Hall; Marina Engelhardt, Bruce Huang, and Emily Mocek in the Brattle Studio; Huang students enjoy “cheap noodles.”

Opportunity & Impact

The capstone project brought many other opportunities:

  • Presenting at an AI for Good Fireside Chat at the Division of Continuing Education’s Brattle Square Studio
  • Giving two seminars at NASA
  • Co-authoring two scientific papers with the chief of the Ocean Ecology Laboratory
  • Facilitating a partnership for the current cohort of data science students to work with the National Oceanic and Atmospheric Administration
  • Giving a talk at the upcoming United Nations AI For Good conference in Geneva, Switzerland

The impact of the data science capstone project and real-world applications is broad.

“Their projects answered several questions that we had for quite some time related to hotspots (regions of high phytoplankton concentration) in the Southern Ocean and their drivers,” said Rousseaux. “The level of quality of the work they did was to the point that we decided to publish these results and we are hoping to submit the paper some time this year to a peer-reviewed journal.”