Skip to content

Analyzing complex data structures: network, spatial, multilevel, and text data.

The world and data about the world are becoming increasingly complex. Examples of complex data structures include network data that represent connections among individuals (e.g., friends on social media platforms), spatial data that represents geolocations (e.g., smartphone location data), data collected at multiple levels (e.g., employees in organizations), and text data (e.g., interviews, online comments).

These data structures pose several interesting challenges in how to extract meaningful and actionable knowledge from them. This course provides students with a comprehensive understanding and set of tools for extracting knowledge from complex data structures: from forming research questions, through preparing and analyzing the data, to reporting and visualizing conclusions.

This course takes a “learning by doing” approach with lectures in the morning in which students will learn about the unique challenges of each type of data and ways to analyze them appropriately. In the afternoon, students will apply this knowledge in lab sessions to gain hands-on experience. Topics include an introduction to R and the tidyverse, network formation and effect models, spatial regression models, multilevel models, and natural language processing.

In the beginning, students will choose a dataset (several options are provided) and then develop and examine their own research questions. Lab sessions and homework are designed such that students can work on their own datasets and research questions to complete the exercises. Lab sessions and homework thus naturally culminate in a final report. In this way, students experience the full process of data science: from research question to final report.

Final report:

The summer course culminates in a research report. This report should not be longer than 10 pages and be executed individually. Lab sessions and homework guide them from the research question to the final report. The final report is just a short write-up focusing on their understanding of the chosen data structure, the applied method, and how they interpreted and visualized their results.  

Literature:

  • Gelman, A., Hill, J., & Vehtari, A. (2020). Regression and other stories. Cambridge University Press.
  • Rawlings, C. M., Smith, J. A., Moody, J., & McFarland, D. A. (2023). Network Analysis: Integrating Social  Network Theory, Method, and Application with R. Cambridge University Press.
  • Pebesma, E., & Bivand, R. (2023). Spatial Data Science: With applications in R. CRC Press.
  • Hvitfeldt, E., & Silge, J. (2021). Supervised machine learning for text analysis in R. CRC Press.
  • Jacobucci, R., Grimm, K., Zhang, Z. (2023). Machine learning for Social and Behavioral Research. Guilford Press.

Students at Cornell University can sign up here.

Students at other universities can attend the lectures on programming and statistics in the morning (ET on Zoom) but cannot participate in lab and project work in the afternoon. If you would like to participate, please email me with a short motivation.