Scalable Methods for the Analysis of Network-based Data
This MURI project is focused on developing new and innovative methods for analysis of large complex network data sets, focusing in particular on scalable algorithms for statistical analysis for social network data. Statistical modeling of social networks is well-established, but the widespread application of these ideas to network data sets has been limited to date by computational limitations. In particular, our ability to collect very large and more complex network data sets (e.g., via the Web) means that there is increasing demand for analysis and modeling algorithms that scale well with data size and complexity, and that can be used to provide insights, to test hypotheses, and to make predictions. This project is focused on developing new techniques and tools for addressing this scalability problem, including topics such as development of efficient data structures for modeling network data, fast Monte Carlo sampling algorithms for network parameter estimation, new latent variable models for analyzing networks over time and networks with text data, systematic methods for handling missing data in networks, among others. Data sets being used in the project include interorganizational communication data for disaster recovery (Katrina and World Trade Center), email communication data over time, Twitter data, political blog data, as well as many more traditional social network data sets. The project personnel consists of an interdisciplinary team with expertise spanning sociology, statistics, machine learning, algorithms, and computational geometry, involving 7 professors, 3 postdoctoral researchers, and about 15 PhD students, distributed across 5 universities.
For more information see the project website.