We develop a web crawler to analyze virtual communities that will integrate social network and cultural theories from the social sciences into a software tool that is able to answer various kinds of research questions about the structure, culture and activities of virtual communities.

The internet's growing importance for social activity - be it market transactions, distribution of cultural products or organization of social movements - is widely recognized. However, despite the enormous amount of data electronically and publicly available, the social analysis of the internet is still in its infancy. We believe that one of the reasons for this state is that social scientific theories and methods, such as social network analysis, content and sentiment analysis have not yet been encapsulated in software programs - enabling users to easily answer practically or theoretically relevant questions about online social activity. Our mixed team of social and computer scientists tries to change this, developing a web crawler that combines the analysis of network structures and textual information on the web - with specialized integrated modules for forums and, in the future, blogs.

The kinds of questions that our program will be able to adress are the following:

Visual examples

A centralized and distributed virtual community compared:
The virtual community on the left the left is very densely connected and centralized, while the right one is distributed and decentralized. This visual impression is also confirmed by the reported measures of Betweenness Centrality of 0.711 and 0.253, respectively.


The salience of emotional constructs in different forums:
The salience of different emotions is measured through dictionairies that are automatically applied to the texts found in forums. As a result, the salience of different expressed emotions - such as anxiety, sadness and anger - can be compared across different forums.

Technical information and interface:
The developed software builds on the open source crawler Nutch's crawling and indexing, combined with our customized data structures & site management. The great amount of functionalities and specifications available will be easily accessible to the user through an intuitive interface, default options, but also advanced possibilities to change parameters.

