COLLECTION OF EMPIRICAL DATA ON EThnicity
Using area studies and historical expertise, in this team two main tasks are pursued: First, a cross-country dataset is developed that allows to test hypotheses about ethnic boundary formation. Second, empirical data needed to run the simulation is provided.
Development of a novel cross-country dataset on active and latent ethnic groups:
The development of a cross-country dataset on active and latent ethnic groups aims at providing researchers with information about groups defined by ascriptive criteria - such as language, religion and geographical origin - that are (active) or could be (latent) the basis of ethnicity construction. Such a dataset is needed because to test hypotheses about the formation of ethnic boundaries, information of existing datasets that takes ethnic groups as a given unit of analysis are not sufficient. Instead, quantitative and cross-country comparable data is needed that identifies potential (latent) as well as existing (active) ethnic groups and provides relevant information about these groups - such as size, wealth, education levels as well as political and cultural influence.
Creating this dataset requires to first identify the ascriptive groups in a particular country that are big enough to potentially become a source of ethnic identity. Based on an algorithm the team developed, groups are identified based on religious, linguistic, racial or geographic associations (see details here). In a second step, an area expert validates the list of groups and compiles the information on them collected for the dataset (see details here). Once completed, the dataset will allow users to find information and run statistical analyses on all these attributes for active and latent ethnic groups of a large sample of countries.
Providing empirical input data for the simulation:
To provide the simulation with empirical data, archival and historical data is used. For example, for the case of Malaysia, the Colonial Census of British Malay provides the raw data on the size of major ethnic groups. However, to be usable for the simulation, this data still needs to be adjusted by the area experts - for example, to correct for increased detail in reporting over time. Furthermore, it needs to be combined with reliable data about religious identification. Lastly, for all cross-sections data is needed to describe political power as well as attitudes towards other groups. While based on the area experts' rich knowledge of historical and current circumstances, this data fnally needs to be reduced to numerical values - to be readily usable for the simulation. These can be entered directly into the simulation software by the area expert, as demonstrated in the screenshot below: