Network Data Collection
So far, we have discussed why social network analysis is an essential part of a crime analyst’s toolkit by introducing the fundamental concepts of network science, helping you see crime problems through a network lens. We have also introduced the theoretical aspects of network science, paying attention to how networks matter as a cause (or an effect). In this chapter, we’ll shift focus to the practical side of network analysis: how to collect network data effectively. Understanding the types of data you need, how to gather them, and the challenges involved in network data collection is crucial for conducting effective network analysis as a crime analyst.
By the end of the chapter, you will be able to answer these questions:
- What are various ways network data can be collected?
Case Study: Network Data Collection in a Gang Crime Investigation
Let’s start with a case study. In a study of gang networks in Chicago, Papachristos, Hureau, and Braga (2013) sought to understand how violence spread within and between gangs. Their research focused on the impact of social networks on gun violence, identifying how specific individuals played a crucial role in coordinating violent activities. The study utilized network data to map the relationships between gang members and trace the diffusion of violence across the network.
Researchers collected network data from several sources: arrest records (links between individuals arrested together), homicide and shooting data (links between individuals and gun violence incidents), and social ties and rivalries (information on known affiliations and rivalries between gang members). Each edge represented an interaction, allowing the researchers to construct a network of gang members linked through criminal activity or social ties. By incorporating both official records and street-level knowledge of gang relationships, the study created a comprehensive map of the social network driving gun violence in the city.
This case is an example that illustrates the diversity of data that can be used to build social networks. For crime analysts, comprehensive data collection is an important goal so as to ensure that all relevant ties and individuals are included (to the extent possible), as missing data could lead to incomplete or misleading conclusions about the network being analyzed.
Types of Network Data Collection
Network data can come from MANY sources. For example, data can be:
Observational - Information about network ties are created by observing the links. An illustration of this is the Miller Project, which focused on youth gang affiliations in an urban environment. A key feature of the study was the collection of data by fieldworkers who conducted observations in public spaces where youth frequently interacted (parks, community centers, street corners). They documented social interactions, identifying key individuals who frequently connected otherwise disconnected groups. In a study by Young, Decker, and Sweeten (2018), the researchers reconstructed these observational data to create social networks.
Archival - Data can also be collected through reconstructing historical data that exists in archives. For example, Papachristos and Smith (2012) conducted a detailed study of Al Capone’s criminal organization by reviewing records from law enforcement, newspapers, court and legal documents, and several other historical sources. The researchers were able to create a detailed sociogram of Capone’s criminal network, identifying the key actors and the strength of ties between them. The network revealed the presence of core members and peripheral associates, showing how resources and influence flowed through the organization.
Questionnaire - Network data can also come from questionnaire’s where someone asks somebody about their ties. For example, the Prison Inmate Network Studies (PINS) are a series of research studies focused on understanding social networks within prison environments (see Kreager et al. (2020)). Networks among incarcerated men and women were created by interviewing individuals about their social relationships in prison.
Boundary Specification
While there are various ways that network data can be collected, an important consideration is the issue of boundary specification. This concerns the theoretical and methodological challenge of determining the appropriate set of actors and connections to analyze in order to identify the relevant social network within a given context. When we collect data on a network we impose a particular boundary. For example, suppose we ask adolescents in a school who their friends are, but we only have a list of people in that school. The boundary would be those friends that are in the school. Does the boundary really exist? What about friendships that are outside of the school? Is the boundary imposed to conduct the research?
This is an important question to consider when collecting network data and when you are analyzing it. This can be a particular challenge for crime analysts given the types of data involved in our work. Defining clear boundaries influences the accuracy and effectiveness of the resulting analysis, which may have important implications for the policies that we identify from these analyses.
Consider another example where a crime analyst is mapping the social network of a local gang involved in drug trafficking. In defining the network, the crime analyst has to decide whether to include only core gang members (e.g., those officially identified by law enforcement) or to also include associates who interact frequently with gang members. If the analyst focuses only on individuals with arrest records, they may overlook influential associates (e.g., suppliers, informants). However, including individuals without strong connections to the gang’s criminal activities might dilute the network’s focus. Another important consideration is the temporal aspect of boundary specification. Should the network only reflect current interactions, or include historical data? A network constructed over a multi-year period allows the analyst to observe changes over time. However, using a long time frame can introduce challenges in identifying which relationships remain relevant to current criminal activities. Finally, the sources used to define the network also impact boundary specification. In this example, we may base the network on co-arrest data. But, a crime analyst could also use informant reports, social media activity, or surveillance footage. Relying too heavily on one source (e.g., arrest records) could bias the analysis toward individuals already under law enforcement scrutiny.
Instruments and Design
Instruments are the tools used to elicit information from respondents. Some common examples are name-generators and rosters.
Design corresponds to the protocol for determining how information should be elicited, who should be sampled, and so on. Some examples are: ego-centric networks, partial networks, and complete/global networks
Ego-Centric Networks
With ego-centric designs, data on a focal actor (ego) and ties to neighbors (alters) and the ties among the alters. The instrument is a “name generator” where individuals are prompted to elicit names that pertain to a connection of interest. For example, the General Social Survey (GSS) is a sociological survey that collects data on demographic characteristics, behaviors, and attitudes of residents in the United States that has been conducted since 1972. The GSS includes a social network measure that asks people the following question: “who are the people with whom you discuss important matters?” For each person named, the respondent is given the prompt “which of these individuals discuss important matters”?
Partial Networks
Partial networks extend data collection from the ego-centric design. Specifically, ego-centric data are collected, plus some amount of tracing to reach contacts of contacts. The instrument is a “tracing mechanism” whereby the network is “traced” by contacting those in ego’s network. For example, Klovdahl et al. (1994) used a link-tracing design to examine HIV risk networks, particularly focusing on drug users and their sexual partners. The project started with a purposively selected group of drug users. These individuals were asked to identify their sexual partners and drug-sharing contacts, who were then recruited into the study. The study aimed to understand how HIV spreads within networks of drug users, examining both direct and indirect relationships. The use of link-tracing was critical for reaching hidden or hard-to-reach populations, such as injection drug users and their social and sexual networks, which are often not accessible through traditional sampling methods.
Complete/Global Network
A final approach is to take a census of the network. In these situations, data on all actors within a particular (defined) boundary are known. The instrument can be a roster (e.g. “For each of the following persons, please indicate whom you trust to share personal information with?”) or perhaps a free response (e.g. “Who are the people in this prison that you trust with person information about you?”). We most commonly encounter this design when there is an identifiable sampling frame, such as a school or a prison, and a roster of members of the setting is known.
Things to Consider
As you may have guessed, collecting network data involves a lot of questions and can sometimes seem a bit overwhelming. But, as you think about analyses you may do or are evaluating existing studies, think about the following areas:
- Domain - what type of network is this?
- Sample - what is the population of interest and how was it sampled?
- Temporality - are the data cross-sectional or longitudinal? Is it a panel or continuous measurement?
- Tie meaning - are ties discrete events or enduring states?
- Instrument - how was the information collected?
Test your Knowledge
- Compare and contrast observational data, archival data, and questionnaire-based data collection methods for social networks. For each, provide an example that you may encounter as a crime analyst.
- Explain the concept of boundary specification in network data collection. Why is it important to define clear boundaries, and what challenges might arise when defining them in crime network analysis?
- Find a study or research report using network data on a topic that interests you. Then, reflect on the issues of domain, sample, temporality, tie meaning, and instrument.
Summary
This chapter has provided a brief introduction to network data collection in the context of crime analysis, focusing on the practical methods of gathering and analyzing network data. In the next chapter, we will cover the basics of what network data are and how we go about actually doing network analysis!