Spatial Analysis of Crime Using GIS-Based Data: Weighted Spatial Adaptive Filtering and Chaotic Cellular Forecasting with Applications to Street Level Drug Markets

 

 

 

 

 

 

 

A dissertation submitted to the

H. John Heinz III School of Public Policy and Management,

Carnegie Mellon University

in partial fulfillment of the requirements

for the degree of Doctor of Philosophy

 

 

by

Andreas M. Olligschlaeger (olli@cs.cmu.edu)

May, 1997

 

 

Acknowledgments

 

Numerous people contributed to the various projects which this dissertation is based on. First and foremost I would like to thank my dissertation committee: Wil Gorr, Jackie Cohen and Ramaya Krishnan for their support and thoughts as the dissertation progressed. Wil and Jackie especially spent endless hours with me going over results and hashing out details, and they are directly responsible for bringing me in on the DMAP project which started this whole thing. However, when it comes to service above and beyond the call of duty, Wil Gorr stands out. In my 17 (!) year career as a full time student at three universities I have never met a professor more dedicated to his work and his students than Wil. I consider myself extremely lucky to have been his student and I am a better person for it.

 

I would also like to thank the folks at the National Institute of Justice, especially Craig Uchida and Nancy LaVigne for supporting my research over the years, as well as some of the other DMAP researchers, including Lorraine Green-Mazerolle, David Weisburd and Tom McEwen for their invaluable input. In addition, I would to thank my "moral support team", consisting of one former NIJ researcher, one geographer, and numerous beers at a New York City Irish pub to help me get through the final phase of this dissertation.

 

Finally, I would like to thank all of the dedicated folks in the City of Pittsburgh, including Chief Earl Buford, Commander Bill Bochter and Sgt. Mona Wallace of the Pittsburgh Bureau of Police, as well as Darrell Packer, Glen Bigler and Ed Wells for their support on the DMAP project.

 

 

Abstract

 

With the recent emphasis towards proactive Community Oriented Policing and the increase in the use of computerized information systems for data collection police departments are faced with two major problems: (1) how to mine the vast amounts of data produced by these systems, and (2) how to use this data to provide information that supports proactive law enforcement.

 

This dissertation makes a contribution in this area by providing the model specification and framework for such tools, a GIS-based data collection system, and a new spatio-temporal forecasting method - chaotic cellular forecasting (CCF) - for use by an early warning system for emerging drug markets.

 

 

Table of Contents

1. Introduction

2. Data Sources and Data Collection Methodology

2.1 Introduction

2.2 The Pittsburgh DMAP Program

2.3 Geobase Files

2.4 Data Sources

2.4.1 Call For Service Data

2.4.2 Police Records Data

2.4.3 Property Ownership Data

2.5 Geocoding Methodology

2. 6 Data Collection and Aggregation

2.7 Conclusion

3. Modeling Street-Level Illicit Drug Markets

3.1 Introduction

3.2 Drug Enforcement

3.3 Literature on the Ecology of Crime

3.4 Model Specification and Data

3.5 Modeling Results

3.6 Conclusion

4: Weighted Spatial Adaptive Filtering: Monte Carlo Studies and Application to Illicit Drug Market Modeling.

4.1 Introduction

4.2 Varying Parameter Models and Methods.

4.3 Spatial Adaptive Filter

4.4 Multiple Feedback Pattern Recognizer

4.5 Monte Carlo Study

4.6 Monte Carlo Results

4. 7 Case Study

4.8 Conclusion

5. Chaos Theory, Artificial Neural Networks and GIS-Based Data: Chaotic Cellular Forecasting and Application to the Prediction of Drug Related Call for Service Data

5.1 Introduction

5.2 Spatial Forecasting Methods and Models

5.3 Chaotic Cellular Forecasting

5.4 Forecasting Drug Calls for Service: A Comparison Between Traditional Forecasting Methods and Chaotic Cellular Forecasting

5.5 Conclusion

6. Summary and Conclusion

7. Literature Cited

Appendix A

1. Geocoded Data Files Accessed Directly By DMAP

2. PSMS Data Files Indirectly Accessed by DMAP

Appendix B

Appendix C

Figure 4.4

Figure 4.5

Figure 4.6

Figure 4.7

 

 

 

1. Introduction

 

As police organizations automate their operations and implement more modern computer systems, taking advantage of advances in information technology such as open architecture database systems, enterprise wide computer applications and ever increasing microprocessor and network speeds, more and more information will become available to police officers at the click of a mouse button. Moreover, all of this information will be linked together from various sources and organized in ways which were previously unheard of. Police investigators will likely find this wealth of information a boon to their work, but crime analysts and police administrators may well find themselves faced with information overload.

 

At the same time that police departments are making increasing use of computer technology they are also undergoing a change in law enforcement philosophy. Evidence of this change can be seen in the fact that many police departments are implementing Community Oriented Policing (C.O.P.) in an effort to emphasize proactive rather than reactive law enforcement. While the concept of Community Oriented Policing is certainly not new (for a review of early C.O.P. initiatives see Trojanowicz, 1986) the way in which information is utilized in Community Oriented Policing has changed over the years. In many cities desktop personal computers have replaced the daily log for foot patrol officers and in some cities the time honored tradition of a notebook and pencil has given way to hand held, pen based mobile computers.

 

An abundance of tools and methodologies have been developed that support traditional reactive law enforcement. Practical examples include investigative tools such as linkage analysis, geographic offender profiling and modus operandi systems. Geographic information systems have also played a large role, both from a practical and a research perspective. Research examples include measuring the geographic displacement of drug offenders (Green, 1993), monitoring the effects of law enforcement strategies on nuisance bar activity (Cohen et al., 1993) and point pattern analysis of crime locations (Canter, 1993). Other examples of more general purpose crime mapping systems for law enforcement include the Drug Market Analysis Program (DMAP) effort undertaken in Jersey City, Hartford, San Diego, Pittsburgh and Kansas City (McEwen and Taxman, 1994; Maltz, 1993) and PA-LEGIS (Pennsylvania Law Enforcement Geographic Information System), an integrated GIS and police records management system developed for smaller police departments (Bookser, 1991).

 

There is no doubt that tools for reactive policing will always play an important role in law enforcement. However, proactive law enforcement will require an entirely new set of tools, the development of which has only just begun. Proactive problem solving by detectives, community oriented police officers and police officials not only requires access to up-to-date information on criminal activity, but perhaps more importantly the ability to anticipate emerging crime trends. This in turn requires the ability to mine the vast amounts of data produced on a daily basis by 911 and police record management systems, police hot line tips and citizen complaints for signs of impending flare-ups, geographic displacement or other unusual criminal activity. In other words, proactive law enforcement needs tools that can anticipate or provide early warning of criminal patterns so that they may be prevented.

 

This dissertation makes a contribution in this area by providing the model specification and framework, a GIS-based data collection system, and a new spatio-temporal forecasting method - chaotic cellular forecasting (CCF) - for use by an early warning system for emerging drug markets.

 

The second chapter focuses on the development of a geographic information system that provides the underlying data for the dissertation. This practical application of GIS to narcotics enforcement arose out of the Drug Market Analysis Program (DMAP) funded by the National Institute of Justice (NIJ). A by-product of the DMAP program was a very accurate data set consisting of point (i.e., address) level data on illicit drug market activity and related crimes.

 

Chapter 3 is a study employing multiple regression techniques to analyze the effects of both traditional and ecological variables on illicit drug markets. The study was in part made possible due to the fact that DMAP includes high quality location data on ecological variables such as land use and the built environment.

 

Chapter 4 is an empirical study introducing weighted spatial adaptive filtering which provides evidence that spatial interaction, local context and spatially varying model parameters are important indicators of street level drug dealing.

 

The fifth chapter introduces chaotic cellular forecasting. CCF employs the findings of the previous chapters and combines chaos theory, artificial neural networks (ANN's) and grid cell aggregated GIS-based data to produce one-step-ahead forecasts of street level drug market activity. One of the underlying assumptions of CCF is that spatio-temporal patterns of criminal activity can be modeled as a chaotic system. Artificial neural networks, more specifically feedforward networks with backpropagation, are then used to estimate the forecasting model. Backpropagation models are uniquely qualified for this purpose because they are self adapting and are universal approximators (Hornik et al., 1989). Two versions of CCF, one using spatially constant weights (analogous to spatial regression using spatially constant parameters) and the other a hybrid model of spatially varying input to hidden unit weights and constant hidden to output units weights are tested. The results are compared to both a simple and a state-of-the art spatial regression model using spatially lagged variables and tested for forecast accuracy on a holdout data sample.

 

The sixth and final chapter provides a summary and outlines future work.

 

 

2. Data Sources and Data Collection Methodology

 

2.1 Introduction

 

Over the past five years GIS has become a standard tool for crime analysts in many police departments, regardless of their size (see for example McEwen and Taxman 1994; Rossmo, 1995). One of the inherent advantages of GIS is its ability to integrate information from a variety of sources into one user interface. In turn, this allows for spatial analyses that would either not have been possible or at a minimum far more difficult prior to the advent of GIS.

 

One such GIS - the Pittsburgh Drug Market Analysis Program (DMAP) - is the primary source of data for this dissertation. DMAP was developed for the Pittsburgh Bureau of Police under a grant from the National Institute of Justice (NIJ - Grant #90-IJ-CX-007). DMAP was one of the first attempts to integrate GIS and a variety of law enforcement and public sources of data into one user interface and make it available to police officers. DMAP has been in daily use in the Pittsburgh Bureau of Police for the past five years and, while originally intended primarily for narcotics enforcement, has since been expanded to a general map-based crime analysis system.

 

From a research perspective DMAP has proved to be a rich source of data. One of the main advantages of an integrated GIS is that all data have one common denominator: the xy coordinates. Thus all data points can be related to others via coordinates or the address as well as other characteristics such as the date and time. This in turn allows for the aggregation of data over any desirable spatial and temporal unit, whether census tracts by year, municipal boundaries by month, or any user defined area by time of day. Without this capability most of this dissertation would certainly not have been possible.

 

This chapter of the dissertation discusses the development of DMAP as well as some of the problems encountered in integrating and aggregating data from a variety of incompatible sources. Towards that end the second section discusses the DMAP project, followed by a description of the various files in the third section. The fourth section outlines the non-geobase sources of data and discusses their limitations. Section five describes what was perhaps the largest hurdle in developing DMAP, namely how to ensure that data from various incompatible sources can be geocoded. Section six discusses how the data for this dissertation were aggregated. Finally, section seven summarizes the chapter.

 

 

2.2 The Pittsburgh DMAP Program

 

In the summer of 1990 the City of Pittsburgh was one of five cities (the others were Hartford, Kansas City, Jersey City and San Diego) awarded grants from the National Institute of Justice to develop new technologies for police and to evaluate the effectiveness of law enforcement strategies aimed at curbing illicit street level drug trafficking. As most other cities participating in the grant, Pittsburgh chose to develop a GIS for use by narcotics officers to track drug related information from a variety of sources. More specifically, DMAP allowed narcotics officers and administration officials to query data from multiple sources according to a number of geographic criteria. The results of the queries were displayed as maps which showed point information about certain crimes or calls for service (911 data). This point information could then be further queried for more detailed reports and descriptions of drug and other criminal activity. DMAP was implemented in 1991 and has been in use on a daily basis since.

 

While the original goal of DMAP was to design and implement a GIS-based system targeted specifically for narcotics enforcement, it was quickly realized that the system would also be useful in other areas of law enforcement. As a result, the federal grant was extended by the City of Pittsburgh so that a more general crime analysis mapping system could be developed. This system went into its final production phase in the spring of 1996 and has been in daily use in the crime analysis division of the Pittsburgh Bureau of Police since, where in addition to using the system in investigations approximately 15 maps per week are produced to support law enforcement officials in their daily tasks. Today, after a total of five years of ongoing development and six years of operation, the DMAP system is comprised of approximately 45,000 lines of programming code (about 60% of which is written in C and the rest in AML) and contains over one gigabyte of data.

 

The development of DMAP required that a number of hurdles be overcome. Most geographic information systems in use today are still standalone systems, i.e., they are not integrated with other systems. Indeed, the tools required for full integration are only just now becoming commercially available to software developers. This made the task of integrating multiple sources of incompatible information more challenging. In addition, the Arc/Info software environment on which DMAP is based, in particular the database portion, has a number of limitations which will be outlined below. Perhaps the most daunting task, however, was the fact that a system such as DMAP had not yet been developed, either commercially or as part of a research project.

 

In its current version DMAP serves a number of purposes. First, it supports the investigation of crimes by providing highly detailed information at the address level. For example, a detective can query for an address and within seconds obtain a complete history of that address, including any arrests or police incidents, 911 calls for service made from that address, whether persons living at the address have been repeat victims or perpetrators of crime, as well as property tax and ownership information. Second, DMAP is used by law enforcement officials to measure the effects of policing efforts by measuring the geographic displacement of crime over time and space. These effects can be displayed wither via choropleth maps showing changes over areal units such as census tracts or patrol sectors, as well as pin maps showing criminal activity before and after police events. A third area in which DMAP has been used is in support of other law enforcement activities such as court presentations and aiding efforts to identify and close nuisance bars.

 

 

2.3 Geobase Files

 

The geobase of a geographic information systems consists of those files which are necessary for mapping. Unlike ordinary databases geographic information systems store information in layers instead of tables (although GIS also stores and accesses data in tabular format to relate layer information to attributes). Thus the geobase of a geographic information systems is made up of layers of information. The main difference between a layer and a table is that a layer stores data geographically instead of in rows: each data object is associated with one or more geographical coordinates. There are three types of data objects which can be stored in layers: points (single coordinate), lines and polygons. Examples of each type of object include neighborhood boundaries (polygon), water lines (line) and radio towers (point). In addition, each object has attributes (such as the neighborhood name or the height of a radio tower) which are usually stored in related tables, although they can also be a part of the layer. Overlaying one or more types of layers on top of the other results in a map.

 

The layers of data that constitute the geobase in DMAP were derived from the Pittsburgh Allegheny Geographic Information System (PAGIS). PAGIS is the City of Pittsburgh's geographic information system used primarily in the departments of City Planning and Public Works. One of the advantages of PAGIS layers is that they are very accurate. The majority of PAGIS layers were derived from air photographs taken in 1986 and 1992. Cartographic information on the air photos was then commercially digitized and converted to GIS layers. What makes the layers so accurate is the fact that PAGIS's tolerance requirement for geographic accuracy is plus or minus five feet. In other words, all of the xy coordinates stored in the map layers can be no further than five feet from their true location. While from an engineering point of view this is not accurate at all, the tolerance is extremely accurate from a GIS perspective. This accuracy has proven very valuable for law enforcement. For example, one of DMAP's features is the ability to determine automatically how far a drug dealer was from a school during an observed drug transaction by measuring the distance between the address where the incident occurred and the property boundary of the closest school. Federal law provides for minimum sentencing guidelines for drug dealers arrested within 1000 feet of a school.

 

The specific layers included in DMAP are: streets outlines, property parcels, building footprints, neighborhood boundaries, 1980 and 1990 census tract boundaries, patrol sectors, police zone boundaries, fire zones, emergency response areas, water features, bridges, park features, cemeteries, major traffic arteries, undermined areas, and miscellaneous features such as parking lots, walls and helicopter landing pads.

 

 

2.4 Data Sources

 

As mentioned earlier one of the most unique features of the DMAP system is its ability to integrate data from a variety of incompatible sources. DMAP relies on three main external sources of information which are updated on a regular basis. These are 911 calls for service, police incident and arrest data, and property tax and ownership information (see Appendix A for a detailed description of files). In addition, DMAP contains data which is not updated as frequently, including the location of schools and playgrounds and the locations and ownership of liquor licenses.

 

2.4.1 Call For Service Data

 

911 call for service data is downloaded on a monthly basis from the City of Pittsburgh's emergency operations center computer system. The 911 system was implemented in 1989, and is rapidly approaching the end of its life expectancy. Indeed, the system is scheduled to be replaced with a more modern client-server based system in early 1998. The system consists of an AT&T miniframe computer running an Oracle database application. Due to the age of the system the amount of information the computer can store is limited to ten days worth. Every ten days the data is archived onto backup tapes. This means that historical data is not available on an on-line basis. DMAP is currently the only system in the City of Pittsburgh with which it is possible to query historical 911 data.

 

Each year the Emergency Operations Center (EOC) processes approximately 550,000 calls for service for a variety of nature codes. Since the DMAP system runs on a standalone workstation it would be impossible to download all calls for service. Thus DMAP obtains a subset of the data, including all Part I (major) crimes such as homicides, robberies and rapes, as well as a selected set of other crimes, including drugs, burglaries, assaults and vice. The number of data points added to the DMAP database each month is approximately 3,750. Each call for service includes the date, time, location, disposition and nature code of the incident.

 

The 911 data has a number of limitations which merit discussion. First, each call has only one nature code (type of call) associated with it. The actual nature code may differ from the one originally entered. For example, a person may call the 911 center saying that a person has been shot. The police, after arriving on the scene, may well find that the person shot is in fact dead. Rather than changing the nature code to that determined by the police, it is the policy of the Pittsburgh EOC to keep the original nature code. More modern 911 systems have two nature codes associated with each call: one for the caller's perception of the type of incident, and one for the actual type of incident which is determined by the responding unit (fire, police or EMS). Thus in Pittsburgh for some types of nature codes the 911 data reflects citizens' perception of incidents rather than their true nature.

 

A second problem with the Pittsburgh 911 data is that the disposition code is used inconsistently. Examples of dispositions used include that a report was written as a result of the call, the actors were gone an arrival, and that the call was unfounded. Unfortunately, due to policy changes in the operation of the 911 center, the way in which dispositions are used have changed over the years. For example, a disposition coded as "GOA", or gone on arrival, has a different meaning today than it did three years ago. This means that it is not possible to compare historical data in terms of the disposition of a call for service.

 

Finally, the 911 data includes both citizen and officer initiated calls for service. Each time either an officer or a citizen calls in an event, it is entered into the 911 system regardless of who called it in. Again, this is a policy issue that results in data limitations. Each time an officer writes a police report or makes an on-sight arrest he/she calls the 911 center via radio to obtain a CCR (Crime Code Reporting) number. Since the CCR number is automatically generated by the 911 system, the officer initiated call is entered into the system the same way it would if a citizen were to make a call. Thus no separation between is possible. However, estimates by 911 personnel indicate that approximately 10% of all drug calls for service are officer initiated.

 

2.4.2 Police Records Data

 

Data on police offense and arrest reports are downloaded on a weekly basis from the Public Safety Management System (PSMS). PSMS was implemented in 1988 and, like the 911 system, is fast approaching the end of its life span and is scheduled to be replaced by the end of 1997 with an Oracle based client-server system. PSMS is a networked database application running on a Honeywell mainframe computer. Data are downloaded via tape and converted from a proprietary Honeywell format to standard ANSI Unix format.

 

As with the 911 data, DMAP receives a subset of all data points in PSMS. Unlike the 911 system, however, PSMS contains historical information and is not archived. Currently, there are approximately 8 million records in PSMS. The PSMS information contained in DMAP consists of eight data tables plus approximately nine code tables (detailed information on the data tables are also included in Appendix A; however, the code tables are too large to list in the appendix). The data tables include information on cases, offense reports, arrest reports, arrest codes, the locations of offenses and arrests, victim and offender identities, and the residences of victims and offenders. Each week approximately 400 new cases are added to the data set.

 

In developing DMAP, the problems associated with PSMS data were far more serious than those encountered with the 911 system. First, the data had to be converted from a networked to a relational database architecture. Second, the database used by the GIS software - INFO - is not a truly relational database and cannot handle one to many relationships, of which PSMS contains several. Thus it was necessary to write a custom database engine in C. DMAP's database engine not only had to store the information contained in PSMS, but also needed to interface with the map layers in the geobase as well as with INFO, since INFO stores all of the addresses associated with data points. The DMAP database currently contains over 1.5 million data points associated with approximately 350,000 individual incidents. Therefore, in order to process all of this data in a timely manner and to provide fast access, binary tree indexing routines also had to be written.

 

Each incident in PSMS is based on a Criminal Control Record (CCR) number. This number is generated by the 911 system and is assigned to every new police incident to initiate a new case. Cases can have several different types of reports associated with them, including offense reports (i.e., the report associated with the original incident), arrest reports and supplemental reports. Supplemental reports usually have no address associated with them because their primary function is to act as a description of an ongoing investigation (for example, a witness interview in a homicide case).

 

Apart from the logistical and technological difficulties encountered there were also problems associated with data quality. When the DMAP project was first started, 100 cases were randomly extracted from the files in the narcotics division and compared to the data found in PSMS. Approximately 55% of the cases had one or more errors in the corresponding PSMS data file (the error rate has since improved dramatically due to better data entry quality control). The most serious errors included listing persons that had nothing to do with the case as arrestees, listing victims as offenders, and omitting arrested persons altogether. These types of errors occurred in two or three percent of cases. The most common error was the omission of the address where an incident occurred (approximately 40% of all cases).

 

The majority of cases with missing addresses involved arrest reports. A subsequent analysis of the data as well as interviews with shift supervisors of the data entry section revealed that, while a part of the missing addresses were in fact due to omission (about 10% of the cases), most of them were not entered on all reports associated with a case because they were the same for all reports. For example, if a case involved three arrests, and all three arrests occurred at the same location as the offense report, then only the offense report would have an entry for an address. In practice this is true mainly for cases in which on on-sight arrest occurred. For most types of incidents, however, including drug offenses, the arrest usually takes place at a location other than where the offense occurred. One final source of missing addresses is that fact that PSMS does not allow an invalid address to be entered. Each address is verified against a list of known city addresses in its database. The list of addresses in PSMS is incomplete, however, never having been updated since its implementation. The only way data entry operators can override the system if PSMS rejects and address is to leave the address field blank. It is unknown what percentage of missing address can be contributed to this factor.

 

The above implied that, while the missing addresses still posed some problems, most missing addresses could be derived from the offense report location. As a result, whenever DMAP updates the PSMS data in its database, it first parses all of the reports associated with a case to determine whether there are any missing addresses. If there are no missing addresses the new data is committed to the database. On the other hand, if one or more addresses are missing, DMAP first checks to see whether the offense report associated with the case has an address. If this is true, then it is assumed that any other reports with missing addresses share the same address as the offense report. Only if none of the reports have an address is the data rejected.

 

2.4.3 Property Ownership Data

 

Property ownership information is updated less frequently than 911 and PSMS data. Typically updates occur every six months. The source of the data is the City of Pittsburgh's real property file, which contains information regarding ownership, lienholders, tax status, zoning, assessed value as well as the date or purchase and amount of purchase. Since the real property file is simply a flat file there was no need to custom develop a database engine to handle the data. Updates are simply imported into an INFO file and geocoded.

 

 

2.5 Geocoding Methodology

 

Geocoding refers to the process of associating a data point with a geographic location based on some form of address. This address need not necessarily be a street or mailing address, but can be any key identifier of a particular location, such as the name of a place or the lot and block number of a property parcel. The geocoding methodology in DMAP was perhaps the most challenging aspect of its development. The problem is that while both the 911 system and PSMS verify each address after it is entered, they do not associate an xy coordinate with an address. In other words, they do not geocode addresses. In addition, they verify addresses against different, incompatible sources. The result is that an address which is verified as correct in the 911 system can be rejected by PSMS. In addition the addresses of offenders and victims in PSMS are not verified at all (mainly because a person may live outside of the city), and thus may contain many spelling errors. Thus some methods had to be developed to ensure that as many addresses from 911 and PSMS are correctly geocoded as possible.

 

Before an address is to be matched against a street coverage a first step is to parse the components of the address (such as street name, type, etc.) in order to ensure compatibility with the address format in the address coverage. This maximizes the number of data points that can be successfully matched. This step includes standardizing abbreviations for street types and street directions.

 

Once all addresses have been parsed the raw data file is ready for geocoding. During geocoding the computer attempts to find a matching address in the address coverage for each raw data point. If a match is found the xy coordinates of the matched address are added to the original data record. If the address coverage is a point coverage, the geocoded data will receive the same xy coordinates as the matched point in the address coverage. Data geocoded on polygon based address coverages obtain the xy coordinates of the geographic center of the matched polygon.

 

By far the most commonly used address coverage, however, is a line-based address coverage. For example, address coverages created with Tiger line files (available from the Census Bureau) are line based. An earlier version of DMAP used this type of address coverage. Geocoding with line-based coverages differs in that geocoded locations are only approximate. Instead of having an address for each polygon or point, line-based address coverages have an address range for each arc (line) representing a street. In most GIS systems each arc has a left and right beginning address and a left and right ending address. It is assumed that numbers on each side of the street have the same parity, i.e., even or odd. The entire arc shares the same street name, direction, suffix and type. During geocoding using line-based address coverages in ARC/INFO, the system first finds all arcs with the same street name, direction, etc. as the address that is to be matched. Once all candidate arcs are found, it tries to find an arc whose address range encompasses that of the data point to be matched. The exact xy coordinates of the geocoded location are then determined via interpolation on that arc. For example, if the starting and ending address of the arc are 100 and 200, respectively, and the address number of the data point to be matched is 150, then the geocoded location will be exactly half way along the arc.

 

Regardless of the type of address coverage, problems arise during geocoding when no exact matches can be found. There are two main causes for this: either ARC/INFO cannot find a matching street name or it cannot find a matching street number (this is particularly true for point or polygon based address coverages). Arc/Info provides a number of ways in which matching can be improved. The most commonly used one is to let the user choose one of a list of possible candidates obtained using a soundex algorithm. This is fine when only one or a few addresses are to be matched, but can become very tedious and time consuming when thousands of addresses are to be matched, as is the case in DMAP.

 

In developing DMAP it was soon found that geocoding requirements were unique and, unfortunately, well beyond the capabilities of ARC/INFO. The Tiger line files used in the earlier version of DMAP let to only 70% matches. This was mainly due to incorrect address ranges on the arcs and misspelled, incorrect or missing street names. This was clearly unacceptable for a system whose goal it was to support research, investigative and administrative functions because too much data was lost.

 

As a result it was decided to create a new point based address coverage based on both PAGIS property parcel data as well as City of Pittsburgh property tax files. The goal was not only to improve geocoding overall in terms of the proportion of data successfully matched, but also to pinpoint exact locations of incidents and calls for service rather than simply approximate their locations by means of interpolation. In addition, some way had to be found to match addresses for which no exact match could be found without having to go through a list of candidates for each unmatched data point. In creating the new address coverage for DMAP several problems were encountered that merit further discussion, since they impact the geographic accuracy of not only single data points, but by extension also data aggregated by areal units.

 

The common denominator between the City of Pittsburgh property tax file and the parcel coverage was the legal identifier for deeded land parcels, the lot and block number. Only the property tax file contained street addresses. The first step was to relate the two files via the lot and block number and use the property tax file's street address as the basis for the address coverage. However, having created the initial coverage it was found that on average only 40% of the addresses would address match. Upon closer inspection it was found that while the matchup rate between the parcel file and the tax file was 97%, a number of things were causing the poor results.

 

First, some of the street numbers associated with the property tax file were actually ranges (as in the case of apartment buildings, for example), while most of the parcels had only a single address associated with them. This posed a problem for ARC/INFO because it can only match based on address ranges or a single house number, but not a combination of both. Therefore, all data points associated with parcels having more than one address did not match.

 

Second, a large portion of PSMS and 911 call for service data (about 20%) is geocoded by intersection. In both systems an intersection will only geocode to the nearest 100 block address. In most cases, however, this is not a legitimate address. For example, the intersection of A and B streets might be recorded as 1000 B street with the 911 call for service address file. If such an address exists at the intersection of A and B streets, then all data at that location would also match in DMAP. However, if the closest address to the intersection were 1002 B street, then of course DMAP would lose fail to match incidents occurring at the intersection. This problem was a major contributor to the poor geocoding rate.

 

The third problem was that the city property tax file contained numerous spelling errors and inconsistencies. Street names were spelled incorrectly, street directions were missing, or street types were missing. In theory this could easily be fixed by manually going through each property file record and correcting any spelling mistakes. However, with 160,000 records this task proved to be insurmountable.

 

Finally, many streets in Pittsburgh are numbered streets. While Arc/Info can handle numbered streets, it does not recognize "2nd Ave." and "Second Ave." as being the same street. In addition, the tax file inconsistently uses numeric and alphanumeric representations of numbered street names.

 

In summary, a way had to be found to automatically account for and consider all of the above mentioned problems in such a way that an acceptable proportion of the data would geocode without extensive operator intervention.

 

The first step in the solution was to create a polygon based address coverage. This was done using the PAGIS parcel coverage with the lot and block number constituting the "address". Before the property file could be geocoded against this coverage, however, the problem of address ranges had to be resolved. This was done by writing a program in C to parse the tax file and create a duplicate entry for each valid street number in the address range of a particular property if the property had more than one street address. In doing so the parity of street addresses also had to be considered. For example, if the address range of a property was 100-106 Smith St., then the resulting property file after parsing would contain a record for 100, 102, 104 and 106 Smith St. where other than the street number all information was identical. The resulting parsed tax file thus contained one record for each unique address in Pittsburgh.

 

In creating the duplicate property file it was found that none of the public housing project parcels contained valid address ranges. In Pittsburgh many housing projects consist of several streets and buildings, where each building has its own address range. At the same time most housing projects comprise one large property parcel, and are associated with a single lot and block number. This makes it impossible to represent all street addresses in a single address field since more than a simple street range number is involved. Even if it were possible, however, all incidents in a housing project would geocode to the geographic center of the associated property parcel. Considering that most projects encompass quite a large area, it would be rather difficult to interpret a large number of incidents in the middle of a housing project.

 

The solution to this problem was to create pseudo parcels based on the building footprints of all structures contained in public housing projects. A separate polygon coverage was created using these building footprints, where each building polygon was assigned a lot and block number. This separate coverage was then appended to the original parcel coverage. Next, police officers equipped with maps of housing projects went on site to determine the address range for each building (surprisingly, the Pittsburgh Housing Authority could not provide such maps). After all building pseudo parcels had been assigned address ranges the parsing program described above was run again. To date not all of the housing projects have been completed due to the manpower limitation.

 

The parsed property file was then geocoded using the parcel based address coverage. About 97% of all property file records were successfully geocoded. The result of the geocoding was a point coverage containing one point for each successfully matched property record. In the case of address ranges one point for each address in the range was created. This resulting coverage was in turn converted to an address coverage based on the street address.

 

Initial testing showed that about 70% of PSMS data and 75% of call for service data was successfully geocoded. This was roughly equal to the Tiger line based address coverage. Clearly, this was still unacceptable. However, all addresses were matched "as is", i.e., no preprocessing was done in order to circumvent the problems outlined above and only the standard ARC/INFO geocoding capabilities were used. To further improve the geocoding rate a preprocessing program was designed and implemented using the C programming language and a hash table containing all valid street names in the city along with valid address ranges. The idea behind the hash table was to create the ability to find a closest matching address if no exact match was found, in particular as a solution to the intersection problem discussed earlier and, in the case of misspelled street names or omitted directions or types, automatically find and utilize the closest candidate. Finally, the preprocessing program automatically converts all numbered streets to their alphanumeric representation, i.e., "2nd Ave." to "Second Ave.", for example.

 

The hash table was created by first processing the parsed property file and extracting a list of all unique street names in the City of Pittsburgh. Next, the property file was processed again to find all valid street numbers for each street in the list. For each street an associated file was created containing a sorted list of street numbers. There are three hash keys in the table: the primary key is the street name, followed by the street type and direction. Each hash table entry points to a file containing the sorted street numbers.

 

The preprocessing program works as follows: first a backup copy is made of the original file containing the raw data to be geocoded. The backup file is then processed record by record, where each record's address is parsed and verified using the hash table. If the street name contained in an address is numeric, it is converted to its alphanumeric representation. Next, the program attempts to find an exact match for the street name, direction and type. If an exact match is found, the program looks up the associated list of street numbers and tries to find a matching street number. If a match is found the record is written back to the original file unaltered. If no match is found, the program finds the closest matching number, considering the parity. The address in the current record is altered and written back to the original table.

 

In the event that DMAP cannot find an exact match for the street name, a list of candidates is compiled using a soundex algorithm, taking into consideration the street type and direction. From this list the closest match (defined as the percentage of characters that match) is used and again DMAP proceeds to find the closest matching street number.

 

Using the preprocessing program DMAP was able to greatly improve the proportion of raw data that was successfully geocoded. For PSMS data the match rate is 95%, and for call for service data it is 97%. While not perfect, these levels are more acceptable than those of earlier geocoding attempts.

 

Random checks of those addresses which were altered were done in order to test the relative accuracy of the preprocessing program. It was found that in 75% of sampled cases the geocoded location was on the average only four to six street numbers away from the original. Only 5% of all altered raw data addresses geocoded more than a block away from the original address. This implies that, when data are aggregated by areal unit, the error rate in aggregation due to data points not being in their true areal unit is minimized.

 

 

2. 6 Data Collection and Aggregation

 

With the exception of census data, all of the data used in this dissertation were derived from geocoded data contained in DMAP. Since all of the empirical data was aggregated by either census tract or grid tile, a program was written to automate the aggregation. First, columns were added to the address coverage, one for each polygon layer in the DMAP geobase (for example, census tract and neighborhood boundaries). Next, each polygon in each layer was overlaid on the address coverage to determine which address points it encompassed. The corresponding boundary columns in the address coverage were then updated. The end result was that each record in the address coverage also contained the census tract number, neighborhood name, grid tile number, patrol sector, police zone number, fire district number and emergency response area number.

 

For PSMS and 911 incidents aggregating data by areal unit was then simply a matter of relating the geocoded data files to the address coverage via the address and running a frequency algorithm by areal unit and incident type (such as drug nature code or offense code) to produce counts by census tract, for example.

 

 

2.7 Conclusion

 

This section described the various data sources used in DMAP and in this dissertation, and outlined how the data were collected and processed. It is important to note that even though every effort was made to minimize the amount of data lost during geocoding and to maximize the quality of the data, any interpretation must take into consideration the factors outlined above.

 

However, despite all of the problems associated with the data set, DMAP to this day remains one of the few truly integrated geographic information systems. With the implementation of a new 911 and police records management system it is anticipated that the quality of data will increase further.

 

3. Modeling Street-Level Illicit Drug Markets

 

3.1 Introduction

 

Urban geographers have long studied the ecology of crime; that is, the relationship between crime, the built environment, and land uses. A major impediment, however, has been the lack of data (Dunn, 1980; Sherman et al, 1989). Official police records systems generally have not controlled quality of address data; furthermore, it has been well documented that police under-report crimes; e.g., crimes that have low solvability factors. Several police departments, however, are in the process of implementing geographic information systems, thereby providing high quality location data and new opportunities to study the ecology of crime. Also, computer aided dispatch data from 911 emergency calls provide a new source of data for research, not subject to police screening.

 

This chapter uses data from the Pittsburgh Drug Market Analysis Program (DMAP) geographic information system to investigate the effect of traditional criminological, ecological, and enforcement factors on the formation of open-air, street-level drug markets. Pittsburgh, Hartford, Jersey City, Kansas City, MO, and San Diego were funded by the National Institute of Justice to develop DMAP systems. The purpose of DMAP are to: 1) support street-level drug case investigations, 2) identify drug markets, 3) track enforcement induced displacement of drug market "hot spots", and 4) evaluate street-level enforcement strategies.

 

Pittsburgh has had computer aided dispatching of 911 emergency calls for service since 1989 and computerized criminal offense and arrest reports since 1990. DMAP was implemented in 1991. All address data are verified on data entry into the CAD and police reporting systems, and DMAP has a land parcel-based point coverage for address matching that successfully places approximately 95 percent of police events on maps. Since drug offenses are generally victimless crimes, offense and arrest report data reflect police policy and management decisions in regard to allocation of scarce police resources, and not the actual activity levels of drug markets. Hence this study uses 911 call data, with the "drug" nature code, as the dependent variable. These are emergency calls to police, reporting drug-related offenses (e.g., illicit drug use or dealing). The 911 data do not have the under-reporting problems of offense and arrest reports, but have the limitation of being unconfirmed citizen perceptions of drug offenses. Officers responding to 911 calls attempt to verify the nature of calls and have the responsibility to revise recorded nature codes (robbery, vandalism, shots fired, etc.) as needed when radioing back to the dispatching center that they are cleared and available for responding to another call.

 

Section 2 provides a brief overview of drug enforcement approaches and section 3 reviews the ecology of crime literature. Section 4 provides a drug market model specification and describes the DMAP data used in this study. Section 5 provides modeling results of spatial econometric methods. Finally, section 6 concludes the chapter with a summary and future work.

 

3.2 Drug Enforcement

 

There are opportunities for drug enforcement at each of the stages of illicit drug production and distribution. The illegal goods and market system consists of mostly foreign and some domestic crops, offshore and onshore processing/chemical manufacturing, transportation, distribution chains, and coordination between buyers and street-level dealers. Law enforcement agencies have tried, in varying degrees, to disrupt the production/distribution cycle at every stage. Enforcement higher up the production-distribution chain promises to cut supplies and increase prices, thereby reducing use.

 

In the late 1980s, there was a growing interest in street-level dealing as a weak link in the chain. The point of transaction between the street dealer and user is vulnerable: it is out in the open, money and drugs have to change hands, and it is difficult to relocate drug dealing points if disrupted and re-establish contacts. Drug dealers are unable to advertise, and are often hemmed in by turf claimed by other dealers.

 

Municipal police have several strategies for disrupting street-level drug markets in an effort to 1) increasing the time between successful transactions, thereby decreasing drug use; 2) increasing the safety of neighborhood residents by getting open-air dealers off the streets; and 3) using street-level contacts and arrests to identify mid-level and higher dealers for arrest. The Pittsburgh narcotics impact squad has several units and teams that sweep hot spot corners and bars, disrupting markets and arresting street dealers and users. Crackdowns saturate neighborhoods with narcotics police walking, patrolling, and disbanding groups of people in known drug markets. Community oriented policing puts foot patrol in communities to work directly with residents to identify long term solutions to drugs and other problems. Some situations, for example indoors dealing or working from the street-level up to mid-level dealers, require undercover or confidential informant buys. Purchased drugs are sent to the crime lab, and if positive result in warrants for arrest.

 

In response to enforcement, street-level dealing displaces in location, generally only a few blocks away. Before the Pittsburgh narcotics squad had weekend or 24 hour coverage, drug dealing often displaced to uncovered times. Corners that were impacted by street sweeps soon employed seller teams: lookouts, touters to attract and screen buyers, money holders and drug holders. Dealers commandeered or "rented" font porches to get the benefits of both public visibility and private property. Bars are attractive public facilities for drug dealing. Dealers and users can legitimately loiter, they can get lost in the crowd and toss drugs in a corner if the bar is raided, and there is limited guardianship controlling activities in and around bars.

 

DMAP has been successful in tracking displacement in time and space, identifying new hot spots before street officers do. After three years of police continually disrupting displaced hot spots, the two largest drug markets in Pittsburgh were dramatically improved.

 

3.3 Literature on the Ecology of Crime

 

The early ecology of crimes studies, starting with Clifford Shaw's seminal study of delinquency in Chicago (1929), tended to be descriptive, noting the concentration of crime in central business districts (CBD's). Concentrations of delinquents' varied inversely in proportion to distance from the city center. Delinquents were also located adjacent to heavy industry and commerce. Attending socioeconomic conditions were areas that tended to have physical deterioration, decreasing population, poverty, minorities, and immigrant population.

 

Schmid (1960) found similar patterns in Seattle, but also found crime pocketed in certain areas, like "skid row." Robbers commit robberies in CBD's, far from their residences. Robber offender characteristics included male unemployment, lower number of school grades completed, lower median income, and fewer people 14-and-over unmarried. Low family and economic status were highly correlated with robbery, larceny and auto theft. Family instability was an important factor contributing to delinquency. Research on criminal careers (e.g., Blumstein, Cohen and Farrington, 1988; Blumstein and Cohen, 1987) finds an at-risk age group - late teens to early twenties - that is more likely to commit crimes.

 

Lander (1954) found evidence that the percentage of non owner-occupied housing and percentage of nonwhite population was associated with delinquency. Since ethnic minorities, especially nonwhites also tend to have low socioeconomic and family status, poverty is generally the underlying causal factor for crime. Lander also found that "generally, higher rates of reported personal attack crimes prevail in lower-class residential areas of cities. Often these are predominantly black [areas]...Higher rates of property crimes are generally reported to characterize the central business areas of cities."

 

More recently there have been some studies providing insight or theories to the ecology of crime. Routine activities theory, due to Cohen and Felson (1979), states that criminal events result from motivated offenders, suitable targets, and absence of capable guardians against crime, and a converging of offenders and victims nonrandomly in time and space. This theory integrates several perspectives on crime, including frequency of convergence in time and space; rhythm, regular periodicity at which events occur; tempo, the number of events per unit of time; and timing/coordination of criminals and victims. Related literatures studied factors affecting motivated offenders, opportunity of availability of targets, lifestyles of victims, and the deterrent effects of official and unofficial policing implied by guardianship.

 

Sherman et al (1989) was the first to test routine activities theory with spatial data. Using 911 call data, this paper found substantial concentrations of police calls in relatively few "hot spots." Geographers have long recognized day-to-day clustering of people residing over wide areas into small nodes of activity (Brantingham and Brantingham, 1984). This is consistent with drug dealing observed in Pittsburgh. There are considerably more potential drug dealing points than actual hot spots, suggesting that a monitoring system is important to track enforcement-induced displacement, to keep drug markets disorganized.

 

Gorr and Olligschlaeger (1994) conducted an exploratory study, using weighted spatial adaptive filtering, with the 1991 data also included in this chapter. They found that the eleven open-air drug markets of Pittsburgh are, for the most part, located in areas with high percentages of black population (over 85% black). These drug markets, however, cover only somewhat more than half the total area with high black population, so that there are factors other than poverty/race at work. The four public housing drug markets were adequately estimated using percentage black population as the only explanatory variable and constant parameters over space relative to neighboring areas. The four public housing drug markets in Pittsburgh were adequately estimated using percentage black population as the only explanatory variable and constant parameters over space relative to neighboring areas. Public housing projects are homogenous in population and land use. The remaining seven markets, which have a mix of commercial and residential land uses, required spatially-varying parameters for the percentage black variable, to improve the fit of the data. In summary, the exploratory study suggested that characteristics other than those of the population of drug markets are necessary for modeling drug markets.

 

While previous authors applied routine activity theory to crimes with victims, this theory also has implications for illicit drug markets, on tacit coordination of dealers and buyers of drugs. Open-air drug dealing takes place in high poverty areas. Lower income persons generally have lower quality private space; e.g., higher levels of crowding in poor households. Hence, public spaces with low guardianship are primary candidates for open-air drug markets; areas such as run-down commercial strips, bars, and public housing projects with high proportions of female headed households. Skogan (1986) reported that fear of crime is higher among residents of high rise buildings than those living in smaller buildings. They feel that they do not have control of their environments and their space.

 

Table 3.1 Drug Market Model Variable Definitions and Expected Directions of Effects on the Dependent Variable

Dependent Variable:

 

DRGTOT

Total 911 Drug Calls1

Traditional Demographic/Economic Crime Indicators:

(+) POPN

Population2

(+) PBLK

Percentage Black Population2

(+) PAR

Percentage At-Risk Population (Ages 12 to 24)2

(+) PFHH

Percentage of Female-Headed Households2

(-) MDHHINC

Median Household Income2

Ecological Crime Indicators

 

(+) NBARS

Number (of Nuisance) Bars3

(+) PCOMM

Percent (of All Land Parcels Zoned) Commercial4

(+) PPHFAM

Percent (of All Housing Units That Are) Public Housing Family5

(-) PPHELD

Percent (of All Housing Units That Are) Public Housing Ederly5

Spatially Lagged Enforcement (Drug Dealing Displacement) Indicators:

(+) SCONDGRA

Sum of Contiguous Census Tracts' Drug Arrests

(+) SCONIMPA

Sum of Contiguous Census Tracts' Nuisance Bar Impact Raids

1Pittsburgh Police Bureau Computer Records

21990 Census

3Nuisance Bar Task Force Records

4Allegheny County Property Tax File

5Pittsburgh Public Housing Authority Records

 

Roncek and Maier (1991) found the number of taverns and lounges in city blocks in Cleveland positively associated with index crimes. Taverns' influence on crime was compounded when taverns were located in areas with more anonymity and lower guardianship. Note that five out of Sherman et al's (1989) top ten hot spots had bars. Eight out of the top twenty-five had bars.

 

3.4 Model Specification and Data

 

We used spatial overlay of 1990 census tract boundaries on police event and land parcel data for geocoding by census tracts. We then aggregated those data to annual, tract-level counts and sums for 1990 through 1992. Our goal was to investigate the power of ecological factors over traditional criminological modeling factors on 911 drug calls, and to investigate the effect of drug displacement using spatially lagged enforcement variables.

 

Table 3.1 defines the variables used in this study and their expected direction of influence on 911 drug calls for service. The dependent variable is the total number of 911 drug nature code calls for 1991 by census tract. One advantage of this measure of drug dealing is that drug calls are largely independent of narcotics squad resource allocations and activities. Less than 20 percent of drug calls are police initiated; the remainder are citizen initiated. A disadvantage of 911 data is that citizen caller behavior and perception underlay the data. For example, a resident may call 911 complaining that a youth gang is dealing drugs on his/her street corner, but the youths may not be dealing drugs at all. Responding officers are required to confirm the validity of initial call nature codes, but by the time that they arrive on the scene, a reported group may have disbanded or otherwise changed. Nevertheless, new drug hot spots detected with pin map displays of 911 drug calls have repeatedly been verified in Pittsburgh through follow-up police observation and found in aggregate to be an accurate indicator of the relative levels of drug dealing activity over time and locations.

 

The first group of independent variables are population characteristics traditionally used in criminological studies. Population (POPN) is simply a scale factor, which of course should be positively related to total drug calls. There are two advantages to placing this variable on the right hand side of the model, rather than dividing it into drug calls. First, we have run Poisson regressions (not reported in this chapter), appropriate for the count data of the dependent variable, which require the current formulation. Second, industrial census tracts and tracts in the central business district which are low in residential population are outliers when using calls per capita but are not unusual in the current formulation. Small sized populations are a liability as a denominator on the left hand side, but an asset on the right hand side in regard to outliers.

 

Percentage black population (PBLK) is highly correlated with measures of poverty (see Table 3.3 below); e.g., -0.583 with the log of median household income and 0.747 with female headed households. As discussed earlier, however, Olligschlaeger and Gorr (1994) found that about half of the black census tracts are in drug markets, so that other factors are at work in determining drug markets. The criminal career literature suggests that there is a crime-prone age group, so the next variable, percentage at-risk population, ages 12 to 24 (PAR) is included. A strong indicator of low social control and family status is the percentage of female headed households (PFHH). Finally, median household income is a direct measure of wealth and poverty. The expected direction of influence of all of these variables on drug calls is obvious.

 

The next group of variables in Table 3.1 are ecological measures. Pennsylvania has a nuisance bar law and Pittsburgh has a nuisance bar task force that 1) identifies bars chronically contributing to crime in neighborhoods, 2) directs a variety of enforcement strategies to correct problems, and 3) builds cases and initiates actions to close nuisance bars. Pittsburgh has about 60 nuisance bars, and a few are closed each year (DMAP has been instrumental in compiling and portraying information to close nuisance bars). Nuisance bars are primary drug dealing hot spots, both inside the bars and outside in the immediate vicinity.

 

Commercial land uses provide opportunities for drug dealers to linger and meet buyers, without the sense of personal guardianship and control over private spaces often present in residential areas. The percentage of land parcels that have commercial land uses (PCOMM) is a rough indicator of potential drug dealing areas.

 

The percentage of all housing units that are public housing units intended for families with children (PPHFAM) are highly vulnerable to drug use and dealing. Often physically and socially isolated, public housing communities are plagued by concentrations of social, economic and other limitations. By contrast, the percentage of all housing units that are public housing designed for the elderly (PPHELD) be characterized by lower levels of drug dealing activity, simply because of their inhabitants - it is difficult for young dealers and users to go unnoticed in these communities.

 

Table 3.2 Descriptive Statistics (n = 171 Census Tracts)

Variable

Year

Mean

Standard Dev.

Minimum

Maximum

DRGTOT

1990

29.6

71.3

0.0

598.0

 

1991

37.4

75.8

0.0

450.0

 

1992

36.4

74.3

0.0

440.0

POPN

1990

2163.0

1336.0

12.0

8523.0

PBLK

1990

29.2

34.1

0.0

98.8

PAR

1990

18.9

9.8

1.9

76.7

PFHH

1990

18.4

12.5

0.0

71.8

MDHHINC

1990

21,158.0

10,353.0

4,999.0

82,553.0

NBARS

1990

0.323

0.711

0.0

4.0

 

1991

0.284

0.664

0.0

4.0

 

1992

0.275

0.658

0.0

3.5

PCOMM

1990

606

12.9

0.0

99.0

PPHFAM

1990

4.0

16.5

0.0

100.0

PPHELD

1990

1.2

6.9

0.0

82.1

SCONDRGA

1990

109.3

166.0

0.0

944.0

 

1991

110.4

146.7

0.0

726.0

 

1992

101.8

142.6

0.0

718.0

SCONIMPA

1990

20.9

33.8

0.0

152.0

 

1991

17.1

31.5

0.0

166.0

 

1992

12.0

22.0

0.0

102.0

 

 

The last two variables are indicators of potential drug displacement, computed as spatial lags (sums) of drug activities in contiguous census tracts. We used a rooks case, first order contiguity matrix with connections broken across Pittsburgh's three major rivers. The variables are the annual monthly sum of drug arrests in contiguous tracts (SCONDRGA), and a similar measure for nuisance bars raids in contiguous tracts. If large enough displacement occurs, we expect increased enforcement to positively influence drug calls in a nearby tract. Alternatively, these variables may serve merely as indicators of widespread drug market areas. We believe that we will be able to distinguish these two explanations in future relying on time series data that tracks levels of activity before and after enforcement actions.

 

Note that we did not include drug arrests and nuisance bar raids as variables for the tract of observation itself. Drug calls come to the attention of police and so eventually result in arrests. Likewise, drug calls generate nuisance bar raids at a later time. Increased police activity in an area may also stimulate increased willingness to report drug incidents by citizens. So there is a complex relationship between enforcement and drug calls. Future analyses will include these variables, lagged over time, in cross-sectionally pooled, monthly time series data.

 

Table 3.2 provides descriptive statistics for the variables of Table 3.1. There are 171 census tracts in the Pittsburgh DMAP system, covering approximately 60 square miles. Table 3.3 provides the corresponding simple bivariate correlations, with variables in the form used in models (some variables have been logged, or logged twice). Some multicollinearity is evident. Four pairs of independent variables have correlations exceeding 0.6.

 

Table 3.3 Pearson Correlation Matrix1 (with 1991 Data for LLDRGTOT, NUMBARS, LLSCONGRGA, and LLSCONIMPA

 

1

2

3

4

5

6

7

8

9

10

11

12

1. LLDRGTOT

1.00

0.12

0.66

0.11

0.54

-0.48

0.47

0.16

0.26

-0.01

0.56

0.57

2. POPN

0.12

1.00

-0.16

0.17

-0.08

0.18

0.02

-0.09

-0.01

-0.04

-0.08

-0.05

3. PBLK

0.66

-0.16

1.00

0.05

0.75

-0.58

0.37

0.02

0.44

0.20

0.51

0.50

4. PAR

0.11

0.17

0.05

1.00

-0.04

-0.13

-0.09

0.24

0.08

-0.07

0.19

-0.01

5. PFHH

0.54

-0.08

0.75

-0.04

1.00

-0.65

0.15

-0.22

0.73

0.16

0.21

0.25

6. LMDHHINC

-0.48

0.18

-0.58

-0.13

-0.65

1.00

-0.21

-0.16

-0.57

-0.31

-0.32

-0.25

7. NBARS

0.47

0.02

0.37

-0.09

0.15

-0.21

1.00

0.28

-0.08

0.00

0.25

0.25

8. PCOMM

0.16

-0.09

0.02

0.24

-0.22

-0.16

0.28

1.00

-0.11

-0.05

0.22

0.11

9. PHHFAM

0.26

-0.01

0.44

0.08

0.73

-0.57

-0.08

-0.11

1.00

0.23

0.06

0.06

10. PPHELD

-0.01

-0.04

0.20

-0.07

0.16

-0.31

0.00

-0.05

0.23

1.00

0.08

0.13

11. LLSCONDRGA

0.56

-0.08

0.51

0.19

0.21

-0.32

0.25

0.22

0.06

0.08

1.00

0.62

12. LLSCONIMPA

0.57

-0.05

0.50

-0.01

0.25

-0.25

0.25

0.11

0.06

0.13

0.62

1.00

1Note that "L" at the beginning of a variable stands for "LOG" and "LL" stands for "LOGLOG".

 

Figure 3.1 is a map showing the correlation between percent black population at the census tract level and the level of 911 drug calls for service. High concentrations of drug calls are predominantly in tracts with high percentages of black population, but not all areas with high levels of black population have drug dealing. Some other factors must also be determining drug dealing besides race. Furthermore, race drops out of models estimated in the next section when underlying factors are entered such as measures of poverty and especially spatially lagged measures of drug enforcement.

 

 

3.5 Modeling Results

 

Table 3.4 Progression of Models for 1990 (n = 171)

Dependent Variable: LOGLOG(DRGTOT)

COEFFICIENT

MODEL 1

MODEL 2

MODEL 3

MODEL 4

CONSTANT

0.367***

2.715***

2.020**

1.425

Traditional

       

POPN (+)

9.04E-5***

9.54E-5***

8.56E-5***

8.24E-5***

PBLK (+)

0.00913***

0.00736***

0.00450***

0.00227

PAR (+)

 

0.00155

0.00250

0.00169

PFHH (+)

 

-0.000507

0.0105*

0.0124**

LOG(MDHHINC) (-)

 

-0.236***

-0.184*

-0.150*

Ecological

       

NBARS (+)

   

0.149***

0.158***

PCOMM (+)

   

0.00514*

0.00453*

PHFAM (+)

   

-0.00368

-0.00255

PPHELD (-)

   

-0.00461

-0.00460

Lagged Enforcement

       

LOGLOG(SCONDRGA) (+)

     

0.203*

LOGLOG(SCONIMPA) (+)

     

0.0759

R2 Adjusted

0.484

0.517

0.613

0.642

F-Test

80.658***

37.458***

30.855***

28.740***

Normality

       

Shapiro-Wilks

0.987

0.983

0.978

0.986

Kiefer-Salmon

0.985

2.729

14.641***

14.673***

Heteroscedasticity

       

Breusch-Pagan

0.920

4.852

-

-

Koenker-Bassett

-

-

11.775

15.554

White

7.490

24.315

-

-

Spatial Dependence

       

Moran's I

3.751***

3.037**

2.855**

1.269

Lagrange Multiplier (Error)

12.140***

7.090**

5.921*

0.641

Kelehian-Robinson

15.715**

12.463

11.990

10.970

Lagrange Multiplier (Lag)

10.164**

9.263**

6.714**

0.015

Significance Levels: * = 0.05; ** = 0.01; *** = 0.001

 

 

We used SpaceStat, written by Luc Anselin, to run multiple regression models based on the variables in Tables 3.1 and 3.2. Heteroscedasticity and non-normal errors were thought to exist in the data, given the clustering or hot spot nature of drug dealing. Hence we used the 1990 data to try several log (base e) transformations, in attempts to get the residuals into an acceptable form. Using SAS and SpaceStat, we found that double logs of the dependent and spatially lagged variables, plus logs of family household income were sufficient to handle these issues.

 

Tables 3.4 through 3.6 provide results of a progression of models for each year. For example, Table 3.4 is for 1990. Model 1 simply uses population and percentage black population to estimate drug calls. Both variables are highly significant with the expected signs, result in an adjusted R-squared of 0.484, have reasonably normal errors and no heteroscedasticity, but strong indications of spatial dependence in the errors and dependent variable.

 

Table 3.5 Progression of Models for 1991 (n = 171)

Dependent Variable: LOGLOG(DRGTOT)

COEFFICIENT

MODEL 1

MODEL 2

MODEL 3

MODEL 4

CONSTANT

0.410***

2.055*

2.206*

1.040

Traditional

       

POPN (+)

8.36E-5***

8.75E-5***

8.43E-5***

7.59E-5***

PBLK (+)

0.00975***

0.00834***

0.00578***

0.00168

PAR (+)

 

0.000511

0.00113

0.00098

PFHH (+)

 

0.000318

0.0110*

0.0147***

LOG(MDHHINC) (-)

 

-0.165*

-0.158*

-0.117

Ecological

       

NBARS (+)

   

0.134**

0.140***

PCOMM (+)

   

0.00491*

0.00371

PHFAM (+)

   

-0.00466

-0.00309