Joseph Zemmels, Heike Hofmann, Susan VanderPlas
Thank you to everyone at the Roy J Carver High Resolution Microscopy Facility for collecting cartridge case scans.
Funding statement
This work was partially funded by the Center for Statistics and Applications in Forensic Evidence (CSAFE) through Cooperative Agreement 70NANB20H019 between NIST and Iowa State University, which includes activities carried out at Carnegie Mellon University, Duke University, University of California Irvine, University of Virginia, West Virginia University, University of Pennsylvania, Swarthmore College and University of Nebraska, Lincoln.
Cartridge Case: metal casing containing primer, powder, and a projectile
Breech Face: back wall of gun barrel
Breech Face Impressions: markings left on cartridge case surface by the breech face during the firing process
Cartridge case recovered from crime scene vs. fired from suspect’s firearm
Place evidence under a comparison microscope for simultaneous viewing (Thompson 2017)
Assess the “agreement” of impressions on the two cartridge cases (AFTE Criteria for Identification Committee 1992)
National Research Council (2009):
“[T]he decision of a toolmark examiner remains a subjective decision based on unarticulated standards and no statistical foundation for estimation of error rates”
President’s Council of Advisors on Science and Technology (2016):
“A second - and more important - direction is (as with latent print analysis) to convert firearms analysis from a subjective method to an objective method. This would involve developing and testing image-analysis algorithms for comparing the similarity of tool marks on bullets [and cartridge cases].”
We discuss the Automatic Cartridge Evidence Scoring (ACES) algorithm to compare 3D topographical images of cartridge cases
Separated cartridge cases into quartets: 3 known-match + 1 unknown source
Match if fired from the same firearm, Non-match if fired from different firearms
218 examiners tasked with determining whether the unknown cartridge case originated from the same pistol as the known-match cartridge cases
Match Conclusion | Non-match Conclusion | Inconclusive Conclusion | Total | |
---|---|---|---|---|
Ground-truth Match | 1,075 | 4 | 11 | 1,090 |
Ground-truth Non-match | 22 | 1,421 | 735 + 2* | 2,180 |
True Positive (%) | True Negative (%) | Overall Inconclusives (%) |
---|---|---|
99.6 | 65.2 | 22.9 |
3D topographic images using Cadre\(^{\text{TM}}\) TopMatch scanner from Roy J Carver High Resolution Microscopy Facility
x3p file contains surface measurements at lateral resolution of 1.8 micrometers (“microns”) per pixel
Obtain an objective measure of similarity between two cartridge cases
Examiner takes similarity score into account during an examination
Challenging to know how/when these steps work correctly
Isolate region in scan that consistently contains breech face impressions
How do we know when a scan is adequately pre-processed?
Cross-correlation function (CCF) measures similarity between scans
Split one scan into a grid of cells that are each registered to the other scan (Song 2013)
For a matching pair, we assume that cells will agree on the same rotation & translation
Why does the algorithm “choose” a particular registration?
Measure of similarity for two cartridge cases
Maximized CCF (0.27 in example below) (Vorburger et al. 2007; Tai and Eddy 2018)
Congruent Matching Cells (11 CMCs in example below) (Song 2013)
What factors influence the final similarity score?
A number of questions arise out of using comparison algorithms
How do we know when a scan is adequately pre-processed?
Why does the algorithm “choose” a particular registration?
What factors influence the final similarity score?
We wanted to create tools to address these questions
Well-constructed visuals are intuitive and persuasive
Useful for both researchers and practitioners to understand the algorithm’s behavior
Emphasizes extreme values in scan that may need to be removed during pre-processing
Allows for comparison of multiple scans on the same color scheme
Map quantiles of surface values to a divergent color scheme
Separate aligned scans into similarities and differences
Useful for understanding a registration
::: {.fragment fade-out fragment-index=1}
:::
For a matching cartridge case pair…
There should be (many) more similarities than differences
The different regions should be relatively small
The surface values of the different regions should follow similar trends
Statistics are useful for justifying/predicting the behavior of the algorithm
Ratio between number of similar vs. different observations
Compare to a non-match cell comparison:
Size of the different regions
Compare to a non-match cell comparison:
Correlation between the different regions of the two scans
Compare to a non-match cell comparison:
Features:
From the full scan comparison:
Similarities vs. differences ratio
Average and standard deviation of different region sizes
Different region correlation
From cell-based comparison:
Average and standard deviation of similarities vs. differences ratios
Average and standard deviation of different region sizes
Average different region correlation
For a matching cartridge case pair…
Correlation should be large at the full scan and cell levels
Cells should “agree” on a particular registration
Compute summary statistics of full-scan and cell-based registration results
Features:
Correlation from full scan comparison
Mean and standard deviation of correlations from cell comparisons
Standard deviation of cell-based registration values (horizontal/vertical translations & rotation)
For a matching cartridge case pair…
Cells should “agree” on a particular registration
The estimated registrations between the two comparison directions should be opposites
Features:
DBSCAN cluster indicator
Average DBSCAN cluster size
Absolute sum of density-estimated rotations
Root sum of squares of the cluster-estimated translations
Compute 19 features for each pairwise comparison
Use 510 cartridge cases from Baldwin et al. (2014) to fit a logistic regression classifier
Train random logistic regression using 21,945 pairwise comparisons from 210 scans
Explore two optimization criteria:
Model that maximizes the overall accuracy
Model that balances true positive and true negative rates
Test model on 44,850 pairwise comparisons from 300 scans
Compute true positive and true negative rates for each model
Consider distributions of similarity scores for truly matching and non-matching pairs
Source | True Pos. (%) | True Neg. (%) | Overall Inconcl. (%) | Overall Acc. (%) |
---|---|---|---|---|
ACES, Min. Error | 92.3 | 99.9 | 0.0 | 99.4 |
ACES, Balanced TP/TN | 95.7 | 98.1 | 0.0 | 97.9 |
Ames I | 99.6 | 65.2 | 22.9 |
We consider classification accuracy as a means of selecting/comparing models.
In practice, the examiner would use the similarity score as part of their examination.
Automatic comparison algorithms are useful for obtaining numerical measures of similarity for two pieces of evidence
Visual diagnostics help explain the inner mechanisms of comparison algorithms
Our visual diagnostic tools aid in understanding each step of a cartridge case comparison algorithm
The Automatic Cartridge Evidence Scoring (ACES) algorithm shows promise at measuring the similarity between cartridge cases
Develop free, open source software to implement visual diagnostics & ACES
We train our model on 10 firearms, all with the same make/model and ammunition
Need additional “stress tests” (different ammunition/firearms, degradation, etc.)