Generating "data" and a Hole to Put it in
By: S. Rowan Wolf, Ph.D., Uncommon
Thought Journal
June 05, 2004 This work is under a fair use Creative Commons License
It seems that a big part of the "war on terror" is a data war. Various
agencies of the US government are gathering phenomenal amounts of
information of individuals from the interlinked "crime" records program
MATRIX, to a huge expansion of the US visitors database. How much
data is being gathered? Enough that the National Security Administration
is building a data storage facility in Colorado that will hold the
equivalent of two Library of Congresses every two days.
According to a newly released GAO report - Data Mining: Federal Efforts Cover a Wide Range of
Uses (May 2004), there are 199 current datamining efforts on the
books with 131 that are already in use (Government data-mining lives on, Cnet, 6/01/04).
One such massive datamining project was awarded to Accenture (see UTJ
article Accenture. Doesn't That Name Sound Familiar?) who
received a $10 Billion contract for border security. This contract is
supposed to extend to all land-based entries into the US. Accenture is
using twenty-nine subcontractors and they include AT&T, KBR, and Dell
Computer (Cnet article).
So there is copious amounts of data being gathered. The Colorado
facility will hold hold the equivalent of 10 terabytes of data every day
(NSA's new Colorado data warehouse: a Library of Congress
per day, UnderReported, 6/03/04). I do have to question the
completeness of the GAO Reports listing of data mining programs as it
totally misses both the renowned CAPPS II system and MATRIX. The CAPPS
program for tracking airline passengers is not even mentioned. The
MATRIX system is discussed, but not included in their listing of
projects. Their description of MATRIX is:
One example of a large-scale development effort launched in
the wake of the September 11 attacks is the Multistate Anti-terrorism
Information
Exchange System, known as MATRIX. MATRIX, currently used in five states,
(Connecticut, Florida, Michigan, Ohio, and Pennsylvania) provides the
capability to store, analyze, and exchange sensitive terrorism-related
and other criminal intelligence data among agencies within a state,
among states, and between state and federal agencies. Information in
MATRIX databases includes criminal history records, driver’s license
data, vehicle registration records, incarceration records, and digitized
photographs. Public awareness of MATRIX and of similar large-scale data
mining or data mining-like projects has led to concerns about the
government’s use of data mining to conduct a mass “dataveillance”7—a
surveillance of large groups of people—to sift through vast amounts of
personally identifying data to find individuals who might fit a
terrorist profile.
There are certainly some drawbacks to living in the information age.
With the amount of data being collected by various government agencies,
one must wonder how big the Colorado facility is, and at the computer
equipment necessary to even store such data - much less access it. My
mind boggles, but it would seem that at least one supercomputer would be
required (and probably more than one in fairly short order). It wasn't
mentioned whether supercomputers were on the NSA's shopping list. As a
researcher, I have to wonder how anyone would ever make sense of that
amount of data. As a researcher, I also know that with that many bits
and pieces of people's lives virtually any scenario can be made plausible.
I have include a selective listing of programs from the GAO report
below. All descriptions are direct quotes.
From the Defense Intelligence Agency:
Insight Smart Discovery: Will be a data mining knowledge discovery tool
to work against unstructured text. Will categorize nouns (names,
locations, events) and present information in images. (planned)
Verity K2 Enterprise: Mines data from the intelligence community and
Internet searches to identify foreign terrorists or U.S. citizens
connected to foreign terrorism activities. (operating)
PATHFINDER: Is a data mining tool developed for analysts that provides
the ability to analyze government and private sector databases rapidly.
It can
compare and search multiple large databases quickly. (operating)
Autonomy: Is a large search engine tool that is used to search hundreds
of thousands of word documents. Is used for the organization and
knowledge discovery of intelligence. (operating)
Airforce
Integrated Space Warfare Center (SWC) Information System:Will be an
internal database containing information on all development/execution
activities within the SWC. Will be used by all management and analyst
personnel to track and align the center’s activities to warfighter
needs, report on execution status, financial status, schedule status,
and performance measurements. (planned)
Genomic and Proteomic Results Analysis: Analyzes National Institutes of
Health’s genetic data. (operational)
Computer Network Defense System: Evaluates network activities to create
rules for intrusion detection system signature sets. ((operating)
Modus Operandi Database: Is an investigative tool used to identify and
track trends in criminal behavior. It links characteristics of crimes
and provides details on crime scenes and other crime factors. (operating)
Department of Education
Citizenship of PLUS Loan Borrowers— National Student Loan Data Systems:
Looks for issues regarding citizenship among its PLUS loan borrowers.
Flags records based on selected criteria and requests additional
information from
schools. (operating)
Foreign Schools Initiatives National Student Loan Data System/Central
Processing: Is a proactive investigation effort that looks at whether
financial aid
was granted individuals attending foreign institutions during periods of
nonenrollment. (operating)
Title IV Applicant— Death Database Match: Compares Department of
Education data with the Social Security Administration’s death database
to detect fraud or criminal activity. (operating)
OIG—Project Strikeback: Compares Department of Education and Federal
Bureau of Investigation data for anomalies. Also verifies personal
identifiers. (operating)
CheckFree Software/Purchase Card Program: Takes monthly billing
information from the Bank of America to create reports on purchases,
purchase quantity, and frequency of purchases. Data are mined for
instances of fraud or abuse. (operating and links to private sector data)
Grant Administration and Payment System: Assists in managing grant
activities and aids in detecting instances of fraud or abuse in grant
activities. (operating and links to private sector data)
Office of the Inspector General (OIG) Projects: Tumbleweed/ Snowball: Is
part of an OIG investigation to determine potential fraud of financial
aid grants primarily in New Hampshire. (operational)
DoE
Counterintelligence Automated Investigative Management System (CI-AIMS):
Is an investigative management system used by Department of Energy (DOE)
field sites to track investigative cases on individuals or countries
that threaten DOE assets. Information stored in this database is also
used to support federal and state law enforcement agencies in support of
national security. (operational)
Autonomy: Will be used to mine a myriad intelligence-related databases
within the intelligence community to uncover criminal or terrorist
activities relating to DOE assets. (planned)
Counterintelligence Analytical Research Data System (CARDS): Is used to
log briefings and debriefings given to DOE employees who travel to
foreign countries or interact with foreign visitors to DOE facilities.
Data are mined to identify potential threats to DOE assets. (operational)
Center for Disease Control
BioSense: Enhances the nation’s capability to rapidly detect
bioterrorism events. (operational)
Border and Transportaion Security Directorate
Incident Data Mart: Will look through incident logs for patterns of
events. An incident is an event involving a law enforcement or
government agency for which a log was created (e.g., traffic ticket,
drug arrest, or firearm possession). The system may look at crimes in a
particular geographic location, particular types of arrests, or any type
of unusual activity. (planned - uses private sector data)
Case Management Data Mart: Assists in managing law enforcement cases,
including Customs cases. Reviews case loads, status, and relationships
among cases. (operational and uses private data)
Information Analysis and Infrastructure Protection Directorate
Analyst Notebook I2: Correlates events and people to specific
information (operational and uses private data)
Automatic Message Handling System (Verity): Automatically takes messages
from external agencies and routes them to appropriate recipients -
detecting terrorist activity (planned)
Secret Service
Criminal Investigation Division Data Mining: Mines data in suspicious
activity reports received from banks to find commonalities in data to
assist in strategically allocating resources. (operational)
DoJ
Drug/Financial Fusion Center: Will contain data from, and be used by,
Organized Crime and Drug Enforcement Task Force agencies. The system
will permit the collection and cross case analysis of all drug and
related financial investigative data. (planned and uses private data)
DEA
Statistical Management Analysis and Reporting Tool System (SMARTS)
/SPSS:Is a query analysis and reporting tool that pulls data from many
systems. It allows for statistical analyses of drug cases Drug
Enforcement Administration’s statistical reporting. (operational)
TOLLS: Is a database of telephone calls from court ordered and approved
wiretaps and Title III investigations. Information such as telephone
numbers, time and date of calls, and call duration is captured. Data are
mined for patterns to give leads in investigations of drug trafficking.
(operational)
FBI
Secure Collaborative Operational Prototype Environment/ Investigative
Data Warehouse:Allows the FBI to search multiple data sources through
one interface to uncover terrorist and criminal activities and
relationships. Data sources are a combination of structured and
unstructured text. (operational)
Foreign Terrorist Tracking Task Force Activity: Supports the Foreign
Terrorist Tracking Task Force that seeks to prevent foreign terrorists
from gaining access to the United States. Data from the Department of
Homeland Security, Federal Bureau of Investigation, and public data
sources are put into a data mart and mined to determine unlawful entry
and to support deportations and prosecutions. (operational)
FBI Intelligence Community Data Marts: Is intended to take a subset of
approved data from a data warehouse and make it available to the
intelligence community.(planned)
State Department
Citibank’s Ad Hoc Reporting System: Enables purchase card managers to
track trends related to the usage of credit cards by employees in
purchasing supplies
and services for official use. Purchase card program is worldwide, and
spending patterns and purchases are monitored for potential misuse or
fraud. (operational and uses private data)
IRS
Reveal: Will be used to detect financial criminal activity such as tax
evasion. (planned and uses private data)