Generating "data" and a Hole to Put it in
By: S. Rowan Wolf, Ph.D., Uncommon Thought Journal
June 05, 2004 This work is under a fair use Creative Commons License

It seems that a big part of the "war on terror" is a data war. Various agencies of the US government are gathering phenomenal amounts of information of individuals from the interlinked "crime" records program MATRIX, to a huge expansion of the US visitors database. How much data is being gathered? Enough that the National Security Administration is building a data storage facility in Colorado that will hold the equivalent of two Library of Congresses every two days.

According to a newly released GAO report - Data Mining: Federal Efforts Cover a Wide Range of Uses (May 2004), there are 199 current datamining efforts on the books with 131 that are already in use (Government data-mining lives on, Cnet, 6/01/04).

One such massive datamining project was awarded to Accenture (see UTJ article Accenture. Doesn't That Name Sound Familiar?) who received a $10 Billion contract for border security. This contract is supposed to extend to all land-based entries into the US. Accenture is using twenty-nine subcontractors and they include AT&T, KBR, and Dell Computer (Cnet article).

So there is copious amounts of data being gathered. The Colorado facility will hold hold the equivalent of 10 terabytes of data every day (NSA's new Colorado data warehouse: a Library of Congress per day, UnderReported, 6/03/04). I do have to question the completeness of the GAO Reports listing of data mining programs as it totally misses both the renowned CAPPS II system and MATRIX. The CAPPS program for tracking airline passengers is not even mentioned. The MATRIX system is discussed, but not included in their listing of projects. Their description of MATRIX is:

One example of a large-scale development effort launched in the wake of the September 11 attacks is the Multistate Anti-terrorism Information Exchange System, known as MATRIX. MATRIX, currently used in five states, (Connecticut, Florida, Michigan, Ohio, and Pennsylvania) provides the capability to store, analyze, and exchange sensitive terrorism-related and other criminal intelligence data among agencies within a state, among states, and between state and federal agencies. Information in MATRIX databases includes criminal history records, driver’s license data, vehicle registration records, incarceration records, and digitized photographs. Public awareness of MATRIX and of similar large-scale data mining or data mining-like projects has led to concerns about the government’s use of data mining to conduct a mass “dataveillance”7—a surveillance of large groups of people—to sift through vast amounts of personally identifying data to find individuals who might fit a terrorist profile.


There are certainly some drawbacks to living in the information age. With the amount of data being collected by various government agencies, one must wonder how big the Colorado facility is, and at the computer equipment necessary to even store such data - much less access it. My mind boggles, but it would seem that at least one supercomputer would be required (and probably more than one in fairly short order). It wasn't mentioned whether supercomputers were on the NSA's shopping list. As a researcher, I have to wonder how anyone would ever make sense of that amount of data. As a researcher, I also know that with that many bits and pieces of people's lives virtually any scenario can be made plausible.

I have include a selective listing of programs from the GAO report below. All descriptions are direct quotes.

From the Defense Intelligence Agency:
Insight Smart Discovery: Will be a data mining knowledge discovery tool to work against unstructured text. Will categorize nouns (names, locations, events) and present information in images. (planned)

Verity K2 Enterprise: Mines data from the intelligence community and Internet searches to identify foreign terrorists or U.S. citizens connected to foreign terrorism activities. (operating)

PATHFINDER: Is a data mining tool developed for analysts that provides the ability to analyze government and private sector databases rapidly. It can compare and search multiple large databases quickly. (operating)

Autonomy: Is a large search engine tool that is used to search hundreds of thousands of word documents. Is used for the organization and knowledge discovery of intelligence. (operating)

Airforce
Integrated Space Warfare Center (SWC) Information System:Will be an internal database containing information on all development/execution activities within the SWC. Will be used by all management and analyst personnel to track and align the center’s activities to warfighter needs, report on execution status, financial status, schedule status, and performance measurements. (planned)

Genomic and Proteomic Results Analysis: Analyzes National Institutes of Health’s genetic data. (operational)

Computer Network Defense System: Evaluates network activities to create rules for intrusion detection system signature sets. ((operating)

Modus Operandi Database: Is an investigative tool used to identify and track trends in criminal behavior. It links characteristics of crimes and provides details on crime scenes and other crime factors. (operating)

Department of Education
Citizenship of PLUS Loan Borrowers— National Student Loan Data Systems: Looks for issues regarding citizenship among its PLUS loan borrowers. Flags records based on selected criteria and requests additional information from schools. (operating)

Foreign Schools Initiatives National Student Loan Data System/Central Processing: Is a proactive investigation effort that looks at whether financial aid was granted individuals attending foreign institutions during periods of nonenrollment. (operating)

Title IV Applicant— Death Database Match: Compares Department of Education data with the Social Security Administration’s death database to detect fraud or criminal activity. (operating)

OIG—Project Strikeback: Compares Department of Education and Federal Bureau of Investigation data for anomalies. Also verifies personal identifiers. (operating)

CheckFree Software/Purchase Card Program: Takes monthly billing information from the Bank of America to create reports on purchases, purchase quantity, and frequency of purchases. Data are mined for instances of fraud or abuse. (operating and links to private sector data)

Grant Administration and Payment System: Assists in managing grant activities and aids in detecting instances of fraud or abuse in grant activities. (operating and links to private sector data)

Office of the Inspector General (OIG) Projects: Tumbleweed/ Snowball: Is part of an OIG investigation to determine potential fraud of financial aid grants primarily in New Hampshire. (operational)

DoE
Counterintelligence Automated Investigative Management System (CI-AIMS): Is an investigative management system used by Department of Energy (DOE) field sites to track investigative cases on individuals or countries that threaten DOE assets. Information stored in this database is also used to support federal and state law enforcement agencies in support of national security. (operational)

Autonomy: Will be used to mine a myriad intelligence-related databases within the intelligence community to uncover criminal or terrorist activities relating to DOE assets. (planned)

Counterintelligence Analytical Research Data System (CARDS): Is used to log briefings and debriefings given to DOE employees who travel to foreign countries or interact with foreign visitors to DOE facilities. Data are mined to identify potential threats to DOE assets. (operational)

Center for Disease Control
BioSense: Enhances the nation’s capability to rapidly detect bioterrorism events. (operational)

Border and Transportaion Security Directorate Incident Data Mart: Will look through incident logs for patterns of events. An incident is an event involving a law enforcement or government agency for which a log was created (e.g., traffic ticket, drug arrest, or firearm possession). The system may look at crimes in a particular geographic location, particular types of arrests, or any type of unusual activity. (planned - uses private sector data)

Case Management Data Mart: Assists in managing law enforcement cases, including Customs cases. Reviews case loads, status, and relationships among cases. (operational and uses private data)

Information Analysis and Infrastructure Protection Directorate
Analyst Notebook I2: Correlates events and people to specific information (operational and uses private data)

Automatic Message Handling System (Verity): Automatically takes messages from external agencies and routes them to appropriate recipients - detecting terrorist activity (planned)

Secret Service
Criminal Investigation Division Data Mining: Mines data in suspicious activity reports received from banks to find commonalities in data to assist in strategically allocating resources. (operational)

DoJ
Drug/Financial Fusion Center: Will contain data from, and be used by, Organized Crime and Drug Enforcement Task Force agencies. The system will permit the collection and cross case analysis of all drug and related financial investigative data. (planned and uses private data)

DEA
Statistical Management Analysis and Reporting Tool System (SMARTS) /SPSS:Is a query analysis and reporting tool that pulls data from many systems. It allows for statistical analyses of drug cases Drug Enforcement Administration’s statistical reporting. (operational)

TOLLS: Is a database of telephone calls from court ordered and approved wiretaps and Title III investigations. Information such as telephone numbers, time and date of calls, and call duration is captured. Data are mined for patterns to give leads in investigations of drug trafficking. (operational)

FBI
Secure Collaborative Operational Prototype Environment/ Investigative Data Warehouse:Allows the FBI to search multiple data sources through one interface to uncover terrorist and criminal activities and relationships. Data sources are a combination of structured and unstructured text. (operational)

Foreign Terrorist Tracking Task Force Activity: Supports the Foreign Terrorist Tracking Task Force that seeks to prevent foreign terrorists from gaining access to the United States. Data from the Department of Homeland Security, Federal Bureau of Investigation, and public data sources are put into a data mart and mined to determine unlawful entry and to support deportations and prosecutions. (operational)

FBI Intelligence Community Data Marts: Is intended to take a subset of approved data from a data warehouse and make it available to the intelligence community.(planned)

State Department
Citibank’s Ad Hoc Reporting System: Enables purchase card managers to track trends related to the usage of credit cards by employees in purchasing supplies and services for official use. Purchase card program is worldwide, and spending patterns and purchases are monitored for potential misuse or fraud. (operational and uses private data)

IRS
Reveal: Will be used to detect financial criminal activity such as tax evasion. (planned and uses private data)