24/7 Breach Assistance


Cybersecurity glossary for 2022

Our glossary contains key cybersecurity terms that enable clear communication and a common understanding of cybersecurity definitions.


“AI”: commonly known as Artificial Intelligence, it is the use of analytics, machine learning, email threading, deduplication and/or natural language processing for the purpose of identification, cost reduction and efficiency.

Analytics: The term used to refer to the various technologies used to provide multiple views into the data set.

Archive: Long-term repository for the storage of records and files.

Attachment: A document or file that is connected to another document or file either externally, e.g., a document connected to an email, or embedded, e.g., an image in a word processing document.


Backup tape: Portable media used to store copies of data that are created as a precaution against the loss or damage of the original data.


Child Document: A file that is attached to another communication file; e.g., the attachment to an email or a spreadsheet embedded in a word-processing document.

Container File: A single file containing multiple documents and/or files, usually in a compressed format; e.g., zip, rar., pst.

Culling: the process of identifying documents that will not be reviewed and/or extracted based on general and specific requirements.


Data Assessment: a process that occurs in the raw data once data is loaded into a review platform; broad determinations are made about the reportability or commonality of documents through general or customized techniques or tools.

Data Extraction: The process of parsing data from electronic documents to identify their metadata and body contents.

Data Mining: the use of set terms, custom terms, pre-built logic and knowledge gathered from the individual client that are applied specifically to the data sets to cull the data to only those that should be the most responsive per a specific reporting requirement(s) in the given jurisdiction.

Digital Forensics: the process of storing, analyzing, retrieving, and preserving data typically for litigation or regulatory proceedings.

DFIR: involve identifying, investigating, containing, remediating and potentially testifying related to cyberattacks, litigations or other digital investigations.

Discovery: The process of identifying, securing, and reviewing information that is potentially relevant to the matter and producing information that can be utilized as evidence in the legal process.

Document Family: All parts of a group of documents that are connected to each other for purposes of communication; e.g. an email and its attachments.


Electronic discovery: eDiscovery, e-Discovery. The process of identifying, preserving, collecting, preparing, reviewing and producing ESI in the context of a legal or investigative process.

Email: An electronic communication sent or received via a data application designed for that purpose (e.g. MS Outlook, Lotus Notes, Google Gmail).

ESI: Electronically stored information.


Filtering: The process of applying specific parameters to remove groups of documents that do not fit those parameters, in order to reduce the volume of the data set, e.g., date ranges and keywords.

Forensics: The handling of ESI including collection, examination and analysis, in a manner that ensures its authenticity, so as to provide for its admission as evidence in a court of law.


Hash: An algorithm that generates a unique value for each document. It is referred to as a digital fingerprint and is used to authenticate documents and to identify duplicate documents.


Load File: A file used to import data into an eDiscovery system. It defines document parameters for imaged documents and often contains metadata for all ESI it relates to.


Media: The device used to store electronic information, e.g., hard drives, backup tapes, and DVDs.

Metadata: Often referred to as data about data, it is the information that describes the characteristics of ESI, e.g., sender, recipient, author, and date. Much of the metadata is not accessible to non-technical users.


Native Format: A file that is maintained in the format in which it was created. This format preserves metadata and details about the data that might be lost when the documents are converted to image format, e.g., pivot tables in spreadsheets.

Near-duplicate: Two or more files that contain a specified percentage of similarity. Also, the process used to identify those nearly identical files.

Normalization: Reformatting data so that it is stored in a standardized format.


OCR: Optical Character Recognition is the process of converting images of printed pages into electronic text.


Parent Document: A document to which other documents/files are attached.

Phishing: A type of social engineering where an attacker sends a fraudulent message designed to trick a person into revealing sensitive information to the attacker. There are many types of phishing scams.

Predictive Coding: A document categorization process that extrapolates the tagging decisions of an expert reviewer across a data set. It is an iterative process that increases accuracy with multiple training passes.

Processing: a series of operations that takes raw data and transforms, enhances the underlying data for further analysis typically in a separate software solution which ingests data, extracts text and metadata, and normalizes the data. Some systems include data indexing and de-duplication in their processing workflow.

Programmatic Extraction: the utilization of tools for the extraction of sensitive information out of structured data.

Programmatic Review: the use of pre-set search terms and logic that are applied generally to all data sets to cull the data to only those documents that should be the most responsive per the general applicability of the reporting requirements in the given jurisdiction.


Search: The process of looking within a data set using specific criteria (a query). There are several types of searches ranging from simple keyword to concept searches that identify documents related to the query even when the query term is not present in the document.

Structured data: Data stored in a structured format such as a database. Structured data can create challenges in eDiscovery. See Unstructured data.


Unstructured Data: Data that is not stored in a structured format such as word processing documents and presentations.

Share this:

Get breach assistance now.

After a cyber breach, you need a team ready to hit the ground running. We’re here for you 24/7/365. That’s our promise.

Our incident response project managers, data analytics experts, and review specialists are seasoned professionals who understand the magnitude of the situation your company is facing and the related expenses.We are here to ensure timely, accurate notification of affected parties.

CyTrex Cyber - Incident Response Service

CyTrex Cyber helps entities that need assistance managing a cybersecurity incident or that want to learn more about cyber breach management. Insurance carriers, law firms, businesses, government agencies, and educational institutions depend on us for cyber incident response support.

Privacy Policy

© 2023 CyTrex Cyber, Inc

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram