Validating Non-Targeted Methods for Food Authenticity: A Complete Guide for Researchers

Lucy Sanders Dec 03, 2025 385

This article provides a comprehensive overview of the validation of non-targeted methods (NTMs) for food authenticity.

Validating Non-Targeted Methods for Food Authenticity: A Complete Guide for Researchers

Abstract

This article provides a comprehensive overview of the validation of non-targeted methods (NTMs) for food authenticity. Aimed at researchers, scientists, and professionals in food development and regulation, it covers the foundational principles of NTMs, explores diverse methodological approaches and their real-world applications, addresses key challenges in implementation and optimization, and outlines current frameworks and considerations for rigorous method validation. By synthesizing the latest research and emerging trends, this guide serves as a critical resource for developing reliable, fit-for-purpose NTMs to combat food fraud and ensure product integrity.

Understanding Non-Targeted Methods: The Foundation of Modern Food Authenticity

Non-targeted methods (NTMs) represent a paradigm shift in analytical chemistry, moving away from the traditional "needle in a haystack" approach that focuses on predefined analytes. Instead, NTMs exploit the comprehensive analytical signature of the entire sample matrix, capturing a holistic view of its chemical composition [1]. In the specific context of food authenticity research, these methods have emerged as powerful tools for characterizing complex food systems, detecting subtle variations indicative of adulteration, verifying origin, and ensuring overall product quality [2].

The core principle of NTMs lies in their ability to perform comprehensive characterization without a priori knowledge of the sample's chemical content [3]. This is achieved through the synergistic combination of high-resolution analytical instrumentation, such as mass spectrometry (MS) or nuclear magnetic resonance (NMR) spectroscopy, with advanced chemometrics and machine learning algorithms [1]. By capturing a complete spectral or chromatographic "fingerprint," NTMs reduce the complex data into manageable variables that provide an extensive metabolite snapshot, encompassing everything from minor compounds to major constituents [2]. The resulting data-rich outputs support stricter quality control and are critical in a marketplace increasingly concerned with food provenance, integrity, and safety [2].

Key Concepts and Comparative Analysis

Foundational Terminology

Understanding the expanding set of terminologies is essential for the widespread adoption and correct application of NTMs. Key concepts include:

Non-Targeted Analysis (NTA): A theoretical concept broadly defined as the characterization of the chemical composition of any given sample without using a priori knowledge regarding the sample's chemical content [3]. It is also referred to as "non-target screening" and "untargeted screening".
Features: In the context of data analysis, a feature represents a set of grouped, associated m/z-retention time pairs (mz@RTs) that represent a set of MS1 components for an individual compound, such as an individual compound and its associated isotopologue, adduct, and in-source product ion m/z peaks [3].
Wet Lab and Dry Lab Procedures: All steps involved in the NTM until the analytical measurements are performed on a lab bench are collectively the "wet lab" procedures. The subsequent "dry lab" procedures involve a chemometric, statistical, or machine learning model that parses the multi-dimensional dataset [1].
Reference Databases: NTMs rely on large, community-built datasets containing empirical data from authentic reference samples to define sample populations or classes [2] [1]. The robustness of these databases is critical for the reliability of any ensuing NTM.

Targeted vs. Non-Targeted Approaches

The fundamental difference between targeted and non-targeted strategies dictates their respective applications, advantages, and limitations.

Table 1: Comparison of Targeted and Non-Targeted Analytical Approaches

Aspect	Targeted Methods	Non-Targeted Methods (NTMs)
Analytical Focus	Aims at a predefined "needle in a haystack"; analysis of a well-defined set of known metabolites [2] [1].	Exploits all constituents of the "haystack"; holistic, impartial examination of complex compositions [2] [1].
Primary Output	Identification and quantification of specific, known compounds.	A unique fingerprint (e.g., NMR spectrum, chromatogram) of a food sample [2].
Typical Workflow	Comparisons with reference compounds and internal standards [2].	Multi-step procedure: metadata collection, sample prep, data acquisition, and multi-variate data analysis [2].
Main Application	Verification and quantification of known substances.	Discovery of unknown markers, sample classification, authentication, and detection of unanticipated adulterants [3].
Data Complexity	Lower; focused data easily analyzed with automated methods.	High; requires advanced data processing and modeling to parse multi-dimensional datasets [1].

Workflow and Experimental Protocol

The successful application of an NTM relies on a rigorously defined and validated workflow. The following protocol outlines the general steps for an NMR-based non-targeted method for food authenticity, which can be adapted for other spectroscopic or spectrometric platforms.

General Workflow for NMR-Based Non-Targeted Analysis

A unified workflow is essential for achieving consistent, high-quality metabolomics data that can be reproduced across different laboratories [2]. The general procedure encompasses several critical stages:

Selection of Authentic Reference Samples: This initial step is critical for ensuring the representativeness of the sample population. A sufficient number of authentic samples, accounting for biological variance (e.g., from genetics, environment, and management), must be selected to build a robust reference database and classification model [2] [4].
Sample Preparation: Depending on the food matrix, this step may involve processes such as extraction, concentration, or purification. The protocol must be optimized and standardized to minimize technical variation [2].
NMR Measurement: The analysis is performed using optimized and agreed-upon acquisition methods and conditions (e.g., pulse sequences, temperature, number of scans) to ensure spectral reproducibility and comparability even across different instruments [2].
Processing of NMR Spectra: Raw data (Free Induction Decay, FID) is processed (e.g., Fourier transformation, phasing, baseline correction) and often normalized (e.g., Constant Sum Normalization) to reduce unwanted technical variance [2].
Data Analysis and Model Building: The processed spectral data, reduced to manageable variables, is analyzed using chemometric methods (e.g., PCA, PLS-DA) to build a classification model that can authenticate an unknown sample based on its metabolic fingerprint [2].

The following diagram visualizes this integrated workflow, highlighting the seamless connection between the physical sample and the data-driven decision.

Protocol: Fisher Ratio Analysis for GC-MS Data

Fisher ratio (F-ratio) analysis is a specific, supervised, non-targeted, discovery-based method used to compare chromatograms from different sample classes and identify features that best differentiate them [5]. The following is a detailed protocol for applying pixel-based F-ratio analysis to discover minute, class-distinguishing compounds in a complex matrix, such as detecting adulterants in food.

Objective: To discover non-native (e.g., adulterating) analytes in a complex food matrix (e.g., an edible oil) by comparing chromatograms of authentic and suspect samples.

Principles: The F-ratio is defined as the ratio of class-to-class variance (( \sigma{cl}^2 )) to the sum of within-class variances (( \sigma{err}^2 )) [5]. It is calculated as: [ \text{Fisher ratio} = \frac{\sigma{cl}^2}{\sigma{err}^2} ] A high F-ratio indicates a feature (chromatographic peak) with large variation between sample classes relative to the variation within each class, marking it as a strong candidate for a class-distinguishing compound.

Experimental Steps:

Sample Preparation and Data Acquisition:
- Prepare samples from two classes (e.g., Class A: authentic oil, Class B: potentially adulterated oil). Include a sufficient number of replicates per class to reliably estimate within-class variance.
- Analyze all samples using GC-MS under consistent, optimized chromatographic conditions to generate the raw data files.
Data Pre-processing and Alignment:
- Convert raw data files into a standard format.
- Use appropriate software to perform retention time alignment on the total ion chromatograms to correct for minor shifts in retention time, which is crucial for minimizing false positives [5].
Pixel-Based F-Ratio Calculation:
- Export the fully aligned, three-dimensional (retention time × m/z × intensity) data.
- Using a custom script or software, calculate the F-ratio for every data point (pixel) in the retention time and m/z plane.
- This involves, for each pixel, calculating the variance between the mean intensities of Class A and Class B, and dividing by the sum of the variances within Class A and within Class B.
Generate and Filter the Hit List:
- Organize all calculated F-ratios into a ranked "hit list," with the highest F-ratio values at the top.
- To establish a statistical cutoff and avoid false positives, perform a null distribution analysis. This involves recalculating F-ratios multiple times with randomly permuted class labels to create a null F-ratio distribution. The 99th percentile of this null distribution can be used as a significance threshold [5].
Hit Identification:
- For the significant hits (peaks) above the threshold, examine the mass spectrum associated with the retention time and m/z of the feature.
- Use mass spectral libraries (e.g., NIST) to identify the compound responsible for the discriminating signal.

Advantages: Pixel-based F-ratio analysis has been shown to be more sensitive than peak table- or tile-based approaches, capable of discovering spiked analytes at low concentrations in a complex gasoline background [5].

The Scientist's Toolkit: Essential Research Reagent Solutions

The implementation of NTMs requires specific reagents, materials, and tools to ensure data quality and reproducibility. The following table details key components for a typical NTM workflow in food authenticity.

Table 2: Essential Reagents and Materials for Non-Targeted Methods

Item Name	Function/Application	Example/Specification
Deuterated Solvent	Provides a field-frequency lock and internal reference for NMR spectroscopy; essential for stable instrument operation.	Deuterium oxide (D₂O) for polar extracts; chloroform-d (CDCl₃) for non-polar extracts.
Internal Standard	Serves as a reference for chemical shift (NMR) or retention time (MS), and for quantification.	4,4-dimethyl-4-silapentane-1-sulfonic acid (DSS) for NMR; stable isotope-labeled compounds for MS.
Chemical Shift Reference	Provides a precise, internal point for chemical shift calibration in NMR spectra.	DSS or Tetramethylsilane (TMS) added to the sample at a known concentration [2].
Quality Control (QC) Pool Sample	Monitors instrument stability and performance over a sequence of analyses.	A pooled sample created by combining small aliquots of all test samples analyzed intermittently throughout the run.
Authentic Reference Materials	Used to build and validate the classification model; defines the "authentic" class.	Certified, well-characterized samples with verified provenance, covering expected biological variance [4].
Proficiency Testing (PT) Schemes	Provides an external check of laboratory performance and method validity.	Schemes available via organizations like EPTIS, which allow labs to compare their results with others [6].

Validation and Standardization Framework

For NTMs to transition from academic research to reliable tools for routine testing and official controls, rigorous validation is imperative [1]. Unlike targeted methods, validating NTMs presents unique challenges as it is the entire analytical process—from sample preparation to the final classification result—that must be validated to be considered fit-for-purpose [1].

Key performance characteristics to be assessed include:

Discrimination Ability: The method's capability to correctly differentiate between defined classes (e.g., authentic vs. adulterated, or different geographical origins).
Specificity: Demonstrated by the consistent detection of the unique pattern or fingerprint that characterizes a class.
Transferability/Robustness: The reproducibility of results across different instruments, laboratories, and over time. The high reproducibility of NMR spectroscopy, for instance, makes it particularly suitable for building collaborative databases [2].
Sensitivity and False Positive/Negative Rates: While more challenging to define than in targeted analysis, these can be estimated using a sufficient number of test samples with known class memberships.

International efforts are ongoing to develop harmonized guidelines for NTM validation. The AOAC International has developed Standard Method Performance Requirements (SMPRs) for non-targeted testing of various foods, including extra virgin olive oil, honey, and milk, providing minimum performance criteria that methods must fulfil [6]. Furthermore, the Benchmarking and Publications for Non-Targeted Analysis Working Group (BP4NTA) was formed to establish consensus definitions, share best practices, and improve the transparency and reproducibility of peer-reviewed NTA studies [3]. These initiatives are critical for fostering global consumer confidence in the authenticity and quality of the food supply chain [2].

Non-targeted methods represent a paradigm shift in analytical food testing. Unlike traditional targeted analyses that focus on predefined "needles in a haystack," NTMs exploit information from all measurable constituents within a sample, creating comprehensive analytical fingerprints that can be mined for patterns indicative of authenticity or fraud [7]. In food authenticity research, this approach is particularly valuable for detecting sophisticated adulteration, mislabeling, and substitution that might evade conventional targeted analyses [8]. The power of NTMs stems from the seamless integration of two complementary domains: the wet lab, where physical samples are processed and measured using advanced analytical platforms, and the dry lab, where complex data undergoes computational processing and statistical modeling to extract meaningful biological or chemical insights [8]. This integration enables researchers to distinguish closely related food products, such as spelt and wheat, with high reliability even when analyzing processed goods like flour and bread where conventional morphological identification fails [8]. The following sections detail the core components, workflows, and validation considerations for implementing integrated NTMs in food authenticity research.

Core Components of an Integrated NTM Framework

Wet Lab Components

The wet lab component is responsible for converting physical samples into standardized, high-quality analytical data. This process begins with sample preparation, which must be robust and reproducible to minimize technical variation that could interfere with biological or chemical signatures. For grain authentication, as demonstrated in spelt/wheat discrimination, this typically involves homogenization and standardized extraction protocols to ensure consistent recovery of analytes across sample batches [8].

The cornerstone of many modern NTMs is analytical fingerprinting using platforms such as Liquid Chromatography coupled to High-Resolution Mass Spectrometry (LC-HRMS) [8]. This platform generates highly resolved spectra that capture subtle differences in food composition arising from genetic factors, growing conditions, or processing methods. The resulting fingerprints comprise data points across mass/charge (m/z) ratios and retention times (Rt), creating a rich, multidimensional dataset for subsequent pattern recognition [8]. The resolution and accuracy of the mass analyzer (e.g., Time-of-Flight or TOF) are critical, as they determine the ability to detect minute but consistent differences between authentic and adulterated products.

Dry Lab Components

The dry lab component transforms raw analytical data into actionable classification models. Data preprocessing is an essential first step, potentially including normalization, peak alignment, and feature extraction to reduce instrumental noise and enhance biological signals [8]. For LC-HRMS data, this often involves creating a standardized mass window (such as in SWATH acquisition) and organizing peak intensity values across different dimensions [8].

Statistical modeling and machine learning form the analytical core of the dry lab. Convolutional Neural Networks (CNNs) have shown remarkable efficacy for classifying complex spectral data, automatically learning discriminative patterns without requiring manual feature selection [8]. These models can be developed using a nested cross-validation (NCV) approach to ensure robustness and prevent overfitting, particularly important when dealing with limited sample sizes [8]. The output of these models can be quantified using novel metrics such as the D score, which provides a quantitative measure of classification confidence and enables comparison across different models or experimental conditions [8].

Table 1: Core Components of an Integrated NTM for Food Authenticity

Component	Sub-Process	Key Techniques	Output
Wet Lab	Sample Preparation	Homogenization, extraction	Standardized analyte mixture
	Analytical Fingerprinting	LC-HRMS, SWATH acquisition	2D spectra (m/z vs Rt with intensities)
Dry Lab	Data Preprocessing	Normalization, peak alignment, feature extraction	Cleaned, standardized feature set
	Statistical Modeling	Convolutional Neural Networks (CNN), Nested Cross-Validation	Trained classification model, D scores

Experimental Workflow and Protocol

Integrated NTM Workflow

The following diagram illustrates the complete integrated workflow for an NTM in food authenticity research, from sample receipt to final classification:

Detailed Wet Lab Protocol: LC-HRMS Fingerprinting for Grain Authentication

Sample Preparation:

Homogenization: Begin with 100g of grain sample (spelt or wheat cultivars). Mill to a consistent particle size using a standardized milling procedure.
Extraction: Weigh 50mg of homogenized material into extraction tubes. Add 1mL of extraction solvent (e.g., methanol:water, 80:20 v/v) and vortex for 30 seconds.
Sonication: Sonicate samples for 15 minutes at room temperature, then centrifuge at 14,000g for 10 minutes.
Filtration: Transfer supernatant to LC vials through 0.2μm PTFE filters.

LC-HRMS Analysis:

Chromatographic Separation: Inject 5μL of extract onto a reversed-phase C18 column (2.1 × 100mm, 1.7μm) maintained at 40°C. Use a binary gradient with mobile phase A (0.1% formic acid in water) and B (0.1% formic acid in acetonitrile) at a flow rate of 0.3mL/min.
Mass Spectrometric Detection: Operate HRMS in positive electrospray ionization mode with data-independent acquisition (SWATH). Set mass range to 50-1200m/z with resolution >30,000. Use collision energy spread of 20-50eV for fragmentation.
Quality Control: Include quality control samples (pooled quality control) every 10 injections to monitor system stability.

Detailed Dry Lab Protocol: CNN Model Development for Spelt/Wheat Discrimination

Data Preprocessing:

Peak Alignment: Use reference-based alignment to correct for retention time drift across samples.
Feature Detection: Extract ion features with intensity >1000 counts and presence in at least 80% of samples in at least one group.
Data Normalization: Apply probabilistic quotient normalization to correct for overall intensity differences.
Data Augmentation: For small datasets, apply minor random variations to expand training set and improve model robustness.

CNN Architecture and Training:

Input Layer: Format preprocessed 2D spectral data (m/z × retention time) as input images.
Network Architecture: Implement a CNN with:
- Two convolutional layers with 32 and 64 filters respectively
- Max-pooling layers (2×2) after each convolutional layer
- Fully connected layer with 128 units
- Output layer with softmax activation for binary classification
Training Parameters: Use nested cross-validation with 5 outer folds and 3 inner folds. Train for 100 epochs with early stopping (patience=10). Use Adam optimizer with learning rate of 0.001.
Validation: Evaluate model performance on external validation set including artificially mixed spectra and processed goods.

Table 2: Validation Parameters for NTM in Food Authenticity

Performance Characteristic	Assessment Method	Target Value
Repeatability	Intra-day precision (n=5)	CV < 15%
Reproducibility	Inter-day/laboratory precision	CV < 20%
Recovery Yield	Spiked samples	67-131%
Inter-laboratory Agreement	Multiple laboratory comparison	>80%
Classification Accuracy	External validation set	>90%

Validation Framework for NTMs

Validation of NTMs requires specialized approaches that differ from traditional method validation. The fit-for-purpose principle guides validation, with parameters tailored to the specific authentication question [7]. Key validation parameters include:

Analytical Validation: This encompasses traditional parameters such as repeatability (intra-day precision) and reproducibility (inter-day, inter-laboratory precision). In spelt/wheat discrimination studies, repeatability should demonstrate coefficient of variation (CV) <15% for peak intensities in quality control samples [8]. Reproducibility is demonstrated through inter-laboratory comparisons targeting >80% agreement [9] [10]. Recovery yield, assessed using spiked samples, should fall within 67-131% [9] [10].

Model Validation: For the dry lab component, robust validation requires using independent sample sets not used in model training. The nested cross-validation approach prevents overfitting and provides realistic performance estimates [8]. External validation should include challenging samples such as artificially mixed spectra, processed goods, and atypical cultivars to demonstrate real-world applicability [8].

The following diagram illustrates the validation framework for NTMs:

Essential Research Reagent Solutions

Successful implementation of integrated NTMs requires specific research reagents and materials. The following table details essential components for the spelt/wheat discrimination protocol:

Table 3: Essential Research Reagents and Materials for NTM Food Authentication

Category	Specific Material/Reagent	Function in Protocol	Specifications
Chromatography	Reversed-phase C18 column	Separation of complex extracts	2.1 × 100mm, 1.7μm particle size
	Formic acid in water (0.1%)	Mobile phase A	LC-MS grade
	Formic acid in acetonitrile (0.1%)	Mobile phase B	LC-MS grade
Sample Preparation	Methanol (80%)	Extraction solvent	LC-MS grade
	PTFE filters (0.2μm)	Sample clarification	Sterile, non-binding
Mass Spectrometry	Reference mass solution	Mass accuracy calibration	Suitable for m/z range 50-1200
	Tuning and calibration solution	Instrument performance verification	Manufacturer-specified
Data Analysis	CNN software framework	Model development and training	Python with TensorFlow/PyTorch
	Spectral processing tools	Feature extraction and alignment	OpenMS, XCMS, or similar

Application in Food Authenticity Research

The integrated NTM approach has demonstrated particular efficacy in challenging authentication scenarios. In the spelt/wheat case study, the method successfully distinguished eleven cultivars each of spelt and wheat, achieving reliable classification even for processed goods (spelt bread and flour) and atypical cultivars not included in the original model training [8]. This capability is significant for regulatory enforcement, as German guidelines stipulate that spelt bread must contain at least 90% spelt, creating a need for accurate quantification in mixed matrices [8].

The approach also shows promise for addressing other food fraud challenges, including geographic origin verification, detection of adulterants, and verification of organic growing claims [8]. As regulatory frameworks such as EU regulations 2017/625 and 1169/2011 emphasize correct labeling and food safety, NTMs provide analytical support for compliance monitoring and enforcement actions [8].

The integration of wet lab and dry lab processes creates a powerful synergy for food authentication. The wet lab generates comprehensive, high-quality analytical data, while the dry lab extracts subtle patterns and relationships that would be undetectable through conventional analysis. This integrated framework represents a significant advancement in food authenticity research, providing a robust, flexible approach for addressing evolving food fraud challenges.

The Critical Role of Reference Databases in NTM Classification

In the field of food authenticity research, Non-Targeted Methods (NTMs) have emerged as a powerful analytical technique for detecting food fraud and verifying product origin, quality, and safety [1]. Unlike targeted approaches that focus on predefined analytes, NTMs exploit a comprehensive "fingerprint" of the sample, combining high-resolution analytical instruments with advanced chemometrics and machine learning algorithms [1] [11]. The reliability of these methods depends critically on the quality and scope of reference databases that form the foundation for statistical models and classification systems. This application note examines the pivotal role of reference databases in NTM classification, providing detailed protocols and considerations for their development and validation within food authenticity research.

Database Fundamentals for NTM Classification

Core Components and Terminology

Non-Targeted Methods consist of two fundamental components: "wet lab" procedures encompassing all steps until analytical measurements, and "dry lab" procedures involving chemometric/statistical/machine learning models that parse multi-dimensional datasets [1]. Reference databases serve as the critical bridge between these components, providing the empirical data needed to define sample populations and classes (e.g., olive oil from Italy versus Spain, wild versus farmed salmon) [1].

Database Diversity in Food Authenticity

The term "database" in NTM contexts encompasses diverse technological implementations, from cloud-based storage and management systems to local repositories [1]. These databases address different classification challenges in food authentication, including geographic origin, production methods (organic versus conventional), biological species, and processing techniques [1]. The construction of these databases must accommodate various analytical technologies, including chromatographic separation coupled with mass spectrometry, NMR, FTIR, NIR, Raman spectroscopy, and next-generation sequencing (NGS) technologies [1].

Table 1: Analytical Platforms and Their Applications in Food Authenticity NTMs

Analytical Platform	Measured Signals	Example Food Applications	Reference
GC-MS, LC-MS	Chromatograms, mass spectra	Virgin olive oil quality grading, honey geographical origin	[1] [11]
NMR Spectroscopy	Spectral fingerprints	Detection of protein hydrolysates in turkey meat	[11]
FT-NIR Spectroscopy	Near-infrared spectra	Truffle species differentiation	[11]
DART-HRMS	Mass spectra	Chestnut honey geographical origin discrimination	[11]
NGS/Metabarcoding	DNA sequences	Multi-species identification in complex products	[1]

Database Construction and Curation Protocols

Sample Collection and Preparation

The development of a robust reference database begins with comprehensive sample collection that accurately represents the natural variability within defined classes. For geographical origin authentication, this includes samples across multiple harvest years, growing regions, and processing facilities. Sample preparation for NTMs typically employs simple protocols to capture as many matrix components as possible, contrasting with targeted methods that often require complex, selective extractions [11].

Protocol 3.1: Representative Sample Collection for Food Authenticity Databases

Define classification objectives: Clearly establish the authentication goal (geographical origin, species, production method)
Establish inclusion criteria: Determine sample requirements (minimum number per class, sourcing documentation)
Collect reference materials: Obtain certified reference materials where available
Document metadata: Record comprehensive sample information including origin, date, processing details, and storage conditions
Implement quality controls: Include positive and negative controls across analytical batches

Analytical Measurement and Data Acquisition

Consistent analytical performance is fundamental to building reliable databases. The analytical methods used for database construction must be validated regarding their performance characteristics to ensure future standardization potential [1].

Protocol 3.2: Standardized Analytical Procedures for Database Building

Method validation: Perform single-laboratory validation of analytical methods before database building
Instrument calibration: Establish and document calibration procedures for all instruments
Quality control samples: Analyze QC samples at regular intervals throughout data acquisition
Standard operating procedures: Develop detailed SOPs for all analytical steps
Data formatting: Implement consistent file naming conventions and data formats

Data Processing and Quality Control Pipeline

The exponential growth of reference data necessitates robust computational pipelines for database construction and quality control [12]. As demonstrated in metagenomic classification, database quality directly impacts research conclusions, with contamination leading to spurious classifications [12].

Protocol 3.3: Database Quality Control and Curation

Sequence decontamination: Apply tools like Conterminator to identify and remove cross-contamination [12]
Low-complexity masking: Use algorithms like NCBI dustmasker to mask uninformative regions [12]
Length filtering: Remove sequences below established thresholds (e.g., <100 bases for genomic data) [12]
Taxonomic consistency: Verify alignment between sequence data and taxonomic information [12]
Version documentation: Maintain comprehensive records of database versions and build parameters

Table 2: Database Quality Control Measures and Their Impacts

Quality Control Step	Tool/Method	Impact on Database Performance	Reference
Reference Decontamination	Conterminator	Reduces spurious classifications; eliminated false Plasmodium annotations in metagenomic study	[12]
Low-Complexity Masking	dustmasker	Removes uninformative regions, improves classification specificity	[12]
Length Filtering	Custom scripts (Recentrifuge)	Excludes short sequences that reduce classification accuracy	[12]
Taxonomic Validation	NCBI Taxonomy Database	Ensures consistent taxonomic assignments across database	[12]
Temporal Synchronization	Custom pipeline	Minimizes inconsistencies from asynchronous updates between sequence and taxonomy databases	[12]

Database Implementation in NTM Workflows

Classification Algorithms and Model Building

The transition from raw database to functional classification system requires appropriate chemometric approaches. Multiple studies demonstrate the effectiveness of combining spectroscopic or chromatographic data with multivariate statistics for food authentication [11].

Diagram 1: NTM Classification Workflow (67 characters)

Experimental Design for Method Validation

Validating NTM classification systems requires specialized approaches that differ from traditional method validation. The European Union Official Controls Regulation requires control laboratories to apply standardized methods when available, or otherwise methods validated through single-laboratory validation [1].

Protocol 4.2: Validation of NTM Classification Systems

Performance characteristics: Establish accuracy, precision, specificity, and robustness metrics
Class representation: Ensure validation samples are independent from training set
Statistical significance: Implement appropriate significance testing for classification rates
Cross-validation: Apply k-fold or leave-one-out cross-validation procedures
External validation: Validate model performance with independently sourced samples

Case Studies in Food Authenticity

Geographical Origin Authentication

Multiple studies demonstrate the effectiveness of NTMs with comprehensive reference databases for determining geographical origin. Kim et al. used hydrophilic and lipophilic metabolite profiling via GC-MS with OPLS-DA to differentiate perilla and sesame seeds from China and Korea, identifying glycolic acid as a potential biomarker [11]. Similarly, Lippoli et al. developed a non-targeted method using DART-HRMS combined with multivariate statistics to discriminate chestnut honey from Portugal and Italy and acacia honey from Italy and China [11].

Species and Production Method Discrimination

The combination of analytical techniques with reference databases enables precise discrimination of species and production methods. Grazina et al. used fatty acid profiles determined by GC-FID with machine learning classifiers to differentiate wild from farmed salmon based on seventeen chemical features [11]. Segelke et al. demonstrated that FT-NIR spectroscopy with chemometrics could differentiate valuable truffle species (Tuber magnatum) from morphologically similar but less valuable species (Tuber borchii) with 100% accuracy [11].

Table 3: Performance of NTM Approaches in Food Authentication Case Studies

Food Product	Authentication Challenge	Analytical Technique	Classification Performance	Reference
Olive Oil	Commercial category (extra virgin, virgin, lampante)	Flash GC with PLS-DA	High percentage correct classification in cross and external validation	[11]
Truffles	Species differentiation (T. magnatum vs T. borchii)	FT-NIR with chemometrics	100% accuracy for expensive white truffle differentiation	[11]
Turkey Meat	Detection of protein hydrolysate adulteration	GC-MS and NMR spectroscopy	Detection of adulteration missed by targeted amino acid analysis	[11]
Honey	Geographical origin discrimination	ICP-OES with LDA	Successful distinction of honeys from industrial vs. non-industrial areas	[11]
Salmon	Wild vs. farmed discrimination	GC-FID fatty acids with machine learning	Successful discrimination of production method and geographical origin	[11]

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 4: Essential Research Reagents and Computational Tools for NTM Database Development

Item/Category	Function in NTM Workflow	Specific Examples/Considerations
Reference Materials	Define class characteristics in database	Certified reference materials, geographically sourced verified samples
Chromatography-Mass Spectrometry Systems	Generate comprehensive chemical profiles	GC-MS, LC-MS systems with high resolution capabilities
Spectroscopic Instruments	Provide rapid fingerprinting capabilities	FT-NIR, NMR, Raman spectrometers for non-destructive analysis
DNA Sequencing Platforms	Species identification via genetic markers	Next-generation sequencing for metabarcoding approaches
Chemical Standards	Instrument calibration and method validation	Pure analytical standards for quality control procedures
Data Processing Software	Extract features from raw instrument data	XCMS, MS-DIAL, custom preprocessing scripts
Statistical Analysis Packages	Develop classification models	R, Python with specialized packages (scikit-learn, SIMCA)
Database Management Systems	Store and query reference data	SQL, NoSQL databases depending on data structure and volume
High-Performance Computing	Process large-scale datasets and build models	Cluster computing resources for database construction and analysis

Reference databases for Non-Targeted Methods must be treated as dynamic entities requiring continuous quality control, validation, and updating akin to software development best practices [12]. The exponential growth of sequence data, with GenBank and NCBI nt database experiencing continuous expansion, presents both opportunities and challenges for NTM classification [12]. Future developments should focus on standardized data formats, interoperable database structures, and automated quality control pipelines to enhance reproducibility and reliability across laboratories.

The critical importance of reference databases in NTM classification is evident across food authenticity applications. From truffle speciation to geographical origin of honey and production method of salmon, the comprehensiveness and quality of the reference database directly determines the accuracy and reliability of authentication. As the field advances, treating reference databases with the same rigor as analytical instrumentation will be essential for advancing food authenticity research and combating increasingly sophisticated food fraud.

The global food supply chain faces escalating challenges from food fraud, defined as the deliberate and intentional adulteration, substitution, or misrepresentation of food products for economic gain [13]. Concurrently, consumer demand for transparency regarding food origin, safety, and authenticity has surged, driven by heightened health consciousness and several highly publicized food scandals [13] [14]. These dual pressures represent the key drivers necessitating advanced analytical solutions to verify food authenticity and protect consumers and legitimate producers.

Non-targeted methods (NTMs) represent a paradigm shift in food authenticity testing. Unlike traditional targeted methods that test for predefined analytes, NTMs exploit a comprehensive "fingerprint" of a food sample, enabling the detection of known, unknown, and unexpected adulterants [15] [16]. This application note details the integration of NTMs, specifically liquid chromatography–high-resolution mass spectrometry (LC–HRMS), within a robust validation framework to effectively address contemporary food fraud challenges and meet the demand for supply chain transparency.

Market and Regulatory Context

The global food authenticity testing market is experiencing significant growth, with an estimated value of approximately $3.6 billion in 2023 and a projected compound annual growth rate (CAGR) of around 7% between 2025 and 2033 [14]. This expansion is concentrated among large multinational testing companies but is fueled by several converging factors:

Stringent Regulations: Governments worldwide are implementing stricter regulations on food safety, traceability, and labeling, mandating rigorous testing protocols. Non-compliance results in millions of dollars in fines annually [14].
Economic Impact of Fraud: Food fraud inflicts substantial economic costs, including unfair competition and financial losses for legitimate businesses, extending to public health implications [17] [13].
Consumer Awareness: Increasing consumer awareness of ethical sourcing, sustainability, and food safety is pushing brands to proactively demonstrate product authenticity to build trust and enhance brand reputation [14] [13].

Certain product categories are disproportionately targeted for fraud. The table below summarizes the market characteristics of key segments.

Table 1: Market Characteristics of High-Risk Food Authenticity Segments

Food Segment	Market Share	Common Fraud Types	Primary Driver for Testing
Dairy Products	~25% [14]	Milk adulteration, substitution [14]	Public health protection from hazardous adulterants [14]
Oils and Fats	~20% [14]	Adulteration of olive oil with cheaper substitutes [14]	Guaranteeing product quality and preventing consumer deception [14]
Honey	~15% [14]	Adulteration with sugar syrups, mislabeling [14]	Ensuring purity and quality to maintain consumer trust [14]
Meat and Grain	Significant [18] [13]	Species substitution, mislabeling of geographical origin [18] [13]	Economic fraud prevention, ethical sourcing verification [18] [13]

Non-Targeted Method (NTM) Fundamentals and Advantages

NTMs comprise two core components: a "wet lab" procedure for analytical measurement and a "dry lab" procedure for statistical modeling and data evaluation [18]. The fundamental advantage of NTMs is their ability to conduct a comprehensive analysis without prior knowledge of potential adulterants, making them uniquely suited for detecting sophisticated and evolving fraud.

Key advantages include:

Detection of Unknowns: Capable of identifying unexpected contaminants or adulterants, as demonstrated in the melamine milk scandal [16].
Retrospective Analysis: Data can be archived and re-interrogated later when new information about a potential adulterant emerges [16].
Broad Applicability: Useful for multiple applications, including food authenticity, foodomics, and nutrient analysis, by providing a complete view of sample composition [16].

Detailed Experimental Protocol: LC-HRMS for Food Authentication

The following protocol details a specific NTM application for distinguishing spelt from wheat, a common fraud due to the price premium of spelt [18]. This can be adapted for other grain and food matrix authentications.

Research Reagent Solutions and Essential Materials

Table 2: Essential Materials and Reagents for LC-HRMS Non-Targeted Analysis

Item	Specification/Function
Liquid Chromatograph	System capable of high-pressure gradient separations.
Mass Spectrometer	High-resolution accurate mass (HRAM) analyzer (e.g., Time-of-Flight or Orbitrap).
Chromatography Column	Reversed-phase C18 column, 100 x 2.1 mm, 1.8 µm particle size.
Mobile Phase A	0.1% Formic acid in water. Aids in protonation for positive electrospray ionization.
Mobile Phase B	0.1% Formic acid in acetonitrile or methanol.
Quality Control Mixture	Non-targeted Standard Quality Control (NTS/QC) mixture containing ~89 compounds with diverse physicochemical properties to monitor instrument performance [16].
Sample Solvent	Appropriate solvent compatible with LC-MS (e.g., water, acetonitrile, methanol).
Isotopically Labeled Standards	For checking retention time stability and ionization efficiency [19].

Step-by-Step Workflow

Figure 1: Experimental workflow for non-targeted food authentication using LC-HRMS and machine learning.

Sample Preparation:
- Collect authentic reference samples (e.g., 11 cultivars each of verified spelt and wheat) [18].
- For solid samples, homogenize and perform a simple extraction (e.g., with methanol/water) to capture a wide range of metabolites. The goal is a non-selective extraction to support the non-targeted approach [13].
- Include quality control (QC) samples, such as a pooled sample from all individual samples, and analyze them intermittently throughout the sequence to monitor instrument stability [16].
LC-HRMS Analysis:
- Chromatography: Use a reversed-phase C18 column with a gradient elution from 5% to 100% organic mobile phase over 20-30 minutes. The specific gradient should be optimized for the sample matrix.
- Mass Spectrometry: Acquire data in data-dependent acquisition (DDA) or data-independent acquisition (DIA, e.g., SWATH) mode. Ensure mass accuracy is maintained within 3 ppm [19]. Collect both MS1 (precursor) and MS/MS (fragmentation) data.
Data Pre-processing:
- Convert raw data files to an open format (e.g., mzML).
- Use software (vendor-specific or open-source like XCMS) for peak picking, alignment across samples, and intensity normalization.
- The output is a data matrix of molecular features (defined by m/z and retention time) and their intensities across all samples.
Data Analysis and Modeling:
- Chemometrics: Perform unsupervised analysis like Principal Component Analysis (PCA) to visualize natural clustering and identify outliers.
- Machine Learning: Employ supervised methods. For spectral data, a Convolutional Neural Network (CNN) can be highly effective.
  - Input: Use the entire pre-processed spectrum as a 2D image (intensity vs. m/z and retention time) [18].
  - Training: Train the CNN on a "calibration set" of known spelt and wheat samples using a nested cross-validation (NCV) approach to avoid overfitting [18].
  - Validation: Validate the model with an external set of samples, including artificially mixed samples and processed goods (e.g., spelt bread with 10% wheat), to test real-world applicability [18].

Validation Considerations for Non-Targeted Methods

Validating NTMs requires a different approach than targeted methods, focusing on fit-for-purpose performance characteristics [15]. Key validation considerations include:

Data Quality: The foundation of any NTM. Critical parameters include mass accuracy (e.g., < 3 ppm), isotopic ratio accuracy, and peak height reproducibility (e.g., < 20% RSD) [16]. These are monitored using a quality control mixture like the NTS/QC [16].
Robustness and Reproducibility: The model must be tested against seasonal variation, different geographic origins, and varying agricultural practices to ensure it does not generate false positives/negatives when conditions change [20].
Veracity of Training Set: The model is only as good as its training data. Absolute certainty about the authenticity of samples used to build the model is crucial; otherwise, fraud is "baked in" from the start [20].
Thorough Reporting: Use tools like the NTA Study Reporting Tool (SRT) developed by the Benchmarking and Publications for Non‑Targeted Analysis (BP4NTA) group to ensure transparent and complete reporting of methods and results [16].

Data Processing and Advanced Analytics

The complex, high-dimensional data generated by NTMs requires advanced processing tools to extract meaningful information.

Suspect Screening Analysis (SSA): A rapid method for putative identification by screening detected masses and fragmentation spectra against chemical databases. This can be refined using retention time prediction models based on log Kow values to reduce false positives [16] [19].
Molecular Networking: Groups detected compounds into molecular families based on the similarity of their MS/MS fragmentation spectra. This is particularly useful for identifying unknown compounds that are not in existing spectral libraries [16].
Challenges in Standardization: A significant challenge is the lack of standardization in data processing. Different software tools can report only ~10% overlap of compounds from the same dataset [16]. Efforts by groups like the Metabolomics Quality Assurance and Quality Control Consortium (mQACC) are ongoing to harmonize QA/QC best practices [16].

The convergence of increasing food fraud incidents and stringent consumer demand for transparency makes the adoption of robust, scientifically validated analytical strategies imperative. Non-targeted methods, particularly when built on LC-HRMS platforms and supported by advanced data processing and machine learning, provide a powerful solution to verify food authenticity. Successful implementation requires a holistic approach that integrates rigorous experimental protocols, comprehensive method validation, and sophisticated data analytics. By adopting this framework, researchers, testing laboratories, and food producers can better safeguard the integrity of the global food supply chain, ensure regulatory compliance, and build consumer trust.

In the ongoing effort to combat food fraud, analytical testing serves as a critical line of defense. Traditional targeted analysis is a reactive approach designed to detect specific, predefined adulterants [21]. While highly sensitive for known compounds, this method offers no protection against unexpected or novel fraud, creating a significant vulnerability in food authenticity programs [13] [21].

In contrast, non-targeted analysis (NTA) represents a paradigm shift towards proactive surveillance. Instead of hunting for specific molecules, NTA acquires a comprehensive chemical "fingerprint" of a sample, capturing a wide array of data points without pre-selection [13] [21]. This fundamental difference allows NTA to screen for deviations from an authentic profile, making it uniquely capable of revealing the presence of unknown or unexpected adulterants, thereby offering a powerful strategic advantage in protecting food integrity [13].

Comparative Analysis: Targeted vs. Non-Targeted Approaches

The distinction between targeted and non-targeted methods dictates their respective applications, strengths, and limitations within a food fraud mitigation strategy. The following table summarizes their core characteristics.

Table 1: Fundamental Comparison of Targeted and Non-Targeted Analytical Approaches

Feature	Targeted Analysis	Non-Targeted Analysis
Analytical Focus	Pre-defined individual analytes or markers [21]	Global, comprehensive fingerprint [13] [21]
Primary Goal	Confirm or deny the presence/quantity of a specific substance [21]	Detect deviations from a reference database of authentic samples [21]
Detection Capability	Known, anticipated adulterants	Known and unknown adulterants [13] [21]
Sample Preparation	Often complex, optimized for specific analytes [13]	Generally simple, to capture a wide range of components [13]
Data Output	Quantitative data on specific compounds	Multivariate data patterns (e.g., spectra, chromatograms) [21]
Result Interpretation	Direct comparison to reference standards	Statistical, probabilistic (e.g., Chemometrics, Machine Learning) [13] [21]
Strategic Role	Reactive testing; compliance checks	Proactive screening; hypothesis generation [21]

The core advantage of NTA is its ability to detect fraud for which no specific test exists. As noted by the IFST, "if an issue is not sought then it will not be found" in a targeted paradigm [21]. NTA overcomes this limitation by casting a wide net. Its power is further demonstrated by its ability to detect adulterations that elude targeted methods. For instance, in a study on turkey meat adulterated with protein hydrolysates, traditional amino acid profiling (targeted) failed to detect partial hydrolysates, whereas non-targeted metabolite profiling via GC-MS and NMR spectroscopy successfully identified the fraud [13].

Key Methodologies and Experimental Protocols

Non-targeted methods leverage a suite of advanced analytical platforms, each providing a different perspective on a sample's chemical composition. The workflow is fundamentally different from targeted analysis, emphasizing comprehensive data acquisition and pattern recognition.

General Workflow for Non-Targeted Analysis

The following diagram outlines the generalized logical workflow for applying non-targeted analysis to food authenticity problems.

Detailed Experimental Protocols

Protocol 1: Non-Targeted Metabolomics for Geographical Origin Authentication

This protocol is adapted from research aimed at discriminating the geographical origin of seeds and honey using metabolite profiling [13].

1. Objective: To differentiate perilla and sesame seeds from China and Korea based on their hydrophilic and lipophilic metabolite profiles.
2. Sample Preparation:
- Grind seeds to a homogeneous powder.
- Weigh 100 mg of powder into an extraction vial.
- Add 1 mL of a methanol:water:chloroform (2.5:1:1 v/v/v) extraction solvent.
- Vortex vigorously for 2 minutes, then sonicate for 15 minutes in an ice bath.
- Centrifuge at 14,000 x g for 10 minutes at 4°C.
- Transfer the upper (hydrophilic) and lower (lipophilic) layers to separate vials.
- Dry the extracts under a gentle stream of nitrogen.
- Derivatize for GC-MS analysis using a standard methoxyamination and silylation procedure.
3. Instrumental Analysis:
- Technique: Gas Chromatography-Mass Spectrometry (GC-MS)
- GC Conditions: Use a non-polar capillary column (e.g., DB-5MS). Employ a temperature gradient from 60°C to 330°C.
- MS Conditions: Electron Impact (EI) ionization at 70 eV; full scan mode from m/z 50 to 600.
4. Data Processing & Analysis:
- Deconvolute raw GC-MS data to align peaks and identify features across all samples.
- Create a data matrix of peak areas (features) versus samples.
- Import the matrix into a chemometric software package.
- Apply Orthogonal Partial Least Squares-Discriminant Analysis (OPLS-DA) to build a classification model.
- Identify potential biomarker metabolites (e.g., glycolic acid for perilla seeds) that drive the separation between groups.

Protocol 2: Non-Targeted Spectroscopy for Quality Grade Screening

This protocol is based on studies that used spectroscopic techniques for the rapid authentication of olive oil quality and truffle species [13].

1. Objective: To develop a classification model for predicting the commercial category of virgin olive oils (extra virgin, virgin, lampante).
2. Sample Preparation:
- No extensive preparation is required.
- Ensure samples are homogenous and at a consistent temperature before analysis.
3. Instrumental Analysis:
- Technique: Flash Gas Chromatography (Flash GC) or Fourier Transform Near-Infrared (FT-NIR) Spectroscopy.
- Flash GC Conditions: Inject 1 µL of sample; use a short, non-polar column; rapid temperature program to separate the volatile fraction.
- FT-NIR Conditions: Use an integrating sphere or fiber optic probe; collect spectra in the range of 800-2500 nm; average 32 scans per sample at a resolution of 8 cm⁻¹.
4. Data Processing & Analysis:
- For Flash GC, use the entire chromatogram as a fingerprint. For NIR, use the raw spectral data.
- Apply necessary pre-processing to reduce noise and correct baseline effects (e.g., Standard Normal Variate (SNV), multiplicative scatter correction, Savitzky-Golay derivatives).
- Use Partial Least Squares-Discriminant Analysis (PLS-DA) to build a predictive model using a large and diversified set of pre-classified authentic samples (n > 300 is ideal).
- Validate the model's robustness using cross-validation and an external validation set.

The Researcher's Toolkit: Essential Reagents and Materials

Successful implementation of non-targeted methods relies on a foundation of specific reagents, instrumentation, and software.

Table 2: Key Research Reagent Solutions for Non-Targeted Analysis

Item	Function/Description	Application Example
Methoxyamine hydrochloride	Protects carbonyl groups during derivatization for GC-MS analysis.	Metabolite profiling in seeds, honey, and meat [13].
N-Methyl-N-(trimethylsilyl)trifluoroacetamide (MSTFA)	Silylation reagent that volatilizes polar metabolites for GC-MS separation.	Metabolite profiling [13].
Deuterated Solvent (e.g., CD₃OD, D₂O)	Provides a locking signal for NMR spectroscopy and enables sample dissolution without interference.	Metabolite profiling for detecting adulteration in turkey meat [13].
Chemometric Software (e.g., SIMCA, PLS_Toolbox)	Software for multivariate statistical analysis, including PCA, PLS-DA, and OPLS-DA.	Essential for building classification and discrimination models from complex data [13].
Authentic Reference Materials	Certified, well-characterized samples used to build the foundational reference database.	Critical for calibrating and validating any non-targeted model for any matrix [21].
C18 / Normal Phase Solid-Phase Extraction (SPE) Cartridges	For selective cleanup or fractionation of complex samples to reduce matrix effects.	Can be used in lipidomics or targeted metabolite analysis within a non-targeted workflow [22].

Non-targeted analysis represents a transformative advance in food authenticity research, shifting the paradigm from reactive detection to proactive surveillance. Its principal advantage is the capacity to uncover unknown adulterants by identifying anomalous patterns against a background of authentic product profiles. While challenges remain—including the need for robust reference databases and sophisticated data analysis—the integration of NTA into food fraud mitigation strategies provides a powerful, forward-looking tool. It empowers scientists and regulators to not only confront known threats but also to build a more resilient food system capable of adapting to the evolving tactics of fraud.

Methodologies in Action: Analytical Platforms and Applications for Food Forensics

Within the framework of non-targeted methods (NTM) for food authenticity research, rapid fingerprinting techniques have emerged as powerful tools for addressing global challenges in food fraud and mislabeling. Fourier Transform Near-Infrared (FT-NIR) spectroscopy and Nuclear Magnetic Resonance (NMR) spectroscopy represent two leading analytical approaches that fulfill the critical need for efficient, high-throughput authentication capable of verifying geographical origin, botanical source, and processing methods without prior knowledge of potential adulterants [23] [15]. The growing implementation of these techniques stems from their ability to provide a comprehensive molecular snapshot of food matrices, enabling researchers to detect subtle compositional differences indicative of authenticity breaches. Unlike targeted methods that focus on specific analytes, non-targeted fingerprinting exploits the entire spectral profile, offering a more holistic approach to authenticity verification that can identify unexpected adulterants [23] [13]. This application note details the practical implementation, experimental protocols, and performance validation of FT-NIR and NMR spectroscopy within the context of NTM validation for food authenticity research.

Key Principles and Technological Comparison

FT-NIR and NMR spectroscopy, while both serving as non-targeted fingerprinting tools, operate on distinct physical principles that dictate their specific applications, strengths, and limitations in food authentication.

FT-NIR spectroscopy measures the absorption of near-infrared light (780-2500 nm), corresponding to overtone and combination vibrations of fundamental molecular bonds, primarily O-H, C-H, and N-H groups [24]. These interactions provide information on the organic composition of samples, making it particularly sensitive to differences in protein, fat, and moisture content. The technique generates complex, high-dimensional data that is inherently non-linear and requires sophisticated chemometric analysis for interpretation [25].

NMR spectroscopy, in contrast, exploits the magnetic properties of certain atomic nuclei (e.g., ^1H, ^13C) when placed in a strong magnetic field. The technique detects the resonance frequencies of these nuclei, which are exquisitely sensitive to their local chemical environment. This provides a definitive, reproducible fingerprint that can simultaneously identify and quantify a wide range of metabolites—from organic acids and amino acids to sugars and lipids—in a single experiment [23]. NMR's exceptional quantitative capabilities and high repeatability make it particularly valuable for building standardized databases and for regulatory applications [23] [26].

Table 1: Comparative Analysis of FT-NIR and NMR Spectroscopy for Food Authenticity

Parameter	FT-NIR Spectroscopy	NMR Spectroscopy
Analytical Principle	Overtone/vibrational spectroscopy [24]	Magnetic nuclear spin transitions [23]
Sample Preparation	Minimal; often non-destructive [27]	May require extraction or dissolution [28]
Speed of Analysis	Very rapid (seconds to minutes) [25] [29]	Moderate (several minutes per sample) [23]
Metabolite Coverage	Broad, based on functional groups	Comprehensive, with specific identification [23]
Quantitative Nature	Semi-quantitative (requires calibration)	Highly quantitative and reproducible [23]
Primary Strengths	Portability, low cost, high-throughput screening	High specificity, structural elucidation, database building [23] [26]
Key Limitations	Indirect measurements, complex data interpretation	High initial cost, lower sensitivity than MS techniques [23]
Ideal Use Case	Rapid in-line screening and origin classification [25] [27]	Definitive authentication and regulatory testing [26]

Experimental Protocols

FT-NIR Protocol for Geographical Origin Authentication

This protocol outlines the procedure for authenticating the geographical origin of almonds using FT-NIR spectroscopy, adaptable for other solid food matrices [29].

Materials and Equipment

FT-NIR spectrometer equipped with a reflectance probe or integration sphere
Cryogenic grinder with liquid nitrogen cooling
Freeze-dryer
Single-use sample cups or quartz windows
Certified reference materials for instrument validation

Sample Preparation Steps

Homogenization: For solid samples like almonds, grind representative portions using a cryogenic grinder to achieve a uniform particle size of <500 µm [29].
Freeze-Drying: Transfer the ground sample to a freeze-dryer for approximately 72 hours to remove interfering water signals from the NIR spectrum [29].
Presentation: Place the prepared sample in a consistent orientation in the sample cup, ensuring uniform packing density and surface topography for reproducible reflectance measurements.

Instrumental Analysis

System Calibration: Verify instrument performance using certified white reference standards prior to analysis.
Spectral Acquisition: Collect spectra in the range of 780-2500 nm (12,820-4,000 cm⁻¹) with a resolution of 4-16 cm⁻¹. Accumulate 32-64 scans per spectrum to optimize the signal-to-noise ratio [29] [24].
Quality Control: Include control samples from known origins in each analytical batch to monitor system stability and performance.

Data Processing and Analysis

Spectral Preprocessing: Apply mathematical treatments to reduce scattering effects and enhance spectral features. Common techniques include:
- Standard Normal Variate (SNV) to remove scatter effects [25]
- Savitzky-Golay derivatives (1st or 2nd order) to resolve overlapping peaks and remove baseline offsets [25]
- Multiplicative Scatter Correction (MSC) [24]
Chemometric Modeling:
- Utilize Support Vector Machine (SVM) algorithms, particularly non-linear variants like Polynomial-SVM, which have demonstrated >95% classification accuracy for geographical origin determination [25] [29].
- Validate models using independent test sets and cross-validation to ensure robustness and prevent overfitting.

NMR Protocol for Honey Authenticity and Metabolite Profiling

This protocol describes the procedure for using NMR spectroscopy to authenticate honey origin and detect syrup adulteration, adaptable to other liquid food matrices [23] [26].

Materials and Equipment

High-field NMR spectrometer (≥400 MHz)
NMR tubes (e.g., 5 mm diameter)
Precision analytical balance
pH meter
Buffer solution (e.g., phosphate buffer, pH 7.0)
Deuterated solvent (e.g., D₂O)
Internal standard (e.g., TSP-d₄ or DSS for chemical shift referencing)

Sample Preparation

Weighing: Accurately weigh 40-50 mg of honey into a clean vial.
Solution Preparation: Add 600 µL of phosphate buffer (pH 7.0) and 60 µL of D₂O containing 0.1% TSP-d₄ as an internal reference standard. The buffer minimizes pH-induced chemical shift variations [23].
Mixing: Vortex the mixture for 30-60 seconds until the honey is completely dissolved.
Transfer: Pipette 550-600 µL of the prepared solution into a clean 5 mm NMR tube.

NMR Data Acquisition

Instrument Setup: Lock, tune, and shim the spectrometer using the D₂O signal to optimize magnetic field homogeneity.
Pulse Program Selection: Employ standard one-dimensional (1D) ^1H NMR pulse sequences with water suppression (e.g., NOESYPRESAT or zgpr) [23].
Parameter Setting:
- Spectral width: 12-14 ppm
- Number of scans: 64-128
- Relaxation delay: 4 seconds
- Acquisition temperature: 298 K
Data Collection: Run the experiment, which typically requires 5-10 minutes per sample.

Data Processing and Multivariate Analysis

Spectral Processing:
- Apply Fourier transformation to convert time-domain data to frequency-domain spectra.
- Perform phase correction and baseline correction.
- Calibrate the chemical shift scale to the internal standard (TSP-d₄ at 0.0 ppm).
Data Reduction:
- Segment spectra into small regions (buckets/bins) of equal width (e.g., 0.04 ppm).
- Integrate the signal intensity within each bucket.
- Normalize the data to the total integral or a reference standard to account for concentration differences.
Pattern Recognition:
- Apply unsupervised methods like Principal Component Analysis (PCA) for initial data exploration and outlier detection.
- Use supervised methods such as Orthogonal Partial Least Squares-Discriminant Analysis (OPLS-DA) to build classification models that differentiate authentic and adulterated samples or determine geographical origin [23].

Applications and Performance Data

FT-NIR Applications

FT-NIR spectroscopy has demonstrated exceptional performance across diverse food authentication scenarios:

Mediterranean Anchovy Authentication: FT-NIR coupled with Polynomial-SVM and Random Forest algorithms achieved 95.7% and 95.5% accuracy, respectively, in classifying anchovies from the Adriatic, Balearic, and Tyrrhenian Seas. Savitzky-Golay derivative preprocessing combined with Standard Normal Variate was critical for this high performance [25].
Paprika Adulteration Detection: FT-NIR successfully differentiated pure paprika from samples adulterated with illegal synthetic dyes (Sudan II-IV, Congo Red) at levels as low as 0.1%. The method achieved 96.8% accuracy in identifying adulterated samples and 100% accuracy in classifying Protected Designation of Origin (PDO) labels using OPLS-DA models [27].
Almond Geographical Origin: Systematic comparison of sample preparation techniques revealed that freeze-drying after grinding produced the most reliable classification models (80.2% accuracy), though analysis of whole almonds remained valuable for rapid screening with lower workload [29].

NMR Applications

NMR spectroscopy provides robust solutions for challenging authentication problems:

Honey Authenticity: NMR has been implemented as an official testing method in Estonia, effectively detecting adulteration with cheap sugar syrups and clearing the market of fraudulent products. The technique provides a comprehensive metabolite profile that enables verification of both authenticity and botanical/geographical origin [26].
Food Metabolite Profiling: A review of 43 NMR-based food authentication studies (2019-2024) highlighted its extensive application in differentiating truffle species, discriminating olive oil categories, and determining the geographical origin of seeds and dairy products based on their distinct metabolic fingerprints [23].
Turkey Meat Adulteration: NMR metabolomics successfully detected the injection of protein hydrolysates into turkey meat, even at low hydrolysis degrees where traditional amino acid analysis failed, demonstrating its advantage for identifying sophisticated adulteration practices [13].

Table 2: Quantitative Performance Metrics of Featured Applications

Application	Technique	Classification Accuracy	Key Analytes/Markers	Reference
Anchovy Origin	FT-NIR + P-SVM	95.7% (test set)	Non-linear spectral patterns	[25]
Paprika PDO Verification	FT-NIR + OPLS-DA	100%	Spectral fingerprints of geographic origin	[27]
Paprika Adulteration Detection	FT-NIR + PLS	96.8%	Sudan dyes, Congo Red	[27]
Almond Origin (Freeze-Dried)	FT-NIR + SVM	80.2%	Moisture-free spectral features	[29]
Honey Authenticity	NMR + PCA/OPLS-DA	Official method (Estonia)	Comprehensive metabolite profile	[26]
Turkey Meat Adulteration	NMR Metabolomics	Successful detection	Sugars, hydrolysis by-products	[13]

The Scientist's Toolkit: Essential Research Reagent Solutions

Successful implementation of FT-NIR and NMR fingerprinting requires specific reagents and materials to ensure analytical rigor and reproducibility.

Table 3: Essential Research Reagents and Materials

Item	Function/Application	Technical Considerations
Cryogenic Mill	Homogenizes solid samples to uniform particle size	Liquid nitrogen cooling prevents degradation of heat-sensitive compounds [29]
Freeze-Dryer	Removes water from samples for FT-NIR	Eliminates strong O-H absorption bands that can mask other spectral features [29]
Certified Reference Materials	Validates instrument performance and method accuracy	Essential for quality control and measurement traceability [15]
Deuterated Solvents (D₂O)	Provides lock signal for NMR spectroscopy	Enables stable magnetic field regulation during extended experiments [23]
Internal Standards (TSP-d₄)	Chemical shift reference for NMR	Provides a consistent δ = 0.0 ppm reference point unaffected by sample pH [23]
Buffer Solutions	Controls pH in NMR samples	Minimizes pH-induced chemical shift variations in metabolomic analyses [23]
SVM and RF Algorithms	Non-linear classification of spectral data	Superior to linear models for handling complex NIR spectra [25]
OPLS-DA Models	Supervised multivariate analysis for NMR data	Handles correlated X-variables and improves interpretation [23] [27]

Workflow Visualization

FT-NIR and NMR spectroscopy have established themselves as cornerstone analytical techniques in the validation of non-targeted methods for food authenticity research. FT-NIR excels as a rapid, cost-effective screening tool for high-throughput applications such as geographical origin verification and adulteration detection, particularly when coupled with robust non-linear machine learning algorithms [25] [27]. NMR provides definitive, multi-parametric metabolite profiling with exceptional reproducibility, making it invaluable for building standardized databases and for regulatory decision-making [23] [26]. The continuing development of both techniques hinges on addressing key challenges, including the standardization of sample preparation protocols, the expansion of comprehensive spectral databases, and the establishment of harmonized validation guidelines specifically designed for non-targeted approaches [23] [15]. As the field advances, the integration of FT-NIR and NMR data with other analytical platforms, alongside the refinement of chemometric models, will further enhance the capability to safeguard food integrity and combat sophisticated fraud throughout the global supply chain.

The globalization of the food supply chain has significantly increased the complexity of ensuring food authenticity, driving the need for high-throughput, accurate, and rapid analytical techniques [22]. Food authenticity verification now extends beyond simple adulteration detection to encompass quality evaluation, label compliance, traceability determination, and other quality-related aspects [22]. Non-targeted methods (NTMs) have emerged as powerful analytical strategies for detecting food fraud and authenticating food substances, as they can capture subtle differences in sample composition without focusing on predetermined analytes [8]. These methods typically combine highly resolved analytical fingerprints with advanced statistical modeling and machine learning for data evaluation [8].

Chromatography coupled with mass spectrometry represents the cornerstone of modern food authenticity testing. Among these platforms, GC-MS, LC-MS, and high-resolution accurate mass (HRAM) Orbitrap systems offer complementary capabilities for analyzing diverse food matrices. GC-MS excels in separating volatile and semi-volatile compounds, LC-MS handles non-volatile and thermally labile substances, while HRAM Orbitrap instrumentation provides superior mass accuracy and resolution for confident compound identification [30]. The integration of these platforms within a foodomics framework—combining proteomics, lipidomics, flavoromics, metabolomics, and genomics with biostatistics and bioinformatics—has revolutionized food authentication from field to table [22].

This application note details experimental protocols and technical considerations for implementing these platforms in non-targeted food authenticity research, with a specific focus on method validation parameters essential for generating defensible scientific data.

Experimental Protocols

Non-Targeted LC-HRMS Method for Distinguishing Spelt and Wheat

Principle: This protocol utilizes liquid chromatography coupled to high-resolution mass spectrometry (LC-HRMS) to obtain highly resolved spectral fingerprints of spelt and wheat cultivars, followed by convolutional neural network (CNN) modeling for classification [8].

Materials and Reagents:

Spelt and wheat cultivars (minimum eleven each for model training)
Acetonitrile (LC-MS grade)
Methanol (LC-MS grade)
Formic acid (LC-MS grade)
Deionized water (18.2 MΩ·cm)

Instrumentation:

Liquid chromatography system (UHPLC capable)
High-resolution mass spectrometer (Time-of-Flight or Orbitrap based)
Chromatographic column: C18 column (e.g., 2.1 × 100 mm, 1.7 μm)

Sample Preparation:

Grind grain samples to a fine powder using a laboratory mill.
Weigh 100 ± 5 mg of homogenized sample into a 2 mL microcentrifuge tube.
Add 1 mL of extraction solvent (acetonitrile:water:formic acid, 80:19:1 v/v/v).
Vortex vigorously for 1 minute, then shake for 10 minutes.
Centrifuge at 14,000 × g for 10 minutes at room temperature.
Transfer supernatant to an autosampler vial for LC-HRMS analysis.

LC Conditions:

Mobile Phase A: 0.1% formic acid in water
Mobile Phase B: 0.1% formic acid in acetonitrile
Flow Rate: 0.3 mL/min
Column Temperature: 40°C
Injection Volume: 5 μL
Gradient Program:
- 0-2 min: 5% B
- 2-15 min: 5-95% B
- 15-18 min: 95% B
- 18-18.1 min: 95-5% B
- 18.1-21 min: 5% B (column re-equilibration)

HRMS Conditions:

Ionization Mode: Electrospray ionization (ESI), positive and negative modes
Mass Range: 100-1500 m/z
Resolution: >30,000 FWHM
Sheath Gas Flow: 40 arb
Aux Gas Flow: 10 arb
Spray Voltage: 3.5 kV
Capillary Temperature: 320°C

Data Processing and CNN Modeling:

Convert raw files to standardized format (e.g., mzML).
Perform peak picking, alignment, and retention time correction.
Normalize peak intensities using total ion current or probabilistic quotient normalization.
Format 2D spectral data (m/z vs. retention time) as images for CNN input.
Implement CNN architecture with convolutional, pooling, and fully connected layers.
Train models using nested cross-validation to prevent overfitting.
Validate model performance using an independent sample set including artificially mixed spectra and processed goods.

Validation Parameters:

Calculate D-score metric for classification decisions
Assess specificity and sensitivity across multiple cultivars
Test model robustness with untypical spelt and old wheat cultivars not included in training

LC-Orbitrap-HRMS Screening Method for Antibiotics in Milk

Principle: This protocol describes a multiresidue screening method for 57 antibiotic compounds in bovine, ovine, and goat milk using LC-Orbitrap-HRMS, compliant with Commission Implementing Regulation (EU) 2021/808 [30].

Materials and Reagents:

Milk samples (bovine, ovine, goat)
Mixed antibiotic standards (57 compounds including beta-lactams, tetracyclines, sulfonamides, quinolones, pleuromutilins, macrolides, lincosamides)
Acetonitrile (LC-MS grade)
Methanol (LC-MS grade)
Ethylenediaminetetraacetic acid (EDTA)
Formic acid (LC-MS grade)
Deionized water (18.2 MΩ·cm)
HLB PRiME solid-phase extraction cartridges

Instrumentation:

Liquid chromatography system
Orbitrap high-resolution mass spectrometer
Chromatographic column: C18 column (e.g., 100 × 2.1 mm, 1.7 μm)

Sample Preparation:

Aliquot 5 ± 0.1 mL of milk into a 15 mL centrifuge tube.
Add 100 μL of 0.1 M EDTA solution to chelate calcium ions and prevent tetracycline complexation.
Add 5 mL of acetonitrile for protein precipitation.
Vortex for 30 seconds, then centrifuge at 4500 × g for 5 minutes at 10°C.
Load supernatant directly onto HLB PRiME cartridge without prior activation or conditioning.
Collect eluent and evaporate to dryness under nitrogen stream at 40°C.
Reconstitute residue in 1 mL of mobile phase (water:acetonitrile, 95:5 v/v with 0.1% formic acid).
Transfer to autosampler vial for LC-HRMS analysis.

LC Conditions:

Mobile Phase A: 0.1% formic acid in water
Mobile Phase B: 0.1% formic acid in acetonitrile
Flow Rate: 0.3 mL/min
Column Temperature: 40°C
Injection Volume: 10 μL
Gradient Program:
- 0-1 min: 5% B
- 1-10 min: 5-100% B
- 10-12 min: 100% B
- 12-12.1 min: 100-5% B
- 12.1-15 min: 5% B (column re-equilibration)

HRMS Conditions (Orbitrap):

Ionization Mode: Heated electrospray ionization (HESI), positive mode
Resolution: 70,000 FWHM (at m/z 200)
Mass Range: 150-1500 m/z
Sheath Gas Flow: 35 arb
Aux Gas Flow: 10 arb
Spray Voltage: 3.8 kV
Capillary Temperature: 320°C
Vaporizer Temperature: 300°C
Automatic Gain Control Target: 1e6

Validation Procedure (per EU 2021/808):

Detection Capability (CCβ): Analyze 25 blank milk samples (10 sheep, 10 cow, 5 goat) spiked at screening target concentration (STC). Demonstrate false positive rate <5%.
Specificity: Analyze 25 blank milk samples from different sources to ensure no false positives.
Stability: Evaluate standard solution stability over 5 time points using y/x ratio.
Ruggedness: Test method robustness by varying 4 factors at 2 levels each (centrifuge temperature/speed, HLB PRiME conditioning, EDTA amount).

Results and Data Presentation

Performance Comparison of Mass Spectrometry Platforms

Table 1: Technical Specifications of Chromatographic and Mass Spectrometry Platforms for Food Authenticity

Parameter	GC-MS	LC-MS	HRAM Orbitrap
Mass Accuracy	0.1-0.5 Da	<5 ppm	<1-2 ppm
Mass Resolution	Unit resolution	5,000-20,000	>50,000-240,000
Dynamic Range	10³-10⁴	10³-10⁵	10³-10⁵
Applicable Compound Classes	Volatile, semi-volatile, thermally stable compounds	Non-volatile, thermally labile, polar compounds	Comprehensive coverage including unknowns
Food Authenticity Applications	Origin analysis [31], flavor profiling [22]	Spelt/wheat discrimination [8], antibiotic screening [30]	Honey authenticity [32], olive oil adulteration [33]
Key Advantages	Excellent separation efficiency, established compound libraries	Broad compound coverage, minimal derivatization	Superior mass accuracy, retrospective data analysis

Table 2: Validation Parameters for Non-Targeted Methods in Food Authenticity

Validation Parameter	Requirement	Assessment Method
Detection Capability (CCβ)	<5% false negative rate	Analysis of 25 spiked samples at screening target concentration [30]
Specificity	<5% false positive rate	Analysis of 25 blank samples from different sources [30]
Stability	Consistent response over time	Evaluation of standard solutions at 5 time points [30]
Robustness	Method resilience to minor changes	Variation of 4 factors at 2 levels each [30]
Classification Accuracy	High sensitivity and specificity	D-score metric, nested cross-validation [8]

Experimental Workflows

The Scientist's Toolkit

Table 3: Essential Research Reagent Solutions for Food Authenticity Testing

Reagent/Consumable	Function	Application Example
HLB PRiME Cartridges	Phospholipid removal and general cleanup; no activation required	Milk sample preparation for antibiotic screening [30]
EDTA (Ethylenediaminetetraacetic acid)	Chelating agent that binds calcium ions to prevent tetracycline complexation	Milk analysis to improve antibiotic recovery [30]
C18 Chromatographic Columns	Reversed-phase separation of non-polar to medium polarity compounds	LC-MS analysis of spelt/wheat markers [8]
Acetonitrile with Acid Modifiers	Protein precipitation and compound extraction	Sample preparation for honey authenticity testing [32]
Stable Isotope-Labeled Standards	Internal standards for quantification and quality control	Isotope analysis for beverage authenticity [33]

Chromatographic and mass spectrometry platforms provide powerful analytical capabilities for addressing the complex challenges of food authenticity research. GC-MS, LC-MS, and HRAM Orbitrap systems each offer unique advantages that can be leveraged based on specific analytical requirements. The implementation of properly validated non-targeted methods, incorporating advanced machine learning approaches like convolutional neural networks, enables robust discrimination of food commodities such as spelt and wheat [8] and reliable screening of contaminants like antibiotics in milk [30].

The future of food authenticity testing lies in the continued development of integrated, automated, and data-driven approaches. Trends point toward increased use of artificial intelligence, machine learning, and portable devices for real-time verification throughout the supply chain [34]. Multi-omics strategies that combine proteomics, genomics, metabolomics, and lipidomics will further enhance our ability to ensure food authenticity from field to table [22]. As regulatory frameworks continue to tighten globally, the implementation of validated NTMs using chromatographic and mass spectrometry platforms will be essential for ensuring food safety, quality, and consumer protection.

Genomics and Next-Generation Sequencing (NGS) for Multi-Species Screening

In the face of increasing global challenges regarding food authenticity and fraud, non-targeted methods (NTMs) have emerged as powerful tools for comprehensive food analysis [22]. Within this framework, genomics and Next-Generation Sequencing (NGS) provide unparalleled capabilities for multi-species screening, enabling the simultaneous identification of numerous plant, fish, and meat species within complex food matrices without prior knowledge of their composition [35]. The stability of DNA makes genomics particularly suitable for analyzing deeply processed food products and detecting contaminants to ensure food safety [22]. The adoption of NGS-based untargeted approaches revolutionizes food authenticity testing by shifting the fundamental question from "Is species X present?" to "Which species are present?" in a single test [35]. This application note details the experimental protocols, bioinformatics workflows, and validation frameworks essential for implementing NGS-based multi-species screening within food authenticity research, providing researchers with practical guidance for deploying these powerful analytical tools.

Principles of NGS-Based Multi-Species Screening

Next-Generation Sequencing enables multi-species screening through DNA metabarcoding, which involves the amplification and sequencing of short, conserved DNA regions that contain variable sequences sufficient to discriminate between species [36]. This approach leverages the extensive reference databases of nucleotide sequences available in public repositories [36]. Unlike targeted methods such as real-time PCR, which require prior knowledge of potential adulterants and are limited in the number of targets that can be simultaneously detected, NGS-based screening provides an untargeted, comprehensive analysis of all DNA-containing ingredients in a sample [35].

Two primary NGS approaches are utilized in food authenticity testing: metagenomics, which involves sequencing all DNA in a sample, and metabarcoding, which amplifies and sequences specific conserved DNA fragments [36]. Metabarcoding offers several advantages for routine authentication, including reduced costs, extensive reference databases, and simpler analysis workflows, though it may offer lower taxonomic resolution and is susceptible to PCR artifacts compared to metagenomics [36].

The core principle underlying NGS-based species identification is that each DNA-containing ingredient produces a unique DNA sequence that can be compared against curated databases, generating a complete list of all species present in a sample [35]. This capability is particularly valuable for detecting unexpected adulterants or substitutions that might not be identified through targeted approaches.

Experimental Design and Workflow

Sample Collection and Storage

Proper sample collection and storage are critical steps in NGS analysis to ensure sample representativeness and integrity [37]. The quantity of samples and repetitions significantly impacts data accuracy and reproducibility, requiring researchers to balance statistical power with practical constraints of processing capacity [37].

Sample Selection: Collect representative samples from the food matrix of interest. For heterogeneous products, ensure adequate sampling across different batches or production dates [37].
Storage Conditions: Snap freezing, rapid drying, or chemical preservatives can be applied to prevent microbial growth or nucleic acid degradation [37]. Most food samples should be stored at 4°C for short-term preservation or at -20°C to -80°C for long-term storage [37].
Sample-Specific Considerations: Tailor sampling techniques to the food matrix. For raw meat surfaces, pooled swabs may improve microbial recovery, while for fermented or cured products, direct sampling may be more appropriate [37].

Nucleic Acid Extraction

Effective nucleic acid extraction from diverse food matrices enables subsequent detection and analysis of genetic material [37]. The extraction process consists of three fundamental steps: lysis, purification, and nucleic acid recovery.

Cell Lysis: Employ chemical, enzymatic, mechanical, or combined methods based on sample complexity. For challenging matrices like romaine lettuce, enzymatic lysis combined with mechanical disruption has proven effective [37].
Purification: Separate DNA from cellular debris, proteins, and inhibitors using liquid-liquid extraction (LLE) or solid-phase extraction (SPE) methods [37]. SPE kits utilizing silica-based filters reduce reliance on organic solvents and enhance efficiency [37].
Protocol Selection: Choose extraction methods based on food matrix characteristics:
- High-fat or polyphenol-rich matrices: Require optimized protocols with additional purification steps [37].
- Fermented products: CTAB and chloroform LLE method has been successfully used for DNA isolation [37].
- Raw meat products: Commercial SPE kits provide efficient DNA recovery [37].

Library Preparation and Sequencing

Library preparation generates DNA fragments of specific size ranges suitable for sequencing. The two major approaches for targeted NGS analysis are hybrid capture-based and amplification-based methods [38].

Amplification-Based Methods (Amplicon Sequencing): Utilize PCR primers to amplify specific target regions [38]. This approach is commonly used in metabarcoding with universal primers targeting conserved genomic regions [36].
Hybrid Capture Methods: Use sequence-specific biotinylated oligonucleotide probes that hybridize to target regions [38]. This approach can tolerate several mismatches in probe binding sites without interfering with hybridization [38].
Barcode Selection: Choose appropriate DNA barcodes based on predicted performance characteristics. Short barcodes are preferable for degraded samples, while longer barcodes may provide better discrimination between closely related species [36].

Table 1: Comparison of NGS Platforms for Food Authentication

Platform Type	Examples	Technology	Read Length	Applications in Food Science
Short-Read	Illumina (MiSeq, HiSeq, NovaSeq), Ion Torrent (PGM, GeneStudio S5)	Sequencing by synthesis (SBS) with reversible terminators (Illumina) or pH detection (Ion Torrent)	Short (75-400 bp)	Metabarcoding, targeted gene panels, metagenomics [37]
Long-Read	Pacific Biosciences (PacBio), Oxford Nanopore	Single-Molecule Real-Time (SMRT) sequencing (PacBio), nanopore sensing (Oxford Nanopore)	Long (>10 kb)	Complete genome assembly, structural variant detection [37]

Bioinformatics Analysis Pipeline

The bioinformatics pipeline is critical for processing raw NGS data into meaningful taxonomic assignments [39]. Proper validation of this component is essential for accurate species identification [39].

Read Preprocessing and Quality Control

Primer Trimming: Remove primer sequences from 5' and 3' ends of reads using tools like cutadapt with an error rate of 0.1 [36].
Quality Filtering: Discard reads shorter than 50 bp and trim low-quality bases using tools like fastp with a window of 4 bp and minimum quality of 25 [36].
Read Merging: Merge paired-end reads using VSearch or DADA2, applying quality filters to retain pseudo-reads between 70-100 bp with a maximum of 2 expected errors [36].

Sequence Clustering and Denoising

De Novo Identity Clustering: Dereplicate sequences and cluster with VSearch using identity levels between 0.97 and 1.0, discarding clusters with fewer than 2 reads [36].
Denoising: Use DADA2 to correct reads using an error model, merge corrected reads while allowing for 1 mismatch, and remove chimeras [36].

Taxonomic Assignment

Database Selection: Create a masked database by filtering sequences corresponding to relevant taxa (e.g., Vertebrates) and removing extinct taxa [36].
Sequence Matching: Search OTUs or ASVs against the reference database using BLAST+ with megablast searches, applying filters for e-value (1.0×10⁻¹⁰), identity (97%), and coverage (100%) [36].
Consensus Taxonomy: Apply a bitscore filter of 4, discarding matches with bitscore differences to the best match above this threshold. Determine consensus taxon using a majority vote with a minimum threshold of 0.51 [36].

Validation Framework for NGS Methods

Validation of NGS methods for non-targeted food authenticity research requires careful consideration of both wet and dry laboratory components [38] [39]. The Association of Molecular Pathology (AMP) and College of American Pathologists have established recommendations that can be adapted for food authentication applications [38] [39].

Experimental Validation Design

Sample Selection: Use a minimum of 59 samples to establish 95% confidence that the passing rate is at least 95% (commonly summarized as "95/95") [40]. Include samples representing expected specimen types and problematic matrices that may be encountered in routine testing [40].
Reference Materials: Utilize well-characterized reference samples with known compositions. Include at least two samples for which a consensus sequence across all regions of interest has been previously established [40].
Limit of Detection (LOD): Determine the lowest species fraction that can be reliably detected for 95% of samples. For meat authentication, a concentration threshold of 0.1% has been applied in validation studies [36].

Performance Metrics

Positive Percentage Agreement (PPA): Calculate for each variant type to determine the percentage of known variants detected by the test [40].
Positive Predictive Value (PPV): Determine the percentage of called variants that are true positives [40].
Precision and Recall: Calculate using benchmark modules that compare observed compositions to expected values at appropriate taxonomic levels [36].

Table 2: Validation Parameters for NGS-Based Multi-Species Screening

Validation Parameter	Recommended Approach	Acceptance Criteria
Accuracy	PPA and PPV for each variant type [40]	≥95% for each variant type [40]
Precision/Repeatability	Within-run duplicates without anticipated variability sources [40]	Quantified variability across NGS workflow steps [40]
Reproducibility	Testing across multiple operators, instruments, and reagent lots [40]	Consistent results across variability sources [40]
Limit of Detection	Testing dilution series of known compositions [36]	Reliable detection at 0.1% concentration [36]
Analytical Specificity	Testing against closely related species and potential interferents [40]	No cross-reactivity with non-target species [40]

Bioinformatics Pipeline Validation

The bioinformatics pipeline should be validated using method-based paradigms with well-characterized samples that reflect the variant population and allele frequencies anticipated in routine testing [40].

Reference Materials: Use well-characterized cell lines or synthetic sequences that reflect the variant population anticipated in clinical service [40].
In Silico Validation: If physical reference samples are unavailable, validate using sequence files generated from well-characterized samples [40].
Parameter Optimization: Systematically benchmark algorithms and parameter sets using real samples spanning multiple taxa [36].

Research Reagent Solutions

Table 3: Essential Research Reagents for NGS-Based Multi-Species Screening

Reagent Category	Specific Examples	Function and Application
DNA Extraction Kits	Silica-based SPE kits, CTAB-chloroform method [37]	Isolation of high-quality DNA from complex food matrices; kit selection depends on sample composition (e.g., high-fat, polyphenol-rich) [37]
Library Preparation Kits	SGS All Species ID Food DNA Analyser Kits [35]	Preparation of sequencing libraries specifically optimized for multi-species identification in food matrices [35]
Sequencing Consumables	Ion Torrent sequencing reagents, Illumina sequencing reagents [35] [37]	Platform-specific reagents for template preparation and sequencing; choice depends on platform (Ion Torrent, Illumina) [37]
Quality Control Reagents	Qubit quantification reagents, gel electrophoresis supplies [35]	Assessment of nucleic acid concentration, library quality, and fragment size distribution prior to sequencing [35]
PCR Reagents	High-fidelity DNA polymerases, universal primer sets [36]	Amplification of target barcode regions; primer selection depends on target taxa and barcode region [36]
Bioinformatics Tools	FooDMe, VSearch, DADA2, BLAST+, cutadapt [36]	Processing, analyzing, and interpreting sequencing data; tool selection depends on analysis approach (clustering vs. denoising) [36]

Applications in Food Authenticity Research

NGS-based multi-species screening has been successfully applied across diverse food authenticity scenarios:

Meat Product Authentication: Identification of species substitution in processed meat products where morphological identification is impossible [22] [36]. DNA analysis allows verification of label claims and detection of undeclared species [22].
Seafood Traceability: Species identification irrespective of processing and storage conditions, promoting sustainable fisheries and stabilizing marine ecosystems [22].
High-Value Oil Authentication: Determination of olive oil composition and geographic origin through DNA analysis, detecting adulteration with cheaper vegetable oils [22].
Dairy and Fermented Product Analysis: Characterization of microbial communities in fermented foods, enabling quality control and detection of spoilage organisms [37] [41].

Genomics and Next-Generation Sequencing provide powerful capabilities for multi-species screening in food authenticity research. The untargeted nature of NGS approaches enables comprehensive detection of unexpected adulterants and species substitutions that may be missed by targeted methods. Successful implementation requires careful attention to sample processing, sequencing platform selection, bioinformatics analysis, and rigorous validation. As reference databases expand and sequencing costs decrease, NGS-based methods are poised to become increasingly accessible for routine food authentication, providing researchers and regulatory bodies with robust tools to combat food fraud and ensure product integrity throughout the global food supply chain.

Isotope Ratio Mass Spectrometry (IRMS) for Geographic Origin Verification

Within the framework of non-targeted methods (NTM) for food authenticity research, verifying a product's geographic origin remains a significant challenge. Isotope Ratio Mass Spectrometry (IRMS) has emerged as a powerful analytical technique that addresses this challenge by measuring the natural abundance ratios of stable isotopes in food samples. These ratios serve as a unique chemical fingerprint, providing an intrinsic link to the geographic and environmental conditions of a product's origin [42] [43]. The technique leverages the principle that the isotopic composition of light elements—such as Carbon (C), Nitrogen (N), Hydrogen (H), and Oxygen (O)—in a plant or animal tissue reflects the environmental conditions of its growth location, including climate, water source, soil composition, and agricultural practices [44] [45]. This application note details the protocols and data interpretation frameworks for using IRMS in geographic origin verification, contextualized within non-targeted food authenticity research.

Fundamental Principles and Application Data

Stable isotope ratios are expressed in delta (δ) notation relative to international standards, in parts per thousand (‰). The variations in these ratios are incorporated into organisms through diet, water, and exchange with the local environment, creating a measurable geographic signature [46].

Recent research demonstrates the power of multi-element isotope analysis. The table below summarizes illustrative data from a study on Volvariella volvacea (straw mushroom), showcasing how isotope values vary across regions in China [45].

Table 1: Stable Isotope Ratios and Classification Accuracy for Volvariella volvacea from Different Geographic Origins

Geographic Origin	δ¹³C (‰)	δ¹⁵N (‰)	δ²H (‰)	δ¹⁸O (‰)	PLS-DA Classification Accuracy
Fujian, Hubei, Jiangxi, Zhejiang (FHJZ Group)	Significantly higher	Significantly higher	-	-	Required further improvement
Guangdong, Jiangsu, Shanghai (GJS Group)	-	-	Significantly higher (Shanghai)	Significantly higher (Shanghai)	> 80% (within-group)
Overall Model (FHJZ vs. GJS)	-	-	-	-	93.60%

The data show that δ¹³C and δ¹⁵N values were significantly higher in the FHJZ group, while samples from Shanghai in the GJS group had significantly higher δ²H and δ¹⁸O values [45]. These differences enabled a Partial Least Squares Discriminant Analysis (PLS-DA) model to classify the samples into the two broad geographic groups with high accuracy, validating the feasibility of the technique [45].

The isotopic composition is influenced by several key factors:

δ¹³C: Primarily indicates the type of photosynthetic cycle (C3 vs. C4) of the plant's carbon source. For example, fungi grown on substrates from C3 plants (e.g., straw, waste cotton) have lower δ¹³C values, while those from C4 plants (e.g., maize, sugarcane) have higher values [45].
δ¹⁵N: Reflects the nitrogen source. Animal manure results in higher δ¹⁵N values due to isotopic fractionation, while plant-based or chemical fertilizers result in lower values [45] [46].
δ²H and δ¹⁸O: Strongly correlated to local precipitation and water sources. Water from low-latitude, low-altitude regions near the ocean generally leads to higher δ²H and δ¹⁸O values in organisms [45].

Experimental Protocols

Standardized Method for EA-IRMS Analysis

The European standard BS EN 18054:2025 provides a definitive protocol for determining carbon and/or nitrogen isotope ratios in food using Elemental Analyser-Isotope Ratio Mass Spectrometry (EA-IRMS) [47]. The following workflow details the core steps.

Figure 1: EA-IRMS analytical workflow for determining C and N isotope ratios.

1. Sample Preparation:

Homogenization: The food sample is freeze-dried and ground into a fine, homogeneous powder using a ball mill or similar device [45].
Sieving: The powder is passed through a sieve (e.g., 0.15 mm) to ensure consistent particle size [45].
Storage: Prepared samples are stored in a desiccator to prevent absorption of atmospheric water, which could alter the hydrogen and oxygen isotope signatures [45].

2. Combustion and Gas Conversion:

A precisely weighed aliquot of the sample (typically a few milligrams) is loaded into a tin or silver capsule and introduced into the Elemental Analyser via an auto-sampler.
Within the EA, the sample undergoes complete combustion at approximately 1000 °C in the presence of oxygen [47] [42].
This process converts:
- Carbon in the sample to CO₂.
- Nitrogen to N₂.
- Other elements (e.g., Hydrogen, Oxygen, Sulfur) are converted to their respective gases, which may be removed or routed to different detection systems depending on the analytical focus [42].

3. Gas Purification and Separation:

The resultant gas mixture is passed through a series of specific chemical traps (e.g., to remove water) and a gas chromatography (GC) column [42].
The GC column separates the pure N₂ and CO₂ gases before they are introduced into the mass spectrometer in a continuous helium flow.

4. Isotope Ratio Mass Spectrometry (IRMS):

The purified gases (N₂, CO₂) enter the ion source of the IRMS, where they are ionized by electron impact.
The resulting ions are accelerated and separated by a magnetic field based on their mass-to-charge ratio (m/z). Key measured ions include:
- For CO₂: m/z 44 (¹²C¹⁶O¹⁶O), 45 (¹³C¹⁶O¹⁶O or ¹²C¹⁶O¹⁷O), and 46 (¹²C¹⁶O¹⁸O).
- For N₂: m/z 28 (¹⁴N¹⁴N), 29 (¹⁴N¹⁵N).
The ion currents for these different masses are simultaneously measured by dedicated Faraday cups, and the ratios (e.g., ¹³C/¹²C, ¹⁵N/¹⁴N) are calculated with high precision [47] [42].

Advanced Consideration: Compound-Specific Isotope Analysis

For greater specificity, Gas Chromatography-Combustion-IRMS (GC-C-IRMS) can be employed. This technique separates individual compounds from a complex mixture (e.g., specific fatty acids in oils or amino acids in meat) via GC before combusting each compound to CO₂ and N₂ for isotopic analysis. This "compound-specific" approach can provide more refined geographic information and is noted as a promising future direction for tracing herb origins [44].

The Scientist's Toolkit: Essential Research Reagents and Materials

The following table lists key consumables and reagents required for IRMS analysis as per the cited protocols.

Table 2: Essential Research Reagent Solutions for IRMS Analysis

Item	Function / Application	Key Specification / Note
Tin / Silver Capsules	Encapsulation of solid samples for introduction into the Elemental Analyser.	High purity to prevent background contamination.
International Isotopic Standards	Calibration of the IRMS instrument and normalization of sample data to the VPDB (C), AIR (N), VSMOW (H, O) scales.	Certified reference materials (e.g., IAEA standards).
High-Purity Gases	Carrier gas (Helium), oxidant (Oxygen), reference gas (CO₂, N₂).	Ultra-high purity (≥99.999%) to ensure analytical accuracy.
Activated Charcoal	Cleaning of water samples to remove organic contaminants that can interfere with IRIS/IRMS analysis.	Critical for accurate δ²H and δ¹⁸O analysis of plant/soil waters [48].
Deionized / H₂¹⁸O Water	Solvent for extraction and isotopic spike experiments.	For procedures requiring hydrolysis in controlled isotopic media [49].
Chemical Traps	Removal of unwanted combustion products (e.g., water traps, halogen traps).	Ensures only the target gases (N₂, CO₂) enter the IRMS.

Data Analysis and Integration into Non-Targeted Frameworks

The raw isotope ratio data must be processed with robust statistical models to be effective for origin verification.

1. Data Pre-processing: Isotope ratios are corrected for instrument drift and normalized to international scales using certified reference materials analyzed in the same sequence.

2. Statistical Modeling and Pattern Recognition:

Exploratory Analysis: Principal Component Analysis (PCA) is often used as an initial unsupervised method to visualize natural clustering of samples based on their multi-element isotope profiles [45]. This can reveal broad geographic groupings, as seen in the separation of the FHJZ and GJS mushroom groups.
Classification Modeling: Supervised methods like Partial Least Squares-Discriminant Analysis (PLS-DA) are then applied to build predictive models for geographic origin. The high (93.6%) accuracy achieved in the mushroom study demonstrates the power of this approach [45].
Data Fusion: For the highest discriminative power, isotope data can be combined with other analytical data, such as elemental profiles (from ICP-MS), genetic markers, or spectroscopic data, within a multivariate model [42] [45]. This multi-analyte strategy is a cornerstone of modern non-targeted authenticity research.

The logical progression from data acquisition to verification is summarized below.

Figure 2: Data analysis workflow for geographic origin verification.

Food fraud, driven by economic motives, is a pervasive global challenge that compromises food safety, consumer trust, and market stability [22]. Incidents such as species substitution in meat and seafood, mislabeling of high-value olive oil, and adulteration of honey cost the industry an estimated $30–40 billion annually [50]. Verifying food authenticity now extends beyond simple adulteration detection to encompass comprehensive quality evaluation, label compliance verification, and traceability determination from field to table [22] [51].

Foodomics has emerged as a powerful, interdisciplinary framework to address these challenges. This approach integrates advanced omics technologies—including genomics, proteomics, and metabolomics—with bioinformatics and chemometrics to provide a comprehensive molecular characterization of food products [52] [53]. Unlike traditional targeted methods that focus on predefined analytes, non-targeted methods (NTMs) within foodomics exploit the entire compositional profile of a food sample, generating distinctive chemical "fingerprints" that can be used for authentication purposes [15] [7]. This application note details specific foodomics case studies and protocols, contextualized within the broader framework of validating NTMs for food authenticity research.

Foodomics Case Studies & Experimental Data

The following case studies illustrate the practical application of foodomics and NTMs across various food matrices susceptible to fraud. The summarized data in the table below highlights the analytical techniques, key findings, and methodological considerations for each application.

Table 1: Summary of Foodomics Case Studies for Food Authentication

Food Matrix	Fraud Type	Foodomics Approach	Analytical Technique	Key Experimental Findings/Performance
Spelt vs. Wheat	Mislabeling, substitution	Proteomics (Non-targeted)	LC-HRMS, Convolutional Neural Networks (CNN)	CNN models automatically learned patterns to distinguish spelt from wheat; method validated on bread and flour. [8]
Meat Products	Species substitution, false origin	Genomics	PCR, DNA Barcoding	DNA from natural tracers (e.g., wheat/rice in lard) verified ham batch and production year. [22]
Seafood	Species substitution	Genomics	DNA extraction, PCR amplification	Enabled species identification irrespective of processing and storage conditions; promoted sustainable fisheries. [22]
Olive Oil	Adulteration, mislabeling of origin	Genomics	ddPCR, DNA Fingerprinting	ddPCR overcame challenges of degraded DNA and PCR inhibitors in olive oil for origin/variety assessment. [22]
Dairy Products	Pathogen risk, authenticity	Proteomics, Genomics	MS-based approaches	Quantitative assessment of pathogenic microorganisms to ensure safety and authenticity. [22]
Herbal Medicines	Adulteration, substitution	Non-targeted Methods	MS, NMR, Chemometrics	A general workflow was shown for fraud detection, highlighting the versatility of NTMs. [54]

Detailed Experimental Protocols

Protocol: Non-Targeted LC-HRMS Method for Distinguishing Spelt from Wheat

This protocol describes an NTM for distinguishing spelt and wheat, a common fraud area in grain products [8].

Research Reagent Solutions

Table 2: Essential Research Reagents and Materials

Item	Function/Explanation
Spelt & Wheat Cultivars	Authentic, verified reference materials crucial for model training and validation.
Liquid Chromatography (LC) Solvents	High-purity mobile phases (e.g., water, acetonitrile with modifiers) for peptide separation.
Trypsin	Protease enzyme used to digest proteins into peptides for mass spectrometric analysis.
Calibration Standards	Standard compounds for mass accuracy calibration of the high-resolution mass spectrometer.

Step-by-Step Procedure

Sample Preparation:
- Grind grain samples to a fine, homogeneous powder.
- Extract proteins using an appropriate buffer (e.g., urea-based or SDS-containing buffer).
- Digest the extracted proteins into peptides using trypsin overnight at 37°C.
- Desalt the resulting peptide mixture using C18 solid-phase extraction tips.
LC-HRMS Analysis (Fingerprint Acquisition):
- Inject the desalted peptide samples onto the LC-HRMS system.
- Separation: Use a reversed-phase C18 column with a gradient of water and acetonitrile (both with 0.1% formic acid) over 20-60 minutes.
- Mass Spectrometry: Operate the HRMS (e.g., Q-TOF) in data-dependent acquisition (DDA) or data-independent acquisition (DIA like SWATH) mode in positive electrospray ionization (ESI+).
- Acquire MS1 spectra (and MS2 fragments if applicable) across a mass range of 100-1700 m/z.
Data Processing and Model Building (Dry Lab):
- Process raw spectral data to create a data matrix of features (retention time, m/z, intensity).
- Model Training: Train a Convolutional Neural Network (CNN) model using a calibration dataset comprising duplicate measurements of multiple, verified spelt and wheat cultivars.
- Use a nested cross-validation (NCV) approach to optimize model parameters and prevent overfitting.
- Validation: Test the model's performance using an external validation set that includes processed goods (e.g., spelt bread), atypical cultivars, and artificially mixed samples.

Protocol: DNA-Based Authentication of Olive Oil Origin

This protocol uses genomics to verify the geographical and varietal origin of olive oil, which is frequently adulterated [22].

Research Reagent Solutions

Table 3: Essential Research Reagents and Materials for Genomics

Item	Function/Explanation
DNA Extraction Kit	Specifically optimized for oily matrices to co-extract and remove polysaccharides and polyphenols.
ddPCR Supermix	Reaction mix for droplet digital PCR, resistant to inhibitors and allowing absolute quantification.
Species/Variety-Specific Primers & Probes	Designed against conserved DNA regions (e.g., chloroplast DNA) for specific amplification.
Authentic Olive Oil Reference Materials	Oils with verified geographical origin and varietal composition, essential for establishing a reference database.

Step-by-Step Procedure

DNA Extraction:
- Extract genomic DNA from 10-50 mL of olive oil using a commercial kit designed for difficult plant matrices or oily samples.
- Include steps to wash away co-extracted PCR inhibitors like polysaccharides and polyphenols.
- Quantify and assess the quality of the extracted DNA using a fluorometer.
Droplet Digital PCR (ddPCR) Assay:
- Prepare the ddPCR reaction mix containing the supermix, fluorescenctly-labeled probes, and primers specific to the target olive variety or species, and the extracted DNA template.
- Generate approximately 20,000 nanoliter-sized droplets from the reaction mixture using a droplet generator.
- Perform PCR amplification on the droplet emulsion with a standard thermal cycling protocol.
- Read the plate in a droplet reader to count the number of positive (target DNA present) and negative (target DNA absent) droplets.
Data Analysis:
- Use the reader's software to apply a threshold to distinguish positive and negative droplets.
- The concentration of the target DNA sequence in the original sample (copies/μL) is automatically calculated using Poisson statistics.
- Compare the results to a database of ddPCR results from authentic, geographically-sourced reference oils to verify the origin and detect adulteration.

Workflow and Data Analysis Diagrams

Non-Targeted Foodomics Workflow

The following diagram illustrates the generalized, end-to-end workflow for applying non-targeted foodomics to food authentication, integrating both laboratory and computational steps.

NTM Validation Roadmap

For integration into a thesis on NTM validation, this diagram outlines a high-level roadmap for the validation process, highlighting key considerations to ensure the method is fit-for-purpose [15] [7].

The case studies and protocols presented herein demonstrate the power of foodomics as a unified, data-driven approach for tackling food authenticity challenges across diverse commodity types. The non-targeted methodologies, particularly when combined with advanced machine learning for pattern recognition, offer a robust defense against evolving fraudulent practices by not relying on a pre-defined list of adulterants [8].

Critical to the adoption of these methods in regulatory and commercial settings is rigorous validation, as framed in the thesis context. Key challenges that must be addressed during validation include the critical need for well-characterized reference materials (RMs) with documented provenance [50], managing data heterogeneity from multiple omics platforms [22], and the requirement for sophisticated bioinformatics expertise [55]. Furthermore, a lack of standardized protocols can lead to significant inter-laboratory variability, underscoring the need for harmonized approaches as highlighted by initiatives like the Periodic Table of Food Initiative (PTFI) [53].

In conclusion, foodomics provides an unparalleled depth of insight into food composition, enabling definitive authentication from field to table. Future advancements will depend on collaborative efforts to standardize methods, develop high-quality reference materials, and integrate foodomics data with emerging technologies like AI and blockchain to enhance predictive modeling and supply chain transparency [55]. For researchers, focusing on the validation roadmap is essential to translate these powerful non-targeted methods from academic research into reliable tools for ensuring food authenticity and safety.

Navigating Challenges: Key Considerations for Robust NTM Implementation

Overcoming Data Heterogeneity and Integration from Multiple Omics Platforms

In non-targeted methods (NTMs) for food authenticity research, the integration of multi-omics data has emerged as a powerful strategy to combat sophisticated food fraud [22]. Unlike targeted analyses that seek a predefined "needle in a haystack," NTMs exploit all constituents of a sample, generating complex, high-dimensional datasets from genomics, proteomics, metabolomics, and other omics platforms [15] [7]. However, the convergence of these disparate data types presents a significant challenge: data heterogeneity. This application note details the sources of this heterogeneity and provides structured protocols and solutions for effective data integration, enabling robust verification of food authenticity.

The Multi-Omics Data Heterogeneity Challenge

Data heterogeneity in multi-omics studies arises from the inherent differences in the nature of various omics technologies and the data they produce. This heterogeneity poses a major bottleneck for researchers aiming to integrate data for a holistic view of food authenticity.

Table 1: Key Sources of Data Heterogeneity in Multi-Omics Studies

Source of Heterogeneity	Description	Impact on Data Integration
Diverse Data Structures	Omics data types exist as heterogeneous matrices with different scales, units, and data types (e.g., discrete counts for genomics, continuous intensities for metabolomics) [56].	Difficulties in aligning datasets and direct comparison.
Varying Noise Profiles & Batch Effects	Each technology has unique technical noise, detection limits, and is susceptible to batch effects from different reagent lots or operators [56].	Can obscure biological signals and lead to misleading conclusions.
Different Statistical Distributions	Data from each platform follows distinct statistical distributions, requiring tailored pre-processing and normalization methods [56].	Standardizing data for integrated analysis is complex.
Missing Values	The absence of data points can be platform-specific, occurring where a compound is not detected or is below the detection limit [56].	Reduces the number of common features across omics layers.

The following diagram illustrates the multi-omics data integration workflow and its associated challenges.

Established Integration Methodologies

To overcome heterogeneity, several computational methods have been developed. The choice of method depends on whether the data is "matched" (profiles from the same sample) or "unmatched" (from different samples), and whether a supervised (using known labels) or unsupervised approach is needed [56].

Table 2: Comparison of Multi-Omics Data Integration Methods

Method	Type	Key Principle	Application in Food Authenticity
MOFA [56]	Unsupervised	A Bayesian framework that infers latent factors capturing principal sources of variation across data types.	Identify underlying patterns (e.g., origin, processing) without prior labels.
DIABLO [56]	Supervised	Uses known phenotypes (e.g., authentic/adulterated) to integrate datasets and select discriminant features.	Build predictive models for food fraud using known authentic samples.
SNF [56]	Unsupervised	Fuses sample-similarity networks from each omics layer into a single network via a non-linear process.	Cluster similar samples to discover unknown adulteration patterns.
MCIA [56]	Unsupervised	A multivariate method that projects multiple datasets into a shared dimensional space to find relationships.	Visualize and interpret how different omics data contribute to food classification.

The application of these methods in a typical NTM workflow for food authenticity is outlined below.

Experimental Protocol: An NTM Case Study for Spelt and Wheat Authentication

This protocol details a specific NTM that integrates liquid chromatography-high-resolution mass spectrometry (LC-HRMS) with convolutional neural networks (CNNs) to distinguish spelt from wheat, a common authenticity issue [18] [57].

Research Reagent Solutions

Table 3: Essential Materials and Reagents for LC-HRMS-based NTM

Item	Function / Specification
LC-HRMS System	For high-resolution spectral fingerprint acquisition. Equipped with a time-of-flight (TOF) mass analyzer [18].
Spelt & Wheat Cultivars	Certified reference materials. Example: Eleven cultivars each of typical spelt and wheat, authenticity verified via marker peptide profiles [18].
Solvents & Mobile Phases	LC-MS grade water, acetonitrile, and methanol for sample preparation and chromatographic separation.
Data Analysis Platform	Python/R environment with libraries for CNNs (e.g., TensorFlow, PyTorch) and chemometrics [18].

Step-by-Step Procedure

Sample Preparation and Measurement:
- Source and authenticate spelt and wheat cultivars. Measure each cultivar in duplicate across different days to account for technical variability [18].
- For processed goods (e.g., bread, flour), prepare samples according to standard recipes. Include artificially mixed spectra (e.g., 90% spelt + 10% wheat) to validate the model's performance on adulterated samples [18].
Wet Lab: LC-HRMS Fingerprinting:
- Use LC-HRMS to obtain highly resolved spectral fingerprints for all samples.
- Critical Parameters: Use data-independent acquisition (DIA) modes like SWATH to acquire comprehensive MS1 and MS2 data across a normalized mass window, ensuring consistent data structure for integration [18].
Dry Lab: Data Pre-processing and Modeling:
- Data Formation: Transform the 2D spectral data (retention time vs. m/z) into a format suitable for CNN processing. Each data point is treated as an "image" where intensity values form the pixels [18].
- Model Training with Nested Cross-Validation (NCV):
  - Split the calibration dataset (44 spectra from 11 spelt + 11 wheat cultivars) into training and testing sets using an NCV approach. This rigorously prevents overfitting and provides a realistic estimate of model performance [18].
  - Train the CNN model to automatically learn patterns and representations that best discriminate spelt from wheat.
- Validation: Use an external validation set containing artificially mixed spectra, processed goods, and atypical cultivars not used in model building. Employ a quantitative metric like the D score to evaluate and compare classification decisions [18].

The integration of multi-omics data presents a powerful path forward for non-targeted food authenticity research. While data heterogeneity remains a significant challenge, established methodologies like MOFA, DIABLO, and SNF, coupled with robust experimental protocols that include rigorous validation, provide a clear roadmap for researchers. By effectively integrating these diverse data layers, scientists can uncover robust, non-targeted biomarkers, leading to more accurate detection of food fraud and enhanced consumer protection.

Managing and Curating High-Quality Reference Databases

In non-targeted methods (NTM) for food authenticity research, the analytical result is not based on pre-defined analytes but is derived from a global fingerprint of the foodstuff, interpreted through statistical models [13] [58]. The model's predictive accuracy is fundamentally constrained by the quality and scope of the reference database used for its calibration and validation [59] [18]. These databases, composed of authentic and adulterated reference materials and their associated analytical fingerprints, enable the empirical differences that discriminate genuine from non-authentic products. Consequently, the meticulous management and curation of these databases is not merely a supportive task but a foundational prerequisite for generating reliable, comparable, and legally defensible results in food fraud detection. This document outlines detailed protocols and application notes for establishing and maintaining high-quality reference databases, framed within the broader context of validating NTMs.

The Critical Role of Reference Databases in NTM Validation

The validation of an NTM requires demonstrating that the method is fit-for-purpose, meaning it can reliably detect the specific food fraud it was designed to identify. This reliability is intrinsically linked to the reference database.

From Reference Materials to Reference Data

The process begins with the procurement and characterization of Reference Materials (RMs). According to metrological guidelines, an RM is a "material, sufficiently homogeneous and stable with respect to one or more specified properties, which has been established to be fit for its intended use in a measurement process" [59]. For food authenticity, RMs can be divided into two key classes, as summarized in Table 1.

Table 1: Categories of Reference Materials for Food Authenticity Testing

Category	Primary Function	Traceability Requirement	Example
RMs with Metrologically Traceable Property Values	Method validation, calibration, quality control [59]	Metrological traceability of a quantitative property (e.g., concentration)	Certified Reference Material for element concentration
RMs with Traceable Nominal Property Values	Calibrating statistical models; determining natural variation of markers [59]	Material and documentary traceability to a process or origin (e.g., geographical origin, production system)	Authentic olive oil from a specific PDO region with verified documentation

A critical bottleneck, as identified in a NIST workshop, is the limited availability of test materials of known origin and growth conditions for many commodities, which hampers the development of robust data repositories [59]. For NTMs, RMs with traceable nominal properties are essential. These materials are analyzed using the non-targeted platform (e.g., LC-HRMS, NMR, NIRS) to generate the reference fingerprints that constitute the database. The resulting database must capture the natural variability of authentic products while also encompassing known adulterants to train models to recognize both compliance and fraud.

Database Requirements for Different NTM Approaches

The design of the reference database is influenced by the type of non-targeted approach employed:

Profiling: This approach relies on the analysis of a broad range of identified compounds (e.g., fatty acids, elements). The database must contain quantitative data for these specific, known markers across a wide set of authentic and adulterated samples [59].
Fingerprinting: This approach uses the entire instrumental response (e.g., spectral or chromatographic data) without requiring prior identification of all features. The database must store the raw or pre-processed fingerprinting data (e.g., NMR, MS, NIR spectra) from a statistically significant number of authenticated samples [59] [60]. The power of this method lies in using advanced machine learning to find patterns within this complex, high-dimensional data.

Protocols for Database Curation and Management

The following section provides a detailed workflow and specific protocols for building and maintaining a high-quality reference database.

Workflow for Database Establishment and Use

The following diagram illustrates the comprehensive lifecycle for creating and utilizing a reference database for NTM validation.

Detailed Experimental Protocol: Building a Database for Spelt-Wheat Discrimination

The following protocol is adapted from a study that used LC-HRMS and convolutional neural networks (CNNs) to distinguish spelt from wheat, a common fraud issue [18].

Objective: To create a curated database of LC-HRMS spectra for multiple spelt and wheat cultivars to train and validate a classification model.

Materials and Reagents:

Authentic Samples: Eleven cultivars each of typical spelt and wheat, as defined by marker peptide profiles [18]. Samples should be sourced from reputable suppliers (e.g., Institut für Getreideverarbeitung (IGV) GmbH) with documented provenance.
Processed Goods: Spelt flour and bread samples adulterated with 10% wheat, for testing model robustness.
Solvents: LC-MS grade water, acetonitrile, and methanol.
Equipment: LC system coupled to a high-resolution mass spectrometer (e.g., Orbitrap-based instrument).

Procedure:

Sample Preparation:
- Grind grain samples to a fine, homogeneous powder using a mill.
- Weigh 100 mg ± 5 mg of each sample into a microcentrifuge tube.
- Add 1.0 mL of extraction solvent (e.g., 70:30 methanol:water, v/v).
- Vortex vigorously for 60 seconds, then centrifuge at 14,000 × g for 10 minutes.
- Transfer the supernatant to a vial for LC-HRMS analysis.

LC-HRMS Analysis (Fingerprint Acquisition):
- Inject 5 µL of the extract onto the LC column (e.g., C18 column, 2.1 x 100 mm, 1.7 µm).
- Use a binary gradient with mobile phase A (water + 0.1% formic acid) and B (acetonitrile + 0.1% formic acid). A typical gradient might be: 0-2 min, 5% B; 2-15 min, 5-95% B; 15-17 min, 95% B; 17-18 min, 95-5% B; 18-20 min, 5% B.
- The MS should be operated in data-independent acquisition (DIA) mode, such as SWATH, to collect fragmentation data for all detectable analytes. Acquire data in a mass range of m/z 50-1200.
Data Pre-processing and Database Curation:
- Convert raw instrument files to an open data format (e.g., mzML).
- Perform peak picking, alignment, and retention time correction using software like XCMS or MS-DIAL.
- Export a feature table containing m/z, retention time, and intensity for all detected peaks across all samples.
- This feature table, annotated with sample metadata (cultivar, species, origin, processing), forms the core of the reference database.
Model Training and Validation (Dry Lab):
- Split the database into a calibration/training set (e.g., duplicate measurements of 11 spelt and 11 wheat cultivars) and an external validation set (e.g., atypical cultivars, processed goods, artificial mixtures) [18].
- Train a machine learning model, such as a Convolutional Neural Network (CNN), using the calibration set. The input to the CNN can be a 2D image constructed from the spectral data (m/z vs. retention time with intensity as pixel value) [18].
- Validate model performance using the held-out external validation set, reporting metrics like accuracy, sensitivity, and specificity.

The Scientist's Toolkit: Essential Reagents and Materials

Table 2: Key Research Reagent Solutions for NTM Database Development

Item	Function/Application	Key Considerations
Certified Reference Materials (CRMs) [59]	Method validation and ensuring metrological traceability for quantitative assays.	Select CRMs with property values relevant to the food matrix (e.g., element concentrations, compound-specific isotope ratios).
Authentic Reference Samples [59] [61]	Provides the foundational fingerprints for authentic material in the database.	Documented provenance is critical. Must have verified claims (geographical origin, organic production, species).
Stable Isotope Standards [62]	For Isotope Ratio Mass Spectrometry (IRMS) to determine geographical origin and adulteration.	Used to calibrate IRMS instruments for measuring δ¹³C, δ¹⁵N, δ¹⁸O, δ²H, and δ³⁴S.
LC-HRMS Solvents & Columns [18]	Generating high-resolution spectral fingerprints for metabolomics/proteomics approaches.	Use LC-MS grade solvents and high-efficiency U/HPLC columns to ensure reproducibility and peak resolution.
DNA Extraction & PCR Kits [62]	For DNA-based speciation and GMO detection, adding a complementary data layer to the database.	Kits should be validated for complex and processed food matrices to ensure DNA quality.
Data Processing Software (e.g., XCMS, MS-DIAL) [18]	Pre-processing raw instrumental data into a structured feature table for the database.	Software must be capable of handling large, multi-batch datasets and performing peak alignment and normalization.

Accessing Existing Authenticity Databases

Leveraging existing public databases can supplement in-house data collection. The Food Authenticity Network maintains a searchable list of known databases for classifying authentic and fraudulent products [61]. These include:

Commodity-Specific Databases: For example, databases for British apples, organic barley, and agave syrup, often based on mass spectrometry or NMR data [61].
Technique-Specific Databases: Such as those for Stable Isotope MS and NMR maintained by commercial laboratories [61].
Global Resources: Signposting services, like the one provided by the Food Industry Hub, direct researchers to global regulatory resources, traceability databases, and fraud alerts from bodies like the FDA, FSA, and RASFF [63].

The integrity of a non-targeted method for food authenticity is a direct reflection of the quality of its underlying reference database. A rigorously curated database, built from well-characterized reference materials with impeccable traceability and analyzed under controlled conditions, provides the empirical foundation without which NTM validation is impossible. As the field moves towards greater harmonization, the development of shared, high-fidelity databases and research-grade test materials will be paramount to improving the comparability of results across laboratories and over time, ultimately strengthening our global defense against food fraud [59].

Addressing the Pitfalls of Result Interpretation and Statistical Overfitting

In the field of food authenticity research, non-targeted methods (NTMs) have emerged as powerful tools for detecting fraud and verifying food origin, quality, and production methods [1]. These methods differ fundamentally from targeted approaches by not focusing on pre-defined analytes but instead capturing a comprehensive fingerprint of the sample [13]. This fingerprint, often acquired through advanced analytical techniques like liquid chromatography-high-resolution mass spectrometry (LC-HRMS) or spectroscopy, is subsequently interpreted using sophisticated chemometric and machine learning algorithms [18] [1].

The very strength of NTMs—their data-rich, comprehensive nature—also presents significant challenges. The high-dimensional data generated, where the number of variables (e.g., spectral features) can vastly exceed the number of samples, creates a fertile ground for statistical overfitting [18]. Overfitting occurs when a model learns not only the underlying patterns in the training data but also the random noise, resulting in a model that performs exceptionally well on the training data but poorly on new, unseen data. This, coupled with the complexity of interpreting results from "black box" machine learning models, poses substantial risks to the validity and reliability of NTM applications [18] [1]. This document outlines the primary pitfalls in interpreting NTM results and provides detailed protocols and strategies to mitigate statistical overfitting, ensuring the development of robust, validated, and fit-for-purpose non-targeted methods.

Key Pitfalls in NTM Result Interpretation and Validation

Statistical Overfitting and Model Robustness

Overfitting is arguably the most critical challenge in NTM development. It can arise from several factors, including insufficient sample size relative to feature number, inappropriate feature selection, and inadequate validation techniques. A model suffering from overfitting will fail in real-world applications, leading to false conclusions about a food product's authenticity [18].

The "Black Box" Problem of Complex Algorithms

Advanced machine learning models, such as Convolutional Neural Networks (CNNs), can automatically learn discriminating patterns from complex data, such as LC-HRMS spectra treated as images [18]. While this avoids manual feature selection, it can make it difficult to understand which specific chemical compounds or spectral features are driving the classification. This lack of interpretability can be a significant hurdle for widespread adoption, especially in regulatory and control settings [18] [1].

Database Dependency and Representativeness

NTMs are inherently reliant on reference databases for model training [1]. The performance and reliability of an NTM are directly contingent upon the quality, size, and representativeness of this database. A database that lacks sufficient genetic or chemical diversity for a given food product, or that does not account for natural variability (e.g., due to geography, season, or agricultural practice), will produce a model with poor generalization capability [1].

Inadequate Validation and Performance Assessment

A common pitfall is the failure to properly validate the model's performance using independent data. Relying solely on internal validation metrics like cross-validation on the calibration set can provide an overly optimistic view of model performance [18] [1]. True assessment requires an external validation set comprising samples that were not involved in any part of the model building process [18].

Table 1: Common Pitfalls in NTM Development and Their Consequences

Pitfall	Description	Potential Consequence
Model Overfitting	Model learns noise from the training data instead of generalizable patterns.	Poor predictive performance on new samples; inaccurate authenticity assessment.
Inadequate Validation	Using only internal/resubstitution validation without external testing.	Overestimation of model accuracy and robustness.
Unrepresentative Database	Reference database lacks diversity or does not cover expected natural variation.	Model fails when applied to real-world samples with legitimate variability.
Ignoring Data Pre-processing	Failure to apply appropriate spectral normalization, alignment, or scaling.	Model artifacts and technical variations are mistaken for biological patterns.

Experimental Protocols to Mitigate Overfitting

Protocol for Nested Cross-Validation (NCV)

Principle: NCV provides an almost unbiased estimate of the model's true performance by combining feature selection and hyperparameter tuning within an outer loop of cross-validation [18].

Procedure:

Split the Calibration Dataset: Divide the entire calibration dataset into k outer folds (e.g., 5 or 10).
Outer Loop: For each of the k iterations: a. Hold out one fold as the validation set. b. Use the remaining k-1 folds as the training set. c. Inner Loop: Perform a second cross-validation (e.g., 5-fold) on this training set to optimize the model's hyperparameters and select features. d. Train a final model on the entire training set using the optimized parameters. e. Evaluate this model on the held-out validation fold from step 2a.
Performance Estimation: Aggregate the performance metrics (e.g., accuracy, sensitivity) from all k iterations to obtain a robust estimate of model performance.

Protocol for External Validation

Principle: To definitively assess the model's performance on completely independent data, simulating real-world application [18].

Procedure:

Dataset Division: Initially split the full dataset into a calibration set (e.g., 70-80%) and an external validation set (e.g., 20-30%). The external set must be kept entirely separate and not used for any step of model building.
Model Building: Develop the final model using the entire calibration set and the optimal parameters identified through NCV.
Final Testing: Apply the final model to the external validation set. The performance metrics obtained from this test are the best indicator of how the model will perform in practice. This set should include challenging samples, such as artificially mixed spectra, processed goods, and atypical cultivars not used in calibration [18].

Protocol for Data Pre-processing and Augmentation

Principle: To minimize the influence of technical noise and increase the effective sample size for model training.

Procedure:

Spectral Pre-processing: a. Alignment: Correct for retention time shifts in chromatographic data [18]. b. Normalization: Apply techniques like Total Ion Count (TIC) normalization to account for overall signal intensity differences [18]. c. Scaling: Use methods like Pareto or Unit Variance scaling to balance the influence of high and low-intensity features.
Data Augmentation (for spectral data): a. Artificially expand the dataset by creating slightly modified versions of existing spectra. b. This can include adding small random noise, applying minor shifts in retention time or m/z, or creating weighted mixtures of spectra from pure samples to simulate adulteration [18].

Diagram 1: Comprehensive NTM development workflow with integrated overfitting mitigation strategies.

The Scientist's Toolkit: Research Reagent Solutions

The successful implementation of an NTM requires a combination of analytical instrumentation, computational tools, and carefully characterized biological materials.

Table 2: Essential Research Reagents and Materials for NTM in Food Authenticity

Item / Solution	Function / Purpose	Example Application
LC-HRMS System	High-resolution fingerprint acquisition; provides precise mass and retention time data for comprehensive metabolite/protein profiling.	Distinguishing spelt and wheat cultivars based on peptide marker profiles [18].
Reference Databases	To build a representative chemical or genetic baseline for model training; defines the classes for authentication.	Authenticating geographical origin of honey, olive oil, or truffles [1] [13].
Typical & Atypical Cultivars	To test model robustness and generalizability beyond the initial training set.	External validation using "untypical spelts" and old wheat cultivars not used in model building [18].
Artificial Mixture Samples	To simulate common adulteration scenarios and validate the model's ability to detect them.	Creating spectra for spelt bread containing 10% wheat flour for validation [18].
Chemometrics/ML Software	For data pre-processing, feature extraction, model training, and validation (e.g., using CNN, PLS-DA).	Building a CNN model to automatically classify spelt vs. wheat from LC-HRMS spectra [18].

Quantitative Decision Metrics

To move beyond binary classification and add a layer of reliability, introducing quantitative decision metrics is highly recommended.

The D-Score Metric

A proposed metric is the D-score (Decision Score), which provides a quantitative measure of the confidence in classification decisions [18]. For instance, in a CNN model discriminating between spelt and wheat, the D-score could be derived from the difference in the output probabilities for the two classes. A high absolute D-score indicates a high-confidence classification, while a score near a predefined threshold would flag the result for further scrutiny. This is particularly useful for evaluating borderline cases, such as mixed samples or untypical cultivars [18].

Table 3: Interpretation of Quantitative D-Score for Classification Confidence

D-Score Range	Interpretation	Recommended Action
> 0.8	High-confidence classification.	Result can be reported with high certainty.
0.5 - 0.8	Moderate-confidence classification.	Result is likely reliable; consider replication.
< 0.5	Low-confidence classification.	Flag for manual review; sample may be atypical or adulterated. Requires further investigation.

Diagram 2: A robust validation framework separating model tuning from final evaluation.

Ensuring Sample Integrity and Managing Matrix Effects in Complex Foods

In non-targeted methods (NTMs) for food authenticity research, the goal is to exploit all constituents of a sample rather than targeting a predefined "needle in a haystack" [15]. The reliability of these advanced analytical techniques is fundamentally dependent on two pillars: the integrity of the original sample and the effective management of matrix effects that arise from complex food compositions. Matrix effects—unpredictable impacts on analyte signals caused by co-eluting compounds—can significantly compromise data quality and lead to false conclusions in food authentication [64]. Sample integrity ensures that the analytical fingerprint generated truly represents the authentic food product, while proper management of matrix effects guarantees that this fingerprint can be accurately interpreted and compared against reference databases. This application note provides detailed protocols and considerations for addressing these critical challenges within the broader context of validating NTMs for food authenticity research, covering diverse food matrices from meat and seafood to high-value oils and processed goods.

Theoretical Background: Matrix Effects in Non-Targeted Analysis

Matrix effects represent a significant challenge in mass spectrometry-based non-targeted analysis, particularly for complex food matrices. These effects occur when co-eluting compounds alter the ionization efficiency of target analytes, leading to either signal suppression or enhancement [64]. In electrospray ionization (ESI), which is commonly coupled with liquid chromatography (LC), matrix effects are especially pronounced due to competitive ionization processes in the spray plume [64]. The complexity of food matrices—containing varying proportions of proteins, lipids, carbohydrates, minerals, and secondary metabolites—creates a dynamic environment where these effects are unpredictable and often sample-specific.

The fundamental principle underlying matrix effect management is that every component in the food sample contributes to the overall chemical fingerprint, and this comprehensive signature must be preserved throughout the analytical process [15] [13]. Non-targeted methods aim to capture this global fingerprint without pre-selecting specific analytes, making matrix effects a particularly pervasive challenge that must be addressed through rigorous sample preparation protocols, analytical parameter optimization, and data processing strategies [8] [65].

Sample Integrity Protocol

Sample Collection and Handling

Principle: Maintain the inherent chemical composition of food samples from collection to analysis to ensure analytical results truly represent the original product.

Materials:

Cryogenic storage tubes: For preserving sample integrity at low temperatures
Inert sample containers: Prevent leaching of contaminants or adsorption of analytes
Portable freezers (-20°C or -80°C): For transport and temporary storage
Liquid nitrogen: For immediate snap-freezing of labile samples
Antioxidant additives: Butylated hydroxytoluene (BHT) or ascorbic acid for lipid-rich samples

Procedure:

Field Collection:
- Collect representative samples using clean, inert instruments to prevent cross-contamination
- For heterogeneous products (e.g., grains, minced meat), employ quartering techniques to ensure representative sampling
- Immediately aliquot samples into pre-cooled containers
- Add antioxidants to samples prone to oxidative degradation (e.g., edible oils, fatty fish)

Transport and Storage:
- Snap-freeze biologically active samples in liquid nitrogen within 30 minutes of collection
- Maintain consistent cold chain during transport with temperature monitoring
- Store at -80°C for long-term preservation of labile metabolites
- Document storage time and conditions for each sample batch
Sample Homogenization:
- Pre-chill homogenization equipment to prevent heat degradation
- For solid matrices, use cryogenic grinding with liquid nitrogen
- Confirm homogeneity through replicate analysis of subsamples
- Aliquot homogenized samples to avoid repeated freeze-thaw cycles

Quality Control:

Implement a sample tracking system with unique identifiers
Include blank samples throughout collection and handling process
Document any deviations from standard procedures
Test sample stability under storage conditions through periodic re-analysis

Sample Preparation Workflow

Table 1: Sample Preparation Methods for Different Food Matrices

Food Matrix	Homogenization Method	Stabilization Approach	Storage Conditions	Maximum Storage Duration
Meat & Meat Products	Cryogenic grinding	Antioxidant addition (BHT)	-80°C	6 months
Seafood	Blade homogenization under N₂	Snap freezing in liquid N₂	-80°C	4 months
Olive Oil & High-value Oils	Liquid-liquid extraction	Nitrogen atmosphere	-20°C, dark	12 months
Cereals & Grains	Mill grinding	Desiccation	Room temperature, dry	24 months
Honey & Syrups	Warm water bath (40°C)	None required	Room temperature, dark	18 months
Processed Foods	Cryogenic grinding	Antioxidant addition	-80°C	6 months

Experimental Protocols for Managing Matrix Effects

Protocol 1: Comprehensive Sample Clean-up and Extraction

Principle: Selectively remove interfering compounds while maximizing recovery of metabolites for non-targeted analysis.

Materials and Reagents:

Solid Phase Extraction (SPE) cartridges: C18, HLB, Silica, and Mixed-mode
Solvents: HPLC-grade methanol, acetonitrile, water, ethyl acetate, hexane
QuEChERS kits: For pesticide residue analysis with modifications for NTMs
Ultrafiltration devices: 3kDa and 10kDa molecular weight cut-off filters

Procedure:

Dual Extraction for Comprehensive Metabolite Coverage:
- Weigh 100±5mg of homogenized sample into 2mL microcentrifuge tubes
- For polar metabolites: Add 1mL methanol:water (80:20, v/v), vortex 1min, sonicate 15min, centrifuge 15min at 14,000×g
- For non-polar metabolites: Add 1mL methyl-tert-butyl ether:methanol (3:1, v/v) to separate aliquot, vortex 1min, sonicate 15min, centrifuge 15min at 14,000×g
- Combine supernatants or analyze separately based on research objectives

Clean-up Strategies by Matrix Type:
- Lipid-rich matrices: Use freezing-lipid filtration - incubate extract at -20°C for 2h, centrifuge at 4°C, collect supernatant
- Protein-rich matrices: Employ precipitation with cold acetonitrile (1:2 sample:solvent ratio), incubate at -20°C for 1h, centrifuge
- Pigmented matrices: Use SPE with C18 cartridges to remove chlorophyll and carotenoids
- Complex processed foods: Implement sequential extraction with solvents of increasing polarity
Concentration and Reconstitution:
- Evaporate extracts under gentle nitrogen stream at 30°C
- Reconstitute in initial mobile phase composition for LC analysis
- Filter through 0.22μm membrane prior to injection

Quality Control:

Process quality control samples (pooled from all samples) with each batch
Include extraction blanks to monitor contamination
Spike selected samples with internal standards to monitor extraction efficiency
Assess matrix effect quantitatively using post-extraction addition method

Protocol 2: LC-HRMS Analysis with Matrix Effect Minimization

Principle: Utilize chromatographic separation and mass spectrometric parameters to reduce matrix interference.

Materials and Instruments:

LC System: Ultra-high performance liquid chromatography with binary pump
Mass Spectrometer: High-resolution mass analyzer (Q-TOF, Orbitrap)
Columns: C18, HILIC, phenyl-hexyl (100×2.1mm, 1.7-1.8μm)
Mobile phases: Water with 0.1% formic acid, acetonitrile with 0.1% formic acid

Procedure:

Chromatographic Method Development:
- Optimize gradient elution to separate analytes from matrix interferences
- Incorporate delayed injection to retain highly polar matrix components
- Use alternative stationary phases (HILIC, phenyl-hexyl) for different metabolite classes
- Employ longer analytical columns (150mm) for increased separation efficiency

Mass Spectrometric Parameters:
- Implement data-independent acquisition (DIA/SWATH) for comprehensive fragmentation data
- Alternate between positive and negative ionization modes in separate runs
- Use dynamic exclusion to prevent oversampling of abundant matrix ions
- Optimize collision energy ramping for fragmentation efficiency
Matrix Effect Assessment:
- Prepare post-extraction spiked samples at low, medium, and high concentrations
- Compare analyte responses in neat solution versus matrix
- Calculate matrix effect (ME%) = (peak area in matrix / peak area in neat solution - 1) × 100
- ME% > 0 indicates ionization enhancement, ME% < 0 indicates suppression

Quality Control:

Inject quality control samples every 5-10 injections to monitor system stability
Use internal standards covering different chemical classes to correct for retention time shifts
Monitor system sensitivity and chromatographic performance throughout sequence
Include solvent blanks to identify carryover and background contamination

Research Reagent Solutions

Table 2: Essential Research Reagents for Managing Matrix Effects in Food Authenticity NTMs

Reagent/Material	Function	Application Examples	Key Considerations
C18 SPE Cartridges	Reverse-phase clean-up; remove lipids and pigments	Olive oil, meat, dairy products	Varying carbon loads and particle sizes affect selectivity
HLB (Hydrophilic-Lipophilic Balanced) SPE	Broad-spectrum retention of polar and non-polar compounds	Honey, wine, fruit juices	Superior for highly polar metabolites compared to C18
QuEChERS Kits	Quick, Easy, Cheap, Effective, Rugged, Safe; multi-residue extraction	Cereals, spices, processed foods	Can be modified with additional clean-up steps for NTMs
Molecularly Imprinted Polymers	Selective extraction of target compound classes	Mycotoxins, veterinary drugs	High selectivity but limited to pre-defined targets
Immunoaffinity Columns	Antibody-based highly specific clean-up	Allergens, specific protein markers	Excellent specificity but limited to available antibodies
Graphitized Carbon Black	Removal of pigments and acidic compounds	Plant extracts, colored foods	Can also retain some desirable analytes requiring optimization
Zirconia-Based Sorbents	Selective removal of phospholipids	Lipid-rich matrices	Superior to C18 for phospholipid removal
Internal Standard Mixture	Correction for matrix effects and recovery	All matrices	Should cover wide polarity range and chemical diversity

Data Analysis and Validation Framework

Data Processing Workflow for Matrix-Rich Samples

The analysis of complex food matrices in non-targeted authenticity testing requires specialized data processing approaches to distinguish true biological variation from matrix-induced artifacts. After raw data acquisition, several preprocessing steps are essential:

Preprocessing Steps:

Peak Detection and Alignment: Use hierarchical density-based spatial clustering to group peaks across samples while accounting for retention time shifts caused by matrix components [8].
Signal Drift Correction: Implement quality control-based robust LOESS normalization to correct for sensitivity changes throughout analytical sequences [8].
Matrix Effect Compensation: Apply post-acquisition correction using stable isotope-labeled internal standards spiked at known concentrations before extraction.
Batch Effect Removal: Use combat or surrogate variable analysis to remove technical variation while preserving biological signals.

Multivariate Analysis:

Principal Component Analysis (PCA): Identify outliers and assess overall data quality
Orthogonal Partial Least Squares-Discriminant Analysis (OPLS-DA): Separate classes while filtering out matrix-related variation orthogonal to class discrimination
Cross-validation: Implement repeated double cross-validation to avoid overfitting with complex matrices

Validation Approaches for Non-Targeted Methods

Validating NTMs for food authenticity requires innovative approaches that differ from traditional targeted method validation [15]. Key performance characteristics to evaluate include:

Specificity: Assess ability to correctly classify authentic samples and detect adulterants despite matrix interference
Sensitivity: Determine detection limits for common adulterants in specific food matrices
Robustness: Test method performance under variations in sample preparation and analysis conditions
Transferability: Demonstrate that methods produce equivalent results across different laboratories and instruments

Table 3: Validation Parameters for NTMs in Food Authenticity Testing

Validation Parameter	Assessment Approach	Acceptance Criteria	Matrix-Specific Considerations
Discrimination Power	Cross-validated classification accuracy	>90% for clear authenticity questions	Establish matrix-specific decision thresholds
Method Stability	Quality control sample clustering in PCA	RSD < 30% for QC samples	Monitor matrix-induced signal drift
Detection Capability	Adulteration series with decreasing levels	LOD established for common adulterants	Account for matrix-specific background interference
Transferability	Interlaboratory study with identical samples	>80% concordance between laboratories	Standardize matrix-specific sample preparation
Throughput	Samples processed per time unit	Compatible with control laboratory needs	Matrix-specific preparation time included

Workflow Visualization

Diagram 1: Comprehensive workflow for maintaining sample integrity and managing matrix effects in non-targeted food authenticity analysis, showing the sequential stages from sample collection through data validation.

Ensuring sample integrity and effectively managing matrix effects are foundational to generating reliable, reproducible data in non-targeted food authenticity research. The protocols outlined in this application note provide a systematic approach to these challenges, emphasizing matrix-specific strategies for sample preparation, analytical analysis, and data processing. As the field advances toward standardized validation frameworks for NTMs [15] [7], attention to these fundamental aspects will enhance method robustness and transferability across laboratories. The convergent technologies of advanced mass spectrometry, innovative sample preparation materials, and sophisticated data analysis algorithms [65] collectively address the complexities of authenticating increasingly diverse and sophisticated food products in the global market.

Food authenticity represents a critical frontier in global food safety and quality assurance, with economic adulteration and counterfeiting costing the industry an estimated $30–40 billion annually [59]. In response to this challenge, non-targeted methods (NTMs) have emerged as powerful analytical strategies that do not require prior knowledge of specific analytes, enabling the detection of unknown contaminants and fraudulent practices through comprehensive fingerprinting approaches [66]. These methodologies, often based on high-resolution analytical technologies such as mass spectrometry (MS) and nuclear magnetic resonance (NMR) spectroscopy, provide a holistic means to verify food authenticity by detecting patterns and anomalies in complex food matrices [22] [67].

The transition of NTMs from research proof-of-concept to routine laboratory application represents a significant challenge for the scientific community. Despite extensive research demonstrating their potential, these methods have not yet been widely incorporated into official control measures, primarily due to the lack of standardized validation guidelines and established frameworks for assessing their fitness for purpose [66]. This application note addresses this critical gap by providing detailed protocols, validation frameworks, and practical implementation strategies to facilitate the robust adoption of NTMs in routine food authenticity testing.

Validation Frameworks for Non-Targeted Methods

Essential Terminology and Validation Considerations

The validation of NTMs requires a distinct approach compared to traditional targeted methods. Unlike targeted analyses that focus on predefined "needles in a haystack," NTMs exploit all constituents of the "haystack" to build comprehensive analytical profiles [15]. This paradigm shift necessitates new concepts, terms, and validation considerations that must be propagated throughout academic research, commercial development, and official control laboratories.

A fundamental challenge in NTM validation lies in establishing metrological traceability and comparability of testing results, which often depends on the availability of appropriate reference materials (RMs) [59]. According to ISO standards, reference materials must be "sufficiently homogeneous and stable with respect to one or more specified properties" and "fit for intended use in a measurement process" [59]. For NTMs, RMs serve critical functions in method validation, quality control, and calibration of multivariate statistical models used for classification [59].

The growing recognition of these challenges has prompted collaborative efforts to develop standardized validation guidelines. Eurachem, in partnership with AOAC-Europe, has established a joint task group specifically focused on developing guidelines for validating non-targeted methods, acknowledging that the lack of such guidance has hindered their adoption in official food safety decision-making processes [66].

Performance Parameters for NTM Validation

Table 1: Key Performance Parameters for NTM Validation

Validation Parameter	Description	Considerations for NTMs
Specificity	Ability to discriminate between different sample classes	Assessed through multivariate statistics; requires representative sample sets
Robustness	Resilience to minor variations in analytical conditions	Critical for inter-laboratory reproducibility; includes sample preparation and instrument variations
Transferability	Performance consistency across multiple platforms/laboratories	Demonstrated in interlaboratory studies; requires standardized protocols
Classification Accuracy	Correct classification rate for authentic vs. non-authentic samples	Validated with independent test sets; requires avoidance of overfitting
Marker Identification	Ability to identify chemical markers of fraud	Not always necessary for classification but important for interpretability

Experimental Protocols for NTM Implementation

NMR-Based Food Authenticity Testing

Nuclear magnetic resonance (NMR) spectroscopy has established itself as a powerful tool for non-targeted food authenticity assessment due to its high robustness, intrinsic quantitative capabilities, and non-destructive nature [67]. The following protocol, optimized for tomato authentication, demonstrates a transferable approach applicable to various food matrices.

Protocol: NMR Sample Preparation and Acquisition for Tomato Geographical Origin Authentication

Materials:

Fresh tomato samples
Deuterated solvent (D₂O, 99.9%)
Sodium 3-(trimethylsilyl)propionate-2,2,3,3-d4 (TSP, 0.1% in D₂O)
Phosphate buffer (pH 4.2)
Sodium azide (0.05% w/v)

Equipment:

High-resolution NMR spectrometer (≥400 MHz)
Lyophilizer
Laboratory homogenizer
Precision balance
Centrifuge
NMR tubes (5 mm)

Procedure:

Sample Homogenization: Wash and homogenize entire tomato fruits using a laboratory homogenizer until a consistent puree is obtained.
Extraction: Weigh 2.0 g of homogenized tomato into a centrifuge tube. Add 4.0 mL of extraction solvent (D₂O phosphate buffer, pH 4.2, containing 0.05% sodium azide). Vortex vigorously for 60 seconds.
Centrifugation: Centrifuge at 14,000 × g for 20 minutes at 4°C to remove particulate matter.
Supernatant Collection: Transfer 600 μL of clear supernatant to a new tube containing 70 μL of TSP reference standard in D₂O.
NMR Acquisition: Transfer 650 μL of the prepared sample to a 5 mm NMR tube. Acquire ¹H NMR spectra using a 1D NOESY pulse sequence with presaturation for water suppression at 298 K.
Parameter Settings: Set acquisition time to 3.5 seconds, relaxation delay to 4 seconds, 64 scans, and spectral width of 20 ppm.

Quality Control:

Assess spectral quality based on signal-to-noise ratio (>100:1 for TSP reference) and full width at half maximum (FWHM < 2 Hz).
Include system suitability test using certified reference material.
Monitor extraction repeatability through relative standard deviation (%RSD) of selected metabolite peaks (<5% for technical replicates).

This protocol demonstrated exceptional performance in an interlaboratory comparison, achieving a 97.62% correct classification rate for discriminating tomatoes from different geographical origins (Lazio vs. Sicily), even when samples were prepared and analyzed independently by different operators using their own equipment [67].

Mass Spectrometry-Based Approaches with Chemometrics

Ambient ionization mass spectrometry (AIMS) techniques have emerged as powerful tools for rapid food authentication screening, offering minimal sample preparation and high analytical throughput [68]. When combined with appropriate chemometric tools, these methods provide robust solutions for routine fraud detection.

Protocol: Paper Spray Mass Spectrometry (PS-MS) for Food Authentication

Materials:

Food samples (liquid or solid)
Analytical grade solvents (methanol, water, acetonitrile)
Filter paper triangles for PS-MS
Reference standards for system calibration

Equipment:

Ambient ionization mass spectrometer
High-voltage power supply
Solvent delivery system (if automated)
Sample homogenization equipment

Procedure:

Sample Preparation:
- For liquid samples (e.g., oils, honey): Apply 10-20 μL directly to the filter paper triangle.
- For solid samples: Perform simple extraction with appropriate solvent (e.g., 1:10 w/v in methanol-water mixture), vortex for 30 seconds, and apply 10-20 μL of extract to paper substrate.
Paper Spray Setup: Position the paper triangle 1-2 cm from the mass spectrometer inlet and apply high voltage (3-5 kV).
Solvent Application: Apply solvent (typically 80:20 methanol:water with 0.1% formic acid) at a flow rate of 10-20 μL/min to initiate spray.
MS Acquisition: Acquire mass spectra in full scan mode (m/z 50-1000) with resolution ≥30,000. Collect data for 1-2 minutes per sample.
Data Preprocessing: Apply peak picking, alignment, and normalization to raw data.

Chemometric Analysis:

Exploratory Analysis: Perform unsupervised pattern recognition (PCA, t-SNE) to identify natural clustering and outliers.
Model Development: Build supervised classification models (PLS-DA, Random Forest) using training sets with known authenticity.
Model Validation: Critically validate all supervised models with an external set of test samples to prevent overfitting. Avoid excessive use of latent variables.
Marker Identification: Conduct thorough spectral interpretation to identify chemical markers of fraud or authenticity.

The integration of AIMS with proper chemometric practices represents an innovative strategy with enormous potential for enhancing rapid fraud detection, particularly for high-value commodities such as olive oil, honey, and dairy products [68].

The Scientist's Toolkit: Essential Research Reagent Solutions

Successful implementation of NTMs for food authenticity requires carefully selected reagents, reference materials, and analytical standards to ensure data quality and method reliability.

Table 2: Essential Research Reagents and Materials for Food Authenticity NTMs

Item	Function	Application Examples
Certified Reference Materials (CRMs)	Method validation, quality control, establishing metrological traceability	ISO Guide 30-compliant RMs with documented provenance [59]
Deuterated Solvents	NMR spectroscopy requiring field frequency locking	D₂O for aqueous food extracts; CD₃OD for lipid-soluble components [67]
Internal Standards	Chemical shift referencing (NMR), quantification, instrument performance monitoring	TSP for NMR; stable isotope-labeled compounds for MS [67]
Synthetic Nucleic Acids	Positive controls for DNA-based methods	Custom oligonucleotides for PCR authentication of botanical ingredients [6]
Proficiency Testing Schemes	Interlaboratory comparison, method benchmarking	EPTIS database schemes for various food matrices [6]
DNA Extraction Kits	Quality-controlled nucleic acid isolation for molecular authentication	Validated methods for processed food matrices [6]

Workflow Visualization and Data Analysis

The analytical process for non-targeted food authenticity testing follows a systematic workflow from sample preparation to final authentication assessment, with critical validation checkpoints at each stage.

Implementation Strategies for Routine Laboratory Use

Database Frameworks and Fitness-for-Purpose Assessment

The transition from research to routine application heavily relies on the development of robust, transparent databases for method calibration and verification. A key challenge in this domain is the lack of transparency in proprietary authenticity databases, which has led to legal disputes and undermined confidence in NTM results [69]. To address this, a practical framework for evaluating database fitness-for-purpose has been developed, focusing on several critical aspects:

Database Scope and Composition: Clear documentation of the geographical, varietal, and temporal coverage of authentic reference samples.
Metadata Requirements: Comprehensive sample information including origin verification, harvesting conditions, and storage history.
Representativity Assessment: Statistical evaluation of how well the database represents the natural variability of the authentic product.
Method Validation Data: Evidence that analytical methods used to populate the database have been appropriately validated.
Safeguards for Database Owners: Mechanisms to protect intellectual property while ensuring methodological transparency.

This framework enables reliable enforcement decisions by providing a standardized approach to assess the reliability of authenticity databases, particularly for challenging applications such as honey authentication where database-based methods like NMR are already commercially deployed [69].

Quality Assurance and Regulatory Compliance

Successful implementation of NTMs in routine laboratories requires integration with established quality assurance frameworks and regulatory guidelines. Key considerations include:

Laboratory Accreditation: Utilize search functions provided by accreditation bodies such as the United Kingdom Accreditation Service (UKAS) to identify appropriately accredited laboratories for specific authenticity testing needs [6].

Standard Method Performance Requirements (SMPRs): AOAC International has developed SMPRs for both targeted and non-targeted food authenticity methods, setting minimum performance criteria that testing methods must fulfill [6]. These standards cover various high-risk commodities including extra virgin olive oil, honey, milk, and spices.

Method Validation Protocols: Adopt structured validation approaches that address the unique characteristics of NTMs, including:

Independent validation with external test sets
Demonstration of robustness through interlaboratory studies
Assessment of false positive and false negative rates
Establishment of system suitability criteria

The collaboration between Eurachem and AOAC-Europe to develop specific validation guidelines for NTMs represents a significant step toward standardized approaches that will support wider adoption in regulatory testing environments [66].

The transition of non-targeted methods from research proof-of-concept to routine laboratory application represents a critical evolution in food authenticity testing. This journey requires not only sophisticated analytical technologies but also robust validation frameworks, standardized protocols, transparent database management, and integration with quality assurance systems. The protocols and frameworks presented in this application note provide practical pathways for laboratories to implement these powerful methods while maintaining scientific rigor and regulatory compliance.

As the field continues to evolve, ongoing efforts to harmonize validation approaches, develop certified reference materials, and establish open-source database frameworks will further enhance the reliability and adoption of NTMs. By bridging the gap between research innovation and routine application, the food authenticity community can more effectively combat economic adulteration, protect consumer interests, and ensure the integrity of global food supply chains.

Ensuring Reliability: Validation Protocols and Comparative Analysis of NTMs

Non-targeted methods (NTMs) represent a paradigm shift in analytical chemistry for food authentication. Unlike traditional targeted methods that aim to detect predefined "needles in a haystack," NTMs exploit comprehensive analytical techniques to characterize the entire "haystack"—capturing a global fingerprint of a food product's composition [15] [7]. This approach is particularly valuable for detecting unknown or unexpected adulterants in complex food matrices, making it increasingly indispensable for combating economically motivated adulteration in today's globalized food supply chain [70] [13].

The fundamental principle underlying NTMs is their ability to screen for authenticity without prior knowledge of specific fraud markers, making them particularly valuable for detecting novel or unconventional adulteration practices [13]. As the food industry and regulatory bodies face growing challenges from sophisticated fraud practices, the development and proper validation of these methods have become critical for ensuring method reliability and widespread adoption [15] [70]. This document establishes a comprehensive framework for validating NTMs, with particular emphasis on demonstrating fitness-for-purpose—the essential requirement that any analytical method must satisfy its intended application [71].

Conceptual Framework for NTM Validation

The Fitness-for-Purpose Principle

The core principle of method validation is establishing fitness-for-purpose, defined as the demonstration that an analytical method's performance characteristics are appropriate for its intended application [71]. For NTMs, this concept takes on additional dimensions compared to traditional targeted methods. While targeted methods focus on validating performance for specific analytes, NTMs must demonstrate their ability to reliably answer a broader analytical question, such as "Is this olive oil authentic?" or "Does this honey sample match its claimed botanical origin?" rather than merely detecting specific adulterants [15].

This fitness-for-purpose approach requires carefully considering the method's operational context, including the specific food matrix, likely adulteration practices, and the required level of certainty for decision-making [15] [71]. The European Commission's Eurachem Guide emphasizes that validation should provide objective evidence that a method meets the requirements for its intended use, with validation depth and scope proportional to the method's application context [71].

Key Terminology in NTM Validation

Understanding the specialized terminology is essential for proper NTM validation:

Non-Targeted Methods (NTMs): Analytical approaches that do not aim at predefined constituents but instead generate a comprehensive fingerprint to answer a specific analytical question [15] [7].
Targeted Methods: Traditional approaches focusing on quantifying or identifying specific predefined analytes or markers [13].
Method Validation: The process of demonstrating that a method is fit-for-purpose by evaluating its performance characteristics [71].
Method Verification: The process of demonstrating that a previously validated method works correctly in a specific laboratory [71].
Performance Characteristics: Metrics such as specificity, accuracy, precision, and robustness that define method capability [15] [71].

Performance Characteristics for NTM Validation

Validating NTMs requires assessing both traditional and method-specific performance parameters. The table below summarizes the core validation characteristics and their specific considerations for non-targeted approaches.

Table 1: Essential Performance Characteristics for NTM Validation

Performance Characteristic	Definition in NTM Context	Validation Considerations
Specificity/Selectivity	Ability to distinguish between different classes or authenticate against claims	Demonstrate response patterns differ significantly between classes; use chemometrics to visualize separation [15]
Robustness	Method resilience to small, deliberate parameter variations	Test impact of instrumental settings, sample preparation, and environmental conditions [15]
Transferability	Method performance consistency across instruments/laboratories	Conduct inter-laboratory studies; standardize protocols and data processing [15]
Precision	Agreement between independent results under specified conditions	Evaluate at multiple levels: instrumental, sample preparation, and within/between laboratories [71]
Stability	Sample and reference standard stability under defined conditions	Establish sample integrity timeframe and storage conditions [71]

Additional NTM-Specific Validation Parameters

Beyond traditional parameters, NTMs require specialized validation approaches:

Model Performance: For classification-based NTMs, validation must include metrics such as sensitivity, specificity, and misclassification rates determined through cross-validation or external validation sets [15]. The model should demonstrate ≥85% correct classification for screening purposes [70].
Data Quality: Instrument performance stability must be monitored using system suitability tests and quality control samples to ensure data reliability throughout the method's lifecycle [15].
False Positive/Negative Rates: Establish acceptable rates for the intended use, recognizing that screening methods may tolerate higher false positive rates than confirmatory methods [15].

Experimental Design and Protocols for NTM Validation

General Workflow for NTM Validation

The validation of non-targeted methods follows a systematic process to establish fitness-for-purpose. The workflow below outlines the key stages from planning through implementation.

Sample Set Design and Preparation

A statistically sound sample set is fundamental for robust NTM validation:

Sample Composition: Include authentic samples (n≥50 per class), adulterated samples (n≥30 per adulteration type), and samples from different geographical origins or production methods as relevant to the analytical question [15] [13].
Reference Materials: Whenever possible, incorporate certified reference materials or consensus reference materials to establish method accuracy [70].
Sample Preparation Protocol:
- Homogenization: Process samples to ensure representative aliquots using appropriate grinding, mixing, or subdivision techniques.
- Extraction: Employ simple, comprehensive extraction protocols to capture a wide range of chemical constituents rather than optimizing for specific analytes [13].
- Quality Controls: Include system suitability standards, pooled quality control samples, and blank samples in each analytical batch [15].
- Storage: Document storage conditions and establish sample stability under these conditions [71].

Analytical Measurement Considerations

NTM validation employs diverse analytical platforms depending on the food matrix and authenticity question:

Spectroscopy Techniques: FT-NIR, NMR - valuable for rapid screening with minimal sample preparation [13].
Mass Spectrometry: GC-MS, LC-MS - provide comprehensive metabolite profiling for complex authentication questions [15] [13].
Molecular Techniques: DNA-based methods - highly specific for species identification [13].

Each technique requires demonstrating instrumental stability throughout validation studies through repeated analysis of quality control samples [15].

Data Processing and Chemometric Analysis

Data processing represents a critical component of NTM validation:

Data Pre-processing: Apply appropriate normalization, scaling, and transformation techniques to reduce unwanted variation while preserving biological or chemical patterns [13].
Feature Extraction: Identify and select relevant variables or markers that contribute to class separation or authentication.
Model Building: Develop classification or regression models using appropriate algorithms (PCA, PLS-DA, machine learning classifiers) with training datasets [13].
Model Validation: Test model performance with independent validation sets not used in model building, reporting key metrics including accuracy, sensitivity, specificity, and AUC where applicable [13] [72].

The Researcher's Toolkit for NTM Validation

Successful implementation of NTM validation requires specific reagents, materials, and computational resources. The following table catalogues essential components of a comprehensive NTM validation toolkit.

Table 2: Essential Research Reagent Solutions for NTM Validation

Tool Category	Specific Examples	Function in NTM Validation
Reference Materials	Certified reference materials, in-house reference standards	Establish method accuracy and monitor long-term performance [70]
Quality Control Materials	Pooled quality control samples, system suitability standards	Monitor analytical system stability and data quality [15]
Extraction Solvents	Methanol, acetonitrile, chloroform, water of varying grades	Comprehensive extraction of metabolites/constituents for fingerprinting [13]
Internal Standards	Stable isotope-labeled compounds, chemical analogues	Monitor extraction efficiency and instrument performance [15]
Data Analysis Software	R, Python, MATLAB, proprietary chemometrics packages	Process complex multivariate data and build classification models [13]

Implementation and Regulatory Considerations

Standardized Method Performance Requirements

Method validation should align with established standards and performance requirements:

AOAC Standards: The AOAC Food Authenticity Methods program has developed Standard Method Performance Requirements (SMPRs) for various food matrices including honey, milk products, and extra virgin olive oil [70].
Eurachem Guidelines: The "Fitness for Purpose of Analytical Methods" guide provides a general framework for method validation applicable across analytical fields [71].
Codex Alimentarius: International standards for method validation in food control, particularly important for trade dispute resolution [70].

Documentation and Reporting

Comprehensive documentation is essential for demonstrating fitness-for-purpose:

Validation Protocol: Pre-defined experimental plan specifying validation criteria, acceptance criteria, and statistical approaches [71].
Validation Report: Detailed documentation of experimental results, statistical analysis, and conclusion regarding fitness-for-purpose [71].
Standard Operating Procedures: Detailed protocols for method implementation, including sample preparation, instrumentation, data processing, and model application [15].
Change Control Procedures: Documentation for managing modifications to the validated method [15].

Validating non-targeted methods for food authenticity requires a systematic, purpose-driven approach that addresses both traditional validation parameters and NTM-specific considerations. By implementing the protocols and frameworks outlined in this document, researchers can demonstrate that their NTMs are fit-for-purpose, providing reliable results for detecting food fraud and verifying authenticity claims. As the field evolves, continued development of standardized validation approaches will be essential for building confidence in these powerful analytical tools and ensuring their appropriate application across the food industry.

Non-targeted methods represent a paradigm shift in analytical science for food authenticity research. Unlike traditional targeted methods that aim to identify and quantify a predefined "needle in a haystack," NTMs exploit information from all measurable constituents of the "haystack" [15]. This comprehensive approach makes NTMs particularly valuable for detecting unknown adulterants and verifying complex food authenticity claims where potential fraud vectors cannot be predetermined [13]. The core strength of NTMs lies in their ability to generate analytical fingerprints using high-resolution instruments such as mass spectrometry, NMR, or spectroscopy, combined with advanced chemometrics and machine learning algorithms for pattern recognition [1].

Validation of NTMs presents unique challenges compared to traditional targeted methods. Rather than focusing on individual analytes, NTM validation must demonstrate fitness-for-purpose for classifying samples based on their comprehensive fingerprint [15]. This requires innovative approaches to establish performance characteristics and define acceptance criteria that ensure reliable results in routine applications [1]. As regulatory frameworks like the EU Official Controls Regulation increasingly require validated methods for official food control, establishing standardized validation protocols for NTMs becomes essential for widespread adoption [1]. This document provides a comprehensive roadmap for validating NTMs in food authenticity research, addressing both conceptual frameworks and practical implementation.

Performance Characteristics for NTM Validation

Core Validation Metrics

Validating non-targeted methods requires assessing a distinct set of performance characteristics that differ from those used for targeted methods. These metrics must collectively demonstrate that the method can reliably discriminate between authentic and non-authentic samples while remaining robust to expected biological and technical variations [15] [1].

Table 1: Essential Performance Characteristics for Non-Targeted Methods

Performance Characteristic	Definition	Assessment Approach
Specificity	Ability to correctly distinguish between defined sample classes	Evaluate separation between classes in multivariate space; assess against potential interferents
Accuracy	Agreement between predicted and true class membership	Calculate percentage of correct classifications using known validation samples
Precision	Agreement between repeated measurements of the same sample under stipulated conditions	Monitor variation in classification results and fingerprint stability across replicates
Robustness	Resilience of method performance to small, deliberate variations in method parameters	Test impact of slight alterations in analytical conditions on classification outcomes
Stability	Consistency of analytical fingerprints over time under specified storage conditions	Track signal drift and classification performance across multiple analytical batches
Applicability	Scope of sample types, origins, and processing methods reliably covered	Test method across diverse samples representing intended use cases
Reliability	Overall trustworthiness of results for intended purpose	Combined assessment of all performance characteristics relative to application context

Quantitative Acceptance Criteria

Establishing quantitative acceptance criteria for NTM performance characteristics requires a fit-for-purpose approach that considers the specific application and its associated risks [1]. The following criteria represent general benchmarks for food authenticity applications:

Classification Accuracy: Minimum of 95% correct classification for authenticity methods with significant economic or safety implications [1] [18]. For screening purposes, 90% may be acceptable with appropriate confirmatory testing protocols.
Precision: Relative standard deviation (RSD) of fingerprint features should be ≤20% for intensive features and ≤30% for less intensive features in replicate analyses [18].
Specificity: Method should correctly reject at least 95% of non-authentic samples, including those with closely related profiles or common adulterants.
Robustness: Method performance should remain within preset acceptance limits when critical parameters (e.g., extraction time, mobile phase composition, instrumental settings) are deliberately varied.
Stability: Analytical fingerprints should remain stable with correlation coefficients ≥0.9 between fingerprints acquired over different days or by different operators.

Validation Workflow Implementation

Stage-Gate Validation Approach

Implementing a structured validation workflow ensures comprehensive assessment of all critical NTM performance characteristics. The following diagram illustrates the recommended stage-gate approach:

Sample Requirements and Considerations

A comprehensive validation study requires carefully selected and characterized samples that represent the full scope of the method's intended application [1]. The sample set must include:

Authentic Reference Samples: Well-characterized samples representing each class the method aims to distinguish (e.g., geographic origins, species, production methods). These should cover natural variability in composition due to seasonality, processing, and genetic factors.
Challenging Samples: Samples that test the method's boundaries, including closely related classes, blended products, and processed goods. For example, in spelt authentication, this includes atypical spelt cultivars and spelt-wheat crosses [18].
Quality Control Materials: Representative materials analyzed throughout the validation to monitor method stability and performance over time.

Sample size requirements depend on the complexity of the classification problem, but as a general guideline, a minimum of 20-30 independent samples per class is recommended for initial validation, with additional samples for external validation [1] [18].

Experimental Protocol: NTM for Spelt Authentication

Detailed Methodology

The following protocol details a validated non-targeted method for distinguishing spelt from wheat using LC-HRMS and convolutional neural networks, adapted from a published study [18]. This exemplifies the practical implementation of NTM validation principles.

Sample Preparation

Homogenization: Begin with thorough homogenization of grain samples using a standardized milling procedure to achieve consistent particle size distribution.
Extraction: Weigh 50 ± 1 mg of homogenized sample into a 2 mL microcentrifuge tube. Add 1 mL of HPLC-grade methanol. Vortex for 1 minute until fully suspended.
Extraction Optimization: Sonicate the suspension for 30 minutes at 30°C in a controlled water bath. Centrifuge at 1400 × g for 5 minutes to pellet insoluble material.
Filtration: Carefully transfer the supernatant to a new vial through a 0.2 μm nylon syringe filter. Store filtered extracts at 4°C if not analyzing immediately, with maximum storage duration of 24 hours.

LC-HRMS Analysis

Chromatographic Conditions:
- Column: Thermo Scientific Accucore Phenyl Hexyl (100 mm × 2.1 mm ID × 2.6 μm)
- Mobile Phase: A: Water with 0.1% formic acid; B: Acetonitrile with 0.1% formic acid
- Gradient: 5-90% B over 16 minutes, hold 4 minutes, return to 5% B
- Flow Rate: 0.3 mL/min
- Temperature: 40°C
- Injection Volume: 5 μL
Mass Spectrometry Parameters:
- Instrument: Thermo Scientific Orbitrap Exploris 240 HRMS or equivalent high-resolution mass spectrometer
- Ionization: H-ESI with positive mode (spray voltage: 3500 V) and negative mode (spray voltage: 2500 V)
- Scan Range: 70-1000 m/z
- Resolution: 60,000 FWHM
- Data Acquisition: Full MS/dd-MS² mode
Quality Control: Include system suitability tests and quality control samples (pooled quality control from all samples) at regular intervals throughout the sequence to monitor instrument stability.

Data Processing and Model Training

Data Preprocessing: Convert raw files to open formats (e.g., mzML). Perform peak detection, alignment, and normalization using software such as MS-DIAL or XCMS.
Feature Table Construction: Create a data matrix with samples as rows and spectral features (m/z-RT pairs with intensities) as columns.
Convolutional Neural Network Architecture:
- Input layer matching dimensions of preprocessed spectral data
- Two convolutional layers with ReLU activation for feature learning
- Max pooling layers for dimensionality reduction
- Fully connected layers for classification
- Softmax output layer for class probability estimation
Model Validation: Implement nested cross-validation to avoid overfitting. Use external validation sets including artificially mixed spectra, processed goods, and atypical samples not included in model training [18].

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 2: Essential Research Reagents and Materials for NTM Development

Category	Specific Items	Function and Application Notes
Chromatography	Accucore Phenyl Hexyl Column (100 mm × 2.1 mm ID × 2.6 μm)	Provides chromatographic separation of complex food extracts with enhanced selectivity
Mobile Phase Additives	LC-MS grade water with 0.1% formic acid, LC-MS grade acetonitrile with 0.1% formic acid	Enhances ionization efficiency in positive ESI mode and improves chromatographic peak shape
Sample Preparation	HPLC-grade methanol, 0.2 μm nylon syringe filters, 2 mL microcentrifuge tubes	Ensures efficient extraction and removal of particulate matter that could compromise LC-MS system
Mass Spectrometry	Thermo Scientific Orbitrap Exploris 240 HRMS or equivalent high-resolution mass spectrometer	Delivers high mass accuracy (<5 ppm) and resolution (60,000 FWHM) essential for non-targeted fingerprinting
Data Processing	Compound Discoverer 3.3, XCMS, MS-DIAL, Python with TensorFlow/Keras for CNN development	Enables comprehensive data analysis, from feature detection to advanced machine learning implementation
Reference Databases	mzCloud, ChemSpider, PubChem, LipidMaps	Facilitates metabolite annotation and identification with spectral matching and accurate mass data
Quality Control	Custom quality control samples, internal standards, system suitability mixtures	Monitors analytical performance throughout method development and validation

Implementation Considerations and Regulatory Alignment

Successfully implementing validated NTMs in routine testing environments requires addressing several practical considerations. The reference database forms a critical component of the NTM and must be carefully constructed, maintained, and documented [1]. Database management should include regular updates, version control, and metadata documentation covering sample provenance, analytical conditions, and data processing parameters.

For regulatory acceptance, NTMs should demonstrate equivalence or superiority to existing standardized methods when available. The validation approach should align with relevant regulatory frameworks such as EU Controls Regulation 2017/625, which requires official food control laboratories to use validated methods [1]. Engaging with standardization bodies early in the method development process can facilitate eventual method standardization through multi-laboratory validation studies.

When implementing NTMs for routine use, establish ongoing verification procedures including regular analysis of quality control materials and periodic assessment of model performance. Monitor for concept drift where the relationship between fingerprint and authenticity may change over time due to seasonal variations, new agricultural practices, or evolving fraud patterns. Implement procedures for model retraining or updating when performance monitoring indicates degradation.

The integration of NTMs with targeted methods creates a powerful framework for comprehensive food authenticity testing, where NTMs provide broad screening capabilities and targeted methods deliver confirmatory analysis for specific adulterants [13]. This combined approach maximizes the strengths of both methodologies while mitigating their individual limitations.

In the field of food authenticity research, the selection of an analytical approach is a fundamental decision that directly impacts the ability to detect fraud and mislabeling. The two primary paradigms—targeted and non-targeted analysis—offer distinct pathways for verifying food authenticity, each with its own operational principles, capabilities, and limitations [58] [13]. Targeted methods focus on quantifying predefined analytes, operating on a "needle in a haystack" principle where specific adulterants or markers are known and measured [15]. In contrast, non-targeted methods (NTMs) exploit the entire "haystack," generating a comprehensive fingerprint of a sample without prior knowledge of its specific constituents [15]. The growing complexity of global food supply chains, coupled with increasingly sophisticated adulteration practices, has accelerated the adoption of NTMs over the past decade [58] [17]. This application note provides a structured comparison of these approaches, detailed experimental protocols for NTM implementation, and essential considerations for their validation within food authenticity research.

Core Principles and Comparative Analysis

Defining the Approaches

Targeted Analysis is a hypothesis-driven approach. It is used to answer a specific question, such as "Is melamine present in this milk powder?" or "What is the concentration of a specific pesticide?" [20]. This approach relies on a priori knowledge of specific analytes (e.g., a known adulterant like Sudan Red dye or melamine) and methods are optimized for the detection and precise quantification of these predefined targets [58] [13]. Techniques like LC-MS/MS (Liquid Chromatography-Tandem Mass Spectrometry) are industry standards for such applications due to their high sensitivity and selectivity for specific compounds [64].

Non-Targeted Analysis (NTA), conversely, is a screening method that seeks to answer a more general question: "Does this sample look normal or authentic compared to a reference set of known authentic samples?" [20]. Instead of measuring predefined compounds, NTMs aim to capture a global profile or fingerprint of a sample, often by measuring thousands of data points without initial knowledge of their chemical identity [20] [13]. The resulting fingerprints from authentic samples are used to build statistical models, and unknown samples are compared against these models to detect anomalies indicative of fraud [20] [2].

Comparative Tables of Strengths and Limitations

Table 1: Strategic Comparison of Targeted vs. Non-Targeted Approaches.

Aspect	Targeted Analysis	Non-Targeted Analysis
Analytical Question	"Is it there, and how much?" [20]	"Does it look normal or not?" [20]
Principle	Quantifies predefined analytes (the "needle") [15] [13]	Comprehensive profiling of the sample "fingerprint" (the "haystack") [15] [13]
Ideal Use Case	Detecting known, specific adulterants; compliance testing [20] [64]	Detecting unknown adulterations; authenticity screening; origin verification [20] [13]
Data Output	Quantitative, definitive concentration of specific analytes	Probabilistic, based on similarity to a model (e.g., "likely to be authentic") [20]
Result Interpretation	Straightforward, direct comparison to a regulatory limit or threshold [20]	Complex, requires statistical models (chemometrics, machine learning) and reference databases [20] [2]

Table 2: Technical and Operational Comparison.

Aspect	Targeted Analysis	Non-Targeted Analysis
Throughput	Can be lower due to complex sample preparation [13]	Generally higher throughput after model development [13]
Development & Cost	Method development per analyte is required; cost-effective for routine targeted checks	High initial R&D cost for model and database building; requires significant expertise [20] [2]
Sensitivity & Quantification	High sensitivity and excellent quantification capabilities [64]	Varies; typically less sensitive and not inherently quantitative
Flexibility	Limited to known targets; cannot detect unforeseen fraud [13]	Broadly flexible; can detect unanticipated deviations if present in the fingerprint [13]
Validation	Well-established, standardized protocols (e.g., ISO, AOAC) [15]	Evolving and complex validation frameworks; no universally accepted standard [15] [2]

Non-Targeted Method Workflow and Protocols

The power of NTMs lies in a rigorous, multi-stage workflow that transforms a raw sample into a reliable authenticity assessment. The following protocol and diagram outline the critical stages for an NMR-based NTM, a technique prized for its high reproducibility and robustness across laboratories [2].

Generalized Workflow for Non-Targeted Analysis

Diagram 1: Generalized workflow for non-targeted analysis.

Detailed Protocol: NMR-Based Non-Targeted Fingerprinting for Liquid Food Authentication (e.g., Wine, Honey)

This protocol is adapted from established methodologies in food metabolomics [2].

3.2.1 Scope This protocol describes the procedure for using Nuclear Magnetic Resonance (NMR) spectroscopy to create a non-targeted fingerprint for authenticating the geographical origin of liquid foodstuffs such as wine and honey.

3.2.2 Experimental Workflow

Diagram 2: Experimental workflow for NMR-based non-targeted analysis.

3.2.3 Materials and Reagents

NMR Spectrometer: A high-field instrument (e.g., 600 MHz) equipped with a cryogenically cooled probe for enhanced sensitivity [2].
NMR Tubes: 5 mm precision NMR tubes.
Chemical Reference Standard: Trimethylsilylpropanoic acid-d4 sodium salt (TSP-d4), serves as an internal standard for chemical shift referencing (δ 0.0 ppm) and can be used for quantification [2].
Deuterated Solvent: Deuterium oxide (D2O) is used in the buffer to provide a stable lock signal for the NMR spectrometer.
Phosphate Buffer: 0.2 M sodium phosphate buffer, pD 7.0 (note: pD = pH meter reading + 0.4), prepared in D2O. The buffer minimizes pH-induced chemical shift variations across samples.

3.2.4 Procedure

Sample Preparation: Follow steps 1-5 as outlined in Diagram 2. Consistent preparation is critical to minimize technical variance.
Data Acquisition: Follow steps 1-4 as outlined in Diagram 2. Using a standardized, validated pulse sequence across all samples is essential for reproducibility [2].
Data Processing: Follow steps 1-6 as outlined in Diagram 2. This step converts raw spectral data into a structured data matrix suitable for statistical analysis.

3.2.5 Statistical Modeling, Validation, and Database Building

Model Training: Use the processed spectral data from the authenticated training set. Apply multivariate statistical methods such as Principal Component Analysis (PCA) for exploratory data analysis and Orthogonal Partial Least Squares-Discriminant Analysis (OPLS-DA) or machine learning classifiers (e.g., Support Vector Machines) to build a classification model [13] [2].
Model Validation: This is a critical step for assessing predictive ability and preventing overfitting.
- Cross-Validation: Employ k-fold cross-validation using the training set.
- External Validation: Test the model's performance on a fully independent set of validated samples that were not used in any stage of model building [2].
Database Building: Store the raw spectra, processed data, and model parameters in a secure database. This facilitates future queries and continuous model improvement as more authentic samples are analyzed [2].

The Scientist's Toolkit: Essential Research Reagent Solutions

The following table details key reagents and materials essential for implementing non-targeted methods, particularly in metabolomics-based authenticity studies.

Table 3: Essential Reagents and Materials for Non-Targeted Food Authenticity Research.

Item	Function / Application	Key Considerations
Internal Standards (e.g., TSP-d4 for NMR)	Chemical shift referencing, quantification, and quality control of the analytical run [2].	Must be inert and not present in the native sample. Concentration should be in the linear dynamic range of the detector.
Deuterated Solvents (e.g., D2O, CD3OD)	Provides a field-frequency lock for NMR spectrometers, ensuring spectral stability [2].	Purity is critical. The choice of solvent depends on the solubility of the food matrix.
Stable Isotope-Labeled Standards (for MS)	Used in some NTA MS workflows as internal standards for signal correction and to aid in compound identification.	13C-labeled compounds are ideal as they co-elute with analytes in LC.
Certified Reference Materials (CRMs)	Used for instrument calibration, method validation, and as a benchmark for authentic samples.	Should be matrix-matched when possible. Sourced from reputable providers (e.g., NIST, IRMM).
Buffers (e.g., Phosphate Buffer)	Controls pH to minimize chemical shift variance in NMR and stabilize the sample [2].	Buffer concentration and pH/pD must be consistent across all samples in a study.
Solid Phase Extraction (SPE) Kits	For sample clean-up or fractionation to reduce matrix effects, particularly in complex matrices for MS analysis.	Select sorbent chemistry based on the broad class of metabolites of interest (e.g., reversed-phase for lipids).

The choice between targeted and non-targeted approaches is not a matter of superiority but of strategic application. Targeted methods provide definitive, quantitative answers for known adulterants and are the backbone of compliance and regulatory testing. Non-targeted methods, conversely, offer a powerful hypothesis-generating screen capable of defending against unknown and evolving fraud threats. The future of food authenticity testing lies in their integrated use, leveraging the strengths of each to create a robust defense system. Furthermore, the successful implementation of NTMs hinges on overcoming challenges related to validation, standardization, and the construction of high-quality, extensive reference databases [15] [2]. As collaborative efforts and technological advancements continue to mature NTM validation frameworks, these methods are poised to become indispensable tools in ensuring global food integrity and consumer confidence.

Food authenticity research has become a critical frontier in ensuring food safety, protecting consumer rights, and promoting fair trade practices globally. Within this domain, non-targeted methods (NTMs) represent a paradigm shift from traditional analytical approaches, moving from the detection of predefined adulterants to the comprehensive fingerprinting of food matrices for authenticity verification [13]. These methods are particularly valuable for detecting unknown or unexpected adulterants, which would likely evade conventional targeted analysis [21]. The burgeoning application of NTMs in food authenticity research, however, introduces significant challenges pertaining to method validation, reproducibility, and regulatory acceptance.

This application note delineates the contemporary landscape of standardization initiatives led by prominent international organizations to address these challenges. By establishing harmonized protocols, performance criteria, and validation frameworks, these initiatives are pivotal in transforming NTMs from research tools into reliable, standardized procedures fit for purpose in compliance-driven environments.

The Organizational Landscape of Food Authenticity Standardization

The standardization ecosystem for food authenticity is a multi-layered structure involving international governmental bodies, non-governmental standards organizations, and industry consortia. The table below summarizes the key organizations and their primary focus areas relevant to NTM development.

Table 1: Key Standardization Organizations in Food Authenticity

Organization	Primary Role & Focus	Key Activities Related to NTMs
AOAC INTERNATIONAL	Development of official methods of analysis; ensures safety and integrity of foods [70].	Food Authenticity Methods (FAM) program with dedicated working groups for Non-Targeted Testing; development of Standard Method Performance Requirements (SMPRs) [70].
Codex Alimentarius Commission (CAC)	Joint FAO/WHO body; develops international food standards to protect consumer health and ensure fair trade [73].	Provides a collection of food standards and guidelines; has an active electronic working group to define Food Fraud/Authenticity for inclusion in the Codex Alimentarius [73].
International Organization for Standardization (ISO)	Develops voluntary, consensus-based international standards across industries [73].	Technical Committee ISO/TC 34 develops standards for food products, including horizontal methods for molecular biomarker analysis which support authenticity testing [73].
European Committee for Standardization (CEN)	Develops and defines voluntary standards at the European level [73].	Technical Committee CEN/TC 460 "Food Authenticity" was established in 2019 to standardize analytical methods for verifying food authenticity [73].

Deep Dive into AOAC's Food Authenticity Methods (FAM) Program

Program Objectives and Structure

AOAC INTERNATIONAL’s FAM program is a preeminent initiative specifically designed to "identify analytical tools to better locate and characterize the intentional and economically motivated adulteration of foods" [70]. The program was launched with a clear focus on addressing the analytical gaps in combating food fraud, with an initial emphasis on the most adulterated food commodities, namely olive oil, milk, and honey [70].

The program is structured around dedicated working groups that drive its scientific agenda:

Non-Targeted Testing Working Group: Focused on developing standards for NTT of foods, including generic SMPRs to evaluate the reliability and usefulness of NTT methodologies [70].
Targeted Testing Working Group: Aims to survey the targeted testing landscape, identify existing methodologies, and prioritize gaps for AOAC standards development [70].
Molecular Applications Working Group: Concentrates on DNA-based and related methods.
Matrix-Specific Subgroups: Dedicated groups for high-risk commodities like olive oil, honey, and milk/milk products [70].

Key Outputs and Future Directions

The FAM program has achieved significant milestones, including the development of six SMPRs for honey, milk products, and extra virgin olive oil, with further SMPRs for botanicals and spices (vanilla, saffron, turmeric) imminent for adoption [70]. These SMPRs are critical as they define the minimum performance criteria that a method must meet for AOAC Official Methods status, thereby providing a clear target for method developers.

Future work is strategically planned to expand into new matrices and develop emergency response capabilities:

Expansion to New Matrices: Development of SMPRs for meat and seafood adulteration is underway [70].
Emergency Response Protocol: Development of an emergency response guidance and process decision tree (E-RAMP) to assist in addressing emergency adulteration events that pose a public health threat [70].
Training Programs: Development of training programs to build capacity and ensure consistent application of standardized methods [70].

Experimental Protocol: A Template for NTM Development and Validation

The following protocol provides a generalized, step-by-step framework for developing and validating a non-targeted mass spectrometry method for food authenticity, drawing from established approaches in the scientific literature [8] [7].

1. Objective: To develop a non-targeted LC-HRMS method coupled with chemometric analysis for reliable discrimination between spelt (Triticum spelta) and wheat (Triticum aestivum) cultivars.

2. Experimental Workflow:

3. Materials and Reagents:

Samples: Authentic and verified cultivars of spelt and wheat (e.g., 11 cultivars each, measured in duplicate) [8].
Solvents: LC-MS grade water, acetonitrile, and methanol.
Additives: Formic acid or ammonium formate/acetate for mobile phase modification.

4. Equipment:

Liquid Chromatography System: UHPLC or HPLC system.
Mass Spectrometer: High-resolution mass analyzer (e.g., Time-of-Flight (TOF) or Orbitrap) [8].
Chromatography Column: Reversed-phase C18 column (e.g., 100 mm x 2.1 mm, 1.8 µm).
Data Processing Software: Software for pre-processing (e.g., XCMS, MS-DIAL) and for statistical analysis (e.g., R, Python, with libraries for machine learning).

5. Procedure: 5.1. Sample Preparation:

Grind grain samples to a fine, homogeneous powder using a laboratory mill.
Weigh a precise amount (e.g., 100 mg) of the powdered sample.
Perform protein extraction using a suitable buffer (e.g., urea/thiourea buffer) or simple solvent extraction (e.g., aqueous methanol) [8].
Centrifuge the extract to remove particulate matter.
Dilute the supernatant appropriately with the initial mobile phase and transfer to an LC vial.

5.2. LC-HRMS Analysis:

Chromatographic Conditions:
- Mobile Phase A: Water with 0.1% formic acid.
- Mobile Phase B: Acetonitrile with 0.1% formic acid.
- Use a linear gradient (e.g., from 5% B to 95% B over 20 minutes).
- Flow rate: 0.3 mL/min; Column temperature: 40°C.
Mass Spectrometric Conditions:
- Data acquired in data-dependent acquisition (DDA) or data-independent acquisition (DIA, e.g., SWATH) mode [8].
- Scan range: m/z 50-1200.
- Polarity: Positive and/or negative electrospray ionization (ESI).
- Ensure mass accuracy (< 5 ppm) and resolution (e.g., > 20,000 FWHM).

5.3. Data Pre-processing:

Convert raw data files to an open format (e.g., mzML).
Perform peak picking, alignment, and retention time correction.
Create a data matrix comprising sample IDs, peak indices (m/z and RT), and corresponding intensities.

6. Data Analysis and Model Building:

Data Pretreatment: Normalize the data matrix (e.g., probabilistic quotient normalization) and scale (e.g., unit variance or Pareto scaling).
Dimensionality Reduction & Model Training: Utilize a nested cross-validation (NCV) approach to train a classification model, such as a Convolutional Neural Network (CNN), on the pre-processed spectral data [8]. The NCV protects against over-optimism by having an outer loop for validation and an inner loop for model selection and tuning.
Model Validation: Validate the final model using a fully external validation set. This set should include samples not used in model building, such as atypical cultivars, processed goods (e.g., spelt bread), and artificially mixed samples [8].
Introduction of a Quantitative Metric: Employ a quantitative metric, such as the D score, to evaluate and compare classification decisions. The D score provides a continuous measure of confidence in the classification, moving beyond a simple binary outcome [8].

The Scientist's Toolkit: Essential Reagents and Materials

The following table catalogues key reagents, materials, and software solutions essential for conducting non-targeted food authenticity research, particularly following the protocol above.

Table 2: Essential Research Reagent Solutions for NTM Food Authenticity

Item Name	Function/Brief Explanation	Example/Specification
LC-MS Grade Solvents	High-purity solvents to minimize background noise and ion suppression in MS, ensuring high-quality data.	Water, Acetonitrile, Methanol [8].
High-Resolution Mass Spectrometer	Instrument for accurate mass measurement; foundational for untargeted profiling and compound identification.	Time-of-Flight (TOF), Orbitrap, Q-TOF [8].
Chromatography Column	Stationary phase for separating complex food matrices, reducing ion suppression and co-elution.	Reversed-phase C18 column (e.g., 100x2.1mm, 1.8µm) [8].
Data Pre-processing Software	Converts raw spectral data into a structured peak intensity table for statistical analysis.	XCMS, MS-DIAL, OpenMS [8].
Chemometric/Machine Learning Software	Platform for building classification and regression models to interpret complex, multivariate data.	R, Python (with scikit-learn, TensorFlow, PyTorch) [74] [8].
Certified Reference Materials	Authentic samples with verified claims; crucial for building and validating robust classification models.	Samples from verified cultivars, geographical origins, or production methods [8].

Validation Framework for Non-Targeted Methods

The validation of NTMs presents unique challenges compared to targeted methods, as they are not designed to quantify a specific analyte but to detect patterns of difference. A fit-for-purpose validation framework is therefore essential. Key considerations include [7]:

Specificity/Sensitivity: Defined by the method's ability to correctly classify authentic (specificity) and non-authentic (sensitivity) samples. This requires a large and diverse set of validation samples covering expected biological and processing variations.
Robustness: The resilience of the method's performance to small, deliberate variations in analytical parameters (e.g., column age, mobile phase composition, instrument calibration).
Cross-Validation: The use of strategies like nested cross-validation is critical to obtain a realistic estimate of the model's predictive performance and to prevent overfitting [8].
Data Quality: Ensuring the stability and performance of the instrumental platform throughout the analysis is a prerequisite for obtaining valid, high-quality data for modeling.

The data analysis workflow, from raw spectra to a validated classification model, can be visualized as a process of increasing information refinement, culminating in a performance assessment.

The collaborative and multi-faceted standardization efforts led by AOAC INTERNATIONAL, Codex, ISO, and CEN are fundamentally shaping the future of food authenticity research. By providing a structured pathway for the development, validation, and official recognition of non-targeted methods, these initiatives are transforming sophisticated research concepts into practical, reliable, and defensible analytical tools. The provided experimental protocol and validation framework offer a tangible template for researchers to align their work with these emerging standards. Adherence to these evolving guidelines is paramount for ensuring that NTM-based applications generate robust, reproducible, and internationally accepted results, thereby strengthening the global food supply chain against the persistent threat of fraud.

The globalization and increasing complexity of the food supply chain have intensified challenges in food authenticity and safety research. Non-targeted methods (NTMs) represent a paradigm shift in analytical science, moving from predefined "needles in a haystack" to exploiting all constituents of the haystack [15] [7]. Multi-omics integration provides the foundational framework to enhance these methods, enabling comprehensive molecular profiling from genomic to metabolomic levels. This article details application notes and protocols for implementing integrated multi-omics approaches within food authenticity research, addressing critical validation considerations and providing practical experimental workflows for researchers and scientists engaged in method development.

Food authenticity encompasses the undeniable quality, origin, and accurate declaration of food products, representing one of the three major attributes of food alongside safety and quality [22] [51]. Incidents of food fraud—including species substitution, geographical origin misrepresentation, and economic adulteration—have escalated globally, driving the need for sophisticated analytical approaches that can verify claims throughout the "field to table" continuum [22] [51]. Traditional targeted methods, while mature for known hazards, face limitations in detecting unknown contaminants, subtle adulteration patterns, and complex fraud scenarios requiring system-wide analysis [75] [76].

Multi-omics strategies integrate data from complementary analytical domains—genomics, proteomics, metabolomics, lipidomics, and others—to create comprehensive molecular fingerprints that can authenticate food origin, processing history, and biological identity with unprecedented precision [75] [22] [76]. The emerging discipline of foodomics applies these omics technologies alongside biostatistics, chemometrics, and bioinformatics to address authentication challenges [22] [51]. This approach enables researchers to move beyond single-marker analysis toward system-wide pattern recognition, facilitating the detection of increasingly sophisticated food fraud schemes that evade conventional testing methodologies [75] [76].

Table 1: Omics Technologies and Their Applications in Food Authenticity

Omics Technology	Analytical Focus	Key Applications in Food Authenticity	Advantages
Genomics [75] [22]	DNA structure and sequence	Species identification, geographical origin tracing, GMO detection	High stability of DNA, suitable for processed foods, high specificity
Proteomics [75] [22]	Protein expression and modification	Species authentication, processing method verification, allergen detection	Direct relationship to biological function, tissue-specific patterns
Metabolomics [75] [22]	Small molecule metabolites	Geographic origin, adulteration detection, freshness evaluation	Reflects both genotype and environment, rapid analysis
Lipidomics [22] [51]	Lipid profiles	Oil authenticity, thermal processing identification, dairy product authentication	High sensitivity to oxidation and processing changes
Flavoromics [22] [51]	Volatile compound profiles	Authenticity of spices, wines, and specialty foods	Correlates with sensory properties, high consumer relevance

Validation Framework for Non-Targeted Multi-omics Methods

The validation of non-targeted methods (NTMs) presents distinct challenges compared to traditional targeted approaches. Rather than assessing performance against predefined criteria for specific analytes, NTMs require fit-for-purpose validation demonstrating their ability to detect meaningful patterns and differences in complex datasets [15] [7]. This validation framework must address several critical performance characteristics specific to multi-omics applications in food authenticity research.

Key Validation Parameters for Non-Targeted Multi-omics Methods

Specificity and selectivity in NTMs refer to the method's ability to detect consistent and reproducible patterns that reliably differentiate between authentic and adulterated samples or between different geographical origins [15]. This is typically established using chemometric tools to demonstrate clear separation between well-characterized sample classes in multivariate space. Robustness must be evaluated against expected variations in sample preparation, analytical conditions, and instrumental performance, with demonstration that classification models remain stable under these variations [15].

For transferability to routine laboratories, standardized protocols for data acquisition, preprocessing, and model application must be established [15]. This includes defining quality control measures for ongoing method verification during routine implementation. The false positive and false negative rates should be characterized through rigorous testing with authentic samples and known adulterants, establishing decision thresholds that balance sensitivity and specificity according to the specific authenticity question [15] [7].

Table 2: Validation Parameters for Non-Targeted Multi-omics Methods

Validation Parameter	Traditional Targeted Methods	Non-Targeted Multi-omics Methods	Recommended Approach for NTMs
Specificity	Ability to distinguish target analyte from interferents	Ability to generate reproducible patterns that differentiate sample classes	Demonstrate consistent separation of authenticated sample classes using PCA, PLS-DA, or other multivariate tools
Transferability	Demonstrated through inter-laboratory studies	Consistent pattern recognition across instruments and laboratories	Standardized data preprocessing, reference materials for signal correction, harmonized statistical models
Accuracy/Trueness	Comparison to reference materials or methods	Correlation with known truths through validated sample sets	Use of certified reference materials and spiked samples when available; cross-validation with orthogonal methods
Precision	Repeatability and reproducibility of quantitative results	Stability of classification models and pattern recognition	Repeated analyses of quality control samples; demonstration of consistent classification in replicate analyses
Sensitivity	Limit of detection for specific analytes	Minimal detectable difference between classes or minimal adulteration level	Serial dilution studies with known adulterants; determination of classification confidence at different adulteration levels

Multi-omics Integration Strategies and Computational Protocols

The integration of multi-omics data presents significant computational challenges due to differences in data scale, noise characteristics, and biological meaning across omics layers [77] [78]. Successful integration requires strategic selection of computational approaches based on data structure and research objectives.

Data Integration Methodologies

Vertical integration (also called matched integration) combines different omics data from the same biological samples, using the sample itself as an anchor point [78]. This approach is particularly powerful for understanding how molecular changes at one level (e.g., gene expression) correlate with changes at another level (e.g., protein abundance). Tools such as MOFA+ (Multi-Omics Factor Analysis) employ factor analysis to decompose variation across multiple omics datasets and identify latent factors that drive biological and technical variability [78]. Seurat v4 utilizes weighted nearest neighbor analysis to integrate mRNA, protein, and chromatin accessibility data from the same cells [78].

Diagonal integration addresses the more challenging scenario of integrating omics data from different samples, requiring the creation of artificial anchors based on biological similarity rather than direct sample matching [78]. Graph-Linked Unified Embedding (GLUE) uses graph variational autoencoders with prior biological knowledge to align features across different omics modalities, enabling triple-omic integration even when data originates from different sample sets [78].

Mosaic integration provides an alternative when experimental designs include various combinations of omics that create sufficient overlap across sample batches [78]. Tools such as COBOLT and MultiVI employ multimodal variational autoencoders to integrate mRNA and chromatin accessibility data in mosaic fashion, creating a unified representation of cells across datasets with partial overlap [78].

Web-Based Computational Protocol for Multi-omics Integration

For researchers without extensive computational expertise, web-based tools provide accessible platforms for multi-omics integration. The Analyst software suite offers a comprehensive workflow that can be executed in approximately 2 hours [79]:

Single-omics Data Analysis: Process transcriptomics/proteomics data using ExpressAnalyst and lipidomics/metabolomics data using MetaboAnalyst 5.0. These platforms perform quality control, normalization, and statistical analysis to identify significant features within each omics layer [79].
Knowledge-Driven Integration: Input significant features identified in step 1 into OmicsNet to construct and visualize biological networks that integrate multiple omics layers. This approach places differentially expressed features in the context of known biological pathways and interactions [79].
Data-Driven Integration: Use OmicsAnalyst to perform joint dimensionality reduction on normalized multi-omics data matrices. This multivariate approach identifies patterns and relationships across different omics modalities without relying on prior biological knowledge [79].

Figure 1: Comprehensive workflow for multi-omics analysis in food authentication, spanning sample preparation, data generation, computational integration, and authentication decision-making.

Application Notes: Experimental Protocols for Food Authenticity

Protocol 1: Species Authentication in Meat Products Using Genomic and Proteomic Integration

Background: Meat products are highly susceptible to species substitution fraud, where premium meats are adulterated with cheaper alternatives. Integrated genomic and proteomic analysis provides orthogonal verification for unambiguous species authentication [75] [22].

Sample Preparation:

DNA Extraction: Use silica-based membrane kits with additional purification steps to remove PCR inhibitors common in processed meat products. For highly processed samples, consider extraction methods optimized for degraded DNA [22].
Protein Extraction: Employ urea/thiourea-based extraction buffers for comprehensive protein solubilization. For dry-cured products, include rehydration steps and mechanical disruption to ensure complete extraction [75].

Analytical Methods:

Genomic Analysis: Perform digital PCR (ddPCR) for absolute quantification of species-specific DNA targets. This method provides enhanced resistance to PCR inhibitors and improved quantification in complex mixtures [22].
Proteomic Analysis: Utilize LC-Orbitrap MS with data-independent acquisition (DIA) for comprehensive protein profiling. Database searching should include all potential contaminant species [75].

Data Integration and Interpretation:

Process genomic data to determine relative species percentages based on reference curves from authentic standards.
Analyze proteomic data using species-specific peptide markers, with quantification based on extracted ion chromatograms.
Integrate results using a decision tree approach: genomic data provides primary species identification, while proteomic data confirms tissue origin and processing history [75] [22].

Validation: Validate the integrated method using artificially adulterated samples with known percentages of substitute species. Establish limit of detection (LOD) and limit of quantification (LOQ) for common adulterants in the specific meat matrix [15].

Protocol 2: Geographical Origin Verification of Olive Oil Using Metabolomics and Genomics

Background: High-value olive oils are frequently subject to geographical origin fraud. While metabolomics can confirm geographical origin through environmental signatures, genomics provides complementary information about cultivar composition [22] [51].

Sample Preparation:

DNA Extraction: Use CTAB-based methods with additional polyvinylpolypyrrolidone (PVPP) treatment to remove polysaccharides and polyphenols that inhibit PCR amplification [22].
Metabolite Extraction: Employ biphasic extraction (chloroform:methanol:water) for comprehensive coverage of polar and non-polar metabolites. Include internal standards for quality control [75].

Analytical Methods:

Genomic Analysis: Implement genotyping-by-sequencing (GBS) for SNP discovery and cultivar identification. Target a minimum of 1000 high-quality SNPs for confident cultivar assignment [22].
Metabolomic Analysis: Use UHPLC-QTOF-MS in both positive and negative ionization modes for broad metabolome coverage. Include HILIC chromatography for polar metabolites and reversed-phase for lipids [75] [22].

Data Integration and Interpretation:

Process genomic data to establish cultivar profiles and identify potential mixtures.
Analyze metabolomic data using multivariate statistics (PCA, OPLS-DA) to identify geographical patterns.
Apply multiblock data integration methods such as DIABLO to identify correlated genomic and metabolomic features that collectively verify geographical origin [22] [51].

Validation: Collect authentic samples from multiple harvest years to establish stable origin markers unaffected by seasonal variation. Use cross-validation and external validation sets to assess model performance [15].

Figure 2: Validation workflow for non-targeted multi-omics methods in food authenticity research, highlighting critical decision points and validation parameters.

Essential Research Reagent Solutions

Successful implementation of multi-omics strategies requires carefully selected reagents and materials optimized for different food matrices and analytical challenges.

Table 3: Essential Research Reagents for Multi-omics Food Authentication

Reagent Category	Specific Examples	Function in Workflow	Matrix-Specific Considerations
Nucleic Acid Extraction Kits	Silica-membrane kits, CTAB-based methods, inhibitor removal resins	High-quality DNA/RNA extraction for genomic analysis	Processed foods require optimized methods for degraded DNA; oily matrices need additional clean-up steps
Protein Extraction Reagents	Urea/thiourea buffers, RIPA buffer, protease inhibitors	Comprehensive protein extraction while maintaining integrity	Dry products need rehydration; high-fat matrices require defatting steps
Metabolite Extraction Solvents	Methanol, chloroform, acetonitrile, MTBE	Comprehensive metabolite coverage across chemical classes	Tissue-specific optimization needed for different food matrices
Internal Standards	Stable isotope-labeled compounds, retention time markers	Quality control, quantification, instrument performance monitoring	Should cover multiple chemical classes relevant to authentication question
Chromatography Columns	HILIC, C18, phenyl-hexyl, biphenyl	Separation of complex mixtures prior to mass spectrometry	Column chemistry should be matched to analyte properties of interest
Reference Materials	Certified authentic samples, DNA barcodes, purified protein standards	Method validation, quality assurance, calibration	Should represent expected variation in authentic products

Multi-omics strategies represent a transformative approach to food authenticity research, enabling comprehensive molecular profiling that can detect increasingly sophisticated fraud schemes. The integration of genomic, proteomic, and metabolomic data provides orthogonal verification that significantly enhances the capabilities of non-targeted methods. However, successful implementation requires careful attention to validation parameters specific to pattern-recognition approaches, including specificity, transferability, and robustness assessment. The experimental protocols and computational workflows detailed in this article provide researchers with practical guidance for developing, validating, and implementing these powerful methods. As food supply chains continue to globalize and complexify, multi-omics integration will play an increasingly critical role in protecting food authenticity and ensuring consumer trust.

Conclusion

The validation of non-targeted methods represents a paradigm shift in food authenticity, moving from reactive detection to proactive, comprehensive screening. Success hinges on a robust, multi-faceted strategy that integrates advanced analytical platforms like mass spectrometry and NGS with sophisticated data analysis and rigorously curated databases. While challenges in standardization, data management, and interpretation persist, the ongoing development of formal validation frameworks and the strategic move toward multi-omics integration promise a future with more resilient food supply chains. For researchers, the path forward involves collaborative efforts to refine these methodologies, establish universal standards, and expand reference databases, ultimately enhancing our ability to ensure food safety, integrity, and consumer trust on a global scale.