In the previous tutorial Introduction to MS2extract package, we described in a detailed manner the core functions of the package. If you are starting to use the MS2extract package with this tutorial, we encourage you to take a look at this tutorial first.

Once you are familiar with the core workflow and functions of this package, we can dive into an automated pipeline with the proposed batch_*() functions. If you find that you want to extract many MS/MS spectra at once, you will want to use thesebatch_*() functions

The first three main steps have a separate batch_*() alternative functions; importing mzXML files, extracting MS/MS spectra, and detecting masses. However, exporting your library to a .msp file is able to detect if the provided spectra comes from a single or multiple .mzXML file, so the same function works in both cases.

Figure 1. Overview of general data processing pipeline to extract MS/MS spectra using the MS2extract package

Batch functions

We are familiar with the arguments that the core functions accept, here in this section we describe extra arguments that specific batch_*() functions require.


knitr::opts_chunk$set(warning = FALSE)

Similarly to import_mzxml(), we need to provide compound metadata, with at minimum the compound name, formula, ionization mode, and optionally (but recommended) the region of interest (min_rt and max_rt).

# Select the csv file name and path
batch_file <- system.file("extdata", "batch_read.csv",
  package = "MS2extract"
# Read the data frame
batch_data <- read.csv(batch_file)

# File paths for Procyanidin A2 and Rutin
ProcA2_file <- system.file("extdata",
  package = "MS2extract"
Rutin_file <- system.file("extdata",
  package = "MS2extract"

# Add file path - User should specified the file path -
batch_data$File <- c(ProcA2_file, Rutin_file)

# Checking batch_data data frame
#> Rows: 2
#> Columns: 6
#> $ Name            <chr> "Procyanidin A2", "Rutin"
#> $ Formula         <chr> "C30H24O12", "C27H30O16"
#> $ Ionization_mode <chr> "Negative", "Negative"
#> $ min_rt          <int> 163, 162
#> $ max_rt          <int> 180, 171
#> $ File            <chr> "/home/runner/work/_temp/Library/MS2extract/extdata/Pr…

The only difference between batch_import_mzxml() and import_mzxml() is that met_metadata can be more than one row. Here we are working with two compounds, procyanidin A2 and rutin.

Tip: you can extract multiple compounds from the same .mzXML if they have different precursor ion m/z.

Tip: you can also specify multiple compounds that have the same m/z as long as they have different retention time.

batch_compounds <- batch_import_mzxml(batch_data)
#> Reading MS2 data from ProcyanidinA2_neg_20eV.mzXML
#> Processing...
#> Reading MS2 data from Rutin_neg_20eV.mzXML
#> Processing...

The raw mzXML data contains:

  • Procyanidin A2: 24249 ions
  • Rutin: 22096 ions
# Checking dimension by compound
purrr::map(batch_compounds, dim)
#> $`Procyanidin A2`
#> [1] 24249     4
#> $Rutin
#> [1] 22096     4


Now that we have our data in imported, we can proceed to extract the most intense MS/MS scan for each compound. In this case, the batch_extract_MS2() functions do not have extra arguments, although most of the arguments remains fairly similar.

# Use extract batch extract_MS2
batch_extracted <- batch_extract_MS2(batch_compounds,
  verbose = TRUE,
  out_list = FALSE

By using verbose = TRUE, we can display the MS/MS TIC plot as well the raw MS/MS spectra.


Now that we have the raw MS/MS spectra, we are going to remove background noise/non-informative product ions based on intensity. batch_detect_mass() has the same arguments than its core analogue.

batch_mass_detected <- batch_detect_mass(batch_extracted, # Compound list
  normalize = TRUE, # Normalize
  min_int = 1
) # Minimum intensity

purrr::map(batch_mass_detected, dim)
#> $`Procyanidin A2`
#> [1] 38  4
#> $Rutin
#> [1] 4 4

We see a decrease of number of ions, 38 and 4 ions for procyanidin A2 and rutin, respectively.

Detected MS2 Procyanidin A2
plot_MS2spectra(batch_mass_detected, "Procyanidin A2")

Detected MS2 Rutin
plot_MS2spectra(batch_mass_detected, "Rutin")


In contrast with the previous batch functions, write_msp() is able to detect if the user is providing a single spectra or multiple spectra. However, the user needs to provide metadata about each compound to be included in the resulting .msp database.

# Reading batch metadata
metadata_msp_file <- system.file("extdata",
  package = "MS2extract"

metadata_msp <- read.csv(metadata_msp_file)

#> Rows: 2
#> Columns: 8
#> $ NAME            <chr> "Procyanidin A2", "Rutin"
#> $ PRECURSORTYPE   <chr> "[M-H]-", "[M-H]-"
#> $ FORMULA         <chr> "C30H24O12", "C27H30O16"
#> $ SMILES          <chr> "C1C(C(OC2=C1C(=CC3=C2C4C(C(O3)(OC5=CC(=CC(=C45)O)O)C6…
#> $ IONMODE         <chr> "Negative", "Negative"
#> $ COLLISIONENERGY <chr> "20 eV", "20 eV"

After having the cleaned MS/MS spectra and the compound metadata, we can proceed to export them into a .msp file.

  spec = batch_mass_detected,
  spec_metadata = metadata_msp,
  msp_name = "ProcA2_Rutin_batch.msp"

