Create the GNPS .mgf backbone file format — write_mgf

This function facilitates to create the structure of the GNPS .mgf format. For more information about submitting your spectra to GNPS, please visit this link. You can find the GNPS template spreadsheet in this link.

Usage

write_mgf_gnps(spec = NULL, spec_metadata = NULL, mgf_name = NULL)

Arguments

spec

a data frame containing the extracted MS/MS spectra, the following column are required:

mz_precursor: precursor ion
rt: retention time
mz: m/z values
intensity: intensity values

spec_metadata

a data frame containing the values to be including in the resulting .mgf file. In this case, this is the minimum mandatory information to be included.

The full explanation about fields and field content can be found in GNPS batch library upload link.

For the rest of fields that are included in the final library file, MS2extract will get this information for the extracted spectra.

We highly suggest to check the gnps_template.xlsx file described in the function example to check what fields/columns are required in MS2extract.

COMPOUND_NAME: character, Metabolite name, it has to be the same name used in met_metadata, the data frame used to import your MS/MS data
INSTRUMENT: character, instrument used for data collection
COLLISIONENERGY: character, collision energy used in MS/MS fragmentation
IONSOURCE: Ionization source
SMILES: character, SMILES chemical structure of your metabolite
INCHI: character, Inchi value for the metabolite
INCHIAUX: character, INCHIAUX for the metabolite
IONMODE: character, ionization polarity
PUBMED: character, PUBMED id where you submitted the MS/MS spectra
ACQUISITION: character, Crude, Lysate, Commercial, Isolated, Other
DATACOLLECTOR: character, Person who collected the MS/MS data
INTEREST: character, interest of the MS/MS data
LIBQUALITY: character, library quality. 1 for Gold, 2 for Silver, 3 for Bronze
GENUS: character, genus of the organism
SPECIES: character, species of the organism
STRAIN: character, strain of the organism
CASNUMBER: character, CAS number of the metabolite
PI: character, principal investigator

mgf_name

file name for the exported .mgflibrary. It does not have to contain the file extension .mgf.

Value

if batch spectra are found, this function writes two files, the .mgf library and the required .tsv table required by GNPS. If single spectrum is detected, it will only write the .mgf library.

Examples

# Example with batch spectra ----


# Select the csv file name and path
batch_file <- system.file("extdata", "batch_read.csv",
  package = "MS2extract"
)
# Read the data frame
batch_data <- read.csv(batch_file)

# File paths for Procyanidin A2 and Rutin
ProcA2_file <- system.file("extdata",
  "ProcyanidinA2_neg_20eV.mzXML",
  package = "MS2extract"
)
Rutin_file <- system.file("extdata",
  "Rutin_neg_20eV.mzXML",
  package = "MS2extract"
)

# Add file path - User should specified the file path -
batch_data$File <- c(ProcA2_file, Rutin_file)

# Checking batch_data data frame
batch_data
#>             Name   Formula Ionization_mode min_rt max_rt COLLISIONENERGY
#> 1 Procyanidin A2 C30H24O12        Negative    163    180           20 eV
#> 2          Rutin C27H30O16        Negative    162    171           20 eV
#>                                                                              File
#> 1 /home/runner/work/_temp/Library/MS2extract/extdata/ProcyanidinA2_neg_20eV.mzXML
#> 2         /home/runner/work/_temp/Library/MS2extract/extdata/Rutin_neg_20eV.mzXML

# Using batch import to import multiple compounds
batch_compounds <- batch_import_mzxml(batch_data)
#> 
#> ── Begining batch import ───────────────────────────────────────────────────────
#> 
#> ── -- ──
#> 
#> • Processing: ProcyanidinA2_neg_20eV.mzXML
#> • Found 1 CE value: 20
#> • Remember to match CE velues in spec_metadata when exporting your library
#> • m/z range given 10 ppm: 575.11376 and 575.12526
#> • Compound name: Procyanidin A2_Negative_20
#> 
#> ── -- ──
#> 
#> • Processing: Rutin_neg_20eV.mzXML
#> • Found 1 CE value: 20
#> • Remember to match CE velues in spec_metadata when exporting your library
#> • m/z range given 10 ppm: 609.14002 and 609.15221
#> • Compound name: Rutin_Negative_20
#> 
#> ── End batch import ────────────────────────────────────────────────────────────
# Checking dimension by compound
# Procyanidin A2: 24249 ions
# Rutin: 22096 ions
purrr::map(batch_compounds, dim)
#> $`Procyanidin A2_Negative_20`
#> [1] 17829     6
#> 
#> $Rutin_Negative_20
#> [1] 11475     6
#> 

batch_extracted_compounds <- batch_extract_MS2(batch_compounds)
#> Scale for x is already present.
#> Adding another scale for x, which will replace the existing scale.
#> Warning: `position_stack()` requires non-overlapping x intervals.

#> Scale for x is already present.
#> Adding another scale for x, which will replace the existing scale.
#> Warning: `position_stack()` requires non-overlapping x intervals.


# Batch detect mass
batch_mass_detected <- batch_detect_mass(batch_extracted_compounds,
  normalize = TRUE, # Normalize
  min_int = 1 # Minimum intensity
)

# Reading metadata from GNPS template
template_file <- system.file("extdata", "GNPS_template.xlsx",
                 package = "MS2extract")

gnps_template <- readxl::read_excel(path = template_file,
                sheet = "batch_example")

write_mgf_gnps(spec = batch_mass_detected,
               spec_metadata = gnps_template,
               mgf_name = "PhenolicsDB")
#> • Filtering MS/MS scans for 20 CE
#> • Filtering MS/MS scans for 20 CE