This function facilitates to create the structure of the GNPS .mgf format. For more information about submitting your spectra to GNPS, please visit this link. You can find the GNPS template spreadsheet in this link.
Arguments
- spec
a data frame containing the extracted MS/MS spectra, the following column are required:
- mz_precursor
precursor ion
- rt
retention time
- mz
m/z values
- intensity
intensity values
- spec_metadata
a data frame containing the values to be including in the resulting
.mgf
file. In this case, this is the minimum mandatory information to be included.The full explanation about fields and field content can be found in GNPS batch library upload link.
For the rest of fields that are included in the final library file, MS2extract will get this information for the extracted spectra.
We highly suggest to check the
gnps_template.xlsx
file described in the function example to check what fields/columns are required in MS2extract.- COMPOUND_NAME
character, Metabolite name, it has to be the same name used in
met_metadata
, the data frame used to import your MS/MS data- INSTRUMENT
character, instrument used for data collection
- COLLISIONENERGY
character, collision energy used in MS/MS fragmentation
- IONSOURCE
Ionization source
- SMILES
character, SMILES chemical structure of your metabolite
- INCHI
character, Inchi value for the metabolite
- INCHIAUX
character, INCHIAUX for the metabolite
- IONMODE
character, ionization polarity
- PUBMED
character, PUBMED id where you submitted the MS/MS spectra
- ACQUISITION
character, Crude, Lysate, Commercial, Isolated, Other
- DATACOLLECTOR
character, Person who collected the MS/MS data
- INTEREST
character, interest of the MS/MS data
- LIBQUALITY
character, library quality. 1 for Gold, 2 for Silver, 3 for Bronze
- GENUS
character, genus of the organism
- SPECIES
character, species of the organism
- STRAIN
character, strain of the organism
- CASNUMBER
character, CAS number of the metabolite
- PI
character, principal investigator
- mgf_name
file name for the exported
.mgf
library. It does not have to contain the file extension.mgf
.
Value
if batch spectra are found, this function writes two files,
the .mgf
library and the required .tsv
table required by GNPS. If single
spectrum is detected, it will only write the .mgf
library.
Examples
# Example with batch spectra ----
# Select the csv file name and path
batch_file <- system.file("extdata", "batch_read.csv",
package = "MS2extract"
)
# Read the data frame
batch_data <- read.csv(batch_file)
# File paths for Procyanidin A2 and Rutin
ProcA2_file <- system.file("extdata",
"ProcyanidinA2_neg_20eV.mzXML",
package = "MS2extract"
)
Rutin_file <- system.file("extdata",
"Rutin_neg_20eV.mzXML",
package = "MS2extract"
)
# Add file path - User should specified the file path -
batch_data$File <- c(ProcA2_file, Rutin_file)
# Checking batch_data data frame
batch_data
#> Name Formula Ionization_mode min_rt max_rt COLLISIONENERGY
#> 1 Procyanidin A2 C30H24O12 Negative 163 180 20 eV
#> 2 Rutin C27H30O16 Negative 162 171 20 eV
#> File
#> 1 /home/runner/work/_temp/Library/MS2extract/extdata/ProcyanidinA2_neg_20eV.mzXML
#> 2 /home/runner/work/_temp/Library/MS2extract/extdata/Rutin_neg_20eV.mzXML
# Using batch import to import multiple compounds
batch_compounds <- batch_import_mzxml(batch_data)
#>
#> ── Begining batch import ───────────────────────────────────────────────────────
#>
#> ── -- ──
#>
#> • Processing: ProcyanidinA2_neg_20eV.mzXML
#> • Found 1 CE value: 20
#> • Remember to match CE velues in spec_metadata when exporting your library
#> • m/z range given 10 ppm: 575.11376 and 575.12526
#> • Compound name: Procyanidin A2_Negative_20
#>
#> ── -- ──
#>
#> • Processing: Rutin_neg_20eV.mzXML
#> • Found 1 CE value: 20
#> • Remember to match CE velues in spec_metadata when exporting your library
#> • m/z range given 10 ppm: 609.14002 and 609.15221
#> • Compound name: Rutin_Negative_20
#>
#> ── End batch import ────────────────────────────────────────────────────────────
# Checking dimension by compound
# Procyanidin A2: 24249 ions
# Rutin: 22096 ions
purrr::map(batch_compounds, dim)
#> $`Procyanidin A2_Negative_20`
#> [1] 17829 6
#>
#> $Rutin_Negative_20
#> [1] 11475 6
#>
batch_extracted_compounds <- batch_extract_MS2(batch_compounds)
#> Scale for x is already present.
#> Adding another scale for x, which will replace the existing scale.
#> Warning: `position_stack()` requires non-overlapping x intervals.
#> Scale for x is already present.
#> Adding another scale for x, which will replace the existing scale.
#> Warning: `position_stack()` requires non-overlapping x intervals.
# Batch detect mass
batch_mass_detected <- batch_detect_mass(batch_extracted_compounds,
normalize = TRUE, # Normalize
min_int = 1 # Minimum intensity
)
# Reading metadata from GNPS template
template_file <- system.file("extdata", "GNPS_template.xlsx",
package = "MS2extract")
gnps_template <- readxl::read_excel(path = template_file,
sheet = "batch_example")
write_mgf_gnps(spec = batch_mass_detected,
spec_metadata = gnps_template,
mgf_name = "PhenolicsDB")
#> • Filtering MS/MS scans for 20 CE
#> • Filtering MS/MS scans for 20 CE