Skip to contents

Goal of this vignette

The main objective of this document is to explain in a detailed manner how we import and process MS/MS when you use import_mzxml().

This document is organized based on the main steps we take to import MS/MS data:

  • Calculating the theoretical precursor ion,
  • Filtering MS/MS scans for a given precursor ion and rt range, and
  • Finding the most intense MS/MS scan.

Calculating the theoretical precursor ion

One of the main inputs, besides the .mzML data, is the met_metadata data frame used in import_mzxml(). This data frame contains the minimum information to calculate the theoretical precursor ion m/z given the following information:

  • Chemical formula,
  • Ionization polarity, and
  • ppm (mass error)

This process can be depicted in the following image.

Here, we are going to use procyanidin A2 to demonstrate this process. (1) First, by using the Rdisop package, we calculate the theoretical monoisotopic mass. (2) Then, given an specific polarity (Positive or Negative), we add or subtract the mass of a proton to calculate the theoretical ionized m/z value. (3) Finally, we used the provided ppm value (10 ppm by default) to calculate the m/z range that will be used to filter scan that the precursor ion value falls within this range.

You can also calculate this ppm range with ppm_range()

ppm_range(mz = 575.12604, ppm = 10)
#> [1] 575.1203 575.1318

Filtering MS/MS scans

Using retention time region of interest

Although providing a retention time window is not mandatory, it is highly suggested to provide this information to have more control over the regions of the run that we look for the MS/MS scans. If you do not provide a specific rt ROI, this package will look for the most intense scan, even if those scans do not represent the desired scans from the provided metabolite.

Filtering using m/z range and rt ROI

Then, after calculating the theoretical m/z range, and knowing the rt ROI, we can look in the data for the MS/MS scans that have this information.

If MS2extract do not find at least one MS/MS scan within the given m/z range and rt ROI, it will stop.

In the following example, the scan in the first row does not meet these requirements, and it is discarded, while the second and third scan falls within these requirements, and are kept for the next steps.

Information about this vignette

Code for creating the vignette

## Create the vignette
library("rmarkdown")
system.time(render("3_import_mzml_explanation.Rmd", "BiocStyle::html_document"))

## Extract the R code
library("knitr")
knit("3_import_mzml_explanation.Rmd", tangle = TRUE)

Date the vignette was generated.

#> [1] "2024-10-23 15:18:00 UTC"

Wallclock time spent generating the vignette.

#> Time difference of 7.469 secs

R session information.

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14)
#>  os       Ubuntu 22.04.5 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language en
#>  collate  C.UTF-8
#>  ctype    C.UTF-8
#>  tz       UTC
#>  date     2024-10-23
#>  pandoc   3.1.11 @ /opt/hostedtoolcache/pandoc/3.1.11/x64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package              * version   date (UTC) lib source
#>  abind                  1.4-8     2024-09-12 [1] RSPM
#>  affy                   1.82.0    2024-04-30 [1] Bioconduc~
#>  affyio                 1.74.0    2024-04-30 [1] Bioconduc~
#>  AnnotationFilter       1.28.0    2024-04-30 [1] Bioconduc~
#>  backports              1.5.0     2024-05-23 [1] RSPM
#>  bibtex                 0.5.1     2023-01-26 [1] RSPM
#>  Biobase                2.64.0    2024-04-30 [1] Bioconduc~
#>  BiocGenerics           0.50.0    2024-04-30 [1] Bioconduc~
#>  BiocManager            1.30.25   2024-08-28 [1] RSPM
#>  BiocParallel           1.38.0    2024-04-30 [1] Bioconduc~
#>  BiocStyle            * 2.32.1    2024-06-16 [1] Bioconduc~
#>  bookdown               0.41      2024-10-16 [1] RSPM
#>  broom                  1.0.7     2024-09-26 [1] RSPM
#>  bslib                  0.8.0     2024-07-29 [1] RSPM
#>  cachem                 1.1.0     2024-05-16 [1] RSPM
#>  car                    3.1-3     2024-09-27 [1] RSPM
#>  carData                3.0-5     2022-01-06 [1] RSPM
#>  cellranger             1.1.0     2016-07-27 [1] RSPM
#>  cli                    3.6.3     2024-06-21 [1] RSPM
#>  clue                   0.3-65    2023-09-23 [1] RSPM
#>  cluster                2.1.6     2023-12-01 [3] CRAN (R 4.4.1)
#>  codetools              0.2-20    2024-03-31 [3] CRAN (R 4.4.1)
#>  colorspace             2.1-1     2024-07-26 [1] RSPM
#>  crayon                 1.5.3     2024-06-20 [1] RSPM
#>  DelayedArray           0.30.1    2024-05-07 [1] Bioconduc~
#>  desc                   1.4.3     2023-12-10 [1] RSPM
#>  digest                 0.6.37    2024-08-19 [1] RSPM
#>  doParallel             1.0.17    2022-02-07 [1] RSPM
#>  dplyr                  1.1.4     2023-11-17 [1] RSPM
#>  evaluate               1.0.1     2024-10-10 [1] RSPM
#>  fansi                  1.0.6     2023-12-08 [1] RSPM
#>  fastmap                1.2.0     2024-05-15 [1] RSPM
#>  foreach                1.5.2     2022-02-02 [1] RSPM
#>  Formula                1.2-5     2023-02-24 [1] RSPM
#>  fs                     1.6.4     2024-04-25 [1] RSPM
#>  generics               0.1.3     2022-07-05 [1] RSPM
#>  GenomeInfoDb           1.40.1    2024-05-24 [1] Bioconduc~
#>  GenomeInfoDbData       1.2.12    2024-10-18 [1] Bioconductor
#>  GenomicRanges          1.56.2    2024-10-09 [1] Bioconduc~
#>  ggplot2                3.5.1     2024-04-23 [1] RSPM
#>  ggpubr                 0.6.0     2023-02-10 [1] RSPM
#>  ggrepel                0.9.6     2024-09-07 [1] RSPM
#>  ggsignif               0.6.4     2022-10-13 [1] RSPM
#>  glue                   1.8.0     2024-09-30 [1] RSPM
#>  gtable                 0.3.5     2024-04-22 [1] RSPM
#>  hms                    1.1.3     2023-03-21 [1] RSPM
#>  htmltools              0.5.8.1   2024-04-04 [1] RSPM
#>  htmlwidgets            1.6.4     2023-12-06 [1] RSPM
#>  httr                   1.4.7     2023-08-15 [1] RSPM
#>  igraph                 2.1.1     2024-10-19 [1] RSPM
#>  impute                 1.78.0    2024-04-30 [1] Bioconduc~
#>  IRanges                2.38.1    2024-07-03 [1] Bioconduc~
#>  iterators              1.0.14    2022-02-05 [1] RSPM
#>  jquerylib              0.1.4     2021-04-26 [1] RSPM
#>  jsonlite               1.8.9     2024-09-20 [1] RSPM
#>  knitr                  1.48      2024-07-07 [1] RSPM
#>  lattice                0.22-6    2024-03-20 [3] CRAN (R 4.4.1)
#>  lazyeval               0.2.2     2019-03-15 [1] RSPM
#>  lifecycle              1.0.4     2023-11-07 [1] RSPM
#>  limma                  3.60.6    2024-10-02 [1] Bioconduc~
#>  lubridate              1.9.3     2023-09-27 [1] RSPM
#>  magrittr               2.0.3     2022-03-30 [1] RSPM
#>  MALDIquant             1.22.3    2024-08-19 [1] RSPM
#>  MASS                   7.3-60.2  2024-04-26 [3] CRAN (R 4.4.1)
#>  Matrix                 1.7-0     2024-04-26 [3] CRAN (R 4.4.1)
#>  MatrixGenerics         1.16.0    2024-04-30 [1] Bioconduc~
#>  matrixStats            1.4.1     2024-09-08 [1] RSPM
#>  MS2extract           * 0.99.0    2024-10-23 [1] local
#>  MsCoreUtils            1.16.1    2024-08-04 [1] Bioconduc~
#>  MSnbase                2.30.1    2024-04-30 [1] Bioconduc~
#>  MultiAssayExperiment   1.30.3    2024-07-10 [1] Bioconduc~
#>  munsell                0.5.1     2024-04-01 [1] RSPM
#>  mzID                   1.42.0    2024-04-30 [1] Bioconduc~
#>  mzR                    2.38.0    2024-04-30 [1] Bioconduc~
#>  ncdf4                  1.23      2024-08-17 [1] RSPM
#>  OrgMassSpecR           0.5-3     2017-08-13 [1] RSPM
#>  pcaMethods             1.96.0    2024-04-30 [1] Bioconduc~
#>  pillar                 1.9.0     2023-03-22 [1] RSPM
#>  pkgconfig              2.0.3     2019-09-22 [1] RSPM
#>  pkgdown                2.1.1     2024-09-17 [1] any (@2.1.1)
#>  plyr                   1.8.9     2023-10-02 [1] RSPM
#>  preprocessCore         1.66.0    2024-04-30 [1] Bioconduc~
#>  ProtGenerics           1.36.0    2024-04-30 [1] Bioconduc~
#>  PSMatch                1.8.0     2024-04-30 [1] Bioconduc~
#>  purrr                  1.0.2     2023-08-10 [1] RSPM
#>  QFeatures              1.14.2    2024-07-07 [1] Bioconduc~
#>  R6                     2.5.1     2021-08-19 [1] RSPM
#>  ragg                   1.3.3     2024-09-11 [1] RSPM
#>  rbibutils              2.3       2024-10-04 [1] RSPM
#>  Rcpp                   1.0.13    2024-07-17 [1] RSPM
#>  Rdisop                 1.64.0    2024-04-30 [1] Bioconduc~
#>  Rdpack                 2.6.1     2024-08-06 [1] RSPM
#>  readr                  2.1.5     2024-01-10 [1] RSPM
#>  readxl                 1.4.3     2023-07-06 [1] RSPM
#>  RefManageR           * 1.4.0     2022-09-30 [1] RSPM
#>  reshape2               1.4.4     2020-04-09 [1] RSPM
#>  rlang                  1.1.4     2024-06-04 [1] RSPM
#>  rmarkdown              2.28      2024-08-17 [1] RSPM
#>  rstatix                0.7.2     2023-02-01 [1] RSPM
#>  S4Arrays               1.4.1     2024-05-20 [1] Bioconduc~
#>  S4Vectors              0.42.1    2024-07-03 [1] Bioconduc~
#>  sass                   0.4.9     2024-03-15 [1] RSPM
#>  scales                 1.3.0     2023-11-28 [1] RSPM
#>  sessioninfo          * 1.2.2     2021-12-06 [1] RSPM
#>  SparseArray            1.4.8     2024-05-24 [1] Bioconduc~
#>  statmod                1.5.0     2023-01-06 [1] RSPM
#>  stringi                1.8.4     2024-05-06 [1] RSPM
#>  stringr                1.5.1     2023-11-14 [1] RSPM
#>  SummarizedExperiment   1.34.0    2024-05-01 [1] Bioconduc~
#>  systemfonts            1.1.0     2024-05-15 [1] RSPM
#>  textshaping            0.4.0     2024-05-24 [1] RSPM
#>  tibble                 3.2.1     2023-03-20 [1] RSPM
#>  tidyr                  1.3.1     2024-01-24 [1] RSPM
#>  tidyselect             1.2.1     2024-03-11 [1] RSPM
#>  timechange             0.3.0     2024-01-18 [1] RSPM
#>  tzdb                   0.4.0     2023-05-12 [1] RSPM
#>  UCSC.utils             1.0.0     2024-04-30 [1] Bioconduc~
#>  utf8                   1.2.4     2023-10-22 [1] RSPM
#>  vctrs                  0.6.5     2023-12-01 [1] RSPM
#>  vsn                    3.72.0    2024-04-30 [1] Bioconduc~
#>  xfun                   0.48      2024-10-03 [1] RSPM
#>  XML                    3.99-0.17 2024-06-25 [1] RSPM
#>  xml2                   1.3.6     2023-12-04 [1] RSPM
#>  XVector                0.44.0    2024-04-30 [1] Bioconduc~
#>  yaml                   2.3.10    2024-07-26 [1] RSPM
#>  zlibbioc               1.50.0    2024-04-30 [1] Bioconduc~
#> 
#>  [1] /home/runner/work/_temp/Library
#>  [2] /opt/R/4.4.1/lib/R/site-library
#>  [3] /opt/R/4.4.1/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Bibliography

This vignette was generated using BiocStyle (Oleś, 2024) with knitr (Xie, 2024) and rmarkdown (Allaire, Xie, Dervieux, McPherson, Luraschi, Ushey, Atkins, Wickham, Cheng, Chang, and Iannone, 2024) running behind the scenes.

Citations made with RefManageR (McLean, 2017).

[1] J. Allaire, Y. Xie, C. Dervieux, et al. rmarkdown: Dynamic Documents for R. R package version 2.28. 2024. URL: https://github.com/rstudio/rmarkdown.

[2] M. W. McLean. “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.

[3] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.32.1. 2024. DOI: 10.18129/B9.bioc.BiocStyle. URL: https://bioconductor.org/packages/BiocStyle.

[4] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.48. 2024. URL: https://yihui.org/knitr/.