---
title: "NCBI"
output: rmarkdown::html_vignette
description: >
This vignette describes how webseq can interact with the NCBI databases.
vignette: >
%\VignetteIndexEntry{NCBI}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r}
library(webseq)
```
## Download genome assemblies
Download genbank file for GCF_003007635.1.
The function will access files within this directory:
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/007/635/
Let's download GCF_003007635.1_ASM300763v1_genomic.gbff.gz!
```{r, eval = FALSE}
ncbi_download_genome("GCF_003007635.1", type = "genomic.gbff")
```
If we try to download it again, the function will indicate that the file is already downloaded.
```{r, eval = FALSE}
ncbi_download_genome("GCF_003007635.1", type = "genomic.gbff")
```
## Download metadata
Let's download some metadata for this assembly!
```{r, eval = FALSE}
assembly_meta <- ncbi_get_meta("GCF_003007635.1", db = "assembly")
```
Using this metadata we can find the BioSample ID associated with the assembly. Let's use this ID to get the BioSample UID of the sample within the NCBI BioSample database.
```{r, eval = FALSE}
biosample_uid <- ncbi_get_uid(assembly_meta$biosample, db = "biosample")
biosample_uid
```
And then get the metadata itself
```{r, eval = FALSE}
biosample_meta <- get_meta(biosample_uid$uid, db = "biosample")
biosample_meta
```
## Parse metadata files
Accessing metadata for one assembly at a time can take quite a while if you have a large number fo queries. However, if you want to access metadata for all hits of a search term, you can follow a hybrid approach: download the metadata manually and parse it with webseq.
Let's download assembly metadata for all Thiobacillus denitrificans genomes!
NCBI link:
https://www.ncbi.nlm.nih.gov/assembly/?term=Thiobacillus+denitrificans
upper right corner -> send to -> file -> format = xml -> create file
Download the file and then parse it.
```{r, eval = FALSE}
ncbi_parse_assembly_xml("assembly_summary.xml")
```
Let's download biosample metadata as well for all Thiobacillus denitrificans samples!
NCBI link:
https://www.ncbi.nlm.nih.gov/biosample/?term=Thiobacillus+denitrificans
upper right corner -> send to -> file -> format = full (text) -> create file
Download the file and then parse it.
```{r, eval = FALSE}
ncbi_parse_biosample_txt("biosample_summary.txt")
```