Download genbank file for GCF_003007635.1.
The function will access files within this directory: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/007/635/
Let’s download GCF_003007635.1_ASM300763v1_genomic.gbff.gz!
If we try to download it again, the function will indicate that the file is already downloaded.
Let’s download some metadata for this assembly!
Using this metadata we can find the BioSample ID associated with the assembly. Let’s use this ID to get the BioSample UID of the sample within the NCBI BioSample database.
And then get the metadata itself
Accessing metadata for one assembly at a time can take quite a while if you have a large number fo queries. However, if you want to access metadata for all hits of a search term, you can follow a hybrid approach: download the metadata manually and parse it with webseq.
Let’s download assembly metadata for all Thiobacillus denitrificans genomes!
NCBI link: https://www.ncbi.nlm.nih.gov/assembly/?term=Thiobacillus+denitrificans
upper right corner -> send to -> file -> format = xml -> create file
Download the file and then parse it.
Let’s download biosample metadata as well for all Thiobacillus denitrificans samples!
NCBI link: https://www.ncbi.nlm.nih.gov/biosample/?term=Thiobacillus+denitrificans
upper right corner -> send to -> file -> format = full (text) -> create file
Download the file and then parse it.