Micro binfie podcast

Microbial Bioinformatics

Microbial Bioinformatics is a rapidly changing field marrying computer science and microbiology. Join us as we share some tips and tricks we’ve learnt over the years. If you’re student just getting to grips to the field, or someone who just wants to keep tabs on the latest and greatest - this podcast is for you. The hosts are Dr. Lee Katz from the Centres for Disease Control and Prevention (US), Dr. Nabil-Fareed Alikhan and Dr. Andrew Page both from Quadram Institute Bioscience (UK) and bring together years of experience in microbial bioinformatics. The opinions expressed here are our own and do not necessarily reflect the views of Centres for Disease Control and Prevention or Quadram Institute Bioscience. Intro music : Werq - Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ Outro music : Scheming Weasel (faster version) - Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ Question and comments? microbinfie@gmail.com read less

108 SeqCode: a nomenclatural code for prokaryotes described from sequence data
1w ago
108 SeqCode: a nomenclatural code for prokaryotes described from sequence data
We are back talking about systematics, and SeqCode; a nomenclatural code for prokaryotes described from sequence data. Marike Palmer is a Postdoctoral researcher in the School of Life Sciences at the University of Nevada Las Vegas and Miguel Rodriguez is an Assistant Professor of Bioinformatics at the University of Innsbruck in the departments of Microbiology and the Digital Science Center (DiSC). Link to paper: https://www.nature.com/articles/s41564-022-01214-9 History paper: https://www.sciencedirect.com/science/article/pii/S0723202022000121 They discussed the SeqCode, a nomenclature code for Prokaryotes described from sequence data. The SeqCode was created to provide a specific nomenclature code for previously uncultivated organisms. Palmer explained that the impetus for the SeqCode was the need to accommodate previously uncultivated organisms under a specific nomenclature code. She emphasized that the SeqCode was written to allow any peer-reviewed publication, but noted that the authors have designed three paths of validation in the SeqCode. They hope that anyone proposing a name will work with the curriculum team to ensure the best quality descriptions, names, etymology, and solidification. Rodriguez discussed the SeqCode's governance, which is already in place, and they have made them public so that anyone interested can join the SeqCode community. The governance structure comprises an executive board, committees, and working groups. The position's co-opted members hold some of the committees of these committees, while some are chosen by ballot. The hosts sought to clarify the relationship between the Isme Society, which is backing the SeqCode, and the wider field in general. Rodriguez explained that ISME is simply providing support as an umbrella organization for the SeqCode. Palmer and Rodriguez clarified that the SeqCode is not a competing code but rather a parallel one that aims to accommodate previously uncultivated organisms. The SeqCode was created to provide a specific nomenclature code for previously uncultivated organisms. Palmer noted that most scientists culture prokaryotes not for naming but to advance their knowledge of these organisms through physiology experiments. They emphasized that the new system is the result of a long collaborative effort that involved many different viewpoints and philosophies. The episode also discussed the practical requirements for naming under the new system, which include standards for the completeness and contamination levels required in the genome sequence data. Palmer noted that while the 16S rRNA gene sequence was not required for naming, it was recommended for improved accuracy in cross-talk between different taxonomies. The conversation highlighted the importance and challenges of naming microorganisms and the ongoing efforts to create a system that is inclusive of all microorganisms, both cultivated and uncultivated. Rodriguez and Palmer also discussed the SeqCode, a nature code for naming prokaryotes described from sequence data. They agreed that high-quality genomes should be the main control types to ensure the system builds up rather than breaks down. They noted the challenge of obtaining full genomes of some organisms, such as obligate intracellular parasites but suggested obtaining housekeeping genes as a potential solution. They further explained the technical issue of estimating completeness or contamination for many taxa, but Palmer confirmed that registering a name on the SeqCode registry requires adding such estimates. It emphasized the importance of collaboration within the scientific community and the need to create a system that is inclusive of all microorganisms. It also highlighted the challenges inherent in the process of naming microorganisms but demonstrated that it is an ongoing process, and that scientists are working to create a system that is accurate, practical, and beneficial for all.
106 Why on earth would you do a PostDoc?
Apr 27 2023
106 Why on earth would you do a PostDoc?
An honest discussion about the up and downsides of doing a postdoc in front of an audience of first year PhD students. Guests Dr Emma Waters, Dr Heather Felgate and Dr Muhammad Yasir are joined by Dr Andrew Page. It was recorded in front of a live audience of PhD students at the Microbes, Microbiomes and Bioinformatics doctoral training program in the Quadram Institute in Norwich UK. Emma starts the conversation by sharing that she enjoys research and solving problems with different tools. The thrill of discovery and exploration that comes with the postdoc position is something she loves. Heather echoes Emma's thoughts and believes that she is happy where she is, rather than chasing after a higher paying job in the industry. She appreciates the flexibility that academia offers, which has enabled her to balance her family and personal life. The conversation takes a turn when PhD students ask if any of the postdocs regret the decision of choosing academia despite the evident pay gap between the industry and academia. Emma points out that although she may have earned more in the industry, she is happy where she is, and finds satisfaction in helping people through her work. Chasing profits in the industry would not offer her that kind of gratification. Yasir shares his success story of sequencing 600 samples of the SAR-CoV-2 virus in Pakistan, and how it contributed towards the fight against the pandemic. He credits the freedom and flexibility of academia that allows him to collaborate with colleagues from all over the world. In conclusion, Andrew advises students to explore their options and to keep their careers open-ended. He suggests that if they are after a higher paycheck, they should consider the bioinformatics data science path that offers more earning opportunities in the industry. The postdocs stress the importance of following what makes one happy in life, rather than chasing big salaries.
104 The Kraken software suite
Apr 6 2023
104 The Kraken software suite
We talk about KRAKEN the taxonomic classification software and the software suite around it and are joined by Jennifer Lu and Natalia Rincon from Johns Hopkins University Center for Computational Biology. Dr. Jennifer Lu and Natalia Rincon from the Kraken software development team were interviewed on the MicroBinfie podcast. They discussed the various versions of Kraken and the tools developed around it. They began by explaining the original Kraken, which uses an exact camera matching process and a camera size of 31 based on jellyfish. Kraken Unique is an additional version of Kraken that includes an additional column called unique camera counting, which determines how many unique cameras are covered by each read, providing an additional way to verify microbial identification. Kraken two was developed to accommodate larger databases by using a probabilistic data structure and minimizers to map cameras to a shorter sequence size. They then talked about how Kraken is useful for microbiome analysis, including detecting pathogens. However, the accuracy of the results depends heavily on the availability of genomic data in the database, which emphasizes bacterial and viral data. For infectious pathogen detection, Kraken one unique is combined with Bracken to approximate the abundance of species present. The developers emphasized the importance of users being aware of available genomic data in the database because the results can only be as accurate as the data. They also talked about how Kraken is used widely in bioinformatics and can be used for various scenarios beyond metagenomics. For example, they use Kraken to treat a single genome as a metagenome as part of quality control analysis. In cases where there are conflicting taxa in the reads, Kraken results show it, making it useful in determining the presence of contamination in samples. The Kraken team also talked about how they use Kraken for contamination work to detect contamination in pathogen genomes. They compare all eukaryotic pathogen genomes against bacteria, human genomes, and databases of vertebrates and plants to filter out any contaminants. They have found in some instances where contaminating sequences from hosts such as chicken or cow were present in eukaryotic pathogen genomes. Moving forward, the Kraken team intends to maintain all Kraken repositories, enhance its accuracy, speed, and usefulness, and develop new scripts and downstream analysis for the Kraken Tools suite. They acknowledge the need to make the database smaller as more genomes become available and are exploring ways of indexing and sketching to achieve this. In conclusion, Kraken has been an essential software for metagenomic analysis, and it remains a continually improving tool for pathogen detection and classification. The Kraken team advises users to keep in mind the importance of accurate data for effective pathogen detection and classification.
103 Release the Kraken
Mar 23 2023
103 Release the Kraken
We are talking about KRAKEN - the taxonomic classification software and in the hot seat are Dr Jennifer Lu and Natalia Rincon from Johns Hopkins University Center for Computational Biology. The MicroBinfie podcast welcomed Dr. Jennifer Lu and Natalia Rincon to discuss Kraken, a taxonomic classification software. Developed in 2013-2014, Kraken easily identifies and assigns sequencing reads to a specific species, genus, or general bacteria. Its efficiency in classifying millions or billions of reads puts it ahead of other classification methods such as Melan, Mega Blast, and Chime. The tool is known for its ease of use and accuracy. Following the success of Kraken's metagenomic analysis, Florian Breitweiser developed Kraken Unique, which provides more information than the standard Kraken. C Another edition to the Kraken family is Bracken, developed by Jennifer Lu, which estimates abundance, and Nat Rincon contributes to the newest editions, which analyze diversity metrics. Kraken's exact camera matching technology identifies reads and classifies taxonomy IDs, with two outputs: a long text file for every read and a Kraken report that provides a breakdown of reads for each taxonomy ID. The interpretation of the Kraken report relies on the sample and its taxon. Even if there are few reads available, taxons can still be meaningful. For beginners, Kraken simplifies the classification process by providing pre-built databases. There was an interesting discussion about the origin of the Kraken name. It is derived from a mythological creature that relied on Jellyfish, a camera counting tool used to build the Kraken databases. Derek Wood developed the original concept of Kraken. The hosts found a true pathogen in a sample, which was significant for downstream analysis. The number of reads in some samples was very few, and some unclassified reads could also be uninformative or indicate contamination. Being developed for Illumina reads, Kraken's accuracy in classifying Nanopore reads is likely to be affected due to the higher error rate. The Kraken database achieves exact matching of k-mers and fits all genome information into a small space. Tools spawned out of the Kraken world are widely used due to their high accuracy, speed, and simplicity in the classification of taxonomy. Kraken provides an additional column in the report to count the number of unique k-mers to validate the results. The developers worked closely with others to test new Nanopore chemistries due to the frequent changes in the chemistry that affected the accuracy of the reads. Kraken databases contain vector sequence information, and vectors are given their taxonomy ID as "synthetic sequences." The software mixes Pearl and C++, with Pearl processing inputs and C++ managing heavy memory stuff by building and compacting sequences and writing bytes. Dr. Jennifer Lu appreciates the simplicity and accuracy of the classification algorithm, and Nat Rincon takes pride in being part of the Kraken community.
102 Early days of MLST
Mar 9 2023
102 Early days of MLST
Ed Feil is a professor of bacterial evolution at the University of Bath, and Natacha Couto, a data scientist at the Center of Genomic Pathogen Surveillance at the University of Oxford. We delve into the concept of multi-locus sequence typing (MLST) in bacterial population genetics. They highlight how the MLST method allows for defining strains based on partial sequences that range up to 500 base pairs. The method measures differences between loci for each strain, offering an allele number while assigning similar numbers to identical sequences. The cumulative sequence number represents the unique identification, which is subsequently referred to as the sequence type (SST). MLST has revolutionized the field by facilitating digital storage and comparison of epidemiological databases, proving particularly useful in investigating transmission events and dissemination of certain strains. Although there are other methods such as Pulse Field Gen Electrophoresis (PFGE) that offer higher resolution when looking for similarities between different strains, MLST remains a versatile and widely used method. They also talk about the shortcomings of MLST and the need for continued improvements in population genetics research. They mention the development of the Eburst program, which uses a circular model, rather than the traditional dendrogram tree structure, to better visualize MLST data and understand the clonal expansion of populations. They also discuss how the original MLST schemes may not have included the best genes for all bacterial species as the genes were chosen before genome sequencing became widely available. Ed and Natacha further elaborate on the concept of clonality among bacterial species. Ed suggests that bacterial population structures have no consistent pattern, with some organisms being well-behaved, while others have a lot of allele shuffling. However, clones have existed since day one, and their presence is still seen today. Natacha adds that although MLST has flaws, it leaves behind the nomenclature for the lineages or clones, which is a lasting legacy. Nabil-Fareed notes that while most reference labs have moved on to genomics, some people still use MLST. He adds that the pipeline is the same for any organism, and the process is efficient in the end. The discussion concludes with the hosts thanking the guests and promising more exciting topics in their next episode. Overall, the hosts highlight the significance of understanding the limitations of MLST and the scope for further research in bacterial population genetics.
101 One Health with Natacha Couto and Ed Feil
Feb 23 2023
101 One Health with Natacha Couto and Ed Feil
The hosts of the MicroBinfie podcast invite Dr Natacha Couto (University of Oxford) and Professor Ed Feil (University of Bath) as special guests to discuss the concept of "One Health". One Health is a comprehensive approach that seeks to manage the problem of antimicrobial resistance (AMR) by addressing the use of antibiotics in healthcare, agriculture, and the environment. It aims to improve health outcomes across all sectors to create a better planet. However, the diagrams often used to represent One Health are misleading as they do not take into account the complexity of the transmission of AMR. Therefore, there is a need for a quantitative study to understand and identify the ecological and biological barriers to AMR transmission. Visual aids such as these diagrams are not always accurate and should be approached with caution; scientists should be mindful of the implicit confirmation bias in visually-appealing graphics. AMR determinants are found in various settings, including animals, the environment, and humans, due to the derived nature of most antibiotics from natural compounds on Earth. Studies have shown that the presence of AMR determinants is not limited to hospitals; they can be found in the environment and surroundings of hospitals. However, they caution that sampling methods can skew results, and it is essential to use a quantitative approach to understand the transmission of AMR across different sectors. The One Health approach requires understanding the drivers of resistance and virulence and looking beyond human pathogens. Plants, insects, and animals form part of the broader virome and represent systems that are harder to study. There is no clear answer on where to focus resources as both resistant and commensal strains can be important to study. Context is essential when it comes to virulence as the consensual bacteria can become dangerous pathogens in certain situations. They note that environmental factors play a significant role in disease outbreaks, and understanding the habits of hosts like deer or pheasants, on whom ticks feed on, is crucial. Approaches like outbreak analysis that work in hospitals cannot be used in environmental settings. Disease cannot be studied as if it occurs in a vacuum. Covid-19 has shown how host switches can have severe consequences, but spillover events usually fizzle out before causing any harm. Understanding environmental factors like habitat changes may help tackle disease outbreaks better in the future. While tools like sequencing and analysis may be equivalent, questions investigated in different settings are vastly different. It is essential to comprehensively understand social science factors such as people's compliance level and risk perception when studying transmission in human communities. In conclusion, the issue of antimicrobial resistance is complex and requires a multidimensional approach involving different perspectives and fields of study.
98 Nomadic bioinformatics with Frank
Jan 12 2023
98 Nomadic bioinformatics with Frank
We interview Frank Ambrosio. He is embarking on a lifestyle of nomadic bioinformatics, living his best life. * https://www.linkedin.com/in/francis-ambrosio/ In this episode of the MicroBinfie podcast, Frank Ambrosio, a bioinformatician working for Theiagen as a traveling bioinformatician, joins co-hosts Andrew Nabil and Lee to talk about his journey into bioinformatics. Frank shares how he transitioned from being a lab technician and microbiologist to analyzing his own data and pursuing a master's program in bioinformatics at Georgia Tech. He also discusses his experience working at the CDC, where he gained exposure to different laboratories working on tuberculosis, biodefense research and development, surveillance-oriented production laboratories for strep genomes, and the division of HIV/AIDS prevention. Frank gives tips for aspiring bioinformaticians, recommending that early career scientists focus on applying for contracting agencies at the CDC to gain valuable experience and eventually become full-time employees. He also suggests starting with a virtual machine and a cloud-based IDE like Google Cloud and VS Code for ease of use and reliability. The conversation then moves onto Frank's nomadic lifestyle as a traveling bioinformatician, and his desire to connect with the public health community worldwide. Frank shares his recent experience meeting collaborators in Mozambique and the importance of building personal connections with colleagues in public health for collaboration and support. Frank concludes by discussing his approach to routines while traveling and how he uses his Google calendar to plan out his days and weeks. He emphasizes the importance of flexibility and adaptability as a traveling bioinformatician, and his eagerness to continue meeting new people and building connections in the public health community. Moving on to Frank's lifestyle as a digital nomad bioinformatician, he explains how he enjoys enhanced flexibility, better quality of life, and the ability to work anywhere in the world. However, he also highlights that this lifestyle model could be challenging, particularly for those who prefer greater stability and predictability. Nabil wonders how possible it would be for bioinformaticians to engage in mentoring and education while working as digital nomads. Frank acknowledges the concerns but highlights that he has been fortunate enough to maintain his mentor relationships remotely. He talks about how working with someone on a project can facilitate a stronger and more rewarding mentor-mentee relationship. The hosts note that flexibility is not new to bioinformatics and that technological advancement is making it easier to find intelligent people worldwide to join in the missions of organizations like the CDC. Frank reflects on his future, reserving the potential to remain with his current institution, Theiagen. He remains optimistic about the potential of these digital collaborations and is open to new opportunities to help the global bioinformatics community.
97 Advances in sequencing technologies
Dec 29 2022
97 Advances in sequencing technologies
We discuss recent advancements in genome sequencing technologies, based on what we've been hearing at conferences and within the community. The Microbial Bioinformatics podcast brought together three experts, Andrew, Lee, and Nabil, to discuss the latest advances in sequencing technologies. The team explored the new developments in the market, including a cutting-edge instrument from Element Biosciences that captured Nabil's attention. Andrew analyzed the adaptive sequencing feature in Illumina that enables the checkout of unwanted reads. The discussion highlighted how the computing power of sequencing labs has developed due to advancements in computers, with gaming computers being repurposed to aid in data analysis. Illumina's complete long-read solution and NextSeq's kits were also topics of discussion. Moreover, the team also discussed the increasing popularity of pacbio with its hi-fi sequencing capabilities to achieve more high fidelity readings. The experts then discussed how longer reads pave the way for 4th generation sequencing while also acknowledging the challenges posed by software tools catering to the new technology. While the developments in sequencing technology seem exciting, Nabil cautioned the panel to not forget the importance of quality over quantity. In the second part of the episode, the team moved on to analyze the limitations of sequencing software, particularly regarding its long-read handling capabilities. Andrew explained how sequencing software is hard-coded to operate up to 300 paired-ended reads, and exceeding this limit often leads to software crashes. Lee asked if there was a constant limit in the source code of Spades or SKESA to limit the software's ability to handle larger datasets. Andrew answered the query by explaining that developers may have set some limits on the memory or stack size of the software, leading to issues when processing larger datasets. The team concluded by noting that the hard-coding and data processing limitations shouldn't be considered permanent obstacles as software development is a continuous process. As sequencing technologies advance, software solutions must also advance to handle increasingly complex genetic datasets better.
94 The great scientific Mastodon migration
Nov 17 2022
94 The great scientific Mastodon migration
Over the past few weeks scientists have been swapping Twitter for Mastodon. Our very own Nabil-Fareed Alikhan talks about his experience with setting up and running a Mastodon server called https://mstdn.science which is one of the places where scientists have moved over to. We are joined by Emma Hodcroft to get an independent scientists view on the whole thing. In the MicroBinfie podcast, Andrew and Nabil discuss the migration of academics from Twitter to a new platform called Mastodon, with Nabil playing a significant role in this shift. According to Nabil, Mastodon is a free and open web application designed for micro-blogging. It enables integration and communication between servers, allowing the users to follow, reply, or read content from other servers. The migration happened after Elon Musk bought Twitter and made significant changes that concerned people about freedom of speech and democracy. In response, Nabil and Duncan set up their own Mastodon instance called https://mstdn.science initially planning to create a social network for bioinformaticians, microbial genomics people, and tech-savvy microbiologists. Expected to have only 50-100 users, many more scientists, including Nobel Laureates, journals, and scientists from other disciplines, joined, and Nabil's instance now has almost 2000 users. Meanwhile, other instances around science, like genomic.social or ecoevo.social, also saw a surge in sign-ups. In terms of resources, Nabil and Duncan's virtual server have almost 2000 users costing around £100 per 1000 users, depending on how much interaction and following goes on. The Mastodon network replicates content from other instances, spawning many jobs, even if a user's account doesn't change much. Nabil does not limit which instances of Mastodon communicate with his site but does block domains serving unwanted or unsafe content. Even though the Mastodon network can crash and burn, Nabil thinks it could still work in the long run. The podcast contributors suggest that Twitter's recent changes have left some users feeling dissatisfied, leading them to Mastodon, which is a decentralized social media platform. Some dodgy servers have been blocked by Mastodon for moderation, and people have moved from Twitter to Mastodon as a total replacement for Twitter. Mastodon has become a "sign" for fed-up users. According to Emma, who recently moved from Twitter to Mastodon, Mastodon is a hedge against Twitter's unknown future. Mastodon's decentralized platform allows for a shift of power towards content and interaction, not available in a centrally controlled platform. Mastodon may not replace Twitter as a one-for-one replacement, but it fits certain use cases, such as a place for academics to complain about papers. Mastodon's success is not dependent on Twitter's fate but rather on what "crazy ideas" Twitter comes up with in the future, Emma argues. While Mastodon may never be quite the same as Twitter, it could be even better.
92 - Avoid dependency hell and get up and running fast
Oct 27 2022
92 - Avoid dependency hell and get up and running fast
Often the hard part of bioinformatics isnt the analysis, its getting all of the software you need setup and installed. Come with us on this journey and avoid dependancy hell. In the MicroBinfie podcast, the hosts discuss the struggles of installing, managing, and dealing with dependencies with bioinformatics software. In the past, software installations were a nightmare, and it was common to edit lines of code and manage dependencies manually, causing conflicts like diamond dependency. To ease this process, the hosts suggest using containers, virtual machines, and local environments. They stress the importance of adhering to semantic versioning guidelines and understanding the end-users' perspective for proper documentation, testing, and clarity regarding dependencies. Additionally, software maintenance is critical for its longevity and usability. The hosts also discuss software dependency management with different chip architectures and operating systems. The M1 Apple architecture's differences from traditional computer processors cause compatibility issues and slow down emulation, leading to difficulties in informatics. Using separate Conda environments for each project or Mamba as a package manager can solve dependency-related problems that can cause significant issues. However, Mamba may take shortcuts and create conflicts with specific programs. Other package managers like Homebrew and APT are also discussed. The episode also covers the benefits of using Docker and Singularity to manage software packages on a local machine. Docker is useful for databases, web servers, and complicated pipelines, while Singularity is perfect for more complex software and plays better with HPC. The hosts provide tips on using containers or virtual machines in a team environment, passing containers instead of binary files, and using Docker and Singularity as tools to ease the process. Overall, the episode offers practical advice to streamline the workflow of researchers who manage software packages.
91 - What language should I learn?
Oct 13 2022
91 - What language should I learn?
The MicroBinfie podcast discusses the top programming languages for bioinformatics. Andrew, Lee, and Nabil agree that Python is a great starting point for its consistency and rigor. Its strict syntax is ideal for teaching programming fundamentals that are essential in any language. In contrast, Perl encourages multiple ways of doing the same thing, creating confusion and difficulties in keeping track of things. The hosts caution against starting with trendy languages that are constantly changing. Instead, stick with more established languages like Python, which have established libraries and concepts that will help you advance more easily. Trendy languages come and go like changing tides, making them riskier choices. Additionally, they highlight the importance of understanding databases and their primary keys and unique fields. SQL is useful, particularly in dealing with large datasets. It is consistent across flavors and unlikely to go away soon. It takes a lot of skill to optimize queries to work in milliseconds. The hosts emphasize that the language you choose to learn depends on your individual goals and environment. For instance, Lee suggests that you should look to who is in your space and what they are using and who is willing to help you. Once you understand the programming concepts, it is easier to transfer them to other languages, and it is just a question of understanding the syntax. Andrew, Lee, and Nabil also discuss their own trajectories of learning programming languages, revealing that it takes a long time to become an expert in a language, and it is something that needs to be appreciated. They highlight the difference between just learning the basics of a language and really getting into the depths of it and the frameworks and libraries. The hosts also mention languages that are important to pick up, like SQL and bash scripting, and languages that are popular for web development, like JavaScript. However, they caution that JavaScript and Java are not the same thing and that JavaScript has a reputation for being a weird language. When asked what language they would choose for a task, Nabil says he would use Perl, Lee mentions R for stats, while Andrew admits that he has to relearn R every time he comes back to it and therefore prefers Perl for quick scripts. They also discuss their love-hate relationship with R, mentioning that while it has useful libraries like GGplot and GGtree, its syntax is difficult to work with and has separate paradigms of approaching the same problem. The hosts conclude by acknowledging that there is no one-size-fits-all approach to learning programming languages. One should choose based on their goals, environment, and personal preferences. Python is a useful language to learn, even if one is not interested in bioinformatics. Additionally, they note that the fundamentals of databases and how they work are crucial to understand and utilized across fields.
89 What do we do with WDL?
Sep 15 2022
89 What do we do with WDL?
Today on the @microbinfie podcast, we talk about WDL with @sevinsky and @DannyJPark. We learn what widdle means to Andrew and his kids. Joel takes a shot at Lyve-SET and you'll never guess what happens next. In the MicroBinfie podcast, we discuss the workflow description language (WDL) commonly used to describe bioinformatics pipelines in a portable and cross-environmental way. The starting point is the presumption that tools are already containerized, and WDL helps to bind them together. The guests highlighted that this standardizes bioinformatics in the field, making it more reproducible and scalable. It also helps remove the need for excessive CIS admin work, enabling researchers to spend more time on scientific questions. Despite having many workflow languages available, WDL is unique in its formal specification and its orthogonality to the common implementations that are used in executing those things. In the second part of the podcast discussion, guests Joel and Danny talked about workflow languages and public health bioinformatics. They highlighted the challenge of version control to quality management and its effects on the field of bioinformatics. They spoke about the origins of the community, StaphB, which comprises state-level public health bioinformaticians. The community discusses various challenges and contributes to creating links between academia and state public health departments. WDL is a workflow language used for bioinformatics work that the hosts use. Danny shared his story of how they came to use Whittle and how they realized it was the perfect language for portability of pipelines. On the other hand, Joel talked about how they chose WDL for its applicability to public health and the support it received from its creators, particularly the Broad Institute. They both agreed that the choice of workflow language was driven by the environment they could work in and which language was best suited to their needs. In conclusion, the discussion focused on the vital role of workflow languages such as WDL in bridging the gap between bioinformatics and public health. The choice of workflow language was critical and would depend heavily on the environment in which the language was used. Finally, they expressed their support for WDL and how it had helped them streamline their bioinformatics workflows.