Unix is very likely the most foundational skillset we can develop for bioinformatics (and much more than bioinformatics). Many of the most common and powerful bioinformatics approaches happen in this text-based environment, and having a solid foundation here can make everything we’re trying to learn and do much easier. This is a set of 5 introductory tutorials to help us get from being completely new to Unix up to being great friends with it 🙂
Why learn Unix?
Getting familiar with working at a “Unix-like command-line” is one of the most fundamental skillsets we can develop for bioinformatics, but also much, much more. As Brian Kerrigan (a team member of the original Unix team) puts it in his 2019 book Unix: A history and a memoir:
“Unix and its derivatives aren’t widely known outside a particular technical community, but they are at the heart of any number of systems that are part of everyone’s world. Google, Facebook, Amazon, and plenty of other services are powered by Unix-like operating systems. If you have a cell phone or a Mac, it runs on some version of Unix. If you have gadgets like Alexa at home or navigation software in your car, they’re powered by Unix-like systems too.”
Being the framework for so much of our world, learning to speak its language also gives us access to things like remote servers and cloud-computing. It can allow us to access and manipulate large datasets we otherwise couldn’t, and use programs we otherwise couldn’t.
Which brings us back to it being foundational to bioinformatics. Many of the most common and powerful bioinformatics approaches happen in this text-based environment, and having a solid foundation here can make everything we’re trying to learn and do much easier.
Summary of why it’s worth it to learn Unix
- it’s the foundation for most of bioinformatics (and much more)
- enables the use of non-GUI (Graphical User Interface) tools
- improves reproducibility (GUI’s are super-convenient for lots of things, but they are not ideal when it comes to reproducibility)
- enables things like quickly performing operations on large files (without needing to read them into memory)
- can allow us to programmatically access data
- helps automate repetitive tasks (need to rename 1,000 files?)
- enables use of higher-powered computers elsewhere (servers/cloud-computing)
So let’s get to it!
Keep in mind while going through these pages is that this is all about exposure, not memorization or mastering anything. Don’t worry about the details!
- Getting started
- Working with files and directories
- Redirectors and wildcards
- Six glorious commands
- Variables and For loops