Monday
On Monday, we covered some topics relating to good data practice, git and quarto. At the end of the day, we set ourselves up with individual blogs for documenting course progress.
Intro to course by Amrei
Introduction • Course content, schedule etc • People intros
Data management Data cycle FAIR principles Good data management practices • Research documentation • Data organisation • Information security • Ethics and legislation (Sweden: research data needs to be kept for 10 years)
Data sets: central dogma of biology <-> bioinformatics
Best practices • Raw data in separate directory • Code in another directory • Output (figs) in separate directory • Version control • README in every directory • File naming that is easy to understand for humans and machines (no ö, spaces characters) • Use non-proprietary formats —> .csv instead of .xlsx
Literate programming • Code chunks • Markdown Notebook in quarto
Version control Git and good git practices Clear and informative, commit often (multiple times per day - once per day) —> Writing blog posts every day
Environment managers Using pixi in this course
Containers Includes everything necessary to run code and do the necessary analysis, including the OS.
Workflow manager —> Nextflow
Git and GitHub by Samuel Flores
• git branch • git checkout • git merge • git diff • git add • git commit -m “Message” Commit on a single theme, message should be in imperative • git push
Notes on branching • git branch —> creates a new branch based on the current commit
Merging • specific branch or specific commit • conflicts need to be resolved
Add and commit • add —> taking area • commit —> send to repo
Push, pull and collaborate • git push origin main
Exercise: alphabetise a list • Format first name, last name • Branching • Merge with neighbours via commit, pull, merge and push until 14 names are in alphabetical order.