.small[ http://www.read.seas.harvard.edu/~kohler/class/aosref/ritchie84evolution.pdf
.small[ Ken Thompson (sitting) and Dennis Ritchie working together at a PDP-11
Vast array of simple, dependable tools
Each do one simple task, and do it really well
By combining these tools, one can conduct rather sophisticated analyses
The Linux help philosophy: "RTFM" (Read the Fine Manual)
Unix users spend a lot of time at the command line
In Unix, a word is worth a thousand mouse clicks
Shell is an interactive environment with a set of commands to initiate and direct computations
Shell encloses the complexity of OS, hence the name
https://en.wikipedia.org/wiki/Unix_shell
The Bourne shell (sh
) is a shell, or command-line interpreter, for computer operating systems
Developed by Stephen Bourne at Bell Labs, 1976
bash
(the Bourne-Again shell) was later developed for the GNU project and incorporates features from the Bourne shell, csh
, and ksh
. It is meant to be POSIX-compliant
https://en.wikipedia.org/wiki/Stephen_R._Bourne
bash
- Bourne-Again shell
tcsh
- TENEX C shell
zsh
- Z shell
Change shell: chsh –s /bin/zsh
$SHELL
environmental variable has path to the currently used shell
Remote access, SSH, PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/), MobaXterm (https://mobaxterm.mobatek.net/)
Mac OS X + Xcode development suite (free, https://developer.apple.com/xcode/) + X11 server (free, https://www.xquartz.org/) + iTerm2 (optional, https://iterm2.com/)
Ubuntu Linux (long-term support LTS version, XX.04, http://www.ubuntu.com/download/desktop)
Cygwin, http://www.cygwin.com/
Git Bash, https://git-for-windows.github.io/
Boot from a CD or USB (search for "linux usb")
Install the whole Linux systems as a Virtual Machine in VirtualBox, https://www.virtualbox.org/
"Terminal" is already installed, bash
shell
Why? Darwin, the system on which Apple's Mac OS X is built, is a derivative of 4.4 BSD-Lite2 and FreeBSD. In other words, the Mac is a Unix system
For X11 (graphics), see XQuartz, http://xquartz.macosforge.org/landing/
iTerm2
- a better terminal replacement for Mac, https://www.iterm2.com/
Modern Unix systems have package managers to that download install (free) software for you
On a Mac, Homebrew (http://brew.sh/) is a popular package-management system (alternatively, MacPorts, https://www.macports.org/)
On Ubuntu, apt (https://itsfoss.com/apt-get-linux-guide/) is the standard package manager, with both a command-line and graphical interface available
On Windows, Cygwin (https://cygwin.com/install.html) installs everything precompiled through its setup file. Do not delete setup-x86_64.exe
file after installing Cygwin, explore what Linux tools are available (a lot)
Most commands take additional arguments that fine tune their behavior
If you don't know what a command does, use the command man <command>
Press q
to quit the man
page viewer
Most often, you’ll use <command> -h
or <command> --help
Some commands output help if executed without any arguments
cd /
- go to the root directorycd /usr/home/jack/bin
- go to the user’s sub-directorycd ..
- go to the upper level directorycd
, or cd ~
- go to the user’s home directorycd --
- go to the last visited directorypwd
- print working directory
ls
- list all files in the current directory
ls -1
- list files in one column
ls –lah
- list files in l
ong, h
uman readable format, include a
ll content, user, owner, permissions
touch <file>
- creates an empty file
nano <file>
- edit it
mkdir <dirname>
- creates a directory
cp <source_file> <target_file>
- copy a file to another location/file
mv <source_file> <target_file>
- move a file
rm <file>
- remove a file. If multiple files provided, removes all of them
rm –r <dirname>
- recursive removal (deletes a directory)
chmod
, chown
and chgrp
In Unix, every file and directory has an owner and a group
Owner
- is the one who created a file/directoryGroup
- defines rules of file operations and/or permissionsEvery
- user on a Unix machine can belong to one or more groupsEvery file has three permission levels
-rw-r--r-- 1 mdozmorov staff 205B Dec 19 11:01 BIOS692.2018.Rproj-rw-r--r-- 1 mdozmorov staff 3.5K Dec 18 10:20 BUILD.md-rw-r--r-- 1 mdozmorov staff 470B Dec 19 08:48 README.md-rw-r--r-- 1 mdozmorov staff 2.1K Dec 19 07:51 _config.ymldrwxr-xr-x 10 mdozmorov staff 340B Dec 18 10:20 _includesdrwxr-xr-x 10 mdozmorov staff 340B Dec 18 10:20 _layoutsdrwxr-xr-x 7 mdozmorov staff 238B Dec 18 10:29 _posts-rw-r--r-- 1 mdozmorov staff 1.0K Dec 20 15:54 acknowledgements.md
The first column tells you about permissions
owner
/group
/world
permissionsThe second column has the number of files (inside a directory)
find
- lists all files under the working directory (and its subdirectories) based on arbitrary criteria
find .
- prints the name of every file or directory, recursively. Starts from the current directory
find . –type f
- finds files only
find . –type d –maxdepth 1
- finds directories only, at most 1 level down
find . –type f –name "*.mp3"
- finds only *.mp3 files
find . -type f -name "README.md" -exec wc -l {} \;
- find files and execute a command on them
*
- matches any character
?
- matches a single character
[chars]
- matches any character in chars
[a-zA-Z]
- matches any character between a
and z
, including capital letters
ls *.md
ls [Rt]*
cat <file>
- prints out content of a file. If multiple files, consequently prints out all of them (concatenates)
zcat
- prints out content of gzipped files
less <file>
- shows the content of the file one screen at a time
Keyboard shortcuts for more command
space
- forward, b
- backwardg
- go to the beginning, G
- go to the end/<text>
- starts forward search, enter to find next instanceq
- quitOne of the most useful capabilities of Unix is the ability to redirect the STDOUT of one command into the STDIN of another
The |
(pipe) character feeds output from the first program (to the left of the |
) as input to the second program on the right. Therefore, you can string all sorts of commands together using the pipe
find . | wc -lcat names.txt | sort | uniq -c
<command> && <command>
mkdir music && mv *.mp3 music/
Nearly every command in Unix makes use of a convention to have a "standard input" (also called stdin or STDIN, or channel 0) and "standard output" (also called stdout or STDOUT, or channel 1)
There is also a "standard error" (stderr or STDERR, or channel 2) output that is, by convention, reserved for error messages
find / 2> error.log
- capture STDERR into a file
find / 2> /dev/null
- suppress STDERR messagesfind / 2>&1
- add STDERR to STDOUTIf you want to dump the standard output into a file, use command > file
(overwrites the file). command >> file
(appends to the file)
Redirection example: ls > README.md
- save file list in the current directory into README.md file
Redirection works in another direction: grep CC0 < License.md
Or, the content of a commant into another command: join <(sort file1) <(sort file2)
head/tail | cut |
for | comm |
sort | echo |
uniq | basename |
wc | dirname |
tr | history |
grep | which |
join | who |
kill | grep |
tar | seq |
gzip | paste |
Tab completion
Ctrl-c
- cancel the command you are writing
Ctrl-r
- reverse search through your command line history
history
- shows your previous commands
!<history number>
- repeats specific command. Or, !ls
to match the most recent ls
command
!!
- repeats the last command
As we interact with Linux, we create numbered instances of running programs called “processes.” You can use the ps
command to see a listing of your processes (and others!). To see a long listing, for example, of all processes on the system try: ps -ef
To see all the processes owned by you and other members of the class, try: ps –ef | grep bash
To see the biggest consumers of CPU, use the top command (which refreshes every few seconds): top
&
operator, to run programs in the “background”, with the result that the shell prompts immediately without waiting for the command to complete:$ mycommand &[1] 54356 <-------- process id$
Sometimes you start a program, then decide you want to run it in the background. Here’s how:
top
Press C-z
to suspend the job. You can continue working in the terminal, but the job won't be lost
Type bg
at the command prompt to see what jobs are available
To bring the background job back to the foreground, type fg
at the command prompt
To end the job, use the ‘kill’ command, either with the five-digit process id: kill 56894 #for example!
Use kill -9 PID
to immediately stop the job
data_hacks, https://github.com/bitly/data_hacks
histogram.py
bar_chart.py
sample.py
datamash, https://www.gnu.org/software/datamash/
tree
- lists the contents of directories in a tree-like format, https://www.tecmint.com/linux-tree-command-examples/
htop
- an interactive process viewer, https://htop.dev/
csvkit
- collection of command-line tools to work with CSV data, https://csvkit.readthedocs.io/
parallel
- a shell tool for executing jobs in parallel using one or more computers, https://opensource.com/article/18/5/gnu-parallel
Allow you to submit multiple jobs at once
Depending on the system, can schedule jobs for you
Are optimized for high-throughput performance
Heng Li's "A Bioinformatician's UNIX Toolbox", http://lh3lh3.users.sourceforge.net/biounix.shtml
Bioinformatics one-liners by Stephen Turner, https://github.com/stephenturner/oneliners
Collection of bioinformatics-genomics bash one liners, using awk, sed etc. https://github.com/crazyhottommy/bioinformatics-one-liners
Links and references to many genomics and bioinformatics resources, https://github.com/crazyhottommy/getting-started-with-genomics-tools-and-resources
.small[ http://www.read.seas.harvard.edu/~kohler/class/aosref/ritchie84evolution.pdf
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |