+ - 0:00:00
Notes for current slide
Notes for next slide

Unix introduction

Mikhail Dozmorov

Virginia Commonwealth University

01-27-2021

1 / 36

What is Unix

  • Unix is a family of operating systems and environments that exploits the power of linguistic abstractions to perform tasks
  • Unix is not an acronym; it is a pun on "Multics". Multics was a large multi-user operating system that was being developed at Bell Labs shortly before Unix was created in the early '70s. Brian Kernighan is credited with the name
  • All computational genomics is done in Unix

.small[ http://www.read.seas.harvard.edu/~kohler/class/aosref/ritchie84evolution.pdf

2 / 36

History of Unix

  • Initial file system, command interpreter (shell), and process management started by Ken Thompson
  • File system and further development from Dennis Ritchie, as well as Doug McIlroy and Joe Ossanna
  • Vast array of simple, dependable tools that each do one simple task

.small[ Ken Thompson (sitting) and Dennis Ritchie working together at a PDP-11

3 / 36

Philosophy of Unix

  • Vast array of simple, dependable tools

  • Each do one simple task, and do it really well

  • By combining these tools, one can conduct rather sophisticated analyses

  • The Linux help philosophy: "RTFM" (Read the Fine Manual)

4 / 36

Know your Unix

  • Unix users spend a lot of time at the command line

  • In Unix, a word is worth a thousand mouse clicks

5 / 36

Unix systems

  • Three common types of laptop/desktop operating systems: Windows, Mac, Linux.
  • Mac and Linux are both Unix-like!
  • What that means for us: Unix-like operating systems are equipped with "shells"" that provide a command line user interface.
6 / 36

Shell, aka command line, aka terminal

  • Shell is an interactive environment with a set of commands to initiate and direct computations

  • Shell encloses the complexity of OS, hence the name

    • You type in commands
    • Shell executes them

https://en.wikipedia.org/wiki/Unix_shell

7 / 36

Shell, aka command line, aka terminal

  • The Bourne shell (sh) is a shell, or command-line interpreter, for computer operating systems

  • Developed by Stephen Bourne at Bell Labs, 1976

  • bash (the Bourne-Again shell) was later developed for the GNU project and incorporates features from the Bourne shell, csh, and ksh. It is meant to be POSIX-compliant

https://en.wikipedia.org/wiki/Stephen_R._Bourne

8 / 36
  • bash - Bourne-Again shell

  • tcsh - TENEX C shell

  • zsh - Z shell

  • Change shell: chsh –s /bin/zsh

  • $SHELL environmental variable has path to the currently used shell

9 / 36

Getting to the command line

10 / 36

Getting to the command line | Windows users

11 / 36

Getting to the command line | Mac users

  • "Terminal" is already installed, bash shell

  • Why? Darwin, the system on which Apple's Mac OS X is built, is a derivative of 4.4 BSD-Lite2 and FreeBSD. In other words, the Mac is a Unix system

  • For X11 (graphics), see XQuartz, http://xquartz.macosforge.org/landing/

  • iTerm2 - a better terminal replacement for Mac, https://www.iterm2.com/

12 / 36

Obtaining new command-line software

  • Modern Unix systems have package managers to that download install (free) software for you

  • On a Mac, Homebrew (http://brew.sh/) is a popular package-management system (alternatively, MacPorts, https://www.macports.org/)

  • On Ubuntu, apt (https://itsfoss.com/apt-get-linux-guide/) is the standard package manager, with both a command-line and graphical interface available

  • On Windows, Cygwin (https://cygwin.com/install.html) installs everything precompiled through its setup file. Do not delete setup-x86_64.exe file after installing Cygwin, explore what Linux tools are available (a lot)

13 / 36

Interacting with shell

  • Most commands take additional arguments that fine tune their behavior

  • If you don't know what a command does, use the command man <command>

  • Press q to quit the man page viewer

  • Most often, you’ll use <command> -h or <command> --help

  • Some commands output help if executed without any arguments

14 / 36

File system: Full vs. relative paths

  • cd / - go to the root directory
  • cd /usr/home/jack/bin - go to the user’s sub-directory
  • cd .. - go to the upper level directory
  • cd, or cd ~ - go to the user’s home directory
  • cd -- - go to the last visited directory
15 / 36

Orienting in the filesystem

  • pwd - print working directory

  • ls - list all files in the current directory

  • ls -1 - list files in one column

  • ls –lah - list files in long, human readable format, include all content, user, owner, permissions

16 / 36

Creating, moving, copying, and removing files

  • touch <file> - creates an empty file

  • nano <file> - edit it

  • mkdir <dirname> - creates a directory

  • cp <source_file> <target_file> - copy a file to another location/file

  • mv <source_file> <target_file> - move a file

  • rm <file> - remove a file. If multiple files provided, removes all of them

  • rm –r <dirname> - recursive removal (deletes a directory)

17 / 36

Permissions: chmod, chown and chgrp

In Unix, every file and directory has an owner and a group

  • Owner - is the one who created a file/directory
  • Group - defines rules of file operations and/or permissions
  • Every - user on a Unix machine can belong to one or more groups

Every file has three permission levels

  • what the user can do
  • what the group can do
  • what the all can do
18 / 36

Understanding ls -lah output

-rw-r--r-- 1 mdozmorov staff 205B Dec 19 11:01 BIOS692.2018.Rproj
-rw-r--r-- 1 mdozmorov staff 3.5K Dec 18 10:20 BUILD.md
-rw-r--r-- 1 mdozmorov staff 470B Dec 19 08:48 README.md
-rw-r--r-- 1 mdozmorov staff 2.1K Dec 19 07:51 _config.yml
drwxr-xr-x 10 mdozmorov staff 340B Dec 18 10:20 _includes
drwxr-xr-x 10 mdozmorov staff 340B Dec 18 10:20 _layouts
drwxr-xr-x 7 mdozmorov staff 238B Dec 18 10:29 _posts
-rw-r--r-- 1 mdozmorov staff 1.0K Dec 20 15:54 acknowledgements.md
19 / 36

Understanding ls -lah output

  • The first column tells you about permissions

    • The very first character in the permissions column tells you what kind of file it is. A - means it's a regular file. A d means it's a directory
    • The next nine characters come in three classes, each has three characters. The three classes are owner/group/world permissions
    • Inside a permission class, r means that class can read the file; w means that class can write the file; x means that class can execute the file
  • The second column has the number of files (inside a directory)

  • The third and fourth columns tell you the owner and group
20 / 36

Finding your files

  • find - lists all files under the working directory (and its subdirectories) based on arbitrary criteria

  • find . - prints the name of every file or directory, recursively. Starts from the current directory

  • find . –type f - finds files only

  • find . –type d –maxdepth 1 - finds directories only, at most 1 level down

  • find . –type f –name "*.mp3" - finds only *.mp3 files

  • find . -type f -name "README.md" -exec wc -l {} \; - find files and execute a command on them

21 / 36

Wildcards and patterns

  • * - matches any character

  • ? - matches a single character

  • [chars] - matches any character in chars

  • [a-zA-Z] - matches any character between a and z, including capital letters

  • ls *.md

  • ls [Rt]*

22 / 36

Looking inside files

  • cat <file> - prints out content of a file. If multiple files, consequently prints out all of them (concatenates)

  • zcat - prints out content of gzipped files

  • less <file> - shows the content of the file one screen at a time

Keyboard shortcuts for more command

  • space - forward, b - backward
  • g - go to the beginning, G - go to the end
  • /<text> - starts forward search, enter to find next instance
  • q - quit
23 / 36

Chaining commands: pipes

  • One of the most useful capabilities of Unix is the ability to redirect the STDOUT of one command into the STDIN of another

  • The | (pipe) character feeds output from the first program (to the left of the |) as input to the second program on the right. Therefore, you can string all sorts of commands together using the pipe

find . | wc -l
cat names.txt | sort | uniq -c
  • Executing one command AFTER another completed successfully: <command> && <command>
mkdir music && mv *.mp3 music/
24 / 36

Chaining commands: redirections

  • Nearly every command in Unix makes use of a convention to have a "standard input" (also called stdin or STDIN, or channel 0) and "standard output" (also called stdout or STDOUT, or channel 1)

  • There is also a "standard error" (stderr or STDERR, or channel 2) output that is, by convention, reserved for error messages

  • find / 2> error.log - capture STDERR into a file

  • find / 2> /dev/null - suppress STDERR messages
  • find / 2>&1 - add STDERR to STDOUT
25 / 36

Chaining commands: redirections

  • If you want to dump the standard output into a file, use command > file (overwrites the file). command >> file (appends to the file)

  • Redirection example: ls > README.md - save file list in the current directory into README.md file

  • Redirection works in another direction: grep CC0 < License.md

  • Or, the content of a commant into another command: join <(sort file1) <(sort file2)

26 / 36

Other essential commands {.smaller}

head/tail cut
for comm
sort echo
uniq basename
wc dirname
tr history
grep which
join who
kill grep
tar seq
gzip paste
27 / 36

Shell conveniences

  • Tab completion

  • Ctrl-c - cancel the command you are writing

  • Ctrl-r - reverse search through your command line history

  • history - shows your previous commands

  • !<history number> - repeats specific command. Or, !ls to match the most recent ls command

  • !! - repeats the last command

28 / 36

Processes and job control

  • As we interact with Linux, we create numbered instances of running programs called “processes.” You can use the ps command to see a listing of your processes (and others!). To see a long listing, for example, of all processes on the system try: ps -ef

  • To see all the processes owned by you and other members of the class, try: ps –ef | grep bash

  • To see the biggest consumers of CPU, use the top command (which refreshes every few seconds): top

29 / 36

Foreground/background

  • Thus far, we have run commands at the prompt and waited for them to complete. We call this running in the “foreground.” It is also possible, using the & operator, to run programs in the “background”, with the result that the shell prompts immediately without waiting for the command to complete:
$ mycommand &
[1] 54356 <-------- process id
$
30 / 36

Backgrounding a running job with C-z and ‘bg’

Sometimes you start a program, then decide you want to run it in the background. Here’s how:

top
  • Press C-z to suspend the job. You can continue working in the terminal, but the job won't be lost

  • Type bg at the command prompt to see what jobs are available

  • To bring the background job back to the foreground, type fg at the command prompt

31 / 36

Process control

  • To end the job, use the ‘kill’ command, either with the five-digit process id: kill 56894 #for example!

  • Use kill -9 PID to immediately stop the job

32 / 36

Statistical command line goodies

data_hacks, https://github.com/bitly/data_hacks

  • Command line tools for data analysis
  • histogram.py
  • bar_chart.py
  • sample.py

datamash, https://www.gnu.org/software/datamash/

  • summary statistics
  • transposing matrixes
33 / 36

Additional commands

34 / 36

Unix for high-performance cluster computing

  • Allow you to submit multiple jobs at once

  • Depending on the system, can schedule jobs for you

  • Are optimized for high-throughput performance

35 / 36

Learn more

36 / 36

What is Unix

  • Unix is a family of operating systems and environments that exploits the power of linguistic abstractions to perform tasks
  • Unix is not an acronym; it is a pun on "Multics". Multics was a large multi-user operating system that was being developed at Bell Labs shortly before Unix was created in the early '70s. Brian Kernighan is credited with the name
  • All computational genomics is done in Unix

.small[ http://www.read.seas.harvard.edu/~kohler/class/aosref/ritchie84evolution.pdf

2 / 36
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow