compsci-missing_semester_2020/_subsections/lesson-04.org

4.1 KiB

Lesson 04 | Data Wrangling

notes

intro example

  • using ssh someserver 'somecommand' will run that command on the server
  • you could run a series of commands on the server instead of channeling all that info back

    ssh someserver 'journalctl | grep sshd | grep "Disconnected from"' | less
  • this will run journalctl on the server, find anything that says 'sshd' and 'disconnec..' in the results
  • then send all those results back to our machine where we pipe it through 'less'

SED

  • stream editor
  • allows you to make changes to the contents of a stream
  • full programming langauge
  • common task is to run replacement expressions on an input stream

example

sed 's/.*blahblah blah//'
  • uses regular expressions

    • way of matching text

sed modifiers

  • (ab)* - remove zero or more of 'ab'
  • -E use new replacement
  • (ab|bc)* - remove 'ab' or 'bc'

regex debugger

  • regex101.com

sort

  • can sort by column
  • sorts ascending by default

awk

  • programming language
  • focused on columnar data
  • can match by pattern

paste

  • takes input and puts it together how you want
  • '-s' :: single line
  • '-d' :: delimiter

berkley calculator

  • calculator that reads from stdin

compute statistics

  • R language is built for statistical analysis

gnuplot

  • plotter
  • takes from stdin

xargs

  • takes lines of input and puts them into arguments