#+title: Lesson 04 | Data Wrangling #+HTML_HEAD: #+HTML_HEAD: #+HTML_HEAD: #+OPTIONS: H:6 * Links #+attr_html: :class links - [[../toc.org][TOC | Missing Semester]] - [[https://www.youtube.com/playlist?list=PLyzOVJj3bHQuloKGG59rS43e29ro7I57J][Playlist: Missing Semester]] - [[https://missing.csail.mit.edu/2020/data-wrangling/][class notes]] - Curr: https://youtu.be/sz_dsktIjt4?si=0WESCuewbWY5mJiv&t=622 *** timestamps :PROPERTIES: :CUSTOM_ID: timestamp :END: #+attr_html: :class playlist + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=4s][00:00 - introduction]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=415s][06:55 - Stream Editor]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=456s][07:36 - Replacement Expressions]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=538s][08:58 - Regular Expression]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=560s][09:20 - Regular Expressions]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=620s][10:20 - Square Brackets]] *current* + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=693s][11:33 - Add Modifiers]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=776s][12:56 - Alternations]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=1029s][17:09 - Anchoring the Regular Expression]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=1138s][18:58 - Capture Groups]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=1215s][20:15 - Regular Expression Debugger]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=1450s][24:10 - Regular Sessions]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=1561s][26:01 - Match and Email Address ]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=1743s][29:03 - Sort]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2040s][34:00 - Awk]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2324s][38:44 - Berkeley Calculator]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2437s][40:37 - Computer Statistics over Inputs]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2482s][41:22 - Summary Statistics]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2513s][41:53 - Plotting]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2570s][42:50 - Two sort of special types]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2754s][45:54 - example where data wrangling is useful]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2805s][46:45 - image captures to standard output]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2846s][47:26 - operate on standard input]] + [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2880s][48:00 - display in an image display]] * notes ** intro example - using ssh someserver 'somecommand' will run that command on the server - you could run a series of commands on the server instead of channeling all that info back #+BEGIN_SRC bash ssh someserver 'journalctl | grep sshd | grep "Disconnected from"' | less #+END_SRC - this will run journalctl on the server, find anything that says 'sshd' and 'disconnec..' in the results - then send all those results back to our machine where we pipe it through 'less' ** SED - stream editor - allows you to make changes to the contents of a stream - full programming langauge - common task is to run replacement expressions on an input stream *** example #+BEGIN_SRc bash sed 's/.*blahblah blah//' #+END_SRC - uses regular expressions - way of matching text