compsci-missing_semester_2020/_subsections/lesson-04.org

106 lines
4.1 KiB
Org Mode
Raw Permalink Normal View History

2025-03-02 13:01:53 +02:00
#+title: Lesson 04 | Data Wrangling
#+HTML_HEAD: <link rel="stylesheet" type="text/css" href="../_share/media/css/missing-semester.css" />
#+HTML_HEAD: <link rel="stylesheet" type="text/css" href="../_share/media/css/org-media-sass/collapsible.css" />
#+HTML_HEAD: <script src="../_share/media/js/collapsible.js"></script>
#+OPTIONS: H:6
* Links
#+attr_html: :class links
- [[../toc.org][TOC | Missing Semester]]
- [[https://www.youtube.com/playlist?list=PLyzOVJj3bHQuloKGG59rS43e29ro7I57J][Playlist: Missing Semester]]
- [[https://missing.csail.mit.edu/2020/data-wrangling/][class notes]]
- Curr: https://youtu.be/sz_dsktIjt4?si=XopbHGTFXY-I6Bkh&t=2577
2025-03-02 13:01:53 +02:00
*** timestamps
:PROPERTIES:
:CUSTOM_ID: timestamp
:END:
#+attr_html: :class playlist
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=4s][00:00 - introduction]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=415s][06:55 - Stream Editor]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=456s][07:36 - Replacement Expressions]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=538s][08:58 - Regular Expression]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=560s][09:20 - Regular Expressions]]
2025-05-14 13:43:27 +03:00
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=620s][10:20 - Square Brackets]]
2025-03-02 13:01:53 +02:00
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=693s][11:33 - Add Modifiers]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=776s][12:56 - Alternations]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=1029s][17:09 - Anchoring the Regular Expression]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=1138s][18:58 - Capture Groups]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=1215s][20:15 - Regular Expression Debugger]]
2025-03-02 13:01:53 +02:00
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=1450s][24:10 - Regular Sessions]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=1561s][26:01 - Match and Email Address ]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=1743s][29:03 - Sort]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2040s][34:00 - Awk]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2324s][38:44 - Berkeley Calculator]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2437s][40:37 - Computer Statistics over Inputs]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2482s][41:22 - Summary Statistics]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2570s][42:50 - Two sort of special types]] *current*
2025-03-02 13:01:53 +02:00
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2513s][41:53 - Plotting]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2754s][45:54 - example where data wrangling is useful]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2805s][46:45 - image captures to standard output]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2846s][47:26 - operate on standard input]]
+ [[https://www.youtube.com/watch?v=sz_dsktIjt4&t=2880s][48:00 - display in an image display]]
* notes
2025-03-10 04:43:17 +02:00
** intro example
- using ssh someserver 'somecommand' will run that command on the server
- you could run a series of commands on the server instead of channeling all that info back
2025-03-02 13:01:53 +02:00
2025-03-10 04:43:17 +02:00
#+BEGIN_SRC bash
ssh someserver 'journalctl | grep sshd | grep "Disconnected from"' | less
#+END_SRC
- this will run journalctl on the server, find anything that says 'sshd' and 'disconnec..' in the results
- then send all those results back to our machine where we pipe it through 'less'
** SED
- stream editor
- allows you to make changes to the contents of a stream
- full programming langauge
- common task is to run replacement expressions on an input stream
*** example
#+BEGIN_SRc bash
sed 's/.*blahblah blah//'
#+END_SRC
- uses regular expressions
- way of matching text
2025-05-14 13:43:27 +03:00
*** sed modifiers
- (ab)* - remove zero or more of 'ab'
- -E use new replacement
- (ab|bc)* - remove 'ab' or 'bc'
** regex debugger
- regex101.com
** sort
- can sort by column
- sorts ascending by default
** awk
- programming language
- focused on columnar data
- can match by pattern
** paste
- takes input and puts it together how you want
- '-s' :: single line
- '-d' :: delimiter
** berkley calculator
- calculator that reads from stdin
** compute statistics
- R language is built for statistical analysis
** gnuplot
- plotter
- takes from stdin
** xargs
- takes lines of input and puts them into arguments