This homework is due on Tuesday, October 10th at 12:30pm (the start of class). Please turn in all your work. The primary purpose of this homework is to hone data and plotting skills. This description may change at any time, however notices about substantial changes (requiring more/less work) will be additionally noted on the class web page. Note that there are two prongs to submission, via Canvas and Bitbucket (in asc-repo/hwk/hw3). You don’t need to use Rmarkdown but your work should be just as pretty if you want full marks.

Problem 1: Unix commands (10 pts)

Update your asc-repo/notes/unix.txt file to include the commands we have discussed in class since homework 1, and any others that you’d like to keep a list of. (Note that the answer to this question does not require files in asc-repo/hw3.)

Problem 2: Asset returns (25 pts)

Return to the dow30 data from Yahoo! Finance that we collected in class.

  1. Consider the getSymbols function from the quantmod package for R.
    1. How can it be used in a fashion similar to our get.quotes and get.multiple.quotes functions?
    2. By inspecting the source, describe how it handles the “crumb issue” we encountered?
    3. Provide the commands (using getSymbols) which mimic the result of our get.multiple.quotes function on the dow30.tickers from class.
  2. With the data you downloaded, via the old get.quotes version or the new getSymbols (as you prefer), use the transform function to create a new "Mid" column tabulating the average between the "Low" and "High" quotes, and to convert the "Date" column into a Date class using a single call.

  3. Calculate returns \(r_{a,t}\) for each asset \(a\) based on the "Mid" price \(p_{a,t}\) on each day \(t\): \(r_{a,t} = (p_{a,t} - p_{a,t-1})/p_{a,t-1}\).

  4. Use those same "Mid" prices to calculate the Dow Jones Industrial Average (DJIA) using the single “divisor” provided on that page. Compare your calculation to a DJIA index downloaded from Yahoo!

  5. The DJIA is often criticized for being a simple “price-weighted” index, meaning that the divisor is common to all assets. Create an alternative market-capitalization weighted version, more like the S&P500. For bonus points, write a script to scrape the “market caps” from Google or Yahoo. (Otherwise do it by hand.) Calculate returns. Finally, put a data.frame together with your all of your DJIA prices and returns and provide a visualization comparing the two.

  6. Calculate the correlation of each stock’s returns to those of the market, as measured (separately) by your two DJIA return calculations. Create a new data set where the stock prices and returns are presented after sorting on these correlations.

Problem 3: Data visualization (30 pts)

Use charts to explore the 1970s UK teacher’s pay survey data in the teach.csv available on the course web page.

  • Info on the data columns is available in the CSV file.

What to show and how to show it is up to you. However, you may wish to (at the very least) consider the following.

  1. Plot salary v. the number of months in service, while also indicating sex. What do you find?
  2. Ignore months and use boxplots to explore the distribution of salary for each of the levels of sex, marry, degree, type, train, and break.
  3. Consider the part of the data for teachers whose school offers a degree of type "0", and revisit i. Plot and try a augmenting with the best fitting regression line/intervals.

Note that this question is about visualization, not statistical modeling. For full credit your plots must be properly annotated with informative axes, labels, legends, main titles, etc., with accompanying verbal descriptions and interpretations in prose.

Problem 4: Processing web data (35 pts)

Hockey season is starting and it is time to get excited! There are many outlets on the web that let you keep track league standings. We will use Click on that link and have a look.

  • The regular season starts next week, so it is pretty empty at the moment.
  • Swap in 2017 for 2018 in the URL to have a look at last season’s version.
  • To read these table I suggest looking into htmltab in a package of the same name on CRAN.

Your task is to produce league standings tables from this data by automatically reading directly from the HTML obtained from the URL above.

  • For an example of what standings tables look like, see the middle panel of the main page.

However you must augment tables like those with columns to include goals for, goals against, and goal differential.

  • You will get something very similar, augmented with that extra goals information, if you Google “NHL standings”.
  • A file linked from the course web page provides conference and division information for each team, including a three-letter team name abbreviation. Use the merge function to merge them with the games data.

Here are some details on how you must present your standings.

  1. You must provide a version of the table within your PDF homework solutions submitted on Canvas, which may be rendered (i.e., is up-to-date) at the time of submission.
  2. More importantly, you must also provide a bash script called which I can call at any time after the time of submission and which will provide an up-to-date summary. That bash script should print a text version of the standings to the terminal and nothing else (i.e., no other R or bash messages).

In addition to being segmented by conference and league (you might find aggregate to be of assistance here), your teams must be properly sorted, first by points (higher is better) and then by losses (lower is better), then by wins (higher is better), then by goal differential (higher better).

  • Teams get 2 points for a win, whether in regulation or overtime (OT), or via shoot-out (SO), and 1 point for an OT or SO loss. Otherwise zero points.
  • You do not need anything else, i.e., no indicators of playoff clinch, etc.

For extra credit, design your script to take an optional argument allowing the specification of one of the following

  • three-letter team abbreviation
  • conference name (“eastern”, “western”)
  • division name (“atlantic”, “metropolitan”, “central”, “pacific”)

which will show only the relevant portion of the table. Include the entire division, but not the whole conference, if a team abbreviation is provided.