Data analytics & machine learning

What do those words mean?


Data analytics: that's just modern statistics.

  • What does my data tell me? Can I use it to make predictions and quantify uncertainty?

If you tell your barber that you do statistics they'll tell you how much they hated that class, and they will pity you.

  • If you tell them you do data analytics, ….
  • Well, either way they'll probably stop talking to you.

A big difference between "conventional" and "modern" statistics (data analytics) is a de-emphasis on \(p\)-values.

  • Nobody likes those, but they have their place.

Machine learning

If you ask folks what the difference is between machine learning (ML) and statistics/analytics

  • you'll get lots of answers.

But I think that the underlying theme is automation.

Data-driven analysis and decision-making should:

  • commence with minimal human intervention,
  • improve over time even as dynamics/trends evolve,
  • and interface seamlessly with other computing/database infrastructure.

Many of the tools are the same in stats and ML, but in ML algorithms are king, often at the expense of (human) interpretation.

Artificial intelligence

Building machines that do like humans do is a subtly different enterprise,

  • and will not be a focus in this class.

Humans are good decision makers, but sometimes they don't see the forest from the trees.

Stats (and ML) punt on "being human", and instead focus on

  • making (automated) decisions from data that are measurably better than humans could do alone;
  • not only in terms of scale (big data), complexity (fancy model), and endurance (many decisions),
  • but also free(er) from bias, and with lower variance,
  • and better at navigating risk/reward trade-offs.

Logistics

These notes

… are not a sufficient record of material covered in lecture.

  • If you stay home and think you can get all you need by reading over the slides you'll be disappointed,
  • and so will I.

The details of mathematical calculations will be entirely left to board work,

  • and you will be expected to do similar calculations on your homework.

The code examples in the slides will be terse, otherwise they wouldn't fit on the slides.

  • We will rely on interactive code sessions for the details
    • (and no record will be provided).

"Optional" books