INFORMS Open Forum

  • 1.  Resources for R, Python and analytics textbooks

    Posted 03-23-2016 09:38

    I am looking for textbook recommendations for an undergraduate and graduate curriculum in analytics.  What books are folks using and/or recommend?  Any recommendations on books to avoid? J

     

    I am familiar with the following books:

    ·        Business Intelligence: A Managerial Perspective, third edition

    ·        Business Intelligence and Analytics, tenth edition

    ·        Business Analytics: Methods, Models and Decisions, second edition

     

     

    Additionally, I would like to ask if faculty have resources for the R or Python.   I am looking for any resources related to R and Python such as websites, tutorials, books, videos, etc. 

     

    I will share what I receive.

     

     

     

    Background – We are currently using IBM SPSS Modeler in our curriculum.  This is a powerful and relatively easy to use software package that can perform a variety of predictive analytics tasks as well as text analytics.  The  software uses a visual interface and "nodes" that can be connected together to accomplish a wide variety of data mining tasks.  IBM makes the software and all the training material available for free use as part of their academic initiative.  Students can also install SPSS Modeler on their own computers assuming they are running Windows.   http://www-01.ibm.com/software/analytics/spss/products/modeler/   However, outside of the academic initiative, the software is quite expensive and I do not see many (most) of the companies where my students get hired having SPSS Modeler due to its cost.  Thus, I am looking for alternatives such as R and Python which are free to use in the curriculum since the students will have access to these packages once they graduate.

     

     

    Thanks.

     

    Jerry

     

    "No trees were harmed in the sending of this message; however, a large number

    of electrons were slightly inconvenienced..."

     

    Dr. Jerry Flatto, Professor, Information Systems Department - School of Business

    University of Indianapolis, Indianapolis, Indiana, USA mailto:jflatto@uindy.edu

     

    Confidentiality Notice: This communication and/or its content are for the sole use of the intended recipient, and may be privileged, confidential, or otherwise protected from disclosure by law.  If you are not the intended recipient, please notify the sender and then delete all copies of it.  Unless you are the intended recipient, your use or dissemination of the information contained in this communication may be illegal.

     

     



  • 2.  RE: Resources for R, Python and analytics textbooks

    Posted 03-24-2016 09:59

    First, I think its great that you are moving towards open source, flexible data analysis tools. This will really help your student's think about what they are doing and let them be more creative. However, with that comes a price: your student's need a modicum of comfort or ability to program or think like a programmer to use these tools...there are no buttons to just click on and pretty tables to view data. Its all through programming commands.

    Here are the books that I've found must useful.

    Note: Unless noted otherwise, all the resources below have been made freely available by their authors, but they are also available for purchase from places like Amazon.com

    R Programming Language Resources 

    • Books by Hadley Wickham (a Core R Team member who has developed a lot of very useful utilities for R)
      • R for Data Science this is focused on using R for statistics
      • Advanced R this is focused on R as a programming language, not on how to do statistics. 
    • CRAN Task Views this is a page maintained by the R Project Team that thematically organizes the myriad of packages in R. 
      • Pros: Well organized and has decent descriptions and links to many packages.
      • Cons: Not exhaustive...more experimental or relatively new packages are not always there (however, this may not be a bad thing) 
    • Cookbook for R takes a "just tell me what to do" approach to many common tasks in R. 

    Python Programming Resources
    Note
    : There are currently two versions of python out there: Python 2 and Python 3. Normally, the developers try to maintain backwards compatibility, but they deviated from that principle for Python 3. The vast majority of Python 2 code will run with Python 3, but there are a few gotchas. I've included a reference that I think does a good job describing both languages. I'd recommend having your students use Python 3, as it's where the language is going.

    • Official Python 3 Documentation --  Decently written, comprehensive overview of Python's standard library.
    • Core External Packages for Data Analysis: Unlike R, Python's data science toolkit is comprised of a few "mega packages" as opposed to many small, focused packages. Also, these packages almost have a life of their own, with their own conferences and generally well-documented, decent looking web pages (unlike R's sparse help files).
      • Scipy.org: Not a package, but the SciPy organization makes most of the packages below.
      • Numpy: Convient array-like objects that are more user-friendly than Python base arrays for numerical computations.
      • ScipyThe  package for scientific computing...has tons of stuff from calculus to statistics to image processing and linear algebra and optimization and....
        • Scikit-learn: Scippy has a nubmer of "kits" that add additional functionality. This one has a bunch of cool machine learning algorithms with generally user-friendly APIs (so they are more accessible to non-ML experts). Since machine learning is pretty hot right now, and the idea of AI and computers learning though statistics is just plain cool, even a brief foray into this area would be well received by students (e.g., lots of classification algorithms boil down to a linear model, albeit in a transformed space)
      • Pandas: Major contribution is the DataFrame, which is meant to have similar functionality to R's popular DataFrame. Has lots of nice data import/export features too (e.g., Pandas.DataFrame.from_csv("filename.csv" creates a nice data from right from a local csv)
      • Matplotlib: Emulates a lot of MATLAB's plotting functionality. again, with a generally user-friendly API.
        • Seaborn: This is a package that uses matplotlib behind the scenes, but it makes a lot of the choices for you regarding formatting and display...generally good choices ;-) I use it a lot because I don't like fiddling with tons of parameters.
    • (NOT FREE) Python Essential Reference by David Beasley. This is a very concise (but well written) reference manual on Python programming (note, does not have a statistics focus). However, it does a good job pointing out the quirks in the language and how it's internals work, so Python will seem less mysterious.

    New(er) Data Formats

    It may also be helpful for you to briefly describe how to use JSON and YAML data formats. They aren't super difficult to learn, but both R and Python can parse these files into useful data structures and they allow for expressing more complex data (like nested lists). It also helps if your students aren't tied to CSV files, useful as they may be for basic statistics.

    • JSON: Less "human readable" but widely used.
    • YAML: More readable and a person favorite of mine for developing configuration files and expressing complex data.

    Finally: Done underestimate YouTube....lots of great stuff related to above, and its generally easier to digest a 15 minute example.

    As a practicing data scientist, I regularly use all the above items, and they have helped me learn a lot of techniques.

    Hope it helps you and your students.

    ------------------------------
    Michael Beyer PE,CAP
    Data Scientist



  • 3.  RE: Resources for R, Python and analytics textbooks

    Posted 03-24-2016 10:13

    Jerry,

    I second Michael Beyer's choices for R, to which I'll add a few things.

    • If your students are going to work with R, they most definitely should install RStudio, which is a great IDE (dare I say "industry standard"?) for R.
    • Johns Hopkins offers a data science specialization on Coursera. The specialization itself has a fee, but the courses are free, they are based on R, and their done well. In particular, the second course is an introduction to R programming.
    • The swirl package can be installed from CRAN. It is a learn-by-doing approach to R and related topics. Once installed, it lets you choose from a list of courses and then walks you through entering and executing code. Someone shifting from, say, Python to R might find it a tad basic, but for a beginner it's a fairly painless introduction to R coding.
    • There's an active Statistics and R Google+ community where people can seek help.

    Cheers,

    Paul

    ------------------------------
    Paul Rubin
    Professor Emeritus
    Michigan State University
    East Lansing MI



  • 4.  RE: Resources for R, Python and analytics textbooks

    Posted 03-24-2016 12:44

    I second Jerry and Michael, and strongly recommend to saty with open source stuff. I'm personally a fan of R with RStudio.

    I could recommend some text books, but you have enough by now. Besides, it would be useful to visit some interesting sites showing R aplplications. here is a suggetsion:  R Stats + Digital Analytics: 8 Blogs you should Follow

    R-bloggers remove preview
    R Stats + Digital Analytics: 8 Blogs you should Follow
    Are you interested in using R for your digital analytics projects? Do you need to perform prediction modelling and visualizations on your digital data and Excel can´t just do the job as you wanted?Or, you simply have no idea how R could help you in your digital analytics problems and you would like to see some real working examples first?Well, there are 2 good news for you.The first one is that you are not alone.
    View this on R-bloggers >

    I hope you have fun with your analytics studies.

    ------------------------------
    Luiz Lucas
    CEO
    Kdia Consulting
    Rio de Janeiro