For the Quantitative Methodologies for Design Research (定量研究方法) course for Ph.D. students at Tongji University in spring 2017, Susu Nousala invited me to join the team of instructors in collaborative education in Shanghai. Experts were brought in during the course to guide the graduate students.
While I’m comfortable with the mathematics underlying statistical analysis, I have a lot of practical experience of working with business executives who aren’t. Thus, my approach to working with data relies a lot on presentation graphics to defog the phenomena. While the label of data science began to rise circa 2012, I’ve had the benefit of practical experience that predates that.
In my first professional assignment in IBM Canada in 1985, data science would have been called econometrics. My work included forecasting country sales, based on price-performance indexes (from the mainframe, midrange and personal computer product divisions) and economic outlooks from Statistics Canada. Two years before the Macintosh II would bring color to personal computing, I was an early adopter of GRAFSTAT: “An APL system for interactive scientific-engineering graphics and data analysis” developed at IBM Research. This would eventually become an IBM program product by called AGSS (A Graphical Statistical System) by 1994.
In 1988, I had an assignment where data science would have been called marketing science. I was sent to California to work in the IBM partnership with Metaphor Computer Systems. This was a Xerox PARC spin-off with a vision that predated the first web page on the World Wide Web by a few years. These activities led me into the TIMS Marketing Science Conference in 1990, cofounding the Canadian Centre for Marketing Information Technologies (C2MIT) and contributing chapters to The Marketing Information Revolution published in 1994.
This journey led me to appreciate the selection and use of computer-based tools for quantitative analysis. Today, the two leading platforms in “Data Science 101” are Python (a general purpose language with statistical libraries), and the R Project for Statistical Computing (a specialized package for data analysis and visualization). Both are open source projects, and free to download and use on personal computers. I tried both. R is a higher level programming language more similar to the APL programming language that gets work done more quickly. For statistical work, I recommend R over Python (although APL is a theoretically better implementation).
Since I live in Toronto, I attended the February session of Data Science with R – Bootcamp in person, at Ryerson University. There, I was watched Polong Lin leading a class through R using the Jupyter notebook, both in (i) an interactive version, and (ii) a printable version. Students had the choice to either follow Polong (i) actively, in a step-by-step execution in the Cognitive Class Virtual Lab (formerly called the Data Scientist Workbench) with a cloud-based R session through their web browsers, or (ii) passively, reading the static printable content.
daviding August 26th, 2017
Posted In: universities