Coevolving Innovations

… in Business Organizations and Information Technologies

Learning data science, hands-on

For the Quantitative Methodologies for Design Research (定量研究方法) course for Ph.D. students at Tongji University in spring 2017, Susu Nousala invited me to join the team of instructors in collaborative education in Shanghai.  Experts were brought in during the course to guide the graduate students.

My participation in the course over two days had three parts:  (a) preparing a lecture outline; (b) orienting the students; and (c) equipping the students with tools.

(A) Preparing a lecture outline

While I’m comfortable with the mathematics underlying statistical analysis, I have a lot of practical experience of working with business executives who aren’t.  Thus, my approach to working with data relies a lot on presentation graphics to defog the phenomena.  While the label of data science began to rise circa 2012, I’ve had the benefit of practical experience that predates that.

Today's APL
AGSS: A Graphical Statistical System (1994)

In my first professional assignment in IBM Canada in 1985, data science would have been called econometrics.  My work included forecasting country sales, based on price-performance indexes (from the mainframe, midrange and personal computer product divisions) and economic outlooks from Statistics Canada.  Two years before the Macintosh II would bring color to personal computing, I was an early adopter of GRAFSTAT: “An APL system for interactive scientific-engineering graphics and data analysis” developed at IBM Research.  This would eventually become an IBM program product by called AGSS (A Graphical Statistical System) by 1994.

Metaphor Computer Systems workstation
Metaphor Computer Systems workstation

In 1988, I had an assignment where data science would have been called marketing science.  I was sent to California to work in the IBM partnership with Metaphor Computer Systems. This was a Xerox PARC spin-off with a vision that predated the first web page on the World Wide Web by a few years.  These activities led me into the TIMS Marketing Science Conference in 1990, cofounding the Canadian Centre for Marketing Information Technologies (C2MIT) and contributing chapters to The Marketing Information Revolution published in 1994.

This journey led me to appreciate the selection and use of computer-based tools for quantitative analysis.  Today, the two leading platforms in “Data Science 101” are Python (a general purpose language with statistical libraries), and the R Project for Statistical Computing (a specialized package for data analysis and visualization).  Both are open source projects, and free to download and use on personal computers.  I tried both.  R is a higher level programming language more similar to the APL programming language that gets work done more quickly.  For statistical work, I recommend R over Python (although APL is a theoretically better implementation).

Intro to R Programming, Big Data University
Intro to R Programming, Big Data University, Feb. 22, 2017

Since I live in Toronto, I attended the February session of Data Science with R – Bootcamp in person, at Ryerson University.  There, I was watched Polong Lin leading a class through R using the Jupyter notebook, both in (i) an interactive version, and (ii) a printable version.  Students had the choice to either follow Polong (i) actively, in a step-by-step execution in the Cognitive Class Virtual Lab (formerly called the Data Scientist Workbench) with a cloud-based R session through their web browsers, or (ii) passively, reading the static printable content.

Polong was helpful in guiding us with course resources that would be available in Shanghai.  In North America, we have cognitiveclass.ai with course materials, and datascientistworkbench.com as a cloud computing platform (that includes R and Python).  For China, there are parallel sites at bigdatauniversity.com.cn and datascientistworkbench.cn in Chinese, that native speakers could be more comfortable using.

Based on Polong’s materials, I developed a Jupyter notebook additionally emphasizing graphical presentation for an in-person lecture in Shanghai.  These materials are cached at coevolving.com/tongji/201704_DataScienceR/, where they are accessible globally.

(B) Orienting the students

Ph.D. students in design will not have been required to have studied mathematics at the university level.  However, I then recalled that Shanghai high school students had a history with PISA achievement of “top of the global class in maths with an average score … or the equivalent of nearly three years of schooling, above the average” in 2013.

Ph.D. course in Quantitative Methods, Tongji University College of Design and Innovation
This strength flavoured the description of my approach for the lecture to students:

  • Did you study linear algebra?  (Pause to have a Chinese professor translate that.)
  • Of course, you studied linear algebra in high school.
  • So, how do I approach exploratory data analysis?
  • I plot the data, and draw a straight line through it.
  • If the plot doesn’t look right, I move the data so that straight line looks right.

An Orientation Demonstration (Jupyter notebook)

The workshop scope was then explained as:

  • NOT to teach you everything about data science!
  • You should know about ways to represent data to support your research findings.
  • You will see some good tools for qualitative methods (and maybe even quantitative methods).
  • We can work together to get you started on tools that will suit your needs.

Working with the end in mind, the orientation stepped through a recommended package with three tools:  (i) ggplot2, through (ii) Jupyter, on (iii) R.

The demonstration loaded an example of Housing, with Sales Prices of Houses in the City of Windsor as 546 rows and 12 variables.  This showed that a higher housing price correlates with a larger lot size, something that makes sense intuitively.  The ggplot2 library has a nice feature of facet grids, so that incorporating the number of bedrooms into an analysis can be visualized as a collection of plots (rather than a single plot).

(C) Equipping the students with tools

Having then shown the students what outputs could look like, we released them to experiment as a self-study group, with the professors available as standby resources.  As Ph.D. students at one of the top universities in China, they were expected to step up to the challenge.  Since neither Susu nor I are proficient in Mandarin, the students would likely learn faster as a group if not slowed down by second-language conversion.

The students then enthusiastically set upon getting their tools to work.  They collectively surfaced two challenges:  (i) technical; and (ii) research use.

(i) The technical issue was that, although the bigdatauniversity.com.cn web site was readily accessible, the students found the cloud-based datascientistworkbench.cn slow over the Internet.  While we in North America expect high bandwidth speeds in our workplaces and homes, the responsiveness through the browser interface at Tongji University to the cloud platform was too slow.  This led the students to prefer installing open source software tools onto their personal computers, dissolving Internet connectivity problems.

Most of the students were running MacBooks, and a few were on Windows laptops.  Downloading R from the Comprehensive R Archive Network and installing that program was easy.  Downloading Anaconda from Continuum Analytics as a step towards a Jupyter notebook was relatively straightforward.  But then, with both Anaconda and R up and running, getting Jupyter to connect to R was a challenge.  This required opening up a Terminal — a new experience for most on MacOS, but somewhat more familiar as command line on Windows — and typing in magic incantations.

  • The installation of IRKernel is not a drag-and-drop activity, and sometimes requires more technical knowledge to interpret the messages signalling other than success.
  • MacOS X with Jupyter needs some Python libraries that requires the installation of XCode.
  • MacOS X after 10.5 (Leopard) no longer includes X11, requiring a separate installation of XQuartz.
  • On MacOS X, installing the IRKernel would not work if R was launched by clicking on the R icon, but would if R was started from a Terminal.

Resolving these issues had the Ph.D. students collectively helping each other for 6 hours in the afternoon and evening on the first course day, and then 2 hours on the second morning.  Updating MacOS X with XCode over the Internet was slow on the university network.  On Windows, one student easily got all of the packages working on his computer, and then struggled to help a friend struggling with similar hardware and software.

The ordeal of installing software had an unexpected benefit of becoming a team-building activity.  The students banded together in mutual support!

(ii) The research use issue arose as the students worked their way through the Big Data University exercises.  Exercises typically use datasets already prepackaged for use,  so students can focus on the programming language.  Real research projects requiring bringing in real datasets.  Pointing out the import features for R programming moved the students one step ahead.

In a course oriented towards a future of big data, this raised the question:  where is the data?  The phenomenon of open data — as electronic data sources readily accessible over the Internet — is nascent within the People’s Republic of China.  This led to questions about research design, where students would have to determine questions of interest, and plan ways to collect data.

After those two days, my participation in the course diminished with a few exchanges over WeChat.  Susu brought some of the students onto a research project in Southern China, putting their research design learning into practice over the summer.  By the fall, the students should have performed preliminary data analysis, so that coaching may be welcomed on finer points about statistics.


Leave a Reply

Your email address will not be published. Required fields are marked *

  • RSS qoto.org/@daviding (Mastodon)

    • daviding: “Hosting multiple Dialogic Drinks on "From Unfreezing-Refreez…” March 8, 2024
      Hosting multiple Dialogic Drinks on "From Unfreezing-Refreezing, to Systems Changes Learning" online, March 12 (Europe), March 14 (Americas), March 15 (Australia). #Leadership meets #SystemsThinking . Short presentations, longer discussions https://www.eqlab.co/from-unfreezing-refreezing-to-systems-changes-learning-david-ing
    • daviding: “"Climate change has no map that we know of. Each time a new…” February 15, 2024
      "Climate change has no map that we know of. Each time a new scientific study returns something we studied before, it's always going to arrive faster and be worse than we thought before". Episode 5, #DavidLHawk "What to do When It's too Late" https://www.youtube.com/watch?v=VPruvIsDRDk #SystemsThinking "Instead of cause-effect thinking, effects coming from prior effects, not […]
    • daviding: “In the third episode of "What to Do When It's Too Late", #Da…” February 2, 2024
      In the third episode of "What to Do When It's Too Late", #DavidLHawk explains his #systemsthinking with humans in #climatechange, dealing with hopelessness. Live weekly broadcast on #BoldBraveTv with video recordings and podcasts. Text digest at https://daviding.wordpress.com/2024/02/02/what-to-do-when-its-too-late-david-l-hawk-2024/
    • daviding: “Published "Reframing #SystemsThinking for Systems Changes: S…” February 2, 2024
      Published "Reframing #SystemsThinking for Systems Changes: Sciencing and Philosophizing from Pragmatism towards Processes as Rhythms" with #GarySMetcalf in Journal of the #InternationalSocietyForTheSystemsSciences following 2023 Kruger Park, revised after peer review. https://coevolving.com/blogs/index.php/archive/sciencing-philosophizing-jisss/
    • daviding: “Web video of @scottdejong@hci.social + @gceh@mstdn.social ho…” January 23, 2024
      Web video of @scottdejong + @gceh hosted by #zaidkhan in relaxed conversation on "What Can Systems Thinkers Learn from Educational Game Studies" at #SystemsThinking Ontario https://coevolving.com/blogs/index.php/archive/educational-game-studies-scott-dejong-geoff-evamy-hil/
  • RSS on IngBrief

    • Introduction, “Systems Thinking: Selected Readings, volume 2”, edited by F. E. Emery (1981)
      The selection of readings in the “Introduction” to Systems Thinking: Selected Readings, volume 2, Penguin (1981), edited by Fred E. Emery, reflects a turn from 1969 when a general systems theory was more fully entertained, towards an urgency towards changes in the world that were present in 1981. Systems thinking was again emphasized in contrast […]
    • Introduction, “Systems Thinking: Selected Readings”, edited by F. E. Emery (1969)
      In reviewing the original introduction for Systems Thinking: Selected Readings in the 1969 Penguin paperback, there’s a few threads that I only recognize, many years later. The tables of contents (disambiguating various editions) were previously listed as 1969, 1981 Emery, System Thinking: Selected Readings. — begin paste — Introduction In the selection of papers for this […]
    • Concerns with the way systems thinking is used in evaluation | Michael C. Jackson, OBE | 2023-02-27
      In a recording of the debate between Michael Quinn Patton and Michael C. Jackson on “Systems Concepts in Evaluation”, Patton referenced four concepts published in the “Principles for effective use of systems thinking in evaluation” (2018) by the Systems in Evaluation Topical Interest Group (SETIG) of the American Evaluation Society. The four concepts are: (i) […]
    • Quality Criteria for Action Research | Herr, Anderson (2015)
      How might the quality of an action research initiative be evaluated? — begin paste — We have linked our five validity criteria (outcome, process, democratic, catalytic, and dialogic) to the goals of action research. Most traditions of action research agree on the following goals: (a) the generation of new knowledge, (b) the achievement of action-oriented […]
    • Western Union and the canton of Ticino, Switzerland
      After 90 minutes on phone and online chat with WesternUnion, the existence of the canton of Ticino in Switzerland is denied, so I can’t send money from Canada. TicinoTurismo should be unhappy. The IT developers at Western Union should be dissatisfied that customer support agents aren’t sending them legitimate bug reports I initially tried the […]
    • Aesthetics | Encyclopaedia Britannica | 15 edition
      Stephen C. Pepper was a contributor to the Encyclopaedia Britannica, 15th edition, on the entry for Aesthetics.
  • Recent Posts

  • Archives

  • RSS on daviding.com

    • 2024/02 Moments February 2024
      Chinese New Year celebrations, both public and family, extended over two weekends, due to busy social schedules.
    • 2024/01 Moments January 2024
      Hibernated with work for most of January, with more activity towards the end of month with warmer termperatures.
    • 2023/12 Moments December 2023
      A month of birthdays and family holiday events, with seasonal events at attractuions around town.
    • 2023/11 Moments November 2023
      Dayliight hours getting shorter encouraged more indoor events, unanticipated cracked furnace block led to replacement of air conditioner with heat pump, too.
    • 2023/10 Moments October 2023
      Left Seoul for 8 days in Ho Chi Minh City, and then 7 days in Taipei. Extended family time with sightseeing, almost completely offline from work.
    • 2023/09 Moments September 2023
      Toronto International Film Festival, and the first stop of a 3-week trip to Asia starting with Seoul, Korea
  • RSS on Media Queue

    • What to Do When It’s Too Late | David L. Hawk | 2024
      David L. Hawk (American management theorist, architect, and systems scientist) has been hosting a weekly television show broadcast on Bold Brave Tv from the New York area on Wednesdays 6pm ET, remotely from his home in Iowa. Live, callers can join…Read more ›
    • 2021/06/17 Keekok Lee | Philosophy of Chinese Medicine 2
      Following the first day lecture on Philosophy of Chinese Medicine 1 for the Global University for Sustainability, Keekok Lee continued on a second day on some topics: * Anatomy as structure; physiology as function (and process); * Process ontology, and thing ontology; * Qi ju as qi-in-concentrating mode, and qi san as qi-in-dissipsating mode; and […]
    • 2021/06/16 Keekok Lee | Philosophy of Chinese Medicine 1
      The philosophy of science underlying Classical Chinese Medicine, in this lecture by Keekok Lee, provides insights into ways in which systems change may be approached, in a process ontology in contrast to the thing ontology underlying Western BioMedicine. Read more ›
    • 2021/02/02 To Understand This Era, You Need to Think in Systems | Zeynep Tufekci with Ezra Klein | New York Times
      In conversation, @zeynep with @ezraklein reveal authentic #SystemsThinking in (i) appreciating that “science” is constructed by human collectives, (ii) the west orients towards individual outcomes rather than population levels; and (iii) there’s an over-emphasis on problems of the moment, and…Read more ›
    • 2019/04/09 Art as a discipline of inquiry | Tim Ingold (web video)
      In the question-answer period after the lecture, #TimIngold proposes art as a discipline of inquiry, rather than ethnography. This refers to his thinking On Human Correspondence. — begin paste — [75m26s question] I am curious to know what art, or…Read more ›
    • 2019/10/16 | “Bubbles, Golden Ages, and Tech Revolutions” | Carlota Perez
      How might our society show value for the long term, over the short term? Could we think about taxation over time, asks @carlotaprzperez in an interview: 92% for 1 day; 80% within 1 month; 50%-60% tax for 1 year; zero tax for 10 years.Read more ›
  • Meta

  • Creative Commons License
    This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
    Theme modified from DevDmBootstrap4 by Danny Machal