I'll be teaching a two-hour session tomorrow July 11th for the MethodsFest in Exeter. The title and description of my session are updated as below:
Research Data as Feedback to Improve Research Design and Data Analysis
Dr ZhiMin Xiao, Exeter University
Using research data from some of the most rigorous Randomised Controlled Trials in social science and medicine, this two-hour session first looks at some common practices in trial data analysis. It then examines some unintended consequences those practices have had on what we call evidence. Finally, we, as a group, reflect on our own research practices and ask if and how established approaches in our own specific disciplines might “blind” us under certain circumstances and what we should do in order to mitigate any negative impact of what Andrew Gelman called “the Science of Default”. Throughout the session, we will use real-world research datasets that are publicly available to demonstrate if and how research findings can be sensitive to analytical choices and show that embracing some principles of statistical learning can help us avoid the vicious feedback loop we sometimes see in research practices.
This is a research and outreach project on data science education, please visit this website for more details.
The project will keep us busy throughout the summer.
We leave digital traces in many domains of everyday life today, and others collect data about us on a regular, if not constant, basis for varied reasons. Data can generate immeasurable value to organisations and industries if they are analysed in innovative ways, but they can also pose huge risks to individuals and institutions that hoard them. It is sometimes argued that a data revolution will significantly change how we live, work, and interact with one another. In the production of knowledge, we often see researchers utilise “big data” to answer all sorts of (research) questions. And in one way or another, big datasets are often preferred in research and occasionally viewed as a higher form of knowledge, with the aura of truth and objectivity that define what we call evidence. “Let the numbers speak for themselves.” We often hear such justification in evidence-based decision making.
But in reality, not all numbers are neutral. And indeed, more data often leave more space for data dredging, and people tend to analyse them in ways that are more likely to generate newsworthy findings and interpret those findings as evidence to support pre-conceived beliefs and decisions. In research terms, the more data we have, the more likely the data can be used to fit, rather than build or test, theories. We need to be data literate in such a data saturated (research) world.
Throughout the course, we will not only teach students some essential technical and analytical skills that will prove useful for (further) research and (future) work, but we also emphasise statistical re-thinking. We will analyse research data in multiple ways, observe how the findings resemble and/or differ from one another, and eventually bring into awareness the importance of design in data-intensive research. Together, we aim to construct a (hopefully) better (shared) understanding of some common research designs by approaching (research) data with an enabling mindset and analysing them in principled ways. We will use statistical software R and RStudio to analyse different datasets in weekly workshops that combine lectures and hands-on-the-keyboard exercises. The course is open to anybody who has an interest in developing analytical skills in R by making the connection between hands and head. We will make the course accessible online anywhere or on campus in Exeter.
Many of you may not have collected any data of your own yet and indeed, some of you will probably never analyse quantitative (or even empirical) data. But the critical approach we take to data (analysis) and evidence that underpins much of the content we cover in this course will prove helpful in all walks of life. Although we do not assume any prior statistical knowledge and programming background, we do assume you have a laptop or at least access to a computer, either before or after each class. Although we will learn how to code in R, this is not just an introduction to programming course. Although we will learn how to work with data, this is not just an introduction to statistics course. We focus on practices and make things happen. We use research data to understand research design, in a hope that we will design better research in future. In the end, you may find that you actually enjoy doing computational social science research in a data-driven way, and that you can do it much better than you imagined.
In total, there will be ten on campus sessions, each two hours long on Friday afternoons from 2:30 pm to 4:30 pm. Participants are welcome to attend the class in person, or online via live streaming on Zoom. If for any reason you cannot attend live sessions, you can watch the recordings after each session. Both R codes and recorded webcasts will be released for download on Dropbox.
The tentative topics covered throughout the ten weeks include: getting familiar with R and RStudio; loading data into R and understanding variable types, which are linked with some issues researchers need to consider at the design stage; exploratory data analysis with some key concepts in statistics; data visualisation and theoretical distributions; sampling strategies in design and re-sampling techniques in R; observational versus experimental designs with some (data) examples; classical regression analysis and data-driven statistical learning; explanatory power and predictive accuracy; transparency, reproducibility, reusability, and replication.
Bearing in mind that the above topics are a tentative list. You will have an opportunity to shape the course of its development, by merely signing up, diving in, showing your passion, and sharing your enthusiasm. Remember, your active participation will help others catch fire on the topic, and oftentimes, learning is not complete, until it is shared, until it is communicated, ideally to the wider public. I look forward to seeing as many of you as possible.