Lab write-up directions:
For this lab, you will be processing raw responses to the class census in order to test two research hypotheses:
For the purposes of this lab, assume that the class census was distributed as an online survey to a random sample of UO students. You are aiming to generalize your results to the UO student population as a whole.
To answer your research questions, you will need to organize and categorize the data produced by Qualtrics in an Excel spreadsheet. You will need:
We strongly suggest using a word processing application to type up and save your answers before pasting them into Canvas. (This will avoid losing work if Canvas has issues.) Where open written answers are required, please use complete sentences. Approximate length of answer is specified for each question.
Data Analysis Instructions:
1. First, make a folder/working directory on your computer for Geog391 if you don’t already have one. Then save both an original copy of the Excel file (eg “GEOG391 Census Sum2018 LabM4” from Canvas) and a separate, working copyon your computer (eg., “Lab M4_Working copy”). It is easy to accidentally delete data or mix up variables in Excel, so you should get in the habit of saving an original copy that you do not edit.
2. Familiarize yourself with the data. Are the questions in order? What do rows and columns represent? How can you tell which question the responses in each column are answering? What kind of data is in each column? Which columns are ordinal, nominal, etc.?
3. Clean the data. You will only need part of the data to test your hypotheses and answer all the questions in the lab. Delete all the columns (select the entire column by clicking the letter, then “Edit” -> “Delete”) EXCEPT for: Q16_3, Q16_5, Q7, Q10, and Q29.
4. Evaluate non-response. Not all of the respondents that started the census completed it: there are some blank rows in the remaining data. Go ahead and delete those blank lines. (Note: it can be hard to see, but line 44 is missing a response to Q29). How many responses are left? Given the size of the sample (there are 88 students in the class), are there enough non-responses to threaten the validity of the data?
5. Categorize and count. In order to get from the data to a test of the hypothesis, you will have to turn the responses into counts. First, you will set up a cross-tab (See Schutt 2017, 182-185). For Hypothesis 1, we are testing the relationship between the importance of food to culture and ethnicity. Accordingly, your cross-tab will have one variable (importance of food to culture) tabulated against the other variable (ethnicity). In the data we measured “importance of food to culture” using a self-reported indication of agreement with a five-level ordinal ranking, ranging from “strongly agree” to “strongly disagree.” We measured ethnicity using a self-reported indicator, using categories drawn from the US 2010 Census. Looking at hypothesis 1, the only ethnic distinction that matters to this question is the distinction between “white” and “other”. Therefore, the crosstab should look like this:
WhiteOther(totals)Strongly agreeSomewhat agreeNeither agree nor disagreeSomewhat disagreeStrongly disagree(totals)
Complete for the first 15 rows of the data.
6. Test the hypothesis (H1). If there is no relationship between ethnicity and the importance of food, then the numbers of respondents of each group who picked each responses should be roughly similar. Here is a complete cross-tab for the whole class:
WhiteOther(totals)Strongly agree132336Somewhat agree14923Neither agree nor disagree639Somewhat disagree213Strongly disagree112(totals)363773
(Note: This was made using an Excel formula complicated enough we’re not going to try to make you do it yourselves for this Lab.)
It looks like white-identifying respondents tended to assign a lower rank to the importance of food to their culture. But is that relationship statistically significant? To see, run a chi-square test using this website (Links to an external site.).
Select the appropriate number of rows (5) and columns (2) in the first screen, and then in the second screen enter the data above. Click calculate. Is the relationship statistically significant? (Hint: look for the p-value.)
7. Repeat for Hypothesis 2. Complete the cross-tab visualizing the relationship between international student status and agreement that food is the best part of traveling. Complete this cross-tab for the first 15 rows of class census data.
The p-value for this crosstab is about ~.22; well short of statistical significance.
8. Reclassify Data. However, our hypothesis doesn’t concern all five points along the Likert scale–it only inquires about the binary difference between agreeing or not agreeing.