Which type of figure should be used to represent these data—a bar graph, histogram, or frequency polygon? Why? Draw the appropriate figure for these data.

FEATURES OF SCALES OF MEASUREMENT

SCALE OF MEASUREMENTNominalOrdinalIntervalRatioExamplesEthnicity
Religion
SexClass rank
Letter gradeTemperature
(Fahrenheit and Celsius)
Many psychological testsWeight
Height
TimePropertiesIdentityIdentity
MagnitudeIdentity
Magnitude
Equal unit sizeIdentity
Magnitude
Equal unit size
Absolute zeroMathematical OperationsDetermine whether = or *Determine whether = or *
Determine
whether < or >Determine whether = or *
Determine whether < or >
Add
SubtractDetermine
whether = or *
Determine
whether < or >
Add
Subtract
Multiply
DivideTypical
Statistics UsedMode
Chi-squareMode
Median
Wilcoxon testsMode
Median
Mean
t test
ANOVAMode
Median
Mean
t test
ANOVA

1. Provide several operational definitions of anxiety. Include nonverbal measures and physiological measures. How would your operational definitions differ from a dictionary definition?

2. Identify the scale of measurement for each of the following:

a. Phone area code

b. Grade of egg (large, medium, small)

c. Amount of time spent studying

d. Score on the SAT

e. Class rank

f. Number on a volleyball jersey

g. Miles per gallon

Discrete and Continuous Variables

Another means of classifying variables is in terms of whether they are discrete or continuous in nature. Discrete variables usually consist of whole-number units or categories. They are made up of chunks or units that are detached and distinct from one another. A change in value occurs a whole unit at a time, and decimals do not make sense with discrete scales. Most nominal and ordinal data are discrete. For example, gender, political party, and ethnicity are discrete scales. Some interval or ratio data can be discrete. For example, the number of children someone has would be reported as a whole number (discrete data), yet it is also ratio data (you can have a true zero and form ratios).

discrete variables Variables that usually consist of whole-number units or categories and are made up of chunks or units that are detached and distinct from one another.

Continuous variables usually fall along a continuum and allow for fractional amounts. The term continuous means that it “continues” between the whole-number units. Examples of continuous variables are age (22.7 years), height (64.5 inches), and weight (113.25 pounds). Most interval and ratio data are continuous in nature.

continuous variables Variables that usually fall along a continuum and allow for fractional amounts.

REVIEW OF KEY TERMS

absolute zero (p. 15)

continuous variables (p. 18)

discrete variables (p. 18)

equal unit size (p. 14)

identity (p. 14)

interval scale (p. 16)

magnitude (p. 14)

nominal scale (p. 15)

operational definition (p. 14)

ordinal scale (p. 16)

ratio scale (p. 16)

Chapter2

In this chapter, and the next, we discuss what to do with the observations made when conducting a study—namely, how to describe the data set through the use of descriptive statistics. First, we consider ways of organizing the data. We need to take the large number of observations made during the course of a study and present them in a manner that is easier to read and understand. Then, we discuss some simple descriptive statistics. These statistics allow us to do some “number crunching”—to condense a large number of observations into a summary statistic or set of statistics. The concepts and statistics described in this section can be used to draw conclusions from data. They do not come close to covering all that can be done with data gathered from a study. They do, however, provide a place to start.

MODULE 3

Organizing Data

Learning Objectives

• Organize data in a frequency distribution.

• Organize data in a class interval frequency distribution.

• Graph data in a bar graph.

• Graph data in a histogram.

• Graph data in a frequency polygon.

We will discuss two methods of organizing data: frequency distributions and graphs.

Frequency Distributions

To illustrate the processes of organizing and describing data, let’s use the data set presented in Table 3.1. These data represent the scores of 30 students on an introductory psychology exam. One reason for organizing data and using statistics is so that meaningful conclusions can be drawn. As you can see from Table 3.1, our list of exam scores is simply that—a list in no particular order. As shown here, the data are not especially meaningful. One of the first steps in organizing these data might be to rearrange them from highest to lowest or lowest to highest.

Once this is accomplished (see Table 3.2), we can try to condense the data into a frequency distribution—a table in which all of the scores are listed along with the frequency with which each occurs. We can also show a relative frequency distribution, which indicates the proportion of the total observations included in each score. When the relative frequency distribution is multiplied by 100, it is read as a percentage. A frequency distribution and a relative frequency distribution of our exam data are presented in Table 3.3.

frequency distribution A table in which all of the scores are listed along with the frequency with which each occurs.

The frequency distribution is a way of presenting data that makes the pattern of the data easier to see. We can make the data set even easier to read (especially desirable with large data sets) if we group the scores and create a class interval frequency distribution. We can combine individual scores into categories, or intervals, and list them along with the frequency of scores in each interval. In our exam score example, the scores range from 45 to 95—a 50-point range. A rule of thumb when creating class intervals is to have between 10 and 20 categories (Hinkle, Wiersma, & Jurs, 1988). A quick method of calculating what the width of the interval should be is to subtract the smallest score from the largest score and then divide by the number of intervals you would like (Schweigert, 1994). If we wanted 10 intervals in our example, we would proceed as follows to determine the width of each interval:

95−4510=510=595−4510=510=5<math xmlns=”http://www.w3.org/1998/Math/MathML” display=”inline” alttext=”math”><mrow><mfrac><mrow><mn>95</mn><mo>−</mo><mn>45</mn></mrow><mrow><mn>10</mn></mrow></mfrac><mo>=</mo><mfrac><mn>5</mn><mrow><mn>10</mn></mrow></mfrac><mo>=</mo><mn>5</mn></mrow></math>

class interval frequency distribution A table in which the scores are grouped into intervals and listed along with the frequency of scores in each interval.

The frequency distribution using the class intervals with a width of 5 is provided in Table 3.4. Notice how much more compact the data appear when presented in a class interval frequency distribution. Although such distributions have the advantage of reducing the number of categories, they have the disadvantage of not providing as much information as a regular frequency distribution. For example, although we can see from the class interval frequency distribution that five people scored between 75 and 79, we do not know their exact scores within the interval.

Graphing Data

Frequency distributions can provide valuable information, but sometimes a picture is of greater value. Several types of pictorial representations can be used to represent data. The choice depends on the type of data collected and what the researcher hopes to emphasize or illustrate. The most common graphs used by psychologists are bar graphs, histograms, and frequency polygons (line graphs). Graphs typically have two coordinate axes, the x-axis (the horizontal axis) and the y-axis (the vertical axis). Most commonly, the y-axis is shorter than the x-axis, typically 60% to 75% of the length of the x-axis.

Bar Graphs and Histograms

Bar graphs and histograms are frequently confused. When the data collected are on a nominal scale, or if the variable is a qualitative variable (a categorical variable for which each value represents a discrete category), then a bar graph is most appropriate. A bar graph is a graphical representation of a frequency distribution in which vertical bars are centered above each category along the x-axis and are separated from each other by a space, indicating that the levels of the variable represent distinct, unrelated categories.

qualitative variable A categorical variable for which each value represents a discrete category.

bar graph A graphical representation of a frequency distribution in which vertical bars are centered above each category along the x-axis and are separated from each other by a space, indicating that the levels of the variable represent distinct, unrelated categories.

If the variable is a quantitative variable (the scores represent a change in quantity), or if the data collected are ordinal, interval, or ratio in scale, then a histogram can be used. A histogram is also a graphical representation of a frequency distribution in which vertical bars are centered above scores on the x-axis, but in a histogram the bars touch each other to indicate that the scores on the variable represent related, increasing values.

quantitative variable A variable for which the scores represent a change in quantity.

histogram A graphical representation of a frequency distribution in which vertical bars centered above scores on the x-axis touch each other to indicate that the scores on the variable represent related, increasing values.

In both a bar graph and a histogram, the height of each bar indicates the frequency for that level of the variable on the x-axis. The spaces between the bars on the bar graph indicate not only the qualitative differences among the categories but also that the order of the values of the variable on the x-axis is arbitrary. In other words, the categories on the x-axis in a bar graph can be placed in any order. The fact that the bars are contiguous in a histogram indicates not only the increasing quantity of the variable but also that the variable has a definite order that cannot be changed.

A bar graph is illustrated in Figure 3.1. For a hypothetical distribution, the frequencies of individuals who affiliate with various political parties are indicated. Notice that the different political parties are listed on the x-axis, whereas frequency is recorded on the y-axis. Although the political parties are presented in a certain order, this order could be rearranged because the variable is qualitative.

Figure 3.2 illustrates a histogram. In this figure, the frequencies of intelligence test scores from a hypothetical distribution are indicated. A histogram is appropriate because the IQ score variable is quantitative. The variable has a specific order that cannot be rearranged. You can see how to use Excel and SPSS to create both bar graphs and histograms in the Statistical Software Resources section at the end of this chapter. If you are unfamiliar with Excel or SPSS, see Appendix C to get started with these tools.

Frequency Polygons (Line Graphs)

We can also depict the data in a histogram as a frequency polygon—a line graph of the frequencies of individual scores or intervals. Again, scores (or intervals) are shown on the x-axis and frequencies on the y-axis. Once all the frequencies are plotted, the data points are connected. You can see the frequency polygon for the intelligence score data in Figure 3.3.

frequency polygon A line graph of the frequencies of individual scores.

Frequency polygons are appropriate when the variable is quantitative or the data are ordinal, interval, or ratio. In this respect, frequency polygons are similar to histograms. Frequency polygons are especially useful for continuous data (such as age, weight, or time) in which it is theoretically possible for values to fall anywhere along the continuum. For example, an individual can weigh 120.5 pounds or be 35.5 years of age. Histograms are more appropriate when the data are discrete (measured in whole units)—for example, number of college classes taken or number of siblings. You can see how to use Excel and SPSS to create frequency polygons in the Statistical Software Resources section at the end of this chapter. If you are unfamiliar with Excel or SPSS, see Appendix C to get started with these tools.

DATA ORGANIZATION

TYPE OF ORGANIZATIONAL TOOLFrequency DistributionBar GraphHistogramFrequency PolygonDescriptionA list of all scores occurring in the distribution along with the frequency of eachA pictorial graph with bars representing the frequency of occurrence of items for qualitative variablesA pictorial graph with bars representing the frequency of occurrence of items for quantitative variablesA pictorial line graph representing the frequency of occurrence of items for quantitative variablesUse withNominal, ordinal, interval, or ratio dataNominal dataTypically ordinal, interval, or ratio data—most appropriate for discrete dataTypically ordinal, interval, or ratio data—more appropriate for continuous data

1. What do you think might be the advantage of a graphical representation of data over a frequency distribution?

2. A researcher observes driving behavior on a roadway, noting the gender of the drivers, the type of vehicle driven, and the speed at which they are traveling. The researcher wants to organize the data in graphs but cannot remember when to use bar graphs, histograms, or frequency polygons. Which type of graph should be used to describe each variable?

REVIEW OF KEY TERMS

bar graph (p. 29)

class interval frequency distribution (p. 27)

frequency distribution (p. 26)

frequency polygon (p. 30)

histogram (p. 29)

qualitative variable (p. 29)

quantitative variable (p. 29)

MODULE EXERCISES

(Answers to odd-numbered questions appear in Appendix B.)

Exercises 1–3: The following data represent a distribution of speeds at which individuals were traveling on a highway.

6464766765686770676580707972736565626864

1. Organize the data into a frequency distribution with frequency (f) and relative frequency (rf) columns.

2. Organize the data into a class interval frequency distribution with 10 intervals and frequency (f) and relative frequency (rf) columns.

3. Which type of figure should be used to represent these data—a bar graph, histogram, or frequency polygon? Why? Draw the appropriate figure for these data.

4. Differentiate a qualitative variable from a quantitative variable.

5. Explain when it would be appropriate to use a bar graph versus a histogram.

6. Explain when it would be appropriate to use a histogram versus a frequency polygon.

CRITICAL THINKING CHECK ANSWERS

Critical Thinking Check 3.1

1. One advantage is that it is easier to “see” the data set in a graphical representation. In other words, with a picture it is easier to determine where the majority of the scores are in the distribution. With a frequency distribution, there is more reading involved before a judgment can be made about the shape of the distribution.

2. Gender and type of vehicle driven are qualitative variables, measured on a nominal scale; thus, a bar graph should be used. The speed at which the drivers are traveling is a quantitative variable, measured on a ratio scale. Either a histogram or a frequency polygon could be used, although a frequency polygon might be better because of the continuous nature of the variable.

MODULE 4

Measures of Central Tendency

Learning Objectives

• Differentiate measures of central tendency.

• Know how to calculate the mean, median, and mode.

• Know when it is most appropriate to use each measure of central tendency.

Organizing data into tables and graphs can help make a data set more meaningful. These methods, however, do not provide as much information as numerical measures. Descriptive statistics are numerical measures that describe a distribution by providing information on the central tendency of the distribution, the width of the distribution, and the distribution’s shape. A measure of central tendency characterizes an entire set of data in terms of a single representative number. Measures of central tendency measure the “middleness” of a distribution of scores in three ways: the mean, median, and mode.

descriptive statistics Numerical measures that describe a distribution by providing information on the central tendency of the distribution, the width of the distribution, and the shape of the distribution.

measure of central tendency A number intended to characterize an entire distribution.

Mean

The most commonly used measure of central tendency is the mean—the arithmetic average of a group of scores. You are probably familiar with this idea. We can calculate the mean for our distribution of exam scores (from the previous module) by adding all of the scores together and dividing by the total number of scores. Mathematically, this would be:

mean A measure of central tendency; the arithmetic average of a distribution.

μ=∑XNμ=∑XN<math xmlns=”http://www.w3.org/1998/Math/MathML” display=”inline” alttext=”math”><mrow><mi>μ</mi><mo>=</mo><mfrac><mrow><mstyle displaystyle=”true”><mo>∑</mo> <mi>X</mi></mstyle></mrow><mi>N</mi></mfrac></mrow></math>

where

μ (pronounced “mu”) represents the symbol for the population mean

Σ represents the symbol for “the sum of”

X represents the individual scores, and

N represents the number of scores in the distribution

To calculate the mean, then, we sum all of the Xs, or scores, and divide by the total number of scores in the distribution (N). You may have also seen this formula represented as follows:

¯¯¯X=∑XNX¯=∑XN<math xmlns=”http://www.w3.org/1998/Math/MathML” display=”inline” alttext=”math”><mrow><mover accent=”true”><mi>X</mi><mo>¯</mo></mover><mo>=</mo><mfrac><mrow><mstyle displaystyle=”true”><mo>∑</mo> <mi>X</mi></mstyle></mrow><mi>N</mi></mfrac></mrow></math>

In this case X represents a sample mean.

We can use either formula (they are the same) to calculate the mean for the distribution of exam scores used in Module 3. These scores are presented again in Table 4.1, along with a column showing frequency (f) and another column showing the frequency of the score multiplied by the score (f times X). The sum of all the values in the fX column is the sum of all the individual scores (ΣX). Using this sum in the formula for the mean, we have:

μ=∑XN=2,22030=74.00μ=∑XN=2,22030=74.00<math xmlns=”http://www.w3.org/1998/Math/MathML” display=”inline” alttext=”math”><mrow><mi>μ</mi><mo>=</mo><mfrac><mrow><mstyle displaystyle=”true”><mo>∑</mo> <mi>X</mi></mstyle></mrow><mi>N</mi></mfrac><mo>=</mo><mfrac><mrow><mn>2</mn><mo>,</mo><mn>220</mn></mrow><mrow><mn>30</mn></mrow></mfrac><mo>=</mo><mn>74.00</mn></mrow></math>

You can also calculate the mean using Excel, SPSS, or the Stats function on most calculators. As an example, the procedure for calculating the mean using each of these tools is presented in the Statistical Software Resources section at the end of this chapter. If you are unfamiliar with Excel or SPSS, see Appendix C to get started with these tools. Use of the mean is constrained by the nature of the data. It is appropriate for interval and ratio data, but it is not appropriate for ordinal or nominal data.

Median

Another measure of central tendency, the median, is used in situations in which the mean might not be representative of a distribution. Let’s use a different distribution of scores to demonstrate when it might be appropriate to use the median rather than the mean. Imagine that you are considering taking a job with a small computer company. When you interview for the position, the owner of the company informs you that the mean income for employees at the company is approximately $100,000 and that the company has 25 employees. Most people would view this as good news. Having learned in a statistics class that the mean might be influenced by extreme scores, you ask to see the distribution of 25 incomes. The distribution is shown in Table 4.2.

The calculation of the mean for this distribution is:

∑XN=2,498,00025=99,920∑XN=2,498,00025=99,920<math xmlns=”http://www.w3.org/1998/Math/MathML” display=”inline” alttext=”math”><mrow><mfrac><mrow><mstyle displaystyle=”true”><mo>∑</mo> <mi>X</mi></mstyle></mrow><mi>N</mi></mfrac><mo>=</mo><mfrac><mrow><mn>2</mn><mo>,</mo><mn>498</mn><mo>,</mo><mn>000</mn></mrow><mrow><mn>25</mn></mrow></mfrac><mo>=</mo><mn>99</mn><mo>,</mo><mn>920</mn></mrow></math>

Notice that, as claimed, the mean income of company employees is very close to $100,000. Notice also, however, that the mean in this case is not very representative of central tendency, or “middleness.” In this distribution, the mean is thrown off center or inflated by one very extreme score of $1,800,000 (the income of the company’s owner, needless to say). This extremely high income pulls the mean toward it and thus increases or inflates the mean. Thus, in distributions with one or a few extreme scores (either high or low), the mean will not be a good indicator of central tendency. In such cases, a better measure of central tendency is the median.

The median is the middle score in a distribution after the scores have been arranged from highest to lowest or lowest to highest. The distribution of incomes in Table 4.2 is already ordered from lowest to highest. To determine the median, we simply have to find the middle score. In this situation, with 25 scores, that would be the 13th score. You can see that the median of the distribution would be an income of $27,000, which is far more representative of the central tendency for this distribution of incomes.

median A measure of central tendency; the middle score in a distribution after the scores have been arranged from highest to lowest or lowest to highest.

Why is the median not as influenced as the mean by extreme scores? Think about the calculation of each of these measures. When calculating the mean, we must add in the atypical income of $1,800,000, thus distorting the calculation. When determining the median, however, we do not consider the size of the $1,800,000 income; it is only a score at one end of the distribution whose numerical value does not have to be considered in order to locate the middle score in the distribution. The point to remember is that the median is not affected by extreme scores in a distribution because it is only a positional value. The mean is affected because its value is determined by a calculation that has to include the extreme value.

In the income example, the distribution had an odd number of scores (N = 25). Thus, the median was an actual score in the distribution (the 13th score). In distributions with an even number of observations, the median is calculated by averaging the two middle scores. In other words, we determine the middle point between the two middle scores. Look back at the distribution of exam scores in Table 4.1. This distribution has 30 scores. The median would be the average of the 15th and 16th scores (the two middle scores). Thus, the median would be 75.5—not an actual score in the distribution, but the middle point nonetheless. Notice that in this distribution, the median (75.5) is very close to the mean (74.00). Why are they so similar? Because this distribution contains no extreme scores, both the mean and the median are representative of the central tendency of the distribution.

Like the mean, the median can be used with ratio and interval data and is inappropriate for use with nominal data, but unlike the mean, the median can be used with most ordinal data.

Mode

The third measure of central tendency is the mode—the score in a distribution that occurs with the greatest frequency. In the distribution of exam scores, the mode is 74 (similar to the mean and median). In the distribution of incomes, the mode is $25,000 (similar to the median, but not the mean). In some distributions, all scores occur with equal frequency; such a distribution has no mode. In other distributions, several scores occur with equal frequency. Thus, a distribution may have two modes (bimodal), three modes (trimodal), or even more. The mode is the only indicator of central tendency that can be used with nominal data. Although it can also be used with ordinal, interval, or ratio data, the mean and median are more reliable indicators of the central tendency of a distribution, and the mode is seldom used.

mode A measure of central tendency; the score in the distribution that occurs with the greatest frequency.

MEASURES OF CENTRAL TENDENCY

TYPE OF CENTRAL TENDENCY MEASUREMeanMedianModeDefinitionThe arithmetic averageThe middle score in a distribution of scores organized from highest to lowest or lowest to highestThe score occurring with greatest frequencyUse withInterval and ratio dataOrdinal, interval, and, ratio dataNominal, ordinal, interval, or ratio dataCautionNot for use with distributions with a few extreme scoresNot a reliable measure of central tendency

1. In the example described in Critical Thinking Check 3.1, a researcher collected data on drivers’ gender, type of vehicle, and speed of travel. What would be an appropriate measure of central tendency to calculate for each type of data?

2. If one driver was traveling at a rate of 100 mph (25 mph faster than anyone else), which measure of central tendency would you recommend against using?

REVIEW OF KEY TERMS

descriptive statistics (p. 34)

mean (p. 34)

measure of central tendency (p. 34)

median (p. 36)

mode (p. 38)

MODULE EXERCISES

(Answers to odd-numbered questions appear in Appendix B.)

1. The following data represent a distribution of speeds at which individuals were traveling on a highway.

6473657665706568726764656767628068647970

Calculate the mean, median, and mode for the speed distribution data set.

2. For the distribution in Exercise 1, which measure of central tendency is most appropriate and why?

Exercises 3–6: Calculate the mean, median, and mode for the following four distributions.

3. 2, 2, 4, 5, 8, 9, 10, 11, 11, 11

4. 1, 2, 3, 4, 4, 5, 5, 5, 6, 6, 8, 9

5. 1, 3, 3, 3, 5, 5, 8, 8, 8, 9, 10, 11

6. 2, 3, 4, 5, 6, 6, 6, 7, 8, 8

7. For the following two distributions, indicate which measure(s) of central tendency would be appropriate for each.

Distribution A: 10, 11, 11, 12, 12, 12, 13, 13, 14

Distribution B: 10, 11, 11, 12, 12, 12, 13, 13, 100

CRITICAL THINKING CHECK ANSWERS

Critical Thinking Check 4.1

1. Because gender and type of vehicle driven are nominal data, the mode can be determined. However, it is inappropriate to use the median or the mean with these data. The speed at which the drivers are traveling is ratio in scale; thus, the mean, median, or mode could be used. The mean and median would be better indicators of central tendency.

2. In this case, the mean should not be used because of the single outlier in the distribution.

CHAPTER TWO SUMMARY AND REVIEW

Descriptive Statistics I

CHAPTER SUMMARY

This chapter discussed data organization and descriptive statistics. Several methods of data organization were presented, including how to design a frequency distribution, a bar graph, a histogram, and a frequency polygon. The type of data appropriate for each of these methods was also discussed. One category of descriptive statistics that summarizes a large data set includes measures of central tendency (mean, median, and mode). These statistics provide information about the central tendency, or “middleness,” of a distribution of scores. The mean is the arithmetic average; the median is the middle score in a distribution of scores after the scores have been ordered from highest to lowest, or lowest to highest; and the mode is the score that occurs with the greatest frequency.

CHAPTER 2 REVIEW EXERCISES

(Answers to exercises appear in Appendix B.)

Fill-in Self-Test

Answer the following questions. If you have trouble answering any of the questions, restudy the relevant material before going on to the multiple-choice self-test.

1. A __________ is a table in which all of the scores are listed along with the frequency with which each occurs.

2. A categorical variable for which each value represents a discrete category is a __________ variable.

3. A graphical representation of a frequency distribution in which vertical bars centered above scores on the x-axis touch each other to indicate that the scores on the variable represent related, increasing values is a __________.

4. Measures of __________ are numbers intended to characterize an entire distribution.

5. The __________ is the middle score in a distribution after the scores have been arranged from highest to lowest or lowest to highest.

Multiple-Choice Self-Test

Select the single best answer for each of the following questions. If you have trouble answering any of the questions, restudy the relevant material.

1. A __________ is graphical representation of a frequency distribution in which vertical bars are centered above each category along the x-axis and are separated from each other by a space indicating that the levels of the variable represent distinct, unrelated categories.

a. histogram

b. frequency polygon

c. bar graph

d. class interval histogram

2. Qualitative variable is to quantitative variable as __________ is to __________.

a. categorical variable; numerical variable

b. numerical variable; categorical variable

c. bar graph; histogram

d. categorical variable and bar graph; numerical variable and histogram

3. Seven Girl Scouts reported the following individual earnings from their sale of cookies: $17, $23, $13, $15, $12, $19, and $13. In this distribution of individual earnings, the mean is __________ the mode and __________ the median.

a. equal to; equal to

b. greater than; equal to

c. equal to; less than

d. greater than; greater than

4. When Dr. Thomas calculated her students’ history test scores, she noticed that one student had an extremely high score. Which measure of central tendency should be used in this situation?

a. mean

b. standard deviation

c. median

d. either the mean or the median

5. Imagine that 4,999 people who are penniless live in Medianville. An individual whose net worth is $500,000,000 moves to Medianville. Now the mean net worth in this town is __________ and the median net worth is __________.

a. 0; 0

b. $100,000; 0

c. 0; $100,000

d. $100,000; $100,000

6. Middle score in the distribution is to __________ as score occurring with the greatest frequency is to __________.

a. mean; median

b. median; mode

c. mean; mode

d. mode; median

7. Mean is to __________ as mode is to __________.

a. ordinal, interval, and ratio data only; nominal data only

b. nominal data only; ordinal data only

c. interval and ratio data only; all types of data

d. None of the above

Self-Test Problems

1. For the following distribution, organize the data into a frequency distribution with frequency (f) and relative frequency(rf) columns.

1, 1, 2, 2, 4, 5, 8, 9, 10, 11, 11, 11

2. Calculate the mean, median, and mode for the distribution in Problem 1.

CHAPTER TWO

Statistical Software Resources

If you need help getting started with Excel or SPSS, please see Appendix C: Getting Started with Excel and SPSS. The procedures outlined in all of the Statistical Software Resources sections will work with Excel 2007, 2010, and 2013; with SPSS 18-22; and on the TI-83 and TI-84, regular and plus versions.

MODULE 3 Organizing Data

Using Excel to Create a Bar Graph

Begin by entering the data from Figure 3.1 in Module 3 into an Excel spreadsheet, as follows. Please note that the column headings of “Affiliation” and “Frequency” are entered into the spreadsheet. Once the data are entered, highlight all of the data including the column headers.

Now select the Insert ribbon and then Column (in Excel 2013, this is found in the Charts menu, also please note that there is a Bar option for figures but that this produces horizontal bars, whereas the bars in a bar graph should be vertical). Select the top left option from the Column options (2-D column chart). This should produce the following bar graph:

Notice that the different political parties are listed on the x-axis, whereas frequency is recorded on the y-axis. Excel provides Chart Tools so that we can modify the appearance of a graph. For example, if you want the bar graph to conform to APA style, you could use Chart Tools to modify the appearance of the chart. To use Chart Tools, make sure that you have clicked on the chart in Excel after which the three ribbons (two in Excel 2013—Design and Format) under Chart Tools (Design, Layout, and Format) will become accessible. Using these menus, you can change the appearance of the chart to, for example, add Axis Titles (under the Layout ribbon in 2007 and 2010 or under the Design ribbon by using the Add Chart Element menu in 2013), remove the horizontal Gridlines (under the Layout ribbon, again using the Add Chart Element menu in 2013), or change the color of the bars (Excel uses blue as the default) by using the Format ribbon, clicking one of the bars, and selecting Shape Fill. After making these modifications, your chart will appear as follows:

Please also note that although the political parties are presented in a certain order, this order could be rearranged because the variable is qualitative.

Using Excel to Create a Histogram

To illustrate the difference between a bar graph and a histogram, let’s use the data from the table below, which lists the frequencies of intelligence test scores from a hypothetical distribution of 30 individuals. A histogram is appropriate for these data because the IQ score variable is quantitative. The variable has a specific order that cannot be rearranged.

Begin by entering the data into an Excel spreadsheet, as follows. Please note that the column headings of “Score” and “Frequency” are entered into the spreadsheet. Once the data are entered, highlight only the “Frequency” data as is illustrated in the next screen capture.

Because Excel does not have a histogram option in which the bars in the graph touch, we’ll have to use special formatting to create the histogram. Click on the Insert ribbon and then Column (in Excel 2013, this is in the Charts menu). Select the option in the top left corner, as we did when creating bar graphs. This should produce the following graph:

We’ll begin editing the graph by removing the spaces between the bars. To do so, right-click on any of the bars and select Format Data Series to produce the following pop-up window (in 2013 this window will appear on the right side of the screen):

Move the Gap Width tab to zero as is indicated in the window and then close the window. Your figure should now more closely resemble a histogram. Now you can use the Chart Tools to modify your figure so that it more closely resembles what you desire. This should include axis labels on the x- and y-axes and changing the values on the x-axis to reflect the range of intelligence scores that were measured. To accomplish the latter, right-click on a value on the x-axis and choose Select Data… to produce the following pop-up window:

Click on the Edit window under Horizontal (Category) Axis Labels. You’ll receive the following pop-up window:

Highlight the IQ scores from the spreadsheet and they will be inserted into the Axis label range: box. Then click OK. Click OK a second time to close the original pop-up window. You can now use the Chart Tools to format your histogram so that it more closely resembles a graph appropriate for APA style. This would include adding axis labels to the x– and y-axes, changing the bars from blue to black, and removing the gridlines from the graph. After making these changes, your figure should look as follows:

Using Excel to Create a Frequency Polygon (Line Graph)

Begin by entering the intelligence test score data from the table presented in the previous example into an Excel spreadsheet, as follows. Then, highlight only the Frequency data as is illustrated in the next screen capture.

Next, click on the Insert ribbon and then Line (in 2013 this is in the Charts menu). Select the option in the top left corner (the first 2-D line option). This should produce the following graph:

Now, right-click on a value on the x-axis and then click Select Data… to produce the following pop-up window:

Click on the Edit window under Horizontal (Category) Axis Labels. You’ll receive the following pop-up window:

Highlight the IQ scores from the spreadsheet and they will be inserted into the Axis label range: box. Then click OK. Click OK a second time to close the original pop-up window. You can now use the Chart Tools to format your frequency polygon so that it more closely resembles line graphs appropriate for APA style. This would include adding axis labels to the x- and y-axes, changing the line from blue to black, and removing the grid-lines from the graph. After making these changes, your figure should look as follows:

Using SPSS to Create a Bar Graph

We’ll use the same data as in the earlier example (Figure 3.1 in Module 3) to illustrate how to use SPSS to create a bar graph. To begin, we enter the data into the SPSS spreadsheet. As with Excel we use two columns, one labeled Affiliation and one labeled Frequency, as can be seen in the following screen capture.

Next, we click on the Graphs menu and then Chart Builder. From the Gallery menu on the bottom of the dialog box select Bar and then double-click the first bar graph icon in the top row to produce the following dialog box.

You can see that the two variables are listed in the top left Variables box. Drag the Affiliation variable to the x-axis box in the figure on the top right, and then drag the Frequency variable to the y-axis box in the figure on the top right. The dialog box should now look as follows:

Next, click on the Element Properties… box on the right-hand side of the dialog box to receive the following dialog box.

Click OK in the original dialog box. SPSS will then produce an output file with the following bar graph.

Using SPSS to Create a Histogram

Let’s use the same data set as in the Excel histogram example to create a histogram with SPSS. Thus, we’ll enter the IQ score data into the SPSS spreadsheet. However, in this case, each individual score is entered. This is illustrated in the screen capture below in which all 30 scores have been entered into SPSS. (Please note that due to screen size constraints, the final four scores do not show in the screen capture. Thus, make sure you use the IQ data from the earlier table that we used when creating a histogram using Excel.)

The variable was named IQscore using the Variable View screen and it was designated a Numeric variable with the Scale level of measurement. To name the variable, click on the Variable View tab at the bottom of the window and type the name you wish to give the variable in the highlighted Name box. The variable name cannot have any spaces in it. Because these data represent intelligence score data, we’ll type in IQscore. Note also that the Type of data is Numeric. Once the variable is named, highlight the Data View tab on the bottom left of the screen in order to get back to the data spreadsheet. (See Appendix C: “Getting Started with Excel and SPSS,” if you are unfamiliar with naming variables.) From the Data View spreadsheet screen, select Graphs, and then Chart Builder… to receive the following dialog boxes.

Select Histogram and then double-click on the first example of a histogram. In the dialog box on the top left of the screen, click on IQscore and drag it to the x-axis box in the histogram on the right. Then, in the Element Properties box on the right highlight Bar1, as in the screen capture above and then click on Set Parameters to receive the following dialog box:

Make sure that Automatic is selected as the option in the first box. In the second box, select Custom and set the Number of intervals at 18 (the number of different IQ scores received by the 30 participants in the study). Then click Continue and then Apply. Finally, click OK in the dialog box on the left and you should receive the histogram in the output file.

Notice that the bars are touching, except for those instances in which there were missing scores.

Using SPSS to Create a Frequency Polygon (Line Graph)

We’ll once again use the intelligence test score data to illustrate how to create a frequency polygon using SPSS. Enter the data in the same manner we did when we created a histogram in SPSS. In other words, enter each individual score on a separate line in SPSS so that all 30 scores in the distribution are entered individually as we did earlier in the module when creating the histogram. Once the data are entered, named, and coded as numeric with the scale level of measurement, click on Graphs and then Chart Builder to receive the following dialog boxes:

Double-click on the Line graph option in the lower left of the screen, and then double-click on the first example of a line graph. Then drag the IQscore variable from the top left of the screen to x-axis box. In the Element Properties dialog box on the right of the screen, highlight Line 1 and then Histogram in the Statistic box. Click on the Set Parameters box to receive the following dialog box:

Select Automatic in the first box, and then Custom in the second box indicating that the number of intervals should be 52 (the total range of IQ scores for our group of 30 individuals). Click Continue and then Apply. Finally, click OK to execute the procedure. You should receive the following frequency polygon.

MODULE 4 Measures of Central Tendency

Using Excel to Calculate the Mean, Median, and Mode

To begin using Excel to conduct data analyses, the data must be entered into an Excel spreadsheet. This simply involves opening Excel and entering the data into the spreadsheet. You can see in the following spreadsheet that I have entered the exam grade data from Table 4.1 in Module 4 into an Excel spreadsheet.

Once the data have been entered, we use the Data Analysis tool to calculate descriptive statistics. This is accomplished by clicking on the Data tab or ribbon and then clicking the Data Analysis icon on the top right side of the window. Once the Data Analysis tab is active, a dialog box of options will appear (see next).

Select Descriptive Statistics as is indicated in the preceding box, and then click OK. This will lead to the following dialog box:

With the cursor in the Input Range box, highlight the data that you want analyzed from Column A in the Excel spreadsheet so that they appear in the input range. In addition, check the Summary statistics box. Once you have done this, click OK. The summary statistics will appear in a new Worksheet, as seen next.

As you can see, there are several descriptive statistics reported, including all three measures of central tendency (mean, median, and mode).

Using SPSS to Calculate the Mean

As with the Excel exercise above, we will once again be using the data from Table 4.1 in Module 4 to calculate descriptive statistics. We begin by entering the data from Table 4.1 into an SPSS spreadsheet. This simply involves opening SPSS and entering the data into the spreadsheet. You can see in the following spreadsheet that I have entered the exam grade data from Table 4.1 into an SPSS spreadsheet.

Notice that the variable is simply named VAR00001. To rename the variable to something appropriate for your data set, click on the Variable View tab on the bottom left of the screen. You will see the following window:

Type the name you wish to give the variable in the highlighted Name box. The variable name cannot have any spaces in it. Because these data represent exam grade data, we’ll type in Examgrade. Note also that the Type of data is Numeric. Once the variable is named, highlight the Data View tab on the bottom left of the screen in order to get back to the data spreadsheet. Once you’ve navigated back to the data spreadsheet, click on the Analyze tab at the top of the screen and a drop-down menu with various statistical analyses will appear. Select Descriptive Statistics and then Descriptive. The following dialog box will appear:

Examgrade will be highlighted, as above. Click on the arrow in the middle of the window and the Examgrade variable will be moved over to the Variables box. Then click on Options to receive the following dialog box:

You can see that the Mean, Standard Deviation, Minimum, and Maximum are all checked. However, you could select any of the descriptive statistics you want calculated. After making your selections, click Continue and then OK. The output will appear on a separate page as an Output file like the one below where you can see the minimum and maximum scores for this distribution along with the mean exam score of 74. Please note that if you had more than one set of data—for example, two classes of exam scores—they could each occupy one column in your SPSS spreadsheet and you could conduct analyses on both variables at the same time. In this situation, separate descriptive statistics would be calculated for each data set.

Using the TI-84 to Calculate the Mean

Follow the steps below to use your TI-84 calculator to calculate the mean for the data set from Table 4.1 in Module 4.

1. With the calculator on, press the STAT key.

2. EDIT will be highlighted. Press the ENTER key.

3. Under L1 enter the data from Table 4.1.

4. Press the STAT key again and highlight CALC.

5. Number 1: 1—VAR STATS will be highlighted. Press ENTER.

6. Press ENTER once again.

The statistics for the single variable on which you entered data will be presented on the calculator screen. The mean is presented on the first line of output as ¯¯¯XX¯<math xmlns=”http://www.w3.org/1998/Math/MathML” display=”inline” alttext=”math”><mover accent=”true”><mi>X</mi><mo>¯</mo></mover></math>.