By the end of this chapter, the student should be able to:
You are probably asking yourself the question, "When and where will I use statistics?" If you read any newspaper, watch television, or use the Internet, you will see statistical information. There are statistics about crime, sports, education, politics, and real estate. Typically, when you read a newspaper article or watch a television news program, you are given sample information. With this information, you may make a decision about the correctness of a statement, claim, or "fact." Statistical methods can help you make the "best educated guess."
Since you will undoubtedly be given statistical information at some point in your life, you need to know some techniques for analyzing the information thoughtfully. Think about buying a house or managing a budget. Think about your chosen profession. The fields of economics, business, psychology, education, biology, law, computer science, police science, and early childhood development require at least one course in statistics.
Included in this chapter are the basic ideas and words of probability and statistics. You will soon understand that statistics and probability work together. You will also learn how data are gathered and what "good" data can be distinguished from "bad."
In statistics, we generally want to study a population. You can think of a population as a collection of persons, things, or objects under study. To study the population, we select a sample. The idea of sampling is to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population.
Example:

From the sample data, we can calculate a statistic. A statistic is a number that represents a property of the sample. For example, if we consider one math class to be a sample of the population of all math classes, then the average number of points earned by students in that one math class at the end of the term is an example of a statistic. The statistic is an estimate of a population parameter. A parameter is a number that is a property of the population. Since we considered all math classes to be the population, then the average number of points earned per student over all the math classes is an example of a parameter.
Population: all math classes Sample: One of the math classes Parameter: Average number of points earned per student over all math classesOne of the main concerns in the field of statistics is how accurately a statistic estimates a parameter. The accuracy really depends on how well the sample represents the population. The sample must contain the characteristics of the population in order to be a representative sample. We are interested in both the sample statistic and the population parameter in inferential statistics. In a later chapter, we will use the sample statistic to test the validity of the established population parameter.
A variable, notated by capital letters such as X and Y, is a characteristic of interest for each person or thing in a population. Variables may be numerical or categorical. Numerical variables take on values with equal units such as weight in pounds and time in hours. Categorical variables place the person or thing into a category.
Example:

Data are the actual values of the variable. They may be numbers or they may be words. Datum is a single value.
Two words that come up often in statistics are mean and proportion.
If you were to take three exams in your math classes and obtain scores of 86, 75, and 92, you would calculate your mean score by adding the three exam scores and dividing by three (your mean score would be 84.3 to one decimal place). If, in your math class, there are 40 students and 22 are men and 18 are women, then the proportion of men students is 2240 and the proportion of women students is 1840. Mean and proportion are discussed in more detail in later chapters.NOTE (We will learn the meaning of mean in next chapter! )The words "mean" and "average" are often used interchangeably. The substitution of one word for the other is common practice. The technical term is "arithmetic mean," and "average" is technically a center location. However, in practice among nonstatisticians, "average" is commonly accepted for "arithmetic mean." 
A study was conducted at a local college to analyze the average cumulative GPA’s of students who graduated last year. Fill in the letter of the phrase that best describes each of the items below.
[revealanswer q="723838"]Population[/revealanswer] [hiddenanswer a="723838"]all students who attended the college last year[/hiddenanswer]
[revealanswer q="509038"]Sample[/revealanswer] [hiddenanswer a="509038"] a group of students who graduated from the college last year, randomly selected[/hiddenanswer] [revealanswer q="292183"]Data[/revealanswer] [hiddenanswer a="292183"]3.65, 2.80, 1.50, 3.90[/hiddenanswer] [revealanswer q="785220"]Statistics[/revealanswer] [hiddenanswer a="785220"]the cumulative GPA of one student who graduated from the college last year[/hiddenanswer] [revealanswer q="568316"]Variable[/revealanswer] [hiddenanswer a="568316"]the average cumulative GPA of students who graduated from the college last year[/hiddenanswer] [revealanswer q="396728"]Parameter: [/revealanswer] [hiddenanswer a="396728"]the average cumulative GPA of students in the study who graduated from the college last year[/hiddenanswer]Determine what the key terms refer to in the following study. We want to know the average (mean) amount of money first year college students spend at ABC College on school supplies that do not include books. We randomly survey 100 first year students at the college. Three of those students spent $150, $200, and $225, respectively.
As part of a study designed to test the safety of automobiles, the National Transportation Safety Board collected and reviewed data about the effects of an automobile crash on test dummies. Here is the criterion they used:
Speed at which Cars Crashed  Location of “drive” (i.e. dummies) 
35 miles/hour  Front Seat 
Cars with dummies in the front seats were crashed into a wall at a speed of 35 miles per hour. We want to know the proportion of dummies in the driver’s seat that would have had head injuries, if they had been actual drivers. We start with a simple random sample of 75 cars.
[revealanswer q="204705"]Population: [/revealanswer] [hiddenanswer a="204705"]all cars containing dummies in the front seat.[/hiddenanswer] [revealanswer q="960223"]Sample: [/revealanswer] [hiddenanswer a="960223"]the 75 cars, selected by a simple random sample.[/hiddenanswer] [revealanswer q="77939"]Parameter: [/revealanswer] [hiddenanswer a="77939"]the proportion of driver dummies (if they had been real people) who would have suffered head injuries in the population.[/hiddenanswer] [revealanswer q="50365"]Statistics: [/revealanswer] [hiddenanswer a="50365"]proportion of driver dummies (if they had been real people) who would have suffered head injuries in the sample.[/hiddenanswer] [revealanswer q="845938"]Variable: [/revealanswer] [hiddenanswer a="845938"]the number of driver dummies (if they had been real people) who would have suffered head injuries.[/hiddenanswer] [revealanswer q="639922"]Data: [/revealanswer] [hiddenanswer a="639922"] yes, had head injury, or no, did not.[/hiddenanswer]
An insurance company would like to determine the proportion of all medical doctors who have been involved in one or more malpractice lawsuits. The company selects 500 doctors at random from a professional directory and determines the number in the sample who have been involved in a malpractice lawsuit. [revealanswer q="59117"]Population: [/revealanswer] [hiddenanswer a="59117"]all medical doctors listed in the professional directory.[/hiddenanswer] [revealanswer q="878990"]Sample: [/revealanswer] [hiddenanswer a="878990"]the 500 doctors selected at random from the professional directory.[/hiddenanswer] [revealanswer q="223312"]Parameter[/revealanswer] [hiddenanswer a="223312"]the proportion of medical doctors who have been involved in one or more malpractice suits in the population.[/hiddenanswer] [revealanswer q="649157"]Statistics[/revealanswer] [hiddenanswer a="649157"] the proportion of medical doctors who have been involved in one or more malpractice suits in the sample.[/hiddenanswer] [revealanswer q="107923"]Variable: [/revealanswer] [hiddenanswer a="107923"]the number of medical doctors who have been involved in one or more malpractice suits.[/hiddenanswer] [revealanswer q="569763"]Data[/revealanswer] [hiddenanswer a="569763"]Yes, was involved in one or more malpractice lawsuits; or no, was not. [/hiddenanswer]
The Data and Story Library, http://lib.stat.cmu.edu/DASL/Stories/CrashTestDummies.html (accessed May 1, 2013).
The mathematical theory of statistics is easier to learn when you know the language. This module presents important terms that will be used throughout the text.
Quantitative Data  Qualitative Data  
Definition  Quantitative data are the result of counting or measuring attributes of a population.  Qualitative data are the result of categorizing or describing attributes of a population. 
Data that you will see  Quantitative data are always numbers.  Qualitative data are generally described by words or letters. 
Examples  Amount of money you have Height Weight Number of people living in your town Number of students who take statistics  Hair color Blood type Ethnic group The car a person drives The street a person lives on 
The data are the areas of lawns in square feet. You sample five houses. The areas of the lawns are 144 sq. feet, 160 sq. feet, 190 sq. feet, 180 sq. feet, and 210 sq. feet. What type of data is this? [revealanswer q="126830"]Show Answer[/revealanswer] [hiddenanswer a="126830"]It is quantitative continuous data.[/hiddenanswer]
Determine the correct data type (quantitative or qualitative). Indicate whether quantitative data are continuous or discrete. Hint: Data that are discrete often start with the words "the number of."
Frequency  Percent  

Asian  8,794  36.1% 
Black  1,412  5.8% 
Filipino  1,298  5.3% 
Hispanic  4,180  17.1% 
Native American  146  0.6% 
Pacific Islander  236  1.0% 
White  5,978  24.5% 
TOTAL  22,044 out of 24,382  90.4% out of 100% 
Figure 1. Ethnicity of Students
The following graph is the same as the previous graph but the "Other/Unknown" percent (9.6%) has been included. The "Other/Unknown" category is large compared to some of the other categories (Native American, 0.6%, Pacific Islander 1.0%). This is important to know when we think about what the data are telling us. This particular bar graph in Figure 2 can be difficult to understand visually.Figure 2. Bar Graph with Other/Unknown Category
The graph in Figure 3 is a Pareto chart. The Pareto chart has the bars sorted from largest to smallest and is easier to read and interpret.
Figure 3. Pareto Chart with Bars Sorted by Size
Type  
Random Sampling  1. Simple Random Sample 
2. Stratified Sample  
3. Cluster Sample  
4. Systematic Sample  
5. Convenient Sample 
ID  Name  ID  Name  ID  Name 

00  Anselmo  11  King  21  Roquero 
01  Bautista  12  Legeny  22  Roth 
02  Bayani  13  Lundquist  23  Rowell 
03  Cheng  14  Macierz  24  Salangsang 
04  Cuarismo  15  Motogawa  25  Slade 
05  Cuningham  16  Okimoto  26  Stratcher 
06  Fontecha  17  Patel  27  Tallai 
07  Hong  18  Price  28  Tran 
08  Hoobler  19  Quizon  29  Wai 
09  Jiao  20  Reyes  30  Wood 
10  Khan 
The way a set of data is measured is called its level of measurement. Correct statistical procedures depend on a researcher being familiar with levels of measurement. Not every statistical operation can be used with every set of data. Data can be classified into four levels of measurement. They are (from lowest to highest level):
Data that is measured using a nominal scale is qualitative. Categories, colors, names, labels and favorite foods along with yes or no responses are examples of nominal level data. Nominal scale data are not ordered. Nominal scale data cannot be used in calculations.
Example:

Data that is measured using an ordinal scale is similar to nominal scale data but there is a big difference. The ordinal scale data can be ordered. Like the nominal scale data, ordinal scale data cannot be used in calculations.
Example:

Data that is measured using the interval scale is similar to ordinal level data because it has a definite ordering but there is a difference between data. The differences between interval scale data can be measured though the data does not have a starting point.
Temperature scales like Celsius (C) and Fahrenheit (F) are measured by using the interval scale. In both temperature measurements, 40° is equal to 100° minus 60°. Differences make sense. But 0 degrees does not because, in both scales, 0 is not the absolute lowest temperature. Temperatures like 10° F and 15° C exist and are colder than 0.
Interval level data can be used in calculations, but comparison cannot be done. 80° C is not four times as hot as 20° C (nor is 80° F four times as hot as 20° F). There is no meaning to the ratio of 80 to 20 (or four to one).Example:

Data that is measured using the ratio scale takes care of the ratio problem and gives you the most information. Ratio scale data is like interval scale data, but it has a 0 point and ratios can be calculated. You will not have a negative value in ratio scale data.
For example, four multiple choice statistics final exam scores are 80, 68, 20 and 92 (out of a possible 100 points) (given that the exams are machinegraded.) The data can be put in order from lowest to highest: 20, 68, 80, 92. There is no negative point in the final exam scores as the lowest score is 0 point.The differences between the data have meaning. The score 92 is more than the score 68 by 24 points. Ratios can be calculated. The smallest score is 0. So 80 is four times 20. If one student scores 80 points and another student scores 20 points, the student who scores higher is 4 times better than the student who scores lower.
Example:

“State & County QuickFacts,” U.S. Census Bureau. http://quickfacts.census.gov/qfd/download_data.html (accessed May 1, 2013).
“State & County QuickFacts: Quick, easy access to facts about people, business, and geography,” U.S. Census Bureau. http://quickfacts.census.gov/qfd/index.html (accessed May 1, 2013).
“Table 5: Direct hits by mainland United States Hurricanes (18512004),” National Hurricane Center, http://www.nhc.noaa.gov/gifs/table5.gif (accessed May 1, 2013).
“Levels of Measurement,” http://infinity.cos.edu/faculty/woodbury/stats/tutorial/Data_Levels.htm (accessed May 1, 2013).
Courtney Taylor, “Levels of Measurement,” about.com, http://statistics.about.com/od/HelpandTutorials/a/LevelsOfMeasurement.htm (accessed May 1, 2013).
David Lane. “Levels of Measurement,” Connexions, http://cnx.org/content/m10809/latest/ (accessed May 1, 2013).
Some calculations generate numbers that are artificially precise. It is not necessary to report a value to eight decimal places when the measures that generated that value were only accurate to the nearest tenth. Round off your final answer to one more decimal place than was present in the original data. This means that if you have data measured to the nearest tenth of a unit, report the final statistic to the nearest hundredth.
In addition to rounding your answers, you can measure your data using the following four levels of measurement.
When organizing data, it is important to know how many times a value appears. How many statistics students study five hours or more for an exam? What percent of families on our block own two pets? Frequency, relative frequency, and cumulative relative frequency are measures that answer questions like these.
The following table lists the different data values in ascending order and their frequencies.
DATA VALUE  FREQUENCY 

2  3 
3  5 
4  3 
5  6 
6  2 
7  1 
In this research, 3 students studied for 2 hours. 5 students studies for 3 hours.
A frequency is the number of times a value of the data occurs. According to the table, there are three students who work two hours, five students who work three hours, and so on. The sum of the values in the frequency column, 20, represents the total number of students included in the sample.A relative frequency is the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes. To find the relative frequencies, divide each frequency by the total number of students in the sample–in this case, 20. Relative frequencies can be written as fractions, percents, or decimals.
Relative frequency = [latex]\frac{\text{frequency of the class}}{\text{total}}[/latex]
Cumulative relative frequency is the accumulation of the previous relative frequencies. To find the cumulative relative frequencies, add all the previous relative frequencies to the relative frequency for the current row, as shown in the table below.
Cumulative relative frequency = sum of previous relative frequencies + current class frequency
DATA VALUE  FREQUENCY  RELATIVE FREQUENCY  CUMULATIVE RELATIVE FREQUENCY 

2  3  [latex]\frac{3}{20}[/latex] or 0.15  0.15 
3  5  [latex]\frac{5}{20}[/latex] or 0.25  0.15 + 0.25 = 0.40 
4  3  [latex]\frac{3}{20}[/latex] or 0.15  0.40 + 0.15 = 0.55 
5  6  [latex]\frac{6}{20}[/latex] or 0.30  0.55 + 0.30 = 0.85 
6  2  [latex]\frac{2}{20}[/latex] or 0.10  0.85 + 0.10 = 0.95 
7  1  [latex]\frac{1}{20}[/latex] or 0.05  0.95 + 0.05 = 1.00 
Height (inches)  Frequency 
59.95  61.95  5 
61.95  63.95  3 
63.95  65.95  15 
65.95  67.95  40 
67.95  69.95  17 
69.95  71.95  12 
71.95  73.95  7 
73.95  75.95  1 
Total = 100 
Height (Inches)  Frequency  Relative Frequency  Cumulative Relative Frequency 
59.95  61.95  5  [latex]\frac{5}{100}[/latex] or 0.05  0.05 
61.95  63.95  3  [latex]\frac{3}{100}[/latex] or 0.03  0.05 + 0.03 = 0.08 
63.95  65.95  15  [latex]\frac{15}{100}[/latex] or 0.15  0.08 + 0.15 = 0.23 
65.95  67.95  40  [latex]\frac{4}{100}[/latex] or 0.04  0.23 + 0.40 = 0.63 
67.95  69.95  17  [latex]\frac{17}{100}[/latex] or 0.17  0.63 + 0.17 = 0.80 
69.95  71.95  12  [latex]\frac{12}{100}[/latex] or 0.12  0.80 + 0.12 = 0.92 
71.95  73.95  7  [latex]\frac{7}{100}[/latex] or 0.07  0.92 + 0.07 = 0.99 
73.95  75.95  1  [latex]\frac{1}{100}[/latex] or 0.01  0.99 + 0.01 = 1.00 
Total = 100  Total = 1 
Rainfall (inches)  Frequency 
2.95  4.97  6 
4.97  6.99  7 
6.99  9.01  15 
9.01  11.03  8 
11.03  13.05  9 
13.05  15.07  5 
Rainfall (inches)  Frequency  Relative frequency  Cumulative relative frequency 
2.95  4.97  6  [latex]\frac{6}{50}[/latex] = 0.12  0.12 
4.97  6.99  7  [latex]\frac{7}{50}[/latex] = 0.14  0.12 + 0.14 = 0.26 
6.99  9.01  15  [latex]\frac{15}{50}[/latex] = 0.30  0.26 + 0.30 = 0.56 
9.01  11.03  8  [latex]\frac{8}{50}[/latex] = 0.16  0.56 + 0.16 = 0.72 
11.03  13.05  9  [latex]\frac{9}{50}[/latex] = 0.18  0.72 + 0.18 = 0.90 
13.05  15.07  5  [latex]\frac{5}{50}[/latex] = 0.10  0.90 + 0.10 = 1.00 
Year  Total Number of Deaths 
2000  231 
2001  21,357 
2002  11,685 
2003  33,819 
2004  228,802 
2005  88,003 
2006  6,605 
2007  712 
2008  88,011 
2009  1,790 
2010  320,120 
2011  21,953 
2012  768 
Total  823,356 
The table contains the total number of fatal motor vehicle traffic crashes in the United States for the period from 1994 to 2011.
Year  Total Number of Crashes  Year  Total Number of Crashes 

1994  36,254  2004  38,444 
1995  37,241  2005  39,252 
1996  37,494  2006  38,648 
1997  37,324  2007  37,435 
1998  37,107  2008  34,172 
1999  37,140  2009  30,862 
2000  37,526  2010  30,296 
2001  37,862  2011  29,757 
2002  38,491  Total  653,782 
2003  38,477 
The purpose of an experiment is to investigate the relationship between two variables. When one variable causes change in another, we call the first variable the explanatory variable. The affected variable is called the response variable. In a randomized experiment, the researcher manipulates values of the explanatory variable and measures the resulting changes in the response variable. The different values of the explanatory variable are called treatments. An experimental unit is a single object or individual to be measured.
The following video explains the difference between collecting data from observations and collecting data from experiments. https://www.youtube.com/watch?v=J_O7ibkX8IkYou want to investigate the effectiveness of vitamin E in preventing disease. You recruit a group of subjects and ask them if they regularly take vitamin E. You notice that the subjects who take vitamin E exhibit better health on average than those who do not. Does this prove that vitamin E is effective in disease prevention? It does not. There are many differences between the two groups compared in addition to vitamin E consumption. People who take vitamin E regularly often take other steps to improve their health: exercise, diet, other vitamin supplements, choosing not to smoke. Any one of these factors could be influencing health. As described, this study does not prove that vitamin E is the key to disease prevention.
Additional variables that can cloud a study are called lurking variables. In order to prove that the explanatory variable is causing a change in the response variable, it is necessary to isolate the explanatory variable. The researcher must design her experiment in such a way that there is only one difference between groups being compared: the planned treatments. This is accomplished by the random assignment of experimental units to treatment groups. When subjects are assigned treatments randomly, all of the potential lurking variables are spread equally among the groups. At this point the only difference between groups is the one imposed by the researcher. Different outcomes measured in the response variable, therefore, must be a direct result of the different treatments. In this way, an experiment can prove a causeandeffect connection between the explanatory and response variables.
The power of suggestion can have an important influence on the outcome of an experiment. Studies have shown that the expectation of the study participant can be as important as the actual medication. In one study of performanceenhancing drugs, researchers noted:
Results showed that believing one had taken the substance resulted in [performance] times almost as fast as those associated with consuming the drug itself. In contrast, taking the drug without knowledge yielded no significant performance increment.^{1}
When participation in a study prompts a physical response from a participant, it is difficult to isolate the effects of the explanatory variable. To counter the power of suggestion, researchers set aside one treatment group as a control group. This group is given a placebo treatment–a treatment that cannot influence the response variable. The control group helps researchers balance the effects of being in an experiment with the effects of the active treatments. Of course, if you are participating in a study and you know that you are receiving a pill which contains no actual medication, then the power of suggestion is no longer a factor. Blinding in a randomized experiment preserves the power of suggestion. When a person involved in a research study is blinded, he does not know who is receiving the active treatment(s) and who is receiving the placebo treatment. A doubleblind experiment is one in which both the subjects and the researchers involved with the subjects are blinded.
Researchers have a responsibility to verify that proper methods are being followed. The report describing the investigation of Stapel’s fraud states that, “statistical flaws frequently revealed a lack of familiarity with elementary statistics.”^{3} Many of Stapel’s coauthors should have spotted irregularities in his data. Unfortunately, they did not know very much about statistical analysis, and they simply trusted that he was collecting and reporting data properly.
Many types of statistical fraud are difficult to spot. Some researchers simply stop collecting data once they have just enough to prove what they had hoped to prove. They don’t want to take the chance that a more extensive study would complicate their lives by producing data contradicting their hypothesis.
Professional organizations, like the American Statistical Association, clearly define expectations for researchers. There are even laws in the federal code about the use of research data.
When a statistical study uses human participants, as in medical studies, both ethics and the law dictate that researchers should be mindful of the safety of their research subjects. The U.S. Department of Health and Human Services oversees federal regulations of research studies with the aim of protecting participants. When a university or other research institution engages in research, it must ensure the safety of all human subjects. For this reason, research institutions establish oversight committees known as Institutional Review Boards (IRB). All planned studies must be approved in advance by the IRB. Key protections that are mandated by law include the following:
These ideas may seem fundamental, but they can be very difficult to verify in practice. Is removing a participant’s name from the data record sufficient to protect privacy? Perhaps the person’s identity could be discovered from the data that remains. What happens if the study does not proceed as planned and risks arise that were not anticipated? When is informed consent really necessary? Suppose your doctor wants a blood sample to check your cholesterol level. Once the sample has been tested, you expect the lab to dispose of the remaining blood. At that point the blood becomes biological waste. Does a researcher have the right to take it for use in a study?
It is important that students of statistics take time to consider the ethical questions that arise in statistical studies. How prevalent is fraud in statistical studies? You might be surprised—and disappointed. There is a website (www.retractionwatch.com) dedicated to cataloging retractions of study articles that have been proven fraudulent. A quick glance will show that the misuse of statistics is a bigger problem than most people realize.
Vigilance against fraud requires knowledge. Learning the basic theory of statistics will empower you to analyze statistical studies critically.
“Vitamin E and Health,” Nutrition Source, Harvard School of Public Health, http://www.hsph.harvard.edu/nutritionsource/vitamine/ (accessed May 1, 2013).
Stan Reents. “Don’t Underestimate the Power of Suggestion,” athleteinme.com, http://www.athleteinme.com/ArticleView.aspx?id=1053 (accessed May 1, 2013).
Ankita Mehta. “Daily Dose of Aspiring Helps Reduce Heart Attacks: Study,” International Business Times, July 21, 2011. Also available online at http://www.ibtimes.com/dailydoseaspirinhelpsreduceheartattacksstudy300443 (accessed May 1, 2013).
The Data and Story Library, http://lib.stat.cmu.edu/DASL/Stories/ScentsandLearning.html (accessed May 1, 2013).
M.L. Jacskon et al., “Cognitive Components of Simulated Driving Performance: Sleep Loss effect and Predictors,” Accident Analysis and Prevention Journal, Jan no. 50 (2013), http://www.ncbi.nlm.nih.gov/pubmed/22721550 (accessed May 1, 2013).
“Earthquake Information by Year,” U.S. Geological Survey. http://earthquake.usgs.gov/earthquakes/eqarchives/year/ (accessed May 1, 2013).
“Fatality Analysis Report Systems (FARS) Encyclopedia,” National Highway Traffic and Safety Administration. http://wwwfars.nhtsa.dot.gov/Main/index.aspx (accessed May 1, 2013).
Data from www.businessweek.com (accessed May 1, 2013).
Data from www.forbes.com (accessed May 1, 2013).
“America’s Best Small Companies,” http://www.forbes.com/bestsmallcompanies/list/ (accessed May 1, 2013).
U.S. Department of Health and Human Services, Code of Federal Regulations Title 45 Public Welfare Department of Health and Human Services Part 46 Protection of Human Subjects revised January 15, 2009. Section 46.111:Criteria for IRB Approval of Research.
“April 2013 Air Travel Consumer Report,” U.S. Department of Transportation, April 11 (2013), http://www.dot.gov/airconsumer/april2013airtravelconsumerreport (accessed May 1, 2013).
Lori Alden, “Statistics can be Misleading,” econoclass.com, http://www.econoclass.com/misleadingstats.html (accessed May 1, 2013).
Maria de los A. Medina, “Ethics in Statistics,” Based on “Building an Ethics Module for Business, Science, and Engineering Students” by Jose A. CruzCruz and William Frey, Connexions, http://cnx.org/content/m15555/latest/ (accessed May 1, 2013).
A poorly designed study will not produce reliable data. There are certain key components that must be included in every experiment. To eliminate lurking variables, subjects must be assigned randomly to different treatment groups. One of the groups must act as a control group, demonstrating what happens when the active treatment is not applied. Participants in the control group receive a placebo treatment that looks exactly like the active treatments but cannot influence the response variable. To preserve the integrity of the placebo, both researchers and subjects may be blinded. When a study is designed properly, the only difference between treatment groups is the one imposed by the researcher. Therefore, when groups respond differently to different treatments, the difference must be due to the influence of the explanatory variable.
“An ethics problem arises when you are considering an action that benefits you or some cause you support, hurts or reduces benefits to others, and violates some rule.”^{4} Ethical violations in statistics are not always easy to spot. Professional associations and federal agencies post guidelines for proper conduct. It is important that you learn basic statistical procedures so that you can recognize proper data analysis.
By the end of this chapter, the student should be able to:
Once you have collected data, what will you do with it? Data can be described and presented in many different formats. For example, suppose you are interested in buying a house in a particular area. You may have no clue about the house prices, so you might ask your real estate agent to give you a sample data set of prices. Looking at all the prices in the sample often is overwhelming. A better way might be to look at the median price and the variation of prices. The median and variation are just two ways that you will learn to describe data. Your agent might also provide you with a graph of the data.
In this chapter, you will study numerical and graphical ways to describe and display your data. This area of statistics is called "Descriptive Statistics." You will learn how to calculate, and even more importantly, how to interpret these measurements and graphs.
A statistical graph is a tool that helps you learn about the shape or distribution of a sample or a population. A graph can be a more effective way of presenting data than a mass of numbers because we can see where data clusters and where there are only a few data values. Newspapers and the Internet use graphs to show trends and to enable readers to compare facts and figures quickly. Statisticians often graph data first to get a picture of the data. Then, more formal tools may be applied.
Some of the types of graphs that are used to summarize and organize data are the dot plot, the bar graph, the histogram, the stemandleaf plot, the frequency polygon (a type of broken line graph), the pie chart, and the box plot. In this chapter, we will briefly look at stemandleaf plots, line graphs, and bar graphs, as well as frequency polygons, and time series graphs. Our emphasis will be on histograms and box plots.
NOTE:This book contains instructions for constructing a histogram and a box plot for the TI83+ and TI84 calculators. The Texas Instruments (TI) website provides additional instructions for using these calculators. 
Stem  Leaf 

3  3 
4  2 9 9 
5  3 5 5 
6  1 3 7 8 8 9 9 
7  2 3 4 8 
8  0 3 8 8 8 
9  0 2 4 4 4 4 6 
10  0 
Stem  Leaf 

3  2 2 3 4 8 
4  0 2 2 3 4 6 7 7 8 8 8 9 
5  0 0 1 2 2 2 3 4 6 7 7 
6  0 1 
Stem  Leaf 

0  5 7 
1  1 2 2 3 3 5 5 7 7 8 9 
2  0 2 5 6 8 8 8 
3  5 8 
4  4 8 9 
5  2 5 7 8 
6  
7  
8  0 
President  Age  President  Age  President  Age 

Washington  57  Lincoln  52  Hoover  54 
J. Adams  61  A. Johnson  56  F. Roosevelt  51 
Jefferson  57  Grant  46  Truman  60 
Madison  57  Hayes  54  Eisenhower  62 
Monroe  58  Garfield  49  Kennedy  43 
J. Q. Adams  57  Arthur  51  L. Johnson  55 
Jackson  61  Cleveland  47  Nixon  56 
Van Buren  54  B. Harrison  55  Ford  61 
W. H. Harrison  68  Cleveland  55  Carter  52 
Tyler  51  McKinley  54  Reagan  69 
Polk  49  T. Roosevelt  42  G.H.W. Bush  64 
Taylor  64  Taft  51  Clinton  47 
Fillmore  50  Wilson  56  G. W. Bush  54 
Pierce  48  Harding  55  Obama  47 
Buchanan  65  Coolidge  51 
President  Age  President  Age  President  Age 

Washington  67  Lincoln  56  Hoover  90 
J. Adams  90  A. Johnson  66  F. Roosevelt  63 
Jefferson  83  Grant  63  Truman  88 
Madison  85  Hayes  70  Eisenhower  78 
Monroe  73  Garfield  49  Kennedy  46 
J. Q. Adams  80  Arthur  56  L. Johnson  64 
Jackson  78  Cleveland  71  Nixon  81 
Van Buren  79  B. Harrison  67  Ford  93 
W. H. Harrison  68  Cleveland  71  Reagan  93 
Tyler  71  McKinley  58  
Polk  53  T. Roosevelt  60  
Taylor  65  Taft  72  
Fillmore  74  Wilson  67  
Pierce  64  Harding  57  
Buchanan  77  Coolidge  60 
Ages at Inauguration  Ages at Death  

9 9 8 7 7 7 6 3 2  4  6 9 
8 7 7 7 7 6 6 6 5 5 5 5 4 4 4 4 4 2 1 1 1 1 1 0  5  3 6 6 7 7 8 
9 5 4 4 2 1 1 1 0  6  0 0 3 3 4 4 5 6 7 7 7 8 
7  0 0 1 1 1 4 7 8 8 9  
8  0 1 3 5 8  
9  0 0 3 3 
Losses  Wins  Year  Losses  Wins  Year 

34  48  1968–1969  41  41  1989–1990 
34  48  1969–1970  39  43  1990–1991 
46  36  1970–1971  44  38  1991–1992 
46  36  1971–1972  39  43  1992–1993 
36  46  1972–1973  25  57  1993–1994 
47  35  1973–1974  40  42  1994–1995 
51  31  1974–1975  36  46  1995–1996 
53  29  1975–1976  26  56  1996–1997 
51  31  1976–1977  32  50  1997–1998 
41  41  1977–1978  19  31  1998–1999 
36  46  1978–1979  54  28  1999–2000 
32  50  1979–1980  57  25  2000–2001 
51  31  1980–1981  49  33  2001–2002 
40  42  1981–1982  47  35  2002–2003 
39  43  1982–1983  54  28  2003–2004 
42  40  1983–1984  69  13  2004–2005 
48  34  1984–1985  56  26  2005–2006 
32  50  1985–1986  52  30  2006–2007 
25  57  1986–1987  45  37  2007–2008 
32  50  1987–1988  35  47  2008–2009 
30  52  1988–1989  29  53  2009–2010 
Atalanta Hawks Wins and Leaves  
Number of Wins  Number of Loses  
3  1  9 
9 8 8 6 5  2  5 5 9 
8 7 6 6 5 5 4 3 1 1 1 1 0  3  0 2 2 2 2 4 4 5 6 6 6 9 9 9 
8 8 7 6 6 6 3 3 3 2 2 1 1 0  4  0 0 1 1 2 4 5 6 6 7 7 8 9 
7 7 6 3 2 0 0 0 0  5  1 1 1 2 3 4 4 6 7 
6  9 
In a survey, 40 mothers were asked how many times per week a teenager must be reminded to do his or her chores.
Number of times teenager is reminded  Frequency 

0  2 
1  5 
2  8 
3  14 
4  7 
5  4 
Number of times in shop  Frequency 

0  7 
1  10 
2  14 
3  9 
By the end of 2011, Facebook had over 146 million users in the United States. The table shows three age groups, the number of users in each age group, and the proportion (%) of users in each age group. Construct a bar graph and a pie chart using this data.
Age groups  Number of Facebook users  Proportion (%) of Facebook users 

13–25  65,082,280  45% 
26–44  53,300,200  36% 
45–64  27,885,100  19% 
[revealanswer q="310877"]Bar Graph[/revealanswer] [hiddenanswer a="310877"]The bar graph has age groups represented on the xaxis and proportions on the yaxis.[/hiddenanswer] [revealanswer q="283830"]Pie Chart[/revealanswer] [hiddenanswer a="283830"][/hiddenanswer]
Age groups  Number of people  Proportion of population 

Children  67,059  19% 
Workingage adults  152,198  43% 
Retirees  131,662  38% 
The columns in the table contain: the race or ethnicity of students in U.S. Public Schools for the class of 2011, percentages for the Advanced Placement examine population for that class, and percentages for the overall student population.
Race/Ethnicity  AP Examinee Population  Overall Student Population 

1 = Asian, Asian American or Pacific Islander  10.3%  5.7% 
2 = Black or African American  9.0%  14.7% 
3 = Hispanic or Latino  17.0%  17.6% 
4 = American Indian or Alaska Native  0.6%  1.1% 
5 = White  57.1%  59.2% 
6 = Not reported/other  6.0%  1.7% 
Park city is broken down into six voting districts. The table shows the percentage of the total registered voter population that lives in each district as well as the percent total of the entire population that lives in each district.
District  Registered voter population  Overall city population 

1  15.5%  19.4% 
2  12.2%  15.6% 
3  9.8%  9.0% 
4  17.4%  18.5% 
5  22.8%  20.7% 
6  22.3%  16.8% 
Example 1The following data are the heights (in inches to the nearest half inch) of 100 male semiprofessional soccer players. The heights are continuous data, since height is measured. 60; 60.5; 61; 61; 61.5 63.5; 63.5; 63.5 64; 64; 64; 64; 64; 64; 64; 64.5; 64.5; 64.5; 64.5; 64.5; 64.5; 64.5; 64.566; 66; 66; 66; 66; 66; 66; 66; 66; 66; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 67; 67; 67; 67; 67; 67; 67; 67; 67; 67; 67; 67; 67.5; 67.5; 67.5; 67.5; 67.5; 67.5; 67.5 68; 68; 69; 69; 69; 69; 69; 69; 69; 69; 69; 69; 69.5; 69.5; 69.5; 69.5; 69.5 70; 70; 70; 70; 70; 70; 70.5; 70.5; 70.5; 71; 71; 71 72; 72; 72; 72.5; 72.5; 73; 73.5; 74 Construct a relative frequency table and histogram. 

[revealanswer q="946569"]Show Answer[/revealanswer]
[hiddenanswer a="946569"]The smallest data value is 60.Since the data with the most decimal places has one decimal (for instance, 61.5), we want our starting point to have two decimal places.Since the numbers 0.5, 0.05, 0.005, etc. are convenient numbers, use 0.05 and subtract it from 60, the smallest value, for the convenient starting point.
60 – 0.05 = 59.95 which is more precise than, say, 61.5 by one decimal place.
The starting point is, then, 59.95.The largest value is 74, so 74 + 0.05 = 74.05 is the ending value.
Next, calculate the width of each bar or class interval.
To calculate this width, subtract the starting point from the ending value and divide by the number of bars (you must choose the number of bars you desire).
Suppose you choose eight bars.
[latex]\displaystyle\frac{{{74.05}{59.95}}}{{8}}={1.76}[/latex]
The boundaries are:

Note:We will round up to two and make each bar or class interval two units wide. Rounding up to two is one way to prevent a value from falling on a boundary. Rounding to the next number is often necessary even if it goes against the standard rules of rounding. For this example, using 1.76 as the width would also work. A guideline that is followed by some for the width of a bar or class interval is to take the square root of the number of data values and then round to the nearest whole number, if necessary. For example, if there are 150 values of data, take the square root of 150 and round to 12 bars or intervals. 
Example 2The following data are the number of books bought by 50 parttime college students at ABC College. The number of books is discrete data, since books are counted. 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1 2; 2; 2; 2; 2; 2; 2; 2; 2; 2 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3 4; 4; 4; 4; 4; 4 5; 5; 5; 5; 5 6; 6 Eleven students buy one book. Ten students buy two books. Sixteen students buy three books. Six students buy four books. Five students buy five books. Two students buy six books. Because the data are integers, subtract 0.5 from 1, the smallest data value and add 0.5 to 6, the largest data value. Then the starting point is 0.5 and the ending value is 6.5. Next, calculate the width of each bar or class interval. If the data are discrete and there are not too many different values, a width that places the data values in the middle of the bar or class interval is the most convenient. Since the data consist of the numbers 1, 2, 3, 4, 5, 6, and the starting point is 0.5, a width of one places the 1 in the middle of the interval from 0.5 to 1.5, the 2 in the middle of the interval from 1.5 to 2.5, the 3 in the middle of the interval from 2.5 to 3.5, the 4 in the middle of the interval from _______ to _______, the 5 in the middle of the interval from _______ to _______, and the _______ in the middle of the interval from _______ to _______ . 
Solution:

Example 3Using this data set, construct a histogram.


[revealanswer q="288404"]Show Answer[/revealanswer] [hiddenanswer a="288404"]Some values in this data set fall on boundaries for the class intervals. A value is counted in a class interval if it falls on the left boundary, but not if it falls on the right boundary. Different researchers may set up histograms for the same data in different ways. There is more than one correct way to set up a histogram.[/hiddenanswer] 
Practice Problems 1:The following data are the shoe sizes of 50 male students. The sizes are continuous data since shoe size is measured. Construct a histogram and calculate the width of each bar or class interval. Suppose you choose six bars. 9; 9; 9.5; 9.5; 10; 10; 10; 10; 10; 10; 10.5; 10.5; 10.5; 10.5; 10.5; 10.5; 10.5; 10.5 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11.5; 11.5; 11.5; 11.5; 11.5; 11.5; 11.5 12; 12; 12; 12; 12; 12; 12; 12.5; 12.5; 12.5; 12.5; 14 [revealanswer q="872174"]Option 1: [/revealanswer] [hiddenanswer a="872174"] Smallest value: 9 Largest value: 14 Convenient starting value: 9 – 0.05 = 8.95 Convenient ending value: 14 0.05 = 14.05 [latex]\displaystyle\frac{{{14.05}{8.95}}}{{6}}={0.85}[/latex] The calculations suggests using 0.85 as the width of each bar or class interval. You can also use an interval with a width equal to one.


Practice Problem 2:The following data are the number of sports played by 50 student athletes. The number of sports is discrete data since sports are counted. 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2 3; 3; 3; 3; 3; 3; 3; 3Solution20 student athletes play one sport. 22 student athletes play two sports. Eight student athletes play three sports. Fill in the blanks for the following sentence. Since the data consist of the numbers 1, 2, 3, and the starting point is 0.5, a width of one places the 1 in the middle of the interval 0.5 to _____, the 2 in the middle of the interval from _____ to _____, and the 3 in the middle of the interval from _____ to _____. [revealanswer q="917196"]Show Answer[/revealanswer] [hiddenanswer a="917196"] 1.5 1.5 to 2.5 2.5 to 3.5[/hiddenanswer] 
Frequency polygons are analogous to line graphs, and just as line graphs make continuous data visually easy to interpret, so too do frequency polygons.
To construct a frequency polygon, first examine the data and decide on the number of intervals, or class intervals, to use on the xaxis and yaxis. After choosing the appropriate ranges, begin plotting the data points. After all the points are plotted, draw line segments to connect them.
A frequency polygon was constructed from the frequency table below.
Frequency Distribution for Calculus Final Test Scores  

Lower Bound  Upper Bound  Frequency  Cumulative Frequency 
49.5  59.5  5  5 
59.5  69.5  10  15 
69.5  79.5  30  45 
79.5  89.5  40  85 
89.5  99.5  15  100 
The first label on the xaxis is 44.5. This represents an interval extending from 39.5 to 49.5. Since the lowest test score is 54.5, this interval is used only to allow the graph to touch the xaxis. The point labeled 54.5 represents the next interval, or the first “real” interval from the table, and contains five scores. This reasoning is followed for each of the remaining intervals with the point 104.5 representing the interval from 99.5 to 109.5. Again, this interval contains no data and is only used so that the graph will touch the xaxis. Looking at the graph, we say that this distribution is skewed because one side of the graph does not mirror the other side.
We will construct an overlay frequency polygon comparing the scores with the students’ final numeric grade.
Frequency Distribution for Calculus Final Test Scores  

Lower Bound  Upper Bound  Frequency  Cumulative Frequency 
49.5  59.5  5  5 
59.5  69.5  10  15 
69.5  79.5  30  45 
79.5  89.5  40  85 
89.5  99.5  15  100 
Frequency Distribution for Calculus Final Grades  

Lower Bound  Upper Bound  Frequency  Cumulative Frequency 
49.5  59.5  10  10 
59.5  69.5  10  20 
69.5  79.5  30  50 
79.5  89.5  45  95 
89.5  99.5  5  100 
Suppose that we want to study the temperature range of a region for an entire month. Every day at noon we note the temperature and write this down in a log. A variety of statistical studies could be done with this data. We could find the mean or the median temperature for the month. We could construct a histogram displaying the number of days that temperatures reach a certain range of values. However, all of these methods ignore a portion of the data that we have collected.
Practice Problem 3:Construct a frequency polygon of U.S. Presidents’ ages at inauguration shown in the table.
[revealanswer q="12745"]Show Answer[/revealanswer]
[hiddenanswer a="12745"][/hiddenanswer]
Frequency polygons are useful for comparing distributions. This is achieved by overlaying the frequency polygons drawn for different data sets. 
The following data shows the Annual Consumer Price Index, each month, for ten years. Construct a time series graph for the Annual Consumer Price Index data only.
Year  Jan  Feb  Mar  Apr  May  Jun  Jul 

2003  181.7  183.1  184.2  183.8  183.5  183.7  183.9 
2004  185.2  186.2  187.4  188.0  189.1  189.7  189.4 
2005  190.7  191.8  193.3  194.6  194.4  194.5  195.4 
2006  198.3  198.7  199.8  201.5  202.5  202.9  203.5 
2007  202.416  203.499  205.352  206.686  207.949  208.352  208.299 
2008  211.080  211.693  213.528  214.823  216.632  218.815  219.964 
2009  211.143  212.193  212.709  213.240  213.856  215.693  215.351 
2010  216.687  216.741  217.631  218.009  218.178  217.965  218.011 
2011  220.223  221.309  223.467  224.906  225.964  225.722  225.922 
2012  226.665  227.663  229.392  230.085  229.815  229.478  229.104 
Year  Aug  Sep  Oct  Nov  Dec  Annual 

2003  184.6  185.2  185.0  184.5  184.3  184.0 
2004  189.5  189.9  190.9  191.0  190.3  188.9 
2005  196.4  198.8  199.2  197.6  196.8  195.3 
2006  203.9  202.9  201.8  201.5  201.8  201.6 
2007  207.917  208.490  208.936  210.177  210.036  207.342 
2008  219.086  218.783  216.573  212.425  210.228  215.303 
2009  215.834  215.969  216.177  216.330  215.949  214.537 
2010  218.312  218.439  218.711  218.803  219.179  218.056 
2011  226.545  226.889  226.421  226.230  225.672  224.939 
2012  230.379  231.407  231.317  230.221  229.601  229.594 
CO2 Emissions  

Ukraine  United Kingdom  United States  
2003  352,259  540,640  5,681,664 
2004  343,121  540,409  5,790,761 
2005  339,029  541,990  5,826,394 
2006  327,797  542,045  5,737,615 
2007  328,357  528,631  5,828,697 
2008  323,657  522,247  5,656,839 
2009  272,176  474,579  5,299,563 
Time series graphs are important tools in various applications of statistics. When recording values of the same variable over an extended period of time, sometimes it is difficult to discern any trend or pattern. However, once the same data points are displayed graphically, some features jump out. Time series graphs make trends easy to spot.
Data on annual homicides in Detroit, 1961–73, from Gunst & Mason’s book ‘Regression Analysis and its Application’, Marcel Dekker
“Timeline: Guide to the U.S. Presidents: Information on every president’s birthplace, political party, term of office, and more.” Scholastic, 2013. Available online at http://www.scholastic.com/teachers/article/timelineguideuspresidents (accessed April 3, 2013).
“Presidents.” Fact Monster. Pearson Education, 2007. Available online at http://www.factmonster.com/ipka/A0194030.html (accessed April 3, 2013).
“Food Security Statistics.” Food and Agriculture Organization of the United Nations. Available online at http://www.fao.org/economic/ess/essfs/en/ (accessed April 3, 2013).
“Consumer Price Index.” United States Department of Labor: Bureau of Labor Statistics. Available online at http://data.bls.gov/pdq/SurveyOutputServlet (accessed April 3, 2013).
“CO2 emissions (kt).” The World Bank, 2013. Available online at http://databank.worldbank.org/data/home.aspx (accessed April 3, 2013).
“Births Time Series Data.” General Register Office For Scotland, 2013. Available online at http://www.groscotland.gov.uk/statistics/theme/vitalevents/births/timeseries.html (accessed April 3, 2013).
“Demographics: Children under the age of 5 years underweight.” Indexmundi. Available online at http://www.indexmundi.com/g/r.aspx?t=50&v=2224&aml=en (accessed April 3, 2013).
Gunst, Richard, Robert Mason. Regression Analysis and Its Application: A DataOriented Approach. CRC Press: 1980.
“Overweight and Obesity: Adult Obesity Facts.” Centers for Disease Control and Prevention. Available online at http://www.cdc.gov/obesity/data/adult.html (accessed September 13, 2013).A histogram is a graphic version of a frequency distribution. The graph consists of bars of equal width drawn adjacent to each other. The horizontal scale represents classes of quantitative data values and the vertical scale represents frequencies. The heights of the bars correspond to frequency values. Histograms are typically used for large, continuous, quantitative data sets. A frequency polygon can also be used when graphing large data sets with data points that repeat. The data usually goes on yaxis with the frequency being graphed on the xaxis. Time series graphs can be helpful when looking at large amounts of data for one variable over a period of time.
1; 11.5; 6; 7.2; 4; 8; 9; 10; 6.8; 8.3; 2; 2; 10; 1
Ordered from smallest to largest:1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5
Since there are 14 observations, the median is between the seventh value, 6.8, and the eighth value, 7.2. To find the median, add the two values together and divide by two. [latex]\displaystyle\frac{{{6.8}+{7.2}}}{{2}}={7}[/latex] The median is seven. 50% of the values are smaller than 7 and 50% of the values are larger than 7.To find the quartiles,

1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5
Hence, the median = [latex]\frac{6.8 + 7.2}{2}[/latex] = 7
A data is a potential outlier if and only if the data is [latex]\begin{cases}\text{smaller than Q1  1.5 * IQR}\\\text{or}\\\text{larger than Q3 + 1.5 * IQR}\end{cases}[/latex] 
Class AOrder the data from smallest to largest. 34; 66; 67; 69; 69; 76; 77; 77; 79; 80; 81; 83; 85; 89; 90; 91; 94; 96; 98; 99 [latex]\displaystyle {Median}=\frac{{{80}+{81}}}{{2}}={80.5}[/latex] [latex]{Q}_{{1}}=\frac{{{69}+{76}}}{{2}}={72.5}[/latex] [latex]{Q}_{{3}}=\frac{{{90}+{91}}}{{2}}={90.5}[/latex] IQR = 90.5 – 72.5 = 18 
Class BOrder the data from smallest to largest. 39; 68; 70; 71; 72; 73; 75; 78; 79; 80; 80; 90; 90; 92; 92; 95; 95; 97; 129; 134 [latex]\displaystyle{Median}=\frac{{{80}+{80}}}{{2}}={80} [/latex] [latex]{Q}_{{1}}=\frac{{{72}+{73}}}{{2}}={72.5}[/latex] [latex]{Q}_{{3}}=\frac{{{92}+{95}}}{{2}}={93.5}[/latex] IQR = 93.5 – 72.5 = 21 
The following 13 real estate prices. (Prices are in dollars.)
Order the data from smallest to largest.
114,950; 158,000; 230,500; 387,000; 389,950; 479,000; 488,800; 529,000; 575,000; 639,000; 659,000; 1,095,000; 5,500,000Median = 488,800
Q_{1} = [latex]\frac{230,500+387,000}{2}[/latex] = 308,750
Q_{3} =[latex]\frac{639,000+659,000}{2}[/latex] = 649,000
IQR = 649,000 – 308,750 = 340,250
(1.5)(IQR) = (1.5)(340,250) = 510,375
Q_{1} – (1.5)(IQR) = 308,750 – 510,375 = –201,625
Q_{3} + (1.5)(IQR) = 649,000 + 510,375 = 1,159,375
No house price is less than –201,625. However, 5,500,000 is more than 1,159,375. Therefore, 5,500,000 is a potential outlier.
[/hiddenanswer]Amount of Sleep per School Night (Hours)  Frequency  Relative Frequency  Cumulative Relative Frequency 

4  2  0.04  0.04 
5  5  0.10  0.14 
6  7  0.14  0.28 
7  12  0.24  0.52 
8  14  0.28  0.80 
9  7  0.14  0.94 
10  3  0.06  1.00 
Amount of time spent on route (hours)  Frequency  Relative Frequency  Cumulative Relative Frequency 

2  12  0.30  0.30 
3  14  0.35  0.65 
4  10  0.25  0.90 
5  4  0.10  1.00 
Amount of Sleep per School Night (Hours)  Frequency  Relative Frequency  Cumulative Relative Frequency 

4  2  0.04  0.04 
5  5  0.10  0.14 
6  7  0.14  0.28 
7  12  0.24  0.52 
8  14  0.28  0.80 
9  7  0.14  0.94 
10  3  0.06  1.00 
Amount of time spent on route (hours)  Frequency  Relative Frequency  Cumulative Relative Frequency 

2  12  0.30  0.30 
3  14  0.35  0.65 
4  10  0.25  0.90 
5  4  0.10  1.00 
GuidelineWhen writing the interpretation of a percentile in the context of the given data, the sentence should contain the following information.

Note:You may encounter boxandwhisker plots that have dots marking outlier values. In those cases, the whiskers are not extending to the minimum and maximum values. 
Note:It is important to start a box plot with a scaled number line. Otherwise the box plot may not be useful. 
NoteThe words "mean" and "average" are often used interchangeably. The substitution of one word for the other is common practice. The technical term is "arithmetic mean" and "average" is technically a center location. However, in practice among nonstatisticians, "average" is commonly accepted for "arithmetic mean." 
1; 1; 1; 2; 2; 3; 4; 4; 4; 4; 4
[latex]\displaystyle\overline{{x}}=\frac{{{1}+{1}+{1}+{2}+{2}+{3}+{4}+{4}+{4}+{4}+{4}}}{{11}}={2.7}[/latex]
[latex]\displaystyle\overline{{x}}=\frac{{{3}{({1})}+{2}{({2})}+{1}{({3})}+{5}{({4})}}}{{11}}={2.7}[/latex]
In the second example, the frequencies are 3(1) + 2(2) + 1(3) + 5(4). You can quickly find the location of the median by using the expression [latex]\displaystyle\frac{{{n}+{1}}}{{2}}[/latex]. The letter n is the total number of data values in the sample. If n is an odd number, the median is the middle value of the ordered data (ordered smallest to largest). If n is an even number, the median is equal to the two middle values added together and divided by two after the data has been ordered. For example, if the total number of data values is 97, then [latex]\displaystyle\frac{{{n}+{1}}}{{2}}=\frac{{{97}+{1}}}{{2}}={49}[/latex]. The median occurs midway between the 50th and 51st values. The location of the median and the value of the median are not the same. The upper case letter M is often used to represent the median. The next example illustrates the location of the median and the value of the median.
Finding the Mean and the Median Using the TI83, 83+, 84, 84+ Calculator

Note:The mode can be calculated for qualitative data as well as for quantitative data. For example, if the data set is: red, red, red, green, green, yellow, purple, black, blue, the mode is red. 
# of movies  Relative Frequency 
0  [latex]\displaystyle\frac{{5}}{{30}}\\[/latex] 
1  [latex]\displaystyle\frac{{15}}{{30}}\\[/latex] 
2  [latex]\displaystyle\frac{{6}}{{30}}\\[/latex] 
3  [latex]\displaystyle\frac{{3}}{{30}}\\[/latex] 
4  [latex]\displaystyle\frac{{1}}{{30}}\\[/latex] 
Grade Interval  Number of Students 
50–56.5  1 
56.5–62.5  0 
62.5–68.5  4 
68.5–74.5  4 
74.5–80.5  2 
80.5–86.5  3 
86.5–92.5  4 
92.5–98.5  1 
Grade Interval  Midpoint 
50.0–56.5  53.25 
56.5–62.5  59.5 
62.5–68.5  65.5 
68.5–74.5  71.5 
74.5–80.5  77.5 
80.5–86.5  83.5 
86.5–92.5  89.5 
92.5–98.5  95.5 
Hours Teenagers Spend on Video Games  Number of Teenagers 
0–3.5  3 
3.5–7.5  7 
7.5–11.5  12 
11.5–15.5  7 
15.5–19.5  9 
This data set can be represented by following histogram. Each interval has width one, and each value is located in the middle of an interval.
Figure 1.The histogram displays a symmetrical distribution of data. A distribution is symmetrical if a vertical line can be drawn at some point in the histogram such that the shape to the left and the right of the vertical line are mirror images of each other. The mean, the median, and the mode are each seven for these data. In a perfectly symmetrical distribution, the mean and the median are the same. This example has one mode (unimodal), and the mode is the same as the mean and median. In a symmetrical distribution that has two modes (bimodal), the two modes would be different from the mean and median.
Consider the following data set: 6 7 7 7 7 8 8 8 9 10.
The histogram is also not symmetrical. It is skewed to the right. Figure 3.The mean is 7.7, the median is 7.5, and the mode is 7.
Of the three statistics, the mean is the largest, while the mode is the smallest. Again, the mean reflects the skewing the most.To summarize,
Skewness and symmetry become important when we discuss probability distributions in later chapters.
Here is a video that summarizes how the mean, median and mode can help us describe the skewness of a dataset. Don't worry about the terms leptokurtic and platykurtic for this course. https://youtu.be/s6N_l3BuMcStatistics are used to compare and sometimes identify authors. The following lists shows a simple random sample that compares the letter counts for three authors.
Terry: 7; 9; 3; 3; 3; 4; 1; 3; 2; 2
Davi: 3; 3; 3; 4; 1; 4; 3; 2; 3; 1
Mari: 2; 3; 4; 4; 4; 6; 6; 6; 8; 3
Discuss the mean, median, and mode for each of the following problems. Is there a pattern between the shape and measure of the center?
a.
b.
The Ages Former U.S Presidents Died  

4  6 9 
5  3 6 7 7 7 8 
6  0 0 3 3 4 4 5 6 7 7 7 8 
7  0 1 1 2 3 4 7 8 8 9 
8  0 1 3 5 8 
9  0 0 3 3 
Key: 80 means 80. 
c.
Looking at the distribution of data can reveal a lot about the relationship between the mean, the median, and the mode. There are three types of distributions. A right (or positive) skewed distribution has a shape like Figure 2. A left (or negative) skewed distribution has a shape like Figure 3 . A symmetrical distribution looks like Figure 1.
7 is one standard deviation to the right of five because 5 + (1)(2) = 7.
If one were also part of the data set, then one is two standard deviations to the left of 5 because 5 + (–2)(2) = 1. In general, a value = mean + (# ofSTDEV)(standard deviation)
Formulas for the Sample Standard Deviation[latex]\displaystyle{s}=\sqrt{{\frac{{\sum{({x}\overline{{x}})}^{{2}}}}{{{n}{1}}}}}{\quad\text{or}\quad}{s}=\sqrt{{\frac{{\sum{(f){{({x}\overline{{x}})}}}^{{2}}}}{{{n}{1}}}}}[/latex]For the sample standard deviation, the denominator is n – 1, that is the sample size – 1. Formulas for the Population Standard Deviation[latex]\displaystyle\sigma=\sqrt{{\frac{{\sum{({x}\mu)}^{{2}}}}{{{N}}}}}{\quad\text{or}\quad}\sigma=\sqrt{{\frac{{\sum{(f){{({x}\mu)}}}^{{2}}}}{{{N}}}}}[/latex] For the population standard deviation, the denominator is N, the number of items in the population. 
Note:In practice, use a calculator or computer software to calculate the standard deviation. If you are using a TI83, 83+, 84+ calculator, you need to select the appropriate standard deviation σ_{x} or s_{x} from the summary statistics. We will concentrate on using and interpreting the information that the standard deviation gives us. However you should study the following stepbystep example to help you understand how the standard deviation measures variation from the mean. (The calculator instructions appear at the end of this example.) 
9; 9.5; 9.5; 10; 10; 10; 10; 10.5; 10.5; 10.5; 10.5; 11; 11; 11; 11; 11; 11; 11.5; 11.5; 11.5
1. The teacher was interested in the average age and the sample standard deviation of the ages of her students. [revealanswer q="80393"]Show Answer[/revealanswer] [hiddenanswer a="80393"] The average age is 10.525 years, rounded to two places. The variance may be calculated by using a table. Then the standard deviation is calculated by taking the square root of the variance. We will explain the parts of the table after calculating s.Data  Freq.  Deviations  Deviations^{2}  (Freq.)( Deviations^{2}) 

x  f  ( x – [latex]\overline{x}[/latex])  ( x –[latex]\overline{x}[/latex])^{2}  (f)(x –[latex]\overline{x}[/latex])^{2} 
9  1  9 – 10.525 = –1.525  (–1.525) ^{2} = 2.325625  1 × 2.325625 = 2.325625 
9.5  2  9.5 – 10.525 = –1.025  (–1.025) ^{2} = 1.050625  2 × 1.050625 = 2.101250 
10  4  10 – 10.525 = –0.525  (–0.525) ^{2} = 0.275625  4 × 0.275625 = 1.1025 
10.5  4  10.5 – 10.525 = –0.025  (–0.025) ^{2} = 0.000625  4 × 0.000625 = 0.0025 
11  6  11 – 10.525 = 0.475  (0.475) ^{2} = 0.225625  6 × 0.225625 = 1.35375 
11.5  3  11.5 – 10.525 = 0.975  (0.975) ^{2} = 0.950625  3 × 0.950625 = 2.851875 
The total is 9.7375 
Verify the mean and standard deviation on your calculator or computer:

Note:Your concentration should be on what the standard deviation tells us about the data. The standard deviation is a number which measures how far the data are spread from the mean. Let a calculator or computer do the arithmetic. 
Data  Frequency  Relative Frequency  Cumulative Relative Frequency 

33  1  0.032  0.032 
42  1  0.032  0.064 
49  2  0.065  0.129 
53  1  0.032  0.161 
55  2  0.065  0.226 
61  1  0.032  0.258 
63  1  0.032  0.29 
67  1  0.032  0.322 
68  2  0.065  0.387 
69  2  0.065  0.452 
72  1  0.032  0.484 
73  1  0.032  0.516 
74  1  0.032  0.548 
78  1  0.032  0.580 
80  1  0.032  0.612 
83  1  0.032  0.644 
88  3  0.097  0.741 
90  1  0.032  0.773 
92  1  0.032  0.805 
94  4  0.129  0.934 
96  1  0.032  0.966 
100  1  0.032  0.998 (Why isn't this value 1?) 
Class  Frequency 
0  2  1 
3  5  6 
6  8  10 
9  11  7 
12  14  0 
15  17  2 
Class  Frequency, f  Midpoint, m 

0–2  1  1 
3–5  6  4 
6–8  10  7 
9–11  7  10 
12–14  0  13 
15–17  2  16 
Class  Frequency, f  Midpoint, m  [latex]m\overline{x}[/latex]  [latex](m\overline{x})^2[/latex]  [latex](f)(m\overline{x})^2[/latex] 

0–2  1  1  1  7.58 = 6.58  [latex](6.58)^2[/latex] = 43.2964  (1)(43.296) = 43.2964 
3–5  6  4  4  7.58 = 3.58  [latex](3.58)^2[/latex] = 12.8164  (6)(12.816) = 76.8984 
6–8  10  7  7  7.58 = 0.58  [latex](0.58)^2[/latex] = 0.3364  (10)(0.3364) = 3.364 
9–11  7  10  10  7.58 = 2.42  [latex](2.42)^2[/latex] = 5.8564  (7)(5.8564) = 40.9948 
12–14  0  13  13  7.58 = 5.42  [latex](5.42)^2[/latex] = 29.3764  (0)(29.376) = 0 
15–17  2  16  16  7.58 = 8.42  [latex](8.42)^2[/latex] = 70.8964  (2)(70.896) = 141.7928 
Mean of Frequency Table, [latex]\overline{x}[/latex] = [latex]\frac{{\sum(fm)}}{{\sum(f)}}[/latex] Standard Deviation of Frequency Table, [latex]{s}_{x}=\sqrt{{\frac{{f{(m\overline{x})}^{2}}}{{n1}}}}[/latex] where f = interval frequencies and m = interval midpoints. The calculations are tedious. It is usually best to use technology when performing the calculations. 
Class  Frequency, f 

0–2  1 
3–5  6 
6–8  10 
9–11  7 
12–14  0 
15–17  2 
Data value, x  ZScore of data value  
Sample  x = [latex]\overline{x}+(z)(s)[/latex]  z = [latex]\frac{x\overline{x}}{s}[/latex] 
Population  x = [latex]\mu+(z)(\sigma)[/latex]  z = [latex]\frac{x\mu}{\sigma}[/latex] 
Student  GPA  School Mean GPA  School Standard Deviation 

John  2.85  3.0  0.7 
Ali  77  80  10 
John  Ali 
z =[latex]\frac{{2.85  3.00}}{{0.7}}[/latex] = −0.21  z = [latex]\frac{{77  80}}{{10}}[/latex]= −0.3 
Swimmer  Time (seconds)  Team Mean Time  Team Standard Deviation 

Angie  26.2  27.2  0.8 
Beth  27.3  30.1  1.4 
By the end of this chapter, the student should be able to:
It is often necessary to "guess" about the outcome of an event in order to make a decision. Politicians study polls to guess their likelihood of winning an election. Teachers choose a particular course of study based on what they think students can comprehend. Doctors choose the treatments needed for various diseases based on their assessment of likely results. You may have visited a casino where people play games chosen because of the belief that the likelihood of winning is good. You may have chosen your course of study based on the probable availability of jobs.
You have, more than likely, used probability. In fact, you probably have an intuitive sense of probability. Probability deals with the chance of an event occurring. Whenever you weigh the odds of whether or not to do your homework or to study for an exam, you are using probability. In this chapter, you will learn how to solve probability problems using a systematic approach.Your instructor will survey your class. Count the number of students in the class today.
Use the class data as estimates of the following probabilities. P(change) means the probability that a randomly chosen person in your class has change in his/her pocket or purse. P(bus) means the probability that a randomly chosen person in your class rode a bus within the last month and so on. Discuss your answers.
The probability of event A, P(A) = [latex]\frac{\text{number of outcome with only one head}}{total of possible outcomes}[/latex] = [latex]\frac{2}{4}[/latex] = 0.5.
P(E) = [latex]\frac{\text{number of outcome that rolling a number that is at least five}}{total of possible outcomes}[/latex] = [latex]\frac{2}{6}[/latex] as the number of repetitions grows larger and larger.
A OR B = {1, 2, 3, 4, 5, 6, 7, 8}. (Notice that 4 and 5 are NOT listed twice.)
A AND B = {4, 5}.
P(A) + P(A′) = 1.
For example: We have sample space S = {1, 2, 3, 4, 5, 6}. If event A = {1, 2, 3, 4}. Then, event A'={5, 6}. P(A) = [latex]\frac{{4}}{{6}}[/latex] and P(A') = [latex]\frac{{2}}{{6}}[/latex] P(A) + P(A') = [latex]\frac{{4}}{{6}}+\frac{{2}}{{6}}={1}[/latex][latex]\displaystyle{P}{({A}{}{B})}=\frac{{{P}{({A}\text{ AND } {B})}}}{{{P}{({B})}}}[/latex] where P(B) is greater than zero.
For example: Suppose we toss one fair, sixsided die. The sample space S = {1, 2, 3, 4, 5, 6}. Let event A = face is 2 or 3 and B = event that face is even. Event A = {2, 3}, Event B ={2, 4, 6}. To calculate P(AB), we count the number of outcomes 2 or 3 in the sample space B = {2, 4, 6}. Then we divide that by the number of outcomes B (rather than S). We get the same result by using the formula. Remember that S has six outcomes. A and B = {2} (as 2 appears in both event A and event B.) [latex]\displaystyle{P}{({A}{}{B})}=\frac{{{P}{({A}\text{ AND } {B})}}}{{{P}{({B})}}}=\frac{{\frac{{\text{the number of outcomes that are in both event A and event B}}}{{total outcomes}}}}{{\frac{{\text{the number of outcomes in event B}}}{{total outcomes}}}}=\frac{{\frac{{1}}{{6}}}}{{\frac{{3}}{{6}}}}=\frac{{1}}{{3}}[/latex]Righthanded  Lefthanded  Total  

Males  43  9  43 + 9 = 52 
Females  44  4  44 + 4 = 48 
Total  43 + 44 = 87  9 + 4 = 13  100 
Hint:
You must show ONE of the following:

Hint:If G and H are independent, then you must show ONE of the following:

Interpretation of ResultsThe events of being female and having long hair are not independent; knowing that a student is female changes the probability that a student has long hair. 
Speeding violation in the last year  No speeding violation in the last year  Total  

Cell phone user  25  280  305 
Not a cell phone user  45  405  450 
Total  70  685  755 
Injury in last year  No injury in last year  Total  

Stretches  55  295  350 
Does not stretch  231  219  450 
Total  286  514  800 
Sex  The Coastline  Near Lakes and Streams  On Mountain Peaks  Total 

Female  18  16  ___  45 
Male  ___  ___  14  55 
Total  ___  41  ___  ___ 
Sex  The Coastline  Near Lakes and Streams  On Mountain Peaks  Total 

Female  18  16  11  45 
Male  16  25  14  55 
Total  34  41  25  100 
Hint:Let F = being female and let C = preferring the coastline. Check if P(F AND C) = P(F) * P(C). If P(F AND C) = P(F) * P(C), then F and C are independent. If P(F AND C) [latex]\ne[/latex] P(F) * P(C), then F and C are not independent. 
Hint:Let M = being male, and let L = prefers hiking near lakes and streams.

Hint:Let F = being female, and let P = prefers mountain peaks.

Gender  Lake Path  Hilly Path  Wooded Path  Total 

Female  45  38  27  110 
Male  26  52  12  90 
Total  71  90  39  200 
Caught or Not  Door One  Door Two  Door Three  Total 

Caught  [latex]\displaystyle\frac{{1}}{{15}}[/latex]  [latex]\displaystyle\frac{{1}}{{12}}[/latex]  [latex]\displaystyle\frac{{1}}{{6}}[/latex]  ____ 
Not Caught  [latex]\displaystyle\frac{{4}}{{15}}[/latex]  [latex]\displaystyle\frac{{3}}{{12}}[/latex]  [latex]\displaystyle\frac{{1}}{{6}}[/latex]  ____ 
Total  ____  ____  ____  1 
Caught or Not  Door One  Door Two  Door Three  Total 

Caught  [latex]\displaystyle\frac{{1}}{{15}}[/latex]  [latex]\displaystyle\frac{{1}}{{12}}[/latex]  [latex]\displaystyle\frac{{1}}{{6}}[/latex]  [latex]\displaystyle\frac{{19}}{{60}}[/latex] 
Not Caught  [latex]\displaystyle\frac{{4}}{{15}}[/latex]  [latex]\displaystyle\frac{{3}}{{12}}[/latex]  [latex]\displaystyle\frac{{1}}{{6}}[/latex]  [latex]\displaystyle\frac{{41}}{{60}}[/latex] 
Total  [latex]\displaystyle\frac{{5}}{{15}}[/latex]  [latex]\displaystyle\frac{{4}}{{12}}[/latex]  [latex]\displaystyle\frac{{2}}{{16}}[/latex]  1 
Year  Robbery  Burglary  Rape  Vehicle  Total 

2008  145.7  732.1  29.7  314.7  
2009  133.1  717.7  29.1  259.2  
2010  119.3  701  27.7  239.1  
2011  113.7  702.2  26.8  229.6  
Total 
Weight/Height  Tall  Medium  Short  Totals 

Obese  18  28  14  
Normal  20  51  28  
Underweight  12  25  9  
Totals 
Weight/Height  Tall  Medium  Short  Totals 

Obese  18  28  14  60 
Normal  20  51  28  99 
Underweight  12  25  9  46 
Totals  50  104  51  205 
A tree diagram is a special type of graph used to determine the outcomes of an experiment. It consists of "branches" that are labeled with either frequencies or probabilities. Tree diagrams can make some probability problems easier to visualize and solve. The following example illustrates how to use a tree diagram.
R1R1 R1R2 R1R3 R2R1 R2R2 R2R3 R3R1 R3R2 R3R3
The other outcomes are similar.
There are a total of 11 balls in the urn. Draw two balls, one at a time, with replacement. There are 11(11) = 121 outcomes, the size of the sample space.[/hiddenanswer] b. List the 24 BR outcomes.[revealanswer q="959880"]Show Answer[/revealanswer] [hiddenanswer a="959880"] B1R1, B1R2, B1R3, B2R1, B2R2, B2R3, B3R1, B3R2, B3R3, B4R1, B4R2, B4R3, B5R1, B5R2, B5R3, B6R1, B6R2, B6R3, B7R1, B7R2, B7R3, B8R1, B8R2, B8R3[/hiddenanswer]
c. Using the tree diagram, calculate P(RR). [revealanswer q="660919"]Show Answer[/revealanswer] [hiddenanswer a="660919"]P(RR) = [latex](\frac{3}{11})(\frac{3}{11})[/latex] = [latex]\frac{9}{121}[/latex][/hiddenanswer] d. Using the tree diagram, calculate P(RB OR BR). [revealanswer q="284621"]Show Answer[/revealanswer] [hiddenanswer a="284621"]P(RB OR BR) = ([latex]\frac{3}{11}[/latex])([latex]\frac{8}{11}[/latex]) + ([latex]\frac{8}{11}[/latex])([latex]\frac{3}{11}[/latex]) = [latex]\frac{48}{121}[/latex] [/hiddenanswer] e. Using the tree diagram, calculate P(R on 1st draw AND B on 2nd draw). [revealanswer q="329409"]Show Answer[/revealanswer] [hiddenanswer a="329409"]P(R on 1st draw AND B on 2nd draw) =([latex]\frac{3}{11}[/latex])([latex]\frac{8}{11}[/latex]) = [latex]\frac{24}{121}[/latex] [/hiddenanswer] f. Using the tree diagram, calculate P(R on 2nd draw GIVEN B on 1st draw). [revealanswer q="787274"]Show Answer[/revealanswer] [hiddenanswer a="787274"]P(R on 2nd [latex]\mid[/latex] B on 1st) = [latex]\frac{24}{88}[/latex] = [latex]\frac{3}{11}[/latex] [/hiddenanswer] g. Using the tree diagram, calculate P(BB). [revealanswer q="929765"]Show Answer[/revealanswer] [hiddenanswer a="929765"]P(BB) = [latex]\frac{64}{121}[/latex] [/hiddenanswer] h. Using the tree diagram, calculate P(B on the 2nd draw given R on the first draw). [revealanswer q="532098"]Show Answer[/revealanswer] [hiddenanswer a="532098"]]P(B on 2nd [latex]\mid[/latex] R on 1st) = [latex]\frac{8}{11}[/latex] There are 9 + 24 outcomes that have R on the first draw (9 RR and 24 RB). The sample space is then 9 + 24 = 33. 24 of the 33 outcomes have B on the second draw. Therefore, P(B on the 2nd draw given R on the first draw) = [latex]\frac{24}{33}[/latex] = [latex]\frac{8}{11}[/latex] [/hiddenanswer]"Without replacement" means that you do not put the first ball back before you select the second marble. Following is a tree diagram for this situation. The branches are labeled with probabilities instead of frequencies. The numbers at the ends of the branches are calculated by multiplying the numbers on the two corresponding branches.
Note:If you draw a red on the first draw from the three red possibilities, there are two red marbles left to draw on the second draw. You do not put back or replace the first marble after you have drawn it. You draw without replacement, so that on the second draw there are ten marbles left in the urn. 
a. P(RR) = ________ [revealanswer q="328182"]Show Answer[/revealanswer] [hiddenanswer a="328182"]P(RR) = [latex](\frac{3}{11})(\frac{2}{10}) = \frac{6}{110}[/latex][/hiddenanswer]
P(RB or BR) = ____________ [revealanswer q="735841"]Show Answer[/revealanswer] [hiddenanswer a="735841"]P(RB or BR) = [latex](\frac{3}{11})(\frac{8}{10})+(\frac{8}{11})(\frac{3}{10}) = \frac{48}{110}[/latex] [/hiddenanswer]
P(R on 1st AND B on 2nd) = P(RB) = (___)(___) = [latex]\frac{24}{100}[/latex] [revealanswer q="406859"]Show Answer[/revealanswer] [hiddenanswer a="406859"]P(R on 1st AND B on 2nd) = P(RB) = ([latex]\frac{3}{11}[/latex])([latex]\frac{8}{10}[/latex]) = [latex]\frac{24}{110}[/latex][/hiddenanswer]
If we are using probabilities, we can label the tree in the following general way.
A Venn diagram is a picture that represents the outcomes of an experiment. It generally consists of a box that represents the sample space S together with circles or ovals. The circles or ovals represent events.
The sample space when you flip two fair coins is X = {HH, HT, TH, TT}. The outcome HH is in NEITHER A NOR B. The Venn diagram is as follows:
In a bookstore, the probability that the customer buys a novel is 0.6, and the probability that the customer buys a nonfiction book is 0.4. Suppose that the probability that the customer buys both is 0.2.
e. P(R on 2nd draw GIVEN B on 1st draw) = P(R on 2ndB on 1st) = 2488 = 311
This problem is a conditional one. The sample space has been reduced to those outcomes that already have a blue on the first draw. There are 24 + 64 = 88 possible outcomes (24 BR and 64 BB). Twentyfour of the 88 possible outcomes are BR. 2488 = 311.
f. P(BB) = 64121g. P(B on 2nd drawR on 1st draw) = 811
There are 9 + 24 outcomes that have R on the first draw (9 RR and 24 RB). The sample space is then 9 + 24 = 33. 24 of the 33 outcomes have B on the second draw. The probability is then 2433.
Solutions to Try These 2:A student takes a tenquestion, truefalse quiz. Because the student had such a busy schedule, he or she could not study and guesses randomly at each answer. What is the probability of the student passing the test with at least a 70%?
Small companies might be interested in the number of longdistance phone calls their employees make during the peak time of the day. Suppose the average is 20 calls. What is the probability that the employees make more than 20 longdistance phone calls during the peak time?
These two examples illustrate two different types of probability problems involving discrete random variables. Recall that discrete data are data that you can count. Arandom variable describes the outcomes of a statistical experiment in words. The values of a random variable can vary with each repetition of an experiment.
Upper case letters such as X or Y denote a random variable. Lower case letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is quantitative.
For example, let X = the number of heads you get when you toss three fair coins. The sample space for the toss of three fair coins is TTT; THH; HTH; HHT; HTT; THT; TTH;HHH. Then, x = 0, 1, 2, 3. X is in words and x is a number. Notice that for this example, the x values are countable outcomes. Because you can count the possible values that X can take on and the outcomes are random (the x values 0, 1, 2, 3), X is a discrete random variable.
P(x) = probability that X takes on a value x.
Probability distribution table for Example 1x  P(x) 

0  P(x = 0) = [latex]\frac{2}{50}[/latex] 
1  P(x = 1) = [latex]\frac{11}{50}[/latex] 
2  P(x = 2) = [latex]\frac{23}{50}[/latex] 
3  P(x = 3) = [latex]\frac{9}{50}[/latex] 
4  P(x = 4) = [latex]\frac{4}{50}[/latex] 
5  P(x = 5) = [latex]\frac{1}{50}[/latex] 
X takes on the values 0, 1, 2, 3, 4, 5. This is a discrete PDF because:
X  P(X) 
0  1% = 0.01 
1  4% = 0.04 
2  15% = 0.15 
3  80% = 0.80 
Jeremiah has basketball practice two days a week. Ninety percent of the time, he attends both practices. Eight percent of the time, he attends one practice. Two percent of the time, he does not attend either practice. What is X and what values does it take on?
[revealanswer q="932796"]Show Answer[/revealanswer] [hiddenanswer a="932796"] X is the number of days Jeremiah attends basketball practice per week. X takes on the values 0, 1, and 2.Number of days Jeremiah attends basketball practice per week, X  P(X) 
0  2% = 0.02 
1  8% = 0.08 
2  90% = 0.90 
The characteristics of a probability distribution function (PDF) for a discrete random variable are as follows:
c.
x  P(x) 

0  0.01 
1  0.04 
2  0.15 
3  0.80 
Expected Value, [latex]\mu[/latex] = [latex]{x}_{1}*P({x}_{1})[/latex] + [latex]{x}_{2}*P({x}_{2})[/latex] + [latex]{x}_{3}*P({x}_{3})[/latex] + .... = sum of all (x * P(x) ) Variance, [latex]\sigma[/latex]^{2 }= sum of all ((x – μ)^{2} ⋅ P(x)) Standard deviation, [latex]\sigma[/latex] = [latex]\sqrt{{\sigma}^{2}}[/latex] 
x  P(x)  x ⋅ P(x) 

0  0.2  (0)(0.2) = 0 
1  0.5  (1)(0.5) = 0.5 
2  0.3  (2)(0.3) = 0.6 
x (Number of times a newborn baby crying after midnight)  P(x)  x ⋅ P(x)  (x – μ)^{2} ⋅ P(x) 

0  [latex]\displaystyle{P}{({x}={0})}=\frac{{2}}{{50}}[/latex]  [latex]\displaystyle{({0})}{(\frac{{2}}{{50}})}=\frac{{0}}{{50}}[/latex]  [latex]\displaystyle { ( { 0 }  { 2.1 } ) } ^{ { 2 } } \cdot { 0.04}={0.1764}[/latex] 
1  [latex]\displaystyle{P}{({x}={1})}=\frac{{11}}{{50}}[/latex]  [latex]\displaystyle{({1})}{(\frac{{11}}{{50}})}=\frac{{11}}{{50}}[/latex]  [latex]\displaystyle{({1}{2.1})}^{{2}}\cdot{0.22}={0.2662}[/latex] 
2  [latex]\displaystyle{P}{({x}={2})}=\frac{{23}}{{50}}[/latex]  [latex]\displaystyle{({2})}{(\frac{{23}}{{50}})}=\frac{{46}}{{50}}[/latex]  [latex]\displaystyle{({2}{2.1})}^{{2}}\cdot{0.46}={0.0046}[/latex] 
3  [latex]\displaystyle{P}{({x}={3})}=\frac{{9}}{{50}}[/latex]  [latex]\displaystyle{({3})}{(\frac{{9}}{{50}})}=\frac{{27}}{{50}}[/latex]  [latex]\displaystyle{({3}{2.1})}^{{2}}\cdot{0.18}={0.1458}[/latex] 
4  [latex]\displaystyle{P}{({x}={4})}=\frac{{4}}{{50}}[/latex]  [latex]\displaystyle{({4})}{(\frac{{4}}{{50}})}=\frac{{16}}{{50}}[/latex]  [latex]\displaystyle{({4}{2.1})}^{{2}}\cdot{0.08}={0.2888}[/latex] 
5  [latex]\displaystyle{P}{({x}={5})}=\frac{{1}}{{50}}[/latex]  [latex]\displaystyle{({5})}{(\frac{{1}}{{50}})}=\frac{{5}}{{50}}[/latex]  [latex]\displaystyle{({5}{2.1})}^{{2}}\cdot{0.02}={0.1682}[/latex] 
sum of all (x ⋅ P(x) ) = 2.1  sum of all ( (x – μ)^{2} ⋅ P(x) ) = 1.05 
x  P(x) 

0  [latex]\displaystyle{P}{({x}={0})}=\frac{{4}}{{50}}[/latex] 
1  [latex]\displaystyle{P}{({x}={1})}=\frac{{8}}{{50}}[/latex] 
2  [latex]\displaystyle{P}{({x}={2})}=\frac{{16}}{{50}}[/latex] 
3  [latex]\displaystyle{P}{({x}={3})}=\frac{{14}}{{50}}[/latex] 
4  [latex]\displaystyle{P}{({x}={4})}=\frac{{6}}{{50}}[/latex] 
5  [latex]\displaystyle{P}{({x}={5})}=\frac{{2}}{{50}}[/latex] 
x  P(x)  x ⋅ P(x)  

Loss  –2  0.99999  (–2)(0.99999) = –1.99998 
Profit  100,000  0.00001  (100000)(0.00001) = 1 
x  ____  ____  

WIN  10  [latex]\displaystyle \frac{{1}}{{3}}[/latex]  ____ 
LOSE  ____  ____  [latex]\displaystyle \frac{{12}}{{3}}[/latex] 
x  P(x)  x ⋅ P(x)  

WIN  10  [latex]\displaystyle \frac{{1}}{{3}} \\[/latex]  [latex]\displaystyle \frac{{10}}{{3}} \\[/latex] 
LOSE  –6  [latex]\displaystyle \frac{{2}}{{3}} \\[/latex]  [latex]\displaystyle \frac{{12}}{{3}} \\[/latex] 
x  P(x)  x ⋅ P(x)  
Red   [latex]\frac{{20}}{{5}}[/latex]  
Blue  [latex]\frac{2}{5}[/latex]  
Green  10 
x  P(x)  x ⋅ P(x)  

Red  –10  [latex]\frac{{2}}{{5}}[/latex]  [latex]\frac{{20}}{{5}}[/latex] 
Blue  0  [latex]\frac{{2}}{{5}}[/latex]  [latex]\frac{{0}}{{5}}[/latex] 
Green  10  [latex]\frac{{1}}{{5}}[/latex]  [latex]\frac{{10}}{{5}}[/latex] 
x  P(x)  x ⋅ P(x)  (x – μ)^{2}⋅ P(x) 

0  0.2  (0)(0.2) = 0  (0 – 1.1 )^{2}⋅ (0.2) = 0.242 
1  0.5  (1)(0.5) = 0.5  (1 – 1.1 )^{2}⋅ (0.5) = 0.005 
2  0.3  (2)(0.3) = 0.6  (2 – 1.1 )^{2}⋅ (0.3) = 0.243 
(1, 1)  (1, 2)  (1, 3)  (1, 4)  (1, 5)  (1, 6) 
(2, 1)  (2, 2)  (2, 3)  (2, 4)  (2, 5)  (2, 6) 
(3, 1)  (3, 2)  (3, 3)  (3, 4)  (3, 5)  (3, 6) 
(4, 1)  (4, 2)  (4, 3)  (4, 4)  (4, 5)  (4, 6) 
(5, 1)  (5, 2)  (5, 3)  (5, 4)  (5, 5)  (5, 6) 
(6, 1)  (6, 2)  (6, 3)  (6, 4)  (6, 5)  (6, 6) 
x  P(x)  x ⋅ P(x)  (x – μ)^{2} ⋅ P(x) 

0  [latex]\displaystyle\frac{{9}}{{36}}[/latex]  0  [latex]\displaystyle{({0}{1})}^{{2}} \cdot \frac{{9}}{{36}}=\frac{{9}}{{36}}[/latex] 
1  [latex]\displaystyle\frac{{18}}{{36}}[/latex]  [latex]\displaystyle\frac{{18}}{{36}}[/latex]  [latex]\displaystyle{({1}{1})}^{{2}} \cdot \frac{{18}}{{36}}={0}[/latex] 
2  [latex]\displaystyle\frac{{9}}{{36}}[/latex]  [latex]\displaystyle\frac{{18}}{{36}}[/latex]  [latex]\displaystyle{({2}{1})}^{{2}} \cdot \frac{{9}}{{36}}=\frac{{9}}{{36}}[/latex] 
x  P(x)  x⋅ P(x)  (x – μ)^{2}⋅ P(x)  

win  50  0.2142  10.71  (50 – (–5.006))^{2} (0.2142) = 648.0964 
loss  –20  0.7858  –15.716  (–20 – (–5.006))^{2}⋅ (0.7858) = 176.6636 
x  P(x)  x ⋅ (Px)  (x  μ)^{2}P(x)  

win  100  0.0108  1.08  [100 – (–8.812)]^{2} ⋅ 0.0108 = 127.8726 
loss  –10  0.9892  –9.892  [–10 – (–8.812)]^{2} ⋅ 0.9892 = 1.3961 
[latex]X\sim{B}(n,p)[/latex]
Read this as "X is a random variable with a binomial distribution." The parameters are n and p; [latex]n=[/latex] number of trials, [latex]p=[/latex] probability of a success on each trial.[latex]\sigma=\sqrt{(20)(0.41)(0.59)}={2.20}\\[/latex]
[/hiddenanswer][latex]\sigma=\sqrt{npq}\\[/latex]
[latex]p+q=1[/latex]
[latex]q=1–p[/latex]
The mean of X is [latex]\mu=np[/latex]. The standard deviation of X is [latex]\sigma=\sqrt{{{n}{p}{q}}}\\[/latex] https://youtu.be/xNLQuuvE9ug [revealanswer q="609833"]Show References[/revealanswer] [hiddenanswer a="609833"] "Access to electricity (% of population)," The World Bank, 2013. Available online at http://data.worldbank.org/indicator/EG.ELC.ACCS.ZS?order=wbapi_data_value_2009%20wbapi_data_value%20wbapi_data_valuefirst&sort=asc (accessed May 15, 2015). "Distance Education." Wikipedia. Available online at http://en.wikipedia.org/wiki/Distance_education (accessed May 15, 2013). "NBA Statistics – 2013," ESPN NBA, 2013. Available online at http://espn.go.com/nba/statistics/_/seasontype/2 (accessed May 15, 2013). Newport, Frank. "Americans Still Enjoy Saving Rather than Spending: Few demographic differences seen in these views other than by income," GALLUP® Economy, 2013. Available online at http://www.gallup.com/poll/162368/americansenjoysavingratherspending.aspx (accessed May 15, 2013). Pryor, John H., Linda DeAngelo, Laura Palucki Blake, Sylvia Hurtado, Serge Tran. The American Freshman: National Norms Fall 2011. Los Angeles: Cooperative Institutional Research Program at the Higher Education Research Institute at UCLA, 2011. Also available online at http://heri.ucla.edu/PDFs/pubs/TFS/Norms/Monographs/TheAmericanFreshman2011.pdf (accessed May 15, 2013). "The World FactBook," Central Intelligence Agency. Available online at https://www.cia.gov/library/publications/theworldfactbook/geos/af.html (accessed May 15, 2013). "What are the key statistics about pancreatic cancer?" American Cancer Society, 2013. Available online at http://www.cancer.org/cancer/pancreaticcancer/detailedguide/pancreaticcancerkeystatistics (accessed May 15, 2013). [/hiddenanswer] ]]>[latex]p = 0.1[/latex]
[latex]q = 0.9[/latex]
[/hiddenanswer][latex]X{\sim}G(p)[/latex]
Read this as "X is a random variable with a geometric distribution." The parameter is p; [latex]p=[/latex] the probability of a success for each trial.[latex]X{\sim}G(0.02)[/latex]
Find [latex]P(x=7)[/latex]. [latex]P(x=7)=0.0177[/latex]. To find the probability that [latex]x=7[/latex],The random variable [latex]X=[/latex] the number of occurrences in the interval of interest.
The probability question asks you to find [latex]P(x=3)[/latex].
[latex]X{\sim}P(\mu)[/latex]
Read this as "X is a random variable with a Poisson distribution." The parameter is μ (or λ); μ (or λ) [latex]=[/latex] the mean for the interval of interest.
[latex]x=0,1,2,3,...[/latex]
If Leah receives, on the average, six telephone calls in two hours, and there are eight 15 minute intervals in two hours, then Leah receives [latex](18)(6)=0.75[/latex] calls in 15 minutes, on average. So, [latex]\mu=0.75[/latex] for this problem.
[latex]X ~ P(0.75)[/latex]
Find P(x > 1). P(x > 1) = 0.1734 (calculator or computer)
The TI calculators use λ (lambda) for the mean.
The probability that Leah receives more than one telephone call in the next 15 minutes is about 0.1734:
P(x > 1) = 1 − poissoncdf(0.75, 1).The graph of X ~ P(0.75) is:
The yaxis contains the probability of x where X = the number of calls in 15 minutes.
A Poisson probability distribution of a discrete random variable gives the probability of a number of events occurring in a fixed interval of time or space, if these events happen at a known average rate and independently of the time since the last event. The Poisson distribution may be used to approximate the binomial, if the probability of success is "small" (less than or equal to 0.05) and the number of trials is "large" (greater than or equal to 20).
X ~ P(μ) means that X has a Poisson probability distribution where X = the number of occurrences in the interval of interest.
X takes on the values x = 0, 1, 2, 3, ...
The mean μ is typically given.
The variance is σ^{2} = μ, and the standard deviation is σ = [latex]\sqrt{\mu}[/latex].
When P(μ) is used to approximate a binomial distribution, μ = np where n represents the number of independent trials and p represents the probability of success in a single trial.
By the end of this chapter, the student should be able to:
In this chapter, you will study the normal distribution, the standard normal distribution, and applications associated with them.
The normal distribution has two parameters (two numerical descriptive measures), the mean (μ) and the standard deviation (σ). If X is a quantity to be measured that has a normal distribution with mean (μ) and standard deviation (σ), we designate this by writing
The probability density function is a rather complicated function.
The curve is symmetrical about a vertical line drawn through the mean, μ. In theory, the mean is the same as the median, because the graph is symmetric about μ. As the notation indicates, the normal distribution depends only on the mean and the standard deviation. Since the area under the curve must equal one, a change in the standard deviation, σ, causes a change in the shape of the curve; the curve becomes fatter or skinnier depending on σ.
X ∼ N(μ, σ)
μ = the mean σ = the standard deviation
Notice that: 5 + (–0.67)(6) is approximately equal to 1. (This has the pattern μ + zσ = raw data.) 
2nd DISTR
. Then press 2:normalcdf
.
The syntax for the instructions are as follows: normalcdf (lower value, upper value, mean, standard deviation).
For this problem: normalcdf (65,10^99,63,5) and press "=".
The answer is 0.3446.2nd > Distr, and choose
3:invNorm
(*Enter the area to the left of z followed by ), and press ENTER
.
For this Example, the steps are
2nd > Distr >
3:invNorm
(.6554) > ENTER.
The answer is 0.3999 which rounds to 0.4.
****************************************************************************************************************************invNorm
.
The format for this function is invNorm(area to the left, mean, standard deviation)
For this problem, we need to input "invNorm(0.90,63,5)" and press "=".
The answer is 69.4.
So the 90th percentile is 69.4. This means that 90% of the test scores fall at or below 69.4 and 10% fall at or above.By the end of this chapter, the student should be able to:
If you draw random samples of size n, then as n increases, the random samples [latex]\displaystyle\overline{{X}}[/latex] which consists of sample means, tend to be normally distributed.
[latex]\displaystyle\overline{{X}}[/latex] ~ N ([latex]\displaystyle{\mu}_{x}[/latex], [latex]\displaystyle\frac{{\sigma_{x}}}{{\sqrt{n}}}[/latex])
The central limit theorem for sample means says that if you keep drawing larger and larger samples (such as rolling one, two, five, and finally, ten dice) and calculating their means, the sample means form their own normal distribution (the sampling distribution). The normal distribution has the same mean as the original distribution and a variance that equals the original variance divided by, the sample size. The variable n is the number of values that are averaged together, not the number of times the experiment is done. To put it more formally, if you draw random samples of size n, the distribution of the random variable [latex]\displaystyle\overline{{X}}[/latex], which consists of sample means, is called the sampling distribution of the mean. The sampling distribution of the mean approaches a normal distribution as the sample size n increases. The random variable [latex]\displaystyle\overline{{X}}[/latex] has a different zscore associated with it from that of the random variable X. The mean [latex]\displaystyle\overline{x}[/latex] is the value of [latex]\displaystyle\overline{X}[/latex] in one sample.z = [latex]\displaystyle\frac{{\overline{x}{\mu}_{x}}}{{\frac{{{\sigma}_{x}}}{{\sqrt{n}}}}}[/latex]
[latex]\displaystyle{\mu}_{x}[/latex] = [latex]\displaystyle{\mu}_{\overline{x}}[/latex] (mean of X = mean of [latex]\displaystyle\overline{X}[/latex]. ) [latex]\displaystyle{\sigma}_{\overline{x}} = {{\frac{{{\sigma}_{x}}}{{\sqrt{n}}}}}[/latex] = standard deviation of [latex]\displaystyle\overline{{X}}[/latex] and is called the standard error of the mean.
Guide for TICalculator:
To find probabilities for means on the TIcalculator, follow these steps:

Guide for TICalculator:
To find percentiles for means on the calculator, follow these steps.

TICalculator: normalcdf
(30,1E99,34,1.5)
The probability that the sample mean age is more than 30 = P(Χ > 30) = 0.9962
Guide for TICalculatorTo find probabilities for sums on the calculator, follow these steps.

Guide for TICalculator
To find percentiles for sums on the calculator, follow these steps.

NoteIf you are being asked to find the probability of an individual value, do not use the clt. Use the distribution of its random variable. 
Remember that the smallest stress score is one. 
k = 3.2
The 90th percentile for the mean of 75 scores is about 3.2. This tells us that 90% of all the means of 75 stress scores are at most 3.2, and that 10% are at least 3.2.For problems c and d, let ΣX = the sum of the 75 stress scores. Then, [latex]\displaystyle\sum{X}{\sim}{N}{[{({75})}{({3})},{(\sqrt{{75}})}{({1.15})}]}[/latex] The mean of the sum of 75 stress scores is (75)(3) = 225. The standard deviation of the sum of 75 stress scores is [latex]\displaystyle{(\sqrt{{75}})}[/latex](1.15) = 9.96 
Remember, since the smallest single score is 1, it is possible that we draw the smallest score of 1 for 75 times theoretically. The smallest total of 75 stress scores is 75. 
invNorm
(0.90,(75)(3),[latex]\displaystyle{(\sqrt{{75}})}[/latex](1.15))
k = 237.8
The 90th percentile for the sum of 75 scores is about 237.8.
This tells us that 90% of all the sums of 75 scores are no more than 237.8 and 10% are no less than 237.8.= 237.8
normalcdf
[latex]\displaystyle{({20},{1}\text{E99},{22},\frac{{22}}{\sqrt{{80}}})}[/latex]
The probability is 0.7919 that the mean excess time used is more than 20 minutes, for a sample of 80 customers who exceed their contracted time allowance.
Remember, 1E99 = 10^{99} and –1E99 = –10^{99}. Press the EE key for E. Or just use 10^{99} instead of 1E99. 
invNorm
= 26.0
normalcdf
(1.75,1.85,2,0.05)normalcdf
(E99,35,30.9,1.8)normalcdf
(50, E99,30.9,1.8)
For this sample group, it is almost impossible for the group's average age to be more than 50. However, it is still possible for an individual in this group to have an age greater than 50.normalcdf
(1600,E99,1514.10,63)normalcdf
(E99,1595,1514.10,63)
This means that there is a 90% chance that the sum of the ages for the sample group n = 49 is at most 1595.invNorm
(0.95,30.9,1.1)
This indicates that 95% of the prostitutes in the sample of 65 are younger than 32.7 years, on average.invNorm
(0.90,2008.5,72.56)
This indicates that 90% of the prostitutes in the sample of 65 have a sum of ages less than 2,101.5 years.TI83/84: normalcdf
(149.5,10^99,159,8.6447).TI83/84: normalcdf
(0,160.5,159,8.6447) = 0.5689TI83/84: normalcdf
(155.5,10^99,159,8.6447) = 0.6572.TI83/84: normalcdf
(0,146.5,159,8.6447) = 0.0741TI83/84: normalcdf
(174.5,175.5,159,8.6447) = 0.00831  binomialcdf
(300,0.53,149) = 0.8641binomialcdf
(300,0.53,160) = 0.56841  binomialcdf
(300,0.53,155) = 0.6576binomialcdf
(300,0.53,146) = 0.0742binomialpdf
(300,0.53,175) = 0.0083 (You need to use the binomial pdf.)We use sample data to make generalizations about an unknown population. This part of statistics is called inferential statistics. The sample data help us to make an estimate for a population. We realize that the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals.
Throughout the chapter, it is important to keep in mind that the confidence interval is a random variable. It is the population parameter that is fixed. https://www.youtube.com/embed/tFWsuO9f74oEach of [latex]\displaystyle\overline{x}[/latex] and s is called a statistic.
A confidence interval is another type of estimate but, instead of being just one number, it is an interval of numbers. The interval of numbers is a range of values calculated from a given set of sample data. The confidence interval is likely to include an unknown population parameter.
Suppose, for the iTunes example, we do not know the population mean μ, but we do know that the population standard deviation is σ = 1 and our sample size is 100. Then, by the central limit theorem, the standard deviation for the sample mean is
[latex]\displaystyle\frac{{\sigma}}{{\sqrt{n}}}[/latex] = [latex]\frac{{1}}{{\sqrt{100}}}=0.1[/latex]
The empirical rule, which applies to bellshaped distributions, says that in approximately 95% of the samples, the sample mean ([latex]\displaystyle\overline{x}[/latex]) will be within two standard deviations of the population mean μ. For our iTunes example, mean μ = 0.1 and standard deviation σ = 0.1. Therefore, two standard deviations is (2)(0.1) = 0.2. The sample mean [latex]\displaystyle\overline{x}[/latex] = 0.1 is likely to be within 0.2 units of μ. Because [latex]\displaystyle\overline{x}[/latex] is within 0.2 units of μ, which is unknown, then μ is likely to be within 0.2 units of [latex]\displaystyle\overline{x}[/latex] in 95% of the samples. The population mean μ is contained in an interval whose lower number is calculated by taking the sample mean and subtracting two standard deviations (2)(0.1) and whose upper number is calculated by taking the sample mean and adding two standard deviations. In other words, μ is between [latex]\displaystyle\overline{x}[/latex]− 0.2 and [latex]\displaystyle\overline{x}[/latex] + 0.2 in 95% of all the samples. For the iTunes example, suppose that a sample produced a sample mean [latex]\displaystyle\overline{x}[/latex]= 2. Then the unknown population mean μ is between [latex]\displaystyle\overline{x}[/latex]−0.2 = 2−0.2 = 1.8 and [latex]\displaystyle\overline{x}[/latex] +0.2 = 2+0.2 = 2.2. The 95% confidence interval is (1.8, 2.2). We say that we are 95% confident that the unknown population mean number of songs downloaded from iTunes per month is between 1.8 and 2.2. The 95% confidence interval implies two possibilities. Either the interval (1.8, 2.2) contains the true mean μ or our sample produced an [latex]\displaystyle\overline{x}[/latex] that is not within 0.2 units of the true mean μ. The second possibility happens for only 5% of all the samples (95–100%).Remember that a confidence interval is created for an unknown population parameter like the population mean, μ. Confidence intervals for some parameters have the form:
(point estimate – margin of error, point estimate + margin of error)
The margin of error depends on the confidence level or percentage of confidence and the standard error of the mean. When you read newspapers and journals, some reports will use the phrase "margin of error." Other reports will not use that phrase, but include a confidence interval as the point estimate plus or minus the margin of error. These are two ways of expressing the same concept.Note
Although the text only covers symmetrical confidence intervals, there are nonsymmetrical confidence intervals (for example, a confidence interval for the standard deviation). 
(point estimate – error bound, point estimate + error bound) or, in symbols, [latex]\displaystyle{(\overline{{x}}  {EBM},\overline{{x}} + {EBM})}[/latex]
The margin of error (EBM) depends on the confidence level (abbreviated CL). The confidence level is often considered the probability that the calculated confidence interval estimate will contain the true population parameter. However, it is more accurate to state that the confidence level is the percent of confidence intervals that contain the true population parameter when repeated samples are taken. Most often, it is the choice of the person constructing the confidence interval to choose a confidence level of 90% or higher because that person wants to be reasonably certain of his or her conclusions. There is another probability called alpha (α). α is related to the confidence level, CL. α is the probability that the interval does not contain the unknown population parameter.
Given that CL is the probability that the calculated confidence interval estimate will contain the true population parameter, α is the probability that the interval does not contain the unknown population parameter,mathematically, we can conclude that α + CL = 1. 
General form to find confidence interval: [latex]\displaystyle{(\overline{{x}}{EBM},\overline{{x}}+{EBM})}[/latex] 
Calculating the Confidence IntervalTo construct a confidence interval estimate for an unknown population mean, we need data from a random sample. The steps to construct and interpret the confidence interval are:

NoteRemember to use the area to the LEFT of ; in this chapter the last two inputs in the invNorm command are 0, 1, because you are using a standard normal distribution Z ~ N(0, 1). 
You can use technology to calculate the confidence interval directly. The first solution is shown stepbystep (Solution A). The second solution uses the TI83, 83+, and 84+ calculators (Solution B). 
[latex]\overline{x}={68}[/latex], [latex]\displaystyle\sigma=3[/latex], n = 6
[latex]{EBM}=({Z}_{\frac{{\alpha}}{{2}}})(\frac{{\sigma}}{{\sqrt{n}}})[/latex]
The confidence level is 90% (CL = 0.90)
CL = 0.90 so α = 1 – CL = 1 – 0.90 = 0.10 [latex]\displaystyle\frac{{\alpha}}{{2}}=0.05[/latex], [latex]\displaystyle{z}_{\frac{{\alpha}}{{2}}}={z}_{0.05}[/latex] The area to the right of z_{0.05} is 0.05 and the area to the left of z_{0.05 }is 1 – 0.05 = 0.95. [latex]\displaystyle\frac{{{z}_{\alpha}}}{{2}}={z}_{0.05}=1.645[/latex] TI83/84: invNorm(0.95, 0, 1) This can also be found using appropriate commands on other calculators, using a computer, or using a probability table for the standard normal distribution. EBM = (1.645)([latex]\displaystyle\frac{{3}}{{\sqrt{36}}}[/latex])= 0.8225 [latex]\displaystyle\overline{x}[/latex]  EBM = 68  0.8225 = 67.1775 [latex]\displaystyle\overline{x}[/latex] + EBM = 68 + 0.8225 = 68.8225 The 90% confidence interval is (67.1775, 68.8225).STAT
and arrow over to TESTS
.
Arrow down to 7: ZInterval
.
Press ENTER
.
Arrow to Stats
and press ENTER
.
Arrow down and enter three for σ, 68 for[latex]\displaystyle\overline{X}[/latex], 36 for n, and .90 for Clevel
.
Arrow down to Calculate
and press ENTER
.
The confidence interval is (67.178, 68.822).
InterpretationWe estimate with 90% confidence that the true population mean exam score for all statistics students is between 67.18 and 68.82.Explanation of 90% Confidence LevelNinety percent of all confidence intervals constructed in this way contain the true mean statistics exam score. For example, if we constructed 100 of these confidence intervals, we would expect 90 of them to contain the true population mean exam score. 
Phone Model  SAR  Phone Model  SAR  Phone Model  SAR 

Apple iPhone 4S  1.11  LG Ally  1.36  Pantech Laser  0.74 
BlackBerry Pearl 8120  1.48  LG AX275  1.34  Samsung Character  0.5 
BlackBerry Tour 9630  1.43  LG Cosmos  1.18  Samsung Epic 4G Touch  0.4 
Cricket TXTM8  1.3  LG CU515  1.3  Samsung M240  0.867 
HP/Palm Centro  1.09  LG Trax CU575  1.26  Samsung Messager III SCHR750  0.68 
HTC One V  0.455  Motorola Q9h  1.29  Samsung Nexus S  0.51 
HTC Touch Pro 2  1.41  Motorola Razr2 V8  0.36  Samsung SGHA227  1.13 
Huawei M835 Ideos  0.82  Motorola Razr2 V9  0.52  SGHa107 GoPhone  0.3 
Kyocera DuraPlus  0.78  Motorola V195s  1.6  Sony W350a  1.48 
Kyocera K127 Marbl  1.25  Nokia 1680  1.39  TMobile Concord  1.38 
You need to find z_{0.01} having the property that the area under the normal density curve to the right of z_{0.01} is 0.01 and the area to the left is 0.99. Use your calculator, a computer, or a probability table for the standard normal distribution to find z_{0.01 }= 2.326.
EBM = ([latex]\displaystyle{Z}_{0.01})(\frac{{\sigma}}{{\sqrt{n}}})=(2.236)\frac{{0.337}}{{\sqrt{30}}}=0.1431[/latex]
To find the 98% confidence interval, find[latex]\displaystyle\overline{x}\pm{EBM}[/latex]
[latex]\displaystyle\overline{x}[/latex]  EBM = 1.024  0.1431 = 0.8809 [latex]\displaystyle\overline{x}[/latex] + EBM = 1.024 + 0.1431 = 1.1671 We estimate with 98% confidence that the true SAR mean for the population of cell phones in the United States is between 0.8809 and 1.1671 watts per kilogram.Phone Model  SAR  Phone Model  SAR 

Blackberry Pearl 8120  1.48  Nokia E71x  1.53 
HTC Evo Design 4G  0.8  Nokia N75  0.68 
HTC Freestyle  1.15  Nokia N79  1.4 
LG Ally  1.36  Sagem Puma  1.24 
LG Fathom  0.77  Samsung Fascinate  0.57 
LG Optimus Vu  0.462  Samsung Infuse 4G  0.2 
Motorola Cliq XT  1.36  Samsung Nexus S  0.51 
Motorola Droid Pro  1.39  Samsung Replenish  0.3 
Motorola Droid Razr M  1.3  Sony W518a Walkman  0.73 
Nokia 7705 Twist  0.7  ZTE C79  0.869 
[latex]\displaystyle\overline{x}[/latex] = 68, [latex]\displaystyle{\sigma}={3}[/latex], n = 36
EBM =([latex]\displaystyle{z}_{\frac{{\alpha}}{{2}}})(\frac{{\sigma}}{{\sqrt{n}}}[/latex])
CL = 0.95 so α = 1 – CL = 1 – 0.95 = 0.05 The area to the right of z_{0.025} is 0.025 and the area to the left of z_{0.025} is 1 – 0.025 = 0.975.
TI83/84: invnorm(0.975,0,1)
This can also be found using appropriate commands on other calculators, using a computer, or using a probability table for the standard normal distribution.
InterpretationWe estimate with 95% confidence that the true population mean for all statistics exam scores is between 67.02 and 68.98.Explanation of 95% Confidence Level95% of all confidence intervals constructed in this way contain the true value of the population mean statistics exam score. 
Comparing the ResultsIn Example 2, the 90% confidence interval is (67.18, 68.82). In Example 4, the 95% confidence interval is (67.02, 68.98). The 95% confidence interval is wider. If you look at the graphs, because the area 0.95 is larger than the area 0.90, it makes sense that the 95% confidence interval is wider. To be more confident that the confidence interval actually does contain the true value of the population mean for all statistics exam scores, the confidence interval necessarily needs to be wider. 
What happens to the error bound(EBM) if the sample size is changed?
In example 2, we suppose scores on exams in statistics are normally distributed with an unknown population mean and a population standard deviation of 3 points. A random sample of 36 scores is taken and gives a sample mean (sample mean score) of 68.We estimate with 90% confidence that the true population mean exam score for all statistics students is between 67.18 and 68.82. 
[latex]\displaystyle\overline{X}{\sim}{N}\left({\mu}_{x}, \frac{{\sigma}}{{\sqrt{n}}}\right)[/latex]. The distribution of sample means is normally distributed with mean equal to the population mean and standard deviation given by the population standard deviation divided by the square root of the sample size.
The general form for a confidence interval for a single population mean, known standard deviation, normal distribution is given by (lower bound, upper bound) = (point estimate – EBM, point estimate + EBM) =([latex]\displaystyle\overline{x}[/latex]  EBM, [latex]\displaystyle\overline{x}[/latex]+EBM) =([latex]\displaystyle\overline{x}{z}_{\frac{{\alpha}}{{\sqrt{n}}}}, \overline{x}+{z}_{\frac{{\alpha}}{{\sqrt{n}}}}[/latex]) EBM = [latex]\displaystyle{z}_{\frac{{\alpha}}{{\sqrt{n}}}}[/latex]= the error bound for the mean, or the margin of error for a single population mean; this formula is used when the population standard deviation is known.CL = confidence level, or the proportion of confidence intervals created that are expected to contain the true population parameter
α = 1 – CL = the proportion of confidence intervals that will not contain the population
[latex]\displaystyle{z}_{\frac{{\alpha}}{{\sqrt{n}}}}[/latex]= the zscore with the property that the area to the right of the zscore is ∝2 this is the zscore used in the calculation of "EBM where α = 1 – CL. n = [latex]\displaystyle\frac{{{z}^{2}{\sigma}^{2}}}{{{EBM}^{2}}}[/latex] = the formula used to determine the sample size (n) needed to achieve a desired margin of error at a given level of confidenceGeneral form of a confidence interval
(lower value, upper value) = (point estimate−error bound, point estimate + error bound)
To find the error bound when you know the confidence interval error bound = upper value−point estimate OR error bound =[latex]\displaystyle\frac{{\text{upper}{value}{lower}{value}}}{{2}}[/latex]Single Population Mean, Known Standard Deviation, Normal Distribution
Use the Normal Distribution for Means, Population Standard Deviation is Known EBM=[latex]\displaystyle{z}_{\frac{{\alpha}}{{2}}\cdot\frac{{\sigma}}{{\sqrt{n}}}}[/latex] The confidence interval has the format EBM = ([latex]\displaystyle\overline{x}[/latex]  EBM, [latex]\displaystyle\overline{x}[/latex]+EBM) ]]>T ~ t_{df} where df = n – 1.
For example, if we have a sample of size n = 20 items, then we calculate the degrees of freedom as df = n  1 = 20  1 = 19 and we write the distribution as T ~ t_{19}.
If the population standard deviation is not known, the error bound for a population mean is:EBM = [latex]\displaystyle({t}_{\frac{{\alpha}}{{2}}})(\frac{{\sigma}}{{\sqrt{n}}})[/latex]
The format for the confidence interval is: ([latex]\displaystyle\overline{x}[/latex]  EBM, [latex]\displaystyle\overline{x}[/latex] + EBM)
In other words, the formula for the confidence interval is
([latex]\displaystyle\overline{x}[/latex]  [latex]\displaystyle({t}_{\frac{{\alpha}}{{2}}})(\frac{{\sigma}}{{\sqrt{n}}})[/latex], [latex]\displaystyle\overline{x}[/latex] + [latex]\displaystyle({t}_{\frac{{\alpha}}{{2}}})(\frac{{\sigma}}{{\sqrt{n}}})[/latex])
Calculate the Confidence Interval bu using TICalculator:Press STAT. Arrow over to TESTS.Arrow down to 8:TInterval and press ENTER (or just press 8). 
8.6 9.4 7.9 6.8 8.3 7.3 9.2 9.6 8.7 11.4 10.3 5.4 8.1 5.5 6.9

[latex]\displaystyle\overline{X}[/latex] = 8.2267, s = 1.6722, n = 15, df = 15 – 1 = 14
CL = 0.95. Then α = 1 – CL = 1 – 0.95 = 0.05
[latex]\displaystyle\frac{{\alpha}}{{2}}[/latex] = 0.025
[latex]\displaystyle{t}_{\frac{{\alpha}}{{2}}} = {t}_{0.025}[/latex]
The area to the right of [latex]\displaystyle{t}_{0.025}[/latex] is 0.025, and the area to the left of[latex]\displaystyle{t}_{o.o25}[/latex] is 1 – 0.025 = 0.975
TI84+ Calculator : invT(.975,14) or use ttable to find [latex]\displaystyle{t}_{\frac{{\alpha}}{{2}}}[/latex] when df = 14.
[latex]\displaystyle{t}_{\frac{{\alpha}}{{2}}}={t}_{0.025}={2.14}[/latex]
EBM = [latex]\displaystyle({t}_{\frac{{\alpha}}{{2}}})(\frac{{s}}{{\sqrt{n}}})[/latex]
Hence, EBM = [latex]\displaystyle(2.14)(\frac{{1.6722}}{{\sqrt{15}}})=(0.924)[/latex]
[latex]\displaystyle\overline{x}[/latex]  EBM = 8.2267  0.9240 = 7.3
[latex]\displaystyle\overline{x}[/latex] + EBM =8.2267 + 0.9240 = 9.15
The 95% confidence interval is (7.30, 9.15). We estimate with 95% confidence that the true population mean sensory rate is between 7.30 and 9.15.STAT
and arrow over toT ESTS
.
Arrow down to 8:TInterval
and press ENTER
(or you can just press8
).
Arrow to Data
and press ENTER
.
Arrow down to List
and enter the list name where you put the data.
There should be a 1 after Freq
.
Arrow down to Clevel
and enter 0.95
Arrow down to Calculate
and press ENTER
.
The 95% confidence interval is (7.3006, 9.1527)
Note: When calculating the error bound, a probability table for the Student's tdistribution can also be used to find the value of t. The table gives tscores that correspond to the confidence level (column) and degrees of freedom (row); the tscore is found where the row and column intersect in the table. 
8.2; 9.1; 7.7; 8.6; 6.9; 11.2; 10.1; 9.9; 8.9; 9.2; 7.5; 10.5
[practicearea rows="1"][/practicearea] [revealanswer q="52249"]Show Answer[/revealanswer] [hiddenanswer a="52249"](8.1634, 9.8032)[/hiddenanswer]79  145  147  160  116  100  159  151  156  126 
137  83  156  94  121  144  123  114  139  99 
From the sample, you can calculate [latex]\displaystyle\overline{x}[/latex]=127.45
and s = 25.965. There are 20 infants in the sample, so n = 20, and df = 20 – 1 = 19. You are asked to calculate a 90% confidence interval: CL = 0.90, so [latex]\displaystyle\alpha[/latex]= 1CL = 10.90 = 0.10[latex]\displaystyle\frac{{\alpha}}{{2}}[/latex]= 0.05
[latex]\displaystyle({t}_{\frac{{\alpha}}{{2}}}={t}_{0.05}[/latex]
By definition, the area to the right of t_{0.05} is 0.05 and so the area to the left of t_{0.05} is 1 – 0.05 = 0.95. Use a table, calculator, or computer to find that t_{0.05} = 1.729. EBM = [latex]\displaystyle{t}_{\frac{{\alpha}}{{2}}}(\frac{{s}}{{\sqrt{n}}})[/latex]={1.729}([latex]\displaystyle\frac{{25.965}}{{\sqrt{20}}}={10.038}[/latex] [latex]\displaystyle\overline{x}[/latex]  EBM = 127.45  10.038 = 117.412 [latex]\displaystyle\overline{x}[/latex]+EBM= 127.45 + 10.038= 137.488 We estimate with 90% confidence that the mean number of all targeted industrial chemicals found in cord blood in the United States is between 117.412 and 137.488.STAT
and arrow over toTESTS
.
Arrow down to 8:TInterval
and press ENTER
(or you can just press8
).
Arrow to Data and pressENTER
.
Arrow down to List
and enter the list name where you put the data.
Arrow down to Freq
and enter 1.
Arrow down to Clevel
and enter 0.90
Arrow down to Calculate
and press ENTER
.
The 90% confidence interval is (117.41, 137.49).
0  3  1  20  9 
5  10  1  10  4 
14  2  4  4  5 
[latex]\displaystyle\overline{x}[/latex]= 6.133, s = 5.514, n= 15, and df = 151=14
CL = 0.98, so [latex]\displaystyle\alpha[/latex] = 1 CL = 1.0.98 = 0.02
[latex]\displaystyle({t}_{\frac{{\alpha}}{{2}}}={t}_{0.01}[/latex]
[latex]\displaystyle({t}_{\frac{{\alpha}}{{2}}}={t}_{0.01}={2.624}[/latex]
EBM = [latex]\displaystyle{t}_{\frac{{\alpha}}{{2}}}(\frac{{s}}{{\sqrt{n}}})[/latex]={2.624}(\frac{{5.514}}{{\sqrt{15}}}={3.736}[/latex] [latex]\displaystyle\overline{x}[/latex]  EBM = 6.133  3.736 = 2.397 [latex]\displaystyle\overline{x}[/latex]+EBM= 16.133 +3.736= 9.869 We estimate with 98% confidence that the mean number of all hours that statistics students spend watching television in one week is between 2.397 and 9.869.STAT
and arrow over to TESTS
.
Arrow down to8:TInterval
.
PressENTER
.
Arrow toData
and pressENTER
.
Arrow down and enter the name of the list where the data is stored.
EnterFreq
: 1Enter CLevel
: 0.98
Arrow down toCalculate
and pressEnter
.
The 98% confidence interval is (2.3965, 9,8702).
[latex]\displaystyle\frac{{\overline{x}\mu}}{{\frac{{s}}{{\sqrt{n}}}}}[/latex]
The tscore follows the Student's tdistribution with n – 1 degrees of freedom. The confidence interval under this distribution is calculated with EBM = [latex]\displaystyle{t}_{\frac{{\alpha}}{{2}}}(\frac{{s}}{{\sqrt{n}}})[/latex] where[latex]\displaystyle{t}_{\frac{{\alpha}}{{2}}}[/latex]s the tscore with area to the right equal to [latex]\displaystyle\frac{{\alpha}}{{2}}[/latex] s is the sample standard deviation, and n is the sample size. Use a table, calculator, or computer to find [latex]\displaystyle\frac{{\alpha}}{{2}}[/latex] for a given a.
t = [latex]\displaystyle\frac{{\overline{x}\mu}}{{\frac{{s}}{{\sqrt{n}}}}}[/latex]
s the formula for the tscore which measures how far away a measure is from the population mean in the Student's tdistribution df = n – 1; the degrees of freedom for a Student's tdistribution where n represents the size of the sample T~t_{df} the random variable, T, has a Student's tdistribution with df degrees of freedom s the formula for the tscore which measures how far away a measure is from the population mean in the Student's tdistribution df = n – 1; the degrees of freedom for a Student's tdistribution where n represents the size of the sample T~t_{df} the random variable, T, has a Student's tdistribution with df degrees of freedom [latex]\displaystyle{t}_{\frac{{\alpha}}{{2}}}(\frac{{s}}{{\sqrt{n}}})[/latex] = the error bound for the population mean when the population standard deviation is unknown is the tscore in the Student's tdistribution with area to the right equal to The general form for a confidence interval for a single mean, population standard deviation unknown, Student's t is given by (lower bound, upper bound) = (point estimate – EBM, point estimate + EBM) = [latex]\displaystyle(\overline{x} \frac{{ts}}{{\sqrt{n}}},\overline{x} +\frac{{ts}}{{\sqrt{n}}})[/latex] ]]>[latex]\displaystyle{X}[/latex] ~ [latex]{N}{({n}{p},\sqrt{{{n}{p}{q}}})}[/latex]
If we divide the random variable, the mean, and the standard deviation by n, we get a normal distribution of proportions with P′, called the estimated proportion, as the random variable. (Recall that a proportion as the number of successes divided by n.)[latex]\displaystyle\frac{{X}}{{n}}={P'}{\sim}{N}{(\frac{{{n}{p}}}{{n}},\frac{{\sqrt{{{n}{p}{q}}}}}{{n}})}[/latex]
Using algebra to simplify:[latex]\displaystyle\frac{{\sqrt{{{n}{p}{q}}}}}{{n}}=\sqrt{{\frac{{{p}{q}}}{{n}}}}[/latex]
P′ follows a normal distribution for proportions:[latex]\displaystyle\frac{{X}}{{n}}={P'}{\sim}{N}{(\frac{{{n}{p}}}{{n}},\frac{{\sqrt{{{n}{p}{q}}}}}{{n}})}[/latex]
The confidence interval has the form (p′ – EBP, p′ + EBP) given that EBP is error bound for the proportion.

NoteFor the normal distribution of proportions, the zscore formula is as follows. If [latex]\displaystyle{P'}{\sim}{N}[/latex](p, [latex]\displaystyle\sqrt{\frac{{pq}}{{n}}}[/latex]) then the zscore formula is z = [latex]\displaystyle\frac{{p'p}}{{\sqrt{pqn}}}[/latex] 
[revealanswer q="269947"]Solution A (StepbyStep Solution)[/revealanswer] [hiddenanswer a="269947"]Let X = the number of people in the sample who have cell phones.To calculate the confidence interval, you must find p′, q′, and EBP.n = 500, x = the number of successes = 421p'= [latex]\displaystyle\frac{{x}}{{n}}[/latex] = [latex]\frac{{421}}{{500}}[/latex] = 0.842p′ = 0.842 is the sample proportion; this is the point estimate of the population proportion.q′ = 1 – p′ = 1 – 0.842 = 0.158 Since CL = 0.95, then α = 1 – CL = 1 – 0.95 = 0.05. [latex]\frac{\alpha}{2}[/latex] = 0.025 Then [latex]\displaystyle{Z}_{\frac{{\alpha}}{{2}}}={Z}_{0.025}[/latex] = 1.96 Use the TI83, 83+, or 84+ calculator command invNorm(0.975,0,1) to find Z0.025. Remember that the area to the right of Z_{0.025} is 0.025 and the area to the left of Z_{0.025} is 0.975. This can also be found using appropriate commands on other calculators, using a computer, or using a Standard Normal probability table. EBP = [latex]\displaystyle({Z}_{\frac{{\alpha}}{{2}}})(\sqrt{\frac{{p'q'}}{{n}}})[/latex] = (1.96)[latex]\displaystyle\sqrt{\frac{{(0.842)(0.158)}}{{500}}}[/latex] = 0.032 p' − EBP = 0.842 − 0.032 = 0.81 p′ + EBP = 0.842 + 0.032 = 0.874 The 95% confidence interval for the true binomial population proportion is ( p′ – EBP, p′ + EBP) = (0.810, 0.874).[/hiddenanswer] 
[revealanswer q="850187"]Solution B (Using TICalculator)[/revealanswer]
[hiddenanswer a="850187"]

[revealanswer q="154949"]Solution A (StepbyStep Solution)[/revealanswer]
[hiddenanswer a="154949"]x = 300 and n = 500p' =[latex]\displaystyle\frac{{x}}{{n}} = \frac{{300}}{{500}}[/latex] = 0.600Since CL = 0.90, then α = 1 – CL = 1 – 0.90 = 0.10[latex]\frac{{\alpha}}{{2}}[/latex] = 0.05[latex]\displaystyle{Z}_{\frac{{\alpha}}{{2}}}[/latex] = [latex]\displaystyle{Z}_{0.05}[/latex] = 1.645
Use the TI83, 83+, or 84+ calculator command invNorm(0.95,0,1) to find Z_{0.05}.
Remember that the area to the right of Z_{0.05} is 0.05 and the area to the left of Z_{0.05} is 0.95.
This can also be found using appropriate commands on other calculators, using a computer, or using a standard normal probability table.
EBP = [latex]\displaystyle({Z}_{\frac{{\alpha}}{{2}}})(\sqrt{\frac{{p'q'}}{{n}}})[/latex] = (1.645)[latex]\displaystyle\sqrt{\frac{{(0.6)(0.4)}}{{500}}}[/latex] = 0.036
p'  EBP = 0.600  0.036 = 0.564
p' + EBP = 0.600 + 0.036 = 0.636
The confidence interval for the true binomial population proportion is (p′  EBP, p′ + EBP) = (0.564,0.636).[/hiddenanswer]


[revealanswer q="842375"]Solution B (Using TICalculator)[/revealanswer]
[hiddenanswer a="842375"]

[revealanswer q="868481"]Solution A (StepbyStep Solution)[/revealanswer] [hiddenanswer a="868481"]Sixtyeight percent (68%) of students own an iPod and a smart phone.p′ = 0.68, q′ = 1  p' = 1 – 0.68 = 0.32.Since CL = 0.97, we know α = 1 – 0.97 = 0.03.The area to the left of Z_{0.015} is 0.015, and the area to the right of Z_{0.015} is 1 – 0.015 = 0.985. Using the TI 83, 83+, or 84+ calculator function InvNorm( 0.985, 0, 1 ) / Standard Normal Table, the zscore Z_{0.015} = 2.17 EBP = [latex]\displaystyle({z}_{\frac{{\alpha}}{{2}}})(\sqrt{\frac{{p'q'}}{{n}}})[/latex] = (1.645)[latex]\displaystyle\sqrt{\frac{{(0.68)(0.32)}}{{300}}}[/latex] = 0.0269 p'  EBP = 0.68  0.0269 = 0.6531 q'  EBP = 0.68 + 0.0269 = 0.7069 We are 97% confident that the true proportion of all students who own an iPod and a smart phone is between 0.6531 and 0.7069.[/hiddenanswer] 
[revealanswer q="809574"]Solution B (Using TICalculator)[/revealanswer]
[hiddenanswer a="809574"]

[revealanswer q="795345"]Solution A (StepbyStep Solution)[/revealanswer] [hiddenanswer a="795345"] Six students out of 25 reported smoking within the past week, so x = 6 and n = 25. Because we are using the "plusfour" method, we will use x = 6 + 2 = 8 and n = 25 + 4 = 29. p' = [latex]\displaystyle\frac{{x}}{{n}} =\frac{{8}}{{29}}[/latex] = 0.276 q' = 1  p'  1  0.276 = 0.724 Since CL = 0.95, we know [latex]\alpha[/latex] = 1  CL = 1  0.95 = 0.05 [latex]\frac{\alpha}{2} = \frac{0.05}{2} = 0.025[/latex] Use the TI83, 83+, or 84+ calculator command invNorm(0.975,0,1) / Standard Normal probability table to find Z_{0.025}. [latex]{Z}_{\frac{\alpha}{2}}[/latex] = [latex]\displaystyle{Z}_{0.025}={1.96}[/latex] Remember that the area to the right of Z_{0.025} is 0.025 and the area to the left of Z_{0.025} is 0.975. EBP = [latex]\displaystyle({Z}_{\frac{{\alpha}}{{2}}})(\sqrt{\frac{{p'q'}}{{n}}})[/latex] = (1.96)[latex]\displaystyle\sqrt{\frac{{(0.276)(0.724)}}{{29}}}[/latex] = 0.1627 p' − EBP = 0.276 − 0.1627 = 0.1133 p′ + EBP = 0.276 + 0.1627 = 0.4387 We are 95% confident that the true proportion of all statistics students who smoke cigarettes is between 0.1133 and 0.4387.[/hiddenanswer] 
[revealanswer q="831559"]Solution B (Using TICalculator)[/revealanswer]
[hiddenanswer a="831559"]

[revealanswer q="852237"]Solution A (StepbyStep Solution)[/revealanswer] [hiddenanswer a="852237"]Using "plus four," we have x = 31 + 2 = 33 and n = 65 + 4 = 69.p' = [latex]\frac{x}{n}[/latex] = [latex]\frac{33}{69}[/latex]q' = 1  [latex]\frac{x}{n}[/latex] = 1  [latex]\frac{33}{69}[/latex] =[latex]\frac{36}{69}[/latex]Since CL = 0.96, we know [latex]\alpha[/latex] = 1  CL = 1  0.96 = 0.04 [latex]\frac{\alpha}{2} = \frac{0.04}{2} = 0.02[/latex] Remember that the area to the right of Z_{0.02} is 0.02 and the area to the left of Z_{0.02} is 0.98. Use the TI83, 83+, or 84+ calculator command invNorm(0.98,0,1) / Standard Normal probability table to find Z_{0.02}. [latex]{Z}_{\frac{\alpha}{2}}[/latex] = [latex]\displaystyle{Z}_{0.02}={2.0537}[/latex] EBP = [latex]\displaystyle({Z}_{\frac{{\alpha}}{{2}}})(\sqrt{\frac{{p'q'}}{{n}}})[/latex] = (2.0537)*[latex]\displaystyle\sqrt{\frac{{(\frac{33}{69})(\frac{36}{69})}}{{69}}}[/latex] = 0.1235 p' − EBP = [latex]\frac{33}{69}[/latex] − 0.1235 = 0.3548 p′ + EBP = [latex]\frac{33}{69}[/latex] + 0.1235 = 0.6018 We are 96% confident that between 35.48% and 60.18% of all freshmen at State U have declared a major.[/hiddenanswer] 
[revealanswer q="984316"]Solution B (Using TICalculator)[/revealanswer]
[hiddenanswer a="984316"]

[revealanswer q="546471"]Solution A (StepbyStep Solution)[/revealanswer] [hiddenanswer a="546471"]Using "plusfour," we have x = 13 + 2 = 15 and n = 50 + 4 = 54.p' = [latex]\frac{x}{n}[/latex] = [latex]\frac{15}{54}[/latex]q' = 1  [latex]\frac{x}{n}[/latex] = 1  [latex]\frac{15}{54}[/latex] =[latex]\frac{39}{54}[/latex]Since CL = 0.90, we know [latex]\alpha[/latex] = 1  CL = 1  0.90 = 0.10 [latex]\frac{\alpha}{2} = \frac{0.1}{2} = 0.05[/latex] Remember that the area to the right of Z0.05 is 0.05 and the area to the left of Z0.05 is 0.95. Use the TI83, 83+, or 84+ calculator command invNorm(0.95,0,1) / Standard Normal probability table to find Z_{0.05}. [latex]{Z}_{\frac{\alpha}{2}}[/latex] = [latex]\displaystyle{Z}_{0.05}={1.6449}[/latex] EBP = [latex]\displaystyle({Z}_{\frac{{\alpha}}{{2}}})(\sqrt{\frac{{p'q'}}{{n}}})[/latex] = (1.6449)*[latex]\displaystyle\sqrt{\frac{{(\frac{15}{54})(\frac{39}{54})}}{{54}}}[/latex] = 0.1003 p' − EBP = [latex]\frac{15}{54}[/latex] − 0.1003 = 0.1775 p′ + EBP = [latex]\frac{15}{54}[/latex] + 0.1003 = 0.3781 We are 90% confident that between 17.75% and 37.81% of all teens would report having more than 500 friends on Facebook.[/hiddenanswer] 
[revealanswer q="540385"]Solution B (Using TICalculator)[/revealanswer]
[hiddenanswer a="540385"]

[revealanswer q="654788"]Solution A (StepbyStep Solution)[/revealanswer] [hiddenanswer a="654788"] Using "plusfour," we have x = 159 + 2 = 161 and n = 588 + 4 = 592. p' = [latex]\frac{x}{n}[/latex] = [latex]\frac{161}{592}[/latex] q' = 1  [latex]\frac{x}{n}[/latex] = 1  [latex]\frac{161}{592}[/latex] =[latex]\frac{431}{592}[/latex] Since CL = 0.90, we know [latex]\alpha[/latex] = 1  CL = 1  0.90 = 0.10 [latex]\frac{\alpha}{2} = \frac{0.1}{2} = 0.05[/latex] Remember that the area to the right of Z_{0.05} is 0.05 and the area to the left of Z_{0.05} is 0.95. Use the TI83, 83+, or 84+ calculator command invNorm(0.95,0,1) / Standard Normal probability table to find Z_{0.05}. [latex]{Z}_{\frac{\alpha}{2}}[/latex] = [latex]\displaystyle{Z}_{0.05}={1.6449}[/latex] EBP = [latex]\displaystyle({Z}_{\frac{{\alpha}}{{2}}})(\sqrt{\frac{{p'q'}}{{n}}})[/latex] = (1.6449)*[latex]\displaystyle\sqrt{\frac{{(\frac{161}{592})(\frac{431}{592})}}{{592}}}[/latex] = 0.0301 p' − EBP = [latex]\frac{161}{592}[/latex] − 0.0301 = 0.2418 p′ + EBP = [latex]\frac{161}{592}[/latex] + 0.0301 = 0.3021 We are 90% confident that between 24.18% and 30.21% of all teens would report having more than 500 friends on Facebook.[/hiddenanswer] 
[revealanswer q="18501"]Solution B (Using TICalculator)[/revealanswer]
[hiddenanswer a="18501"]

p′ is a point estimate for ρ
s is a point estimate for σ ]]>By the end of this chapter, the student should be able to:
A statistician will make a decision about these claims. This process is called "hypothesis testing." A hypothesis test involves collecting data from a sample and evaluating the data. Then, the statistician makes a decision as to whether or not there is sufficient evidence, based upon analyses of the data, to reject the null hypothesis.
In this chapter, you will conduct hypothesis tests on single means and single proportions. You will also learn about the errors associated with these tests.
Hypothesis testing consists of two contradictory hypotheses or statements, a decision based on the data, and a conclusion. To perform a hypothesis test, a statistician will:
H_{0}  H_{a} 

equal (=)  not equal (≠) or greater than (>) or less than (<) 
greater than or equal to (≥)  less than (<) 
less than or equal to (≤)  more than (>) 
Note:H_{0} always has a symbol with an equal in it. H_{a} never has a symbol with an equal in it. The choice of symbol depends on the wording of the hypothesis test. However, be aware that many researchers (including one of the coauthors in research work) use = in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because we only make the decision to reject or not reject the null hypothesis. 
H_{0}: μ __ 66
H_{a}:μ __ 66
[practicearea rows="1"][/practicearea] [revealanswer q="509899"]Show Answer[/revealanswer] [hiddenanswer a="509899"] H_{0} : μ = 66 H_{a} : μ ≠ 66[/hiddenanswer]
H_{0} is actually  

Action  True  False 
Do not reject H_{0}  Correct Outcome  Type II Error ([latex]\beta[/latex]) 
Reject H_{0}  Type I Error ([latex]\alpha[/latex])  Correct Outcome 
Null Hypothesis: The rock climbing equipment is safe.  
Frank's decision  True (The equipment is safe)  False (The equipment is not safe.) 
Not reject H_{0 }  Correct decision  Type II Error 
Reject H_{0 }  Type I Error  Correct decision 
Null hypothesis: The blood cultures contain no traces of pathogen X.  
Decision (What the researcher thinks...)  True (The blood cultures contain no traces of pathogen X. )  False (The blood cultures contain traces of pathogen X.) 
Not reject H_{0 }  Correct decision  Type II Error 
Reject H_{0}  Type I Error  Correct decision 
Null hypothesis: The victim's situation is alive.  
Decision  True (The victim is alive.)  False (The victim is dead.) 
Not reject H_{0}  Correct decision  Type II Error 
Reject H_{0}  Type I Error  Correct decision 
Null hypothesis: A patient is not sick.  
Decision  True (The patient is not sick.)  False (The patient is sick.) 
Not reject H_{0}  Correct decision  Type II Error 
Reject H_{0}  Type I Error  Correct decision 
Null Hypothesis : Boy Genetic Labs has no effect on gender outcome.  
Decision  True (No Effect)  False (Effect) 
Not reject H_{0}  Correct decision  Type II Error 
reject H_{0}  Type I Error  Correct decision 
Null Hypothesis: The mean level of toxins is at most 800 μg.  
Decision  True  False 
Not reject H_{0}  Correct decision  Type II Error 
Reject H_{0}  Type I Error  Correct decision 
Null hypothesis (The cure rate is less than 75%.)  
Decision  True (The cure rate is less than 75%.)  False (The cure rate is higher than 75%.) 
Not reject H_{0}  Correct  Type II Error 
Reject H_{0}  Type I Error  Correct 
When you perform a hypothesis test of a single population proportion p, you take a simple random sample from the population. You must meet the conditions for a binomial distribution which are as follows:
The pvalue, then, is the probability that a sample mean is the same or greater than 17 cm. when the population mean is, in fact, 15 cm. We can calculate this probability using the normal distribution for means.
[caption id="attachment_731" align="alignnone" width="487"] Figure 1[/caption]pvalue = P([latex]\overline{X}[/latex] > 17) = P([latex]\frac{\overline{X}{\mu}}{\frac{\sigma}{\sqrt{n}}}[/latex] > [latex]\frac{17  {\mu}}{\frac{\sigma}{\sqrt{n}}}[/latex]) = P([latex]\frac{\overline{X}{\mu}}{\frac{\sigma}{\sqrt{n}}}[/latex] > [latex]\frac{1715}{\frac{0.5}{\sqrt{10}}}[/latex]) = P( Z > 12.64911 ) which is approximately zero.
A pvalue of approximately zero tells us that it is highly unlikely that a loaf of bread rises no more than 15 cm, on average. That is, almost 100% of all loaves of bread would be at least as high as 17 cm. purely by CHANCE had the population mean height really been 15 cm. Because the outcome of 17 cm is so unlikely to happen (meaning it is happening NOT by chance alone), we conclude that the evidence is strongly against the null hypothesis (the mean height is at most 15 cm.). There is sufficient evidence that the true mean height for the population of the baker's loaves of bread is greater than 15 cm.
Using TI83/84:
Interpretation of the result:The zscore of height 17cm is 12.64911. The blue shaded area of Figure 1, also known as pvalue, is 5.854831 * 10^{37}. 
H_{0}: μ ≤ 12
H_{a}: μ > 12 The pvalue is 0.0013 Draw a graph that shows the pvalue. [practicearea rows="1"][/practicearea] [revealanswer q="688828"]Solution[/revealanswer] [hiddenanswer a="688828"] pvalue = 0.0013[/hiddenanswer]A systematic way to make a decision of whether to reject or not reject the null hypothesis is to compare the pvalue and a preset or preconceived α (also called a "significance level"). A preset α is the probability of a Type I error (rejecting the null hypothesis when the null hypothesis is true). It may or may not be given to you at the beginning of the problem.
When you make a decision to reject or not reject H_{0}, do as follows:

Conclusion: After you make your decision, write a thoughtful conclusion about the hypotheses in terms of the given problem.
If the pvalue is low, the null must go.
If the pvalue is high, the null must fly.This memory aid relates a pvalue less than the established alpha (the p is low) as rejecting the null hypothesis and, likewise, relates a pvalue higher than the established alpha (the p is high) as not rejecting the null hypothesis.
Reject the null hypothesis when ______________________________________. [revealanswer q="59188"]Show Answer[/revealanswer] [hiddenanswer a="59188"]the pvalue is less than the established value of [latex]\alpha[/latex].[/hiddenanswer]
The results of the sample data _____________________________________. [revealanswer q="254496"]Show Answer[/revealanswer] [hiddenanswer a="254496"]support the alternative hypothesis.[/hiddenanswer]
Do not reject the null when hypothesis when __________________________________________. [revealanswer q="525499"]Show Answer[/revealanswer] [hiddenanswer a="525499"] the pvalue is greater than the established value of [latex]\alpha[/latex].[/hiddenanswer]
The results of the sample data ____________________________________________. [revealanswer q="780055"]Show Answer[/revealanswer] [hiddenanswer a="780055"]do not support the alternative hypothesis.[/hiddenanswer]
CuteBaby Genetics Labs claim their procedures improve the chances of a boy being born. The results for a test of a single population proportion are as follows:
H_{0}: p = 0.50, H_{a}: p > 0.50
α = 0.01
pvalue = 0.025
Interpret the results and state a conclusion in simple, nontechnical terms.
The following examples illustrate a left, right, and twotailed test.
H_{o}: μ = 5 H_{a}: μ < 5 Significance level = 5% Assume the pvalue is 0.0243.
[revealanswer q="256590"]Click here to show solution: [/revealanswer] [hiddenanswer a="256590"]
H_{0}: μ = 10 H_{a}: μ < 10 Significance level = 5% = 0.05 Assume the pvalue is 0.0435.
[revealanswer q="590445"]Click here to show solution: [/revealanswer] [hiddenanswer a="590445"]
H_{0}: μ ≤ 1 H_{a}: μ > 1 Significance level = 1% Assume the pvalue is 0.1243.
Steps to set up a hypothesis test:

For Jeffrey to swim faster, his time will be less than 16.43 seconds. The "<" tells you this is lefttailed.[/hiddenanswer] [revealanswer q="12509"]2. What is the significance level? [/revealanswer] [hiddenanswer a="12509"]significance level, [latex]\alpha[/latex] = 5% = 0.05[/hiddenanswer] [revealanswer q="540483"]3. What is the pvalue? [/revealanswer] [hiddenanswer a="540483"]
Graph:
Calculate the pvalue using the normal distribution for a mean:
pvalue = P[latex]\left(\overline{X}<{16}\right)[/latex] = P[latex]\left(\frac{\overline{X}  {\mu}}{\frac{\sigma}{\sqrt{n}}}<\frac{16  {\mu}}{\frac{\sigma}{\sqrt{n}}}\right)[/latex] = P[latex]\left( Z <\frac{16  {\mu}}{\frac{\sigma}{\sqrt{n}}}\right)[/latex] = P[latex]\left( Z <\frac{16 16.43}{\frac{0.8}{\sqrt{15}}}\right)[/latex] = 0.0187 where the sample mean in the problem is given as 16.pvalue = 0.0187 (This is called the actual level of significance.) The pvalue is the area to the left of the sample mean is given as 16.
pvalue = 0.0187, α = 0.05 Therefore, α > pvalue.
Interpretation of the pvalue: If H_{0} is true, there is a 0.0187 probability (1.87%)that Jeffrey's mean time to swim the 25yard freestyle is 16 seconds or less. Because a 1.87% chance is small, the mean time of 16 seconds or less is unlikely to have happened randomly. It is a rare event. 
This means that you reject μ = 16.43. In other words, you do not think Jeffrey swims the 25yard freestyle in 16.43 seconds but faster with the new goggles. Make a decision: Since α > pvalue, reject H_{0}.[/hiddenanswer]
[revealanswer q="665879"]6. Conclusion? [/revealanswer] [hiddenanswer a="665879"]At the 5% significance level, we conclude that Jeffrey swims faster using the new goggles. The sample data show there is sufficient evidence that Jeffrey's mean time to swim the 25yard freestyle is less than 16.43 seconds.[/hiddenanswer]
Using TICalculator to find pvalue:

The Type I and Type II errors for this problem are as follows:
The Type I error is to conclude that Jeffrey swims the 25yard freestyle, on average, in less than 16.43 seconds when, in fact, he actually swims the 25yard freestyle, on average, in 16.43 seconds. (Reject the null hypothesis when the null hypothesis is true.)
The Type II error is that there is not evidence to conclude that Jeffrey swims the 25yard freestyle, on average, in less than 16.43 seconds when, in fact, he actually does swim the 25yard freestyle, on average, in less than 16.43 seconds. (Do not reject the null hypothesis when the null hypothesis is false.)The mean throwing distance of a football for a Marco, a high school freshman quarterback, is 40 yards, with a standard deviation of 2 yards. The team coach tells Marco to adjust his grip to get more distance. The coach records the distances for 20 throws. For the 20 throws, Marco’s mean distance was 45 yards. The coach thought the different grip helped Marco throw farther than 40 yards. Conduct a hypothesis test using a preset α = 0.05. Assume the throw distances for footballs are normal.
First, determine what type of test this is, set up the hypothesis test, find the pvalue, sketch the graph, and state your conclusion.
[practicearea rows="4"][/practicearea] [revealanswer q="205319"]StepbyStep Solution[/revealanswer] [hiddenanswer a="205319"] Since the problem is about a mean, this is a test of a single population mean.Conduct a hypothesis test using a 2.5% level of significance to determine if the bench press mean is more than 275 pounds.
Since the problem is about a mean weight, this is a test of a single population mean.
H_{0}: μ = 275 H_{a}: μ > 275 Significance leel, [latex]\alpha[/latex] = 2.5% = 0.025
This is a righttailed test. [/hiddenanswer] [revealanswer q="873840"]2. What is the significance level, [latex]\alpha[/latex]? [/revealanswer] [hiddenanswer a="873840"][latex]\alpha[/latex] = 2.5% = 0.025[/hiddenanswer] [revealanswer q="56192"]3. Find the pvalue.[/revealanswer] [hiddenanswer a="56192"][latex]\overline{X}[/latex] = the mean weight (in pounds) lifted by the football players.
Distribution for the test: X is normally distributed because σ is known. [latex]\overline{x}[/latex] = 286.2, n = 30, σ = 55 pounds (Always use σ if you know it.) We assume μ = 275 pounds unless our data shows us otherwise. Calculate the pvalue using the normal distribution for a mean and using the sample mean as input. pvalue = P ([latex]\overline{x}[/latex] > 286.2 ) = P ([latex]\frac{\overline{x}  {\mu}}{\frac{\sigma}{\sqrt{n}}}[/latex] > [latex]\frac{286.2  {\mu}}{\frac{\sigma}{\sqrt{n}}}[/latex] ) = P (Z > [latex]\frac{286.2  {\mu}}{\frac{\sigma}{\sqrt{n}}}[/latex]) = P (Z > [latex]\frac{286.2  275}{\frac{55}{\sqrt{30}}}[/latex]) = P (Z > 1.1153623) = 0.1323.[/hiddenanswer] [revealanswer q="173168"]4. Comparison between pvalue and significance level. [/revealanswer] [hiddenanswer a="173168"] Interpretation of the pvalue: If H_{0} is true, then there is a 0.1331 probability (13.23%) that the football players can lift a mean weight of 286.2 pounds or more. Because a 13.23% chance is large enough, a mean weight lift of 286.2 pounds or more is not a rare event. α = 0.025, pvalue = 0.1323[/hiddenanswer] [revealanswer q="330644"]5. Decision?[/revealanswer] [hiddenanswer a="330644"]Make a decision: Since α <pvalue, do not reject H_{0}.[/hiddenanswer] [revealanswer q="878326"]6. Conclusion?[/revealanswer] [hiddenanswer a="878326"]Conclusion:
At the 2.5% level of significance, from the sample data, there is not sufficient evidence to conclude that the true mean weight lifted is more than 275 pounds.[/hiddenanswer]The pvalue can easily be calculated.

Statistics students believe that the mean score on the first statistics test is 65. A statistics instructor thinks the mean score is higher than 65. He samples ten statistics students and obtains the scores 65 65 70 67 66 63 63 68 72 71. The data are assumed to be from a normal distribution. He performs a hypothesis test using a 5% level of significance.
Since we do not know population standard deviation, we are going to run Student's t Test.
This is a test of a single population mean.H_{0}: μ = 65 H_{a}: μ > 65 A 5% level of significance means that α = 0.05.
Since the instructor thinks the average score is higher, use a ">". The ">" means the test is righttailed.[/hiddenanswer] [revealanswer q="510963"]2. What is the significance level, [latex]\alpha[/latex]? [/revealanswer] [hiddenanswer a="510963"]The significance level, [latex]\alpha[/latex] = 5% = 0.05[/hiddenanswer] [revealanswer q="406273"]3. Find the pvalue. [/revealanswer] [hiddenanswer a="406273"] Random variable: [latex]\overline{X}[/latex] = average score on the first statistics test. Distribution for the test: If you read the problem carefully, you will notice that there is no population standard deviation given. You are only given n = 10 sample data values. Notice also that the data come from a normal distribution. This means that the distribution for the test is a student's ttest. Use t_{df}. Therefore, the distribution for the test is t_{9} where n = 10 and df = 10  1 = 9.Calculate the pvalue using the Student's tdistribution:
Given that sample mean and sample standard deviation are calculated as 67 and 3.1972 from the data, pvalue = P([latex]\overline{x}[/latex]> 67) = P([latex]\frac{\overline{x}\mu}{\frac{\sigma}{\sqrt{n}}}[/latex]> [latex]\frac{67  65}{\frac{3.1972}{\sqrt{10}}}[/latex]) = P( t > 1.978) = 0.0396[/hiddenanswer] [revealanswer q="186204"]4. Comparison between pvalue and significance level. [/revealanswer] [hiddenanswer a="186204"]Interpretation of the pvalue:
If the null hypothesis is true, then there is a 0.0396 probability (3.96%) that the sample mean is 65 or more. Since α = 0.05 and pvalue = 0.0396. α > pvalue.[/hiddenanswer] [revealanswer q="243554"]5. Decision? [/revealanswer] [hiddenanswer a="243554"]Since α > pvalue, reject H_{0}. This means you reject μ = 65. In other words, you believe the average test score is more than 65.[/hiddenanswer] [revealanswer q="391743"]6. Conclusion?[/revealanswer] [hiddenanswer a="391743"]At a 5% level of significance, the sample data show sufficient evidence that the mean (average) test score is more than 65, just as the math instructor thinks.[/hiddenanswer]
The pvalue can easily be calculated.

$4, $3, $2, $3, $1, $7, $2, $1, $1, $2.
Perform a hypothesis test using a 5% level of significance. State the null and alternative hypotheses, find the pvalue, state your conclusion, and identify the Type I and Type II errors.
[revealanswer q="391557"]Show Answer[/revealanswer] [hiddenanswer a="391557"]
We run Student's ttest as we do not know the population standard deviation. H_{0}: μ = 5 H_{a}: μ < 5 p = 0.0082Because p < α, we reject the null hypothesis. There is sufficient evidence to suggest that the stock price of the company grows at a rate less than $5 a week.
Type I Error: To conclude that the stock price is growing slower than $5 a week when, in fact, the stock price is growing at $5 a week (reject the null hypothesis when the null hypothesis is true).
Type II Error: To conclude that the stock price is growing at a rate of $5 a week when, in fact, the stock price is growing slower than $5 a week (do not reject the null hypothesis when the null hypothesis is false).
[/hiddenanswer]
This is a Normal test of a single population proportion.
H_{0}: p = 0.50 H_{a}: p ≠ 0.50 The 1% level of significance means that α = 0.01. The words "is different from" tell you this is a twotailed test. [/hiddenanswer] [revealanswer q="544872"]2. What is the significance level, [latex]\alpha[/latex]?[/revealanswer] [hiddenanswer a="544872"]the significance level, [latex]\alpha[/latex] = 1% = 0.01[/hiddenanswer] [revealanswer q="774681"]3. Find the pvalue. [/revealanswer] [hiddenanswer a="774681"]This is a twotailed test. We will include both left tail and right tail in the hypothesis test. P′ = the percent of of firsttime brides who are younger than their grooms. The proportion of firsttime brides who reply that they are younger than their grooms in sample of 100 brides = [latex]\frac{53}{100}[/latex] = 0.53. 0.53 is the right tail of this test as 0.53 is larger than 0.5 (the population mean). How about the left tail? Since 0.53 is on the right side of 0.50, the left tail will fall on the left side of 0.50. Moreover, 0.03 is the difference between 0.53 and 0.50. Hence the distance between the left tail and 0.50 is equal to 0.03 as well. 0.50 – 0.03 = 0.47. The left tail of this test is 0.47. Given that p = 0.50, q = 1 − p = 0.50, and n = 100, pvalue = area to the right of right tail + area to the left of left tail = area to the right of 0.53 + area to the left of 0.47 = P (p' > 0.53) + P (p' < 0.47) = P ( [latex]\frac{p'  p}{\sqrt{\frac{pq}{n}}}[/latex] > [latex]\frac{0.530.50}{\sqrt{\frac{(0.5)(0.5)}{100}}}[/latex] ) + P ( [latex]\frac{p'  p}{\sqrt{\frac{pq}{n}}}[/latex] > [latex]\frac{0.470.50}{\sqrt{\frac{(0.5)(0.5)}{100}}}[/latex] ) = P ( Z > 0.6 ) + P(Z < 0.6 ) = 0.27425 + 0.27425 = 0.5485 [/hiddenanswer] [revealanswer q="970068"]4. Comparison between pvalue and significance level. [/revealanswer] [hiddenanswer a="970068"] pvalue = 0.5485, significance level = 0.01 [/hiddenanswer] [revealanswer q="505115"]5. Decision?[/revealanswer] [hiddenanswer a="505115"]Since pvalue > significance level, we do not reject H_{0}.[/hiddenanswer] [revealanswer q="666927"]6. Conclusion? [/revealanswer] [hiddenanswer a="666927"]There is no sufficient evidence to suggest that the percentage is different than 50%. [/hiddenanswer]STAT
and arrow over to TESTS
.5:1PropZTest
. Enter .5 for p_{0}, 53 for x and 100 for n.Prop
and arrow to not equals
p_{0}. Press ENTER
.Calculate
and press ENTER
.Prop not equals
.5 is the alternate hypothesis.Draw
(instead of Calculate
). Press ENTER
. A shaded graph appears with z = 0.6 (test statistic) and p = 0.5485 (pvalue). Make sure when you use Draw
that no other equations are highlighted in Y = and the plots are turned off.[/hiddenanswer]
A teacher believes that 85% of students in the class will want to go on a field trip to the local zoo. She performs a hypothesis test to determine if the percentage is the same or different from 85%. The teacher samples 50 students and 39 reply that they would want to go to the zoo. For the hypothesis test, use a 1% level of significance.
First, determine what type of test this is, set up the hypothesis test, find the pvalue, sketch the graph, and state your conclusion.
[revealanswer q="401155"]Show Answer[/revealanswer] [hiddenanswer a="401155"] Since the problem is about percentages, this is a test of single population proportions. H_{0} : p = 0.85 H_{a}: p ≠ 0.85 p = 0.7554 Because p > α, we fail to reject the null hypothesis. There is not sufficient evidence to suggest that the proportion of students that want to go to the zoo is not 85%.[/hiddenanswer]The National Institute of Standards and Technology provides exact data on conductivity properties of materials. Following are conductivity measurements for 11 randomly selected pieces of a particular type of glass. (Assume the population is normal. )
1.11; 1.07; 1.11; 1.07; 1.12; 1.08; .98; .98 1.02; .95; .95
Is there convincing evidence that the average conductivity of this type of glass is greater than 1? Use a significance level of 0.05. [revealanswer q="104041"]Stepbystep Solution:[/revealanswer] [hiddenanswer a="104041"]The hypothesis test itself has an established process. This can be summarized as follows:
In this chapter, you will be studying the simplest form of regression, "linear regression" with one independent variable (x). This involves data that fits a line in two dimensions. You will also study correlation which measures how strong the relationship is.
]]>y = –0.125 – 3.5x [revealanswer q="452869"]Show Answer[/revealanswer] [hiddenanswer a="452869"]Yes[/hiddenanswer]
The graph of a linear equation of the form y = a + bx is a straight line.
Any straight line that is not vertical can be described by this equation.Graph the equation y = –1 + 2x.
Find the equation that expresses the total cost in terms of the number of hours required to complete the job. [revealanswer q="396610"]Show Answer:[/revealanswer] [hiddenanswer a="396610"]
Let x = the number of hours it takes to get the job done. Let y = the total cost to the customer. The $31.50 is a fixed cost. If it takes x hours to complete the job, then (32)(x) is the cost of the word processing only. The total cost is: y = 31.50 + 32x[/hiddenanswer]For the linear equation y = a + bx, b = slope and a = yintercept. From algebra recall that the slope is a number that describes the steepness of a line, and the yintercept is the y coordinate of the point (0, a) where the line crosses the yaxis.
The independent variable (x) is the number of hours Svetlana tutors each session.
The dependent variable (y) is the amount, in dollars, Svetlana earns for each session.The yintercept is 25 (a = 25). At the start of the tutoring session, Svetlana charges a onetime fee of $25 (this is when x = 0).
The slope is 15 (b = 15). For each session, Svetlana earns $15 for each hour she tutors.[/hiddenanswer]
Ethan repairs household appliances like dishwashers and refrigerators. For each visit, he charges $25 plus $20 per hour of work. A linear equation that expresses the total amount of money Ethan earns per visit is y = 25 + 20x.
What are the independent and dependent variables? What is the yintercept and what is the slope? Interpret them using complete sentences. [revealanswer q="962960"]Show Answer[/revealanswer] [hiddenanswer a="962960"]
The independent variable (x) is the number of hours Ethan works each visit. The dependent variable (y) is the amount, in dollars, Ethan earns for each visit. The yintercept is 25 (a = 25). At the start of a visit, Ethan charges a onetime fee of $25 (this is when x = 0). The slope is 20 (b = 20). For each visit, Ethan earns $20 for each hour he works.[/hiddenanswer]Data from the Centers for Disease Control and Prevention.
Data from the National Center for HIV, STD, and TB Prevention.
The most basic type of association is a linear association. This type of relationship can be defined algebraically by the equations used, numerically with actual or predicted data values, or graphically from a plotted curve. (Lines are classified as straight curves.) Algebraically, a linear equation typically takes the form y = mx + b, where m and b are constants, x is the independent variable, y is the dependent variable. In a statistical context, a linear equation is written in the form y = a + bx, where aand b are the constants. This form is used to help readers distinguish the statistical context from the algebraic context. In the equation y = a + bx, the constant b that multiplies the x variable (b is called a coefficient) is called as the slope. The slope describes the rate of change between the independent and dependent variables; in other words, the rate of change describes the change that occurs in the dependent variable as the independent variable is changed. In the equation y = a + bx, the constant a is called as the yintercept. Graphically, the yintercept is the y coordinate of the point where the graph of the line crosses the y axis. At this point x = 0.
The slope of a line is a value that describes the rate of change between the independent and dependent variables. The slope tells us how the dependent variable (y) changes for every one unit increase in the independent (x) variable, on average. The yintercept is used to describe the dependent variable when the independent variable equals zero. Graphically, the slope is represented by three line types in elementary statistics.(year)  (# of users) 

2000  0.5 
2002  20.0 
2003  33.0 
2004  47.0 
Creating a Scatter Plot (TICalculator):

X (hours practicing jump shot)  Y (points scored in a game) 

5  15 
7  22 
9  28 
10  31 
11  33 
12  36 
x (third exam score)  y (final exam score) 

65  175 
67  133 
71  185 
71  163 
66  126 
75  198 
67  153 
70  163 
71  159 
69  151 
69  159 
Depth (in feet)  Maximum dive time (in minutes) 

50  80 
60  55 
70  45 
80  35 
90  25 
100  22 

[latex]\displaystyle\hat{{y}}={a}+{b}{x}[/latex]
where [latex]\displaystyle{a}=\overline{y}{b}\overline{{x}}[/latex] and [latex]{b}=\frac{{\sum{({x}\overline{{x}})}{({y}\overline{{y}})}}}{{\sum{({x}\overline{{x}})}^{{2}}}}[/latex]. The sample means of the xvalues and the yvalues are [latex]\displaystyle\overline{{x}}[/latex] and [latex]\overline{{y}}[/latex]. The slope b can be written as [latex]\displaystyle{b}={r}{\left(\frac{{s}_{{y}}}{{s}_{{x}}}\right)}[/latex] where
Note:Computer spreadsheets, statistical software, and many calculators can quickly calculate the bestfit line and create the graphs. The calculations tend to be tedious if done by hand. Instructions to use the TI83, TI83+, and TI84+ calculators to find the bestfit line and create a scatterplot are shown at the end of this section. 
Remember, it is always important to plot a scatter diagram first. If the scatter plot indicates that there is a linear relationship between the variables, then it is reasonable to use a best fit line to make predictions for y given x within the domain of xvalues in the sample data, but not necessarily for xvalues outside that domain. You could use the line to predict the final exam score for a student who earned a grade of 73 on the third exam. You should NOT use the line to predict the final exam score for a student who earned a grade of 50 on the third exam, because 50 is not within the domain of the xvalues in the sample data, which are between 65 and 75.
Third Exam vs Final Exam Example: [latex]\displaystyle\hat{{y}}={173.51}+{4.83}{x}[/latex] Slope: The slope of the line is b = 4.83. Interpretation: For a onepoint increase in the score on the third exam (x), the final exam score (y) increases by 4.83 points, on average. 
Note:Strong correlation does not suggest that x causes y or y causes x. We say "correlation does not imply causation." 
Third Exam vs Final Exam Example: The line of best fit is [latex]\displaystyle\hat{{y}}={173.51}+{4.83}{x}[/latex] The correlation coefficient is r = 0.6631 The coefficient of determination is r^{2} = 0.66312 = 0.4397 Interpretation of r^{2} in the context of this example: Approximately 44% of the variation (0.4397 is approximately 0.44) in the finalexam grades can be explained by the variation in the grades on the third exam, using the bestfit regression line. Therefore, approximately 56% of the variation (1 – 0.44 = 0.56) in the final exam grades can NOT be explained by the variation in the grades on the third exam, using the bestfit regression line. (This is seen as the scattering of the points about the line.) 
x (third exam score)  y (final exam score) 

65  175 
67  133 
71  185 
71  163 
66  126 
75  198 
67  153 
70  163 
71  159 
69  151 
69  159 
Table showing the scores on the final exam based on scores from the third exam.
Scatter plot showing the scores on the final exam based on scores from the third exam.
We examined the scatterplot and showed that the correlation coefficient is significant. We found the equation of the bestfit line for the final exam grade as a function of the grade on the thirdexam. We can now use the leastsquares regression line for prediction. Suppose you want to estimate, or predict, the mean final exam score of statistics students who received 73 on the third exam. The exam scores (xvalues) range from 65 to 75. Since 73 is between the xvalues 65 and 75, substitute x = 73 into the equation. Then: [latex]\displaystyle\hat{{y}}={173.51}+{4.83}{({73})}={179.08}[/latex] We predict that statistics students who earn a grade of 73 on the third exam will earn a grade of 179.08 on the final exam, on average.
Note:The process of predicting inside of the observed x values observed in the data is called interpolation. The process of predicting outside of the observed x values observed in the data is called extrapolation. 