Homework 2: SDGB 7844
Submit two files through Blackboard: (a) .Rmd R Markdown file with answers and code
and (b) Word document of knitted R Markdown file. Your file should be named as follows:
“HW2-[Full Name]-[Class Time]” and include those details in the body of your file.
For those of you who have studied U.S. government, you know that Congress (legislature) is
made up of the House of Representatives and the Senate. The number of people each state
sends to the House is dependent on that state’s population, whereas every state sends two
people to the Senate.
A census of the U.S. population is required every ten years by the U.S. Constitution
(Article 1, Section 2). The primary purpose of the census is to determine how many representatives
each state will send to the House. This procedure is called apportionment
(link). There are 435 representatives in the House and each state sends at least once person.
Once the census is complete, the equal proportions method is used to apportion those
435 seats among the states.
The first census was conducted in 1790 when people were hired to visit each home and
count who lived there. At that time, only white males were eligible to vote, but according
to the Constitution everyone was to be counted, not just eligible voters or citizens. Slaves
were counted too, but were considered only three-fifths of a person (see Constitution Article
1, Section 2, Clause 3). This was abolished after the Civil War when the 13th Amendment
to the Constitution was ratified in 1865.
The Electoral College is a body which decides who is president. The number of House
members plus two equals the number of electoral votes each state gets. During a presidential
election, citizens technically vote for the Electoral College members (even though the presidential
candidates are on the ballot) and the Electoral College votes for president (link).
For all practical purposes, though, whichever candidate gets the most votes in the state gets
all of the electoral votes for that state. (Note: There are 538 electoral college members and
so 538 electoral votes. 538 = 435 House reps + 50 Senators + 3 people for the District of
Columbia. Therefore, whomever gets at least 270 electoral votes wins.)
The next census is in 2020 when, again, everyone will be counted. Every residential address
will receive a form to fill regarding the occupants of that residence. Between censuses, the
government keeps track of population changes through the Population Estimates Program
(PEP), which is administered by the U.S. Census Bureau (link).
Goal: Use 2018 Population Estimates Program (PEP) data to estimate the number
of House of Representative members for each state expected from the results
of the upcoming 2020 census. Compare your estimates with the current House
distribution which is based on the 2010 census1
DO NOT CHANGE ANY OF THE FILE NAMES OR FILES THEMSELVES!!
• “PEP 2018 PEPANNRES with ann.csv”: 2018 population for each state from
the PEP from American FactFinder, a website maintained by the Census Bureau.
Instructions are at the end of this assignment.
• “ApportionmentPopulation2010.xls”: 2010 population for each state and the
2010 apportionment results. Instructions are at the end of this assignment.
• Equal proportions algorithm: In “Congressional Apportionment...” file posted
with this assignment.
• U.S. map: from the R package usmap . You need to install this package on your
computer and then load it by using the command require(usmap). See Lecture 3
slides for instructions on installing an R package.
1. What was the “residence rule” for the 2010 census and why is it important? (Use the
internet and provide a link for any sources you use.)
2. Upload the 2018 data file into R. Only keep the columns Geography; April 1, 2010
- Census; and Population Estimate (as of July 1) - 2018. Rename the columns
state; res2010; and pep2018 (all lowercase).
(a) There are 50 states, so why are there more than 50 rows in the data set?
(b) What is the resident population of the U.S. according to the 2010 census? Which
geographies are included/excluded from this total? Remove the extra rows from
your 2018 PEP data set so you only have the data for the 50 states. (The functions
sum() and is.element() are useful here.)
(c) Calculate the percent change of the total resident population between the 2010
census and 2018. How much has the population grown? Once you’ve answered this
question, remove the res2010 column from the data set.
1Note: The population used for apportionment purposes is slightly higher than the resident populations
given in the 2018 data file. That is because people like overseas military members are included as part of
their home state population totals for apportionment purposes. That means our 2018 population values will
undercount the population used for 2020 apportionment.
Page 2 of 6
3. Upload the 2010 data file into R. This file has some extra bits, so the arguments skip
and n max in the read excel() function from the package readxl may be useful. Keep
the columns STATE; APPORTIONMENT POPULATION (APRIL 1, 2010); and APPORTIONED
REPRESENTATIVES BASED ON 2010 CENSUS. Rename them state;
appor2010; and rep2010 (again, all lowercase).
(a) Calculate the following summary statistics for the 2010 census population values
and put them into a table in Word: minimum, maximium, mean, median, and
(b) Which state has the largest population? Which has the smallest? Where does New
York fall into the ranking of population size?
4. Create two histograms: (a) 2010 apportionment population and (b) log of the 2010
apportionment populaiton (log always means natural log in statistics). Describe the
shape of both distributions.
5. Looking at your histograms in Question 4, is the mean or the median a better measure
for center in each case? Justify your answer.
6. Create two scatter plots: (a) 2010 apportionment population on the x-axis and number
of House members on the y-axis; and (b) log of 2010 apportionment population on
the x-axis and number of House members on the y-axis. Which plot shows a clearer
relationship between the two variables? Can we use correlation, r, to represent the
relationships in either graph? Justify your answers.
7. Merge the the 2018 population data and the 2010 apportionment data into a single R
object called data.x. Estimate what the number of House members for each state would
be in 2020 based on your 2018 population data using the equal proportions method. Add
your calculated apportionment numbers as a new column in data.x.
The equal proportions method of calculating the number of House members is given in
the “Congressional Apportionment” report posted along with this assignment (additional
info). Read it first so you can understand the instructions given below.
Equal Proportions Method:
Step 1: Calculate a vector of values of the formula 1/
n(n − 1) where n goes from 2 to
60 and call it denom. This means that we are assuming that the maximum number
of seats for a state is 60, which seems reasonable given the 2010 representative
numbers. (Make sure you’ve merged your 2010 and 2018 data sets first.)
Step 2: Multiply each value of denom in Step 1 by each state’s 2018 population. For example,
each element in denom is multiplied by Alabama’s population and the repeated
Page 3 of 6
for Alaska, Arizona, etc. These values are called priority values:
P Vn =
n(n − 1)
There are many ways to do this, but the simplest in terms of coding is to use some
matrix algebra: c(t(outer(data.x$pep2018, denom))) where outer() calculates
the outer product of two vectors, t() transposes the resulting matrix, and c()
converts the matrix into a vector.
Step 3: Create a data set with the priority values as one column and the corresponding
state names as a second column.
Step 4: Sort your data set in Step 3 in descending order by priority value so that the
highest priority values are on top. Extract the first 385 rows (435-50=385). Each
row of the resulting data set represents one seat in the House.
Step 5: Make a frequency table of the state names in Step 4 using the function count().
The frequency of each state is the initial number of representatives for that state.
Step 6: Merge your frequency table with data.x. Then, replace all NA counts with 0 using
the function replace na().
Step 7: Add 1 to each state representative count so that each state has at least one representative
and the total number of representatives equals 435.
Now, answer the following questions:
(a) Make a table in Word with the three states with the highest number of representatives.
What fraction of the total number of representatives do these 3 states
comprise? Currently, do the same states have the highest number of representatives?
(b) How many states have only a single House of Representatives member?
8. Calculate the following difference: (estimated 2020 house reps − 2010 house reps) as
a new column in data.x and convert it to a character data type Call this column
difference. Make a frequency table of the differences column in Word.
Page 4 of 6
9. A way of representing the information in Question 8 is by creating a map.
(a) Make a map of the US color-coded by the differences column. Then answer the
(b) Why does the legend include an NA?
(c) Describe what you see in the map.
(d) Various research/media organizations have made their own predictions about distribution
of the House seats. Pick one and compare your results with their predictions.
Include links to any references you use.
(e) Describe one way we could improve our analysis.
Page 5 of 6
Downloading 2018 PEP Data
1. Go to the American FactFinder website:
2. In the section titled, “What We Provide” near the bottom, click on the “get data” link
next to Population Estimates Program.
3. Click on the table called PEPANNRES, “Annual Estimates of the Resident Population:
April 1, 2010 to July 1, 2018”. It should bring you to a table which looks like this:
4. Click on the Download button; select the “Use” option in the pop-up window and click
5. Unzip the downloaded file. The file you will be using is called
“PEP 2018 PEPANNRES with ann.csv” The other files in the folder contain information
about the data.
6. You can put the entire folder wherever you have your R code for this assignment.
When you upload the data, use the filepath
“PEP 2018 PEPANNRES/PEP 2018 PEPANNRES with ann.csv” to indicate that the
file you want is inside the folder called “PEP 2018 PEPANNRES”. That way you can
keep all of the information relevant to the data file together.
Downloading 2010 Apportionment Data
1. Go to this website:
2. Download the Excel file titled “Apportionment Population and Number...”
Page 6 of 6
因为专业，所以值得信赖。如有需要，请加QQ：99515681 或邮箱：[email protected]