Data Science with R
Assignments
All assignments are parts of the online course "Data Science: R Basics" on "edx.org". These will help to understand the statistical software R and to utilize this learning to obtain a certificate if needed.
*These are for your help only! Try to do it by yourself when getting certification from edx.org
01
What is the sum of the first 100 positive integers?
PDF
03
How do you use sum() and seq() functions to add the first 1000 positive numbers?
PDF
05
Which of the following will always return the numeric value stored in?
PDF
07
What are the column names used by the "murders" data frame for the five variables?
PDF
09
Use the square brackets [[ to extract the state abbreviations. Then, use the identical() function to determine if, it is the same as the 08 exercises.
PDF
11
Use the table() function in one line of code to create a table showing the number of states per region.
PDF
13
Create a vector with the city names Beijing, Lagos, Paris, Rio de Janeiro, San Juan, and Toronto.
PDF
15
Use the [ and : operators to access the temperature of the first three cities already stored in the temp object.
PDF
17
Use the : operator to create a sequence of consecutive integers starting at 12 and ending at 73 and then determine the length of it.
PDF
19
Create a vector of numbers that starts at 6, does not go beyond 55, and adds numbers in increments of 4/7. So the first three numbers will be 6, 6+4/7, and 6+8/7. How many numbers does the list have?
PDF
21
Store the sequence in the object a<-seq(1,10).
Determine the class of a
PDF
23
x is a character vector. Redefine x to typecast it to get an integer vector using as.numeric.
PDF
25
Access population values from the dataset and order them. Find the index number of the entry with the smallest population size.
PDF
27
Define a variable states to hold the state names from the murders data frame. Combine these to find the state name for the smallest state.
PDF
29
Define a variable ind to store the indexes needed to order the population values. Create a data frame with the state name and its rank and order from least populous to most.
PDF
31
Write one line of code to compute the average, but only for the entries that are not NA making use of the ! operator before ind.
PDF
33
Define an object that contains the numbers 1 through 100. Compute the sum 1+1/2^2+1/3^2+...+1/100^2.
PDF
35
Use the logical operators to create a logical vector, name it low, that tells us which entries of murder_rate are lower than 1, and which are not, in one line of code.
PDF
37
Use the results from the previous exercise to report the names of the states with murder rates lower than 1, using the square brackets to retrieve the names of the states from the dataset.
PDF
39
In a previous exercise, we computed the murder rate for each state and the average of these numbers. How many states are below the average?
41a
Define a character vector with the abbreviations MA, ME, MI, MO, and MU. Use the %in% operator to create a logical vector that is TRUE when the abbreviation is in murders$abb.
PDF
42
You can add columns using the mutate() function in dplyr library. Use this function to add a murders column named rate with the per 100,000 murder rate.
PDF
44
Use select() to show the state names and abbreviations in murders. Just show it; do not define a new object.
PDF
46
Create a new data frame called no_south that removes states from the South region. How many states are in this category? You can use the function nrow() for this.
PDF
48
Create a table, call it my_states, that satisfies both the conditions: it is in the Northeast or West, and the murder rate is less than 1. Use select() to show only the state name, the rate, and the rank.
PDF
50
Use one line of code to create a new data frame, called my_states, that has murder rate and rank columns, considers only states in the Northeast or West that have a murder rate lower than 1, and contains only the state, rate, and rank columns.
PDF
52
Compute the population in millions and save it to the object population_in_millions. Create a histogram of the state populations using the function hist.
PDF
54
02
What is the sum of the first 1000 positive integers?
PDF
04
Use one line of code to compute the log, to the base 10, of the square root of 100.
PDF
06
Use the function str() to examine the structure of the "murders" object in "dslabs" library.
PDF
08
Use the accessor $ to extract the state abbreviations. What is the class of this object?
PDF
10
Use the functions levels() and length() to determine the number of regions defined in murders$region
PDF
12
Use the c() function to create a numeric vector with the average high temperatures.
PDF
14
Use the names() function and the objects in the 12 and 13 to associate the temp data with its city.
PDF
16
Use the [ and : operator to access the temperature of Paris and San Juan already stored in the temp object.
PDF
18
Create a vector containing all the positive odd numbers < 100, which should be in ascending order.
PDF
20
The argument length.out generates sequences that are increasing by the same amount but are of the prespecified length.
What is the class of the following object a?
PDF
22
Confirm that the class of 1 is numeric and the class of 1L is integer.
PDF
24
Access population values from the dataset and sort it. Determine the smallest population size.
PDF
26
Write one line of code that gives the index of the lowest population entry. Use the which.min command.
PDF
28
Define a variable ranks to determine the population size ranks. Create a data frame with state names and their respective ranks.
PDF
30
Use is.na to create a logical index ind that tells which entries are NA. Determine how many NA ind has using the sum() function.
PDF
32
Use vector arithmetic to convert temp to Celsius. Create a data frame with the city names and temperatures in Celsius.
PDF
34
Store the per 100,000 murder rate for each state in murder_rate. Calculate the average murder rate in the US.
PDF
36
Use the results from the previous exercise and the function which() to determine the indices of murder_rate associated with values lower than 1.
38
Use the operator to create a new object ind that is true when low is true and the state is in the Northeast. Use the brackets [ and ind to show the state names that satisfy this condition.
PDF
40
Define a character vector with the abbreviations. Start by defining an index of the entries of murders$abb that match the three abbreviations. Extract the states.
PDF
41b
Using the which() function and the ! operator, get the index of the entries of abbs that are not abbreviations. Show the entries of abbs that are not actual abbreviations.
PDF
43
Use the mutate() function to add a column rank containing the rank, from highest to lowest murder rate.
Make sure you redefine murders.
PDF
45
Use filter() to show the top 5 states with the highest murder rates. Note that you can filter based on the rank column.
PDF
47
Create a new data frame called murders_nw with only the states from the Northeast and the West. How many states are in this category?
49
The pipe %>% can be used to perform operations sequentially without defining intermediate objects.
Show the same as in 48 but using the pipe.
PDF
51
Transform the variables population and total murders using the log base-10 transformations.
Create a scatterplot using a plot with log-transformed total murders versus the population.
PDF
53
Create a boxplot of state populations by region for the murders dataset using boxplot.
PDF
55
Which of the following expressions is always when at least one entry of a logical vector x is ? You can try examples in the R console.
PDF