top of page

Data Science with R

Assignments

All assignments are parts of the online course "Data Science: R Basics" on "edx.org". These will help to understand the statistical software R and to utilize this learning to obtain a certificate if needed. 
*These are for your help only! Try to do it by yourself when getting certification from edx.org

01

What is the sum of the first 100 positive integers?

PDF

03

How do you use sum() and seq() functions to add the first 1000 positive numbers?

PDF

05

Which of the following will always return the numeric value stored in?


PDF

07

What are the column names used by the "murders" data frame for the five variables?

PDF

09

Use the square brackets [[ to extract the state abbreviations. Then, use the identical() function to determine if, it is the same as the 08 exercises.

PDF

11

Use the table() function in one line of code to create a table showing the number of states per region.

PDF

13

Create a vector with the city names Beijing, Lagos, Paris, Rio de Janeiro, San Juan, and Toronto.

PDF

15

Use the [ and : operators to access the temperature of the first three cities already stored in the temp object.

PDF

17

Use the : operator to create a sequence of consecutive integers starting at 12 and ending at 73 and then determine the length of it.

PDF

19

Create a vector of numbers that starts at 6, does not go beyond 55, and adds numbers in increments of 4/7. So the first three numbers will be 6, 6+4/7, and 6+8/7. How many numbers does the list have?

PDF

21

Store the sequence in the object a<-seq(1,10).
Determine the class of
a

PDF

23

x is a character vector. Redefine x to typecast it to get an integer vector using as.numeric.

PDF

25

Access population values from the dataset and order them. Find the index number of the entry with the smallest population size.

PDF

27

Define a variable states to hold the state names from the murders data frame. Combine these to find the state name for the smallest state.

PDF

29

Define a variable ind to store the indexes needed to order the population values. Create a data frame with the state name and its rank and order from least populous to most.

PDF

31

Write one line of code to compute the average, but only for the entries that are not NA making use of the ! operator before ind.

PDF

33

Define an object that contains the numbers 1 through 100. Compute the sum 1+1/2^2+1/3^2+...+1/100^2.

PDF

35

Use the logical operators to create a logical vector, name it low, that tells us which entries of murder_rate are lower than 1, and which are not, in one line of code.

PDF

37

Use the results from the previous exercise to report the names of the states with murder rates lower than 1, using the square brackets to retrieve the names of the states from the dataset.

PDF

39

In a previous exercise, we computed the murder rate for each state and the average of these numbers. How many states are below the average?



PDF

41a

Define a character vector with the abbreviations MA, ME, MI, MO, and MU. Use the %in% operator to create a logical vector that is TRUE when the abbreviation is in murders$abb.

PDF

42

You can add columns using the mutate() function in dplyr library. Use this function to add a murders column named rate with the per 100,000 murder rate.

PDF

44

Use select() to show the state names and abbreviations in murders. Just show it; do not define a new object.

PDF

46

Create a new data frame called no_south that removes states from the South region. How many states are in this category? You can use the function nrow() for this.

PDF

48

Create a table, call it my_states, that satisfies both the conditions: it is in the Northeast or West, and the murder rate is less than 1. Use select() to show only the state name, the rate, and the rank.

PDF

50

Use one line of code to create a new data frame, called my_states, that has murder rate and rank columns, considers only states in the Northeast or West that have a murder rate lower than 1, and contains only the state, rate, and rank columns. 

PDF

52

Compute the population in millions and save it to the object population_in_millions. Create a histogram of the state populations using the function hist.

PDF

54

What will this conditional expression return? Run it from the console.

​

​

​


PDF

02

What is the sum of the first 1000 positive integers?

PDF

04

Use one line of code to compute the log, to the base 10, of the square root of 100.

PDF

06

Use the function str() to examine the structure of the "murders" object in "dslabs" library.

PDF

08

Use the accessor $ to extract the state abbreviations. What is the class of this object?

PDF

10

Use the functions levels() and length() to determine the number of regions defined in murders$region



PDF

12

Use the c() function to create a numeric vector with the average high temperatures.

PDF

14

Use the names() function and the objects in the 12 and 13 to associate the temp data with its city.

PDF

16

Use the [ and operator to access the temperature of Paris and San Juan already stored in the temp object.

PDF

18

Create a vector containing all the positive odd numbers < 100, which should be in ascending order.

PDF

20

The argument length.out generates sequences that are increasing by the same amount but are of the prespecified length.
What is the class of the following object
a?



PDF

22

Confirm that the class of 1 is numeric and the class of 1L is integer.

PDF

24

Access population values from the dataset and sort it. Determine the smallest population size.

PDF

26

Write one line of code that gives the index of the lowest population entry. Use the which.min command.


PDF

28

Define a variable ranks to determine the population size ranks. Create a data frame with state names and their respective ranks.


PDF

30

Use is.na to create a logical index ind that tells which entries are NA. Determine how many NA ind has using the sum() function.



PDF

32

Use vector arithmetic to convert temp to Celsius. Create a data frame with the city names and temperatures in Celsius.

PDF

34

Store the per 100,000 murder rate for each state in murder_rate. Calculate the average murder rate in the US.

PDF

36

Use the results from the previous exercise and the function which() to determine the indices of murder_rate associated with values lower than 1.
 


PDF

38

Use the operator to create a new object ind that is true when low is true and the state is in the Northeast. Use the brackets [ and ind to show the state names that satisfy this condition.


PDF

40

Define a character vector with the abbreviations. Start by defining an index of the entries of murders$abb that match the three abbreviations. Extract the states.

PDF

41b

Using the which() function and the ! operator, get the index of the entries of abbs that are not abbreviations. Show the entries of abbs that are not actual abbreviations.


PDF

43

Use the mutate() function to add a column rank containing the rank, from highest to lowest murder rate.
Make sure you redefine murders.


PDF

45

Use filter() to show the top 5 states with the highest murder rates. Note that you can filter based on the rank column.

PDF

47

Create a new data frame called murders_nw with only the states from the Northeast and the West. How many states are in this category?
 


PDF

49

The pipe %>% can be used to perform operations sequentially without defining intermediate objects.
Show the same as in
48 but using the pipe.


PDF

51

Transform the variables population and total murders using the log base-10 transformations. 
Create a scatterplot using a 
plot with log-transformed total murders versus the population.




PDF

53

Create a boxplot of state populations by region for the murders dataset using boxplot.



PDF

55

Which of the following expressions is always when at least one entry of a logical vector x is ? You can try examples in the R console.

PDF

bottom of page