Vectors and variables

Last chapter, we saw an example of a vector: a sequence of data of the same type, for example, a sequence of numbers or a sequence of strings. When analyzing data, you almost never deal with single numbers, and the reason you need to analyze the data in the first place is likely because there are heaps of it! That’s why handling sequences of strings and numbers is central to data analysis and why vectors are core to R.

Let me show you how central vectors are to R.

Type any number in the box below and press ▶ Run Code.

But it’s not just a number. It’s actually a one-item vector with a single number. In R, even single numbers are vectors.

Sometimes, you would want to create new vectors longer than a single number. This can be done using the c() function that combines many values. For example, c(2, 3, 5, 7, 9) will create a numeric vector with all prime numbers between 1 and 10.

Create a numeric vector using c() with at least 5 items. (By the way, I’ll check if any of them are prime numbers.)

There are many functions that help you create vectors in R, one shortcut is the colon operator where, say, 10:30 would create the vector 10, 11, 12, ..., 28, 29, 30.

Use the colon : operator to create the vector 1, 2, 3, ..., 98, 99, 100. (I’ll, again, figure out which are prime.)

Many functions in R are vectorized, that is, they work both on single values, as well as vectors. For example, nchar("pizza") returns 5, the number of characters in "pizza". But nchar() also works on vectors of strings.

Use nchar() to count the number of characters in each word.

And, especially, all math operators (+, *, etc.) are vectorized. That is, 1:3 * 5 would give you 5, 10, 15.

Make the below output the vector c(15, 25, 35, 45, 55, 65, 75, 85, 95, 105) by only changing the numbers in * 1 + 0 (leave 1:10 alone!)

When doing math with two vectors of the same length, the operation will be applied to each corresponding pair of values. It’s easier than it sounds. For example, c(10, 20, 30) + c(1, 2, 3) gives 11, 22, 33 and 11:14 - 1:4 gives 10, 10, 10, 10.

Change the code below to subtract the expenses from the quarterly sales to get the quarterly revenue.

Moving around the numbers directly, like you did above, can work, but it gets messy. It can be made more organized by assigning the values to variables. This needs explanation, but first, let’s look at an example:

pi <- 3.141593

Here we’re taking the value (3.141593) and by using <-, the assignment operator, we’re assigning it to (“putting it into”) a variable named pi. Now, instead of writing 2 * 3.141593 * 5, we can write 2 * pi * 5. The assignment operator is made up of a < and a -, and is meant to look like a left-pointing arrow.

Variables can be given both short and long names, but they can’t include spaces. Instead, it’s common to use underscores (_) to separate words in longer names.

Again, calculate the quarterly revenue. But this time by replacing the ______ placeholder and assigning the result to the variable quarterly_revenue.

Variables need to be assigned before they can be used. This won’t work:

y <- x + 1  # won't work as x doesn't exist at this point!
x <- 1

However, variable names can be reused and “overwritten”. For example, this is okay:

x <- 10
x <- x + 1
x <- x + 1
x <- x + 1
But what would now be the value of x? Write it in the box below and press ▶ Run Code

Here’s some more sales data for you!

c(13, 22, 37, 35, 9, 16, 19, 18, 15, 37,
  30, 12, 14, 14, 16, 11, 33, 31, 19, 17,
  15, 7, 15, 23, 12, 5, 7, 9, 9, 14)

This is the number of sold ice cream cones at my cafe in Hyderabad, India for each day in June 2023. (As opposed to the Hyderabadi temperature data we looked at last chapter, this data is unfortunately made up.)

Copy the ice cream sales data to the code box below and assign it to the variable sold_ice_creams.

Another thing one can do with a vector is to subset it using the square brackets operator ([]). For example, here’s how you would pick out the 1st value in sold_ice_creams:

sold_ice_creams[1]
[1] 13
Pick out the 2nd value in sold_ice_creams.

You can also subset a range of values using the colon operator. For example, this would pick out the first three days of sales:

sold_ice_creams[1:3]
[1] 13 22 37
Pick out the first seven days of sales from sold_ice_creams.

A subset of a vector can be used as any other vector. For example, this here would calculate the median sales for the first week in June:

median(sold_ice_creams[1:7])
[1] 19
Use the sum() function to calculated the total sales for the first week in sold_ice_creams.

As a last thing, let’s bring in the daily max temperature data from last chapter. Again, I’ve put that into the temp variable.

Now, the plot() function can make simple scatter plots that show two numeric vectors against each other. For example, here’s how one would plot age against height:

plot(x = age, y = height)

Let’s look at the relationship between the temperature and ice cream sales.

Make a scatter plot with temp on the x-axis and sold_ice_creams on the y-axis.

You’ve completed the chapter, great work!

So the plot above is correct because the values in temp and sold_ice_creams vectors line up. But, rather than juggling several related vectors, wouldn’t it be better to stick them all into something like a spreadsheet or table?

Yes it would! And that’s what this next chapter is all about: 👉3. Data files and data frames👈