<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Datasets and vectors | Roberto Petrosino</title><link>https://www.robertopetrosino.com/teaching/r-workshop/2-data-vectors/</link><atom:link href="https://www.robertopetrosino.com/teaching/r-workshop/2-data-vectors/index.xml" rel="self" type="application/rss+xml"/><description>Datasets and vectors</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><image><url>https://www.robertopetrosino.com/media/icon_hub36f9e3ed2f551ac550cd2459c860d9f_18154_512x512_fill_lanczos_center_3.png</url><title>Datasets and vectors</title><link>https://www.robertopetrosino.com/teaching/r-workshop/2-data-vectors/</link></image><item><title>Vectors</title><link>https://www.robertopetrosino.com/teaching/r-workshop/2-data-vectors/vectors/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://www.robertopetrosino.com/teaching/r-workshop/2-data-vectors/vectors/</guid><description>&lt;h2 id="diving-in-to-vector-manipulation">Diving in to vector manipulation&lt;/h2>
&lt;p>Dataframes/dataset are nothing but variables (i.e., columns) bound together. In R, variables are also called &lt;em>vectors&lt;/em>, i.e. a series of values stored within the same variable. In the past two weeks we have been playing with column subsetting in various ways&amp;hellip; which basically has meant for you to create separate vectors from a dataframe. Now we will learn how to create a vector by hand, i.e. without passing by a dataframe. This may sound useless a first, but it will come handy if you want to use R to make calculations for your next homework assignments.&lt;/p>
&lt;p>We can make up a vector by using the &lt;code>c()&lt;/code> function (where &amp;ldquo;c&amp;rdquo; is for &amp;ldquo;combine&amp;rdquo;).&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-r" data-lang="r">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Add 5 numeric values to the vector &amp;#34;my_vec&amp;#34; below by adding them in the parenthesis and separating them with a comma.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">my_vec&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="nf">c&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">____&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">_____&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">_____&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">_____&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">_____&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Notice that a vector cannot contain different datatypes. All values contained in a vector must be of the &lt;em>same&lt;/em> datatype.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-r" data-lang="r">&lt;span class="line">&lt;span class="cl">&lt;span class="n">my_vec2&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="nf">c&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;jon&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;roberto&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;sandra&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;panini&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;bowie&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">my_vec3&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="nf">c&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">my_vec&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">my_vec2&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># This won&amp;#39;t give you an error, but it will convert everything to character by default.&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>We are now going to work on the vectors above, and explore some of the properties and functions. Similarly to dataframes, vectors can be subset by using the square brackets &lt;code>[]&lt;/code> and adding the number corresponding to the position of the number you want to refer to.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-r" data-lang="r">&lt;span class="line">&lt;span class="cl">&lt;span class="n">first&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="n">my_vec3[1]&lt;/span> &lt;span class="c1"># among the numbers stored in &amp;#39;my_vec3&amp;#39;, what number will this be?&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">second&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="n">my_vec3[_]&lt;/span> &lt;span class="c1"># gimme the second number stored in &amp;#39;my_vec3&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">third&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="n">____&lt;/span> &lt;span class="c1"># gimme the third number stored in &amp;#39;my_vec3&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">eighth&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="n">____&lt;/span> &lt;span class="c1"># gimme the eighth number stored in &amp;#39;my_vec3&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">last&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="n">____&lt;/span> &lt;span class="c1"># gimme the last number stored in &amp;#39;my_vec3&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;blockquote>
&lt;p>QUESTION: Why can&amp;rsquo;t we use the &amp;lsquo;$&amp;rsquo; operator for vectors?&lt;/p>
&lt;/blockquote>
&lt;p>Once you have a vector you can do all of the calculations you want &amp;ndash; exactly in the same way you did it for a dataframe. You can check the length of the vector &lt;code>my_vec3&lt;/code>, i.e. the number of values stored in the vector. Use the function &lt;code>length()&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-r" data-lang="r">&lt;span class="line">&lt;span class="cl">&lt;span class="n">my_vec3.length&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="n">_____&lt;/span> &lt;span class="c1"># can you already guess that the length is before running this?&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>You can also sum the values stored in the vector. Use the sum() function.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-r" data-lang="r">&lt;span class="line">&lt;span class="cl">&lt;span class="n">my_vec3.sum&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="n">_____&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now, calculate the mean &lt;em>by hand&lt;/em>. I know the &lt;code>mean()&lt;/code> function is very convenient&amp;hellip; but it&amp;rsquo;ll be just 5 values, and it&amp;rsquo;s always good to brush things up, right? Use the variables we just created above (i.e., sum and length).&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-r" data-lang="r">&lt;span class="line">&lt;span class="cl">&lt;span class="n">my_vec3.meanHand&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="n">_____&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Does the &lt;code>mean()&lt;/code> function give you the same result?&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-r" data-lang="r">&lt;span class="line">&lt;span class="cl">&lt;span class="n">my_vec3.mean&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="n">_____&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Calculate the standard deviation of &lt;code>my_vec3&lt;/code> above &lt;em>by hand&lt;/em>. It&amp;rsquo;s gonna be a pain, I know &amp;ndash; bear with me please!&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-r" data-lang="r">&lt;span class="line">&lt;span class="cl">&lt;span class="n">my_vec3.SS&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="c1"># You could use the square-bracket method to refer to each value of the vector. You also have already calculated and stored the mean above; make sure you use it! Finally, don&amp;#39;t forget to square each difference! &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">variance&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="n">_____&lt;/span> &lt;span class="c1"># what do we do now?&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">my_vec3.sdHand&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="n">_____&lt;/span> &lt;span class="c1"># what should we finally do to get the standard deviation from the variance? Here, the function sqrt() may be kinda helpful here...&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Cool &amp;ndash; now let&amp;rsquo;s check if you get the same result with the sd() function&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-r" data-lang="r">&lt;span class="line">&lt;span class="cl">&lt;span class="n">my_vec3.sd&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="n">____&lt;/span> &lt;span class="c1"># does this give you the same result?&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>As I was saying before, dataframes (or datasets) are just vectors bound together. This means you can creat your own dataframe by combining vectors! The &lt;code>data.frame()&lt;/code> function will help with that.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-r" data-lang="r">&lt;span class="line">&lt;span class="cl">&lt;span class="n">_____&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="nf">data.frame&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">___&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">___&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># make a dataframe of the vectors &amp;#39;my_vec&amp;#39; and &amp;#39;my_vec2&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">____&lt;/span> &lt;span class="c1"># call the new dataframe and see what it looks like!&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item></channel></rss>