<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:posse="https://posseparty.com/2024/Feed"><title>Blogs | Jared Knowles</title><link href="https://jaredknowles.com/posts/" rel="alternate" type="text/html"/><link href="https://jaredknowles.com/posts/feed.xml" rel="self" type="application/atom+xml"/><id>https://jaredknowles.com/posts/</id><updated>2026-06-12T01:05:45Z</updated><subtitle>Data analysis, photography, and the occasional thought.</subtitle><author><name>Jared E. Knowles</name><email>jared@fastmail.us</email></author><entry><title>merTools 1.0: prediction intervals for mixed models, validated against brms</title><link href="https://jaredknowles.com/posts/mertools-1-0-released/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/mertools-1-0-released/</id><published>2026-05-30T11:44:41Z</published><updated>2026-05-30T11:44:41Z</updated><category term="R"/><summary>After a decade on CRAN, merTools 1.0.0 is out today. It has been a long-term goal of mine to bring merTools to a stable state, fix outstanding bugs, and rework the `predictInterval()` function at its core to be more modular and easy to maintain. This release achieves that and marks the package as feature complete and the start of a long-term-support (LTS) phase. This release hits the goal of 1.0.0 of a stable API, outstanding issues closed, a correctness fix at the core of predictInterval(), and modularity in the core prediction interval functionality. The documentation is also now available [via a pkgdown site as well!](https://jknowles.github.io/merTools) install.packages("merTools")</summary><content type="html"><![CDATA[<!-- DROP-CAP: must be a raw-HTML <p class="dropcap"> (post.css styles
     p.dropcap::first-letter). goldmark does NOT parse markdown inside a raw
     HTML block, so bold/italic/code/links here are hand-written as HTML. -->
<p class="dropcap">After a decade on CRAN, <strong>merTools 1.0.0</strong> is out today.
It has been a long-term goal of mine to bring merTools to a stable state, fix outstanding bugs,
and rework the `predictInterval()` function at its core to be more modular and easy to maintain.
This release achieves that and marks the package as <em>feature complete</em> and the start of a
long-term-support (LTS) phase. This release hits the goal of 1.0.0 of a stable API, outstanding issues closed,
a correctness fix at the core of <code>predictInterval()</code>, and modularity in the core prediction interval functionality.
The documentation is also now available [via a pkgdown site as well!](https://jknowles.github.io/merTools)
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;merTools&#34;</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<h2 id="what-mertools-is-for">
  <span class="heading-mark">What merTools is for</span>
  <a class="heading-anchor" href="#what-mertools-is-for" aria-label="Link to this section">#</a>
</h2>
<p>merTools has been on CRAN since 2015, with over 457,000 downloads and about
8,000 a month. It exists to get the most out of large mixed-effects models fit
with <a href="https://cran.r-project.org/package=lme4">lme4</a>
. Its headline function,
<code>predictInterval()</code>, produces <strong>prediction intervals</strong> for <code>lmer</code>/<code>glmer</code> models
by simulating from the joint distribution of the fixed and random effects — the
approach from Gelman and Hill (2007),<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> the same machinery behind <code>arm::sim()</code>.</p>
<!-- FOOTNOTE: plain markdown [^id] ref + definition; goldmark collects all
     definitions into a <section class="footnotes"> at the end regardless of
     where the definition sits. Indented continuation lines stay in the note. -->
<p>The motivation is practical. <code>lme4::bootMer()</code> and full MCMC give you principled
uncertainty, but on models with tens of thousands of groups they can be
impractical or simply too slow to use interactively.</p>
<!-- BLOCKQUOTE: an existing sentence promoted to a > quote (4px blue rule). -->
<blockquote>
<p><code>predictInterval()</code> is built to give you honest intervals on exactly those
models, in about a second.</p>
</blockquote>
<h2 id="try-it-out-and-validate-it-against-brms">
  <span class="heading-mark">Try it out and validate it against brms!</span>
  <a class="heading-anchor" href="#try-it-out-and-validate-it-against-brms" aria-label="Link to this section">#</a>
</h2>
<!-- MARGINALIA: raw-HTML <aside class="marginalia">. On wide viewports (>=1400px)
     post.css floats it into the left margin beside this paragraph; on narrower
     screens it falls back to an outdented block. Placed just before the
     paragraph it annotates so the float aligns with that paragraph's top. -->
<aside class="marginalia">brms fits the same model with full HMC in Stan — the gold standard we're checking against.</aside>
<p>The obvious question for a <em>simulation-based approximation</em> is how close it is to
doing the full Bayesian thing. When we first wrote <code>merTools</code> brms was in its infancy, but
since then it has become my go-to for most Bayesian inference and, frankly, made much of <code>merTools</code> obsolete.
I used an LLM to assist me with refactoring the code and responding to open issues, so I wanted to create a new
set of tests and benchmarks to validate that the package still worked.</p>
<p>So for 1.0 I created a new short vignette that compares brms and merTools results side by side:
the same data and the same model — the bundled <code>hsb</code> data (7,185 students in 160 schools),
modeling math achievement from student SES, school-mean SES, and a school-varying
SES slope. We hold out 20% of students to test on.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">data</span><span class="p">(</span><span class="n">hsb</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">hsb</span><span class="o">$</span><span class="n">schid</span> <span class="o">&lt;-</span> <span class="nf">factor</span><span class="p">(</span><span class="n">hsb</span><span class="o">$</span><span class="n">schid</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">new_schools</span> <span class="o">&lt;-</span> <span class="nf">sample</span><span class="p">(</span><span class="nf">levels</span><span class="p">(</span><span class="n">hsb</span><span class="o">$</span><span class="n">schid</span><span class="p">),</span> <span class="m">6</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">hsb</span><span class="o">$</span><span class="n">.set</span> <span class="o">&lt;-</span> <span class="s">&#34;train&#34;</span>
</span></span><span class="line"><span class="cl"><span class="n">hsb</span><span class="o">$</span><span class="n">.set[hsb</span><span class="o">$</span><span class="n">schid</span> <span class="o">%in%</span> <span class="n">new_schools]</span> <span class="o">&lt;-</span> <span class="s">&#34;test_new&#34;</span>
</span></span><span class="line"><span class="cl"><span class="n">seen</span> <span class="o">&lt;-</span> <span class="nf">which</span><span class="p">(</span><span class="n">hsb</span><span class="o">$</span><span class="n">.set</span> <span class="o">==</span> <span class="s">&#34;train&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">hsb</span><span class="o">$</span><span class="n">.set</span><span class="nf">[sample</span><span class="p">(</span><span class="n">seen</span><span class="p">,</span> <span class="nf">round</span><span class="p">(</span><span class="m">0.2</span> <span class="o">*</span> <span class="nf">length</span><span class="p">(</span><span class="n">seen</span><span class="p">)))</span><span class="n">]</span> <span class="o">&lt;-</span> <span class="s">&#34;test_seen&#34;</span>
</span></span><span class="line"><span class="cl"><span class="n">train</span>     <span class="o">&lt;-</span> <span class="nf">droplevels</span><span class="p">(</span><span class="n">hsb[hsb</span><span class="o">$</span><span class="n">.set</span> <span class="o">==</span> <span class="s">&#34;train&#34;</span><span class="p">,</span> <span class="n">]</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">test_seen</span> <span class="o">&lt;-</span> <span class="n">hsb[hsb</span><span class="o">$</span><span class="n">.set</span> <span class="o">==</span> <span class="s">&#34;test_seen&#34;</span> <span class="o">&amp;</span> <span class="n">hsb</span><span class="o">$</span><span class="n">schid</span> <span class="o">%in%</span> <span class="nf">levels</span><span class="p">(</span><span class="n">train</span><span class="o">$</span><span class="n">schid</span><span class="p">),</span> <span class="n">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">f1</span> <span class="o">&lt;-</span> <span class="n">mathach</span> <span class="o">~</span> <span class="n">ses</span> <span class="o">+</span> <span class="n">meanses</span> <span class="o">+</span> <span class="p">(</span><span class="n">ses</span> <span class="o">|</span> <span class="n">schid</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">m_lme</span> <span class="o">&lt;-</span> <span class="nf">lmer</span><span class="p">(</span><span class="n">f1</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">train</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Fit brms once and cache it. Persist the genuine from-scratch fit time to a</span>
</span></span><span class="line"><span class="cl"><span class="c1"># file, so the timing table below reports the real compile+sample cost even on</span>
</span></span><span class="line"><span class="cl"><span class="c1"># later renders that just read the cached fit (which take ~2s, not ~7 min).</span>
</span></span><span class="line"><span class="cl"><span class="n">brms_cache</span> <span class="o">&lt;-</span> <span class="nf">file.path</span><span class="p">(</span><span class="n">outdir</span><span class="p">,</span> <span class="s">&#34;brms_hsb_meanses.rds&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">time_cache</span> <span class="o">&lt;-</span> <span class="nf">file.path</span><span class="p">(</span><span class="n">outdir</span><span class="p">,</span> <span class="s">&#34;brms_hsb_meanses_seconds.rds&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">fresh_fit</span> <span class="o">&lt;-</span> <span class="o">!</span><span class="nf">file.exists</span><span class="p">(</span><span class="n">brms_cache</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">elapsed</span> <span class="o">&lt;-</span> <span class="nf">system.time</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">  <span class="n">m_brm</span> <span class="o">&lt;-</span> <span class="nf">fit_brm</span><span class="p">(</span><span class="n">f1</span><span class="p">,</span> <span class="n">train</span><span class="p">,</span> <span class="nf">gaussian</span><span class="p">(),</span> <span class="s">&#34;brms_hsb_meanses&#34;</span><span class="p">))</span><span class="n">[[</span><span class="s">&#34;elapsed&#34;</span><span class="n">]]</span>
</span></span><span class="line"><span class="cl"><span class="kr">if</span> <span class="p">(</span><span class="n">fresh_fit</span><span class="p">)</span> <span class="nf">saveRDS</span><span class="p">(</span><span class="n">elapsed</span><span class="p">,</span> <span class="n">time_cache</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">brms_fit_seconds</span> <span class="o">&lt;-</span> <span class="kr">if</span> <span class="p">(</span><span class="nf">file.exists</span><span class="p">(</span><span class="n">time_cache</span><span class="p">))</span> <span class="nf">readRDS</span><span class="p">(</span><span class="n">time_cache</span><span class="p">)</span> <span class="kr">else</span> <span class="n">elapsed</span></span></span></code></pre></div>
</figure>
<h3 id="point-estimates-are-essentially-identical">
  <span class="heading-mark">Point estimates are essentially identical</span>
  <a class="heading-anchor" href="#point-estimates-are-essentially-identical" aria-label="Link to this section">#</a>
</h3>
<p>Both methods condition on the same estimated fixed effects and school BLUPs, so
the conditional-mean predictions agree almost exactly.</p>
<!-- FIGURE (size hint): fig.show='hide' tells knitr to still WRITE point-1.png
     but NOT emit its own ![plot of chunk point] line, so the hand-written
     markdown image below is the only one — carrying real alt text, a caption,
     and the "| small" size hint that render-image.html turns into .figure-small. -->
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">lme_mean</span> <span class="o">&lt;-</span> <span class="nf">predict</span><span class="p">(</span><span class="n">m_lme</span><span class="p">,</span> <span class="n">newdata</span> <span class="o">=</span> <span class="n">test_seen</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">brm_mean</span> <span class="o">&lt;-</span> <span class="nf">colMeans</span><span class="p">(</span><span class="nf">posterior_epred</span><span class="p">(</span><span class="n">m_brm</span><span class="p">,</span> <span class="n">newdata</span> <span class="o">=</span> <span class="n">test_seen</span><span class="p">,</span> <span class="n">allow_new_levels</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="nf">c</span><span class="p">(</span><span class="n">correlation</span> <span class="o">=</span> <span class="nf">cor</span><span class="p">(</span><span class="n">lme_mean</span><span class="p">,</span> <span class="n">brm_mean</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">  <span class="n">mean_abs_diff</span> <span class="o">=</span> <span class="nf">mean</span><span class="p">(</span><span class="nf">abs</span><span class="p">(</span><span class="n">lme_mean</span> <span class="o">-</span> <span class="n">brm_mean</span><span class="p">)),</span>
</span></span><span class="line"><span class="cl">  <span class="n">response_sd</span> <span class="o">=</span> <span class="nf">sd</span><span class="p">(</span><span class="n">test_seen</span><span class="o">$</span><span class="n">mathach</span><span class="p">))</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="text"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">#&gt;   correlation mean_abs_diff   response_sd 
</span></span><span class="line"><span class="cl">#&gt;    0.99984045    0.03670134    6.71745425</span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">ggplot</span><span class="p">(</span><span class="nf">data.frame</span><span class="p">(</span><span class="n">merTools</span> <span class="o">=</span> <span class="n">lme_mean</span><span class="p">,</span> <span class="n">brms</span> <span class="o">=</span> <span class="n">brm_mean</span><span class="p">),</span> <span class="nf">aes</span><span class="p">(</span><span class="n">merTools</span><span class="p">,</span> <span class="n">brms</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">  <span class="nf">geom_abline</span><span class="p">(</span><span class="n">slope</span> <span class="o">=</span> <span class="m">1</span><span class="p">,</span> <span class="n">intercept</span> <span class="o">=</span> <span class="m">0</span><span class="p">,</span> <span class="n">linetype</span> <span class="o">=</span> <span class="m">2</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">&#34;grey50&#34;</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">  <span class="nf">geom_point</span><span class="p">(</span><span class="n">alpha</span> <span class="o">=</span> <span class="m">0.3</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="m">0.8</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">&#34;#1B9E77&#34;</span><span class="p">)</span> <span class="o">+</span> <span class="nf">coord_equal</span><span class="p">()</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">  <span class="nf">labs</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="s">&#34;lme4 predict()&#34;</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="s">&#34;brms posterior_epred()&#34;</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">  <span class="nf">theme_civilytics</span><span class="p">()</span></span></span></code></pre></div>
</figure>
<p><figure class="post-figure figure-small">
  <div class="photo-frame"><img src="/posts/mertools-1-0-released/point-1_hu_add01e072b920db2.webp"
             srcset="/posts/mertools-1-0-released/point-1_hu_add01e072b920db2.webp 760w, /posts/mertools-1-0-released/point-1_hu_4738e11267f29e01.webp 770w"
             sizes="(max-width: 800px) 100vw, 760px"
             alt="Scatter of lme4 versus brms conditional-mean predictions for held-out students, with points hugging the 1:1 line." width="770" height="440" loading="lazy" decoding="async"></div><figcaption><span class="fig-label">Figure.</span> merTools and brms agree to four decimals on the point predictions.</figcaption></figure>
</p>
<h3 id="and-they-are-equally-well-calibrated">
  <span class="heading-mark">And they are equally well calibrated</span>
  <a class="heading-anchor" href="#and-they-are-equally-well-calibrated" aria-label="Link to this section">#</a>
</h3>
<p>The real test is out-of-sample: do nominal intervals cover held-out scores at the
nominal rate? We draw a full predictive distribution for each held-out student
from each method — <code>predictInterval(returnSims = TRUE)</code> and
<code>brms::posterior_predict()</code> — and compare empirical coverage. Both track the
nominal level, and each other.</p>
<p>The <code>mt_draws()</code> helper used below — saved alongside the analysis — just wraps
<code>predictInterval()</code> to hand back the raw simulation matrix:</p>
<!-- CODE-BLOCK-WITH-FILENAME: a LITERAL (non-evaluated) fenced block below. knitr
     only runs braced r-chunks; a plain r-fence carrying a filename attribute
     passes through untouched, and render-codeblock.html reads that filename for
     the badge. NB: do not write a backtick-r sequence in prose/comments here —
     knitr parses it as inline R code and the knit fails. -->
<figure class="code-block" data-lang="r"><figcaption class="code-block-filename">helpers.R</figcaption><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># One column of posterior-style draws per row of `newdata`.</span>
</span></span><span class="line"><span class="cl"><span class="n">mt_draws</span> <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">(</span><span class="n">fit</span><span class="p">,</span> <span class="n">newdata</span><span class="p">,</span> <span class="kc">...</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="kc">pi</span> <span class="o">&lt;-</span> <span class="nf">predictInterval</span><span class="p">(</span><span class="n">fit</span><span class="p">,</span> <span class="n">newdata</span> <span class="o">=</span> <span class="n">newdata</span><span class="p">,</span> <span class="n">returnSims</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">,</span> <span class="kc">...</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">  <span class="nf">attr</span><span class="p">(</span><span class="kc">pi</span><span class="p">,</span> <span class="s">&#34;sim.results&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">mt_seen</span> <span class="o">&lt;-</span> <span class="nf">mt_draws</span><span class="p">(</span><span class="n">m_lme</span><span class="p">,</span> <span class="n">test_seen</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">pp_seen</span> <span class="o">&lt;-</span> <span class="nf">posterior_predict</span><span class="p">(</span><span class="n">m_brm</span><span class="p">,</span> <span class="n">newdata</span> <span class="o">=</span> <span class="n">test_seen</span><span class="p">,</span> <span class="n">allow_new_levels</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">data.frame</span><span class="p">(</span><span class="n">nominal</span> <span class="o">=</span> <span class="n">NOMINAL</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">           <span class="n">merTools</span> <span class="o">=</span> <span class="nf">coverage</span><span class="p">(</span><span class="n">mt_seen</span><span class="p">,</span> <span class="n">test_seen</span><span class="o">$</span><span class="n">mathach</span><span class="p">,</span> <span class="n">NOMINAL</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">           <span class="n">brms</span>     <span class="o">=</span> <span class="nf">coverage</span><span class="p">(</span><span class="n">pp_seen</span><span class="p">,</span> <span class="n">test_seen</span><span class="o">$</span><span class="n">mathach</span><span class="p">,</span> <span class="n">NOMINAL</span><span class="p">))</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="text"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">#&gt;   nominal  merTools      brms
</span></span><span class="line"><span class="cl">#&gt; 1    0.50 0.4569776 0.4598698
</span></span><span class="line"><span class="cl">#&gt; 2    0.80 0.7895879 0.7932032
</span></span><span class="line"><span class="cl">#&gt; 3    0.90 0.9168474 0.9168474
</span></span><span class="line"><span class="cl">#&gt; 4    0.95 0.9660159 0.9703543</span></span></code></pre></div>
</figure>
<h3 id="at-a-fraction-of-the-cost">
  <span class="heading-mark">At a fraction of the cost</span>
  <a class="heading-anchor" href="#at-a-fraction-of-the-cost" aria-label="Link to this section">#</a>
</h3>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">t_lme</span> <span class="o">&lt;-</span> <span class="nf">system.time</span><span class="p">(</span><span class="nf">lmer</span><span class="p">(</span><span class="n">f1</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">train</span><span class="p">))</span><span class="n">[[</span><span class="s">&#34;elapsed&#34;</span><span class="n">]]</span>
</span></span><span class="line"><span class="cl"><span class="n">t_mt</span>  <span class="o">&lt;-</span> <span class="nf">system.time</span><span class="p">(</span><span class="nf">mt_draws</span><span class="p">(</span><span class="n">m_lme</span><span class="p">,</span> <span class="n">test_seen</span><span class="p">))</span><span class="n">[[</span><span class="s">&#34;elapsed&#34;</span><span class="n">]]</span>
</span></span><span class="line"><span class="cl"><span class="n">t_pp</span>  <span class="o">&lt;-</span> <span class="nf">system.time</span><span class="p">(</span><span class="nf">posterior_predict</span><span class="p">(</span><span class="n">m_brm</span><span class="p">,</span> <span class="n">newdata</span> <span class="o">=</span> <span class="n">test_seen</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                                       <span class="n">allow_new_levels</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">))</span><span class="n">[[</span><span class="s">&#34;elapsed&#34;</span><span class="n">]]</span>
</span></span><span class="line"><span class="cl"><span class="nf">data.frame</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">  <span class="n">step</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;lme4 fit&#34;</span><span class="p">,</span> <span class="s">&#34;predictInterval&#34;</span><span class="p">,</span> <span class="s">&#34;brms fit (compile+sample)&#34;</span><span class="p">,</span> <span class="s">&#34;posterior_predict&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">  <span class="n">seconds</span> <span class="o">=</span> <span class="nf">round</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="n">t_lme</span><span class="p">,</span> <span class="n">t_mt</span><span class="p">,</span> <span class="n">brms_fit_seconds</span><span class="p">,</span> <span class="n">t_pp</span><span class="p">),</span> <span class="m">2</span><span class="p">))</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="text"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">#&gt;                        step seconds
</span></span><span class="line"><span class="cl">#&gt; 1                  lme4 fit    0.07
</span></span><span class="line"><span class="cl">#&gt; 2           predictInterval    0.66
</span></span><span class="line"><span class="cl">#&gt; 3 brms fit (compile+sample)  409.13
</span></span><span class="line"><span class="cl">#&gt; 4         posterior_predict    2.38</span></span></code></pre></div>
</figure>
<p>The entire lme4 + <code>predictInterval()</code> path runs in about a second; the single
brms fit takes several minutes — a couple of orders of magnitude, and the gap
only widens on the large models <code>predictInterval()</code> was really built for. The
point isn&#8217;t to stop using brms: if you want the full posterior, use it. The point
is that when full MCMC is impractical, <code>predictInterval()</code> is a fast,
well-calibrated stand-in. The full story — new-group handling and a binomial GLMM
— is in the
<a href="https://jknowles.github.io/merTools/articles/brms_validation.html">validation vignette</a>
.</p>
<!-- PULL-QUOTE: raw-HTML <p class="pull-quote"> — 5px magenta rule, amber bleed. -->
<p class="pull-quote">When full MCMC is impractical, predictInterval() is a fast, well-calibrated stand-in.</p>
<h2 id="visualizing-group-effects-with-plotreimpact">
  <span class="heading-mark">Visualizing group effects with <code>plotREimpact()</code></span>
  <a class="heading-anchor" href="#visualizing-group-effects-with-plotreimpact" aria-label="Link to this section">#</a>
</h2>
<p>New in 1.0, <code>plotREimpact()</code> visualizes <code>REimpact()</code> output — the fitted outcome
across the distribution of a grouping factor&#8217;s effect. Pass a <em>named list</em> and it
overlays grouping factors on shared axes, so you can compare their influence
directly. Here, the instructor and student effects in the <code>InstEval</code> ratings data:</p>
<!-- FIGURE (second size hint, "| wide"): same fig.show='hide' technique so the
     hand-written image below — captioned and full-bleed-wide — is the only one. -->
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">m1</span> <span class="o">&lt;-</span> <span class="nf">lmer</span><span class="p">(</span><span class="n">y</span> <span class="o">~</span> <span class="n">service</span> <span class="o">+</span> <span class="n">lectage</span> <span class="o">+</span> <span class="n">studage</span> <span class="o">+</span> <span class="p">(</span><span class="m">1</span> <span class="o">|</span> <span class="n">d</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="m">1</span> <span class="o">|</span> <span class="n">s</span><span class="p">),</span> <span class="n">data</span> <span class="o">=</span> <span class="n">InstEval</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">imp_d</span> <span class="o">&lt;-</span> <span class="nf">REimpact</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="n">InstEval[7</span><span class="p">,</span> <span class="n">]</span><span class="p">,</span> <span class="n">groupFctr</span> <span class="o">=</span> <span class="s">&#34;d&#34;</span><span class="p">,</span> <span class="n">breaks</span> <span class="o">=</span> <span class="m">5</span><span class="p">,</span> <span class="n">n.sims</span> <span class="o">=</span> <span class="m">300</span><span class="p">,</span> <span class="n">level</span> <span class="o">=</span> <span class="m">0.9</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">imp_s</span> <span class="o">&lt;-</span> <span class="nf">REimpact</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="n">InstEval[7</span><span class="p">,</span> <span class="n">]</span><span class="p">,</span> <span class="n">groupFctr</span> <span class="o">=</span> <span class="s">&#34;s&#34;</span><span class="p">,</span> <span class="n">breaks</span> <span class="o">=</span> <span class="m">5</span><span class="p">,</span> <span class="n">n.sims</span> <span class="o">=</span> <span class="m">300</span><span class="p">,</span> <span class="n">level</span> <span class="o">=</span> <span class="m">0.9</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">plotREimpact</span><span class="p">(</span><span class="nf">list</span><span class="p">(</span><span class="s">&#34;Instructor (d)&#34;</span> <span class="o">=</span> <span class="n">imp_d</span><span class="p">,</span> <span class="s">&#34;Student (s)&#34;</span> <span class="o">=</span> <span class="n">imp_s</span><span class="p">))</span> <span class="o">+</span> <span class="nf">theme_civilytics</span><span class="p">()</span></span></span></code></pre></div>
</figure>
<p><figure class="post-figure figure-wide">
  <div class="photo-frame"><img src="/posts/mertools-1-0-released/reimpact-1_hu_75e0d0424522797f.webp"
             srcset="/posts/mertools-1-0-released/reimpact-1_hu_75e0d0424522797f.webp 760w, /posts/mertools-1-0-released/reimpact-1_hu_2f4f3577815752c4.webp 770w"
             sizes="(max-width: 800px) 100vw, 760px"
             alt="Two overlaid REimpact curves showing predicted rating across quintiles of the instructor effect versus the student effect; the instructor curve rises far more steeply." width="770" height="495" loading="lazy" decoding="async"></div><figcaption><span class="fig-label">Figure.</span> Instructor effect moves the predicted rating more than the student effect.</figcaption></figure>
</p>
<p>Moving a case from the bottom to the top of the instructor-effect distribution
shifts the predicted rating substantially more than the same move across the
student distribution — the instructor grouping factor carries more of the signal.</p>
<h2 id="what-else-is-new-in-10">
  <span class="heading-mark">What else is new in 1.0</span>
  <a class="heading-anchor" href="#what-else-is-new-in-10" aria-label="Link to this section">#</a>
</h2>
<p>The <a href="https://github.com/jknowles/merTools/blob/main/NEWS.md">release notes</a>
 have
the full list; the highlights:</p>
<ul>
<li>
<p><strong>Correctness fix for nested random effects (#124).</strong> <code>predictInterval()</code> was
returning seed-dependent point estimates when a prediction frame mixed observed
and unobserved levels of an interaction grouping factor (e.g. <code>(1 | a/b)</code>). A
<code>max.col()</code> tie-break was silently letting an unobserved level borrow a
<em>random</em> observed level&#8217;s effect. Unobserved levels now correctly fall back to
the fixed effects; observed-level predictions are bit-for-bit unchanged.</p>
</li>
<li>
<p><strong>New <code>new.levels = &quot;draw&quot;</code>.</strong> For a group the model never saw, the default
(<code>&quot;zero&quot;</code>) drops its random effect; <code>&quot;draw&quot;</code> instead <em>samples</em> the effect from
the estimated random-effect covariance — the direct analogue of
<code>brms::posterior_predict(allow_new_levels = TRUE)</code>. The default is unchanged.</p>
</li>
<li>
<p><strong><code>plotFEsim()</code> now highlights significant terms</strong>, and <strong><code>shinyMer()</code> is
revived and extended</strong> with a Model Summary tab and subset-based draws.</p>
</li>
</ul>
<h2 id="why-lts--maintenance-mode">
  <span class="heading-mark">Why LTS / maintenance mode</span>
  <a class="heading-anchor" href="#why-lts--maintenance-mode" aria-label="Link to this section">#</a>
</h2>
<p>merTools does what it set out to do, and the API has been stable for years. Going
forward, 1.0.x releases will be about keeping it healthy — bug fixes, CRAN and
dependency compatibility, and documentation — rather than churning the interface.</p>
<h2 id="get-it--cite-it--file-issues">
  <span class="heading-mark">Get it / cite it / file issues</span>
  <a class="heading-anchor" href="#get-it--cite-it--file-issues" aria-label="Link to this section">#</a>
</h2>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;merTools&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">citation</span><span class="p">(</span><span class="s">&#34;merTools&#34;</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<ul>
<li>Package site &amp; vignettes: <a href="https://jknowles.github.io/merTools/">https://jknowles.github.io/merTools/</a>
</li>
<li>Source &amp; issues: <a href="https://github.com/jknowles/merTools">https://github.com/jknowles/merTools</a>
</li>
</ul>
<p>Thanks to everyone who has contributed over the years — Carl Frederick, Alex
Whitworth, Ben Bolker (<code>@bbolker</code>), Davis Vaughan (<code>@DavisVaughan</code>) — and to the
issue reporters who made this release better, including <code>@dotPiano</code>, whose report
led to the #124 correctness fix.</p>
<h2 id="references">
  <span class="heading-mark">References</span>
  <a class="heading-anchor" href="#references" aria-label="Link to this section">#</a>
</h2>
<!-- The bibliography shortcode renders an APA list from the bib.json page
     resource (CSL-JSON). In-text author-date citation is also available (a cite
     shortcode), but this post cites sources in prose + a footnote, so only the
     list is used here. NB: Hugo runs shortcodes even inside HTML comments, so
     the shortcode names are described in words, not written literally. -->


  










<section class="hugo-cite-bibliography">
  <dl>
    

      <div id="mertools">
        <dt>
          Knowles&#32;&amp;&#32;Frederick

          
          (2026)</dt>

        <dd>
          










<span itemscope 
      itemtype="https://schema.org/Book"
      data-type="book"><span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Knowles</span>,&#32;
    <meta itemprop="givenName" content="Jared E." />
    J.</span>&#32;&amp;&#32;<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Frederick</span>,&#32;
    <meta itemprop="givenName" content="Carl" />
    C.</span>&#32;
    (<span itemprop="datePublished">2026</span>).
  &#32;<span itemprop="name">
    <i>merTools: Tools for analyzing mixed effect regression models</i></span>.
  &#32;Retrieved from&#32;
  <a href="https://cran.r-project.org/package=merTools"
     itemprop="identifier"
     itemtype="https://schema.org/URL">https://cran.r-project.org/package=merTools</a></span>




</dd>

      </div>

      <div id="gelman2007">
        <dt>
          Gelman&#32;&amp;&#32;Hill

          
          (2007)</dt>

        <dd>
          










<span itemscope 
      itemtype="https://schema.org/Book"
      data-type="book"><span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Gelman</span>,&#32;
    <meta itemprop="givenName" content="Andrew" />
    A.</span>&#32;&amp;&#32;<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Hill</span>,&#32;
    <meta itemprop="givenName" content="Jennifer" />
    J.</span>&#32;
    (<span itemprop="datePublished">2007</span>).
  &#32;<span itemprop="name">
    <i>Data analysis using regression and multilevel/hierarchical models</i></span>.
  <meta itemprop="contentLocation"
        value="New York">&#32;
  <span itemprop="publisher"
             itemtype="http://schema.org/Organization"
             itemscope="">
    <span itemprop="name">Cambridge University Press</span></span>.</span>




</dd>

      </div>
  </dl>
</section>



<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>Gelman &amp; Hill, <em>Data Analysis Using Regression and Multilevel/Hierarchical
Models</em> (Cambridge University Press, 2007), ch. 12 — the simulation approach
<code>predictInterval()</code> implements.&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "After a decade on CRAN, merTools reaches 1.0 — feature complete, in long-term support, and now benchmarked head-to-head against full Bayesian inference.",
  "og_image": "https://jaredknowles.com/og/posts/mertools-1-0-released.png",
  "og_title": "merTools 1.0: prediction intervals for mixed models, validated against brms"
}
</posse:post></entry><entry><title>Education Data Done Right, a New Book on Strategies for Success in Building Education Data Capacity</title><link href="https://jaredknowles.com/posts/education-data-done-right-a-new-book-on-strategies-for-success-in-building-education-data-capacity/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/education-data-done-right-a-new-book-on-strategies-for-success-in-building-education-data-capacity/</id><published>2019-10-23T21:31:30Z</published><updated>2019-10-23T21:31:30Z</updated><category term="education research"/><summary>New book covers the ins and outs of doing education data analysis well in public education agencies from the perspective of three analysts with decades of experience in the field. Boston, MA – October 7, 2019 – Wendy Geller, Dorothyjean Cratty, and Jared Knowles – three data analysts with expertise in public education agencies – have teamed up to write a new book which covers the missing elements that are critical to success in building data capacity in education agencies. The book is intended for education agency data analysts, teams of analyst, and data managers, strategists, and leaders seeking to improve how their agency operates.</summary><content type="html"><![CDATA[<blockquote>
<p>New book covers the ins and outs of doing education data analysis well in public education agencies from the perspective of three analysts with decades of experience in the field.</p>
</blockquote>
<hr>
<p><figure class="post-figure">
  <div class="photo-frame"><img src="https://media.jaredknowles.com/posts/education-data-done-right-a-new-book-on-strategies-for-success-in-building-education-data-capacity/education-data-done-right-a-new-book-on-strategies-for-success-in-building-education-data-capacity-01-book_cover_small.jpg" alt="book_cover_small.jpg" loading="lazy" decoding="async"></div></figure>
</p>
<p><strong>Boston, MA</strong> – <strong>October 7, 2019</strong> – Wendy Geller, Dorothyjean Cratty, and Jared Knowles – three data analysts with expertise in public education agencies – have teamed up to write a new book which covers the missing elements that are critical to success in building data capacity in education agencies. The book is intended for education agency data analysts, teams of analyst, and data managers, strategists, and leaders seeking to improve how their agency operates.</p>
<p>Many education agency data analysts come from a social science research background and the transition to work inside agencies can come with a lot of new challenges. This book is a guide through those challenges covering topics such as metadata, data requests, how to work with IT, politics, and descriptive data analysis.</p>
<p>The book covers these topics with wit and humor and a perspective only possible from authors who’ve been in the trenches and gotten the work done. Each chapter was reviewed by another expert in the field who gave valuable outside perspective and broadened the horizon of the book to ensure its relevance for agencies across the country.</p>
<p>The book is accompanied by a website where analysts across the country can get in touch and suggest contributions for planned future volumes. <a href="https://www.eddatadoneright.com">On the website</a>
 you can also learn more about the biographies of the authors and each of the contributors.</p>
<p>Education Data Done Right (EDDR) is available now digitally on LeanPub with a suggested price of $15: <a href="https://www.leanpub.com/eddatadoneright">www.leanpub.com/eddatadoneright</a>
 Print copies available at <a href="https://www.amazon.com/dp/1698152310/ref=sr_1_1?keywords=education&#43;data&#43;done&#43;right&amp;qid=1570737099&amp;sr=8-1">Amazon</a>
.</p>
<p><strong>About the Authors</strong></p>
<p>Dr. Wendy Geller is currently the Director of the Data Management &amp; Analysis Division. There, she leads a team that serves as a centralized resource to the Vermont Agency of Education. Her crew collects, stewards, and leverages the institution’s critical data assets to create and share data products that enable empirically-based practice and policy decision-making.</p>
<p>Dr. Jared Knowles is formerly a research analyst at the Wisconsin Department of Public Instruction (2011-2016) and is currently the president of Civilytics Consulting LLC, which provides training, analytic services, and strategy to education agencies across the country.</p>
<p>Ms. Dorothyjean Cratty is formerly a research associate at the U.S. Department of Education. She is currently the founder of DJC Applied Research, which provides high-quality in-depth data analysis of administrative education data.</p>
<p><strong>Media contacts:</strong></p>
<p>Jared E. Knowles</p>
<p>617.393.1939</p>
<p><a href="mailto:jared@civilytics.com">jared@civilytics.com</a>
</p>
<p><strong>Table of Contents</strong></p>
<p>Introduction</p>
<p>Metadata and Business Rules</p>
<p>An Analyst’s Guide to IT</p>
<p>Data Requests: You Can Make them Useful (we swear)</p>
<p>Politics and Data Driven Decision Making</p>
<p>Moments of Truth: Why Calculating Descriptive Statistics is Important</p>
<p>Applying Tools of the Trade: Descriptive Data Commands in Context</p>
<p>Conclusions</p>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "New book covers the ins and outs of doing education data analysis well in public education agencies from the perspective of three analysts with decades of experience in the field. Boston, MA – October 7, 2019 – Wendy Geller, Dorothyjean Cratty, and Jared Knowles – three data analysts with expertise in public education agencies – have teamed up to write a new book which covers the missing elements that are critical to success in building data capacity in education agencies. The book is intended for education agency data analysts, teams of analyst, and data managers, strategists, and leaders seeking to improve how their agency operates.",
  "og_image": "https://jaredknowles.com/og/posts/education-data-done-right-a-new-book-on-strategies-for-success-in-building-education-data-capacity.png",
  "og_title": "Education Data Done Right, a New Book on Strategies for Success in Building Education Data Capacity"
}
</posse:post></entry><entry><title>The Ethics of Data Science</title><link href="https://jaredknowles.com/posts/the-ethics-of-data-science/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/the-ethics-of-data-science/</id><published>2019-07-12T14:29:56Z</published><updated>2019-07-12T14:29:56Z</updated><summary>The past 18 months I’ve spent a lot of time thinking about the role of experts in a democratic society. Experts pose a particular challenge to democratic governance because expertise is, by its nature, undemocratic. A lot of this thinking has been centered around a particular kind of expertise - perhaps the defining type of expertise of our time - data science. I have been happy to find that a number of very talented researchers and authors have been taking up the question of data science and its role in our society lately. In the past several years a number of thought-provoking and brilliant books and articles have been written that demand the attention of anyone working in data science - but particularly those working in fields where the impact of their work is public.</summary><content type="html"><![CDATA[<p>The past 18 months I’ve spent a lot of time thinking about the role of experts in a democratic society. Experts pose a particular challenge to democratic governance because expertise is, by its nature, undemocratic. A lot of this thinking has been centered around a particular kind of expertise - perhaps the defining type of expertise of our time - data science.</p>
<p>I have been happy to find that a number of very talented researchers and authors have been taking up the question of data science and its role in our society lately. In the past several years a number of thought-provoking and brilliant books and articles have been written that demand the attention of anyone working in data science - but particularly those working in fields where the impact of their work is public.</p>
<p>Instead of adding my interpretation of these works I thought it would be more helpful to direct you to engage with these authors directly. To aid in that I have put together a reading list of the books and articles that have been the most eye-opening and thought provoking for me in the hopes they may provide the same for you.</p>
<p>You can see an <a href="https://www.github.com/jknowles/ethical_data_science_reader">updated version of this list on GitHub</a>
 where you are also welcome to submit readings you have found helpful to be added to the list.</p>
<p>Below is an annotated version of this list which captures my recommendations for which books and articles might be most useful to different types of readers. You can <a href="/s/Ethical-and-Inclusive-Data-Science-Readings.pdf">get it as a PDF here.</a>
</p>
<h1 id="introduction">
  <span class="heading-mark">Introduction</span>
  <a class="heading-anchor" href="#introduction" aria-label="Link to this section">#</a>
</h1>
<p>This reading list gives an overview of the ethical concerns specific to data analysis, data science, and artificial intelligence. Ethics is used broadly here to mean concerns related to racial and economic equity, justice, fairness, and the protection of democratic and human rights.</p>
<p>This list is intended to spark new ideas and prompt critical thinking about data system design and integration into business processes in an organization. This is not an endorsement of all viewpoints represented in the readings below – except to say that each of the readings raise questions, put forward ideas, and make critiques that are worthy of your deep consideration.</p>
<p>All links last accessed July 11th, 2019. This guide was last updated July 11th, 2019. An unannotated version of this reading list is available <a href="http://www.github.com/jknowles/ethical_data_science_reader">on GitHub</a>
. You can get an updated version of this list as well as suggest additions to the reading list there.</p>
<h1 id="books">
  <span class="heading-mark">Books</span>
  <a class="heading-anchor" href="#books" aria-label="Link to this section">#</a>
</h1>
<p>Eubanks, Virginia. 2018. Automating Inequality. St. Martin’s Press.</p>
<p>Noble, Safiya. 2018. Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press.</p>
<p>O’Neil, Cathy. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Broadway Books.</p>
<blockquote>
<p>These are the “big three” books uncovering the ways that algorithms affect our lives invisibly and sometimes visibly. Of the three I am partial to Eubanks because of the in-depth way she centers the voices of those affected by algorithms and her focus on algorithms in the social services sector. Noble is one of the most important voices in technology today – especially for thinking about the impact of major technology companies on our lives. I prefer the article below to the book length treatment by O’Neil. The intersection of expertise and democracy has been studied by social scientists for decades and that literature is better summarized elsewhere.</p>
</blockquote>
<p>Broussard, Meredith. 2018. Artificial Unintelligence: How Computers Misunderstand the World. MIT Press.</p>
<p>Benjamin, Ruha. 2019. Race After Technology: Abolitionist Tools for the New Jim Code. Polity.</p>
<blockquote>
<p>These are two newer texts which offer updated takes on the above themes. Broussard is specifically tackling artificial intelligence which is related to, but adjacent to most applications of data science in fields like education and social services (for now). Benjamin’s book is one I anticipate greatly as it brings a much needed critical race perspective to the conversation about the affect of data analysis and collection on our society. Even better it is intended to teach the reader how to critically review the promises of technologies like algorithms and automated decision support systems.</p>
</blockquote>
<p>Loukides, Mike, Hilary Mason, and DJ Patil. 2018. Ethics and Data Science. O’Reilly.</p>
<blockquote>
<p>This is a pragmatic and brief overview of the major ethical concerns with data science. This text focuses on practical steps that a data science team can take to be more ethical. This practical approach is different than the above readings which is why I recommend it as a supplementary reading – but is very helpful for answering the “what do I do now?” question.</p>
</blockquote>
<p>brown, adrienne maree. 2017. Emergent Strategy: Shaping Change, Changing Worlds. AK Press.</p>
<blockquote>
<p>This book isn’t about ethics, data science, or technology explicitly at all. It is about how to work together with a large inclusive set of stakeholders to build something that reflects the voices of a diverse community. This is, in fact, the main solution proposed by almost all the authors above – inclusive design done together with a wider community. This book will stimulate your thinking on how to go about that.</p>
</blockquote>
<h1 id="articles">
  <span class="heading-mark">Articles</span>
  <a class="heading-anchor" href="#articles" aria-label="Link to this section">#</a>
</h1>
<p>Wallach, Hanna. 2014. “Big Data, Machine Learning, and the Social Sciences: Fairness, Accountability, and Transparency.” Medium. <a href="https://medium.com/@hannawallach/big-data-machine-learning-and-the-social-sciences-927a8e20460d">Online</a>
. 12.19.2014</p>
<p>O’Neil, Cathy. 2016. “How to Bring Better Ethics to Data Science.” Slate. <a href="https://slate.com/technology/2016/02/how-to-bring-better-ethics-to-data-science.html">Online</a>
. 2.4.2016</p>
<p>Broussard, Meredith. 2019. “Letting Go of Technochauvinism.” in Public Books. <a href="https://www.publicbooks.org/letting-go-of-technochauvinism/">Online</a>
. 6.17.2019.</p>
<blockquote>
<p>Together these three articles provide a great overview of the limits of data science, the limits of our ability to “technology” our way out of social problems, and the intersection of the data systems we design and the world we live in. The Broussard article challenges the reader with the proposition that automation is perhaps not the best answer to each and every problem. The O’Neil article is a great overview of her critically acclaimed book – clearly presenting the arguments and the implications. The Wallach piece is maybe the best of the bunch – it gives a comprehensive tour of the ethical concerns of data science starting from the questions we ask all the way through how we use the answers our algorithms provide.</p>
</blockquote>
<p>Dash, Anil. 2018. Humane Tech. Medium. <a href="https://medium.com/humane-tech.">Online</a>
.</p>
<blockquote>
<p>Dash is one of the most important voices in the tech industry. While not strictly about data science, this series of articles provides a great overview for thinking about how to build technology tools for society as it is in ways that make society better – instead of exploiting the flaws in our society for profit. You should also check out his podcast – <a href="https://glitch.com/culture/function/">Function</a>
.</p>
</blockquote>
<p>Fischer, Frank. 1993. “Citizen participation and democratization of policy expertise: From   theoretical inquiry to practical cases*.*” Policy Sciences. v. 26 pp. 165-187.</p>
<p>Diakopoulos, Nicholas. 2016. “How to Hold Governments Accountable for the Algorithms They        Use.” Slate. <a href="https://slate.com/technology/2016/02/how-to-hold-governments-accountable-for-their-algorithms.html">Online</a>
. 2.11.2016</p>
<p>Angwin, Julia. 2016. “Making Algorithms Accountable.” ProPublica. <a href="https://www.propublica.org/article/making-algorithms-accountable">Online</a>
. 2.1.2016</p>
<blockquote>
<p>Government uses of data science tools are a special case and merit their own discussion. Dakopoulos and Angwin both present good overviews for how to make algorithms in government accountable and how to enforce accountability of algorithms in general. For me, though, the take on this topic that expanded my mind the most was an older article on the role of expertise in governing a democratic society by Fischer. This article is heavy on the academic side but takes a look at the unique challenges that a reliance on expertise poses to a democratically governed society.</p>
</blockquote>
<p>Patil, DJ. 2016. “A Code of Ethics for Data Science.” Medium. <a href="https://medium.com/@dpatil/a-code-of-ethics-for-data-science-cda27d1fac1">Online</a>
. 2.1.2018</p>
<p>Wheeler, Schaun. 2018. “An ethical code can’t be about ethics.” Towards Data Science. <a href="https://towardsdatascience.com/an-ethical-code-cant-be-about-ethics-66acaea6f16f">Online</a>
.          2.6.2018</p>
<p>Eubanks, Virginia. 2018. “A Hippocratic Oath for Data Science*.*” <a href="https://virginia-eubanks.com/2018/02/21/a-hippocratic-oath-for-data-science/">Online</a>
. 2.21.2018</p>
<blockquote>
<p>There has been a healthy debate about a “Hippocratic Oath” for Data Science or a “Data Science Code of Ethics”. These articles provide different viewpoints on that debate and help think about what it means to ethically do data science and what role a professional code of ethics may play.</p>
</blockquote>
<h1 id="further-reading-lists">
  <span class="heading-mark">Further Reading Lists</span>
  <a class="heading-anchor" href="#further-reading-lists" aria-label="Link to this section">#</a>
</h1>
<p>Venkatasubramanian, Suresh and Katie Shelef. 2017. “Ethics of Data Science Course Syllabus.”        University of Utah. <a href="https://utah.instructure.com/courses/462398/assignments/syllabus">Online</a>
.</p>
<blockquote>
<p>This syllabus contains a lot of foundational texts in the ethics of social science as well as a wonderful set of examples of the ethical challenges posed by data science.</p>
</blockquote>
<p>Malliaraki, Eirini. 2018. “Toward ethical, transparent and fair AI/ML: a critical reading list.”   Medium. <a href="https://medium.com/@eirinimalliaraki/toward-ethical-transparent-and-fair-ai-ml-a-critical-reading-list">Online</a>
.</p>
<blockquote>
<p>This is the closest thing to a comprehensive current reading list on transparency and fairness in machine learning and artificial intelligence. This thorough and well-organized reading list has plenty of great further reading to extend on any of the readings covered here.</p>
</blockquote>
<p>Wickham, Hadley. 2018. “Readings in Applied Data Science.” <a href="https://github.com/hadley/stats337">Online</a>
.</p>
<blockquote>
<p>A wide-ranging reading list of applied data science topics. Some would make great case studies for ethical dilemmas in data science, others are critical analyses of the ethics of particular applications of data science.</p>
</blockquote>
<p>Various. 2018. Readings in Data Ethics. O’Reilly. <a href="https://www.oreilly.com/tags/data-ethics">Online</a>
.</p>
<blockquote>
<p>Five short articles that will give you a practical and pragmatic overview for how to implement some ethical safeguards into your data science team and products.</p>
</blockquote>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "The past 18 months I’ve spent a lot of time thinking about the role of experts in a democratic society. Experts pose a particular challenge to democratic governance because expertise is, by its nature, undemocratic. A lot of this thinking has been centered around a particular kind of expertise - perhaps the defining type of expertise of our time - data science. I have been happy to find that a number of very talented researchers and authors have been taking up the question of data science and its role in our society lately. In the past several years a number of thought-provoking and brilliant books and articles have been written that demand the attention of anyone working in data science - but particularly those working in fields where the impact of their work is public.",
  "og_image": "https://jaredknowles.com/og/posts/the-ethics-of-data-science.png",
  "og_title": "The Ethics of Data Science"
}
</posse:post></entry><entry><title>Learn More About Civilytics</title><link href="https://jaredknowles.com/posts/learn-more-about-civilytics/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/learn-more-about-civilytics/</id><published>2019-03-20T14:15:13Z</published><updated>2019-03-20T14:15:13Z</updated><summary>Civilytics Consulting is an LLC founded by me in 2016 to pursue my goals of providing capacity-building data science services to public institution partners. Now that we are in our third year there will be lots of new developments to announce in 2019 - so stay tuned. For now, I thought I would share this snapshot of where Civilytics is and what we do: About Civilytics # Civilytics Consulting is a data science consulting firm founded in 2016 by me, Dr. Jared E. Knowles. Civilytics has served clients at all levels of government in several policy areas including K-12 education, higher education, policing, and taxation. Dr. Knowles pioneered an award-winning machine learning algorithm for the state of Wisconsin that is used across the state to help struggling students get back on track. Since 2012, his work has been used around the country to improve the accuracy of predictive analytics systems in education. He has been published in several peer-reviewed journals, including the Journal of Educational Data Mining and Journal of Policy Analysis and Management. He has a Ph.D. in Political Science with expertise in the management and governance of publicly-run education agencies. He has advised government agencies in a variety of sectors on the development, business use, and policy implications of machine learning tools to augment human decision making.</summary><content type="html"><![CDATA[<p>Civilytics Consulting is an LLC founded by me in 2016 to pursue my goals of providing capacity-building data science services to public institution partners. Now that we are in our third year there will be lots of new developments to announce in 2019 - so stay tuned. For now, I thought I would share this snapshot of where Civilytics is and what we do:</p>
<h3 id="about-civilytics">
  <span class="heading-mark">About Civilytics</span>
  <a class="heading-anchor" href="#about-civilytics" aria-label="Link to this section">#</a>
</h3>
<p>Civilytics Consulting is a data science consulting firm founded in 2016 by me, Dr. Jared E. Knowles. Civilytics has served clients at all levels of government in several policy areas including K-12 education, higher education, policing, and taxation. Dr. Knowles pioneered an award-winning machine learning algorithm for the state of Wisconsin that is used across the state to help struggling students get back on track. Since 2012, his work has been used around the country to improve the accuracy of predictive analytics systems in education. He has been published in several peer-reviewed journals, including the Journal of Educational Data Mining and Journal of Policy Analysis and Management. He has a Ph.D. in Political Science with expertise in the management and governance of publicly-run education agencies. He has advised government agencies in a variety of sectors on the development, business use, and policy implications of machine learning tools to augment human decision making.</p>
<p>Civilytics Consulting focuses on building the capacity of its public institution partners to sustain analytic products after the initial work. By using open source software and open data sources, providing all source code and extensive documentation, and excellent training of customers in the product, Civilytics ensures that its clients can maintain products developed long after the contract ends. This sustainable partnership model is unique, allowing agencies to build lasting organizational change. Civilytics approaches projects with a human centered design methodology that creates lasting partnerships with clients.</p>
<h3 id="projects">
  <span class="heading-mark">Projects</span>
  <a class="heading-anchor" href="#projects" aria-label="Link to this section">#</a>
</h3>
<p><strong>Recent Projects</strong></p>
<ul>
<li>placing 2nd out of over 60 entries in a data science competition hosted on <a href="https://www.kaggle.com/civilytics/geoprocessing-and-analysis-of-cpe-data/">Kaggle for the Center for Policing Equity</a>
</li>
<li>building an enrollment projection system for a state higher education system</li>
<li>building an equity focused dashboard and reporting tool for student recruitment</li>
<li>designing a curriculum for data analysts in education agencies</li>
<li>developing an open platform for education analysts to reproduce results using synthetic data to protect confidential records and increase collaboration</li>
</ul>
<p><strong>Past Projects</strong></p>
<ul>
<li>advised on the development of a statewide college and career readiness identification system and high school graduation early warning system</li>
<li>audited and reviewed predictive models for a number of local government services</li>
<li>built programmatic city accountability reports on police performance</li>
<li>a published review of public data sources available to integrate with State Longitudinal Data Systems (SLDSs)</li>
</ul>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "Civilytics Consulting is an LLC founded by me in 2016 to pursue my goals of providing capacity-building data science services to public institution partners. Now that we are in our third year there will be lots of new developments to announce in 2019 - so stay tuned. For now, I thought I would share this snapshot of where Civilytics is and what we do: About Civilytics # Civilytics Consulting is a data science consulting firm founded in 2016 by me, Dr. Jared E. Knowles. Civilytics has served clients at all levels of government in several policy areas including K-12 education, higher education, policing, and taxation. Dr. Knowles pioneered an award-winning machine learning algorithm for the state of Wisconsin that is used across the state to help struggling students get back on track. Since 2012, his work has been used around the country to improve the accuracy of predictive analytics systems in education. He has been published in several peer-reviewed journals, including the Journal of Educational Data Mining and Journal of Policy Analysis and Management. He has a Ph.D. in Political Science with expertise in the management and governance of publicly-run education agencies. He has advised government agencies in a variety of sectors on the development, business use, and policy implications of machine learning tools to augment human decision making.",
  "og_image": "https://jaredknowles.com/og/posts/learn-more-about-civilytics.png",
  "og_title": "Learn More About Civilytics"
}
</posse:post></entry><entry><title>Announcing a major update to merTools</title><link href="https://jaredknowles.com/posts/announcing-a-major-update-to-mertools/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/announcing-a-major-update-to-mertools/</id><published>2016-12-13T13:36:16Z</published><updated>2016-12-13T13:36:16Z</updated><category term="R"/><category term="multilevel models"/><category term="applied models"/><summary>merTools is an R package that is designed to make working with multilevel models from lme4, particularly large models with many random effects, fast and easy. With merTools you can generate prediction intervals that incorporate various components of uncertainty (fixed effect, random effect, and model uncertainty), you can get the expected rank of individual random effect levels (a combination of magnitude and precision of the estimate) and you can explore the substantive effect of variables in the model using a Shiny application interactively!</summary><content type="html"><![CDATA[<p>merTools is an R package that is designed to make working with multilevel models from lme4, particularly large models with many random effects, fast and easy. With merTools you can generate prediction intervals that incorporate various components of uncertainty (fixed effect, random effect, and model uncertainty), you can get the expected rank of individual random effect levels (a combination of magnitude and precision of the estimate) and you can explore the substantive effect of variables in the model using a Shiny application interactively!</p>
<p>Recently, we&#8217;ve updated the package to significantly improve performance and accuracy. You can get it on CRAN now.</p>
<p>Below are some updates from the NEWS.md. To learn more check out the package development on <a href="http://www.github.com/jknowles/merTools">GitHub</a>
. You can also read previous a <a href="http://jaredknowles.com/journal/2015/8/12/announcing-mertools">previous blog entry discussing</a>
 the package and its uses.</p>
<h2 id="mertools-030">
  <span class="heading-mark">merTools 0.3.0</span>
  <a class="heading-anchor" href="#mertools-030" aria-label="Link to this section">#</a>
</h2>
<ul>
<li>Improve handling of formulas. If the original <code>merMod</code> has functions specified
in the formula, the <code>draw</code> and <code>wiggle</code> functions will check for this and attempt
to respect these variable transformations. Where this is not possible a warning
will be issued. Most common transformations are respected as long as the the
original variable is passed untransformed to the model.</li>
<li>Change the calculations of the residual variance. Previously residual variance
was used to inflate both the variance around the fixed parameters and around the
predicted values themselves. This was incorrect and resulted in overly conservative
estimates. Now the residual variance is appropriately only used around the
final predictions</li>
<li>New option for <code>predictInterval</code> that allows the user to return the full
interval, the fixed component, the random component, or the fixed and each random
component separately for each observation</li>
<li>Fixed a bug with slope+intercept random terms that caused a miscalculation of
the random component</li>
<li>Add comparison to <code>rstanarm</code> to the Vignette</li>
<li>Make <code>expectedRank</code> output more <code>tidy</code> like and allow function to calculate
expected rank for all terms at once
<ul>
<li>Note, this breaks the API by changing the names of the columns in the output
of this function</li>
</ul>
</li>
<li>Remove tests that test for timing to avoid issues with R-devel JIT compiler</li>
<li>Remove <code>plyr</code> and replace with <code>dplyr</code></li>
<li>Fix issue #62 <code>varList</code> will now throw an error if <code>==</code> is used instead of <code>=</code></li>
<li>Fix issue #54 <code>predictInterval</code> did not included random effects in calculations
when <code>newdata</code> had more than 1000 rows and/or user specified <code>parallel=TRUE</code>.
Note: fix was to disable the <code>.paropts</code> option for <code>predictInterval</code> &#8230; user
can still specify for <em>temporary</em> backward compatibility but this should be
either removed or fixed in the permanent solution.</li>
<li>Fix issue #53 about problems with <code>predictInterval</code> when only specific levels
of a grouping factor are in <code>newdata</code> with the colon specification of
interactions</li>
<li>Fix issue #52 ICC wrong calculations &#8230; we just needed to square the standard
deviations that we pulled</li>
</ul>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "merTools is an R package that is designed to make working with multilevel models from lme4, particularly large models with many random effects, fast and easy. With merTools you can generate prediction intervals that incorporate various components of uncertainty (fixed effect, random effect, and model uncertainty), you can get the expected rank of individual random effect levels (a combination of magnitude and precision of the estimate) and you can explore the substantive effect of variables in the model using a Shiny application interactively!",
  "og_image": "https://jaredknowles.com/og/posts/announcing-a-major-update-to-mertools.png",
  "og_title": "Announcing a major update to merTools"
}
</posse:post></entry><entry><title>Introducing the R Data Science Livestream</title><link href="https://jaredknowles.com/posts/introducing-the-r-data-science-livestream/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/introducing-the-r-data-science-livestream/</id><published>2016-09-22T15:37:03Z</published><updated>2016-09-22T15:37:03Z</updated><category term="R"/><summary>Have you ever watched a livestream? Have you ever wondered what the actual minute to minute of doing data science looks like? Do you wonder if other R users have the same frustrations as you? If yes – then read on! I’m off on a new professional adventure where I am doing public facing work for the first time in years. While working at home the other day I thought it would be a great idea to keep myself on-task and document my decisions if I recorded myself working with my webcam. Then, I thought, why stop there – why not livestream my work?</summary><content type="html"><![CDATA[<p>Have you ever watched a livestream? Have you ever wondered what the actual minute to minute of doing data science looks like? Do you wonder if other R users have the same frustrations as you? If yes &#8211; then read on!</p>
<p>I&#8217;m off on a new professional adventure where I am doing public facing work for the first time in years. While working at home the other day I thought it would be a great idea to keep myself on-task and document my decisions if I recorded myself working with my webcam. Then, I thought, why stop there &#8211; why not livestream my work?</p>
<p>And thus, the <a href="https://jknowles.github.io/DataScienceLivestream/">R Data Science Livestream</a>
 was born. The idea is that every day for an hour or two I will livestream myself doing some data science tasks related to my current project &#8211; which is to analyze dozens of years of FBI Uniform Crime reports (<a href="https://jknowles.github.io/DataScienceLivestream/pages/project.html">read more</a>
). I haven&#8217;t done much R coding in the last 4 months, so it&#8217;s also a good way to shake off the rust of being out of the game for so long.</p>
<p>So if you are at all interested or curious why someone would do this, check out <a href="https://jknowles.github.io/DataScienceLivestream/">the landing page I put up to document the project</a>
 and if you are really curious, maybe even <a href="https://www.youtube.com/c/jaredknowles">tune in or watch the archives on YouTube</a>
!</p>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "Have you ever watched a livestream? Have you ever wondered what the actual minute to minute of doing data science looks like? Do you wonder if other R users have the same frustrations as you? If yes – then read on! I’m off on a new professional adventure where I am doing public facing work for the first time in years. While working at home the other day I thought it would be a great idea to keep myself on-task and document my decisions if I recorded myself working with my webcam. Then, I thought, why stop there – why not livestream my work?",
  "og_image": "https://jaredknowles.com/og/posts/introducing-the-r-data-science-livestream.png",
  "og_title": "Introducing the R Data Science Livestream"
}
</posse:post></entry><entry><title>Explore multilevel models faster with the new merTools R package</title><link href="https://jaredknowles.com/posts/announcing-mertools/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/announcing-mertools/</id><published>2015-09-23T13:49:00Z</published><updated>2015-09-23T13:49:00Z</updated><category term="R"/><summary>Update 1: Package is now available on CRAN . Update 2: Development version also supports models from the blme package. By far the most popular content on this blog are my two tutorials on how to fit and use multilevel models in R. Since publishing those tutorials I have received numerous questions, comments, and hits to this blog looking for more information about multilevel models in R. Since my day job involves fitting and exploring multilevel models, as well as explaining them to a non-technical audience, I began working with my colleague Carl Frederick on an R package to make these tasks easier. Today, I’m happy to announce that our solution, merTools , is now available. Below, I reproduce the package README file, but you can find out more on GitHub. There are two extensive vignettes that describe how to make use of the package, as well as a shiny app that allows interactive model exploration. The package should be available on CRAN within the next few days.</summary><content type="html"><![CDATA[<p>Update 1: Package is now available <a href="https://cran.rstudio.com/web/packages/merTools/index.html">on CRAN</a>
.</p>
<p>Update 2: Development version also supports models from the blme package.</p>
<p>By far the most popular content on this blog are my two tutorials on how to fit and use multilevel models in R. Since publishing those tutorials I have received numerous questions, comments, and hits to this blog looking for more information about multilevel models in R. Since my day job involves fitting and exploring multilevel models, as well as explaining them to a non-technical audience, I began working with my colleague Carl Frederick on an R package to make these tasks easier. Today, I&#8217;m happy to announce that our solution, <a href="http://www.github.com/jknowles/merTools">merTools</a>
, is now available. Below, I reproduce the package README file, but you can find out more on GitHub. There are two extensive vignettes that describe how to make use of the package, as well as a <strong>shiny</strong> app that allows interactive model exploration. The package should be available on CRAN within the next few days.</p>
<p>Working with generalized linear mixed models (GLMM) and linear mixed models (LMM) has become increasingly easy with advances in the <code>lme4</code> package. As we have found ourselves using these models more and more within our work, we, the authors, have developed a set of tools for simplifying and speeding up common tasks for interacting with <code>merMod</code> objects from <code>lme4</code>. This package provides those tools.</p>
<h2 id="installation">
  <span class="heading-mark">Installation</span>
  <a class="heading-anchor" href="#installation" aria-label="Link to this section">#</a>
</h2>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># development version</span>
</span></span><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">install_github</span><span class="p">(</span><span class="s">&#34;jknowles/merTools&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># CRAN version -- coming soon</span>
</span></span><span class="line"><span class="cl"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;merTools&#34;</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<h2 id="shiny-app-and-demo">
  <span class="heading-mark">Shiny App and Demo</span>
  <a class="heading-anchor" href="#shiny-app-and-demo" aria-label="Link to this section">#</a>
</h2>
<p>The easiest way to demo the features of this application is to use the bundled Shiny application which launches a number of the metrics here to aide in exploring the model. To do this:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">devtools</span><span class="o">::</span><span class="nf">install_github</span><span class="p">(</span><span class="s">&#34;jknowles/merTools&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">merTools</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">m1</span> <span class="o">&lt;-</span> <span class="nf">lmer</span><span class="p">(</span><span class="n">y</span> <span class="o">~</span> <span class="n">service</span> <span class="o">+</span> <span class="n">lectage</span> <span class="o">+</span> <span class="n">studage</span> <span class="o">+</span> <span class="p">(</span><span class="m">1</span><span class="o">|</span><span class="n">d</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="m">1</span><span class="o">|</span><span class="n">s</span><span class="p">),</span> <span class="n">data</span><span class="o">=</span><span class="n">InstEval</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">shinyMer</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="n">simData</span> <span class="o">=</span> <span class="n">InstEval[1</span><span class="o">:</span><span class="m">100</span><span class="p">,</span> <span class="n">]</span><span class="p">)</span> <span class="c1"># just try the first 100 rows of data</span></span></span></code></pre></div>
</figure>
<p><figure class="post-figure">
  <div class="photo-frame"><img src="/posts/announcing-mertools/plot-1_hu_f335679cd0d89bee.webp"
             srcset="/posts/announcing-mertools/plot-1_hu_f335679cd0d89bee.webp 760w, /posts/announcing-mertools/plot-1_hu_101d7260bee23244.webp 1366w"
             sizes="(max-width: 800px) 100vw, 760px"
             alt="Screenshot of the shinyMer app: a sidebar of simulation controls beside the Prediction uncertainty tab, which plots a 95% prediction interval for each observation." width="1366" height="768" loading="lazy" decoding="async"></div></figure>
</p>
<p>On the first tab, the function presents the prediction intervals for the data selected by user which are calculated using the <code>predictInterval</code> function within the package. This function calculates prediction intervals quickly by sampling from the simulated distribution of the fixed effect and random effect terms and combining these simulated estimates to produce a distribution of predictions for each observation. This allows prediction intervals to be generated from very large models where the use of <code>bootMer</code> would not be feasible computationally.</p>
<p><figure class="post-figure">
  <div class="photo-frame"><img src="/posts/announcing-mertools/plot-2_hu_9ed71cc90e5107a3.webp"
             srcset="/posts/announcing-mertools/plot-2_hu_9ed71cc90e5107a3.webp 760w, /posts/announcing-mertools/plot-2_hu_8c4dc5f7d15bdf7c.webp 1357w"
             sizes="(max-width: 800px) 100vw, 760px"
             alt="Screenshot of the shinyMer Parameters tab showing the original model call, a dot-and-whisker plot of fixed effects, and effect-range plots for the two grouping terms." width="1357" height="553" loading="lazy" decoding="async"></div></figure>
</p>
<p>On the next tab the distribution of the fixed effect and group-level effects is depicted on confidence interval plots. These are useful for diagnostics and provide a way to inspect the relative magnitudes of various parameters. This tab makes use of four related functions in <code>merTools</code>: <code>FEsim</code>, <code>plotFEsim</code>, <code>REsim</code> and <code>plotREsim</code> which are available to be used on their own as well.</p>
<p><figure class="post-figure">
  <div class="photo-frame"><img src="/posts/announcing-mertools/plot-3_hu_59f42a3060e0fa93.webp"
             srcset="/posts/announcing-mertools/plot-3_hu_59f42a3060e0fa93.webp 760w, /posts/announcing-mertools/plot-3_hu_f875073c49b226b1.webp 1345w"
             sizes="(max-width: 800px) 100vw, 760px"
             alt="Screenshot of the shinyMer Substantive Effect tab with small-multiple panels comparing the impact of the grouping term and a selected fixed effect across cases." width="1345" height="649" loading="lazy" decoding="async"></div></figure>
</p>
<p>On the third tab are some convenient ways to show the influence or magnitude of effects by leveraging the power of <code>predictInterval</code>. For each case, up to 12, in the selected data type, the user can view the impact of changing either one of the fixed effect or one of the grouping level terms. Using the <code>REimpact</code> function, each case is simulated with the model’s prediction if all else was held equal, but the observation was moved through the distribution of the fixed effect or the random effect term. This is plotted on the scale of the dependent variable, which allows the user to compare the magnitude of effects across variables, and also between models on the same data.</p>
<h2 id="predicting">
  <span class="heading-mark">Predicting</span>
  <a class="heading-anchor" href="#predicting" aria-label="Link to this section">#</a>
</h2>
<p>Standard prediction looks like so.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">predict</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="n">newdata</span> <span class="o">=</span> <span class="n">InstEval[1</span><span class="o">:</span><span class="m">10</span><span class="p">,</span> <span class="n">]</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt;        1        2        3        4        5        6        7        8</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 3.146336 3.165211 3.398499 3.114248 3.320686 3.252670 4.180896 3.845218</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt;        9       10</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 3.779336 3.331012</span></span></span></code></pre></div>
</figure>
<p>With <code>predictInterval</code> we obtain predictions that are more like the standard objects produced by <code>lm</code> and <code>glm</code>:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1">#predictInterval(m1, newdata = InstEval[1:10, ]) # all other parameters are optional</span>
</span></span><span class="line"><span class="cl"><span class="nf">predictInterval</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="n">newdata</span> <span class="o">=</span> <span class="n">InstEval[1</span><span class="o">:</span><span class="m">10</span><span class="p">,</span> <span class="n">]</span><span class="p">,</span> <span class="n">n.sims</span> <span class="o">=</span> <span class="m">500</span><span class="p">,</span> <span class="n">level</span> <span class="o">=</span> <span class="m">0.9</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                <span class="n">stat</span> <span class="o">=</span> <span class="s">&#39;median&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt;         fit      lwr      upr</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 1  3.074148 1.112255 4.903116</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 2  3.243587 1.271725 5.200187</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 3  3.529055 1.409372 5.304214</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 4  3.072788 1.079944 5.142912</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 5  3.395598 1.268169 5.327549</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 6  3.262092 1.333713 5.304931</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 7  4.215371 2.136654 6.078790</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 8  3.816399 1.860071 5.769248</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 9  3.811090 1.697161 5.775237</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 10 3.337685 1.417322 5.341484</span></span></span></code></pre></div>
</figure>
<p>Note that <code>predictInterval</code> is slower because it is computing simulations. It can also return all of the simulated <code>yhat</code> values as an attribute to the predict object itself.</p>
<p><code>predictInterval</code> uses the <code>sim</code> function from the <code>arm</code> package heavily to draw the distributions of the parameters of the model. It then combines these simulated values to create a distribution of the <code>yhat</code> for each observation.</p>
<h2 id="plotting">
  <span class="heading-mark">Plotting</span>
  <a class="heading-anchor" href="#plotting" aria-label="Link to this section">#</a>
</h2>
<p><code>merTools</code> also provides functionality for inspecting <code>merMod</code> objects visually. The easiest are getting the posterior distributions of both fixed and random effect parameters.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">feSims</span> <span class="o">&lt;-</span> <span class="nf">FEsim</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="n">n.sims</span> <span class="o">=</span> <span class="m">100</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">head</span><span class="p">(</span><span class="n">feSims</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt;          term        mean      median         sd</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 1 (Intercept)  3.22673524  3.22793168 0.01798444</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 2    service1 -0.07331857 -0.07482390 0.01304097</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 3   lectage.L -0.18419526 -0.18451731 0.01726253</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 4   lectage.Q  0.02287717  0.02187172 0.01328641</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 5   lectage.C -0.02282755 -0.02117014 0.01324410</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 6   lectage^4 -0.01940499 -0.02041036 0.01196718</span></span></span></code></pre></div>
</figure>
<p>And we can also plot this:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">plotFEsim</span><span class="p">(</span><span class="nf">FEsim</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="n">n.sims</span> <span class="o">=</span> <span class="m">100</span><span class="p">),</span> <span class="n">level</span> <span class="o">=</span> <span class="m">0.9</span><span class="p">,</span> <span class="n">stat</span> <span class="o">=</span> <span class="s">&#39;median&#39;</span><span class="p">,</span> <span class="n">intercept</span> <span class="o">=</span> <span class="kc">FALSE</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<p><figure class="post-figure">
  <div class="photo-frame"><img src="/posts/announcing-mertools/plot-4_hu_bfe3fdefdd46696a.webp"
             srcset="/posts/announcing-mertools/plot-4_hu_bfe3fdefdd46696a.webp 760w, /posts/announcing-mertools/plot-4_hu_7f27d58a02a1b5b2.webp 1344w"
             sizes="(max-width: 800px) 100vw, 760px"
             alt="Dot-and-whisker plot of simulated fixed-effect medians with intervals, sorted from studage.L (most positive) down to lectage.L (most negative), against a red zero line." width="1344" height="960" loading="lazy" decoding="async"></div></figure>
</p>
<p>We can also quickly make caterpillar plots for the random-effect terms:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">reSims</span> <span class="o">&lt;-</span> <span class="nf">REsim</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="n">n.sims</span> <span class="o">=</span> <span class="m">100</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">head</span><span class="p">(</span><span class="n">reSims</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt;   groupFctr groupID        term        mean      median        sd</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 1         s       1 (Intercept)  0.15317316  0.11665654 0.3255914</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 2         s       2 (Intercept) -0.08744824 -0.03964493 0.2940082</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 3         s       3 (Intercept)  0.29063126  0.30065450 0.2882751</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 4         s       4 (Intercept)  0.26176515  0.26428522 0.2972536</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 5         s       5 (Intercept)  0.06069458  0.06518977 0.3105805</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 6         s       6 (Intercept)  0.08055309  0.05872426 0.2182059</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">plotREsim</span><span class="p">(</span><span class="nf">REsim</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="n">n.sims</span> <span class="o">=</span> <span class="m">100</span><span class="p">),</span> <span class="n">stat</span> <span class="o">=</span> <span class="s">&#39;median&#39;</span><span class="p">,</span> <span class="n">sd</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<p><figure class="post-figure">
  <div class="photo-frame"><img src="/posts/announcing-mertools/plot-5_hu_b030477cecbe3de6.webp"
             srcset="/posts/announcing-mertools/plot-5_hu_b030477cecbe3de6.webp 760w, /posts/announcing-mertools/plot-5_hu_dec351e782924aff.webp 1344w"
             sizes="(max-width: 800px) 100vw, 760px"
             alt="Caterpillar plots of simulated random-effect ranges for the d and s grouping factors, each sorted from negative to positive around a red zero line." width="1344" height="960" loading="lazy" decoding="async"></div></figure>
</p>
<p>Note that <code>plotREsim</code> highlights group levels that have a simulated distribution that does not overlap 0 – these appear darker. The lighter bars represent grouping levels that are not distinguishable from 0 in the data.</p>
<p>Sometimes the random effects can be hard to interpret and not all of them are meaningfully different from zero. To help with this <code>merTools</code> provides the <code>expectedRank</code> function, which provides the percentile ranks for the observed groups in the random effect distribution taking into account both the magnitude and uncertainty of the estimated effect for each group.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">ranks</span> <span class="o">&lt;-</span> <span class="nf">expectedRank</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="n">groupFctr</span> <span class="o">=</span> <span class="s">&#34;d&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">head</span><span class="p">(</span><span class="n">ranks</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt;      d (Intercept) (Intercept)_var       ER pctER</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 1 1866   1.2553613     0.012755634 1123.806   100</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 2 1258   1.1674852     0.034291228 1115.766    99</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 3  240   1.0933372     0.008761218 1115.090    99</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 4   79   1.0998653     0.023095979 1112.315    99</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 5  676   1.0169070     0.026562174 1101.553    98</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 6   66   0.9568607     0.008602823 1098.049    97</span></span></span></code></pre></div>
</figure>
<h2 id="effect-simulation">
  <span class="heading-mark">Effect Simulation</span>
  <a class="heading-anchor" href="#effect-simulation" aria-label="Link to this section">#</a>
</h2>
<p>It can still be difficult to interpret the results of LMM and GLMM models, especially the relative influence of varying parameters on the predicted outcome. This is where the <code>REimpact</code> and the <code>wiggle</code> functions in <code>merTools</code> can be handy.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">impSim</span> <span class="o">&lt;-</span> <span class="nf">REimpact</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="n">InstEval[7</span><span class="p">,</span> <span class="n">]</span><span class="p">,</span> <span class="n">groupFctr</span> <span class="o">=</span> <span class="s">&#34;d&#34;</span><span class="p">,</span> <span class="n">breaks</span> <span class="o">=</span> <span class="m">5</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                   <span class="n">n.sims</span> <span class="o">=</span> <span class="m">300</span><span class="p">,</span> <span class="n">level</span> <span class="o">=</span> <span class="m">0.9</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">impSim</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt;   case bin   AvgFit     AvgFitSE nobs</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 1    1   1 2.787033 2.801368e-04  193</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 2    1   2 3.260565 5.389196e-05  240</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 3    1   3 3.561137 5.976653e-05  254</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 4    1   4 3.840941 6.266748e-05  265</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 5    1   5 4.235376 1.881360e-04  176</span></span></span></code></pre></div>
</figure>
<p>The result of <code>REimpact</code> shows the change in the <code>yhat</code> as the case we supplied to <code>newdata</code> is moved from the first to the fifth quintile in terms of the magnitude of the group factor coefficient. We can see here that the individual professor effect has a strong impact on the outcome variable. This can be shown graphically as well:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">ggplot</span><span class="p">(</span><span class="n">impSim</span><span class="p">,</span> <span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="nf">factor</span><span class="p">(</span><span class="n">bin</span><span class="p">),</span> <span class="n">y</span> <span class="o">=</span> <span class="n">AvgFit</span><span class="p">,</span> <span class="n">ymin</span> <span class="o">=</span> <span class="n">AvgFit</span> <span class="o">-</span> <span class="m">1.96</span><span class="o">*</span><span class="n">AvgFitSE</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                   <span class="n">ymax</span> <span class="o">=</span> <span class="n">AvgFit</span> <span class="o">+</span> <span class="m">1.96</span><span class="o">*</span><span class="n">AvgFitSE</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">  <span class="nf">geom_pointrange</span><span class="p">()</span> <span class="o">+</span> <span class="nf">theme_bw</span><span class="p">()</span> <span class="o">+</span> <span class="nf">labs</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="s">&#34;Bin of `d` term&#34;</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="s">&#34;Predicted Fit&#34;</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<p><figure class="post-figure">
  <div class="photo-frame"><img src="/posts/announcing-mertools/plot-6_hu_3261964aec56854b.webp"
             srcset="/posts/announcing-mertools/plot-6_hu_3261964aec56854b.webp 760w, /posts/announcing-mertools/plot-6_hu_a6bd1a2c3be2a0f1.webp 1344w"
             sizes="(max-width: 800px) 100vw, 760px"
             alt="Scatterplot of predicted fit rising steadily from about 2.8 to 4.2 across five bins of the d grouping term." width="1344" height="960" loading="lazy" decoding="async"></div></figure>
</p>
<p>Here the standard error is a bit different – it is the weighted standard error of the mean effect within the bin. It does not take into account the variability within the effects of each observation in the bin – accounting for this variation will be a future addition to <code>merTools</code>.</p>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "Update 1: Package is now available on CRAN . Update 2: Development version also supports models from the blme package. By far the most popular content on this blog are my two tutorials on how to fit and use multilevel models in R. Since publishing those tutorials I have received numerous questions, comments, and hits to this blog looking for more information about multilevel models in R. Since my day job involves fitting and exploring multilevel models, as well as explaining them to a non-technical audience, I began working with my colleague Carl Frederick on an R package to make these tasks easier. Today, I’m happy to announce that our solution, merTools , is now available. Below, I reproduce the package README file, but you can find out more on GitHub. There are two extensive vignettes that describe how to make use of the package, as well as a shiny app that allows interactive model exploration. The package should be available on CRAN within the next few days.",
  "og_image": "https://jaredknowles.com/og/posts/announcing-mertools.png",
  "og_title": "Explore multilevel models faster with the new merTools R package"
}
</posse:post></entry><entry><title>Version 0.9.0 of eeptools released!</title><link href="https://jaredknowles.com/posts/version-090-of-eeptools-released/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/version-090-of-eeptools-released/</id><published>2015-09-22T17:09:54Z</published><updated>2015-09-22T17:09:54Z</updated><category term="R"/><summary>A long overdue overhaul of my eeptools package for R was released to CRAN today and should be showing up in the mirrors soon. The release notes for this version are extensive as this represents a modernization of the package infrastructure and the reimagining of many of the utility functions contained in the package. From the release notes: This is a major update including removing little used functions and renaming and restructuring functions.</summary><content type="html"><![CDATA[<p>A long overdue overhaul of my <strong>eeptools</strong> package for R was released to CRAN today and should be showing up in the mirrors soon. The release notes for this version are extensive as this represents a modernization of the package infrastructure and the reimagining of many of the utility functions contained in the package. From the release notes:</p>
<p>This is a major update including removing little used functions and renaming
and restructuring functions.</p>
<h3 id="new-functionality">
  <span class="heading-mark">New Functionality</span>
  <a class="heading-anchor" href="#new-functionality" aria-label="Link to this section">#</a>
</h3>
<ul>
<li>A new package vignette is now included</li>
<li><code>nth_max</code> function for finding the <code>nth</code> highest value in a vector</li>
<li><code>retained_calc</code> now accepts user specified values for <code>sid</code> and <code>grade</code></li>
<li><code>destring</code> function deprecated and renamed to <code>makenum</code> to better reflect the
use of the function</li>
<li><code>crosstabs</code> function exported to allow the user to generate the data behind
<code>crosstabplot</code> but not draw the plot</li>
</ul>
<h3 id="deprecated">
  <span class="heading-mark">Deprecated</span>
  <a class="heading-anchor" href="#deprecated" aria-label="Link to this section">#</a>
</h3>
<ul>
<li><code>dropbox_source</code> deprecated, use the <code>rdrop2</code> package</li>
<li><code>plotForWord</code> function deprecated in favor of packages like <code>knitr</code> and <code>rmarkdown</code></li>
<li><code>mapmerge2</code> has been deprecated in favor of a tested <code>mapmerge</code></li>
<li><code>mosaictabs.labels</code> has been deprecated in favor of <code>crosstabplot</code></li>
</ul>
<h3 id="bug-fixes">
  <span class="heading-mark">Bug Fixes</span>
  <a class="heading-anchor" href="#bug-fixes" aria-label="Link to this section">#</a>
</h3>
<ul>
<li><code>nsims</code> in <code>gelmansim</code> was renamed to <code>n.sims</code> to align with the <code>arm</code> package</li>
<li>Fixed bug in <code>retained_calc</code> where user specified <code>sid</code> resulted in wrong
ids being returned</li>
<li>Inserted a meaningful error in <code>age_calc</code> when the enddate is before the date
of birth</li>
<li>Fixed issue with <code>age_calc</code> which lead to wrong fraction of age during leap
years</li>
<li><code>lag_data</code> now can do leads and lags and includes proper error messages</li>
<li>fix major bugs for <code>statamode</code> including faulty default to method and returning
objects of the wrong class</li>
<li>add unit tests and continuous integration support for better package updating</li>
<li>fix behavior of <code>max_mis</code> in cases when it is passed an empty vector or a
vector of NA</li>
<li><code>leading_zero</code> function made robust to negative values</li>
<li>added NA handling options to <code>cutoff</code> and <code>thresh</code></li>
<li>Codebase is now tested with <code>lintr</code> to improve readability</li>
</ul>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "A long overdue overhaul of my eeptools package for R was released to CRAN today and should be showing up in the mirrors soon. The release notes for this version are extensive as this represents a modernization of the package infrastructure and the reimagining of many of the utility functions contained in the package. From the release notes: This is a major update including removing little used functions and renaming and restructuring functions.",
  "og_image": "https://jaredknowles.com/og/posts/version-090-of-eeptools-released.png",
  "og_title": "Version 0.9.0 of eeptools released!"
}
</posse:post></entry><entry><title>Announcing the caretEnsemble R package</title><link href="https://jaredknowles.com/posts/announcing-the-caretensemble-package-for-r/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/announcing-the-caretensemble-package-for-r/</id><published>2015-01-20T15:31:04Z</published><updated>2015-01-20T15:31:04Z</updated><category term="R"/><summary>Last week version 1.0 of the caretEnsemble package was released to CRAN. I have co-authored this package with Zach Mayer , who had the original idea of allowing for ensembles of train objects in the caret package. The package is designed to make it easy for the user to optimally combine models of various types together to produce a meta-model with superior fit than the sub-models. From the vignette: "caretEnsemble has 3 primary functions: caretList, caretEnsemble and caretStack. caretList is used to build lists of caret models on the same training data, with the same re-sampling parameters. caretEnsemble andcaretStack are used to create ensemble models from such lists of caret models. caretEnsemble uses greedy optimization to create a simple linear blend of models and caretStack uses a caret model to combine the outputs from several component caret models." I am excited about this package because the ensembling features in caretEnsemble are used to provide additional predictive power in the Wisconsin Dropout Early Warning System (DEWS). I’ve written about this system before , but it is a large-scale machine learning system used to provide schools with a prediction on the likely graduation of their middle grade students. It is easy to implement and provides additional predictive power for the cost of some CPU cycles.</summary><content type="html"><![CDATA[<p>Last week version 1.0 of the <a href="http://cran.r-project.org/web/packages/caretEnsemble/">caretEnsemble</a>
 package was released to CRAN. I have co-authored this package with <a href="http://moderntoolmaking.blogspot.com/">Zach Mayer</a>
, who had the original idea of allowing for ensembles of train objects in the <a href="http://caret.r-forge.r-project.org/">caret</a>
 package. The package is designed to make it easy for the user to optimally combine models of various types together to produce a meta-model with superior fit than the sub-models.</p>
<p><a href="http://cran.r-project.org/web/packages/caretEnsemble/vignettes/caretEnsemble-intro.html">From the vignette:</a>
</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="s">&#34;caretEnsemble has 3 primary functions: caretList, caretEnsemble and caretStack. caretList is used to build lists of caret models on the same training data, with the same re-sampling parameters. caretEnsemble andcaretStack are used to create ensemble models from such lists of caret models. caretEnsemble uses greedy optimization to create a simple linear blend of models and caretStack uses a caret model to combine the outputs from several component caret models.&#34;</span></span></span></code></pre></div>
</figure>
<p>I am excited about this package because the ensembling features in caretEnsemble are used to provide additional predictive power in the Wisconsin Dropout Early Warning System (DEWS). I&#8217;ve <a href="http://jaredknowles.com/journal/2014/12/11/on-early-warning-systems-in-education">written about this system before</a>
, but it is <a href="http://jaredknowles.com/journal/2014/8/24/of-needles-and-haystacks-building-an-accurate-statewide-dropout-early-warning-system-in-wisconsin">a large-scale machine learning system used to provide schools with a prediction on the likely graduation</a>
 of their middle grade students. It is easy to implement and provides additional predictive power for the cost of some CPU cycles.</p>
<p>Additionally, Zach and I have worked hard to make ensembling models *easy*. For example, you can automatically build lists of models &#8211; a library of models &#8211; for ensembling using the <em>caretList</em> function. This <em>caretList</em> can then be used directly in either the <em>caretEnsemble</em> or <em>caretStack</em> mode, depending on how you want to combine the predictions from the submodels. These new caret objects also come with their own S3 methods (adding more in future releases) to allow you to interact with them and explore the results of ensembling &#8211; including <em>summary</em>, <em>print</em>, <em>plot</em>, and variable importance calculations. They also include the all important <em>predict</em> method allowing you to generate predictions for use elsewhere.</p>
<p>Zach has written <a href="http://cran.r-project.org/web/packages/caretEnsemble/vignettes/caretEnsemble-intro.html">a great vignette</a>
 that should give you a feel for how caretEnsemble works. And, we are actively <a href="https://github.com/zachmayer/caretEnsemble">improving caretEnsemble over on GitHub</a>
. Drop by and let us know if you find a bug, have a feature request, or want to let us know how it is working for you!</p>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "Last week version 1.0 of the caretEnsemble package was released to CRAN. I have co-authored this package with Zach Mayer , who had the original idea of allowing for ensembles of train objects in the caret package. The package is designed to make it easy for the user to optimally combine models of various types together to produce a meta-model with superior fit than the sub-models. From the vignette: \"caretEnsemble has 3 primary functions: caretList, caretEnsemble and caretStack. caretList is used to build lists of caret models on the same training data, with the same re-sampling parameters. caretEnsemble andcaretStack are used to create ensemble models from such lists of caret models. caretEnsemble uses greedy optimization to create a simple linear blend of models and caretStack uses a caret model to combine the outputs from several component caret models.\" I am excited about this package because the ensembling features in caretEnsemble are used to provide additional predictive power in the Wisconsin Dropout Early Warning System (DEWS). I’ve written about this system before , but it is a large-scale machine learning system used to provide schools with a prediction on the likely graduation of their middle grade students. It is easy to implement and provides additional predictive power for the cost of some CPU cycles.",
  "og_image": "https://jaredknowles.com/og/posts/announcing-the-caretensemble-package-for-r.png",
  "og_title": "Announcing the caretEnsemble R package"
}
</posse:post></entry><entry><title>On Early Warning Systems in Education</title><link href="https://jaredknowles.com/posts/on-early-warning-systems-in-education/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/on-early-warning-systems-in-education/</id><published>2014-12-12T03:43:37Z</published><updated>2014-12-12T03:43:37Z</updated><category term="education"/><category term="applied models"/><category term="research"/><summary>Recently the NPR program Marketplace did a story about the rise of the use of dropout early warning systems in public schools that you can read or listen to online. I was lucky enough to be interviewed for the piece because of the role I have played in creating the Wisconsin Dropout Early Warning System . Marketplace did a great job explaining the nuances of how these systems fit into the ways schools and districts work. I wanted to use this as an opportunity just write a few thoughts about early warning systems based on my work in this area.</summary><content type="html"><![CDATA[<p>Recently the NPR program Marketplace did a story about the rise of the use of dropout early warning systems in public schools that <a href="http://www.marketplace.org/topics/education/learningcurve/using-data-head-high-school-dropouts">you can read or listen to online.</a>
 I was lucky enough to be interviewed for the piece because of the role I have played in creating the <a href="http://wise.dpi.wi.gov/wise_dashdews">Wisconsin Dropout Early Warning System</a>
. <a href="http://www.marketplace.org">Marketplace</a>
 did a great job explaining the nuances of how these systems fit into the ways schools and districts work. I wanted to use this as an opportunity just write a few thoughts about early warning systems based on my work in this area.</p>
<p>Not discussed in the story was the more wonky but important question of <strong>how</strong> these predictions are obtained. While much academic research discusses the merits of various models in terms of their ability to correctly identify students, there is not as much work done discussing the choice of which system to use in application. By its nature, the problem of identifying dropouts early presents a fundamental trade-off between simplicity and accuracy. When deploying an EWS to educators in the field, then, analysts should focus on not <strong>how accurate a model is, but if it is accurate enough to be useful</strong> and actionable. Unfortunately, most of the research literature on early warning systems focuses on the accuracy of a specific model and not the question of sufficient accuracy.</p>
<p>Part of the reason for this focus is that each model tended to have its own definition of accuracy. A welcome and recent shift in the field to using ROC curves to measure the trade-off between false-positives and false-negatives now allows for these discussions of simple vs. complex to use a common and robust accuracy metric. (Hat tip to <a href="http://www.tc.columbia.edu/academics/index.htm?facid=ab3764#papers">Alex Bowers</a>
 for working to provide these metrics for dozens of published early warning indicators.) For example, <a href="https://ccsr.uchicago.edu/publications/looking-forward-high-school-and-college-middle-grade-indicators-readiness-chicago">a recent report by the Chicago Consortium on School Research</a>
 (CCSR) demonstrates how simple indicators such as grade 8 GPA and attendance can be used to accurately project whether a student will be on-track in grade 9 or not. Using ROC curves, the CCSR can demonstrate on a common scale how accurate these indicators are relative to other more complex indicators and make a compelling case that in Chicago Public Schools these indicators are sufficiently accurate to merit use.</p>
<p>However, in many cases these simple approaches will not be sufficiently accurate to merit use in decision making in schools. Many middle school indicators in the published literature have true dropout identification rates that are quite low, and false-positive rates that are quite high (<a href="http://hdl.handle.net/10022/AC:P:21258">Bowers, Sprott and Taff 2013</a>
). Furthermore, local conditions may mean that a linkage between GPA and dropout that holds in Chicago Public Schools is not nearly as predictive in another context. Additionally, though not empirically testable in most cases, many EWS indicator systems simply serve to provide a numeric account of information that is apparent to schools in other ways &#8211; that is, the indicators selected identify only &#8220;obvious&#8221; cases of students at risk of dropping out. In this case the overhead of collecting data and conducting identification using the model does not generate a payoff of new actionable information with which to intervene.</p>
<p>More complex models have begun to see use perhaps in part to respond to the challenge of providing value added beyond simple checklist indicators. Unlike checklist or indicator systems, machine learning approaches determine the risk factors empirically from historical data. Instead of asserting that an attendance rate above 95% is necessary to be on-track to graduate, a machine learning algorithm identifies the attendance rate cutoff that that best predicts successful graduation. Better still, the algorithm can do this while jointly considering several other factors simultaneously. This approach is the approach <a href="/journal?category=education%20research">I have previously written about taking in Wisconsin</a>
, and has also been developed in <a href="https://github.com/dssg/student-early-warning">Montgomery County Public Schools</a>
 by <a href="http://dssg.io/">Data Science for Social Good fellows</a>
.</p>
<p>In fact, the machine learning model is much more flexible than a checklist approach. Once you have moved away from the desire to provide simple indicators that can be applied by users on the fly, and are willing to deliver analytics much like another piece of data, the sky is the limit. Perhaps the biggest advantage to users is that machine learning approaches allow analysts to help schools understand the degree of student risk. Instead of providing a simple yes or no indicator, these approaches can assign probabilities to student completion, allowing the school to use this information to decide on the appropriate level of response.</p>
<p>This concept of degree is important because not all dropouts are simply the lowest performing students in their respective classes. While low performing students do represent a majority of dropouts in many schools, these students are often already identified and being served because of their low-performance. A true early warning system, then, should seek to identify both students who are already identified by schools and those students who are likely non-completers, but who may not already be receiving intervention services. To live up to their name, early warning systems should identify students earlier than after they have started showing acute signs of low performance or disengagement in school. This is where the most value can be delivered to schools.</p>
<p>Despite the improvements possible with a machine learning approach, a lot of work remains to be done. One issue that was raised in the piece in the Marketplace story is understanding how schools put this information to work. An EWS alone will not improve outcomes for students &#8211; it only enables schools more time to make changes. There has not been much research on how schools use information like an early warning system to make decisions about students. There needs to be more work done to understand how schools as organizations respond to analytics like early warning indicators. What are their misconceptions? How do they work together? What are the barriers to trusting these more complex calculations and the data that underlie them?</p>
<p>The drawback of the machine learning approach, as the authors of the CCSR report note, is that the results are not intuitive to school staff and this makes the resulting intervention strategy seem less clear. This trade-off strikes at the heart of the changing ways in which data analysis is assisting humans in making decisions. The lack of transparency in the approach must be balanced by an effort on the part of the analysts providing the prediction to communicate the results. Communication can make the results easier to interpret, can build trust in the underlying data, and build capacity within organizations to create the feedback loops necessary to sustain the system. Analysts must actively seek out feedback on the performance of the model, learn where users are struggling to understand it, and where users are finding it clash with their own observations. This is a critical piece in ensuring that the trade-off in complexity does not undermine the usefulness of the entire system.</p>
<p>EWS work represents just the beginning for meaningful analytics to replace the deluge of data in K-12 schools. Schools don&#8217;t need more data, they need actionable information that reduces the time not spent on instruction and student services. Analysts don&#8217;t need more student data, they need meaningful feedback loops with educators who are tasked with interpreting these analyses and applying the interventions to drive real change. As more work is done to integrate machine learning and eased data collection into the school system, much more work must be done to understand the interface between school organizations, individual educators, and analytics. <strong>Analysts and educators must work together to continually refine what information schools and teachers need to be successful and how best to deliver that information in an easy to use fashion at the right time.</strong></p>
<h3 id="further-reading">
  <span class="heading-mark">Further Reading</span>
  <a class="heading-anchor" href="#further-reading" aria-label="Link to this section">#</a>
</h3>
<p>Read about the machine learning approach applied in <a href="https://github.com/dssg/student-early-warning">Montgomery County Public Schools</a>
.</p>
<p>Learn about the ROC metric and how various early warning indicators have performed relative to one another <a href="http://hdl.handle.net/10022/AC:P:21258">in this paper by Bowers, Sprott, and Taff.</a>
</p>
<p>Learn about the Wisconsin DEWS machine learning system <a href="http://figshare.com/articles/Of_Needles_and_Haystacks_Building_an_Accurate_Statewide_Dropout_Early_Warning_System_in_Wisconsin/1142580">and how it was developed</a>
.</p>
<p>Read the comparison of <a href="https://ccsr.uchicago.edu/publications/looking-forward-high-school-and-college-middle-grade-indicators-readiness-chicago">many early warning indicators and their performance within Chicago Public Schools.</a>
</p>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "Recently the NPR program Marketplace did a story about the rise of the use of dropout early warning systems in public schools that you can read or listen to online. I was lucky enough to be interviewed for the piece because of the role I have played in creating the Wisconsin Dropout Early Warning System . Marketplace did a great job explaining the nuances of how these systems fit into the ways schools and districts work. I wanted to use this as an opportunity just write a few thoughts about early warning systems based on my work in this area.",
  "og_image": "https://jaredknowles.com/og/posts/on-early-warning-systems-in-education.png",
  "og_title": "On Early Warning Systems in Education"
}
</posse:post></entry><entry><title>Launching DATA-COPE</title><link href="https://jaredknowles.com/posts/launching-data-cope/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/launching-data-cope/</id><published>2014-08-28T21:25:52Z</published><updated>2014-08-28T21:25:52Z</updated><category term="education"/><category term="data-cope"/><category term="education research"/><summary>Really excited to launch my new website - DATA-COPE , a place for education data analysts to share ideas, learn about the latest tools and policies affecting their work, and to keep the pulse on education analytics and the role they play in improving education outcomes. The group is a loosely organized affiliation of state and local education analysts in the United States as well as external researchers at research organizations which provides support to such agencies. The group’s aim is to better learn from one another, share resources, and keep the pulse on any policy or technology related developments that may significantly impact our shared work.</summary><content type="html"><![CDATA[<p>Really excited to launch my new website  - <a href="http://www.datacope.org">DATA-COPE</a>
, a place for education data analysts to share ideas, learn about the latest tools and policies affecting their work, and to keep the pulse on education analytics and the role they play in improving education outcomes. The group is a loosely organized affiliation of state and local education analysts in the United States as well as external researchers at research organizations which provides support to such agencies. The group&#8217;s aim is to better learn from one another, share resources, and keep the pulse on any policy or technology related developments that may significantly impact our shared work. </p>
<p><a href="http://www.knowles.synology.me/DATA-COPE/2014/08/choosing-the-right-analytic-tools-in-your-agency/">My first major post</a>
 on the website covers selecting an analytics platform and software suite to best meet the needs of your agency. Spoiler alert, I&#8217;m a big fan of R!</p>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "Really excited to launch my new website - DATA-COPE , a place for education data analysts to share ideas, learn about the latest tools and policies affecting their work, and to keep the pulse on education analytics and the role they play in improving education outcomes. The group is a loosely organized affiliation of state and local education analysts in the United States as well as external researchers at research organizations which provides support to such agencies. The group’s aim is to better learn from one another, share resources, and keep the pulse on any policy or technology related developments that may significantly impact our shared work.",
  "og_image": "https://jaredknowles.com/og/posts/launching-data-cope.png",
  "og_title": "Launching DATA-COPE"
}
</posse:post></entry><entry><title>Of Needles and Haystacks: Building an Accurate Statewide Dropout Early Warning System in Wisconsin</title><link href="https://jaredknowles.com/posts/of-needles-and-haystacks-building-an-accurate-statewide-dropout-early-warning-system-in-wisconsin/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/of-needles-and-haystacks-building-an-accurate-statewide-dropout-early-warning-system-in-wisconsin/</id><published>2014-08-25T15:35:02Z</published><updated>2014-08-25T15:35:02Z</updated><category term="education"/><category term="early warning systems"/><category term="applied models"/><category term="R"/><category term="research"/><summary>For the past two years I have been working on the Wisconsin Dropout Early Warning System, a predictive model of on time high school graduation for students in grades 6-9 in Wisconsin. The goal of this project is to help schools and educators have an early indication of the likely graduation of each of their students, early enough to allow time for individualized intervention. The result is that nearly 225,000 students receive an individualized prediction at the start and end of the school year. The workflow for the system is mapped out in the diagram below:</summary><content type="html"><![CDATA[<p>For the past two years I have been working on the Wisconsin Dropout Early Warning System, a predictive model of on time high school graduation for students in grades 6-9 in Wisconsin. The goal of this project is to help schools and educators have an early indication of the likely graduation of each of their students, early enough to allow time for individualized intervention. The result is that nearly 225,000 students receive an individualized prediction at the start and end of the school year. The workflow for the system is mapped out in the diagram below:</p>
<p>View fullsize
<figure class="post-figure">
  <div class="photo-frame"><img src="https://media.jaredknowles.com/posts/of-needles-and-haystacks-building-an-accurate-statewide-dropout-early-warning-system-in-wisconsin/of-needles-and-haystacks-building-an-accurate-statewide-dropout-early-warning-system-in-wisconsin-01-DEWS_workflow_diagram.png" alt="DEWS_workflow_diagram.png" loading="lazy" decoding="async"></div></figure>
</p>
<p>The system is moving into its second year of use this fall and I recently completed a research paper describing the predictive analytic approach taken within DEWS. The research paper is intended to serve as a description and guide of the decisions made in developing an automated prediction system using administrative data. The paper covers both the data preparation and model building process as well as a review of the results. A preview is shown below which demonstrates how the EWS models trained in Wisconsin compare to the accuracy reported in the research literature - represented by the points on the graph. The accuracy is measured using the ROC curve. The article <a href="http://figshare.com/articles/Of_Needles_and_Haystacks_Building_an_Accurate_Statewide_Dropout_Early_Warning_System_in_Wisconsin/1142580">is now available via figshare.</a>
</p>
<p>View fullsize
<figure class="post-figure">
  <div class="photo-frame"><img src="https://media.jaredknowles.com/posts/of-needles-and-haystacks-building-an-accurate-statewide-dropout-early-warning-system-in-wisconsin/of-needles-and-haystacks-building-an-accurate-statewide-dropout-early-warning-system-in-wisconsin-02-DEWS_caret_model_comparison.jpg" alt="DEWS_caret_model_comparison" loading="lazy" decoding="async"></div></figure>
</p>
<p>The colored lines represent different types of ensembled statistical models and their accuracy across various thresholds of their predicted probabilities. The points represent the accuracy of comparable models in the research literature using reported accuracy from a paper by <a href="http://www.tc.columbia.edu/academics/?facid=ab3764#vitae">Alex Bowers</a>
:</p>
<p>Bowers, A.J., Sprott, R.*, Taff, S.A.* (2013) Do we Know Who Will Drop Out? A Review of the Predictors of Dropping out of High School: Precision, Sensitivity and Specificity. The <br>
High School Journal, 96(2), 77-100. <a href="http://muse.jhu.edu/journals/high_school_journal/v096/96.2.bowers.html">doi:10.1353/hsj.2013.0000</a>
. This article serves as good background and grounds the benchmarking of the models built in Wisconsin and for others when benchmarking their own models.</p>
<p>Article Abstract:</p>
<p>The state of Wisconsin has one of the highest four year graduation rates in the nation, but deep disparities among student subgroups remain. To address this the state has created the Wisconsin Dropout Early Warning System (DEWS), a predictive model of student dropout risk for students in grades six through nine. The Wisconsin DEWS is in use statewide and currently provides predictions on the likelihood of graduation for over 225,000 students. DEWS represents a novel statistical learning based approach to the challenge of assessing the risk of non-graduation for students and provides highly accurate predictions for students in the middle grades without expanding beyond mandated administrative data collections.</p>
<p>Similar dropout early warning systems are in place in many jurisdictions across the country. Prior research has shown that in many cases the indicators used by such systems do a poor job of balancing the trade off between correct classification of likely dropouts and false-alarm (Bowers et al., 2013). Building on this work, DEWS uses the receiver-operating characteristic (ROC) metric to identify the best possible set of statistical models for making predictions about individual students.</p>
<p>This paper describes the DEWS approach and the software behind it, which leverages the open source statistical language R (R Core Team, 2013). As a result DEWS is a flexible series of software modules that can adapt to new data, new algorithms, and new outcome variables to not only predict dropout, but also impute key predictors as well. The design and implementation of each of these modules is described in detail as well as the open-source R package, EWStools, that serves as the core of DEWS (Knowles, 2014).</p>
<p>Code:</p>
<p>The code that powers the EWS is an open source R extension of the caret package which is available on GitHub: <a href="http://www.github.com/jknowles/EWStools">EWStools on GitHub</a>
</p>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "For the past two years I have been working on the Wisconsin Dropout Early Warning System, a predictive model of on time high school graduation for students in grades 6-9 in Wisconsin. The goal of this project is to help schools and educators have an early indication of the likely graduation of each of their students, early enough to allow time for individualized intervention. The result is that nearly 225,000 students receive an individualized prediction at the start and end of the school year. The workflow for the system is mapped out in the diagram below:",
  "og_image": "https://jaredknowles.com/og/posts/of-needles-and-haystacks-building-an-accurate-statewide-dropout-early-warning-system-in-wisconsin.png",
  "og_title": "Of Needles and Haystacks: Building an Accurate Statewide Dropout Early Warning System in Wisconsin"
}
</posse:post></entry><entry><title>Mixed Effects Tutorial 2: Fun with merMod Objects</title><link href="https://jaredknowles.com/posts/mixed-effects-tutorial-2-fun-with-mermod-objects/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/mixed-effects-tutorial-2-fun-with-mermod-objects/</id><published>2014-05-17T22:38:18Z</published><updated>2014-05-17T22:38:18Z</updated><category term="R"/><summary>Update: Since this post was released I have co-authored an R package to make some of the items in this post easier to do. This package is called merTools and is available on CRAN and on GitHub. To read more about it, read my new pos t here and check out the packageon GitHub . Introduction # First of all, be warned, the terminology surrounding multilevel models is vastly inconsistent. For example, multilevel models themselves may be referred to as hierarchical linear models, random effects models, multilevel models, random intercept models, random slope models, or pooling models. Depending on the discipline, software used, and the academic literature many of these terms may be referring to the same general modeling strategy. In this tutorial I will attempt to provide a user guide to multilevel modeling by demonstrating how to fit multilevel models in R and by attempting to connect the model fitting procedure to commonly used terminology used regarding these models.</summary><content type="html"><![CDATA[<p><strong>Update</strong>: Since this post was released I have co-authored an R package to make some of the items in this post easier to do. This package is called merTools and is available on CRAN and on GitHub. To read more about it, read <a href="http://jaredknowles.com/journal/2015/8/12/announcing-mertools">my new pos</a>
<a href="https://www.civilytics.com/posts/2015/explore-multilevel-models-faster-with-the-new-mertools-r-package/">t here and</a>
 check out the package<a href="http://www.github.com/jknowles/merTools">on GitHub</a>
.</p>
<h2 id="introduction">
  <span class="heading-mark">Introduction</span>
  <a class="heading-anchor" href="#introduction" aria-label="Link to this section">#</a>
</h2>
<p>First of all, be warned, the terminology surrounding multilevel models is vastly inconsistent. For
example, multilevel models themselves may be referred to as hierarchical linear models, random
effects models, multilevel models, random intercept models, random slope models, or pooling models.
Depending on the discipline, software used, and the academic literature many of these terms may be
referring to the same general modeling strategy. In this tutorial I will attempt to provide a user
guide to multilevel modeling by demonstrating how to fit multilevel models in R and by attempting to
connect the model fitting procedure to commonly used terminology used regarding these models.</p>
<p>We will cover the following topics:</p>
<ul>
<li>The structure and methods of <code>merMod</code> objects</li>
<li>Extracting random effects of <code>merMod</code> objects</li>
<li>Plotting and interpreting <code>merMod</code> objects</li>
</ul>
<p>If you haven&#8217;t already, make sure you head over to the <a href="https://www.civilytics.com/posts/2013/getting-started-with-mixed-effect-models-in-r/">Getting Started With Multilevel Models
tutorial</a>

in order to ensure you have set up your environment correctly and installed all the necessary
packages. The tl;dr is that you will need:</p>
<ul>
<li>A current version of R (2.15 or greater)</li>
<li>The <code>lme4</code> package (<code>install.packages(&quot;lme4&quot;)</code>)</li>
</ul>
<h2 id="read-in-the-data">
  <span class="heading-mark">Read in the data</span>
  <a class="heading-anchor" href="#read-in-the-data" aria-label="Link to this section">#</a>
</h2>
<p>Multilevel models are appropriate for a particular kind of data structure where units are nested
within groups (generally 5+ groups) and where we want to model the group structure of the data. For
our introductory example we will start with a simple example from the <code>lme4</code> documentation and
explain what the model is doing. We will use data from Jon Starkweather at the <a href="http://bayes.acs.unt.edu:8083/BayesContent/class/Jon/">University of North
Texas</a>
. Visit the excellent tutorial
<a href="http://bayes.acs.unt.edu:8083/BayesContent/class/Jon/Benchmarks/LinearMixedModels_JDS_Dec2010.pdf">available here for
more.</a>
</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">lme4</span><span class="p">)</span> <span class="c1"># load library</span>
</span></span><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">arm</span><span class="p">)</span> <span class="c1"># convenience functions for regression in R</span>
</span></span><span class="line"><span class="cl"><span class="n">lmm.data</span> <span class="o">&lt;-</span> <span class="nf">read.table</span><span class="p">(</span><span class="s">&#34;http://bayes.acs.unt.edu:8083/BayesContent/class/Jon/R_SC/Module9/lmm.data.txt&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                       <span class="n">header</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s">&#34;,&#34;</span><span class="p">,</span> <span class="n">na.strings</span><span class="o">=</span><span class="s">&#34;NA&#34;</span><span class="p">,</span> <span class="n">dec</span><span class="o">=</span><span class="s">&#34;.&#34;</span><span class="p">,</span> <span class="n">strip.white</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">#summary(lmm.data)</span>
</span></span><span class="line"><span class="cl"><span class="nf">head</span><span class="p">(</span><span class="n">lmm.data</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   id    extro     open    agree    social class school</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 1  1 63.69356 43.43306 38.02668  75.05811     d     IV</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 2  2 69.48244 46.86979 31.48957  98.12560     a     VI</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 3  3 79.74006 32.27013 40.20866 116.33897     d     VI</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 4  4 62.96674 44.40790 30.50866  90.46888     c     IV</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 5  5 64.24582 36.86337 37.43949  98.51873     d     IV</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 6  6 50.97107 46.25627 38.83196  75.21992     d      I</span></span></span></code></pre></div>
</figure>
<p>Here we have data on the extroversion of subjects nested within classes and within schools.</p>
<p>Let&#8217;s understand the structure of the data a bit before we begin:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">str</span><span class="p">(</span><span class="n">lmm.data</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## &#39;data.frame&#39;:    1200 obs. of  7 variables:</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  $ id    : int  1 2 3 4 5 6 7 8 9 10 ...</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  $ extro : num  63.7 69.5 79.7 63 64.2 ...</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  $ open  : num  43.4 46.9 32.3 44.4 36.9 ...</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  $ agree : num  38 31.5 40.2 30.5 37.4 ...</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  $ social: num  75.1 98.1 116.3 90.5 98.5 ...</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  $ class : Factor w/ 4 levels &#34;a&#34;,&#34;b&#34;,&#34;c&#34;,&#34;d&#34;: 4 1 4 3 4 4 4 4 1 2 ...</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  $ school: Factor w/ 6 levels &#34;I&#34;,&#34;II&#34;,&#34;III&#34;,..: 4 6 6 4 4 1 3 4 3 1 ...</span></span></span></code></pre></div>
</figure>
<p>Here we see we have two possible grouping variables &#8211; <code>class</code> and <code>school</code>. Let&#8217;s
explore them a bit further:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">table</span><span class="p">(</span><span class="n">lmm.data</span><span class="o">$</span><span class="n">class</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##   a   b   c   d </span>
</span></span><span class="line"><span class="cl"><span class="c1">## 300 300 300 300</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">table</span><span class="p">(</span><span class="n">lmm.data</span><span class="o">$</span><span class="n">school</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##   I  II III  IV   V  VI </span>
</span></span><span class="line"><span class="cl"><span class="c1">## 200 200 200 200 200 200</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">table</span><span class="p">(</span><span class="n">lmm.data</span><span class="o">$</span><span class="n">class</span><span class="p">,</span> <span class="n">lmm.data</span><span class="o">$</span><span class="n">school</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##    </span>
</span></span><span class="line"><span class="cl"><span class="c1">##      I II III IV  V VI</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   a 50 50  50 50 50 50</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   b 50 50  50 50 50 50</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   c 50 50  50 50 50 50</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   d 50 50  50 50 50 50</span></span></span></code></pre></div>
</figure>
<p>This is a perfectly balanced dataset. In all likelihood you aren&#8217;t working with a perfectly balanced
dataset, but we&#8217;ll explore the implications for that in the future. For now, let&#8217;s plot the data a
bit. Using the excellent <code>xyplot</code> function in the <code>lattice</code> package, we can explore the relationship
between schools and classes across our variables.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">require</span><span class="p">(</span><span class="n">lattice</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">xyplot</span><span class="p">(</span><span class="n">extro</span> <span class="o">~</span> <span class="n">open</span> <span class="o">+</span> <span class="n">social</span> <span class="o">+</span> <span class="n">agree</span> <span class="o">|</span> <span class="n">class</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">lmm.data</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">                 <span class="n">auto.key</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="m">.85</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="m">.035</span><span class="p">,</span> <span class="n">corner</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">)),</span> 
</span></span><span class="line"><span class="cl">       <span class="n">layout</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">4</span><span class="p">,</span><span class="m">1</span><span class="p">),</span> <span class="n">main</span> <span class="o">=</span> <span class="s">&#34;Extroversion by Class&#34;</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<p><figure class="post-figure">
  <div class="photo-frame"><img src="https://media.jaredknowles.com/posts/mixed-effects-tutorial-2-fun-with-mermod-objects/mixed-effects-tutorial-2-fun-with-mermod-objects-01-xyplot1-1.svg" alt="" loading="lazy" decoding="async"></div></figure>
</p>
<p>Here we see that within classes there are clear stratifications and we also see that the <code>social</code>
variable is strongly distinct from the <code>open</code> and <code>agree</code> variables. We also see that class <code>a</code> and
class <code>d</code> have significantly more spread in their lowest and highest bands respectively. Let&#8217;s next
plot the data by <code>school</code>.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">xyplot</span><span class="p">(</span><span class="n">extro</span> <span class="o">~</span> <span class="n">open</span> <span class="o">+</span> <span class="n">social</span> <span class="o">+</span> <span class="n">agree</span> <span class="o">|</span> <span class="n">school</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">lmm.data</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">                 <span class="n">auto.key</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="m">.85</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="m">.035</span><span class="p">,</span> <span class="n">corner</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">)),</span> 
</span></span><span class="line"><span class="cl">       <span class="n">layout</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">3</span><span class="p">,</span> <span class="m">2</span><span class="p">),</span> <span class="n">main</span> <span class="o">=</span> <span class="s">&#34;Extroversion by School&#34;</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<p><figure class="post-figure">
  <div class="photo-frame"><img src="https://media.jaredknowles.com/posts/mixed-effects-tutorial-2-fun-with-mermod-objects/mixed-effects-tutorial-2-fun-with-mermod-objects-02-bycaseplot-1.svg" alt="" loading="lazy" decoding="async"></div></figure>
</p>
<p>By school we see that students are tightly grouped, but that school <code>I</code> and school <code>VI</code> show
substantially more dispersion than the other schools. The same pattern among our predictors holds
between schools as it did between classes. Let&#8217;s put it all together:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">xyplot</span><span class="p">(</span><span class="n">extro</span> <span class="o">~</span> <span class="n">open</span> <span class="o">+</span> <span class="n">social</span> <span class="o">+</span> <span class="n">agree</span> <span class="o">|</span> <span class="n">school</span> <span class="o">+</span> <span class="n">class</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">lmm.data</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">                 <span class="n">auto.key</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="m">.85</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="m">.035</span><span class="p">,</span> <span class="n">corner</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">)),</span> 
</span></span><span class="line"><span class="cl">       <span class="n">main</span> <span class="o">=</span> <span class="s">&#34;Extroversion by School and Class&#34;</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<p><figure class="post-figure">
  <div class="photo-frame"><img src="https://media.jaredknowles.com/posts/mixed-effects-tutorial-2-fun-with-mermod-objects/mixed-effects-tutorial-2-fun-with-mermod-objects-03-xyplot3-1.svg" alt="" loading="lazy" decoding="async"></div></figure>
</p>
<p>Here we can see that school and class seem to closely differentiate the relationship between our
predictors and extroversion.</p>
<h2 id="exploring-the-internals-of-a-mermod-object">
  <span class="heading-mark">Exploring the Internals of a merMod Object</span>
  <a class="heading-anchor" href="#exploring-the-internals-of-a-mermod-object" aria-label="Link to this section">#</a>
</h2>
<p>In the last tutorial we fit a series of random intercept models to our nested data. We will examine
the <code>lmerMod</code> object produced when we fit this model in much more depth in order to understand how
to work with mixed effect models in R. We start by fitting a the basic example below grouped by
class:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">MLexamp1</span> <span class="o">&lt;-</span> <span class="nf">lmer</span><span class="p">(</span><span class="n">extro</span> <span class="o">~</span> <span class="n">open</span> <span class="o">+</span> <span class="n">agree</span> <span class="o">+</span> <span class="n">social</span> <span class="o">+</span> <span class="p">(</span><span class="m">1</span><span class="o">|</span><span class="n">school</span><span class="p">),</span> <span class="n">data</span><span class="o">=</span><span class="n">lmm.data</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">class</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] &#34;lmerMod&#34;</span>
</span></span><span class="line"><span class="cl"><span class="c1">## attr(,&#34;package&#34;)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] &#34;lme4&#34;</span></span></span></code></pre></div>
</figure>
<p>First, we see that <code>MLexamp1</code> is now an R object of the class <code>lmerMod</code>. This <code>lmerMod</code> object is an
<strong>S4</strong> class, and to explore its structure, we use <code>slotNames</code>:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">slotNames</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  [1] &#34;resp&#34;    &#34;Gp&#34;      &#34;call&#34;    &#34;frame&#34;   &#34;flist&#34;   &#34;cnms&#34;    &#34;lower&#34;   &#34;theta&#34;   &#34;beta&#34;   </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [10] &#34;u&#34;       &#34;devcomp&#34; &#34;pp&#34;      &#34;optinfo&#34;</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1">#showMethods(classes=&#34;lmerMod&#34;)</span></span></span></code></pre></div>
</figure>
<p>Within the <code>lmerMod</code> object we see a number of objects that we may wish to explore. To look at any
of these, we can simply type <code>MLexamp1@</code> and then the slot name itself. For example:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">MLexamp1</span><span class="o">@</span><span class="n">call</span> <span class="c1"># returns the model call</span>
</span></span><span class="line"><span class="cl"><span class="c1">## lmer(formula = extro ~ open + agree + social + (1 | school), </span>
</span></span><span class="line"><span class="cl"><span class="c1">##     data = lmm.data)</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">MLexamp1</span><span class="o">@</span><span class="n">beta</span> <span class="c1"># returns the betas</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] 59.116514199  0.009750941  0.027788360 -0.002151446</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">class</span><span class="p">(</span><span class="n">MLexamp1</span><span class="o">@</span><span class="n">frame</span><span class="p">)</span> <span class="c1"># returns the class for the frame slot</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] &#34;data.frame&#34;</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">head</span><span class="p">(</span><span class="n">MLexamp1</span><span class="o">@</span><span class="n">frame</span><span class="p">)</span> <span class="c1"># returns the model frame</span>
</span></span><span class="line"><span class="cl"><span class="c1">##      extro     open    agree    social school</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 1 63.69356 43.43306 38.02668  75.05811     IV</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 2 69.48244 46.86979 31.48957  98.12560     VI</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 3 79.74006 32.27013 40.20866 116.33897     VI</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 4 62.96674 44.40790 30.50866  90.46888     IV</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 5 64.24582 36.86337 37.43949  98.51873     IV</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 6 50.97107 46.25627 38.83196  75.21992      I</span></span></span></code></pre></div>
</figure>
<p>The <code>merMod</code> object has a number of methods available &#8211; too many to enumerate here. But, we will go
over some of the more common in the list below:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">methods</span><span class="p">(</span><span class="n">class</span> <span class="o">=</span> <span class="s">&#34;merMod&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  [1] anova          as.function    coef           confint        cooks.distance deviance      </span>
</span></span><span class="line"><span class="cl"><span class="c1">##  [7] df.residual    display        drop1          extractAIC     extractDIC     family        </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [13] fitted         fixef          formula        getL           getME          hatvalues     </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [19] influence      isGLMM         isLMM          isNLMM         isREML         logLik        </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [25] mcsamp         model.frame    model.matrix   ngrps          nobs           plot          </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [31] predict        print          profile        qqmath         ranef          refit         </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [37] refitML        rePCA          residuals      rstudent       se.coef        show          </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [43] sigma.hat      sigma          sim            simulate       standardize    summary       </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [49] terms          update         VarCorr        vcov           weights       </span>
</span></span><span class="line"><span class="cl"><span class="c1">## see &#39;?methods&#39; for accessing help and source code</span></span></span></code></pre></div>
</figure>
<p>A common need is to extract the fixed effects from a <code>merMod</code> object. <code>fixef</code> extracts a named
numeric vector of the fixed effects, which is handy.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">fixef</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  (Intercept)         open        agree       social </span>
</span></span><span class="line"><span class="cl"><span class="c1">## 59.116514199  0.009750941  0.027788360 -0.002151446</span></span></span></code></pre></div>
</figure>
<p>If you want to get a sense of the p-values or statistical significance of these parameters, first
consult the <code>lme4</code> help by running <code>?mcmcsamp</code> for a rundown of various ways of doing this. One
convenient way built into the package is:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">confint</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">,</span> <span class="n">level</span> <span class="o">=</span> <span class="m">0.99</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##                   0.5 %      99.5 %</span>
</span></span><span class="line"><span class="cl"><span class="c1">## .sig01       4.91840325 23.88757695</span>
</span></span><span class="line"><span class="cl"><span class="c1">## .sigma       2.53286648  2.81455985</span>
</span></span><span class="line"><span class="cl"><span class="c1">## (Intercept) 46.27750884 71.95609747</span>
</span></span><span class="line"><span class="cl"><span class="c1">## open        -0.02464506  0.04414924</span>
</span></span><span class="line"><span class="cl"><span class="c1">## agree       -0.01163700  0.06721354</span>
</span></span><span class="line"><span class="cl"><span class="c1">## social      -0.01492690  0.01062510</span></span></span></code></pre></div>
</figure>
<p>From here we can see first that our fixed effect parameters overlap 0 indicating no evidence of an
effect. We can also see that <code>.sig01</code>, which is our estimate of the variability in the random
effect, is very large and very widely defined. This indicates we may have a lack of precision
between our groups - either because the group effect is small between groups, we have too few groups
to get a more precise estimate, we have too few units within each group, or a combination of all of
the above.</p>
<p>Another common need is to extract the residual standard error, which is necessary for calculating
effect sizes. To get a named vector of the residual standard error:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">sigma</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] 2.670886</span></span></span></code></pre></div>
</figure>
<p>For example, it is common practice in education research to standardize fixed effects into &#8220;effect
sizes&#8221; by dividing the fixed effect paramters by the residual standard error, which can be
accomplished in <code>lme4</code> easily:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">fixef</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">)</span> <span class="o">/</span> <span class="nf">sigma</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   (Intercept)          open         agree        social </span>
</span></span><span class="line"><span class="cl"><span class="c1">## 22.1336707437  0.0036508262  0.0104041726 -0.0008055176</span></span></span></code></pre></div>
</figure>
<p>From this, we can see that our predictors of openness, agreeableness and social are virtually
useless in predicting extroversion &#8211; as our plots showed. Let&#8217;s turn our attention to the random
effects next.</p>
<h2 id="explore-group-variation-and-random-effects">
  <span class="heading-mark">Explore Group Variation and Random Effects</span>
  <a class="heading-anchor" href="#explore-group-variation-and-random-effects" aria-label="Link to this section">#</a>
</h2>
<p>In all likelihood you fit a mixed-effect model because you are directly interested in the
group-level variation in your model. It is not immediately clear how to explore this group level
variation from the results of <code>summary.merMod</code>. What we get from this output is the variance and the
standard deviation of the group effect, but we do not get effects for individual groups. This is
where the <code>ranef</code> function comes in handy.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">ranef</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## $school</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     (Intercept)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## I    -14.090991</span>
</span></span><span class="line"><span class="cl"><span class="c1">## II    -6.183368</span>
</span></span><span class="line"><span class="cl"><span class="c1">## III   -1.970700</span>
</span></span><span class="line"><span class="cl"><span class="c1">## IV     1.965938</span>
</span></span><span class="line"><span class="cl"><span class="c1">## V      6.330710</span>
</span></span><span class="line"><span class="cl"><span class="c1">## VI    13.948412</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## with conditional variances for &#34;school&#34;</span></span></span></code></pre></div>
</figure>
<p>Running the <code>ranef</code> function gives us the intercepts for each school, but not much additional
information &#8211; for example the precision of these estimates. To do that, we need some additional
commands:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">re1</span> <span class="o">&lt;-</span> <span class="nf">ranef</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">,</span> <span class="n">condVar</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span> <span class="c1"># save the ranef.mer object</span>
</span></span><span class="line"><span class="cl"><span class="nf">class</span><span class="p">(</span><span class="n">re1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] &#34;ranef.mer&#34;</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">attr</span><span class="p">(</span><span class="n">re1[[1]]</span><span class="p">,</span> <span class="n">which</span> <span class="o">=</span> <span class="s">&#34;postVar&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## , , 1</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##           [,1]</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1,] 0.0356549</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## , , 2</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##           [,1]</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1,] 0.0356549</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## , , 3</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##           [,1]</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1,] 0.0356549</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## , , 4</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##           [,1]</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1,] 0.0356549</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## , , 5</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##           [,1]</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1,] 0.0356549</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## , , 6</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##           [,1]</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1,] 0.0356549</span></span></span></code></pre></div>
</figure>
<p>The <code>ranef.mer</code> object is a list which contains a data.frame for each group level. The dataframe
contains the random effects for each group (here we only have an intercept for each school). When we
ask <code>lme4</code> for the conditional variance of the random effects it is stored in an <code>attribute</code> of
those dataframes as a list of variance-covariance matrices.</p>
<p>This structure is indeed <em>complicated</em>, but it is powerful as it allows for nested, grouped, and
cross-level random effects. Also, the creators of <code>lme4</code> have provided users with some simple
shortcuts to get what they are really interested in out of a <code>ranef.mer</code> object.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">re1</span> <span class="o">&lt;-</span> <span class="nf">ranef</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">,</span> <span class="n">condVar</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">,</span> <span class="n">whichel</span> <span class="o">=</span> <span class="s">&#34;school&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">print</span><span class="p">(</span><span class="n">re1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## $school</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     (Intercept)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## I    -14.090991</span>
</span></span><span class="line"><span class="cl"><span class="c1">## II    -6.183368</span>
</span></span><span class="line"><span class="cl"><span class="c1">## III   -1.970700</span>
</span></span><span class="line"><span class="cl"><span class="c1">## IV     1.965938</span>
</span></span><span class="line"><span class="cl"><span class="c1">## V      6.330710</span>
</span></span><span class="line"><span class="cl"><span class="c1">## VI    13.948412</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## with conditional variances for &#34;school&#34;</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">dotplot</span><span class="p">(</span><span class="n">re1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## $school</span></span></span></code></pre></div>
</figure>
<p><figure class="post-figure">
  <div class="photo-frame"><img src="https://media.jaredknowles.com/posts/mixed-effects-tutorial-2-fun-with-mermod-objects/mixed-effects-tutorial-2-fun-with-mermod-objects-04-ranef3-1.svg" alt="" loading="lazy" decoding="async"></div></figure>
</p>
<p>This graphic shows a <code>dotplot</code> of the random effect terms, also known as a caterpillar plot. Here
you can clearly see the effects of each school on <code>extroversion</code> as well as their standard errors to
help identify how distinct the random effects are from one another. Interpreting random effects is
notably tricky, but for assistance I would recommend looking at a few of these resources:</p>
<ul>
<li>Gelman and Hill 2006 - <a href="http://www.stat.columbia.edu/~gelman/arm/">Data Analysis Using Regression and Multilevel/Hierarchical Techniques</a>
</li>
<li>John Fox - An R Compantion to Applied Regression <a href="http://socserv.mcmaster.ca/jfox/Books/Companion-1E/appendix-mixed-models.pdf">web appendix</a>
</li>
</ul>
<h2 id="using-simulation-and-plots-to-explore-random-effects">
  <span class="heading-mark">Using Simulation and Plots to Explore Random Effects</span>
  <a class="heading-anchor" href="#using-simulation-and-plots-to-explore-random-effects" aria-label="Link to this section">#</a>
</h2>
<p>A common econometric approach is to create what are known as <strong>empirical Bayes</strong> estimates of the
group-level terms. Unfortunately there is not much agreement about what constitutes a proper
standard error for the random effect terms or even how to consistently define <strong>empirical Bayes</strong>
estimates.<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> However, in R there are a few additional ways to get estimates of the random effects
that provide the user with information about the relative sizes of the effects for each unit and the
precision in that estimate. To do this, we use the <code>sim</code> function in the <code>arm</code> package.<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup></p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># A function to extract simulated estimates of random effect paramaters from </span>
</span></span><span class="line"><span class="cl"><span class="c1"># lme4 objects using the sim function in arm</span>
</span></span><span class="line"><span class="cl"><span class="c1"># whichel = the character for the name of the grouping effect to extract estimates for </span>
</span></span><span class="line"><span class="cl"><span class="c1"># nsims = the number of simulations to pass to sim</span>
</span></span><span class="line"><span class="cl"><span class="c1"># x = model object</span>
</span></span><span class="line"><span class="cl"><span class="n">REsim</span> <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">whichel</span><span class="o">=</span><span class="kc">NULL</span><span class="p">,</span> <span class="n">nsims</span><span class="p">){</span>
</span></span><span class="line"><span class="cl">  <span class="nf">require</span><span class="p">(</span><span class="n">plyr</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">  <span class="n">mysim</span> <span class="o">&lt;-</span> <span class="nf">sim</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">n.sims</span> <span class="o">=</span> <span class="n">nsims</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">  <span class="kr">if</span><span class="p">(</span><span class="nf">missing</span><span class="p">(</span><span class="n">whichel</span><span class="p">)){</span>
</span></span><span class="line"><span class="cl">    <span class="n">dat</span> <span class="o">&lt;-</span> <span class="n">plyr</span><span class="o">::</span><span class="nf">adply</span><span class="p">(</span><span class="n">mysim</span><span class="o">@</span><span class="n">ranef[[1]]</span><span class="p">,</span> <span class="nf">c</span><span class="p">(</span><span class="m">2</span><span class="p">,</span> <span class="m">3</span><span class="p">),</span> <span class="n">plyr</span><span class="o">::</span><span class="nf">each</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="n">mean</span><span class="p">,</span> <span class="n">median</span><span class="p">,</span> <span class="n">sd</span><span class="p">)))</span>
</span></span><span class="line"><span class="cl">    <span class="nf">warning</span><span class="p">(</span><span class="s">&#34;Only returning 1st random effect because whichel not specified&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">  <span class="p">}</span> <span class="kr">else</span><span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">dat</span> <span class="o">&lt;-</span> <span class="n">plyr</span><span class="o">::</span><span class="nf">adply</span><span class="p">(</span><span class="n">mysim</span><span class="o">@</span><span class="n">ranef[[whichel]]</span><span class="p">,</span> <span class="nf">c</span><span class="p">(</span><span class="m">2</span><span class="p">,</span> <span class="m">3</span><span class="p">),</span> <span class="n">plyr</span><span class="o">::</span><span class="nf">each</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="n">mean</span><span class="p">,</span> <span class="n">median</span><span class="p">,</span> <span class="n">sd</span><span class="p">)))</span>
</span></span><span class="line"><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="cl">  <span class="kr">return</span><span class="p">(</span><span class="n">dat</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nf">REsim</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">,</span> <span class="n">whichel</span> <span class="o">=</span> <span class="s">&#34;school&#34;</span><span class="p">,</span> <span class="n">nsims</span> <span class="o">=</span> <span class="m">1000</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##    X1          X2       mean     median       sd</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 1   I (Intercept) -14.133834 -14.157333 3.871655</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 2  II (Intercept)  -6.230981  -6.258464 3.873329</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 3 III (Intercept)  -2.011822  -2.034863 3.872661</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 4  IV (Intercept)   1.915608   1.911129 3.866870</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 5   V (Intercept)   6.282461   6.299311 3.867046</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 6  VI (Intercept)  13.901818  13.878827 3.873950</span></span></span></code></pre></div>
</figure>
<p>The <code>REsim</code> function returns for each school the level name <code>X1</code>, the estimate name, <code>X2</code>, the mean
of the estimated values, the median, and the standard deviation of the estimates.</p>
<p>Another convenience function can help us plot these results to see how they compare to the results
of <code>dotplot</code>:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># Dat = results of REsim</span>
</span></span><span class="line"><span class="cl"><span class="c1"># scale = factor to multiply sd by</span>
</span></span><span class="line"><span class="cl"><span class="c1"># var = character of &#34;mean&#34; or &#34;median&#34;</span>
</span></span><span class="line"><span class="cl"><span class="c1"># sd = character of &#34;sd&#34;</span>
</span></span><span class="line"><span class="cl"><span class="n">plotREsim</span> <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">(</span><span class="n">dat</span><span class="p">,</span> <span class="n">scale</span><span class="p">,</span> <span class="n">var</span><span class="p">,</span> <span class="n">sd</span><span class="p">){</span>
</span></span><span class="line"><span class="cl">  <span class="nf">require</span><span class="p">(</span><span class="n">eeptools</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">  <span class="n">dat[</span><span class="p">,</span> <span class="n">sd]</span> <span class="o">&lt;-</span> <span class="n">dat[</span><span class="p">,</span> <span class="n">sd]</span> <span class="o">*</span> <span class="n">scale</span>
</span></span><span class="line"><span class="cl">  <span class="n">dat[</span><span class="p">,</span> <span class="s">&#34;ymax&#34;</span><span class="n">]</span> <span class="o">&lt;-</span> <span class="n">dat[</span><span class="p">,</span> <span class="n">var]</span> <span class="o">+</span> <span class="n">dat[</span><span class="p">,</span> <span class="n">sd]</span> 
</span></span><span class="line"><span class="cl">  <span class="n">dat[</span><span class="p">,</span> <span class="s">&#34;ymin&#34;</span><span class="n">]</span> <span class="o">&lt;-</span> <span class="n">dat[</span><span class="p">,</span> <span class="n">var]</span> <span class="o">-</span> <span class="n">dat[</span><span class="p">,</span> <span class="n">sd]</span> 
</span></span><span class="line"><span class="cl">  <span class="n">dat</span><span class="nf">[order</span><span class="p">(</span><span class="n">dat[</span><span class="p">,</span> <span class="n">var]</span><span class="p">),</span> <span class="s">&#34;id&#34;</span><span class="n">]</span> <span class="o">&lt;-</span> <span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="nf">nrow</span><span class="p">(</span><span class="n">dat</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">  <span class="nf">ggplot</span><span class="p">(</span><span class="n">dat</span><span class="p">,</span> <span class="nf">aes_string</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="s">&#34;id&#34;</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">var</span><span class="p">,</span> <span class="n">ymax</span> <span class="o">=</span> <span class="s">&#34;ymax&#34;</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">                         <span class="n">ymin</span> <span class="o">=</span> <span class="s">&#34;ymin&#34;</span><span class="p">))</span> <span class="o">+</span> 
</span></span><span class="line"><span class="cl">    <span class="nf">geom_pointrange</span><span class="p">()</span> <span class="o">+</span> <span class="nf">theme_dpi</span><span class="p">()</span> <span class="o">+</span> 
</span></span><span class="line"><span class="cl">    <span class="nf">labs</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="s">&#34;Group&#34;</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="s">&#34;Effect Range&#34;</span><span class="p">,</span> <span class="n">title</span> <span class="o">=</span> <span class="s">&#34;Effect Ranges&#34;</span><span class="p">)</span> <span class="o">+</span> 
</span></span><span class="line"><span class="cl">    <span class="nf">theme</span><span class="p">(</span><span class="n">panel.grid.major</span> <span class="o">=</span> <span class="nf">element_blank</span><span class="p">(),</span> <span class="n">panel.grid.minor</span> <span class="o">=</span> <span class="nf">element_blank</span><span class="p">(),</span> 
</span></span><span class="line"><span class="cl">          <span class="n">axis.text.x</span> <span class="o">=</span> <span class="nf">element_blank</span><span class="p">(),</span> <span class="n">axis.ticks.x</span> <span class="o">=</span> <span class="nf">element_blank</span><span class="p">())</span> <span class="o">+</span> 
</span></span><span class="line"><span class="cl">    <span class="nf">geom_hline</span><span class="p">(</span><span class="n">yintercept</span> <span class="o">=</span> <span class="m">0</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="nf">I</span><span class="p">(</span><span class="s">&#34;red&#34;</span><span class="p">),</span> <span class="n">size</span> <span class="o">=</span> <span class="nf">I</span><span class="p">(</span><span class="m">1.1</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nf">plotREsim</span><span class="p">(</span><span class="nf">REsim</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">,</span> <span class="n">whichel</span> <span class="o">=</span> <span class="s">&#34;school&#34;</span><span class="p">,</span> <span class="n">nsims</span> <span class="o">=</span> <span class="m">1000</span><span class="p">),</span> <span class="n">scale</span> <span class="o">=</span> <span class="m">1.2</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">          <span class="n">var</span> <span class="o">=</span> <span class="s">&#34;mean&#34;</span><span class="p">,</span> <span class="n">sd</span> <span class="o">=</span> <span class="s">&#34;sd&#34;</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<p><figure class="post-figure">
  <div class="photo-frame"><img src="https://media.jaredknowles.com/posts/mixed-effects-tutorial-2-fun-with-mermod-objects/mixed-effects-tutorial-2-fun-with-mermod-objects-05-ebplot-1.svg" alt="" loading="lazy" decoding="async"></div></figure>
</p>
<p>This presents a more conservative view of the variation between random effect components. Depending
on how your data was collected and your research question, alternative ways of estimating these
effect sizes are possible. However, proceed with caution.<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup></p>
<p>Another approach recommended by the authors of <code>lme4</code> involves the <code>RLRsim</code> package. Using this
package we can test whether or not inclusion of the random effects improves model fit and we can
evaluate the p-value of additional random effect terms using a likelihood ratio test based on
simulation.<sup id="fnref:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup></p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">RLRsim</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">m0</span> <span class="o">&lt;-</span> <span class="nf">lm</span><span class="p">(</span><span class="n">extro</span> <span class="o">~</span> <span class="n">agree</span> <span class="o">+</span> <span class="n">open</span> <span class="o">+</span> <span class="n">social</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span><span class="n">lmm.data</span><span class="p">)</span> <span class="c1"># fit the null model</span>
</span></span><span class="line"><span class="cl"><span class="nf">exactLRT</span><span class="p">(</span><span class="n">m</span> <span class="o">=</span> <span class="n">MLexamp1</span><span class="p">,</span> <span class="n">m0</span> <span class="o">=</span> <span class="n">m0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##     simulated finite sample distribution of LRT. (p-value based on 10000 simulated values)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## data:  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## LRT = 2957.7, p-value &lt; 2.2e-16</span></span></span></code></pre></div>
</figure>
<p>Here <code>exactLRT</code> issues a warning because we originally fit the model with REML instead of full
maximum likelihood. Fortunately, the <code>refitML</code> function in <code>lme4</code> allows us to easily refit our
model using full maximum likelihood to conduct an exact test easily.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">mA</span> <span class="o">&lt;-</span> <span class="nf">refitML</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">exactLRT</span><span class="p">(</span><span class="n">m</span><span class="o">=</span> <span class="n">mA</span><span class="p">,</span> <span class="n">m0</span> <span class="o">=</span> <span class="n">m0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##     simulated finite sample distribution of LRT. (p-value based on 10000 simulated values)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## data:  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## LRT = 2957.8, p-value &lt; 2.2e-16</span></span></span></code></pre></div>
</figure>
<p>Here we can see that the inclusion of our grouping variable is significant, even though the effect
of each individual group may be substantively small and/or imprecisely measured. This is important
in understanding the correct specification of the model. Our next tutorial will cover specification
tests like this in more detail.</p>
<h2 id="what-do-random-effects-matter">
  <span class="heading-mark">What do Random Effects Matter?</span>
  <a class="heading-anchor" href="#what-do-random-effects-matter" aria-label="Link to this section">#</a>
</h2>
<p>How do interpret the <em>substantive</em> impact of our random effects? This is often critical in
observation work trying to use a multilevel structure to understand the impact that the grouping can
have on the individual observation. To do this we select 12 random cases and then we simulate their
predicted value of <code>extro</code> if they were placed in each of the 6 schools. Note, that this is a very
simple simulation just using the mean of the fixed effect and the conditional mode of the random
effect and not replicating or sampling to get a sense of the variability. This will be left as an
exercise to the reader and/or a future tutorial!</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># Simulate</span>
</span></span><span class="line"><span class="cl"><span class="c1"># Let&#39;s create 12 cases of students</span>
</span></span><span class="line"><span class="cl"><span class="c1"># </span>
</span></span><span class="line"><span class="cl"><span class="c1">#sample some rows</span>
</span></span><span class="line"><span class="cl"><span class="n">simX</span> <span class="o">&lt;-</span> <span class="nf">sample</span><span class="p">(</span><span class="n">lmm.data</span><span class="o">$</span><span class="n">id</span><span class="p">,</span> <span class="m">12</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">simX</span> <span class="o">&lt;-</span> <span class="n">lmm.data[lmm.data</span><span class="o">$</span><span class="n">id</span> <span class="o">%in%</span> <span class="n">simX</span><span class="p">,</span> <span class="nf">c</span><span class="p">(</span><span class="m">3</span><span class="o">:</span><span class="m">5</span><span class="p">)</span><span class="n">]</span> <span class="c1"># get their data</span>
</span></span><span class="line"><span class="cl"><span class="c1"># add an intercept</span>
</span></span><span class="line"><span class="cl"><span class="n">simX[</span><span class="p">,</span> <span class="s">&#34;Intercept&#34;</span><span class="n">]</span> <span class="o">&lt;-</span> <span class="m">1</span>
</span></span><span class="line"><span class="cl"><span class="n">simX</span> <span class="o">&lt;-</span> <span class="n">simX[</span><span class="p">,</span> <span class="nf">c</span><span class="p">(</span><span class="m">4</span><span class="p">,</span> <span class="m">1</span><span class="o">:</span><span class="m">3</span><span class="p">)</span><span class="n">]</span> <span class="c1"># reorder</span>
</span></span><span class="line"><span class="cl"><span class="n">simRE</span> <span class="o">&lt;-</span> <span class="nf">REsim</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">,</span> <span class="n">whichel</span> <span class="o">=</span> <span class="s">&#34;school&#34;</span><span class="p">,</span> <span class="n">nsims</span> <span class="o">=</span> <span class="m">1000</span><span class="p">)</span> <span class="c1"># simulate randome effects</span>
</span></span><span class="line"><span class="cl"><span class="n">simX</span><span class="o">$</span><span class="n">case</span> <span class="o">&lt;-</span> <span class="nf">row.names</span><span class="p">(</span><span class="n">simX</span><span class="p">)</span> <span class="c1"># create a case ID</span>
</span></span><span class="line"><span class="cl"><span class="c1"># expand a grid of case IDs by schools</span>
</span></span><span class="line"><span class="cl"><span class="n">simDat</span> <span class="o">&lt;-</span> <span class="nf">expand.grid</span><span class="p">(</span><span class="n">case</span> <span class="o">=</span> <span class="nf">row.names</span><span class="p">(</span><span class="n">simX</span><span class="p">),</span> <span class="n">school</span> <span class="o">=</span> <span class="nf">levels</span><span class="p">(</span><span class="n">lmm.data</span><span class="o">$</span><span class="n">school</span><span class="p">))</span> 
</span></span><span class="line"><span class="cl"><span class="n">simDat</span> <span class="o">&lt;-</span> <span class="nf">merge</span><span class="p">(</span><span class="n">simX</span><span class="p">,</span> <span class="n">simDat</span><span class="p">)</span> <span class="c1"># merge in the data</span>
</span></span><span class="line"><span class="cl"><span class="c1"># Create the fixed effect predictor</span>
</span></span><span class="line"><span class="cl"><span class="n">simDat[</span><span class="p">,</span> <span class="s">&#34;fepred&#34;</span><span class="n">]</span> <span class="o">&lt;-</span> <span class="p">(</span><span class="n">simDat[</span><span class="p">,</span> <span class="m">2</span><span class="n">]</span> <span class="o">*</span> <span class="nf">fixef</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">)</span><span class="n">[1]</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">simDat[</span><span class="p">,</span> <span class="m">3</span><span class="n">]</span> <span class="o">*</span> <span class="nf">fixef</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">)</span><span class="n">[2]</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="p">(</span><span class="n">simDat[</span><span class="p">,</span> <span class="m">4</span><span class="n">]</span> <span class="o">*</span> <span class="nf">fixef</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">)</span><span class="n">[3]</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">simDat[</span><span class="p">,</span> <span class="m">5</span><span class="n">]</span> <span class="o">*</span> <span class="nf">fixef</span><span class="p">(</span><span class="n">MLexamp1</span><span class="p">)</span><span class="n">[4]</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1"># Add the school effects</span>
</span></span><span class="line"><span class="cl"><span class="n">simDat</span> <span class="o">&lt;-</span> <span class="nf">merge</span><span class="p">(</span><span class="n">simDat</span><span class="p">,</span> <span class="n">simRE[</span><span class="p">,</span> <span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">3</span><span class="p">)</span><span class="n">]</span><span class="p">,</span> <span class="n">by.x</span> <span class="o">=</span> <span class="s">&#34;school&#34;</span><span class="p">,</span> <span class="n">by.y</span><span class="o">=</span><span class="s">&#34;X1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">simDat</span><span class="o">$</span><span class="n">yhat</span> <span class="o">&lt;-</span> <span class="n">simDat</span><span class="o">$</span><span class="n">fepred</span> <span class="o">+</span> <span class="n">simDat</span><span class="o">$</span><span class="n">mean</span> <span class="c1"># add the school specific intercept</span></span></span></code></pre></div>
</figure>
<p>Now that we have set up a simulated dataframe, let&#8217;s plot it, first by case:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">qplot</span><span class="p">(</span><span class="n">school</span><span class="p">,</span> <span class="n">yhat</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">simDat</span><span class="p">)</span> <span class="o">+</span> <span class="nf">facet_wrap</span><span class="p">(</span><span class="o">~</span><span class="n">case</span><span class="p">)</span> <span class="o">+</span> <span class="nf">theme_dpi</span><span class="p">()</span></span></span></code></pre></div>
</figure>
<p><figure class="post-figure">
  <div class="photo-frame"><img src="https://media.jaredknowles.com/posts/mixed-effects-tutorial-2-fun-with-mermod-objects/mixed-effects-tutorial-2-fun-with-mermod-objects-06-bycaseplot-1.svg" alt="" loading="lazy" decoding="async"></div></figure>
</p>
<p>This plot shows us that within each plot, representing a case, there is tremendous variation by
school. So, moving each student into a different school has large effects on the extroversion score.
But, does each case vary at each school?</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">qplot</span><span class="p">(</span><span class="n">case</span><span class="p">,</span> <span class="n">yhat</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">simDat</span><span class="p">)</span> <span class="o">+</span> <span class="nf">facet_wrap</span><span class="p">(</span><span class="o">~</span><span class="n">school</span><span class="p">)</span> <span class="o">+</span> <span class="nf">theme_dpi</span><span class="p">()</span> <span class="o">+</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">theme</span><span class="p">(</span><span class="n">axis.text.x</span> <span class="o">=</span> <span class="nf">element_blank</span><span class="p">())</span></span></span></code></pre></div>
</figure>
<p><figure class="post-figure">
  <div class="photo-frame"><img src="https://media.jaredknowles.com/posts/mixed-effects-tutorial-2-fun-with-mermod-objects/mixed-effects-tutorial-2-fun-with-mermod-objects-07-byschool-1.svg" alt="" loading="lazy" decoding="async"></div></figure>
</p>
<p>Here we can clearly see that within each school the cases are relatively the same indicating that
the group effect is larger than the individual effects.</p>
<p>These plots are useful in demonstrating the relative importance of group and individual effects in a
substantive fashion. Even more can be done to make the the graphs more informative, such as placing
references to the total variability of the outcome and also looking at the distance moving groups
moves each observation from its true value.</p>
<h1 id="conclusion">
  <span class="heading-mark">Conclusion</span>
  <a class="heading-anchor" href="#conclusion" aria-label="Link to this section">#</a>
</h1>
<p><code>lme4</code> provides a very powerful object-oriented toolset for dealing with mixed effect models in R.
Understanding model fit and confidence intervals of <code>lme4</code> objects requires some diligent research
and the use of a variety of functions and extensions of <code>lme4</code> itself. In our next tutorial we will
explore how to identify a proper specification of a random-effect model and Bayesian extensions of
the <code>lme4</code> framework for difficult to specify models. We will also explore the generalized linear
model framework and the <code>glmer</code> function for generalized linear modeling with multi-levels.</p>
<h1 id="appendix">
  <span class="heading-mark">Appendix</span>
  <a class="heading-anchor" href="#appendix" aria-label="Link to this section">#</a>
</h1>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">print</span><span class="p">(</span><span class="nf">sessionInfo</span><span class="p">(),</span><span class="n">locale</span><span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## R version 3.5.3 (2019-03-11)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## Platform: x86_64-w64-mingw32/x64 (64-bit)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## Running under: Windows 10 x64 (build 17134)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## Matrix products: default</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## attached base packages:</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] stats     graphics  grDevices utils     datasets  methods   base     </span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## other attached packages:</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  [1] RLRsim_3.1-3    eeptools_1.2.2  ggplot2_3.1.1   plyr_1.8.4      lattice_0.20-38 arm_1.10-1     </span>
</span></span><span class="line"><span class="cl"><span class="c1">##  [7] MASS_7.3-51.4   lme4_1.1-21     Matrix_1.2-17   knitr_1.22     </span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## loaded via a namespace (and not attached):</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  [1] zoo_1.8-5         tidyselect_0.2.5  xfun_0.6          purrr_0.3.2       splines_3.5.3    </span>
</span></span><span class="line"><span class="cl"><span class="c1">##  [6] colorspace_1.4-1  htmltools_0.3.6   yaml_2.2.0        mgcv_1.8-28       rlang_0.3.4      </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [11] nloptr_1.2.1      pillar_1.3.1      foreign_0.8-71    glue_1.3.1        withr_2.1.2      </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [16] sp_1.3-1          stringr_1.4.0     munsell_0.5.0     gtable_0.3.0      coda_0.19-2      </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [21] evaluate_0.13     labeling_0.3      maptools_0.9-5    lmtest_0.9-37     vcd_1.4-4        </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [26] Rcpp_1.0.1        scales_1.0.0      abind_1.4-5       digest_0.6.18     stringi_1.4.3    </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [31] dplyr_0.8.0.1     grid_3.5.3        tools_3.5.3       magrittr_1.5      lazyeval_0.2.2   </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [36] tibble_2.1.1      crayon_1.3.4      pkgconfig_2.0.2   data.table_1.12.2 assertthat_0.2.1 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [41] minqa_1.2.4       rmarkdown_1.12    R6_2.4.0          boot_1.3-22       nlme_3.1-139     </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [46] compiler_3.5.3</span></span></span></code></pre></div>
</figure>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>[See message from <code>lme4</code> co-author Doug Bates on this subject]. (<a href="https://stat.ethz.ch/pipermail/r-sig-mixed-models/2009q4/002984.html">https://stat.ethz.ch/pipermail/r-sig-mixed-models/2009q4/002984.html</a>
)&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:2">
<p>Andrew Gelman and Yu-Sung Su (2014). arm: Data Analysis Using Regression and Multilevel/Hierarchical Models. R package version
1.7-03. <a href="http://CRAN.R-project.org/package=arm">http://CRAN.R-project.org/package=arm</a>
&#160;<a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:3">
<p><a href="http://glmm.wikidot.com/faq">WikiDot FAQ from the R Mixed Models Mailing List</a>
&#160;<a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:4">
<p>There are also an extensive series of references available in the <code>References</code>
section of the help by running <code>?exactLRT</code> and <code>?exactRLRT</code>.&#160;<a href="#fnref:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "Update: Since this post was released I have co-authored an R package to make some of the items in this post easier to do. This package is called merTools and is available on CRAN and on GitHub. To read more about it, read my new pos t here and check out the packageon GitHub . Introduction # First of all, be warned, the terminology surrounding multilevel models is vastly inconsistent. For example, multilevel models themselves may be referred to as hierarchical linear models, random effects models, multilevel models, random intercept models, random slope models, or pooling models. Depending on the discipline, software used, and the academic literature many of these terms may be referring to the same general modeling strategy. In this tutorial I will attempt to provide a user guide to multilevel modeling by demonstrating how to fit multilevel models in R and by attempting to connect the model fitting procedure to commonly used terminology used regarding these models.",
  "og_image": "https://jaredknowles.com/og/posts/mixed-effects-tutorial-2-fun-with-mermod-objects.png",
  "og_title": "Mixed Effects Tutorial 2: Fun with merMod Objects"
}
</posse:post></entry><entry><title>eeptools 0.3 Released!</title><link href="https://jaredknowles.com/posts/eeptools-03-released/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/eeptools-03-released/</id><published>2013-12-10T05:31:03Z</published><updated>2013-12-10T05:31:03Z</updated><category term="R"/><summary>Version 0.3 of my R package of miscellaneous code has been released, this time with substantial contributions from Jason Becker via GitHub. Progress continues toward the ultimate goal for eeptools to “make it easier for administrators at state and local education agencies to analyze and visualize their data on student, school, and district performance. By putting simple wrappers around a number of R functions to make many common tasks simpler and lower the barrier to entry to statistical analysis.</summary><content type="html"><![CDATA[<p>Version 0.3 of my R package of miscellaneous code has been released, this time with substantial contributions from <a href="http://www.jsonbecker.com/">Jason Becker</a>
 via GitHub. Progress continues toward the ultimate goal for <strong>eeptools</strong> to &#8220;make it easier for administrators at state and local education agencies to analyze and visualize their data on student, school, and district performance. By putting simple wrappers around a number of R functions to make many common tasks simpler and lower the barrier to entry to statistical analysis.</p>
<p>The goal is not to invent new functionality for R, but instead to lower the barrier of entry to doing common and routine data manipulation, visualization, and analysis tasks with education data. By collaborating with other users of education data we can build transparent, efficient, reproducible, and easy to use functions for analysts.</p>
<p>Find out more at <a href="https://github.com/jknowles/eeptools/wiki">https://github.com/jknowles/eeptools/wiki</a>
, and check out the development of this tool.&#8221;</p>
<p>Suggestions and pull requests are very welcome on <a href="https://github.com/jknowles/eeptools">GitHub</a>
.</p>
<p>See the NEWS file in the repository for full updates:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">eeptools</span> <span class="m">0.3</span></span></span></code></pre></div>
</figure>
<hr>
<p>* unit tests for decomma, gelmansim, and statamode using <code>testthat</code> package</p>
<p>* statamode updated to work with data.table</p>
<p>* age_calc function from Jason Becker given new precision option</p>
<p>* moves_calc function from Jason Becker</p>
<p>* gelmansim function to do post-estimation prediction on new data from model objects using functionality in the <code>arm</code> package</p>
<p>* lag_data function to create groupwise nested lags quickly</p>
<p>eeptools 0.2</p>
<hr>
<p>* new functions for building maps with shapefiles including mapmerge to merge a dataframe and a shapefile, and ggmapmerge to conver this to a document for making a map in ggplot2</p>
<p>* statamode updated to allow for multiple methods for handling multiple modes</p>
<p>* remove_stars deleted and replaced with remove_char to allow for users to specify an arbitrary character string to be removed</p>
<p>* add plotForWord function to export plots in a Windows MetaFile for inclusion in Microsoft Office documents</p>
<p>* add age_calc function to allow calculating the age of a vector of birthdates relative to the current date</p>
<p>* fix typos in documentation</p>
<p>* fix startup message behavior</p>
<p>* remove dependencies of the package dramatically so loading is faster and more lightweight</p>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "Version 0.3 of my R package of miscellaneous code has been released, this time with substantial contributions from Jason Becker via GitHub. Progress continues toward the ultimate goal for eeptools to “make it easier for administrators at state and local education agencies to analyze and visualize their data on student, school, and district performance. By putting simple wrappers around a number of R functions to make many common tasks simpler and lower the barrier to entry to statistical analysis.",
  "og_image": "https://jaredknowles.com/og/posts/eeptools-03-released.png",
  "og_title": "eeptools 0.3 Released!"
}
</posse:post></entry><entry><title>Getting Started with Mixed Effect Models in R</title><link href="https://jaredknowles.com/posts/getting-started-with-mixed-effect-models-in-r/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/getting-started-with-mixed-effect-models-in-r/</id><published>2013-11-25T19:47:35Z</published><updated>2013-11-25T19:47:35Z</updated><category term="R"/><summary>Update: Since this post was released I have co-authored an R package to make some of the items in this post easier to do. This package is called merTools and is available on CRAN and on GitHub. To read more about it, read my new post h ere and check out the package on GitHub . Introduction # Analysts dealing with grouped data and complex hierarchical structures in their data ranging from measurements nested within participants, to counties nested within states or students nested within classrooms often find themselves in need of modeling tools to reflect this structure of their data. In R there are two predominant ways to fit multilevel models that account for such structure in the data. These tutorials will show the user how to use both the lme4 package in R to fit linear and nonlinear mixed effect models, and to use rstan to fit fully Bayesian multilevel models. The focus here will be on how to fit the models in R and not the theory behind the models. For background on multilevel modeling, see the references. [1]</summary><content type="html"><![CDATA[<p><strong>Update</strong>: Since this post was released I have co-authored an R package to make some of the items in this post easier to do. This package is called merTools and is available on CRAN and on GitHub. To read more about it, read <a href="http://jaredknowles.com/journal/2015/8/12/announcing-mertools">my new post h</a>
<a href="https://www.civilytics.com/posts/2015/explore-multilevel-models-faster-with-the-new-mertools-r-package/">ere and check</a>
 out the package <a href="http://www.github.com/jknowles/merTools">on GitHub</a>
.</p>
<h2 id="introduction">
  <span class="heading-mark">Introduction</span>
  <a class="heading-anchor" href="#introduction" aria-label="Link to this section">#</a>
</h2>
<p>Analysts dealing with grouped data and complex hierarchical structures in their data ranging from
measurements nested within participants, to counties nested within states or students nested within
classrooms often find themselves in need of modeling tools to reflect this structure of their data.
In R there are two predominant ways to fit multilevel models that account for such structure in the
data. These tutorials will show the user how to use both the <code>lme4</code> package in R to fit linear and
nonlinear mixed effect models, and to use <code>rstan</code> to fit fully Bayesian multilevel models. The focus
here will be on how to fit the models in R and not the theory behind the models. For background on
multilevel modeling, see the references. [1]</p>
<p>This tutorial will cover getting set up and running a few basic models using <code>lme4</code> in R. Future
tutorials will cover:</p>
<ul>
<li>constructing varying intercept, varying slope, and varying slope and intercept models in R</li>
<li>generating predictions and interpreting parameters from mixed-effect models</li>
<li>generalized and non-linear multilevel models</li>
<li>fully Bayesian multilevel models fit with <code>rstan</code> or other MCMC methods</li>
</ul>
<h2 id="setting-up-your-environment">
  <span class="heading-mark">Setting up your enviRonment</span>
  <a class="heading-anchor" href="#setting-up-your-environment" aria-label="Link to this section">#</a>
</h2>
<p>Getting started with multilevel modeling in R is simple. <code>lme4</code> is the canonical package for
implementing multilevel models in R, though there are a number of packages that depend on and
enhance its feature set, including Bayesian extensions. <code>lme4</code> has been recently rewritten to
improve speed and to incorporate a C++ codebase, and as such the features of the package are
somewhat in flux. Be sure to update the package frequently.</p>
<p>To install <code>lme4</code>, we just run:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># Main version</span>
</span></span><span class="line"><span class="cl"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;lme4&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Or to install the dev version</span>
</span></span><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">install_github</span><span class="p">(</span><span class="s">&#34;lme4&#34;</span><span class="p">,</span><span class="n">user</span><span class="o">=</span><span class="s">&#34;lme4&#34;</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<h2 id="read-in-the-data">
  <span class="heading-mark">Read in the data</span>
  <a class="heading-anchor" href="#read-in-the-data" aria-label="Link to this section">#</a>
</h2>
<p>Multilevel models are appropriate for a particular kind of data structure where units are nested
within groups (generally 5+ groups) and where we want to model the group structure of the data. For
our introductory example we will start with a simple example from the <code>lme4</code> documentation and
explain what the model is doing. We will use data from Jon Starkweather at the <a href="http://bayes.acs.unt.edu:8083/BayesContent/class/Jon/">University of North
Texas</a>
. Visit the excellent tutorial
<a href="http://bayes.acs.unt.edu:8083/BayesContent/class/Jon/Benchmarks/LinearMixedModels_JDS_Dec2010.pdf">available here for
more.</a>
</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">lme4</span><span class="p">)</span> <span class="c1"># load library</span>
</span></span><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">arm</span><span class="p">)</span> <span class="c1"># convenience functions for regression in R</span>
</span></span><span class="line"><span class="cl"><span class="n">lmm.data</span> <span class="o">&lt;-</span> <span class="nf">read.table</span><span class="p">(</span><span class="s">&#34;http://bayes.acs.unt.edu:8083/BayesContent/class/Jon/R_SC/Module9/lmm.data.txt&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                       <span class="n">header</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s">&#34;,&#34;</span><span class="p">,</span> <span class="n">na.strings</span><span class="o">=</span><span class="s">&#34;NA&#34;</span><span class="p">,</span> <span class="n">dec</span><span class="o">=</span><span class="s">&#34;.&#34;</span><span class="p">,</span> <span class="n">strip.white</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">#summary(lmm.data)</span>
</span></span><span class="line"><span class="cl"><span class="nf">head</span><span class="p">(</span><span class="n">lmm.data</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   id    extro     open    agree    social class school</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 1  1 63.69356 43.43306 38.02668  75.05811     d     IV</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 2  2 69.48244 46.86979 31.48957  98.12560     a     VI</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 3  3 79.74006 32.27013 40.20866 116.33897     d     VI</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 4  4 62.96674 44.40790 30.50866  90.46888     c     IV</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 5  5 64.24582 36.86337 37.43949  98.51873     d     IV</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 6  6 50.97107 46.25627 38.83196  75.21992     d      I</span></span></span></code></pre></div>
</figure>
<p>Here we have data on the extroversion of subjects nested within classes and within schools.</p>
<h2 id="fit-the-non-multilevel-models">
  <span class="heading-mark">Fit the Non-Multilevel Models</span>
  <a class="heading-anchor" href="#fit-the-non-multilevel-models" aria-label="Link to this section">#</a>
</h2>
<p>Let&#8217;s start by fitting a simple OLS regression of measures of openness, agreeableness, and
socialability on extroversion. Here we use the <code>display</code> function in the excellent <code>arm</code> package for
abbreviated output. Other options include <code>stargazer</code> for LaTeX typeset tables, <code>xtable</code>, or the
<code>ascii</code> package for more flexible plain text output options.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">OLSexamp</span> <span class="o">&lt;-</span> <span class="nf">lm</span><span class="p">(</span><span class="n">extro</span> <span class="o">~</span> <span class="n">open</span> <span class="o">+</span> <span class="n">agree</span> <span class="o">+</span> <span class="n">social</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">lmm.data</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">display</span><span class="p">(</span><span class="n">OLSexamp</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## lm(formula = extro ~ open + agree + social, data = lmm.data)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##             coef.est coef.se</span>
</span></span><span class="line"><span class="cl"><span class="c1">## (Intercept) 57.84     3.15  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## open         0.02     0.05  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## agree        0.03     0.05  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## social       0.01     0.02  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## ---</span>
</span></span><span class="line"><span class="cl"><span class="c1">## n = 1200, k = 4</span>
</span></span><span class="line"><span class="cl"><span class="c1">## residual sd = 9.34, R-Squared = 0.00</span></span></span></code></pre></div>
</figure>
<p>So far this model does not fit very well at all. The R model interface is quite a simple one with
the dependent variable being specified first, followed by the <code>~</code> symbol. The righ hand side,
predictor variables, are each named. Addition signs indicate that these are modeled as additive
effects. Finally, we specify that datframe on which to calculate the model. Here we use the <code>lm</code>
function to perform OLS regression, but there are many other options in R.</p>
<p>If we want to extract measures such as the AIC, we may prefer to fit a generalized linear model with
<code>glm</code> which produces a model fit through maximum likelihood estimation. Note that the model formula
specification is the same.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">MLexamp</span> <span class="o">&lt;-</span> <span class="nf">glm</span><span class="p">(</span><span class="n">extro</span> <span class="o">~</span> <span class="n">open</span> <span class="o">+</span> <span class="n">agree</span> <span class="o">+</span> <span class="n">social</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">lmm.data</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">display</span><span class="p">(</span><span class="n">MLexamp</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## glm(formula = extro ~ open + agree + social, data = lmm.data)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##             coef.est coef.se</span>
</span></span><span class="line"><span class="cl"><span class="c1">## (Intercept) 57.84     3.15  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## open         0.02     0.05  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## agree        0.03     0.05  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## social       0.01     0.02  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## ---</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   n = 1200, k = 4</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   residual deviance = 104378.2, null deviance = 104432.7 (difference = 54.5)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   overdispersion parameter = 87.3</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   residual sd is sqrt(overdispersion) = 9.34</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">AIC</span><span class="p">(</span><span class="n">MLexamp</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] 8774.291</span></span></span></code></pre></div>
</figure>
<p>This results in a poor model fit. Let&#8217;s look at a simple varying intercept model now.</p>
<h2 id="fit-a-varying-intercept-model">
  <span class="heading-mark">Fit a varying intercept model</span>
  <a class="heading-anchor" href="#fit-a-varying-intercept-model" aria-label="Link to this section">#</a>
</h2>
<p>Depending on disciplinary norms, our next step might be to fit a varying intercept model using a
grouping variable such as school or classes. Using the <code>glm</code> function and the familiar formula
interface, such a fit is easy:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">MLexamp.2</span> <span class="o">&lt;-</span> <span class="nf">glm</span><span class="p">(</span><span class="n">extro</span> <span class="o">~</span> <span class="n">open</span> <span class="o">+</span> <span class="n">agree</span> <span class="o">+</span> <span class="n">social</span> <span class="o">+</span> <span class="n">class</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">lmm.data</span> <span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">display</span><span class="p">(</span><span class="n">MLexamp.2</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## glm(formula = extro ~ open + agree + social + class, data = lmm.data)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##             coef.est coef.se</span>
</span></span><span class="line"><span class="cl"><span class="c1">## (Intercept) 56.05     3.09  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## open         0.03     0.05  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## agree       -0.01     0.05  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## social       0.01     0.02  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## classb       2.06     0.75  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## classc       3.70     0.75  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## classd       5.67     0.75  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## ---</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   n = 1200, k = 7</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   residual deviance = 99187.7, null deviance = 104432.7 (difference = 5245.0)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   overdispersion parameter = 83.1</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   residual sd is sqrt(overdispersion) = 9.12</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">AIC</span><span class="p">(</span><span class="n">MLexamp.2</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] 8719.083</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">anova</span><span class="p">(</span><span class="n">MLexamp</span><span class="p">,</span> <span class="n">MLexamp.2</span><span class="p">,</span> <span class="n">test</span><span class="o">=</span><span class="s">&#34;F&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## Analysis of Deviance Table</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## Model 1: extro ~ open + agree + social</span>
</span></span><span class="line"><span class="cl"><span class="c1">## Model 2: extro ~ open + agree + social + class</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Resid. Df Resid. Dev Df Deviance     F   Pr(&gt;F)    </span>
</span></span><span class="line"><span class="cl"><span class="c1">## 1      1196     104378                               </span>
</span></span><span class="line"><span class="cl"><span class="c1">## 2      1193      99188  3   5190.5 20.81 3.82e-13 ***</span>
</span></span><span class="line"><span class="cl"><span class="c1">## ---</span>
</span></span><span class="line"><span class="cl"><span class="c1">## Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1</span></span></span></code></pre></div>
</figure>
<p>This is called a fixed-effects specification often. This is simply the case of fitting a separate
dummy variable as a predictor for each class. We can see this does not provide much additional model
fit. Let&#8217;s see if school performs any better.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">MLexamp.3</span> <span class="o">&lt;-</span> <span class="nf">glm</span><span class="p">(</span><span class="n">extro</span> <span class="o">~</span> <span class="n">open</span> <span class="o">+</span> <span class="n">agree</span> <span class="o">+</span> <span class="n">social</span> <span class="o">+</span> <span class="n">school</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">lmm.data</span> <span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">display</span><span class="p">(</span><span class="n">MLexamp.3</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## glm(formula = extro ~ open + agree + social + school, data = lmm.data)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##             coef.est coef.se</span>
</span></span><span class="line"><span class="cl"><span class="c1">## (Intercept) 45.02     0.92  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## open         0.01     0.01  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## agree        0.03     0.02  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## social       0.00     0.00  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolII     7.91     0.27  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIII   12.12     0.27  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIV    16.06     0.27  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolV     20.43     0.27  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolVI    28.05     0.27  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## ---</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   n = 1200, k = 9</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   residual deviance = 8496.2, null deviance = 104432.7 (difference = 95936.5)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   overdispersion parameter = 7.1</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   residual sd is sqrt(overdispersion) = 2.67</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">AIC</span><span class="p">(</span><span class="n">MLexamp.3</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] 5774.203</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">anova</span><span class="p">(</span><span class="n">MLexamp</span><span class="p">,</span> <span class="n">MLexamp.3</span><span class="p">,</span> <span class="n">test</span><span class="o">=</span><span class="s">&#34;F&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## Analysis of Deviance Table</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## Model 1: extro ~ open + agree + social</span>
</span></span><span class="line"><span class="cl"><span class="c1">## Model 2: extro ~ open + agree + social + school</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Resid. Df Resid. Dev Df Deviance      F    Pr(&gt;F)    </span>
</span></span><span class="line"><span class="cl"><span class="c1">## 1      1196     104378                                 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## 2      1191       8496  5    95882 2688.2 &lt; 2.2e-16 ***</span>
</span></span><span class="line"><span class="cl"><span class="c1">## ---</span>
</span></span><span class="line"><span class="cl"><span class="c1">## Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1</span></span></span></code></pre></div>
</figure>
<p>The school effect greatly improves our model fit. However, how do we interpret these effects?</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">table</span><span class="p">(</span><span class="n">lmm.data</span><span class="o">$</span><span class="n">school</span><span class="p">,</span> <span class="n">lmm.data</span><span class="o">$</span><span class="n">class</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##      </span>
</span></span><span class="line"><span class="cl"><span class="c1">##        a  b  c  d</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   I   50 50 50 50</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   II  50 50 50 50</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   III 50 50 50 50</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   IV  50 50 50 50</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   V   50 50 50 50</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   VI  50 50 50 50</span></span></span></code></pre></div>
</figure>
<p>Here we can see we have a perfectly balanced design with fifty observations in each combination of
class and school (if only data were always so nice!).</p>
<p>Let&#8217;s try to model each of these unique cells. To do this, we fit a model and use the <code>:</code> operator
to specify the interaction between <code>school</code> and <code>class</code>.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">MLexamp.4</span> <span class="o">&lt;-</span> <span class="nf">glm</span><span class="p">(</span><span class="n">extro</span> <span class="o">~</span> <span class="n">open</span> <span class="o">+</span> <span class="n">agree</span> <span class="o">+</span> <span class="n">social</span> <span class="o">+</span> <span class="n">school</span><span class="o">:</span><span class="n">class</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">lmm.data</span> <span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">display</span><span class="p">(</span><span class="n">MLexamp.4</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## glm(formula = extro ~ open + agree + social + school:class, data = lmm.data)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##                  coef.est coef.se</span>
</span></span><span class="line"><span class="cl"><span class="c1">## (Intercept)       80.36     0.37 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## open               0.01     0.00 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## agree             -0.01     0.01 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## social             0.00     0.00 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolI:classa   -40.39     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolII:classa  -28.15     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIII:classa -23.58     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIV:classa  -19.76     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolV:classa   -15.50     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolVI:classa  -10.46     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolI:classb   -34.60     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolII:classb  -26.76     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIII:classb -22.59     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIV:classb  -18.71     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolV:classb   -14.31     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolVI:classb   -8.54     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolI:classc   -31.86     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolII:classc  -25.64     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIII:classc -21.58     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIV:classc  -17.58     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolV:classc   -13.38     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolVI:classc   -5.58     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolI:classd   -30.00     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolII:classd  -24.57     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIII:classd -20.64     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIV:classd  -16.60     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolV:classd   -12.04     0.20 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## ---</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   n = 1200, k = 27</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   residual deviance = 1135.9, null deviance = 104432.7 (difference = 103296.8)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   overdispersion parameter = 1.0</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   residual sd is sqrt(overdispersion) = 0.98</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">AIC</span><span class="p">(</span><span class="n">MLexamp.4</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] 3395.573</span></span></span></code></pre></div>
</figure>
<p>This is very useful, but what if we want to understand both the effect of the school and the effect
of the class, as well as the effect of the schools and classes? Unfortunately, this is not easily
done with the standard <code>glm</code>.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">MLexamp.5</span> <span class="o">&lt;-</span> <span class="nf">glm</span><span class="p">(</span><span class="n">extro</span> <span class="o">~</span> <span class="n">open</span> <span class="o">+</span> <span class="n">agree</span> <span class="o">+</span> <span class="n">social</span> <span class="o">+</span> <span class="n">school</span><span class="o">*</span><span class="n">class</span> <span class="o">-</span> <span class="m">1</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">lmm.data</span> <span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">display</span><span class="p">(</span><span class="n">MLexamp.5</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## glm(formula = extro ~ open + agree + social + school * class - </span>
</span></span><span class="line"><span class="cl"><span class="c1">##     1, data = lmm.data)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##                  coef.est coef.se</span>
</span></span><span class="line"><span class="cl"><span class="c1">## open              0.01     0.00  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## agree            -0.01     0.01  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## social            0.00     0.00  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolI          39.96     0.36  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolII         52.21     0.36  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIII        56.78     0.36  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIV         60.60     0.36  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolV          64.86     0.36  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolVI         69.90     0.36  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## classb            5.79     0.20  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## classc            8.53     0.20  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## classd           10.39     0.20  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolII:classb  -4.40     0.28  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIII:classb -4.80     0.28  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIV:classb  -4.74     0.28  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolV:classb   -4.60     0.28  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolVI:classb  -3.87     0.28  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolII:classc  -6.02     0.28  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIII:classc -6.54     0.28  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIV:classc  -6.36     0.28  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolV:classc   -6.41     0.28  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolVI:classc  -3.65     0.28  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolII:classd  -6.81     0.28  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIII:classd -7.45     0.28  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolIV:classd  -7.24     0.28  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolV:classd   -6.93     0.28  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## schoolVI:classd   0.06     0.28  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## ---</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   n = 1200, k = 27</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   residual deviance = 1135.9, null deviance = 4463029.9 (difference = 4461894.0)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   overdispersion parameter = 1.0</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   residual sd is sqrt(overdispersion) = 0.98</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">AIC</span><span class="p">(</span><span class="n">MLexamp.5</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] 3395.573</span></span></span></code></pre></div>
</figure>
<h2 id="exploring-random-slopes">
  <span class="heading-mark">Exploring Random Slopes</span>
  <a class="heading-anchor" href="#exploring-random-slopes" aria-label="Link to this section">#</a>
</h2>
<p>Another alternative is to fit a separate model for each of the school and class combinations. If we
believe the relationsihp between our variables may be highly dependent on the school and class
combination, we can simply fit a series of models and explore the parameter variation among them:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">require</span><span class="p">(</span><span class="n">plyr</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">modellist</span> <span class="o">&lt;-</span> <span class="nf">dlply</span><span class="p">(</span><span class="n">lmm.data</span><span class="p">,</span> <span class="n">.(school</span><span class="p">,</span> <span class="n">class</span><span class="p">),</span> <span class="kr">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> 
</span></span><span class="line"><span class="cl">                              <span class="nf">glm</span><span class="p">(</span><span class="n">extro</span><span class="o">~</span> <span class="n">open</span> <span class="o">+</span> <span class="n">agree</span> <span class="o">+</span> <span class="n">social</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">x</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="nf">display</span><span class="p">(</span><span class="n">modellist[[1]]</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## glm(formula = extro ~ open + agree + social, data = x)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##             coef.est coef.se</span>
</span></span><span class="line"><span class="cl"><span class="c1">## (Intercept) 35.87     5.90  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## open         0.05     0.09  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## agree        0.02     0.10  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## social       0.01     0.03  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## ---</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   n = 50, k = 4</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   residual deviance = 500.2, null deviance = 506.2 (difference = 5.9)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   overdispersion parameter = 10.9</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   residual sd is sqrt(overdispersion) = 3.30</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">display</span><span class="p">(</span><span class="n">modellist[[2]]</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## glm(formula = extro ~ open + agree + social, data = x)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##             coef.est coef.se</span>
</span></span><span class="line"><span class="cl"><span class="c1">## (Intercept) 47.96     2.16  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## open        -0.01     0.03  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## agree       -0.03     0.03  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## social      -0.01     0.01  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## ---</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   n = 50, k = 4</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   residual deviance = 47.9, null deviance = 49.1 (difference = 1.2)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   overdispersion parameter = 1.0</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   residual sd is sqrt(overdispersion) = 1.02</span></span></span></code></pre></div>
</figure>
<p>We will discuss this strategy in more depth in future tutorials including how to performan inference
on the list of models produced in this command.</p>
<h1 id="fit-a-varying-intercept-model-with-lmer">
  <span class="heading-mark">Fit a varying intercept model with lmer</span>
  <a class="heading-anchor" href="#fit-a-varying-intercept-model-with-lmer" aria-label="Link to this section">#</a>
</h1>
<p>Enter <code>lme4</code>. While all of the above techniques are valid approaches to this problem, they are not
necessarily the best approach when we are interested explicitly in variation among and by groups.
This is where a mixed-effect modeling framework is useful. Now we use the <code>lmer</code> function with the
familiar formula interface, but now group level variables are specified using a special syntax:
<code>(1|school)</code> tells <code>lmer</code> to fit a linear model with a varying-intercept group effect using the
variable <code>school</code>.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">MLexamp.6</span> <span class="o">&lt;-</span> <span class="nf">lmer</span><span class="p">(</span><span class="n">extro</span> <span class="o">~</span> <span class="n">open</span> <span class="o">+</span> <span class="n">agree</span> <span class="o">+</span> <span class="n">social</span> <span class="o">+</span> <span class="p">(</span><span class="m">1</span><span class="o">|</span><span class="n">school</span><span class="p">),</span> <span class="n">data</span><span class="o">=</span><span class="n">lmm.data</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">display</span><span class="p">(</span><span class="n">MLexamp.6</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## lmer(formula = extro ~ open + agree + social + (1 | school), </span>
</span></span><span class="line"><span class="cl"><span class="c1">##     data = lmm.data)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##             coef.est coef.se</span>
</span></span><span class="line"><span class="cl"><span class="c1">## (Intercept) 59.12     4.10  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## open         0.01     0.01  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## agree        0.03     0.02  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## social       0.00     0.00  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## Error terms:</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  Groups   Name        Std.Dev.</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  school   (Intercept) 9.79    </span>
</span></span><span class="line"><span class="cl"><span class="c1">##  Residual             2.67    </span>
</span></span><span class="line"><span class="cl"><span class="c1">## ---</span>
</span></span><span class="line"><span class="cl"><span class="c1">## number of obs: 1200, groups: school, 6</span>
</span></span><span class="line"><span class="cl"><span class="c1">## AIC = 5836.1, DIC = 5788.9</span>
</span></span><span class="line"><span class="cl"><span class="c1">## deviance = 5806.5</span></span></span></code></pre></div>
</figure>
<p>We can fit multiple group effects with multiple group effect terms.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">MLexamp.7</span> <span class="o">&lt;-</span> <span class="nf">lmer</span><span class="p">(</span><span class="n">extro</span> <span class="o">~</span> <span class="n">open</span> <span class="o">+</span> <span class="n">agree</span> <span class="o">+</span> <span class="n">social</span> <span class="o">+</span> <span class="p">(</span><span class="m">1</span><span class="o">|</span><span class="n">school</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="m">1</span><span class="o">|</span><span class="n">class</span><span class="p">),</span> <span class="n">data</span><span class="o">=</span><span class="n">lmm.data</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">display</span><span class="p">(</span><span class="n">MLexamp.7</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## lmer(formula = extro ~ open + agree + social + (1 | school) + </span>
</span></span><span class="line"><span class="cl"><span class="c1">##     (1 | class), data = lmm.data)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##             coef.est coef.se</span>
</span></span><span class="line"><span class="cl"><span class="c1">## (Intercept) 60.20     4.21  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## open         0.01     0.01  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## agree       -0.01     0.01  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## social       0.00     0.00  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## Error terms:</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  Groups   Name        Std.Dev.</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  school   (Intercept) 9.79    </span>
</span></span><span class="line"><span class="cl"><span class="c1">##  class    (Intercept) 2.41    </span>
</span></span><span class="line"><span class="cl"><span class="c1">##  Residual             1.67    </span>
</span></span><span class="line"><span class="cl"><span class="c1">## ---</span>
</span></span><span class="line"><span class="cl"><span class="c1">## number of obs: 1200, groups: school, 6; class, 4</span>
</span></span><span class="line"><span class="cl"><span class="c1">## AIC = 4737.9, DIC = 4683.3</span>
</span></span><span class="line"><span class="cl"><span class="c1">## deviance = 4703.6</span></span></span></code></pre></div>
</figure>
<p>And finally, we can fit nested group effect terms through the following syntax:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">MLexamp.8</span> <span class="o">&lt;-</span> <span class="nf">lmer</span><span class="p">(</span><span class="n">extro</span> <span class="o">~</span> <span class="n">open</span> <span class="o">+</span> <span class="n">agree</span> <span class="o">+</span> <span class="n">social</span> <span class="o">+</span> <span class="p">(</span><span class="m">1</span><span class="o">|</span><span class="n">school</span><span class="o">/</span><span class="n">class</span><span class="p">),</span> <span class="n">data</span><span class="o">=</span><span class="n">lmm.data</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">display</span><span class="p">(</span><span class="n">MLexamp.8</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## lmer(formula = extro ~ open + agree + social + (1 | school/class), </span>
</span></span><span class="line"><span class="cl"><span class="c1">##     data = lmm.data)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##             coef.est coef.se</span>
</span></span><span class="line"><span class="cl"><span class="c1">## (Intercept) 60.24     4.01  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## open         0.01     0.00  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## agree       -0.01     0.01  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## social       0.00     0.00  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## Error terms:</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  Groups       Name        Std.Dev.</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  class:school (Intercept) 2.86    </span>
</span></span><span class="line"><span class="cl"><span class="c1">##  school       (Intercept) 9.69    </span>
</span></span><span class="line"><span class="cl"><span class="c1">##  Residual                 0.98    </span>
</span></span><span class="line"><span class="cl"><span class="c1">## ---</span>
</span></span><span class="line"><span class="cl"><span class="c1">## number of obs: 1200, groups: class:school, 24; school, 6</span>
</span></span><span class="line"><span class="cl"><span class="c1">## AIC = 3568.6, DIC = 3507.6</span>
</span></span><span class="line"><span class="cl"><span class="c1">## deviance = 3531.1</span></span></span></code></pre></div>
</figure>
<p>Here the <code>(1|school/class)</code> says that we want to fit a mixed effect term for varying intercepts <code>1|</code>
by schools, and for classes that are nested within schools.</p>
<h1 id="fit-a-varying-slope-model-with-lmer">
  <span class="heading-mark">Fit a varying slope model with lmer</span>
  <a class="heading-anchor" href="#fit-a-varying-slope-model-with-lmer" aria-label="Link to this section">#</a>
</h1>
<p>But, what if we want to explore the effect of different student level indicators as they vary across
classrooms. Instead of fitting unique models by school (or school/class) we can fit a varying slope
model. Here we modify our random effect term to include variables before the grouping terms:
<code>(1 +open|school/class)</code> tells R to fit a varying slope and varying intercept model for schools and
classes nested within schools, and to allow the slope of the <code>open</code> variable to vary by school.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">MLexamp.9</span> <span class="o">&lt;-</span> <span class="nf">lmer</span><span class="p">(</span><span class="n">extro</span> <span class="o">~</span> <span class="n">open</span> <span class="o">+</span> <span class="n">agree</span> <span class="o">+</span> <span class="n">social</span> <span class="o">+</span> <span class="p">(</span><span class="m">1</span><span class="o">+</span><span class="n">open</span><span class="o">|</span><span class="n">school</span><span class="o">/</span><span class="n">class</span><span class="p">),</span> <span class="n">data</span><span class="o">=</span><span class="n">lmm.data</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">display</span><span class="p">(</span><span class="n">MLexamp.9</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## lmer(formula = extro ~ open + agree + social + (1 + open | school/class), </span>
</span></span><span class="line"><span class="cl"><span class="c1">##     data = lmm.data)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##             coef.est coef.se</span>
</span></span><span class="line"><span class="cl"><span class="c1">## (Intercept) 60.26     3.46  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## open         0.01     0.01  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## agree       -0.01     0.01  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## social       0.00     0.00  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## Error terms:</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  Groups       Name        Std.Dev. Corr </span>
</span></span><span class="line"><span class="cl"><span class="c1">##  class:school (Intercept) 2.61          </span>
</span></span><span class="line"><span class="cl"><span class="c1">##               open        0.01     1.00 </span>
</span></span><span class="line"><span class="cl"><span class="c1">##  school       (Intercept) 8.33          </span>
</span></span><span class="line"><span class="cl"><span class="c1">##               open        0.00     1.00 </span>
</span></span><span class="line"><span class="cl"><span class="c1">##  Residual                 0.98          </span>
</span></span><span class="line"><span class="cl"><span class="c1">## ---</span>
</span></span><span class="line"><span class="cl"><span class="c1">## number of obs: 1200, groups: class:school, 24; school, 6</span>
</span></span><span class="line"><span class="cl"><span class="c1">## AIC = 3574.9, DIC = 3505.6</span>
</span></span><span class="line"><span class="cl"><span class="c1">## deviance = 3529.3</span></span></span></code></pre></div>
</figure>
<h2 id="conclusion">
  <span class="heading-mark">Conclusion</span>
  <a class="heading-anchor" href="#conclusion" aria-label="Link to this section">#</a>
</h2>
<p>Fitting mixed effect models and exploring group level variation is very easy within the R language
and ecosystem. In future tutorials we will explore comparing across models, doing inference with
mixed-effect models, and creating graphical representations of mixed effect models to understand
their effects.</p>
<h2 id="appendix">
  <span class="heading-mark">Appendix</span>
  <a class="heading-anchor" href="#appendix" aria-label="Link to this section">#</a>
</h2>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">print</span><span class="p">(</span><span class="nf">sessionInfo</span><span class="p">(),</span><span class="n">locale</span><span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## R version 3.5.3 (2019-03-11)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## Platform: x86_64-w64-mingw32/x64 (64-bit)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## Running under: Windows 10 x64 (build 17134)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## Matrix products: default</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## attached base packages:</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] stats     graphics  grDevices utils     datasets  methods   base     </span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## other attached packages:</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] plyr_1.8.4    arm_1.10-1    MASS_7.3-51.4 lme4_1.1-21   Matrix_1.2-17 knitr_1.22   </span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## loaded via a namespace (and not attached):</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  [1] Rcpp_1.0.1      lattice_0.20-38 digest_0.6.18   grid_3.5.3      nlme_3.1-139    magrittr_1.5   </span>
</span></span><span class="line"><span class="cl"><span class="c1">##  [7] coda_0.19-2     evaluate_0.13   stringi_1.4.3   minqa_1.2.4     nloptr_1.2.1    boot_1.3-22    </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [13] rmarkdown_1.12  splines_3.5.3   tools_3.5.3     stringr_1.4.0   abind_1.4-5     xfun_0.6       </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [19] yaml_2.2.0      compiler_3.5.3  htmltools_0.3.6</span></span></span></code></pre></div>
</figure>
<p>[1] Examples include <a href="http://stat.columbia.edu/~gelman/arm">Gelman and Hill</a>
,
<a href="http://stat.columbia.edu/~gelman/book/">Gelman et al. 2013</a>
, etc.</p>
<p><strong>Like this?</strong><br>
Then head over to the second part &#8211; <a href="http://jaredknowles.com/journal/2014/5/17/mixed-effects-tutorial-2-fun-with-mermod-objects">using merMod o</a>
<a href="https://www.civilytics.com/posts/2014/mixed-effects-tutorial-2-fun-with-mermod-objects/">bjects in R.</a>
</p>
<p>Consider subscribing to our newsletter, <a href="https://civilytics.substack.com">the Civic Pulse</a>
, for monthly emails about how data skills like these are being practiced in social policy.</p>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "Update: Since this post was released I have co-authored an R package to make some of the items in this post easier to do. This package is called merTools and is available on CRAN and on GitHub. To read more about it, read my new post h ere and check out the package on GitHub . Introduction # Analysts dealing with grouped data and complex hierarchical structures in their data ranging from measurements nested within participants, to counties nested within states or students nested within classrooms often find themselves in need of modeling tools to reflect this structure of their data. In R there are two predominant ways to fit multilevel models that account for such structure in the data. These tutorials will show the user how to use both the lme4 package in R to fit linear and nonlinear mixed effect models, and to use rstan to fit fully Bayesian multilevel models. The focus here will be on how to fit the models in R and not the theory behind the models. For background on multilevel modeling, see the references. [1]",
  "og_image": "https://jaredknowles.com/og/posts/getting-started-with-mixed-effect-models-in-r.png",
  "og_title": "Getting Started with Mixed Effect Models in R"
}
</posse:post></entry><entry><title>Latent Variable Analysis with R: Getting Setup with lavaan</title><link href="https://jaredknowles.com/posts/latent-variable-analysis-with-r-getting-setup-with-lavaan/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/latent-variable-analysis-with-r-getting-setup-with-lavaan/</id><published>2013-09-01T22:23:00Z</published><updated>2013-09-01T22:23:00Z</updated><category term="R"/><summary>Getting Started with Structural Equation Modeling Part 1 Getting Started with Structural Equation Modeling: Part 1 # Introduction # For the analyst familiar with linear regression fitting structural equation models can at first feel strange. In the R environment, fitting structural equation models involves learning new modeling syntax, new plotting syntax, and often a new data input method. However, a quick reorientation and soon the user is exposed to the differences, fitting structural equation models can be a powerful tool in the analyst’s toolkit.</summary><content type="html"><![CDATA[<p>Getting Started with Structural Equation Modeling Part 1</p>
<h1 id="getting-started-with-structural-equation-modeling-part-1">
  <span class="heading-mark">Getting Started with Structural Equation Modeling: Part 1</span>
  <a class="heading-anchor" href="#getting-started-with-structural-equation-modeling-part-1" aria-label="Link to this section">#</a>
</h1>
<h1 id="introduction">
  <span class="heading-mark">Introduction</span>
  <a class="heading-anchor" href="#introduction" aria-label="Link to this section">#</a>
</h1>
<p>For the analyst familiar with linear regression fitting structural equation models
can at first feel strange. In the R environment, fitting structural equation models
involves learning new modeling syntax, new plotting syntax, and often a new
data input method. However, a quick reorientation and soon the user is exposed
to the differences, fitting structural equation models can be a powerful tool in
the analyst&#8217;s toolkit.</p>
<p>This tutorial will cover getting set up and running a few basic models using <code>lavaan</code>
in R.<a href="http://lavaan.ugent.be/" title="The lavaan homepage">1</a>
 Future tutorials will cover:</p>
<ul>
<li>constructing latent variables</li>
<li>comparing alternate models</li>
<li>multi-group analysis on larger datasets.</li>
</ul>
<h1 id="setting-up-your-environment">
  <span class="heading-mark">Setting up your enviRonment</span>
  <a class="heading-anchor" href="#setting-up-your-environment" aria-label="Link to this section">#</a>
</h1>
<p>Getting started using structural equation modeling (SEM) in R can be daunting. There are
lots of different packages for implementing SEM in R and there are different features
of SEM that a user might be interested in implementing. A few packages you might come
across can be found on the <a href="http://cran.r-project.org/web/views/Psychometrics.html">CRAN Psychometrics Task View</a>
.</p>
<p>For those who want to just dive in the <code>lavaan</code> package seems to offer the most
comprehensive feature set for most SEM users and has a well thought out and easy
to learn syntax for describing SEM models. To install <code>lavaan</code>, we just run:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># Main version</span>
</span></span><span class="line"><span class="cl"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;lavaan&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Or to install the dev version</span>
</span></span><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">install_github</span><span class="p">(</span><span class="s">&#34;lavaan&#34;</span><span class="p">,</span> <span class="s">&#34;yrosseel&#34;</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<h1 id="read-in-the-data">
  <span class="heading-mark">Read in the data</span>
  <a class="heading-anchor" href="#read-in-the-data" aria-label="Link to this section">#</a>
</h1>
<p>Once we load up the lavaan package, we need to read in the dataset. <code>lavaan</code> accepts
two different types of data, either a standard R dataframe, or a variance-covariance
matrix. Since the latter is unfamiliar to us coming from the standard <code>lm</code> linear modeling
framework in R, we&#8217;ll start with reading in the simplest variance-covariance matrix
possible and running a path analysis model.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">lavaan</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">mat1</span> <span class="o">&lt;-</span> <span class="nf">matrix</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0.6</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0.33</span><span class="p">,</span> <span class="m">0.63</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> <span class="m">3</span><span class="p">,</span> <span class="m">3</span><span class="p">,</span> <span class="n">byrow</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nf">colnames</span><span class="p">(</span><span class="n">mat1</span><span class="p">)</span> <span class="o">&lt;-</span> <span class="nf">rownames</span><span class="p">(</span><span class="n">mat1</span><span class="p">)</span> <span class="o">&lt;-</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;ILL&#34;</span><span class="p">,</span> <span class="s">&#34;IMM&#34;</span><span class="p">,</span> <span class="s">&#34;DEP&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">myN</span> <span class="o">&lt;-</span> <span class="m">500</span>
</span></span><span class="line"><span class="cl"><span class="nf">print</span><span class="p">(</span><span class="n">mat1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##      ILL  IMM DEP</span>
</span></span><span class="line"><span class="cl"><span class="c1">## ILL 1.00 0.00   0</span>
</span></span><span class="line"><span class="cl"><span class="c1">## IMM 0.60 1.00   0</span>
</span></span><span class="line"><span class="cl"><span class="c1">## DEP 0.33 0.63   1</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># Note that we only input the lower triangle of the matrix. This is</span>
</span></span><span class="line"><span class="cl"><span class="c1"># sufficient though we could put the whole matrix in if we like</span></span></span></code></pre></div>
</figure>
<p>Now we have a variance-covariance matrix in our environment named <code>mat1</code> and a
variable <code>myN</code> corresponding to the number of observations in our dataset. Alternatively,
we could provide R with the full dataset and it can derive <code>mat1</code> and <code>myN</code> itself.</p>
<p>With this data we can construct two possible models:</p>
<ol>
<li>Depression (DEP) influences Immune System (IMM) influences Illness (ILL)</li>
<li>IMM influences ILL influences DEP</li>
</ol>
<p>Using SEM we can evaluate which model best explains the covariances we observe in
our data above. Fitting models in <code>lavaan</code> is a two step process. First, we create
a text string that serves as the <code>lavaan</code> model and follows the <code>lavaan</code> <a href="http://www.inside-r.org/packages/cran/lavaan/docs/model.syntax">model
syntax</a>
. Next, we
give <code>lavaan</code> the instructions on how to fit this model to the data using either the
<code>cfa</code>, <code>lavaan</code>, or <code>sem</code> functions. Here we will use the <code>sem</code> function. Other
functions will be covered in a future post.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># Specify the model</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">mod1</span> <span class="o">&lt;-</span> <span class="s">&#34;ILL ~ IMM \n        IMM ~ DEP&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Give lavaan the command to fit the model</span>
</span></span><span class="line"><span class="cl"><span class="n">mod1fit</span> <span class="o">&lt;-</span> <span class="nf">sem</span><span class="p">(</span><span class="n">mod1</span><span class="p">,</span> <span class="n">sample.cov</span> <span class="o">=</span> <span class="n">mat1</span><span class="p">,</span> <span class="n">sample.nobs</span> <span class="o">=</span> <span class="m">500</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Specify model 2</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">mod2</span> <span class="o">&lt;-</span> <span class="s">&#34;DEP ~ ILL\n        ILL ~ IMM&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">mod2fit</span> <span class="o">&lt;-</span> <span class="nf">sem</span><span class="p">(</span><span class="n">mod2</span><span class="p">,</span> <span class="n">sample.cov</span> <span class="o">=</span> <span class="n">mat1</span><span class="p">,</span> <span class="n">sample.nobs</span> <span class="o">=</span> <span class="m">500</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<p>Now we have two objects stored in our environment for each model. We have the
model string and the modelfit object. The model fit objects (<code>mod1fit</code> and <code>mod2fit</code>)
are <code>lavaan</code> class objects. These are S4 objects with many supported methods, including
the <code>summary</code> method which provides a lot of useful output:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># Summarize the model fit</span>
</span></span><span class="line"><span class="cl"><span class="nf">summary</span><span class="p">(</span><span class="n">mod1fit</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## lavaan (0.5-14) converged normally after  12 iterations</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Number of observations                           500</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Estimator                                         ML</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Minimum Function Test Statistic                2.994</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Degrees of freedom                                 1</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   P-value (Chi-square)                           0.084</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## Parameter estimates:</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Information                                 Expected</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Standard Errors                             Standard</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##                    Estimate  Std.err  Z-value  P(&gt;|z|)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## Regressions:</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   ILL ~</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     IMM               0.600    0.036   16.771    0.000</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   IMM ~</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     DEP               0.630    0.035   18.140    0.000</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## Variances:</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     ILL               0.639    0.040</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     IMM               0.602    0.038</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">summary</span><span class="p">(</span><span class="n">mod2fit</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## lavaan (0.5-14) converged normally after  11 iterations</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Number of observations                           500</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Estimator                                         ML</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Minimum Function Test Statistic              198.180</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Degrees of freedom                                 1</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   P-value (Chi-square)                           0.000</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## Parameter estimates:</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Information                                 Expected</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Standard Errors                             Standard</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##                    Estimate  Std.err  Z-value  P(&gt;|z|)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## Regressions:</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   DEP ~</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     ILL               0.330    0.042    7.817    0.000</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   ILL ~</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     IMM               0.600    0.036   16.771    0.000</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## Variances:</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     DEP               0.889    0.056</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     ILL               0.639    0.040</span></span></span></code></pre></div>
</figure>
<p>One of the best ways to understand an SEM model is to inspect the model visually
using a path diagram. Thanks to the <code>semPlot</code> package,
this is easy to do in R.<a href="http://sachaepskamp.com/semPlot/" title="The semPlot homepage">2</a>
 First, install <code>semPlot</code>:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># Official version</span>
</span></span><span class="line"><span class="cl"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;semPlot&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Or to install the dev version</span>
</span></span><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">install_github</span><span class="p">(</span><span class="s">&#34;semPlot&#34;</span><span class="p">,</span> <span class="s">&#34;SachaEpskamp&#34;</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<p>Next we load the library and make some path diagrams.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">semPlot</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">semPaths</span><span class="p">(</span><span class="n">mod1fit</span><span class="p">,</span> <span class="n">what</span> <span class="o">=</span> <span class="s">&#34;est&#34;</span><span class="p">,</span> <span class="n">layout</span> <span class="o">=</span> <span class="s">&#34;tree&#34;</span><span class="p">,</span> <span class="n">title</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">,</span> <span class="n">style</span> <span class="o">=</span> <span class="s">&#34;LISREL&#34;</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<p><figure class="post-figure">
  <div class="photo-frame">
        <img src="/posts/latent-variable-analysis-with-r-getting-setup-with-lavaan/plot-1.svg" alt="Path diagram of the first CFA model: nine observed indicators loading on three correlated latent factors, drawn in LISREL style with standardized estimates." loading="lazy" decoding="async"></div></figure>
</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">semPaths</span><span class="p">(</span><span class="n">mod2fit</span><span class="p">,</span> <span class="n">what</span> <span class="o">=</span> <span class="s">&#34;est&#34;</span><span class="p">,</span> <span class="n">layout</span> <span class="o">=</span> <span class="s">&#34;tree&#34;</span><span class="p">,</span> <span class="n">title</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">,</span> <span class="n">style</span> <span class="o">=</span> <span class="s">&#34;LISREL&#34;</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<p><figure class="post-figure">
  <div class="photo-frame">
        <img src="/posts/latent-variable-analysis-with-r-getting-setup-with-lavaan/plot-2.svg" alt="Path diagram of the second CFA model with the same nine indicators and three latent factors, showing the alternative specification side by side." loading="lazy" decoding="async"></div></figure>
</p>
<p>These two simple path models look great. But which is better? We can run a simple
chi-square test on the <code>lavaan</code> objects <code>mod1fit</code> and <code>mod2fit</code>.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">anova</span><span class="p">(</span><span class="n">mod1fit</span><span class="p">,</span> <span class="n">mod2fit</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## Chi Square Difference Test</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##         Df  AIC  BIC  Chisq Chisq diff Df diff Pr(&gt;Chisq)    </span>
</span></span><span class="line"><span class="cl"><span class="c1">## mod1fit  1 3786 3803   2.99                                  </span>
</span></span><span class="line"><span class="cl"><span class="c1">## mod2fit  1 3981 3998 198.18        195       0     &lt;2e-16 ***</span>
</span></span><span class="line"><span class="cl"><span class="c1">## ---</span>
</span></span><span class="line"><span class="cl"><span class="c1">## Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1</span></span></span></code></pre></div>
</figure>
<p>We can see that very clearly we prefer Model 2. Let&#8217;s look at some properties of
model 2 that we can access through the <code>lavaan</code> object with convenience functions.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># Goodness of fit measures</span>
</span></span><span class="line"><span class="cl"><span class="nf">fitMeasures</span><span class="p">(</span><span class="n">mod2fit</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##              fmin             chisq                df            pvalue </span>
</span></span><span class="line"><span class="cl"><span class="c1">##             0.198           198.180             1.000             0.000 </span>
</span></span><span class="line"><span class="cl"><span class="c1">##    baseline.chisq       baseline.df   baseline.pvalue               cfi </span>
</span></span><span class="line"><span class="cl"><span class="c1">##           478.973             3.000             0.000             0.586 </span>
</span></span><span class="line"><span class="cl"><span class="c1">##               tli              nnfi               rfi               nfi </span>
</span></span><span class="line"><span class="cl"><span class="c1">##            -0.243            -0.243             1.000             0.586 </span>
</span></span><span class="line"><span class="cl"><span class="c1">##              pnfi               ifi               rni              logl </span>
</span></span><span class="line"><span class="cl"><span class="c1">##             0.195             0.587             0.586         -1986.510 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## unrestricted.logl              npar               aic               bic </span>
</span></span><span class="line"><span class="cl"><span class="c1">##         -1887.420             4.000          3981.020          3997.878 </span>
</span></span><span class="line"><span class="cl"><span class="c1">##            ntotal              bic2             rmsea    rmsea.ci.lower </span>
</span></span><span class="line"><span class="cl"><span class="c1">##           500.000          3985.182             0.628             0.556 </span>
</span></span><span class="line"><span class="cl"><span class="c1">##    rmsea.ci.upper      rmsea.pvalue               rmr        rmr_nomean </span>
</span></span><span class="line"><span class="cl"><span class="c1">##             0.703             0.000             0.176             0.176 </span>
</span></span><span class="line"><span class="cl"><span class="c1">##              srmr       srmr_nomean             cn_05             cn_01 </span>
</span></span><span class="line"><span class="cl"><span class="c1">##             0.176             0.176            10.692            17.740 </span>
</span></span><span class="line"><span class="cl"><span class="c1">##               gfi              agfi              pgfi               mfi </span>
</span></span><span class="line"><span class="cl"><span class="c1">##             0.821            -0.075             0.137             0.821 </span>
</span></span><span class="line"><span class="cl"><span class="c1">##              ecvi </span>
</span></span><span class="line"><span class="cl"><span class="c1">##             0.412</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># Estimates of the model parameters</span>
</span></span><span class="line"><span class="cl"><span class="nf">parameterEstimates</span><span class="p">(</span><span class="n">mod2fit</span><span class="p">,</span> <span class="n">ci</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">,</span> <span class="n">boot.ci.type</span> <span class="o">=</span> <span class="s">&#34;norm&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   lhs op rhs   est    se      z pvalue ci.lower ci.upper</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 1 DEP  ~ ILL 0.330 0.042  7.817      0    0.247    0.413</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 2 ILL  ~ IMM 0.600 0.036 16.771      0    0.530    0.670</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 3 DEP ~~ DEP 0.889 0.056 15.811      0    0.779    1.000</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 4 ILL ~~ ILL 0.639 0.040 15.811      0    0.560    0.718</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 5 IMM ~~ IMM 0.998 0.000     NA     NA    0.998    0.998</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># Modification indices</span>
</span></span><span class="line"><span class="cl"><span class="nf">modindices</span><span class="p">(</span><span class="n">mod2fit</span><span class="p">,</span> <span class="n">standardized</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">##    lhs op rhs    mi    epc sepc.lv sepc.all sepc.nox</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 1  DEP ~~ DEP   0.0  0.000   0.000    0.000    0.000</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 2  DEP ~~ ILL 163.6 -0.719  -0.719   -0.720   -0.720</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 3  DEP ~~ IMM 163.6  0.674   0.674    0.675    0.674</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 4  ILL ~~ ILL   0.0  0.000   0.000    0.000    0.000</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 5  ILL ~~ IMM    NA     NA      NA       NA       NA</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 6  IMM ~~ IMM   0.0  0.000   0.000    0.000    0.000</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 7  DEP  ~ ILL   0.0  0.000   0.000    0.000    0.000</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 8  DEP  ~ IMM 163.6  0.675   0.675    0.675    0.676</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 9  ILL  ~ DEP 163.6 -0.808  -0.808   -0.808   -0.808</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 10 ILL  ~ IMM   0.0  0.000   0.000    0.000    0.000</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 11 IMM  ~ DEP 143.8  0.666   0.666    0.666    0.666</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 12 IMM  ~ ILL   0.0  0.000   0.000    0.000    0.000</span></span></span></code></pre></div>
</figure>
<p>That&#8217;s it. From inputing a variance-covariance matrix to fitting a model,
drawing a path diagram, comparing to alternate models, and finally inspecting
the parameters of the preferred model. <code>lavaan</code> is an amazing project which
adds great capabilities to R. These will be explored in future posts.</p>
<h1 id="appendix">
  <span class="heading-mark">Appendix</span>
  <a class="heading-anchor" href="#appendix" aria-label="Link to this section">#</a>
</h1>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">citation</span><span class="p">(</span><span class="n">package</span> <span class="o">=</span> <span class="s">&#34;lavaan&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## To cite lavaan in publications use:</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Yves Rosseel (2012). lavaan: An R Package for Structural</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Equation Modeling. Journal of Statistical Software, 48(2), 1-36.</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   URL http://www.jstatsoft.org/v48/i02/.</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## A BibTeX entry for LaTeX users is</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##   @Article{,</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     title = {{lavaan}: An {R} Package for Structural Equation Modeling},</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     author = {Yves Rosseel},</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     journal = {Journal of Statistical Software},</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     year = {2012},</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     volume = {48},</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     number = {2},</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     pages = {1--36},</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     url = {http://www.jstatsoft.org/v48/i02/},</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   }</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">citation</span><span class="p">(</span><span class="n">package</span> <span class="o">=</span> <span class="s">&#34;semPlot&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## To cite package &#39;semPlot&#39; in publications use:</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##   Sacha Epskamp (2013). semPlot: Path diagrams and visual analysis</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   of various SEM packages&#39; output. R package version 0.3.3.</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   https://github.com/SachaEpskamp/semPlot</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## A BibTeX entry for LaTeX users is</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">##   @Manual{,</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     title = {semPlot: Path diagrams and visual analysis of various SEM packages&#39; output},</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     author = {Sacha Epskamp},</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     year = {2013},</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     note = {R package version 0.3.3},</span>
</span></span><span class="line"><span class="cl"><span class="c1">##     url = {https://github.com/SachaEpskamp/semPlot},</span>
</span></span><span class="line"><span class="cl"><span class="c1">##   }</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## ATTENTION: This citation information has been auto-generated from</span>
</span></span><span class="line"><span class="cl"><span class="c1">## the package DESCRIPTION file and may need manual editing, see</span>
</span></span><span class="line"><span class="cl"><span class="c1">## &#39;help(&#34;citation&#34;)&#39;.</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">sessionInfo</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="c1">## R version 3.0.1 (2013-05-16)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## Platform: x86_64-w64-mingw32/x64 (64-bit)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## locale:</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] LC_COLLATE=English_United States.1252 </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [2] LC_CTYPE=English_United States.1252   </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [3] LC_MONETARY=English_United States.1252</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [4] LC_NUMERIC=C                          </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [5] LC_TIME=English_United States.1252    </span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## attached base packages:</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] stats     graphics  grDevices utils     datasets  methods   base     </span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## other attached packages:</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [1] semPlot_0.3.3  lavaan_0.5-14  quadprog_1.5-5 pbivnorm_0.5-1</span>
</span></span><span class="line"><span class="cl"><span class="c1">## [5] mnormt_1.4-5   boot_1.3-9     MASS_7.3-28    knitr_1.4.1   </span>
</span></span><span class="line"><span class="cl"><span class="c1">## </span>
</span></span><span class="line"><span class="cl"><span class="c1">## loaded via a namespace (and not attached):</span>
</span></span><span class="line"><span class="cl"><span class="c1">##  [1] car_2.0-18       cluster_1.14.4   colorspace_1.2-2 corpcor_1.6.6   </span>
</span></span><span class="line"><span class="cl"><span class="c1">##  [5] digest_0.6.3     ellipse_0.3-8    evaluate_0.4.7   formatR_0.9     </span>
</span></span><span class="line"><span class="cl"><span class="c1">##  [9] grid_3.0.1       Hmisc_3.12-2     igraph_0.6.5-2   jpeg_0.1-6      </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [13] lattice_0.20-23  lisrelToR_0.1.4  plyr_1.8         png_0.1-6       </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [17] psych_1.3.2      qgraph_1.2.3     rockchalk_1.8.0  rpart_4.1-2     </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [21] sem_3.1-3        stats4_3.0.1     stringr_0.6.2    tools_3.0.1     </span>
</span></span><span class="line"><span class="cl"><span class="c1">## [25] XML_3.98-1.1</span></span></span></code></pre></div>
</figure>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "Getting Started with Structural Equation Modeling Part 1 Getting Started with Structural Equation Modeling: Part 1 # Introduction # For the analyst familiar with linear regression fitting structural equation models can at first feel strange. In the R environment, fitting structural equation models involves learning new modeling syntax, new plotting syntax, and often a new data input method. However, a quick reorientation and soon the user is exposed to the differences, fitting structural equation models can be a powerful tool in the analyst’s toolkit.",
  "og_image": "https://jaredknowles.com/og/posts/latent-variable-analysis-with-r-getting-setup-with-lavaan.png",
  "og_title": "Latent Variable Analysis with R: Getting Setup with lavaan"
}
</posse:post></entry><entry><title>Writing a Minimal Working Example (MWE) in R</title><link href="https://jaredknowles.com/posts/writing-a-minimal-working-example-mwe-in-r/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/writing-a-minimal-working-example-mwe-in-r/</id><published>2013-05-27T22:26:22Z</published><updated>2013-05-27T22:26:22Z</updated><category term="R"/><summary>How to Ask for Help using R How to Ask for Help using R # The key to getting good help with an R problem is to provide a minimally working reproducible example (MWRE). Making an MWRE is really easy with R, and it will help ensure that those helping you can identify the source of the error, and ideally submit to you back the corrected code to fix the error instead of sending you hunting for code that works. To have an MWRE you need the following items:</summary><content type="html"><![CDATA[<p>How to Ask for Help using R</p>
<h1 id="how-to-ask-for-help-using-r">
  <span class="heading-mark">How to Ask for Help using R</span>
  <a class="heading-anchor" href="#how-to-ask-for-help-using-r" aria-label="Link to this section">#</a>
</h1>
<p>The key to getting good help with an R problem is to provide a minimally working
reproducible example (MWRE). Making an MWRE is really easy with R, and it will
help ensure that those helping you can identify the source of the error, and
ideally submit to you back the corrected code to fix the error instead of sending
you hunting for code that works. To have an MWRE you need the following items:</p>
<ul>
<li>a minimal dataset that produces the error</li>
<li>the minimal runnable code necessary to produce the data, run on the dataset
provided</li>
<li>the necessary information on the used packages, R version, and system</li>
<li>a <code>seed</code> value, if random properties are part of the code</li>
</ul>
<p>Let&#8217;s look at the tools available in R to help us create each of these components
quickly and easily.</p>
<h3 id="producing-a-minimal-dataset">
  <span class="heading-mark">Producing a Minimal Dataset</span>
  <a class="heading-anchor" href="#producing-a-minimal-dataset" aria-label="Link to this section">#</a>
</h3>
<p>There are three distinct options here:</p>
<ol>
<li>Use a built in R dataset</li>
<li>Create a new vector / data.frame from scratch</li>
<li>Output the data you are currently working on in a shareable way</li>
</ol>
<p>Let&#8217;s look at each of these in turn and see the tools R has to help us do this.</p>
<h4 id="built-in-datasets">
  <span class="heading-mark">Built in Datasets</span>
  <a class="heading-anchor" href="#built-in-datasets" aria-label="Link to this section">#</a>
</h4>
<p>There are a few canonical buit in R datasets that are really attractive for use in
help requests.</p>
<ul>
<li>mtcars</li>
<li>diamonds (from ggplot2)</li>
<li>iris</li>
</ul>
<p>To see all the available datasets in R, simply type: <code>data()</code>. To load any of
these datasets, simply use the following:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">data</span><span class="p">(</span><span class="n">mtcars</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">head</span><span class="p">(</span><span class="n">mtcars</span><span class="p">)</span>  <span class="c1"># to look at the data</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl">                   <span class="n">mpg</span> <span class="n">cyl</span> <span class="n">disp</span>  <span class="n">hp</span> <span class="n">drat</span>    <span class="n">wt</span>  <span class="n">qsec</span> <span class="n">vs</span> <span class="n">am</span> <span class="n">gear</span> <span class="n">carb</span>
</span></span><span class="line"><span class="cl"><span class="n">Mazda</span> <span class="n">RX4</span>         <span class="m">21.0</span>   <span class="m">6</span>  <span class="m">160</span> <span class="m">110</span> <span class="m">3.90</span> <span class="m">2.620</span> <span class="m">16.46</span>  <span class="m">0</span>  <span class="m">1</span>    <span class="m">4</span>    <span class="m">4</span>
</span></span><span class="line"><span class="cl"><span class="n">Mazda</span> <span class="n">RX4</span> <span class="n">Wag</span>     <span class="m">21.0</span>   <span class="m">6</span>  <span class="m">160</span> <span class="m">110</span> <span class="m">3.90</span> <span class="m">2.875</span> <span class="m">17.02</span>  <span class="m">0</span>  <span class="m">1</span>    <span class="m">4</span>    <span class="m">4</span>
</span></span><span class="line"><span class="cl"><span class="n">Datsun</span> <span class="m">710</span>        <span class="m">22.8</span>   <span class="m">4</span>  <span class="m">108</span>  <span class="m">93</span> <span class="m">3.85</span> <span class="m">2.320</span> <span class="m">18.61</span>  <span class="m">1</span>  <span class="m">1</span>    <span class="m">4</span>    <span class="m">1</span>
</span></span><span class="line"><span class="cl"><span class="n">Hornet</span> <span class="m">4</span> <span class="n">Drive</span>    <span class="m">21.4</span>   <span class="m">6</span>  <span class="m">258</span> <span class="m">110</span> <span class="m">3.08</span> <span class="m">3.215</span> <span class="m">19.44</span>  <span class="m">1</span>  <span class="m">0</span>    <span class="m">3</span>    <span class="m">1</span>
</span></span><span class="line"><span class="cl"><span class="n">Hornet</span> <span class="n">Sportabout</span> <span class="m">18.7</span>   <span class="m">8</span>  <span class="m">360</span> <span class="m">175</span> <span class="m">3.15</span> <span class="m">3.440</span> <span class="m">17.02</span>  <span class="m">0</span>  <span class="m">0</span>    <span class="m">3</span>    <span class="m">2</span>
</span></span><span class="line"><span class="cl"><span class="n">Valiant</span>           <span class="m">18.1</span>   <span class="m">6</span>  <span class="m">225</span> <span class="m">105</span> <span class="m">2.76</span> <span class="m">3.460</span> <span class="m">20.22</span>  <span class="m">1</span>  <span class="m">0</span>    <span class="m">3</span>    <span class="m">1</span></span></span></code></pre></div>
</figure>
<p>This option works great for a problem where you know you are having trouble with
a command in R. It is not a great option if you are having trouble understanding
why a command you are familiar with won&#8217;t work on your data.</p>
<p>Note that for education data that is fairly “realistic”, there are built in
simulated datasets in the <code>eeptools</code> package, created by Jared Knowles.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">eeptools</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">data</span><span class="p">(</span><span class="n">stulevel</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">names</span><span class="p">(</span><span class="n">stulevel</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"> <span class="n">[1]</span> <span class="s">&#34;X&#34;</span>           <span class="s">&#34;school&#34;</span>      <span class="s">&#34;stuid&#34;</span>       <span class="s">&#34;grade&#34;</span>       <span class="s">&#34;schid&#34;</span>      
</span></span><span class="line"><span class="cl"> <span class="n">[6]</span> <span class="s">&#34;dist&#34;</span>        <span class="s">&#34;white&#34;</span>       <span class="s">&#34;black&#34;</span>       <span class="s">&#34;hisp&#34;</span>        <span class="s">&#34;indian&#34;</span>     
</span></span><span class="line"><span class="cl"><span class="n">[11]</span> <span class="s">&#34;asian&#34;</span>       <span class="s">&#34;econ&#34;</span>        <span class="s">&#34;female&#34;</span>      <span class="s">&#34;ell&#34;</span>         <span class="s">&#34;disab&#34;</span>      
</span></span><span class="line"><span class="cl"><span class="n">[16]</span> <span class="s">&#34;sch_fay&#34;</span>     <span class="s">&#34;dist_fay&#34;</span>    <span class="s">&#34;luck&#34;</span>        <span class="s">&#34;ability&#34;</span>     <span class="s">&#34;measerr&#34;</span>    
</span></span><span class="line"><span class="cl"><span class="n">[21]</span> <span class="s">&#34;teachq&#34;</span>      <span class="s">&#34;year&#34;</span>        <span class="s">&#34;attday&#34;</span>      <span class="s">&#34;schoolscore&#34;</span> <span class="s">&#34;district&#34;</span>   
</span></span><span class="line"><span class="cl"><span class="n">[26]</span> <span class="s">&#34;schoolhigh&#34;</span>  <span class="s">&#34;schoolavg&#34;</span>   <span class="s">&#34;schoollow&#34;</span>   <span class="s">&#34;readSS&#34;</span>      <span class="s">&#34;mathSS&#34;</span>     
</span></span><span class="line"><span class="cl"><span class="n">[31]</span> <span class="s">&#34;proflvl&#34;</span>     <span class="s">&#34;race&#34;</span></span></span></code></pre></div>
</figure>
<h4 id="create-your-own-data">
  <span class="heading-mark">Create Your Own Data</span>
  <a class="heading-anchor" href="#create-your-own-data" aria-label="Link to this section">#</a>
</h4>
<p>Inputing data into R and sharing it back out with others is really easy. Part of
the power of R is the ability to create diverse data structures very easily.
Let&#8217;s create a simulated data frame of student test scores and demographics.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">Data</span> <span class="o">&lt;-</span> <span class="nf">data.frame</span><span class="p">(</span><span class="n">id</span> <span class="o">=</span> <span class="nf">seq</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">1000</span><span class="p">),</span> <span class="n">gender</span> <span class="o">=</span> <span class="nf">sample</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s">&#34;male&#34;</span><span class="p">,</span> <span class="s">&#34;female&#34;</span><span class="p">),</span> <span class="m">1000</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">    <span class="n">replace</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">),</span> <span class="n">mathSS</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">1000</span><span class="p">,</span> <span class="n">mean</span> <span class="o">=</span> <span class="m">400</span><span class="p">,</span> <span class="n">sd</span> <span class="o">=</span> <span class="m">60</span><span class="p">),</span> <span class="n">readSS</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">1000</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">    <span class="n">mean</span> <span class="o">=</span> <span class="m">370</span><span class="p">,</span> <span class="n">sd</span> <span class="o">=</span> <span class="m">58.3</span><span class="p">),</span> <span class="n">race</span> <span class="o">=</span> <span class="nf">sample</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s">&#34;H&#34;</span><span class="p">,</span> <span class="s">&#34;B&#34;</span><span class="p">,</span> <span class="s">&#34;W&#34;</span><span class="p">,</span> <span class="s">&#34;I&#34;</span><span class="p">,</span> <span class="s">&#34;A&#34;</span><span class="p">),</span> <span class="m">1000</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">    <span class="n">replace</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nf">head</span><span class="p">(</span><span class="n">Data</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl">  <span class="n">id</span> <span class="n">gender</span> <span class="n">mathSS</span> <span class="n">readSS</span> <span class="n">race</span>
</span></span><span class="line"><span class="cl"><span class="m">1</span>  <span class="m">1</span> <span class="n">female</span>  <span class="m">396.6</span>  <span class="m">349.2</span>    <span class="n">H</span>
</span></span><span class="line"><span class="cl"><span class="m">2</span>  <span class="m">2</span>   <span class="n">male</span>  <span class="m">369.5</span>  <span class="m">330.7</span>    <span class="n">W</span>
</span></span><span class="line"><span class="cl"><span class="m">3</span>  <span class="m">3</span> <span class="n">female</span>  <span class="m">423.3</span>  <span class="m">354.3</span>    <span class="n">B</span>
</span></span><span class="line"><span class="cl"><span class="m">4</span>  <span class="m">4</span>   <span class="n">male</span>  <span class="m">348.7</span>  <span class="m">333.1</span>    <span class="n">W</span>
</span></span><span class="line"><span class="cl"><span class="m">5</span>  <span class="m">5</span>   <span class="n">male</span>  <span class="m">299.7</span>  <span class="m">353.4</span>    <span class="n">H</span>
</span></span><span class="line"><span class="cl"><span class="m">6</span>  <span class="m">6</span> <span class="n">female</span>  <span class="m">338.0</span>  <span class="m">422.1</span>    <span class="n">I</span></span></span></code></pre></div>
</figure>
<p>And, just like that, we have simulated student data. This is a great way to
evaluate problems with plotting data or with large datasets, since we can ask
R to generate a random dataset that is incredibly large if necessary. However,
let&#8217;s look at the relationship among our variables using a quick plot:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">qplot</span><span class="p">(</span><span class="n">mathSS</span><span class="p">,</span> <span class="n">readSS</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">Data</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="n">race</span><span class="p">)</span> <span class="o">+</span> <span class="nf">theme_bw</span><span class="p">()</span></span></span></code></pre></div>
</figure>
<p><figure class="post-figure">
  <div class="photo-frame"><img src="/posts/writing-a-minimal-working-example-mwe-in-r/plot-1.png" alt="Scatterplot of simulated reading scores against math scores, colored by race group — a diffuse cloud with no visible relationship." width="504" height="504" loading="lazy" decoding="async"></div></figure>
</p>
<p>It looks like race is pretty evenly distributed and there is no relationship
among <code>mathSS</code> and <code>readSS</code>. For some applications this data is sufficient, but
for others we may wish for data that is more realistic.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">table</span><span class="p">(</span><span class="n">Data</span><span class="o">$</span><span class="n">race</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl">  <span class="n">A</span>   <span class="n">B</span>   <span class="n">H</span>   <span class="n">I</span>   <span class="n">W</span> 
</span></span><span class="line"><span class="cl"><span class="m">192</span> <span class="m">195</span> <span class="m">202</span> <span class="m">203</span> <span class="m">208</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">cor</span><span class="p">(</span><span class="n">Data</span><span class="o">$</span><span class="n">mathSS</span><span class="p">,</span> <span class="n">Data</span><span class="o">$</span><span class="n">readSS</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">[1]</span> <span class="m">-0.01236</span></span></span></code></pre></div>
</figure>
<h4 id="output-your-current-data">
  <span class="heading-mark">Output Your Current Data</span>
  <a class="heading-anchor" href="#output-your-current-data" aria-label="Link to this section">#</a>
</h4>
<p>Sometimes you just want to show others the data you are using and see why
the problem won&#8217;t work. The best practice here is to make a subset of the data
you are working on, and then output it using the <code>dput</code> command.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">dput</span><span class="p">(</span><span class="nf">head</span><span class="p">(</span><span class="n">stulevel</span><span class="p">,</span> <span class="m">5</span><span class="p">))</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">structure</span><span class="p">(</span><span class="nf">list</span><span class="p">(</span><span class="n">X</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">44L</span><span class="p">,</span> <span class="m">53L</span><span class="p">,</span> <span class="m">116L</span><span class="p">,</span> <span class="m">244L</span><span class="p">,</span> <span class="m">274L</span><span class="p">),</span> <span class="n">school</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">1L</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">1L</span><span class="p">,</span> <span class="m">1L</span><span class="p">,</span> <span class="m">1L</span><span class="p">,</span> <span class="m">1L</span><span class="p">),</span> <span class="n">stuid</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">149995L</span><span class="p">,</span> <span class="m">13495L</span><span class="p">,</span> <span class="m">106495L</span><span class="p">,</span> <span class="m">45205L</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">142705L</span><span class="p">),</span> <span class="n">grade</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">3L</span><span class="p">,</span> <span class="m">3L</span><span class="p">,</span> <span class="m">3L</span><span class="p">,</span> <span class="m">3L</span><span class="p">,</span> <span class="m">3L</span><span class="p">),</span> <span class="n">schid</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">495L</span><span class="p">,</span> <span class="m">495L</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">495L</span><span class="p">,</span> <span class="m">205L</span><span class="p">,</span> <span class="m">205L</span><span class="p">),</span> <span class="n">dist</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">105L</span><span class="p">,</span> <span class="m">45L</span><span class="p">,</span> <span class="m">45L</span><span class="p">,</span> <span class="m">15L</span><span class="p">,</span> <span class="m">75L</span><span class="p">),</span> <span class="n">white</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0L</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">),</span> <span class="n">black</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">1L</span><span class="p">,</span> <span class="m">1L</span><span class="p">,</span> <span class="m">1L</span><span class="p">,</span> <span class="m">1L</span><span class="p">,</span> <span class="m">1L</span><span class="p">),</span> <span class="n">hisp</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0L</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">),</span> <span class="n">indian</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">),</span> <span class="n">asian</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0L</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">),</span> <span class="n">econ</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0L</span><span class="p">,</span> <span class="m">1L</span><span class="p">,</span> <span class="m">1L</span><span class="p">,</span> <span class="m">1L</span><span class="p">,</span> <span class="m">1L</span><span class="p">),</span> <span class="n">female</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0L</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">),</span> <span class="n">ell</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">),</span> <span class="n">disab</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">),</span> <span class="n">sch_fay</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">),</span> <span class="n">dist_fay</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0L</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">),</span> <span class="n">luck</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0L</span><span class="p">,</span> <span class="m">1L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">1L</span><span class="p">,</span> <span class="m">0L</span><span class="p">),</span> <span class="n">ability</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">87.8540493076978</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">97.7875614875502</span><span class="p">,</span> <span class="m">104.493033823157</span><span class="p">,</span> <span class="m">111.671512686787</span><span class="p">,</span> <span class="m">81.9253913501755</span>
</span></span><span class="line"><span class="cl"><span class="p">),</span> <span class="n">measerr</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">11.1332639734731</span><span class="p">,</span> <span class="m">6.8223938284885</span><span class="p">,</span> <span class="m">-7.85615858883968</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">-17.5741522573204</span><span class="p">,</span> <span class="m">52.9833376218976</span><span class="p">),</span> <span class="n">teachq</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">39.0902471213577</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">0.0984819168655733</span><span class="p">,</span> <span class="m">39.5388526976972</span><span class="p">,</span> <span class="m">24.1161227728637</span><span class="p">,</span> <span class="m">56.6806130368238</span>
</span></span><span class="line"><span class="cl"><span class="p">),</span> <span class="n">year</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">2000L</span><span class="p">,</span> <span class="m">2000L</span><span class="p">,</span> <span class="m">2000L</span><span class="p">,</span> <span class="m">2000L</span><span class="p">,</span> <span class="m">2000L</span><span class="p">),</span> <span class="n">attday</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">180L</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">180L</span><span class="p">,</span> <span class="m">160L</span><span class="p">,</span> <span class="m">168L</span><span class="p">,</span> <span class="m">156L</span><span class="p">),</span> <span class="n">schoolscore</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">29.2242722609726</span><span class="p">,</span> <span class="m">55.9632592971956</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">55.9632592971956</span><span class="p">,</span> <span class="m">55.9632592971956</span><span class="p">,</span> <span class="m">55.9632592971956</span><span class="p">),</span> <span class="n">district</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">3L</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">3L</span><span class="p">,</span> <span class="m">3L</span><span class="p">,</span> <span class="m">3L</span><span class="p">,</span> <span class="m">3L</span><span class="p">),</span> <span class="n">schoolhigh</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">),</span> <span class="n">schoolavg</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">1L</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">1L</span><span class="p">,</span> <span class="m">1L</span><span class="p">,</span> <span class="m">1L</span><span class="p">,</span> <span class="m">1L</span><span class="p">),</span> <span class="n">schoollow</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">,</span> <span class="m">0L</span><span class="p">),</span> <span class="n">readSS</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">357.286464546893</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">263.904581222636</span><span class="p">,</span> <span class="m">369.672179143784</span><span class="p">,</span> <span class="m">346.595665384202</span><span class="p">,</span> <span class="m">373.125445669888</span>
</span></span><span class="line"><span class="cl"><span class="p">),</span> <span class="n">mathSS</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">387.280282915207</span><span class="p">,</span> <span class="m">302.572371332695</span><span class="p">,</span> <span class="m">365.461432571883</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">344.496386434725</span><span class="p">,</span> <span class="m">441.15810279391</span><span class="p">),</span> <span class="n">proflvl</span> <span class="o">=</span> <span class="nf">structure</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="m">2L</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">3L</span><span class="p">,</span> <span class="m">2L</span><span class="p">,</span> <span class="m">2L</span><span class="p">,</span> <span class="m">2L</span><span class="p">),</span> <span class="n">.Label</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;advanced&#34;</span><span class="p">,</span> <span class="s">&#34;basic&#34;</span><span class="p">,</span> <span class="s">&#34;below basic&#34;</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="s">&#34;proficient&#34;</span><span class="p">),</span> <span class="n">class</span> <span class="o">=</span> <span class="s">&#34;factor&#34;</span><span class="p">),</span> <span class="n">race</span> <span class="o">=</span> <span class="nf">structure</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="m">2L</span><span class="p">,</span> <span class="m">2L</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">2L</span><span class="p">,</span> <span class="m">2L</span><span class="p">,</span> <span class="m">2L</span><span class="p">),</span> <span class="n">.Label</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;A&#34;</span><span class="p">,</span> <span class="s">&#34;B&#34;</span><span class="p">,</span> <span class="s">&#34;H&#34;</span><span class="p">,</span> <span class="s">&#34;I&#34;</span><span class="p">,</span> <span class="s">&#34;W&#34;</span><span class="p">),</span> <span class="n">class</span> <span class="o">=</span> <span class="s">&#34;factor&#34;</span><span class="p">)),</span> <span class="n">.Names</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;X&#34;</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="s">&#34;school&#34;</span><span class="p">,</span> <span class="s">&#34;stuid&#34;</span><span class="p">,</span> <span class="s">&#34;grade&#34;</span><span class="p">,</span> <span class="s">&#34;schid&#34;</span><span class="p">,</span> <span class="s">&#34;dist&#34;</span><span class="p">,</span> <span class="s">&#34;white&#34;</span><span class="p">,</span> <span class="s">&#34;black&#34;</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="s">&#34;hisp&#34;</span><span class="p">,</span> <span class="s">&#34;indian&#34;</span><span class="p">,</span> <span class="s">&#34;asian&#34;</span><span class="p">,</span> <span class="s">&#34;econ&#34;</span><span class="p">,</span> <span class="s">&#34;female&#34;</span><span class="p">,</span> <span class="s">&#34;ell&#34;</span><span class="p">,</span> <span class="s">&#34;disab&#34;</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="s">&#34;sch_fay&#34;</span><span class="p">,</span> <span class="s">&#34;dist_fay&#34;</span><span class="p">,</span> <span class="s">&#34;luck&#34;</span><span class="p">,</span> <span class="s">&#34;ability&#34;</span><span class="p">,</span> <span class="s">&#34;measerr&#34;</span><span class="p">,</span> <span class="s">&#34;teachq&#34;</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="s">&#34;year&#34;</span><span class="p">,</span> <span class="s">&#34;attday&#34;</span><span class="p">,</span> <span class="s">&#34;schoolscore&#34;</span><span class="p">,</span> <span class="s">&#34;district&#34;</span><span class="p">,</span> <span class="s">&#34;schoolhigh&#34;</span><span class="p">,</span> <span class="s">&#34;schoolavg&#34;</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="s">&#34;schoollow&#34;</span><span class="p">,</span> <span class="s">&#34;readSS&#34;</span><span class="p">,</span> <span class="s">&#34;mathSS&#34;</span><span class="p">,</span> <span class="s">&#34;proflvl&#34;</span><span class="p">,</span> <span class="s">&#34;race&#34;</span><span class="p">),</span> <span class="n">row.names</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="kc">NA</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"><span class="m">5L</span><span class="p">),</span> <span class="n">class</span> <span class="o">=</span> <span class="s">&#34;data.frame&#34;</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<p>The resulting code can be copied and pasted into an R terminal and it will
automatically build the dataset up exactly as described. Note, that in the above
example, it might have been better if I first cut out all the unnecessary
variables for my problem before I executed the <code>dput</code> command. The goal is to
make the data only necessary to reproduce your code available.</p>
<p>Also, note, that we never send <strong>student level</strong> data from LDS over e-mail
as this is unsecure. For work on student level data, it is better to either
simulate the data or to use the built in simulated data from the <code>eeptools</code>
package to run your examples.</p>
<h4 id="anonymizing-your-data">
  <span class="heading-mark">Anonymizing Your Data</span>
  <a class="heading-anchor" href="#anonymizing-your-data" aria-label="Link to this section">#</a>
</h4>
<p>It may also be the case that you want to <code>dput</code> your data, but you want to keep
the contents of your data anonymous. A Google search came up with a decent
looking function to carry this out:</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">anonym</span> <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">(</span><span class="n">df</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="kr">if</span> <span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">df</span><span class="p">)</span> <span class="o">&gt;</span> <span class="m">26</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="kc">LETTERS</span> <span class="o">&lt;-</span> <span class="nf">replicate</span><span class="p">(</span><span class="nf">floor</span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="o">/</span><span class="m">26</span><span class="p">),</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="kc">LETTERS</span> <span class="o">&lt;-</span> <span class="nf">c</span><span class="p">(</span><span class="kc">LETTERS</span><span class="p">,</span> <span class="nf">paste</span><span class="p">(</span><span class="kc">LETTERS</span><span class="p">,</span> <span class="kc">LETTERS</span><span class="p">,</span> <span class="n">sep</span> <span class="o">=</span> <span class="s">&#34;&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">        <span class="p">})</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="nf">names</span><span class="p">(</span><span class="n">df</span><span class="p">)</span> <span class="o">&lt;-</span> <span class="nf">paste</span><span class="p">(</span><span class="kc">LETTERS</span><span class="n">[1</span><span class="o">:</span><span class="nf">length</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="n">]</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">level.id.df</span> <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">(</span><span class="n">df</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="n">level.id</span> <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="kr">if</span> <span class="p">(</span><span class="nf">class</span><span class="p">(</span><span class="n">df[</span><span class="p">,</span> <span class="n">i]</span><span class="p">)</span> <span class="o">==</span> <span class="s">&#34;factor&#34;</span> <span class="o">|</span> <span class="nf">class</span><span class="p">(</span><span class="n">df[</span><span class="p">,</span> <span class="n">i]</span><span class="p">)</span> <span class="o">==</span> <span class="s">&#34;character&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="n">column</span> <span class="o">&lt;-</span> <span class="nf">paste</span><span class="p">(</span><span class="nf">names</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="n">[i]</span><span class="p">,</span> <span class="nf">as.numeric</span><span class="p">(</span><span class="nf">as.factor</span><span class="p">(</span><span class="n">df[</span><span class="p">,</span> <span class="n">i]</span><span class="p">)),</span> 
</span></span><span class="line"><span class="cl">                  <span class="n">sep</span> <span class="o">=</span> <span class="s">&#34;.&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="p">}</span> <span class="kr">else</span> <span class="kr">if</span> <span class="p">(</span><span class="nf">is.numeric</span><span class="p">(</span><span class="n">df[</span><span class="p">,</span> <span class="n">i]</span><span class="p">))</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="n">column</span> <span class="o">&lt;-</span> <span class="n">df[</span><span class="p">,</span> <span class="n">i]</span><span class="o">/</span><span class="nf">mean</span><span class="p">(</span><span class="n">df[</span><span class="p">,</span> <span class="n">i]</span><span class="p">,</span> <span class="n">na.rm</span> <span class="o">=</span> <span class="bp">T</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="p">}</span> <span class="kr">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="n">column</span> <span class="o">&lt;-</span> <span class="n">df[</span><span class="p">,</span> <span class="n">i]</span>
</span></span><span class="line"><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="cl">            <span class="kr">return</span><span class="p">(</span><span class="n">column</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="n">DF</span> <span class="o">&lt;-</span> <span class="nf">data.frame</span><span class="p">(</span><span class="nf">sapply</span><span class="p">(</span><span class="nf">seq_along</span><span class="p">(</span><span class="n">df</span><span class="p">),</span> <span class="n">level.id</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">        <span class="nf">names</span><span class="p">(</span><span class="n">DF</span><span class="p">)</span> <span class="o">&lt;-</span> <span class="nf">names</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="kr">return</span><span class="p">(</span><span class="n">DF</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="n">df</span> <span class="o">&lt;-</span> <span class="nf">level.id.df</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="kr">return</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">test</span> <span class="o">&lt;-</span> <span class="nf">anonym</span><span class="p">(</span><span class="n">stulevel</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">head</span><span class="p">(</span><span class="n">test[</span><span class="p">,</span> <span class="nf">c</span><span class="p">(</span><span class="m">2</span><span class="o">:</span><span class="m">6</span><span class="p">,</span> <span class="m">28</span><span class="o">:</span><span class="m">32</span><span class="p">)</span><span class="n">]</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl">                    <span class="n">B</span>                 <span class="n">C</span>                 <span class="n">D</span>
</span></span><span class="line"><span class="cl"><span class="m">1</span> <span class="m">0.00217632592657076</span>  <span class="m">1.51160611230132</span> <span class="m">0.551020408163265</span>
</span></span><span class="line"><span class="cl"><span class="m">2</span> <span class="m">0.00217632592657076</span> <span class="m">0.135998696526593</span> <span class="m">0.551020408163265</span>
</span></span><span class="line"><span class="cl"><span class="m">3</span> <span class="m">0.00217632592657076</span>  <span class="m">1.07322572705443</span> <span class="m">0.551020408163265</span>
</span></span><span class="line"><span class="cl"><span class="m">4</span> <span class="m">0.00217632592657076</span> <span class="m">0.455562880806568</span> <span class="m">0.551020408163265</span>
</span></span><span class="line"><span class="cl"><span class="m">5</span> <span class="m">0.00217632592657076</span>  <span class="m">1.43813960635994</span> <span class="m">0.551020408163265</span>
</span></span><span class="line"><span class="cl"><span class="m">6</span> <span class="m">0.00217632592657076</span> <span class="m">0.151115261535106</span> <span class="m">0.551020408163265</span>
</span></span><span class="line"><span class="cl">                  <span class="n">E</span>                 <span class="bp">F</span> <span class="n">BB</span>                <span class="n">CC</span>
</span></span><span class="line"><span class="cl"><span class="m">1</span>   <span class="m">1.3475499092559</span>  <span class="m">2.01923076923077</span>  <span class="m">0</span> <span class="m">0.720073808281278</span>
</span></span><span class="line"><span class="cl"><span class="m">2</span>   <span class="m">1.3475499092559</span> <span class="m">0.865384615384615</span>  <span class="m">0</span> <span class="m">0.531872308862454</span>
</span></span><span class="line"><span class="cl"><span class="m">3</span>   <span class="m">1.3475499092559</span> <span class="m">0.865384615384615</span>  <span class="m">0</span> <span class="m">0.745035931291952</span>
</span></span><span class="line"><span class="cl"><span class="m">4</span> <span class="m">0.558076225045372</span> <span class="m">0.288461538461538</span>  <span class="m">0</span> <span class="m">0.698527611516136</span>
</span></span><span class="line"><span class="cl"><span class="m">5</span> <span class="m">0.558076225045372</span>  <span class="m">1.44230769230769</span>  <span class="m">0</span> <span class="m">0.751995631770993</span>
</span></span><span class="line"><span class="cl"><span class="m">6</span>   <span class="m">1.3475499092559</span>  <span class="m">2.01923076923077</span>  <span class="m">0</span> <span class="m">0.880245964840198</span>
</span></span><span class="line"><span class="cl">                 <span class="n">DD</span>   <span class="n">EE</span>   <span class="n">FF</span>
</span></span><span class="line"><span class="cl"><span class="m">1</span> <span class="m">0.801153708902007</span> <span class="n">EE.2</span> <span class="n">FF.2</span>
</span></span><span class="line"><span class="cl"><span class="m">2</span> <span class="m">0.625921298341795</span> <span class="n">EE.3</span> <span class="n">FF.2</span>
</span></span><span class="line"><span class="cl"><span class="m">3</span> <span class="m">0.756017786295901</span> <span class="n">EE.2</span> <span class="n">FF.2</span>
</span></span><span class="line"><span class="cl"><span class="m">4</span> <span class="m">0.712648099763826</span> <span class="n">EE.2</span> <span class="n">FF.2</span>
</span></span><span class="line"><span class="cl"><span class="m">5</span> <span class="m">0.912608944625505</span> <span class="n">EE.2</span> <span class="n">FF.2</span>
</span></span><span class="line"><span class="cl"><span class="m">6</span> <span class="m">0.958626895492888</span> <span class="n">EE.4</span> <span class="n">FF.2</span></span></span></code></pre></div>
</figure>
<p>That looks pretty generic and anonymized to me!</p>
<h4 id="notes">
  <span class="heading-mark">Notes</span>
  <a class="heading-anchor" href="#notes" aria-label="Link to this section">#</a>
</h4>
<ul>
<li>Most of these solutions do not include missing data (NAs) which are often the
source of problems in R. That limits their usefulness.</li>
<li>So, always check for NA values.</li>
</ul>
<h3 id="creating-the-example">
  <span class="heading-mark">Creating the Example</span>
  <a class="heading-anchor" href="#creating-the-example" aria-label="Link to this section">#</a>
</h3>
<p>Once we have our minimal dataset, we need to reproduce our error on <em>that dataset.</em>
This part is critical. If the error goes away when you apply your code to the
minimal dataset, then it will be very hard for others to diagnose the problem
remotely, and it might be time to get some “at your desk” help.</p>
<p>Let&#8217;s look at an example where we have an error aggregating data. Let&#8217;s assume
I am creating a new data frame for my example, and trying to aggregate that data
by race.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">Data</span> <span class="o">&lt;-</span> <span class="nf">data.frame</span><span class="p">(</span><span class="n">id</span> <span class="o">=</span> <span class="nf">seq</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">1000</span><span class="p">),</span> <span class="n">gender</span> <span class="o">=</span> <span class="nf">sample</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s">&#34;male&#34;</span><span class="p">,</span> <span class="s">&#34;female&#34;</span><span class="p">),</span> <span class="m">1000</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">    <span class="n">replace</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">),</span> <span class="n">mathSS</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">1000</span><span class="p">,</span> <span class="n">mean</span> <span class="o">=</span> <span class="m">400</span><span class="p">,</span> <span class="n">sd</span> <span class="o">=</span> <span class="m">60</span><span class="p">),</span> <span class="n">readSS</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">1000</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">    <span class="n">mean</span> <span class="o">=</span> <span class="m">370</span><span class="p">,</span> <span class="n">sd</span> <span class="o">=</span> <span class="m">58.3</span><span class="p">),</span> <span class="n">race</span> <span class="o">=</span> <span class="nf">sample</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s">&#34;H&#34;</span><span class="p">,</span> <span class="s">&#34;B&#34;</span><span class="p">,</span> <span class="s">&#34;W&#34;</span><span class="p">,</span> <span class="s">&#34;I&#34;</span><span class="p">,</span> <span class="s">&#34;A&#34;</span><span class="p">),</span> <span class="m">1000</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">    <span class="n">replace</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">myAgg</span> <span class="o">&lt;-</span> <span class="n">Data[</span><span class="p">,</span> <span class="nf">list</span><span class="p">(</span><span class="n">meanM</span> <span class="o">=</span> <span class="nf">mean</span><span class="p">(</span><span class="n">mathSS</span><span class="p">)),</span> <span class="n">by</span> <span class="o">=</span> <span class="n">race]</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">Error</span><span class="o">:</span> <span class="n">unused</span> <span class="nf">argument</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="p">(</span><span class="n">by</span> <span class="o">=</span> <span class="n">race</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">head</span><span class="p">(</span><span class="n">myAgg</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">Error</span><span class="o">:</span> <span class="n">object</span> <span class="s">&#39;myAgg&#39;</span> <span class="n">not</span> <span class="n">found</span></span></span></code></pre></div>
</figure>
<p>Why do I get an error? Well, if you sent the above code to someone, they could
quickly evaluate it for errors, and look at the mistake if they knew you were
attempting to use the data.table package.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">data.table</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">Data</span> <span class="o">&lt;-</span> <span class="nf">data.frame</span><span class="p">(</span><span class="n">id</span> <span class="o">=</span> <span class="nf">seq</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">1000</span><span class="p">),</span> <span class="n">gender</span> <span class="o">=</span> <span class="nf">sample</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s">&#34;male&#34;</span><span class="p">,</span> <span class="s">&#34;female&#34;</span><span class="p">),</span> <span class="m">1000</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">    <span class="n">replace</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">),</span> <span class="n">mathSS</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">1000</span><span class="p">,</span> <span class="n">mean</span> <span class="o">=</span> <span class="m">400</span><span class="p">,</span> <span class="n">sd</span> <span class="o">=</span> <span class="m">60</span><span class="p">),</span> <span class="n">readSS</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">1000</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">    <span class="n">mean</span> <span class="o">=</span> <span class="m">370</span><span class="p">,</span> <span class="n">sd</span> <span class="o">=</span> <span class="m">58.3</span><span class="p">),</span> <span class="n">race</span> <span class="o">=</span> <span class="nf">sample</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s">&#34;H&#34;</span><span class="p">,</span> <span class="s">&#34;B&#34;</span><span class="p">,</span> <span class="s">&#34;W&#34;</span><span class="p">,</span> <span class="s">&#34;I&#34;</span><span class="p">,</span> <span class="s">&#34;A&#34;</span><span class="p">),</span> <span class="m">1000</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">    <span class="n">replace</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">Data</span> <span class="o">&lt;-</span> <span class="nf">data.table</span><span class="p">(</span><span class="n">Data</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">myAgg</span> <span class="o">&lt;-</span> <span class="n">Data[</span><span class="p">,</span> <span class="nf">list</span><span class="p">(</span><span class="n">meanM</span> <span class="o">=</span> <span class="nf">mean</span><span class="p">(</span><span class="n">mathSS</span><span class="p">)),</span> <span class="n">by</span> <span class="o">=</span> <span class="n">race]</span>
</span></span><span class="line"><span class="cl"><span class="nf">head</span><span class="p">(</span><span class="n">myAgg</span><span class="p">)</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl">   <span class="n">race</span> <span class="n">meanM</span>
</span></span><span class="line"><span class="cl"><span class="m">1</span><span class="o">:</span>    <span class="n">H</span> <span class="m">398.6</span>
</span></span><span class="line"><span class="cl"><span class="m">2</span><span class="o">:</span>    <span class="n">B</span> <span class="m">405.1</span>
</span></span><span class="line"><span class="cl"><span class="m">3</span><span class="o">:</span>    <span class="n">A</span> <span class="m">397.8</span>
</span></span><span class="line"><span class="cl"><span class="m">4</span><span class="o">:</span>    <span class="n">W</span> <span class="m">395.7</span>
</span></span><span class="line"><span class="cl"><span class="m">5</span><span class="o">:</span>    <span class="n">I</span> <span class="m">399.1</span></span></span></code></pre></div>
</figure>
<h3 id="session-info">
  <span class="heading-mark">Session Info</span>
  <a class="heading-anchor" href="#session-info" aria-label="Link to this section">#</a>
</h3>
<p>However, they might not know this, so we need to provide one final piece of
information. This is known was the <code>sessionInfo</code> for our R session. To diagnose
the error it is necessary to know what system you are running on, what packages
are loaded in your workspace, and what version of R and a given package you are
using.</p>
<p>Thankfully, R makes this incredibly easy. Just tack on the output from the
<code>sessionInfo()</code> function. This is easy enough to copy and paste or include in
a <code>knitr</code> document.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">sessionInfo</span><span class="p">()</span></span></span></code></pre></div>
</figure>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">R</span> <span class="n">version</span> <span class="m">2.15.2</span> <span class="p">(</span><span class="m">2012-10-26</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">Platform</span><span class="o">:</span> <span class="n">x86_64</span><span class="o">-</span><span class="n">w64</span><span class="o">-</span><span class="n">mingw32</span><span class="o">/</span><span class="nf">x64 </span><span class="p">(</span><span class="m">64</span><span class="o">-</span><span class="n">bit</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">locale</span><span class="o">:</span>
</span></span><span class="line"><span class="cl"><span class="n">[1]</span> <span class="n">LC_COLLATE</span><span class="o">=</span><span class="n">English_United</span> <span class="n">States.1252</span> 
</span></span><span class="line"><span class="cl"><span class="n">[2]</span> <span class="n">LC_CTYPE</span><span class="o">=</span><span class="n">English_United</span> <span class="n">States.1252</span>   
</span></span><span class="line"><span class="cl"><span class="n">[3]</span> <span class="n">LC_MONETARY</span><span class="o">=</span><span class="n">English_United</span> <span class="n">States.1252</span>
</span></span><span class="line"><span class="cl"><span class="n">[4]</span> <span class="n">LC_NUMERIC</span><span class="o">=</span><span class="n">C</span>                          
</span></span><span class="line"><span class="cl"><span class="n">[5]</span> <span class="n">LC_TIME</span><span class="o">=</span><span class="n">English_United</span> <span class="n">States.1252</span>    
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">attached</span> <span class="n">base</span> <span class="n">packages</span><span class="o">:</span>
</span></span><span class="line"><span class="cl"><span class="n">[1]</span> <span class="n">stats</span>     <span class="n">graphics</span>  <span class="n">grDevices</span> <span class="n">utils</span>     <span class="n">datasets</span>  <span class="n">methods</span>   <span class="n">base</span>     
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">other</span> <span class="n">attached</span> <span class="n">packages</span><span class="o">:</span>
</span></span><span class="line"><span class="cl"><span class="n">[1]</span> <span class="n">data.table_1.8.8</span> <span class="n">eeptools_0.2</span>     <span class="n">ggplot2_0.9.3.1</span>  <span class="n">knitr_1.2</span>       
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">loaded</span> <span class="n">via</span> <span class="n">a</span> <span class="nf">namespace </span><span class="p">(</span><span class="n">and</span> <span class="n">not</span> <span class="n">attached</span><span class="p">)</span><span class="o">:</span>
</span></span><span class="line"><span class="cl"> <span class="n">[1]</span> <span class="n">colorspace_1.2</span><span class="m">-2</span>   <span class="n">dichromat_2.0</span><span class="m">-0</span>    <span class="n">digest_0.6.3</span>      
</span></span><span class="line"><span class="cl"> <span class="n">[4]</span> <span class="n">evaluate_0.4.3</span>     <span class="n">formatR_0.7</span>        <span class="n">grid_2.15.2</span>       
</span></span><span class="line"><span class="cl"> <span class="n">[7]</span> <span class="n">gtable_0.1.2</span>       <span class="n">labeling_0.1</span>       <span class="n">MASS_7.3</span><span class="m">-23</span>       
</span></span><span class="line"><span class="cl"><span class="n">[10]</span> <span class="n">munsell_0.4</span>        <span class="n">plyr_1.8</span>           <span class="n">proto_0.3</span><span class="m">-10</span>      
</span></span><span class="line"><span class="cl"><span class="n">[13]</span> <span class="n">RColorBrewer_1.0</span><span class="m">-5</span> <span class="n">reshape2_1.2.2</span>     <span class="n">scales_0.2.3</span>      
</span></span><span class="line"><span class="cl"><span class="n">[16]</span> <span class="n">stringr_0.6.2</span>      <span class="n">tools_2.15.2</span></span></span></code></pre></div>
</figure>
<h3 id="resources">
  <span class="heading-mark">Resources</span>
  <a class="heading-anchor" href="#resources" aria-label="Link to this section">#</a>
</h3>
<p>For more information, visit:</p>
<ul>
<li><a href="http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example">http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example</a>
</li>
<li><a href="https://github.com/hadley/devtools/wiki/Reproducibility">https://github.com/hadley/devtools/wiki/Reproducibility</a>
</li>
<li><a href="http://stackoverflow.com/questions/10454973/how-to-create-example-data-set-from-private-data-replacing-variable-names-and-l/10458688#10458688">http://stackoverflow.com/questions/10454973/how-to-create-example-data-set-from-private-data-replacing-variable-names-and-l/10458688#10458688</a>
</li>
</ul>
<p>Get the source code for this blogpost in a Gist here: <a href="https://gist.github.com/jknowles/5659390">https://gist.github.com/jknowles/5659390</a>
​</p>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "How to Ask for Help using R How to Ask for Help using R # The key to getting good help with an R problem is to provide a minimally working reproducible example (MWRE). Making an MWRE is really easy with R, and it will help ensure that those helping you can identify the source of the error, and ideally submit to you back the corrected code to fix the error instead of sending you hunting for code that works. To have an MWRE you need the following items:",
  "og_image": "https://jaredknowles.com/og/posts/writing-a-minimal-working-example-mwe-in-r.png",
  "og_title": "Writing a Minimal Working Example (MWE) in R"
}
</posse:post></entry><entry><title>Announcing eeptools 0.2</title><link href="https://jaredknowles.com/posts/announcing-eeptools-02/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/announcing-eeptools-02/</id><published>2013-04-05T01:10:00Z</published><updated>2013-04-05T01:10:00Z</updated><category term="R"/><summary>My R package eeptools has reached version 0.2. As with the last release, this is still a preliminary release which means that functionality is not full, function names and code behavior may still change from version to version, and I am still looking for suggestions and ideas to improve the package – check it out on GitHub . Below, here are the changes in the new version: ​ eeptools 0.2 +++++++++++++++++++++++++++++ New functions for building maps with shapefiles including mapmerge to merge a dataframe and a shapefile, and ggmapmerge to conver this to a document for making a map in ggplot2 statamode updated to allow for multiple methods for handling multiple modes remove_stars deleted and replaced with remove_char to allow for users to specify an arbitrary character string to be removed New plotForWord function to export plots in a Windows MetaFile for inclusion in Microsoft Office documents New age_calc function to allow calculating the age of a vector of birthdates relative to the current date Fix typos in documentation Fix startup message behavior Remove dependencies of the package dramatically so loading is faster and more lightweight</summary><content type="html"><![CDATA[<p>My R package <a href="http://cran.r-project.org/web/packages/eeptools/index.html">eeptools</a>
 has reached version 0.2. As with the last release, this is still a preliminary release which means that functionality is not full, function names and code behavior may still change from version to version, and I am still looking for suggestions and ideas to improve the package &#8211; <a href="https://github.com/jknowles/eeptools">check it out on GitHub</a>
. Below, here are the changes in the new version:</p>
<p>​</p>
<p>eeptools 0.2</p>
<p>+++++++++++++++++++++++++++++</p>
<ul>
<li>New functions for building maps with shapefiles including mapmerge to merge a dataframe and a shapefile, and ggmapmerge to conver this to a document for making a map in ggplot2</li>
<li>statamode updated to allow for multiple methods for handling multiple modes</li>
<li>remove_stars deleted and replaced with remove_char to allow for users to specify an arbitrary character string to be removed</li>
<li>New plotForWord function to export plots in a Windows MetaFile for inclusion in Microsoft Office documents</li>
<li>New age_calc function to allow calculating the age of a vector of birthdates relative to the current date</li>
<li>Fix typos in documentation</li>
<li>Fix startup message behavior</li>
<li>Remove dependencies of the package dramatically so loading is faster and more lightweight</li>
</ul>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "My R package eeptools has reached version 0.2. As with the last release, this is still a preliminary release which means that functionality is not full, function names and code behavior may still change from version to version, and I am still looking for suggestions and ideas to improve the package – check it out on GitHub . Below, here are the changes in the new version: ​ eeptools 0.2 +++++++++++++++++++++++++++++ New functions for building maps with shapefiles including mapmerge to merge a dataframe and a shapefile, and ggmapmerge to conver this to a document for making a map in ggplot2 statamode updated to allow for multiple methods for handling multiple modes remove_stars deleted and replaced with remove_char to allow for users to specify an arbitrary character string to be removed New plotForWord function to export plots in a Windows MetaFile for inclusion in Microsoft Office documents New age_calc function to allow calculating the age of a vector of birthdates relative to the current date Fix typos in documentation Fix startup message behavior Remove dependencies of the package dramatically so loading is faster and more lightweight",
  "og_image": "https://jaredknowles.com/og/posts/announcing-eeptools-02.png",
  "og_title": "Announcing eeptools 0.2"
}
</posse:post></entry><entry><title>Tips and Tricks for HTML and R</title><link href="https://jaredknowles.com/posts/tips-and-tricks-for-html-and-r/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/tips-and-tricks-for-html-and-r/</id><published>2013-03-04T02:43:02Z</published><updated>2013-03-04T02:43:02Z</updated><summary>Over the past two months I have tried to convert completely to HTML5 slides , HTML reports and R + knitr . The switch from Sweave came with a few frustrations but I think overall it is way better - it is incredibly efficient and while some flexibility on report style is given up, a lot of speed is gained. To help make the change smoother, I have found that a a few pieces of non-R code have really helped me make HTML reports have parity with what I was doing before in R. An example is the CSS header below.</summary><content type="html"><![CDATA[<p>Over the past two months I have tried to convert completely to <a href="http://slides.html5rocks.com/#landing-slide">HTML5 slides</a>
, HTML reports and R + <a href="http://yihui.name/knitr/">knitr</a>
. The switch from Sweave came with a few frustrations but I think overall it is way better - it is incredibly efficient and while some flexibility on report style is given up, a lot of speed is gained. To help make the change smoother, I have found that a a few pieces of non-R code have really helped me make HTML reports have parity with what I was doing before in R. An example is the CSS header below.</p>
<figure class="code-block" data-lang="r"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="o">&lt;</span><span class="n">style</span> <span class="n">type</span><span class="o">=</span><span class="s">&#34;text/css&#34;</span><span class="o">&gt;</span>  
</span></span><span class="line"><span class="cl"><span class="n">body</span><span class="p">,</span> <span class="n">td</span> <span class="p">{</span>  
</span></span><span class="line"><span class="cl">   <span class="n">font</span><span class="o">-</span><span class="n">size</span><span class="o">:</span> <span class="m">16</span><span class="n">px</span><span class="p">;</span>  
</span></span><span class="line"><span class="cl">   <span class="n">font</span><span class="o">-</span><span class="n">family</span><span class="o">:</span> <span class="n">Times</span><span class="p">;</span>  
</span></span><span class="line"><span class="cl"><span class="p">}</span>  
</span></span><span class="line"><span class="cl"><span class="n">code.r</span><span class="p">{</span>  
</span></span><span class="line"><span class="cl">  <span class="n">font</span><span class="o">-</span><span class="n">size</span><span class="o">:</span> <span class="m">10</span><span class="n">px</span><span class="p">;</span>  
</span></span><span class="line"><span class="cl"><span class="p">}</span>  
</span></span><span class="line"><span class="cl"><span class="n">pre</span> <span class="p">{</span>  
</span></span><span class="line"><span class="cl">  <span class="n">font</span><span class="o">-</span><span class="n">size</span><span class="o">:</span> <span class="m">10</span><span class="n">px</span><span class="p">;</span>  
</span></span><span class="line"><span class="cl">  <span class="n">font</span><span class="o">-</span><span class="n">family</span><span class="o">:</span> <span class="s">&#39;DejaVu Sans Mono&#39;</span><span class="p">,</span> <span class="s">&#39;Droid Sans Mono&#39;</span><span class="p">,</span> <span class="s">&#39;Lucida Console&#39;</span><span class="p">,</span> <span class="n">Consolas</span><span class="p">,</span> <span class="n">Monaco</span><span class="p">,</span> <span class="n">monospace</span><span class="p">;</span>  
</span></span><span class="line"><span class="cl"><span class="p">}</span>  
</span></span><span class="line"><span class="cl"><span class="n">pre</span> <span class="n">code</span> <span class="p">{</span>  
</span></span><span class="line"><span class="cl">  <span class="n">font</span><span class="o">-</span><span class="n">size</span><span class="o">:</span> <span class="m">10</span><span class="n">px</span><span class="p">;</span>  
</span></span><span class="line"><span class="cl">  <span class="n">font</span><span class="o">-</span><span class="n">family</span><span class="o">:</span> <span class="s">&#39;DejaVu Sans Mono&#39;</span><span class="p">,</span> <span class="s">&#39;Droid Sans Mono&#39;</span><span class="p">,</span> <span class="s">&#39;Lucida Console&#39;</span><span class="p">,</span> <span class="n">Consolas</span><span class="p">,</span> <span class="n">Monaco</span><span class="p">,</span> <span class="n">monospace</span><span class="p">;</span>  
</span></span><span class="line"><span class="cl"><span class="p">}</span>  
</span></span><span class="line"><span class="cl"><span class="n">code</span> <span class="p">{</span>  
</span></span><span class="line"><span class="cl">  <span class="n">font</span><span class="o">-</span><span class="n">size</span><span class="o">:</span> <span class="m">16</span><span class="n">px</span><span class="p">;</span>  
</span></span><span class="line"><span class="cl">  <span class="n">font</span><span class="o">-</span><span class="n">family</span><span class="o">:</span> <span class="n">Times</span><span class="p">;</span>  
</span></span><span class="line"><span class="cl"><span class="p">}</span>  
</span></span><span class="line"><span class="cl">  
</span></span><span class="line"><span class="cl"><span class="o">&lt;/</span><span class="n">style</span><span class="o">&gt;</span></span></span></code></pre></div>
</figure>
<p>By inserting this at the head of our .Rmd file, we can modify a number of style elements in the resulting output. Most importantly, we can modify font sizes.</p>
<p>We can modify the body of our HTML and the font using the first argument. We can modify the appearance of our R code by modifying the second code.r style. ​</p>
<p>​These modifications are really helpful for customizing the look and feel of an HTML file you might want to share for publication or align with your own internal style sheet at your organization. You also might want to increase font sizes if you are converting your .Rmd file to HTML5 slides as well.</p>
<p>​</p>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "Over the past two months I have tried to convert completely to HTML5 slides , HTML reports and R + knitr . The switch from Sweave came with a few frustrations but I think overall it is way better - it is incredibly efficient and while some flexibility on report style is given up, a lot of speed is gained. To help make the change smoother, I have found that a a few pieces of non-R code have really helped me make HTML reports have parity with what I was doing before in R. An example is the CSS header below.",
  "og_image": "https://jaredknowles.com/og/posts/tips-and-tricks-for-html-and-r.png",
  "og_title": "Tips and Tricks for HTML and R"
}
</posse:post></entry><entry><title>R Bootcamp Materials!</title><link href="https://jaredknowles.com/posts/r-bootcamp-materials/" rel="alternate" type="text/html"/><id>https://jaredknowles.com/posts/r-bootcamp-materials/</id><published>2013-02-25T05:08:27Z</published><updated>2013-02-25T05:08:27Z</updated><category term="R"/><summary>View fullsize ​Learn about ColoRs in R! View fullsize ​Analyze model results with custom functions. View fullsize ​Good and Bad Graphics ​ To train new employees at the Wisconsin Department of Public Instruction, I have developed a 2-3 day series of training modules on how to get work done in R. These modules cover everything from setting up and installing R and RStudio to creating reproducible analyses using the knitr package. There are also some experimental modules for introductions to basic computer programming, and a refresher course on statistics. I hope to improve both of these over time.</summary><content type="html"><![CDATA[<p>View fullsize
<figure class="post-figure">
  <div class="photo-frame"><img src="https://media.jaredknowles.com/posts/r-bootcamp-materials/r-bootcamp-materials-01-slides6-colorwheel1.png" alt="​Learn about ColoRs in R!" loading="lazy" decoding="async"></div></figure>
</p>
<p>​Learn about ColoRs in R!</p>
<p>View fullsize
<figure class="post-figure">
  <div class="photo-frame"><img src="https://media.jaredknowles.com/posts/r-bootcamp-materials/r-bootcamp-materials-02-slides5-betterquantileplot2.png" alt="​Analyze model results with custom functions." loading="lazy" decoding="async"></div></figure>
</p>
<p>​Analyze model results with custom functions.</p>
<p>View fullsize
<figure class="post-figure">
  <div class="photo-frame"><img src="https://media.jaredknowles.com/posts/r-bootcamp-materials/r-bootcamp-materials-03-slides6-ggplot2plottypesadv.png" alt="​Good and Bad Graphics" loading="lazy" decoding="async"></div></figure>
</p>
<p>​Good and Bad Graphics</p>
<p><a href="http://yihui.name/knitr/">​</a>
To train new employees at the Wisconsin Department of Public Instruction, I have developed a 2-3 day series of training modules on how to get work done in R. These modules cover everything from setting up and installing R and RStudio to creating reproducible analyses using the <a href="http://yihui.name/knitr/"><strong>knitr</strong></a>
 package. There are also some experimental modules for introductions to basic computer programming, and a refresher course on statistics. I hope to improve both of these over time.</p>
<p>I am happy to announce that all of these materials are available online, for free.</p>
<p>​The bootcamp covers the following topics:</p>
<ol>
<li>​<strong>Introduction to R</strong>​ : History of R, R as a programming language, and features of R.</li>
<li><strong>​</strong>​**Getting Data In :**​ How to import data into R, manipulate, and manage multiple data objects.</li>
<li><strong>​</strong>​**Sorting and Reshaping Data :**​Long to wide, wide to long, and everything in between!</li>
<li><strong>Cleaning Education Data</strong>​ : Includes material from the Strategic Data Project about how to implement common business rules in processing administrative data.</li>
<li><strong>Regression and Basic Analytics in R</strong>​ : Using school mean test scores to do OLS regression and regression diagnostics &#8211; a real world example.</li>
<li>**Visualizing Data :**​Harness the power of R&#8217;s data visualization packages to make compelling and informative visualizations.</li>
<li>**Exporting Your Work :**​Learn the <strong>knitr</strong>​ package, and how to export graphics, and create PDF reports.</li>
<li>​**Advanced Topics :**​ A potpourri of advanced features in R (by request)</li>
<li>**A Statistics Refresher :**​With <a href="http://jaredknowles.com/journal/2013/1/8/shiny-apps">interactive examples</a>
 using **<a href="http://www.rstudio.com/shiny">shiny</a>
**​</li>
<li>​**​Programming Principles :**​Tips and pointers about writing code. (Needs work)</li>
</ol>
<p>​</p>
<p>The best part is, all of the materials are available online and free of charge! (<a href="/r-bootcamp">Check out the R Bootcamp page</a>
). They are constantly evolving. We have done two R Bootcamps so far, and hope to do more. Each time the materials get a little better. ​</p>
<p>For those not ready for a full 2 to 3 day training, together with a colleague at wor (<a href="http://rprogramming.net/">Justin Meyer of RProgramming.net</a>
) we have created a 2-3 hour introduction that is also available on the webpage.</p>
<p>And, of course, all the materials are online on <a href="https://github.com/jknowles/r_tutorial_ed">GitHub</a>
.​ Look for future blog posts on tips for running an R bootcamp and some practical advice. For now, enjoy the materials, and feel free to leave a comment here for feedback, fork the GitHub repo, make a pull request, or take and adopt the materials however you see fit! One parting piece of advice though &#8211; don&#8217;t wait until day two for the data visualization module &#8211; give them the <strong>ggplot2</strong>​ goodness ASAP.</p>
]]></content><posse:post format="json">
{
  "attach_link": true,
  "format_string": "{{title}}\n\n{{summary}}",
  "og_description": "View fullsize ​Learn about ColoRs in R! View fullsize ​Analyze model results with custom functions. View fullsize ​Good and Bad Graphics ​ To train new employees at the Wisconsin Department of Public Instruction, I have developed a 2-3 day series of training modules on how to get work done in R. These modules cover everything from setting up and installing R and RStudio to creating reproducible analyses using the knitr package. There are also some experimental modules for introductions to basic computer programming, and a refresher course on statistics. I hope to improve both of these over time.",
  "og_image": "https://jaredknowles.com/og/posts/r-bootcamp-materials.png",
  "og_title": "R Bootcamp Materials!"
}
</posse:post></entry></feed>