<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Zajichek Stats</title>
<link>https://www.zajichekstats.com/</link>
<atom:link href="https://www.zajichekstats.com/index.xml" rel="self" type="application/rss+xml"/>
<description>Statistician/data scientist in Central Wisconsin.</description>
<generator>quarto-1.7.33</generator>
<lastBuildDate>Mon, 05 Jan 2026 06:00:00 GMT</lastBuildDate>
<item>
  <title>Conceptualizing the readmission risk pool</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/conceptualizing-the-readmission-risk-pool/</link>
  <description><![CDATA[ 




<p>In a <a href="https://www.zajichekstats.com/post/managing-the-readmission-risk-pool/">prior article</a> the <em>readmission risk pool</em> was briefly described in the context of a building a prediction framework for managing hospital readmissions. I wanted to go a bit more in depth about why this concept is important, especially when trying to seamlessly wrangle the data/reporting side with the day-to-day, operational side of preventing and managing readmissions.</p>
<section id="definition" class="level1">
<h1>What is the <em>readmission risk pool</em>?</h1>
<p>In its simplest form, I define it as <em>all patients <em>currently</em> at risk for readmission</em>.</p>
<p>Typically, we think of 30 days as the important time window for readmissions (although that’s arbitrary). But if we’re working in that context, then anyone who is currently (meaning as of <em>now</em>) at risk for a 30-day readmission is part of the risk. Some specific examples:</p>
<ul>
<li>A patient was discharged 29 days ago and not yet readmitted, then they are in the risk pool (for 1 more day)</li>
<li>A patient was discharged yesterday, then they are in the risk for 29 more days</li>
<li>A patient was discharged 14 days ago, but were readmitted 7 days ago. They are <em>no longer</em> in the risk pool, since they were already readmitted as of now (at least for that initial discharge)</li>
</ul>
<p>It all has to do with which patients have a theoretically non-zero probability of still being readmitted, as of <em>right now</em>.</p>
</section>
<section id="why-is-it-important" class="level1">
<h1>Why is it important?</h1>
<p>This is a very important concept to think about because it has all sorts of implications, particularly when trying to monitor/report data, and even more, trying to facilitate actual care teams with data to intervene and prevent subsequent readmissions.</p>
<p>The key word is <em>action</em>. The patients in the risk pool are those that we are still able to (even theoretically, if not practically) <em>still</em> intervene on to prevent the future event of another hospital admission.</p>
<p>It also matters for doing things like predictive analytics. When we’re trying to marry models we’ve built into systems and clinical workflows, we need to define which patients <em>should</em> receive risk scores from our models at which particular times. We may be able to “plug in” data to our models, but if its not in the right context, the output may be junk. This also includes the retrospective datasets we build to actually <em>train</em> such models: even if we’re extracting data to build a training dataset, we need it to reflect the nature of the thing we are going to try to predict in the future, so this concept <em>currently at risk for readmission</em> needs to be applied from the lens of the data timepoints we are extracting as well.</p>
<p>That’s not to say the definition above is concrete and universal. There are many conceivable nuances that may make <em>your</em> definition of these things vary depending on the context:</p>
<ul>
<li><p>We may be interested in something other than 30 day readmissions (e.g., 60 or 90), or want to be agnostic to the specific time frame, generalizing what a readmission <em>is</em>. This will make our risk pool definition differ.</p></li>
<li><p>The thing we are seeking to measure may impact it. For example, we said above that a patient who has already been readmitted is no longer in the risk pool. But readmitted where? What if they were readmitted to an outside hospital, but we didn’t know that? According to our information, they haven’t been readmitted, but in reality they have been (and if it’s a Medicare patient in the <a href="https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/hospital-readmissions-reduction-program-hrrp">Hospital Readmissions Reduction Program (HRRP)</a>, they <em>will</em> be counted). How should we handle this? How should we <em>model</em> this? Do we want to explicitly account for outside readmission risk in our models, metrics, analytics? Or are we going to define readmissions only in terms of what we measure with <em>our</em> systems?</p></li>
<li><p>What are the pragmatic considerations we must have? If we’re building a readmission prevention system with predictive models being apart of it, we need to consider the <em>whole</em> solution as part of the project. This means understanding workflow logistics, resource constraints, data capture and systems issues, etc. and mapping these things out upfront, and designing the predictive modeling piece around those. It’s only a small piece of the puzzle. We can <em>theoretically</em> design a model that creates predictions at arbitrary timepoints, but maybe what works optimally would be a model that creates predictions at a single, fixed time point on a daily basis and gets delivered as an Excel spreadsheet into the inbox of a care team, because, for example, that would better accommodate more structure and predictability to have defined roles for what will be done with the information. How we end actually designing a model (and defining the risk pool) may differ in these scenarios.</p></li>
</ul>
<p>All of these things depend on the context of the problem we’re trying to solve, the trade-offs involved in our interventions, and how things are going to be monitored and reported. Ultimately we want to create a well-oiled machine where the lineage from high level readmission rates down to individual patient intervention is clear, and how we conceptualize the readmission risk pool is certainly part of that.</p>


<!-- -->

</section>

 ]]></description>
  <category>Readmissions</category>
  <category>Healthcare</category>
  <guid>https://www.zajichekstats.com/post/conceptualizing-the-readmission-risk-pool/</guid>
  <pubDate>Mon, 05 Jan 2026 06:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/conceptualizing-the-readmission-risk-pool/feature.png" medium="image" type="image/png" height="95" width="144"/>
</item>
<item>
  <title>Investigating a Hospital-Specific Report (HSR)</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/investigating-a-hospital-specific-report/</link>
  <description><![CDATA[ 




<p><em>This article is a copy of the <a href="https://centralstatz.github.io/readmit/index.html"><code>readmit</code></a> package tutorial. See it on the package website <a href="https://centralstatz.github.io/readmit/articles/investigating-an-hsr.html">here</a>.</em></p>
<p><em><strong>Note</strong>: CMS changed the format of Hospital-Specific Reports (HSRs) for FY2026 (see <a href="https://qualitynet.cms.gov/inpatient/hrrp/reports#tab2">here</a>). The current HSR functions support Excel-based formats through FY2025. However, analysis strategies are still relevant.</em></p>
<p>As part of the <a href="https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/hospital-readmissions-reduction-program-hrrp">Hospital Readmissions Reduction Program (HRRP)</a>, the <a href="https://www.cms.gov/">Centers for Medicare &amp; Medicaid Services (CMS)</a> provides a detailed, annual program summary report (called the <a href="https://qualitynet.cms.gov/inpatient/hrrp/reports">Hospital-Specific Report (HSR)</a>) to hospitals that includes details on the penalty calculation for the upcoming fiscal year, such as discharge-level data, dually-eligible discharge lists, cohort-level rollup, and the penalty amount. There is a defined <a href="https://qualitynet.cms.gov/inpatient/hrrp/resources#tab1:~:text=FY%202026%20Hospital%20Readmissions%20Reduction%20Program%20Key%20Dates%20(08/11/25)">review and correction period</a> in which hospitals can use these reports to ensure the penalty being enforced by CMS is accurate. It occurs approximately 1 month before the new fiscal year, thus it is a time-critical event (see historical date ranges below for reference with the built-in package datasets):</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract date ranges</span></span>
<span id="cb1-2">readmit<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>hrrp_keydates <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-3">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(</span>
<span id="cb1-4">    ProgramYear,</span>
<span id="cb1-5">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matches</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^(Payment|Review)"</span>)</span>
<span id="cb1-6">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-7">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">distinct</span>()</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 9 × 5
  ProgramYear PaymentStartDate PaymentEndDate ReviewStartDate ReviewEndDate
        &lt;dbl&gt; &lt;date&gt;           &lt;date&gt;         &lt;chr&gt;           &lt;chr&gt;        
1        2027 2026-10-01       2027-09-30     &lt;NA&gt;            &lt;NA&gt;         
2        2026 2025-10-01       2026-09-30     2025-08-12      2025-09-10   
3        2025 2024-10-01       2025-09-30     2024-08-12      2024-09-10   
4        2024 2023-10-01       2024-09-30     2023-08-08      2023-09-07   
5        2023 2022-10-01       2023-09-30     2022-08-08      2022-09-07   
6        2022 2021-10-01       2022-09-30     2021-08-09      2021-09-08   
7        2021 2020-10-01       2021-09-30     2020-08-10      2020-09-09   
8        2020 2019-10-01       2020-09-30     2019-08-09      2019-09-09   
9        2019 2018-10-01       2019-09-30     2018-08-06      2018-09-05   </code></pre>
</div>
</div>
<p>The report file itself (through FY2025) is a large, multi-tab Microsoft Excel document where the structured part of the data is ambiguously placed throughout, thus we need tools to parse it out into a usable format. That is what some functions in the <code>readmit</code> package are for. In this article, we go through the tools that are available, what they do, and then provide some strategies/approaches for how hospitals can use these tools to analyze their own HSR’s to gain deeper insight into HRRP results and readmissions more broadly.</p>
<section id="the-toolbox" class="level1">
<h1>The Toolbox</h1>
<p>First, we’ll start by taking a look at what relevant functions are available to us, what they do, and how to use them. For our purposes, these are all of the functions prefixed like <code>hsr_*</code>. We’ll do this roughly in order of how the report is laid out, and how the HRRP results roll up.</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(readmit)</span></code></pre></div>
</details>
</div>
<section id="mock-reports" class="level2">
<h2 class="anchored" data-anchor-id="mock-reports">0. Mock Reports</h2>
<p>As the developer of this package, I don’t have access to hospitals’ actual HSR’s, as they contain senstivie patient information (i.e., <a href="https://cphs.berkeley.edu/hipaa/hipaa18.html">PHI</a>) and thus are not publicly available. So what we have to work with are <em>mock</em> reports that <a href="https://qualitynet.cms.gov/inpatient/hrrp/reports#tab3">CMS provides</a> to the public that are meant to mimick the format a hospital can expect their report to be in. It just includes fake data.</p>
<p><em><strong>Note</strong>: CMS changed the format of Hospital-Specific Reports (HSRs) for FY2026 (see <a href="https://qualitynet.cms.gov/inpatient/hrrp/reports#tab2">here</a>). The current HSR functions support Excel-based formats through FY2025.</em></p>
<p>Nevertheless, these provide a useful playground to analyze the mechanics of the program. We’ll start by finding a report with the <code>hsr_mock_reports()</code> function:</p>
<ul>
<li>Using no arguments lists the various mock files included in the package</li>
</ul>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_mock_reports</span>()</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "FY2019_HRRP_MockHSR.xlsx" "FY2020_HRRP_MockHSR.xlsx"
[3] "FY2021_HRRP_MockHSR.xlsx" "FY2022_HRRP_MockHSR.xlsx"
[5] "FY2023_HRRP_MockHSR.xlsx" "FY2024_HRRP_MockHSR.xlsx"
[7] "FY2025_HRRP_MockHSR.xlsx"</code></pre>
</div>
</div>
<ul>
<li>Entering one of the listed file names will return the complete path to the file in the packages location on your computer</li>
</ul>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">my_report <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_mock_reports</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"FY2025_HRRP_MockHSR.xlsx"</span>)</span>
<span id="cb6-2">my_report</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/readmit/extdata/FY2025_HRRP_MockHSR.xlsx"</code></pre>
</div>
</div>
<p>Now we can use that HSR path with other package functions. Of course, you would just point to your own HSR when analyzing your hospital’s reports.</p>
</section>
<section id="programsummary" class="level2">
<h2 class="anchored" data-anchor-id="programsummary">1. Program Summary</h2>
<p>We’ll start with high level program results. Ultimately, all of the moving parts in the HRRP roll up into a single number: the penalty amount applied to your hospital. This is typically the first table in your report. We can parse it out of the report with the <code>hsr_payment_summary()</code> function:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">my_payment_summary <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_payment_summary</span>(my_report)</span>
<span id="cb8-2">my_payment_summary</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 7
  Number of Dually Eligible Stays…¹ Total Number of Stay…² `Dual Proportion [c]`
                              &lt;dbl&gt;                  &lt;dbl&gt;                 &lt;dbl&gt;
1                               186                    856                 0.217
# ℹ abbreviated names: ¹​`Number of Dually Eligible Stays (Numerator) [a]`,
#   ²​`Total Number of Stays(Denominator) [b]`
# ℹ 4 more variables: `Peer Group Assignment [d]` &lt;dbl&gt;,
#   `Neutrality Modifier [e]` &lt;dbl&gt;, `Payment Reduction Percentage [f]` &lt;dbl&gt;,
#   `Payment Adjustment Factor [g]` &lt;dbl&gt;</code></pre>
</div>
</div>
<p>We now have this information in a data frame that we can manipulate as needed:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">my_payment_summary <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-2">  tidyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">everything</span>())</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 7 × 2
  name                                               value
  &lt;chr&gt;                                              &lt;dbl&gt;
1 Number of Dually Eligible Stays (Numerator) [a] 186     
2 Total Number of Stays(Denominator) [b]          856     
3 Dual Proportion [c]                               0.217 
4 Peer Group Assignment [d]                         3     
5 Neutrality Modifier [e]                           0.965 
6 Payment Reduction Percentage [f]                  0.0007
7 Payment Adjustment Factor [g]                     0.999 </code></pre>
</div>
</div>
<p>There are also helper functions to extract specific components from this table.</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_payment_penalty</span>(my_report)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 7e-04</code></pre>
</div>
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_dual_proportion</span>(my_report)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0.2172897</code></pre>
</div>
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_peer_group</span>(my_report)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 3</code></pre>
</div>
</div>
<p>See <code>?hsr_payment_summary</code> for all of them.</p>
</section>
<section id="cohortsummary" class="level2">
<h2 class="anchored" data-anchor-id="cohortsummary">2. Cohort Summary</h2>
<p>The overall payment penalty a hospital receives is a weighted average of penalities applied to the individual cohorts. These details are typically in the second table (tab) of the HSR, which we can import with <code>hsr_cohort_summary()</code>:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">cohort_summary <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_cohort_summary</span>(my_report)</span>
<span id="cb18-2">cohort_summary</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 10
  `Measure [a]` `Number of Eligible Discharges [b]` Number of Readmissions Amo…¹
  &lt;chr&gt;                                       &lt;dbl&gt;                        &lt;dbl&gt;
1 AMI                                             2                            0
2 COPD                                           18                            3
3 HF                                             25                            2
4 Pneumonia                                      32                            5
5 CABG                                           NA                           NA
6 THA/TKA                                        45                            0
# ℹ abbreviated name: ¹​`Number of Readmissions Among Eligible Discharges [c]`
# ℹ 7 more variables: `Predicted Readmission Rate [d]` &lt;dbl&gt;,
#   `Expected Readmission Rate [e]` &lt;dbl&gt;,
#   `Excess Readmission Ratio (ERR) [f]` &lt;dbl&gt;,
#   `Peer Group Median ERR [g]` &lt;dbl&gt;, `Penalty Indicator (Yes/No) [h]` &lt;chr&gt;,
#   `Ratio of DRG Payments Per Measure to Total Payments [i]` &lt;dbl&gt;,
#   `National Observed Readmission Rate [j]` &lt;dbl&gt;</code></pre>
</div>
</div>
<p>We can then, for example, reconcile the overall penalty amount based on what is in this table:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">cohort_summary <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb20-2"></span>
<span id="cb20-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filter to cohorts with a penalty</span></span>
<span id="cb20-4">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Penalty Indicator (Yes/No) [h]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Yes"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb20-5">  </span>
<span id="cb20-6">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute the contribution of each penalized cohort</span></span>
<span id="cb20-7">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb20-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Contribution =</span> </span>
<span id="cb20-9">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Ratio of DRG Payments Per Measure to Total Payments [i]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span></span>
<span id="cb20-10">      (</span>
<span id="cb20-11">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Excess Readmission Ratio (ERR) [f]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> </span>
<span id="cb20-12">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Peer Group Median ERR [g]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span></span>
<span id="cb20-13">      )</span>
<span id="cb20-14">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb20-15">  </span>
<span id="cb20-16">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Roll up into final calculation</span></span>
<span id="cb20-17">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb20-18">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Penalty =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(Contribution) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_neutrality_modifier</span>(my_report)</span>
<span id="cb20-19">  )</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 1
   Penalty
     &lt;dbl&gt;
1 0.000746</code></pre>
</div>
</div>
<p>What we did here was:</p>
<ol type="1">
<li>Find cohorts who received a penalty <em>who were eligible</em>
<ul>
<li>A cohort must have at least 25 discharges to be eligible</li>
<li>Then, the excess readmission ratio (ERR) must be greater than the assigned peer group’s median ERR</li>
</ul></li>
<li>Find the difference between the hospital’s ERR compared to the peer group median ERR</li>
<li>Multiply that by the ratio of DRG payments for that cohort
<ul>
<li>This is a measure of the volume of patients with this condition are treated at the hospital</li>
</ul></li>
<li>Sum those contributions over each cohort</li>
<li>Multiply that by the neutrality modifier in the <code>hsr_payment_summary()</code> table</li>
</ol>
<p>That’s how the penalty is computed, from cohort summary level.</p>
</section>
<section id="discharges" class="level2">
<h2 class="anchored" data-anchor-id="discharges">3. Discharges</h2>
<p><em>Note: Because we are using mock reports, the dates in these files are erroneous and thus R doesn’t interpret them as dates. However, your hospital report has real dates and thus R should automatically parse them as such.</em></p>
<p>The HSR also contains discharge-level data on the individual patients that actually contributed to the program. There is a separate table/tab for each of the cohorts. We can use the <code>hsr_discharges()</code> function to import them for a specified cohort:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_discharges</span>(my_report, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"COPD"</span>)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 21 × 17
   `ID Number` MBI         `Medical Record Number` `Beneficiary DOB`
         &lt;int&gt; &lt;chr&gt;       &lt;chr&gt;                   &lt;chr&gt;            
 1           1 9AA9AA9AA99 99999A                  99/99/9999       
 2           2 9AA9AA9AA99 99999A                  99/99/9999       
 3           3 9AA9AA9AA99 99999A                  99/99/9999       
 4           4 9AA9AA9AA99 99999A                  99/99/9999       
 5           5 9AA9AA9AA99 99999A                  99/99/9999       
 6           6 9AA9AA9AA99 99999A                  99/99/9999       
 7           7 9AA9AA9AA99 99999A                  99/99/9999       
 8           8 9AA9AA9AA99 99999A                  99/99/9999       
 9           9 9AA9AA9AA99 99999A                  99/99/9999       
10          10 9AA9AA9AA99 99999A                  99/99/9999       
# ℹ 11 more rows
# ℹ 13 more variables: `Admission Date of Index Stay` &lt;chr&gt;,
#   `Discharge Date of Index Stay` &lt;chr&gt;,
#   `Cohort Inclusion/Exclusion Indicator` &lt;chr&gt;, `Index Stay (Yes/No)` &lt;chr&gt;,
#   `Principal Discharge Diagnosis of Index Stay` &lt;chr&gt;,
#   `Discharge Destination` &lt;chr&gt;,
#   `Unplanned Readmission within 30 Days (Yes/No) [a]` &lt;chr&gt;, …</code></pre>
</div>
</div>
<p>We get some patient identifying information, including the specific dates associated with the index and readmission hospitalizations, whether or not the readmission occurred at the same hospital, diagnosis codes, etc., which is all very valuable information that we can explore to gain insights from (and what we’ll do later).</p>
<p>There are also options available in the function to refine the result:</p>
<section id="eligible-cases" class="level3">
<h3 class="anchored" data-anchor-id="eligible-cases">Eligible Cases</h3>
<p>The <code>eligible_only</code> argument can be used to only included discharges that were actually included in HRRP evaluation:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_discharges</span>(</span>
<span id="cb24-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> my_report, </span>
<span id="cb24-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cohort =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"COPD"</span>,</span>
<span id="cb24-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">eligible_only =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb24-5">)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 18 × 17
   `ID Number` MBI         `Medical Record Number` `Beneficiary DOB`
         &lt;int&gt; &lt;chr&gt;       &lt;chr&gt;                   &lt;chr&gt;            
 1           1 9AA9AA9AA99 99999A                  99/99/9999       
 2           2 9AA9AA9AA99 99999A                  99/99/9999       
 3           3 9AA9AA9AA99 99999A                  99/99/9999       
 4           4 9AA9AA9AA99 99999A                  99/99/9999       
 5           5 9AA9AA9AA99 99999A                  99/99/9999       
 6           6 9AA9AA9AA99 99999A                  99/99/9999       
 7           7 9AA9AA9AA99 99999A                  99/99/9999       
 8           8 9AA9AA9AA99 99999A                  99/99/9999       
 9           9 9AA9AA9AA99 99999A                  99/99/9999       
10          10 9AA9AA9AA99 99999A                  99/99/9999       
11          11 9AA9AA9AA99 99999A                  99/99/9999       
12          12 9AA9AA9AA99 99999A                  99/99/9999       
13          13 9AA9AA9AA99 99999A                  99/99/9999       
14          14 9AA9AA9AA99 99999A                  99/99/9999       
15          15 9AA9AA9AA99 99999A                  99/99/9999       
16          16 9AA9AA9AA99 99999A                  99/99/9999       
17          17 9AA9AA9AA99 99999A                  99/99/9999       
18          18 9AA9AA9AA99 99999A                  99/99/9999       
# ℹ 13 more variables: `Admission Date of Index Stay` &lt;chr&gt;,
#   `Discharge Date of Index Stay` &lt;chr&gt;,
#   `Cohort Inclusion/Exclusion Indicator` &lt;chr&gt;, `Index Stay (Yes/No)` &lt;chr&gt;,
#   `Principal Discharge Diagnosis of Index Stay` &lt;chr&gt;,
#   `Discharge Destination` &lt;chr&gt;,
#   `Unplanned Readmission within 30 Days (Yes/No) [a]` &lt;chr&gt;,
#   `Planned Readmission within 30 Days (Yes/No)` &lt;chr&gt;, …</code></pre>
</div>
</div>
<p>Notice that this row count matches what was reported as the COPD denominator in the <code>cohort_summary</code>:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1">cohort_summary <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb26-2">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Measure [a]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"COPD"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb26-3">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Number of Eligible Discharges [b]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 18</code></pre>
</div>
</div>
</section>
<section id="risk-factors" class="level3">
<h3 class="anchored" data-anchor-id="risk-factors">Risk Factors</h3>
<p>Also included in these tables are the indicators of risk factors that are used in the statistical models to estimate individual adjusted readmission risks. We can use the <code>risk_factors</code> argument to extract those for each patient:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_discharges</span>(</span>
<span id="cb28-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> my_report, </span>
<span id="cb28-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cohort =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"COPD"</span>,</span>
<span id="cb28-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">eligible_only =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb28-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">risk_factors =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb28-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">discharge_phi =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span></span>
<span id="cb28-7">)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 18 × 42
   `ID Number` `Years Over 65 (continuous)` `History of Mechanical Ventilation`
         &lt;int&gt;                        &lt;dbl&gt;                               &lt;dbl&gt;
 1           1                            5                                   1
 2           2                            8                                   1
 3           3                           23                                   0
 4           4                           15                                   0
 5           5                            7                                   0
 6           6                            1                                   0
 7           7                           12                                   0
 8           8                           18                                   0
 9           9                            6                                   0
10          10                           11                                   0
11          11                           15                                   0
12          12                           13                                   0
13          13                            5                                   0
14          14                           12                                   0
15          15                           12                                   0
16          16                           24                                   0
17          17                           12                                   0
18          18                           11                                   0
# ℹ 39 more variables: `Sleep-Disordered Breathing` &lt;dbl&gt;,
#   `History of COVID-19` &lt;dbl&gt;,
#   `Severe Infection; Other Infectious Diseases` &lt;dbl&gt;,
#   `Metastatic Cancer and Acute Leukemia` &lt;dbl&gt;,
#   `Lung and Other Severe Cancers` &lt;dbl&gt;,
#   `Lymphatic, Head and Neck, Brain, and Other Major Cancers; Breast, Colorectal and Other Cancers and Tumors; Other Respiratory and Heart Neoplasms` &lt;dbl&gt;,
#   `Other Digestive and Urinary Neoplasms` &lt;dbl&gt;, …</code></pre>
</div>
</div>
<p>This data can then be explored further to understand risk factor prevalence and how that relates to model weights, etc. (again, covered later). Notice the <code>discharge_phi</code> argument was used to prevent the date information from being returned.</p>
</section>
</section>
<section id="modelcoefficients" class="level2">
<h2 class="anchored" data-anchor-id="modelcoefficients">4. Model Coefficients</h2>
<p>The ERR is calculated based on an aggregated roll up of individual adjusted readmission risks derived from a random-intercept logistic regression model. The first row in the discharge table contains the coefficients for this model. We can use the <code>hsr_coefficients()</code> function to extract them:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1">copd_model <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_coefficients</span>(my_report, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"COPD"</span>)</span>
<span id="cb30-2">copd_model</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 43 × 2
   Factor                                                                  Value
   &lt;chr&gt;                                                                   &lt;dbl&gt;
 1 Years Over 65 (continuous)                                           -0.00550
 2 History of Mechanical Ventilation                                     0.293  
 3 Sleep-Disordered Breathing                                           -0.0339 
 4 History of COVID-19                                                  -0.0200 
 5 Severe Infection; Other Infectious Diseases                           0.0381 
 6 Metastatic Cancer and Acute Leukemia                                  0.201  
 7 Lung and Other Severe Cancers                                         0.156  
 8 Lymphatic, Head and Neck, Brain, and Other Major Cancers; Breast, C… -0.00343
 9 Other Digestive and Urinary Neoplasms                                -0.0792 
10 Diabetes Mellitus (DM) or DM Complications                            0.0891 
# ℹ 33 more rows</code></pre>
</div>
</div>
<p>This allows us to do things like assess the relative contribution of risk factors to the estimated readmission rates or use the risk factor dataset above to compute individual level readmission risks.</p>
<section id="intercept-terms" class="level3">
<h3 class="anchored" data-anchor-id="intercept-terms">Intercept Terms</h3>
<p>The <em>predicted</em> and <em>expected</em> readmission rates only differ in the intercept terms applied to the prediction (thus it is a constant shift for all patients). We can see these at the end of this data frame:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1">copd_model <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tail</span>()</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 2
  Factor                                           Value
  &lt;chr&gt;                                            &lt;dbl&gt;
1 Renal Failure                                   0.160 
2 Decubitus Ulcer or Chronic Skin Ulcer           0.0729
3 Cellulitis, Local Skin Infection                0.0412
4 Vertebral Fractures Without Spinal Cord Injury  0.0822
5 HOSP_EFFECT                                    -2.53  
6 AVG_EFFECT                                     -2.53  </code></pre>
</div>
</div>
</section>
</section>
<section id="readmission-risks" class="level2">
<h2 class="anchored" data-anchor-id="readmission-risks">5. Readmission Risks</h2>
<p>We could take the risk factor output from <code>hsr_discharges()</code> combined with the coefficients from <code>hsr_coefficients()</code> and reconcile each patient’s <em>predicted</em> and <em>expected</em> readmission risk. But the <code>hsr_readmission_risks()</code> function can do all of that for us:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1">risks <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_readmission_risks</span>(my_report, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"COPD"</span>)</span>
<span id="cb34-2">risks</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 18 × 3
   `ID Number` Predicted Expected
         &lt;int&gt;     &lt;dbl&gt;    &lt;dbl&gt;
 1           1    0.260    0.260 
 2           2    0.266    0.266 
 3           3    0.227    0.227 
 4           4    0.0990   0.0989
 5           5    0.0935   0.0935
 6           6    0.130    0.130 
 7           7    0.156    0.156 
 8           8    0.190    0.190 
 9           9    0.0842   0.0842
10          10    0.100    0.100 
11          11    0.129    0.129 
12          12    0.111    0.111 
13          13    0.350    0.349 
14          14    0.127    0.127 
15          15    0.158    0.158 
16          16    0.168    0.168 
17          17    0.118    0.118 
18          18    0.210    0.210 </code></pre>
</div>
</div>
<p>This just takes a weighted-sum of the risk factors and coefficients, adds the corresponding intercept, and then maps it to a probability through the logistic function.</p>
<p>The cohort-level <em>predicted</em> and <em>expected</em> readmission rates are just the averages of these columns across all eligible patients:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb36-1">risks <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb36-2">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb36-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Discharges =</span> dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>(),</span>
<span id="cb36-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Predicted =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(Predicted),</span>
<span id="cb36-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Expected =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(Expected),</span>
<span id="cb36-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ERR =</span> Predicted <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> Expected</span>
<span id="cb36-7">  )</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 4
  Discharges Predicted Expected   ERR
       &lt;int&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt;
1         18     0.165    0.165  1.00</code></pre>
</div>
</div>
<p>Again, looking at our cohort summary, we can see these match:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb38-1">cohort_summary <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb38-2">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(</span>
<span id="cb38-3">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Measure [a]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"COPD"</span></span>
<span id="cb38-4">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb38-5">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(</span>
<span id="cb38-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Discharges =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Number of Eligible Discharges [b]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>,</span>
<span id="cb38-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Predicted =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Predicted Readmission Rate [d]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>,</span>
<span id="cb38-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Expected =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Expected Readmission Rate [e]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>,</span>
<span id="cb38-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ERR =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Excess Readmission Ratio (ERR) [f]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span></span>
<span id="cb38-10">  )</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 4
  Discharges Predicted Expected   ERR
       &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt;
1         18     0.165    0.165  1.00</code></pre>
</div>
</div>
</section>
<section id="dual-stays" class="level2">
<h2 class="anchored" data-anchor-id="dual-stays">6. Dual Stays</h2>
<p>CMS puts hospitals into peer groups based on the relative proportion of Medicare patients who are also eligible for Medicaid. This is a measure of socioeconomic status for the hospital population so hospitals are being compared only against other hospitals that are similar (in this regard). These aggregated quantities were found in <code>hsr_payment_summary()</code> result:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb40-1">my_payment_summary</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 7
  Number of Dually Eligible Stays…¹ Total Number of Stay…² `Dual Proportion [c]`
                              &lt;dbl&gt;                  &lt;dbl&gt;                 &lt;dbl&gt;
1                               186                    856                 0.217
# ℹ abbreviated names: ¹​`Number of Dually Eligible Stays (Numerator) [a]`,
#   ²​`Total Number of Stays(Denominator) [b]`
# ℹ 4 more variables: `Peer Group Assignment [d]` &lt;dbl&gt;,
#   `Neutrality Modifier [e]` &lt;dbl&gt;, `Payment Reduction Percentage [f]` &lt;dbl&gt;,
#   `Payment Adjustment Factor [g]` &lt;dbl&gt;</code></pre>
</div>
</div>
<p>The <code>hsr_dual_stays()</code> function extracts the discharge-level data corresponding to the numerator of the ratio:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb42-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_dual_stays</span>(my_report)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 186 × 6
   `ID Number` MBI         `Beneficiary DOB` `Admission Date` `Discharge&nbsp;Date`
         &lt;int&gt; &lt;chr&gt;       &lt;chr&gt;             &lt;chr&gt;            &lt;chr&gt;           
 1           1 9AA9AA9AA99 99/99/9999        99/99/9999       99/99/9999      
 2           2 9AA9AA9AA99 99/99/9999        99/99/9999       99/99/9999      
 3           3 9AA9AA9AA99 99/99/9999        99/99/9999       99/99/9999      
 4           4 9AA9AA9AA99 99/99/9999        99/99/9999       99/99/9999      
 5           5 9AA9AA9AA99 99/99/9999        99/99/9999       99/99/9999      
 6           6 9AA9AA9AA99 99/99/9999        99/99/9999       99/99/9999      
 7           7 9AA9AA9AA99 99/99/9999        99/99/9999       99/99/9999      
 8           8 9AA9AA9AA99 99/99/9999        99/99/9999       99/99/9999      
 9           9 9AA9AA9AA99 99/99/9999        99/99/9999       99/99/9999      
10          10 9AA9AA9AA99 99/99/9999        99/99/9999       99/99/9999      
# ℹ 176 more rows
# ℹ 1 more variable: `Claim Type` &lt;chr&gt;</code></pre>
</div>
</div>
<p>We can see that the row count of this discharge level data matches the first number in the preceding table.</p>
</section>
</section>
<section id="analysisstrategies" class="level1">
<h1>Analysis Strategies</h1>
<p>In this section we’ll go through a collection data analyses that can be conducted, using functions in <code>readmit</code> as support, to validate HSR calculations and/or to gain deeper insights into HRRP results.</p>
<section id="validating-the-penalty-calculation" class="level2">
<h2 class="anchored" data-anchor-id="validating-the-penalty-calculation">1. Validating the Penalty Calculation</h2>
<p>We previously calculated the payment penalty starting from the cohort-level results. However, as an initial validation step, it is important to go through the mechanics of reconciling the penalty calculation from the discharge-level data to ensure comprehension of how it works.</p>
<p><em>Note: We’ll go the slower, more tedious way to do this in the steps below in order to capture all intermediate details for understanding, but will callout where certain steps can be more efficient.</em></p>
<p>Let’s extract the <code>cohort</code> strings we need to plug into various function arguments:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb44-1">cohorts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">setdiff</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(hrrp_cohort_inclusion), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ProgramYear"</span>)</span>
<span id="cb44-2">cohorts</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "AMI"  "COPD" "HF"   "PN"   "CABG" "HK"  </code></pre>
</div>
</div>
<section id="i.-extract-discharges" class="level3">
<h3 class="anchored" data-anchor-id="i.-extract-discharges">i. Extract Discharges</h3>
<p>The first thing we need to is extract the set of discharges (i.e., the <em>denominator</em>) that contribute to program for each cohort. To do this, we’ll iterate through the different cohorts and sequentially use the <code>hsr_discharges()</code> function to get the row identifiers that should be included:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb46" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb46-1">eligible_discharges <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb46-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">setdiff</span>(cohorts, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"CABG"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb46-3"></span>
<span id="cb46-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Iterate each cohort</span></span>
<span id="cb46-5">    purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_df</span>(</span>
<span id="cb46-6"></span>
<span id="cb46-7">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Import eligible discharges</span></span>
<span id="cb46-8">      <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_discharges</span>(</span>
<span id="cb46-9">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> my_report,</span>
<span id="cb46-10">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cohort =</span> .x,</span>
<span id="cb46-11">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">eligible_only =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb46-12">      ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb46-13"></span>
<span id="cb46-14">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Keep the row identifiers</span></span>
<span id="cb46-15">      dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ID Number</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb46-16"></span>
<span id="cb46-17">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add cohort identifier</span></span>
<span id="cb46-18">      tibble<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_column</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Cohort =</span> .x)</span>
<span id="cb46-19">    )</span>
<span id="cb46-20">eligible_discharges</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 122 × 2
   `ID Number` Cohort
         &lt;int&gt; &lt;chr&gt; 
 1           1 AMI   
 2           2 AMI   
 3           1 COPD  
 4           2 COPD  
 5           3 COPD  
 6           4 COPD  
 7           5 COPD  
 8           6 COPD  
 9           7 COPD  
10           8 COPD  
# ℹ 112 more rows</code></pre>
</div>
</div>
<p>We can check the counts of these:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb48-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">table</span>(eligible_discharges<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Cohort)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>
 AMI COPD   HF   HK   PN 
   2   18   25   45   32 </code></pre>
</div>
</div>
<p>You can validate that we obtained the correct counts by looking at the cohort summary we previously created.</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb50" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb50-1">cohort_summary</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 10
  `Measure [a]` `Number of Eligible Discharges [b]` Number of Readmissions Amo…¹
  &lt;chr&gt;                                       &lt;dbl&gt;                        &lt;dbl&gt;
1 AMI                                             2                            0
2 COPD                                           18                            3
3 HF                                             25                            2
4 Pneumonia                                      32                            5
5 CABG                                           NA                           NA
6 THA/TKA                                        45                            0
# ℹ abbreviated name: ¹​`Number of Readmissions Among Eligible Discharges [c]`
# ℹ 7 more variables: `Predicted Readmission Rate [d]` &lt;dbl&gt;,
#   `Expected Readmission Rate [e]` &lt;dbl&gt;,
#   `Excess Readmission Ratio (ERR) [f]` &lt;dbl&gt;,
#   `Peer Group Median ERR [g]` &lt;dbl&gt;, `Penalty Indicator (Yes/No) [h]` &lt;chr&gt;,
#   `Ratio of DRG Payments Per Measure to Total Payments [i]` &lt;dbl&gt;,
#   `National Observed Readmission Rate [j]` &lt;dbl&gt;</code></pre>
</div>
</div>
<p>One caveat was that we already knew there weren’t any <code>CABG</code> discharges (via the <code>NA</code> in the <code>cohort_summary</code> table), so we pre-excluded this from our cohort list we iterated through (as it would have caused an error otherwise).</p>
</section>
<section id="riskfactors" class="level3">
<h3 class="anchored" data-anchor-id="riskfactors">ii. Extract Risk Factors</h3>
<p>Next we need to extract the sets of risk factors for each cohort that go into the readmission risk model. We can again do this by iterating through the <code>cohort</code> list with <code>hsr_discharges</code>, but extracting the risk factors as well with <code>risk_factors=TRUE</code>:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb52" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb52-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">setdiff</span>(cohorts, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"CABG"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb52-2"></span>
<span id="cb52-3">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Iterate each cohort</span></span>
<span id="cb52-4">    purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(</span>
<span id="cb52-5"></span>
<span id="cb52-6">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Import eligible discharges</span></span>
<span id="cb52-7">      <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_discharges</span>(</span>
<span id="cb52-8">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> my_report,</span>
<span id="cb52-9">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cohort =</span> .x,</span>
<span id="cb52-10">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">risk_factors =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb52-11">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">discharge_phi =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span></span>
<span id="cb52-12">      )</span>
<span id="cb52-13">    )</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[[1]]
# A tibble: 2 × 33
  `ID Number` `Years Over 65 (continuous)`  Male Anterior Myocardial Infarctio…¹
        &lt;int&gt;                        &lt;dbl&gt; &lt;dbl&gt;                           &lt;dbl&gt;
1           1                           31     1                               0
2           2                           27     1                               0
# ℹ abbreviated name: ¹​`Anterior Myocardial Infarction `
# ℹ 29 more variables: `Non-Anterior Location of Myocardial Infarction` &lt;dbl&gt;,
#   `History of Coronary Artery Bypass Graft (CABG) Surgery` &lt;dbl&gt;,
#   `History of Percutaneous Transluminal Coronary Angioplasty (PTCA)` &lt;dbl&gt;,
#   `History of COVID-19` &lt;dbl&gt;,
#   `Severe Infection; Other Infectious Diseases` &lt;dbl&gt;,
#   `Metastatic Cancer and Acute Leukemia` &lt;dbl&gt;, Cancer &lt;dbl&gt;, …

[[2]]
# A tibble: 21 × 42
   `ID Number` `Years Over 65 (continuous)` `History of Mechanical Ventilation`
         &lt;int&gt;                        &lt;dbl&gt;                               &lt;dbl&gt;
 1           1                            5                                   1
 2           2                            8                                   1
 3           3                           23                                   0
 4           4                           15                                   0
 5           5                            7                                   0
 6           6                            1                                   0
 7           7                           12                                   0
 8           8                           18                                   0
 9           9                            6                                   0
10          10                           11                                   0
# ℹ 11 more rows
# ℹ 39 more variables: `Sleep-Disordered Breathing` &lt;dbl&gt;,
#   `History of COVID-19` &lt;dbl&gt;,
#   `Severe Infection; Other Infectious Diseases` &lt;dbl&gt;,
#   `Metastatic Cancer and Acute Leukemia` &lt;dbl&gt;,
#   `Lung and Other Severe Cancers` &lt;dbl&gt;,
#   `Lymphatic, Head and Neck, Brain, and Other Major Cancers; Breast, Colorectal and Other Cancers and Tumors; Other Respiratory and Heart Neoplasms` &lt;dbl&gt;, …

[[3]]
# A tibble: 30 × 39
   `ID Number` `Years Over 65 (continuous)`  Male History of Coronary Artery B…¹
         &lt;int&gt;                        &lt;dbl&gt; &lt;dbl&gt;                          &lt;dbl&gt;
 1           1                            8     1                              0
 2           2                           25     1                              1
 3           3                            9     0                              0
 4           4                            9     0                              0
 5           5                           30     0                              0
 6           6                           13     0                              0
 7           7                           12     1                              1
 8           8                            7     1                              1
 9           9                           25     1                              0
10          10                           22     0                              0
# ℹ 20 more rows
# ℹ abbreviated name: ¹​`History of Coronary Artery Bypass Graft (CABG) Surgery`
# ℹ 35 more variables: `History of COVID-19` &lt;dbl&gt;,
#   `Metastatic Cancer and Acute Leukemia` &lt;dbl&gt;, Cancer &lt;dbl&gt;,
#   `Diabetes Mellitus (DM) or DM Complications` &lt;dbl&gt;,
#   `Protein-Calorie Malnutrition` &lt;dbl&gt;,
#   `Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance` &lt;dbl&gt;, …

[[4]]
# A tibble: 45 × 43
   `ID Number` `Years Over 65 (continuous)`  Male History of Coronary Artery B…¹
         &lt;int&gt;                        &lt;dbl&gt; &lt;dbl&gt;                          &lt;dbl&gt;
 1           1                           13     0                              0
 2           2                           16     1                              0
 3           3                            6     1                              0
 4           4                           14     1                              0
 5           5                           16     1                              0
 6           6                           18     1                              0
 7           7                           15     1                              0
 8           8                           13     0                              0
 9           9                            9     0                              0
10          10                            4     0                              0
# ℹ 35 more rows
# ℹ abbreviated name: ¹​`History of Coronary Artery Bypass Graft (CABG) Surgery`
# ℹ 39 more variables: `History of COVID-19` &lt;dbl&gt;,
#   `Severe Infection; Other Infectious Diseases` &lt;dbl&gt;,
#   `Septicemia, Sepsis, Systemic Inflammatory Response Syndrome/Shock` &lt;dbl&gt;,
#   `Metastatic Cancer and Acute Leukemia` &lt;dbl&gt;,
#   `Lung and Other Severe Cancers` &lt;dbl&gt;, `Lymphoma; Other Cancers` &lt;dbl&gt;, …

[[5]]
# A tibble: 51 × 35
   `ID Number` `Years Over 65 (continuous)`  Male Index Admissions with an Ele…¹
         &lt;int&gt;                        &lt;dbl&gt; &lt;dbl&gt;                          &lt;dbl&gt;
 1           1                            1     1                              0
 2           2                            7     1                              1
 3           3                            1     0                              0
 4           4                            6     1                              0
 5           5                            1     1                              0
 6           6                           11     1                              0
 7           7                           14     0                              0
 8           8                           16     0                              0
 9           9                            1     0                              0
10          10                           16     0                              1
# ℹ 41 more rows
# ℹ abbreviated name: ¹​`Index Admissions with an Elective THA Procedure`
# ℹ 31 more variables: `Number of Procedures (two vs. one)` &lt;dbl&gt;,
#   `Other Congenital Deformity of Hip (Joint)` &lt;dbl&gt;,
#   `Post Traumatic Osteoarthritis` &lt;dbl&gt;, `History of COVID-19` &lt;dbl&gt;,
#   `Severe Infection; Other Infectious Diseases` &lt;dbl&gt;,
#   `Metastatic Cancer and Acute Leukemia` &lt;dbl&gt;, Cancer &lt;dbl&gt;, …</code></pre>
</div>
</div>
<p>Notice that we get the risk factors for each cohort, but all of the columns are different since each cohort has a different model. So one thing we can do is pivot the data with <code>tidyr::pivot_longer()</code> to make a long and narrow data frame so that it can all be binded together.</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb54" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb54-1">risk_factors <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb54-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">setdiff</span>(cohorts, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"CABG"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb54-3"></span>
<span id="cb54-4">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Iterate each cohort</span></span>
<span id="cb54-5">      purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_df</span>(</span>
<span id="cb54-6"></span>
<span id="cb54-7">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Import eligible discharges</span></span>
<span id="cb54-8">        <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_discharges</span>(</span>
<span id="cb54-9">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> my_report,</span>
<span id="cb54-10">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cohort =</span> .x,</span>
<span id="cb54-11">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">risk_factors =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb54-12">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">discharge_phi =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span></span>
<span id="cb54-13">        ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb54-14">        </span>
<span id="cb54-15">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send risk factors down the rows</span></span>
<span id="cb54-16">        tidyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb54-17">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ID Number</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>,</span>
<span id="cb54-18">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Factor"</span>,</span>
<span id="cb54-19">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Value"</span></span>
<span id="cb54-20">        ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb54-21">        </span>
<span id="cb54-22">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Indicate cohort</span></span>
<span id="cb54-23">        tibble<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_column</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Cohort =</span> .x)</span>
<span id="cb54-24">      )</span>
<span id="cb54-25">risk_factors</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 5,689 × 4
   `ID Number` Factor                                               Value Cohort
         &lt;int&gt; &lt;chr&gt;                                                &lt;dbl&gt; &lt;chr&gt; 
 1           1 "Years Over 65 (continuous)"                            31 AMI   
 2           1 "Male"                                                   1 AMI   
 3           1 "Anterior Myocardial Infarction "                        0 AMI   
 4           1 "Non-Anterior Location of Myocardial Infarction"         0 AMI   
 5           1 "History of Coronary Artery Bypass Graft (CABG) Sur…     1 AMI   
 6           1 "History of Percutaneous Transluminal Coronary Angi…     0 AMI   
 7           1 "History of COVID-19"                                    0 AMI   
 8           1 "Severe Infection; Other Infectious Diseases"            1 AMI   
 9           1 "Metastatic Cancer and Acute Leukemia"                   0 AMI   
10           1 "Cancer"                                                 0 AMI   
# ℹ 5,679 more rows</code></pre>
</div>
</div>
<p>We can then merge these (using <code>dplyr::inner_join()</code>) with the eligilble discharges we previously identified to get the set of risk factors for each eligible discharge:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb56" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb56-1">risk_factors <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb56-2">  risk_factors <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb56-3"></span>
<span id="cb56-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Join to get eligible only</span></span>
<span id="cb56-5">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">inner_join</span>(</span>
<span id="cb56-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> eligible_discharges,</span>
<span id="cb56-7">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> </span>
<span id="cb56-8">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb56-9">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ID Number"</span>,</span>
<span id="cb56-10">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cohort"</span></span>
<span id="cb56-11">        )</span>
<span id="cb56-12">    )</span>
<span id="cb56-13">risk_factors</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 4,626 × 4
   `ID Number` Factor                                               Value Cohort
         &lt;int&gt; &lt;chr&gt;                                                &lt;dbl&gt; &lt;chr&gt; 
 1           1 "Years Over 65 (continuous)"                            31 AMI   
 2           1 "Male"                                                   1 AMI   
 3           1 "Anterior Myocardial Infarction "                        0 AMI   
 4           1 "Non-Anterior Location of Myocardial Infarction"         0 AMI   
 5           1 "History of Coronary Artery Bypass Graft (CABG) Sur…     1 AMI   
 6           1 "History of Percutaneous Transluminal Coronary Angi…     0 AMI   
 7           1 "History of COVID-19"                                    0 AMI   
 8           1 "Severe Infection; Other Infectious Diseases"            1 AMI   
 9           1 "Metastatic Cancer and Acute Leukemia"                   0 AMI   
10           1 "Cancer"                                                 0 AMI   
# ℹ 4,616 more rows</code></pre>
</div>
</div>
<p>Of course, we didn’t <em>have</em> to do this step separately from step (i), as we could have just used the <code>eligible_only</code> argument simultaneously, and ended up in the same place.</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb58" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb58-1">risk_factors <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb58-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">setdiff</span>(cohorts, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"CABG"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb58-3"></span>
<span id="cb58-4">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Iterate each cohort</span></span>
<span id="cb58-5">      purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_df</span>(</span>
<span id="cb58-6"></span>
<span id="cb58-7">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Import eligible discharges</span></span>
<span id="cb58-8">        <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_discharges</span>(</span>
<span id="cb58-9">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> my_report,</span>
<span id="cb58-10">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cohort =</span> .x,</span>
<span id="cb58-11">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">risk_factors =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb58-12">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">discharge_phi =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>,</span>
<span id="cb58-13">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">eligible_only =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb58-14">        ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb58-15">        </span>
<span id="cb58-16">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send risk factors down the rows</span></span>
<span id="cb58-17">        tidyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb58-18">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ID Number</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>,</span>
<span id="cb58-19">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Factor"</span>,</span>
<span id="cb58-20">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Value"</span></span>
<span id="cb58-21">        ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb58-22">        </span>
<span id="cb58-23">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Indicate cohort</span></span>
<span id="cb58-24">        tibble<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_column</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Cohort =</span> .x)</span>
<span id="cb58-25">      )</span>
<span id="cb58-26">risk_factors</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 4,626 × 4
   `ID Number` Factor                                               Value Cohort
         &lt;int&gt; &lt;chr&gt;                                                &lt;dbl&gt; &lt;chr&gt; 
 1           1 "Years Over 65 (continuous)"                            31 AMI   
 2           1 "Male"                                                   1 AMI   
 3           1 "Anterior Myocardial Infarction "                        0 AMI   
 4           1 "Non-Anterior Location of Myocardial Infarction"         0 AMI   
 5           1 "History of Coronary Artery Bypass Graft (CABG) Sur…     1 AMI   
 6           1 "History of Percutaneous Transluminal Coronary Angi…     0 AMI   
 7           1 "History of COVID-19"                                    0 AMI   
 8           1 "Severe Infection; Other Infectious Diseases"            1 AMI   
 9           1 "Metastatic Cancer and Acute Leukemia"                   0 AMI   
10           1 "Cancer"                                                 0 AMI   
# ℹ 4,616 more rows</code></pre>
</div>
</div>
</section>
<section id="individualreadmissionrisk" class="level3">
<h3 class="anchored" data-anchor-id="individualreadmissionrisk">iii. Compute Individual Readmission Risks</h3>
<p>The <em>predicted</em> and <em>expected</em> readmission rates are computed for each discharge by plugging in each patient’s set of risk factors into risk models developed by CMS. We can use the <code>hsr_coefficients()</code> function to extract these:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb60" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb60-1">model_weights <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb60-2">  cohorts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb60-3"></span>
<span id="cb60-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Iterate each cohort</span></span>
<span id="cb60-5">  purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_df</span>(</span>
<span id="cb60-6"></span>
<span id="cb60-7">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Import eligible discharges</span></span>
<span id="cb60-8">    <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_coefficients</span>(</span>
<span id="cb60-9">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> my_report,</span>
<span id="cb60-10">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cohort =</span> .x</span>
<span id="cb60-11">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb60-12">    </span>
<span id="cb60-13">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Indicate cohort</span></span>
<span id="cb60-14">    tibble<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_column</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Cohort =</span> .x)</span>
<span id="cb60-15">  )</span>
<span id="cb60-16">model_weights</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 225 × 3
   Factor                                                           Value Cohort
   &lt;chr&gt;                                                            &lt;dbl&gt; &lt;chr&gt; 
 1 "Years Over 65 (continuous)"                                   0.00765 AMI   
 2 "Male"                                                        -0.134   AMI   
 3 "Anterior Myocardial Infarction "                              0.271   AMI   
 4 "Non-Anterior Location of Myocardial Infarction"               0.0712  AMI   
 5 "History of Coronary Artery Bypass Graft (CABG) Surgery"       0.0233  AMI   
 6 "History of Percutaneous Transluminal Coronary Angioplasty (… -0.0218  AMI   
 7 "History of COVID-19"                                         -0.0676  AMI   
 8 "Severe Infection; Other Infectious Diseases"                  0.0832  AMI   
 9 "Metastatic Cancer and Acute Leukemia"                         0.226   AMI   
10 "Cancer"                                                       0.0440  AMI   
# ℹ 215 more rows</code></pre>
</div>
</div>
<p>To use these in our calculations, we need to attach the model weights for each cohort to our current <code>risk_factors</code> dataset using <code>dplyr::inner_join()</code>:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb62" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb62-1">risk_factors <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb62-2">  risk_factors <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb62-3"></span>
<span id="cb62-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Join to get weights</span></span>
<span id="cb62-5">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">inner_join</span>(</span>
<span id="cb62-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> </span>
<span id="cb62-7">        model_weights <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb62-8">          </span>
<span id="cb62-9">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rename the column</span></span>
<span id="cb62-10">        dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(</span>
<span id="cb62-11">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Weight =</span> Value</span>
<span id="cb62-12">        ),</span>
<span id="cb62-13">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> </span>
<span id="cb62-14">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb62-15">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cohort"</span>,</span>
<span id="cb62-16">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Factor"</span></span>
<span id="cb62-17">        )</span>
<span id="cb62-18">    )</span>
<span id="cb62-19">risk_factors</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 4,626 × 5
   `ID Number` Factor                                      Value Cohort   Weight
         &lt;int&gt; &lt;chr&gt;                                       &lt;dbl&gt; &lt;chr&gt;     &lt;dbl&gt;
 1           1 "Years Over 65 (continuous)"                   31 AMI     0.00765
 2           1 "Male"                                          1 AMI    -0.134  
 3           1 "Anterior Myocardial Infarction "               0 AMI     0.271  
 4           1 "Non-Anterior Location of Myocardial Infar…     0 AMI     0.0712 
 5           1 "History of Coronary Artery Bypass Graft (…     1 AMI     0.0233 
 6           1 "History of Percutaneous Transluminal Coro…     0 AMI    -0.0218 
 7           1 "History of COVID-19"                           0 AMI    -0.0676 
 8           1 "Severe Infection; Other Infectious Diseas…     1 AMI     0.0832 
 9           1 "Metastatic Cancer and Acute Leukemia"          0 AMI     0.226  
10           1 "Cancer"                                        0 AMI     0.0440 
# ℹ 4,616 more rows</code></pre>
</div>
</div>
<section id="linearpredictor" class="level4">
<h4 class="anchored" data-anchor-id="linearpredictor">Compute the Linear Predictor</h4>
<p>These are <a href="https://en.wikipedia.org/wiki/Logistic_regression">logistic regression</a> models, so to convert to a risk estimate, we need to take the <em>weighted-sum</em> of each factor weight with the risk factor value for each discharge.</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb64" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb64-1">linear_predictors <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb64-2">  risk_factors <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb64-3"></span>
<span id="cb64-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute weighted sum</span></span>
<span id="cb64-5">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb64-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">LP =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(Weight <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> Value),</span>
<span id="cb64-7">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> </span>
<span id="cb64-8">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb64-9">          Cohort,</span>
<span id="cb64-10">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ID Number</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span></span>
<span id="cb64-11">        )</span>
<span id="cb64-12">    )</span>
<span id="cb64-13">linear_predictors</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 122 × 3
   Cohort `ID Number`    LP
   &lt;chr&gt;        &lt;int&gt; &lt;dbl&gt;
 1 AMI              1 1.92 
 2 AMI              2 0.749
 3 COPD             1 1.49 
 4 COPD             2 1.52 
 5 COPD             3 1.31 
 6 COPD             4 0.321
 7 COPD             5 0.258
 8 COPD             6 0.633
 9 COPD             7 0.841
10 COPD             8 1.08 
# ℹ 112 more rows</code></pre>
</div>
</div>
<p>You can think of this as the “risk-adjustment” part, where we’ve adjusted each patient’s readmission risk based on their own clinical history. These are on the <a href="https://en.wikipedia.org/wiki/Logit">logit</a> scale (so not yet probability/risk estimates), but it is what we assume to be linearly related to the outcome, thus we call it the “linear predictor”. However, we are still missing something: <strong>the intercept terms</strong>.</p>
</section>
<section id="add-in-the-intercepts" class="level4">
<h4 class="anchored" data-anchor-id="add-in-the-intercepts">Add In the Intercepts</h4>
<p>The <em>predicted</em> and <em>expected</em> readmission risks are derived from the same risk-adjusted model, the only difference being in the intercept term that is added to complete the linear predictor. We can see those in our model weight list:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb66" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb66-1">model_weights <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb66-2">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(</span>
<span id="cb66-3">    stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(</span>
<span id="cb66-4">      Factor,</span>
<span id="cb66-5">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pattern =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"EFFECT$"</span></span>
<span id="cb66-6">    )</span>
<span id="cb66-7">  )</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 11 × 3
   Factor      Value Cohort
   &lt;chr&gt;       &lt;dbl&gt; &lt;chr&gt; 
 1 HOSP_EFFECT -2.96 AMI   
 2 AVG_EFFECT  -2.95 AMI   
 3 HOSP_EFFECT -2.53 COPD  
 4 AVG_EFFECT  -2.53 COPD  
 5 HOSP_EFFECT -2.46 HF    
 6 AVG_EFFECT  -2.42 HF    
 7 HOSP_EFFECT -2.51 PN    
 8 AVG_EFFECT  -2.51 PN    
 9 AVG_EFFECT  -2.70 CABG  
10 HOSP_EFFECT -4.41 HK    
11 AVG_EFFECT  -4.28 HK    </code></pre>
</div>
</div>
<ul>
<li>The <code>AVG_EFFECT</code> corresponds to the risk-shift (intercept) associated with being treated at the “average” hospital</li>
<li>The <code>HOSP_EFFECT</code> corresponds to the risk-shift (intercept) associated with being treated at <em>your</em> hospital</li>
</ul>
<p>These are estimated by CMS using your discharge lists. They are done so <em>after</em> accounting for all of the risk factors, so by comparing them we get a measure of how much more or less likely a patient is to be readmitted at your hospital versus the average hospital, after risk-adjustment.</p>
<p>We can add each one into our current <code>linear_predictors</code> to get the complete linear predictors for the predicted and expected readmission rates:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb68" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb68-1">linear_predictors <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb68-2">  linear_predictors <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb68-3"></span>
<span id="cb68-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Join to get intercepts</span></span>
<span id="cb68-5">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">inner_join</span>(</span>
<span id="cb68-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> </span>
<span id="cb68-7">        model_weights <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb68-8"></span>
<span id="cb68-9">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filter to intercepts</span></span>
<span id="cb68-10">        dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(</span>
<span id="cb68-11">          stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(</span>
<span id="cb68-12">            Factor,</span>
<span id="cb68-13">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pattern =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"_EFFECT$"</span></span>
<span id="cb68-14">          )</span>
<span id="cb68-15">        ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb68-16">        </span>
<span id="cb68-17">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send over columns</span></span>
<span id="cb68-18">        tidyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(</span>
<span id="cb68-19">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> Factor,</span>
<span id="cb68-20">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> Value</span>
<span id="cb68-21">        ),</span>
<span id="cb68-22">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cohort"</span></span>
<span id="cb68-23">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb68-24">    </span>
<span id="cb68-25">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Finish LP's</span></span>
<span id="cb68-26">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb68-27">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">LP_Predicted =</span> LP <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> HOSP_EFFECT,</span>
<span id="cb68-28">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">LP_Expected =</span> LP <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> AVG_EFFECT</span>
<span id="cb68-29">    )</span>
<span id="cb68-30">linear_predictors</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 122 × 7
   Cohort `ID Number`    LP HOSP_EFFECT AVG_EFFECT LP_Predicted LP_Expected
   &lt;chr&gt;        &lt;int&gt; &lt;dbl&gt;       &lt;dbl&gt;      &lt;dbl&gt;        &lt;dbl&gt;       &lt;dbl&gt;
 1 AMI              1 1.92        -2.96      -2.95        -1.03       -1.03
 2 AMI              2 0.749       -2.96      -2.95        -2.21       -2.20
 3 COPD             1 1.49        -2.53      -2.53        -1.04       -1.04
 4 COPD             2 1.52        -2.53      -2.53        -1.01       -1.01
 5 COPD             3 1.31        -2.53      -2.53        -1.22       -1.22
 6 COPD             4 0.321       -2.53      -2.53        -2.21       -2.21
 7 COPD             5 0.258       -2.53      -2.53        -2.27       -2.27
 8 COPD             6 0.633       -2.53      -2.53        -1.90       -1.90
 9 COPD             7 0.841       -2.53      -2.53        -1.69       -1.69
10 COPD             8 1.08        -2.53      -2.53        -1.45       -1.45
# ℹ 112 more rows</code></pre>
</div>
</div>
</section>
<section id="transform-to-probability-scale" class="level4">
<h4 class="anchored" data-anchor-id="transform-to-probability-scale">Transform to Probability Scale</h4>
<p>The last thing we need to do is transform the linear predictors to the probabilty scale using the <a href="https://en.wikipedia.org/wiki/Logistic_function">logistic function</a> in order for our result to be a risk (i.e., a percentage between 0%-100%).</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb70" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb70-1">readmission_risks <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb70-2">  linear_predictors <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb70-3"></span>
<span id="cb70-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Transform LP's</span></span>
<span id="cb70-5">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb70-6">      dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(</span>
<span id="cb70-7">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(LP_Predicted, LP_Expected),</span>
<span id="cb70-8">        \(x) <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>x))</span>
<span id="cb70-9">      )</span>
<span id="cb70-10">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb70-11">    </span>
<span id="cb70-12">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Keep a few columns</span></span>
<span id="cb70-13">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(</span>
<span id="cb70-14">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ID Number</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>,</span>
<span id="cb70-15">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Predicted =</span> LP_Predicted,</span>
<span id="cb70-16">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Expected =</span> LP_Expected,</span>
<span id="cb70-17">      Cohort</span>
<span id="cb70-18">    )</span>
<span id="cb70-19">readmission_risks</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 122 × 4
   `ID Number` Predicted Expected Cohort
         &lt;int&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;chr&gt; 
 1           1    0.262    0.264  AMI   
 2           2    0.0991   0.100  AMI   
 3           1    0.260    0.260  COPD  
 4           2    0.266    0.266  COPD  
 5           3    0.227    0.227  COPD  
 6           4    0.0990   0.0989 COPD  
 7           5    0.0935   0.0935 COPD  
 8           6    0.130    0.130  COPD  
 9           7    0.156    0.156  COPD  
10           8    0.190    0.190  COPD  
# ℹ 112 more rows</code></pre>
</div>
</div>
<p>Now we have the <em>predicted</em> and <em>expected</em> readmission risk for each discharge.</p>
</section>
<section id="readmissionrisks" class="level4">
<h4 class="anchored" data-anchor-id="readmissionrisks">Doing it Easier</h4>
<p>We went through those computations to see how deriving the readmission risks works starting with the discharges. However, we can get this automatically by using the <code>hsr_readmission_risks()</code> function:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb72" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb72-1">readmission_risks <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb72-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">setdiff</span>(cohorts, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"CABG"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb72-3"></span>
<span id="cb72-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Iterate each cohort</span></span>
<span id="cb72-5">    purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_df</span>(</span>
<span id="cb72-6"></span>
<span id="cb72-7">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Import eligible discharges</span></span>
<span id="cb72-8">      <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_readmission_risks</span>(</span>
<span id="cb72-9">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> my_report,</span>
<span id="cb72-10">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cohort =</span> .x</span>
<span id="cb72-11">      ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb72-12">      </span>
<span id="cb72-13">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Indicate cohort</span></span>
<span id="cb72-14">      tibble<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_column</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Cohort =</span> .x)</span>
<span id="cb72-15">    )</span>
<span id="cb72-16">readmission_risks</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 122 × 4
   `ID Number` Predicted Expected Cohort
         &lt;int&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;chr&gt; 
 1           1    0.262    0.264  AMI   
 2           2    0.0991   0.100  AMI   
 3           1    0.260    0.260  COPD  
 4           2    0.266    0.266  COPD  
 5           3    0.227    0.227  COPD  
 6           4    0.0990   0.0989 COPD  
 7           5    0.0935   0.0935 COPD  
 8           6    0.130    0.130  COPD  
 9           7    0.156    0.156  COPD  
10           8    0.190    0.190  COPD  
# ℹ 112 more rows</code></pre>
</div>
</div>
<p>This function extracts eligible discharges and computes the readmission risk, so it captures everything we just did up to this point.</p>
</section>
</section>
<section id="iv.-compute-cohort-level-results" class="level3">
<h3 class="anchored" data-anchor-id="iv.-compute-cohort-level-results">iv. Compute Cohort-Level Results</h3>
<p>Now that we have individual level readmission risks, we can roll these up to get cohort-level results. The main calculation we need to do is compute the cohort-level predicted and expected readmission rates, which is just the <em>average</em> of the individual ones.</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb74" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb74-1">cohort_rates <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb74-2">  readmission_risks <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb74-3"></span>
<span id="cb74-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute cohort-level stats</span></span>
<span id="cb74-5">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb74-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Discharges =</span> dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>(),</span>
<span id="cb74-7">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Predicted =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(Predicted),</span>
<span id="cb74-8">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Expected =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(Expected),</span>
<span id="cb74-9">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> Cohort</span>
<span id="cb74-10">    )</span>
<span id="cb74-11">cohort_rates</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 5 × 4
  Cohort Discharges Predicted Expected
  &lt;chr&gt;       &lt;int&gt;     &lt;dbl&gt;    &lt;dbl&gt;
1 AMI             2    0.181    0.182 
2 COPD           18    0.165    0.165 
3 HF             25    0.159    0.164 
4 PN             32    0.142    0.141 
5 HK             45    0.0350   0.0397</code></pre>
</div>
</div>
<p>We can compare this to our previous cohort summary and see that it matches what was already in the table:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb76" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb76-1">cohort_summary <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb76-2">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(</span>
<span id="cb76-3">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Cohort =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Measure [a]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>,</span>
<span id="cb76-4">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Discharges =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Number of Eligible Discharges [b]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>,</span>
<span id="cb76-5">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Predicted =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Predicted Readmission Rate [d]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>,</span>
<span id="cb76-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Expected =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Expected Readmission Rate [e]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span></span>
<span id="cb76-7">    )</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 4
  Cohort    Discharges Predicted Expected
  &lt;chr&gt;          &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;
1 AMI                2    0.181    0.182 
2 COPD              18    0.165    0.165 
3 HF                25    0.159    0.164 
4 Pneumonia         32    0.142    0.141 
5 CABG              NA   NA       NA     
6 THA/TKA           45    0.0350   0.0397</code></pre>
</div>
</div>
<section id="cohortcontributions" class="level4">
<h4 class="anchored" data-anchor-id="cohortcontributions">Pulling in Additional Info</h4>
<p>Recall earlier how we calculated the payment penalty from the cohort-level data. It was done by computing the:</p>
<ol type="1">
<li>Excess readmission ratio (ERR) for each cohort</li>
<li>Difference between (1) and the peer group median ERR for each cohort</li>
<li>Sum (2) across cohorts, weighted by DRG ratios</li>
</ol>
<p>We can do (1) right away with our current <code>cohort_rates</code> dataset:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb78" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb78-1">cohort_rates <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb78-2">  cohort_rates <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb78-3"></span>
<span id="cb78-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute the ERR</span></span>
<span id="cb78-5">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb78-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ERR =</span> Predicted <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> Expected</span>
<span id="cb78-7">    )</span>
<span id="cb78-8">cohort_rates</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 5 × 5
  Cohort Discharges Predicted Expected   ERR
  &lt;chr&gt;       &lt;int&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt;
1 AMI             2    0.181    0.182  0.993
2 COPD           18    0.165    0.165  1.00 
3 HF             25    0.159    0.164  0.971
4 PN             32    0.142    0.141  1.01 
5 HK             45    0.0350   0.0397 0.882</code></pre>
</div>
</div>
<p>For the rest, we need to add the peer group medians and DRG ratios to our <code>cohort_rates</code> dataset, which we can get from our existing <code>cohort_summary</code> dataset that we extracted earlier:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb80" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb80-1">cohort_rates <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb80-2">  cohort_rates <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb80-3"></span>
<span id="cb80-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Join to get reference info</span></span>
<span id="cb80-5">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">inner_join</span>(</span>
<span id="cb80-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> </span>
<span id="cb80-7">        cohort_summary <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb80-8"></span>
<span id="cb80-9">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make matching names</span></span>
<span id="cb80-10">        dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb80-11">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Cohort =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Measure [a]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>,</span>
<span id="cb80-12">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Cohort =</span> </span>
<span id="cb80-13">            dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb80-14">              Cohort <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Pneumonia"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PN"</span>,</span>
<span id="cb80-15">              Cohort <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"THA/TKA"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HK"</span>,</span>
<span id="cb80-16">              <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Cohort</span>
<span id="cb80-17">            )</span>
<span id="cb80-18">        ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb80-19">        </span>
<span id="cb80-20">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Keep a few columns</span></span>
<span id="cb80-21">        dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(</span>
<span id="cb80-22">          Cohort,</span>
<span id="cb80-23">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">PeerGroupERR =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Peer Group Median ERR [g]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>,</span>
<span id="cb80-24">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">DRGRatio =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Ratio of DRG Payments Per Measure to Total Payments [i]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span></span>
<span id="cb80-25">        ),</span>
<span id="cb80-26">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cohort"</span></span>
<span id="cb80-27">    )</span>
<span id="cb80-28">cohort_rates</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 5 × 7
  Cohort Discharges Predicted Expected   ERR PeerGroupERR DRGRatio
  &lt;chr&gt;       &lt;int&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt;        &lt;dbl&gt;    &lt;dbl&gt;
1 AMI             2    0.181    0.182  0.993        0.996  0.00273
2 COPD           18    0.165    0.165  1.00         0.992  0.0226 
3 HF             25    0.159    0.164  0.971        0.996  0.0322 
4 PN             32    0.142    0.141  1.01         0.991  0.0494 
5 HK             45    0.0350   0.0397 0.882        0.996  0.104  </code></pre>
</div>
</div>
<p>Finally, we can indicate which cohorts received penalty and how much they contributed. Recall, a cohort is only eligible to receive penalty if:</p>
<ol type="1">
<li>They have at least 25 discharges</li>
<li>The ERR is greater than the peer group median ERR</li>
</ol>
<p>Thus, we can see, for example, that the COPD group had an ERR greater than the peer group median, but won’t actually contribute penalty because there were too few cases. Let’s compute the actual contribution for each cohort:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb82" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb82-1">cohort_rates <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb82-2">  cohort_rates <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb82-3"></span>
<span id="cb82-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute penalty contribution</span></span>
<span id="cb82-5">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb82-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">IsPenalized =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(ERR <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> PeerGroupERR <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> Discharges <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25</span>),</span>
<span id="cb82-7">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">PenaltyContribution =</span> (ERR <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> PeerGroupERR) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> DRGRatio,</span>
<span id="cb82-8">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">PenaltyContribution =</span> IsPenalized <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> PenaltyContribution</span>
<span id="cb82-9">    )</span>
<span id="cb82-10">cohort_rates</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 5 × 9
  Cohort Discharges Predicted Expected   ERR PeerGroupERR DRGRatio IsPenalized
  &lt;chr&gt;       &lt;int&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt;        &lt;dbl&gt;    &lt;dbl&gt;       &lt;dbl&gt;
1 AMI             2    0.181    0.182  0.993        0.996  0.00273           0
2 COPD           18    0.165    0.165  1.00         0.992  0.0226            0
3 HF             25    0.159    0.164  0.971        0.996  0.0322            0
4 PN             32    0.142    0.141  1.01         0.991  0.0494            1
5 HK             45    0.0350   0.0397 0.882        0.996  0.104             0
# ℹ 1 more variable: PenaltyContribution &lt;dbl&gt;</code></pre>
</div>
</div>
<p>Ultimately, based on these criteria, this hospital is only penalized in the Pneumonia cohort.</p>
</section>
</section>
<section id="v.-aggregate-to-the-program-result" class="level3">
<h3 class="anchored" data-anchor-id="v.-aggregate-to-the-program-result">v. Aggregate to the Program Result</h3>
<p>Now that we have the contributions of each cohort, we can aggregate them into our final penalty amount. We’ll do this step by step:</p>
<ol type="1">
<li>Add up the cohort contributions</li>
</ol>
<p>We just computed these in the prior step.</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb84" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb84-1">temp_penalty <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(cohort_rates<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>PenaltyContribution)</span>
<span id="cb84-2">temp_penalty</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0.0007729462</code></pre>
</div>
</div>
<ol start="2" type="1">
<li>Multiply (1) by the neutrality modifier</li>
</ol>
<p>The neutrality modifier was found in our program summary earlier. We can just extract from there.</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb86" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb86-1">penalty <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> temp_penalty <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> my_payment_summary<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Neutrality Modifier [e]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span></span>
<span id="cb86-2">penalty</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0.0007460787</code></pre>
</div>
</div>
<p>We can see that matches what was reported in the program summary:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb88" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb88-1">my_payment_summary<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Payment Reduction Percentage [f]</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span></span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 7e-04</code></pre>
</div>
</div>
<p>This is the final penalty amount.</p>
<section id="implication" class="level4">
<h4 class="anchored" data-anchor-id="implication">Implication</h4>
<p>Recall that the (mock) report we’ve been working with is from FY2025.</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb90" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb90-1">my_report</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/readmit/extdata/FY2025_HRRP_MockHSR.xlsx"</code></pre>
</div>
</div>
<p>Let’s remind ourselves of the payment period for this program year:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb92" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb92-1">my_payment_period <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb92-2">  hrrp_payment_periods <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb92-3">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(ProgramYear <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2025</span>)</span>
<span id="cb92-4">my_payment_period</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 3
  ProgramYear StartDate  EndDate   
        &lt;int&gt; &lt;date&gt;     &lt;date&gt;    
1        2025 2024-10-01 2025-09-30</code></pre>
</div>
</div>
<p>So from 10/01/2024 through 09/30/2025, all Medicare payments for this hospital were reduced by 0.075%.</p>
</section>
</section>
<section id="key-observations" class="level3">
<h3 class="anchored" data-anchor-id="key-observations">Key Observations</h3>
<p>Some important things to note about the penalty calculation:</p>
<section id="readmissions-themselves-dont-compute-penalty" class="level4">
<h4 class="anchored" data-anchor-id="readmissions-themselves-dont-compute-penalty">Readmissions Themselves Don’t Compute Penalty</h4>
<p>Notice that in the calculation of the penalty, we never explicity used or pulled out the actual discharges that readmitted, or used an “observed” readmission rate (i.e., the simple fraction of discharges that were readmitted). Instead, <em>all</em> discharges were fed into a statistical model to compute the predicted and expected readmission rates for the whole group, and then were aggregated. The actual readmission cases at your hospital were used upstream to fit that statistical model, in a pool along with data from all other hospitals. Thus, the readmissions from your hospital only contribute in an indirect way: they inform the resulting <em>hospital effect</em> estimated from the statistical model, after adjusting for clinical history, which in turn is built into the calculated <em>predicted</em> readmission rate that is computed on each discharge and used downstream in the penalty calculation.</p>
</section>
<section id="cohorts-arent-weighted-equally" class="level4">
<h4 class="anchored" data-anchor-id="cohorts-arent-weighted-equally">Cohorts Aren’t Weighted Equally</h4>
<p>First, remember that even if a cohort has excessive readmissions, they may not contribute to the penalty if there aren’t enough discharges. But beyond that point: the cohort-level contribution to the penalty is not necessarily proportional to how many readmissions they had. It also has to do with how many of those patients are seen at the hospital. Recall that we took the penalty contribution amount and multiplied it by the ratio of DRG’s relevant to that cohort out of all payments. This means that it is possible that a cohort with a slight excess in readmissions can contribute much more to the overall penalty amount than another cohort with huge excess, if the former is a much more high-volume diagnosis at your hospital. So you must consider the full picture.</p>
</section>
<section id="statistical-models-arent-perfect" class="level4">
<h4 class="anchored" data-anchor-id="statistical-models-arent-perfect">Statistical Models Aren’t Perfect</h4>
<p>Recall that CMS fits random-intercept logistic regression models on a combined dataset from all participating hospitals for each cohort. There are tons of assumptions and nuances that go into these models that make them far from perfect arbitors of truth. For example, each risk factor is added to the model as an independent factor, so no interactions between them are assumed. Additionally, effects are estimated on a combined dataset for all hospitals, which therefore assumes the effect of each clinical factor for risk adjustment is the same for all hospitals. On top of that, each of these risk factors is defined by claims documentation, and, specifically, groupings of ICD codes that may be inconsistent across hospitals. Then, these models are used to estimate individual level readmission risks, leading to group-level excess readmission ratios, that are then compared to a peer group <em>median</em> value to indicate penalty–so by definition, half of hospitals are getting flagged no matter what, even if everyone is doing well. We could keep going down the list of nuances, but it’s important to at least acknowledge and understand the mechanics of how these things work.</p>
</section>
</section>
</section>
<section id="understanding-model-weights" class="level2">
<h2 class="anchored" data-anchor-id="understanding-model-weights">2. Understanding Model Weights</h2>
<p>As described previously, the <em>predicted</em> and <em>expected</em> readmission rates are based on risk models developed by CMS that estimate hospital-level effects so that individual hospitals can be compared to an “average” hospital for penalty determination. Additionally, these models <em>risk-adjust</em> for individual patient clinical history as to tease out impacts due to the <em>hospital</em> itself instead of the overall morbidity of the population it serves. Thus, each model contains a long list of covariates (risk factors) which can be explored to gain further insight into the mechanics of the program. You can find more details about the model methodology <a href="https://qualitynet.cms.gov/inpatient/measures/readmission/methodology">here</a>.</p>
<section id="what-do-they-mean" class="level3">
<h3 class="anchored" data-anchor-id="what-do-they-mean">What Do They Mean?</h3>
<p>The first thing worth understanding are what the model estimates mean and how to interpret them. Recall from above that we can extract the model coefficients, which are found in the first row of the discharge-level data for each cohort, from the HSR with the <code>hsr_coefficients()</code> function, which we previously extracted above:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb94" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb94-1">model_weights</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 225 × 3
   Factor                                                           Value Cohort
   &lt;chr&gt;                                                            &lt;dbl&gt; &lt;chr&gt; 
 1 "Years Over 65 (continuous)"                                   0.00765 AMI   
 2 "Male"                                                        -0.134   AMI   
 3 "Anterior Myocardial Infarction "                              0.271   AMI   
 4 "Non-Anterior Location of Myocardial Infarction"               0.0712  AMI   
 5 "History of Coronary Artery Bypass Graft (CABG) Surgery"       0.0233  AMI   
 6 "History of Percutaneous Transluminal Coronary Angioplasty (… -0.0218  AMI   
 7 "History of COVID-19"                                         -0.0676  AMI   
 8 "Severe Infection; Other Infectious Diseases"                  0.0832  AMI   
 9 "Metastatic Cancer and Acute Leukemia"                         0.226   AMI   
10 "Cancer"                                                       0.0440  AMI   
# ℹ 215 more rows</code></pre>
</div>
</div>
<p>These are the weights (coefficients) of the regression equation that we use to weight individual patient risk factors. Remember that these are currently on the scale of the linear predictor. To make them intuitive and interpretable. we can <em>exponentiate</em> them to put them on the <em><a href="https://en.wikipedia.org/wiki/Odds_ratio">odds ratio</a></em> scale.</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb96" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb96-1">model_weights <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb96-2">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb96-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">OR =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(Value)</span>
<span id="cb96-4">  )</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 225 × 4
   Factor                                                     Value Cohort    OR
   &lt;chr&gt;                                                      &lt;dbl&gt; &lt;chr&gt;  &lt;dbl&gt;
 1 "Years Over 65 (continuous)"                             0.00765 AMI    1.01 
 2 "Male"                                                  -0.134   AMI    0.875
 3 "Anterior Myocardial Infarction "                        0.271   AMI    1.31 
 4 "Non-Anterior Location of Myocardial Infarction"         0.0712  AMI    1.07 
 5 "History of Coronary Artery Bypass Graft (CABG) Surger…  0.0233  AMI    1.02 
 6 "History of Percutaneous Transluminal Coronary Angiopl… -0.0218  AMI    0.978
 7 "History of COVID-19"                                   -0.0676  AMI    0.935
 8 "Severe Infection; Other Infectious Diseases"            0.0832  AMI    1.09 
 9 "Metastatic Cancer and Acute Leukemia"                   0.226   AMI    1.25 
10 "Cancer"                                                 0.0440  AMI    1.04 
# ℹ 215 more rows</code></pre>
</div>
</div>
<p>For example, according to these estimates, the odds of a readmission for males are 12.5% lower than females on average for the AMI cohort. So one thing we can do is assess these across all factors to better understand how each risk factor is weighted, by which direction, and how much.</p>
<section id="relative-importance" class="level4">
<h4 class="anchored" data-anchor-id="relative-importance">Relative Importance</h4>
<p>A more tractable way to organize them for understanding is to rank them by the magnitude of their effects to get a sense of which factors have the most impact on the readmission risk calculation. Here we’ll make a plot of the top five (5) most heavily-weighted factors for each cohort:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb98" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb98-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggplot2)</span>
<span id="cb98-2">model_weights <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb98-3"></span>
<span id="cb98-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Remove intercepts</span></span>
<span id="cb98-5">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(Factor, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"_EFFECT$"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb98-6">  </span>
<span id="cb98-7">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rank by group</span></span>
<span id="cb98-8">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb98-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Rank =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">order</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">order</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(Value), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">decreasing =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)),</span>
<span id="cb98-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cohort"</span></span>
<span id="cb98-11">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb98-12">  </span>
<span id="cb98-13">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filter to top 10</span></span>
<span id="cb98-14">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(Rank <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb98-15">  </span>
<span id="cb98-16">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a plot</span></span>
<span id="cb98-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb98-18">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_linerange</span>(</span>
<span id="cb98-19">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb98-20">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_sub</span>(Factor, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>),</span>
<span id="cb98-21">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ymin =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,</span>
<span id="cb98-22">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ymax =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(Value),</span>
<span id="cb98-23">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_sub</span>(Factor, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>)</span>
<span id="cb98-24">    ),</span>
<span id="cb98-25">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linewidth =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,</span>
<span id="cb98-26">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">show.legend =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span></span>
<span id="cb98-27">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb98-28">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_hline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">yintercept =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb98-29">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>Cohort, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"free_y"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nrow =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb98-30">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_flip</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb98-31">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb98-32">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Risk Factor"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb98-33">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Odds Ratio"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/investigating-a-hospital-specific-report/index_files/figure-html/unnamed-chunk-48-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>We can see, for example, that <em>Dialysis Status</em> is an important factor that shows up across multiple cohort models, thus having heavy impact on program results.</p>
<p>We can create the full listing in a table format:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb99" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb99-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(reactable)</span>
<span id="cb99-2">model_weights <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb99-3"></span>
<span id="cb99-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Remove intercepts</span></span>
<span id="cb99-5">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(Factor, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"_EFFECT$"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb99-6">  </span>
<span id="cb99-7">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rank by group</span></span>
<span id="cb99-8">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb99-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Rank =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">order</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">order</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(Value), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">decreasing =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)),</span>
<span id="cb99-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cohort"</span></span>
<span id="cb99-11">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb99-12">  </span>
<span id="cb99-13">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute odds ratio</span></span>
<span id="cb99-14">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb99-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">OR =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(Value)</span>
<span id="cb99-16">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb99-17">  </span>
<span id="cb99-18">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rearrange</span></span>
<span id="cb99-19">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(</span>
<span id="cb99-20">    Cohort,</span>
<span id="cb99-21">    Factor,</span>
<span id="cb99-22">    Rank,</span>
<span id="cb99-23">    Value,</span>
<span id="cb99-24">    OR</span>
<span id="cb99-25">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb99-26">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(</span>
<span id="cb99-27">    Cohort,</span>
<span id="cb99-28">    Rank</span>
<span id="cb99-29">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb99-30">  </span>
<span id="cb99-31">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a table</span></span>
<span id="cb99-32">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">reactable</span>(</span>
<span id="cb99-33">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">groupBy =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cohort"</span>,</span>
<span id="cb99-34">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> </span>
<span id="cb99-35">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb99-36">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Factor =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Risk Factor"</span>),</span>
<span id="cb99-37">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Value =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Coefficient"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colFormat</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)),</span>
<span id="cb99-38">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">OR =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Odds-Ratio"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colFormat</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span>
<span id="cb99-39">      ),</span>
<span id="cb99-40">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">searchable =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb99-41">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sortable =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb99-42">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">filterable =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb99-43">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resizable =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb99-44">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div class="reactable html-widget html-fill-item" id="htmlwidget-c96991d5e32e477db25c" style="width:auto;height:auto;"></div>
<script type="application/json" data-for="htmlwidget-c96991d5e32e477db25c">{"x":{"tag":{"name":"Reactable","attribs":{"data":{"Cohort":["AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","CABG","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN"],"Factor":["Congestive Heart Failure","Anterior Myocardial Infarction ","Renal Failure","Dialysis Status","Metastatic Cancer and Acute Leukemia","Iron Deficiency or Other/Unspecified Anemias and Blood Disease","Chronic Obstructive Pulmonary Disease (COPD)","Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance","Diabetes Mellitus (DM) or DM Complications","Pneumonia","Protein-Calorie Malnutrition","Hemiplegia, Paraplegia, Paralysis, Functional Disability","Male","Specified Arrhythmias and Other Heart Rhythm Disorders","Decubitus Ulcer or Chronic Skin Ulcer","Coronary Atherosclerosis/ Other Chronic Ischemic Heart Disease","Valvular and Rheumatic Heart Disease","Vascular or Circulatory Disease","Severe Infection; Other Infectious Diseases","Other Urinary Tract Disorders","Non-Anterior Location of Myocardial Infarction","History of COVID-19","Cerebrovascular Disease","Stroke","Cancer","Acute Coronary Syndrome","Angina Pectoris","History of Coronary Artery Bypass Graft (CABG) Surgery","History of Percutaneous Transluminal Coronary Angioplasty (PTCA)","Asthma","Years Over 65 (continuous)","Dementia or Other Specified Brain Disorders","Dialysis Status","Renal Failure","Male","Congestive Heart Failure","Decubitus Ulcer or Chronic Skin Ulcer","Chronic Obstructive Pulmonary Disease (COPD)","Protein-Calorie Malnutrition","Severe Hematological Disorders","Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance","Diabetes Mellitus (DM) or DM Complications","Pneumonia","Dementia or Other Specified Brain Disorders","Polyneuropathy; Other Neuropathies","Cardiogenic Shock","Other Respiratory Disorders","Morbid Obesity; Other Endocrine/Metabolic/Nutritional Disorders","Hemiplegia, Paraplegia, Paralysis, Functional Disability","Vascular or Circulatory Disease","Specified Arrhythmias and Other Heart Rhythm Disorders","Major Psychiatric Disorders","Stroke","History of COVID-19","Years Over 65 (continuous)","Cerebrovascular Disease","Fibrosis of Lung or Other Chronic Lung Disorders","History of Coronary Artery Bypass Graft (CABG) or Valve Surgery","Cancer; Metastatic Cancer and Acute Leukemia","History of Mechanical Ventilation","Congestive Heart Failure","Cardio-respiratory Failure and Shock","Metastatic Cancer and Acute Leukemia","Iron Deficiency or Other/Unspecified Anemias and Blood Disease","Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance","Drug/Alcohol Psychosis or Dependence","Specified Arrhythmias and Other Heart Rhythm Disorders","Renal Failure","Lung and Other Severe Cancers","Severe Hematological Disorders","Protein-Calorie Malnutrition","Fibrosis of Lung or Other Chronic Lung Disorders","Other Psychiatric Disorders","Diabetes Mellitus (DM) or DM Complications","Peptic Ulcer, Hemorrhage, Other Specified Gastrointestinal Disorders","Acute Coronary Syndrome","Anxiety Disorders","Other Gastrointestinal Disorders","Vertebral Fractures Without Spinal Cord Injury","Pneumonia","Other Digestive and Urinary Neoplasms","Decubitus Ulcer or Chronic Skin Ulcer","Other and Unspecified Heart Disease","Chronic Pancreatitis","Vascular or Circulatory Disease","Hemiplegia, Paraplegia, Paralysis, Functional Disability","Coronary Atherosclerosis or Angina","Major Psychiatric Disorders","Cellulitis, Local Skin Infection","Severe Infection; Other Infectious Diseases","Sleep-Disordered Breathing","Polyneuropathy; Other Neuropathies","Morbid Obesity; Other Endocrine/ Metabolic/ Nutritional Disorders","Depression","History of COVID-19","Dementia or Other Specified Brain Disorders","Respirator Dependence/Respiratory Failure","Stroke","Years Over 65 (continuous)","Lymphatic, Head and Neck, Brain, and Other Major Cancers; Breast, Colorectal and Other Cancers and Tumors; Other Respiratory and Heart Neoplasms","Renal Failure","Severe Hematological Disorders","Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance","Chronic Obstructive Pulmonary Disease (COPD)","Iron Deficiency or Other/Unspecified Anemias and Blood Disease","Dialysis Status","Metastatic Cancer and Acute Leukemia","Pneumonia","Decubitus Ulcer or Chronic Skin Ulcer","Congestive Heart Failure","Acute Coronary Syndrome","Specified Arrhythmias and Other Heart Rhythm Disorders","Diabetes Mellitus (DM) or DM Complications","Drug/Alcohol Abuse/ Dependence/Psychosis","Liver or Biliary Disease","Protein-Calorie Malnutrition","Hemiplegia, Paraplegia, Paralysis, Functional Disability","Peptic Ulcer, Hemorrhage, Other Specified Gastrointestinal Disorders","Vascular or Circulatory Disease","Valvular and Rheumatic Heart Disease","Other Psychiatric Disorders","Coronary Atherosclerosis or Angina","Cardio-Respiratory Failure and Shock","Other Gastrointestinal Disorders","Fibrosis of Lung or Other Chronic Lung Disorders","Nephritis","Other Urinary Tract Disorders","Other and Unspecified Heart Disease","Male","Stroke","Major Psychiatric Disorders","History of Coronary Artery Bypass Graft (CABG) Surgery","Depression","Cancer","Dementia or Other Specified Brain Disorders","Years Over 65 (continuous)","Asthma","History of COVID-19","Dialysis Status","Severe Hematological Disorders","Other Congenital Deformity of Hip (Joint)","Congestive Heart Failure","Chronic Obstructive Pulmonary Disease (COPD)","Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance","Major Psychiatric Disorders","Renal Failure","Morbid Obesity","Decubitus Ulcer or Chronic Skin Ulcer","Specified Arrhythmias and Other Heart Rhythm Disorders","Index Admissions with an Elective THA Procedure","Dementia or Other Specified Brain Disorders","Male","Pneumonia","Hypertension","Other Injuries","Cellulitis, Local Skin Infection","Rheumatoid Arthritis and Inflammatory Connective Tissue Disease","Protein-Calorie Malnutrition","Coronary Atherosclerosis or Angina","Number of Procedures (two vs. one)","Hemiplegia, Paraplegia, Paralysis, Functional Disability","Diabetes Mellitus (DM) or DM Complications","Vascular or Circulatory Disease","Major Symptoms, Abnormalities","Stroke","Polyneuropathy; Other Neuropathies","Metastatic Cancer and Acute Leukemia","Severe Infection; Other Infectious Diseases","Years Over 65 (continuous)","History of COVID-19","Cancer","Post Traumatic Osteoarthritis","Severe Hematological Disorders","Dialysis Status","Congestive Heart Failure","Iron Deficiency or Other/Unspecified Anemias and Blood Disease","Metastatic Cancer and Acute Leukemia","Respirator Dependence/Tracheostomy Status","Pleural Effusion/Pneumothorax","Chronic Obstructive Pulmonary Disease (COPD)","Renal Failure","Lung and Other Severe Cancers","Protein-Calorie Malnutrition","Septicemia, Sepsis, Systemic Inflammatory Response Syndrome/Shock","Decubitus Ulcer or Chronic Skin Ulcer","Hemiplegia, Paraplegia, Paralysis, Functional Disability","Fibrosis of Lung or Other Chronic Lung Disorders","Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance","Pneumonia","Specified Arrhythmias and Other Heart Rhythm Disorders","Diabetes Mellitus (DM) or DM Complications","History of COVID-19","Acute Coronary Syndrome","Other Gastrointestinal Disorders","Drug/Alcohol Abuse/Dependence/Psychosis","Other Psychiatric Disorders","Respiratory Arrest; Cardio-Respiratory Failure and Shock","Urinary Tract Infection","Vascular or Circulatory Disease","Valvular and Rheumatic Heart Disease","Other Urinary Tract Disorders","Vertebral Fractures Without Spinal Cord Injury","Coronary Atherosclerosis or Angina","Severe Infection; Other Infectious Diseases","History of Coronary Artery Bypass Graft (CABG) Surgery","Stroke","Lymphoma; Other Cancers","Male","Other Respiratory Disorders","Major Psychiatric Disorders","Other Injuries","Years Over 65 (continuous)","Asthma","Dementia or Other Specified Brain Disorders"],"Rank":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42],"Value":[0.33957243074671,0.27100275921083,0.26177130895785,0.25127110061188,0.22634890641462,0.22465852292459,0.20889600190574,0.18728702112007,0.18363519022453,0.16362948864226,0.14477909636915,0.14174229352292,-0.13377094797325,0.12834192882491,0.11440519285484,0.10175224371917,0.09968987228144,0.09943077750067,0.08324200070107,0.07545016852382,0.07120286291483,-0.0676429056577,0.05614466547329,0.04570363659622,0.04401442147305,0.0297614483081,0.02729795306472,0.02331580008699,-0.02178937903037,0.01695937962873,0.00765330879237,0.00703750490067,0.35818303029402,0.29143239275745,-0.28332728852419,0.2800446325316,0.27300580017862,0.26484564483108,0.26370785697943,0.26154055665819,0.19114552705245,0.15786295143238,0.1511681537206,0.14566619001395,0.13107528477562,0.12957699458811,0.12184220387001,-0.11691556053706,0.10490316401824,0.10464292366885,0.09889417588161,0.08747730777201,0.07037263002522,-0.04492223071892,0.02767499869282,-0.02389210984846,0.01718013343108,0.00902984145127,-0.00332420831438,0.29280807156464,0.2439415793327,0.22038178999048,0.20104445264549,0.18328054402919,0.17681007960757,0.16445038235307,0.16108300640596,0.16003937693966,0.15641571251929,0.14811373057356,0.14281362472359,0.09533030999425,0.09114373098859,0.0890658422646,0.08820129638075,0.08369753436147,0.08328503809818,0.08250218578877,0.08224193320365,0.07955108252904,-0.07920807606545,0.07294989023911,0.07153773924385,0.06293681682419,0.06087761722765,0.05564420968061,0.04904567748974,0.04427640612473,0.04118173223991,0.03808035282907,-0.03389473810138,0.03215598707713,-0.02388404741238,0.02313345493758,-0.02003981846797,0.01206184755198,0.01120929463448,-0.00864104299739,-0.00550489429206,-0.0034296550888,0.25467622994391,0.23762950955586,0.16255670466746,0.15927075502566,0.15184667522785,0.15063453076202,0.14874300720604,0.11689320548651,0.11392662096829,0.1016609291805,0.09790277908018,0.09682630620285,0.09677714822051,0.09145549141705,0.08650221507101,0.08560777795158,0.0742789500502,0.0723520680848,0.07185907371904,0.07058646898766,0.06772100095278,0.06340647519094,0.06051633638107,0.05653031652753,0.05413448733244,0.05285914464513,0.04461504793124,0.03603726506083,-0.03586725467682,0.03184030056073,0.02511695897923,0.01986402655318,0.01541403711219,0.01262653085985,0.00975548315292,-0.0058876745155,0.00404853256189,-0.00239294251728,0.5978782174907,0.29881994243779,0.28564566526498,0.2723238654975,0.25480313855337,0.24543227729109,0.23494388360546,0.22799144692474,0.22298546250043,0.20960556510495,0.17376971474725,0.16317044297896,0.15538492166378,0.14771744300593,0.14606104955108,0.14328115783213,0.14248311993964,0.13988994687991,0.13935558883477,0.13277839169882,0.13100204057882,0.1286610446364,0.1224436188646,0.11981420121191,0.10992796636631,0.09528675586484,0.08923188109521,0.08487023427136,0.0799645208149,0.04508274963675,0.03857725723824,-0.02583679305955,-0.021371850181,-1.52018641e-05,0.26359064734308,0.24009331029806,0.19422199453265,0.18194701681503,0.17827066953589,0.17600014087344,0.1606016141625,0.13139244938143,0.12708573624157,0.12270154741653,0.12196238167798,0.11855382669326,0.11631820787447,0.11023929530957,0.10848046835409,0.1072630978906,0.10709471281501,0.09977928562087,0.09004469255696,-0.0767024905464,0.07105773620641,0.06175525419797,0.06078806879643,0.05806411686367,0.05240883963648,0.05229959267784,0.05191332162581,0.04899146976652,0.04270544546113,0.03936136215198,0.03285307899265,0.03053426003874,-0.02980557296489,0.02877533874591,0.02813765867189,0.02436651587802,0.02356441710246,0.0184252173479,0.00894405564906,-0.00890555581122,-0.00763712593242,-0.00405389134248],"OR":[1.40434700657616,1.31127868847395,1.29922938632986,1.28565857991806,1.25401312218034,1.2518951496995,1.23231683326371,1.20597337520888,1.20157739594988,1.17777785416899,1.15578422513392,1.15227966033888,0.874790414174569,1.13694168937752,1.12120633765287,1.10710914450208,1.10482822708194,1.10454200893507,1.08680478424652,1.0783694896356,1.07379903777738,0.934594152280405,1.05775069267337,1.04676414239889,1.04499742518198,1.03020874659868,1.02767395575626,1.02358973823982,0.978446294657523,1.01710400634375,1.00768267021632,1.0070623263311,1.43072746298482,1.33834314874191,0.753273209833681,1.3231888682885,1.31390786570295,1.30322979999659,1.30174784419777,1.2989296207483,1.21063561949572,1.17100569905634,1.1631922367401,1.1568099688729,1.14005360676125,1.13834675462822,1.1295758450197,0.889660312430832,1.11060305881081,1.11031407268731,1.10394946889795,1.09141749746726,1.07290790447466,0.956071831961017,1.02806150878639,0.976391047053408,1.01732856070524,1.0090707334597,0.99668131074888,1.34018554603127,1.27626976777999,1.24655256103836,1.22267912193687,1.20115133665262,1.19340441976714,1.17874508154332,1.17478247923821,1.17355708116837,1.16931219978582,1.15964477592895,1.15351479499191,1.10002214343179,1.09542644066927,1.09315262960155,1.09220795741027,1.08729997318708,1.08685155850211,1.08600104720536,1.08571844940038,1.08280087032362,0.923847673855356,1.07567663366527,1.07415868787304,1.06495954957909,1.06276884163083,1.05722146764992,1.05026832326495,1.04527123433375,1.04204146086774,1.03881470122512,0.966673253144263,1.0326785772551,0.976398919175548,1.02340310863209,0.980159644074325,1.01213488499526,1.01127235417498,0.991396183512194,0.994510229873486,0.996576219460403,1.29004387567186,1.26823923525535,1.17651503045079,1.17265540604647,1.16398175551224,1.16257169623939,1.16037474293993,1.12399938631304,1.1206698881957,1.10700805395688,1.10285555926439,1.10166900392913,1.10161484943477,1.09576800452604,1.09035378292616,1.08937896605059,1.07710722270742,1.07503376252324,1.0745039075538,1.07313735851998,1.07006671923253,1.06545983421404,1.062384952954,1.05815869401114,1.05562656100928,1.05428113351708,1.04562526681456,1.03669447825391,0.964768353470128,1.0323526260044,1.02543504734744,1.02006262916481,1.01553344611831,1.01270658206884,1.00980322299415,0.994129623874354,1.0040567389407,0.997609918287297,1.81825675924455,1.34826683597601,1.3306208865114,1.31301217179861,1.2902076037353,1.2781737192729,1.26483778860496,1.25607458208597,1.249802404602,1.23319155023515,1.18978154512627,1.17723732442611,1.16810750446418,1.15918531416689,1.15726683651509,1.1540542274412,1.15313361582805,1.15014721460233,1.14953278836153,1.14199689429525,1.13997010750847,1.13730456334368,1.13025539319517,1.12728738349084,1.11619766378927,1.09997423396834,1.09333415045518,1.08857579770109,1.08324863421437,1.04611442191559,1.03933102110065,0.974494120834844,0.97885490951141,0.999984798251448,1.30159527574775,1.27136777649297,1.21436583561166,1.1995506361829,1.19514876782156,1.19243822663353,1.17421708416427,1.14041524876104,1.13551436834238,1.13054695593161,1.12971160312608,1.1258674741942,1.12335327512397,1.11654522252815,1.11458313868254,1.11322710365637,1.11303966860744,1.10492701787976,1.09422318624449,0.926165355571123,1.07364321216502,1.06370197627399,1.06267367660932,1.05978294349047,1.05380649231681,1.05369137345083,1.05328444157356,1.05021139215344,1.04363044348821,1.04014628524584,1.03339870010939,1.03100521172221,0.97063423273985,1.02919334863171,1.02853726174997,1.02466580536269,1.02384425170249,1.01859600901031,1.00898417323021,0.991133981197538,0.992391962815253,0.99595431458264]},"columns":[{"id":"Cohort","name":"Cohort","type":"character"},{"id":"Factor","name":"Risk Factor","type":"character"},{"id":"Rank","name":"Rank","type":"numeric"},{"id":"Value","name":"Coefficient","type":"numeric","format":{"cell":{"digits":2},"aggregated":{"digits":2}}},{"id":"OR","name":"Odds-Ratio","type":"numeric","format":{"cell":{"digits":2},"aggregated":{"digits":2}}}],"groupBy":["Cohort"],"resizable":true,"filterable":true,"searchable":true,"dataKey":"e0d7afee0d1a9a6a7d3cb81f7787c7e0"},"children":[]},"class":"reactR_markup"},"evals":[],"jsHooks":[]}</script>
</div>
</div>
</section>
</section>
<section id="risk-factor-prevalence" class="level3">
<h3 class="anchored" data-anchor-id="risk-factor-prevalence">Risk Factor Prevalence</h3>
<p>We now understand the model output, but how can we incorporate the risk factors in our discharge datasets to gain additional insight?</p>
<p>Recall, the way we calculated the readmission risks was by taking a weighted-sum of the risk factors in our dataset with the model weights (and then doing some transformations to turn it into a probability). We just established that different factors yield different effects (weight) on the readmission risks from the model side and how that can be useful for understanding how CMS weights different clinical history factors.</p>
<p>The next thing we can seek to understand is how prevalent each of the risk factors are for your cohort. This is useful for a few reasons:</p>
<ol type="1">
<li>You can compare to see if the rates of various risk factors are similar for your hospital versus peer group hospitals</li>
<li>It gives insight into the difference between model importance for readmission risk vs.&nbsp;how prevalent the risk factor is</li>
<li>We can use the combination of model weights and prevalence of risk factors to understand overall impact. For example, a risk factor that has an average impact in terms of odds-ratio but has very high prevalence at your hospital may have more overall net impact on your HRRP readmission metrics than the most important factor according to the model but has very few patients with it at your hospital.</li>
</ol>
<p>The first step get at this analysis is to compute these from your datasets. We can do this by manipulating the output of <code>hsr_discharges()</code> after using the <code>risk_factors=TRUE</code> argument. Here is an example of how we’d do this for AMI:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb100" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb100-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_discharges</span>(</span>
<span id="cb100-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> my_report,</span>
<span id="cb100-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cohort =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AMI"</span>,</span>
<span id="cb100-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">discharge_phi =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>,</span>
<span id="cb100-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">risk_factors =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb100-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">eligible_only =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb100-7">)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 33
  `ID Number` `Years Over 65 (continuous)`  Male Anterior Myocardial Infarctio…¹
        &lt;int&gt;                        &lt;dbl&gt; &lt;dbl&gt;                           &lt;dbl&gt;
1           1                           31     1                               0
2           2                           27     1                               0
# ℹ abbreviated name: ¹​`Anterior Myocardial Infarction `
# ℹ 29 more variables: `Non-Anterior Location of Myocardial Infarction` &lt;dbl&gt;,
#   `History of Coronary Artery Bypass Graft (CABG) Surgery` &lt;dbl&gt;,
#   `History of Percutaneous Transluminal Coronary Angioplasty (PTCA)` &lt;dbl&gt;,
#   `History of COVID-19` &lt;dbl&gt;,
#   `Severe Infection; Other Infectious Diseases` &lt;dbl&gt;,
#   `Metastatic Cancer and Acute Leukemia` &lt;dbl&gt;, Cancer &lt;dbl&gt;, …</code></pre>
</div>
</div>
<p>However, we’ve already done this earlier in our exercises, so we’ll use that <code>risk_factors</code> dataset:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb102" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb102-1">risk_factors</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 4,626 × 5
   `ID Number` Factor                                      Value Cohort   Weight
         &lt;int&gt; &lt;chr&gt;                                       &lt;dbl&gt; &lt;chr&gt;     &lt;dbl&gt;
 1           1 "Years Over 65 (continuous)"                   31 AMI     0.00765
 2           1 "Male"                                          1 AMI    -0.134  
 3           1 "Anterior Myocardial Infarction "               0 AMI     0.271  
 4           1 "Non-Anterior Location of Myocardial Infar…     0 AMI     0.0712 
 5           1 "History of Coronary Artery Bypass Graft (…     1 AMI     0.0233 
 6           1 "History of Percutaneous Transluminal Coro…     0 AMI    -0.0218 
 7           1 "History of COVID-19"                           0 AMI    -0.0676 
 8           1 "Severe Infection; Other Infectious Diseas…     1 AMI     0.0832 
 9           1 "Metastatic Cancer and Acute Leukemia"          0 AMI     0.226  
10           1 "Cancer"                                        0 AMI     0.0440 
# ℹ 4,616 more rows</code></pre>
</div>
</div>
<p>Now we can compute the prevalence for each risk factor (in each cohort) by simply taking the average <code>Value</code>, because all of these are binary factors (except for Age):</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb104" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb104-1">prevalence <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb104-2">  risk_factors <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb104-3"></span>
<span id="cb104-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Remove age (for demo purposes)</span></span>
<span id="cb104-5">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(Factor, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^Years"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb104-6"></span>
<span id="cb104-7">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute average</span></span>
<span id="cb104-8">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb104-9">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">N =</span> dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>(),</span>
<span id="cb104-10">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Count =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(Value),</span>
<span id="cb104-11">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Rate =</span> Count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> N,</span>
<span id="cb104-12">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> </span>
<span id="cb104-13">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb104-14">          Cohort,</span>
<span id="cb104-15">          Factor</span>
<span id="cb104-16">        )</span>
<span id="cb104-17">    )</span>
<span id="cb104-18">prevalence</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 182 × 5
   Cohort Factor                                                   N Count  Rate
   &lt;chr&gt;  &lt;chr&gt;                                                &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt;
 1 AMI    "Male"                                                   2     2   1  
 2 AMI    "Anterior Myocardial Infarction "                        2     0   0  
 3 AMI    "Non-Anterior Location of Myocardial Infarction"         2     0   0  
 4 AMI    "History of Coronary Artery Bypass Graft (CABG) Sur…     2     1   0.5
 5 AMI    "History of Percutaneous Transluminal Coronary Angi…     2     1   0.5
 6 AMI    "History of COVID-19"                                    2     0   0  
 7 AMI    "Severe Infection; Other Infectious Diseases"            2     1   0.5
 8 AMI    "Metastatic Cancer and Acute Leukemia"                   2     0   0  
 9 AMI    "Cancer"                                                 2     0   0  
10 AMI    "Diabetes Mellitus (DM) or DM Complications"             2     1   0.5
# ℹ 172 more rows</code></pre>
</div>
</div>
<p>Then we could put all of these into a navigatable table:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb106" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb106-1">prevalence <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb106-2">  </span>
<span id="cb106-3">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Arrange</span></span>
<span id="cb106-4">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(Cohort, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(Rate)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb106-5">    </span>
<span id="cb106-6">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a table</span></span>
<span id="cb106-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">reactable</span>(</span>
<span id="cb106-8">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">groupBy =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cohort"</span>,</span>
<span id="cb106-9">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> </span>
<span id="cb106-10">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb106-11">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Factor =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Risk Factor"</span>),</span>
<span id="cb106-12">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">N =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Discharge Count"</span>),</span>
<span id="cb106-13">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Count =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Count"</span>),</span>
<span id="cb106-14">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Rate =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Percent"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colFormat</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">percent =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>))</span>
<span id="cb106-15">        ),</span>
<span id="cb106-16">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columnGroups =</span> </span>
<span id="cb106-17">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb106-18">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colGroup</span>(</span>
<span id="cb106-19">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Risk Factor Prevalence"</span>,</span>
<span id="cb106-20">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Count"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Rate"</span>)</span>
<span id="cb106-21">          )</span>
<span id="cb106-22">        ),</span>
<span id="cb106-23">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">searchable =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb106-24">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sortable =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb106-25">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">filterable =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb106-26">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resizable =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb106-27">    )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div class="reactable html-widget html-fill-item" id="htmlwidget-13411291f937cdc3f40b" style="width:auto;height:auto;"></div>
<script type="application/json" data-for="htmlwidget-13411291f937cdc3f40b">{"x":{"tag":{"name":"Reactable","attribs":{"data":{"Cohort":["AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN"],"Factor":["Male","Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance","Iron Deficiency or Other/Unspecified Anemias and Blood Disease","Renal Failure","History of Coronary Artery Bypass Graft (CABG) Surgery","History of Percutaneous Transluminal Coronary Angioplasty (PTCA)","Severe Infection; Other Infectious Diseases","Diabetes Mellitus (DM) or DM Complications","Dementia or Other Specified Brain Disorders","Congestive Heart Failure","Coronary Atherosclerosis/ Other Chronic Ischemic Heart Disease","Specified Arrhythmias and Other Heart Rhythm Disorders","Chronic Obstructive Pulmonary Disease (COPD)","Asthma","Other Urinary Tract Disorders","Anterior Myocardial Infarction ","Non-Anterior Location of Myocardial Infarction","History of COVID-19","Metastatic Cancer and Acute Leukemia","Cancer","Protein-Calorie Malnutrition","Hemiplegia, Paraplegia, Paralysis, Functional Disability","Acute Coronary Syndrome","Angina Pectoris","Valvular and Rheumatic Heart Disease","Stroke","Cerebrovascular Disease","Vascular or Circulatory Disease","Pneumonia","Dialysis Status","Decubitus Ulcer or Chronic Skin Ulcer","Cardio-respiratory Failure and Shock","Morbid Obesity; Other Endocrine/ Metabolic/ Nutritional Disorders","Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance","Other Gastrointestinal Disorders","Iron Deficiency or Other/Unspecified Anemias and Blood Disease","Congestive Heart Failure","Diabetes Mellitus (DM) or DM Complications","Polyneuropathy; Other Neuropathies","Vascular or Circulatory Disease","Renal Failure","Depression","Specified Arrhythmias and Other Heart Rhythm Disorders","Pneumonia","Other Psychiatric Disorders","Sleep-Disordered Breathing","Lymphatic, Head and Neck, Brain, and Other Major Cancers; Breast, Colorectal and Other Cancers and Tumors; Other Respiratory and Heart Neoplasms","Severe Infection; Other Infectious Diseases","Anxiety Disorders","History of Mechanical Ventilation","History of COVID-19","Drug/Alcohol Psychosis or Dependence","Acute Coronary Syndrome","Coronary Atherosclerosis or Angina","Other and Unspecified Heart Disease","Metastatic Cancer and Acute Leukemia","Protein-Calorie Malnutrition","Peptic Ulcer, Hemorrhage, Other Specified Gastrointestinal Disorders","Dementia or Other Specified Brain Disorders","Major Psychiatric Disorders","Hemiplegia, Paraplegia, Paralysis, Functional Disability","Respirator Dependence/Respiratory Failure","Cellulitis, Local Skin Infection","Vertebral Fractures Without Spinal Cord Injury","Lung and Other Severe Cancers","Other Digestive and Urinary Neoplasms","Chronic Pancreatitis","Severe Hematological Disorders","Stroke","Fibrosis of Lung or Other Chronic Lung Disorders","Decubitus Ulcer or Chronic Skin Ulcer","Specified Arrhythmias and Other Heart Rhythm Disorders","Other Gastrointestinal Disorders","Congestive Heart Failure","Cardio-Respiratory Failure and Shock","Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance","Iron Deficiency or Other/Unspecified Anemias and Blood Disease","Renal Failure","Coronary Atherosclerosis or Angina","Male","Diabetes Mellitus (DM) or DM Complications","Valvular and Rheumatic Heart Disease","Vascular or Circulatory Disease","Depression","Chronic Obstructive Pulmonary Disease (COPD)","Other Psychiatric Disorders","Other and Unspecified Heart Disease","Other Urinary Tract Disorders","History of Coronary Artery Bypass Graft (CABG) Surgery","Liver or Biliary Disease","Pneumonia","Dementia or Other Specified Brain Disorders","Cancer","Drug/Alcohol Abuse/ Dependence/Psychosis","Acute Coronary Syndrome","Peptic Ulcer, Hemorrhage, Other Specified Gastrointestinal Disorders","Major Psychiatric Disorders","Stroke","Fibrosis of Lung or Other Chronic Lung Disorders","Decubitus Ulcer or Chronic Skin Ulcer","Protein-Calorie Malnutrition","Asthma","History of COVID-19","Metastatic Cancer and Acute Leukemia","Severe Hematological Disorders","Hemiplegia, Paraplegia, Paralysis, Functional Disability","Dialysis Status","Nephritis","Hypertension","Major Symptoms, Abnormalities","Male","Diabetes Mellitus (DM) or DM Complications","Specified Arrhythmias and Other Heart Rhythm Disorders","Other Injuries","Index Admissions with an Elective THA Procedure","Renal Failure","Morbid Obesity","Polyneuropathy; Other Neuropathies","Chronic Obstructive Pulmonary Disease (COPD)","Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance","Congestive Heart Failure","Rheumatoid Arthritis and Inflammatory Connective Tissue Disease","Coronary Atherosclerosis or Angina","Vascular or Circulatory Disease","Severe Infection; Other Infectious Diseases","Cancer","Pneumonia","Number of Procedures (two vs. one)","Metastatic Cancer and Acute Leukemia","Protein-Calorie Malnutrition","Severe Hematological Disorders","Stroke","Cellulitis, Local Skin Infection","Other Congenital Deformity of Hip (Joint)","Post Traumatic Osteoarthritis","History of COVID-19","Dementia or Other Specified Brain Disorders","Major Psychiatric Disorders","Hemiplegia, Paraplegia, Paralysis, Functional Disability","Dialysis Status","Decubitus Ulcer or Chronic Skin Ulcer","Respiratory Arrest; Cardio-Respiratory Failure and Shock","Male","Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance","Chronic Obstructive Pulmonary Disease (COPD)","Other Gastrointestinal Disorders","Specified Arrhythmias and Other Heart Rhythm Disorders","Iron Deficiency or Other/Unspecified Anemias and Blood Disease","Congestive Heart Failure","Other Respiratory Disorders","Diabetes Mellitus (DM) or DM Complications","Renal Failure","Other Injuries","Pneumonia","Lymphoma; Other Cancers","Vascular or Circulatory Disease","Severe Infection; Other Infectious Diseases","Dementia or Other Specified Brain Disorders","Coronary Atherosclerosis or Angina","Valvular and Rheumatic Heart Disease","Other Psychiatric Disorders","Pleural Effusion/Pneumothorax","Decubitus Ulcer or Chronic Skin Ulcer","Asthma","Urinary Tract Infection","Septicemia, Sepsis, Systemic Inflammatory Response Syndrome/Shock","Drug/Alcohol Abuse/Dependence/Psychosis","Other Urinary Tract Disorders","Protein-Calorie Malnutrition","Acute Coronary Syndrome","Metastatic Cancer and Acute Leukemia","Vertebral Fractures Without Spinal Cord Injury","History of Coronary Artery Bypass Graft (CABG) Surgery","History of COVID-19","Lung and Other Severe Cancers","Severe Hematological Disorders","Major Psychiatric Disorders","Stroke","Fibrosis of Lung or Other Chronic Lung Disorders","Hemiplegia, Paraplegia, Paralysis, Functional Disability","Respirator Dependence/Tracheostomy Status","Dialysis Status"],"N":[2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32],"Count":[2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,12,10,9,9,9,8,8,8,7,6,6,6,5,4,4,3,3,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,18,17,17,16,15,14,14,13,12,12,10,9,8,8,6,6,6,5,5,5,4,3,3,3,2,2,2,2,2,1,1,0,0,0,0,0,0,29,23,19,13,12,12,10,9,7,7,7,6,4,3,3,3,2,2,2,1,1,1,1,1,1,0,0,0,0,0,0,0,0,25,24,22,19,17,16,15,13,13,11,11,11,10,9,9,8,8,8,8,7,7,6,5,5,4,4,4,3,3,2,2,1,1,1,1,1,1,1,0,0,0],"Rate":[1,1,1,1,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.833333333333333,0.666666666666667,0.555555555555556,0.5,0.5,0.5,0.444444444444444,0.444444444444444,0.444444444444444,0.388888888888889,0.333333333333333,0.333333333333333,0.333333333333333,0.277777777777778,0.222222222222222,0.222222222222222,0.166666666666667,0.166666666666667,0.111111111111111,0.111111111111111,0.111111111111111,0.111111111111111,0.111111111111111,0.111111111111111,0.0555555555555556,0.0555555555555556,0.0555555555555556,0.0555555555555556,0.0555555555555556,0.0555555555555556,0.0555555555555556,0.0555555555555556,0.0555555555555556,0,0,0,0,0,0,0,0.72,0.68,0.68,0.64,0.6,0.56,0.56,0.52,0.48,0.48,0.4,0.36,0.32,0.32,0.24,0.24,0.24,0.2,0.2,0.2,0.16,0.12,0.12,0.12,0.08,0.08,0.08,0.08,0.08,0.04,0.04,0,0,0,0,0,0,0.644444444444444,0.511111111111111,0.422222222222222,0.288888888888889,0.266666666666667,0.266666666666667,0.222222222222222,0.2,0.155555555555556,0.155555555555556,0.155555555555556,0.133333333333333,0.0888888888888889,0.0666666666666667,0.0666666666666667,0.0666666666666667,0.0444444444444444,0.0444444444444444,0.0444444444444444,0.0222222222222222,0.0222222222222222,0.0222222222222222,0.0222222222222222,0.0222222222222222,0.0222222222222222,0,0,0,0,0,0,0,0,0.78125,0.75,0.6875,0.59375,0.53125,0.5,0.46875,0.40625,0.40625,0.34375,0.34375,0.34375,0.3125,0.28125,0.28125,0.25,0.25,0.25,0.25,0.21875,0.21875,0.1875,0.15625,0.15625,0.125,0.125,0.125,0.09375,0.09375,0.0625,0.0625,0.03125,0.03125,0.03125,0.03125,0.03125,0.03125,0.03125,0,0,0]},"columns":[{"id":"Cohort","name":"Cohort","type":"character"},{"id":"Factor","name":"Risk Factor","type":"character"},{"id":"N","name":"Discharge Count","type":"numeric"},{"id":"Count","name":"Count","type":"numeric"},{"id":"Rate","name":"Percent","type":"numeric","format":{"cell":{"digits":1,"percent":true},"aggregated":{"digits":1,"percent":true}}}],"columnGroups":[{"name":"Risk Factor Prevalence","columns":["Count","Rate"]}],"groupBy":["Cohort"],"resizable":true,"filterable":true,"searchable":true,"dataKey":"517ed1869ca3b676da7ccde083b9b408"},"children":[]},"class":"reactR_markup"},"evals":[],"jsHooks":[]}</script>
</div>
</div>
</section>
<section id="net-factor-influence" class="level3">
<h3 class="anchored" data-anchor-id="net-factor-influence">Net Factor Influence</h3>
<p>We can then combine the two concepts above (model weights + risk factor prevalence) to explore which factors may have the most overall impact. One way to quantify this is by computing the total weight of a risk factor by taking the number of patients with the risk factor multiplied by the model weight.</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb107" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb107-1">prevalence <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb107-2"></span>
<span id="cb107-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Join to get model weight</span></span>
<span id="cb107-4">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">inner_join</span>(</span>
<span id="cb107-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> model_weights,</span>
<span id="cb107-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> </span>
<span id="cb107-7">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb107-8">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cohort"</span>,</span>
<span id="cb107-9">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Factor"</span></span>
<span id="cb107-10">      )</span>
<span id="cb107-11">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb107-12">  </span>
<span id="cb107-13">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make total weight</span></span>
<span id="cb107-14">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb107-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">NetImpact =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(Count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> Value)</span>
<span id="cb107-16">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb107-17">  </span>
<span id="cb107-18">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Arrange</span></span>
<span id="cb107-19">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(Cohort, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(NetImpact)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb107-20">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">relocate</span>(NetImpact, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.after =</span> Factor) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb107-21">    </span>
<span id="cb107-22">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a table</span></span>
<span id="cb107-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">reactable</span>(</span>
<span id="cb107-24">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">groupBy =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cohort"</span>,</span>
<span id="cb107-25">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> </span>
<span id="cb107-26">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb107-27">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Factor =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Risk Factor"</span>),</span>
<span id="cb107-28">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">NetImpact =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Net Impact"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colFormat</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)),</span>
<span id="cb107-29">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">N =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Discharge Count"</span>),</span>
<span id="cb107-30">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Count =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Count"</span>),</span>
<span id="cb107-31">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Rate =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Percent"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colFormat</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">percent =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>))</span>
<span id="cb107-32">      ),</span>
<span id="cb107-33">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columnGroups =</span> </span>
<span id="cb107-34">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb107-35">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colGroup</span>(</span>
<span id="cb107-36">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Risk Factor Prevalence"</span>,</span>
<span id="cb107-37">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Count"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Rate"</span>)</span>
<span id="cb107-38">        )</span>
<span id="cb107-39">      ),</span>
<span id="cb107-40">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">searchable =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb107-41">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sortable =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb107-42">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">filterable =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb107-43">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resizable =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb107-44">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div class="reactable html-widget html-fill-item" id="htmlwidget-10475ca61a11461d485d" style="width:auto;height:auto;"></div>
<script type="application/json" data-for="htmlwidget-10475ca61a11461d485d">{"x":{"tag":{"name":"Reactable","attribs":{"data":{"Cohort":["AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","AMI","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","COPD","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HF","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","HK","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN","PN"],"Factor":["Renal Failure","Iron Deficiency or Other/Unspecified Anemias and Blood Disease","Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance","Congestive Heart Failure","Male","Chronic Obstructive Pulmonary Disease (COPD)","Diabetes Mellitus (DM) or DM Complications","Specified Arrhythmias and Other Heart Rhythm Disorders","Coronary Atherosclerosis/ Other Chronic Ischemic Heart Disease","Severe Infection; Other Infectious Diseases","Other Urinary Tract Disorders","History of Coronary Artery Bypass Graft (CABG) Surgery","History of Percutaneous Transluminal Coronary Angioplasty (PTCA)","Asthma","Dementia or Other Specified Brain Disorders","Anterior Myocardial Infarction ","Non-Anterior Location of Myocardial Infarction","History of COVID-19","Metastatic Cancer and Acute Leukemia","Cancer","Protein-Calorie Malnutrition","Hemiplegia, Paraplegia, Paralysis, Functional Disability","Acute Coronary Syndrome","Angina Pectoris","Valvular and Rheumatic Heart Disease","Stroke","Cerebrovascular Disease","Vascular or Circulatory Disease","Pneumonia","Dialysis Status","Decubitus Ulcer or Chronic Skin Ulcer","Cardio-respiratory Failure and Shock","Congestive Heart Failure","Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance","Iron Deficiency or Other/Unspecified Anemias and Blood Disease","Renal Failure","Specified Arrhythmias and Other Heart Rhythm Disorders","Other Gastrointestinal Disorders","Diabetes Mellitus (DM) or DM Complications","History of Mechanical Ventilation","Vascular or Circulatory Disease","Pneumonia","Other Psychiatric Disorders","Drug/Alcohol Psychosis or Dependence","Morbid Obesity; Other Endocrine/ Metabolic/ Nutritional Disorders","Polyneuropathy; Other Neuropathies","Anxiety Disorders","Metastatic Cancer and Acute Leukemia","Acute Coronary Syndrome","Other and Unspecified Heart Disease","Protein-Calorie Malnutrition","Depression","Sleep-Disordered Breathing","Severe Infection; Other Infectious Diseases","Coronary Atherosclerosis or Angina","Peptic Ulcer, Hemorrhage, Other Specified Gastrointestinal Disorders","Vertebral Fractures Without Spinal Cord Injury","Hemiplegia, Paraplegia, Paralysis, Functional Disability","Major Psychiatric Disorders","Cellulitis, Local Skin Infection","History of COVID-19","Lymphatic, Head and Neck, Brain, and Other Major Cancers; Breast, Colorectal and Other Cancers and Tumors; Other Respiratory and Heart Neoplasms","Dementia or Other Specified Brain Disorders","Respirator Dependence/Respiratory Failure","Lung and Other Severe Cancers","Other Digestive and Urinary Neoplasms","Chronic Pancreatitis","Severe Hematological Disorders","Stroke","Fibrosis of Lung or Other Chronic Lung Disorders","Decubitus Ulcer or Chronic Skin Ulcer","Renal Failure","Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance","Iron Deficiency or Other/Unspecified Anemias and Blood Disease","Specified Arrhythmias and Other Heart Rhythm Disorders","Congestive Heart Failure","Chronic Obstructive Pulmonary Disease (COPD)","Diabetes Mellitus (DM) or DM Complications","Cardio-Respiratory Failure and Shock","Other Gastrointestinal Disorders","Coronary Atherosclerosis or Angina","Valvular and Rheumatic Heart Disease","Vascular or Circulatory Disease","Pneumonia","Liver or Biliary Disease","Male","Other Psychiatric Disorders","Acute Coronary Syndrome","Drug/Alcohol Abuse/ Dependence/Psychosis","Other Urinary Tract Disorders","Decubitus Ulcer or Chronic Skin Ulcer","Other and Unspecified Heart Disease","Peptic Ulcer, Hemorrhage, Other Specified Gastrointestinal Disorders","Depression","Fibrosis of Lung or Other Chronic Lung Disorders","History of Coronary Artery Bypass Graft (CABG) Surgery","Protein-Calorie Malnutrition","Stroke","Major Psychiatric Disorders","Dementia or Other Specified Brain Disorders","Cancer","Asthma","History of COVID-19","Metastatic Cancer and Acute Leukemia","Severe Hematological Disorders","Hemiplegia, Paraplegia, Paralysis, Functional Disability","Dialysis Status","Nephritis","Hypertension","Male","Major Symptoms, Abnormalities","Specified Arrhythmias and Other Heart Rhythm Disorders","Renal Failure","Chronic Obstructive Pulmonary Disease (COPD)","Other Injuries","Index Admissions with an Elective THA Procedure","Morbid Obesity","Diabetes Mellitus (DM) or DM Complications","Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance","Congestive Heart Failure","Polyneuropathy; Other Neuropathies","Rheumatoid Arthritis and Inflammatory Connective Tissue Disease","Coronary Atherosclerosis or Angina","Vascular or Circulatory Disease","Severe Hematological Disorders","Pneumonia","Cellulitis, Local Skin Infection","Protein-Calorie Malnutrition","Number of Procedures (two vs. one)","Severe Infection; Other Infectious Diseases","Stroke","Metastatic Cancer and Acute Leukemia","Cancer","Other Congenital Deformity of Hip (Joint)","Post Traumatic Osteoarthritis","History of COVID-19","Dementia or Other Specified Brain Disorders","Major Psychiatric Disorders","Hemiplegia, Paraplegia, Paralysis, Functional Disability","Dialysis Status","Decubitus Ulcer or Chronic Skin Ulcer","Iron Deficiency or Other/Unspecified Anemias and Blood Disease","Congestive Heart Failure","Chronic Obstructive Pulmonary Disease (COPD)","Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance","Specified Arrhythmias and Other Heart Rhythm Disorders","Renal Failure","Respiratory Arrest; Cardio-Respiratory Failure and Shock","Pleural Effusion/Pneumothorax","Pneumonia","Other Gastrointestinal Disorders","Diabetes Mellitus (DM) or DM Complications","Decubitus Ulcer or Chronic Skin Ulcer","Male","Septicemia, Sepsis, Systemic Inflammatory Response Syndrome/Shock","Vascular or Circulatory Disease","Other Psychiatric Disorders","Valvular and Rheumatic Heart Disease","Protein-Calorie Malnutrition","Metastatic Cancer and Acute Leukemia","Other Respiratory Disorders","Severe Hematological Disorders","Coronary Atherosclerosis or Angina","Urinary Tract Infection","Lymphoma; Other Cancers","Severe Infection; Other Infectious Diseases","Drug/Alcohol Abuse/Dependence/Psychosis","Acute Coronary Syndrome","Other Urinary Tract Disorders","Lung and Other Severe Cancers","Fibrosis of Lung or Other Chronic Lung Disorders","Other Injuries","Vertebral Fractures Without Spinal Cord Injury","History of COVID-19","Asthma","Dementia or Other Specified Brain Disorders","History of Coronary Artery Bypass Graft (CABG) Surgery","Stroke","Major Psychiatric Disorders","Hemiplegia, Paraplegia, Paralysis, Functional Disability","Respirator Dependence/Tracheostomy Status","Dialysis Status"],"NetImpact":[0.5235426179157,0.44931704584918,0.37457404224014,0.33957243074671,0.2675418959465,0.20889600190574,0.18363519022453,0.12834192882491,0.10175224371917,0.08324200070107,0.07545016852382,0.02331580008699,0.02178937903037,0.01695937962873,0.00703750490067,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3.3057268498572,2.1954742139943,1.7681007960757,1.64952489626271,1.12027563857762,0.96649803843576,0.74251967209893,0.7125267381168,0.58561614312928,0.4870209378212,0.47730649517424,0.45571865494295,0.32890076470614,0.28660856894856,0.25724789661704,0.24985511429454,0.20104445264549,0.16739506872294,0.1430754784877,0.14281362472359,0.13880072962548,0.13557895240552,0.11424105848721,0.09809135497948,0.08820129638075,0.08224193320365,0.05564420968061,0.04427640612473,0.04118173223991,0.04007963693594,0.0137186203552,0.01206184755198,0.01120929463448,0,0,0,0,0,0,0,3.56546721921474,2.4383505700119,2.1258534531899,1.7428735116513,1.7282357960685,1.27416604020528,1.16132577864612,0.96826138209712,0.96101538096801,0.82428417748222,0.7058646898766,0.64673166347136,0.58446602743255,0.43251107535505,0.43040705612184,0.40632600571668,0.29370833724054,0.27436647425115,0.26769028758744,0.22785324193658,0.21622359036498,0.1447041361696,0.12331229689752,0.10826897466488,0.0993201327659,0.08560777795158,0.06368060112146,0.05023391795846,0.03902193261168,0.03787959257955,0.00404853256189,0,0,0,0,0,0,4.15515357713177,2.80663141711267,2.19159538489132,2.085236576967,2.05192302232266,1.78362196987359,1.70979743927568,1.6317044297896,1.56089823750301,1.55758461575483,1.47259366374654,1.08929546199,0.59409163989952,0.41806676650431,0.39300612173646,0.32978389909893,0.29881994243779,0.29212209910216,0.13988994687991,0.13277839169882,0.1286610446364,0.0901654992735,0.08923188109521,0.0799645208149,0.042743700362,0,0,0,0,0,0,0,0,2.72920525222545,2.52488592892445,2.49645653824717,2.3597881535932,1.59646856993392,1.39794309865727,1.310220990912,1.1242112991375,1.0709471281501,1.04983932136549,0.99049161812656,0.69790924724682,0.58479638107248,0.47421530677304,0.46721989463229,0.40644881804569,0.39193175813216,0.36588714503394,0.35654133907178,0.30633742233198,0.26359064734308,0.2628246319412,0.2614979633892,0.25323892804701,0.24427408030992,0.24315227518572,0.21317320861923,0.17082178184452,0.12270154741653,0.10848046835409,0.09838461213966,0.07872272430396,0.0767024905464,0.0381856296621,0.03243113073984,0.02980557296489,0.02877533874591,0.0184252173479,0,0,0],"N":[2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32],"Count":[2,2,2,1,2,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,9,10,9,7,6,9,8,2,8,6,5,2,12,8,3,1,2,2,1,6,4,3,2,1,1,1,1,1,2,4,1,1,0,0,0,0,0,0,0,14,15,14,18,17,8,12,16,17,13,10,9,5,5,12,6,3,3,6,2,6,2,8,2,5,1,2,2,4,3,1,0,0,0,0,0,0,29,19,23,12,9,7,12,10,7,13,6,4,7,3,3,3,1,2,1,1,1,2,1,1,2,0,0,0,0,0,0,0,0,15,13,19,22,16,11,25,7,10,17,11,6,24,4,9,7,8,3,2,13,1,8,5,9,8,4,3,4,1,1,11,2,1,5,8,1,1,1,0,0,0],"Rate":[1,1,1,0.5,1,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.833333333333333,0.5,0.555555555555556,0.5,0.388888888888889,0.333333333333333,0.5,0.444444444444444,0.111111111111111,0.444444444444444,0.333333333333333,0.277777777777778,0.111111111111111,0.666666666666667,0.444444444444444,0.166666666666667,0.0555555555555556,0.111111111111111,0.111111111111111,0.0555555555555556,0.333333333333333,0.222222222222222,0.166666666666667,0.111111111111111,0.0555555555555556,0.0555555555555556,0.0555555555555556,0.0555555555555556,0.0555555555555556,0.111111111111111,0.222222222222222,0.0555555555555556,0.0555555555555556,0,0,0,0,0,0,0,0.56,0.6,0.56,0.72,0.68,0.32,0.48,0.64,0.68,0.52,0.4,0.36,0.2,0.2,0.48,0.24,0.12,0.12,0.24,0.08,0.24,0.08,0.32,0.08,0.2,0.04,0.08,0.08,0.16,0.12,0.04,0,0,0,0,0,0,0.644444444444444,0.422222222222222,0.511111111111111,0.266666666666667,0.2,0.155555555555556,0.266666666666667,0.222222222222222,0.155555555555556,0.288888888888889,0.133333333333333,0.0888888888888889,0.155555555555556,0.0666666666666667,0.0666666666666667,0.0666666666666667,0.0222222222222222,0.0444444444444444,0.0222222222222222,0.0222222222222222,0.0222222222222222,0.0444444444444444,0.0222222222222222,0.0222222222222222,0.0444444444444444,0,0,0,0,0,0,0,0,0.46875,0.40625,0.59375,0.6875,0.5,0.34375,0.78125,0.21875,0.3125,0.53125,0.34375,0.1875,0.75,0.125,0.28125,0.21875,0.25,0.09375,0.0625,0.40625,0.03125,0.25,0.15625,0.28125,0.25,0.125,0.09375,0.125,0.03125,0.03125,0.34375,0.0625,0.03125,0.15625,0.25,0.03125,0.03125,0.03125,0,0,0],"Value":[0.26177130895785,0.22465852292459,0.18728702112007,0.33957243074671,-0.13377094797325,0.20889600190574,0.18363519022453,0.12834192882491,0.10175224371917,0.08324200070107,0.07545016852382,0.02331580008699,-0.02178937903037,0.01695937962873,0.00703750490067,0.27100275921083,0.07120286291483,-0.0676429056577,0.22634890641462,0.04401442147305,0.14477909636915,0.14174229352292,0.0297614483081,0.02729795306472,0.09968987228144,0.04570363659622,0.05614466547329,0.09943077750067,0.16362948864226,0.25127110061188,0.11440519285484,0.22038178999048,0.2439415793327,0.17681007960757,0.18328054402919,0.16003937693966,0.16108300640596,0.08250218578877,0.0890658422646,0.29280807156464,0.06087761722765,0.07955108252904,0.09114373098859,0.16445038235307,-0.02388404741238,0.03215598707713,0.08328503809818,0.20104445264549,0.08369753436147,0.07153773924385,0.14281362472359,0.02313345493758,-0.03389473810138,0.03808035282907,0.04904567748974,0.08820129638075,0.08224193320365,0.05564420968061,0.04427640612473,0.04118173223991,-0.02003981846797,-0.0034296550888,0.01206184755198,0.01120929463448,0.15641571251929,-0.07920807606545,0.06293681682419,0.14811373057356,-0.00864104299739,0.09533030999425,0.07294989023911,0.25467622994391,0.16255670466746,0.15184667522785,0.09682630620285,0.1016609291805,0.15927075502566,0.09677714822051,0.06051633638107,0.05653031652753,0.06340647519094,0.07058646898766,0.07185907371904,0.11689320548651,0.08650221507101,-0.03586725467682,0.06772100095278,0.09790277908018,0.09145549141705,0.04461504793124,0.11392662096829,0.03603726506083,0.0723520680848,0.01541403711219,0.05413448733244,0.01986402655318,0.08560777795158,0.03184030056073,0.02511695897923,0.00975548315292,0.01262653085985,0.00404853256189,-0.00239294251728,0.14874300720604,0.23762950955586,0.0742789500502,0.15063453076202,0.05285914464513,0.14328115783213,0.14771744300593,0.09528675586484,0.17376971474725,0.22799144692474,0.25480313855337,0.14248311993964,0.16317044297896,0.22298546250043,0.11981420121191,0.24543227729109,0.2723238654975,0.08487023427136,0.13935558883477,0.13100204057882,0.10992796636631,0.29881994243779,0.14606104955108,0.13988994687991,0.13277839169882,0.1286610446364,0.04508274963675,0.08923188109521,0.0799645208149,-0.021371850181,0.28564566526498,-1.52018641e-05,-0.02583679305955,0.15538492166378,0.23494388360546,0.1224436188646,0.5978782174907,0.20960556510495,0.18194701681503,0.19422199453265,0.13139244938143,0.1072630978906,0.09977928562087,0.12708573624157,0.05240883963648,0.1606016141625,0.10709471281501,0.06175525419797,0.09004469255696,0.11631820787447,0.02436651587802,0.11855382669326,0.05191332162581,0.05806411686367,0.04899146976652,0.12196238167798,0.17827066953589,0.02356441710246,0.26359064734308,0.03285307899265,0.05229959267784,0.02813765867189,0.03053426003874,0.06078806879643,0.07105773620641,0.04270544546113,0.12270154741653,0.10848046835409,0.00894405564906,0.03936136215198,-0.0767024905464,-0.00763712593242,-0.00405389134248,-0.02980557296489,0.02877533874591,0.0184252173479,0.11023929530957,0.17600014087344,0.24009331029806]},"columns":[{"id":"Cohort","name":"Cohort","type":"character"},{"id":"Factor","name":"Risk Factor","type":"character"},{"id":"NetImpact","name":"Net Impact","type":"numeric","format":{"cell":{"digits":2},"aggregated":{"digits":2}}},{"id":"N","name":"Discharge Count","type":"numeric"},{"id":"Count","name":"Count","type":"numeric"},{"id":"Rate","name":"Percent","type":"numeric","format":{"cell":{"digits":1,"percent":true},"aggregated":{"digits":1,"percent":true}}},{"id":"Value","name":"Value","type":"numeric"}],"columnGroups":[{"name":"Risk Factor Prevalence","columns":["Count","Rate"]}],"groupBy":["Cohort"],"resizable":true,"filterable":true,"searchable":true,"dataKey":"e32991bf80ecaa70061e95ab294ab68b"},"children":[]},"class":"reactR_markup"},"evals":[],"jsHooks":[]}</script>
</div>
</div>
<p>If you scan through the table you’ll notice that the risk factors that have the highest model weight or highest prevalence are not necessarily the ones with the most net impact.</p>
<p>Finally, we could put these in a plot:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb108" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb108-1">prevalence <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb108-2"></span>
<span id="cb108-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Join to get model weight</span></span>
<span id="cb108-4">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">inner_join</span>(</span>
<span id="cb108-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> model_weights,</span>
<span id="cb108-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> </span>
<span id="cb108-7">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb108-8">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cohort"</span>,</span>
<span id="cb108-9">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Factor"</span></span>
<span id="cb108-10">      )</span>
<span id="cb108-11">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb108-12">  </span>
<span id="cb108-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb108-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(</span>
<span id="cb108-15">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb108-16">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Count,</span>
<span id="cb108-17">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(Value),</span>
<span id="cb108-18">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> Cohort</span>
<span id="cb108-19">    ),</span>
<span id="cb108-20">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">show.legend =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span></span>
<span id="cb108-21">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb108-22">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_hline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">yintercept =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb108-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>Cohort, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"free_x"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nrow =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb108-24">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb108-25">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Risk Factor Count"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb108-26">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Odds Ratio"</span>) </span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/investigating-a-hospital-specific-report/index_files/figure-html/unnamed-chunk-55-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Points that are closer to the upper-right quadrant are the most influential. Of course, it would be useful to add interactivity to these plots (e.g., with <code>plotly</code>) as to provide the user ability to scan and view what risk factor each point represents, but you get the idea.</p>
</section>
</section>
<section id="other-analyses" class="level2">
<h2 class="anchored" data-anchor-id="other-analyses">3. Other Analyses</h2>
<p>There are many other possible questions that can be answered analyzing an HSR that can be useful for better understand HRRP program results. We’ll list a few more ideas here without actually doing them:</p>
<section id="diagnosis-comparison" class="level3">
<h3 class="anchored" data-anchor-id="diagnosis-comparison">Diagnosis Comparison</h3>
<p>In the discharge reports, CMS supplies the diagnosis code the patient received that led their index discharge to be included in the program. For those who are readmitted, also supplied in this table are the diagnosis codes received at their readmission stay. These can be accessed with the <code>hsr_discharges()</code> function and extracting the appropriate columns. Here’s an example doing this for the HF cohort:</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb109" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb109-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_discharges</span>(</span>
<span id="cb109-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> my_report,</span>
<span id="cb109-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cohort =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HF"</span>,</span>
<span id="cb109-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">eligible_only =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb109-5">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb109-6"></span>
<span id="cb109-7">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Keep diagnosis columns</span></span>
<span id="cb109-8">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matches</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Diagnosis"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb109-9">  </span>
<span id="cb109-10">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Aggregate</span></span>
<span id="cb109-11">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb109-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">N =</span> dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>(),</span>
<span id="cb109-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">everything</span>()</span>
<span id="cb109-14">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb109-15">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(N))</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 9 × 3
  `Principal Discharge Diagnosis of Index Stay` Principal Discharge Diag…¹     N
  &lt;chr&gt;                                         &lt;chr&gt;                      &lt;int&gt;
1 I110                                          &lt;NA&gt;                          12
2 I130                                          &lt;NA&gt;                           5
3 I5033                                         &lt;NA&gt;                           2
4 I5033                                         I130                           1
5 I130                                          A0472                          1
6 I132                                          U071                           1
7 I509                                          &lt;NA&gt;                           1
8 I5023                                         &lt;NA&gt;                           1
9 I5043                                         &lt;NA&gt;                           1
# ℹ abbreviated name: ¹​`Principal Discharge Diagnosis of Readmission`</code></pre>
</div>
</div>
<p>This can be extremely useful to understand (a) which diagnoses are leading to cohort inclusion to begin with, and (b) why the patient came back. Sometimes there may be readmissions that are completely unrelated to the index diagnosis.</p>
</section>
<section id="outside-hospital-readmissions" class="level3">
<h3 class="anchored" data-anchor-id="outside-hospital-readmissions">Outside Hospital Readmissions</h3>
<p>Since CMS enforces the HRRP, hospitals get penalized for readmissions that don’t even occur at their hospital. If a patient is discharged from your hospital and subsequently readmitted somewhere else, it still counts as <em>your</em> readmission. Thus it is useful to understand the rate of readmissions that are back in your own system versus somewhere else. This is provided the HSR as well.</p>
<div class="cell">
<details open="" class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb111" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb111-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_discharges</span>(</span>
<span id="cb111-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> my_report,</span>
<span id="cb111-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cohort =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HF"</span>,</span>
<span id="cb111-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">eligible_only =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb111-5">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb111-6"></span>
<span id="cb111-7">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Keep diagnosis columns</span></span>
<span id="cb111-8">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matches</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Same Hospital"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb111-9">  </span>
<span id="cb111-10">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a table</span></span>
<span id="cb111-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">table</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">useNA =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ifany"</span>)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>Readmission to Same Hospital (Yes/No)
  No  Yes &lt;NA&gt; 
   2    1   22 </code></pre>
</div>
</div>
<p>For example, of the 22 HF discharges, three (3) were readmitted: one (1) to the same hospital as the index stay, and two (2) elsewhere (maybe within your system, or not).</p>
</section>
<section id="benchmarking-with-other-hospitals" class="level3">
<h3 class="anchored" data-anchor-id="benchmarking-with-other-hospitals">Benchmarking With Other Hospitals</h3>
<p>CMS provides access to hospital-level performance data on <a href="https://data.cms.gov/provider-data/">QualityNet</a>. You can use the <code>pdc_*</code> functions in this package to explore and import various data files to perform comparative analyses at the hospital level (see <code>pdc_read()</code>).</p>
<p><em>Don’t forget to <a href="https://dashboard.mailerlite.com/forms/1517199/154300987644839168/share">subscribe</a> to receive email updates when new articles drop!</em></p>


<!-- -->

</section>
</section>
</section>

 ]]></description>
  <category>Software Development</category>
  <category>Healthcare</category>
  <category>Readmissions</category>
  <guid>https://www.zajichekstats.com/post/investigating-a-hospital-specific-report/</guid>
  <pubDate>Fri, 26 Dec 2025 06:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/investigating-a-hospital-specific-report/feature.png" medium="image" type="image/png" height="102" width="144"/>
</item>
<item>
  <title>Introducing the {readmit} R package</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/introducing-the-readmit-r-package/</link>
  <description><![CDATA[ 




<p><em>This article is a copy of the package’s <code>README</code> file. See the package website <a href="https://centralstatz.github.io/readmit/index.html">here</a>.</em></p>
<p><code>readmit</code> is an evolving R package that contains tools for working with and analyzing hospital readmissions data. Currently, it provides utilities for components of the <a href="https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/hospital-readmissions-reduction-program-hrrp">Hospital Readmissions Reduction Program (HRRP)</a>, including program timeline functions, <a href="https://qualitynet.cms.gov/inpatient/hrrp/reports">Hospital-Specific Report (HSR)</a> helpers, and general importing tools for the <a href="https://data.cms.gov/provider-data/">Provider Data Catalog (PDC)</a>.</p>
<section id="installation" class="level2">
<h2 class="anchored" data-anchor-id="installation">Installation</h2>
<p>You can install <code>readmit</code> from CRAN:</p>
<div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># install.packages("pak")</span></span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">install.packages</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"readmit"</span>)</span></code></pre></div>
<p>Or the development version from <a href="https://github.com/">GitHub</a> with:</p>
<div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># install.packages("pak")</span></span>
<span id="cb2-2">pak<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pak</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"centralstatz/readmit"</span>)</span></code></pre></div>
</section>
<section id="background" class="level1">
<h1>Background</h1>
<p>A <em>readmission</em> occurs when a patient is admitted to the hospital, again, after they were recently discharged (where 30 days is the typical time frame used between hospitalizations). First and foremost, it is an obvious burden to patients for multiple reasons (i.e., psychologically, financially, etc.).</p>
<p>Additionally, hospitals across the United States are <a href="https://qualitynet.cms.gov/inpatient/hrrp/methodology">penalized</a> by the <a href="https://www.cms.gov/">Centers for Medicare &amp; Medicaid Services (CMS)</a> on an annual basis in what is called the <a href="https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/hospital-readmissions-reduction-program-hrrp">Hospital Readmissions Reduction Program (HRRP)</a>. In this program, up to 3% of Medicare reimbursement is witheld from hospitals for the duration of a <a href="https://www.usa.gov/federal-budget-process">fiscal year</a> depending on the volume of <a href="https://qualitynet.cms.gov/inpatient/hrrp/measures#:~:text=Planned%20Readmission%20Algorithm.-,Excess%20Readmission%20Ratio,-The%20excess%20readmission">excess readmissions</a> in <a href="https://www.cms.gov/medicare/quality/value-based-programs/hospital-readmissions#:~:text=What%20measures%20are%20included%20in%20the%20Hospital%20Readmissions%20Reduction%20Program%3F">select patient populations</a> during a preceding <a href="https://qualitynet.cms.gov/inpatient/hrrp/resources">performance period</a>. Readmissions also show up in other payer contracts, such as commericial insurers. Thus, it is a key area of focus for hospitals and part of the general measure of overall health of clinical and financial operations.</p>
<p>Typically, cross-functional teams are deployed within health systems to monitor and develop initiatives, interventions, and overall strategy to manage and prevent readmissions. This includes things like care coordination and patient outreach, as well as how to incorporate technology such as predictive analytics to identify high-risk patients and prevent readmissions before they occur.</p>
<section id="the-problems" class="level2">
<h2 class="anchored" data-anchor-id="the-problems">The Problems</h2>
<p>The issues in doing this flawlessly are multi-factorial, but we’ll list a few that we see relevant to (and motivators for) this package:</p>
<ul>
<li><p><strong>Reporting lineage</strong>: It is often difficult to have seamless line of sight and reconcile hospital-wide metrics (e.g., overall readmission rates, penalty amounts, etc.) down to individual patients and their associated impact. There are many reasons for this: some may be technical (e.g., reporting tools, data collection, systems/personnel constraints), but some are due to complexities of hospital operations: varying definitions of metrics (e.g., how do we define readmissions?), diverse patient populations (e.g., which patients should/should not be included in the rates?), differential impact on outcomes (e.g., only Medicare patients contribute to the penalty but the hospital cares about readmissions for all patients). <br></p></li>
<li><p><strong>Payor Contracts/Reimbursement</strong>: Readmissions have different implications depending on whose paying for the service (among other things). It is difficult to disentangle and account for these nuances (especially in reporting/metrics) when developing readmission prevention strategies while optimizing financial health. <br></p></li>
<li><p><strong>Oversaturation of research</strong>: Readmissions has a <a href="https://pubmed.ncbi.nlm.nih.gov/?term=predicting+readmissions">large body of research</a>. As a result, hospitals are thrown all kinds of “evidence” about how they should prevent readmissions, but it’s difficult to confidently translate and distill that to a localized, optimized, actionable program for any one hospital, especially when it’s conflicting. <br></p></li>
<li><p><strong>Over-reliance of risk tools</strong>: Especially in the hype of AI, machine learning (ML), etc., there are various vendor platforms and risk tools that purport to predict readmissions. The issue comes with how they are implemented into clinical workflows. These models may be good <em>statistically</em>, but must be implemented with intention and cross-functional teamwork for them to actually be useful. AI is not magic! Additionally, many out-of-the-box models focus on predicting readmission risk at the time of discharge. These may be good markers of baseline clinical risk but can quickly grow stale as the patient enters the post-discharge phase where the real drivers of readmissions occur. <br></p></li>
<li><p><strong>Complexity of government programs</strong>: The <a href="https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/hospital-readmissions-reduction-program-hrrp">HRRP</a> has many moving parts and details that make it difficult to track what’s really going on. This includes the timing of the discharges that are actually counted in the program relative to when payment penalties are applied, the diagnosis codes and claims documentation used to identify patients to include, the statistical methodology behind the scenes that power program metrics, and the way all of that rolls up into a penalty percentage amount administered to the hospital, among other things. Each of these details have deep nuances that have tangible impact.</p></li>
</ul>
<p><br></p>
<p>This package is meant to provide tools to help with components of these issues. In particular, the current state of the package focuses on the last item, making it easier to analyze information related to the <a href="https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/hospital-readmissions-reduction-program-hrrp">HRRP</a>. Over time, we hope this scope widens.</p>
</section>
</section>
<section id="examples" class="level1">
<h1>Examples</h1>
<p>Here are a few ways to use the package.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(readmit)</span></code></pre></div>
</details>
</div>
<section id="extracting-key-dates-from-the-hrrp" class="level2">
<h2 class="anchored" data-anchor-id="extracting-key-dates-from-the-hrrp">Extracting key dates from the HRRP</h2>
<p>An important piece of the <a href="https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/hospital-readmissions-reduction-program-hrrp">HRRP</a> is to understand the timelines and dates associated with the program. We provide built-in datasets to conveniently access these dates (see <code>?hrrp_keydates</code>). For example, <code>hrrp_performance_periods</code> provides the date ranges for discharges that are included in each program year:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">hrrp_performance_periods</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>   ProgramYear  StartDate    EndDate
1         2027 2023-07-01 2025-06-30
2         2026 2021-07-01 2024-06-30
3         2025 2020-07-01 2023-06-30
4         2024 2019-07-01 2019-12-01
5         2024 2020-07-01 2022-06-30
6         2023 2018-07-01 2019-12-01
7         2023 2020-07-01 2021-06-30
8         2022 2017-07-01 2019-12-01
9         2021 2016-07-01 2019-06-30
10        2020 2015-07-01 2018-06-30
11        2019 2014-07-01 2017-06-30</code></pre>
</div>
</div>
<p>And <code>hrrp_snapshot_dates</code> provides the date that CMS took the extract of claims data for each program year:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">hrrp_snapshot_dates</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>  ProgramYear SnapshotDate
1        2027   2025-09-30
2        2026   2024-10-22
3        2025   2023-10-13
4        2024   2022-09-30
5        2023   2021-09-24
6        2022   2020-09-25
7        2021   2019-09-27
8        2020   2018-09-28
9        2019   2017-09-29</code></pre>
</div>
</div>
<p>Or, all of the individual <code>hrrp_*</code> datasets are pre-joined in <code>hrrp_keydates</code>:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">hrrp_keydates</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>   ProgramYear PerformanceStartDate PerformanceEndDate PaymentStartDate
1         2027           2023-07-01         2025-06-30       2026-10-01
2         2026           2021-07-01         2024-06-30       2025-10-01
3         2025           2020-07-01         2023-06-30       2024-10-01
4         2024           2019-07-01         2019-12-01       2023-10-01
5         2024           2020-07-01         2022-06-30       2023-10-01
6         2023           2018-07-01         2019-12-01       2022-10-01
7         2023           2020-07-01         2021-06-30       2022-10-01
8         2022           2017-07-01         2019-12-01       2021-10-01
9         2021           2016-07-01         2019-06-30       2020-10-01
10        2020           2015-07-01         2018-06-30       2019-10-01
11        2019           2014-07-01         2017-06-30       2018-10-01
   PaymentEndDate ReviewStartDate ReviewEndDate SnapshotDate AMI COPD HF PN
1      2027-09-30            &lt;NA&gt;          &lt;NA&gt;   2025-09-30   1    1  1  1
2      2026-09-30      2025-08-12    2025-09-10   2024-10-22   1    1  1  1
3      2025-09-30      2024-08-12    2024-09-10   2023-10-13   1    1  1  1
4      2024-09-30      2023-08-08    2023-09-07   2022-09-30   1    1  1  1
5      2024-09-30      2023-08-08    2023-09-07   2022-09-30   1    1  1  1
6      2023-09-30      2022-08-08    2022-09-07   2021-09-24   1    1  1  0
7      2023-09-30      2022-08-08    2022-09-07   2021-09-24   1    1  1  0
8      2022-09-30      2021-08-09    2021-09-08   2020-09-25   1    1  1  1
9      2021-09-30      2020-08-10    2020-09-09   2019-09-27   1    1  1  1
10     2020-09-30      2019-08-09    2019-09-09   2018-09-28   1    1  1  1
11     2019-09-30      2018-08-06    2018-09-05   2017-09-29   1    1  1  1
   CABG HK
1     1  1
2     1  1
3     1  1
4     1  1
5     1  1
6     1  1
7     1  1
8     1  1
9     1  1
10    1  1
11    1  1</code></pre>
</div>
</div>
<section id="finding-relevant-program-dates" class="level3">
<h3 class="anchored" data-anchor-id="finding-relevant-program-dates">Finding relevant program dates</h3>
<p>We can also use the <code>hrrp_get_dates()</code> function to extract relevant time periods for an inputted date. For example:</p>
<ul>
<li><em>“What is the performance period for payments my hospital is currently being penalized for?”</em></li>
</ul>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hrrp_get_dates</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.Date</span>(), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"performance"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">discharge =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 3
  ProgramYear StartDate  EndDate   
        &lt;dbl&gt; &lt;chr&gt;      &lt;chr&gt;     
1        2026 2021-07-01 2024-06-30</code></pre>
</div>
</div>
<ul>
<li><em>“What payment periods did a discharge from 1/1/2022 impact?”</em></li>
</ul>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hrrp_get_dates</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.Date</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2022-01-01"</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"payment"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">discharge =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 3 × 3
  ProgramYear StartDate  EndDate   
        &lt;int&gt; &lt;date&gt;     &lt;date&gt;    
1        2026 2025-10-01 2026-09-30
2        2025 2024-10-01 2025-09-30
3        2024 2023-10-01 2024-09-30</code></pre>
</div>
</div>
<p>We can see that not only are the discharges that impact <em>today’s</em> payment reductions multiple years old, but also individual discharges (and their associated readmissions) can impact the program result for three (3) years in a row.</p>
</section>
</section>
<section id="analyzing-hospital-reports" class="level2">
<h2 class="anchored" data-anchor-id="analyzing-hospital-reports">Analyzing hospital reports</h2>
<p><em><strong>Note</strong>: CMS changed the format of Hospital-Specific Reports (HSRs) for FY2026 (see <a href="https://qualitynet.cms.gov/inpatient/hrrp/reports#tab2">here</a>). The current HSR functions support Excel-based formats through FY2025.</em></p>
<p><a href="https://www.cms.gov/">CMS</a> sends out <a href="https://qualitynet.cms.gov/inpatient/hrrp/reports">Hospital-Specific Reports (HSR)</a> each program year detailing the calculations of the payment reduction for the upcoming fiscal year (hospitals are given a 1-month period to review and submit corrections, the dates of which can be accessed with <code>hrrp_review_periods</code>). These reports contain the penalty amount down to the individual, line-item discharges that were included in the program. The package functions prefixed like <code>hsr_*</code> are meant to be used with them. For example, we can use the <code>hsr_discharges</code> function to extract discharge-level data for the heart failure cohort included in the readmission denominator into a clean data frame:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Mock report from QualityNet</span></span>
<span id="cb14-2">my_hsr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_mock_reports</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"FY2025_HRRP_MockHSR.xlsx"</span>)</span>
<span id="cb14-3"></span>
<span id="cb14-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_discharges</span>(</span>
<span id="cb14-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> my_hsr,</span>
<span id="cb14-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cohort =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HF"</span>,</span>
<span id="cb14-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">eligible_only =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb14-8">)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 25 × 17
   `ID Number` MBI         `Medical Record Number` `Beneficiary DOB`
         &lt;int&gt; &lt;chr&gt;       &lt;chr&gt;                   &lt;chr&gt;            
 1           1 9AA9AA9AA99 99999A                  99/99/9999       
 2           2 9AA9AA9AA99 99999A                  99/99/9999       
 3           3 9AA9AA9AA99 99999A                  99/99/9999       
 4           4 9AA9AA9AA99 99999A                  99/99/9999       
 5           5 9AA9AA9AA99 99999A                  99/99/9999       
 6           6 9AA9AA9AA99 99999A                  99/99/9999       
 7           7 9AA9AA9AA99 99999A                  99/99/9999       
 8           8 9AA9AA9AA99 99999A                  99/99/9999       
 9           9 9AA9AA9AA99 99999A                  99/99/9999       
10          10 9AA9AA9AA99 99999A                  99/99/9999       
# ℹ 15 more rows
# ℹ 13 more variables: `Admission Date of Index Stay` &lt;chr&gt;,
#   `Discharge Date of Index Stay` &lt;chr&gt;,
#   `Cohort Inclusion/Exclusion Indicator` &lt;chr&gt;, `Index Stay (Yes/No)` &lt;chr&gt;,
#   `Principal Discharge Diagnosis of Index Stay` &lt;chr&gt;,
#   `Discharge Destination` &lt;chr&gt;,
#   `Unplanned Readmission within 30 Days (Yes/No) [a]` &lt;chr&gt;, …</code></pre>
</div>
</div>
<p>We could also extract the risk factors for each patient used in the statistical models developed by CMS to estimate adjusted readmission risk:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_discharges</span>(</span>
<span id="cb16-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> my_hsr,</span>
<span id="cb16-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cohort =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HF"</span>,</span>
<span id="cb16-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">eligible_only =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb16-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">risk_factors =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb16-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">discharge_phi =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span></span>
<span id="cb16-7">)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 25 × 39
   `ID Number` `Years Over 65 (continuous)`  Male History of Coronary Artery B…¹
         &lt;int&gt;                        &lt;dbl&gt; &lt;dbl&gt;                          &lt;dbl&gt;
 1           1                            8     1                              0
 2           2                           25     1                              1
 3           3                            9     0                              0
 4           4                            9     0                              0
 5           5                           30     0                              0
 6           6                           13     0                              0
 7           7                           12     1                              1
 8           8                            7     1                              1
 9           9                           25     1                              0
10          10                           22     0                              0
# ℹ 15 more rows
# ℹ abbreviated name: ¹​`History of Coronary Artery Bypass Graft (CABG) Surgery`
# ℹ 35 more variables: `History of COVID-19` &lt;dbl&gt;,
#   `Metastatic Cancer and Acute Leukemia` &lt;dbl&gt;, Cancer &lt;dbl&gt;,
#   `Diabetes Mellitus (DM) or DM Complications` &lt;dbl&gt;,
#   `Protein-Calorie Malnutrition` &lt;dbl&gt;,
#   `Other Significant Endocrine and Metabolic Disorders; Disorders of Fluid/Electrolyte/Acid-base Balance` &lt;dbl&gt;, …</code></pre>
</div>
</div>
<p>We could then choose to extract the actual model coefficients (weights) that get applied to the patient risk factors:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_coefficients</span>(</span>
<span id="cb18-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> my_hsr,</span>
<span id="cb18-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cohort =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HF"</span></span>
<span id="cb18-4">)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 40 × 2
   Factor                                                                  Value
   &lt;chr&gt;                                                                   &lt;dbl&gt;
 1 Years Over 65 (continuous)                                           -0.00589
 2 Male                                                                 -0.0359 
 3 History of Coronary Artery Bypass Graft (CABG) Surgery                0.0199 
 4 History of COVID-19                                                  -0.00239
 5 Metastatic Cancer and Acute Leukemia                                  0.149  
 6 Cancer                                                                0.0126 
 7 Diabetes Mellitus (DM) or DM Complications                            0.0968 
 8 Protein-Calorie Malnutrition                                          0.0856 
 9 Other Significant Endocrine and Metabolic Disorders; Disorders of F…  0.163  
10 Liver or Biliary Disease                                              0.0865 
# ℹ 30 more rows</code></pre>
</div>
</div>
<p>These tables can be joined together and each patient’s readmission risk that CMS used can be computed and analyzed. Or, we can use the <code>hsr_readmission_risks()</code> function to do this for us:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hsr_readmission_risks</span>(</span>
<span id="cb20-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> my_hsr,</span>
<span id="cb20-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cohort =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HF"</span></span>
<span id="cb20-4">)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 25 × 3
   `ID Number` Predicted Expected
         &lt;int&gt;     &lt;dbl&gt;    &lt;dbl&gt;
 1           1    0.258    0.264 
 2           2    0.186    0.192 
 3           3    0.184    0.189 
 4           4    0.188    0.193 
 5           5    0.0857   0.0885
 6           6    0.133    0.138 
 7           7    0.110    0.113 
 8           8    0.183    0.188 
 9           9    0.163    0.168 
10          10    0.179    0.184 
# ℹ 15 more rows</code></pre>
</div>
</div>
<p>As you can see, there are many ways to use the information in these reports to gain insight into readmissions at your hospital. Further analysis strategies can be explored in the <a href="https://centralstatz.github.io/readmit/articles/investigating-an-hsr.html">associated article</a>.</p>
</section>
<section id="importing-data-from-the-provider-data-catalog" class="level2">
<h2 class="anchored" data-anchor-id="importing-data-from-the-provider-data-catalog">Importing data from the Provider Data Catalog</h2>
<p><a href="https://www.cms.gov/">CMS</a> provides access to a large repository of datasets in the <a href="https://data.cms.gov/provider-data/">Provider Data Catalog (PDC)</a>, which includes, among many other datasets, readmission measures and HRRP program results for hospitals around the United States. The package functions prefixed like <code>pdc_*</code> are general functions to explore and import metadata/datasets straight from the website into clean datasets in <code>R</code> (see <code>?pdc_read</code>). For example, we can use <code>pdc_topics()</code> to get the collection of topics seen <a href="https://data.cms.gov/provider-data/">here</a>:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pdc_topics</span>()</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code> [1] "Dialysis facilities"                   
 [2] "Doctors and clinicians"                
 [3] "Home health services"                  
 [4] "Hospice care"                          
 [5] "Hospitals"                             
 [6] "Inpatient rehabilitation facilities"   
 [7] "Long-term care hospitals"              
 [8] "Nursing homes including rehab services"
 [9] "Physician office visit costs"          
[10] "Supplier directory"                    </code></pre>
</div>
</div>
<p>Then we can choose a topic (or topics) we want to find datasets for, and extract their metadata with <code>pdc_datasets()</code>:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1">hospital_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pdc_datasets</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hospitals"</span>)</span>
<span id="cb24-2">hospital_data</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 69 × 7
   datasetid topic     title       description issued     modified   downloadurl
   &lt;chr&gt;     &lt;chr&gt;     &lt;chr&gt;       &lt;chr&gt;       &lt;date&gt;     &lt;date&gt;     &lt;chr&gt;      
 1 axe7-s95e Hospitals Ambulatory… This file … 2025-10-01 2025-10-01 https://da…
 2 wue8-3vwe Hospitals Ambulatory… This file … 2025-10-01 2025-10-01 https://da…
 3 4jcv-atw7 Hospitals Ambulatory… A list of … 2025-10-01 2025-10-01 https://da…
 4 hbf-map   Hospitals Birthing F… A list of … 2025-07-09 2025-10-14 https://da…
 5 muwa-iene Hospitals CMS Medica… This data … 2020-12-10 2025-10-14 https://da…
 6 ynj2-r877 Hospitals Complicati… Complicati… 2023-07-05 2025-10-20 https://da…
 7 qqw3-t4ie Hospitals Complicati… Complicati… 2020-12-10 2025-10-14 https://da…
 8 bs2r-24vh Hospitals Complicati… Complicati… 2020-12-10 2025-10-14 https://da…
 9 jfnd-nl7s Hospitals Complicati… Prospectiv… 2024-07-31 2025-10-14 https://da…
10 z8ax-x9j1 Hospitals Complicati… Prospectiv… 2024-07-31 2025-10-14 https://da…
# ℹ 59 more rows</code></pre>
</div>
</div>
<p>This result contains information on all datasets included under the <em>Hospitals</em> topic. We can then explore this list to find a dataset we want to import. For example, we can search the titles of the datasets relevant to readmissions:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1">readmission_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb26-2">  hospital_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb26-3">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(</span>
<span id="cb26-4">      stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(</span>
<span id="cb26-5">        title,</span>
<span id="cb26-6">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pattern =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"(?i)readmission"</span></span>
<span id="cb26-7">      )</span>
<span id="cb26-8">    )</span>
<span id="cb26-9">readmission_data</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 7
  datasetid topic     title        description issued     modified   downloadurl
  &lt;chr&gt;     &lt;chr&gt;     &lt;chr&gt;        &lt;chr&gt;       &lt;date&gt;     &lt;date&gt;     &lt;chr&gt;      
1 9n3s-kdb3 Hospitals Hospital Re… In October… 2020-12-10 2025-01-08 https://da…</code></pre>
</div>
</div>
<p>Once we find the dataset we want, we can take note of the <code>datasetid</code>, and use the <code>pdc_read()</code> function to import it:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1">hrrp_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pdc_read</span>(readmission_data<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>datasetid)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stderr">
<pre><code>Rows: 18510 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (11): Facility Name, Facility ID, State, Measure Name, Number of Dischar...
dbl  (1): Footnote

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.</code></pre>
</div>
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1">hrrp_data</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 18,510 × 12
   `Facility Name`     `Facility ID` State `Measure Name` `Number of Discharges`
   &lt;chr&gt;               &lt;chr&gt;         &lt;chr&gt; &lt;chr&gt;          &lt;chr&gt;                 
 1 SOUTHEAST HEALTH M… 010001        AL    READM-30-AMI-… 296                   
 2 SOUTHEAST HEALTH M… 010001        AL    READM-30-CABG… 151                   
 3 SOUTHEAST HEALTH M… 010001        AL    READM-30-HF-H… 681                   
 4 SOUTHEAST HEALTH M… 010001        AL    READM-30-HIP-… N/A                   
 5 SOUTHEAST HEALTH M… 010001        AL    READM-30-PN-H… 490                   
 6 SOUTHEAST HEALTH M… 010001        AL    READM-30-COPD… 130                   
 7 MARSHALL MEDICAL C… 010005        AL    READM-30-CABG… N/A                   
 8 MARSHALL MEDICAL C… 010005        AL    READM-30-HIP-… N/A                   
 9 MARSHALL MEDICAL C… 010005        AL    READM-30-HF-H… 176                   
10 MARSHALL MEDICAL C… 010005        AL    READM-30-PN-H… 305                   
# ℹ 18,500 more rows
# ℹ 7 more variables: Footnote &lt;dbl&gt;, `Excess Readmission Ratio` &lt;chr&gt;,
#   `Predicted Readmission Rate` &lt;chr&gt;, `Expected Readmission Rate` &lt;chr&gt;,
#   `Number of Readmissions` &lt;chr&gt;, `Start Date` &lt;chr&gt;, `End Date` &lt;chr&gt;</code></pre>
</div>
</div>
<p>And then we can use this dataset for further analysis. For example:</p>
<p><em>“How many hospitals in this dataset are located in Wisconsin?”</em></p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1">hrrp_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb32-2">  dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(State <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"WI"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb32-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">with</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> _, dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n_distinct</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Facility ID</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>))</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 65</code></pre>
</div>
</div>
<p><br></p>
<p><em>Don’t forget to <a href="https://dashboard.mailerlite.com/forms/1517199/154300987644839168/share">subscribe</a> to receive email updates when new articles drop!</em></p>


<!-- -->

</section>
</section>

 ]]></description>
  <category>Software Development</category>
  <category>Healthcare</category>
  <category>Readmissions</category>
  <guid>https://www.zajichekstats.com/post/introducing-the-readmit-r-package/</guid>
  <pubDate>Fri, 19 Dec 2025 06:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/introducing-the-readmit-r-package/feature.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>Trying AI-assisted development in the Positron IDE</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/ai-assisted-shiny-development-in-positron/</link>
  <description><![CDATA[ 




<p><em>This article builds off a <a href="https://www.zajichekstats.com/post/building-an-llm-powered-shiny-app-for-hospital-readmissions/">prior post</a> that demos how to integrate LLM-powered components into a <a href="https://shiny.posit.co/">Shiny</a> application.</em></p>
<p>The new <a href="https://positron.posit.co/">Positron IDE</a> is amazing for doing and developing data science output/workflows. Although I’ll always love <a href="https://posit.co/download/rstudio-desktop/">RStudio</a>, I’m making the full-time switch to Positron because it’s the future of how data science work should be done, and the <a href="https://posit.co/products/ide/positron/">AI capabilities</a> being built in is a big part of that.</p>
<p>In a <a href="https://www.zajichekstats.com/post/building-an-llm-powered-shiny-app-for-hospital-readmissions/">prior post</a> I showed how to use packages like <a href="https://ellmer.tidyverse.org/"><code>ellmer</code></a> and <a href="https://posit-dev.github.io/querychat/"><code>querychat</code></a> to integrate LLM’s into a <a href="https://shiny.posit.co/">Shiny</a> web application. Now I’m just starting to get the “AI stuff” started in <a href="https://positron.posit.co/">Positron</a>, so with that same app wanted to add a few features to get a sense of how it all works while using <a href="https://positron.posit.co/assistant.html">Positron Assistant</a> for help.</p>
<section id="setup" class="level1">
<h1>Setup</h1>
<p>To get <a href="https://positron.posit.co/assistant.html">Positron Assitant</a> setup initially, I followed the steps <a href="https://positron.posit.co/assistant.html">here</a> to get it configured, which amounted to <a href="https://positron.posit.co/updating.html">updating Positron</a> and enabling it in the settings (as the directions say).</p>
<section id="purchasing-api-credits" class="level2">
<h2 class="anchored" data-anchor-id="purchasing-api-credits">Purchasing API credits</h2>
<p>After configuration, I see the chat pane available in the left sidebar.</p>
<p><img src="https://www.zajichekstats.com/post/ai-assisted-shiny-development-in-positron/chatpane.png" class="img-fluid"></p>
<p>In order for it to work, I need to get an API key from <a href="https://www.anthropic.com/">Anthropic</a> (though as mentioned in a <a href="https://www.youtube.com/watch?v=TrN-FMcOsOA">recent webinar</a>, more models will become supported). This means I need to purchase API credits for <a href="https://claude.ai/">Claude</a>, which I can do so on the platform <a href="https://platform.claude.com/dashboard">here</a>. When I click the <em>Get API Key</em> button, it tells me to pay $5, so I just went ahead and did that.</p>
<p><img src="https://www.zajichekstats.com/post/ai-assisted-shiny-development-in-positron/claudepurchase.png" class="img-fluid"></p>
<p>Once I submitted, it prompted me to name my API key (which it was done by default).</p>
<p><img src="https://www.zajichekstats.com/post/ai-assisted-shiny-development-in-positron/createkey.png" class="img-fluid"></p>
<p>Another thing I did was go into my Claude console and change the <a href="https://platform.claude.com/settings/limits">limit</a> to a lower amount, because it defaulted to $100 per month. But I don’t actually know how much cost will be incurred by using this in Positron, so wanted to be safe.</p>
</section>
<section id="authenticating-in-positron" class="level2">
<h2 class="anchored" data-anchor-id="authenticating-in-positron">Authenticating in Positron</h2>
<p>Now that we have a key, we can supply it to Positron. I clicked the <em>Add Anthropic as a Chat Provider</em> button (see screenshot above).</p>
<p><img src="https://www.zajichekstats.com/post/ai-assisted-shiny-development-in-positron/authenticate.png" class="img-fluid"></p>
<p>I pasted my API key and it looks like we are set.</p>
<p><img src="https://www.zajichekstats.com/post/ai-assisted-shiny-development-in-positron/finalsetup.png" class="img-fluid"></p>
</section>
</section>
<section id="codebase" class="level1">
<h1>Supplying the code base</h1>
<p>The code base we are going to be working with is located <a href="https://github.com/centralstatz/hospital_readmissions_explorer">here</a> (note that <a href="https://github.com/centralstatz/hospital_readmissions_explorer/commit/a0eec2e86664cf261dcd0b03e43ede78773c9e44">this commit</a> was the last one <em>before</em> the changes in this article were made). It is a <a href="https://shiny.posit.co/">Shiny</a> application exploring <a href="https://www.zajichekstats.com/post/managing-the-readmission-risk-pool/">hospital readmissions</a> that contains a toggle capability for the user to switch between “traditional” data filters and a natural language chat interface to interact with the data. In either case, the interactive map and visuals update accordingly:</p>
<p><img src="https://www.zajichekstats.com/post/ai-assisted-shiny-development-in-positron/readmit_app.gif" class="img-fluid"></p>
<p>There is also a video of how this was done:</p>
<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/YlLcxuAjgJw" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<p>The live application is located on <a href="https://connect.posit.cloud/">Posit Connect Cloud</a> <a href="https://0196f590-15b7-8b36-3010-eb5a0d8a6d94.share.connect.posit.cloud/">here</a>.</p>
</section>
<section id="whats-the-goal-here-then" class="level1">
<h1>What’s the goal here then?</h1>
<p>I want Positron Assistant to interact directly with my code and make changes on my behalf <em>with my supervision and approval</em>. The actual feature I want to change is simple: in the bottom-right corner of <a href="https://0196f590-15b7-8b36-3010-eb5a0d8a6d94.share.connect.posit.cloud/">the app</a> there is a table from the <a href="https://rstudio.github.io/DT/"><code>DT</code></a> package showing the hospitals currently being plotted on the map (just basically a list view of the map data).</p>
<p><img src="https://www.zajichekstats.com/post/ai-assisted-shiny-development-in-positron/dataview.png" class="img-fluid"></p>
<p><br></p>
<section id="desired-change" class="level2">
<h2 class="anchored" data-anchor-id="desired-change">Desired Change</h2>
<p>The headers in the table have pretty ugly names, so I’d like the assistant to make those nicer.</p>
</section>
</section>
<section id="trying-it-out" class="level1">
<h1>Trying it out</h1>
<p>With everything configured, I opened up the code base in a new Positron session. Note I had to re-authenticate my API key after closing Positron, and then I chose Claude Sonnet 3.7.</p>
<p>I know that <a href="https://github.com/centralstatz/hospital_readmissions_explorer/blob/main/server.R"><code>server.R</code></a> is where this change will need to be made because that is where my dynamic dataset is fed into the <a href="https://rstudio.github.io/DT/shiny.html"><code>DT::renderDataTable</code></a> function. So I opened that file as well as the Positron Assistant chat pane:</p>
<p><img src="https://www.zajichekstats.com/post/ai-assisted-shiny-development-in-positron/context.png" class="img-fluid"></p>
<p>The one really cool thing already is the <em>Add Context</em> button on the bottom. This means that you can add various files that the AI should consider in context to the conversation you are about to have with it. In this case, <code>server.R</code> is already attached (because it’s open), so I can just talk about this file as if it already knows what is in there.</p>
<p>The prompt I’m using is as follows:</p>
<p><em>“I don’t like the display column names in the hospital_table output table, can you make these column names cleaner, but just at the last step before it gets displayed.”</em></p>
<p>It gave a pretty good response to start.</p>
<p><img src="https://www.zajichekstats.com/post/ai-assisted-shiny-development-in-positron/prompt11.png" class="img-fluid"></p>
<p>This was the code snippet it provided.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Show data with cleaner column names</span></span>
<span id="cb1-2">output<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>hospital_table <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> DT<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">renderDataTable</span>({</span>
<span id="cb1-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a mapping of current names to cleaner names</span></span>
<span id="cb1-4">  clean_names <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb1-5">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"FacilityName"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hospital Name"</span>,</span>
<span id="cb1-6">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"FacilityID"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hospital ID"</span>,</span>
<span id="cb1-7">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"City"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"City"</span>,</span>
<span id="cb1-8">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"County"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"County"</span>,</span>
<span id="cb1-9">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Zip"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ZIP Code"</span>,</span>
<span id="cb1-10">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"DiagnosisCategory"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Diagnosis Group"</span>,</span>
<span id="cb1-11">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Excess"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Excess Readmission Ratio"</span>,</span>
<span id="cb1-12">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Predicted"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Predicted Rate (%)"</span>,</span>
<span id="cb1-13">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Expected"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Expected Rate (%)"</span></span>
<span id="cb1-14">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add any other columns you want to rename</span></span>
<span id="cb1-15">  )</span>
<span id="cb1-16">  </span>
<span id="cb1-17">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Apply the column renaming only at display time</span></span>
<span id="cb1-18">  DT<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">datatable</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">current_hospitals</span>(), </span>
<span id="cb1-19">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colnames =</span> clean_names[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">current_hospitals</span>())[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">current_hospitals</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(clean_names)]])</span>
<span id="cb1-20">})</span></code></pre></div>
</details>
</div>
<p>Seems to basically give me what I want. I’d need to copy/paste into the code to enact the changes. This is because, if you notice in the bottom of the screenshot above, it is in <em>Ask</em> mode (one other useful thing to note is that after that one question I went back to my Claude console and saw my balance was $4.93, so that just costed me $0.07).</p>
<p>I changed it to <em>Agent</em> mode and tried again:</p>
<p><img src="https://www.zajichekstats.com/post/ai-assisted-shiny-development-in-positron/agentmode.png" class="img-fluid"></p>
<p>To no avail. I just basically got the same output in the chat (and charged another $0.08). So I guess I’ll have to mess with that a little more.</p>
<section id="inline-prompting" class="level2">
<h2 class="anchored" data-anchor-id="inline-prompting">Inline prompting</h2>
<p>Although it basically gave me what I wanted and I could easily update the code, I want to test the other way to interact with the LLM: inline prompting. You just hover over the line in your code you want ask specifically about, and then right click -&gt; Copilot -&gt; “Editor Inline Chat”.</p>
<p><img src="https://www.zajichekstats.com/post/ai-assisted-shiny-development-in-positron/inlineprompt.png" class="img-fluid"></p>
<p>Here I entered the same prompt again, and there it did what I initially wanted: actually edit the code itself.</p>
<p><img src="https://www.zajichekstats.com/post/ai-assisted-shiny-development-in-positron/inlinepromptfix.png" class="img-fluid"></p>
<p>The coolest part? You get a <strong>validation</strong> step built in where you can accept the changes that were made (and even run the result of the change before accepting), or revert and cancel what was done. <em>This</em> is how you should be interacting with AI + coding.</p>
<p>I reran my application after saving the changes it made (with no additional edits on my end). It worked perfect.</p>
<p><img src="https://www.zajichekstats.com/post/ai-assisted-shiny-development-in-positron/editedheaders.png" class="img-fluid"></p>
<p>I then just <a href="https://github.com/centralstatz/hospital_readmissions_explorer/commit/876f157e8b717fa9729af744906b5c1283d2ce63">committed my code</a> to <a href="https://github.com/">GitHub</a> which in turn automatically updated the <a href="https://0196f590-15b7-8b36-3010-eb5a0d8a6d94.share.connect.posit.cloud/">live application</a> on <a href="https://connect.posit.cloud/">Posit Connect Cloud</a>.</p>
<p>Now the app used AI for developing the code <em>and</em> the application itself uses AI while it is running for user interaction. Many ways to use it.</p>
</section>
</section>
<section id="analysis-of-the-result" class="level1">
<h1>Analysis of the result</h1>
<p>The really interesting/impressive part here is that I only used that one prompt and supplied the <code>server.R</code> file (and R environment). Yes it was a simple fix conceptually (i.e., just change the names of the columns). But I didn’t actually specify the column names explicitly in the place where the table is built within the code. They are implied by all the rest of the code and datasets that ship with the app. And it really only edited the columns that <em>needed</em> editing (which wasn’t all of them). So to be clear, it had to:</p>
<ol type="1">
<li>Connect the dots throughout the code base to infer what the <em>current</em> names of the columns in that table would actually be at the time the table is rendered</li>
<li>Determine (on its own) which of those <em>current</em> columns were not in a “nice” format (whatever that means, but it got it right)</li>
<li>Assign a <em>new</em> name to those columns that <em>is</em> in a “nice” format (which it did)</li>
<li>Place the correct code edits to make this happen in the right spot to effect the changes and not break the app (it ran fine).</li>
</ol>
<p>This. Is. Impressive. I can imagine how powerful this would be if you used this with more than 1 prompt and for more than 10 minutes. Very fun times ahead.</p>
<p><br></p>
<p>Now I don’t think I’m really scratching the surface of what possibilities this brings to efficiently streamline data science workflows, assist in app development, and more. This was just the baseline integration of Positron Assitant and it’s already awesome. In this application, AI was now used for both <em>development</em> of the code, and in the live application itself. I didn’t even start trying to use <a href="https://positron.posit.co/databot.html">DataBot</a> yet, which is another add-on to Positron by <a href="https://posit.co/">Posit</a> that will enable it to actually build analysis and explore your data with you. This, among other things, makes me very excited to be in this field.</p>


<!-- -->

</section>

 ]]></description>
  <category>AI</category>
  <category>Shiny</category>
  <category>Web Applications</category>
  <guid>https://www.zajichekstats.com/post/ai-assisted-shiny-development-in-positron/</guid>
  <pubDate>Wed, 29 Oct 2025 05:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/ai-assisted-shiny-development-in-positron/feature.png" medium="image" type="image/png" height="95" width="144"/>
</item>
<item>
  <title>The question is not if, but how much</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/the-question-is-not-if-but-how-much/</link>
  <description><![CDATA[ 




<p>In the current <a href="https://www.usnews.com/news/health-news/articles/2025-09-26/tylenol-refutes-old-post-as-pregnancy-safety-debate-resurfaces">Tylenol debate</a>, everyone is talking about whether there <em>is</em> or <em>isn’t</em> some sort of (causal) association of its use with autism and/or other neurodevelopmental disorders. Pointing to studies like <a href="https://jamanetwork.com/journals/jama/fullarticle/2817406">this JAMA paper</a> or <a href="https://ehjournal.biomedcentral.com/articles/10.1186/s12940-025-01208-0">this systematic review</a>, depending what side of the debate they’re on, as evidence for their conclusion. You even have the <a href="https://www.acog.org/">American College of Obstetricians and Gynecologists</a> putting out a <a href="https://www.acog.org/news/news-releases/2025/09/acog-affirms-safety-benefits-acetaminophen-pregnancy">statement</a> reading:</p>
<blockquote class="blockquote">
<p>“In more than two decades of research on the use of acetaminophen in pregnancy, not a single reputable study has successfully concluded that the use of acetaminophen in any trimester of pregnancy causes neurodevelopmental disorders in children. In fact, the two highest-quality studies on this subject—one of which was published in JAMA last year—found no significant associations between use of acetaminophen during pregnancy and children’s risk of autism, ADHD, or intellectual disability. The studies that are frequently pointed to as evidence of a causal relationship, including the latest systematic review released in August, include the same methodological limitations—for example, lack of a control for confounding factors or use of unreliable self-reported data—that are prevalent in the majority of studies on this topic.”</p>
</blockquote>
<p>What’s missing from all of this is the conversation not about <em>if</em> there is an effect or not, but <em>how much</em> of an effect there is. Surely, there must be variation with respect to how large (or small) an effect is, how it changes with dose and time, who is taking it, etc. What are those numbers that are actually going to help individual people make informed decisions?</p>
<p>There is a line in <a href="https://www.nature.com/articles/d41586-025-02876-1">this article</a> that sums it up nicely:</p>
<blockquote class="blockquote">
<p>“At the heart of this is people trying to look for simple answers to complex problems.”</p>
</blockquote>
<p>I actually don’t believe in this binary (association vs.&nbsp;no association) as a concept. The effect <em>is</em> there, it just may be extremely small, extremely large, or something in between. We can’t “prove” that no association exists, no matter what we do. The important part is accurately estimating what it <em>is</em> as best as possible, in a relevant, interpretable context, and figuring out whether it is <em>practically</em> meaningful (not statistically) and use it to weigh the benefits and costs. Whether it is “statistically significant” (or not) is irrelevant, and is probably the reason why this binary debate exists in the first place.</p>
<p>You can read more on my thoughts about statistical significance <a href="https://www.zajichekstats.com/post/statistical-significance-is-insignificant/">here</a>.</p>


<!-- -->


 ]]></description>
  <category>Philosophy</category>
  <category>Statistical Significance</category>
  <category>Decision Making</category>
  <guid>https://www.zajichekstats.com/post/the-question-is-not-if-but-how-much/</guid>
  <pubDate>Fri, 26 Sep 2025 05:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/the-question-is-not-if-but-how-much/feature.png" medium="image" type="image/png" height="143" width="144"/>
</item>
<item>
  <title>Building a custom LLM-powered Shiny app for hospital readmissions with {querychat}</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/building-an-llm-powered-shiny-app-for-hospital-readmissions/</link>
  <description><![CDATA[ 




<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/YlLcxuAjgJw" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<p>I’ve been exploring the various <a href="https://posit.co/use-cases/ai/">AI tools</a> that <a href="https://posit.co/">Posit</a> has been releasing over the past few months, and I gotta say, they are really cool and incredibly fun to work with. I wanted to learn how to use packages like <a href="https://ellmer.tidyverse.org/"><code>ellmer</code></a>, <a href="https://posit-dev.github.io/shinychat"><code>shinychat</code></a>, and <a href="https://github.com/posit-dev/querychat/tree/main/r-package"><code>querychat</code></a> to enable the user of an <a href="https://shiny.posit.co/">R Shiny</a> application to explore a dataset and dynamically manipulate app visuals/results using natural language in chat format. So I built and deployed an app that explores results from the <a href="https://qualitynet.cms.gov/inpatient/hrrp">Hospital Readmissions Reduction Program (HRRP)</a> for the state of Wisconsin, using these features. I go through it in the video above 👆, or you can read on for a text description.</p>
<section id="the-app" class="level1">
<h1>The App</h1>
<p>Here’s what it looks like in action:</p>
<p><img src="https://www.zajichekstats.com/post/building-an-llm-powered-shiny-app-for-hospital-readmissions/readmit_app.gif" class="img-fluid"></p>
<p>It was deployed to <a href="https://connect.posit.cloud/">Posit Connect Cloud</a>. That, in combination with using <a href="https://gemini.google.com/app">Google Gemini</a> as my LLM provider, made the development and deployment of this app <strong>completely free</strong>. Feel free to access the <a href="https://0196f590-15b7-8b36-3010-eb5a0d8a6d94.share.connect.posit.cloud/">live app</a> and/or the <a href="https://github.com/centralstatz/hospital_readmissions_explorer">source code</a>.</p>
</section>
<section id="some-background-on-readmissions" class="level1">
<h1>Some Background on Readmissions</h1>
<p>The context of this app involves the <a href="https://qualitynet.cms.gov/inpatient/hrrp">Hospital Readmissions Reduction Program (HRRP)</a>.</p>
<p>Each year hospitals across the United States are penalized by <a href="https://www.cms.gov/">CMS</a> for having too many <a href="https://en.wikipedia.org/wiki/Hospital_readmission">readmissions</a> in one or more of the following diagnosis groups:</p>
<ul>
<li>AMI: Acute myocardial infarction</li>
<li>CABG: Coronary artery bypass graft surgery</li>
<li>COPD: Chronic obstructive pulmonary disease</li>
<li>HF: Heart failure</li>
<li>TKA/THA: Total hip/knee surgery</li>
<li>PN: Pneumonia</li>
</ul>
<p>It is a 3-year reporting period in which the readmissions are tallied, and then the hospital is penalized for the duration of a subsequent fiscal year (the penalty amounts to 3% maximum of all Medicare reimbursements). Only ~50% of hospitals actually receive penalty, and the program only applies to patients on Medicare.</p>
<section id="penalty-calculation" class="level2">
<h2 class="anchored" data-anchor-id="penalty-calculation">Penalty calculation</h2>
<p>The penalty is calculated (in a very simplified description) based on the hospital’s relative performance of the <em>excess readmission ratio</em>, which is the ratio of the hospital’s <em>predicted</em> and <em>expected</em> readmission rates. These are estimated for each diagnosis group and then aggregated across the groups to determine the overall payment penalty. See more <a href="https://qualitynet.cms.gov/inpatient/hrrp/measures">here</a>.</p>
<p>The predicted and expected readmission rates are estimated from a <a href="https://en.wikipedia.org/wiki/Generalized_linear_mixed_model">generalized linear mixed effects model</a> via logistic regression (specifically, it is a <a href="https://www.bristol.ac.uk/cmm/learning/videos/random-intercepts.html">random-intercept model</a>). They risk adjust hospitals’ readmission rates by entering numerous risk factors in these models (like comorbidities, etc., which you can find details <a href="https://qualitynet.cms.gov/inpatient/measures/readmission/methodology">here</a>). They use this model to tease out an individual hospital effect on the readmission rate <em>after</em> accounting for the case-mix of that hospital, and compare this individual effect to the “average” hospital. If you’re worse than average, you get penalized (again, as a simple description). So overall, the ratio meant to quantify how likely a patient is to be readmitted to <em>your</em> hospital versus the <em>average</em> hospital after accounting for how sick they are.</p>
<p>In this app, the main metrics we are working with are what are described above:</p>
<ul>
<li>Excess readmission ratio</li>
<li>Predicted readmission rate</li>
<li>Expected readmission rate</li>
</ul>
</section>
<section id="data-source" class="level2">
<h2 class="anchored" data-anchor-id="data-source">Data source</h2>
<p>The datasets themselves that contain this data come from the <a href="https://data.cms.gov/provider-data/">CMS Provider Data Catalog</a>. Specifically, we use the following:</p>
<ul>
<li><a href="https://data.cms.gov/provider-data/dataset/xubh-q36u">Hospital General Information</a>: Provides information on hospitals such as state, location, etc.</li>
<li><a href="https://data.cms.gov/provider-data/dataset/9n3s-kdb3">Hospital Readmissions Reduction Program</a>: Contains readmission program metrics for each hospital participating in the program</li>
</ul>
<p>In the application, the datasets are scraped directly from the web source by using HTTR GET request to get metadata about the dataset, constructing the appropriate file path, and then importing it as a CSV file.</p>
<p>You can see how I did this here by creating a utility function that takes a <code>datasetid</code> as identified on the source website and imports the dataset (so this can be used for any dataset in the catalog).</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load packages</span></span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Function to import dataset from CMS Provider Data Catalog (see https://github.com/zajichek/carecompare/blob/b1fa89382adfe77bd5f230f4162b03767ece10ea/R/FUNCTIONS.R#L99)</span></span>
<span id="cb1-5">pdc_read <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb1-6">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(</span>
<span id="cb1-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">datasetid =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,</span>
<span id="cb1-8">    ...</span>
<span id="cb1-9">  ) {</span>
<span id="cb1-10">    </span>
<span id="cb1-11">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Check for input</span></span>
<span id="cb1-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.null</span>(datasetid)) </span>
<span id="cb1-13">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stop</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Please specify a dataset identifier."</span>)</span>
<span id="cb1-14">    </span>
<span id="cb1-15">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make the url</span></span>
<span id="cb1-16">    url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://data.cms.gov/provider-data/api/1/metastore/schemas/dataset/items/"</span>, datasetid, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"?show-reference-ids=false"</span>)</span>
<span id="cb1-17">    </span>
<span id="cb1-18">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make the request, extract the content</span></span>
<span id="cb1-19">    request <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> httr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">content</span>(httr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">GET</span>(url))</span>
<span id="cb1-20">    </span>
<span id="cb1-21">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Update the variable</span></span>
<span id="cb1-22">    downloadurl <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> request<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>distribution[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>data<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>downloadURL</span>
<span id="cb1-23">    </span>
<span id="cb1-24">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Import the dataset</span></span>
<span id="cb1-25">    readr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(</span>
<span id="cb1-26">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> downloadurl,</span>
<span id="cb1-27">      ...</span>
<span id="cb1-28">    )</span>
<span id="cb1-29">    </span>
<span id="cb1-30">  }</span>
<span id="cb1-31"></span>
<span id="cb1-32"><span class="do" style="color: #5E5E5E;
background-color: null;
font-style: italic;">## Import datasets</span></span>
<span id="cb1-33"></span>
<span id="cb1-34"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Hospital information</span></span>
<span id="cb1-35">hospitals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pdc_read</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">datasetid =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"xubh-q36u"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">guess_max =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># https://data.cms.gov/provider-data/dataset/xubh-q36u</span></span>
<span id="cb1-36"></span>
<span id="cb1-37"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># HRRP outcomes</span></span>
<span id="cb1-38">hrrp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pdc_read</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">datasetid =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"9n3s-kdb3"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"N/A"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" "</span>)) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># https://data.cms.gov/provider-data/dataset/9n3s-kdb3</span></span></code></pre></div>
</details>
</div>
<p>The datasets look like this:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">hospitals</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 5,384 × 38
   `Facility ID` `Facility Name`            Address `City/Town` State `ZIP Code`
   &lt;chr&gt;         &lt;chr&gt;                      &lt;chr&gt;   &lt;chr&gt;       &lt;chr&gt; &lt;chr&gt;     
 1 010001        SOUTHEAST HEALTH MEDICAL … 1108 R… DOTHAN      AL    36301     
 2 010005        MARSHALL MEDICAL CENTERS   2505 U… BOAZ        AL    35957     
 3 010006        NORTH ALABAMA MEDICAL CEN… 1701 V… FLORENCE    AL    35630     
 4 010007        MIZELL MEMORIAL HOSPITAL   702 N … OPP         AL    36467     
 5 010008        CRENSHAW COMMUNITY HOSPIT… 101 HO… LUVERNE     AL    36049     
 6 010011        ST. VINCENT'S EAST         50 MED… BIRMINGHAM  AL    35235     
 7 010012        DEKALB REGIONAL MEDICAL C… 200 ME… FORT PAYNE  AL    35968     
 8 010016        SHELBY BAPTIST MEDICAL CE… 1000 F… ALABASTER   AL    35007     
 9 010018        CALLAHAN EYE HOSPITAL      1720 U… BIRMINGHAM  AL    35233     
10 010019        HELEN KELLER HOSPITAL      1300 S… SHEFFIELD   AL    35660     
# ℹ 5,374 more rows
# ℹ 32 more variables: `County/Parish` &lt;chr&gt;, `Telephone Number` &lt;chr&gt;,
#   `Hospital Type` &lt;chr&gt;, `Hospital Ownership` &lt;chr&gt;,
#   `Emergency Services` &lt;chr&gt;,
#   `Meets criteria for birthing friendly designation` &lt;chr&gt;,
#   `Hospital overall rating` &lt;chr&gt;, `Hospital overall rating footnote` &lt;chr&gt;,
#   `MORT Group Measure Count` &lt;chr&gt;, …</code></pre>
</div>
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">hrrp</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 18,510 × 12
   `Facility Name`     `Facility ID` State `Measure Name` `Number of Discharges`
   &lt;chr&gt;               &lt;chr&gt;         &lt;chr&gt; &lt;chr&gt;                           &lt;dbl&gt;
 1 SOUTHEAST HEALTH M… 010001        AL    READM-30-AMI-…                    296
 2 SOUTHEAST HEALTH M… 010001        AL    READM-30-CABG…                    151
 3 SOUTHEAST HEALTH M… 010001        AL    READM-30-HF-H…                    681
 4 SOUTHEAST HEALTH M… 010001        AL    READM-30-HIP-…                     NA
 5 SOUTHEAST HEALTH M… 010001        AL    READM-30-PN-H…                    490
 6 SOUTHEAST HEALTH M… 010001        AL    READM-30-COPD…                    130
 7 MARSHALL MEDICAL C… 010005        AL    READM-30-CABG…                     NA
 8 MARSHALL MEDICAL C… 010005        AL    READM-30-HIP-…                     NA
 9 MARSHALL MEDICAL C… 010005        AL    READM-30-HF-H…                    176
10 MARSHALL MEDICAL C… 010005        AL    READM-30-PN-H…                    305
# ℹ 18,500 more rows
# ℹ 7 more variables: Footnote &lt;dbl&gt;, `Excess Readmission Ratio` &lt;dbl&gt;,
#   `Predicted Readmission Rate` &lt;dbl&gt;, `Expected Readmission Rate` &lt;dbl&gt;,
#   `Number of Readmissions` &lt;chr&gt;, `Start Date` &lt;chr&gt;, `End Date` &lt;chr&gt;</code></pre>
</div>
</div>
</section>
</section>
<section id="code-summary" class="level1">
<h1>Code summary</h1>
<p>This article is mainly focusing on the part of the code that enables the connection and use of LLM’s to chat with the data, but I wanted to make a few notes about the rest of the app. Feel free to browse the complete app source code <a href="https://github.com/centralstatz/hospital_readmissions_explorer/tree/main">here</a>.</p>
<p>First, the <a href="https://github.com/centralstatz/hospital_readmissions_explorer/blob/main/global.R"><code>global.R</code></a> file creates the objects that are available during the app runtime. It is executed once at app launch. This is where the datasets are imported and cleaned, and the base map of Wisconsin is created (using <a href="https://rstudio.github.io/leaflet/"><code>leaflet</code></a>). This is done so the app doesn’t have to redraw the map everytime from scratch, only the points need be updated as things change (with <a href="https://rstudio.github.io/leaflet/reference/leafletProxy.html"><code>leafletProxy</code></a>). This file is also where the first step for setting up <a href="https://github.com/posit-dev/querychat/blob/main/r-package"><code>querychat</code></a> occurs.</p>
<p>Second, the <a href="https://rstudio.github.io/bslib/"><code>bslib</code></a> package is used to drive the app layout in <a href="https://github.com/centralstatz/hospital_readmissions_explorer/blob/main/ui.R"><code>ui.R</code></a>, and works well with <a href="https://github.com/posit-dev/querychat/blob/main/r-package"><code>querychat</code></a>. The plots in the app are made with the <a href="https://jkunst.com/highcharter/"><code>highcharter</code></a> package, which I just discovered as an alternative to <a href="https://plotly.com/r/"><code>plotly</code></a> and I think it may be my new favorite plotting library (but don’t worry, I’ll always continue using the latter + <a href="https://ggplot2.tidyverse.org/"><code>ggplot2</code></a>). Also, I found the <a href="https://dreamrs.github.io/datamods/"><code>datamods</code></a> package to be really useful for creating the dynamic group filters in the traditional inputs, which is what makes the hospital filters simultaneously update as other columns are filtered. This is done with the <a href="https://dreamrs.github.io/datamods/reference/select-group.html"><code>select_group_*</code></a> functions.</p>
<p>Finally, the general strategy in <a href="https://github.com/centralstatz/hospital_readmissions_explorer/blob/main/server.R"><code>server.R</code></a> to implement the toggle between traditional filters and the “chat-mode” was to have a <a href="https://shiny.posit.co/r/getstarted/shiny-basics/lesson6/"><code>reactive</code></a> data frame that updates based on conditional logic via the status of the toggle input (via <a href="https://rstudio.github.io/bslib/reference/input_switch.html"><code>input_switch</code></a>). We either apply the set of filters to the master dataset in the app, or just provide the dataset returned by the chat object.</p>
</section>
<section id="implementing-chat-functionality" class="level1">
<h1>Implementing chat functionality</h1>
<p>There are two main packages you need to have installed to make this work:</p>
<ul>
<li><a href="https://ellmer.tidyverse.org/"><code>ellmer</code></a>: The package drives the backend for sending and receiving messages to the LLM</li>
<li><a href="https://github.com/posit-dev/querychat/tree/main/r-package"><code>querychat</code></a>: Sets up the UI components and server (using <a href="https://ellmer.tidyverse.org/"><code>ellmer</code></a> internally) to chat within your app, build and execute SQL statements on your dataset, and return the result to be used within the app.</li>
</ul>
<section id="step-1-initialize-the-connection" class="level2">
<h2 class="anchored" data-anchor-id="step-1-initialize-the-connection">Step 1: Initialize the connection</h2>
<p>The first step is to initialize the connection to the LLM of your choice, supply your dataset, and additional information to help it perform better using the <code>querychat_init</code> function. The following snippet is what is implemented in the app (or see it <a href="https://github.com/centralstatz/hospital_readmissions_explorer/blob/a0eec2e86664cf261dcd0b03e43ede78773c9e44/global.R#L224">here</a>):</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Configure the chat object</span></span>
<span id="cb6-2">querychat_config <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb6-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">querychat_init</span>(</span>
<span id="cb6-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">df =</span> master_dat,</span>
<span id="cb6-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tbl_name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HospitalHRRP"</span>,</span>
<span id="cb6-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">create_chat_func =</span> purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">partial</span>(ellmer<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>chat_gemini),</span>
<span id="cb6-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">greeting =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Ask me a question about the HRRP in Wisconsin"</span>,</span>
<span id="cb6-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data_description =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">readLines</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data_description.md"</span>)</span>
<span id="cb6-9">  )</span></code></pre></div>
</details>
</div>
<section id="prompting-strategy" class="level3">
<h3 class="anchored" data-anchor-id="prompting-strategy">Prompting strategy</h3>
<p>The overarching premise of this package is that the LLM’s can be great at generating SQL queries. So when we use the chat in the app, no data is ever being sent to the LLM. Only metadata about the dataset. Then it can translate our natural languge inputs into curated SQL queries. See more <a href="https://github.com/posit-dev/querychat/blob/main/r-package/README.md#powered-by-sql">here</a>. The <code>tblName</code> argument is what will appear in queries generated in the chat.</p>
</section>
<section id="choosing-an-llm-provider" class="level3">
<h3 class="anchored" data-anchor-id="choosing-an-llm-provider">Choosing an LLM provider</h3>
<p>I used <a href="https://gemini.google.com/app">Google Gemini</a> because it’s free and I didn’t have to provide payment information to start using it. It’s not the optimal model to use (you can see recommendations <a href="https://github.com/posit-dev/querychat/blob/main/r-package/README.md#powered-by-llms">here</a>), but it actually does give pretty decent results and is definitely sufficient for demonstration purposes. I provided more detail on configuring the API in <a href="https://www.zajichekstats.com/post/making-an-ai-statistical-consultant/">this blog post</a>.</p>
</section>
<section id="providing-a-data-description" class="level3">
<h3 class="anchored" data-anchor-id="providing-a-data-description">Providing a data description</h3>
<p>What I’ve found to be the most important part: the <code>data_description</code> argument. This is your chance to supply the LLM with some initial information to consider before it starts trying to generate queries on your data. By default, it will only give it some basic information (see <a href="https://github.com/posit-dev/querychat/blob/main/r-package/README.md#data-description">here</a>), but provider more detail and context will make it work a lot better and give a lot more flexibility into how the chat interaction can occur.</p>
<p>In my app, I created a <a href="https://github.com/centralstatz/hospital_readmissions_explorer/blob/main/data_description.md"><code>data_description.md</code></a> file that I feed into this argument which gives the LLM a detailed description of not only the columns in my dataset, but the context in which the data is for (i.e., hospital readmissions). The most important part in my file to get things working robustly was this line:</p>
<blockquote class="blockquote">
<p>“Important Note:: The hospital information fields are all in capital letters (all characters), so queries on this data should always capitalize all characters when searching for specific cities or counties.”</p>
</blockquote>
<p>Since the hospital location columns in the raw dataset were in all capital letters, the LLM would not generate sufficient queries without this unless I also capitalized specific cities/counties in my prompts. That’s not very natural, so this <em>a priori</em> prompt helped resolve that. Now in my app when I ask for <em>“hospitals in marathon county”</em>, it will correctly generate a query that says <code>SELECT * FROM HospitalHRRP WHERE County = "MARATHON"</code>.</p>
</section>
</section>
<section id="step-2-setup-the-user-interface" class="level2">
<h2 class="anchored" data-anchor-id="step-2-setup-the-user-interface">Step 2: Setup the user interface</h2>
<p>In the <a href="https://shiny.posit.co/r/getstarted/shiny-basics/lesson2/">app UI</a>, we need to specify where (and how) we want the chat interface to appear. In this app, we use the <code>querychat_ui</code> function (see <a href="https://github.com/centralstatz/hospital_readmissions_explorer/blob/a0eec2e86664cf261dcd0b03e43ede78773c9e44/ui.R#L123">here</a>). It’s literally just one line of code that you can put anywhere in your app: <code>querychat_ui(id = "chat")</code>. You could also use the <code>querychat_sidebar</code> function if you wanted your entire app side panel to be a nicer looking chat pane. However, in this app, I wanted to be able to toggle between manual filters and chat mode, so I opted for the former.</p>
<section id="toggling-between-manual-and-llm-filtering" class="level3">
<h3 class="anchored" data-anchor-id="toggling-between-manual-and-llm-filtering">Toggling between manual and LLM filtering</h3>
<p>The “Chat Mode” toggle button is just built from an <code>input_switch</code> (see <a href="https://github.com/centralstatz/hospital_readmissions_explorer/blob/a0eec2e86664cf261dcd0b03e43ede78773c9e44/ui.R#L35">here</a>). Based on the current value of that switch, either the manual input panel or the chat pane is displayed with a condition panel:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Chat input</span></span>
<span id="cb7-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">conditionalPanel</span>(</span>
<span id="cb7-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">condition =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"input.chat_mode"</span>,</span>
<span id="cb7-4">  </span>
<span id="cb7-5">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># The chat user interface</span></span>
<span id="cb7-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">querychat_ui</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">id =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"chat"</span>)</span>
<span id="cb7-7">)</span></code></pre></div>
</details>
</div>
<p>Simple enough, but pretty useful feature.</p>
<p>That is all you need to do on the user interface side to setup the chat. Everything else, like manipulating and updating visuals, is done on the server side.</p>
</section>
</section>
<section id="step-3-manage-the-data-output" class="level2">
<h2 class="anchored" data-anchor-id="step-3-manage-the-data-output">Step 3: Manage the data output</h2>
<p>Finally, we just need to set things up on the server side to feed the data sent back from the chat (i.e., received from the query) into our visuals. Conceptually, you can basically just treat this dataset like any reactive data frame you would normally use in a Shiny application, so it’s quite easy to work with.</p>
<p>First you need to create the chat server object (<code>querychat_server</code>) using the previously-initialized object from <code>querychat_init</code>. This can just be run somewhere in the server function (see <a href="https://github.com/centralstatz/hospital_readmissions_explorer/blob/a0eec2e86664cf261dcd0b03e43ede78773c9e44/server.R#L20">here</a>).</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make the chat server</span></span>
<span id="cb8-2">querychat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">querychat_server</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"chat"</span>, querychat_config)</span></code></pre></div>
</details>
</div>
<section id="accessing-the-returned-data" class="level3">
<h3 class="anchored" data-anchor-id="accessing-the-returned-data">Accessing the returned data</h3>
<p>The <code>querychat</code> server above then holds a <code>$df()</code> object, which consists of the current data stored from the most recent query executed (assuming the query is returning a data set and not a response in the chat itself). So, we just need to access that data object wherever we want to use the data in our app–just like any other dataset in a Shiny app (see <a href="https://github.com/centralstatz/hospital_readmissions_explorer/blob/a0eec2e86664cf261dcd0b03e43ede78773c9e44/server.R#L30">here</a>).</p>
<p>In this app, to implement the toggle functionality between manual and chat filtering on the backend, I just did this:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filter to current hospitals (with metric criteria)</span></span>
<span id="cb9-2">current_hospitals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb9-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">reactive</span>({</span>
<span id="cb9-4">    </span>
<span id="cb9-5">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use dataset based on app mode</span></span>
<span id="cb9-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span>(input<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>chat_mode) {</span>
<span id="cb9-7">      </span>
<span id="cb9-8">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get the dataset being returned by the chat</span></span>
<span id="cb9-9">      temp_hospitals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> querychat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">df</span>()</span>
<span id="cb9-10">      </span>
<span id="cb9-11">    } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> {</span>
<span id="cb9-12">      </span>
<span id="cb9-13">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use the dataset filtered manually</span></span>
<span id="cb9-14">      temp_hospitals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb9-15">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">current_hospitals_temp</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-16">        </span>
<span id="cb9-17">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filter to the specified metric ranges</span></span>
<span id="cb9-18">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(</span>
<span id="cb9-19">          DiagnosisCategory <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> input<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>diagnosis,</span>
<span id="cb9-20">          Excess <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(input<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>excess), Excess <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(input<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>excess),</span>
<span id="cb9-21">          Predicted <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(input<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>predicted), Predicted <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(input<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>predicted),</span>
<span id="cb9-22">          Expected <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(input<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>expected), Expected <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(input<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>expected)</span>
<span id="cb9-23">        )</span>
<span id="cb9-24">    }</span>
<span id="cb9-25">    </span>
<span id="cb9-26">    temp_hospitals</span>
<span id="cb9-27">    </span>
<span id="cb9-28">  })</span></code></pre></div>
</details>
</div>
<p>In short, if the app is in chat mode, use the data returned from the SQL query. Otherwise, use the dataset that is created from the manual filters. These datasets are always in the same format, so I can then cascade it down through the app visuals like I would any other dataset, agnostic to what means it was created.</p>
</section>
</section>
</section>
<section id="deploying-the-app" class="level1">
<h1>Deploying the app</h1>
<p>As mentioned previously, the app was deployed to <a href="https://connect.posit.cloud/">Posit Connect Cloud</a> for free (you can have up to 5 live Shiny applications in the free tier). I gave a more detailed description on how you deploy an app to this platform with the LLM API configured in <a href="https://www.zajichekstats.com/post/making-an-ai-statistical-consultant/">this blog post</a>, so check there for more detail. The main things you need to remember for deployment are to:</p>
<ul>
<li>Create your <code>manifest.json</code> file after you’ve completed app development by running <code>rsconnect::writeManifest()</code> so you can capture your app’s dependencies</li>
<li>Put your code in a public GitHub repository (unless you want to upgrade to a paid tier then you can deploy from private repos)</li>
<li>Add your <code>GOOGLE_API_KEY</code> (or whatever API you’re using) to the environment variable list while you are configuring your deployment on <a href="https://connect.posit.cloud/">Posit Connect Cloud</a></li>
</ul>
<p>Other than that, things will likely run smooth.</p>


<!-- -->

</section>

 ]]></description>
  <category>AI</category>
  <category>Shiny</category>
  <category>Web Applications</category>
  <guid>https://www.zajichekstats.com/post/building-an-llm-powered-shiny-app-for-hospital-readmissions/</guid>
  <pubDate>Fri, 23 May 2025 05:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/building-an-llm-powered-shiny-app-for-hospital-readmissions/feature.png" medium="image" type="image/png" height="95" width="144"/>
</item>
<item>
  <title>Making an AI statistical consultant</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/making-an-ai-statistical-consultant/</link>
  <description><![CDATA[ 




<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/WKB548RjE9o" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<p>The stuff <a href="https://posit.co/">Posit</a> has been <a href="https://posit.co/use-cases/ai/">doing with AI</a> has been really refreshing for data scientists, especially R and Python developers. With <a href="https://chatgpt.com/">ChatGPT</a> and the other popular chat <a href="https://en.wikipedia.org/wiki/Graphical_user_interface">GUIs</a> out there, using AI tools was sort of fun, but now with the ability to programmatically interact with LLM’s as part of my data science tech stack, I’m actually excited about using them.</p>
<p>I attended <a href="https://www.shinyconf.com/">ShinyConf 2025</a> and there were a lot of great talks that made these tools very appealing to work with. They introduced the <a href="https://ellmer.tidyverse.org/">ellmer</a> package in R, and other ones such as <a href="https://posit-dev.github.io/shinychat/">shinychat</a> and <a href="https://github.com/posit-dev/querychat">querychat</a>. Having no experience using these yet, I wanted to see how simple it would be to build and deploy a chat application from scratch that acts as a statistical consultant, which is what is in the video above 👆 👆 👆</p>
<p>Here are the steps taken, in text form:</p>
<section id="live-app-source-code" class="level1">
<h1>Live app + source code</h1>
<p>First, if you wanted to see how the app works and the full code behind it, you can find these here:</p>
<ul>
<li><a href="https://01962be4-3359-d652-ac24-9641c956445a.share.connect.posit.cloud/">Live app</a></li>
<li><a href="https://github.com/centralstatz/statistical_consultant_chat/">Source code</a></li>
</ul>
<p>Now on to the tutorial 👇</p>
</section>
<section id="install-the-libraries" class="level1">
<h1>1. Install the libraries</h1>
<p>Of course, we need to make sure <a href="https://shiny.posit.co/r/getstarted/shiny-basics/lesson1/"><code>shiny</code></a> is installed.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">install.packages</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"shiny"</span>)</span></code></pre></div>
</details>
</div>
<p>Then the brunt of the work here is going to be done by:</p>
<ul>
<li><a href="https://ellmer.tidyverse.org/"><code>ellmer</code></a>: What we use to send prompts and receive responses from an LLM</li>
<li><a href="https://posit-dev.github.io/shinychat/"><code>shinychat</code></a>: How we create the chat user interface for our application</li>
</ul>
<p>Both of these are on <a href="https://cran.r-project.org/">CRAN</a>, so we can also install them easily.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">install.packages</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ellmer"</span>)</span>
<span id="cb2-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">install.packages</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"shinychat"</span>)</span></code></pre></div>
</details>
</div>
</section>
<section id="choose-an-llm-provider" class="level1">
<h1>2. Choose an LLM provider</h1>
<p>You need to choose which LLM you want to facilitate the chat function in your application. <a href="https://ellmer.tidyverse.org/"><code>ellmer</code></a> supports all the major ones, and you can scan through the list <a href="https://ellmer.tidyverse.org/reference/index.html">here</a>.</p>
<p>Many of these require some sort of payment for API usage, but <a href="https://gemini.google.com/app">Google Gemini</a> offers free-tier usage, so that’s what I went with. So from that list above, I chose the <a href="https://ellmer.tidyverse.org/reference/chat_gemini.html"><code>chat_gemini</code></a> option.</p>
<section id="setting-up-the-api" class="level2">
<h2 class="anchored" data-anchor-id="setting-up-the-api">Setting up the API</h2>
<p>In order to get this working, you need to configure an API key to send prompts to the model programmatically.</p>
<section id="a.-retrieve-the-api-key" class="level3">
<h3 class="anchored" data-anchor-id="a.-retrieve-the-api-key">a. Retrieve the API key</h3>
<p>You can go to the <a href="https://aistudio.google.com/apikey">Google AI Studio</a> (and login with your Google account and agree to terms), click <em>Create API Key</em> in the top-right corner, and copy the key to your clipboard (see <a href="https://youtu.be/WKB548RjE9o?t=199">this part of the video above</a> for a visual).</p>
</section>
<section id="apikey" class="level3">
<h3 class="anchored" data-anchor-id="apikey">b. Set your local environment variable</h3>
<p>As stated in the <a href="https://ellmer.tidyverse.org/reference/chat_gemini.html">function documentation</a>, rather than hard-coding your API key into your app code, you just put in your local R environment file (<code>.Renviron</code>) as <code>GOOGLE_API_KEY</code>. As always, this is made simple with the <a href="https://usethis.r-lib.org/"><code>usethis</code></a> package.</p>
<p>In <a href="https://posit.co/download/rstudio-desktop/">RStudio</a>, just type <code>usethis::edit_r_environ()</code> and your <code>.Renviron</code> file will open. Then on a new line, enter your API key:</p>
<pre><code>GOOGLE_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXX &lt;-- FILL IN WITH YOUR KEY</code></pre>
<p>Save it, and that’s it.</p>
<p>You should immediately be able to send/receive a response from Gemini:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load package</span></span>
<span id="cb4-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ellmer)</span>
<span id="cb4-3"></span>
<span id="cb4-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set up chat</span></span>
<span id="cb4-5">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>()</span></code></pre></div>
</details>
<div class="cell-output cell-output-stderr">
<pre><code>Using model = "gemini-2.0-flash".</code></pre>
</div>
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send a prompt</span></span>
<span id="cb6-2">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What's a word that rhymes with orange?"</span>)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>This is a classic riddle! The truth is, there aren't any perfect rhymes for 
"orange" in the English language. 

However, people sometimes use near rhymes or words that rhyme in specific 
dialects. For example:

*   **"Door hinge"** - If you combine two words, you can get a near rhyme.
*   **"Lozenge"** - This is the closest near rhyme you'll find as it sounds 
almost like "lore-inj"

**So, the best answer is: There isn't a perfect rhyme for "orange."**</code></pre>
</div>
</div>
</section>
</section>
</section>
<section id="createproject" class="level1">
<h1>3. Initialize a new R project</h1>
<p>I store my code on <a href="https://github.com/">GitHub</a>, so I first went there to create a new <em>public</em> repository (you can see mine <a href="https://github.com/centralstatz/statistical_consultant_chat/">here</a>).</p>
<p>Then in <a href="https://posit.co/download/rstudio-desktop/">RStudio</a>, you can create a new project (<em>File -&gt; New Project -&gt; Version Control -&gt; Git</em>). Clone the repository you just made so that your local project is pre-configured to push changes to the remote repo (see <a href="https://youtu.be/WKB548RjE9o?t=34">here</a>).</p>
</section>
<section id="systemprompt" class="level1">
<h1>4. Develop the system prompt</h1>
<p>One of the most important concepts during <a href="https://www.shinyconf.com/">ShinyConf 2025</a> was the <a href="https://ellmer.tidyverse.org/articles/ellmer.html#what-is-a-prompt"><em>system prompt</em></a>. It is how you shape the broader LLM that you’ll be using (in my case, <a href="https://gemini.google.com/app">Gemini</a>) to take on a given behavior or persona that you’d like for your application. You set this <em>once</em> upon application start up.</p>
<p>For my purposes, I wanted the chatbot to act like a statistical consultant. Not just any statistical consultant, but one that was very focused on <em>practical</em> application (not statistical significance), as if a business owner was interacting with it trying to use data to make a decision that they needed to take action on soon. This was my system prompt:</p>
<blockquote class="blockquote">
<p>“You are a statistical consultant. You are interacting subject matter experts and/or business leaders who are trying to use statistics to take action and make decisions. The person you are talking to is interested in tangible, real-world results and outcomes, so you should problem solve with insofar as to help them achieve those goals. We are not necessarily concerned with things like statistical signifcance or just running tests, unless it is actually relevant to the action or decision point. Instead, focus on the practical steps this person may take, in the context of their problem, to move closer to their desired goal. To do this, you will ask questions about what they are trying to solve, and how we know that what we’ve done provides real value. Then once you’ve gained enough information, you can develop a pragmatic strategy for how they should get started, and what the roadmap may look like. You’re suggestions may include technical things where it makes sense (like tools, methodology, etc.), but should very much be focused on what a non-statistical person might be able to do to further this effort (e.g., better data collection, personnell, etc.)”</p>
</blockquote>
<p>You can see the raw file <a href="https://github.com/centralstatz/statistical_consultant_chat/blob/main/system_prompt.txt">here</a>.</p>
<section id="systempromptinapp" class="level2">
<h2 class="anchored" data-anchor-id="systempromptinapp">Placing it into the app</h2>
<p>Since this was long text, I just put it in <a href="https://github.com/centralstatz/statistical_consultant_chat/blob/main/system_prompt.txt">its own text file</a> in my <a href="https://github.com/centralstatz/statistical_consultant_chat/tree/main">app’s directory</a>. Then when app starts, load it into memory (see <a href="https://github.com/centralstatz/statistical_consultant_chat/blob/03151165178f75d70520e952ab0efed9a89ed9c6/app.R#L8C1-L8C49">this line</a>):</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">system_prompt <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">readLines</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"system_prompt.txt"</span>) </span></code></pre></div>
</details>
</div>
<p>Then when the <code>chat_gemini</code> object is created, the imported prompt is entered there in the <code>system_prompt</code> argument (see <a href="https://github.com/centralstatz/statistical_consultant_chat/blob/03151165178f75d70520e952ab0efed9a89ed9c6/app.R#L22C3-L22C75">this line</a>:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(system_prompt, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">collapse =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>))</span></code></pre></div>
</details>
</div>
</section>
</section>
<section id="put-the-app-together" class="level1">
<h1>5. Put the app together</h1>
<p>We haven’t talked about the <a href="https://posit-dev.github.io/shinychat/"><code>shinychat</code></a> package yet, so we’ll do that here as we build the main application file, which you can find <a href="https://github.com/centralstatz/statistical_consultant_chat/blob/main/app.R">here</a>.</p>
<section id="a.-load-the-required-packages" class="level2">
<h2 class="anchored" data-anchor-id="a.-load-the-required-packages">a. Load the required packages</h2>
<p>The first part of the app just loads the packages we need (and imports the system prompt that we discussed above).</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load packages</span></span>
<span id="cb10-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(shiny)</span>
<span id="cb10-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(bslib)</span>
<span id="cb10-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ellmer)</span>
<span id="cb10-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(shinychat)</span>
<span id="cb10-6"></span>
<span id="cb10-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Import system prompt</span></span>
<span id="cb10-8">system_prompt <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">readLines</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"system_prompt.txt"</span>) </span></code></pre></div>
</details>
</div>
</section>
<section id="userinterface" class="level2">
<h2 class="anchored" data-anchor-id="userinterface">b. Make the user interface</h2>
<p>All we want here is the chat interface, nothing else, so it’s quite simple. First, we use the <a href="https://rstudio.github.io/bslib/index.html"><code>bslib</code></a> package to <a href="https://rstudio.github.io/bslib/reference/page.html">create a page</a>.</p>
<p>Then within there is where <a href="https://posit-dev.github.io/shinychat/"><code>shinychat</code></a> comes in, which very easily makes a chat user interface with the <a href="https://posit-dev.github.io/shinychat/reference/chat_ui.html"><code>chat_ui</code></a> function.</p>
<p>We place that in the page, and specify the <code>id</code> and <code>placeholder</code> for chat initialization, and the entire user interface just looks like this (see <a href="https://github.com/centralstatz/statistical_consultant_chat/blob/03151165178f75d70520e952ab0efed9a89ed9c6/app.R#L10">these lines</a>):</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">ui <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">page_fluid</span>(</span>
<span id="cb11-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ui</span>(</span>
<span id="cb11-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">id =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"chat"</span>,</span>
<span id="cb11-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">placeholder =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hi, I'm a statistical consultant, how can I help you?"</span></span>
<span id="cb11-5">  )</span>
<span id="cb11-6">)</span></code></pre></div>
</details>
</div>
</section>
<section id="c.-define-the-server" class="level2">
<h2 class="anchored" data-anchor-id="c.-define-the-server">c.&nbsp;Define the server</h2>
<p>First we create the chat object like we did above. Then we have to figure out how to continuously send/receive messages while the app is running. That’s where <a href="https://ellmer.tidyverse.org/"><code>ellmer</code></a> comes back into play.</p>
<p>When we create the <code>chat_gemini</code> object, it is an <a href="https://r6.r-lib.org/articles/Introduction.html">R6</a> object of type <a href="https://ellmer.tidyverse.org/reference/Chat.html"><em>Chat</em></a>, and we use the <a href="https://ellmer.tidyverse.org/reference/Chat.html#method-stream-async-"><code>stream_async</code></a> method to feed responses back from the LLM, but we allow it to provide response as it goes so we’re not just waiting for all of it to complete. In that method, we provide the <code>input$chat_user_input</code> which is just the message we type into the chat interface we created above (and is automatically accessible via the creation of the <a href="https://posit-dev.github.io/shinychat/reference/chat_ui.html"><code>chat_ui</code></a>).</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">stream <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stream_async</span>(input<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>chat_user_input)</span></code></pre></div>
</details>
</div>
<p>Once we create that stream, then we go back to <a href="https://posit-dev.github.io/shinychat/"><code>shinychat</code></a> and use the <a href="https://posit-dev.github.io/shinychat/reference/chat_append.html"><code>chat_append</code></a> function to feed our <a href="https://posit-dev.github.io/shinychat/reference/chat_ui.html"><code>chat_ui</code></a> the results. And that’s it.</p>
<p>The full server function looks like this (see it <a href="https://github.com/centralstatz/statistical_consultant_chat/blob/03151165178f75d70520e952ab0efed9a89ed9c6/app.R#L19">here</a>):</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">server <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(input, output, session) {</span>
<span id="cb13-2">  </span>
<span id="cb13-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Setup the chat</span></span>
<span id="cb13-4">  chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(system_prompt, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">collapse =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>))</span>
<span id="cb13-5">  </span>
<span id="cb13-6">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Stream responses as they come</span></span>
<span id="cb13-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">observeEvent</span>(input<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>chat_user_input, {</span>
<span id="cb13-8">    stream <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stream_async</span>(input<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>chat_user_input)</span>
<span id="cb13-9">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_append</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"chat"</span>, stream)</span>
<span id="cb13-10">  })</span>
<span id="cb13-11">  </span>
<span id="cb13-12">}</span></code></pre></div>
</details>
</div>
<p>And our app is complete. We should now be able to run it locally.</p>
</section>
</section>
<section id="deploy-the-app" class="level1">
<h1>6. Deploy the app</h1>
<p>The live app is <a href="https://01962be4-3359-d652-ac24-9641c956445a.share.connect.posit.cloud/">here</a>.</p>
<p>I used <a href="https://connect.posit.cloud/">Posit Connect Cloud</a>, which is an awesome platform for deploying data science apps to the web. Basically all complex configuration is done there, we just need to supply the code. And you can do it for <a href="https://connect.posit.cloud/plans">free</a>.</p>
<section id="a.-capturing-your-environment" class="level2">
<h2 class="anchored" data-anchor-id="a.-capturing-your-environment">a. Capturing your environment</h2>
<p>The last thing you’ll want to do before deploying to <a href="https://connect.posit.cloud/">Connect Cloud</a> is to create a file called <code>manifest.json</code> and store it at the root of your app directory. You do this after you’ve made all changes to your app code, and it’s meant to capture your R environment and packages/dependencies so that it can be recreated upon deployment. Yet again, this is done with a simple line of code (assuming your working directory is your app’s location):</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">rsconnect<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">writeManifest</span>()</span></code></pre></div>
</details>
</div>
<p>You should now see <code>manifest.json</code>.</p>
</section>
<section id="b.-push-to-github" class="level2">
<h2 class="anchored" data-anchor-id="b.-push-to-github">b. Push to GitHub</h2>
<p>We can then commit changes and push the code to GitHub in the repository that was created earlier. <a href="https://github.com/centralstatz/statistical_consultant_chat/commit/d22d2bf925040525624113e762cba373da131a99">This</a> was my commit for the app.</p>
</section>
<section id="c.-setup-on-connect-cloud" class="level2">
<h2 class="anchored" data-anchor-id="c.-setup-on-connect-cloud">c.&nbsp;Setup on Connect Cloud</h2>
<p>I created my account using GitHub credentials (see <a href="https://connect.posit.cloud/zajichek">my profile</a>), so when I want to deploy something, it already has access to relevant metadata.</p>
<p>First, hit the <em>Publish</em> button from your homepage:</p>
<p><img src="https://www.zajichekstats.com/post/making-an-ai-statistical-consultant/publish.png" class="img-fluid"></p>
<p>Select <em>Shiny</em>:</p>
<p><img src="https://www.zajichekstats.com/post/making-an-ai-statistical-consultant/shiny.png" class="img-fluid"></p>
<p>Then configure the location of your code:</p>
<p><img src="https://www.zajichekstats.com/post/making-an-ai-statistical-consultant/configure.png" class="img-fluid"></p>
<p>The repository name should auto-populate in the list if you signed up with GitHub. Then select the file that contains your app code (in my case, <code>app.R</code>). I chose to <em>Automatically publish on push</em> so that if I change my code and push to GitHub, the live app will automatically reflect updates.</p>
<section id="re-configure-your-api-key" class="level3">
<h3 class="anchored" data-anchor-id="re-configure-your-api-key">(Re) configure your API key</h3>
<p>Since we’re no longer local, we need to re-establish the connection to the LLM provider by setting the <code>GOOGLE_API_KEY</code> environment variable on Connect Cloud. Click <em>Advanced settings</em>:</p>
<p><img src="https://www.zajichekstats.com/post/making-an-ai-statistical-consultant/advancedsettings1.png" class="img-fluid"></p>
<p>Then <em>Add variable</em>, and enter your API key from above:</p>
<p><img src="https://www.zajichekstats.com/post/making-an-ai-statistical-consultant/advancedsettings2.png" class="img-fluid"></p>
<p>The hit <em>Publish</em>. Your app is now live. You can copy the public link to your app in the top-right of the page, and share it for the world to use!</p>
<p><br></p>
<p>My live app is <a href="https://01962be4-3359-d652-ac24-9641c956445a.share.connect.posit.cloud/">here</a>. This is a snippet of how it works:</p>
<p><img src="https://www.zajichekstats.com/post/making-an-ai-statistical-consultant/livechatapp.gif" class="img-fluid"></p>


<!-- -->

</section>
</section>
</section>

 ]]></description>
  <category>AI</category>
  <category>Shiny</category>
  <category>Web Applications</category>
  <guid>https://www.zajichekstats.com/post/making-an-ai-statistical-consultant/</guid>
  <pubDate>Tue, 13 May 2025 05:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/making-an-ai-statistical-consultant/feature.png" medium="image" type="image/png" height="115" width="144"/>
</item>
<item>
  <title>What decision are we trying to make, anyway?</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/what-decision-are-we-trying-to-make/</link>
  <description><![CDATA[ 




<p>Everyone acknowledges, or even agrees with, the issues and limitations of <a href="https://www.ncbi.nlm.nih.gov/books/NBK459346/#:~:text=A%20study%20is%20statistically%20significant,not%20a%20statistically%20significant%20result.">statistical significance</a> (which I previously wrote about <a href="https://www.zajichekstats.com/post/statistical-significance-is-insignificant/">here</a>): the effect size is ignored, you’ll get it with a really large sample, it doesn’t account for practical importance, and so on…</p>
<p>Yet it is still used all the time, and life goes on. People debate these nuances back and forth as to when and where it is or isn’t valid, etc. But just recently in a <a href="https://www.linkedin.com/feed/update/urn:li:activity:7319500623299653633?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A7319500623299653633%2C7319674842163621888%29&amp;dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287319674842163621888%2Curn%3Ali%3Aactivity%3A7319500623299653633%29">conversation on LinkedIn</a>, I had a realization of the fundamental reason on why it is an issue, and why there are many <a href="https://www.tandfonline.com/doi/full/10.1080/00031305.2016.1154108#d1e167">calls to ban</a> p-values and statistical significance entirely.</p>
<blockquote class="blockquote">
<p>“Except experimental field, like randomized clinical trials, where both statistical and practical significance are necessary and when combined, answer the question of interest.”</p>
</blockquote>
<p>It’s <em>not</em> that a p-value, or statistical significance itself even, is inherently bad or wrong. <strong>It <em>all</em> has to do exclusively with how they are used, interpreted, implemented, and implicated.</strong></p>
<p>Take <a href="https://www.linkedin.com/posts/adrianolszewski_how-to-combine-statistical-discernibility-activity-7310365190959632384-sNHs?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAACwA1gEB29rAFO327bVy4fMmOqVeTbmsXA8">this point from Adrian Olszewski’s post</a> (which was cited in the conversation above) talking about using statistical significance in combination with practical importance, but in a mathematically formal manner.</p>
<p>The key quote here is:</p>
<blockquote class="blockquote">
<p>“We all make BINARY decisions every day, based on some criteria.”</p>
</blockquote>
<p>And from this supporting image:</p>
<p><img src="https://www.zajichekstats.com/post/what-decision-are-we-trying-to-make/adrian.png" class="img-fluid"></p>
<blockquote class="blockquote">
<p>“Thresholds are essential in binary decision making in experimental studies…”</p>
</blockquote>
<p>All of that is well and good, and sure, makes sense if statistical significance helps you solve a problem in that way. But the main thing that stuck out to me was the emphasis on <em>decisions</em>.</p>
<p><strong>THIS</strong> is exactly the problem with statistical significance.</p>
<p>We use it to declare there is or isn’t an effect of some sort, therefore implicating decision X, Y, or Z. But in many (or most) cases, it is only a <em>guise</em> of a decision actually being made. Think about it: in a typical academic publication, they collect data, run some statistical modeling and/or tests, and get a p-value. The “decision” being made amounts to writing the words <em>“there was a significant difference”</em> on a page in some boilerplate fashion, not an actual action being taken. Where’s the decision?</p>
<p>Sure, you might argue that a published statistical test result could <em>lead</em> to many decisions being made by those consuming the analysis. That’s exactly the point. We have no idea what these decisions are. It is intractable. Those subsequent decisions are not actually related to or reflected in the statistical setup and testing done in the original analysis. We don’t know who is using the results, how they are using them, or what they are using them for. It’s just words on a page.</p>
<p>Contrast this with, say, <a href="https://en.wikipedia.org/wiki/William_Sealy_Gosset">William Sealy Gosset</a> (the inventor of the <a href="https://en.wikipedia.org/wiki/Student%27s_t-test">t-test</a>, and who I <a href="https://www.zajichekstats.com/post/on-the-creation-of-classical-statistics/">wrote about previously</a>), whose primary goal was to brew better beer. His statistical testing methodology, and the use of statistical significance, was directly tailored to the problem at hand and the decisions he was trying to make in order to produce higher quality product. It was tangible. He could <em>see</em> the impact with his own two eyes of using significance testing for decisions he was actually making. And in his case, it provided value. Kind of like the way a tape measure is as useful as the context it is applied and the tolerance needed for making a decision. It’s not just the fact that a tape measure was used that creates its importance, it’s that the information happens to be useful in that situation so subsequent action can be taken.</p>
<p>So when we say statistical significance or p-values should be banned, what we really mean is that they shouldn’t be used the way they are. If we could just find better uses for them, then sure, they can stay. The problem is that incentive structures, education, etc. are poorly designed around these concepts, and they are so deeply ingrained that it’s difficult to dig our way out.</p>


<!-- -->


 ]]></description>
  <category>Philosophy</category>
  <category>Statistical Significance</category>
  <category>Decision Making</category>
  <guid>https://www.zajichekstats.com/post/what-decision-are-we-trying-to-make/</guid>
  <pubDate>Wed, 23 Apr 2025 05:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/what-decision-are-we-trying-to-make/feature.png" medium="image" type="image/png" height="133" width="144"/>
</item>
<item>
  <title>How to reconcile the regression equation from spline terms</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/reconciling-regression-equation-spline-terms/</link>
  <description><![CDATA[ 




<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/HXTjssAYCdw" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<p>Suppose we want to model blood pressure as a function of age (dataset details found <a href="https://archive.ics.uci.edu/dataset/45/heart+disease">here</a>):</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load packages</span></span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract the data set</span></span>
<span id="cb1-5">dat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> cheese<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>heart_disease</span>
<span id="cb1-6"></span>
<span id="cb1-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a plot</span></span>
<span id="cb1-8">dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(</span>
<span id="cb1-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb1-12">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Age,</span>
<span id="cb1-13">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> BP</span>
<span id="cb1-14">    ),</span>
<span id="cb1-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span>,</span>
<span id="cb1-16">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,</span>
<span id="cb1-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"brown"</span>,</span>
<span id="cb1-18">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb1-19">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Age (years)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Systolic blood pressure"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-22">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb1-23">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb1-24">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>),</span>
<span id="cb1-25">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>)</span>
<span id="cb1-26">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/reconciling-regression-equation-spline-terms/index_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>It looks like a larger age is associated with a larger blood pressure, on average, so we fit a <a href="https://en.wikipedia.org/wiki/Simple_linear_regression">simple linear regression</a> model:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit the model</span></span>
<span id="cb2-2">mod1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(BP <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Age, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> dat)</span>
<span id="cb2-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(mod1)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = BP ~ Age, data = dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-38.659 -11.449  -0.904  10.218  67.444 

Coefficients:
            Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept) 101.4851     5.9364  17.095  &lt; 2e-16 ***
Age           0.5548     0.1076   5.157 4.55e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 16.9 on 301 degrees of freedom
Multiple R-squared:  0.08119,   Adjusted R-squared:  0.07814 
F-statistic:  26.6 on 1 and 301 DF,  p-value: 4.547e-07</code></pre>
</div>
</div>
<p>We estimate that for every 10 year increase in age, systolic blood pressure increases by 5.5 mmHg. We can also write out the full regression equation for estimating the blood pressure given a new patient’s age:</p>
<p><img src="https://latex.codecogs.com/png.latex?BP%20=%20101.49%20+%200.56%20%5Ctimes%20Age"></p>
<p>And we can add this to our plot to get a visual.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a plot</span></span>
<span id="cb4-2">dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(</span>
<span id="cb4-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb4-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Age,</span>
<span id="cb4-7">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> BP</span>
<span id="cb4-8">    ),</span>
<span id="cb4-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span>,</span>
<span id="cb4-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,</span>
<span id="cb4-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"brown"</span>,</span>
<span id="cb4-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb4-13">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_abline</span>(</span>
<span id="cb4-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">slope =</span> mod1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]],</span>
<span id="cb4-16">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">intercept =</span> mod1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]],</span>
<span id="cb4-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linewidth =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb4-18">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Age (years)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Systolic blood pressure"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb4-22">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb4-23">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>),</span>
<span id="cb4-24">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>)</span>
<span id="cb4-25">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/reconciling-regression-equation-spline-terms/index_files/figure-html/unnamed-chunk-3-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Simple enough. We can verify our formula works by comparing with the output of the <code>predict</code> function:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb5-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">BP1 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(mod1, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">newdata =</span> dat),</span>
<span id="cb5-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">BP2 =</span> mod1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> mod1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Age</span>
<span id="cb5-4">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">with</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> _, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">near</span>(BP1, BP2))), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"% of fitted values match."</span>))</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "100% of fitted values match."</code></pre>
</div>
</div>
<p>As we can see, all of the predictions match, thus we understand exactly how the model works.</p>
<section id="adding-spline-terms" class="level3">
<h3 class="anchored" data-anchor-id="adding-spline-terms">Adding spline terms</h3>
<p>Now suppose from the plot above we argue there may be a nonlinear relationship between age and blood pressure, such that it only slightly increases at lower ages and then accelerates in higher ages. So we choose to use a <a href="https://en.wikipedia.org/wiki/Spline_(mathematics)">restricted cubic spline</a> with 3 knots:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">mod2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(BP <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> rms<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rcs</span>(Age, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> dat)</span>
<span id="cb7-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(mod2)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = BP ~ rms::rcs(Age, 3), data = dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-37.817 -11.576  -1.212   9.642  66.782 

Coefficients:
                     Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)           94.8089    11.2104   8.457 1.22e-15 ***
rms::rcs(Age, 3)Age    0.7016     0.2351   2.984  0.00308 ** 
rms::rcs(Age, 3)Age'  -0.1854     0.2640  -0.702  0.48305    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 16.91 on 300 degrees of freedom
Multiple R-squared:  0.0827,    Adjusted R-squared:  0.07659 
F-statistic: 13.52 on 2 and 300 DF,  p-value: 2.38e-06</code></pre>
</div>
</div>
<p>We can see there are now 2 parameters in the model to capture the age effect. We can again plot this to see what it looks like:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a plot</span></span>
<span id="cb9-2">dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(</span>
<span id="cb9-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb9-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Age,</span>
<span id="cb9-7">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> BP</span>
<span id="cb9-8">    ),</span>
<span id="cb9-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span>,</span>
<span id="cb9-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,</span>
<span id="cb9-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"brown"</span>,</span>
<span id="cb9-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb9-13">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(</span>
<span id="cb9-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Age =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>(dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Age), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Fitted =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(mod2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">newdata =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Age =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>(dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Age)))),</span>
<span id="cb9-16">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb9-17">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Age,</span>
<span id="cb9-18">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> Fitted</span>
<span id="cb9-19">    ),</span>
<span id="cb9-20">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linewidth =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb9-21">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-22">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Age (years)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Systolic blood pressure"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-24">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb9-25">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb9-26">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>),</span>
<span id="cb9-27">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>)</span>
<span id="cb9-28">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/reconciling-regression-equation-spline-terms/index_files/figure-html/unnamed-chunk-6-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>The estimated curve didn’t take the shape of what we suspected (and maybe with more flexibility (i.e., knots), it would), but we’ll run with it anyway.</p>
</section>
<section id="writing-the-regression-equation" class="level3">
<h3 class="anchored" data-anchor-id="writing-the-regression-equation">Writing the regression equation</h3>
<p>The question is: <em>how do we write out the formula for this model such that we input an age (in years) and get a predicted blood pressure?</em></p>
<p>If we did what we did before with <a href="https://en.wikipedia.org/wiki/Simple_linear_regression">simple linear regression</a> and just blindly plug in age to what is given from the model output above, we’d have:</p>
<p><img src="https://latex.codecogs.com/png.latex?BP%20=%2094.8%20+%200.70%20%5Ctimes%20Age%20-%200.19%20%5Ctimes%20Age%20"></p>
<p>Now if we again compare the output of that equation with what the correct fitted values are from the <code>predict</code> function:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Gather set of predicted values</span></span>
<span id="cb10-2">preds <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb10-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb10-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">BP1 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(mod2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">newdata =</span> dat),</span>
<span id="cb10-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">BP2 =</span> mod2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> mod2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Age <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> mod2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Age</span>
<span id="cb10-6">  )</span>
<span id="cb10-7"></span>
<span id="cb10-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Check concordance</span></span>
<span id="cb10-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">with</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> preds, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">near</span>(BP1, BP2))), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"% of fitted values match."</span>))</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "0% of fitted values match."</code></pre>
</div>
</div>
<p>It turns out <em>none</em> of them are correct. So obviously something is wrong.</p>
</section>
<section id="find-the-knot-locations" class="level3">
<h3 class="anchored" data-anchor-id="find-the-knot-locations">Find the knot locations</h3>
<p>First, we need to figure out what gets inputted into the equation. The coefficient for the second age term in the model output (-0.185) is not multiplied by the raw age value, but rather by some transformation of it.</p>
<p>Specifically, we can <a href="https://www.zajichekstats.com/post/the-evasive-spline/">recall</a> that age on the original scale will be transformed by a truncated power basis <em>at each knot</em>:</p>
<p><img src="https://latex.codecogs.com/png.latex?h(x,%5Cnu)%20=%20(x-%5Cnu)%5E3_+"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?%5Cnu"> is the knot location and <img src="https://latex.codecogs.com/png.latex?+"> means we take <img src="https://latex.codecogs.com/png.latex?(x-%5Cnu)%5E3"> if <img src="https://latex.codecogs.com/png.latex?x%3E%5Cnu"> and 0 otherwise.</p>
<p>In our example, we have 3 knot locations:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract knot locations</span></span>
<span id="cb12-2">knots <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">attr</span>(rms<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rcs</span>(dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Age, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"parms"</span>)</span>
<span id="cb12-3">knots</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 42 56 66</code></pre>
</div>
</div>
<p>We can verify that these are the <img src="https://latex.codecogs.com/png.latex?10%5E%7Bth%7D">, <img src="https://latex.codecogs.com/png.latex?50%5E%7Bth%7D"> and <img src="https://latex.codecogs.com/png.latex?90%5E%7Bth%7D"> percentiles of the age distribution (which is the default).</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">quantile</span>(dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Age, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>))</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>10% 50% 90% 
 42  56  66 </code></pre>
</div>
</div>
<p>Then, we can write out the full transformation for the spline term using the <a href="https://support.sas.com/resources/papers/proceedings16/5621-2016.pdf">formula</a>:</p>
<p><img src="https://latex.codecogs.com/png.latex?X_%7Btrans%7D%20=%20(x-%5Cnu_1)%5E3_+%20-%20%5Cfrac%7B%5Cnu_3%20-%20%5Cnu_1%7D%7B%5Cnu_3%20-%20%5Cnu_2%7D(x-%5Cnu_2)%5E3_+%20+%20%5Cfrac%7B%5Cnu_2%20-%20%5Cnu_1%7D%7B%5Cnu_3%20-%20%5Cnu_2%7D(x-%5Cnu_3)%5E3_+"></p>
<p>Plugging in our quantities, we get:</p>
<p><img src="https://latex.codecogs.com/png.latex?Age_%7Bspline%7D%20=%20(age%20-%2042)%5E3_+%20-%20%5Cfrac%7B66-42%7D%7B66-56%7D(age%20-%2056)%5E3_+%20+%20%5Cfrac%7B56-42%7D%7B66-56%7D(age%20-%2066)%5E3_+"></p>
<p>Ahhh okay. So we need to take our raw age value, first plug it into that formula, and <em>then</em> multiply it by the coefficient from the model’s output. In sloppy notation, something like this:</p>
<p><img src="https://latex.codecogs.com/png.latex?BP%20=%2094.8%20+%200.70%20%5Ctimes%20Age%20-%200.19%20%5Ctimes%20Age_%7Bspline%7D"></p>
<p>So we’ll do that, again comparing with the correct output of the <code>predict</code> function:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute the spline value</span></span>
<span id="cb16-2">age_spline <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb16-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pmax</span>((dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Age<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>knots[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> </span>
<span id="cb16-4">  (knots[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> knots[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>(knots[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> knots[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pmax</span>((dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Age<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>knots[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb16-5">  (knots[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> knots[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>(knots[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> knots[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pmax</span>((dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Age<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>knots[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span></span>
<span id="cb16-6"></span>
<span id="cb16-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add new calculation</span></span>
<span id="cb16-8">preds<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>BP3 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mod2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> mod2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Age <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> mod2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> age_spline</span>
<span id="cb16-9"></span>
<span id="cb16-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Check concordance</span></span>
<span id="cb16-11"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">with</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> preds, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">near</span>(BP1, BP3))), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"% of fitted values match."</span>))</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "12% of fitted values match."</code></pre>
</div>
</div>
<p>We’re still having problems. It looks like <em>some</em> of the predictions from our new formula match the correct output, but most don’t.</p>
</section>
<section id="normalizing-the-transformation" class="level3">
<h3 class="anchored" data-anchor-id="normalizing-the-transformation">Normalizing the transformation</h3>
<p>In the <a href="https://www.rdocumentation.org/packages/Hmisc/versions/5.1-3/topics/rcspline.eval"><code>rms::rcs</code> documentation</a>, it states that the default behavior (seen by the <code>norm</code> argument) is to <em>“normalize by the square of the spacing between the first and last knots”</em>. Applying this to our age transformation, our normalization factor is:</p>
<p><img src="https://latex.codecogs.com/png.latex?Norm%20=%20(%5Cnu_3%20-%20%5Cnu_1)%5E2%20=%20(66%20-%2042)%5E2%20=%2024%5E2%20=%20576"></p>
<p>So in finding the basis vectors for the spline term, this value was implicitly multiplied through, causing our apparent equation above to be miscalibrated from the original age scale. To remedy this, we simply need to <em>divide</em> the model coefficient for the spline term (-0.185) by the normalization factor. So our equation becomes:</p>
<p><img src="https://latex.codecogs.com/png.latex?BP%20=%2094.8%20+%200.70%20%5Ctimes%20Age%20-%20%5Cfrac%7B0.19%7D%7BNorm%7D%20%5Ctimes%20Age_%7Bspline%7D"></p>
<p>We can again check to see how this compares to the (correct) output of the <code>predict</code> function:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute the normalizing factor</span></span>
<span id="cb18-2">norm_factor <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> (knots[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> knots[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb18-3"></span>
<span id="cb18-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add new calculation</span></span>
<span id="cb18-5">preds<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>BP4 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mod2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> mod2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Age <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> mod2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> norm_factor <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> age_spline </span>
<span id="cb18-6"></span>
<span id="cb18-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Check concordance</span></span>
<span id="cb18-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">with</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> preds, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">near</span>(BP1, BP4))), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"% of fitted values match."</span>))</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "100% of fitted values match."</code></pre>
</div>
</div>
<p>Finally, all of our calculated predicted values are correct!</p>
</section>
<section id="the-final-formula" class="level3">
<h3 class="anchored" data-anchor-id="the-final-formula">The final formula</h3>
<p>We already wrote this out above, if you want to piece together the various components, but we’ll do it again here in one fell swoop.</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Bequation%7D%0A%5Cbegin%7Bsplit%7D%0ABP%20&amp;%20=%2094.8%20+%200.70%20%5Ctimes%20Age%20+%20%5C%5C%0A&amp;%20=%20%5Cfrac%7B-0.19%7D%7B(66-42)%5E2%7D%20%5Ctimes%20%5Cbigg%20(%20%5C%5C%0A&amp;%20=%20(Age%20-%2042)%5E3_+%20-%20%5C%5C%0A&amp;%20=%20%5Cfrac%7B66-42%7D%7B66-56%7D(Age%20-%2056)%5E3_+%20+%20%5C%5C%0A&amp;%20=%20%5Cfrac%7B56-42%7D%7B66-56%7D(Age%20-%2066)%5E3_+%20%5Cbigg%20)%20%5C%5C%0A%5Cend%7Bsplit%7D%0A%5Cend%7Bequation%7D%0A"></p>
<p>Now we have an equation that takes an age value as input (in years) and outputs the predicted systolic blood pressure. Finally, we understand exactly how <em>this</em> model works.</p>
</section>
<section id="why-is-this-useful" class="level2">
<h2 class="anchored" data-anchor-id="why-is-this-useful">Why is this useful?</h2>
<p>First, it’s obviously important to understand how modeling software is getting to the results it gives you, so that you can correctly interpret it, among other things. Second, from a pragmatic point of view, the ability to write out the full regression equation allows you embed your model into any application (even Excel if you wanted), instead of requiring the <code>predict</code> function to be run, which in turn would require <code>R</code> to be a part of the application’s server. In this case, it’s unnecessary for that to be a requirement, since we can simply reconcile our model into an easily understandable equation. Luckily, there are also <a href="https://www.rdocumentation.org/packages/rms/versions/6.8-2/topics/Function">some functions</a> that can extract these formulas for you for this exact purpose so you don’t <em>have</em> to go through the cumbersome calculations above.</p>
    <bluesky-comments post="at://did:plc:sh3av73hihgu72vx7k44kgv7/app.bsky.feed.post/3lbubbyx3v42m" config="{}"></bluesky-comments>
  


<!-- -->

</section>

 ]]></description>
  <category>Regression</category>
  <guid>https://www.zajichekstats.com/post/reconciling-regression-equation-spline-terms/</guid>
  <pubDate>Mon, 25 Nov 2024 06:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/reconciling-regression-equation-spline-terms/feature.png" medium="image" type="image/png" height="152" width="144"/>
</item>
<item>
  <title>Can you have a model without data?</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/can-you-have-a-model-without-data/</link>
  <description><![CDATA[ 




<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/bUerkVsCwtA" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<p>In <a href="https://en.wikipedia.org/wiki/Frequentist_inference">frequentist</a> statistics, the paradigm in which much of statistical practice is done, has a specific requirement: we need data <em>before</em> we can attribute estimates to (i.e., “fit”) our model. Yes, we might pre-specify it’s form, and be quite confident in what that looks like, but ultimately before we get an answer, we need the data.</p>
<p>For example, suppose we are interested in the proportion of individuals with heart disease. We can specify an assumed model:</p>
<p><img src="https://latex.codecogs.com/png.latex?X%20%5Csim%20Bernoulli(p)"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?X%20%5Cin%20%5C%7B0,1%5C%7D"> and <img src="https://latex.codecogs.com/png.latex?p%20%5Cin%20%5B0,1%5D">.</p>
<p>That is, we assume that whether an individual has heart disease is a coin flip with probability <img src="https://latex.codecogs.com/png.latex?p">. Our goal is to estimate what <img src="https://latex.codecogs.com/png.latex?p"> is.</p>
<p>We plan to use the <a href="https://en.wikipedia.org/wiki/Population_proportion">typical approach</a> for estimating a population proportion such that:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Chat%7Bp%7D%20=%20%5Cfrac%7B%5Csum_%7Bi=1%7D%5Enx_i%7D%7Bn%7D%20%5Chskip.2in%20Var(%5Chat%7Bp%7D)%20=%20%5Cfrac%7B%5Chat%7Bp%7D(1-%5Chat%7Bp%7D)%7D%7Bn%7D"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?x_i"> is the indicator of whether or not individual <img src="https://latex.codecogs.com/png.latex?i"> in the sample has heart disease, and <img src="https://latex.codecogs.com/png.latex?n"> is the total number of individuals in the sample. That is, we take the average, or sample proportion. The variance provides a window of uncertainty in our estimate.</p>
<p>Okay let’s do it.</p>
<p>But wait, in order for us to get a numerical quantity to work with, we need data to plug into these equations. <em>That is the point</em>. Our model in this paradigm becomes <em>data</em> focused, such that a sample is required. And a large enough one at that.</p>
<p><em>Our model of the world is completely dependent on collecting and entering a sample into the estimators, despite what we may already know about heart disease rates. Thus, it is only informed by the data at hand.</em></p>
<p>Okay, fine. So we find a dataset related to heart disease:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">dat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> cheese<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>heart_disease</span>
<span id="cb1-2">dat</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 303 × 9
     Age Sex    ChestPain           BP Cholesterol BloodSugar MaximumHR
   &lt;dbl&gt; &lt;fct&gt;  &lt;fct&gt;            &lt;dbl&gt;       &lt;dbl&gt; &lt;lgl&gt;          &lt;dbl&gt;
 1    63 Male   Typical angina     145         233 TRUE             150
 2    67 Male   Asymptomatic       160         286 FALSE            108
 3    67 Male   Asymptomatic       120         229 FALSE            129
 4    37 Male   Non-anginal pain   130         250 FALSE            187
 5    41 Female Atypical angina    130         204 FALSE            172
 6    56 Male   Atypical angina    120         236 FALSE            178
 7    62 Female Asymptomatic       140         268 FALSE            160
 8    57 Female Asymptomatic       120         354 FALSE            163
 9    63 Male   Asymptomatic       130         254 FALSE            147
10    53 Male   Asymptomatic       140         203 TRUE             155
# ℹ 293 more rows
# ℹ 2 more variables: ExerciseInducedAngina &lt;fct&gt;, HeartDisease &lt;fct&gt;</code></pre>
</div>
</div>
<p>And we plug our estimates into the formulas and get our result:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sample proportion</span></span>
<span id="cb3-2">p_hat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HeartDisease <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Yes"</span>)</span>
<span id="cb3-3"></span>
<span id="cb3-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sample size</span></span>
<span id="cb3-5">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(dat)</span>
<span id="cb3-6"></span>
<span id="cb3-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Standard error</span></span>
<span id="cb3-8">se <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>((p_hat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> p_hat)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> n)</span>
<span id="cb3-9"></span>
<span id="cb3-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Construct confidence interval</span></span>
<span id="cb3-11"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>(</span>
<span id="cb3-12">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Estimate =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(p_hat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%"</span>),</span>
<span id="cb3-13">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Lower =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (p_hat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">qnorm</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">96</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> se), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%"</span>),</span>
<span id="cb3-14">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Upper =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (p_hat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">qnorm</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">96</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> se), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%"</span>)</span>
<span id="cb3-15">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-16">  knitr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">kable</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"html"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-17">  kableExtra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">kable_styling</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">full_width =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-18">  kableExtra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_header_above</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"92% Confidence Interval"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span></code></pre></div>
</details>
<div class="cell-output-display">
<table class="table caption-top table-sm table-striped small" data-quarto-postprocess="true">
<colgroup>
<col style="width: 33%">
<col style="width: 33%">
<col style="width: 33%">
</colgroup>
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th" style="text-align: left; empty-cells: hide; border-bottom: hidden;"></th>
<th colspan="2" data-quarto-table-cell-role="th" style="text-align: center; border-bottom: hidden; padding-bottom: 0; padding-left: 3px; padding-right: 3px;"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">
92% Confidence Interval
</div></th>
</tr>
<tr class="even">
<th style="text-align: left;" data-quarto-table-cell-role="th">Estimate</th>
<th style="text-align: left;" data-quarto-table-cell-role="th">Lower</th>
<th style="text-align: left;" data-quarto-table-cell-role="th">Upper</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">45.87%</td>
<td style="text-align: left;">40.86%</td>
<td style="text-align: left;">50.89%</td>
</tr>
</tbody>
</table>
</div>
</div>
<p>Despite what we may think of this result (which is certainly high in any general population context), there’s not much wiggle room with respect to the output. The data is what it is: we estimate that <img src="https://latex.codecogs.com/png.latex?p"> is somewhere between 41% and 51% with 92% confidence. And that’s it.</p>
<p>What if this sample wasn’t able to be gathered all at once? What if we already knew stuff about the rate of heart disease? What if we wanted our estimates to be informed by prior information we had?</p>
<p>When we consider our data being sequentially collected, we run into problems early on.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set grid</span></span>
<span id="cb4-2">n_obs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(dat))</span>
<span id="cb4-3"></span>
<span id="cb4-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set values</span></span>
<span id="cb4-5">p_hat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>()</span>
<span id="cb4-6">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>()</span>
<span id="cb4-7">se <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>()</span>
<span id="cb4-8"></span>
<span id="cb4-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Iterate the grid</span></span>
<span id="cb4-10"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span>(i <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(n_obs)) {</span>
<span id="cb4-11">  </span>
<span id="cb4-12">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute estimates</span></span>
<span id="cb4-13">  temp_n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> n_obs[i]</span>
<span id="cb4-14">  temp_p_hat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HeartDisease[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>temp_n] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Yes"</span>)</span>
<span id="cb4-15">  temp_se <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>((temp_p_hat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> temp_p_hat)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> temp_n)</span>
<span id="cb4-16">  </span>
<span id="cb4-17">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add to lists</span></span>
<span id="cb4-18">  p_hat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(p_hat, temp_p_hat)</span>
<span id="cb4-19">  n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(n, temp_n)</span>
<span id="cb4-20">  se <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(se, temp_se)</span>
<span id="cb4-21">  </span>
<span id="cb4-22">}</span>
<span id="cb4-23"></span>
<span id="cb4-24"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a plot</span></span>
<span id="cb4-25"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggplot2)</span>
<span id="cb4-26"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>(</span>
<span id="cb4-27">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p_hat =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>, p_hat), </span>
<span id="cb4-28">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n),</span>
<span id="cb4-29">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">se =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>, se)</span>
<span id="cb4-30">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-31">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb4-32">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(</span>
<span id="cb4-33">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb4-34">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(n),</span>
<span id="cb4-35">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> p_hat</span>
<span id="cb4-36">    ),</span>
<span id="cb4-37">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span></span>
<span id="cb4-38">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-39">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_linerange</span>(</span>
<span id="cb4-40">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb4-41">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(n),</span>
<span id="cb4-42">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ymin =</span> p_hat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">qnorm</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">96</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> se,</span>
<span id="cb4-43">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ymax =</span> p_hat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">qnorm</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">96</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> se</span>
<span id="cb4-44">    ),</span>
<span id="cb4-45">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linewidth =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb4-46">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-47">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_flip</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-48">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb4-49">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb4-50">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.major.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray"</span>),</span>
<span id="cb4-51">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>),</span>
<span id="cb4-52">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>)</span>
<span id="cb4-53">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-54">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Estimated p (92% CI)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>percent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-55">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sample size"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/can-you-have-a-model-without-data/index_files/figure-html/unnamed-chunk-3-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>First, when we have no data, we can’t get <em>any</em> estimate (obviously). We have to pretend we know <em>nothing</em> about what the rate of heart disease is. After adding only 1 observation, our estimate for <img src="https://latex.codecogs.com/png.latex?p"> is 0% with a 92% confidence interval ranging from 0% to 0%. This is not useful or informative, as that estimate was based on a single individual. Then, when we just add 4 more observations, our estimate of <img src="https://latex.codecogs.com/png.latex?p"> becomes wildly uncertain (40% with a 92% confidence interval from 2% to 78%). This accumulated information is inconsistent and counter-intuitive (of course I’m using large sample methods, so we could use more appropriate small sample approaches, but that’s part of the point). Eventually as more data is added, the estimate gets more precise, but, again, completely driven by the data.</p>
<p>The bottom line being that in the frequentist paradigm, we are handcuffed. We can’t mathematically provide estimates until there is sufficient data collected, despite what our intuition or prior knowledge tells us about the parameter of interest beforehand.</p>
<p>What if we have no data, or very little? What if we need to make decisions along the way before all of the data is collected, using our best estimate as of <em>now</em>? As we saw above, we need to wait to have sufficient data to get something reliable.</p>
<p>Is there a way to provide a starting point about what we <em>think</em> the true rate of heart disease is, and then have our estimates be informed or augmented by evidence?</p>
<p>Yes, by being a Bayesian.</p>
<section id="bayesian-thinking" class="level1">
<h1>Bayesian Thinking</h1>
<p>The way I like to think about how <a href="https://en.wikipedia.org/wiki/Bayesian_statistics">Bayesian statistics</a> differs from frequentist methods is that <em>the model is everything</em>. Here, we focus on the <em>model</em> and treat it as a living, breathing object. The data becomes secondary, sometimes an afterthought, and only used <em>as needed</em> in order to update our knowledge about the world as new information comes in.</p>
<section id="prior" class="level2">
<h2 class="anchored" data-anchor-id="prior">Specify a prior distribution</h2>
<p>Before any of the data is collected, we can use our subject matter knowledge about a phenomenon as to where we think a parameter value lies.</p>
<p>In the example above, suppose we thought it’s likely that the true parameter value <img src="https://latex.codecogs.com/png.latex?p"> is somewhere around 35% in this population, of course allowing for some uncertainty.</p>
<p>We can assign a <a href="https://en.wikipedia.org/wiki/Prior_probability">prior distribution</a> to <img src="https://latex.codecogs.com/png.latex?p"> using the <a href="https://en.wikipedia.org/wiki/Beta_distribution">Beta</a> distribution (you could use anything you wanted that adheres to your prior knowledge, it just happens that this distribution works nicely in the case of proportions, an example of the <a href="https://en.wikipedia.org/wiki/Conjugate_prior">conjugate prior</a>):</p>
<p><img src="https://latex.codecogs.com/png.latex?p%20%5Csim%20Beta(%5Calpha%20=%204.5,%20%5Cbeta%20=%207.5)"></p>
<p>where the <a href="https://en.wikipedia.org/wiki/Probability_density_function">probability density function (PDF)</a> is defined as:</p>
<p><img src="https://latex.codecogs.com/png.latex?f(x%7C%5Calpha,%20%5Cbeta)%20=%20%5Cfrac%7B%5CGamma(%5Calpha+%5Cbeta)%7D%7B%5CGamma(%5Calpha)%5CGamma(%5Cbeta)%7Dx%5E%7B%5Calpha-1%7D(1-x)%5E%7B%5Cbeta-1%7D"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?x%20%5Cin%20%5B0,1%5D">, <img src="https://latex.codecogs.com/png.latex?%5Calpha,%20%5Cbeta%20%3E%200">, and <img src="https://latex.codecogs.com/png.latex?%5CGamma(n)%20=%20(n-1)!">.</p>
<p>And we can construct a plot to visualize our initial beliefs about the parameter <img src="https://latex.codecogs.com/png.latex?p">:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get the density values</span></span>
<span id="cb5-2">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">01</span>)</span>
<span id="cb5-3">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbeta</span>(x, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">7.5</span>)</span>
<span id="cb5-4"></span>
<span id="cb5-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make the plot</span></span>
<span id="cb5-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>(x, y) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(</span>
<span id="cb5-9">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb5-10">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> x,</span>
<span id="cb5-11">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> y</span>
<span id="cb5-12">    )</span>
<span id="cb5-13">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_vline</span>(</span>
<span id="cb5-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xintercept =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>),</span>
<span id="cb5-16">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span></span>
<span id="cb5-17">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-18">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb5-19">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb5-20">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>),</span>
<span id="cb5-21">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>),</span>
<span id="cb5-22">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb5-23">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb5-24">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.ticks.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>()</span>
<span id="cb5-25">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-26">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_x_continuous</span>(</span>
<span id="cb5-27">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"p"</span>,</span>
<span id="cb5-28">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>percent</span>
<span id="cb5-29">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/can-you-have-a-model-without-data/index_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>So initially we think there is a 72% chance that the true value of <img src="https://latex.codecogs.com/png.latex?p"> is between 20% and 50% (taken as the area under the curve between those two points), with more probability mass towards the center.</p>
<p>In this sense, we <em>have</em> our model estimate already in its complete form. If the current state of information isn’t sufficient, <em>then</em> we can collect data to help guide/inform our prior belief. So instead of requiring a (large enough) sample to realize any numerical estimate, we have one with zero data points. As we add data, our model will update proportionally/appropriately to the amount of new information it contains. Therefore, we can think of this <em>prior</em> distribution as equivalent to our <em>current</em> <a href="https://en.wikipedia.org/wiki/Posterior_probability">posterior distribution</a>, whether we got here from prior data, intuition, or a plain guess, it doesn’t really matter. Our current knowledge of <img src="https://latex.codecogs.com/png.latex?p"> captures all that we know about it, and will only change as new information is added. <em>Thus, we have constructed a model with no data.</em></p>
</section>
<section id="updating-the-model-with-data" class="level2">
<h2 class="anchored" data-anchor-id="updating-the-model-with-data">Updating the model with data</h2>
<p>Now we want a way to update our prior (or current) knowledge of the parameter of interest as new information comes in. The result of this is called the <a href="https://en.wikipedia.org/wiki/Posterior_probability">posterior distribution</a>, which tells us where the parameter value(s) are most likely to be, given our prior beliefs + new data. The derivation of this distribution is done through <a href="https://en.wikipedia.org/wiki/Bayes%27_theorem">Bayes’ theorem</a>.</p>
<p>In our example (and any analysis), the posterior distribution is written as:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(p%7Cdata)%20=%20%5Cfrac%7BP(p)P(data%7Cp)%7D%7BP(data)%7D"></p>
<p>The <img src="https://latex.codecogs.com/png.latex?P(p)"> is the prior distribution, which we saw above, <img src="https://latex.codecogs.com/png.latex?P(data%7Cp)"> is the <em>likelihood</em> of observing our data given a particular value of <img src="https://latex.codecogs.com/png.latex?p">, and <img src="https://latex.codecogs.com/png.latex?P(data)"> is the probability of observing our dataset across all values of <img src="https://latex.codecogs.com/png.latex?p"> (i.e., the <a href="https://en.wikipedia.org/wiki/Law_of_total_probability">law of total probability</a>). In general, the denominator is not dependent on the parameter, and since we’re conditioned on the <img src="https://latex.codecogs.com/png.latex?data">, this just amounts to a normalizing constant to ensure the posterior distribution is a <a href="https://study.com/skill/learn/how-to-determine-if-a-probability-distribution-is-valid-explanation.html">valid probability distribution</a>. Thus, we only need to concern ourselves with the form of the numerator, and can write the posterior as <em>proportional</em> to the product of the prior and likelihood:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(p%7Cdata)%20%5Cpropto%20P(p)P(data%7Cp)"></p>
<section id="define-the-likelihood" class="level3">
<h3 class="anchored" data-anchor-id="define-the-likelihood">Define the likelihood</h3>
<p>Similar to what we saw in the frequentist approach above, the <a href="https://en.wikipedia.org/wiki/Likelihood_function">likelihood</a> of the data in our Bayesian model can be thought of as a <a href="https://en.wikipedia.org/wiki/Bernoulli_distribution">Bernoulli</a> random variable, where each patient has heart disease or they don’t, for a given probability <img src="https://latex.codecogs.com/png.latex?p">. Because our observations are independent, the collection of these “coin flips” can be summarized using the <a href="https://en.wikipedia.org/wiki/Binomial_distribution">Binomial distribution</a>:</p>
<p><img src="https://latex.codecogs.com/png.latex?H%7Cn,%20p%20%5Csim%20Binomial(n,p)"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?n"> is the sample size, <img src="https://latex.codecogs.com/png.latex?p"> is the probability of heart disease (the parameter of concern), and <img src="https://latex.codecogs.com/png.latex?H"> is the total number of patients with heart disease in a sample. The <a href="https://en.wikipedia.org/wiki/Probability_mass_function">probability mass function (PMF)</a> for this distribution looks like:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(data%7Cp)%20=%20P(H%7Cn,p)%20=%20%5Cfrac%7Bn!%7D%7BH!(n-H)!%7Dp%5EH(1-p)%5E%7Bn-H%7D"></p>
<p>So for a given sample size and probability, we can compute the likelihood of observing any number of patients with heart disease.</p>
</section>
<section id="derive-the-posterior" class="level3">
<h3 class="anchored" data-anchor-id="derive-the-posterior">Derive the posterior</h3>
<p>As mentioned, the posterior is derived by taking the prior distribution multiplied by the likelihood function.</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Bequation%7D%0A%5Cbegin%7Bsplit%7D%0AP(p%7Cdata)%20&amp;%20%5Cpropto%20P(p)P(data%7Cp)%20%5C%5C%0A&amp;%20=%20P(p)P(H%7Cn,%20p)%5C%5C%0A&amp;%20=%20%5Cfrac%7B%5CGamma(%5Calpha+%5Cbeta)%7D%7B%5CGamma(%5Calpha)%5CGamma(%5Cbeta)%7Dp%5E%7B%5Calpha-1%7D(1-p)%5E%7B%5Cbeta-1%7D%20%5Cfrac%7Bn!%7D%7BH!(n-H)!%7Dp%5EH(1-p)%5E%7Bn-H%7D%20%5C%5C%0A&amp;%20%5Cpropto%20p%5E%7B%5Calpha%20-%201%20+%20H%7D(1-p)%5E%7B%5Cbeta%20-%201%20+%20n%20-%20H%7D%0A%5Cend%7Bsplit%7D%0A%5Cend%7Bequation%7D%0A"></p>
<p>It turns out this is just another Beta distribution with a different parameterization (note the * to differentiate from the prior parameter values):</p>
<p><img src="https://latex.codecogs.com/png.latex?p%7CN,Y%20%5Csim%20Beta(%5Calpha%5E*%20=%20%5Calpha%20+%20H,%20%5Cbeta%5E*%20=%20%5Cbeta%20+%20n%20-%20H)"></p>
<p>Notice that we only needed the <a href="https://en.wikipedia.org/wiki/Kernel_(statistics)">kernel</a> to classify this distribution, because that part was dependent on the parameter <img src="https://latex.codecogs.com/png.latex?p">. The rest is just a constant that normalizes it to be a valid probability distribution (as mentioned earlier), meaning it sums (integrates) to 1. Thus, since we know it’s a Beta, we can write out the full posterior PDF:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Bequation%7D%0A%5Cbegin%7Bsplit%7D%0Af(p%7C%5Calpha%5E*,%20%5Cbeta%5E*)%20&amp;%20=%20%5Cfrac%7B%5CGamma(%5Calpha%5E*%20+%20%5Cbeta%5E*)%7D%7B%5CGamma(%5Calpha%5E*)%5CGamma(%5Cbeta%5E*)%7Dp%5E%7B%5Calpha%5E*-1%7D(1-p)%5E%7B%5Cbeta%5E*-1%7D%20%5C%5C%0A&amp;%20=%20%5Cfrac%7B%5CGamma(%5Calpha%20+%20%5Cbeta%20+%20n)%7D%7B%5CGamma(%5Calpha%20+%20H)%5CGamma(%5Cbeta%20+%20n%20-%20H)%7Dp%5E%7B%5Calpha%20+%20H-1%7D(1-p)%5E%7B%5Cbeta%20+%20n%20-%20H-1%7D%20%5C%5C%0A%5Cend%7Bsplit%7D%0A%5Cend%7Bequation%7D%0A"></p>
<p>What this tells us is that from our initial model, the posterior distribution is just moved/shifted as new data comes in. Also notice the effect of sample size as clearly indicated by the equation: as more data comes in (i.e., higher <img src="https://latex.codecogs.com/png.latex?H"> and <img src="https://latex.codecogs.com/png.latex?N"> values), the more the prior distribution will be drowned out. Meaning that we are only straying away from the initial/prior belief “proportional” to how much new information is coming in. <em>This</em> is what allows us to have perfectly valid models and estimates, even with a single observation or no data at all.</p>
</section>
<section id="plug-in-the-data" class="level3">
<h3 class="anchored" data-anchor-id="plug-in-the-data">Plug in the data</h3>
<p>The hard part is done. Now all we need to do is plug in our data into the posterior distribution. In our sample of 303 patients (<img src="https://latex.codecogs.com/png.latex?n">), we observed 139 patients with heart disease (<img src="https://latex.codecogs.com/png.latex?H">). Plotting it across the range of possible values for <img src="https://latex.codecogs.com/png.latex?p"> looks like this:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set sample stats</span></span>
<span id="cb6-2">H <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HeartDisease <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Yes"</span>)</span>
<span id="cb6-3">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(dat)</span>
<span id="cb6-4"></span>
<span id="cb6-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get the density values</span></span>
<span id="cb6-6">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">01</span>)</span>
<span id="cb6-7">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbeta</span>(x, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> H, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">7.5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> H)</span>
<span id="cb6-8"></span>
<span id="cb6-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make the plot</span></span>
<span id="cb6-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>(x, y) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(</span>
<span id="cb6-13">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb6-14">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> x,</span>
<span id="cb6-15">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> y</span>
<span id="cb6-16">    )</span>
<span id="cb6-17">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-18">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_vline</span>(</span>
<span id="cb6-19">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xintercept =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>),</span>
<span id="cb6-20">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span></span>
<span id="cb6-21">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-22">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb6-23">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb6-24">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>),</span>
<span id="cb6-25">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>),</span>
<span id="cb6-26">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb6-27">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb6-28">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.ticks.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>()</span>
<span id="cb6-29">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-30">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_x_continuous</span>(</span>
<span id="cb6-31">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"p"</span>,</span>
<span id="cb6-32">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>percent</span>
<span id="cb6-33">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/can-you-have-a-model-without-data/index_files/figure-html/unnamed-chunk-5-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>After our update using the data set (appended to our prior belief), we now estimate there is a 94% chance that the true value of <img src="https://latex.codecogs.com/png.latex?p"> is between 40% and 50% (taken as the area under the curve between those two points), again with more probability mass towards the center.</p>
</section>
<section id="incremental-updates" class="level3">
<h3 class="anchored" data-anchor-id="incremental-updates">Incremental updates</h3>
<p>To drive the point home, we’ll now revisit how our estimates change when data is added <em>sequentially</em>, and contrast that with what we saw from the frequentist approach above. To do this, we can just evaluate the posterior distribution at incremental chunks of our dataset to see how it changes as more data is added (assuming some sort of chronological structure to the data).</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set values</span></span>
<span id="cb7-2">p <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">01</span>)</span>
<span id="cb7-3">results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p =</span> p, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbeta</span>(p, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">7.5</span>))</span>
<span id="cb7-4"></span>
<span id="cb7-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Iterate the grid</span></span>
<span id="cb7-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span>(i <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(n_obs)) {</span>
<span id="cb7-7">  </span>
<span id="cb7-8">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute estimates</span></span>
<span id="cb7-9">  temp_n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> n_obs[i]</span>
<span id="cb7-10">  temp_H<span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HeartDisease[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>temp_n] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Yes"</span>)</span>
<span id="cb7-11">  temp_y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbeta</span>(p, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> temp_H, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">7.5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> temp_n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> temp_H)</span>
<span id="cb7-12">  </span>
<span id="cb7-13">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Store values</span></span>
<span id="cb7-14">  results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb7-15">    results <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-16">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbind</span>(</span>
<span id="cb7-17">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>(</span>
<span id="cb7-18">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> temp_n,</span>
<span id="cb7-19">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p =</span> p,</span>
<span id="cb7-20">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> temp_y</span>
<span id="cb7-21">      )</span>
<span id="cb7-22">    )</span>
<span id="cb7-23">  </span>
<span id="cb7-24">}</span>
<span id="cb7-25"></span>
<span id="cb7-26"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make the plot</span></span>
<span id="cb7-27">results <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-28">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-29">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(</span>
<span id="cb7-30">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb7-31">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> p,</span>
<span id="cb7-32">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> y</span>
<span id="cb7-33">    )</span>
<span id="cb7-34">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb7-35">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"n = "</span>, n) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> forcats<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_reorder</span>(y, max)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-36">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_vline</span>(</span>
<span id="cb7-37">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xintercept =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>),</span>
<span id="cb7-38">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span></span>
<span id="cb7-39">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-40">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb7-41">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb7-42">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>),</span>
<span id="cb7-43">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>),</span>
<span id="cb7-44">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb7-45">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb7-46">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.ticks.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>()</span>
<span id="cb7-47">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-48">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_x_continuous</span>(</span>
<span id="cb7-49">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"p"</span>,</span>
<span id="cb7-50">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>percent</span>
<span id="cb7-51">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/can-you-have-a-model-without-data/index_files/figure-html/unnamed-chunk-6-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>As done before, the red lines indicate the amount of the posterior distribution between 40% and 50%.</p>
<p>First, we actually have an estimate before there is any data collected (when <img src="https://latex.codecogs.com/png.latex?n=0">). As we add a few observations, it only changes a little, but our estimates still retain our prior information about <img src="https://latex.codecogs.com/png.latex?p">. Then as more data is collected, we see the posterior distribution become much more precise in where it estimates <img src="https://latex.codecogs.com/png.latex?p"> to be.</p>
<p>This smooth integration of, and transition from, the prior knowledge we incorporate into the model to the information augmented by the data we collect is one reason why I think Bayesian thinking is better suited for scientific modeling. A more natural accumulation of knowledge, erasing the boundaries between what we already know (which should be considered a form of “data” itself) and hard data collected on a spreadsheet. It changes the way you approach the problem: instead of focusing right away on the data, which will be exhausted once it’s used, you focus on conceptualizing the living, breathing <em>model</em> of the world that generated it, and thus allow <em>data</em> to only contribute to that model as seen fit.</p>
    <bluesky-comments post="at://did:plc:sh3av73hihgu72vx7k44kgv7/app.bsky.feed.post/3lbzdwct5222m" config="{}"></bluesky-comments>
  


<!-- -->

</section>
</section>
</section>

 ]]></description>
  <category>Bayesian Statistics</category>
  <guid>https://www.zajichekstats.com/post/can-you-have-a-model-without-data/</guid>
  <pubDate>Tue, 29 Oct 2024 05:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/can-you-have-a-model-without-data/feature.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>You should have a data science blog</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/you-should-have-a-data-science-blog/</link>
  <description><![CDATA[ 




<p>One thing I would highly recommend to anyone with an interest in statistics/data science, whether you’re a student just starting out or a professional increasing your skills, is to create a blog. In <em>this</em> blog post, we’ll cover some of the benefits of doing so and touch on our preferred workflow to get one up and running (for free).</p>
<section id="benefits" class="level1">
<h1>Benefits</h1>
<section id="software-developmentdeployment" class="level2">
<h2 class="anchored" data-anchor-id="software-developmentdeployment">1. Software development/deployment</h2>
<p>You might think that the logistics/maintenance of a blog sounds cumbersome. You just want to stick to the content. However, learning how to create, develop, maintain, and deploy the blog, regardless of its content, provides in itself exposure to essential skills for data science: software development and deployment. Simply knowing the math or the R/Python commands to run a model is one thing, but if you want to deliver effective analytical solutions, those results probably need to be delivered to the end user in a useful way. Building a blog (website) forces you to learn about things such as <a href="https://www.atlassian.com/git/tutorials/what-is-version-control">version control</a>, <a href="https://www.synopsys.com/glossary/what-is-cicd.html">CI/CD</a>, <a href="https://en.wikipedia.org/wiki/Web_hosting_service">web hosting</a>, <a href="https://www.cloudflare.com/learning/dns/dns-records/">DNS records</a>, <a href="https://www.geeksforgeeks.org/web-development/">web development</a>…the list goes on, which are all components of creating a data science <em>product</em>. Marrying these things to the content creation itself provides you an arsenal of tools that can be exploited in all sorts of contexts.</p>
</section>
<section id="solidify-concepts" class="level2">
<h2 class="anchored" data-anchor-id="solidify-concepts">2. Solidify concepts</h2>
<p>There are a lot of topics to keep track of when you’re learning statistics/data science, and one very effective way to solidify your understanding is to write it out coherently. For example, you may be learning about <a href="">linear regression</a> but there may be nuances surrounding it that are fuzzy, such as why a <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> parameter is interpreted the way that it is. So, you may opt to write a blog post that goes through this derivation in an applied example with a coherent, structured narrative–by the end, you’ll generally grasp the thing you set out for. You just need to start with an outline of thing you want to understand and then learn it as you write. This is exactly the type of thing I do as well when I want to fully vet my understanding of something, such as in <a href="https://www.zajichekstats.com/post/the-evasive-spline/">this post</a>. Not only does this help with becoming a better writer, which is a great skill in and of itself, but it forces you to articulate a topic thoroughly, as if someone else is going to read it (which they hopefully will!). More importantly, it allows you to have permanent, easily accessible place to put your work.</p>
</section>
<section id="reference-repository" class="level2">
<h2 class="anchored" data-anchor-id="reference-repository">3. Reference repository</h2>
<p>On a related note of conceptual understanding: your blog posts are now just public web pages on the internet. That means you can use your collection of articles simply as an accessible repository for you to refer back to when you need them. If you took the time to write an article to help yourself solidify a concept (#2), then it was probably (a) tricky enough that over time you may lose your intuition on it occasionally, and (b) important enough that it was worth writing out. So, having a place where you’ve written out your thought process about a topic, in your own words, is an invaluable resource for you to refer back to. I can’t tell you the number of times I’ve referred back to articles I’ve written to remember little things. The beauty of it is that when my curiosity comes at a random part of the day, I know exactly where to look, and I can whip out my phone, remember the thing, and then stop thinking about it.</p>
</section>
<section id="portfolio-to-showcase" class="level2">
<h2 class="anchored" data-anchor-id="portfolio-to-showcase">4. Portfolio to showcase</h2>
<p>The fact that you even have a website is impressive. It shows that you can figure out how to do things, but more importantly, that you have the drive/initiative to create projects related to your craft. The skills in creating and maintaining the website are undoubtedly transferable to the work you’d do in data science. Add in interesting content you are producing in the blog posts themselves, such as code tutorials, analyses, method exploration, whatever it may be, then you have something that can prove your capabilities, and set you apart from other candidates. The convenience and security of having that all accessible through a simple web URL that you can quickly share with someone is a real advantage, as if someone asks you for work samples or something like that, you can confidently respond knowing the work was already done.</p>
</section>
<section id="engaging-in-public-discourse" class="level2">
<h2 class="anchored" data-anchor-id="engaging-in-public-discourse">5. Engaging in public discourse</h2>
<p>Having your own blog means you can write about whatever you’d like, however you want to say it, providing an avenue to contribute your own unique thoughts and perspectives. You may read a book or another article on some data science (or other) topic, and have strong thoughts, questions, or opinions about what was said. Or you just want to dig a little deeper into a certain aspect of it. Or reframe it in a way that is more understandable for yourself. Write it out and share it! Others may find your take interesting (or not, who cares). It only adds thought diversity, and makes you more connected to your field, as a peer.</p>
</section>
</section>
<section id="a-blog-workflow" class="level1">
<h1>A blog workflow</h1>
<p>In short, my preferred approach is based on the <a href="https://quarto.org/">Quarto</a> framework, and you can see a step-by-step tutorial <a href="https://quarto.org/docs/websites/website-blog.html">here</a>. That’s actually how this very website is built/maintained, which you can find the source code for <a href="https://github.com/centralstatz/centralstatz">here</a>. I’ll just touch on some of the steps involved:</p>
<section id="install-software" class="level2">
<h2 class="anchored" data-anchor-id="install-software">1. Install software</h2>
<p>I prefer to use <a href="https://posit.co/download/rstudio-desktop/">RStudio</a> as my IDE for developing my websites, so you’ll also need to <a href="https://www.r-project.org/">install R</a>. Then, you can <a href="https://quarto.org/docs/get-started/">install Quarto</a> and you’ll have what you need.</p>
</section>
<section id="store-code-on-github" class="level2">
<h2 class="anchored" data-anchor-id="store-code-on-github">2. Store code on GitHub</h2>
<p>The source code for the blog should be <a href="https://www.atlassian.com/git/tutorials/what-is-version-control">version controlled</a> and stored in a remote repository. I prefer <a href="https://www.github.com/">GitHub</a>. It’s not actually necessary to do this, but creates a much cleaner workflow and promotes better software development practices, as it retains all history to your code changes. As you develop your website locally, you’ll commit and push changes as you get to good stopping points, and GitHub will serve as your website’s source of truth.</p>
</section>
<section id="host-your-site-on-netlify" class="level2">
<h2 class="anchored" data-anchor-id="host-your-site-on-netlify">3. Host your site on Netlify</h2>
<p>Your code is on GitHub, but you need a server that will actually host the live application. That is where <a href="https://www.netlify.com/">Netlify</a> comes in, where you can host your website for free. To make it very easy, you can configure Netlify to update your website everytime code changes are made to GitHub (#2), automatically. So when I develop my website (like this very blog post), I just commit and push my changes to GitHub (from RStudio) and that will automatically trigger Netlify to grab changes from the repository and update the live website. By default, your website URL will be something like <code>website.netlify.app</code>, but Netlify also has a great domain management system where you can point your website to a custom URL that you own (which you’ll likely have to pay for). Nevertheless, if you don’t mind the default URL, it is completely free.</p>
</section>
</section>
<section id="get-started" class="level1">
<h1>Get started</h1>
<p>My advice is to get your website workflow up and running first: create the local (default) application, push the code to GitHub, configure Netlify to host the website, and, if you choose, set up your custom domain (again, this last part costs money). Then you can start to make tweaks to your website and make sure everything operates as expected for general maintenance. From there, you can then focus on customizing your website in any way you want, and more importantly, developing the actual content for your blog.</p>
    <bluesky-comments post="at://did:plc:sh3av73hihgu72vx7k44kgv7/app.bsky.feed.post/3lc73hlfqyk2t" config="{}"></bluesky-comments>
  


<!-- -->

</section>

 ]]></description>
  <category>Deployment</category>
  <category>Learning</category>
  <category>Software Development</category>
  <guid>https://www.zajichekstats.com/post/you-should-have-a-data-science-blog/</guid>
  <pubDate>Wed, 25 Sep 2024 05:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/you-should-have-a-data-science-blog/feature.png" medium="image" type="image/png" height="145" width="144"/>
</item>
<item>
  <title>Low cost ways to build and deploy analytical web apps</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/low-cost-ways-to-build-and-deploy-apps/</link>
  <description><![CDATA[ 




<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/vttykZnwIbA" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<p>In the era of <a href="https://en.wikipedia.org/wiki/Artificial_intelligence">artificial intelligence</a> (AI), organizations are often told they must <a href="https://medium.com/@markkloepfel/artificial-intelligence-adopt-or-get-left-behind-6e7387e695b9">buy in or get left behind</a>. I’ve attended various presentations like this. The presenters discuss how things are changing, provide some high-level overview of AI, and then convey that the organization must develop a strategy for it. However, the feeling left in the room is usually that of confusion and ambiguity. They understand that they should be <em>thinking</em> about it, but it’s totally unclear what (if anything) they should actually <em>do</em>.</p>
<p>First, it’s not really clear what is meant by “AI”, as it can mean <a href="https://hai.stanford.edu/sites/default/files/2020-09/AI-Definitions-HAI.pdf">many things</a> (I often don’t know what they mean by it either). Second, the implication seems to be pointed toward AI in the context of tools that can be <em>purchased</em> from vendors, therefore adding a financial stressor to decision makers when figuring out how to act. It also conveys a one-sided message: that AI <em>is</em> products that you buy, never really discussing the technical foundation, and thus foregoing the idea of taking these concepts and building incrementally.</p>
<p>Sure, there are all sorts of obscure ways organizations can use tools like <a href="">ChatGPT</a>, for example, but I would venture to guess that in the bigger picture, much of the target audience is at a relatively early stage of their data science journey, and unaware of all of the opportunity that is available using low-cost, or free, programming languages and technologies that can get them well on their way to a mature advanced analytics function, without making a direct jump to cost-intensive solutions in hopes that they “work”. There’s a better way.</p>
<p>Data science is all about asking the right questions and figuring out ways to provide answers to those questions to the people who need them at the right time so they can take action. This is somewhat arbitrary, but that also means it leaves the door open to an infinite number of ways to develop solutions for it, many of which can be solved (a) by going back to the fundamentals: data quality, data collection, data storage, infrastructure, reporting workflows, etc., and (b) with, at least to start, freely-available software, tools and programming languages. AI is simply hyperbole for data and analytical strategy. There’s lots of ways to start using data better, and that doesn’t necessarily mean opening your wallet.</p>
<section id="a-shiny-intro" class="level1">
<h1>A Shiny intro</h1>
<p>One of those tools, which I use almost everyday, is <a href="https://shiny.posit.co/">Shiny</a>. It is a <strong>free</strong> toolkit built primarily for the <a href="https://www.r-project.org/">R programming language</a> (also <strong>free</strong>), but has recently been made <a href="https://shiny.posit.co/py/">available in Python</a> as well (another <strong>free</strong> <a href="https://www.python.org/">programming language</a>), that enables you to build completely custom analytical web applications for whatever your purpose may be, whether that is for data exploration, dashboards, reporting, predictive modeling, mapping…the possibilities are endless.</p>
<p>Given these are code-first data science tools, there is of course a learning curve. It will take some effort to figure out. The main advantage is that it is <strong>free</strong>, allowing you to start slow, learn as you go, and progress iteratively. You can start realizing incremental, tangible impact without having to invest tons of money in a product upfront. Additionally, doing this can build the fundamental skills needed to continue maturing the path toward advanced analytics. Cost incurs as it financially makes sense from a value perspective, because you are in total control of what you are producing. In this article, we provide a brief overview of where you can start building and sharing these applications for <strong>free</strong>.</p>
</section>
<section id="where-can-i-share-my-app" class="level1">
<h1>Where can I share my app?</h1>
<p>This is not a tutorial on how to <em>build</em> Shiny apps, as there are <a href="https://shiny.posit.co/r/articles/">tons of resources</a> available to help you begin doing so. However, a big hurdle is often how to get that app off of your computer and in the hands of others for consumption. I’m here to tell you that even that part can be easily done with no cost.</p>
<p><strong>Bottom line:</strong> <em>There is infrastructure out there that allows you to easily build and share completely custom, powerful web applications for $0</em>.</p>
<p>Thus, if you are an individual, part of a small team, someone in a big organization, a leader, etc., and want to find better ways to interact with your data, you can begin developing and <em>sharing</em> useful applications with people (or the world). All for free. Which makes it feasible for anyone to begin experimenting, prototyping, and building with minimal risk. Here some of the options for doing so:</p>
<section id="locally" class="level2">
<h2 class="anchored" data-anchor-id="locally">1. Locally</h2>
<p>We already said that the primary goal of sharing an application is to get it off of your local machine, but this is still worth mentioning, because it is how you start. Once you install <a href="https://www.r-project.org/">R</a>, <a href="https://posit.co/download/rstudio-desktop/">RStudio</a>, and the <a href="https://shiny.posit.co/r/getstarted/shiny-basics/lesson1/index.html"><code>shiny</code> package</a>, you’ll have everything you need to begin developing and running apps. This is of course a great way to <em>develop</em>, and often the way you would do it regardless, because you can write the code, run the app to see how it works, make changes, test, and repeat.</p>
<p>But once it’s finalized and you want to share it, without other infrastructure in place, you would need to send the code files to someone else and have them run it on their own machine, which means they need to have all the software installed as well. This can be a totally legitimate approach in certain cases. It is certainly a starting point, but it’s far from ideal.</p>
<section id="side-note" class="level3">
<h3 class="anchored" data-anchor-id="side-note">Side note</h3>
<p>Another reason I mention local “deployment” is because I actually use this approach quite often. When I am analyzing a dataset, it is extremely useful to be able to spin up an application for my own usage to easily explore certain aspects of it, among many other uses. In this case, the audience is me, and I can make it work however I need it to serve my purposes. You can do this too, and don’t need it to run anywhere but your own computer, yet it is still very useful.</p>
</section>
</section>
<section id="shinyapps" class="level2">
<h2 class="anchored" data-anchor-id="shinyapps">2. shinyapps.io</h2>
<p>The most obvious location to deploy a <em>Shiny</em> app is the place specifically purposed for it: <a href="https://www.shinyapps.io/">shinyapps.io</a>. It is a cloud-based server built and maintained by <a href="https://posit.co/">Posit</a> (who we’ll see come up a lot in this article) that is made for hosting apps built in this framework. It has been around for quite a while, and, as we’ll see, there have been other platforms developed for hosting these apps that may be a better fit. Nevertheless, this is a totally viable option and great way to start deploying things, and they have a free tier that allows you to deploy up to five (5) applications with 25 hours of active usage per month.</p>
<section id="the-basics" class="level3">
<h3 class="anchored" data-anchor-id="the-basics">The Basics</h3>
<p>The concept is simple: develop your app on your local machine (e.g., in RStudio) and then run the <code>rsconnect::deployApp</code> function to deploy it to <a href="https://www.shinyapps.io/">shinyapps.io</a>. You can create an account with your email address or sign up through GitHub (like I did), among other options. Here is an outline of the steps I took to deploy <a href="https://tgzz86-alex0zajichek.shinyapps.io/readmissionriskpool/">this app</a>:</p>
<section id="make-an-application" class="level4">
<h4 class="anchored" data-anchor-id="make-an-application">1. Make an application</h4>
<p>First you just need to code up an app. This could be very simple like the one found <a href="https://shiny.posit.co/r/getstarted/shiny-basics/lesson1/index.html">here</a> to get started.</p>
<p>I’m using an <a href="https://github.com/centralstatz/ExampleApps/tree/main/ReadmissionRiskPool">existing demo app</a> we have in <a href="https://github.com/centralstatz">our GitHub page</a>. From Terminal we can execute (and you can too):</p>
<pre><code>git clone https://github.com/centralstatz/ExampleApps.git</code></pre>
<p>The app we want is located at <code>ExampleApps/ReadmissionRiskPool</code>.</p>
</section>
<section id="configure-connection-to-shinyapps.io" class="level4">
<h4 class="anchored" data-anchor-id="configure-connection-to-shinyapps.io">2. Configure connection to <a href="https://www.shinyapps.io/">shinyapps.io</a></h4>
<p>When you sign up, you’ll get a basic list of instructions for getting started. Assuming you are doing it with R, you need to install the <code>rsconnect</code> R package and register your account information, which includes tokens/secrets that are provided with your <a href="https://www.shinyapps.io/">shinyapps.io</a> account.</p>
<pre><code># Install the configuration library
install.packages("rsconnect")

# Setup account information
my_user_name &lt;- "&lt;GET_FROM_ACCOUNT&gt;"
my_token &lt;- "&lt;GET_FROM_ACCOUNT&gt;"
my_secret &lt;- "&lt;GET_FROM_ACCOUNT&gt;"
rsconnect::setAccountInfo(
  name = my_user_name,         
  token = my_token,     
  secret = my_secret
)</code></pre>
</section>
<section id="deploy-the-application" class="level4">
<h4 class="anchored" data-anchor-id="deploy-the-application">3. Deploy the application</h4>
<p>Once your account is configured, you can call <code>rsconnect::deployApp()</code> with the application specified and it will be uploaded to the host server.</p>
<pre><code># Assume we're working in the repo just cloned
rsconnect::deployApp("ReadmissionRiskPool")</code></pre>
<p>And just like that we have a custom, interactive web application <a href="https://tgzz86-alex0zajichek.shinyapps.io/readmissionriskpool/">available to the world</a>.</p>
<p>One thing you might notice is the web URL: <a href="https://tgzz86-alex0zajichek.shinyapps.io/readmissionriskpool/">https://tgzz86-alex0zajichek.shinyapps.io/readmissionriskpool/</a></p>
<p>It is obviously not very nice looking. One way to clean this up would be to have a better username (the first part of it). Then, once you start getting into the <a href="https://www.shinyapps.io/">shinyapps.io</a> paid plans, they allow for custom domains.</p>
<p>Another part of this is security. The app we just deployed is publicly available, which is definitely not always what is desired. Again, with the paid tiers you can begin introducing authentication into the applications for restricted access.</p>
</section>
</section>
</section>
<section id="posit-cloud" class="level2">
<h2 class="anchored" data-anchor-id="posit-cloud">3. Posit Cloud</h2>
<p>This platform takes it up a notch. <a href="https://posit.cloud/">Posit Cloud</a>, developed and maintained by the <a href="https://posit.co/">same company</a>, is not only a place to share applications, but a cloud-based service where you can <em>develop</em> your code as well, so everything is accessed in your web browser. As the <a href="https://posit.cloud/">website describes</a> it is a great platform for collaboration, teaching, etc. in a more all-encompassing environment.</p>
<p>You can organize content into “spaces”, which are disjointed areas for separate work streams. Within each space, and at the content item level, you can control settings for who has access, to what, and how much. Each user creates a login to the platform, and is able to access the spaces that they have created or have been shared with them. The free tier enables you a single shared space (which you can invite other users to) with up to 25 projects/outputs and limited computation usage. There is also infrastructure in place to establish connections to databases, among other things, that make it a feasible home for full-fledged data solutions. The other very important thing to note is that it is not only <a href="https://shiny.posit.co/">Shiny</a> applications that can be deployed here, but all sorts of other analytical content such as <a href="https://rmarkdown.rstudio.com/">R Markdown</a> and <a href="https://quarto.org/">Quarto</a> documents, <a href="https://www.rplumber.io/">API’s</a>, etc. Here is an introductory video for more information straight from the source:</p>
<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/-fzwm4ZhVQQ" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<section id="how-we-use-posit-cloud" class="level3">
<h3 class="anchored" data-anchor-id="how-we-use-posit-cloud">How we use Posit Cloud</h3>
<p>With the paid plans of <a href="https://posit.cloud/">Posit Cloud</a>, you are allowed an unlimited number of spaces, as well as beefed-up compute. This is what enables us to take advantage of this platform as an offering for client engagements. Each client gets their own private space which all of their analytical content we are collaborating on lives, and only we (us and the client) have access. Here’s one possible example of how a client engagement could work:</p>
<ol type="1">
<li><p>A client reaches out because they want to build a web application that enables them to interactively explore data related to their customer base on a reactive map, among other functionality.</p></li>
<li><p>We initialize a new space in <a href="https://posit.cloud/">Posit Cloud</a> and invite the client via email. They receive the invitation link that takes them to the space upon login, creating an account if they have not done so already.</p></li>
<li><p>We initialize a new private (or public, if applicable) <a href="https://github.com/">GitHub</a> repository on our <a href="https://github.com/centralstatz">organization’s page</a> to hold all of the application’s source code and its change-tracking history. Optionally, the client can be added as a member with viewing privileges of these files as well for full transparency.</p></li>
<li><p>We initialize a new RStudio project within the workspace sourced from the created GitHub repository. This serves as the development environment for the application.</p></li>
<li><p>The client has a dataset in a large Excel file that we would like to use as the source for the application. It gets uploaded into the application’s project by us after the client sends it in an email (or alternatively, the client goes in and uploads it themselves).</p></li>
<li><p>The app development work begins. We work iteratively with the client to construct it most optimally to fit their needs (occassionally pushing code to <a href="https://github.com/">GitHub</a> to track changes). Because they have direct access to the space, they can see it anytime. We can quickly <em>show</em> progress updates, bounce ideas back and forth, test functionality, and answer questions in a timely manner, until it is satisfactory. Then, we deploy the final application, and the client is enabled to go into it on-demand and use it as seen fit.</p></li>
<li><p>Over time, the data becomes stale so the client would like it to be updated on a recurring monthly basis. One option would be to send a new Excel file each month and we will manually update and re-deploy the application. Instead, we decide to use Posit Cloud’s <a href="https://docs.posit.co/cloud/guide/data/#external-databases">built-in data integration</a> capabilities. So we establish a connection to a SQL database in which the Excel file was sourced from, and build the queries directly into the application’s source code so that the application itself is sourced directly from the database. No middleman required.</p></li>
</ol>
<p>Overall, <a href="https://posit.cloud/">Posit Cloud</a> is a highly recommended tool to use for individuals and/or teams of people to get started with application development and deployment. Or even if you’re a seasoned Shiny developer, the infrastructure you get out of the box is amazing. And again, it is free.</p>
</section>
</section>
<section id="connect-cloud" class="level2">
<h2 class="anchored" data-anchor-id="connect-cloud">4. Connect Cloud</h2>
<p><a href="https://posit.co/">Posit</a> <a href="https://posit.co/blog/introducing-posit-connect-cloud/">just recently</a> launched the <a href="https://docs.posit.co/connect-cloud/user/">Alpha release</a> of their newest platform, called <a href="https://connect.posit.cloud/">Connect Cloud</a>. It is in its early phase, and it’s all about easy deployment. As the <a href="https://connect.posit.cloud/">home page</a> states, there are three (3) steps to get an application up and running with a shareable link that can be accessed from anywhere:</p>
<ol type="1">
<li>Create a new account by authenticating with <a href="https://github.com/">GitHub</a> (so you must have a GitHub account)</li>
<li>Link to a public repository (from GitHub) containing the code for a <a href="https://shiny.posit.co/">Shiny</a> application (or whatever other type of content you’re deploying)</li>
<li>Deploy the application and share the link</li>
</ol>
<p>We did this with the application deployed in #2, and yes, it’s that easy (and, once again, free). The link for the app on <em>this</em> platform is <a href="https://connect.posit.cloud/zajichek/content/01912861-8be7-59e2-215a-cdeffdd549f2">here</a>.</p>
<p>The main thing you have to remember to do before making your final commit to <a href="https://github.com/">GitHub</a>, and subsequently configuring the app connection to the repository, is to create the <code>manifest.json</code> file in your application’s root directory to capture the environment parameters that need to be created (automatically by <a href="">Connect Cloud</a>) for it to run. This can be done with a simple command:</p>
<pre><code>rsconnect::writeManifest()</code></pre>
<p>You can watch this video to get a more thorough introduction:</p>
<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/HWCPLURWYgY" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<p>We don’t quite yet know where this platform is going to go given that it is in such early stages. Although the infrastructure it already provides, and the ease at which it is done, is magic-like, there are of course a lot of things it <em>doesn’t</em> have (yet): login without GitHub, source apps from private repositories, private content, authentication, security, etc. These were the things I wondered about when I attended the live webinar above, which in turn had me thinking how it exactly compares with the purpose of <a href="https://posit.cloud/">Posit Cloud</a> (in the <a href="https://www.youtube.com/live/HWCPLURWYgY">video</a>, I explicitly asked this at <a href="https://youtu.be/HWCPLURWYgY?t=1766">29:26</a>). It sounds like these “commercial” considerations are all part of the roadmap, so we are very excited to see where it goes.</p>
<p>Nevertheless, even in its current state, <a href="https://connect.posit.cloud/">Connect Cloud</a> is a highly recommended platform to start using for anyone wanting to build a data science portfolio and deploy <em>public</em> content.</p>
</section>
<section id="shiny-server" class="level2">
<h2 class="anchored" data-anchor-id="shiny-server">5. Shiny Server</h2>
<p>Want total control over the infrastructure? If yes, <a href="https://posit.co/products/open-source/shiny-server/">Shiny Server</a> might be for you.</p>
<p>This is an <em>open-source</em> server configuration that can be installed on-premises (or wherever you’d like). The huge advantage is that it is <em>always</em> free–you just need to install the software. The disadvantage is that you need a server bulky enough to handle the desired compute <em>and</em> you need people who know how to manage it. Compared to the others, this option is like the wild west. It provides you the basic skeleton to get stuff working, but from there you have ultimate freedom to do with it what you wish, which can easily get out of hand as the developer/user bases grow and you’re trying to maintain environments as new software package versions are constantly being released.</p>
<p>I always consider this a fantastic option for relatively large organizations who are early in their data science journeys. They likely have mature information technology (IT) teams supporting existing systems, but have not yet invested heavily in advanced analytics infrastructure. This provides an opportunity to leverage those mature systems admin teams to setup up and manage the backend of this infrastructure that then enables data scientists in the organization to deliver great analytic content (and prove its value) without spending tons of money on acquiring the software. Low risk with huge potential.</p>
<section id="amazon-web-services-aws" class="level3">
<h3 class="anchored" data-anchor-id="amazon-web-services-aws">Amazon Web Services (AWS)</h3>
<p>Despite it being open-source, most individuals (or small companies for that matter) probably don’t have adequate server space to implement <a href="https://posit.co/products/open-source/shiny-server/">Shiny Server</a> in the way they’d like. However, you <em>can</em> do it if you use external compute resources. <a href="https://aws.amazon.com/ec2/">Amazon EC2</a> is one way to make it happen. And still for free.</p>
<p>The way it works is that you spin up an EC2 instance of your choice (essentially a computer, ranging from very low to very high compute power, the <code>t2.micro</code> being the one that you can use for <strong>free</strong>), install <a href="https://posit.co/products/open-source/shiny-server/">Shiny Server</a> on the instance, and then have all the freedom in the world to deploy applications to it and access them on the web. <a href="https://www.charlesbordet.com/en/guide-shiny-aws/">This</a> is an excellent resource that I followed when learning how to do this, which you can see a detailed account of my steps <a href="https://github.com/zajichek/shinydemo-aws/issues/1">here</a> as well.</p>
<p>The real cool thing about this setup is that you can take it wherever you want to go. You have unlimited ability to customize the design of your server pages, integrate it with other software/tools, assign custom domains, etc. The possibilities are endless. When <a href="https://github.com/zajichek/shinydemo-aws/issues/1">I learned how to do this</a>, I was able to quickly get it to a state where my server is live and accessible at a subdomain of one of my websites: <a href="http://apps.zajichekstats.com/">http://apps.zajichekstats.com/</a>. Clicking that link takes you to the home page of my Shiny Server on an AWS EC2 <code>t2.micro</code> instance, where subsequent applications are found at subpages of that (e.g., <a href="http://apps.zajichekstats.com/shinydemo-aws/">http://apps.zajichekstats.com/shinydemo-aws/</a>). This entire setup, including the subdomain assignment, was done for free, and it is barely scratching the surface of what <em>can</em> be done with it.</p>
</section>
</section>
</section>
<section id="conclusion" class="level1">
<h1>Conclusion</h1>
<p>There are many different ways (and this list probably doesn’t come close to covering the possible ones (e.g., <a href="https://azure.microsoft.com/en-us/products/app-service">Azure App Service</a>)) to start building and sharing data science products for no cost. So, what are you waiting for? Give them a try!</p>
    <bluesky-comments post="at://did:plc:sh3av73hihgu72vx7k44kgv7/app.bsky.feed.post/3lc73mgu5ns2t" config="{}"></bluesky-comments>
  


</section>

 ]]></description>
  <category>Web applications</category>
  <category>Deployment</category>
  <guid>https://www.zajichekstats.com/post/low-cost-ways-to-build-and-deploy-apps/</guid>
  <pubDate>Tue, 20 Aug 2024 05:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/low-cost-ways-to-build-and-deploy-apps/feature.png" medium="image" type="image/png" height="150" width="144"/>
</item>
<item>
  <title>A prediction system for managing the hospital readmission risk pool</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/managing-the-readmission-risk-pool/</link>
  <description><![CDATA[ 




<p>Each year hospitals across the United States get <a href="https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/hospital-readmissions-reduction-program-hrrp">penalized by CMS</a>, withholding up to 3% of all Medicare reimbursement for an entire fiscal year, for having excess <a href="https://www.healthcare.gov/glossary/hospital-readmissions/">readmissions</a>. Additionally, there are implications from <a href="https://www.uhcprovider.com/content/dam/provider/docs/public/health-plans/medicare/MA-Readmission-Program-Clinical-Guidelines.pdf">commercial payors</a>, <a href="https://www.ncqa.org/hedis/measures/plan-all-cause-readmissions/">quality programs</a>, etc. that make it a focal point of the general <a href="https://www.cms.gov/priorities/innovation/key-concepts/value-based-care#:~:text=What%20is%20value%2Dbased%20care,what%20an%20individual%20values%20most.">value-based care</a> landscape. Not to mention the obvious patient burden (both financially and <a href="https://pubmed.ncbi.nlm.nih.gov/34544571/">psychologically</a>) of being hospitalized twice in a short period of time. Thus, it has become a <a href="https://www.definitivehc.com/blog/top-10-hospital-performance-metrics-you-need-to-know">key area</a> of focus for hospitals in monitoring the overall health of their clinical and financial operations.</p>
<p>In turn, the day-to-day discussion becomes one of strategy: what interventions, processes, and workflows should be put in place to proactively prevent hospital readmissions from occurring? This is not totally straightforward. Do we focus on preventing the <em>readmission</em>, or the initial hospitalization altogether? You can’t have a readmission without an index stay. For which patients? Who needs which resources? There can be conflicting priorities. Hospitals <em>depend</em> on admissions (mostly from <a href="https://www.definitivehc.com/resources/healthcare-insights/breaking-down-us-hospital-payor-mixes">commercial payors</a>) to keep the lights on. Also, programs like the <a href="https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/hospital-readmissions-reduction-program-hrrp">Hospital Readmissions Reduction Program (HRRP)</a> (<a href="https://www.cms.gov/">CMS</a>’ penalty program) only apply to Medicare beneficiaries, which is a subset of the overall hospital population (and sometimes the <a href="https://www.aha.org/system/files/media/file/2020/09/fact-sheet-billing-explained-0820.pdf">most unprofitable</a>). Should we just put our resources towards preventing admissions for the unprofitable patients? Probably not. Obviously, we are constrained morally (and probably legally) from giving payor-based, preferential treatment, but this is simply the reality of the things stakeholders need to sift through. In my view, the best you can do to balance things, at least to start, is to be <em>meticulously aware</em> of what is happening–through data.</p>
<p>In this realm, one component of particular interest is using <a href="https://en.wikipedia.org/wiki/Predictive_analytics">predictive analytics</a> to anticipate and intervene on high-risk patients in order to prevent a subsequent hospitalization. Despite there being a <a href="https://pubmed.ncbi.nlm.nih.gov/?term=predicting+readmissions">large body of work</a> by researchers developing creative and innovative approaches for preventing readmissions, the reality is that many hospitals do not leverage all the literature that is available because (a) most of it is just that–research, and it’s difficult to confidently translate and tailor that to an actionable program for any one hospital, and (b) it is simply too difficult to parse and organize <em>because</em> there is so much of it.</p>
<p>So, when hospitals do go down the route of implementing predictive tools for readmissions, it seems to be more readily available, yet sub-optimal modeling frameworks that are used. For example, <a href="https://www.epic.com/">Epic</a> has <a href="https://www.epicshare.org/tips-and-tricks/use-predictive-risk-score-to-reduce-readmission">their own</a> readmission risk predictor native to its EHR platform for this very purpose. The common thing I notice about the tools used in practice (and in the way <a href="https://qualitynet.cms.gov/inpatient/measures/readmission">CMS does it</a>) is the modeling setup: they predict the likelihood of readmission based on the state of the patient at the time of <em>hospital discharge</em>. Yet, it is <a href="https://www.aha.org/system/files/2018-02/11sep-tw-readmissions.pdf">well known</a>, and frankly just common sense, that the actual drivers, the real, preventable reasons, for a readmission are circumstances of the patient <em>after</em> being discharged from the hospital. That’s not to say these scores aren’t <em>correlated</em> with the rates of readmission, or can’t provide a useful marker; but with the complexities of managing a diverse hospital population, it makes it difficult to figure out what to do with a rapidly stale risk score that only reflects how the patient was when they left the hospital, and ignores everything that happened after.</p>
<p>Thus, a reframing of the modeling problem is in order. And in fact, it’s more than that. In this effort, <em>how</em> the information is displayed, propagated and relayed between clinical teams and leadership is as (if not more) important than the shear goal of accurate risk estimation (according to statistical metrics). It requires cross-functional involvement (from the get-go), strategic design, attention to nuance and rigor, and a flexible scope in order to tie hospital-wide impact down to the individual patient.</p>
<p>This is my idea of how you might build a tool to effectively manage the readmission risk pool.</p>
<section id="riskpooldefinition" class="level1">
<h1>First, what is the <em>readmission risk pool</em>?</h1>
<p>We’ll go with a very simple definition:</p>
<blockquote class="blockquote">
<p>All patients at risk for hospital readmission at a given point in time.</p>
</blockquote>
<p>Let’s use 30-day readmissions as an example. This would be all patients who were discharged from the hospital in the past 30 days who are still at risk for readmission. Meaning that up until the <em>current</em> point in time (i.e., <em>now</em>), they have not already been readmitted or had an otherwise exclusionary event. These are the patients we can still do something about.</p>
<p>From here on out, when we talk about the <em>risk pool</em>, this is what we mean. You can assume in general we’re talking about 30-day readmissions, although nothing about this framework necessarily restricts us to that.</p>
</section>
<section id="whathappened" class="level1">
<h1>Step 1: Build a (near) real-time data tool</h1>
<p>This is simply a data problem.</p>
<p>Before we even begin the conversation about <em>prediction</em>, we should exhaust all efforts to optimize the ability to know what <em>has already happened</em> as close to <em>now</em> as we can. That means purposefully designing a tool that encapsulates the full picture of readmissions for the hospital (system), tying the contributions of individual patients back to aggregated hospital metrics, in (near) real-time. We can think of the types of questions that this would facilitate on-demand, up-to-date answers to, from different areas of hospital administration:</p>
<p><strong>Care Management</strong></p>
<ul>
<li>Who is currently in the <em>risk pool</em>?</li>
<li>Has a patient already received, or is going to receive, a certain intervention?</li>
<li>What are the patient’s socioeconomic conditions?</li>
<li>Have they filled their prescriptions?</li>
<li>What are the payor rules for the services they may need?</li>
</ul>
<p><strong>Coding/Billing</strong></p>
<ul>
<li>Which patients contribute towards the <a href="https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/hospital-readmissions-reduction-program-hrrp">HRRP</a> and/or other relevant metrics?</li>
<li>What is the readmission risk according to <a href="https://www.cms.gov/">CMS</a>’ (or other programs’) models?</li>
<li>What risk factors from <a href="https://www.cms.gov/">CMS</a> (or other programs) are currently documented (and are there discrepancies)?</li>
</ul>
<p><strong>Executive/Leadership</strong></p>
<ul>
<li>What is the current (estimated) <a href="https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/hospital-readmissions-reduction-program-hrrp">HRRP</a> penalty amount?</li>
<li>What share of patients in the (past and present) risk pool affect readmission penalty?</li>
<li>What is the <em>real-time</em> readmission rate, where has it been, and how will it change under various scenarios with the patients who may affect it?</li>
</ul>
<p>The goal is to make it actionable for all stakeholders involved by giving them the most pertinent information they need when they need to know it. But doing so in a common, representative, well-connected tool so that the lineage is clear, and everyone is working from the same source of information. I strongly believe that putting in the cross-functional time, nuance, and rigor needed to design and implement tools like this, that mostly consists of figuring out how to move existing data points to a certain location at the right time, would relieve a lot of disconnect and probably give people most of what they need to make effective, informed decisions in a timely manner. Note that <em>existing</em> data points may include determining new ways to collect information as well, not necessarily only sticking with systems in place, as time may be better well-spent establishing new data collection mechanisms that actually measure the thing of interest, rather than jumping through hoops to force conformity of existing sources.</p>
<section id="prototype" class="level2">
<h2 class="anchored" data-anchor-id="prototype">A prototype</h2>
<p>So what might such a tool look like? Well here’s one possibility (the source code for this is <a href="https://github.com/centralstatz/ExampleApps/tree/main/ReadmissionRiskPool">here</a>):</p>
<p><img src="https://www.zajichekstats.com/post/managing-the-readmission-risk-pool/app1.png" class="img-fluid"></p>
<p>Not too fancy, but it’s a start. The focal point being the interactive map widget. Suppose your service area was the state of Wisconsin. Each dot represents a patient in the risk pool at their home address (i.e., they were recently discharged from the hospital), and the relative size/color of the dot represents the amount of risk. For example, in the popup box, Patient 240 was discharged 13 days ago with a 47% readmission risk, and has 17 days remaining in the risk pool.</p>
<p>It gives a simple and intuitive way to view the current risk pool for the hospital (system). Care managers can use it to identify patients for intervention in real-time, while leadership can use it to quickly get a pulse on the total volume of patients across the system at risk for readmission and how that would affect aggregated metrics over a period of time. Of course, there can be many enhancements to this for better utility, such as functionality to filter patients by disease categories, service lines, discharge location, PCP location (if they have one), whether they have a visit scheduled, payor, who will contribute to the <a href="https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/hospital-readmissions-reduction-program-hrrp">HRRP</a> (and who will not)…the list goes on. The goal is to have a readily-available, cohesive tool that can provide day-to-day actionable information for all parties involved.</p>
</section>
<section id="subsequent-impact" class="level2">
<h2 class="anchored" data-anchor-id="subsequent-impact">Subsequent impact</h2>
<p>The effect of this as a starting point is that the jump to <em>predictive</em> analytics becomes much more intentional. You get what you can from understanding what has already happened, which I think is a lot, and then proceed to a more advanced (i.e., predictive) solution that is much more well-defined once processes have been optimized with the current state and its capabilities have reached their limit. Then when you want to add that additional feature, the focus can be on that, making it much more clear for everyone who needs to be involved (e.g., IT, data analytics, clinicians, managers, etc.) what the specific goals are.</p>
<p>For example, if a care management team can already quickly and intuitively answer all the questions they have about an individual or group of patients with respect to things that already occurred in order to manage the risk pool on a day-to-day basis, then naturally the predictive piece only arises when the need is necessary (and predictably, it’s probably related to resource utilization). In a world with unlimited resources, they could just continually intervene on the entire risk pool everyday, until they know each patient is not going to be readmitted (and ideally after). But in our world, they’ll be faced with scenarios like:</p>
<ul>
<li>There are 500 patients in the risk pool, but I only have time to intervene on 50 of them. Which ones should I choose?</li>
<li>Which patients are currently at the highest risk of being readmitted?</li>
<li>Patient A was discharged 20 days ago, and patient B was discharged 10 days ago. Which one should I intervene on today?</li>
</ul>
<p>At which point the necessary personnel can be convened, and an enhancement can be implemented for that specific purpose.</p>
<p>All-in-all, going back to the introduction, this is about being <em>meticulously aware</em>.</p>
</section>
</section>
<section id="whatwillhappen" class="level1">
<h1>Step 2: What is going to happen?</h1>
<p>Now suppose in the tool above, when we click on an individual patient, we want to know what their <em>current</em> risk of readmission is, given everything that has happened since they were discharged. Some patients may only have 5 days left in the risk pool, while others were just recently discharged. Some have had post-discharge clinic visits, some have not. Things have changed since they left the hospital, and we want to find a way to prioritize which ones still need our attention the most. This is where predictive analytics comes in.</p>
<section id="build-the-pipelines" class="level2">
<h2 class="anchored" data-anchor-id="build-the-pipelines">Build the pipelines</h2>
<p>From a technological perspective, we can first consider putting in a random number as a placeholder for the risk. This allows us to build the data pipelines, infrastructure and workflows needed to support the models once we’re ready for the real thing. Some key things to think about:</p>
<ul>
<li>Will it be delivered through an API?</li>
<li>What databases need to be accessed, and when, in order to evaluate the models and produce a prediction?</li>
<li>How will we monitor the model’s accuracy, and have the ability to iterate/update it?</li>
<li>How will administrators and clinical teams interact with it?</li>
<li>Where/how should the number(s) be displayed in the application?</li>
</ul>
<p>Hashing these things out enables a plan to be put in place for the product as a whole, and ensures the buy in from all of the teams that will be needed to use and maintain it.</p>
<p>At this point, it’s still just a systems, logistical, and personnel problem. Once everyone understands <em>how</em> it’s going to work, then the focus (for the data science teams) can transition to the math.</p>
</section>
<section id="model" class="level2">
<h2 class="anchored" data-anchor-id="model">The model</h2>
<p>We’re going to get a little technical here.</p>
<p>Let</p>
<p><img src="https://latex.codecogs.com/png.latex?D%20=%20%5Ctext%7BReadmission%20duration%20of%20interest%20(e.g.,%2030%20days)%7D"> <img src="https://latex.codecogs.com/png.latex?T%20=%20%5Ctext%7BTime%20since%20discharge%20to%20the%20current%20time%20point%7D"> <img src="https://latex.codecogs.com/png.latex?R%20=%20%5Ctext%7BTime%20that%20the%20patient%20is%20readmitted%20(if%20at%20all)%7D"></p>
<p>Then what we want to estimate is:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(R%20%5Cleq%20D%20%7C%20R%20%3E%20T)"> In layman’s terms, were simply asking this: if, as of <em>now</em>, a patient is still at risk for readmission (i.e., in the risk pool), what is the probability that they will be readmitted in the remaining window of interest, given they have not been readmitted up to this point?</p>
<p>So if a patient was discharged 20 days ago, and they still have not been readmitted, we want to know how likely it is that they will be readmitted in the next 10 days (assuming we care about 30 day readmissions), given what has happened since discharge.</p>
</section>
<section id="how-do-we-estimate-it" class="level2">
<h2 class="anchored" data-anchor-id="how-do-we-estimate-it">How do we estimate it?</h2>
<p>Notice that the model is actually quite general, which leaves the door open to <em>many</em> possible ways to structure the data and choose statistical methodologies as seen fit. We’re basically needing some modeling process, however defined, that can produce a predicted probability at an arbitrary point in time, accounting for what has happened up to that point. Intuitively, I think some sort of <a href="https://en.wikipedia.org/wiki/Bayesian_statistics">Bayesian</a> approach would be pretty cool, since you can conceptually think of updating <em>yesterday’s</em> risk with the new information you’ve gathered about the patient up to <em>today</em> (e.g., maybe they just completed an office visit). In that sense the information is more naturally accumulating. But since I haven’t thought that one through yet, I’ll propose a simpler way to start:</p>
<section id="baselinerisk" class="level3">
<h3 class="anchored" data-anchor-id="baselinerisk">1. Start with the baseline risk</h3>
<p>We’re not trying to totally recreate the wheel here. Those discharge-based risk estimates we talked about in the introduction can still be valuable <em>at that point in time</em>. Since they already account for many patient/clinical characteristics before and during the hospitalization, we might as well use it. Additionally, if a hospital (system) is already using, say, an <a href="https://www.epicshare.org/tips-and-tricks/use-predictive-risk-score-to-reduce-readmission">Epic readmission risk score</a>, this can just be seen as a day-to-day updating of that. However, if you don’t have/want this, this framework will still work.</p>
<p>We just want an estimate of the patient’s risk of readmission when they leave the hospital, however that may be generated. If you want to pretend you know nothing at that time, then just think of starting everyone at the overall average risk.</p>
</section>
<section id="denominator" class="level3">
<h3 class="anchored" data-anchor-id="denominator">2. Define the denominators</h3>
<p>Let’s assume we’ll generate one prediction per day for each patient over the course of the 30-days after they are discharged. So, we’re going to train separate models for each of those days: 1, 2, …, 30. Thus, we need to define the set of patients who were in the risk pool at each of those points in time over whatever period our training data will cover. For example, for the Day 5 model, our denominator will consist of patients who were readmitted <em>after</em> 5 days from discharge or were not readmitted at all. For the Day 20 model, patients could not have been readmitted before 20 days post-discharge. And so on. So, these denominators get smaller and smaller for increasing time points.</p>
</section>
<section id="numerator" class="level3">
<h3 class="anchored" data-anchor-id="numerator">3. Define the numerators</h3>
<p>Analogously, we need to indicate whether the patient was in fact readmitted or not. For the Day 5 denominator, we want to indicate if they were readmitted in the following 25 days. For the Day 20 denominator, we want to indicate if they were readmitted in the following 10 days.</p>
<section id="choosing-an-outcome-distribution" class="level4">
<h4 class="anchored" data-anchor-id="choosing-an-outcome-distribution">Choosing an outcome distribution</h4>
<p>As defined, the numerator sets us up for a <a href="https://en.wikipedia.org/wiki/Binary_classification">binary classification</a> model, like <a href="https://en.wikipedia.org/wiki/Logistic_regression">logistic regression</a>, or other more complex <a href="https://en.wikipedia.org/wiki/Machine_learning">machine learning</a> (ML) algorithms. However, we may alternatively think of keeping track of the actual number of days until the readmission, which conforms more to a <a href="https://en.wikipedia.org/wiki/Survival_analysis">time-to-event</a> model, like <a href="https://en.wikipedia.org/wiki/Proportional_hazards_model">Cox-proportional hazards</a> (or an <a href="https://www.randomforestsrc.org/articles/survival.html">ML equivalent</a>). In the latter case, we could choose to <a href="https://en.wikipedia.org/wiki/Censoring_(statistics)">censor</a> patients at the <img src="https://latex.codecogs.com/png.latex?D-T"> time point (e.g., 30 days), or a later time point.</p>
<p>The thing I like about the time-to-event setup is that it allows us to more easily estimate risks for any future time points, like 60 days or 90 days, and doesn’t try to equate a patient who was readmitted at 31 days with a patient who was not readmitted at all, as a binary outcome would. We know practically that these patients are not the same.</p>
</section>
</section>
<section id="append-the-predictors" class="level3">
<h3 class="anchored" data-anchor-id="append-the-predictors">4. Append the predictors</h3>
<p>Now the fun part. At a given point in time, what do we think are the main indicators/events that occur during follow-up that will impact the risk that a patient will ultimately be readmitted? Here I’m mostly focused on interventions, assuming most of the demographics and clinical history is already accounted for in the baseline risk (and we can actually use this risk as a predictor in all of the models too). This might include things like:</p>
<ul>
<li>Did they complete a follow-up visit?</li>
<li>Do they have a visit scheduled?</li>
<li>Did they fill their prescription?</li>
<li>Do they have transportation means?</li>
<li>…the list goes on…</li>
</ul>
<p>We need to define these at time point <img src="https://latex.codecogs.com/png.latex?T"> for each model. Now this list might vary by time point (e.g., if the things impacting readmission risk at Day 1 are different than those at Day 25), by disease (e.g., readmission risk factors for a post-surgical patient are presumably different than a COPD patient), or a host of other things. We might have different models by these different subgroups, so all of these nuances should be discussed as a preparatory step to modeling.</p>
</section>
<section id="calibrate-to-baseline-risk" class="level3">
<h3 class="anchored" data-anchor-id="calibrate-to-baseline-risk">5. Calibrate to baseline risk</h3>
<p>We can think of what we’re constructing as a <em>risk trajectory</em> over follow-up, so if we’re using an already-established baseline risk score at discharge, then each of our subsequent predictions can just be transformed to be relative to that, making a smooth transition from discharge through follow-up.</p>
<p>Now, you can train the models. It would also be a good idea to concurrently setup a model monitoring mechanism, to be able to continually track and evaluate model performance over time, and perform ad-hoc analysis related to the effectiveness of the tool in its intent. Setting up this infrastructure in advance allows agility in the ability to iterate and update models as needed to ensure that they stay relevant.</p>
</section>
</section>
<section id="an-updated-prototype" class="level2">
<h2 class="anchored" data-anchor-id="an-updated-prototype">An updated prototype</h2>
<p>Now suppose we have built our models and integrated them into the tool above, such that each patient gets attached to them their current risk of readmission as of <em>now</em> (the source code for this is <a href="https://github.com/centralstatz/ExampleApps/tree/main/ReadmissionRiskPool">here</a>).</p>
<p><img src="https://www.zajichekstats.com/post/managing-the-readmission-risk-pool/app2.png" class="img-fluid"></p>
<p>For example, in the popup box, Patient 254 was discharged from the hospital 12 days ago with a readmission risk of 14.1%, and has 18 days left in the risk pool. However, we now see that this patient’s <em>current</em> risk of readmission has been reduced to 2.4%, presumably due to the various interventions and events that have taken place since then. Additionally, we can see that, as a group, we expected 37 (7.4%) patients to be readmitted once discharged, but now we only expect 23 (4.6%) to be, indicating that as a hospital (system) our interventions seem to be effectively reducing risk. But we do still have the ability to identify those individuals <em>still currently</em> at high risk, by finding the dots on the map that are largest and closer to red in gradient.</p>
<p>The aggregated expected readmission rate could be a useful metric for hospital leadership to keep a “global” pulse with, being able to see when an influx of readmissions may be coming, but is driven by the individual risk estimates that the front line care teams are working with, so it’s all tied together. The longitudinal look at these metrics would also be useful in the same vein for answering questions like, “How many readmissions are we expecting in the next week?”, which would allow you to promptly allocate extra resources to put out that fire in real-time. Analogously, we could add <em>risk trajectories</em> for individual patients, such that you can view how a patient’s risk has changed each day since discharge, to get a better feel for which interventions helped reduce it (or didn’t have any effect).</p>
<p>Now we have a tool in place to proactively identify high-risk patients in real-time that is optimally integrated into clinical/operational workflows. We also have a real-time pulse on the bigger picture, such as what the overall readmission rate is, how it is changing over time, the expected <a href="https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/hospital-readmissions-reduction-program-hrrp">HRRP</a> penalty amount, etc. Again, although there has been a significant improvement in the overall management of readmissions, these components may reach their limit of intended purpose, and people might start asking questions like: <em>“We can accurately identify <em>who</em> is at highest risk, but what is the best intervention to use?”</em> That’s where the last part comes in.</p>
</section>
</section>
<section id="step-3-what-should-we-do-about-it" class="level1">
<h1>Step 3: What should we do about it?</h1>
<p>If we’ve gotten to this point, there’s probably some good stuff happening, but we can always go further. Readmission reduction doesn’t necessarily have a one-size-fits-all solution, across different hospital systems or even patient populations within a hospital. Process improvement is a big component of how hospitals operate, and it would be beneficial to have the ability to customize and “try out” various interventions to see what does and doesn’t work and on which patients. Thus, we can use experimentation to rapidly test interventions to help inform <em>what</em> should done for an individual patient. Then, as this data is collected, can feed back into subsequent iterations of the predictive models to better quantify risk for individual patients.</p>
<p>For example, we may want to know whether a text or a phone call is more effective, or if we should auto-schedule follow-up appointments, or send transportation resources to be able to get patients’ prescriptions filled. In considering these things, we want to carefully design a robust infrastructure and data collection mechanisms to enable teams to rapidly test new ideas, but ensure it feeds seamlessly into the overall readmission management strategy, the central tool, and maintains statistical soundness.</p>
</section>
<section id="some-considerations" class="level1">
<h1>Some considerations</h1>
<ul>
<li><p>It is important to identify the guiding measures that will objectively indicate if these efforts are paying off. Whether it’s just the overall readmission rate, the <a href="https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/hospital-readmissions-reduction-program-hrrp">HRRP</a> penalty amount, or a much more complex calculation of cost/benefit–allow that to guide development.</p></li>
<li><p>Having cross-functional involvement and buy in is essential. Someone can grab data from a database and build a model on their computer that is really accurate (statistically), but if it doesn’t fit into any clinical/operational workflow or doesn’t have the support it needs to work well in the system, does it really have value? Most of the success of a model in realizing real impact probably has little to do with the math, and mostly to do with the people and implementation.</p></li>
<li><p>Finding ways to leverage previously-established information can provide a head start. If another hospital has developed models that <em>partially</em> work for your patients, can we develop methods such that we <em>augment</em> that with what is needed for our purposes, instead of starting from scratch?</p></li>
</ul>
    <bluesky-comments post="at://did:plc:sh3av73hihgu72vx7k44kgv7/app.bsky.feed.post/3lcicrrlvgc2k" config="{}"></bluesky-comments>
  


<!-- -->

</section>

 ]]></description>
  <category>Healthcare</category>
  <category>Modeling</category>
  <category>Prediction</category>
  <category>Readmissions</category>
  <guid>https://www.zajichekstats.com/post/managing-the-readmission-risk-pool/</guid>
  <pubDate>Fri, 26 Jul 2024 05:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/managing-the-readmission-risk-pool/feature.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>How do you assess the proportional-odds assumption? Directly.</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/how-to-assess-the-proportional-odds-assumption/</link>
  <description><![CDATA[ 




<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/9q_0tWT89W4" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<p>This dataset comes from the <a href="https://data.world/durhamnc/2011-resident-survey">2011 Annual Resident Survey</a> in Durham, NC.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load packages</span></span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Import the dataset</span></span>
<span id="cb1-5">dat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_delim</span>(</span>
<span id="cb1-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://query.data.world/s/zr3uaxpaagbzddttoreosktj2zy7lm?dws=00000"</span>, </span>
<span id="cb1-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">delim =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">";"</span>,</span>
<span id="cb1-8">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" "</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"NA"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"N/A"</span>)</span>
<span id="cb1-9">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-10">  </span>
<span id="cb1-11">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Keep a few columns</span></span>
<span id="cb1-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">transmute</span>(</span>
<span id="cb1-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">QOL =</span> </span>
<span id="cb1-14">      q3f_quality_of_life_in_city <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-15">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-16">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_relevel</span>(</span>
<span id="cb1-17">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Very Dissatisfied"</span>,</span>
<span id="cb1-18">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Dissatisfied"</span>,</span>
<span id="cb1-19">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Neutral"</span>,</span>
<span id="cb1-20">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Satisfied"</span>,</span>
<span id="cb1-21">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Very Satisfied"</span></span>
<span id="cb1-22">      ),</span>
<span id="cb1-23">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Age =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_remove</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">18_34_years</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"(?i)</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">syears$"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(),</span>
<span id="cb1-24">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Income =</span> </span>
<span id="cb1-25">      q38_annual_household_income <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-26">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-27">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_relevel</span>(</span>
<span id="cb1-28">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Under $30,000"</span>,</span>
<span id="cb1-29">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"$30,000 to $59,999"</span>,</span>
<span id="cb1-30">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"$60,000 to $99,999"</span>,</span>
<span id="cb1-31">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"$100,000 or more"</span></span>
<span id="cb1-32">      ),</span>
<span id="cb1-33">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Sex =</span> q34_respondents_gender</span>
<span id="cb1-34">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-35">  </span>
<span id="cb1-36">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Remove missing cases</span></span>
<span id="cb1-37">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">na.omit</span>()</span></code></pre></div>
</details>
</div>
<p>Suppose we are interested in understanding the relationship between resident age and their perceived quality of life in the city, after adjusting for gender and annual household income. We have the following observed distribution (note that we’ve removed missing data for simplicity):</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb2-2">  </span>
<span id="cb2-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute group summaries</span></span>
<span id="cb2-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb2-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">N =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>(),</span>
<span id="cb2-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> </span>
<span id="cb2-7">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb2-8">        QOL,</span>
<span id="cb2-9">        Age</span>
<span id="cb2-10">      )</span>
<span id="cb2-11">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span>  </span>
<span id="cb2-12">  </span>
<span id="cb2-13">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Flip the order</span></span>
<span id="cb2-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb2-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">QOL =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_rev</span>(QOL)</span>
<span id="cb2-16">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-17">  </span>
<span id="cb2-18">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a plot</span></span>
<span id="cb2-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb2-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(</span>
<span id="cb2-21">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb2-22">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Age,</span>
<span id="cb2-23">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> N,</span>
<span id="cb2-24">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> QOL</span>
<span id="cb2-25">    ),</span>
<span id="cb2-26">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>,</span>
<span id="cb2-27">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">75</span></span>
<span id="cb2-28">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-29">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb2-30">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb2-31">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.major.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray"</span>),</span>
<span id="cb2-32">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"top"</span>,</span>
<span id="cb2-33">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb2-34">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>),</span>
<span id="cb2-35">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>),</span>
<span id="cb2-36">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>),</span>
<span id="cb2-37">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">plot.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hjust =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb2-38">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-39">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Respondent Age (years)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-40">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Count"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-41">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Quality of life in the city"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/how-to-assess-the-proportional-odds-assumption/index_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Overall, it looks like older respondents tend to report more pessimistic views of quality of life.</p>
<section id="forming-a-model" class="level1">
<h1>Forming a model</h1>
<p>A typical approach for modeling <a href="https://en.wikipedia.org/wiki/Likert_scale">likert-like</a> data is to use a <a href="https://en.wikipedia.org/wiki/Ordered_logit">proportional-odds logistic regression model</a>. It is an extension of the widely-used binary <a href="https://en.wikipedia.org/wiki/Logistic_regression">logistic regression</a> model, with one key assumption: <em>the ratio of odds between two groups (e.g., ages 35-44 versus 18-34) of being at or above a response level (e.g., satisfied or very satisfied versus neutral, dissatisfied, or very dissatisfied) are <strong>proportional</strong> (i.e., the same) regardless of where we make that comparison in the outcome</em>. So this odds ratio would also be the same if we instead compared, for example, very satisfied versus everything else.</p>
<section id="what-does-that-mean" class="level2">
<h2 class="anchored" data-anchor-id="what-does-that-mean">What does that mean?</h2>
<p>Let’s clarify this by using our model output directly. We’ll fit the model, adjusted for annual household income and gender, using the <code>MASS::polr</code> function. The age group 18-34 will serve as the reference category in which all other age groups will be compared against. <em>Note: By default the package computes odds ratios in the opposite direction to what we what we want, so we invert them.</em></p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit the model</span></span>
<span id="cb3-2">mod <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb3-3">  MASS<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">polr</span>(</span>
<span id="cb3-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">formula =</span> QOL <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Age <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Income <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Sex,</span>
<span id="cb3-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> dat,</span>
<span id="cb3-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Hess =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb3-7">  )</span>
<span id="cb3-8"></span>
<span id="cb3-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a table of odds-ratios</span></span>
<span id="cb3-10">mod<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-11">  </span>
<span id="cb3-12">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert to data frame</span></span>
<span id="cb3-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">enframe</span>(</span>
<span id="cb3-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Term"</span>,</span>
<span id="cb3-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">value =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Estimate"</span></span>
<span id="cb3-16">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-17">  </span>
<span id="cb3-18">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Join to get the CI</span></span>
<span id="cb3-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">inner_join</span>(</span>
<span id="cb3-20">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> </span>
<span id="cb3-21">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get the 95% confidence intervals</span></span>
<span id="cb3-22">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">confint</span>(mod) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb3-23">      </span>
<span id="cb3-24">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert to tibble, add the coefficient names</span></span>
<span id="cb3-25">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as_tibble</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb3-26">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_column</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Term =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(mod<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients)),</span>
<span id="cb3-27">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Term"</span></span>
<span id="cb3-28">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb3-29">  </span>
<span id="cb3-30">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filter to age factor only</span></span>
<span id="cb3-31">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(Term, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^Age"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-32">  </span>
<span id="cb3-33">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clean up</span></span>
<span id="cb3-34">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb3-35">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Term =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_remove</span>(Term, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^Age"</span>),</span>
<span id="cb3-36">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(</span>
<span id="cb3-37">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.numeric),</span>
<span id="cb3-38">      \(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sprintf</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%.2f"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(x))</span>
<span id="cb3-39">    )</span>
<span id="cb3-40">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb3-41">  </span>
<span id="cb3-42">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rename</span></span>
<span id="cb3-43">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(</span>
<span id="cb3-44">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Age (years)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> Term,</span>
<span id="cb3-45">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Odds-ratio</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> Estimate,</span>
<span id="cb3-46">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Lower =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">97.5 %</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>,</span>
<span id="cb3-47">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Upper =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">2.5 %</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span></span>
<span id="cb3-48">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-49">  </span>
<span id="cb3-50">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Change location</span></span>
<span id="cb3-51">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">relocate</span>(Lower, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.before =</span> Upper) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-52">  </span>
<span id="cb3-53">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add the reference row</span></span>
<span id="cb3-54">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_row</span>(</span>
<span id="cb3-55">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Age (years)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"18-34"</span>,</span>
<span id="cb3-56">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Odds-ratio</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"-"</span>,</span>
<span id="cb3-57">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Lower =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"-"</span>,</span>
<span id="cb3-58">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Upper =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"-"</span>,</span>
<span id="cb3-59">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.before =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb3-60">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-61">  </span>
<span id="cb3-62">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a table</span></span>
<span id="cb3-63">  knitr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">kable</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"html"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-64">  kableExtra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">kable_styling</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-65">  kableExtra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_header_above</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"95% CI"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span></code></pre></div>
</details>
<div class="cell-output-display">
<table class="table caption-top table-sm table-striped small" data-quarto-postprocess="true">
<colgroup>
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
</colgroup>
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th" style="text-align: left; empty-cells: hide; border-bottom: hidden;"></th>
<th data-quarto-table-cell-role="th" style="text-align: left; empty-cells: hide; border-bottom: hidden;"></th>
<th colspan="2" data-quarto-table-cell-role="th" style="text-align: center; border-bottom: hidden; padding-bottom: 0; padding-left: 3px; padding-right: 3px;"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">
95% CI
</div></th>
</tr>
<tr class="even">
<th style="text-align: left;" data-quarto-table-cell-role="th">Age (years)</th>
<th style="text-align: left;" data-quarto-table-cell-role="th">Odds-ratio</th>
<th style="text-align: left;" data-quarto-table-cell-role="th">Lower</th>
<th style="text-align: left;" data-quarto-table-cell-role="th">Upper</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">18-34</td>
<td style="text-align: left;">-</td>
<td style="text-align: left;">-</td>
<td style="text-align: left;">-</td>
</tr>
<tr class="even">
<td style="text-align: left;">35-44</td>
<td style="text-align: left;">1.24</td>
<td style="text-align: left;">0.67</td>
<td style="text-align: left;">2.30</td>
</tr>
<tr class="odd">
<td style="text-align: left;">45-54</td>
<td style="text-align: left;">1.13</td>
<td style="text-align: left;">0.64</td>
<td style="text-align: left;">1.98</td>
</tr>
<tr class="even">
<td style="text-align: left;">55-64</td>
<td style="text-align: left;">1.74</td>
<td style="text-align: left;">0.97</td>
<td style="text-align: left;">3.12</td>
</tr>
<tr class="odd">
<td style="text-align: left;">65-74</td>
<td style="text-align: left;">3.02</td>
<td style="text-align: left;">1.40</td>
<td style="text-align: left;">6.54</td>
</tr>
<tr class="even">
<td style="text-align: left;">75+</td>
<td style="text-align: left;">0.93</td>
<td style="text-align: left;">0.30</td>
<td style="text-align: left;">2.88</td>
</tr>
</tbody>
</table>
</div>
</div>
<p>Generally, the estimates pan out roughly how we suspected. In particular, the estimated odds of <em>worse</em> perceived quality of life in the city for 65-74 year olds are 3 times that of 18-34 year olds, after adjusting for annual household income and gender (with a 95% <a href="https://en.wikipedia.org/wiki/Confidence_interval">confidence interval</a> of 1.4 to 6.5).</p>
<p>Again, this interpretation is assumed to hold true if “worse” is defined as <em>very dissatisfied</em> versus everything else, or <em>very dissatisfied</em> through <em>satisfied</em> versus <em>very satisfied</em>, and everything in between.</p>
</section>
</section>
<section id="does-the-assumption-hold" class="level1">
<h1>Does the assumption hold?</h1>
<p>The question becomes whether that big assumption of proportional-odds actually holds. We may have reason to think, from the data or gut instinct, that it might not. Well, one simple way to check is by <em>directly</em> assessing what it implies.</p>
<section id="continue" class="level2">
<h2 class="anchored" data-anchor-id="continue">Continue…</h2>
<p>We said earlier that the model assumes the same odds ratios for any mutually exclusive comparison of the ordered response categories. Thus, we can free up this constraint by thinking of constructing a collection of <em>binary</em> logistic regression models: one for each of those ordinal comparisons. Specifically,</p>
<ul>
<li><em>Very Dissatisfied</em> versus everything else</li>
<li><em>Very Dissatisfied</em> or <em>Dissatisfied</em> versus everything else</li>
<li><em>Very Dissatisfied</em>, <em>Dissatisfied</em>, or <em>Neutral</em> versus <em>Satisfied</em> or <em>Very Satisfied</em></li>
<li><em>Very Dissatisfied</em> through <em>Satisfied</em> versus <em>Very Satisfied</em></li>
</ul>
<p>Then, we simply just look to see if the resulting odds ratios are reasonably similar across all of those models. If so, then we can be somewhat confident that it <em>can</em> be reduced to a single model, and stick with our original proportional-odds estimates.</p>
<p>My preference is to do this in a plot.</p>
</section>
<section id="making-the-plot" class="level2">
<h2 class="anchored" data-anchor-id="making-the-plot">Making the plot</h2>
<p>We’ll cycle through the response categories, iteratively define the binary outcomes as described above, and then fit a logistic regression model for each definition. Once we do this, we get the following plot:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set the number of comparisons</span></span>
<span id="cb4-2">n_comp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n_distinct</span>(dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>QOL) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb4-3"></span>
<span id="cb4-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make each data set</span></span>
<span id="cb4-5"><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>n_comp <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-6">  </span>
<span id="cb4-7">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># For each set</span></span>
<span id="cb4-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_df</span>(</span>
<span id="cb4-9">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(.index) {</span>
<span id="cb4-10">      </span>
<span id="cb4-11">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract the current response set</span></span>
<span id="cb4-12">      temp_resp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">levels</span>(dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>QOL)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>.index]</span>
<span id="cb4-13">      </span>
<span id="cb4-14">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create binary outcome in the data</span></span>
<span id="cb4-15">      temp_dat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb4-16">        dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-17">        </span>
<span id="cb4-18">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create target</span></span>
<span id="cb4-19">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb4-20">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Response =</span> </span>
<span id="cb4-21">            <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb4-22">              QOL <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> temp_resp <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,</span>
<span id="cb4-23">              <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb4-24">            )</span>
<span id="cb4-25">        )</span>
<span id="cb4-26">      </span>
<span id="cb4-27">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit the binary logistic regression model</span></span>
<span id="cb4-28">      temp_mod <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb4-29">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glm</span>(</span>
<span id="cb4-30">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">formula =</span> Response <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Age <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Income <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Sex,</span>
<span id="cb4-31">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> temp_dat,</span>
<span id="cb4-32">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"binomial"</span></span>
<span id="cb4-33">        )</span>
<span id="cb4-34">      </span>
<span id="cb4-35">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a table of odds-ratios</span></span>
<span id="cb4-36">      temp_mod<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-37">        </span>
<span id="cb4-38">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert to data frame</span></span>
<span id="cb4-39">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">enframe</span>(</span>
<span id="cb4-40">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Term"</span>,</span>
<span id="cb4-41">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">value =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Estimate"</span></span>
<span id="cb4-42">        ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-43">        </span>
<span id="cb4-44">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Join to get the CI</span></span>
<span id="cb4-45">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">inner_join</span>(</span>
<span id="cb4-46">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> </span>
<span id="cb4-47">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get the 95% confidence intervals</span></span>
<span id="cb4-48">            <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">confint.default</span>(temp_mod) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb4-49">            </span>
<span id="cb4-50">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert to tibble, add the coefficient names</span></span>
<span id="cb4-51">            <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as_tibble</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb4-52">            <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_column</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Term =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(temp_mod<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients)),</span>
<span id="cb4-53">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Term"</span></span>
<span id="cb4-54">        ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb4-55">        </span>
<span id="cb4-56">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filter to age factor only</span></span>
<span id="cb4-57">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(Term, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^Age"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-58">        </span>
<span id="cb4-59">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clean up</span></span>
<span id="cb4-60">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb4-61">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Term =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_remove</span>(Term, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^Age"</span>),</span>
<span id="cb4-62">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(</span>
<span id="cb4-63">            <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.numeric),</span>
<span id="cb4-64">            \(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(x)</span>
<span id="cb4-65">          )</span>
<span id="cb4-66">        ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-67">        </span>
<span id="cb4-68">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rename</span></span>
<span id="cb4-69">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(</span>
<span id="cb4-70">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Age =</span> Term,</span>
<span id="cb4-71">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">OR =</span> Estimate,</span>
<span id="cb4-72">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Lower =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">2.5 %</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>,</span>
<span id="cb4-73">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Upper =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">97.5 %</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span></span>
<span id="cb4-74">        ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-75">        </span>
<span id="cb4-76">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add the reference row</span></span>
<span id="cb4-77">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_row</span>(</span>
<span id="cb4-78">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Age =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"18-34"</span>,</span>
<span id="cb4-79">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">OR =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,</span>
<span id="cb4-80">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Lower =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,</span>
<span id="cb4-81">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Upper =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,</span>
<span id="cb4-82">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.before =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb4-83">        ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-84">        </span>
<span id="cb4-85">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Attach outcome level</span></span>
<span id="cb4-86">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_column</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">QOL =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">levels</span>(dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>QOL)[.index])</span>
<span id="cb4-87">      </span>
<span id="cb4-88">    },</span>
<span id="cb4-89">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.id =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Order"</span></span>
<span id="cb4-90">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-91">  </span>
<span id="cb4-92">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make the factor</span></span>
<span id="cb4-93">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb4-94">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Order =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(Order),</span>
<span id="cb4-95">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">QOL =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(QOL) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_reorder</span>(Order)</span>
<span id="cb4-96">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-97">  </span>
<span id="cb4-98">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a plot</span></span>
<span id="cb4-99">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(</span>
<span id="cb4-100">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb4-101">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> QOL,</span>
<span id="cb4-102">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> OR,</span>
<span id="cb4-103">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">group =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb4-104">    )</span>
<span id="cb4-105">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-106">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(</span>
<span id="cb4-107">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb4-108">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> Age</span>
<span id="cb4-109">    ),</span>
<span id="cb4-110">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linewidth =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span></span>
<span id="cb4-111">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-112">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(</span>
<span id="cb4-113">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb4-114">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> Age</span>
<span id="cb4-115">    ),</span>
<span id="cb4-116">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span></span>
<span id="cb4-117">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-118">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_ribbon</span>(</span>
<span id="cb4-119">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb4-120">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ymin =</span> Lower,</span>
<span id="cb4-121">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ymax =</span> Upper,</span>
<span id="cb4-122">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> Age</span>
<span id="cb4-123">    ),</span>
<span id="cb4-124">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25</span></span>
<span id="cb4-125">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-126">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_hline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">yintercept =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-127">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(Age, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" years"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-128">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_cartesian</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylim =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-129">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb4-130">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb4-131">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.major.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray"</span>),</span>
<span id="cb4-132">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>,</span>
<span id="cb4-133">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>),</span>
<span id="cb4-134">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>),</span>
<span id="cb4-135">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">angle =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">45</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>),</span>
<span id="cb4-136">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">plot.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hjust =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>),</span>
<span id="cb4-137">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strip.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>)</span>
<span id="cb4-138">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-139">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Response"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-140">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Odds-ratio (95% CI) for being at or below response level"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/how-to-assess-the-proportional-odds-assumption/index_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Unfortunately we’re quite plagued by variability here, especially in the lower-end (i.e., <em>very dissatisfied</em> versus everything else), due to scanty event volumes, but you get the picture. Actually, for 65-74 year olds, the proportional-odds assumption seems to be a reasonable one: it was estimated earlier at 3.02, and we see the point estimates across these binary models vary between 2.5-3.5.</p>
<p>For other age categories, it may not be so good of an assumption. It looks like 35-44 and 55-64 year olds tend to have a much higher odds of responding <em>very dissatisfied</em> relative to 18-34 year olds, but there is much less of a difference (in all age categories) for the odds of responding <em>very satisfied</em>, suggesting something like older residents may make a point to select the least favorable response but don’t see much difference between being <em>satisfied</em> or <em>very satisfied</em>.</p>
</section>
<section id="so-what-do-we-do-in-practice" class="level2">
<h2 class="anchored" data-anchor-id="so-what-do-we-do-in-practice">So what do we do in practice?</h2>
<p>First, the same proportional-odds assumptions hold for all covariates in the model, so we would also want to assess this for annual household income and gender. Second, if the assumption is not met, then we need to accommodate that by introducing more flexibility into the model. That may be by being clever with interaction terms, defining sensible groups to create, or by using separate binary models for each possible comparison, as we’ve done here. It’s really a judgement call.</p>
    <bluesky-comments post="at://did:plc:sh3av73hihgu72vx7k44kgv7/app.bsky.feed.post/3lc73sgvuqk2t" config="{}"></bluesky-comments>
  


<!-- -->

</section>
</section>

 ]]></description>
  <category>Regression</category>
  <guid>https://www.zajichekstats.com/post/how-to-assess-the-proportional-odds-assumption/</guid>
  <pubDate>Fri, 19 Jul 2024 05:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/how-to-assess-the-proportional-odds-assumption/feature.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>A simple example why statistical significance is insufficient for action</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/simple-example-why-statistical-significance-is-insufficient/</link>
  <description><![CDATA[ 




<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/IWWONFhgVY4" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<p>When we see the phrase <em>statistically significant</em>, we’re often meant to believe it means that the result matters, but that is not the case. Here is a simple example why.</p>
<blockquote class="blockquote">
<p><span style="text-decoration: underline;">Context</span>: <em>Suppose we are trying to hone in on a market segment that yields higher sales so that we can develop better strategies for acquiring customers in that group.</em></p>
</blockquote>
<p>We decide to correlate the customer age with the sales amount. Suppose there are two scenarios from a sample of 500 customers:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load packages</span></span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="do" style="color: #5E5E5E;
background-color: null;
font-style: italic;">## Simulate some data</span></span>
<span id="cb1-5"></span>
<span id="cb1-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set the seed for reproducibility</span></span>
<span id="cb1-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">123456789</span>)</span>
<span id="cb1-8"></span>
<span id="cb1-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sample size</span></span>
<span id="cb1-10">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">500</span></span>
<span id="cb1-11"></span>
<span id="cb1-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create the data set</span></span>
<span id="cb1-13">sim_dat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb1-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb1-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Age =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">18</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rgamma</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scale =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)),</span>
<span id="cb1-16">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Sales_Large =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">500</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (Age <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(Age)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>),</span>
<span id="cb1-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Sales_Small =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">500</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (Age <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(Age)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb1-18">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-19">  </span>
<span id="cb1-20">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send down the rows</span></span>
<span id="cb1-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb1-22">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">starts_with</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sales"</span>),</span>
<span id="cb1-23">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Effect"</span>,</span>
<span id="cb1-24">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sales"</span>,</span>
<span id="cb1-25">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_prefix =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sales_"</span></span>
<span id="cb1-26">  ) </span>
<span id="cb1-27"></span>
<span id="cb1-28">sim_dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-29">  </span>
<span id="cb1-30">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a paneled scatterplot</span></span>
<span id="cb1-31">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-32">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(</span>
<span id="cb1-33">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb1-34">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Age,</span>
<span id="cb1-35">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> Sales,</span>
<span id="cb1-36">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> Effect</span>
<span id="cb1-37">    ),</span>
<span id="cb1-38">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span>,</span>
<span id="cb1-39">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>,</span>
<span id="cb1-40">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,</span>
<span id="cb1-41">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb1-42">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-43">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>Effect, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"free_y"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-44">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb1-45">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb1-46">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>,</span>
<span id="cb1-47">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strip.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb1-48">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb1-49">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.ticks.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb1-50">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>),</span>
<span id="cb1-51">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>),</span>
<span id="cb1-52">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.spacing.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unit</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lines"</span>)</span>
<span id="cb1-53">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-54">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Customer Age (years)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-55">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sales ($)"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/simple-example-why-statistical-significance-is-insufficient/index_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>At a glance these graphs look very similar, such that age is positively correlated with the sales amount. We fit a <a href="https://en.wikipedia.org/wiki/Linear_regression">linear regression</a> model, or “best-fit line”, to summarize and describe the relationship.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit the models</span></span>
<span id="cb2-2">sim_models <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb2-3">  sim_dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-4">  </span>
<span id="cb2-5">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Nest the data</span></span>
<span id="cb2-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nest</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> Effect) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-7">  </span>
<span id="cb2-8">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit a linear model for each data set; get p-values</span></span>
<span id="cb2-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb2-10">    </span>
<span id="cb2-11">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit the model</span></span>
<span id="cb2-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> </span>
<span id="cb2-13">      data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-14">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(</span>
<span id="cb2-15">        \(.dat) </span>
<span id="cb2-16">        </span>
<span id="cb2-17">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(</span>
<span id="cb2-18">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">formula =</span> Sales <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Age,</span>
<span id="cb2-19">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> .dat</span>
<span id="cb2-20">        )</span>
<span id="cb2-21">        </span>
<span id="cb2-22">      ),</span>
<span id="cb2-23">    </span>
<span id="cb2-24">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute the p-value</span></span>
<span id="cb2-25">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pvalue =</span> </span>
<span id="cb2-26">      model <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-27">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(</span>
<span id="cb2-28">        \(.model) </span>
<span id="cb2-29">        <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pt</span>(</span>
<span id="cb2-30">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">q =</span> .model<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">diag</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vcov</span>(.model))), </span>
<span id="cb2-31">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">df =</span> .model<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>df.residual, </span>
<span id="cb2-32">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lower.tail =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span></span>
<span id="cb2-33">        )[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]]</span>
<span id="cb2-34">      )</span>
<span id="cb2-35">  )</span>
<span id="cb2-36"></span>
<span id="cb2-37">sim_dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-38">  </span>
<span id="cb2-39">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a paneled scatterplot</span></span>
<span id="cb2-40">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-41">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(</span>
<span id="cb2-42">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb2-43">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Age,</span>
<span id="cb2-44">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> Sales,</span>
<span id="cb2-45">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> Effect</span>
<span id="cb2-46">    ),</span>
<span id="cb2-47">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span>,</span>
<span id="cb2-48">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>,</span>
<span id="cb2-49">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,</span>
<span id="cb2-50">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb2-51">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-52">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_smooth</span>(</span>
<span id="cb2-53">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb2-54">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Age,</span>
<span id="cb2-55">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> Sales</span>
<span id="cb2-56">    ),</span>
<span id="cb2-57">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">formula =</span> y<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>x,</span>
<span id="cb2-58">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>,</span>
<span id="cb2-59">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>,</span>
<span id="cb2-60">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">se =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span></span>
<span id="cb2-61">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-62">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(</span>
<span id="cb2-63">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> </span>
<span id="cb2-64">      sim_models <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-65">      </span>
<span id="cb2-66">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract the p-value</span></span>
<span id="cb2-67">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> pvalue) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-68">      </span>
<span id="cb2-69">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clean up p-value</span></span>
<span id="cb2-70">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb2-71">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pvalue =</span> </span>
<span id="cb2-72">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb2-73">            pvalue <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.001</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;0.001"</span>,</span>
<span id="cb2-74">            <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.character</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(pvalue, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))</span>
<span id="cb2-75">          )</span>
<span id="cb2-76">      ),</span>
<span id="cb2-77">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb2-78">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">60</span>,</span>
<span id="cb2-79">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">160</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">496.5</span>),</span>
<span id="cb2-80">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"P-value: "</span>, pvalue)</span>
<span id="cb2-81">    )</span>
<span id="cb2-82">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-83">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>Effect, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"free_y"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-84">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb2-85">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb2-86">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>,</span>
<span id="cb2-87">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strip.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb2-88">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb2-89">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.ticks.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb2-90">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>),</span>
<span id="cb2-91">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>),</span>
<span id="cb2-92">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.spacing.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unit</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lines"</span>)</span>
<span id="cb2-93">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-94">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Customer Age (years)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-95">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sales ($)"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/simple-example-why-statistical-significance-is-insufficient/index_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>The <a href=""><em>p-values</em></a> for both of these models are extremely, and <em>equally</em>, small (&lt;0.1%), indicating <a href="">statistical significance</a>. In fact, the evidence is so strong that some might say it is <em>very</em> significant–much smaller than the standard (and <a href="">infamous</a>) rule of thumb threshold of 5%.</p>
<p><strong>It is based on this information alone that often would elicit the conclusion/statement/finding that <em>age is significantly associated with sales</em></strong>.</p>
<p>You hear this language all the time, especially in research. It brings with it certain implications of importance, as if it has now become a meaningful fact that should warrant attention and/or action.</p>
<p>The problem?</p>
<p>Watch what happens when we add the actual scales for sales to these graphs (if you’ve noticed, they’ve been missing this whole time):</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">sim_dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-2">    </span>
<span id="cb3-3">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a paneled scatterplot</span></span>
<span id="cb3-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(</span>
<span id="cb3-6">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb3-7">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Age,</span>
<span id="cb3-8">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> Sales,</span>
<span id="cb3-9">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> Effect</span>
<span id="cb3-10">        ),</span>
<span id="cb3-11">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span>,</span>
<span id="cb3-12">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>,</span>
<span id="cb3-13">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,</span>
<span id="cb3-14">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb3-15">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-16">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_smooth</span>(</span>
<span id="cb3-17">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb3-18">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Age,</span>
<span id="cb3-19">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> Sales</span>
<span id="cb3-20">        ),</span>
<span id="cb3-21">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">formula =</span> y<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>x,</span>
<span id="cb3-22">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>,</span>
<span id="cb3-23">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>,</span>
<span id="cb3-24">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">se =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span></span>
<span id="cb3-25">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-26">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(</span>
<span id="cb3-27">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> </span>
<span id="cb3-28">            sim_models <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-29">            </span>
<span id="cb3-30">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract the p-value</span></span>
<span id="cb3-31">            <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> pvalue) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-32">            </span>
<span id="cb3-33">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clean up p-value</span></span>
<span id="cb3-34">            <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb3-35">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pvalue =</span> </span>
<span id="cb3-36">                    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb3-37">                        pvalue <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.001</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;0.001"</span>,</span>
<span id="cb3-38">                        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.character</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(pvalue, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))</span>
<span id="cb3-39">                    )</span>
<span id="cb3-40">            ),</span>
<span id="cb3-41">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb3-42">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">60</span>,</span>
<span id="cb3-43">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">160</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">496.5</span>),</span>
<span id="cb3-44">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"P-value: "</span>, pvalue)</span>
<span id="cb3-45">        )</span>
<span id="cb3-46">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-47">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>Effect, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"free_y"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-48">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb3-49">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb3-50">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>,</span>
<span id="cb3-51">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strip.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb3-52">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>),</span>
<span id="cb3-53">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>),</span>
<span id="cb3-54">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.spacing.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unit</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lines"</span>)</span>
<span id="cb3-55">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-56">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Customer Age (years)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-57">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_continuous</span>(</span>
<span id="cb3-58">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sales ($)"</span>,</span>
<span id="cb3-59">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> \(x) scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dollar</span>(x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">accuracy =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb3-60">    )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/simple-example-why-statistical-significance-is-insufficient/index_files/figure-html/unnamed-chunk-3-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>On the left panel, the range of sales goes from approximately $200 to $1000 per customer in an increasing fashion with age. On the right panel, it goes from about $497 to $503–a few dollars. See the issue yet? To solidify this, let’s look at the graphs when they are on the <em>same</em> scale:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">sim_dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-2">    </span>
<span id="cb4-3">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a paneled scatterplot</span></span>
<span id="cb4-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(</span>
<span id="cb4-6">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb4-7">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Age,</span>
<span id="cb4-8">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> Sales,</span>
<span id="cb4-9">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> Effect</span>
<span id="cb4-10">        ),</span>
<span id="cb4-11">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span>,</span>
<span id="cb4-12">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>,</span>
<span id="cb4-13">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,</span>
<span id="cb4-14">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb4-15">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-16">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_smooth</span>(</span>
<span id="cb4-17">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb4-18">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Age,</span>
<span id="cb4-19">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> Sales</span>
<span id="cb4-20">        ),</span>
<span id="cb4-21">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">formula =</span> y<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>x,</span>
<span id="cb4-22">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>,</span>
<span id="cb4-23">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>,</span>
<span id="cb4-24">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">se =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span></span>
<span id="cb4-25">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-26">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(</span>
<span id="cb4-27">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> </span>
<span id="cb4-28">            sim_models <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-29">            </span>
<span id="cb4-30">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract the p-value</span></span>
<span id="cb4-31">            <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> pvalue) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-32">            </span>
<span id="cb4-33">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clean up p-value</span></span>
<span id="cb4-34">            <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb4-35">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pvalue =</span> </span>
<span id="cb4-36">                    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb4-37">                        pvalue <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.001</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;0.001"</span>,</span>
<span id="cb4-38">                        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.character</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(pvalue, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))</span>
<span id="cb4-39">                    )</span>
<span id="cb4-40">            ),</span>
<span id="cb4-41">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb4-42">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">60</span>,</span>
<span id="cb4-43">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">160</span>,</span>
<span id="cb4-44">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"P-value: "</span>, pvalue)</span>
<span id="cb4-45">        )</span>
<span id="cb4-46">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-47">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>Effect) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-48">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb4-49">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb4-50">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>,</span>
<span id="cb4-51">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strip.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb4-52">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>),</span>
<span id="cb4-53">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>),</span>
<span id="cb4-54">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.spacing.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unit</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lines"</span>)</span>
<span id="cb4-55">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-56">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Customer Age (years)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-57">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_continuous</span>(</span>
<span id="cb4-58">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sales ($)"</span>,</span>
<span id="cb4-59">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> \(x) scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dollar</span>(x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">accuracy =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb4-60">    )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/simple-example-why-statistical-significance-is-insufficient/index_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>The line in the right panel is basically flat.</p>
<p>It turns out that although these two graphs have the <strong>same</strong> amount of <em>statistical</em> significance, they clearly tell much different stories about how age relates to sales. The lines can be summarized as follows:</p>
<ul>
<li><p><span style="text-decoration: underline;">Left panel</span>: The average sales increases by $104.55 on average for every 10 year increase in customer age.</p></li>
<li><p><span style="text-decoration: underline;">Right panel</span>: The average sales increases by $0.94 on average for every 10 year increase in customer age.</p></li>
</ul>
<p>In the context of performing market segmentation for increased revenue (or whatever else it may be), these magnitudes certainly matter. Statistically, they don’t.</p>
<section id="some-takeaways" class="level2">
<h2 class="anchored" data-anchor-id="some-takeaways">Some takeaways</h2>
<ol type="1">
<li><p>Statistical significance only pertains to the <em>existence</em> of a relationship (under the implied assumptions), not the size of it.</p></li>
<li><p>You must pay attention to the <em>magnitude</em> of the relationship to gain any meaningful insight.</p></li>
<li><p>The magnitudes should be translated to the <em>real-world</em> implications of using the information for different decisions or courses of action. In the example above, the market age distribution looks like this:</p></li>
</ol>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">sim_dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-2">  </span>
<span id="cb5-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a paneled scatterplot</span></span>
<span id="cb5-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(</span>
<span id="cb5-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb5-7">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Age</span>
<span id="cb5-8">    ),</span>
<span id="cb5-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray"</span>,</span>
<span id="cb5-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>,</span>
<span id="cb5-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb5-12">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb5-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb5-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>,</span>
<span id="cb5-16">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strip.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb5-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>),</span>
<span id="cb5-18">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>)</span>
<span id="cb5-19">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Customer Age (years)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Customers"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/simple-example-why-statistical-significance-is-insufficient/index_files/figure-html/unnamed-chunk-5-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Although older customers yield higher sales, it may cost more marketing dollars to acquire any given individual from such a small segment, ultimately making the juice not worth the squeeze. Feeding estimates into cost-benefit or what-if scenarios can greatly increase confidence in how a proposed course of action would actually play out, instead of implementing things on the basis of mere existence (i.e., statistical significance).</p>
    <bluesky-comments post="at://did:plc:sh3av73hihgu72vx7k44kgv7/app.bsky.feed.post/3lc73zfajos2t" config="{}"></bluesky-comments>
  


<!-- -->

</section>

 ]]></description>
  <category>Statistical Significance</category>
  <category>Decision Making</category>
  <guid>https://www.zajichekstats.com/post/simple-example-why-statistical-significance-is-insufficient/</guid>
  <pubDate>Wed, 19 Jun 2024 05:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/simple-example-why-statistical-significance-is-insufficient/feature.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>5 ways to help ensure success of a statistical project</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/ways-to-ensure-success-of-statistical-project/</link>
  <description><![CDATA[ 




<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/blO16Uuo68E" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<p>Sometimes stats projects don’t go as planned. There are delays, setbacks, surprises, ambiguity, scope creep…the list goes on. All of these things can lead to a seemingly longer list of questions than what was started with: <em>Did we answer the research question? Are we confident in the result? What do we do now?</em> It’s a sense of dissatisfaction.</p>
<p>Many of these issues stem from the early phases of the project, and (maybe I should collect data on this) they can largely be alleviated when more care is taken at that stage. Actually, the <a href="https://theeffectbook.net/">book I’m currently reading</a> summed it up perfectly, <em>“Well-designed research is research capable of answering the question it’s trying to answer”</em>. Sounds obtuse, but it is undoubtedly true, even for projects outside of what you may define as “research”. This is about proper planning. So here are 5 things that can help increase the likelihood of a successful statistical project:</p>
<section id="assembleteam" class="level1">
<h1>1. Assemble your team…early</h1>
<p>It’s often a misconception that the statistician’s role is to simply analyze, or “run the tests on”, the data at the end once it is collected. This is far from optimal. Statistical analysis is not systematic or mechanical. Rather, it requires knowledge and intuition about the subject matter context. Add in a lack of transparency to the data collection itself, the chances of lost insight definitely increase. In fact, it might be the case that a few poor design choices end up adding huge complexities in answering the original question, or maybe even make it impossible altogether. So, if you have a project idea, consult your statistician! Early and often during the development phase.</p>
<p>Now, data people are certainly not the only ones you need involved early. Far from it. Who better to help shape the final product than the end-users—the people who will actually be using the information <em>and</em> know what works? If you want to integrate models into your operational workflow, what sort of resource or technical constraints may there be? Well, we probably have to talk to systems and IT people who will also likely need to commit their own resources for upkeep. And it comes full circle, because all of these nuances may even affect the statistical choices made from a mathematical perspective (i.e., design, modeling framework, etc.). There are plenty more important roles that could be described here, but the bottom line is that cross-functional collaboration, from the beginning, is crucial.</p>
</section>
<section id="make-the-goals-clear-then-plan-accordingly" class="level1">
<h1>2. Make the goals clear, then plan accordingly</h1>
<p>There is a common issue in data projects of ambiguous or non-specific objectives. Statistics is inherently “gray”, by definition, because there is no right answer. Uncertainty always exists. So unless you already know exactly what you’re looking for in the data, without a clearly defined goal, you can find yourself spinning in circles and never know when what you’ve done is sufficient to move on.</p>
<p>My recommendation is to have multiple levels: the <em>statistical</em> goals and the real-world goals that they are supporting. The statistical goals should be stated as clear, specific questions with quantitative answers that data (among other things) will be used to estimate. However, we have to think about how these statistical quantities will be <em>used</em> afterwards. For example, the approach taken to estimate the likelihood of a customer buying a product (a statistical goal) may differ if we’re trying to decrease costs versus increase retention, especially when thinking about the solution as a whole. The question comes down to what we are trying to accomplish with the new information. Once that is clear, we can envision the roadmap for how it will be used, which can be an anchor for developing the right methodology, making analytic decisions easier to manage.</p>
</section>
<section id="think-about-taking-action" class="level1">
<h1>3. Think about taking action</h1>
<p>Once you obtain the new information you set out to find, what are you going to do about it? Under what circumstances? Based on which results? Having some inkling as to what it is going to enable (or disable) someone to do–not just in general, but a specific example–adds clarity to the practical implications of investing the time and money into finding the answers. These things can unravel many of the nuances that were originally an oversight, and may end up causing changes to how the information gets disseminated, who gets involved and when, or even the math itself. All in all, it allows for more proper design at the beginning, a reduction of wasted time/resources, and a better chance of finding the right solution. The reason we perform statistical analysis is (or should be) to inform some action or decision. If it doesn’t, then you may need to think about why that is and adjust. My favorite way to frame this to someone is by asking a simple question: <em>“If you knew X at time Y then you could do Z. What are X, Y and Z?”</em>.</p>
</section>
<section id="create-a-tangible-product" class="level1">
<h1>4. Create a tangible product</h1>
<p>It may sound trivial, but it’s worth thinking about (and even explicitly defining). By what means will the result or solution be delivered? To whom? When? How? Is it going to be a comprehensive report or just a number sent in an email? It might even be a model deployed in the organization’s systems and workflows, or an application hosted on the web. These tell you all sorts of things about who should get involved (see #1) and what it will take to get there. It is about ensuring that the right information gets to the right people at the right time in an expected and predictable manner. It can be the case that a data project fails not because of bad statistics or models, but because it wasn’t disseminated optimally. Having a tangible end-product that you can work towards keeps everyone’s eye on the ball, exposes where the problems are, helps you plan deliverables, create milestones, and makes it clear when you are veering off course.</p>
</section>
<section id="answer-the-question-did-it-work" class="level1">
<h1>5. Answer the question, “did it work?”</h1>
<p>Oftentimes when you look at a statistic like a p-value, it leaves an empty feeling like you haven’t been convinced. That’s because it does not yet directly translate to the real-world impact the new insight is supposed to address. Maybe it’s useful during data analysis, but we should be thinking beyond the inferences made from the sample at hand to what we need to see to convince us that the results really matter. If the information we’ve garnered is actually useful, then we should expect improvements to play out where the rubber meets the road once it is utilized. The most direct way to do this: test it.</p>
    <bluesky-comments post="at://did:plc:sh3av73hihgu72vx7k44kgv7/app.bsky.feed.post/3lc747wvce22t" config="{}"></bluesky-comments>
  


</section>

 ]]></description>
  <category>Project Management</category>
  <guid>https://www.zajichekstats.com/post/ways-to-ensure-success-of-statistical-project/</guid>
  <pubDate>Thu, 16 May 2024 05:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/ways-to-ensure-success-of-statistical-project/feature.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>Quantum entanglement from a statistician’s perspective</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/quantum-entanglement-from-statistical-perspective/</link>
  <description><![CDATA[ 




<p><em>Quantum entanglement</em> is an intimidating phrase to encounter when you barely know what <em>quantum</em> means (and maybe it is even if you do). My daughter’s book, <a href="https://www.amazon.com/Quantum-Entanglement-Babies-Baby-University/dp/1492656232"><em>Quantum Entanglement for Babies</em></a>, also does a good job of keeping the mystery alive:</p>
<p><img src="https://www.zajichekstats.com/post/quantum-entanglement-from-statistical-perspective/QEPage.png" class="img-fluid"></p>
<p>Now I’ve just barely scratched the surface in quantum computing (and I mean <em>barely</em>, like I’ve gotten so far as to understand how to build a circuit to add two bits together. Yes, 1 + 1 = 2). But as I was going through the section on quantum entanglement in <a href="https://github.com/Qiskit/textbook/tree/main/notebooks/intro">this tutorial</a>, I immediately noticed something familiar that it was getting at (albeit in an unfamiliar, roundabout way). And that was <em>statistical independence</em>.</p>
<section id="some-background" class="level1">
<h1>Some background</h1>
<p>We can represent the state of <a href="https://en.wikipedia.org/wiki/Qubit"><em>qubits</em></a> (like a <a href="https://en.wikipedia.org/wiki/Bit"><em>bit</em></a>, but in quantum), at a given point in time, as <em>state vectors</em>, which (loosely) correspond to the probability they will be measured in a particular state.</p>
<p>For example, suppose we have a qubit, <img src="https://latex.codecogs.com/png.latex?q_0">, that has the following state vector:</p>
<p><img src="https://latex.codecogs.com/png.latex?q_0%20=%20%7C0%5Crangle%20=%20%5Cleft%5B%5Cbegin%7Barray%7D%7Bc%7D%201%20%5C%5C%200%20%5C%5C%20%5Cend%7Barray%7D%5Cright%5D"></p>
<p>The <em>positions</em> of the vector represent the possible states the qubit can be in. Namely, since it’s basically just a <a href="https://en.wikipedia.org/wiki/Bit">bit</a>, 0 (position 1) or 1 (position 2). The <em>entries</em> in the vector represent (again, loosely) the probability that the qubit will take on that state when measured. So in this example,</p>
<p><img src="https://latex.codecogs.com/png.latex?P(q_0%20=%200)%20=%201%20%5Chskip.1in%20P(q_0=1)=0"> It will <em>always</em> be measured in the 0 state.</p>
<p>Now suppose we introduce another qubit, <img src="https://latex.codecogs.com/png.latex?q_1">. And remember, computers just store information as sequences of bits. This qubit can also only be measured in states 0 or 1. Thus, the possible <a href="https://en.wikipedia.org/wiki/Binary_code">bit strings</a> are:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th><img src="https://latex.codecogs.com/png.latex?q_1q_0"></th>
<th>Represents the number…</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>00</td>
<td><img src="https://latex.codecogs.com/png.latex?(0%5Ctimes2%5E1)%20+%20(0%20%5Ctimes%202%5E0)%20=%200"></td>
</tr>
<tr class="even">
<td>01</td>
<td><img src="https://latex.codecogs.com/png.latex?(0%5Ctimes2%5E1)%20+%20(1%20%5Ctimes%202%5E0)%20=%201"></td>
</tr>
<tr class="odd">
<td>10</td>
<td><img src="https://latex.codecogs.com/png.latex?(1%5Ctimes2%5E1)%20+%20(0%20%5Ctimes%202%5E0)%20=%202"></td>
</tr>
<tr class="even">
<td>11</td>
<td><img src="https://latex.codecogs.com/png.latex?(1%5Ctimes2%5E1)%20+%20(1%20%5Ctimes%202%5E0)%20=%203"></td>
</tr>
</tbody>
</table>
<p>So one possible two-qubit state vector is:</p>
<p><img src="https://latex.codecogs.com/png.latex?%7C01%5Crangle%20=%20%5Cleft%5B%5Cbegin%7Barray%7D%7Bc%7D%200%20%5C%5C%201%20%5C%5C%200%20%5C%5C%200%20%5C%5C%20%5Cend%7Barray%7D%5Cright%5D"></p>
<p>where, again, the <em>positions</em> represent the possible sequences of qubits (00, 01, 10, 11; there will always be <img src="https://latex.codecogs.com/png.latex?2%5En"> possible states, where <img src="https://latex.codecogs.com/png.latex?n"> is the number of qubits), and the entries (for the third time, loosely) represent the probability of measuring that sequence. In this case,</p>
<p><img src="https://latex.codecogs.com/png.latex?P(q_0%20=%201%20%5Ccap%20q_1%20=%200)%20=%201;%20%5Chskip.1in%20P(%5Ctext%7Bother%20combos%7D)%20=%200"> So now we can imagine the more interesting case where more than one entry is non-zero, that is, multiple different states have a positive probability of being measured. Given it still has to lead to a valid probability distribution, this means that the 100% must be distributed amongst the possibilities.</p>
<p>The final thing I’ll leave here is that the entries actually represent the <em>square root</em> of the probability, which is why I’ve been emphasizing probability “loosely”. So the “valid probability distribution” constraint applies to the <em>square</em> of the vector entries. In the first example above, a more complete way to write this would be:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(q_0%20=%200)%20=%201%5E2%20=%201%20%5Chskip.1in%20P(q_0=1)%20=%200%5E2%20=%200"></p>
</section>
<section id="what-is-entanglement" class="level1">
<h1>What is entanglement?</h1>
<p>The <a href="https://github.com/Qiskit/textbook/blob/main/notebooks/intro/entangled-states.ipynb">tutorial</a> has us consider a couple of two-qubit state vectors:</p>
<p><img src="https://latex.codecogs.com/png.latex?%7C%5CPhi%5E+%5Crangle%20=%20%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%20%5Cleft%5B%5Cbegin%7Barray%7D%7Bc%7D%201%20%5C%5C%200%20%5C%5C%200%20%5C%5C%201%20%5C%5C%20%5Cend%7Barray%7D%5Cright%5D%20%5Chskip.2in%20%7C+0%5Crangle%20=%20%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%20%5Cleft%5B%5Cbegin%7Barray%7D%7Bc%7D%201%20%5C%5C%200%20%5C%5C%201%20%5C%5C%200%20%5C%5C%20%5Cend%7Barray%7D%5Cright%5D"></p>
<p>If we let <img src="https://latex.codecogs.com/png.latex?X%20=%20q_1q_0">, that is, the bit string measured from the qubits, these imply the following:</p>
<p><img src="https://latex.codecogs.com/png.latex?P_%7B%7C%5CPhi%5E+%5Crangle%7D(X%20=%2000)%20=%20P_%7B%7C%5CPhi%5E+%5Crangle%7D(X%20=%2011)%20=%20%5Cfrac%7B1%7D%7B2%7D"></p>
<p><img src="https://latex.codecogs.com/png.latex?P_%7B%7C+0%5Crangle%7D(X%20=%2000)%20=%20P_%7B%7C+0%5Crangle%7D(X%20=%2010)%20=%20%5Cfrac%7B1%7D%7B2%7D"></p>
<p>Notice how both bits change in <img src="https://latex.codecogs.com/png.latex?%7C%5CPhi%5E+%5Crangle">, but only one changes in <img src="https://latex.codecogs.com/png.latex?%7C+0%5Crangle">. The former is <em>entangled</em>, the latter is not. This is because we cannot separate <img src="https://latex.codecogs.com/png.latex?%7C%5CPhi%5E+%5Crangle"> into <a href="https://en.wikipedia.org/wiki/Quantum_superposition">superpositions</a> of two individual, one-qubit state vectors. But in <img src="https://latex.codecogs.com/png.latex?%7C+0%5Crangle">, we can:</p>
<p><img src="https://latex.codecogs.com/png.latex?q_0%20=%20%5Cleft%5B%5Cbegin%7Barray%7D%7Bc%7D%201%20%5C%5C%200%20%5C%5C%20%5Cend%7Barray%7D%5Cright%5D%20=%20%7C0%5Crangle"> <img src="https://latex.codecogs.com/png.latex?q_1%20=%20%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%20%5Cleft%5B%5Cbegin%7Barray%7D%7Bc%7D%201%20%5C%5C%201%20%5C%5C%20%5Cend%7Barray%7D%5Cright%5D%20=%20%7C+%5Crangle"></p>
<p>Implying that <img src="https://latex.codecogs.com/png.latex?q_0"> will always be measured to 0, and all uncertainty (random variability) lies in measuring <img src="https://latex.codecogs.com/png.latex?q_1">. This is known as a <em>product</em> state, because the probabilities in the two-qubit state vector can be determined by a cross-product of the individual ones.</p>
</section>
<section id="its-just-independence" class="level1">
<h1>It’s just independence</h1>
<p><a href="https://en.wikipedia.org/wiki/Independence_(probability_theory)">Statistical independence</a> occurs when the probability of observing an event does not change once we know something about another one. In our case, we can pretty clearly see this holds for <img src="https://latex.codecogs.com/png.latex?%7C+0%5Crangle"> but not <img src="https://latex.codecogs.com/png.latex?%7C%5CPhi%5E+%5Crangle">. Let’s look at the latter case.</p>
<p>From the the two-qubit state vector, we know the possible measurements are 00 or 11. Thus,</p>
<p><img src="https://latex.codecogs.com/png.latex?P_%7B%7C%5CPhi%5E+%5Crangle%7D(q_0%20=%200)%20=%20P_%7B%7C%5CPhi%5E+%5Crangle%7D(q_0%20=%201)%20=%20%5Cfrac%7B1%7D%7B2%7D"> <img src="https://latex.codecogs.com/png.latex?P_%7B%7C%5CPhi%5E+%5Crangle%7D(q_1%20=%200)%20=%20P_%7B%7C%5CPhi%5E+%5Crangle%7D(q_1%20=%201)%20=%20%5Cfrac%7B1%7D%7B2%7D"></p>
<p>Marginally, each qubit has an equal chance of being measured 0 or 1. But once we know something about the state of the other qubit, this changes:</p>
<p><img src="https://latex.codecogs.com/png.latex?P_%7B%7C%5CPhi%5E+%5Crangle%7D(q_1%20=%200%7Cq_0%20=%200)%20=%201"> <img src="https://latex.codecogs.com/png.latex?P_%7B%7C%5CPhi%5E+%5Crangle%7D(q_1%20=%200%7Cq_0%20=%201)%20=%200"> <img src="https://latex.codecogs.com/png.latex?P_%7B%7C%5CPhi%5E+%5Crangle%7D(q_1%20=%201%7Cq_0%20=%200)%20=%200"> <img src="https://latex.codecogs.com/png.latex?P_%7B%7C%5CPhi%5E+%5Crangle%7D(q_1%20=%201%7Cq_0%20=%201)%20=%201"></p>
<p>We could flip those around and condition <img src="https://latex.codecogs.com/png.latex?q_0"> on <img src="https://latex.codecogs.com/png.latex?q_1"> and we’d end up with the same result. What this shows is that in the entangled state,</p>
<p><img src="https://latex.codecogs.com/png.latex?P(q_0%7Cq_1)%20%5Cneq%20P(q_0)"> implying</p>
<p><img src="https://latex.codecogs.com/png.latex?P(q_0%20%5Ccap%20q_1)%20%5Cneq%20P(q_0)P(q_1)"></p>
<p>and therefore are not independent. Once we know (measure) one qubit, we automatically know what the other one will be. If you go through the same math for <img src="https://latex.codecogs.com/png.latex?%7C+0%5Crangle">, you’ll see the marginal and conditional probabilities are in fact equal, and thus independent.</p>
<p>Now I don’t know if/how this might change once you start introducing more qubits or allow for the full range of <a href="https://github.com/Qiskit/textbook/blob/main/notebooks/intro/what-is-quantum.ipynb">phase</a>, but to keep things intuitive, my working definition of quantum entanglement is:</p>
<p><strong><em>Does the probability of a qubit being measured to a particular state depend on the state of another qubit? If yes, they are entangled; otherwise, they are not.</em></strong></p>


<!-- -->

</section>

 ]]></description>
  <category>Quantum</category>
  <category>Probability</category>
  <guid>https://www.zajichekstats.com/post/quantum-entanglement-from-statistical-perspective/</guid>
  <pubDate>Fri, 16 Feb 2024 06:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/quantum-entanglement-from-statistical-perspective/feature.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>On the Creation of Classical Statistics</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/on-the-creation-of-classical-statistics/</link>
  <description><![CDATA[ 




<p>I used to have a somewhat cynical view of <a href="https://en.wikipedia.org/wiki/Ronald_Fisher">R.A. Fisher</a>, especially on the motivation for statistical significance (see my <a href="https://www.zajichekstats.com/post/statistical-significance-is-insignificant/">previous article</a>). Even though he did explicitly advocate for the use of the 5% threshold:</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 14px; color: #2e5c46">“If P is between .1 and .9 there is certainly no reason to suspect the hypothesis tested. If it is below .02 it is strongly indicated that the hypothesis fails to account for the whole of the facts. We shall not often be astray if we draw a conventional line at .05 and consider that higher values of [the statistic] indicate a real discrepancy.”</span><sub>1</sub></p>
</blockquote>
<p>and</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 14px; color: #2e5c46">“If one in twenty does not seem high enough, we may, if we prefer, draw the line at one in fifty (the 2 percent point), or one in a hundred (the 1 percent point). Personally, the writer prefers to set a low standard at the 5 percent point, and ignore entirely all results which fail this level.”</span><sub>2</sub></p>
</blockquote>
<p>After reading Erich Lehmann’s book, <a href="https://link.springer.com/book/10.1007/978-1-4419-9500-1"><em>Fisher, Neyman, and the Creation of Classical Statistics</em></a>, I realize there is much more nuance to it, and he probably meant well in his <em>statistical</em> work (his <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2942659/">other work</a>, maybe a different story). I’m fairly convinced that he never imagined, nor would approve of, how statistical significance would be used and abused since then.</p>
<section id="an-experimentation-context" class="level1">
<h1>An experimentation context</h1>
<p>Much of the methodology related to hypothesis testing that Fisher developed (along with other fundamental concepts (in 1922) like <em>consistency</em>, <em>efficiency</em>, and <em>sufficiency</em>) was motivated by the specific context he started out in as an agricultural statistician in 1919 at <a href="https://www.rothamsted.ac.uk/"><em>Rothamstead Experimental Station</em></a>: that of small-sample, randomized experimentation. It is clear in his writings, though maybe implicit, that there were practical things he considered that played into the overall validity of inference, not <em>only</em> whether the p-value crossed a threshold:</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 14px; color: #2e5c46">“…it is not known whether heterogeneity [of the soil] will be more pronounced in the one or the other direction in which the field is ordinarily cultivated…The effects are sufficiently widespread to make apparent the importance of eliminating the major effects of soil heterogeneity not only in one direction across the field, but at the same time in the direction at right angles to it.”</span><sub>3</sub></p>
</blockquote>
<p>He wasn’t proposing that his methods be mechanically applied, or that the <em>method</em> itself is what proves valid inference. Rather, inherent in that quote is the intuition Fisher had about the subject, the “soil heterogeneity”, that made the <em>implications</em> of the design <em>useful</em> for that situation. This, in combination with his obviously deep statistical knowledge, is what I believe ultimately made the usage of seemingly arbitrary significance thresholds valid in Fisher’s eyes. It’s not that he didn’t want statistical analysis to be “easier” for researchers (and he was somewhat back and forth on this):</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 14px; color: #c79c00">“However, his early recommendation and life-long practice prevailed. The desire for standardization trumped the advantages of considering each case on its own merit.”</span><sub>4</sub></p>
</blockquote>
<p>I think he probably just put too much confidence in the implementers of his work to be as critical, meticulous, and simply as brilliant as he was. He never conceived of the erroneous ways his statistical and design principles would later be used.</p>
</section>
<section id="he-had-a-mentor" class="level1">
<h1>He had a mentor</h1>
<p>One of the most fascinating aspects of the history of classical statistics is the role of <a href="https://en.wikipedia.org/wiki/William_Sealy_Gosset">William Sealy Gosset</a> (a.k.a. “Student”, as in <a href="https://en.wikipedia.org/wiki/Student%27s_t-test"><em>Student’s t-test</em></a>). For his entire career, he was a beer brewer at <a href="https://en.wikipedia.org/wiki/Guinness_Brewery"><em>Arthur Guinness Son and Co.</em></a> (one of my favorites), yet he is credited with putting forth, through his own curiosity, intelligence, and need of practical solutions for quality control efforts, the ideas of which Fisher would ultimately bring to fruition:</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 14px; color: #c79c00">“After a small-sample (”exact”) approach to testing was initiated by Gosset (“Student”) in 1908 with his t-test, Fisher in the 1920’s, under frequent prodding by Gosset, developed a battery of such tests, all based on the assumption of normality. These tests today still constitute the bread and butter of much of statistical practice.”</span><sub>4</sub></p>
</blockquote>
<p>That “frequent prodding” Lehmann is talking about, in addition to the timeline, is why I characterize Gosset more like a mentor. Fisher was 14 years younger, but incredibly gifted intellectually.</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 14px; color: #c79c00">“He [Gosset] then had the crucial insight that exact results [for a t-test] could be obtained by making an additional assumption…although he was not able to give a rigorous proof. The first proof was obtained (although not published) by Fisher in 1912…as a result of constant prodding and urging by Gosset, he found a number of additional small-sample distributions, and in 1925 presented the totality of these results in his book…getting Fisher to develop this methodology much further than he (Fisher) had originally intended.”</span><sub>4</sub></p>
</blockquote>
<p>Fisher was only 22 years old in 1912. It seems Gosset’s wisdom helped him pinpoint the arguments he would come to make, and ultimately gave him the encouragement and motivation to see it through. Without that, who knows if any of it would have been done.</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 14px; color: #c79c00">“This passage suggests that Fisher thought these problems to be difficult, and that he had no plans to work on them himself. However, in April 1922 he received two letters from Gosset that apparently changed his mind.”</span><sub>4</sub></p>
</blockquote>
<p>Not to mention Gosset’s influence on Neyman’s (and Egon Pearson’s) foundational work regarding the <em>“consideration of the alternatives (suggested by Gosset)”</em>, Fisher did acknowledge his contributions and spoke highly of him.</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 14px; color: #2e5c46">“…an exact solution of the distribution of regression coefficients…has been outstanding for many years; but the need for its solution was recently brought home to the writer by correspondence with ‘Student’, whose brilliant researches in 1908 form the basis of the exact solution”</span><sub>5</sub></p>
</blockquote>
</section>
<section id="he-was-in-fact-a-genius" class="level1">
<h1>He was in fact, a genius</h1>
<p>Despite their <em>“disdain”</em> for one another:</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 14px; color: #c79c00">“Both Fisher and Neyman believed that they had made important contributions to the philosophy of science, but each felt that the other’s views were completely wrong-headed.”</span><sub>4</sub></p>
</blockquote>
<p>Much of their foundational work was complimentary. Fisher supplied the methodology, Neyman put the rubber stamp on it with mathematical proofs.</p>
<p>The thing that caught my attention that Lehmann mentions multiple times in the book is the way Fisher came up with those methods.</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 14px; color: #c79c00">“Fisher’s tests were solely based on his intuition. The right choice of test statistics was obvious to him. A theory that would justify his choices was developed by Neyman and Pearson in their papers in 1928 and 1933.”</span><sub>4</sub></p>
</blockquote>
<p>As you read about the progression of his work, it’s like all the fundamental statistical concepts pop-up one by one, and you realize the breadth and depth of Fisher’s accomplishments. The idea that this can be attributed to his “intuition” is just remarkable. It wasn’t just in testing, but also in design:</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 14px; color: #c79c00">“…the designs in DOE [The Design of Experiments, 1935] were presented without much justification, based entirely on his intuitive understanding of what the situation demanded. But again later writers found justifications by showing that Fisher’s procedures possessed certain optimality properties.”</span><sub>4</sub></p>
</blockquote>
<p>Even when you read Fisher’s passages directly, you get the feeling that it just rolled off his tongue and he was writing down what flowed from his mind. Though what he was writing turned out to be fundamental to statistical practice:</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 14px; color: #2e5c46">“…much caution should be used before claiming significance for special comparisons… Comparisons suggested by scrutiny of the results themselves are open to suspicion; for if the variants are numerous, a comparison of the highest with the lowest observed value will often appear to be significant, even from undifferentiated material.”</span><sub>3</sub></p>
</blockquote>
<p>In this case, the problem with <a href="https://en.wikipedia.org/wiki/Multiple_comparisons_problem">multiple comparisons</a>. This is the general tone of Fisher’s writings, just nonchalantly bringing up things like <a href="https://en.wikipedia.org/wiki/Power_of_a_test">power</a>, creating <a href="https://en.wikipedia.org/wiki/Blocking_(statistics)">block designs</a>, etc. as “obvious” considerations.</p>
<p>Unfortunately, despite all Fisher did achieve, his stubbornness prevented him from achieving more.</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 14px; color: #c79c00">“…Fisher rarely gave an inch. Those holding different views from his own had ‘misread’ him and their statements were ‘incorrect’.”</span><sub>4</sub></p>
</blockquote>
<p>And subsequently, even though he hinted at it with his idea of “sensitiveness”:</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 14px; color: #c79c00">“By not utilizing the idea of power, Fisher deprives himself of the ability to resolve one of the most important issues of experimental design, the determination of sample size.”</span><sub>4</sub></p>
</blockquote>
<p>It seems he grew bitter and resentful in older age. For one, Fisher, “the creator of modern statistics”, in his role at University College under Egon Pearson, was <em>“not permitted to teach statistics”</em>. Also, all the progress and innovation in statistics shifted to the United States after Neyman moved there in 1938. People appreciated his foundational work, but they were taking it in a different direction and he was too far away to continue having influence. Nevertheless, his legacy is set in stone.</p>
</section>
<section id="references" class="level1">
<h1>References</h1>
<ol type="1">
<li><p>Fisher, R.A. (1925). <a href="https://link.springer.com/chapter/10.1007/978-1-4612-4380-9_6"><em>Statistical methods for research workers</em></a>. Oliver and Boyd: Edinburgh.</p></li>
<li><p>Fisher, R.A. (1926). <a href="https://link.springer.com/chapter/10.1007/978-1-4612-4380-9_8"><em>The arrangement of field experiments</em></a>. J. Min. Agric. G. Br. 33:503-513</p></li>
<li><p>Fisher, R.A. (1935). <a href="https://en.wikipedia.org/wiki/The_Design_of_Experiments"><em>The Design of Experiments</em></a>. Oliver and Boyd: Edinburgh.</p></li>
<li><p>Lehmann, Erich L (2011). <a href="https://link.springer.com/book/10.1007/978-1-4419-9500-1"><em>Fisher, Neyman, and the Creation of Classical Statistics</em></a>. Springer New York, NY. https://doi.org/10.1007/978-1-4419-9500-1</p></li>
<li><p>Fisher, R.A. (1922). <a href="https://www.jstor.org/stable/2341124"><em>The goodness of fit of regression formulae, and the distribution of regression coefficients</em></a>. J. Roy. Statist. Soc., 85: 597-612</p></li>
</ol>


<!-- -->

</section>

 ]]></description>
  <category>History</category>
  <category>Philosophy</category>
  <guid>https://www.zajichekstats.com/post/on-the-creation-of-classical-statistics/</guid>
  <pubDate>Sat, 10 Feb 2024 06:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/on-the-creation-of-classical-statistics/feature.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>Statistical significance is…insignificant</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/statistical-significance-is-insignificant/</link>
  <description><![CDATA[ 




<p>The longer I’ve been practicing as a statistician, maybe paradoxically, the more skeptical I’ve become of statistical significance (#1). It manifests as a feeling of dissatisfaction, as if, even though you’ve stated what you “found”, you don’t <em>actually</em> believe it to be true. I recently finished reading <a href="https://press.umich.edu/Books/T/The-Cult-of-Statistical-Significance2"><em>The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives</em></a>–it instantly became one of my favorite books (here are my favorite quotes and passages). It affirms a lot of what I’ve come to suspect, with deep articulation about the vastness of the issue, backed by a thorough historical foundation. I can’t help but wonder about the broader scientific, political, and societal implications this has had over the years (and continues to have). It really lit a fire in me to continue learning about and unraveling statistical history to connect those dots.</p>
<section id="my-take" class="level1">
<h1>My take</h1>
<p>The <em>significance</em> of a statistical result cannot be mechanically, mathematically, or systematically determined. It must be driven by a <em>practical</em> relevance, or importance, which is inherently subjective, and a product of the values, beliefs, interests, and/or goals of the individual(s) interpreting the data. That result is always subject to dispute, critique, replication, and refinement, whether those reasons are process error (#2), or the value of the information itself. The chance occurrence of a sampling probability crossing an arbitrary threshold is actually irrelevant.</p>
</section>
<section id="whatisstatsig" class="level1">
<h1>What is statistical significance?</h1>
<blockquote class="blockquote">
<p><span style="font-family: Garamond, serif; font-size: 20px; color: #2e5c46">“It’s embedded like a tax code in the bureaucracy of science.”</span><sub>1</sub></p>
</blockquote>
<p>Technically speaking, it is when the likelihood of observing our data, <em>if</em> an hypothesized state of the world were true (known as the <em>p-value</em>), is so small (<a href="https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913">notoriously</a>, and most often, less than 5%), that the hypothetical state of the world must be false, and therefore, we have “significant” statistical evidence to say so. It positions itself as an objective tool to <em>decide</em> (on the basis of this probability threshold) whether a statistical relationship is “real”, and, often, subsequently that it “matters”.</p>
<p>Take this <a href="https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2808358">recent study</a>, for example, which is meant to characterize physician-propagated misinformation about the COVID-19 pandemic. The authors outline a set of <a href="https://cdn.jamanetwork.com/ama/content_public/journal/jamanetworkopen/939195/zoi230834supp1_prod_1697557763.1365.pdf?Expires=1704642946&amp;Signature=nxggsGG3f~BQcev3DoFQmbURh3vsb1CtFrP4rSviM1XaF8Y9vtyGPmRRBTRDAXyvYzrGvW6vrJFsphYDRTI9LSJD35NYEc8RUdkZK8fJkKcSpr-AbsW1wyhe30CUf-x8GGPI2For6nZNLWoZhBn0m~GrC3JlmuTmCswv~3RH7HolcYV10ZTVgSh4ZvGaBOUKDdNhmITsHrocrTct-xvMnohwhM~6~nHZMATo6~grFfhnPrhgsDHRVkYdLr9o8yFEae4-ylEnzkulOfigZIKZJsj1s5iBbyPAB60k2KYYkXlBZnNAUpkrKSP33m9BiG8YY047fQOSRrqMrap-PaA45w__&amp;Key-Pair-Id=APKAIE5G5CRDK6RD3PGA">basic premises</a> that are used as a basis for classifying contrarian statements as <em>misinformation</em> (#3). At least some of this is built upon the attainment of statistical significance (or lack thereof).</p>
<p>As an example, in the category of <strong>Promoting Unapproved Medications for Prevention or Treatment</strong> (in the Results section), the authors state:</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 12px; color: #9e3634">“The 2 most prominent medications promoted were ivermectin and hydroxychloroquine, which have been found to not be effective at treating COVID-19 infections in randomized clinical trials.”</span><sub>2</sub></p>
</blockquote>
<p>This premise drove them to classify social media posts like this:</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 12px; color: #45A4CE">“Two of my toughest COVID patients–showed up with oxygen stats of 68% and 84% and would not go to the hospital. We treated them with IVM, steroids, and breathing treatments and here they are now.”</span><sub>2</sub></p>
</blockquote>
<p>as misinformation (see all <a href="https://cdn.jamanetwork.com/ama/content_public/journal/jamanetworkopen/939195/zoi230834t4_1697557763.5423.png?Expires=1704648605&amp;Signature=vIdMdVOE4XQ76IbpK3v-OY0EzN5zbPooMwAeWZTXhWDu5fJnx4hpErMTWryrzYEaJOYO6YckYQIvSqmFFHp7LJ~8NughK380U2JDc2PBtonwbYYmVcXzRXT~oLftOZEXNxq0MWHqDFs1Ov7KNUdoqd1TuhYxCgFKWUvhdb5pXzB0zliNP-28kjQwZF9KLM70oerRsri0XM-HVfdKkPKrM1idAtUBFAZzBDg05y5BBQD8cLzgW4Fa7cINV-~1Dhyg9HpncVHeRACdc-tJwqqs4Z-VQfJPXMgzxP2-eyL6nhSA3IqIW9ax8CYgcYrha6yqq80wqaR4zTTGmXzf~rTWmQ__&amp;Key-Pair-Id=APKAIE5G5CRDK6RD3PGA">supportive quotes</a>). This is an actual doctor saying the drug helped <em>their</em> patients, but the authors have deemed it ineffective. What justifies them making such a universal claim?</p>
<p>If you look at <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8394824/">one of their references</a>, the <a href="https://en.wikipedia.org/wiki/Meta-analysis">meta-analysis</a> shows the <a href="https://en.wikipedia.org/wiki/Relative_risk">relative risk</a> for all-cause mortality was estimated to be 37%. That is, the risk of death was 63% lower in patients who received ivermectin versus placebo or standard of care. However, because the 95% <a href="https://en.wikipedia.org/wiki/Confidence_interval">confidence interval</a> ranged from 12% to 113% (i.e., there was <em>plausibility</em> that ivermectin could produce up to a 13% <em>worse</em> mortality rate, but equally plausible an 88% risk reduction), it was deemed <em>not</em> statistically significant, and as the authors state:</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 12px; color: #3d2f2f">“IVM [ivermectin], compared with control treatment, did not have an effect on the all-cause mortality rate.”</span><sub>3</sub></p>
</blockquote>
<p>and ultimately,</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 12px; color: #3d2f2f">“Ivermectin is not a viable treatment option for COVID-19.”</span><sub>3</sub></p>
</blockquote>
<p>In other words, because the probability of observing this data, under the assumption of no difference in mortality risk (our p-value definition above), was not less than 5% (it was 31%), that gives reason to conclude <em>no difference at all</em> (#4). Furthermore, if the confidence interval crossed 100% by any amount, no matter how small, the p-value would have remained above 5% and not reached the threshold for statistical significance.</p>
</section>
<section id="why-is-it-flawed" class="level1">
<h1>Why is it flawed?</h1>
<blockquote class="blockquote">
<p><span style="font-family: Garamond, serif; font-size: 20px; color: #2e5c46">“Real science changes one’s mind. That’s one way to see that the proliferation of unpersuasive significance tests is not real science.”</span><sub>1</sub></p>
</blockquote>
<section id="arbitrarythreshold" class="level2">
<h2 class="anchored" data-anchor-id="arbitrarythreshold">An arbitrary threshold</h2>
<p>The 5% threshold is arbitrary. Despite that common acknowledgement, willful ignorance tends to prevail due to tradition and adherence to norms. The fact that the perceived significance of a result can suddenly change from minute differences speaks to the lack of robustness in the logic. In <a href="https://press.umich.edu/Books/T/The-Cult-of-Statistical-Significance2">the book</a>, the authors frequently discuss the importance of a <em>loss function</em>, which focuses on the potential consequences and implications of the result on the real-world decisions that are sought to be made from the information, rather than a predefined threshold based on sampling error probability. In this sense, the allowable risk tolerance can’t be objectively or mechanically determined. It is context-dependent, and not all decisions are created equal. Yes, the p-value above was 31%, but that error rate, along with the plausible range of risks (and benefits), may be sufficient to someone needing to make a treatment decision <em>now</em>.</p>
<section id="risks-are-subjective" class="level3">
<h3 class="anchored" data-anchor-id="risks-are-subjective">Risks are subjective</h3>
<blockquote class="blockquote">
<p><span style="font-family: Garamond, serif; font-size: 18px; color: #2e5c46">“It always depends on the loss, measured in side effects, treatment cost, death rates. The loss to a cool, scientific, impartial spectator will not be the same as the loss to the patient in question…[the balance between Type I/II errors] ‘must be left to the patient, friends, and family’.”</span><sub>1</sub></p>
</blockquote>
<p>Beyond the statistical significance of a result is the question of what to do about it. In the <a href="https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2808358">same article</a>, the authors state the following, still in the context of misinformation:</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 12px; color: #9e3634">“Claims that myocarditis was common in children who received the vaccine and that the risks of myocarditis outweighed the risk of vaccination were also unfounded.”</span><sub>2</sub></p>
</blockquote>
<p>Nevermind the fact that the <a href="https://jamanetwork.com/journals/jama/fullarticle/2782900">study they reference</a> <em>does</em> show an increase in monthly case volume of myocarditis and pericarditis between pre/post-vaccine periods and the authors state:</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 12px; color: #9e8a39">“Myocarditis developed rapidly in younger patients, mostly after the second vaccination. Pericarditis affected older patients later, after either the first or second dose.”</span><sub>4</sub></p>
</blockquote>
<p>The more important point is that the weight individuals place on statistical results to inform their decision making is subjective. The risk may be low, maybe even lower than the alternative, but that doesn’t inform <em>how</em> someone should weigh it.</p>
<blockquote class="blockquote">
<p><span style="font-family: Garamond, serif; font-size: 14px; color: #2e5c46">“Imagine that you and your infant child are standing on a sidewalk near a busy street. You have just purchased a hot dog from the street vendor and have safely crossed the street. Scenario 1: You suddenly realize you have forgotten the mustard and if you scurry across the busy street, dodging vehicles, there is a 95% probability you’ll return safe with your mustard. Scenario 2: You forgot your child and you watch as she tries to cross the street herself, if you scurry across the busy street, dodging vehicles, there is a 95% probabiliity you’ll return safe with your child. The sizeless scientist in effect declares ‘they are equally important reasons for crossing the street’”</span><sub>1</sub></p>
</blockquote>
</section>
</section>
<section id="samplesize" class="level2">
<h2 class="anchored" data-anchor-id="samplesize">It can’t depend on sample size</h2>
<blockquote class="blockquote">
<p><span style="font-family: Garamond, serif; font-size: 18px; color: #2e5c46">“At high sample sizes, all null hypotheses are rejected, by mathematical fact, without having to look at the data.”</span><sub>1</sub></p>
</blockquote>
<p>One pretty simple argument is that of <a href="https://www.omniconvert.com/what-is/sample-size/">sample size</a>. In most contexts, a statistical test is, by definition, more likely to be declared <em>significant</em> by simply <a href="https://en.wikipedia.org/wiki/Standard_error">amassing more data</a>, regardless of what the actual effect size is. This, on the other hand, <em>is</em> completely mechanical and dissociated from the real-world context in which the test is being run. Thus, it prioritizes quantity over substance, and when blindly used, potentially promotes results that may lack practical meaning.</p>
<blockquote class="blockquote">
<p><span style="font-family: Garamond, serif; font-size: 14px; color: #2e5c46">“…some cause of natural selection may have a high probability of replicability in additional samples but be trivial. Yet a cause may have a low probability of replicability but be important. This is what we mean when we say that a test of significance is neither necessary nor sufficient for a finding of importance”</span><sub>1</sub></p>
</blockquote>
<p>It also tends to shift focus to attaining statistical significance and using it as a filter, causing the potential to miss meaningful insights that didn’t reach this level.</p>
</section>
<section id="we-dont-believe-in-zero-sized-effects" class="level2">
<h2 class="anchored" data-anchor-id="we-dont-believe-in-zero-sized-effects">We don’t believe in “zero-sized” effects</h2>
<blockquote class="blockquote">
<p><span style="font-family: Garamond, serif; font-size: 18px; color: #2e5c46">“Real scientists draw a line between what is large and small.”</span><sub>1</sub></p>
</blockquote>
<p>There is a major contradiction that arises.</p>
<p>The typical hypothesis test is conducted under the assumption of a <a href="https://en.wikipedia.org/wiki/Null_hypothesis">null hypothesis</a> positing <em>no effect</em>. For example, in calculating the p-value above, it is assumed that there is <em>no</em> difference in all-cause mortality rates between the treatment groups. However, I would argue that in any practical context, it’s rare that someone would genuinely believe in the existence of precisely zero effect. Rather, it would stand to reason that what they really mean is “effectively zero” effect, something so small that it is considered inconsequential.</p>
<p>Herein lies the contradiction: they have now acknowledged some level of substantive significance, albeit undefined. If the true effect happens to be smaller than this threshold, as we just explained, the estimate will still eventually be declared statistically significant with mathematical certainty no matter how minuscule, thus inevitably crossing the unspoken threshold of substantive meaning. Therefore, this begs into question the value of attaining statistical significance at all in favor of the need for explicit consideration of the real-world implications (i.e., the loss function). At the <em>very least</em>, the substantive threshold should be identified and reflected in the null hypothesis so that the p-value is calibrated for substance.</p>
</section>
<section id="fallacy" class="level2">
<h2 class="anchored" data-anchor-id="fallacy">The fallacy of the transposed conditional</h2>
<p>This is where it gets especially interesting. There are logical errors with the conclusions drawn from hypothesis testing. I think the best way to describe it is jumping into the classic example that arises in Jacob Cohen’s <a href="https://doi.org/10.1037/0003-066X.49.12.997"><em>The Earth Is Round (p &lt; .05)</em></a> from 1994:</p>
<blockquote class="blockquote">
<p><span style="font-family: Verdana, sans-serif; font-style: italic; font-size: 12px; color: #45A4CE">“The incidence of schizophrenia in adults is about 2%. A proposed screening test is estimated to have at least 95% accuracy in making the positive diagnosis (sensitivity) and about 97% accuracy in declaring normality (specificity)…With a positive test for schizophrenia at hand, given the more than .95 assumed accuracy of the test, the probability of a positive test given that the case is normal is less than .05, that is, significant at p &lt; .05. One would reject the hypothesis that the case is normal and conclude that the case has schizophrenia, as it happens mistakenly, but within the .05 alpha error. But that’s not the point. The probability of the case being normal, given a positive test, is not what has just been discovered however much it sounds like it and however much it is wished to be. It is not true that the probability that the case is normal is less than .05, nor is it even unlikely that it is a normal case. By a Bayesian maneuver, this inverse probability, the probability that the case is normal, given a positive test for schizophrenia, is about .60!”</span><sub>5</sub></p>
</blockquote>
<p>The desired interpretation of a statistically significant result induces a technical problem. The p-value provides the likelihood of observing the data under the assumption that the null hypothesis is true (a single state of the world), yet we <em>want</em> to interpret it as evidence about the parameter of interest given the data. After all, we did collect it, and want that to be the basis of our conclusions. But that is not the probability we have concerned ourselves with. Using the p-value as a singular basis to determine significance disregards all other possibilities that the true parameter could be. When those possibilities are imbalanced (as they were here, since only 2% of the population had schizophrenia), it confuses which state of the world is most likely given the data with how likely the data is given a state of the world (#5).</p>
</section>
</section>
<section id="what-to-do-instead" class="level1">
<h1>What to do instead?</h1>
<blockquote class="blockquote">
<p><span style="font-family: Garamond, serif; font-size: 20px; color: #2e5c46">“Real science, unlike significance-testing science, is difficult. If it were not, it would not be real science, but instead it would be already established routine. Real science asks you to make real scientific judgements and real scientific arguments within a community of other scientists. It asks you to be quantitatively persuasive, not to be irrelevantely mechanical. Life is hard.”</span><sub>1</sub></p>
</blockquote>
<p>It’s a scary thing to think about. Suppose statistical significance isn’t there to bail you out. What are you supposed to do? How do you know if your results matter or not? I think this passage gives a pretty clear answer:</p>
<blockquote class="blockquote">
<p><span style="font-family: Garamond, serif; font-size: 16px; color: #2e5c46; font-weight: bold">“She can test her belief in the price effect by looking at the magnitudes, using, for example, the highly advanced technique common in data-heavy articles in physics journals: ‘interocular trauma’. That is, she can look and see if the result hits her between the eyes.”</span><sub>1</sub></p>
</blockquote>
<p>The premise of this article has been that the implications of statistical results are context-dependent, so there isn’t a one-size-fits-all alternative to replace statistical significance. Rather than seeking a systematic approach, the emphasis should be placed on cultivating understanding of the subject matter. It’s akin to relying on intuition, like a feeling of “knowing” that you’ve gotten what you needed. Take this simple analogy: a tape measure is a tool that quantifies information needed to inform subsequent action, and the precision of the measurement is tailored to the specific needs of the task at hand. Sometimes a rough estimate is sufficient, while other times meticulous precision is necessary. The goal is to reach the point where, intuitively, you “know” that you’ve obtained the necessary information to move forward confidently. I see statistics as the same thing. Merely a <em>tool</em> to be used to quantify the desired information needed to <em>inform</em> (i.e., augment, not determine) a decision.</p>
<p>Now I’m not going to claim that I haven’t repeatedly violated the practices I’m arguing against, it’s hard not to, but these are things that I’m going to focus more on moving forward instead of p-values and statistical significance:</p>
<section id="estimation-magnitude" class="level3">
<h3 class="anchored" data-anchor-id="estimation-magnitude">1. Estimation &amp; magnitude</h3>
<p>This is probably the easiest change to start making because it doesn’t require an overhaul of statistical methods, but rather just a shift in focus to the magnitude of the estimates. By deliberately avoiding p-value calculations (and, when reading and consuming research, simply ignoring the concept of statistical significance altogether), the interpretation is governed by (a plausible range of) effect sizes, untainted by arbitrary, context-agnostic significance thresholds, and thus forces a scientific argument to be made on that basis. With a little extra brain power (and humility), this creates a much more contextually-rich, informative interpretation.</p>
</section>
<section id="bayesian-thinking-causal-modeling" class="level3">
<h3 class="anchored" data-anchor-id="bayesian-thinking-causal-modeling">2. Bayesian thinking &amp; causal modeling</h3>
<p>Richard McElreath’s <a href="https://github.com/rmcelreath/stat_rethinking_2023"><em>Statistical Rethinking</em></a> really convinced me that causal inference powered by Bayesian estimation is probably the best framework out there for scientific modeling (and I’ve only made it through the <a href="https://www.zajichekstats.com/post/statistical-rethinking-2023-class-notes/">first couple of chapters</a> so far). It completely shifts the focus from the data itself to the data-generating <em>process</em>, putting the bulk of the hard work upfront, before data is collected, with a focus on mechanism and structure. It also addresses the fallacy problem. However, it’s definitely harder to start doing on a whim.</p>
<p>First of all, the <a href="https://en.wikipedia.org/wiki/Bayesian_statistics">math itself</a> is different from typical <a href="https://en.wikipedia.org/wiki/Frequentist_inference">frequentist</a> methods, so there is a learning curve there. More difficult though is navigating the <em>practical</em> complexities, such as properly eliciting the necessary subject matter expertise and piecing that together into coherent <a href="https://en.wikipedia.org/wiki/Prior_probability">prior distributions</a> and <a href="https://en.wikipedia.org/wiki/Causal_model">causal models</a>. Nevermind the technical reasons why that is hard, it is simply more demanding from a time, brainpower, and collaboration perspective–and everyone is busy. Nevertheless I think it is a worthy pursuit (#6).</p>
</section>
<section id="decision-making-course-of-action" class="level3">
<h3 class="anchored" data-anchor-id="decision-making-course-of-action">3. Decision-making &amp; course of action</h3>
<p>This is where the loss function is most relevant.</p>
<p>Instead of contorting a generic statistical result to tenuously align with real-world implications, I want to be more deliberate. The first step is to target and understand the tangible decision-making processes that the estimates seek to inform, with an identification of the current standards including practical constraints and nuances. Then, rather than passively using standard techniques, deriving tailored statistical methods to facilitate that usage, which may prompt more rigor, customization, or reframing of the statistical problem entirely to suit the specific context at hand. Estimation uncertainty can be fed as input into hypothetical scenarios to gain insight into where/what actions will be triggered and their subsequent downstream effects on the hard outcomes intended to be impacted. At that point, the <em>significance</em> will be clear.</p>
<section id="focus-on-the-end-product" class="level4">
<h4 class="anchored" data-anchor-id="focus-on-the-end-product">Focus on the end-product</h4>
<p>I think a critical piece to this endeavor is to not only focus on the statistics, but also <em>how</em> they will be disseminated. This means specifying the vehicle that will deliver the information to the right person at the right time. The emphasis on something tangible elicits certain practical and technological constraints that may be otherwise unbounding when focusing solely on the math. Further, this perspective acknowledges that the statistical methods are only a fragment of the overall data product, and may be direct cause for further refinement of the statistical approach itself. That is, even with robust statistical methods or results, the information may lose its utility if poorly conveyed or implemented. This could be due to anything from data pipelines and visualization to deployment and computing resources. This also enables the ability to be more forward-thinking about success measures and accountability/validation schemes like continuous monitoring to ensure sustained yet impactful presence in the intended decision-making context.</p>
</section>
</section>
</section>
<section id="some-historical-gold" class="level1">
<h1>Some historical gold</h1>
<p>To conclude this, I wanted to highlight an excerpt from the chapter <em>The Psychology of Psychological Significance Testing</em> in <a href="https://press.umich.edu/Books/T/The-Cult-of-Statistical-Significance2">the book</a> that I found especially fascinating about the propagation of statistical significance across university education in the United States (pages 142-143):</p>
<blockquote class="blockquote">
<span style="font-family: Garamond, serif; font-size: 14px; color: #2e5c46">
<p>
“In this context the 5 percent science was promoted by the new leaders of quantitative psychology and education. European humanists can score themselves by how many generations they are removed from Hegel–that is, in being taught by a teacher who was taught by a teacher who was taught by a teacher who was taught by Hegel at the University of Berlin. Likewise, statisticians can score themselves by how many generations they are from Fisher. Quinn McNemar, for example, of Stanford University, was an important teacher of psychologists who had himself studied statistical methods at Stanford with Harold Hotelling, the chief American disciple of Fisher. Hotelling had worked directly with Fisher. McNemar then taught L.G. Humphreys, Allen Edwards, David Grant, and scores of others. As early as 1935 all graduate students in psychology at Stanford, following the model of Iowa State, were required to master Fisher’s crowning achievement, analysis of variance. Already by 1950, Gigerenzer et al.&nbsp;reckon, about half of the leading departments of psychology required training in Fisherian methods.
</p>
<p>
Even rebels against Fisher were close to him, starting with [William Sealy] Gosset himself. Palmer Johnson of the University of Minnesota studied with Fisher in England, though he later had the bad taste to write articles with Fisher’s erstwhile colleague and eternal enemy Jerzy Neyman, whom Fisher had cast into outer darkness. George Snedecor, an agricultural scientist at Iowa State University at Ames, was a cofounder of the first department of statistics in the United States. His important book <em>Statistical Methods</em> was influenced directly by Fisher himself, who somewhat surprisingly was in the 1930s a visiting professor of statistics at Iowa State. One can think of the Iowa schools then [1940s and 1950s] as one thinks of London’s Gower Street in the 1920s and 1930s–a crucial crossroads of statistical methods and training. In a eulogy for S.S. Wilks, a student in the late 1920s of Henry L. Rietz and Allen T. Craig at the University of Iowa, Frederick Mostellar said that Iowa was then “the center of statistical study in the United States of America”. Rietz, Craig, and Wilks worked closely with Fisher. E.F. Lindquist, the American leader of standardized testing for educators, also of the University of Iowa, was deeply influenced by Snedecor. Lindquist invented the Iowa Test of Basic Skills for schoolchildren. He too spent time with the great man.
</p>
<p>
Some psychologists knew about the work of Neyman and Pearson and some even about that of the Bayesian Harold Jeffreys. But textbook authors, editors, and teachers–inspirited by Fisher’s promise of raising their fields to the level of hard science–helped Fisher win the day. Statistical education narrowed at the same time as it spread. Decision theory and inverse probability, and Gosset’s views on substantive significance, alternative hypotheses, and power, were pushed aside. Too introspective for the hard-boiled.”
</p>
</span></blockquote>
<p>It seems as if Fisher’s mechanization of statistical significance is what ultimately enabled <em>statistics</em> to branch out as its own field of study (and that it took place in Iowa is a fun fact). It makes you wonder how this separation contributed to the subsequent growth of scientific inquiry, results, and knowledge by disrupting the synergy between the intuition held by the practitioner and the intricacies of statistical nuance. While the popular notion of “playing in everyone’s backyard” is commonly portrayed as an advantage (which it is pretty cool), upon closer reflection, it might be a fundamental issue. <a href="https://en.wikipedia.org/wiki/William_Sealy_Gosset">William Sealy Gosset</a>, a.k.a <em>Student</em>, and the inventor of the <a href="https://en.wikipedia.org/wiki/Student%27s_t-test"><em>t-test</em></a>, was first and foremost, a brewer of Guinness beer, and clearly prioritized substantive meaning:</p>
<blockquote class="blockquote">
<p><span style="font-family: Garamond, serif; font-size: 14px; color: #2e5c46">“Fisher, not the great transcendent, invented the 5 percent philosophy. By contrast, Gosset’s economic approach to uncertainty prevented him from being able to stop thinking at .05 for fear he’d lose too much information, and profits.”</span><sub>1</sub></p>
</blockquote>
<blockquote class="blockquote">
<p><span style="font-family: Garamond, serif; font-size: 14px; color: #2e5c46">“World War I had been under way for more than a year when Gosset–who wanted to serve in the war but was rejected because of nearsightedness–wrote to his elderly friend, the great Karl Pearson: ‘My own war work is obviously to brew Guinness stout in such a way as to waste as little labor and material as possible, and I am hoping to help to do something fairly creditible in that way.’ It seems he did.”</span><sub>1</sub></p>
</blockquote>
<p>He had a problem to solve: <em>“to brew the best tasting stout at a satisfying price.”</em>. My takeaway: be like Gosset.</p>
</section>
<section id="sidenotes" class="level1">
<h1>Side notes</h1>
<ol type="1">
<li>I don’t think this has much to do with <em>statistical</em> advancement, but rather the experience of observing its implications over time. <br><br></li>
<li>By <em>error</em>, I’m talking about the inevitable consequences of statistical analysis in the real-world. Data is messy and inaccurate, samples contain unintended biases and nuances, and estimation methods always produce a much more simplified version of reality. It probably doesn’t need to be repeated, but as George Box <a href="https://en.wikipedia.org/wiki/All_models_are_wrong">famously said</a>, <em>“all models are wrong, some are useful”</em>. <br><br></li>
<li>In the article, they defined <em>COVID-19 misinformation</em> as <em>“assertions unsupported by or contradicting US Centers for Disease Control and Prevention (CDC) guidance on COVID-19 prevention and treatment during the period assessed or contradicting the existing state of scientific evidence for any topics not covered by the CDC”</em>. <br><br></li>
<li>To give them the benefit of the doubt, they also use a “certainty of evidence” criteria in their decision making which is meant to rate the confidence they have in the result with respect to estimation accuracy, risk of bias, etc. However, the conclusion that there is <em>“no effect”</em> seems questionable to say the least, and that suggesting otherwise is <em>misinformation</em> is asinine. <br><br></li>
<li>Search for the <em>‘Quinn is dead’</em> quote below for another intuitive example of the <em>fallacy of the transposed conditional</em>. <br><br></li>
<li>A couple other points on Bayesian modeling. First, on sample size. The required number of samples needed to estimate something is <em>N=0</em>. That is, I can get parameter estimates solely based on the prior distributions that are driven by what is already known. Thinking of it this way, the data becomes secondary to the model, and is merely collected as a way to nudge parameters one way or another as more of it comes in. The <em>model</em> always exists, relaying the best available information at that point in time, and I don’t need to wait to cross arbitrary sample size thresholds in order to obtain my estimates. This seems to naturally lend itself better to the scientific process. Second, a criticism of Bayesian modeling is that it is too subjective because individual judgement is being used to inform prior distributions. However, I see this as an unequivocal strength. Frequentist methods (and noninformative priors) are not “objective”. They carry assumptions that we probably wouldn’t see as realistic, it is just convenient to use them. In that sense, they become <em>more</em> arbitrary than utilizing pre-existing knowledge. There is an excellent <a href="https://learnbayesstats.com/episode/45-biostats-clinical-trial-design-frank-harrell/">podcast episode</a> where this is discussed.</li>
</ol>
</section>
<section id="favoritequotes" class="level1">
<h1>My favorite quotes</h1>
<p>These are my favorite quotes and passages from <a href="https://press.umich.edu/Books/T/The-Cult-of-Statistical-Significance2">the book</a>:</p>
<ul>
<li><em>“The sizeless scientists have adopted a method of deciding which numbers are significant that has little to do with humanly significant numbers…Imagine that you and your infant child are standing on a sidewalk near a busy street. You have just purchased a hot dog from the street vendor and have safely crossed the street. Scenario 1: You suddenly realize you have forgotten the mustard and if you scurry across the busy street, dodging vehicles, there is a 95% probability you’ll return safe with your mustard. Scenario 2: You forgot your child and you watch as she tries to cross the street herself, if you scurry across the busy street, dodging vehicles, there is a 95% probabiliity you’ll return safe with your child. The sizeless scientist in effect declares ‘they are equally important reasons for crossing the street’”</em> (chapter 0, page 10) <br><br></li>
<li><em>“…since the arrival of the desktop computer with its ability to invert big matrices at the punch of a key, ‘checking’ on sampling variability effortlessly…electronic computation of statistical significance has cheapened to near zero…‘Decision’ has become socialized and bureaucratized–heedless of the social margins.”</em> (chapter 0, page 13) <br><br></li>
<li><em>“It’s hard to do, unlike calculating t-statistics, which is a simpleton’s parlor game. But actual science at the frontier is supposed to be difficult. If it wasn’t, you wouldn’t be at the frontier.”</em> (chapter 0, page 16) <br><br></li>
<li><em>“…some cause of natural selection may have a high probability of replicability in additional samples but be trivial. Yet a cause may have a low probability of replicability but be important. This is what we mean when we say that a test of significance is neither necessary nor sufficient for a finding of importance”</em> (chapter 1, page 26) <br><br></li>
<li><em>“Unreasoning anger is a quite common reaction to challenges to the Fisherian orthodoxy.”</em> (chapter 1, page 31) <br><br></li>
<li><em>“Significance unfortunately is a useful means toward personal ends in the advance of science…Precision, knowledge, and control. In a narrow and cynical sense statistical significance is the way to achieve these. Design experiment. Then calculate statistical significance. Publish articles showing ‘significant’ results. Enjoy promotion.”</em> (chapter 1, page 32) <br><br></li>
<li><em>“An arbitrary level of statistical significance is the only standard in force–regardless of size, of loss, of cost, of ethics, of scientific persuasiveness. That is, regardless of oomph.”</em> (chapter 2, page 41) <br><br></li>
<li><em>“Gosset’s economic approach to uncertainty prevented him from be able to stop thinking at .05 for fear he’d lose too much information, and profits…[Fisher] turned away from Gosset and sought a mechanical, uniform, and bureaucratic line of demarcation–an ‘impenetrable’ end, to scientific argument. So the insecure sciences, eager to establish an ‘objective basis’ for their research ‘communicable to other rational minds’, were pleased and materially rewarded by Fisher’s 5 percent philosophy…With the low fee he set for them to rise to the rank of Sciences with a big S…”</em> (chapter 3, page 46) <br><br></li>
<li><em>“Fisher’s procedure appeals to scientists uncomfortable with any sort of argument…To avoid debate they seek certitude such as statistical significance. The unhappy result is that mere opinion and unargued crankery are <strong>more</strong> likely to rule the sizeless sciences, not less…A technique that was supposed to end arguments has in fact merely concealed the arguments behind a facade of testing that does not test.”</em> (chapter 3, page 47) <br><br></li>
<li><em>“‘The goal of an empirical economist should not be to determine the truthfulness of a model but rather the domain of its usefulness’ [Edward Leamer]”</em> (chapter 3, page 52) <br><br></li>
<li><em>“Ten million tests of significance, in economics, done annually. If the ten million tests were in fact as conclusive as their own rhetoric requires, whether accepting or rejecting, then nearly every issue in economics would long since have been settled. By now there would therefore be far fewer tests per year, not, as is the case, more and more.”</em> (chapter 3, page 53) <br><br></li>
<li><em>“Real scientists draw a line between what is large and small.”</em> (chapter 3, page 54) <br><br></li>
<li><em>“Real science, unlike significance-testing science, is difficult. If it were not, it would not be real science, but instead it would be already established routine. Real science asks you to make real scientific judgements and real scientific arguments within a community of other scientists. It asks you to be quantitatively persuasive, not to be irrelevantely mechanical. Life is hard.”</em> (chapter 3, page 55) <br><br></li>
<li><em>“…seems to be today’s prepublication attitude: merely increase the N [sample size] to get a still lower [standard error]…Notice the implication of such reasoning. It implies that something must be very wrong with the notion that statistical significance is <strong>necessary</strong> for substantive significance, a preliminary screen in which one puts one’s data.”</em> (chapter 5, page 67) <br><br></li>
<li><em>“She can test her belief in the price effect by looking at the magnitudes, using, for example, the highly advanced technique common in data-heavy articles in physics journals: ‘interocular trauma’. That is, she can look and see if the result hits her between the eyes.”</em> (chapter 5, page 72) <br><br></li>
<li><em>“‘Pushing’ an economically large <strong>though noisily estimated</strong> effect is not a misuse–or a ‘stretch’ of professional ethics. It is precisely the ethical thing to do. To argue otherwise is to fall into the mistaken belief that statistical significance <strong>can</strong> provide a screen through which the results can be put, to be examined then for <strong>substantive</strong> significance if they make it through the significance screen.”</em> (chapter 7, page 86) <br><br></li>
<li><em>“‘Young people have to have careers’ [former editor of the American Economic Review]”</em> (chapter 8, page 89) <br><br></li>
<li><em>“Any scientific hypothesis is a matter of being close enough. The decisions the scientist makes on what constitutes ‘closeness’ ‘depend entirely on the special purposes of the investigator’.”</em> (chapter 8, page 97) <br><br></li>
<li><em>“Real scientific tests are always a matter of how close to zero or how close to large or how close to some parameter value, and the standard of how close must be a substantive one, inclusive of tolerable loss.”</em> (chapter 9, page 98) <br><br></li>
<li><em>“…‘the overall benefit-cost ratio for the Employer Experiment is 4.29, but it is not statistically different from zero. The benefit-cost ratio for white women…however, is 7.07, and is statistically different from zero…The Employer Experiment affected only white women.’ The 7.07 ratio <strong>affects</strong>, they said, the 4.29 did not. This is a mistake. The best guess of the researchers was that the state got $4.29 for every dollar spent. The estimate was fuzzy, speaking of random sampling error alone. But that <strong>does not mean it is to be taken as zero</strong>.”</em> (chapter 9, page 99) <br><br></li>
<li><em>“Notice the respect for the approximate nature of social statistics in his very phrasing of ‘around 0.4’ instead of the 0.40768934 that his computer undoubtedly spewed out.”</em> (chapter 9, page 101) <br><br></li>
<li><em>“Real science changes one’s mind. That’s one way to see that the proliferation of unpersuasive significance tests is not real science.”</em> (chapter 9, page 101) <br><br></li>
<li><em>“At high sample sizes, all null hypotheses are rejected, by mathematical fact, without having to look at the data. No magic of instrumental variables is going to change that.”</em> (chapter 9, page 104) <br><br></li>
<li><em>“‘Caution, common sense, and patience…are quite likely to keep [the experimenter] more free from error…than the man of little caution and common sense who guides himself by a mechanical application of sampling rules. He will be more likely to remember that there are sources of error more important than fluctuations of sampling.’”</em> (chapter 10, page 114) <br><br></li>
<li><em>“‘It is possible for a result to be useful and possess wide standard error. A result obtained by definitions and techniques drawn up with care, and carried out by excellent interviewing and supervision may have wide standard error because the sample was small; yet such a result might be well preferable to one obtained with a bigger sample, with a smaller standard error, but whose definitions, techniques, and interviewing were out of line with best practice and knowledge of the subject matter.’ [W. Edwards Deming]”</em> (chapter 10, page 117) <br><br></li>
<li><em>“It’s embedded like a tax code in the bureaucracy of science.”</em> (chapter 11, page 124) <br><br></li>
<li><em>“…why actually replicate when the logic of Fisherian procedures gives you a virtual replication without the bother and expense? Why not go ahead and use the alloys F1 and F2 in airplanes? After all, p&lt;.05.”</em> (chapter 11, page 127) <br><br></li>
<li><em>“In denying the plurality of overlapping hypotheses, the Fisherian tester asks very little of the data. She sees the world through the lens of one hypothesis–the null.”</em> (chapter 12, page 133) <br><br></li>
<li><em>“If you are a Fisherian, the fact of a large sample becomes your problem. You’re deluded, thinking you’ve proved oomph before you’ve considered what it is.”</em> (chapter 12, page 135) <br><br></li>
<li><em>“It always depends on the loss, measured in side effects, treatment cost, death rates. The loss to a cool, scientific, impartial spectator will not be the same as the loss to the patient in question…[the balance between Type I/II errors] ‘must be left to the patient, friends, and family’.”</em> (chapter 12, page 137) <br><br></li>
<li><em>“Designing experiments to find the maximal and minimal effect size is a better way to get powerful results and to keep the focus where is should be, on the effect size itself…[William Sealy Gosset]: ‘We tend to think of effect size (when we think of it at all) as a fixed and immutable quantity that we attempt to detect. It may be more useful to think of effect size as a manipulable parameter than can, in a sense, be made larger through greater measurement accuracy.’”</em> (chapter 12, page 139) <br><br></li>
<li><em>“Some psychologists knew about the work of Neyman and Pearson and some even about that of the Bayesian Harold Jeffreys. But textbook authors, editors, and teachers–inspirited by Fisher’s promise of raising their fields to the level of hard science–helped Fisher win the day. Statistical education narrowed at the same time as it spread. Decision theory and inverse probability, and Gosset’s views on substantive significance, alternative hypotheses, and power, were pushed aside. Too introspective for the hard-boiled.”</em> (chapter 13, page 143) <br><br></li>
<li><em>“Fisher wrote in 1955, ‘In the US also the great importance of organized technology has I think made it easy to confuse the process appropriate for drawing correct conclusions, with those aimed rather at , let us say, speeding production, or saving money’. Notice the sneer by the new aristocracy of merit, as the clerisy fancied itself. Bourgeois production and money making, Fisher avers, are <strong>not</strong> the appropriate currencies of science.”</em> (chapter 13, page 145) <br><br></li>
<li><em>“Early on in an elementary statistics or psychometrics or econometrics book there might appear a loss function–‘what if it rains the day of the company picnic?’. But the loss function disappears when the book gets down to producing a formula for science.”</em> (chapter 13, page 146) <br><br></li>
<li><em>“Power, simulation, a variety of experiments, triangulation, actual replication, and exploratory data analysis leading to interocular trauma from the effect of magnitudes are different modes of affirming the consequent and are more generally a reasonable program of Gosset or Bayesian and Feynman confirmationism than is the dogma of Fisherian or Popperian falsificationism.”</em> (chapter 13, page 153) <br><br></li>
<li><em>“The Fisher test can shed light on the probability that ‘Quinn is dead’ given that ‘Quinn was hanged’. What the Fisher test wants to know and claims to measure is the opposite, the probability that Quinn was hanged, given that Quinn is dead…this probability is close to zero…In a nonhanging society people die for many reasons other than hanging…therefore being dead is very weak evidence indeed that Quinn was hanged…Being dead is ‘consistent with’ the hypothesis that Quinn was hanged as the positivist rhetoric of the Fisherian argument emphasizes. But so what? A myriad of other hypotheses…such as catching pneumonia or breaking your neck in a fall from your horse, are also consistent with it–‘it’ being the fact of being dead.”</em> (chapter 14, page 155) <br><br></li>
<li><em>“One of us has an elderly aunt who can sit in the garden of a hot, Indiana summer evening untouched by mosquitoes. She chalks up her immunity to a side effect of a ‘nuclear treatment’ received at midcentury to attack a tumor…Well, who’s to deny her? Medical science since the arrival of Fisher’s methods has had a problem with narrative…people believed that the use of p’s and t’s in the design and evaluation of clinical trials would mark an advance over old wive’s tales, crankery, anecdote, folkways, and fast-talking patent medicine salesmen. The dream of mechanization was as compelling in medicine as it was in war, social work, and philosophy of mind…‘Let the table decide’. At 5 percent the medical scientists suddenly submitted eyes locked hard in a sizeless stare. But the new method is just a mutation of old husband’s tales, statistical crankery, probabilistic anecdote, scientific folkways, and fast-talking, twenty-first-century, statistical patent medicine salesmen.”</em> (chapter 14, page 160) <br><br></li>
<li><em>“Even the rare courageous Fisherians do not deign to make a case for their procedures. They merely complain that the procedures are being criticized…being comfortably in control, appear inclined to leave things as they are…If you don’t have any arguments for an intellectual habit of a lifetime perhaps it is best to keep quiet”</em> (chapter 15, page 169) <br><br></li>
<li><em>“If one can see or hear the problem, one does not need to rely on correlations…doctors have lost many of their skills of physical assessment, even with the stethoscope (and certainly with their hands) and have come to rely on a medical literature deeply infected with Fisherianism.”</em> (chapter 15, page 175) <br><br></li>
<li><em>“The Fisherian tests of significance, the only tests employed by the original authors of the seventy-one studies, literally could not see the beneficial effects of the therapies under study, though staring at them.”</em> (chapter 16, page 179) <br><br></li>
<li><em>“The ‘sunshine herb’ [St.&nbsp;John’s wort] is frequently under attack (perhaps, one suspects, because it seems to be a cheap substitute for drugs)…the authors…concluded from the p-value that St.&nbsp;John’s-wort is not clinically effective. Doesn’t help, they said.”</em> (chapter 16, page 182) <br><br></li>
<li><em>“‘…They were made on different days at different hours. They all relate to the same nest’. Since Edgeworth had collected his own data, he knew his observations intimately; for example, he controlled exactly for nest and time-of-day heterogeneity, reducing error in observations that cannot be matched with a mere test of statistical significance on a data set downloaded from the Internet, no matter how mathematically advanced the ‘correction’.”</em> (chapter 17, page 189) <br><br></li>
<li><em>“Statistical significance can indicate the likelihood of the presence of an effect…But…so what?…Hoover an Siegler want to assign the responsibility to a man they call ‘practical’. Shades of Fisher: the scientist is replaced by a mechanical puppet who acknowledges a signal at p=.05, and the puppet–not the scientist who knows why it might matter–is called ‘practical’.”</em> (chapter 17, page 191) <br><br></li>
<li><em>“Statistics was not by any means the primary science on the Gower Street agenda. Biometry, but especially eugenics, was…Pearson’s papers and the archives of the Biometric and Galton labs survive. One finds in them the ephemera of a scientific racism common to the age, and to which Galton, Pearson, and Fisher were leading contributors…Value judgements–arguments about the arguments–and Gosset’s personal probability, were to be kept out of the neighborhood of their new sciences. Pearson would write in the 1920s against Jewish migration to Britain, and Fisher would write in the 1930s against material relief for poor people and literally in favor of relief for the rich on eugenic grounds. Such stuff was in the air…”</em> (chapter 18, page 199) <br><br></li>
<li><em>“An early case, applied to the eggs of the cuckoo bird, illustrates literally the feel of substantive as against statistical significance.”</em> (chapter 19, page 203) <br><br></li>
<li><em>“There are ways other than getting inside the mind of the victim to know what matters to her. For instance, one could measure with some difficulty and sacrifice (but good science is difficult and sacrificial)…”</em> (chapter 19, page 205) <br><br></li>
<li><em>“But Gosset in this study and others often found z or t beside the point. ‘You want to be able to say ’if farmers [or whomever] in general do this [i.e., follow a certain experimental method] they will make money by it’’. A criterion of merely statistical significance could not satisfy such taste.</em> (chapter 20, page 209) <br><br></li>
<li><em>“‘Fisher was vague. Karl Pearson was vague. Egon Pearson vague. Neyman vague. Fisher and Neyman were fiery. Silly! Egon Pearson was on the outside. They were all jealous of one another, afraid somebody would get ahead. Gosset didn’t have a jealous bone in his body. He asked the question [about power and alternative hypotheses]. Egon Pearson to a certain extent rephrased the question which Gosset had asked in statistical parlance. Neyman solved the problem mathematically.’ [Florence Nightingale David]”</em> (chapter 20, page 211) <br><br></li>
<li><em>“‘There must be essential similarity to ordinary practice…Experiments must be so arranged as to obtain the maximum possible correlation [not the maximum possible statistical significance] between figures which are to be compared [like Leamer and other oomph-ful scientists, Gosset thought in terms of upper and lower bound estimates, best and worst case scenarios]…Repetitions should be so arranged as to have the minimum possible correlation between repetitions (or the highest possible negative correlation)…There should be economy of effory [net pecuniary advantage in the 1905 sense]’ [Student (William Sealy Gosset)]. Fisher shrugged. The economic approach to the design of experiments was too difficult. He never did try Gosset’s way.”</em> (chapter 21, page 216) <br><br></li>
<li><em>“An ethical life of science seems to require an emotional life outside of it. ‘…he [Fisher] is glad to discuss…things early in the morning or late at night. But he is not glad or even willing to have others work on the purely theoretical aspects of his work. He expects others to accept his discoveries without even questioning them. He does <strong>not</strong> admit that anything he ever said or wrote was wrong. But he goes much further than that. He does not admit even that the <strong>way</strong> he said anything or the nomenclature he used could be improved in any way.’ [Raymond Birge]. Birge told Deming that Fisher was the most conceited man he ever met.”</em> (chapter 21, page 222) <br><br></li>
<li><em>“‘Though recognizable as a psychological condition of reluctance, or resistance to the acceptance of a proposition, the feeling induced by a test of significance has an objective basis in that the probability statement on which it is based is a fact communicable to and verifiable by, other rational minds. The level of significance in such cases fulfils the conditions of a measure of the rational grounds for the disbelief it engenders.’ [R.A. Fisher]”</em> (chapter 21, page 223) <br><br></li>
<li><em>“To evaluate size matters/how much would have forced Fisher to listen to and cooperate with others. Determining whether something matters to people depends on actually listening to people, as a heart surgeon listens to a radiologist, as a beer brewer listens to a customer. Admitting that size matters would have required Fisher to admit that regression coefficients ‘are capable of evaluation in any currency’. It would have put him in the unhappy position of having to communicate with others about the meaning of his findings. This, we have shown, he would not do.”</em> (chapter 21, page 224) <br><br></li>
<li><em>“Scientists, Fisher said, should ‘not assume’ their research is ‘capable of evaluation’. They must not work to ‘maximize profit’, he said in 1955, only for ‘faith’–a secular faith, he means, in the possibility that another mechanically calculated output of p-values by themselves could contribute to scientific progress. The scientist should not worry…whether their samples are random: just test, test, test, <strong>as if</strong> random. A 5 percent level of Type I error is, when ‘formally’ considered, says Fisher, the final judge of Science.”</em> (chapter 21, page 226) <br><br></li>
<li><em>“It is our experience that the more training a person has undergone in Fisherian methods the less easy it is for her to grasp our very elementary point…People who are highly trained in conventional economics have an especially difficult time. Most of them have no idea what we are talking about, though they are sure they do not approve. By contrast, undergraduates who have never had a statistics course, science and engineering professionals we work with or meet in our travels, businesspeople, musicians, activists, various colleagues in nonstatistical fields…as soon as they are able to grasp that we are <strong>not</strong> attacking statistics as such…these have no difficulty understanding our point and immediately begin wondering what the controversy is about.”</em> (chapter 23, page 239) <br><br></li>
<li><em>“One can take null-hypothesis significance testing as a sort of astrology, giving ‘decisions’ mechanically, justified within the system of astrology itself…Fisherisnism is <strong>bad</strong> input, straightforwardly misleading advice, erroneous astrology. Misleading advice is not made into good advice merely by its mechanical and pecuniary cheapness.”</em> (chapter 23, page 241/242) <br><br></li>
<li><em>“‘Adherence to the rules originally conceived as a means, becomes transformed into an end-in-itself’ [Robert Merton]. That seems about right: statistical significance, originally conceived as a means to substantive significance, became transformed by Fisher and then by bureaucracies of science into an end in itself. A t-tested certified fact will be ‘equally convincing to all rational minds, irrespective of any intentions they may have in utilizing knowledge inferred’.”</em> (chapter 23, page 243) <br><br></li>
<li><em>“If we were to assemble our socioeconomic observations into a single chain of thought its strongest link would be coupling Merton’s ‘bureaucracy’ with Hayek’s ‘scientism’. Scientism describes, ‘of course, an attitude which is decidedly unscientific in the true sense of the word, since it involves a mechanical and uncritical application of habits of thought to fields different from those in which they have been formed. The scentistic as distinguished from the scientific view is not an unprejudiced but a very prejudiced approach which, before it has considered its subject, claims to know what is the most appropriate way of investigating it’. [Hayek]. The trick is to unshackle the bureaucracy of scientism, to break its mechanical rules, change its prejudice incentives, create new rituals, train capacity. No simple trick.”</em> (chapter 23, page 244) <br><br></li>
<li><em>“They need to acquire the virtues necessary for performing repeated experiments on the same material. They need to hear that random error is one out of many dozens of errors and seldom the biggest.”</em> (chapter 24, page 246) <br><br></li>
<li><em>“In science, as against careerism or pure mathematics, it is better to be approximately correct and scientifically relevant than it is to be precisely correct but humanly irrelevant. Not even the fully specified power function, balancing the risk of errors from random sampling, provides a full solution to a scientific problem. In truth, as Kruskal never tired of remarking, statistical ‘significance’ poses no scientific problem at all. With the aid of a personal computer and a grant such significance is easy to achieve.”</em> (chapter 24, page 246) <br><br></li>
<li><em>“Statistical scientists can teach substance without sacrificing the rigor they so passionately seek. Real rigor will <strong>rise</strong> with increased attention to substance.”</em> (chapter 24, page 247) <br><br></li>
<li><em>“The textbooks are wrong. The teaching is wrong. The seminar you just attended is wrong. The most prestigious journal in your scientific field is wrong…Science is mainly a series of approximations to discovering the sources of error. Science is a systematic way of reducing wrongs or can be.”</em> (chapter 24, page 251) <br><br></li>
<li><em>“Perhaps you feel frustrated by the random epistemology of the mainstream but don’t know what to do. Perhaps you’ve been sedated by significance and lulled into silence. Perhaps you sense that the power of a Rothamsted test against a plausible Dublin alternative is statistically speaking low but are dazzled by the one-sided rhetoric of statistical significance. Perhaps you feel oppressed by the instrumental variable one should dare not to wield. Perhaps you feel frazzled by the ‘social psychological rhetoric of fear’ that keeps the abuse of significance in circulation. You want to come out of it. But perhaps you are cowed by the pretige of Fisherian dogma. Or, worse thought, perhaps you are cynically willing to be corrupted if it will keep a nice job. Repent, we say. Embrace your inner Gosset…‘Who are you going to believe–us or your own lying eyes?’”</em> (chapter 24, page 251)</li>
</ul>
</section>
<section id="references" class="level1">
<h1>References</h1>
<ol type="1">
<li>Deirdre McCloskey, Steve Ziliak. <a href="https://press.umich.edu/Books/T/The-Cult-of-Statistical-Significance2"><em>The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives</em></a>. University of Michigan Press. 2008. https://doi.org/10.3998/mpub.186351 (subtitle quote: chapter 10, page 112) <br><br></li>
<li>Sule S, DaCosta MC, DeCou E, Gilson C, Wallace K, Goff SL. <a href="https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2808358">Communication of COVID-19 Misinformation on Social Media by Physicians in the US</a>. JAMA Netw Open. 2023;6(8):e2328928. doi:10.1001/jamanetworkopen.2023.28928 <br><br></li>
<li>Roman YM, Burela PA, Pasupuleti V, Piscoya A, Vidal JE, Hernandez AV. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8394824/">Ivermectin for the Treatment of Coronavirus Disease 2019: A Systematic Review and Meta-analysis of Randomized Controlled Trials.</a> Clin Infect Dis. 2022 Mar 23;74(6):1022-1029. doi: 10.1093/cid/ciab591. PMID: 34181716; PMCID: PMC8394824. <br><br></li>
<li>Diaz GA, Parsons GT, Gering SK, Meier AR, Hutchinson IV, Robicsek A. <a href="https://jamanetwork.com/journals/jama/fullarticle/2782900">Myocarditis and Pericarditis After Vaccination for COVID-19</a>. JAMA. 2021;326(12):1210–1212. doi:10.1001/jama.2021.13443 <br><br></li>
<li>Cohen, J. (1994). <a href="https://doi.org/10.1037/0003-066X.49.12.997">The earth is round (p &lt; .05)</a>. American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997</li>
</ol>


<!-- -->

</section>

 ]]></description>
  <category>History</category>
  <category>Philosophy</category>
  <category>Research</category>
  <category>Statistical Significance</category>
  <guid>https://www.zajichekstats.com/post/statistical-significance-is-insignificant/</guid>
  <pubDate>Fri, 22 Dec 2023 06:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/statistical-significance-is-insignificant/feature.png" medium="image" type="image/png" height="145" width="144"/>
</item>
<item>
  <title>The overlap weight in survival analysis</title>
  <dc:creator>Alex Zajichek</dc:creator>
  <link>https://www.zajichekstats.com/post/the-overlap-weight/</link>
  <description><![CDATA[ 




<p>I was recently introduced to <a href="https://jamanetwork.com/journals/jama/article-abstract/2765748">overlap weighting</a>, which is part of a general family of methods for balancing covariates when estimating treatment effects with observational data. Specifically, it focuses on the <a href="https://www.sciencedirect.com/topics/medicine-and-dentistry/clinical-equipoise#:~:text=Clinical%20equipoise%20is%20defined%20as,Seminars%20in%20Vascular%20Surgery%2C%202022"><em>clinical equipoise</em></a>; that is, the patients in which the treatment decision is most uncertain. I find it more elegant than <a href="https://en.wikipedia.org/wiki/Propensity_score_matching">matching</a> (which I’ve <a href="https://jamanetwork.com/journals/jama/fullarticle/2749478">used in the past</a>), and figured a useful way to better understand the approach is to deconstruct, interpret, and translate a <a href="http://www2.stat.duke.edu/~fl35/OW/OW_survival_Demo.sas">SAS simulation</a> in the context of survival analysis implemented by the <a href="https://pubmed.ncbi.nlm.nih.gov/30189042/">original authors</a>. All code is written in (and translated to) <a href="https://www.r-project.org/"><code>R</code></a>. We’ll start by loading some packages.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(survival)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(reactable)</span></code></pre></div>
</details>
</div>
<section id="table-of-contents" class="level1">
<h1>Table of Contents</h1>
<ul>
<li>Setting up the potential outcomes
<ul>
<li>Simulate some patients</li>
<li>Define the treatment propensity</li>
<li>Generate the treatment assignment</li>
<li>Compute the (true) overlap weight</li>
<li>Set the treatment effect</li>
<li>Assign the observed outcome</li>
</ul></li>
<li>Estimating the weights
<ul>
<li>Model the propensity scores</li>
<li>Calculate the (estimated) overlap weight</li>
</ul></li>
<li>Estimating the treatment effect
<ul>
<li>A look at the true hazard ratio</li>
<li>The final estimate</li>
</ul></li>
</ul>
</section>
<section id="potentialoutcomes" class="level1">
<h1>Setting up the potential outcomes</h1>
<p>The theoretical underpinnings of <a href="https://pubmed.ncbi.nlm.nih.gov/30189042/">overlap weighting</a> live in the context of the <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5841618/#:~:text=The%20potential%20outcomes%20framework%20provides%20a%20way%20to%20quantify%20causal,exposure%20or%20intervention%20under%20consideration.">potential outcomes</a> paradigm of <a href="https://www.sciencedirect.com/topics/social-sciences/causal-inference">causal inference</a>. Basically, we wonder what <em>would have</em> happened had we been able to observe each patient under both treatments (known as the <a href="https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-5-28">counterfactual</a>). If we knew the outcomes from the two worlds, the causal treatment effect would simply be the average difference across all patients. The problem of course is that in reality the outcome can only be observed for the treatment in which the patient was assigned (or chose), and so our goal is to work with the “incomplete” information that we do have to try to estimate what the outcome difference would have been had the counterfactual been observed as well.</p>
<section id="simulatepatients" class="level2">
<h2 class="anchored" data-anchor-id="simulatepatients">Simulate some patients</h2>
<p>The first thing we need to do is generate some (fake) patients to facilitate the simulation. Here we’ll focus on age, sex, and income as patient-identifying characteristics.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set the sample size</span></span>
<span id="cb2-2">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span></span>
<span id="cb2-3"></span>
<span id="cb2-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set the seed</span></span>
<span id="cb2-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">123</span>)</span>
<span id="cb2-6"></span>
<span id="cb2-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Build the population</span></span>
<span id="cb2-8">population <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb2-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb2-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">age =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>),</span>
<span id="cb2-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">zage =</span> (age <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>,</span>
<span id="cb2-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">male =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbinom</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prob =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>),</span>
<span id="cb2-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">income =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>),</span>
<span id="cb2-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">zincome =</span> (income <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span></span>
<span id="cb2-15">  )</span>
<span id="cb2-16">population</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10,000 × 5
     age    zage  male income zincome
   &lt;dbl&gt;   &lt;dbl&gt; &lt;int&gt;  &lt;dbl&gt;   &lt;dbl&gt;
 1  44.4 -0.560      0   36.5  -1.35 
 2  47.7 -0.230      1   44.2  -0.579
 3  65.6  1.56       1   41.4  -0.861
 4  50.7  0.0705     1   59.7   0.973
 5  51.3  0.129      0   56.2   0.619
 6  67.2  1.72       1   63.9   1.39 
 7  54.6  0.461      1   35.1  -1.49 
 8  37.3 -1.27       1   56.4   0.639
 9  43.1 -0.687      1   53.7   0.375
10  45.5 -0.446      1   53.7   0.370
# ℹ 9,990 more rows</code></pre>
</div>
</div>
<p>We’ve generated a sample of 10000 patients. We’ll assume <code>age</code> is measured in years, <code>income</code> in thousands of dollars ($), and <code>male</code> is 1 for <em>Male</em> and 0 for <em>Female</em>. The columns <code>zage</code> and <code>zincome</code> are just <a href="https://statisticsbyjim.com/glossary/standardization/#:~:text=In%20statistics%2C%20standardization%20is%20the,standard%20deviation%20for%20a%20variable.">standardized</a> versions of the originals to avoid scaling nuances (we’ll refer to these as <img src="https://latex.codecogs.com/png.latex?age_z"> and <img src="https://latex.codecogs.com/png.latex?income_z">).</p>
</section>
<section id="treatmentpropensity" class="level2">
<h2 class="anchored" data-anchor-id="treatmentpropensity">Define the treatment propensity</h2>
<p>Here’s where the important theory starts to creep in (already). We assume that there is a <em>true</em> propensity score, say <img src="https://latex.codecogs.com/png.latex?p_i%5EA">, that is the <em>true</em> probability that patient <em>i</em> receives treatment <em>A</em> given their specific characteristics (and let’s assume there are possible treatments <em>A</em> &amp; <em>B</em>). Further, we assume (some transformation of) this probability is a linear combination of all the characteristics that confound the crude treatment effect on the outcome. In this simulation, we’ll assume the <a href="https://en.wikipedia.org/wiki/Logistic_regression">logit</a> model:</p>
<p><img src="https://latex.codecogs.com/png.latex?log(%5Cfrac%7Bp_i%5EA%7D%7B1%20-%20p_i%5EA%7D)%20=%200.41%20%5Ctimes%20age_z%20-%200.22%20%5Ctimes%20male%20-%200.69%20%5Ctimes%20income_z%20-%200.40"></p>
<p>So let’s add this <em>true</em> propensity score to the <code>population</code> data set:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">population <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb4-2">  population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-3">  </span>
<span id="cb4-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add the true propensity score (unknown quantity)</span></span>
<span id="cb4-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb4-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">log_odds_pA =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.4</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>zage <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>male <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>zincome,</span>
<span id="cb4-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pA =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>log_odds_pA))</span>
<span id="cb4-8">  )</span>
<span id="cb4-9">population</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10,000 × 7
     age    zage  male income zincome log_odds_pA    pA
   &lt;dbl&gt;   &lt;dbl&gt; &lt;int&gt;  &lt;dbl&gt;   &lt;dbl&gt;       &lt;dbl&gt; &lt;dbl&gt;
 1  44.4 -0.560      0   36.5  -1.35        0.311 0.577
 2  47.7 -0.230      1   44.2  -0.579      -0.315 0.422
 3  65.6  1.56       1   41.4  -0.861       0.606 0.647
 4  50.7  0.0705     1   59.7   0.973      -1.27  0.219
 5  51.3  0.129      0   56.2   0.619      -0.777 0.315
 6  67.2  1.72       1   63.9   1.39       -0.888 0.292
 7  54.6  0.461      1   35.1  -1.49        0.595 0.644
 8  37.3 -1.27       1   56.4   0.639      -1.58  0.171
 9  43.1 -0.687      1   53.7   0.375      -1.16  0.238
10  45.5 -0.446      1   53.7   0.370      -1.06  0.257
# ℹ 9,990 more rows</code></pre>
</div>
</div>
<p>Obviously in practice we don’t know what <img src="https://latex.codecogs.com/png.latex?p_i%5EA"> is, so it must be <em>estimated</em> from the treatment assignments we observe in the data. Additionally, we aren’t strictly required to assume the <a href="https://en.wikipedia.org/wiki/Logistic_regression">logit</a> model, but when we use it to estimate the propensity score for <a href="https://pubmed.ncbi.nlm.nih.gov/30189042/">overlap weighting</a>, it has the advantageous property of perfect balance. That is, the weighted-mean differences for all covariates included in the logistic regression model will be zero across the treatment groups.</p>
</section>
<section id="treatmentassignment" class="level2">
<h2 class="anchored" data-anchor-id="treatmentassignment">Generate the treatment assignment</h2>
<p>The treatment assignment is one of the <em>known</em> quantities we would observe in a real life sample, and that is what would be used as the dependent variable for <em>estimating</em> propensity scores. However, since we are running a simulation, we need to generate the treatment assignments from the governing distribution. We assume that the observed treatment for patient <em>i</em> is the result of an (unfair) coin flip that has a probability equal to the true propensity score. In statistical notation,</p>
<p><img src="https://latex.codecogs.com/png.latex?A_i%20%5Csim%20Bernoulli(p_i%5EA)"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?A_i"> is the indicator of treatment A. Let’s add a <em>realized</em> treatment assignment to the <code>population</code>:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">population <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb6-2">  population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-3">  </span>
<span id="cb6-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add the realized treatment assignment (known quantity)</span></span>
<span id="cb6-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb6-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">A =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbinom</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prob =</span> pA),</span>
<span id="cb6-7">  )</span>
<span id="cb6-8">population</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10,000 × 8
     age    zage  male income zincome log_odds_pA    pA     A
   &lt;dbl&gt;   &lt;dbl&gt; &lt;int&gt;  &lt;dbl&gt;   &lt;dbl&gt;       &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt;
 1  44.4 -0.560      0   36.5  -1.35        0.311 0.577     1
 2  47.7 -0.230      1   44.2  -0.579      -0.315 0.422     0
 3  65.6  1.56       1   41.4  -0.861       0.606 0.647     1
 4  50.7  0.0705     1   59.7   0.973      -1.27  0.219     0
 5  51.3  0.129      0   56.2   0.619      -0.777 0.315     1
 6  67.2  1.72       1   63.9   1.39       -0.888 0.292     1
 7  54.6  0.461      1   35.1  -1.49        0.595 0.644     0
 8  37.3 -1.27       1   56.4   0.639      -1.58  0.171     0
 9  43.1 -0.687      1   53.7   0.375      -1.16  0.238     0
10  45.5 -0.446      1   53.7   0.370      -1.06  0.257     0
# ℹ 9,990 more rows</code></pre>
</div>
</div>
<p>In this case, <code>A=1</code> represents treatment <em>A</em> and <code>A=0</code> represents treatment <em>B</em>. Again, this is the actual treatment we would observe the patient receiving in a sample.</p>
</section>
<section id="trueoverlapweight" class="level2">
<h2 class="anchored" data-anchor-id="trueoverlapweight">Compute the (true) overlap weight</h2>
<p>We need to define what the overlap weight is, and it’s actually quite simple: just assign the patient the probability they <em>did not</em> receive their observed treatment.</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Bequation%7D%0AOW_i=%0A%20%20%20%20%5Cbegin%7Bcases%7D%0A%20%20%20%20%20%20%20%201-p_i%5EA%20&amp;%20%5Ctext%7Bif%20%7D%20A_i=1%5C%5C%0A%20%20%20%20%20%20%20%20p_i%5EA%20&amp;%20%5Ctext%7Bif%20%7D%20A_i=0%5C%5C%0A%20%20%20%20%5Cend%7Bcases%7D%0A%5Cend%7Bequation%7D%0A"></p>
<p>Notice the notation. Since it depends on the true propensity score, the overlap weight is another <em>unknown</em> quantity that is estimated from the data. It also depends on the realized treatment assignment, so the collection of weights across patients differ depending on the observed treatment distribution. We can add these weights to the <code>population</code> data set:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">population <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb8-2">  population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-3">  </span>
<span id="cb8-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add the true overlap weight for the realized treatment (unknown quantity)</span></span>
<span id="cb8-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb8-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">OW =</span> A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>pA) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>A) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> pA</span>
<span id="cb8-7">  )</span>
<span id="cb8-8">population</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10,000 × 9
     age    zage  male income zincome log_odds_pA    pA     A    OW
   &lt;dbl&gt;   &lt;dbl&gt; &lt;int&gt;  &lt;dbl&gt;   &lt;dbl&gt;       &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt;
 1  44.4 -0.560      0   36.5  -1.35        0.311 0.577     1 0.423
 2  47.7 -0.230      1   44.2  -0.579      -0.315 0.422     0 0.422
 3  65.6  1.56       1   41.4  -0.861       0.606 0.647     1 0.353
 4  50.7  0.0705     1   59.7   0.973      -1.27  0.219     0 0.219
 5  51.3  0.129      0   56.2   0.619      -0.777 0.315     1 0.685
 6  67.2  1.72       1   63.9   1.39       -0.888 0.292     1 0.708
 7  54.6  0.461      1   35.1  -1.49        0.595 0.644     0 0.644
 8  37.3 -1.27       1   56.4   0.639      -1.58  0.171     0 0.171
 9  43.1 -0.687      1   53.7   0.375      -1.16  0.238     0 0.238
10  45.5 -0.446      1   53.7   0.370      -1.06  0.257     0 0.257
# ℹ 9,990 more rows</code></pre>
</div>
</div>
<p>In a real analysis, these weights are what we are ultimately after in order to estimate the average <em>causal</em> treatment effect on the outcome of interest while balancing differences in patient characteristics across the groups (i.e., removing the confounding factors).</p>
<section id="targetpopulation" class="level3">
<h3 class="anchored" data-anchor-id="targetpopulation">Target population</h3>
<p>Now we do end up normalizing the weights so they sum to one within each treatment group. But the intuition about what’s happening is that the focus of the treatment effect estimation is shifted to patients that are most likely in <em>either</em> treatment group. This is known as the <a href="https://www.sciencedirect.com/topics/medicine-and-dentistry/clinical-equipoise#:~:text=Clinical%20equipoise%20is%20defined%20as,Seminars%20in%20Vascular%20Surgery%2C%202022"><em>clinical equipoise</em></a>, and the resulting treatment effect is interpreted as the <em>average treatment effect in the overlap population</em>.</p>
<p>This makes a lot of sense. When we’re comparing treatments, we should focus most heavily on patients that could be candidates for either, and less so on patients that were bound for one. Thus, we up-weight patients who have the most overlap in characteristics with the opposing treatment group. The beautiful thing here is that we do this without throwing away any information; smoothly and proportionately weighting each subject just the amount that we should.</p>
</section>
</section>
<section id="treatmenteffect" class="level2">
<h2 class="anchored" data-anchor-id="treatmenteffect">Set the treatment effect</h2>
<p>We’ve generated the treatments and established how they are related to patient characteristics, but haven’t talked about the outcome in which we’re ultimately interested in estimating the treatment effect for. Since we’re focusing on <a href="https://www.publichealth.columbia.edu/research/population-health-methods/time-event-data-analysis">time-to-event</a> outcomes, we’ll stay in that context, but the general idea is that we assume there <em>exists</em> a realization of what a patient’s outcome would have been under each treatment scenario. Then if we compare the difference of those outcomes across all patients, the average difference must be caused by the treatment.</p>
<section id="defining-the-event-times" class="level3">
<h3 class="anchored" data-anchor-id="defining-the-event-times">Defining the event times</h3>
<p>In this simulation, we’ll generate event times from a <a href="https://en.wikipedia.org/wiki/Weibull_distribution">Weibull</a> distribution. Starting at treatment initiation, this might be the time until cancer recurrence, hospitalization, or something else; we’re just looking to see how long it takes for some event to occur. The <a href="https://en.wikipedia.org/wiki/Probability_density_function">PDF</a> for this distribution looks <a href="https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Weibull.html">like this</a>:</p>
<p><img src="https://latex.codecogs.com/png.latex?f(t)%20=%20%5Cfrac%7B%5Calpha%7D%7B%5Csigma%7D%5Cleft(%5Cfrac%7Bt%7D%7B%5Csigma%7D%5Cright)%5E%7B%5Calpha-1%7De%5E%7B-%5Cleft(%5Cfrac%7Bt%7D%7B%5Csigma%7D%5Cright)%5E%5Calpha%7D"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?t"> is the event time, <img src="https://latex.codecogs.com/png.latex?%5Calpha"> is the <em>shape</em> parameter, and <img src="https://latex.codecogs.com/png.latex?%5Csigma"> is the <em>scale</em> parameter. In our example, we will say that:</p>
<p><img src="https://latex.codecogs.com/png.latex?T_i%5EA%20%5Csim%20Weibull(%5Calpha,%20%5Csigma_i%5EA)"> <img src="https://latex.codecogs.com/png.latex?T_i%5EB%20%5Csim%20Weibull(%5Calpha,%20%5Csigma_i%5EB)"> where <img src="https://latex.codecogs.com/png.latex?T_i%5EA"> and <img src="https://latex.codecogs.com/png.latex?T_i%5EB"> are the event times under treatments <em>A</em> and <em>B</em>, respectively. That is, the event times for each patient under each treatment follow a <a href="https://en.wikipedia.org/wiki/Weibull_distribution">Weibull</a> distribution with a common <em>shape</em> parameter (in this case, across both treatments), but a <em>scale</em> parameter that depends on the patient’s specific characteristics (in <em>log-linear</em> form) and differs by treatment, which captures the treatment effect. Specifically,</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Calpha%20=%201"> <img src="https://latex.codecogs.com/png.latex?%5Csigma_i%5EA%20=%20%5Clambda%20%5Ctimes%20e%5E%7B-(%5Cphi_i%20-%200.36)%7D"> <img src="https://latex.codecogs.com/png.latex?%5Csigma_i%5EB%20=%20%5Clambda%20%5Ctimes%20e%5E%7B-%5Cphi_i%7D"> <img src="https://latex.codecogs.com/png.latex?%5Cphi_i%20=%201.10%20%5Ctimes%20age_z%20-%200.22%20%5Ctimes%20male%20-%200.36%20%5Ctimes%20income_z"> <img src="https://latex.codecogs.com/png.latex?%5Clambda%20=%204055.56"></p>
<p>Basically, the <img src="https://latex.codecogs.com/png.latex?%5Cphi_i"> term does the baseline adjustment on the outcome risk for the patient’s specific characteristics, and the treatment effect is simply a fixed, additive deviation from that for all patients. The <img src="https://latex.codecogs.com/png.latex?%5Clambda"> term is the <a href="https://www.linkedin.com/advice/0/how-do-you-interpret-hazard-ratio-baseline-function">baseline hazard function</a>, providing the <em>scale</em> parameter when all covariate values are zero (for treatment <em>B</em>), which in this case is constant over time.</p>
<section id="interpretingtreatmenteffect" class="level4">
<h4 class="anchored" data-anchor-id="interpretingtreatmenteffect">Interpreting the treatment effect</h4>
<p>When setting <img src="https://latex.codecogs.com/png.latex?%5Calpha=1">, the mean of a <a href="https://en.wikipedia.org/wiki/Weibull_distribution">Weibull</a> distribution is equal to its scale parameter. That is,</p>
<p><img src="https://latex.codecogs.com/png.latex?E%5BT_i%5EA%5D%20=%20%5Csigma_i%5EA"> <img src="https://latex.codecogs.com/png.latex?E%5BT_i%5EB%5D%20=%20%5Csigma_i%5EB"> This gives us a very nice interpretation of the treatment effect. If we compare them <em>relatively</em>, we get:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Bequation%7D%0A%5Cbegin%7Bsplit%7D%0A%5Cfrac%7BE%5BT_i%5EA%5D%7D%7BE%5BT_i%5EB%5D%7D%20&amp;%20=%20%5Cfrac%7B%5Csigma_i%5EA%7D%7B%5Csigma_i%5EB%7D%20%5C%5C%0A&amp;%20=%20%5Cfrac%7B%5Clambda%20%5Ctimes%20e%5E%7B-(%5Cphi_i%20-%200.36)%7D%7D%7B%5Clambda%20%5Ctimes%20e%5E%7B-%5Cphi_i%7D%7D%20%5C%5C%0A&amp;%20=%20%5Cfrac%7Be%5E%7B-(%5Cphi_i%20-%200.36)%7D%7D%7Be%5E%7B-%5Cphi_i%7D%7D%20%5C%5C%0A&amp;%20=%20%5Cfrac%7Be%5E%7B0.36%7De%5E%7B-%5Cphi_i%7D%7D%7Be%5E%7B-%5Cphi_i%7D%7D%20%5C%5C%0A&amp;%20=%20e%5E%7B0.36%7D%20%5C%5C%0A&amp;%20%5Capprox%201.43%0A%5Cend%7Bsplit%7D%0A%5Cend%7Bequation%7D%0A"></p>
<p>So we can say that the <em>time to event for patients on treatment A is 1.43 times, or 43%, longer on average than those on treatment B, given a fixed age, sex, and income</em>. That is, treatment A is “better” than treatment B.</p>
</section>
</section>
<section id="samplingeventtimes" class="level3">
<h3 class="anchored" data-anchor-id="samplingeventtimes">Sampling the event times</h3>
<p>Next, we need to actually sample the times for our <code>population</code> (<em>note that it was confirmed that SAS and R have the same parameterizations</em>):</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">population <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb10-2">  population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-3">  </span>
<span id="cb10-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sample outcome event times under each treatment (counterfactual)</span></span>
<span id="cb10-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb10-6">    </span>
<span id="cb10-7">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Baseline hazard (constant over time)</span></span>
<span id="cb10-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lambda =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">365</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.09</span>,</span>
<span id="cb10-9">    </span>
<span id="cb10-10">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Define TRUE Weibull linear predictor</span></span>
<span id="cb10-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">phi =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> zage <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> male <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">70</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> zincome,</span>
<span id="cb10-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sigma_A =</span> lambda <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>(phi <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.70</span>))),</span>
<span id="cb10-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sigma_B =</span> lambda <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>phi),</span>
<span id="cb10-14">    </span>
<span id="cb10-15">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Generate REALIZED survival times under each treatment scenario (same parameterizations in SAS)</span></span>
<span id="cb10-16">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">t_A =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rweibull</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scale =</span> sigma_A),</span>
<span id="cb10-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">t_B =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rweibull</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scale =</span> sigma_B), </span>
<span id="cb10-18">  )</span>
<span id="cb10-19">population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(lambda, phi, sigma_A, sigma_B, t_A, t_B)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10,000 × 6
   lambda     phi sigma_A sigma_B    t_A   t_B
    &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt;
 1  4056. -0.133    6617.   4632.  5689. 7760.
 2  4056. -0.269    7585.   5309.  1831. 9482.
 3  4056.  1.80      961.    673.   488.  300.
 4  4056. -0.493    9482.   6637. 23722.  255.
 5  4056. -0.0788   6269.   4388.  7649. 1800.
 6  4056.  1.17     1804.   1263.  1953. 1493.
 7  4056.  0.814    2568.   1797.  5462. 2652.
 8  4056. -1.84    36514.  25560. 38262. 2385.
 9  4056. -1.11    17604.  12323.  3577. 6894.
10  4056. -0.845   13485.   9439. 38541. 3743.
# ℹ 9,990 more rows</code></pre>
</div>
</div>
<p>Now we can make some density plots comparing the event time distributions across treatments (we’ll use <em>log</em> scaling for visual appeal):</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb12-2">  </span>
<span id="cb12-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send down the rows</span></span>
<span id="cb12-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb12-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">starts_with</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"t_"</span>),</span>
<span id="cb12-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Treatment"</span>,</span>
<span id="cb12-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"EventTime"</span>,</span>
<span id="cb12-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_prefix =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"t_"</span></span>
<span id="cb12-9">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb12-10">  </span>
<span id="cb12-11">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a plot</span></span>
<span id="cb12-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb12-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_density</span>(</span>
<span id="cb12-14">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb12-15">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(EventTime),</span>
<span id="cb12-16">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> Treatment</span>
<span id="cb12-17">    ),</span>
<span id="cb12-18">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">75</span></span>
<span id="cb12-19">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb12-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_x_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(x))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb12-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb12-22">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb12-23">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.ticks.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb12-24">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb12-25">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb12-26">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"top"</span></span>
<span id="cb12-27">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb12-28">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Event Time"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/the-overlap-weight/index_files/figure-html/unnamed-chunk-7-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Notice the shift to the right in the distribution for treatment <em>A</em>. This is the treatment effect. In the hypothetical world where all patients received treatment <em>A</em>, the event times happened <em>later</em> than in the world where all patients received treatment <em>B</em> (and all event times were observed), indicating, on average, a “benefit” to treatment <em>A</em>, which was expected from our prior calculation.</p>
</section>
</section>
<section id="observedoutcome" class="level2">
<h2 class="anchored" data-anchor-id="observedoutcome">Assign the observed outcome</h2>
<p>The final step for simulation setup is to assign the event time as we might observe in a real data set. The event times generated in the previous section capture both treatment scenarios for all patients, but in reality we would only (potentially) observe the event time for the treatment the patient received. Additionally, we likely (or definitely) won’t be following all patients long enough to observe all events occur, meaning some patients will be <a href="https://en.wikipedia.org/wiki/Survival_analysis#:~:text=Censoring%20%2F%20Censored%20observation%3A%20Censoring%20occurs,after%20the%20time%20of%20censoring.">censored</a>. Let’s add the observed outcomes to the <code>population</code>:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">population <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb13-2">  population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb13-3">  </span>
<span id="cb13-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add the observed outcome</span></span>
<span id="cb13-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb13-6">    </span>
<span id="cb13-7">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># The ACTUAL event time outcome depends on the treatment ACTUALLY observed for the patient</span></span>
<span id="cb13-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">actual_event_time =</span> t_B <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> A) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> t_A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> A,</span>
<span id="cb13-9">    </span>
<span id="cb13-10">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Generate a REALIZED censoring time (completely random)</span></span>
<span id="cb13-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">censor_time =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">500</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">500</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runif</span>(n),</span>
<span id="cb13-12">    </span>
<span id="cb13-13">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate the OBSERVED time in the data set (known quantity)</span></span>
<span id="cb13-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">time =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pmin</span>(actual_event_time, censor_time), <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># They either had the event, or were censored first</span></span>
<span id="cb13-15">    </span>
<span id="cb13-16">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate the event status (TRUE if event observed, FALSE if censored)</span></span>
<span id="cb13-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">status =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(actual_event_time <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> censor_time)</span>
<span id="cb13-18">  )</span>
<span id="cb13-19">population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(actual_event_time, censor_time, time, status)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10,000 × 4
   actual_event_time censor_time  time status
               &lt;dbl&gt;       &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt;
 1             5689.        843.  843.      0
 2             9482.        814.  814.      0
 3              488.        882.  488.      1
 4              255.        936.  255.      1
 5             7649.        653.  653.      0
 6             1953.        890.  890.      0
 7             2652.        737.  737.      0
 8             2385.        703.  703.      0
 9             6894.        952.  952.      0
10             3743.        526.  526.      0
# ℹ 9,990 more rows</code></pre>
</div>
</div>
<p>Our final simulated data set has the following <em>observed</em> outcome summaries:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-2">  </span>
<span id="cb15-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute summary metrics</span></span>
<span id="cb15-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb15-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Patients =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>(),</span>
<span id="cb15-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Time =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(time),</span>
<span id="cb15-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(A, status)</span>
<span id="cb15-8">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb15-9">  </span>
<span id="cb15-10">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add shares within treatment</span></span>
<span id="cb15-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb15-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Percent =</span> Patients <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(Patients),</span>
<span id="cb15-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> A</span>
<span id="cb15-14">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-15">  </span>
<span id="cb15-16">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clean/rearrange</span></span>
<span id="cb15-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">transmute</span>(</span>
<span id="cb15-18">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Treatment =</span> </span>
<span id="cb15-19">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb15-20">        A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A"</span>,</span>
<span id="cb15-21">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"B"</span></span>
<span id="cb15-22">      ),</span>
<span id="cb15-23">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Status =</span> </span>
<span id="cb15-24">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb15-25">        status <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Event"</span>,</span>
<span id="cb15-26">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Censored"</span></span>
<span id="cb15-27">      ),</span>
<span id="cb15-28">    Patients,</span>
<span id="cb15-29">    Percent,</span>
<span id="cb15-30">    Time</span>
<span id="cb15-31">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-32">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(Treatment, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(Status)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-33">  </span>
<span id="cb15-34">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a table</span></span>
<span id="cb15-35">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">reactable</span>(</span>
<span id="cb15-36">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">groupBy =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Treatment"</span>,</span>
<span id="cb15-37">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> </span>
<span id="cb15-38">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb15-39">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Patients =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">aggregate =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sum"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">align =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"center"</span>),</span>
<span id="cb15-40">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Percent =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Percent of patients in treatment group (%)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">aggregate =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sum"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">align =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"center"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colFormat</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">percent =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)),</span>
<span id="cb15-41">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Time =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Avg. time to outcome"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">aggregate =</span> zildge<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rectbl_agg_wtd</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Patients"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">align =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"center"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colFormat</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb15-42">      ),</span>
<span id="cb15-43">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">striped =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb15-44">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">highlight =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb15-45">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bordered =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb15-46">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resizable =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb15-47">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">theme =</span> reactablefmtr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sandstone</span>()</span>
<span id="cb15-48">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-49">  reactablefmtr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_source</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Use arrows to expand table"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">font_size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">font_style =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"italic"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div class="reactable html-widget html-fill-item" id="htmlwidget-a2195d80fa4b22595ce8" style="width:auto;height:auto;"></div>
<p style="color:#000;background:#FFFFFF;text-align:left;font-size:12px;font-style:italic;font-weight:normal;text-decoration:;letter-spacing:px;word-spacing:px;text-transform:;text-shadow:;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px">Use arrows to expand table</p>
<script type="application/json" data-for="htmlwidget-a2195d80fa4b22595ce8">{"x":{"tag":{"name":"Reactable","attribs":{"data":{"Treatment":["A","A","B","B"],"Status":["Event","Censored","Event","Censored"],"Patients":[836,3046,1137,4981],"Percent":[0.215352910870685,0.784647089129315,0.185845047401111,0.814154952598888],"Time":[345.503703338068,745.228837321148,343.475165801799,746.272923924017]},"columns":[{"id":"Treatment","name":"Treatment","type":"character"},{"id":"Status","name":"Status","type":"character"},{"id":"Patients","name":"Patients","type":"numeric","aggregate":"sum","align":"center"},{"id":"Percent","name":"Percent of patients in treatment group (%)","type":"numeric","aggregate":"sum","format":{"cell":{"digits":1,"percent":true},"aggregated":{"digits":1,"percent":true}},"align":"center"},{"id":"Time","name":"Avg. time to outcome","type":"numeric","aggregate":"function(values, rows) {\n            var numerator = 0\n            var denominator = 0\n\n            rows.forEach(function(row, index) {\n                numerator += row['Patients'] * values[index]\n                denominator += row['Patients']\n            })\n\n            if('mean' == 'mean') {\n                return numerator / denominator\n            } else {\n                return numerator\n            }\n        }","format":{"cell":{"digits":1},"aggregated":{"digits":1}},"align":"center"}],"groupBy":["Treatment"],"resizable":true,"highlight":true,"bordered":true,"striped":true,"theme":{"color":"#3e3f3a","backgroundColor":"#ffffff","borderColor":"#f8f5f0","borderWidth":"1px","stripedColor":"#ededed","highlightColor":"#f8f5f0","cellPadding":6,"tableStyle":{"fontSize":15},"headerStyle":{"borderWidth":"2px","backgroundColor":"#f8f5f0","color":"#7c7a78","transitionDuration":"0.5s","&:hover[aria-sort]":{"color":"#000000"},"&[aria-sort='ascending'], &[aria-sort='descending']":{"color":"#000000"},"fontSize":16},"groupHeaderStyle":{"&:not(:empty)":{"color":"#3e3f3a","fontSize":16},"&:hover":{"fontWeight":"bold","transitionDuration":"1s","transitionTimingFunction":"ease-out","color":"#000000"}},"rowSelectedStyle":{"backgroundColor":"#dfd7ca","color":"#8e8c84"},"inputStyle":{"backgroundColor":"#ffffff","borderColor":"#bcbfc1","color":"#3e3f3a"},"searchInputStyle":{"backgroundColor":"#ffffff","color":"#3e3f3a","borderColor":"#bcbfc1","&:focus":{"color":"#3e3f3a"}},"selectStyle":{"backgroundColor":"#dfd7ca","color":"#8e8c84","borderColor":"#ffffff","outlineColor":"#ffffff"},"pageButtonStyle":{"backgroundColor":"#f8f5f0","color":"#8e8c84","&:hover":{"backgroundColor":"#f3969a","color":"#8e8c84"}},"pageButtonHoverStyle":{"backgroundColor":"#dfd7ca","color":"#8e8c84"},"pageButtonActiveStyle":{"backgroundColor":"#dfd7ca","color":"#8e8c84"},"pageButtonCurrentStyle":{"backgroundColor":"#dfd7ca","color":"#8e8c84"}},"dataKey":"803f0e909881888862e5822d6c0d5d1e"},"children":[]},"class":"reactR_markup"},"evals":["tag.attribs.columns.4.aggregate"],"jsHooks":[]}</script>
</div>
</div>
<p><br></p>
<p>We can also look at the survival curves.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-2">  </span>
<span id="cb16-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Nest by treatment</span></span>
<span id="cb16-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(A) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nest</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-6">  </span>
<span id="cb16-7">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Esimtate survival curves for each treatment</span></span>
<span id="cb16-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb16-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Surv =</span> </span>
<span id="cb16-10">      data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-11">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(</span>
<span id="cb16-12">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(.trt) {</span>
<span id="cb16-13">          </span>
<span id="cb16-14">          <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit the model</span></span>
<span id="cb16-15">          mod <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">survfit</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Surv</span>(time, status) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> .trt)</span>
<span id="cb16-16">          </span>
<span id="cb16-17">          <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract the elements</span></span>
<span id="cb16-18">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb16-19">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Time =</span> mod<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>time,</span>
<span id="cb16-20">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Survival =</span> mod<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>surv,</span>
<span id="cb16-21">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Lower =</span> mod<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>lower,</span>
<span id="cb16-22">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Upper =</span> mod<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>upper</span>
<span id="cb16-23">          )</span>
<span id="cb16-24">          </span>
<span id="cb16-25">        }</span>
<span id="cb16-26">      )</span>
<span id="cb16-27">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb16-28">  </span>
<span id="cb16-29">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Unnest the curves</span></span>
<span id="cb16-30">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>data) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-31">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> Surv) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-32">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-33"></span>
<span id="cb16-34">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clean labels</span></span>
<span id="cb16-35">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb16-36">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Treatment =</span> </span>
<span id="cb16-37">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb16-38">        A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A"</span>,</span>
<span id="cb16-39">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"B"</span></span>
<span id="cb16-40">      )</span>
<span id="cb16-41">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-42">  </span>
<span id="cb16-43">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a plot</span></span>
<span id="cb16-44">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(</span>
<span id="cb16-45">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb16-46">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Time,</span>
<span id="cb16-47">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> Survival</span>
<span id="cb16-48">    )</span>
<span id="cb16-49">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb16-50">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> Treatment)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb16-51">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_ribbon</span>(</span>
<span id="cb16-52">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb16-53">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ymin =</span> Lower,</span>
<span id="cb16-54">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ymax =</span> Upper,</span>
<span id="cb16-55">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> Treatment</span>
<span id="cb16-56">    ),</span>
<span id="cb16-57">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25</span></span>
<span id="cb16-58">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb16-59">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>percent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb16-60">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb16-61">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb16-62">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.major.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray"</span>),</span>
<span id="cb16-63">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"top"</span></span>
<span id="cb16-64">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb16-65">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Time since treatment"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb16-66">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Survival Probability"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/the-overlap-weight/index_files/figure-html/unnamed-chunk-10-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>If we weren’t doing a simulation, we wouldn’t know how much of these differences are explained by the treatment. And actually, these crude survival curves suggest longer survival from treatment <em>B</em>, even though we know the opposite is true. The goal is to use this <em>known</em> information (along with patient characteristics) to estimate the causal treatment effect that is baked into all of the <em>unknown</em> quantities we have here that would be unavailable to us in a real-life analysis.</p>
</section>
</section>
<section id="estimatingweights" class="level1">
<h1>Estimating the weights</h1>
<p>Now that our simulation is set, we can start using the observed data (i.e., known quantities) to estimate the case-weights needed for treatment effect estimation. This is the process we’d go through in a real-life analysis, but here we have the benefit of knowing the truth, so we can see how close our estimations are to reality.</p>
<p>The intention of these weights is to <em>adjust</em> the observed sample to create a <em>pseudo-population</em> such that the patient characteristics that we believe are confounding the treatment effect are corrected for, or <em>balanced</em>, across the treatment groups. The term “pseudo” here is a little misleading. It’s simply referring to the fact that each patient will not contribute the same weight in estimating the treatment effect. In fact, the patients with the most <em>overlap</em> in characteristics across the treatment groups will contribute the most, with patients being proportionately down-weighted the less overlap they have (I’d argue this is the same thing that is done in <a href="https://en.wikipedia.org/wiki/Propensity_score_matching">matching</a>, it’s just that the weights for patients are either exactly <em>1</em> or <em>0</em>). This is done to tease out the portion of the outcome differences that is explained by the treatment, namely, the causal treatment effect.</p>
<section id="modelpropensityscores" class="level2">
<h2 class="anchored" data-anchor-id="modelpropensityscores">Model the propensity scores</h2>
<p>We’ve established that the overlap weights are quantities that need to be estimated from the data. In order to calculate those weights, we first need to estimate the propensity scores. We know what the true model is, but let’s assume we are only working with observable data:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(zage, male, zincome, A, time, status)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10,000 × 6
      zage  male zincome     A  time status
     &lt;dbl&gt; &lt;int&gt;   &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt;  &lt;dbl&gt;
 1 -0.560      0  -1.35      1  843.      0
 2 -0.230      1  -0.579     0  814.      0
 3  1.56       1  -0.861     1  488.      1
 4  0.0705     1   0.973     0  255.      1
 5  0.129      0   0.619     1  653.      0
 6  1.72       1   1.39      1  890.      0
 7  0.461      1  -1.49      0  737.      0
 8 -1.27       1   0.639     0  703.      0
 9 -0.687      1   0.375     0  952.      0
10 -0.446      1   0.370     0  526.      0
# ℹ 9,990 more rows</code></pre>
</div>
</div>
<p>Our goal is to obtain patient-specific probabilities of receiving treatment <em>A</em>. We’ll estimate this with a <a href="https://en.wikipedia.org/wiki/Logistic_regression">logistic regression</a> model, but we’ll allow for some flexibility in the shape of the relationships between the continuous variables (age and income) and the outcome with the use of <a href="https://www.nature.com/articles/s41409-019-0679-x">restricted cubic splines</a> (see my <a href="https://www.zajichekstats.com/post/the-evasive-spline/">other post</a> for another explanation):</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit the propensity score model</span></span>
<span id="cb19-2">ps_mod <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb19-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glm</span>(</span>
<span id="cb19-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">formula =</span> A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> rms<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rcs</span>(zage, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> rms<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rcs</span>(zincome, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(male),</span>
<span id="cb19-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> population,</span>
<span id="cb19-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"binomial"</span></span>
<span id="cb19-7">  )</span>
<span id="cb19-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(ps_mod)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
glm(formula = A ~ rms::rcs(zage, 3) + rms::rcs(zincome, 3) + 
    factor(male), family = "binomial", data = population)

Coefficients:
                             Estimate Std. Error z value Pr(&gt;|z|)    
(Intercept)                  -0.30027    0.06527  -4.600 4.22e-06 ***
rms::rcs(zage, 3)zage         0.40206    0.05099   7.885 3.16e-15 ***
rms::rcs(zage, 3)zage'       -0.07337    0.05813  -1.262    0.207    
rms::rcs(zincome, 3)zincome  -0.63792    0.04950 -12.888  &lt; 2e-16 ***
rms::rcs(zincome, 3)zincome' -0.04020    0.06434  -0.625    0.532    
factor(male)1                -0.25432    0.04415  -5.760 8.41e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 13359  on 9999  degrees of freedom
Residual deviance: 12193  on 9994  degrees of freedom
AIC: 12205

Number of Fisher Scoring iterations: 4</code></pre>
</div>
</div>
<p>The model seems to indicate that the non-linear terms for age and income don’t particularly matter (which is expected), but we’ll leave it as-is. Next, let’s attach the <em>estimated</em> propensity scores to the <code>population</code>:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">population <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb21-2">  population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb21-3">  </span>
<span id="cb21-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add the estimated propensity score</span></span>
<span id="cb21-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb21-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pA_hat =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(ps_mod, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"response"</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># P(A = 1 | X)</span></span>
<span id="cb21-7">  )</span></code></pre></div>
</details>
</div>
<p>We can take a look at the estimated propensity score distribution across the treatment groups (we’ll also overlay the <em>true</em> distributions for comparison):</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1">population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-2">  </span>
<span id="cb22-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send PS scores down the rows</span></span>
<span id="cb22-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb22-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(pA, pA_hat),</span>
<span id="cb22-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Type"</span>,</span>
<span id="cb22-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Score"</span></span>
<span id="cb22-8">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-9"></span>
<span id="cb22-10">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clean labels</span></span>
<span id="cb22-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb22-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Treatment =</span> </span>
<span id="cb22-13">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb22-14">        A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A"</span>,</span>
<span id="cb22-15">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"B"</span></span>
<span id="cb22-16">      ),</span>
<span id="cb22-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Type =</span> </span>
<span id="cb22-18">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb22-19">        Type <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pA"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"True"</span>,</span>
<span id="cb22-20">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Estimated"</span></span>
<span id="cb22-21">      )</span>
<span id="cb22-22">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-23">  </span>
<span id="cb22-24">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a plot</span></span>
<span id="cb22-25">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb22-26">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_density</span>(</span>
<span id="cb22-27">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb22-28">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Score,</span>
<span id="cb22-29">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> Treatment,</span>
<span id="cb22-30">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> Type</span>
<span id="cb22-31">    ),</span>
<span id="cb22-32">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">40</span></span>
<span id="cb22-33">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-34">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_x_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>percent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-35">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb22-36">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb22-37">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.ticks.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb22-38">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb22-39">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb22-40">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"top"</span></span>
<span id="cb22-41">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-42">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"P(A=1|X)"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/the-overlap-weight/index_files/figure-html/unnamed-chunk-14-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>The model does a great job at estimating the true propensity scores. In a real analysis we probably wouldn’t be this close since we likely wouldn’t have only and all true confounders accounted for.</p>
<section id="visualizing-propensity-score-effects" class="level3">
<h3 class="anchored" data-anchor-id="visualizing-propensity-score-effects">Visualizing propensity score effects</h3>
<p>In the same vein, we can explore the modeled relationships between each patient characteristic and the propensity scores.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1">plots <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb23-2">  population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb23-3">  </span>
<span id="cb23-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send PS scores down the rows</span></span>
<span id="cb23-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb23-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(pA, pA_hat),</span>
<span id="cb23-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Type"</span>,</span>
<span id="cb23-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Score"</span></span>
<span id="cb23-9">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb23-10">  </span>
<span id="cb23-11">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clean labels</span></span>
<span id="cb23-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb23-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Treatment =</span> </span>
<span id="cb23-14">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb23-15">        A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A"</span>,</span>
<span id="cb23-16">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"B"</span></span>
<span id="cb23-17">      ),</span>
<span id="cb23-18">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Type =</span> </span>
<span id="cb23-19">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb23-20">        Type <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pA"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"True"</span>,</span>
<span id="cb23-21">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Estimated"</span></span>
<span id="cb23-22">      )</span>
<span id="cb23-23">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb23-24">  </span>
<span id="cb23-25">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send covariates down the rows</span></span>
<span id="cb23-26">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb23-27">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(zage, zincome, male),</span>
<span id="cb23-28">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_prefix =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^z"</span></span>
<span id="cb23-29">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb23-30">  </span>
<span id="cb23-31">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make groups</span></span>
<span id="cb23-32">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb23-33">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Group =</span> </span>
<span id="cb23-34">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb23-35">        name <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"male"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"categorical"</span>,</span>
<span id="cb23-36">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"continuous"</span></span>
<span id="cb23-37">      )</span>
<span id="cb23-38">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb23-39">  </span>
<span id="cb23-40">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a nested frame</span></span>
<span id="cb23-41">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(Group) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb23-42">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nest</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb23-43">  </span>
<span id="cb23-44">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make plot depending on type</span></span>
<span id="cb23-45">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb23-46">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">plot =</span> </span>
<span id="cb23-47">      data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb23-48">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(</span>
<span id="cb23-49">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(.group) {</span>
<span id="cb23-50">          </span>
<span id="cb23-51">          <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Check condition</span></span>
<span id="cb23-52">          <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n_distinct</span>(.group<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>value) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) {</span>
<span id="cb23-53">            </span>
<span id="cb23-54">            .group <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb23-55">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb23-56">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Rate =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(Score),</span>
<span id="cb23-57">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(Treatment, Type, name, value)</span>
<span id="cb23-58">              ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb23-59">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb23-60">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Sex =</span> </span>
<span id="cb23-61">                  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb23-62">                    value <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Male"</span>,</span>
<span id="cb23-63">                    <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Female"</span></span>
<span id="cb23-64">                  ),</span>
<span id="cb23-65">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sex"</span></span>
<span id="cb23-66">              ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb23-67">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-68">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(</span>
<span id="cb23-69">                <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb23-70">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Sex,</span>
<span id="cb23-71">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> Rate,</span>
<span id="cb23-72">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> Treatment,</span>
<span id="cb23-73">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> Type,</span>
<span id="cb23-74">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">group =</span> Type</span>
<span id="cb23-75">                )</span>
<span id="cb23-76">              ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-77">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(</span>
<span id="cb23-78">                <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb23-79">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Sex,</span>
<span id="cb23-80">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> Rate,</span>
<span id="cb23-81">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> Treatment,</span>
<span id="cb23-82">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> Type,</span>
<span id="cb23-83">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">group =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">interaction</span>(Type, Treatment)</span>
<span id="cb23-84">                )</span>
<span id="cb23-85">              ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-86">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>name) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-87">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_flip</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-88">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>percent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-89">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb23-90">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb23-91">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.major.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray"</span>),</span>
<span id="cb23-92">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"top"</span>,</span>
<span id="cb23-93">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>()</span>
<span id="cb23-94">              ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-95">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"P(A=1|X)"</span>)</span>
<span id="cb23-96">            </span>
<span id="cb23-97">          } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> {</span>
<span id="cb23-98">            </span>
<span id="cb23-99">            .group <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb23-100">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb23-101">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_smooth</span>(</span>
<span id="cb23-102">                <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb23-103">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> value,</span>
<span id="cb23-104">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> Score,</span>
<span id="cb23-105">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> Treatment,</span>
<span id="cb23-106">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> Treatment,</span>
<span id="cb23-107">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> Type</span>
<span id="cb23-108">                ),</span>
<span id="cb23-109">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25</span></span>
<span id="cb23-110">              ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-111">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>name) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-112">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>percent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-113">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb23-114">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb23-115">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.major.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray"</span>),</span>
<span id="cb23-116">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"top"</span></span>
<span id="cb23-117">              ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-118">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Z-Score"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-119">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"P(A=1|X)"</span>)</span>
<span id="cb23-120">          }</span>
<span id="cb23-121">          </span>
<span id="cb23-122">        } </span>
<span id="cb23-123">      )</span>
<span id="cb23-124">  ) </span>
<span id="cb23-125"></span>
<span id="cb23-126">gridExtra<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">grid.arrange</span>(plots<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>plot[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]], plots<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>plot[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]])</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/the-overlap-weight/index_files/figure-html/unnamed-chunk-15-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>As known from the true model, patients who received treatment <em>A</em> tend to be older, female patients with lower income. We can also see slight miscalibration in the estimates for patients who are older in that the model tends to <em>underestimate</em> the true propensity score in these patients. We see similar miscalibration for sex.</p>
</section>
</section>
<section id="estimatedoverlapweight" class="level2">
<h2 class="anchored" data-anchor-id="estimatedoverlapweight">Calculate the (estimated) overlap weight</h2>
<p>We’ve already defined how to calculate the overlap weights. The only difference here is that we’ll do it from the <em>estimated</em> propensity scores instead of the true ones. First, we’ll apply the formula to add the estimated weights to the <code>population</code>:</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1">population <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb24-2">  population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb24-3">  </span>
<span id="cb24-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add the estimated overlap weight for the realized treatment (known quantity)</span></span>
<span id="cb24-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb24-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">OW_hat =</span> A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>pA_hat) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>A) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> pA_hat</span>
<span id="cb24-7">  )</span></code></pre></div>
</details>
</div>
<p>As we’ve mentioned earlier, we will then <em>normalize</em> the weights <em>within</em> each treatment group so they have the same cumulative contribution for estimating the treatment effect in the outcome model.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1">population <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> </span>
<span id="cb25-2">  population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb25-3">  </span>
<span id="cb25-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add the normalized weights (adding true and estimated)</span></span>
<span id="cb25-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb25-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">OW_norm =</span> OW <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(OW),</span>
<span id="cb25-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">OW_hat_norm =</span> OW_hat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(OW_hat),</span>
<span id="cb25-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> A</span>
<span id="cb25-9">  )</span>
<span id="cb25-10">population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(A, pA, pA_hat, OW, OW_norm, OW_hat, OW_hat_norm)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10,000 × 7
       A    pA pA_hat    OW   OW_norm OW_hat OW_hat_norm
   &lt;int&gt; &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;  &lt;dbl&gt;       &lt;dbl&gt;
 1     1 0.577  0.583 0.423 0.000202   0.417   0.000198 
 2     0 0.422  0.427 0.422 0.000201   0.427   0.000202 
 3     1 0.647  0.610 0.353 0.000168   0.390   0.000185 
 4     0 0.219  0.226 0.219 0.000105   0.226   0.000107 
 5     1 0.315  0.329 0.685 0.000327   0.671   0.000318 
 6     1 0.292  0.265 0.708 0.000338   0.735   0.000348 
 7     0 0.644  0.628 0.644 0.000307   0.628   0.000297 
 8     0 0.171  0.181 0.171 0.0000814  0.181   0.0000856
 9     0 0.238  0.250 0.238 0.000114   0.250   0.000118 
10     0 0.257  0.268 0.257 0.000123   0.268   0.000127 
# ℹ 9,990 more rows</code></pre>
</div>
</div>
<p>To get a sense of the impact of these weights, let’s first look at their distributions (again, adding the true weights for comparison):</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1">population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb27-2">  </span>
<span id="cb27-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send overlap weight down the rows</span></span>
<span id="cb27-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb27-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(OW_norm, OW_hat_norm),</span>
<span id="cb27-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Type"</span>,</span>
<span id="cb27-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Weight"</span></span>
<span id="cb27-8">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb27-9">  </span>
<span id="cb27-10">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clean labels</span></span>
<span id="cb27-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb27-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Treatment =</span> </span>
<span id="cb27-13">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb27-14">        A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A"</span>,</span>
<span id="cb27-15">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"B"</span></span>
<span id="cb27-16">      ),</span>
<span id="cb27-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Type =</span> </span>
<span id="cb27-18">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb27-19">        Type <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"OW_norm"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"True"</span>,</span>
<span id="cb27-20">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Estimated"</span></span>
<span id="cb27-21">      )</span>
<span id="cb27-22">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb27-23">  </span>
<span id="cb27-24">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a plot</span></span>
<span id="cb27-25">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb27-26">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_density</span>(</span>
<span id="cb27-27">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb27-28">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Weight,</span>
<span id="cb27-29">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> Treatment,</span>
<span id="cb27-30">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> Type</span>
<span id="cb27-31">    ),</span>
<span id="cb27-32">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">40</span></span>
<span id="cb27-33">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb27-34">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb27-35">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb27-36">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.ticks.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb27-37">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb27-38">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb27-39">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"top"</span></span>
<span id="cb27-40">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb27-41">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Normalized Overlap Weight"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/the-overlap-weight/index_files/figure-html/unnamed-chunk-18-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>In aggregate, the groups will have equal weight. The distribution shift has to do with the sampled treatment distribution. Only 38.8% of patients have treatment <em>A</em> so at an individual level each patient will contribute more on average.</p>
<section id="cumulative-patient-contribution" class="level3">
<h3 class="anchored" data-anchor-id="cumulative-patient-contribution">Cumulative patient contribution</h3>
<p>We can also look at the accumulation of <em>patients</em> (i.e., when each patient contributes equally) as a function of the cumulative <em>overlap weight</em> in each group. This allows us to understand how much (or little) the treatment effect estimation will be dominated by smaller concentrations of patients.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1">population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb28-2">  </span>
<span id="cb28-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send overlap weight down the rows</span></span>
<span id="cb28-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb28-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(OW_norm, OW_hat_norm),</span>
<span id="cb28-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Type"</span>,</span>
<span id="cb28-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Weight"</span></span>
<span id="cb28-8">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb28-9">  </span>
<span id="cb28-10">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clean labels</span></span>
<span id="cb28-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb28-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Treatment =</span> </span>
<span id="cb28-13">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb28-14">        A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A"</span>,</span>
<span id="cb28-15">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"B"</span></span>
<span id="cb28-16">      ),</span>
<span id="cb28-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Type =</span> </span>
<span id="cb28-18">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb28-19">        Type <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"OW_norm"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"True"</span>,</span>
<span id="cb28-20">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Estimated"</span></span>
<span id="cb28-21">      ),</span>
<span id="cb28-22">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">NullWeight =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb28-23">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb28-24">  </span>
<span id="cb28-25">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rearrange</span></span>
<span id="cb28-26">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(Treatment, Type, Weight) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb28-27">  </span>
<span id="cb28-28">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute the cumulative weight</span></span>
<span id="cb28-29">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb28-30">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Weight =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cumsum</span>(Weight) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(Weight),</span>
<span id="cb28-31">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">NullWeight =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cumsum</span>(NullWeight) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(NullWeight),</span>
<span id="cb28-32">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(Treatment, Type)</span>
<span id="cb28-33">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb28-34">  </span>
<span id="cb28-35">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a plot</span></span>
<span id="cb28-36">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb28-37">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(</span>
<span id="cb28-38">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb28-39">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Weight,</span>
<span id="cb28-40">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> NullWeight,</span>
<span id="cb28-41">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> Treatment,</span>
<span id="cb28-42">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> Type</span>
<span id="cb28-43">    ),</span>
<span id="cb28-44">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linewidth =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb28-45">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb28-46">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_x_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>percent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb28-47">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>percent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb28-48">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb28-49">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb28-50">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.major.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray"</span>),</span>
<span id="cb28-51">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.major.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray"</span>),</span>
<span id="cb28-52">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"top"</span></span>
<span id="cb28-53">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb28-54">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cumulative percent of overlap weight (%)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb28-55">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cumulative percent of patients (%)"</span>) </span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/the-overlap-weight/index_files/figure-html/unnamed-chunk-19-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>We can see that for treatment <em>B</em>, about 40% of the outcome model weight will be accounted for by only 25% of patients. The weights for treatment <em>A</em> are slightly more evenly spread across the patients.</p>
</section>
<section id="weighted-mean-differences" class="level3">
<h3 class="anchored" data-anchor-id="weighted-mean-differences">Weighted-mean differences</h3>
<p>It was previously mentioned that the overlap weight methodology leads to perfect balance among the covariates used in the propensity score model. We can verify by looking at the group means for each model factor before and after weighting.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1">population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb29-2">  </span>
<span id="cb29-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add uniform weights</span></span>
<span id="cb29-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_column</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">NullWeight =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb29-5">  </span>
<span id="cb29-6">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send weights down the rows</span></span>
<span id="cb29-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb29-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(NullWeight, OW_hat_norm),</span>
<span id="cb29-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Type"</span>,</span>
<span id="cb29-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Weight"</span></span>
<span id="cb29-11">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb29-12">  </span>
<span id="cb29-13">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send factors down the rows</span></span>
<span id="cb29-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb29-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(age, income, male),</span>
<span id="cb29-16">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Factor"</span>,</span>
<span id="cb29-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Value"</span></span>
<span id="cb29-18">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb29-19">  </span>
<span id="cb29-20">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># For each </span></span>
<span id="cb29-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb29-22">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Patients =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>(),</span>
<span id="cb29-23">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Mean =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(Weight <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> Value) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(Weight),</span>
<span id="cb29-24">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(Factor, Type, A)</span>
<span id="cb29-25">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb29-26">  </span>
<span id="cb29-27">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clean up</span></span>
<span id="cb29-28">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">transmute</span>(</span>
<span id="cb29-29">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Factor =</span> </span>
<span id="cb29-30">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb29-31">        Factor <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"age"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Age (years)"</span>,</span>
<span id="cb29-32">        Factor <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"income"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Income ($)"</span>,</span>
<span id="cb29-33">        Factor <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"male"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Male (%)"</span></span>
<span id="cb29-34">      ),</span>
<span id="cb29-35">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Treatment =</span> </span>
<span id="cb29-36">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb29-37">        A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A"</span>,</span>
<span id="cb29-38">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"B"</span></span>
<span id="cb29-39">      ),</span>
<span id="cb29-40">    Type,</span>
<span id="cb29-41">    Patients,</span>
<span id="cb29-42">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Mean =</span> </span>
<span id="cb29-43">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb29-44">        Factor <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Male (%)"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> Mean,</span>
<span id="cb29-45">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Mean</span>
<span id="cb29-46">      )</span>
<span id="cb29-47">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb29-48">  </span>
<span id="cb29-49">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send over the columns</span></span>
<span id="cb29-50">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(</span>
<span id="cb29-51">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> Type,</span>
<span id="cb29-52">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(Patients, Mean)</span>
<span id="cb29-53">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb29-54">  </span>
<span id="cb29-55">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rearrange</span></span>
<span id="cb29-56">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(Factor, Treatment) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb29-57">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(Factor, Treatment, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ends_with</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"NullWeight"</span>), Mean_OW_hat_norm) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb29-58">  </span>
<span id="cb29-59">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a table</span></span>
<span id="cb29-60">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">reactable</span>(</span>
<span id="cb29-61">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">groupBy =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Factor"</span>,</span>
<span id="cb29-62">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> </span>
<span id="cb29-63">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb29-64">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Patients_NullWeight =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Patients"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">aggregate =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sum"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">align =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"center"</span>),</span>
<span id="cb29-65">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Mean_NullWeight =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Before Weighting"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">aggregate =</span> zildge<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rectbl_agg_wtd</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Patients_NullWeight"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">align =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"center"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colFormat</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)),</span>
<span id="cb29-66">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Mean_OW_hat_norm =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colDef</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"After Weighting"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">aggregate =</span> zildge<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rectbl_agg_wtd</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Patients_NullWeight"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">align =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"center"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colFormat</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb29-67">      ),</span>
<span id="cb29-68">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columnGroups =</span> </span>
<span id="cb29-69">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb29-70">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colGroup</span>(</span>
<span id="cb29-71">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Mean Value"</span>,</span>
<span id="cb29-72">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Mean_NullWeight"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Mean_OW_hat_norm"</span>)</span>
<span id="cb29-73">        )</span>
<span id="cb29-74">      ),</span>
<span id="cb29-75">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">striped =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb29-76">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">highlight =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb29-77">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bordered =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb29-78">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resizable =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb29-79">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">theme =</span> reactablefmtr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sandstone</span>()</span>
<span id="cb29-80">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb29-81">  reactablefmtr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_source</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Use arrows to expand table"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">font_size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">font_style =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"italic"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div class="reactable html-widget html-fill-item" id="htmlwidget-858fd2eb03b0c0ded5eb" style="width:auto;height:auto;"></div>
<p style="color:#000;background:#FFFFFF;text-align:left;font-size:12px;font-style:italic;font-weight:normal;text-decoration:;letter-spacing:px;word-spacing:px;text-transform:;text-shadow:;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px">Use arrows to expand table</p>
<script type="application/json" data-for="htmlwidget-858fd2eb03b0c0ded5eb">{"x":{"tag":{"name":"Reactable","attribs":{"data":{"Factor":["Age (years)","Age (years)","Income ($)","Income ($)","Male (%)","Male (%)"],"Treatment":["A","B","A","B","A","B"],"Patients_NullWeight":[3882,6118,3882,6118,3882,6118],"Mean_NullWeight":[51.8724351584411,48.7731344443606,46.152542137955,52.1470439747779,55.7444616177228,61.7522066034652],"Mean_OW_hat_norm":[50.6909470423481,50.6909470423472,48.480546358988,48.4805463589892,58.1435359891369,58.1435359891387]},"columns":[{"id":"Factor","name":"Factor","type":"character"},{"id":"Treatment","name":"Treatment","type":"character"},{"id":"Patients_NullWeight","name":"Patients","type":"numeric","aggregate":"sum","align":"center"},{"id":"Mean_NullWeight","name":"Before Weighting","type":"numeric","aggregate":"function(values, rows) {\n            var numerator = 0\n            var denominator = 0\n\n            rows.forEach(function(row, index) {\n                numerator += row['Patients_NullWeight'] * values[index]\n                denominator += row['Patients_NullWeight']\n            })\n\n            if('mean' == 'mean') {\n                return numerator / denominator\n            } else {\n                return numerator\n            }\n        }","format":{"cell":{"digits":1},"aggregated":{"digits":1}},"align":"center"},{"id":"Mean_OW_hat_norm","name":"After Weighting","type":"numeric","aggregate":"function(values, rows) {\n            var numerator = 0\n            var denominator = 0\n\n            rows.forEach(function(row, index) {\n                numerator += row['Patients_NullWeight'] * values[index]\n                denominator += row['Patients_NullWeight']\n            })\n\n            if('mean' == 'mean') {\n                return numerator / denominator\n            } else {\n                return numerator\n            }\n        }","format":{"cell":{"digits":1},"aggregated":{"digits":1}},"align":"center"}],"columnGroups":[{"name":"Mean Value","columns":["Mean_NullWeight","Mean_OW_hat_norm"]}],"groupBy":["Factor"],"resizable":true,"highlight":true,"bordered":true,"striped":true,"theme":{"color":"#3e3f3a","backgroundColor":"#ffffff","borderColor":"#f8f5f0","borderWidth":"1px","stripedColor":"#ededed","highlightColor":"#f8f5f0","cellPadding":6,"tableStyle":{"fontSize":15},"headerStyle":{"borderWidth":"2px","backgroundColor":"#f8f5f0","color":"#7c7a78","transitionDuration":"0.5s","&:hover[aria-sort]":{"color":"#000000"},"&[aria-sort='ascending'], &[aria-sort='descending']":{"color":"#000000"},"fontSize":16},"groupHeaderStyle":{"&:not(:empty)":{"color":"#3e3f3a","fontSize":16},"&:hover":{"fontWeight":"bold","transitionDuration":"1s","transitionTimingFunction":"ease-out","color":"#000000"}},"rowSelectedStyle":{"backgroundColor":"#dfd7ca","color":"#8e8c84"},"inputStyle":{"backgroundColor":"#ffffff","borderColor":"#bcbfc1","color":"#3e3f3a"},"searchInputStyle":{"backgroundColor":"#ffffff","color":"#3e3f3a","borderColor":"#bcbfc1","&:focus":{"color":"#3e3f3a"}},"selectStyle":{"backgroundColor":"#dfd7ca","color":"#8e8c84","borderColor":"#ffffff","outlineColor":"#ffffff"},"pageButtonStyle":{"backgroundColor":"#f8f5f0","color":"#8e8c84","&:hover":{"backgroundColor":"#f3969a","color":"#8e8c84"}},"pageButtonHoverStyle":{"backgroundColor":"#dfd7ca","color":"#8e8c84"},"pageButtonActiveStyle":{"backgroundColor":"#dfd7ca","color":"#8e8c84"},"pageButtonCurrentStyle":{"backgroundColor":"#dfd7ca","color":"#8e8c84"}},"dataKey":"0e103bb7f71d932df5c23f1aa11d9ef7"},"children":[]},"class":"reactR_markup"},"evals":["tag.attribs.columns.3.aggregate","tag.attribs.columns.4.aggregate"],"jsHooks":[]}</script>
</div>
</div>
<p>We can see that, overall, the weighted sample is <em>slightly</em> older and female with lower income.</p>
</section>
<section id="confounder-weight-shifts" class="level3">
<h3 class="anchored" data-anchor-id="confounder-weight-shifts">Confounder weight shifts</h3>
<p>Similar to looking at weight attributions as a whole, we can explore weight shifts within subgroups of patient characteristics. This helps build intuition about which patients the subsequent treatment effect estimates will focus on most (and least).</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1">population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb30-2">  </span>
<span id="cb30-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add uniform weights</span></span>
<span id="cb30-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_column</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">NullWeight =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb30-5">  </span>
<span id="cb30-6">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert to quntiles</span></span>
<span id="cb30-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb30-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(</span>
<span id="cb30-9">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(age, income),</span>
<span id="cb30-10">      \(x) Hmisc<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cut2</span>(x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">g =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span>
<span id="cb30-11">    ),</span>
<span id="cb30-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Age =</span> age,</span>
<span id="cb30-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Income =</span> income,</span>
<span id="cb30-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Sex =</span> </span>
<span id="cb30-15">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb30-16">        male <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Male"</span>,</span>
<span id="cb30-17">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Female"</span></span>
<span id="cb30-18">      ),</span>
<span id="cb30-19">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Treatment =</span> </span>
<span id="cb30-20">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb30-21">        A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A"</span>,</span>
<span id="cb30-22">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"B"</span></span>
<span id="cb30-23">      )</span>
<span id="cb30-24">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb30-25">  </span>
<span id="cb30-26">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send weights down the rows</span></span>
<span id="cb30-27">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb30-28">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(NullWeight, OW_hat_norm),</span>
<span id="cb30-29">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Type"</span>,</span>
<span id="cb30-30">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Weight"</span></span>
<span id="cb30-31">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb30-32">  </span>
<span id="cb30-33">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send factors down the rows</span></span>
<span id="cb30-34">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb30-35">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(Age, Income, Sex),</span>
<span id="cb30-36">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Factor"</span>,</span>
<span id="cb30-37">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Level"</span></span>
<span id="cb30-38">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb30-39">  </span>
<span id="cb30-40">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute total weights</span></span>
<span id="cb30-41">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb30-42">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Weight =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(Weight),</span>
<span id="cb30-43">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(Factor, Level, Type, Treatment)</span>
<span id="cb30-44">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb30-45">  </span>
<span id="cb30-46">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Find percent of weight</span></span>
<span id="cb30-47">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb30-48">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Weight =</span> Weight <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(Weight),</span>
<span id="cb30-49">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(Factor, Type, Treatment)</span>
<span id="cb30-50">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb30-51">  </span>
<span id="cb30-52">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send over columns</span></span>
<span id="cb30-53">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(</span>
<span id="cb30-54">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> Type,</span>
<span id="cb30-55">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> Weight</span>
<span id="cb30-56">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb30-57">  </span>
<span id="cb30-58">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clean up</span></span>
<span id="cb30-59">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb30-60">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Level =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(Level) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_rev</span>(),</span>
<span id="cb30-61">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Factor =</span> </span>
<span id="cb30-62">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(</span>
<span id="cb30-63">        Factor <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Age"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Age (years)"</span>,</span>
<span id="cb30-64">        Factor <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Income"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Income ($)"</span>,</span>
<span id="cb30-65">        <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Factor</span>
<span id="cb30-66">      )</span>
<span id="cb30-67">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb30-68">  </span>
<span id="cb30-69">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make a plot</span></span>
<span id="cb30-70">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-71">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(</span>
<span id="cb30-72">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb30-73">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Level,</span>
<span id="cb30-74">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> NullWeight,</span>
<span id="cb30-75">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> Treatment</span>
<span id="cb30-76">    ),</span>
<span id="cb30-77">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>,</span>
<span id="cb30-78">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linewidth =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">75</span>,</span>
<span id="cb30-79">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">40</span>,</span>
<span id="cb30-80">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dodge"</span></span>
<span id="cb30-81">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb30-82">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(</span>
<span id="cb30-83">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb30-84">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Level,</span>
<span id="cb30-85">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> OW_hat_norm,</span>
<span id="cb30-86">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> Treatment,</span>
<span id="cb30-87">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> Treatment</span>
<span id="cb30-88">    ),</span>
<span id="cb30-89">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">position =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">position_dodge</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">width =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>),</span>
<span id="cb30-90">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span></span>
<span id="cb30-91">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-92">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(</span>
<span id="cb30-93">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb30-94">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Level,</span>
<span id="cb30-95">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> OW_hat_norm,</span>
<span id="cb30-96">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> Treatment,</span>
<span id="cb30-97">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">group =</span> Treatment</span>
<span id="cb30-98">    ),</span>
<span id="cb30-99">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">position =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">position_dodge</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">width =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>),</span>
<span id="cb30-100">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linewidth =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb30-101">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-102">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>Factor, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"free"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-103">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_flip</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-104">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>percent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-105">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb30-106">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb30-107">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.major.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray"</span>),</span>
<span id="cb30-108">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"top"</span></span>
<span id="cb30-109">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-110">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Level"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-111">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Share of weight within group (%)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb30-112">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb30-113">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Before Weighting"</span>,</span>
<span id="cb30-114">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"After Weighting"</span>,</span>
<span id="cb30-115">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"After Weighting"</span></span>
<span id="cb30-116">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.zajichekstats.com/post/the-overlap-weight/index_files/figure-html/unnamed-chunk-21-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>We can see the balancing take place. After overlap weighting, the contribution to treatment effect estimation is reduced in older patients with lower income for treatment <em>A</em>, and vice-versa for treatment <em>B</em>, with some levels in the middle having increased weight for <em>both</em> treatments. This is where the most overlap is between the groups.</p>
<p>It is also useful to look at these plots for factors that were <em>not</em> included in the propensity score model. We may find drastic weight shifts which might indicate significant co-variation among factors already accounted for.</p>
</section>
</section>
</section>
<section id="estimatingtreatmenteffect" class="level1">
<h1>Estimating the treatment effect</h1>
<p>We’re now ready to estimate the causal treatment effect. Although we generated our data from a <a href="https://en.wikipedia.org/wiki/Weibull_distribution">Weibull</a> distribution, we will use a <a href="https://en.wikipedia.org/wiki/Proportional_hazards_model">Cox proportional-hazards model</a> for estimation, which is a <em>semi-parametric</em> method, and tends to be the default choice for modeling survival data, especially in healthcare.</p>
<section id="truehazardratio" class="level2">
<h2 class="anchored" data-anchor-id="truehazardratio">A look at the true hazard ratio</h2>
<p>The treatment effect will be quantified by a <a href="https://en.wikipedia.org/wiki/Hazard_ratio"><em>hazard ratio</em></a>, which is an estimate of the ratio of instantaneous event rates between the treatment groups (this is what is done in <a href="https://www2.stat.duke.edu/~fl35/OW/OW_survival_Demo.sas">the original simulation</a>). This measure is different than the effect we interpreted from the true data-generating process, so, although we won’t expect them to provide the same numerical result, we should still get a similar conclusion, in that treatment <em>A</em> is superior to treatment <em>B</em>.</p>
<p>To start, we can compute the actual hazard ratio had we been able to observe each potential outcome (i.e., the counterfactual). To do this, we need to transform our data such that we have two rows of data per patient–one for each treatment. Then, we fit the Cox model, using the <em>true</em> overlap weights as case weights.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit the model</span></span>
<span id="cb31-2">true_hr_mod <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb31-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coxph</span>(</span>
<span id="cb31-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">formula =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Surv</span>(time, status) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(A),</span>
<span id="cb31-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> </span>
<span id="cb31-6">      population <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb31-7">      </span>
<span id="cb31-8">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Keep columns needed</span></span>
<span id="cb31-9">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(</span>
<span id="cb31-10">        OW_norm, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># True (normalized) overlap weight (could use un-normalized version here)</span></span>
<span id="cb31-11">        t_A, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># True event time under treatment A</span></span>
<span id="cb31-12">        t_B, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># True event time under treatment B</span></span>
<span id="cb31-13">        censor_time <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># When the patient was censored</span></span>
<span id="cb31-14">      ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb31-15">      </span>
<span id="cb31-16">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Send true event times down the rows</span></span>
<span id="cb31-17">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb31-18">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(t_A, t_B),</span>
<span id="cb31-19">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A"</span>,</span>
<span id="cb31-20">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"time"</span></span>
<span id="cb31-21">      ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb31-22">      </span>
<span id="cb31-23">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Check if censored first</span></span>
<span id="cb31-24">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb31-25">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">A =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">case_when</span>(A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"t_A"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>), <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Replace with our notation</span></span>
<span id="cb31-26">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">time =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pmin</span>(time, censor_time),</span>
<span id="cb31-27">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">status =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(time <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> censor_time)</span>
<span id="cb31-28">      ),</span>
<span id="cb31-29">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">weights =</span> OW_norm</span>
<span id="cb31-30">  )</span>
<span id="cb31-31"></span>
<span id="cb31-32"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract the result</span></span>
<span id="cb31-33">true_hr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(true_hr_mod<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients)[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb31-34">true_hr</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0.7284168</code></pre>
</div>
</div>
<p>If we could compare the worlds in which we observe all patients under each treatment, the hazard ratio is 0.73, suggesting that the instantaneous event rate is 27% lower with treatment <em>A</em> than with treatment <em>B</em>. This aligns with what we’ve established as the true effect. Our hope is that the estimate from the observable data (in the next section) will contain this value within some margin of sampling error.</p>
</section>
<section id="estimatedhazardratio" class="level2">
<h2 class="anchored" data-anchor-id="estimatedhazardratio">Our estimate of the treatment effect</h2>
<p>Finally, we can obtain our estimate of the treatment effect using only the observable quantities we have in our data set, which is what we would have in a real-life analysis. Similar to the previous section, we are just fitting a Cox model with the treatment as a covariate. Except this time we only observe one outcome per patient as a result of the treatment they actually received, and use the <em>estimated</em> overlap weight to balance the confounding factors (age, sex, and income). We will also use <a href="https://rdrr.io/cran/survival/man/coxph.html">robust</a> standard error estimates.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit the model</span></span>
<span id="cb33-2">estimated_hr_mod <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb33-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coxph</span>(</span>
<span id="cb33-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">formula =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Surv</span>(time, status) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(A),</span>
<span id="cb33-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> population,</span>
<span id="cb33-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">weights =</span> OW_hat_norm,</span>
<span id="cb33-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">robust =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb33-8">  )</span>
<span id="cb33-9"></span>
<span id="cb33-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract the result</span></span>
<span id="cb33-11">estimated_hr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(estimated_hr_mod<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients)[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb33-12">estimated_hr</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0.7423806</code></pre>
</div>
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract the confidence interval</span></span>
<span id="cb35-2">estimated_hr_ci <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">confint</span>(estimated_hr_mod))</span>
<span id="cb35-3">estimated_hr_ci</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>               2.5 %    97.5 %
factor(A)1 0.6756889 0.8156548</code></pre>
</div>
</div>
<p>Our estimate for the hazard ratio is 0.74, with a 95% confidence interval ranging from 0.68 to 0.82, suggesting that the instantaneous event rate for treatment <em>A</em> is between 18% and 32% lower compared to treatment <em>B</em> (note that this captures the true hazard ratio). Under the assumption that we’ve correctly specified all confounding factors (which we did), we can interpret this as the <em>causal</em> treatment effect, and, assuming the magnitude of improvement is of practical importance (when considering things like cost, resources, clinical outcomes, etc.), we can reliably conclude that there is a benefit to treatment <em>A</em> over treatment <em>B</em>, on average.</p>
</section>
</section>
<section id="resources" class="level1">
<h1>Resources</h1>
<p>Some additional resources I’ve come across while learning the methodology:</p>
<ul>
<li>Original paper for overlap weighting (<a href="https://pubmed.ncbi.nlm.nih.gov/30189042/">link</a>)</li>
<li>Paper extending overlap weighting to survival analysis (<a href="https://arxiv.org/abs/2108.04394">link</a>)</li>
<li>Example simulation of overlap weighting in the survival setting in SAS (<a href="http://www2.stat.duke.edu/~fl35/OW/OW_survival_Demo.sas">link</a>)</li>
<li>R package for implementing overlap weight methods (<a href="https://github.com/thuizhou/PSweight">link</a>)</li>
<li>R functions for overlap weighting in the survival setting (<a href="https://github.com/chaochengstat/OW_Survival">link</a>)</li>
<li>Author’s web page dedicated to overlap weighting (<a href="https://www2.stat.duke.edu/~fl35/OW.html">link</a>)</li>
</ul>


<!-- -->

</section>

 ]]></description>
  <category>Causal Inference</category>
  <category>Survival Analysis</category>
  <category>Propensity Scores</category>
  <category>Weighting</category>
  <guid>https://www.zajichekstats.com/post/the-overlap-weight/</guid>
  <pubDate>Mon, 02 Oct 2023 05:00:00 GMT</pubDate>
  <media:content url="https://www.zajichekstats.com/post/the-overlap-weight/feature.png" medium="image" type="image/png" height="146" width="144"/>
</item>
</channel>
</rss>
