Skip to contents

Matched Case-Control Modeling

One upcoming regression method in Colossus is matched case-control logistic regression. The implementation is ongoing, but the theory is presented in the following section.

General Theory

Suppose we have matched case-control data and divide our data into each matched set. Each set has mm cases and nn records. We denote the relative risk for individual ii in the set by rir_i. We can calculate the probability of case exposures conditional on all exposures in the set by taking the ratio of the product of relative risks in the cases to the sum of the product of relative risks for every way of selecting mm individuals from the nn at risk.

imricR(j=1mrcj)L=i=1mlog(ri)log(cR(j=1mrcj)) \begin{aligned} \frac{\prod_{i}^{m} r_i}{\sum_{c \in R} \left ( \prod_{j=1}^{m} r_{c_j} \right )} \\ L = \sum_{i=1}^{m} log(r_i) - log \left ( \sum_{c \in R} \left ( \prod_{j=1}^{m} r_{c_j} \right ) \right ) \end{aligned}

Using the methods presented in Gail et al. (1981) we can calculate the combination of all n!/m!(nm)!n!/m!(n-m)! ways to select mm items with a more manageable recursive formula B(m,n)B(m,n).

B(m,n)=cR(j=1mrcj)B(m,n)=B(m,n1)+rnB(m1,n1)B(m,n)={jnrjm=10m>n \begin{aligned} B(m, n) = \sum_{c \in R} \left ( \prod_{j=1}^{m} r_{c_j} \right ) \\ B(m,n) = B(m, n-1) + r_n B(m-1, n-1) \\ B(m,n) = \begin{cases} \sum_{j}^{n} r_j & m = 1 \\ 0 & m > n \end{cases} \end{aligned}

We can then directly solve for the first and second derivatives and their recursive formula.

riβμ=:riμB(m,n)βμ=Bμ(m,n)=cR[(j=1mrcjμrcj)j=1mrcj]Bμ(m,n)=Bμ(m,n1)+rnBμ(m1,n1)+rnμB(m1,n1)Bμ(m,n)={jnrjμm=10m>n \begin{aligned} \frac{\partial r_i}{\partial \beta_\mu} =: r_{i}^{\mu} \\ \frac{\partial B(m,n)}{\partial \beta_\mu} = B^{\mu}(m, n) = \sum_{c \in R} \left [ \left ( \sum_{j=1}^{m} \frac{r_{c_j}^{\mu}}{r_{c_j}} \right ) \prod_{j=1}^{m} r_{c_j} \right ] \\ B^{\mu}(m,n) = B^{\mu}(m, n-1) + r_n B^{\mu}(m-1, n-1) + r_n^{\mu} B(m-1, n-1) \\ B^{\mu}(m,n) = \begin{cases} \sum_{j}^{n} r_j^{\mu} & m = 1 \\ 0 & m > n \end{cases} \end{aligned}

2riβμβν=:riμ,ν2B(m,n)βμβν=Bμ,ν(m,n)=cR[(j=1mrcjμ,νrcj+(j=1mrcjμrcj)(j=1mrcjνrcj)j=1mrcjμrcjrcjνrcj)j=1mrcj]Bμ,ν(m,n)=Bμ,ν(m,n1)+rnμ,νB(m1,n1)+rnνBμ(m1,n1)+rnμBν(m1,n1)+rnBμ,ν(m1,n1)Bμ,ν(m,n)={jnrjμ,νm=10m>n \begin{aligned} \frac{\partial^2 r_i}{\partial \beta_\mu \partial \beta_\nu} =: r_{i}^{\mu,\nu} \\ \frac{\partial^2 B(m,n)}{\partial \beta_\mu \partial \beta_\nu} = B^{\mu,\nu}(m, n) = \sum_{c \in R} \left [ \left ( \sum_{j=1}^{m} \frac{r_{c_j}^{\mu,\nu}}{r_{c_j}} + \left ( \sum_{j=1}^{m} \frac{r_{c_j}^{\mu}}{r_{c_j}} \right ) \left ( \sum_{j=1}^{m} \frac{r_{c_j}^{\nu}}{r_{c_j}} \right ) - \sum_{j=1}^{m} \frac{r_{c_j}^{\mu}}{r_{c_j}} \frac{r_{c_j}^{\nu}}{r_{c_j}} \right ) \prod_{j=1}^{m} r_{c_j} \right ] \\ B^{\mu,\nu}(m,n) = B^{\mu,\nu}(m, n-1) + r_n^{\mu,\nu} B(m-1, n-1) + r_n^{\nu} B^{\mu}(m-1, n-1) + r_n^{\mu} B^{\nu}(m-1, n-1) + r_n B^{\mu,\nu}(m-1, n-1) \\ B^{\mu,\nu}(m,n) = \begin{cases} \sum_{j}^{n} r_j^{\mu,\nu} & m = 1 \\ 0 & m > n \end{cases} \end{aligned}

Finally these expressions for B(m,n)B(m,n) can be substituted into the equations for the contribution of Log-Likelihood and it’s derivatives from each matched set. The model is then optimized via the same methods as the other regression models.

Lset=i=1mlog(ri)log(B(m,n))Lsetμ=i=1mriμriBμ(m,n)B(m,n)Lsetμ,ν=i=1m(riμ,νririμririnuri)(Bμ,ν(m,n)B(m,n)Bμ(m,n)B(m,n)Bν(m,n)B(m,n)) \begin{aligned} L_{set} = \sum_{i=1}^{m} log(r_i) - log \left ( B(m,n) \right ) \\ L^{\mu}_{set} = \sum_{i=1}^{m} \frac{r_i^{\mu}}{r_i} - \frac{B^{\mu}(m,n)}{B(m,n)} \\ L^{\mu, \nu}_{set} = \sum_{i=1}^{m} \left ( \frac{r_i^{\mu,\nu}}{r_i} - \frac{r_i^{\mu}}{r_i}\frac{r_i^{nu}}{r_i} \right ) - \left ( \frac{B^{\mu,\nu}(m,n)}{B(m,n)} - \frac{B^{\mu}(m,n)}{B(m,n)}\frac{B^{\nu}(m,n)}{B(m,n)} \right ) \end{aligned}