BookmarkSubscribeRSS Feed
Merdock
Quartz | Level 8

I have a retrospective study (registry) with data on people with kidney transplant. Registry contains very small number of people with a specific genetic disease (n=17) and a lot of people without the disease (N=13000). The registry provides data on various outcomes for the patients, such as graft failure, various lab values, infections etc. The theory is that this disease increases the risk of kidney complications and poor transplant outcomes.

 

Aims:

(1) perform 1:1 matching on age, gender, race of cases with the genetic disease with those without the disease.

(2) do a descriptive analysis of the incidence of various outcomes (serum creatinine levels, CMV infection etc) between the two groups.

 

I have already addressed aim (1) by performing the 1:1 case-control match (i.e., one control for each case) using the guidance in this article (https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/173-29.pdf) but I’m confused about what to do next as I don’t have any prior experience with matching in cohort studies.

 

Question: For the descriptive analysis (I want table below) what should I use to evaluate differences between matched pairs (i.e., cases vs controls) on specific characteristics? Is it ok to use signed ranked test (if variable is continuous) and McNemar test (if categorical)? From what I've read, opinions are divided on whether matching should/can be ignored or not but I don't have any experience with this to be able to know which opinion is best to go with. If it’s ok to ignore matching, should I use independent samples t-test and chi square? 

If you recommend I use signed rank test and McNemar instead, does anybody have any sample code for how to do this for my data (sample data below)?

Characteristic at Baseline

Disease (Case)

No Disease (Control)

P-value

Serum Creatinine Value, (mg/dL)

 

 

 

N

 

 

 

Mean (SD)

 

 

 

Median (IQR)

 

 

 

CMV Infection, n (%)

 

 

 

Yes

 

 

 

No

 

 

 

 

My final dataset looks like this, where disease=1 denotes presence of disease (case), and group links the case with its matched control:

data exam;
	input id $ age race $ gender $ creat cmv $ disease $ group;
cards;
0001 19 2 2 23 1 1 1 
0017 19 2 2 28 0 0 1 
0002 10 2 1 43 1 1 2 
0005 10 2 1 26 1 0 2 
0060 15 2 2 54 1 1 3 
0010 15 2 2 43 0 0 3 
0018 14 2 2 120 1 1 4 
0105 14 2 2 29 1 0 4 
0008 18 2 1 36 1 1 5 
0022 18 2 1 57 0 0 5 
0548 15 2 1 49 0 1 6 
0052 15 2 1 100 1 0 6 
0059 13 2 1 95 0 1 7 
0982 13 2 1 65 1 0 7 
0047 12 2 1 20 1 1 8 
0084 12 2 1 39 0 0 8 
0680 17 2 2 78 0 1 9 
0042 17 2 2 110 0 0 9 
0984 15 2 2 66 1 1 10 
0007 15 2 2 85 0 0 10 
0021 16 2 1 73 0 1 11 
0873 16 2 1 62 0 0 11 
0193 17 2 1 71 1 1 12 
0178 17 2 1 76 0 0 12 
;
run;

Thanks in advance for any help/suggestions!

 

 

5 REPLIES 5
StatDave
SAS Super FREQ

Yes, you could use McNemar's test to compare the proportions for the disease groups when the response is binary. See this note that shows an example and also shows how a repeated measures analysis can be done if confidence intervals for the proportions are needed. For a continuous response, you could use a repeated measures analysis similar to that shown in the note but specify whatever is the appropriate response distribution for the response using the DIST= option in the MODEL statement. If not specified, the normal distribution is used.

Merdock
Quartz | Level 8

@StatDave, thank you for your input! I ended up using wilcoxon signed rank test for difference in means between my disease and no disease group for continuous variables. But now I'm a bit confused about how to present these results in a table similar to what I showed in my original post above. Since the test is for difference in creatinine levels between disease and non disease groups, how can I present mean (SD), median (IQR) for each disease group and then p-value? Not sure if my question makes sense or not but basically, I'm going based on what I read in this article (https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/165-29.pdf) where the authors say that "For the matched analysis, differences between matched pairs were evaluated using the signed rank test for continuous data and the
McNemar's test for binary data" and give Table 2 on page 5, which is what I was hoping to do for my variables. But I don't know how to do that by using Wilcoxon signed rank test:

Merdock_0-1665461412799.png

How can I obtain a table like this for my Disease vs No Disease groups and creatinine variable? Code for what I did so far is below:

data exam;
	input id $ age race $ gender $ creat cmv $ disease $ group;
cards;
0001 19 2 2 23 1 1 1 
0017 19 2 2 28 0 0 1 
0002 10 2 1 43 1 1 2 
0005 10 2 1 26 1 0 2 
0060 15 2 2 54 1 1 3 
0010 15 2 2 43 0 0 3 
0018 14 2 2 120 1 1 4 
0105 14 2 2 29 1 0 4 
0008 18 2 1 36 1 1 5 
0022 18 2 1 57 0 0 5 
0548 15 2 1 49 0 1 6 
0052 15 2 1 100 1 0 6 
0059 13 2 1 95 0 1 7 
0982 13 2 1 65 1 0 7 
0047 12 2 1 20 1 1 8 
0084 12 2 1 39 0 0 8 
0680 17 2 2 78 0 1 9 
0042 17 2 2 110 0 0 9 
0984 15 2 2 66 1 1 10 
0007 15 2 2 85 0 0 10 
0021 16 2 1 73 0 1 11 
0873 16 2 1 62 0 0 11 
0193 17 2 1 71 1 1 12 
0178 17 2 1 76 0 0 12 
;
run;
proc print data=exam; run;
proc sort data=exam; by group; run;
proc transpose data=exam out=exam1 prefix=creat;
    by group;
    id disease;
    var creat;
run;
proc print data=exam1; run;

data exam1;
	set exam1;
	rename creat1=DISEASE_creat creat0=NODISEASE_creat;
run;
proc print data=exam1; run;

data final;
    set exam1;
    diff=DISEASE_creat-NODISEASE_creat;
run;
proc print data=final; run;

/*perform Wilcoxon Signed Rank Test*/
proc univariate data=final;
    var diff;
run;
Ksharp
Super User
if variable is continuous , you could use paired t-test( PROC TTEST+PAIRED before*after; ) ,
or you could make a diff variable( diff=after-before ) and use it in  signed ranked test or t-test .
Merdock
Quartz | Level 8

@Ksharp, thanks for the input.I ended up making the diff variable and using signed ranked test.

If you don't mind, would you be able to please advise if my code below looks ok?

 

data exam;
	input id $ age race $ gender $ creat cmv $ disease $ group;
cards;
0001 19 2 2 23 1 1 1 
0017 19 2 2 28 0 0 1 
0002 10 2 1 43 1 1 2 
0005 10 2 1 26 1 0 2 
0060 15 2 2 54 1 1 3 
0010 15 2 2 43 0 0 3 
0018 14 2 2 120 1 1 4 
0105 14 2 2 29 1 0 4 
0008 18 2 1 36 1 1 5 
0022 18 2 1 57 0 0 5 
0548 15 2 1 49 0 1 6 
0052 15 2 1 100 1 0 6 
0059 13 2 1 95 0 1 7 
0982 13 2 1 65 1 0 7 
0047 12 2 1 20 1 1 8 
0084 12 2 1 39 0 0 8 
0680 17 2 2 78 0 1 9 
0042 17 2 2 110 0 0 9 
0984 15 2 2 66 1 1 10 
0007 15 2 2 85 0 0 10 
0021 16 2 1 73 0 1 11 
0873 16 2 1 62 0 0 11 
0193 17 2 1 71 1 1 12 
0178 17 2 1 76 0 0 12 
;
run;
proc print data=exam; run;
proc sort data=exam; by group; run;
proc transpose data=exam out=exam1 prefix=creat;
    by group;
    id disease;
    var creat;
run;
proc print data=exam1; run;

data exam1;
	set exam1;
	rename creat1=DISEASE_creat creat0=NODISEASE_creat;
run;
proc print data=exam1; run;

data final;
    set exam1;
    diff=DISEASE_creat-NODISEASE_creat;
run;
proc print data=final; run;

/*perform Wilcoxon Signed Rank Test*/
proc univariate data=final;
    var diff;
run;

 

Ksharp
Super User
Yes. Your code looks good .

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1172 views
  • 1 like
  • 3 in conversation