BookmarkSubscribeRSS Feed
ak2011
Fluorite | Level 6

Hi,

Can someone please help me with the code to count the number of unique lung subjects ( Ca case, Ca cont and Pop cont) in a merged dataset (Table 2). I would like to obtain the same number of Ca case(2 obs), Ca cont (3 obs)  and Pop cont (4 obs) as shown in Table 1 Unique lung subject means  one lung observation(eg. Ca case, Ca cont or Ca cont) for each id.

Eg. id OSa13 each has Ca case but actually the number of ca case for OSa13 is one (1). 

 

Please find below the code  and log. Output is attached. Thanks. ak.

 

/*  Cancer subjects*/
data d2;
input id$ 1-5 lung$ 7-15;
datalines;
OSa01 Pop cont
OSa06 Ca cont
OSa11 Pop cont
OSa13 Ca case
OSa29 Ca cont
OSa30 Ca case
OSa31 Ca cont
OSa54 Pop cont
OSa73 Pop cont
;
proc sort data=d2; by id; run;

proc freq data=d2;
tables lung;
Title "Table 1: Lung frequencies in d2";
run;

/* Exposure Duration*/
data d4;
input id$ 1-5 idchem 7-12 status$ 14-15 duration 16-18;
datalines;
OSa03 211701 S 6
OSa06 210701 S 9
OSa13 210701 S 37
OSa13 990005 S 5
OSa13 990021 S 37
OSa29 210701 NS 12
OSa29 990005 S 2
OSa30 210701 S 8
OSa30 211701 NS 8
OSa30 990005 S 8
OSa30 990021 S 15
OSa54 210701 NS 14
OSa64 210701 S 15
OSa74 211701 NS 21
OSa78 210701 NS 20
OSa78 990005 S 20
OSa78 990021 S 20
OSa86 990005 S 14
OSa93 210701 S 4
OSa93 990005 S 13
;

proc sort data=d4; by id; run;

/* Merging d2 & d4*/
data mg24;
merge d2(in=a) d4(in=b); by id;
if a and b;
run;

proc freq data=mg24;
tables lung;
title "Table 2: Lung frequencies in merged file d2,d4";
run;

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 /* Cancer subjects*/
74 data d2;
75 input id$ 1-5 lung$ 7-15;
76 datalines;
 
NOTE: The data set WORK.D2 has 9 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
 
 
86 ;
87 proc sort data=d2; by id; run;
 
NOTE: There were 9 observations read from the data set WORK.D2.
NOTE: The data set WORK.D2 has 9 observations and 2 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
 
 
88
89 proc freq data=d2;
90 tables lung;
91 Title "Table 1: Lung frequencies in d2";
92 run;
 
NOTE: There were 9 observations read from the data set WORK.D2.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.27 seconds
cpu time 0.25 seconds
 
 
93
94 /* Exposure Duration*/
95 data d4;
96 input id$ 1-5 idchem 7-12 status$ 14-15 duration 16-18;
97 datalines;
 
NOTE: The data set WORK.D4 has 20 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
 
 
118 ;
119
120 proc sort data=d4; by id; run;
 
NOTE: There were 20 observations read from the data set WORK.D4.
NOTE: The data set WORK.D4 has 20 observations and 4 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.20 seconds
cpu time 0.01 seconds
 
 
121
122 /* Merging d2 & d4*/
123 data mg24;
124 merge d2(in=a) d4(in=b); by id;
125 if a and b;
126 run;
 
NOTE: There were 9 observations read from the data set WORK.D2.
NOTE: There were 20 observations read from the data set WORK.D4.
NOTE: The data set WORK.MG24 has 11 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
 
 
127
128 proc freq data=mg24;
129 tables lung;
130 title "Table 2: Lung frequencies in merged file d2,d4";
131 run;
 
NOTE: There were 11 observations read from the data set WORK.MG24.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.07 seconds
cpu time 0.07 seconds
 
 
132
133 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
145

 


 

1 REPLY 1
Patrick
Opal | Level 21

You can make it unique again using below code.

 

proc sort data=mg24(keep=id lung) out=unique_id_lung nodupkey;
  by id lung;
run;

title "Table 2: Lung frequencies in merged file d2,d4";
proc freq data=unique_id_lung;
  tables lung;
run;
title;

Not all rows from table D2  match with a row from table D4 so don't expect to get the same count.

With your sample data the row in D2 with ID OSa31 doesn't match with table D4 (there is no such ID in the data).

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 861 views
  • 0 likes
  • 2 in conversation