Hello, I am using SAS 9.4. I am trying to count merged data. One has ID and non relevant information, the other has the same ID and sample ID's which I want to match up. Some of the ID's have multiple and some have no sample ID's. I would like to count how many samples there are per ID and if it does not have a match put a 0 as the count. I can tell which ones do not have any match because they do not have a box ID.
Set 1: ID
ID
123
456
789
101
102
Set 2: Sample ID with ID
ID Sample ID Box ID
123 ABC A
123 DEF B
456 GHI B
456 JKL F
456 MNO L
456 PQR J
101 STU A
102 VWX W
105 YZA B
106 BCD N
Set: WANT
ID sample ID Box ID Count
123 ABC A 1
123 DEF B 2
456 GHI B 1
456 JKL F 2
456 MNO L 3
456 PQR J 4
789 ------ ---- 0
101 STU A 1
102 VWX W 1
I tried merging the two and then counting however it would just still keep 789 and treat it as if it had a Sample ID and then count it as 1 when it should be a 0. How would I go about this?
Thank you
See this:
data have1;
input id $3.;
datalines;
123
456
789
101
102
;
data have2;
input id :$3. sample_id :$3. box_id :$1.;
datalines;
123 ABC A
123 DEF B
456 GHI B
456 JKL F
456 MNO L
456 PQR J
101 STU A
102 VWX W
105 YZA B
106 BCD N
;
run;
proc sort data=have1;
by id;
run;
proc sort data=have2;
by id;
run;
data want;
merge
have1 (in=h1)
have2 (in=h2)
;
by id;
if h1;
if first.id
then count = 0;
if h2
then count + 1;
else do;
sample_id = '---';
box_id = '-';
end;
run;
proc print data=want noobs;
run;
Result:
sample_ id id box_id count 101 STU A 1 102 VWX W 1 123 ABC A 1 123 DEF B 2 456 GHI B 1 456 JKL F 2 456 MNO L 3 456 PQR J 4 789 --- - 0
See this:
data have1;
input id $3.;
datalines;
123
456
789
101
102
;
data have2;
input id :$3. sample_id :$3. box_id :$1.;
datalines;
123 ABC A
123 DEF B
456 GHI B
456 JKL F
456 MNO L
456 PQR J
101 STU A
102 VWX W
105 YZA B
106 BCD N
;
run;
proc sort data=have1;
by id;
run;
proc sort data=have2;
by id;
run;
data want;
merge
have1 (in=h1)
have2 (in=h2)
;
by id;
if h1;
if first.id
then count = 0;
if h2
then count + 1;
else do;
sample_id = '---';
box_id = '-';
end;
run;
proc print data=want noobs;
run;
Result:
sample_ id id box_id count 101 STU A 1 102 VWX W 1 123 ABC A 1 123 DEF B 2 456 GHI B 1 456 JKL F 2 456 MNO L 3 456 PQR J 4 789 --- - 0
Same idea as Kurt, just my share of fun
data have1;
input id $3.;
datalines;
123
456
789
101
102
;
data have2;
input id :$3. sample_id :$3. box_id :$1.;
datalines;
123 ABC A
123 DEF B
456 GHI B
456 JKL F
456 MNO L
456 PQR J
101 STU A
102 VWX W
105 YZA B
106 BCD N
;
run;
data want ;
if _n_=1 then do;
if 0 then set have2;
dcl hash H (dataset:'have1') ;
h.definekey ("id") ;
h.definedata ("id") ;
h.definedone () ;
dcl hiter hi('h');
end;
do count=1 by 1 until(last.id);
set have2 end=z;
by id notsorted;
if h.check()=0 then do;
if last.id then h.remove();
output;
end;
end;
if z;
do while(hi.next()=0);
count=0;
call missing(sample_id,box_id);
output;
end;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.