Hi SAS experts,
Please advise on a SAS procedure for a large dataset that will allow me to identify subjects who have similar, but not identical ID numbers (i.e. all ID numbers are the same except for the last 2 numbers).
For example, a study has the following 8 subject ID numbers:
888709
234294
888710
098762
546849
888721
234276
888733
The SAS procedure should be able to identify the following matched groups:
Group 1 -- 888709, 888710, 888721, 888733 (same 8887 string)
Group 2 -- 234294, 234276 (same 2342 string)
ID numbers 098762, 546849 do not have matches.
Thanks,
SS
Assuming ID numbers are character:
*Create the group of 4 characters;
data want;
set have;
first_four=substr(id, 1, 4);
run;
*sort it by the group;
proc sort data=want; by first_four; run;
*Identify each group uniquely;
data group;
set want;
retain group 0;
if first.first_four then group+1;
else group;
run;
Thanks Reeza. I'm a bit confused by the last lines of the code .
I can't seem to figure out how to assign the grouped (matched?) values detailed in the last set of code.
data group;
set want;
retain group 0;
if first.first_four then group+1;
else group;
run;
Thanks.
is the example helpful?
data have;
input id $ @@;
cards;
a b c d a b c d d e
;
proc sort;
by id;
proc print;
run;
data grouped;
set have;
by id;
if first.id then group+1;
run;
proc print;
run;
Obs id group
1 a 1
2 a 1
3 b 2
4 b 2
5 c 3
6 c 3
7 d 4
8 d 4
9 d 4
10 e 5
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.