Please!
I have a data that has one of its variable called hltcode.
The hltcode variable have codes that denotes different features. I need to create another variable (called hlt_grp) and group Whyz, Emfy and Pheno based on a code or combination of codes from hltcode.
Any of these codes in hltcode (35, 37, 38, 39, 484, 7827, 2728, 2539, 2218, 2721912) represent Whyz
any one of these codes (728, 7298, 722, 27218, 2721821, 27282, 27218) represent Emfy
any one of these codes (73, 252, 27, 252, 262782, 5262, 27272, 27282, 2722, 272626, 25215,526) represent Lgel
any one of these codes (6316, 126137, 126316, 1351, 2636, 26317, 73171) represent Etope
Therefore;
If a code in hltcode contains ANY code of Whyz, then it's Whyz.
If a code in hltcode contains ANY code of Emfy, then it's Emfy.
if a code in hltcode contains BOTH codes of Whyz AND Emfy, then it's Pheno.
if a code in hltcode should have a combination (3 combinations) of Whyz AND Emfy AND Lgel, then it's still Pheno
If a code in hltcode should have a combination (4 combinations) of Whyz AND Emfy AND Lgel AND Etope, then it's still Pheno.
I have tried series of SAS trials but don't seem to work out completely. Please any help?
How is hltcode many codes at the same time?
No please.
There is another variable called "ID";
Therefore; if
If an ID has a code in hltcode that contains only one code in Whyz, then it's Whyz.
If an ID has a code in hltcode that contains only one code in Emfy, then it's Emfy.
if an ID has a code in hltcode contains BOTH codes of Whyz AND Emfy, then it's Pheno.
if an ID has combination of 3 codes of Whyz AND Emfy AND Lgel, then it's still Pheno
If an ID has combination of 4 codes of Whyz AND Emfy AND Lgel AND Etope, then it's still Pheno.
Also, an ID have several observations for hltcode
***////This is an example of my input data into SAS look like. The codes in the bracket are not necessary what they represented///*****;
libname obs_rad "C:\obs_rad";
data obs_rad.radft;
set obs_rad.radft;
data obs_rad.SOB;
set obs_rad.radft;
if hltcode in (262, 2727, 26252, 227272) then alc_grp="Whyz";
if hltcode in (626228, 227, 22772, 26252, 7383) then alc_grp="Emfy";
if hltcode in (636, 6363, 4484, 37393, 383839, 393, 3737) and hltcode in (636, 2262, 2627, 26272) then alc_grp="Pheno";
if hltcode in (262, 2727, 26252, 227272) and hltcode in (636, 6363, 4484, 37393, 383839, 393, 3737) and hitcode in 636, 2262, 2627, 26272) then alc_grp="Pheno";
proc freq data=obs_rad.SOB;
table alc_grp;
run;
This should do it:
data obs_rad.SOB;
do until(last.id);
set obs_rad.radft; by id;
if hltcode in (262, 2727, 26252, 227272) then Whyz = 1;
if hltcode in (626228, 227, 22772, 26252, 7383) then Emfy = 1;
if hltcode in (73, 252, 27, 252, 262782, 5262, 27272, 27282, 2722, 272626, 25215, 526) then lgel = 1;
if hltcode in (6316, 126137, 126316, 1351, 2636, 26317, 73171) then etope = 1;
end;
if sum(Whyz, Emfy, lgel, etope) >= 3 or whyz and emfy then hlt_grp = "Pheno";
else if Emfy then hlt_grp = "Emfy";
else if Whyz then hlt_grp = "Whyz";
do until(last.id);
set obs_rad.radft; by id;
output;
end;
drop Whyz Emfy lgel etope;
run;
Thanks!
However, i need some clarifications.
1. My 3 distinct groups are Whyz, Emfy or Pheno. That means each of my ID has to be grouped into either Whyz, Emfy or Pheno. In your last SAS statement, you coded:
drop Whyz Emfy lgel etope
2. Also, if an ID with a hltcode of both Whyz AND Emfy = Pheno
hltcode of both Whyz AND Etope = Pheno
hltcode of both Emfy AND Lgel = Pheno
hltcode of Emfy, Lgel, Etope = Pheno
hltcode of both Polp AND Lopr= Pheno
Actually i have 13 features: Whyz, Emfy, Etope, Lgel, Etope, Polp, Klig, Uttr, Thaip, Ghait, Dondj, Lopr, Bduc.
It can be a combination of any two only, three only or four only which i have to define based on the definition in the literature.
There is a code for Pheno only as well, which needs no combination but represent "Pheno".
How the code works
In the code above, variables whyz, emfy, lgel, and etope are all missing at the start of an ID group. In the first do until() loop, as soon as a proper code is found, the variable becomes = 1.
The IF THEN ELSE IF... statements transform the combination of features into the corresponding value for hlt_grp.
The last loop simply copies the ID group to the output with the new variable hlt_grp.
The variables whyz, emfy, lgel, and etope are not needed in the output, so they are dropped.
Adding more features
The code above involved only the 4 features presented in your original question. With 13 features being possibly present or absent in each ID group, you will have 2^13 = 8192 combinations to account for. The logic could get out of hand unless it can be summarized into a small set of rules.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.