BookmarkSubscribeRSS Feed
Mystik
Obsidian | Level 7

Please!

I have a data that has one of its variable called hltcode.

 

The hltcode variable have codes that denotes different features. I need to create another variable (called hlt_grp) and group Whyz, Emfy and Pheno based on a code or combination of codes from hltcode.

 

Any of these codes in hltcode (35, 37, 38, 39, 484, 7827, 2728, 2539, 2218, 2721912) represent Whyz 

any one of these codes  (728, 7298, 722, 27218, 2721821, 27282, 27218) represent Emfy

any one of these codes (73, 252, 27, 252, 262782, 5262, 27272, 27282, 2722, 272626, 25215,526) represent Lgel

any one of these codes (6316, 126137, 126316, 1351, 2636, 26317, 73171) represent Etope

 

Therefore;

If a code in hltcode contains ANY code of Whyz, then it's Whyz.

If a code in hltcode contains ANY code of Emfy, then it's Emfy.

if a code in hltcode contains BOTH codes of Whyz AND Emfy, then it's Pheno.

if a code in hltcode should have a combination (3 combinations) of Whyz AND Emfy AND Lgel, then it's still Pheno

If a code in hltcode should have a combination (4 combinations) of Whyz AND Emfy AND Lgel AND Etope, then it's still Pheno.

 

I have tried series of SAS trials but don't seem to work out completely. Please any help?

11 REPLIES 11
PGStats
Opal | Level 21

How is hltcode many codes at the same time?

PG
Mystik
Obsidian | Level 7

No please.

There is another variable called "ID";

 

Therefore; if

If an ID has a code in hltcode that contains only one code in Whyz, then it's Whyz.

If an ID has a code in hltcode that contains only one code in Emfy, then it's Emfy.

if an ID has a code in hltcode contains BOTH codes of Whyz AND Emfy, then it's Pheno.

if an ID has  combination of 3 codes of Whyz AND Emfy AND Lgel, then it's still Pheno

If an ID has combination of 4 codes of Whyz AND Emfy AND Lgel AND Etope, then it's still Pheno.

 

Also, an ID have several observations for hltcode

Reeza
Super User
Please show an example of what your input data looks like and what you expect as output.
Mystik
Obsidian | Level 7

***////This is an example of my input data into SAS look like. The codes in the bracket are not necessary what they represented///*****;

 

libname obs_rad "C:\obs_rad";

data obs_rad.radft;

set obs_rad.radft;

 

data obs_rad.SOB;

set obs_rad.radft;

if hltcode in (262, 2727, 26252, 227272) then alc_grp="Whyz";

if hltcode in (626228, 227, 22772, 26252, 7383) then alc_grp="Emfy";

if hltcode in (636, 6363, 4484, 37393, 383839, 393, 3737) and hltcode in (636, 2262, 2627, 26272) then alc_grp="Pheno";

if hltcode in (262, 2727, 26252, 227272) and hltcode in (636, 6363, 4484, 37393, 383839, 393, 3737) and hitcode in 636, 2262, 2627, 26272) then alc_grp="Pheno";

 

proc freq data=obs_rad.SOB;

table alc_grp;

run;

 

PGStats
Opal | Level 21

This should do it:

data obs_rad.SOB;
do until(last.id);
    set obs_rad.radft; by id;
    if hltcode in (262, 2727, 26252, 227272) then Whyz = 1;
    if hltcode in (626228, 227, 22772, 26252, 7383) then Emfy = 1;
    if hltcode in (73, 252, 27, 252, 262782, 5262, 27272, 27282, 2722, 272626, 25215, 526) then lgel = 1;
    if hltcode in (6316, 126137, 126316, 1351, 2636, 26317, 73171) then etope = 1;
    end;
if sum(Whyz, Emfy, lgel, etope) >= 3 or whyz and emfy then hlt_grp = "Pheno";
else if Emfy then hlt_grp = "Emfy";
else if Whyz then hlt_grp = "Whyz";
do until(last.id);
    set obs_rad.radft; by id;
    output;
    end;
drop Whyz Emfy lgel etope;
run;
PG
Mystik
Obsidian | Level 7

However, i need some clarifications.

 

1. My 3 distinct groups are Whyz, Emfy or Pheno. That means each of my ID has to be grouped into either Whyz, Emfy or Pheno. In your last SAS statement, you coded: 

drop Whyz Emfy lgel etope 

 

2. Also, if an ID with a hltcode of both Whyz AND Emfy = Pheno

                                    hltcode of both Whyz AND Etope = Pheno

                                    hltcode of both Emfy AND Lgel = Pheno 

                                    hltcode of Emfy, Lgel, Etope = Pheno

                                    hltcode of both Polp AND Lopr= Pheno 

 

Actually i have 13 features: Whyz, Emfy, Etope, Lgel, Etope, Polp, Klig, Uttr, Thaip, Ghait, Dondj, Lopr, Bduc.

It can be a combination of any two only, three only or four only which i have to define based on the definition in the literature.

 

There is a code for Pheno only as well, which needs no combination but represent "Pheno".

 

PGStats
Opal | Level 21

How the code works

In the code above, variables whyz, emfy, lgel, and etope are all missing at the start of an ID group. In the first do until() loop, as soon as a proper code is found, the variable becomes = 1.

The IF THEN ELSE IF... statements transform the combination of features into the corresponding value for hlt_grp.

The last loop simply copies the ID group to the output with the new variable hlt_grp.

The variables whyz, emfy, lgel, and etope are not needed in the output, so they are dropped.

 

Adding more features

The code above involved only the 4 features presented in your original question. With 13 features being possibly present or absent in each ID group, you will have 2^13 = 8192 combinations to account for. The logic could get out of hand unless it can be summarized into a small set of rules.

 

PG
Peter_C
Rhodochrosite | Level 12
I'd suggest user-defined formats to manage and define the hit-code groups., PROC summary or means to collect hit-code groups per ID and some scoring system to translate combinations of hit-grp to the target results.
PROC FORMAT and MEANS examples should give good direction.
Peter_C
Rhodochrosite | Level 12
PROC summary data=your data NWAY ;
Class id hitcode ;
Format hitcode udef_fmt;
Output out= idgroups n=count ;
Run ;

* now create a hit groups collection for ID;
Data combine;
Do until( last.id ) ;
Set idgroups ;
Length hit_grp $100 ;
Hit_grp = catx( '-', hit_gdp, put( hit-code, udef_fmt. ) ) ;
End ;
Run ;
* now process your rules on collection hit_grp;
Peter_C
Rhodochrosite | Level 12
Sorry about the autocorrect and typos
udef_fmt
Has no dot(.) only when you create it in PROC FORMAT
HIT_GDP should be hit_grp
HIT-CODE should be hitcode

Wish this samsung internet browser supported the code formatting.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1142 views
  • 0 likes
  • 4 in conversation