How do I create new indicator variables in a dataset based on variables from a second dataset?

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

How do I create new indicator variables in a dataset based on variables from a second dataset?

Hi everyone, 

 

I have two datasets.

  1. Dataset 1 is a discharge dataset. Each obsercation has a unique id (var name "ID") and10 diagnosis variables (var names dx_1, dx_2, dx_3, etc). The diagnosis variable values are ICD9 codes such as 707.00, 707.01, etc.  
  2. Dataset 2 contains a list of ICD9 codes and group-level variables. So, for each ICD9 code (unique, 1 observation each), I have 6 1/0 variables that indicate if the code is part of a larger group (var names group_1, group_2). I.e., ICD9 code 707.00 will get a "1" for group_1. 

 

I want to be able to go through dataset 1, and look at each dx_n variable. If the value is, for example, 707.00, I want to assign that discharge a "1" for the variable group_1. 

 

Essentially, I want to create these new indicator variables in dataset 1, but I am defining them based on group definitions from dataset 2. Should I do this in a datastep? Is there a SAS procedure that would make this easier? 

 

Thanks! 


Accepted Solutions
Solution
‎08-22-2016 03:54 PM
Respected Advisor
Posts: 4,606

Re: How do I create new indicator variables in a dataset based on variables from a second dataset?

Dealing with wide data structures involves more complications. Here is a way to get there:

 

data d1;
input id (dx_1-dx_3) (:$8.);
datalines;
1 707.00 707.01 707.02
2 707.00 707.04 .
;

data d2;
input diag :$8. group_1-group_6;
datalines;
707.00 1 0 0 0 0 0
707.01 1 0 0 0 0 1
707.02 0 0 0 1 0 0
707.03 1 0 1 0 0 0
707.04 1 0 1 0 0 0
;

proc transpose data=d1 out=d1List;
by id;
var dx:;
run;

proc sql;
create table d1Groups as
select 
    d1List.id,
    d1List._name_, 
    d2.*
from 
    d1List left join 
    d2 on d1List.col1 = d2.diag
order by id, _name_;

create table diagList as
select 
    id, 
    _name_,
    diag,
    max(group_1) as group_1,
    max(group_2) as group_2,
    max(group_3) as group_3,
    max(group_4) as group_4,
    max(group_5) as group_5,
    max(group_6) as group_6
from d1Groups
group by id;
quit;

proc transpose data=diagList out=want(drop=_name_);
by id group_:;
id _name_;
var diag;
run;

proc print data=want noobs; 
var id dx_: group_:;
run;
PG

View solution in original post


All Replies
Grand Advisor
Posts: 10,210

Re: How do I create new indicator variables in a dataset based on variables from a second dataset?

Can you show a bit of your example data sets and what the output should actually look like.

If the data in your dataset 2 is appropriate then it may be possible to create a group of informats that might terribly simplify and comparison syntax.

Respected Advisor
Posts: 4,973

Re: How do I create new indicator variables in a dataset based on variables from a second dataset?

Would you feel comfortable transforming your second data set so that it ends up with just two variables:  ICD9 and group (taking on values from 1 through 6). That would be step 1.

 

The next step would be to use that version of the data to create a format, and then applying the format to the first data set.  I just wasn't sure if you would need help with step 1.

Solution
‎08-22-2016 03:54 PM
Respected Advisor
Posts: 4,606

Re: How do I create new indicator variables in a dataset based on variables from a second dataset?

Dealing with wide data structures involves more complications. Here is a way to get there:

 

data d1;
input id (dx_1-dx_3) (:$8.);
datalines;
1 707.00 707.01 707.02
2 707.00 707.04 .
;

data d2;
input diag :$8. group_1-group_6;
datalines;
707.00 1 0 0 0 0 0
707.01 1 0 0 0 0 1
707.02 0 0 0 1 0 0
707.03 1 0 1 0 0 0
707.04 1 0 1 0 0 0
;

proc transpose data=d1 out=d1List;
by id;
var dx:;
run;

proc sql;
create table d1Groups as
select 
    d1List.id,
    d1List._name_, 
    d2.*
from 
    d1List left join 
    d2 on d1List.col1 = d2.diag
order by id, _name_;

create table diagList as
select 
    id, 
    _name_,
    diag,
    max(group_1) as group_1,
    max(group_2) as group_2,
    max(group_3) as group_3,
    max(group_4) as group_4,
    max(group_5) as group_5,
    max(group_6) as group_6
from d1Groups
group by id;
quit;

proc transpose data=diagList out=want(drop=_name_);
by id group_:;
id _name_;
var diag;
run;

proc print data=want noobs; 
var id dx_: group_:;
run;
PG
New Contributor
Posts: 3

Re: How do I create new indicator variables in a dataset based on variables from a second dataset?

Thank you both! Using sql to create a new transposed dataset 1 worked. I was trying to avoid that initially since my data set was very large, but in the end it was easiest to code.
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 298 views
  • 0 likes
  • 4 in conversation