BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sakshi13
Obsidian | Level 7

Hello,

My data has a variable "conditions" which has values from 0 to 8, depending upon what the patient has. I want to do univariate regression analysis for each one of these condition. so, need to recode into separate variables for each value. For example: data is following where each number indicates  a disease, here 0: smoking status, 1: liver disease, 2: renal disease and so on. 

 

Condition   Smoker      liverdiseases    

0                      1                 0

0,1                    1                 1

0,2                    1                  0

1,3                     0                  1

1,2                     0                  1

0                         1                0

3,4,6,7                0                0

 

Smoker and liver diseases are examples for outcome that I am looking for.

 

I want to recode my data into new variables : smoker, liver, renal, and so on, where 1 is no smoking, 0 is smoking, and same for other variables.

When I used code:

data name;

set previous;

where Conditions contains '0';

smoker='1';

run;

 

I get the data with only '0' values, and remaining gets deleted. I want to sort all and keep all the data.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@Sakshi13 wrote:

Hello,

My data has a variable "conditions" which has values from 0 to 8, depending upon what the patient has. I want to do univariate regression analysis for each one of these condition. so, need to recode into separate variables for each value. For example: data is following where each number indicates  a disease, here 0: smoking status, 1: liver disease, 2: renal disease and so on. 

 

Condition     

0

0,1

0,2

1,3

1,2

2,3,5

3,4,6,7

 

I want to recode tmy data into new variables : smoker, liver, renal, and so on, where 1 is no smoking, 0 is smoking, and same for other variables.

When I used code:

data name;

set previous;

where Conditions contains '0';

smoker='1';

run;

 

I get the data with only '0' values, and remaining gets deleted. I want to sort all and keep all the data.

 

Thanks


I suggest that you use 1 for a "true" or 'yes' condition. Especially since it isn't very clear what a 0 = "no liver" or "no renal" might mean??

 

This might get you started:

data example;
   informat codestr $18.;
   input codestr;
   array v(0:9) tobacco cond2-cond10;/*< place your variable names here*/
                /* in the order of the code VALUES*/ 
   do i=0 to 9;
      v[i]= index(codestr,put(i,best1.))=0;
   end;
   drop i;
datalines;
0
0,2
3,4,5,6
0,1,2,3,4,5,6,7,8,9
;
run;

Since you didn't provide much detail in the form of which code goes with which topic can't go much more than add variables to suit.

This will not work if you have 2 digit codes or ones with decimals.

The code uses SAS behavior that a comparison will return 1 for true and 0 for false. So when the index value is not found this assigns 1.

View solution in original post

15 REPLIES 15
Reeza
Super User
Are all your codes single digits or can you have two digit codes? It affects the options for searching. One approach however, that's pretty easy is to split each condition to a separate line and then use PROC TRANSPOSE.

Use COUNTW() To get number of words, SCAN() to separate them, and PROC TRANSPOSE to flip it.

data long;
set have;

n_diags = countw(condition);
do i=1 to n_diags;
diag = scan(condition, i, ",");
output;
end;

run;

*then transpose to a wide format;
Sakshi13
Obsidian | Level 7

ALL codes are single digits, from 0 to 8. entries may include anything from single code to multiple code, depending upon the health conditions a person has

Astounding
PROC Star
A simple approach:

if indexw(condition, '0') then smoker=1;
else smoker=0;

Add as many such pairs of statements as you want.
Sakshi13
Obsidian | Level 7

I tried this but again it only defines '0' if there is only '0' in value. I want to include all entries that has zero in it.

ballardw
Super User

@Sakshi13 wrote:

I tried this but again it only defines '0' if there is only '0' in value. I want to include all entries that has zero in it.


???????

 

Example data for ALL the variables involved in this recode AND an example of what the actual output should look like for that given data.

You should provide enough examples to see the overall picture and exercise logic for each variable involved.

Sakshi13
Obsidian | Level 7
Yes, sorry for that, I edited my question, provided what I have and what I want
Astounding
PROC Star

Sorry, my fault.  By default, INDEXW uses only blanks as delimiters.  To use both blanks and commas as delimiters:

 

if indexw(condition, '0', ', ') then smoker=1;
else smoker=0;

ballardw
Super User

@Sakshi13 wrote:

Hello,

My data has a variable "conditions" which has values from 0 to 8, depending upon what the patient has. I want to do univariate regression analysis for each one of these condition. so, need to recode into separate variables for each value. For example: data is following where each number indicates  a disease, here 0: smoking status, 1: liver disease, 2: renal disease and so on. 

 

Condition     

0

0,1

0,2

1,3

1,2

2,3,5

3,4,6,7

 

I want to recode tmy data into new variables : smoker, liver, renal, and so on, where 1 is no smoking, 0 is smoking, and same for other variables.

When I used code:

data name;

set previous;

where Conditions contains '0';

smoker='1';

run;

 

I get the data with only '0' values, and remaining gets deleted. I want to sort all and keep all the data.

 

Thanks


I suggest that you use 1 for a "true" or 'yes' condition. Especially since it isn't very clear what a 0 = "no liver" or "no renal" might mean??

 

This might get you started:

data example;
   informat codestr $18.;
   input codestr;
   array v(0:9) tobacco cond2-cond10;/*< place your variable names here*/
                /* in the order of the code VALUES*/ 
   do i=0 to 9;
      v[i]= index(codestr,put(i,best1.))=0;
   end;
   drop i;
datalines;
0
0,2
3,4,5,6
0,1,2,3,4,5,6,7,8,9
;
run;

Since you didn't provide much detail in the form of which code goes with which topic can't go much more than add variables to suit.

This will not work if you have 2 digit codes or ones with decimals.

The code uses SAS behavior that a comparison will return 1 for true and 0 for false. So when the index value is not found this assigns 1.

Sakshi13
Obsidian | Level 7
I edited my question, sorry I realized it was not clear, this will help in providing solution.
Patrick
Opal | Level 21

@Sakshi13 

Try to understand what @ballardw posted because this is the solution to your problem.

Sakshi13
Obsidian | Level 7

Hello, I tried that, please confirm if I should be mentioning all the entries in datalines??

Patrick
Opal | Level 21

@Sakshi13 wrote:

Hello, I tried that, please confirm if I should be mentioning all the entries in datalines??


I guess that you didn't mark it as solution means that you didn't fully understand the proposed code.

You didn't tell us what the different codes mean; which code should populate a '1' into which variable. 

 

The only thing you need to amend to make the proposed code work is to list the variables in the array statement in the order of your codes. For example if variable liverdiseases should be set to 1 for code 3 then make sure this variable is listed as 3rd element in the array statement.

 

 

Sakshi13
Obsidian | Level 7

Hello, yes, I did mention the variables in the same order (Below is list of variables):

 

data example;
informat Comorbidities $18.;
input Comorbidities;
array v(0:8) smoker dm cad stroke cancer chf liver renal none ;/*< place your variable names here*/
/* in the order of the code VALUES*/
do i=0 to 8;
v[i]= index(Comorbidities,put(i,best1.))=0;
end;
drop i;
datalines;

 ***should I mention the complete data here? Or can I skip this part? ****
run;

 

You might be right, I am still trying to understand the code. 

 

Regards

Patrick
Opal | Level 21

@Sakshi13 wrote:

datalines;

 ***should I mention the complete data here? Or can I skip this part? ****
run;

 


All the Infile/Input/Datalines parts are only to create the sample data. If you've got this data already in a table then of course you will replace all of this with a SET statement.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 2166 views
  • 6 likes
  • 5 in conversation