DATA Step, Macro, Functions and more

SAS new variable

Accepted Solution Solved
Reply
Contributor
Posts: 20
Accepted Solution

SAS new variable

Create a new variable called disease and make it equal to 1 if a person has complaints of heartburns, sickness, and spasm, but no temperature or tiredness.

 

If the person does not have this exact symptom breakdown, make disease equal to 0.

 

Lastly, use PROC FREQ to determine what number and proportion of individuals in the dataset has the disease of interest.

 

I do not know how to do this. Any hints or help? I am studying for an exam and need to understand this program. 

%macro one (v1,v2); 
proc import out  = &v1
            datafile = "\\Client\C$\Users\jess\Desktop\data\&v2"
            DBMS     = xlsx replace;
            getnames = YES;
run;


%mend one; 
%one (Project3, Project3_f17);
 proc format; 
value symptom_no	1= "heartburns" 
					2= "Sickness"
					3= "Spasm"
					4= "Temperature"
					5= "Tiredness"; 
		
proc sort data=Project3 out= longsort; 
 	by id_no; 
run; 

data new; 
	set longsort; 
	by id_no; 
	Keep id_no sympt1 - sympt5 disease; 
	retain sympt1 - sympt5 disease; 
	disease=0;
	array New_a (1:5) $20 sympt1 - sympt5; 
	If first.id_no then
	do; 
	Do i = 1 to 5; 
		new_a (i) = .; 
		end; 
	end; 
	new_a (symptom_no) = symptom; 
	if last.id_no then output; 
		run; 
	proc print data= new; 
	run; 

This code gives me this...

Obs

id_no

sympt1

sympt2

sympt3

sympt4

sympt5

disease

1

1

Heartburns

Sickness

 

 

 

0

2

4

 

 

 

 

Tiredness

0

3

6

Heartburns

 

 

 

 

0

4

7

 

 

 

Temperature

 

0

5

8

Heartburns

 

 

 

Tiredness

0

6

9

 

Sickness

Spasm

Temperature

 

0

7

10

 

 

Spasm

 

Tiredness

0

8

11

 

 

Spasm

 

Tiredness

0

9

12

 

 

 

Temperature

Tiredness

0

10

13

 

 

Spasm

Temperature

 

0

11

14

 

 

Spasm

 

 

0

12

15

Heartburns

 

 

Temperature

 

0

13

16

 

 

 

 

Tiredness

0

14

17

Heartburns

 

 

 

 

0

15

19

 

Sickness

 

 

 

0

 

Except it does not include the disease column. 


Accepted Solutions
Solution
‎11-24-2017 05:10 PM
Super User
Posts: 23,342

Re: SAS new variable

Posted in reply to jessica_join
*Convert file to wide format from long format -> 'conversion';
data
new; set longsort; by id_no; Keep id_no sympt1 - sympt5 disease; retain sympt1 - sympt5 disease; disease=0; array New_a (1:5) $20 sympt1 - sympt5; If first.id_no then do; Do i = 1 to 5; new_a (i) = .; end; new_a (symptom_no) = symptom; if last.id_no then output; run; array New_b (1) disease; else disease='0'; end; end; run;
 

 

Now add another step that creats the flag based on your new data.


As I mentioned previously, your diseases each end up in the same column so you can make your IF THEN structured as below. Just change the variable names to match your variable name and the values to match the values you're checking for. Being able to open the NEW data to work with it makes this process a bit easier. Good Luck. 

 

 

data flag_disease;
set new;

if symptom1 = 'Heart' and symptom2 = 'Lung' and symptom3 = 'Other' then flag_disease = 1;
else flag_disease = 0;

run;

View solution in original post


All Replies
Super User
Posts: 23,342

Re: SAS new variable

Posted in reply to jessica_join

This is a second question that should start with your WIDE data set, ie answer from your previous question. 

You can look up the WHICHC function which will search an array for the presence of specific variable. 

 

 

Super User
Posts: 23,342

Re: SAS new variable

WIDE=NEW in your case. Your first, starting data is what we consider a long data set, and a wide data set is your new data set - variables go across.
Contributor
Posts: 20

Re: SAS new variable

[ Edited ]

It seems like the WHICHC works for one variable but I do not know how to make it work for multiple. We also never learned about this.

 

I am thinking I am supposed to use if and then statements?

Super User
Posts: 23,342

Re: SAS new variable

Posted in reply to jessica_join

You could do IF THEN, but then you’re checking all 5 for each of the values of interest, so that’s at least 15 conditions. You could simplify it with a loop. But if you’re attempting to learn rather than complete an assignment I’d lean toward the lazy approach myself. 

 

If symptom1=‘HD’ or symptom2 = ‘HD’ or symptom3=‘HD’ ....etc until 

 

WHICHC first parameter is the word you’re searching for, the remaining is a list of the variables or a reference to an ARRAY. 

 

Disease = WHICHC(‘Disease3’, of new(*)) and WHICHC(‘Disease2’, of new(*)) and WHICHC(‘Disease3’, of new(*));

 

 

 

 

 

Super User
Posts: 23,342

Re: SAS new variable

I guess if you put each specific diag in it’s own column it would be simplified but I’ve have never (in 10 years) seen real clinical data organized in that manner. It’s usually Diag1-Diag25.
Contributor
Posts: 20

Re: SAS new variable

I have no clue where to even start... ):

Super User
Posts: 23,342

Re: SAS new variable

Posted in reply to jessica_join

Did the code posted not work? You need to replace Disease1/2/3 with the names you're looking for and create your array again, but otherwise it should work as posted. Note that case matters, i.e. heart does not equal Heart.

 

Here's some resources that may help:

http://video.sas.com/detail/videos/how-to-tutorials/video/4573016759001/performing-conditional-logic...

 

https://stats.idre.ucla.edu/sas/modules/creating-and-recoding-variables-in-sas/

 

The first SAS Programming e-course is free if you're interested.

I think you get a printable certificate at the end as well. 

Contributor
Posts: 20

Re: SAS new variable

So he wants us to know how to do it using if and then without WHICHC.

 

I have this but it is not working...

data new; 
	set longsort; 
	by id_no; 
	Keep id_no sympt1 - sympt5 disease; 
	retain sympt1 - sympt5 disease; 
	disease=0;
	array New_a (1:5) $20 sympt1 - sympt5; 
	If first.id_no then
	do; 
	Do i = 1 to 5; 
		new_a (i) = .; 
		end; 

	new_a (symptom_no) = symptom; 
	if last.id_no then output; 
		run; 
	array New_b (1) disease; 
	If sympt1 ='heartburns' and sympt2='sickness' and sympt3='spasm' then disease='1';
	else disease='0'; 
		end; 
	end; 

		run; 
	proc print data= new; 
	run; 

What am I doing wrong?

Contributor
Posts: 20

Re: SAS new variable

Yes attempting to learn/study. The exam is multiple choice so I dont know which way he would chose to ask the question. Hopefully the lazy way, since in real life, that is how it would be done.

 

Thank you for this. I will play with it now and see if I can figure it out. 

Contributor
Posts: 20

Re: SAS new variable

Posted in reply to jessica_join
data new; 
	set longsort; 
	by id_no; 
	Keep id_no sympt1 - sympt5 disease; 
	retain sympt1 - sympt5 disease; 
	disease=0;
	array New_a (1:5) $20 sympt1 - sympt5; 
	If first.id_no then
	do; 
	Do i = 1 to 5; 
		new_a (i) = .; 
		end; 

	new_a (symptom_no) = symptom; 
	if last.id_no then output; 
		run; 
	array New_b (1) disease; 
	If sympt1 ='heartburns' and sympt2='sickness' and sympt3='spasm' then disease='1';
	else disease='0'; 
		end; 
	end; 

		run; 
	proc print data= new; 
	run; 

I have this now and it is still not working. 

Super User
Posts: 23,342

Re: SAS new variable

Posted in reply to jessica_join

Don't do it in the same step. 

Do the conversion first and the flag in a second step. Once you have that working you can combine them into a single step.

Otherwise you're trying to do a lot at once and you can't see the intermediary steps.

Contributor
Posts: 20

Re: SAS new variable

What conversion?

Solution
‎11-24-2017 05:10 PM
Super User
Posts: 23,342

Re: SAS new variable

Posted in reply to jessica_join
*Convert file to wide format from long format -> 'conversion';
data
new; set longsort; by id_no; Keep id_no sympt1 - sympt5 disease; retain sympt1 - sympt5 disease; disease=0; array New_a (1:5) $20 sympt1 - sympt5; If first.id_no then do; Do i = 1 to 5; new_a (i) = .; end; new_a (symptom_no) = symptom; if last.id_no then output; run; array New_b (1) disease; else disease='0'; end; end; run;
 

 

Now add another step that creats the flag based on your new data.


As I mentioned previously, your diseases each end up in the same column so you can make your IF THEN structured as below. Just change the variable names to match your variable name and the values to match the values you're checking for. Being able to open the NEW data to work with it makes this process a bit easier. Good Luck. 

 

 

data flag_disease;
set new;

if symptom1 = 'Heart' and symptom2 = 'Lung' and symptom3 = 'Other' then flag_disease = 1;
else flag_disease = 0;

run;

Super User
Super User
Posts: 7,941

Re: SAS new variable

Posted in reply to jessica_join

Pick one structure for the data and calculate DISEASE from that.

So if you have the data that looks like the printout you posted.

data have ;
  length id_no 8 sympt1-sympt5 $20 disease 8;
  infile cards dsd dlm='|';
  input id_no sympt1-sympt5 disease ;
cards;
1|Heartburns|Sickness||||0
4|||||Tiredness|0
6|Heartburns|||||0
7||||Temperature||0
8|Heartburns||||Tiredness|0
9||Sickness|Spasm|Temperature||0
10|||Spasm||Tiredness|0
11|||Spasm||Tiredness|0
12||||Temperature|Tiredness|0
13|||Spasm|Temperature||0
14|||Spasm|||0
15|Heartburns|||Temperature||0
16|||||Tiredness|0
17|Heartburns|||||0
19||Sickness||||0
;

You can now calculate the variable DISEASE by testing the 5 symptoms.

data want ;
  set have ;
  if not missing(sympt1) and not missing(sympt2) and not missing(sympt3)
  and missing(sympt4) and missing(sympt5) then disease=1;
  else disease=0;
run;
proc freq data=want;
 tables disease;
run;
The FREQ Procedure

                                    Cumulative    Cumulative
disease    Frequency     Percent     Frequency      Percent
------------------------------------------------------------
      0          15      100.00            15       100.00

Looks like no one meets that definition.

Do you have more data? Or perhaps the criteria is a little different than the way you expressed it?

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 335 views
  • 0 likes
  • 3 in conversation