BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jessica_join
Obsidian | Level 7

Create a new variable called disease and make it equal to 1 if a person has complaints of heartburns, sickness, and spasm, but no temperature or tiredness.

 

If the person does not have this exact symptom breakdown, make disease equal to 0.

 

Lastly, use PROC FREQ to determine what number and proportion of individuals in the dataset has the disease of interest.

 

I do not know how to do this. Any hints or help? I am studying for an exam and need to understand this program. 

%macro one (v1,v2); 
proc import out  = &v1
            datafile = "\\Client\C$\Users\jess\Desktop\data\&v2"
            DBMS     = xlsx replace;
            getnames = YES;
run;


%mend one; 
%one (Project3, Project3_f17);
 proc format; 
value symptom_no	1= "heartburns" 
					2= "Sickness"
					3= "Spasm"
					4= "Temperature"
					5= "Tiredness"; 
		
proc sort data=Project3 out= longsort; 
 	by id_no; 
run; 

data new; 
	set longsort; 
	by id_no; 
	Keep id_no sympt1 - sympt5 disease; 
	retain sympt1 - sympt5 disease; 
	disease=0;
	array New_a (1:5) $20 sympt1 - sympt5; 
	If first.id_no then
	do; 
	Do i = 1 to 5; 
		new_a (i) = .; 
		end; 
	end; 
	new_a (symptom_no) = symptom; 
	if last.id_no then output; 
		run; 
	proc print data= new; 
	run; 

This code gives me this...

Obs

id_no

sympt1

sympt2

sympt3

sympt4

sympt5

disease

1

1

Heartburns

Sickness

 

 

 

0

2

4

 

 

 

 

Tiredness

0

3

6

Heartburns

 

 

 

 

0

4

7

 

 

 

Temperature

 

0

5

8

Heartburns

 

 

 

Tiredness

0

6

9

 

Sickness

Spasm

Temperature

 

0

7

10

 

 

Spasm

 

Tiredness

0

8

11

 

 

Spasm

 

Tiredness

0

9

12

 

 

 

Temperature

Tiredness

0

10

13

 

 

Spasm

Temperature

 

0

11

14

 

 

Spasm

 

 

0

12

15

Heartburns

 

 

Temperature

 

0

13

16

 

 

 

 

Tiredness

0

14

17

Heartburns

 

 

 

 

0

15

19

 

Sickness

 

 

 

0

 

Except it does not include the disease column. 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
*Convert file to wide format from long format -> 'conversion';
data
new; set longsort; by id_no; Keep id_no sympt1 - sympt5 disease; retain sympt1 - sympt5 disease; disease=0; array New_a (1:5) $20 sympt1 - sympt5; If first.id_no then do; Do i = 1 to 5; new_a (i) = .; end; new_a (symptom_no) = symptom; if last.id_no then output; run; array New_b (1) disease; else disease='0'; end; end; run;
 

 

Now add another step that creats the flag based on your new data.


As I mentioned previously, your diseases each end up in the same column so you can make your IF THEN structured as below. Just change the variable names to match your variable name and the values to match the values you're checking for. Being able to open the NEW data to work with it makes this process a bit easier. Good Luck. 

 

 

data flag_disease;
set new;

if symptom1 = 'Heart' and symptom2 = 'Lung' and symptom3 = 'Other' then flag_disease = 1;
else flag_disease = 0;

run;

View solution in original post

14 REPLIES 14
Reeza
Super User

This is a second question that should start with your WIDE data set, ie answer from your previous question. 

You can look up the WHICHC function which will search an array for the presence of specific variable. 

 

 

Reeza
Super User
WIDE=NEW in your case. Your first, starting data is what we consider a long data set, and a wide data set is your new data set - variables go across.
jessica_join
Obsidian | Level 7

It seems like the WHICHC works for one variable but I do not know how to make it work for multiple. We also never learned about this.

 

I am thinking I am supposed to use if and then statements?

Reeza
Super User

You could do IF THEN, but then you’re checking all 5 for each of the values of interest, so that’s at least 15 conditions. You could simplify it with a loop. But if you’re attempting to learn rather than complete an assignment I’d lean toward the lazy approach myself. 

 

If symptom1=‘HD’ or symptom2 = ‘HD’ or symptom3=‘HD’ ....etc until 

 

WHICHC first parameter is the word you’re searching for, the remaining is a list of the variables or a reference to an ARRAY. 

 

Disease = WHICHC(‘Disease3’, of new(*)) and WHICHC(‘Disease2’, of new(*)) and WHICHC(‘Disease3’, of new(*));

 

 

 

 

 

Reeza
Super User
I guess if you put each specific diag in it’s own column it would be simplified but I’ve have never (in 10 years) seen real clinical data organized in that manner. It’s usually Diag1-Diag25.
jessica_join
Obsidian | Level 7

I have no clue where to even start... 😞

Reeza
Super User

Did the code posted not work? You need to replace Disease1/2/3 with the names you're looking for and create your array again, but otherwise it should work as posted. Note that case matters, i.e. heart does not equal Heart.

 

Here's some resources that may help:

http://video.sas.com/detail/videos/how-to-tutorials/video/4573016759001/performing-conditional-logic...

 

https://stats.idre.ucla.edu/sas/modules/creating-and-recoding-variables-in-sas/

 

The first SAS Programming e-course is free if you're interested.

I think you get a printable certificate at the end as well. 

jessica_join
Obsidian | Level 7

So he wants us to know how to do it using if and then without WHICHC.

 

I have this but it is not working...

data new; 
	set longsort; 
	by id_no; 
	Keep id_no sympt1 - sympt5 disease; 
	retain sympt1 - sympt5 disease; 
	disease=0;
	array New_a (1:5) $20 sympt1 - sympt5; 
	If first.id_no then
	do; 
	Do i = 1 to 5; 
		new_a (i) = .; 
		end; 

	new_a (symptom_no) = symptom; 
	if last.id_no then output; 
		run; 
	array New_b (1) disease; 
	If sympt1 ='heartburns' and sympt2='sickness' and sympt3='spasm' then disease='1';
	else disease='0'; 
		end; 
	end; 

		run; 
	proc print data= new; 
	run; 

What am I doing wrong?

jessica_join
Obsidian | Level 7

Yes attempting to learn/study. The exam is multiple choice so I dont know which way he would chose to ask the question. Hopefully the lazy way, since in real life, that is how it would be done.

 

Thank you for this. I will play with it now and see if I can figure it out. 

jessica_join
Obsidian | Level 7
data new; 
	set longsort; 
	by id_no; 
	Keep id_no sympt1 - sympt5 disease; 
	retain sympt1 - sympt5 disease; 
	disease=0;
	array New_a (1:5) $20 sympt1 - sympt5; 
	If first.id_no then
	do; 
	Do i = 1 to 5; 
		new_a (i) = .; 
		end; 

	new_a (symptom_no) = symptom; 
	if last.id_no then output; 
		run; 
	array New_b (1) disease; 
	If sympt1 ='heartburns' and sympt2='sickness' and sympt3='spasm' then disease='1';
	else disease='0'; 
		end; 
	end; 

		run; 
	proc print data= new; 
	run; 

I have this now and it is still not working. 

Reeza
Super User

Don't do it in the same step. 

Do the conversion first and the flag in a second step. Once you have that working you can combine them into a single step.

Otherwise you're trying to do a lot at once and you can't see the intermediary steps.

jessica_join
Obsidian | Level 7

What conversion?

Reeza
Super User
*Convert file to wide format from long format -> 'conversion';
data
new; set longsort; by id_no; Keep id_no sympt1 - sympt5 disease; retain sympt1 - sympt5 disease; disease=0; array New_a (1:5) $20 sympt1 - sympt5; If first.id_no then do; Do i = 1 to 5; new_a (i) = .; end; new_a (symptom_no) = symptom; if last.id_no then output; run; array New_b (1) disease; else disease='0'; end; end; run;
 

 

Now add another step that creats the flag based on your new data.


As I mentioned previously, your diseases each end up in the same column so you can make your IF THEN structured as below. Just change the variable names to match your variable name and the values to match the values you're checking for. Being able to open the NEW data to work with it makes this process a bit easier. Good Luck. 

 

 

data flag_disease;
set new;

if symptom1 = 'Heart' and symptom2 = 'Lung' and symptom3 = 'Other' then flag_disease = 1;
else flag_disease = 0;

run;

Tom
Super User Tom
Super User

Pick one structure for the data and calculate DISEASE from that.

So if you have the data that looks like the printout you posted.

data have ;
  length id_no 8 sympt1-sympt5 $20 disease 8;
  infile cards dsd dlm='|';
  input id_no sympt1-sympt5 disease ;
cards;
1|Heartburns|Sickness||||0
4|||||Tiredness|0
6|Heartburns|||||0
7||||Temperature||0
8|Heartburns||||Tiredness|0
9||Sickness|Spasm|Temperature||0
10|||Spasm||Tiredness|0
11|||Spasm||Tiredness|0
12||||Temperature|Tiredness|0
13|||Spasm|Temperature||0
14|||Spasm|||0
15|Heartburns|||Temperature||0
16|||||Tiredness|0
17|Heartburns|||||0
19||Sickness||||0
;

You can now calculate the variable DISEASE by testing the 5 symptoms.

data want ;
  set have ;
  if not missing(sympt1) and not missing(sympt2) and not missing(sympt3)
  and missing(sympt4) and missing(sympt5) then disease=1;
  else disease=0;
run;
proc freq data=want;
 tables disease;
run;
The FREQ Procedure

                                    Cumulative    Cumulative
disease    Frequency     Percent     Frequency      Percent
------------------------------------------------------------
      0          15      100.00            15       100.00

Looks like no one meets that definition.

Do you have more data? Or perhaps the criteria is a little different than the way you expressed it?

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 3988 views
  • 0 likes
  • 3 in conversation