Create a new variable called disease and make it equal to 1 if a person has complaints of heartburns, sickness, and spasm, but no temperature or tiredness.
If the person does not have this exact symptom breakdown, make disease equal to 0.
Lastly, use PROC FREQ to determine what number and proportion of individuals in the dataset has the disease of interest.
I do not know how to do this. Any hints or help? I am studying for an exam and need to understand this program.
%macro one (v1,v2);
proc import out = &v1
datafile = "\\Client\C$\Users\jess\Desktop\data\&v2"
DBMS = xlsx replace;
getnames = YES;
run;
%mend one;
%one (Project3, Project3_f17);
proc format;
value symptom_no 1= "heartburns"
2= "Sickness"
3= "Spasm"
4= "Temperature"
5= "Tiredness";
proc sort data=Project3 out= longsort;
by id_no;
run;
data new;
set longsort;
by id_no;
Keep id_no sympt1 - sympt5 disease;
retain sympt1 - sympt5 disease;
disease=0;
array New_a (1:5) $20 sympt1 - sympt5;
If first.id_no then
do;
Do i = 1 to 5;
new_a (i) = .;
end;
end;
new_a (symptom_no) = symptom;
if last.id_no then output;
run;
proc print data= new;
run;
This code gives me this...
Obs | id_no | sympt1 | sympt2 | sympt3 | sympt4 | sympt5 | disease |
1 | 1 | Heartburns | Sickness |
|
|
| 0 |
2 | 4 |
|
|
|
| Tiredness | 0 |
3 | 6 | Heartburns |
|
|
|
| 0 |
4 | 7 |
|
|
| Temperature |
| 0 |
5 | 8 | Heartburns |
|
|
| Tiredness | 0 |
6 | 9 |
| Sickness | Spasm | Temperature |
| 0 |
7 | 10 |
|
| Spasm |
| Tiredness | 0 |
8 | 11 |
|
| Spasm |
| Tiredness | 0 |
9 | 12 |
|
|
| Temperature | Tiredness | 0 |
10 | 13 |
|
| Spasm | Temperature |
| 0 |
11 | 14 |
|
| Spasm |
|
| 0 |
12 | 15 | Heartburns |
|
| Temperature |
| 0 |
13 | 16 |
|
|
|
| Tiredness | 0 |
14 | 17 | Heartburns |
|
|
|
| 0 |
15 | 19 |
| Sickness |
|
|
| 0 |
Except it does not include the disease column.
*Convert file to wide format from long format -> 'conversion';
data new;
set longsort;
by id_no;
Keep id_no sympt1 - sympt5 disease;
retain sympt1 - sympt5 disease;
disease=0;
array New_a (1:5) $20 sympt1 - sympt5;
If first.id_no then
do;
Do i = 1 to 5;
new_a (i) = .;
end;
new_a (symptom_no) = symptom;
if last.id_no then output;
run;
array New_b (1) disease;
else disease='0';
end;
end;
run;
Now add another step that creats the flag based on your new data.
As I mentioned previously, your diseases each end up in the same column so you can make your IF THEN structured as below. Just change the variable names to match your variable name and the values to match the values you're checking for. Being able to open the NEW data to work with it makes this process a bit easier. Good Luck.
data flag_disease;
set new;
if symptom1 = 'Heart' and symptom2 = 'Lung' and symptom3 = 'Other' then flag_disease = 1;
else flag_disease = 0;
run;
This is a second question that should start with your WIDE data set, ie answer from your previous question.
You can look up the WHICHC function which will search an array for the presence of specific variable.
It seems like the WHICHC works for one variable but I do not know how to make it work for multiple. We also never learned about this.
I am thinking I am supposed to use if and then statements?
You could do IF THEN, but then you’re checking all 5 for each of the values of interest, so that’s at least 15 conditions. You could simplify it with a loop. But if you’re attempting to learn rather than complete an assignment I’d lean toward the lazy approach myself.
If symptom1=‘HD’ or symptom2 = ‘HD’ or symptom3=‘HD’ ....etc until
WHICHC first parameter is the word you’re searching for, the remaining is a list of the variables or a reference to an ARRAY.
Disease = WHICHC(‘Disease3’, of new(*)) and WHICHC(‘Disease2’, of new(*)) and WHICHC(‘Disease3’, of new(*));
I have no clue where to even start... 😞
Did the code posted not work? You need to replace Disease1/2/3 with the names you're looking for and create your array again, but otherwise it should work as posted. Note that case matters, i.e. heart does not equal Heart.
Here's some resources that may help:
https://stats.idre.ucla.edu/sas/modules/creating-and-recoding-variables-in-sas/
The first SAS Programming e-course is free if you're interested.
I think you get a printable certificate at the end as well.
So he wants us to know how to do it using if and then without WHICHC.
I have this but it is not working...
data new;
set longsort;
by id_no;
Keep id_no sympt1 - sympt5 disease;
retain sympt1 - sympt5 disease;
disease=0;
array New_a (1:5) $20 sympt1 - sympt5;
If first.id_no then
do;
Do i = 1 to 5;
new_a (i) = .;
end;
new_a (symptom_no) = symptom;
if last.id_no then output;
run;
array New_b (1) disease;
If sympt1 ='heartburns' and sympt2='sickness' and sympt3='spasm' then disease='1';
else disease='0';
end;
end;
run;
proc print data= new;
run;
What am I doing wrong?
Yes attempting to learn/study. The exam is multiple choice so I dont know which way he would chose to ask the question. Hopefully the lazy way, since in real life, that is how it would be done.
Thank you for this. I will play with it now and see if I can figure it out.
data new;
set longsort;
by id_no;
Keep id_no sympt1 - sympt5 disease;
retain sympt1 - sympt5 disease;
disease=0;
array New_a (1:5) $20 sympt1 - sympt5;
If first.id_no then
do;
Do i = 1 to 5;
new_a (i) = .;
end;
new_a (symptom_no) = symptom;
if last.id_no then output;
run;
array New_b (1) disease;
If sympt1 ='heartburns' and sympt2='sickness' and sympt3='spasm' then disease='1';
else disease='0';
end;
end;
run;
proc print data= new;
run;
I have this now and it is still not working.
Don't do it in the same step.
Do the conversion first and the flag in a second step. Once you have that working you can combine them into a single step.
Otherwise you're trying to do a lot at once and you can't see the intermediary steps.
What conversion?
*Convert file to wide format from long format -> 'conversion';
data new;
set longsort;
by id_no;
Keep id_no sympt1 - sympt5 disease;
retain sympt1 - sympt5 disease;
disease=0;
array New_a (1:5) $20 sympt1 - sympt5;
If first.id_no then
do;
Do i = 1 to 5;
new_a (i) = .;
end;
new_a (symptom_no) = symptom;
if last.id_no then output;
run;
array New_b (1) disease;
else disease='0';
end;
end;
run;
Now add another step that creats the flag based on your new data.
As I mentioned previously, your diseases each end up in the same column so you can make your IF THEN structured as below. Just change the variable names to match your variable name and the values to match the values you're checking for. Being able to open the NEW data to work with it makes this process a bit easier. Good Luck.
data flag_disease;
set new;
if symptom1 = 'Heart' and symptom2 = 'Lung' and symptom3 = 'Other' then flag_disease = 1;
else flag_disease = 0;
run;
Pick one structure for the data and calculate DISEASE from that.
So if you have the data that looks like the printout you posted.
data have ;
length id_no 8 sympt1-sympt5 $20 disease 8;
infile cards dsd dlm='|';
input id_no sympt1-sympt5 disease ;
cards;
1|Heartburns|Sickness||||0
4|||||Tiredness|0
6|Heartburns|||||0
7||||Temperature||0
8|Heartburns||||Tiredness|0
9||Sickness|Spasm|Temperature||0
10|||Spasm||Tiredness|0
11|||Spasm||Tiredness|0
12||||Temperature|Tiredness|0
13|||Spasm|Temperature||0
14|||Spasm|||0
15|Heartburns|||Temperature||0
16|||||Tiredness|0
17|Heartburns|||||0
19||Sickness||||0
;
You can now calculate the variable DISEASE by testing the 5 symptoms.
data want ;
set have ;
if not missing(sympt1) and not missing(sympt2) and not missing(sympt3)
and missing(sympt4) and missing(sympt5) then disease=1;
else disease=0;
run;
proc freq data=want;
tables disease;
run;
The FREQ Procedure Cumulative Cumulative disease Frequency Percent Frequency Percent ------------------------------------------------------------ 0 15 100.00 15 100.00
Looks like no one meets that definition.
Do you have more data? Or perhaps the criteria is a little different than the way you expressed it?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.