11-24-2017 03:30 PM
Create a new variable called disease and make it equal to 1 if a person has complaints of heartburns, sickness, and spasm, but no temperature or tiredness.
If the person does not have this exact symptom breakdown, make disease equal to 0.
Lastly, use PROC FREQ to determine what number and proportion of individuals in the dataset has the disease of interest.
I do not know how to do this. Any hints or help? I am studying for an exam and need to understand this program.
I have to use if and then statements.
This is what I have so far.... it is not working.
proc format; value symptom_no 1= "heartburns" 2= "Sickness" 3= "Spasm" 4= "Temperature" 5= "Tiredness"; proc sort data=Project3 out= longsort; by id_no; run; data new; set longsort; by id_no; Keep id_no sympt1 - sympt5 disease; retain sympt1 - sympt5 disease; disease=0; array New_a (1:5) $20 sympt1 - sympt5; If first.id_no then do; Do i = 1 to 5; new_a (i) = .; end; new_a (symptom_no) = symptom; if last.id_no then output; run; array New_b (1) disease; If sympt1 ='heartburns' and sympt2='sickness' and sympt3='spasm' then disease='1'; else disease='0'; end; end; run; proc print data= new; run;
11-24-2017 04:39 PM
You need to get rid of the RUN statement in the middle of your data step. It looks like there is an extra end statement. This would be a lot easier to look at with reasonable indentation.
11-24-2017 05:24 PM
Its a little unclear on the structure of your original data, so I've made an assumption that it simply has a single numeric symptom column to start with (with values 1 to 5) and multiple rows per ID depending on number of symptoms. In order to make the logic more transparent i.e. move away from arrays (for now). I reckon it might look something like:
data new; set longsort; by id_no; retain heartburn sickness spasm temperature tiredness; if (first.id_no) then do; heartburn =0; sickness =0; spasm =0; temperature =0; tiredness =0; end; if symptom=1 then heartburn =1; if symptom=2 then sickness =1; if symptom=3 then spasm =1; if symptom=4 then temperature=1; if symptom=5 then tiredness =1; if (heartburn) and (sickness) and (spasm) and ^(temperature) and ^(tiredness) then disease =1; else disease =0; if (last.id_no); run; proc freq data=new; table disease; run;
11-24-2017 05:52 PM
You're right, and that's cool. Hopefully the code above helps to explain what the arrays are trying to do. With arrays it would probably look something like this:
data new(drop=symptom i); set longsort; by id_no; retain sympt1 - sympt5; array new_a (1:5) sympt1 - sympt5; do i = 1 to 5 ; if (first.id_no) then do; new_a(i) =0; end; if symptom=i then new_a(i) =1; end; if (sympt1) and (sympt2) and (sympt3) and ^(sympt4) and ^(sympt5) then disease =1; else disease =0; if (last.id_no); run; proc freq data=new; table disease; run;
11-24-2017 06:04 PM
@Enio some small changes - noted by Warren earlier I think. That END after the line below should be moved up, or the do loop could be simplified.
if symptom=i then new_a(i) =1; end; *This needs to be moved up;
The i reference isn't correct in this case, because the diagnosis are being moved to specific points. Those are defined by symbol_num (sp?) variable.
Though... symptoms in the previous question were text, but they appear to be character here so I'm slightly confused myself.
A previous version of this question is linked to below for your information.
11-24-2017 06:08 PM
Also note that it seems like a big misconception among many programmers is that you need if/then/else to assign a binary value to a variable. NOT TRUE! This works just fine and requires only a single statement.
variable = boolean-expression
A boolean expression (and/or/eq/ne/gt/ge/lt/le etc) resolves to zero or one. You can assign that value to a variable.
11-25-2017 01:55 AM
This is my preferred way of visual coding style:
data new; set longsort; by id_no; keep id_no sympt1 - sympt5 disease; retain sympt1 - sympt5 disease; array New_a (1:5) $20 sympt1 - sympt5; disease = 0; If first.id_no then do; do i = 1 to 5; new_a (i) = .; end; new_a (symptom_no) = symptom; if last.id_no then output; if sympt1 ='heartburns' and sympt2='sickness' and sympt3='spasm' then disease='1'; else disease='0'; end; run;
I removed the surplus end (which sticks out like a beacon when the code is properly formatted) and the erroneous run.
Now you can see that your if last. is within the if first. block, and will only be executed if there's only one row per id_no. I guess that's not what yo wanted.
Although I'm famous for my notoriously cluttered desk, my codes are always neat and tidy.