DATA Step, Macro, Functions and more

Creating new variable and formulating its value to depend on other variables

Reply
Occasional Contributor
Posts: 11

Creating new variable and formulating its value to depend on other variables

Hi all!

I am having trouble creating a new variable "disease" that would determine whether or not the subject has the disease (1) or does not (0), depending on the symptoms listed on 5 different variables. If patient has sympt1-sympt3 (heartburns, sickness, and spasm) without sympt4 and/or sympt5 (temperature and/ or tiredness), then patient has disease (1). If the conditions are any different then disease should equal 0.

project3 task2 table.PNG

CODE:

options nodate nonumber;
****1.IMPORT;
%macro P3 (a, b, c, d);
proc import out= &a
datafile= "C:\HW5\&b"
dbms=xlsx replace;
getnames=yes;
run;
proc sort data=&a;
by &c &d;
run;
Proc print data = &a;
Run;
%mend P3;

%P3 (PROJECT3_F17, Project3.xlsx, id_no, symptom_no);
***2.REORGANIZE variables: USE ARRAY STATEMENT ALONG WITH FIRST.ID and LAST.ID;
data sympt (drop=symptom_no symptom);
set PROJECT3_F17;
retain sympt1-sympt5;
length sympt1-sympt5 $15;
array symptoms (5) $15. sympt1-sympt5;
by id_no;
if first.id_no then call missing(of symptoms(*));
symptoms(symptom_no)=symptom;
if last.id_no then output;
run;
proc print data=sympt;
run;
***3.Set disease=1 if sympt1+sympt2+sympt3 and no sympt4-sympt5 WORK OFF DATA SET SYMPT;
data dis_01 (drop=answer);
set sympt;
retain disease;
array dpthdisease (1) disease;
by id_no;
if first.id_no then dpthdisease (symp1-sympt5)=answer;
If answer EQ sympt1-sympt3 and NE sympt4 OR sympt5 then disease=1;
else disease=0;
if last.id_no then output;
run;
proc print data=disease;
run;

 

 

 LOG:

 

 

1 options nodate nonumber;
2 ****1.IMPORT;
3 %macro P3 (a, b, c, d);
4 proc import out= &a
5 datafile= "C:\HW5\&b"
6 dbms=xlsx replace;
7 getnames=yes;
8 run;
9 proc sort data=&a;
10 by &c &d;
11 run;
12 Proc print data = &a;
13 Run;
14 %mend P3;
15
16 %P3 (PROJECT3_F17, Project3.xlsx, id_no, symptom_no);

NOTE: The import data set has 18082 observations and 3 variables.
NOTE: WORK.PROJECT3_F17 data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 1.41 seconds
cpu time 0.85 seconds

 

NOTE: There were 18082 observations read from the data set WORK.PROJECT3_F17.
NOTE: The data set WORK.PROJECT3_F17 has 18082 observations and 3 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.15 seconds
cpu time 0.01 seconds


NOTE: Writing HTML Body file: sashtml.htm

NOTE: There were 18082 observations read from the data set WORK.PROJECT3_F17.
NOTE: PROCEDURE PRINT used (Total process time):
real time 4.96 seconds
cpu time 3.20 seconds


17 ***2.REORGANIZE variables: USE ARRAY STATEMENT ALONG WITH FIRST.ID and LAST.ID;
18 data sympt (drop=symptom_no symptom);
19 set PROJECT3_F17;
20 retain sympt1-sympt5;
21 length sympt1-sympt5 $15;
22 array symptoms (5) $15. sympt1-sympt5;
23 by id_no;
24 if first.id_no then call missing(of symptoms(*));
25 symptoms(symptom_no)=symptom;
26 if last.id_no then output;
27 run;

NOTE: There were 18082 observations read from the data set WORK.PROJECT3_F17.
NOTE: The data set WORK.SYMPT has 12549 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.66 seconds
cpu time 0.07 seconds


28 proc print data=sympt;
29 run;

NOTE: There were 12549 observations read from the data set WORK.SYMPT.
NOTE: PROCEDURE PRINT used (Total process time):
real time 3.42 seconds
cpu time 3.34 seconds


30 ***3.Set disease=1 if sympt1+sympt2+sympt3 and no sympt4-sympt5 WORK OFF DATA SET SYMPT;
31 data dis_01 (drop=answer);
32 set sympt;
33 retain disease;
34 array dpthdisease (1) disease;
35 by id_no;
36 if first.id_no then dpthdisease (symp1-sympt5)=answer;
37 If answer EQ sympt1-sympt3 and NE sympt4 OR sympt5 then disease=1;
------
22
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, -, /, <, <=,
<>, =, >, ><, >=, AND, EQ, GE, GT, IN, LE, LT, MAX, MIN, NE, NG, NL, NOTIN, OR, [,
^=, {, |, ||, ~=.

38 else disease=0;
39 if last.id_no then output;
40 run;

NOTE: Character values have been converted to numeric values at the places given by:
(Line)Smiley SadColumn).
36:44 37:18 37:25 37:39 37:49
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.DIS_01 may be incomplete. When this step was stopped there were 0
observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.23 seconds
cpu time 0.04 seconds

 

53 proc print data=dis_01;
54 run;

NOTE: No observations in data set WORK.DIS_01.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

Super User
Posts: 23,332

Re: Creating new variable and formulating its value to depend on other variables

Posted in reply to aespinarey

Is this for a course? This exact question with the same data has been asked and answered on here.

Ask a Question
Discussion stats
  • 1 reply
  • 69 views
  • 0 likes
  • 2 in conversation