Hi,
I tried the following code to create a new variable from existing multiple variables using ARRAY statement and DO OVER loop. But I am not able to get the right result.
My data has variables DX1, DX2, DX3... DX30. These variables are character. Data in these variables looks like,
DX1 | DX2 | DX3 | DX4 | DX5 | DX6 | DX7 | DX8 | DX9 | DX10 | DX11 | DX12 | DX13 | DX14 | DX15 | DX16 | DX17 | DX18 | DX19 | DX20 | DX21 | DX22 | DX23 | DX24 | DX25 | DX26 | DX27 | DX28 | DX29 | DX30 |
4241 | 42832 | 20410 | 4280 | V707 | 4019 | 30391 | 42789 | 78194 | V1582 | ||||||||||||||||||||
4241 | 4260 | 496 | 4019 | 2724 | 4439 | V707 | 42731 | 41401 | V1254 | 32723 | V462 | V5861 | V4582 | V1582 | |||||||||||||||
4241 | 78551 | 42822 | 5853 | 2875 | 4168 | 4280 | 2851 | V707 | 496 | 2724 | 41400 | 27800 | 40390 | 44020 | |||||||||||||||
4241 | 42832 | 4280 | 25000 | 2720 | 2724 | 42731 | 43310 | 40390 | 5859 | 4168 | 311 | V5866 | V5867 | V5869 | |||||||||||||||
4241 | 42822 | 4260 | 4254 | 4280 | V1582 | 496 | 4019 | 2724 | 25000 | 60000 | V1083 | V1254 | 27800 | V1582 | |||||||||||||||
3950 | 41071 | 3910 | 39891 | 2536 | 41401 | 4019 | V4582 | 42731 | 340 | 59654 | 4168 | 4280 | 2724 | 43490 | |||||||||||||||
99602 | 42830 | 4280 | 2875 | 25040 | 4241 | V422 | 41400 | V1582 | V4582 | V707 | V5866 | 40390 | 5859 | 58381 | |||||||||||||||
4241 | 51882 | V707 | 4280 | 42611 | 4263 | 41401 | 4142 | 5859 | 40390 | 79311 | 2749 | 7993 | 33818 | 4928 | 25000 | 4439 | 53081 | 60000 | 2724 | 412 | V5863 | V4582 | V1582 | ||||||
4241 | V707 | 42843 | 4168 | V462 | V8543 | 4280 | 41401 | 27801 | 496 | 32723 | 2449 | 25000 | 4019 | 6202 | 6202 | 1103 | 412 | V5866 | V5863 | V173 | V5867 | V1582 | |||||||
4241 | 5849 | 45341 | 42832 | 4280 | 2875 | 78630 | V707 | 4263 | 42731 | 4019 | 2724 | 41400 | V4581 | V5861 | 42781 | 797 | V5866 | V1582 |
Some of the values in these variables like - "V1582","3051 ","30510","30511","30512","30513","64900","64901","64902","64903","64904" - represent one condition. I want to make a new variable with coding "1" if the observations in the variables from DX1 to DX30 have the values I mentioned in the prior sentence and coding "0" if not.
I used the following code, but I am not getting the right result.
Data set has 42690044 observations.
DATA PROJECT.SAMPLE;
SET PROJECT.SAMPLE;
ARRAY VARIABLE $ DX1-DX30;
DO OVER VARIABLE;
IF (VARIABLE) in ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904") THEN NEWDX=1; ELSE NEWDX=0;
END;
RUN;
The frequency of the new variable is,
(NEWDX = 0) ==> 42687442 (99.99 %)
(NEWDX = 1) ==> 2602 (0.01%)
Actually I added the frequencies of each of these valves - "V1582","3051 ","30510","30511","30512","30513","64900","64901","64902","64903","64904" - in all the variables DX1 to DX30 and I am getting around ,
(NEWDX = 0) ==> 72 %
(NEWDX = 1) ==> 28%.
The observations with new condition should be 28%, but when I use the my code it is giving only 0.01%.
So my code is not doing exactly what I want. But the code worked for few observation like 100 in a small sample data set. I am not sure what is going wrong when I apply to large data set.
And I am not getting any error.
Can any one please help me to accomplish my task.
Thank you,
Kamesh.
Your explanation is really helpful. Now I understood.
Thank you very much for teaching me.
Please let me know if there are any corrections I have to make in the code I posed using 2 ARRAY's as my condition need to satisfy the values in two different series of variables.
Occasional learner and do over implicit array, thats interesting
Try this
DATA PROJECT.SAMPLE;
SET PROJECT.SAMPLE;
ARRAY VARIABLE $ DX1-DX30;
NEWDX=0;
DO OVER VARIABLE;
IF (VARIABLE) in ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904")
THEN do;
NEWDX=1;
leave;
end;
END;
RUN
Thank you for responding.
I tried the code you mentioned, but I got error.
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
61
62 DATA PROJECT.SAMPLE;
63 SET PROJECT.SAMPLE;
64 ARRAY VARIABLE $ DX1-DX30;
65 NEWDX=0;
66 DO OVER VARIABLE;
67 IF (VARIABLE) in
67 ! ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904")
68 THEN do;
69 NEWDX=1;
70 leave;end;
71 END;
72 RUN
73
74 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
_______
22
76
ERROR 22-322: Syntax error, expecting one of the following: ;, CANCEL, PGM.
ERROR 76-322: Syntax error, statement will be ignored.
75 ODS HTML CLOSE;
76 &GRAPHTERM; ;*';*";*/;RUN;QUIT;
76 &GRAPHTERM; ;*';*";*/;RUN;QUIT;
_
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set PROJECT.SAMPLE may be incomplete. When this step was stopped there were 0
observations and 347 variables.
NOTE: DATA statement used (Total process time):
real time 0.76 seconds
cpu time 0.01 seconds
77 QUIT;RUN;
78 ODS HTML5 (ID=WEB) CLOSE;
79
80 ODS RTF (ID=WEB) CLOSE;
81 ODS PDF (ID=WEB) CLOSE;
NOTE: ODS PDF(WEB) printed no output.
(This sometimes results from failing to place a RUN statement before the ODS PDF(WEB) CLOSE
statement.)
82 FILENAME _GSFNAME;
NOTE: Fileref _GSFNAME has been deassigned.
83 DATA _NULL_;
84 RUN;
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
85 OPTIONS VALIDMEMNAME=COMPAT;
86 OPTIONS NOTES STIMER SOURCE SYNTAXCHECK;
87
missed a semicolon after RUN
DATA PROJECT.SAMPLE;
SET PROJECT.SAMPLE;
ARRAY VARIABLE $ DX1-DX30;
NEWDX=0;
DO OVER VARIABLE;
IF (VARIABLE) in ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904")
THEN do;
NEWDX=1;
leave;end;
END;
RUN;
Sorry, I didn't realize. Was excited with your reply and copied and pasted the code. Thank you.
No worries and it was my mistake too. Sorry about that. Let us know if that works
I have another question,
If I want to create a new varaible in the same way this time from two ARRAY's. The second ARRAY has same number of varaibels as first ARRAY and named as CHRON1, CHRON2, CHRON3, ............CHRON30.
CHRON1 Varaible is binary and coded as "1" and "0" and it corresponds to DX1 and represents whether vakve in DX1 is chronic or not.
CHRON1 is numeric.
If I want to satisfy both the values in DX1 and CHRON1 to cerate new varaible "NEWDX", can I use the "AND" in the "IF THEN statement" like below?
Should I have to mention the ARRAY name "CHRONIC" in DO OVER statement along with ARRAY name "VARIABLE" - "DO OVER VARIABLE AND CHRONIC"
or
can I just leave the "DO OVER VARIABLE" like below and it automatically considers the ARRAY "CHRONIC" ?
DATA PROJECT.SAMPLE;
SET PROJECT.SAMPLE;
ARRAY VARIABLE $ DX1-DX30;
ARRAY CHRONIC CHRON1-CHRON30;
NEWDX=0;
DO OVER VARIABLE;
IF (VARIABLE) in ("438","4380","43810","43811","43812","43813","43814","43819","43820",) AND (CHRONIC) in ('1')
THEN do;
NEWDX=1;
leave;end;
END;
RUN;
Please let me know.
Thank you.
Thank you very much.
You solved my problem. It worked.
I am a new learner. Trying to understand more about IF THEN and ELSE statement.
I have couple questions.
I tried my code without using " ELSE NEWDX=0;" after "IF compress(DIAGNOSIS) in ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904") THEN Smoking=1;".
Like below,
DATA PROJECT.SAMPLE;
SET PROJECT.SAMPLE;
ARRAY VARIABLE $ DX1-DX30;
DO OVER VARIABLE;
IF compress(VARIABLE) in ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904") THEN NEWDX=1;
END;
RUN;
It worked, but it coded NEWDX as 1 if the "IF THEN" statement is satisfied and left remaining blank (missing data).
Frequency table
NEWDX Frequency Percent
. 7472 .
1 3061 100
After I used your code,
I got similar result, but it coded "0" fro remaining observations.
Frequency table
NEWDX Frequency Percent
0 7472 70.94
1 3061 29.06
When I used " ELSE NEWDX=0;"
after the
"IF compress(DIAGNOSIS) in ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904") THEN Smoking=1;"
I am getting
NEWDX Frequency Percent
0 10527 99.94
1 6 0.06
So, using ELSE making my result wrong.
Now I learned that I am using ELSE in wrong context.
Can you explain me why ELSE is not working?
I understood your code little bit, but can you please explain me your code and suggest any good article that will give me better understanding about DO loops and will help me to make good code like you did.
I really appreciate your help.
Thank you very much.
Ok,
Let's take your 1st conditional statement
IF (VARIABLE) in ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904") THEN NEWDX=1; ELSE NEWDX=0;
1. Newdx var is assigned a blank(missing) value at the top of each iteration of the datastep besides at compile time.
2. When you loop hrough the elements of the array, you will need to exit the loop soon as NEWDX=1 otherwise the loop continues and execute and if the next element happens to be not true(newdx=1) then else condition takes effect and overwrites the NEWDX=0, which is not desired. That's what I modified with leave statement.
3. So for all the other values where NEWDX=1 is not true, the blank(missing) value initialised at compile time remains in effect and we have to find a way to make zero.
4. So assigning NEWDX=0 at te top makes sure this is done and, when and if the conditional statement in the loop is true, NEWDX=0 is replcaed by 1.
5. You can also use boolean expressions like
do over variable;
Newdx=(VARIABLE) in ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904");
if newdx then leave;
end;
This is just a fancy
HTH
Your explanation is really helpful. Now I understood.
Thank you very much for teaching me.
Please let me know if there are any corrections I have to make in the code I posed using 2 ARRAY's as my condition need to satisfy the values in two different series of variables.
I have another question,
If I want to create a new varaible in the same way this time from two ARRAY's. The second ARRAY has same number of varaibels as first ARRAY and named as CHRON1, CHRON2, CHRON3, ............CHRON30.
CHRON1 Varaible is binary and coded as "1" and "0" and it corresponds to DX1 and represents whether vakve in DX1 is chronic or not.
CHRON1 is numeric.
If I want to satisfy both the values in DX1 and CHRON1 to cerate new varaible "NEWDX", can I use the "AND" in the "IF THEN statement" like below?
Should I have to mention the ARRAY name "CHRONIC" in DO OVER statement along with ARRAY name "VARIABLE" - "DO OVER VARIABLE AND CHRONIC"
or
can I just leave the "DO OVER VARIABLE" like below and it automatically considers the ARRAY "CHRONIC" ?
DATA PROJECT.SAMPLE;
SET PROJECT.SAMPLE;
ARRAY VARIABLE $ DX1-DX30;
ARRAY CHRONIC CHRON1-CHRON30;
NEWDX=0;
DO OVER VARIABLE;
IF (VARIABLE) in ("438","4380","43810","43811","43812","43813","43814","43819","43820",) AND (CHRONIC) in ('1')
THEN do;
NEWDX=1;
leave;end;
END;
RUN;
Please let me know.
Thank you.
@kk11 Hi, Would be more appropriate and courteous to mark novinosrin's solution as accepted than marking your own. Thanks!
Sorry, I am new and didn't know. I will mark novinosrin's solution as accepted. I thought it will represent the whole conversation.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.