BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kk11
Obsidian | Level 7

Hi,

 

I tried the following code to create a new variable from existing multiple variables using ARRAY statement and DO OVER loop. But I am not able to get the right result.

 

My data has variables DX1, DX2, DX3... DX30. These variables are character. Data in these variables looks like,

 

 

DX1DX2DX3DX4DX5DX6DX7DX8DX9DX10DX11DX12DX13DX14DX15DX16DX17DX18DX19DX20DX21DX22DX23DX24DX25DX26DX27DX28DX29DX30
424142832204104280V7074019303914278978194V1582                    
42414260496401927244439V7074273141401V125432723V462V5861V4582V1582               
4241785514282258532875416842802851V707496272441400278004039044020               
4241428324280250002720272442731433104039058594168311V5866V5867V5869               
424142822426042544280V1582496401927242500060000V1083V125427800V1582               
3950410713910398912536414014019V4582427313405965441684280272443490               
996024283042802875250404241V42241400V1582V4582V707V586640390585958381               
424151882V7074280426114263414014142585940390793112749799333818492825000443953081600002724412V5863V4582V1582      
4241V707428434168V462V854342804140127801496327232449250004019620262021103412V5866V5863V173V5867V1582       
4241584945341428324280287578630V7074263427314019272441400V4581V586142781797V5866V1582           

 

 

Some of the values in these variables like - "V1582","3051 ","30510","30511","30512","30513","64900","64901","64902","64903","64904" - represent one condition. I want to make a new variable with coding "1" if the observations in the variables from DX1 to DX30 have the values I mentioned in the prior sentence and coding "0" if not.

 

I used the following code, but I am not getting the right result.

 

Data set has 42690044 observations. 

 

 

DATA PROJECT.SAMPLE;
SET PROJECT.SAMPLE;
ARRAY VARIABLE $ DX1-DX30;
DO OVER VARIABLE;
IF (VARIABLE) in ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904") THEN NEWDX=1;  ELSE NEWDX=0;
END;
RUN;

 

 

The frequency of the new variable is,

 

(NEWDX = 0) ==> 42687442 (99.99 %)

(NEWDX = 1) ==> 2602 (0.01%)

 

 

Actually I added the frequencies of each of these valves  - "V1582","3051 ","30510","30511","30512","30513","64900","64901","64902","64903","64904" - in all the variables DX1 to DX30 and I am getting around ,

 

 

(NEWDX = 0) ==> 72 %

(NEWDX = 1) ==> 28%.

 

 

The observations with new condition should be 28%, but when I use the my code it is giving only 0.01%.

 

So my code is not doing exactly what I want. But the code worked for few observation like 100 in a small sample data set. I am not sure what is going wrong when I apply to large data set.

 

And I am not getting any error.

 

Can any one please help me to accomplish my task.

 

Thank you,

Kamesh.

 

 

 

 

 

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
kk11
Obsidian | Level 7

Your explanation is really helpful. Now I understood.

 

Thank you very much for teaching me.

 

Please let me know if there are any corrections I have to make in the code I posed using 2 ARRAY's as my condition need to satisfy the values in two different series of variables.

 

 

 

 

 

 

 

View solution in original post

12 REPLIES 12
novinosrin
Tourmaline | Level 20

Occasional learner and do over implicit array, thats interesting

 

Try this

 

 

DATA PROJECT.SAMPLE;
SET PROJECT.SAMPLE;
ARRAY VARIABLE $ DX1-DX30;
NEWDX=0;
DO OVER VARIABLE;
IF (VARIABLE) in ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904") 
THEN do;
NEWDX=1;  
leave;
end; END; RUN
kk11
Obsidian | Level 7

Thank you for responding.

 

I tried the code you mentioned, but I got error.

 

 

 

 

 


1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
61
62 DATA PROJECT.SAMPLE;
63 SET PROJECT.SAMPLE;
64 ARRAY VARIABLE $ DX1-DX30;
65 NEWDX=0;
66 DO OVER VARIABLE;
67 IF (VARIABLE) in
67 ! ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904")
68 THEN do;
69 NEWDX=1;
70 leave;end;
71 END;
72 RUN
73
74 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
_______
22
76
ERROR 22-322: Syntax error, expecting one of the following: ;, CANCEL, PGM.

ERROR 76-322: Syntax error, statement will be ignored.

75 ODS HTML CLOSE;
76 &GRAPHTERM; ;*';*";*/;RUN;QUIT;
76 &GRAPHTERM; ;*';*";*/;RUN;QUIT;
_
180
ERROR 180-322: Statement is not valid or it is used out of proper order.

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set PROJECT.SAMPLE may be incomplete. When this step was stopped there were 0
observations and 347 variables.
NOTE: DATA statement used (Total process time):
real time 0.76 seconds
cpu time 0.01 seconds

77 QUIT;RUN;
78 ODS HTML5 (ID=WEB) CLOSE;
79
80 ODS RTF (ID=WEB) CLOSE;
81 ODS PDF (ID=WEB) CLOSE;
NOTE: ODS PDF(WEB) printed no output.
(This sometimes results from failing to place a RUN statement before the ODS PDF(WEB) CLOSE
statement.)
82 FILENAME _GSFNAME;
NOTE: Fileref _GSFNAME has been deassigned.


83 DATA _NULL_;
84 RUN;

NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds


85 OPTIONS VALIDMEMNAME=COMPAT;
86 OPTIONS NOTES STIMER SOURCE SYNTAXCHECK;
87

 

 

novinosrin
Tourmaline | Level 20

missed a semicolon after RUN

 

DATA PROJECT.SAMPLE;
SET PROJECT.SAMPLE;
ARRAY VARIABLE $ DX1-DX30;
NEWDX=0;
DO OVER VARIABLE;
IF (VARIABLE) in ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904") 
THEN do;
NEWDX=1;  
leave;end;
END;
RUN;
kk11
Obsidian | Level 7

Sorry, I didn't realize. Was excited with your reply and copied and pasted the code. Thank you.

novinosrin
Tourmaline | Level 20

No worries and it was my mistake too. Sorry about that. Let us know if that works

kk11
Obsidian | Level 7

I have another question,

If I want to create a new varaible in the same way this time from two ARRAY's. The second ARRAY has same number of varaibels as first ARRAY and named as CHRON1, CHRON2, CHRON3, ............CHRON30.


CHRON1 Varaible is binary and coded as "1" and "0" and it corresponds to DX1 and represents whether vakve in DX1 is chronic or not.

CHRON1 is numeric.


If I want to satisfy both the values in DX1 and CHRON1 to cerate new varaible "NEWDX", can I use the "AND" in the "IF THEN statement" like below?

Should I have to mention the ARRAY name "CHRONIC" in DO OVER statement along with ARRAY name "VARIABLE" - "DO OVER VARIABLE AND CHRONIC"

or

can I just leave the "DO OVER VARIABLE" like below and it automatically considers the ARRAY "CHRONIC" ?

 

DATA PROJECT.SAMPLE;
SET PROJECT.SAMPLE;
ARRAY VARIABLE $ DX1-DX30;
ARRAY CHRONIC CHRON1-CHRON30;
NEWDX=0;
DO OVER VARIABLE;
IF (VARIABLE) in ("438","4380","43810","43811","43812","43813","43814","43819","43820",) AND (CHRONIC) in ('1')
THEN do;
NEWDX=1;
leave;end;
END;
RUN;

 

Please let me know.

Thank you.

kk11
Obsidian | Level 7

Thank you very much.

 

You solved my problem. It worked.

 

I am a new learner. Trying to understand more about IF THEN and ELSE statement. 

 

I have couple questions.

 

I tried my code without using " ELSE NEWDX=0;" after "IF compress(DIAGNOSIS) in ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904") THEN Smoking=1;".

 

Like below, 

 

 

DATA PROJECT.SAMPLE;
SET PROJECT.SAMPLE;
ARRAY VARIABLE $ DX1-DX30;
DO OVER VARIABLE;
IF compress(VARIABLE) in ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904") THEN NEWDX=1; 
END;
RUN; 

 

 

It worked, but it coded NEWDX as 1 if the "IF THEN" statement is satisfied and left remaining blank (missing data). 

 

Frequency table

 

NEWDX          Frequency      Percent

.                         7472                 .


1                         3061                 100

 

 

After I used your code,

 

I got similar result, but it coded "0" fro remaining observations.

 

Frequency table

 

NEWDX          Frequency      Percent

0                         7472              70.94


1                        3061                29.06

 

 

 

 

When I used " ELSE NEWDX=0;"

 

after the 

"IF compress(DIAGNOSIS) in ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904") THEN Smoking=1;"

 

I am getting 

 

NEWDX          Frequency      Percent

0                        10527             99.94


1                        6                      0.06

 

 

So, using ELSE making my result wrong.

 

Now I learned that I am using ELSE in wrong context. 

 

Can you explain me why ELSE is not working?

 

I understood your code little bit, but can you please explain me your code and suggest any good article that will give me better understanding about DO loops and will help me to make good code like you did.

 

I really appreciate your help.

 

Thank you very much.

novinosrin
Tourmaline | Level 20

Ok, 

 

Let's take your 1st conditional statement

 

IF (VARIABLE) in ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904") THEN NEWDX=1;  ELSE NEWDX=0;

 

1. Newdx var is assigned a blank(missing)  value at the top of each iteration of the datastep besides at compile time.

2. When you loop hrough the elements of the array, you will need to exit the loop soon as NEWDX=1 otherwise the loop continues and execute and if the next element happens to be not true(newdx=1) then else condition takes effect and overwrites the NEWDX=0, which is not desired. That's what I modified with leave statement.

3. So for all the other values where NEWDX=1 is not true, the blank(missing) value initialised at compile time remains in effect and we have to find a way to make zero.

4. So assigning NEWDX=0 at te top makes sure this is done and, when and if the conditional statement in the loop is true, NEWDX=0 is replcaed by 1.

5. You can also use boolean expressions like

 


do over variable;
Newdx=(VARIABLE) in ("V1582","3051","30510","30511","30512","30513","64900","64901","64902","64903","64904");
if newdx then leave;
end;

This is just a fancy 

 

HTH

 

kk11
Obsidian | Level 7

Your explanation is really helpful. Now I understood.

 

Thank you very much for teaching me.

 

Please let me know if there are any corrections I have to make in the code I posed using 2 ARRAY's as my condition need to satisfy the values in two different series of variables.

 

 

 

 

 

 

 

kk11
Obsidian | Level 7

I have another question,

If I want to create a new varaible in the same way this time from two ARRAY's. The second ARRAY has same number of varaibels as first ARRAY and named as CHRON1, CHRON2, CHRON3, ............CHRON30.


CHRON1 Varaible is binary and coded as "1" and "0" and it corresponds to DX1 and represents whether vakve in DX1 is chronic or not.

CHRON1 is numeric.


If I want to satisfy both the values in DX1 and CHRON1 to cerate new varaible "NEWDX", can I use the "AND" in the "IF THEN statement" like below?

Should I have to mention the ARRAY name "CHRONIC" in DO OVER statement along with ARRAY name "VARIABLE" - "DO OVER VARIABLE AND CHRONIC"

or

can I just leave the "DO OVER VARIABLE" like below and it automatically considers the ARRAY "CHRONIC" ?

 

DATA PROJECT.SAMPLE;
SET PROJECT.SAMPLE;
ARRAY VARIABLE $ DX1-DX30;
ARRAY CHRONIC CHRON1-CHRON30;
NEWDX=0;
DO OVER VARIABLE;
IF (VARIABLE) in ("438","4380","43810","43811","43812","43813","43814","43819","43820",) AND (CHRONIC) in ('1')
THEN do;
NEWDX=1;
leave;end;
END;
RUN;

 

Please let me know.

Thank you.

Allaluiah
Quartz | Level 8

@kk11  Hi, Would be more appropriate and courteous to mark novinosrin's solution as accepted than marking your own. Thanks!

kk11
Obsidian | Level 7

Sorry, I am new and didn't know. I will mark novinosrin's solution as accepted. I thought it will represent the whole conversation. 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 5427 views
  • 0 likes
  • 3 in conversation