BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kk11
Obsidian | Level 7

Hi,

 

My data has variables DX1, DX2, DX3... DX30. These variables are character. Data in these variables looks like,

 

 

DX1DX2DX3DX4DX5DX6DX7DX8DX9DX10DX11DX12DX13DX14DX15DX16DX17DX18DX19DX20DX21DX22DX23DX24DX25DX26DX27DX28DX29DX30
424142832204104280V7074019303914278978194V1582                    
42414260496401927244439V7074273141401V125432723V462V5861V4582V1582               
4241785514282258532875416842802851V707496272441400278004039044020               
4241428324280250002720272442731433104039058594168311V5866V5867V5869               
424142822426042544280V1582496401927242500060000V1083V125427800V1582               
3950410713910398912536414014019V4582427313405965441684280272443490               
996024283042802875250404241V42241400V1582V4582V707V586640390585958381               
424151882V7074280426114263414014142585940390793112749799333818492825000443953081600002724412V5863V4582V1582      
4241V707428434168V462V854342804140127801496327232449250004019620262021103412V5866V5863V173V5867V1582       
4241584945341428324280287578630V7074263427314019272441400V4581V586142781797V5866V1582           

 

 

 

 

and I have  variables like CHRON1, CHRON2, CHRON3, ............CHRON30. These variables are numeric and binary. Data in these variables looks like,

 

CHRON1CHRON2CHRON3CHRON4CHRON5CHRON6CHRON7CHRON8CHRON9CHRON10CHRON11CHRON12CHRON13CHRON14CHRON15CHRON16CHRON17CHRON18CHRON19CHRON20CHRON21CHRON22CHRON23CHRON24CHRON25CHRON26CHRON27CHRON28CHRON29CHRON30
111101110                     
11111101101100                
101111100111111               
111111111111000               
11111111111001                
110111101111111               
011111110000111               
100111111101001111111000      
10111111111111000100001       
100111001111100110            

 

 

 

I want to create a new variable from two ARRAY's.

The first ARRAY name is VARIABLE and represents the series of variables DX1, DX2, DX3 ............DX30


The second ARRAY (CHRONIC) has same number of variables as first ARRAY and named as CHRON1, CHRON2, CHRON3, ............CHRON30.


CHRON1 Variable is binary and coded as "1" and "0" and it corresponds to DX1 and represents whether value in DX1 is chronic or not.

 

If I want to satisfy both the values in DX1 and CHRON1 to create new variable "NEWDX", can I use the "AND" in the "IF THEN statement" like below?

Should I have to mention the ARRAY name "CHRONIC" in DO OVER statement along with ARRAY name "VARIABLE" - "DO OVER VARIABLE AND CHRONIC"

or

can I just leave the "DO OVER VARIABLE" like below and it automatically considers the ARRAY "CHRONIC" ?

 

 

 

DATA PROJECT.SAMPLE;
SET PROJECT.SAMPLE;
ARRAY VARIABLE $ DX1-DX30;
ARRAY CHRONIC CHRON1-CHRON30;
NEWDX=0;
DO OVER VARIABLE;
IF (VARIABLE) in ("438","4380","43810","43811","43812","43813","43814","43819","43820",) AND (CHRONIC) in ('1')
THEN do;
NEWDX=1;
leave;end;
END;
RUN;

 

 

 

Please let me know.

Thank you.

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

If that is the case, then I wouldn't go with the do over approach. Rather, I would use an iterator variable i to reference the same entry for each of the two arrays like this

 

data want(drop=i);
   set sample;
   array variable $ dx1-dx30;
   array chronic chron1-chron30;
   newdx=0;

   do i=1 to dim(variable);
      if variable[i] in ("438","4380","43810","43811","43812","43813","43814","43819","43820") & strip(chronic[i]) eq "1" then do;
         newdx=1;
         leave;
      end;
   end;

run;

View solution in original post

14 REPLIES 14
PeterClemmensen
Tourmaline | Level 20

To get things started, your data looks like this, correct?

 

data one;
length DX1-DX30 $10;
input DX1-DX30;
infile datalines missover;
datalines;
4241 42832 20410 4280 V707 4019 30391 42789 78194 V1582
4241 4260 496 4019 2724 4439 V707 42731 41401 V1254 32723 V462 V5861 V4582 V1582
4241 78551 42822 5853 2875 4168 4280 2851 V707 496 2724 41400 27800 40390 44020
4241 42832 4280 25000 2720 2724 42731 43310 40390 5859 4168 311 V5866 V5867 V5869
4241 42822 4260 4254 4280 V1582 496 4019 2724 25000 60000 V1083 V1254 27800 V1582
3950 41071 3910 39891 2536 41401 4019 V4582 42731 340 59654 4168 4280 2724 43490
99602 42830 4280 2875 25040 4241 V422 41400 V1582 V4582 V707 V5866 40390 5859 58381
4241 51882 V707 4280 42611 4263 41401 4142 5859 40390 79311 2749 7993 33818 4928 25000 4439 53081 60000 2724 412 V5863 V4582 V1582
4241 V707 42843 4168 V462 V8543 4280 41401 27801 496 32723 2449 25000 4019 6202 6202 1103 412 V5866 V5863 V173 V5867 V1582
4241 5849 45341 42832 4280 2875 78630 V707 4263 42731 4019 2724 41400 V4581 V5861 42781 797 V5866 V1582
;

data two;
length CHRON1-CHRON30 $10;
input CHRON1-CHRON30;
infile datalines missover;
datalines;
1	1	1	1	0	1	1	1	0	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 
1	1	1	1	1	1	0	1	1	0	1	1	0	0	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 
1	0	1	1	1	1	1	0	0	1	1	1	1	1	1	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 
1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 
1	1	1	1	1	1	1	1	1	1	1	0	0	1	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 
1	1	0	1	1	1	1	0	1	1	1	1	1	1	1	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 
0	1	1	1	1	1	1	1	0	0	0	0	1	1	1	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 
1	0	0	1	1	1	1	1	1	1	0	1	0	0	1	1	1	1	1	1	1	0	0	0	 	 	 	 	 	 
1	0	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	1	0	0	0	0	1	 	 	 	 	 	 	 
1	0	0	1	1	1	0	0	1	1	1	1	1	0	0	1	1	0	 	 	 	 	 	 	 	 	 	 	 	 
;

data sample;
merge one two;
run;

 

Now, I do not follow the logic in creating the NEWDX variable. As I read it: If any of the same numbered entries in the two arrays fulfill the fact that the DX variable are among the values "438","4380","43810","43811","43812","43813","43814","43819","43820" and the CHRON variable is equal to 1 then NEWDX is equal to one for that record..

 

Is that correct?

PeterClemmensen
Tourmaline | Level 20

If that is the case, then I wouldn't go with the do over approach. Rather, I would use an iterator variable i to reference the same entry for each of the two arrays like this

 

data want(drop=i);
   set sample;
   array variable $ dx1-dx30;
   array chronic chron1-chron30;
   newdx=0;

   do i=1 to dim(variable);
      if variable[i] in ("438","4380","43810","43811","43812","43813","43814","43819","43820") & strip(chronic[i]) eq "1" then do;
         newdx=1;
         leave;
      end;
   end;

run;
kk11
Obsidian | Level 7

Thank you for replying.

 

Those are not two data sets. All the variables are in one data set. DX1 to DX30 and CHRON1 to CHRON30. I just pasted in two separate tables. 

 

I want to satisfy the values in DX variables and "1" in CHRON variables for the corresponding value of DX varaibles to create new variable  NEWDX.

 

If DX2 has 438 or 4380 or .... or 43820 for first observation and 1 in CHRON2 for the same first observation, then the NEWDX should be coded "1".

 

If DX3 has 438 or 4380 or .... or 43820 for second observation and 0 in CHRON2 then the NEWDX should be coded "0".

 

So, I want to satisfy both the DX variable and CHRON variable to make observation codes as "1" in NEWDX variable.

 

I will try your code.

 

Thank you.

PeterClemmensen
Tourmaline | Level 20

"Those are not two data sets. All the variables are in one data set. DX1 to DX30 and CHRON1 to CHRON30. I just pasted in two separate tables. "

 

I am aware. I merge the two data sets at the bottom of my code. The data set sample should resemble your actual data, correct?

 

If I read your logic correct, then I believe my code is proper. If not please let me know. 

 

Also, if I read your code correct, all of the values for NEWDX will be 0 with the posted data, right? Otherwise, please point to an example where NEWDX should equal 1.

kk11
Obsidian | Level 7

You are right. All the values for NEWDX will be 0 with the posted data and the condition in IF THEN statement should change the observations which fulfill the condition to 1 in  NEWDX variable. 

 

I will try your code and let you know.

 

Thank you.

kk11
Obsidian | Level 7

Thank you very much. 

 

Your code worked. I really appreciate your help.

 

Novinosrin code also worked.

 

Thank you very much Novinosrin.

 

 

I have a question. We used index variable i. I thought we have to mention that after the ARRAY names - variable and chronic. But it worked without mentioning the i value 30 after the ARRAY names.

 

So, is it not mandatory to mentioned that after the ARRAY names?

 

I am learning more about DIM and HBOUND after you mentioned DIM in DO statement.

 

Thank you very much for all your help.  

novinosrin
Tourmaline | Level 20

I agree explicit arrays are the best and safe method. 

if you are very good at arrays, you can experiment implicit for this one too. However, strongly not recommended

 

  array variable $ dx1-dx30;
   array chronic chron1-chron30;
   newdx=0;

   do over variable;
      if variable in ("438","4380","43810","43811","43812","43813","43814","43819","43820") & strip(chronic[_i_]) eq "1" then do;
         newdx=1;
         leave;
      end;
kk11
Obsidian | Level 7

Thank you very much Novinosrin.

 

Sorry, I didn't know and I marked wrongly on my post as solution. I am new to the forum. I apologize. 

 

Once gain thank you very much for helping me. I really appreciate.

novinosrin
Tourmaline | Level 20

lol That's fine and no big deal. I have already accomplished 2017 proc star, 2018 super user and 2019 super user. 3 yrs has already been a journey. Hahaha. All the best to you becoming the next Proc star/Super user 🙂

kk11
Obsidian | Level 7

Thank you, though, I might not go to that level. I am a physician and work in a hospital. But I want to do research on health data base. I am working on data base for my research project. I completed biostatistics and worked on my hospital small data sets with SPSS, but I started learning SAS from last 2 months from online courses and books. Based on couple papers published regarding SAS code, I made these codes. I have been suffering from these issues for last few days. 

 

Finally, you came like a hero and saved me. 

 

Still prepping data.. I can analyze the data once data is ready. 

 

Thank you very much.

kk11
Obsidian | Level 7

Your code worked too.

 

Thank you very much Novinosrin.

 

One last small question.

 

you used "_i_" in "  strip(chronic[_i_]) eq "1"  ".

 

What does "_i_" represents?  What happens if we use strip(chronic) eq "1" ?

 

Once gain thank you very much all your help.

novinosrin
Tourmaline | Level 20

What does "_i_" represents?  

Thats auto generated index variable for implicit arrays. Just ignore implicit arrays is it is not recommended.

 

 

 


@kk11 wrote:

Your code worked too.

 

Thank you very much Novinosrin.

 

One last small question.

 

you used "_i_" in "  strip(chronic[_i_]) eq "1"  ".

 

What does "_i_" represents?  What happens if we use strip(chronic) eq "1" ?

 

Once gain thank you very much all your help.


 

kk11
Obsidian | Level 7

Thank you Novinosrin. Sunday, I had to leave to work. Just got to forum back. Thank you explanation.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 2046 views
  • 0 likes
  • 3 in conversation