Hi,
My data has variables DX1, DX2, DX3... DX30. These variables are character. Data in these variables looks like,
DX1 | DX2 | DX3 | DX4 | DX5 | DX6 | DX7 | DX8 | DX9 | DX10 | DX11 | DX12 | DX13 | DX14 | DX15 | DX16 | DX17 | DX18 | DX19 | DX20 | DX21 | DX22 | DX23 | DX24 | DX25 | DX26 | DX27 | DX28 | DX29 | DX30 |
4241 | 42832 | 20410 | 4280 | V707 | 4019 | 30391 | 42789 | 78194 | V1582 | ||||||||||||||||||||
4241 | 4260 | 496 | 4019 | 2724 | 4439 | V707 | 42731 | 41401 | V1254 | 32723 | V462 | V5861 | V4582 | V1582 | |||||||||||||||
4241 | 78551 | 42822 | 5853 | 2875 | 4168 | 4280 | 2851 | V707 | 496 | 2724 | 41400 | 27800 | 40390 | 44020 | |||||||||||||||
4241 | 42832 | 4280 | 25000 | 2720 | 2724 | 42731 | 43310 | 40390 | 5859 | 4168 | 311 | V5866 | V5867 | V5869 | |||||||||||||||
4241 | 42822 | 4260 | 4254 | 4280 | V1582 | 496 | 4019 | 2724 | 25000 | 60000 | V1083 | V1254 | 27800 | V1582 | |||||||||||||||
3950 | 41071 | 3910 | 39891 | 2536 | 41401 | 4019 | V4582 | 42731 | 340 | 59654 | 4168 | 4280 | 2724 | 43490 | |||||||||||||||
99602 | 42830 | 4280 | 2875 | 25040 | 4241 | V422 | 41400 | V1582 | V4582 | V707 | V5866 | 40390 | 5859 | 58381 | |||||||||||||||
4241 | 51882 | V707 | 4280 | 42611 | 4263 | 41401 | 4142 | 5859 | 40390 | 79311 | 2749 | 7993 | 33818 | 4928 | 25000 | 4439 | 53081 | 60000 | 2724 | 412 | V5863 | V4582 | V1582 | ||||||
4241 | V707 | 42843 | 4168 | V462 | V8543 | 4280 | 41401 | 27801 | 496 | 32723 | 2449 | 25000 | 4019 | 6202 | 6202 | 1103 | 412 | V5866 | V5863 | V173 | V5867 | V1582 | |||||||
4241 | 5849 | 45341 | 42832 | 4280 | 2875 | 78630 | V707 | 4263 | 42731 | 4019 | 2724 | 41400 | V4581 | V5861 | 42781 | 797 | V5866 | V1582 |
and I have variables like CHRON1, CHRON2, CHRON3, ............CHRON30. These variables are numeric and binary. Data in these variables looks like,
CHRON1 | CHRON2 | CHRON3 | CHRON4 | CHRON5 | CHRON6 | CHRON7 | CHRON8 | CHRON9 | CHRON10 | CHRON11 | CHRON12 | CHRON13 | CHRON14 | CHRON15 | CHRON16 | CHRON17 | CHRON18 | CHRON19 | CHRON20 | CHRON21 | CHRON22 | CHRON23 | CHRON24 | CHRON25 | CHRON26 | CHRON27 | CHRON28 | CHRON29 | CHRON30 |
1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | |||||||||||||||||||||
1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | ||||||||||||||||
1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | |||||||||||||||
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | |||||||||||||||
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | ||||||||||||||||
1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |||||||||||||||
0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | |||||||||||||||
1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | ||||||
1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | |||||||
1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 |
I want to create a new variable from two ARRAY's.
The first ARRAY name is VARIABLE and represents the series of variables DX1, DX2, DX3 ............DX30
The second ARRAY (CHRONIC) has same number of variables as first ARRAY and named as CHRON1, CHRON2, CHRON3, ............CHRON30.
CHRON1 Variable is binary and coded as "1" and "0" and it corresponds to DX1 and represents whether value in DX1 is chronic or not.
If I want to satisfy both the values in DX1 and CHRON1 to create new variable "NEWDX", can I use the "AND" in the "IF THEN statement" like below?
Should I have to mention the ARRAY name "CHRONIC" in DO OVER statement along with ARRAY name "VARIABLE" - "DO OVER VARIABLE AND CHRONIC"
or
can I just leave the "DO OVER VARIABLE" like below and it automatically considers the ARRAY "CHRONIC" ?
DATA PROJECT.SAMPLE;
SET PROJECT.SAMPLE;
ARRAY VARIABLE $ DX1-DX30;
ARRAY CHRONIC CHRON1-CHRON30;
NEWDX=0;
DO OVER VARIABLE;
IF (VARIABLE) in ("438","4380","43810","43811","43812","43813","43814","43819","43820",) AND (CHRONIC) in ('1')
THEN do;
NEWDX=1;
leave;end;
END;
RUN;
Please let me know.
Thank you.
If that is the case, then I wouldn't go with the do over approach. Rather, I would use an iterator variable i to reference the same entry for each of the two arrays like this
data want(drop=i);
set sample;
array variable $ dx1-dx30;
array chronic chron1-chron30;
newdx=0;
do i=1 to dim(variable);
if variable[i] in ("438","4380","43810","43811","43812","43813","43814","43819","43820") & strip(chronic[i]) eq "1" then do;
newdx=1;
leave;
end;
end;
run;
To get things started, your data looks like this, correct?
data one;
length DX1-DX30 $10;
input DX1-DX30;
infile datalines missover;
datalines;
4241 42832 20410 4280 V707 4019 30391 42789 78194 V1582
4241 4260 496 4019 2724 4439 V707 42731 41401 V1254 32723 V462 V5861 V4582 V1582
4241 78551 42822 5853 2875 4168 4280 2851 V707 496 2724 41400 27800 40390 44020
4241 42832 4280 25000 2720 2724 42731 43310 40390 5859 4168 311 V5866 V5867 V5869
4241 42822 4260 4254 4280 V1582 496 4019 2724 25000 60000 V1083 V1254 27800 V1582
3950 41071 3910 39891 2536 41401 4019 V4582 42731 340 59654 4168 4280 2724 43490
99602 42830 4280 2875 25040 4241 V422 41400 V1582 V4582 V707 V5866 40390 5859 58381
4241 51882 V707 4280 42611 4263 41401 4142 5859 40390 79311 2749 7993 33818 4928 25000 4439 53081 60000 2724 412 V5863 V4582 V1582
4241 V707 42843 4168 V462 V8543 4280 41401 27801 496 32723 2449 25000 4019 6202 6202 1103 412 V5866 V5863 V173 V5867 V1582
4241 5849 45341 42832 4280 2875 78630 V707 4263 42731 4019 2724 41400 V4581 V5861 42781 797 V5866 V1582
;
data two;
length CHRON1-CHRON30 $10;
input CHRON1-CHRON30;
infile datalines missover;
datalines;
1 1 1 1 0 1 1 1 0
1 1 1 1 1 1 0 1 1 0 1 1 0 0
1 0 1 1 1 1 1 0 0 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 0 0 1
1 1 0 1 1 1 1 0 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 0 0 0 0 1 1 1
1 0 0 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 0 0 0
1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0 1
1 0 0 1 1 1 0 0 1 1 1 1 1 0 0 1 1 0
;
data sample;
merge one two;
run;
Now, I do not follow the logic in creating the NEWDX variable. As I read it: If any of the same numbered entries in the two arrays fulfill the fact that the DX variable are among the values "438","4380","43810","43811","43812","43813","43814","43819","43820" and the CHRON variable is equal to 1 then NEWDX is equal to one for that record..
Is that correct?
If that is the case, then I wouldn't go with the do over approach. Rather, I would use an iterator variable i to reference the same entry for each of the two arrays like this
data want(drop=i);
set sample;
array variable $ dx1-dx30;
array chronic chron1-chron30;
newdx=0;
do i=1 to dim(variable);
if variable[i] in ("438","4380","43810","43811","43812","43813","43814","43819","43820") & strip(chronic[i]) eq "1" then do;
newdx=1;
leave;
end;
end;
run;
Thank you for replying.
Those are not two data sets. All the variables are in one data set. DX1 to DX30 and CHRON1 to CHRON30. I just pasted in two separate tables.
I want to satisfy the values in DX variables and "1" in CHRON variables for the corresponding value of DX varaibles to create new variable NEWDX.
If DX2 has 438 or 4380 or .... or 43820 for first observation and 1 in CHRON2 for the same first observation, then the NEWDX should be coded "1".
If DX3 has 438 or 4380 or .... or 43820 for second observation and 0 in CHRON2 then the NEWDX should be coded "0".
So, I want to satisfy both the DX variable and CHRON variable to make observation codes as "1" in NEWDX variable.
I will try your code.
Thank you.
"Those are not two data sets. All the variables are in one data set. DX1 to DX30 and CHRON1 to CHRON30. I just pasted in two separate tables. "
I am aware. I merge the two data sets at the bottom of my code. The data set sample should resemble your actual data, correct?
If I read your logic correct, then I believe my code is proper. If not please let me know.
Also, if I read your code correct, all of the values for NEWDX will be 0 with the posted data, right? Otherwise, please point to an example where NEWDX should equal 1.
You are right. All the values for NEWDX will be 0 with the posted data and the condition in IF THEN statement should change the observations which fulfill the condition to 1 in NEWDX variable.
I will try your code and let you know.
Thank you.
Good 🙂 Yes, please let me know.
Thank you very much.
Your code worked. I really appreciate your help.
Novinosrin code also worked.
Thank you very much Novinosrin.
I have a question. We used index variable i. I thought we have to mention that after the ARRAY names - variable and chronic. But it worked without mentioning the i value 30 after the ARRAY names.
So, is it not mandatory to mentioned that after the ARRAY names?
I am learning more about DIM and HBOUND after you mentioned DIM in DO statement.
Thank you very much for all your help.
I agree explicit arrays are the best and safe method.
if you are very good at arrays, you can experiment implicit for this one too. However, strongly not recommended
array variable $ dx1-dx30;
array chronic chron1-chron30;
newdx=0;
do over variable;
if variable in ("438","4380","43810","43811","43812","43813","43814","43819","43820") & strip(chronic[_i_]) eq "1" then do;
newdx=1;
leave;
end;
Thank you very much Novinosrin.
Sorry, I didn't know and I marked wrongly on my post as solution. I am new to the forum. I apologize.
Once gain thank you very much for helping me. I really appreciate.
lol That's fine and no big deal. I have already accomplished 2017 proc star, 2018 super user and 2019 super user. 3 yrs has already been a journey. Hahaha. All the best to you becoming the next Proc star/Super user 🙂
Thank you, though, I might not go to that level. I am a physician and work in a hospital. But I want to do research on health data base. I am working on data base for my research project. I completed biostatistics and worked on my hospital small data sets with SPSS, but I started learning SAS from last 2 months from online courses and books. Based on couple papers published regarding SAS code, I made these codes. I have been suffering from these issues for last few days.
Finally, you came like a hero and saved me.
Still prepping data.. I can analyze the data once data is ready.
Thank you very much.
Your code worked too.
Thank you very much Novinosrin.
One last small question.
you used "_i_" in " strip(chronic[_i_]) eq "1" ".
What does "_i_" represents? What happens if we use strip(chronic) eq "1" ?
Once gain thank you very much all your help.
What does "_i_" represents?
Thats auto generated index variable for implicit arrays. Just ignore implicit arrays is it is not recommended.
@kk11 wrote:
Your code worked too.
Thank you very much Novinosrin.
One last small question.
you used "_i_" in " strip(chronic[_i_]) eq "1" ".
What does "_i_" represents? What happens if we use strip(chronic) eq "1" ?
Once gain thank you very much all your help.
Thank you Novinosrin. Sunday, I had to leave to work. Just got to forum back. Thank you explanation.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.