Hello,
I have a data set with variables that I want to compare using an array, but currently have an if-then statement.
Data set:
dtp1 | dtp1src | dtp2 | dtp2src | dtp3 | dtp3src | dtp4 | dtp4src | dtp5 | dtp5src | dtp6 | dtp6src | dtp7 | dtp7src | dtp8 | dtp8src |
1/1/2016 | M | ||||||||||||||
2/22/2016 | M | 5/10/2016 | M | 7/19/2016 | M | 7/14/2017 | M | ||||||||
U | 5/10/2016 | M | 7/7/2016 | M | 5/1/2017 | M | |||||||||
4/1/2016 | M | 4/5/2017 | M | 5/3/2017 | M | U | |||||||||
3/9/2016 | M | M | 7/8/2016 | M | 7/20/2017 | M | |||||||||
4/4/2016 | H | 7/11/2016 | H | 9/26/2016 | H | 4/24/2017 | H |
Each "src" column refers to the column before it (where dtp1src is the vaccination source for the dtp1 vaccination date). What I want to end up with is a error variable of any vaccination date (DTP1-DTP8) that does not have a source (DTP1SRC-DTP8SRC) and any source (DTP1-DTP8) that does not have a vaccination date (DTP1-DTP8). Below is the code I currently have. It works fine, but I feel like there is a more elegant way to flag these using an array/do loop. Any suggestions are greatly appreciated!
DATA CLEANING.DTP_ERROR;
SET CLEANING.VALID;
IF DTP1 NE . AND DTP1src=' ' OR DTP1=. AND DTP1SRC NE ' ' THEN ERROR= 'SOURCECHECK';
IF DTP2 NE . AND DTP2src=' ' OR DTP2=. AND DTP2SRC NE ' ' THEN ERROR= 'SOURCECHECK';
IF DTP3 NE . AND DTP3src=' ' OR DTP3=. AND DTP3SRC NE ' ' THEN ERROR= 'SOURCECHECK';
IF DTP4 NE . AND DTP4src=' ' OR DTP4=. AND DTP4SRC NE ' ' THEN ERROR= 'SOURCECHECK';
IF DTP5 NE . AND DTP5src=' ' OR DTP5=. AND DTP5SRC NE ' ' THEN ERROR= 'SOURCECHECK';
IF DTP6 NE . AND DTP6src=' ' OR DTP6=. AND DTP6SRC NE ' ' THEN ERROR= 'SOURCECHECK';
IF DTP7 NE . AND DTP7src=' ' OR DTP7=. AND DTP7SRC NE ' ' THEN ERROR= 'SOURCECHECK';
IF DTP8 NE . AND DTP8src=' ' OR DTP8=. AND DTP8SRC NE ' ' THEN ERROR= 'SOURCECHECK';
RUN;
data cleaning.dtp_error;
set cleaning.valid;
array dtp{8} dtp1-dtp8;
array dtpsrc{8} dtp1src dtp2src dtp3src dtp4src dtp5src dtp6src dtp7src dtp8src;
do i = 1 to 8;
if dtp{i} ne . and dtpsrc{i} =' ' or dtp{i} = . and dtpsrc{i} ne ' ' then error = 'SOURCECHECK';
end;
run;
data cleaning.dtp_error;
set cleaning.valid;
array dtp{8} dtp1-dtp8;
array dtpsrc{8} dtp1src dtp2src dtp3src dtp4src dtp5src dtp6src dtp7src dtp8src;
do i = 1 to 8;
if dtp{i} ne . and dtpsrc{i} =' ' or dtp{i} = . and dtpsrc{i} ne ' ' then error = 'SOURCECHECK';
end;
run;
And I strongly second @Reeza 's comment about reasonable data structures. With a properly structured dataset, the code would be simple as that, and you wouldn't waste space for all the missing values.
See Maxim 33: Intelligent data makes for intelligent programs.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.