I am creating a database from another and I want to keep observations that have codes in certain variables. It will look something like this:
data new;
set old;
if ((4100 or 4101) in (DX1-DX25)) and (100 or 101 or 105 in (DXCC1-DXCC25))) or
((4200 or 4201) in (DX1-DX25)) and (200 or 201 or 205 in (DXCC1-DXCC25)));
run;
Is the syntax correct for what I want to do? I want to make sure before I run because it is a large dataset and will take a long time.
Just create 2 do loops. Something like this:
data new (drop = i);
set old;
array DX{*} DX1-DX25;
array DXCC{*} DXCC1-DXCC14;
do i = 1 to dim(DX);
if DX[i] in (4100, 4101) then dx1 = 1;
if DX[i] in (4200, 4201) then dx2 = 1;
end;
do i = 1 to dim(DXCC);
if DXCC[i] in (100,101,105) then dxcc1=1;
if DXCC[i] in (200,201,205) then dxcc2=1;
end;
if sum(dx1,dxcc1) = 2 or sum(dx2,dxcc2) = 2;
run;
I would suggest to test your code first on a small subset 🙂
If you want to test code simply, you can do that:
data want;
set have (obs=100);
** add whatever you want to do;
run;
Here, your syntax is far from working. This would be a reasonable approach:
data want;
set have;
array dx {25};
array dxcc {23};
do k=1 to 25 until;
if dx{k} in (4100, 4101) then flag_group1=1;
else if dx{k} in (4200, 4201) then flag_group2=1;
end;
if flag_group1=1 then do k=1 to 23 until (flag=1);
if dxcc{k} in (100, 101, 105) then flag=1;
end;
if flag_group2=1 and flag ne 1 then do k=1 to 23 until (flag=1);
if dxcc{k} in (200, 201, 205) then flag=1;
end;
if flag=1;
drop k flag flag_group1 flag_group2;
run;
And do test it using OBS= before running against the entire data set.
***** EDITED: changes added to allow for different number of elements in each array.
Also Edited to match your AND / OR logic.
I'm not sure what you are asking. Do you have some other number of elements (not 25)? Are you talking about the lengths of character variables (which requires a slightly different syntax)?
I think this code will work, with the assumption you have equal number of variables both for DX and DXCC.
data new (drop = i);
set old;
array DX{*} DX1-DX25;
array DXCC{*} DXCC1-DXCC25;
do i = 1 to dim(DX);
if DX[i] in (4100, 4101) then dx1 = 1;
if DXCC[i] in (100,101,105) then dxcc1=1;
if DX[i] in (4200, 4201) then dx2 = 1;
if DXCC[i] in (200,201,205) then dxcc2=1;
end;
if sum(dx1,dxcc1) = 2 or sum(dx2,dxcc2) = 2;
run;
Just create 2 do loops. Something like this:
data new (drop = i);
set old;
array DX{*} DX1-DX25;
array DXCC{*} DXCC1-DXCC14;
do i = 1 to dim(DX);
if DX[i] in (4100, 4101) then dx1 = 1;
if DX[i] in (4200, 4201) then dx2 = 1;
end;
do i = 1 to dim(DXCC);
if DXCC[i] in (100,101,105) then dxcc1=1;
if DXCC[i] in (200,201,205) then dxcc2=1;
end;
if sum(dx1,dxcc1) = 2 or sum(dx2,dxcc2) = 2;
run;
I would suggest to test your code first on a small subset 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.