I am creating a database from another and I want to keep observations that have codes in certain variables. It will look something like this:
data new;
set old;
if ((4100 or 4101) in (DX1-DX25)) and (100 or 101 or 105 in (DXCC1-DXCC25))) or
((4200 or 4201) in (DX1-DX25)) and (200 or 201 or 205 in (DXCC1-DXCC25)));
run;
Is the syntax correct for what I want to do? I want to make sure before I run because it is a large dataset and will take a long time.
Just create 2 do loops. Something like this:
data new (drop = i);
set old;
array DX{*} DX1-DX25;
array DXCC{*} DXCC1-DXCC14;
do i = 1 to dim(DX);
if DX[i] in (4100, 4101) then dx1 = 1;
if DX[i] in (4200, 4201) then dx2 = 1;
end;
do i = 1 to dim(DXCC);
if DXCC[i] in (100,101,105) then dxcc1=1;
if DXCC[i] in (200,201,205) then dxcc2=1;
end;
if sum(dx1,dxcc1) = 2 or sum(dx2,dxcc2) = 2;
run;
I would suggest to test your code first on a small subset 🙂
If you want to test code simply, you can do that:
data want;
set have (obs=100);
** add whatever you want to do;
run;
Here, your syntax is far from working. This would be a reasonable approach:
data want;
set have;
array dx {25};
array dxcc {23};
do k=1 to 25 until;
if dx{k} in (4100, 4101) then flag_group1=1;
else if dx{k} in (4200, 4201) then flag_group2=1;
end;
if flag_group1=1 then do k=1 to 23 until (flag=1);
if dxcc{k} in (100, 101, 105) then flag=1;
end;
if flag_group2=1 and flag ne 1 then do k=1 to 23 until (flag=1);
if dxcc{k} in (200, 201, 205) then flag=1;
end;
if flag=1;
drop k flag flag_group1 flag_group2;
run;
And do test it using OBS= before running against the entire data set.
***** EDITED: changes added to allow for different number of elements in each array.
Also Edited to match your AND / OR logic.
I'm not sure what you are asking. Do you have some other number of elements (not 25)? Are you talking about the lengths of character variables (which requires a slightly different syntax)?
I think this code will work, with the assumption you have equal number of variables both for DX and DXCC.
data new (drop = i);
set old;
array DX{*} DX1-DX25;
array DXCC{*} DXCC1-DXCC25;
do i = 1 to dim(DX);
if DX[i] in (4100, 4101) then dx1 = 1;
if DXCC[i] in (100,101,105) then dxcc1=1;
if DX[i] in (4200, 4201) then dx2 = 1;
if DXCC[i] in (200,201,205) then dxcc2=1;
end;
if sum(dx1,dxcc1) = 2 or sum(dx2,dxcc2) = 2;
run;
Just create 2 do loops. Something like this:
data new (drop = i);
set old;
array DX{*} DX1-DX25;
array DXCC{*} DXCC1-DXCC14;
do i = 1 to dim(DX);
if DX[i] in (4100, 4101) then dx1 = 1;
if DX[i] in (4200, 4201) then dx2 = 1;
end;
do i = 1 to dim(DXCC);
if DXCC[i] in (100,101,105) then dxcc1=1;
if DXCC[i] in (200,201,205) then dxcc2=1;
end;
if sum(dx1,dxcc1) = 2 or sum(dx2,dxcc2) = 2;
run;
I would suggest to test your code first on a small subset 🙂
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.