## Syntax for creating a new dataset based on codes for specific variables

Solved
Frequent Contributor
Posts: 132

# Syntax for creating a new dataset based on codes for specific variables

[ Edited ]

I am creating a database from another and I want to keep observations that have codes in certain variables. It will look something like this:

data new;

set old;

if ((4100 or 4101) in (DX1-DX25)) and (100 or 101 or 105 in (DXCC1-DXCC25))) or

((4200 or 4201) in (DX1-DX25)) and (200 or 201 or 205 in (DXCC1-DXCC25)));

run;

Is the syntax correct for what I want to do? I want to make sure before I run because it is a large dataset and will take a long time.

Accepted Solutions
Solution
‎12-14-2017 05:12 PM
Occasional Contributor
Posts: 15

## Re: Syntax for creating a new dataset based on codes for specific variables

Just create 2 do loops. Something like this:

data new (drop = i);
set old;
array DX{*} DX1-DX25;
array DXCC{*} DXCC1-DXCC14;

do i = 1 to dim(DX);
if DX[i] in (4100, 4101) then dx1 = 1;
if DX[i] in (4200, 4201) then dx2 = 1;
end;

do i = 1 to dim(DXCC);
if DXCC[i] in (100,101,105) then dxcc1=1;
if DXCC[i] in (200,201,205) then dxcc2=1;
end;

if sum(dx1,dxcc1) = 2 or sum(dx2,dxcc2) = 2;
run;

I would suggest to test your code first on a small subset

All Replies
Super User
Posts: 6,543

## Re: Syntax for creating a new dataset based on codes for specific variables

[ Edited ]

If you want to test code simply, you can do that:

data want;

set have (obs=100);

** add whatever you want to do;

run;

Here, your syntax is far from working.  This would be a reasonable approach:

data want;

set have;

array dx {25};

array dxcc {23};

do k=1 to 25 until;

if dx{k} in (4100, 4101) then flag_group1=1;

else if dx{k} in (4200, 4201) then flag_group2=1;

end;

if flag_group1=1 then do k=1 to 23 until (flag=1);

if dxcc{k} in (100, 101, 105) then flag=1;

end;

if flag_group2=1 and flag ne 1 then do k=1 to 23 until (flag=1);

if dxcc{k} in (200, 201, 205) then flag=1;

end;

if flag=1;

drop k flag flag_group1 flag_group2;

run;

And do test it using OBS= before running against the entire data set.

***** EDITED:  changes added to allow for different number of elements in each array.

Also Edited to match your AND / OR logic.

Frequent Contributor
Posts: 132

## Re: Syntax for creating a new dataset based on codes for specific variables

my dxcc actually has a difference length, I just noticed. Will this still work if I use the longest length of the 2 in my k-1 to x?
Super User
Posts: 6,543

## Re: Syntax for creating a new dataset based on codes for specific variables

I'm not sure what you are asking.  Do you have some other number of elements (not 25)?  Are you talking about the lengths of character variables (which requires a slightly different syntax)?

Frequent Contributor
Posts: 132

## Re: Syntax for creating a new dataset based on codes for specific variables

Ah sorry - yes the number if elements in the arrays are different for DX and DXCC.
Occasional Contributor
Posts: 15

## Re: Syntax for creating a new dataset based on codes for specific variables

I think this code will work, with the assumption you have equal number of variables both for DX and DXCC.

data new (drop = i);
set old;
array DX{*} DX1-DX25;
array DXCC{*} DXCC1-DXCC25;

do i = 1 to dim(DX);

if DX[i] in (4100, 4101) then dx1 = 1;
if DXCC[i] in (100,101,105) then dxcc1=1;
if DX[i] in (4200, 4201) then dx2 = 1;
if DXCC[i] in (200,201,205) then dxcc2=1;

end;
if sum(dx1,dxcc1) = 2 or sum(dx2,dxcc2) = 2;
run;

Frequent Contributor
Posts: 132

## Re: Syntax for creating a new dataset based on codes for specific variables

Thanks for your response. What if the number of vars are different for DX and DXCC?
Solution
‎12-14-2017 05:12 PM
Occasional Contributor
Posts: 15

## Re: Syntax for creating a new dataset based on codes for specific variables

Just create 2 do loops. Something like this:

data new (drop = i);
set old;
array DX{*} DX1-DX25;
array DXCC{*} DXCC1-DXCC14;

do i = 1 to dim(DX);
if DX[i] in (4100, 4101) then dx1 = 1;
if DX[i] in (4200, 4201) then dx2 = 1;
end;

do i = 1 to dim(DXCC);
if DXCC[i] in (100,101,105) then dxcc1=1;
if DXCC[i] in (200,201,205) then dxcc2=1;
end;

if sum(dx1,dxcc1) = 2 or sum(dx2,dxcc2) = 2;
run;

I would suggest to test your code first on a small subset

☑ This topic is solved.