Fluorite | Level 6

## Output a group if conditional statement is met

Hello,

I have a dataset of 18 million records. There are repeating person ids. I need this dataset split into two separate datasets. One dataset should be for a person who has ever had mx = NVX. The other data set would be for everyone else.

Have:

``````Obs person_id MX
1 10000103722 PFR
2 10000103722 PFR
3 10000181132 NVX
4 10000181132 MOD
5 10000189649 PFR
6 10000189649 PFR
7 10000189649 NVX
8 10000189649 PFR
9 10000371289 PFR
10 10000371289 PFR
11 10000751831 NVX
12 10000810473 MOD
13 10000810473 MOD
14 10000829168 JSN
15 10000864651 PFR
16 10000864651 PFR
17 10000865007 PFR
18 10000865007 NVX
19 10000865500 PFR
20 10000865500 PFR
21 10000865883 MOD
22 10000865883 MOD
23 10000866093 PFR
24 10000866093 NVX
25 10000866093 NVX``````

Want:

``````Obs person_id MX
3 10000181132 NVX
4 10000181132 MOD
5 10000189649 PFR
6 10000189649 PFR
7 10000189649 NVX
11 10000751831 NVX
17 10000865007 PFR
18 10000865007 NVX
23 10000866093 PFR
24 10000866093 NVX
25 10000866093 NVX``````

I used the below code to try and do this, but it is only outputting records when mx=NVX and not outputting the entire group when NVX is found.

``````Data admin_adult2 admin_nvx;
By person_id;
if mx IN ('NVX') then output admin_nvx;
Run;``````

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Output a group if conditional statement is met

Use a double DO loop (your dataset seems to already be sorted by person_id):

``````data want;
do until (last.person_id);
set have;
by person_id;
if mx = "NVX" then flag = 1;
end;
do until (last.person_id);
set have;
by person_id;
if flag then output;
end;
drop flag;
run;``````
6 REPLIES 6
Lapis Lazuli | Level 10

## Re: Output a group if conditional statement is met

One way to get the result is to create a macro variable to reference all person_id that have met the criteria., then subset the data based on the macro variable values.
Eg (assuming person_id is a numeric variable):

``````data have;
input Obs person_id MX \$;
cards;
1 10000103722 PFR
2 10000103722 PFR
3 10000181132 NVX
4 10000181132 MOD
5 10000189649 PFR
6 10000189649 PFR
7 10000189649 NVX
8 10000189649 PFR
9 10000371289 PFR
10 10000371289 PFR
11 10000751831 NVX
12 10000810473 MOD
13 10000810473 MOD
14 10000829168 JSN
15 10000864651 PFR
16 10000864651 PFR
17 10000865007 PFR
18 10000865007 NVX
19 10000865500 PFR
20 10000865500 PFR
21 10000865883 MOD
22 10000865883 MOD
23 10000866093 PFR
24 10000866093 NVX
25 10000866093 NVX
;
proc print; run;

proc sql noprint;
select distinct person_id format=16.
into: ids separated by ','
from have
where upcase(strip(mx)) eq 'NVX';
%put Selected Person_id= &ids;
quit;
data want1 want2;
set have;
if person_id in (&ids) then output want1;
else output want2;
run; ``````
Super User

## Re: Output a group if conditional statement is met

``````data have;
input Obs person_id MX \$;
cards;
1 10000103722 PFR
2 10000103722 PFR
3 10000181132 NVX
4 10000181132 MOD
5 10000189649 PFR
6 10000189649 PFR
7 10000189649 NVX
8 10000189649 PFR
9 10000371289 PFR
10 10000371289 PFR
11 10000751831 NVX
12 10000810473 MOD
13 10000810473 MOD
14 10000829168 JSN
15 10000864651 PFR
16 10000864651 PFR
17 10000865007 PFR
18 10000865007 NVX
19 10000865500 PFR
20 10000865500 PFR
21 10000865883 MOD
22 10000865883 MOD
23 10000866093 PFR
24 10000866093 NVX
25 10000866093 NVX
;

proc sql;
create table NVX          as select distinct person_id from have where MX='NVX';
create table admin_nvx    as select * from have where person_id in (select  person_id from NVX);
create table admin_adult2 as select * from have where person_id not in (select  person_id from NVX);
quit;``````
Meteorite | Level 14

## Re: Output a group if conditional statement is met

Assuming your input data is sorted by person_id, a fast way to do it is this:

``````data admin_nvx admin_adult2;
merge have(where=(mx='NVX') in=NVX) have;
by person_id;
run;``````
SAS Employee

## Re: Output a group if conditional statement is met

See if this does the job - It's returning the same number of records as your output above, but in a slightly different order.   Just using some proc sort and SQL....

proc sort data=have nodupkey out=have_1_nvx;
by person_id;
where mx = 'NVX';
run;

proc sql;
select have.obs, have.person_id format=12., have.mx
from have_1_nvx,have
where have_1_nvx.person_id = have.person_id;
quit;

## Re: Output a group if conditional statement is met

If you data are sorted by person_id, then a self merge will help you deterimine where person_id's have an 'NVX' record:

``````
data have;
input Obs person_id MX \$;
cards;
1 10000103722 PFR
2 10000103722 PFR
3 10000181132 NVX
4 10000181132 MOD
5 10000189649 PFR
6 10000189649 PFR
7 10000189649 NVX
8 10000189649 PFR
9 10000371289 PFR
10 10000371289 PFR
11 10000751831 NVX
12 10000810473 MOD
13 10000810473 MOD
14 10000829168 JSN
15 10000864651 PFR
16 10000864651 PFR
17 10000865007 PFR
18 10000865007 NVX
19 10000865500 PFR
20 10000865500 PFR
21 10000865883 MOD
22 10000865883 MOD
23 10000866093 PFR
24 10000866093 NVX
25 10000866093 NVX
;
data want1 want2;
merge have (where=(mx='NVX') in=found_nvx)
have ;
by person_id;
if found_nvx then output want1;
else output want2;
run;``````

The IN parameter FOUND_NVX will be 1 for an entire person_id even if only 1 such record is found.

And it's crucial to this self-merge that the dataset name with the WHERE=(mx='NVX') parameter appears first.  This generates the desired creation of the FOUND_NVX dummy, while allowing the sequence of data records from the second reference to HAVE overwrite records from the first, to be passed on to the output datasets.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Super User

## Re: Output a group if conditional statement is met

Use a double DO loop (your dataset seems to already be sorted by person_id):

``````data want;
do until (last.person_id);
set have;
by person_id;
if mx = "NVX" then flag = 1;
end;
do until (last.person_id);
set have;
by person_id;
if flag then output;
end;
drop flag;
run;``````
Discussion stats
• 6 replies
• 268 views
• 2 likes
• 7 in conversation