Hi all,
I'm working on a dataset that has repeated data. Below is a sample of the Have data:
CustID | Count | CustTo |
AAA | 3 | AAA |
AAA | 3 | BBB |
AAA | 3 | DDD |
BBB | 2 | AAA |
BBB | 2 | AAA |
CCC | 1 | BBB |
DDD | 3 | AAA |
DDD | 3 | BBB |
DDD | 3 | AAA |
EEE | 2 | GGG |
EEE | 2 | HHH |
FFF | 1 | AAA |
GGG | 1 | AAA |
HHH | 4 | BBB |
HHH | 4 | BBB |
HHH | 4 | BBB |
HHH | 4 | HHH |
in sas format:
data have;
infile datalines truncover dsd dlm=',';
input (custID count custto) ($);
datalines;
AAA,3,AAA
AAA,3,BBB
AAA,3,DDD
BBB,2,AAA
BBB,2,AAA
CCC,1,BBB
DDD,3,AAA
DDD,3,BBB
DDD,3,AAA
EEE,2,GGG
EEE,2,HHH
FFF,1,AAA
GGG,1,AAA
HHH,4,BBB
HHH,4,BBB
HHH,4,BBB
HHH,4,HHH
;
I want to look at the data in groups, one CUSTID at a time.
For each CUSTID the same CUSTID should be present in the CustTo column at least once or else that entire group of CUSTID obs is to be deleted (or sent to another output dataset).
The Want table would look like this, where only AAA and HHH data remains and all other CUSTIDs were removed.:
CustID | Count | CustTo |
AAA | 3 | AAA |
AAA | 3 | BBB |
AAA | 3 | DDD |
HHH | 4 | BBB |
HHH | 4 | BBB |
HHH | 4 | BBB |
HHH | 4 | HHH |
Based on searches on this forum I tried this but it did not work in the full dataset:
proc sql;
create table want as
select * from have where custid in (select distinct custto from have);
quit;
Any help would be much appreciated!
Read the keys where the condition is met into a hash:
data want;
set have;
if _n_ = 1
then do;
declare hash h (dataset:"have (where=(custid = custto))");
h.definekey("custid");
h.definedone();
end;
if h.check() = 0;
run;
or use a double DO loop:
data want;
flag = 0;
do until (last.custid);
set have;
by custid;
if custid = custto then flag = 1;
end;
do until (last.custid);
set have;
by custid;
if flag then output;
end;
drop flag;
run;
Read the keys where the condition is met into a hash:
data want;
set have;
if _n_ = 1
then do;
declare hash h (dataset:"have (where=(custid = custto))");
h.definekey("custid");
h.definedone();
end;
if h.check() = 0;
run;
or use a double DO loop:
data want;
flag = 0;
do until (last.custid);
set have;
by custid;
if custid = custto then flag = 1;
end;
do until (last.custid);
set have;
by custid;
if flag then output;
end;
drop flag;
run;
Thanks Kurt, that worked!
I have a related question (not sure if I should ask that in a new topic as it is similar but with one additional column).
If you wish I can move this question to a new topic.
data have;
infile datalines truncover dsd dlm=',';
input (custID count custfrom custto) ($);
datalines;
AxA,3,aaa,aaa
AxA,3,aaa,bbb
AxA,3,aaa,ccc
B2B,2,bbb,ccc
B2B,2,bbb,ddd
CCC,1,ccc,aaa
DzD,3,ddd,aaa
DzD,3,ddd,hhh
DzD,3,ddd,ccc
EEE,2,eee,ccc
EtE,2,eee,aaa
FtF,1,fff,ggg
GGG,1,ggg,hhh
HHt,4,hhh,aaa
HHt,4,hhh,bbb
HHt,4,hhh,ccc
HHt,4,hhh,hhh
;
In this case I want to subset by orderID.
If custfrom does not equal to custto at least once per orderID then I would like to remove all of those orderid observations from the dataset.
want would look like this:
CustID | Count | Custfrom | custto |
AxA | 3 | aaa | aaa |
AxA | 3 | aaa | bbb |
AxA | 3 | aaa | ccc |
HHt | 4 | hhh | aaa |
HHt | 4 | hhh | bbb |
HHt | 4 | hhh | ccc |
HHt | 4 | hhh | hhh |
thank you!
Just add your condition into the subquery.
proc sql;
create table want as
select a.* from have a
where a.custid in
(select distinct b.custid
from have b
where b.custid = b.custto
)
;
quit;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.