I have a dataset with 3 columns . For ex--
Id CustomerName TransactionDate
1 Mayank 10 aug 2018
2 Mayank 11 aug 2018
3 Akhil 10 aug 2018
4 Neha 20 aug 2018
5 Abhi 23 aug 2018
6 Neha 30 may 2017
I want to create dataset with datasteps from this dataset where customer has done multiple transaction like (Mayank , Neha) but not (AKhil , Abhi).
So Result should be--
Id CustomerName TransactionDate
1 Mayank 10 aug 2018
2 Mayank 11 aug 2018
4 Neha 20 aug 2018
6 Neha 30 may 2017
Use PROC SORT to re-order the observations:
proc sort data=have;
by CustomerName;
run;
Then you can select those with more than one observation:
data want;
set have;
by CustomerName;
if first.CustomerName=1 and last.CustomerName=1 then delete;
run;
There are ways to do this in one step with PROC SQL, but I'm more familiar with a DATA step.
Is there a rule for excluding the record or just a few specific ones you don't want.
If just a few specific are to be removed or ignored
data want; set have if customername not in ('Akhil' 'Abhi'); run;
Use PROC SORT to re-order the observations:
proc sort data=have;
by CustomerName;
run;
Then you can select those with more than one observation:
data want;
set have;
by CustomerName;
if first.CustomerName=1 and last.CustomerName=1 then delete;
run;
There are ways to do this in one step with PROC SQL, but I'm more familiar with a DATA step.
PROC SQL;
create table data_set_new as
SELECT id, CustomerName,TransactionDate , COUNT(1) AS Cnt
FROM data_set
GROUP BY CustomerName
HAVING Cnt > 1
ORDER BY Cnt DESC;
QUIT;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.