I have a dataset with 3 columns . For ex--
Id CustomerName TransactionDate
1 Mayank 10 aug 2018
2 Mayank 11 aug 2018
3 Akhil 10 aug 2018
4 Neha 20 aug 2018
5 Abhi 23 aug 2018
6 Neha 30 may 2017
I want to create dataset with datasteps from this dataset where customer has done multiple transaction like (Mayank , Neha) but not (AKhil , Abhi).
So Result should be--
Id CustomerName TransactionDate
1 Mayank 10 aug 2018
2 Mayank 11 aug 2018
4 Neha 20 aug 2018
6 Neha 30 may 2017
Use PROC SORT to re-order the observations:
proc sort data=have;
by CustomerName;
run;
Then you can select those with more than one observation:
data want;
set have;
by CustomerName;
if first.CustomerName=1 and last.CustomerName=1 then delete;
run;
There are ways to do this in one step with PROC SQL, but I'm more familiar with a DATA step.
Is there a rule for excluding the record or just a few specific ones you don't want.
If just a few specific are to be removed or ignored
data want; set have if customername not in ('Akhil' 'Abhi'); run;
Use PROC SORT to re-order the observations:
proc sort data=have;
by CustomerName;
run;
Then you can select those with more than one observation:
data want;
set have;
by CustomerName;
if first.CustomerName=1 and last.CustomerName=1 then delete;
run;
There are ways to do this in one step with PROC SQL, but I'm more familiar with a DATA step.
PROC SQL;
create table data_set_new as
SELECT id, CustomerName,TransactionDate , COUNT(1) AS Cnt
FROM data_set
GROUP BY CustomerName
HAVING Cnt > 1
ORDER BY Cnt DESC;
QUIT;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.