One way that might get you started:
Proc freq data=have noprint; tables var1*var2 / out=want(where=(count>1) drop=percent) ; run;
The noprint option suppresses the normal proc freq output. The out= creates a data set named Want with the levels of the variable combinations and the count, and only when the count is more than 1, i.e. duplicates.
If you want to treat a missing value as a valid combination add the keyword MISSING after the / .
A small (but realistic) example would be helpful. Please follow these instructions when providing data. Do not provide data as screen captures or as attachments. More context about the problem would also be helpful.
Sorry, unable to submit data as its sensitive.
@danhopkinslewis wrote:
Sorry, unable to submit data as its sensitive.
Make up some data, as long as it represents the problem.
Then make up some fake data that illustrates your issue. Just make sure that variable types and other attributes (length, format) are the same.
One way that might get you started:
Proc freq data=have noprint; tables var1*var2 / out=want(where=(count>1) drop=percent) ; run;
The noprint option suppresses the normal proc freq output. The out= creates a data set named Want with the levels of the variable combinations and the count, and only when the count is more than 1, i.e. duplicates.
If you want to treat a missing value as a valid combination add the keyword MISSING after the / .
Thanks. So if I wanted to bring in all the other variable columns but not include them in the count, how would I do that?
@danhopkinslewis wrote:
Thanks. So if I wanted to bring in all the other variable columns but not include them in the count, how would I do that?
Do ALL of the other variables where you have duplicates have the same values?
If not you will need to decide which ones you want.
If the idea is to add a code that the variable combination is a duplicate then you could merge this back on the original data.
Again pseudo code because you haven't mentioned names of data sets or variables: The below renames the count to indicate that it is indeed a duplicate count.
Proc sort data=have; by var1 var2; run; data final; merge have want( rename=(count=dupecount) );
by var1 var2; run;
If you want a simple flag for "this record is part of a duplicate set" you could add something like:
Flag = (dupecount>1);
SAS will treat the result of logical comparisons as a numeric one when true and zero when false.
Other logic could be used depending on what you want.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.