SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Using Splitter to put duplicate values into a separate table

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 89
Accepted Solution

Using Splitter to put duplicate values into a separate table

I'd like a Splitter transformation in DI Studio to put duplicate values into a separate data set.

In the precode, I've written:

%let dup_count = count(var_id);

In the row selection condition, I've written:

&dup_count > 1;

I suspect there's something wrong with how I'm using dup_count in the precode. Can someone please tell me how to resolve this? Using %count didn't do me any good either, though perhaps I was using it incorrectly.

Thanks.


Accepted Solutions
Solution
‎10-02-2012 07:52 AM
Respected Advisor
Posts: 4,173

Re: Using Splitter to put duplicate values into a separate table

Posted in reply to TurnTheBacon

It's always worth to look at the generated code (code tab) of a transformation. If you can understand the code generated (sometimes it takes some effort I must admit) then you get normally the explanation why something is not working as expected.

In your case: The splitter works on row level. It doesn't help that you define a macro variable as pre-code. The macro variable will simply resolve and then the code the SAS interpreter sees is 'count(var_id)>1'. The count() function counts how many times a substring appears in a string - so it needs at least 2 parameters. I assume when you run your job it's not only not doing what you want but even throws an error.

You cant' use a Splitter Transformation to write duplicates to a separate dataset. I wouldn't know right now if there is any standard transformation which allows you to write duplicate records to one table and the remaining unique records to another table. You might need to implement this as a user written transformation.

If you only want to write duplicates to a table then a SQL Join with using only one table as source and a group by <keys> having count(*)>1 would give you the duplicates.

View solution in original post


All Replies
Super User
Posts: 5,426

Re: Using Splitter to put duplicate values into a separate table

Posted in reply to TurnTheBacon

The splitter transform uses a data step, it seems you are trying to mimic some SQL behavior.

Maybe you should check out if you could use the Data Validation transform instead.

Data never sleeps
Frequent Contributor
Posts: 89

Re: Using Splitter to put duplicate values into a separate table

Thanks for the PROC SQL tip, I changed the precode to:

%let dup_count = 0;

proc sql noprint;

          select count(var_id) into : dup_count

          from &SYSLAST;

quit;

SAS still seems to hate the row selection condition &dup_count > 1; though. "Statement is not valid or it is used out of proper order."

Respected Advisor
Posts: 4,173

Re: Using Splitter to put duplicate values into a separate table

Posted in reply to TurnTheBacon

Below the SQL code you could use in a user written transformation to get the duplicates. Alternatively you could also use a SQL Join transformation to set up such a SQL (and using a standard transformation is what you should do if you can).

data have;
  do var_id=1,2,3,3,4,5,5,5,6;
    someOtherVar=ranuni(1);
    output;
  end;
run;
%let _input=have;
%let _output=want;

proc sql;
  create table &_output as
    select *
    from &_input
    group by var_id
    having count(*)>1
  ;
quit;

Solution
‎10-02-2012 07:52 AM
Respected Advisor
Posts: 4,173

Re: Using Splitter to put duplicate values into a separate table

Posted in reply to TurnTheBacon

It's always worth to look at the generated code (code tab) of a transformation. If you can understand the code generated (sometimes it takes some effort I must admit) then you get normally the explanation why something is not working as expected.

In your case: The splitter works on row level. It doesn't help that you define a macro variable as pre-code. The macro variable will simply resolve and then the code the SAS interpreter sees is 'count(var_id)>1'. The count() function counts how many times a substring appears in a string - so it needs at least 2 parameters. I assume when you run your job it's not only not doing what you want but even throws an error.

You cant' use a Splitter Transformation to write duplicates to a separate dataset. I wouldn't know right now if there is any standard transformation which allows you to write duplicate records to one table and the remaining unique records to another table. You might need to implement this as a user written transformation.

If you only want to write duplicates to a table then a SQL Join with using only one table as source and a group by <keys> having count(*)>1 would give you the duplicates.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 591 views
  • 6 likes
  • 3 in conversation