BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TurnTheBacon
Fluorite | Level 6

I'd like a Splitter transformation in DI Studio to put duplicate values into a separate data set.

In the precode, I've written:

%let dup_count = count(var_id);

In the row selection condition, I've written:

&dup_count > 1;

I suspect there's something wrong with how I'm using dup_count in the precode. Can someone please tell me how to resolve this? Using %count didn't do me any good either, though perhaps I was using it incorrectly.

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

It's always worth to look at the generated code (code tab) of a transformation. If you can understand the code generated (sometimes it takes some effort I must admit) then you get normally the explanation why something is not working as expected.

In your case: The splitter works on row level. It doesn't help that you define a macro variable as pre-code. The macro variable will simply resolve and then the code the SAS interpreter sees is 'count(var_id)>1'. The count() function counts how many times a substring appears in a string - so it needs at least 2 parameters. I assume when you run your job it's not only not doing what you want but even throws an error.

You cant' use a Splitter Transformation to write duplicates to a separate dataset. I wouldn't know right now if there is any standard transformation which allows you to write duplicate records to one table and the remaining unique records to another table. You might need to implement this as a user written transformation.

If you only want to write duplicates to a table then a SQL Join with using only one table as source and a group by <keys> having count(*)>1 would give you the duplicates.

View solution in original post

4 REPLIES 4
LinusH
Tourmaline | Level 20

The splitter transform uses a data step, it seems you are trying to mimic some SQL behavior.

Maybe you should check out if you could use the Data Validation transform instead.

Data never sleeps
TurnTheBacon
Fluorite | Level 6

Thanks for the PROC SQL tip, I changed the precode to:

%let dup_count = 0;

proc sql noprint;

          select count(var_id) into : dup_count

          from &SYSLAST;

quit;

SAS still seems to hate the row selection condition &dup_count > 1; though. "Statement is not valid or it is used out of proper order."

Patrick
Opal | Level 21

Below the SQL code you could use in a user written transformation to get the duplicates. Alternatively you could also use a SQL Join transformation to set up such a SQL (and using a standard transformation is what you should do if you can).

data have;
  do var_id=1,2,3,3,4,5,5,5,6;
    someOtherVar=ranuni(1);
    output;
  end;
run;
%let _input=have;
%let _output=want;

proc sql;
  create table &_output as
    select *
    from &_input
    group by var_id
    having count(*)>1
  ;
quit;

Patrick
Opal | Level 21

It's always worth to look at the generated code (code tab) of a transformation. If you can understand the code generated (sometimes it takes some effort I must admit) then you get normally the explanation why something is not working as expected.

In your case: The splitter works on row level. It doesn't help that you define a macro variable as pre-code. The macro variable will simply resolve and then the code the SAS interpreter sees is 'count(var_id)>1'. The count() function counts how many times a substring appears in a string - so it needs at least 2 parameters. I assume when you run your job it's not only not doing what you want but even throws an error.

You cant' use a Splitter Transformation to write duplicates to a separate dataset. I wouldn't know right now if there is any standard transformation which allows you to write duplicate records to one table and the remaining unique records to another table. You might need to implement this as a user written transformation.

If you only want to write duplicates to a table then a SQL Join with using only one table as source and a group by <keys> having count(*)>1 would give you the duplicates.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1850 views
  • 6 likes
  • 3 in conversation