SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

User Written code to only display unique rows (based on two variables)

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 89
Accepted Solution

User Written code to only display unique rows (based on two variables)

A User Written transformation in DI Studio uses the following code:

proc sql;

  create table work.oppspartkap as

  select polisenr, periode_yyyymm, oppspartkapital2 from &_input;

quit;

data &_output;

  set work.oppspartkap;

  by polisenr notsorted;

   if first.polisenr;

run;

After running it, and applying a filter so that it only displays rows with polisenr 1, the work table looks like this:

I need to edit the user written code so that identical rows, such as the three highlighted ones, only appear once. Not sure whether I should use "distinct" somewhere, "group by" somewhere, or something else. Kindly advise me, thanks.


Accepted Solutions
Solution
‎11-10-2012 05:16 PM
Respected Advisor
Posts: 4,173

Re: User Written code to only display unique rows (based on two variables)

Posted in reply to TurnTheBacon

You should only use a User Written Transformation in DI Studio if there is no standard transformation available doing the job for you. In your case you can use the SQL Join transformation to get the job done. Set it up that it generates code as shown below:

proc sql;

  create view want as

  select

    polisenr,

    period,

    max(OppspartKapital2) as OppspartKapital2

  from have

    group by polisenr, period

  ;

quit;

Code as above will deal with duplicates as well as select the "correct" value if there are "duplicates" with different OppspartKapital2 values (just selecting the highest value).

You will have to ask yourself if it's o.k. that you get such data in first place or if you eventually should implement a Validation transformation somewhere upstream in the process to cleanse the data and to move "wrong" data to Error and Exception tables.

Message was edited by: Patrick Matter

View solution in original post


All Replies
Super User
Posts: 19,869

Re: User Written code to only display unique rows (based on two variables)

Posted in reply to TurnTheBacon

You need to sort your data rather than use the notsorted option.

Because the period_yyyymm changes in between those variables SAS considers them different groups.

You can sort either in the proc sql step with an order by or using a select distinct option.

PROC Star
Posts: 7,492

Re: User Written code to only display unique rows (based on two variables)

Posted in reply to TurnTheBacon

Not sure what you want to get rid of, or if maintaining the order is important (although, since you initially use proc sql, order can't be important).

As such, why not replace:

data &_output;

  set work.oppspartkap;

  by polisenr notsorted;

   if first.polisenr;

run;

with:

proc sort data=work.oppspartkap out=&_output nodupkey;

      by polisenr periode_yyyymm oppspartkapital2;

run;

Frequent Contributor
Posts: 89

Re: User Written code to only display unique rows (based on two variables)

Thanks for the feedback!

Arthur, using your suggested code I get the following output:

Skjermbilde.PNG

As you can see there's some identical sets of polisenr&periode_yyyymm, but with different oppspartkapital2. The last oppspartkapital2 of each set is always correct. So, I think I need to add code that basically says "select last oppspartkapital2 for each distinct polisenr&periode_yyyymm". I'd appreciate some advice on this.

Here's the current code:

proc sql;

  create table work.oppspartkap as

  select polisenr, periode_yyyymm, oppspartkapital2 from &_input;

quit;

proc sort data=work.oppspartkap out=&_output nodupkey;

      by polisenr periode_yyyymm oppspartkapital2;

run;

It doesn't matter to me if I use proc sql or data step.

Solution
‎11-10-2012 05:16 PM
Respected Advisor
Posts: 4,173

Re: User Written code to only display unique rows (based on two variables)

Posted in reply to TurnTheBacon

You should only use a User Written Transformation in DI Studio if there is no standard transformation available doing the job for you. In your case you can use the SQL Join transformation to get the job done. Set it up that it generates code as shown below:

proc sql;

  create view want as

  select

    polisenr,

    period,

    max(OppspartKapital2) as OppspartKapital2

  from have

    group by polisenr, period

  ;

quit;

Code as above will deal with duplicates as well as select the "correct" value if there are "duplicates" with different OppspartKapital2 values (just selecting the highest value).

You will have to ask yourself if it's o.k. that you get such data in first place or if you eventually should implement a Validation transformation somewhere upstream in the process to cleanse the data and to move "wrong" data to Error and Exception tables.

Message was edited by: Patrick Matter

PROC Star
Posts: 7,492

Re: User Written code to only display unique rows (based on two variables)

Posted in reply to TurnTheBacon

I'm not versed in DI studio, thus I'm not sure what your best route is.  However, since you asked, a datastep at the end will do what you ask, namely:

data &_output;

  set data &_output;

  by polisenr periode_yyyymm;

  if last.periode_yyyymm;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 336 views
  • 6 likes
  • 4 in conversation