BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TurnTheBacon
Fluorite | Level 6

A User Written transformation in DI Studio uses the following code:

proc sql;

  create table work.oppspartkap as

  select polisenr, periode_yyyymm, oppspartkapital2 from &_input;

quit;

data &_output;

  set work.oppspartkap;

  by polisenr notsorted;

   if first.polisenr;

run;

After running it, and applying a filter so that it only displays rows with polisenr 1, the work table looks like this:

I need to edit the user written code so that identical rows, such as the three highlighted ones, only appear once. Not sure whether I should use "distinct" somewhere, "group by" somewhere, or something else. Kindly advise me, thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

You should only use a User Written Transformation in DI Studio if there is no standard transformation available doing the job for you. In your case you can use the SQL Join transformation to get the job done. Set it up that it generates code as shown below:

proc sql;

  create view want as

  select

    polisenr,

    period,

    max(OppspartKapital2) as OppspartKapital2

  from have

    group by polisenr, period

  ;

quit;

Code as above will deal with duplicates as well as select the "correct" value if there are "duplicates" with different OppspartKapital2 values (just selecting the highest value).

You will have to ask yourself if it's o.k. that you get such data in first place or if you eventually should implement a Validation transformation somewhere upstream in the process to cleanse the data and to move "wrong" data to Error and Exception tables.

Message was edited by: Patrick Matter

View solution in original post

5 REPLIES 5
Reeza
Super User

You need to sort your data rather than use the notsorted option.

Because the period_yyyymm changes in between those variables SAS considers them different groups.

You can sort either in the proc sql step with an order by or using a select distinct option.

art297
Opal | Level 21

Not sure what you want to get rid of, or if maintaining the order is important (although, since you initially use proc sql, order can't be important).

As such, why not replace:

data &_output;

  set work.oppspartkap;

  by polisenr notsorted;

   if first.polisenr;

run;

with:

proc sort data=work.oppspartkap out=&_output nodupkey;

      by polisenr periode_yyyymm oppspartkapital2;

run;

TurnTheBacon
Fluorite | Level 6

Thanks for the feedback!

Arthur, using your suggested code I get the following output:

Skjermbilde.PNG

As you can see there's some identical sets of polisenr&periode_yyyymm, but with different oppspartkapital2. The last oppspartkapital2 of each set is always correct. So, I think I need to add code that basically says "select last oppspartkapital2 for each distinct polisenr&periode_yyyymm". I'd appreciate some advice on this.

Here's the current code:

proc sql;

  create table work.oppspartkap as

  select polisenr, periode_yyyymm, oppspartkapital2 from &_input;

quit;

proc sort data=work.oppspartkap out=&_output nodupkey;

      by polisenr periode_yyyymm oppspartkapital2;

run;

It doesn't matter to me if I use proc sql or data step.

Patrick
Opal | Level 21

You should only use a User Written Transformation in DI Studio if there is no standard transformation available doing the job for you. In your case you can use the SQL Join transformation to get the job done. Set it up that it generates code as shown below:

proc sql;

  create view want as

  select

    polisenr,

    period,

    max(OppspartKapital2) as OppspartKapital2

  from have

    group by polisenr, period

  ;

quit;

Code as above will deal with duplicates as well as select the "correct" value if there are "duplicates" with different OppspartKapital2 values (just selecting the highest value).

You will have to ask yourself if it's o.k. that you get such data in first place or if you eventually should implement a Validation transformation somewhere upstream in the process to cleanse the data and to move "wrong" data to Error and Exception tables.

Message was edited by: Patrick Matter

art297
Opal | Level 21

I'm not versed in DI studio, thus I'm not sure what your best route is.  However, since you asked, a datastep at the end will do what you ask, namely:

data &_output;

  set data &_output;

  by polisenr periode_yyyymm;

  if last.periode_yyyymm;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1061 views
  • 6 likes
  • 4 in conversation