A User Written transformation in DI Studio uses the following code:
proc sql;
create table work.oppspartkap as
select polisenr, periode_yyyymm, oppspartkapital2 from &_input;
quit;
data &_output;
set work.oppspartkap;
by polisenr notsorted;
if first.polisenr;
run;
After running it, and applying a filter so that it only displays rows with polisenr 1, the work table looks like this:
I need to edit the user written code so that identical rows, such as the three highlighted ones, only appear once. Not sure whether I should use "distinct" somewhere, "group by" somewhere, or something else. Kindly advise me, thanks.
You should only use a User Written Transformation in DI Studio if there is no standard transformation available doing the job for you. In your case you can use the SQL Join transformation to get the job done. Set it up that it generates code as shown below:
proc sql;
create view want as
select
polisenr,
period,
max(OppspartKapital2) as OppspartKapital2
from have
group by polisenr, period
;
quit;
Code as above will deal with duplicates as well as select the "correct" value if there are "duplicates" with different OppspartKapital2 values (just selecting the highest value).
You will have to ask yourself if it's o.k. that you get such data in first place or if you eventually should implement a Validation transformation somewhere upstream in the process to cleanse the data and to move "wrong" data to Error and Exception tables.
Message was edited by: Patrick Matter
You need to sort your data rather than use the notsorted option.
Because the period_yyyymm changes in between those variables SAS considers them different groups.
You can sort either in the proc sql step with an order by or using a select distinct option.
Not sure what you want to get rid of, or if maintaining the order is important (although, since you initially use proc sql, order can't be important).
As such, why not replace:
data &_output;
set work.oppspartkap;
by polisenr notsorted;
if first.polisenr;
run;
with:
proc sort data=work.oppspartkap out=&_output nodupkey;
by polisenr periode_yyyymm oppspartkapital2;
run;
Thanks for the feedback!
Arthur, using your suggested code I get the following output:
As you can see there's some identical sets of polisenr&periode_yyyymm, but with different oppspartkapital2. The last oppspartkapital2 of each set is always correct. So, I think I need to add code that basically says "select last oppspartkapital2 for each distinct polisenr&periode_yyyymm". I'd appreciate some advice on this.
Here's the current code:
proc sql;
create table work.oppspartkap as
select polisenr, periode_yyyymm, oppspartkapital2 from &_input;
quit;
proc sort data=work.oppspartkap out=&_output nodupkey;
by polisenr periode_yyyymm oppspartkapital2;
run;
It doesn't matter to me if I use proc sql or data step.
You should only use a User Written Transformation in DI Studio if there is no standard transformation available doing the job for you. In your case you can use the SQL Join transformation to get the job done. Set it up that it generates code as shown below:
proc sql;
create view want as
select
polisenr,
period,
max(OppspartKapital2) as OppspartKapital2
from have
group by polisenr, period
;
quit;
Code as above will deal with duplicates as well as select the "correct" value if there are "duplicates" with different OppspartKapital2 values (just selecting the highest value).
You will have to ask yourself if it's o.k. that you get such data in first place or if you eventually should implement a Validation transformation somewhere upstream in the process to cleanse the data and to move "wrong" data to Error and Exception tables.
Message was edited by: Patrick Matter
I'm not versed in DI studio, thus I'm not sure what your best route is. However, since you asked, a datastep at the end will do what you ask, namely:
data &_output;
set data &_output;
by polisenr periode_yyyymm;
if last.periode_yyyymm;
run;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.