Hi SAS Users,
I have 2 questions.
1. Wanted some suggestions on transpose coding with joining of multiple lines like below example.
example -
data test;
length id $3 run_id $3 run_date $3;
input id run_id run_date ;
cards;
123 ABC 2017/01/01
123 CDE 2017/01/01
;
run;
expected transposed output -
id duplicate_reason
123 ABC|2017/01/01~CDE|2017/01/01
2. this expected transpose output will be passed to another dataset and i have to attatch them more run_ids & run_dates to exisitng duplicate_reason like below example if it is a new value only.
id duplicate_reason
123 ABC|2017/01/01~CDE|2017/01/01~CDE|2017/01/01
can step1 and step2 can be done in a single step of proc transpose ?
Thanks,
Ana
So it sounds like given this input
data have ;
input id run_id $ run_date ;
informat run_date yymmdd10.;
format run_date yymmdd10.;
cards;
123 ABC 2017-03-15
123 XYZ 2017-03-15
123 HSN 2017-03-15
;
You wan to first join it with itself based on the ID variable so that you get all runs combined with all of the other runs for the same id. But you don't want to include the exact duplicates.
proc sql ;
create table full as
select a.*
, case when not (a.run_id=b.run_id and a.run_date=b.run_date) then
catx('|',b.run_id,put(b.run_date,yymmdd10.))
else ' ' end as X length=20
from have a full join have b
on a.id = b.id
order by a.id,a.run_id,a.run_date,b.run_id,b.run_date
;
quit;
Then you need to collapse these back to the original unique rows.
data want ;
do until (last.run_date);
set full;
by id run_id run_date ;
length want $200 ;
want=catx('~',want,x);
end;
drop x;
run;
Likely better to combine the data, remove duplicates and then transpose.
data temp;
set dataone
datatwo;
run;
proc sort data=temp nodupkey;
by id run_id run_date;
run;
then combine the information(transpose). The implication you show that you want multiple run_id/run_date pairs within a single variable makes me cringe. But then I don't know how you are going to use this data.
Thanks for the quick reply Ballard.
But i have all the data in one data set like below.
example -
data test;
length id $3 run_id $3 run_date $3 duplicate_reason $20;
input id run_id run_date duplicate_reason ;
cards;
123 ABC 2017/01/01 XYZ|2016/12/01~HND|2016/11/01
123 CDE 2017/01/01 XYZ|2016/12/01~HND|2016/11/01
;
run;
expected transposed output -
id duplicate_reason
123 XYZ|2016/12/01~HND|2016/11/01~ABC|2017/01/01~CDE|2017/01/01 (add them only if it is not present in the duplicate_reason)
Thanks,
Ana
data test;
length id $3 run_id $3 run_date $10;
input id run_id run_date ;
cards;
123 ABC 2017/01/01
123 CDE 2017/01/01
;
run;
data want;
set test;
by id;
length duplicate_reason $100;
retain duplicate_reason ;
if first.id then do;
call missing(duplicate_reason);
duplicate_reason= catx('|',run_id,run_date);
end;
else duplicate_reason=catx('~',duplicate_reason,trim(run_id)||run_date);
if last.id;
keep id duplicate_reason;
run;
Regards,
Naveen Srinivasan
Thanks for the reply Naveen.
that worked. But i have scenario's where the run_id||run_date combination could be present already. So in those cases i have to ignore them from attaching. How to add tranward or index in that same line where you r attaching the data ?
else duplicate_reason=catx('~',duplicate_reason,trim(run_id)||'|'||run_date);
Thanks,
Ana
Do you mean, you already have the duplicate_reason variable in your dataset? Hmm, i think i am confused with how your data looks like, See if my understanding is right:
/*Another version*/
data want;
set test;
by id;
length temp $100;
retain temp ;
if first.id then do;
call missing(temp);
temp= catx('|',run_id,run_date);
end;
else temp=catx('~',temp,trim(run_id)||run_date);
if last.id and index(duplicate_reason,temp)=0 then duplicate_reason= catx('~',duplicate_reason,temp);
keep id duplicate_reason;
run;
Regards,
Naveen Srinivasan
No, and if you could, it wouldn't be straightforward. You may be able to do it all in a single data step.
Post a larger example and we can help with sample code.
Hi Reeza,
id run_id run_date Duplicate_reason
123 ABC 2017-03-15 DKS|2016-10-21~ XYZ|2017-03-15~ HSN|2017-03-15
123 XYZ 2017-03-15 DKS|2016-10-21~ ABC|2017-03-15~ HSN2017-03-15
123 HSN 2017-03-15 DKS|2016-10-21~ABC|2017-03-15~ XYZ|2017-03-15
data set will be like the above example. & bold letters are the new values that needs to be coded to add to existing duplicate logic , only if they are aready not present.
For the duplicate Reason field logic -
line#1 has to have values from line #2 & line #3 with tilda delimiter.
line#2 has to have values from line #1 & line #3 with tilda delimiter.
line#3 has to have values from line #1 & line #2 with tilda delimiter.
Please let me know if this is not clear.
Thanks,
Ana
What's your logic here?
Edit: You may want to look up 'Cartesian Product' in SAS along with CATX()
🙂 i am still thinking on this and looking for some easier way to get this requirment.
I was thinking of transposing all the fields by id , but it is not working for me, so i posted this on community.
Thanks,
Ana
So it sounds like given this input
data have ;
input id run_id $ run_date ;
informat run_date yymmdd10.;
format run_date yymmdd10.;
cards;
123 ABC 2017-03-15
123 XYZ 2017-03-15
123 HSN 2017-03-15
;
You wan to first join it with itself based on the ID variable so that you get all runs combined with all of the other runs for the same id. But you don't want to include the exact duplicates.
proc sql ;
create table full as
select a.*
, case when not (a.run_id=b.run_id and a.run_date=b.run_date) then
catx('|',b.run_id,put(b.run_date,yymmdd10.))
else ' ' end as X length=20
from have a full join have b
on a.id = b.id
order by a.id,a.run_id,a.run_date,b.run_id,b.run_date
;
quit;
Then you need to collapse these back to the original unique rows.
data want ;
do until (last.run_date);
set full;
by id run_id run_date ;
length want $200 ;
want=catx('~',want,x);
end;
drop x;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.