BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SASAna
Quartz | Level 8

Hi SAS Users,

 

I have 2 questions.

 

1.  Wanted some suggestions on  transpose coding with joining of multiple lines like below example.

 

example - 

data test;

length id $3  run_id  $3     run_date $3;
input      id         run_id      run_date ;
cards;
123           ABC       2017/01/01

123           CDE       2017/01/01
;
run;

 

expected transposed output -

 

id                duplicate_reason

123           ABC|2017/01/01~CDE|2017/01/01

 

 

2.    this expected transpose output will be passed to another dataset and i have to attatch them more run_ids & run_dates to exisitng duplicate_reason like below example if it is a new value only.

 

id                duplicate_reason

123           ABC|2017/01/01~CDE|2017/01/01~CDE|2017/01/01

 

can step1 and step2 can be done in a single step of proc transpose ?

 

Thanks,

Ana

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So it sounds like given this input

data have ;
  input id run_id $ run_date ;
  informat run_date yymmdd10.;
  format run_date yymmdd10.;
cards;
123 ABC 2017-03-15
123 XYZ 2017-03-15
123 HSN 2017-03-15
;

You wan to first join it with itself based on the ID variable so that you get all runs combined with all of the other runs for the same id. But you don't want to include the exact duplicates.

proc sql ;
 create table full as
   select a.*
        , case when not (a.run_id=b.run_id and a.run_date=b.run_date) then
          catx('|',b.run_id,put(b.run_date,yymmdd10.))
          else ' ' end as X length=20
   from have a full join  have b
   on a.id = b.id
   order by a.id,a.run_id,a.run_date,b.run_id,b.run_date
 ;
quit;

Then you need to collapse these back to the original unique rows.

data want ;
 do until (last.run_date);
   set full;
   by id run_id run_date ;
   length want $200 ;
   want=catx('~',want,x);
 end;
 drop x;
run;

View solution in original post

12 REPLIES 12
ballardw
Super User

Likely better to combine the data, remove duplicates and then transpose.

 

data temp;

   set dataone

         datatwo;

run;

 

proc sort data=temp nodupkey;

   by id run_id run_date;

run;

 

then combine the information(transpose). The implication you show that you want multiple run_id/run_date pairs within a single variable makes me cringe. But then I don't know how you are going to use this data.

SASAna
Quartz | Level 8

Thanks for the quick reply Ballard.

 

But i have all the data in one data set like below.

 

example - 

data test;

length id $3  run_id  $3     run_date $3 duplicate_reason $20;
input      id         run_id      run_date    duplicate_reason ;
cards;
123           ABC       2017/01/01          XYZ|2016/12/01~HND|2016/11/01

123           CDE       2017/01/01          XYZ|2016/12/01~HND|2016/11/01
;
run;

 

expected transposed output -

 

id                duplicate_reason

123           XYZ|2016/12/01~HND|2016/11/01~ABC|2017/01/01~CDE|2017/01/01   (add them only if it is not present in the duplicate_reason)

 

Thanks,

Ana

novinosrin
Tourmaline | Level 20

data test;
length id $3 run_id $3 run_date $10;
input id run_id run_date ;
cards;
123 ABC 2017/01/01
123 CDE 2017/01/01
;
run;

 

data want;
set test;
by id;
length duplicate_reason $100;
retain duplicate_reason ;
if first.id then do;
call missing(duplicate_reason);
duplicate_reason= catx('|',run_id,run_date);
end;
else duplicate_reason=catx('~',duplicate_reason,trim(run_id)||run_date);
if last.id;
keep id duplicate_reason;
run;

 

Regards,

Naveen Srinivasan

SASAna
Quartz | Level 8

Thanks for the reply Naveen.

 

that worked. But i have scenario's where the run_id||run_date combination could be present already. So in those cases i have to ignore them from attaching. How to add tranward or index in that same line where you r attaching the data ?

 

else duplicate_reason=catx('~',duplicate_reason,trim(run_id)||'|'||run_date);

 

Thanks,

Ana

novinosrin
Tourmaline | Level 20

Do you mean, you already have the duplicate_reason variable in your dataset? Hmm, i think i am confused with how your data looks like, See if my understanding is right:

 

/*Another version*/

data want;
set test;
by id;
length temp $100;
retain temp ;
if first.id then do;
call missing(temp);
temp= catx('|',run_id,run_date);
end;
else temp=catx('~',temp,trim(run_id)||run_date);
if last.id and index(duplicate_reason,temp)=0 then duplicate_reason= catx('~',duplicate_reason,temp);
keep id duplicate_reason;
run;

 

Regards,

Naveen Srinivasan

SASAna
Quartz | Level 8
Thanks for the suggestion Naveen. My requirement is to shuffle the values across the id's .Sorry i was not clear on my asking. per your solution first line will not get the value of other lines run_ids & run_dates .
Reeza
Super User

No, and if you could, it wouldn't be straightforward. You may be able to do it all in a single data step. 

 

Post a larger example and we can help with sample code. 

SASAna
Quartz | Level 8

Hi Reeza,

 

id     run_id              run_date                                                                         Duplicate_reason

123    ABC     2017-03-15                               DKS|2016-10-21~ XYZ|2017-03-15~ HSN|2017-03-15    

123   XYZ      2017-03-15                               DKS|2016-10-21~ ABC|2017-03-15~ HSN2017-03-15           

123   HSN     2017-03-15                                DKS|2016-10-21~ABC|2017-03-15~ XYZ|2017-03-15

 

data set will be like the above example. & bold letters are the new values that needs to be coded to add to existing duplicate logic , only if they are aready not present.

 

For the duplicate Reason field logic - 

 

line#1 has to have values  from line #2 & line #3 with tilda delimiter.

line#2 has to have values from line #1 & line #3 with tilda delimiter.

line#3 has to have values from line #1 & line #2 with tilda delimiter.

 

Please let me know if this is not clear.

 

Thanks,

Ana

Reeza
Super User

What's your logic here? 

 

Edit: You may want to look up 'Cartesian Product' in SAS along with CATX()

SASAna
Quartz | Level 8

🙂 i am still thinking on this and looking  for some easier way to get this requirment.

 

I was thinking of transposing all the fields  by id , but it is not working for me, so i posted this on community.

 

Thanks,

Ana

Tom
Super User Tom
Super User

So it sounds like given this input

data have ;
  input id run_id $ run_date ;
  informat run_date yymmdd10.;
  format run_date yymmdd10.;
cards;
123 ABC 2017-03-15
123 XYZ 2017-03-15
123 HSN 2017-03-15
;

You wan to first join it with itself based on the ID variable so that you get all runs combined with all of the other runs for the same id. But you don't want to include the exact duplicates.

proc sql ;
 create table full as
   select a.*
        , case when not (a.run_id=b.run_id and a.run_date=b.run_date) then
          catx('|',b.run_id,put(b.run_date,yymmdd10.))
          else ' ' end as X length=20
   from have a full join  have b
   on a.id = b.id
   order by a.id,a.run_id,a.run_date,b.run_id,b.run_date
 ;
quit;

Then you need to collapse these back to the original unique rows.

data want ;
 do until (last.run_date);
   set full;
   by id run_id run_date ;
   length want $200 ;
   want=catx('~',want,x);
 end;
 drop x;
run;
SASAna
Quartz | Level 8
Thanks Tom. It worked like charm. Thanks for helping.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 3734 views
  • 2 likes
  • 5 in conversation