BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
buddha_d
Pyrite | Level 9

Hi guys,

         I have a dataset that I got from joining two tables. Since there are multiple users on a given account, I have duplicate values. I am trying to find a way to code the out put listed below. As you see, I want to concatenate records based on a individual employee id and trans_no, acct_no, acct_name, trans_date,trans_amt along with employee name and his dob. I tried below code, but it's not working out as I want to concatenate duplicate part of the record to non-duplicated record (employee fname,lname,dob).

data have;
	input fname $ lname $ trans_no $  emp_id $ acct_no $ acct_name $ trans_date date9. trans_amt dob date9. ;
	FORMAT DOB TRANS_DATE DATE9.;
cards;
John Smith 123XER ABCD 12345 JonasLLC 17May2020 1500 02mar1990
Sarah Frost 123XER ABCD 12345 JonasLLC 17May2020 1500 22Jun1995
Noel White 458ABC ABCD 45689 Jonas 11May2020 20000 12may2000
Tim Kerry 458ABC ABCD 45689 Jonas 11May2020 20000 16jun2005
Tom Baker 784TYU ABCD 54321 JonasLLC 17Mar2020 15000 02mar2010
Nick Black 342VCF WXYZ 67890 Chipotle 15May2020 10000 15Jun1978
Mike Meyer 987RTV WXYZ 568471 Macys 18jan2020 45000 14Feb1992
Ronney Black 342VCF WXYZ 67890 Chipotle 15May2020 10000 15Jun2000
Nancy Meyer 987RTV WXYZ 568471 Macys 18jan2020 45000 14Feb2012
;
run;

PROC SORT DATA=HAVE ;
	BY emp_id ;
RUN;

data want (keep=total);
	LENGTH TOTAL $500;
	set have;
	by emp_id;
	IF FIRST.EMP_ID THEN TOTAL= STRIP(TOTAL)||'|'||STRIP(emp_id)||'|'||STRIP(trans_no)||'|'||STRIP(acct_no)||'|'||STRIP(acct_name)||'|'||STRIP(PUT(trans_date, date9.))||'|'||
		 STRIP(PUT(trans_amt,8.))||'|'||STRIP(fname)||'|'||STRIP(lname)||'|'||STRIP(PUT(dob,DATE9.))||'|'||'***'||'|' ;
	ELSE DELETE;

RUN;

I want the output to look like this:

|ABCD|123XER|12345|JonasLLC|17May2020|1500|***|John|Smith|02mar1990|***|Sarah|Frost|22Jun1995|***
|458ABC|45689|Jonas|11May2020|20000|***|Noel|White|12may2000|***|Tim|Kerry|16jun2005|***
|784TYU|54321|JonasLLC|17Mar2020|15000|Tom|Baker|02mar2010|
|WXYZ|342VCF|67890|Chipotle|15May2020|10000|***|Nick|Black|15Jun1978|***|Ronney|Black|15Jun2000|***
|987RTV|568471|Macys|18jan2020|45000|***|Mike|Meyer|14Feb1992|***|Nancy|Meyer|14Feb2012|

 

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Like this?


proc sort data=HAVE ;
	by EMP_ID TRANS_NO ACCT_NO ACCT_NAME TRANS_DATE TRANS_AMT;
run;

data WANT ;*(keep=TOTAL);
  length TOTAL $500;
  retain TOTAL ;
  set HAVE;
	by EMP_ID TRANS_NO ACCT_NO ACCT_NAME TRANS_DATE TRANS_AMT;
  if first.EMP_ID then TOTAL=EMP_ID;
  if first.TRANS_AMT then TOTAL=catx('|',TOTAL,TRANS_NO,ACCT_NO,ACCT_NAME,vvalue(TRANS_DATE),TRANS_AMT,'***');
  TOTAL=catx('|',TOTAL,FNAME,LNAME,vvalue(DOB),'***');
  if last.EMP_ID then output;
run;

Please use the proper way to insert code (running man icon) and text (</> icon) to avoid formatting issues.

ABCD|123XER|12345|JonasLLC|17MAY2020|1500|Sarah|Frost|22JUN1995|***|John|Smith|02MAR1990|***|458ABC|45689|Jonas|11MAY2020|20000|Noel|White|12MAY2000|***|Tim|Kerry|16JUN2005|***|784TYU|54321|JonasLLC|17MAR2020|15000|Tom|Baker|02MAR2010|***
WXYZ|342VCF|67890|Chipotle|15MAY2020|10000|Nick|Black|15JUN1978|***|Ronney|Black|15JUN2000|***|987RTV|568471|Macys|18JAN2020|45000|Mike|Meyer|14FEB1992|***|Nancy|Meyer|14FEB2012|***

 

 

View solution in original post

8 REPLIES 8
buddha_d
Pyrite | Level 9

The output would have 2 records for two employee ids. 

Thanks.

ChrisNZ
Tourmaline | Level 20

The logic you want is unclear. Why 5 records output?

Does this help?

data WANT (keep=TOTAL);
  length TOTAL $500;
  set HAVE;
  by EMP_ID;
  if first.EMP_ID then TOTAL=EMP_ID;
  TOTAL=catx('|',TOTAL,TRANS_NO,ACCT_NO,ACCT_NAME,vvalue(TRANS_DATE),TRANS_AMT,FNAME,LNAME,vvalue(DOB),'***');
run;
buddha_d
Pyrite | Level 9

Hi ChrisNZ, 

              Thanks for responding. There is supposed be two records (as there are only two employee ids). On top of it. I have to separate out repeated data (duplicate, 123XER ABCD 12345 JonasLLC 17May2020 1500 (red color text)) from non duplicate data (John Smith , 02mar1990  and Sarah Frost , 22Jun1995). Take a look at the first two records.

 

John Smith 123XER ABCD 12345 JonasLLC 17May2020 1500 02mar1990
Sarah Frost 123XER ABCD 12345 JonasLLC 17May2020 1500 22Jun1995 

 

I want to have two employee_ids and each employee id should have repeated part of record concatenated with nonrepeated part of the code. 

 

nar.JPGThe highlighted text is repeated in two records at least. The result would look like this. The highlighted text is repeated in two records and rest of the string is not repeated record (ie, fname,lname and date of birth). 

nar1.JPG Let me know if you need more clarity.

Thanks again. 

buddha_d
Pyrite | Level 9

The output will have two records only( as there are 2 employee ids ABCD and WXYZ).  Space on the page is not allowing me to put all of the single employee on the same line. Though it looks 4 lines, but it should have two lines of records.

The output should look like this

|ABCD|123XER|12345|JonasLLC|17May2020|1500|***|John|Smith|02mar1990|***|Sarah|Frost|22Jun1995|***|458ABC|45689|Jonas|11May2020|20000|***|Noel|White|12may2000|***|Tim|Kerry|16jun2005|***|784TYU|54321|JonasLLC|17Mar2020|15000|Tom|Baker|02mar2010|
|WXYZ|342VCF|67890|Chipotle|15May2020|10000|***|Nick|Black|15Jun1978|***|Ronney|Black|15Jun2000|***|987RTV|568471|Macys|18jan2020|45000|***|Mike|Meyer|14Feb1992|***|Nancy|Meyer|14Feb2012|

ChrisNZ
Tourmaline | Level 20

Like this?


proc sort data=HAVE ;
	by EMP_ID TRANS_NO ACCT_NO ACCT_NAME TRANS_DATE TRANS_AMT;
run;

data WANT ;*(keep=TOTAL);
  length TOTAL $500;
  retain TOTAL ;
  set HAVE;
	by EMP_ID TRANS_NO ACCT_NO ACCT_NAME TRANS_DATE TRANS_AMT;
  if first.EMP_ID then TOTAL=EMP_ID;
  if first.TRANS_AMT then TOTAL=catx('|',TOTAL,TRANS_NO,ACCT_NO,ACCT_NAME,vvalue(TRANS_DATE),TRANS_AMT,'***');
  TOTAL=catx('|',TOTAL,FNAME,LNAME,vvalue(DOB),'***');
  if last.EMP_ID then output;
run;

Please use the proper way to insert code (running man icon) and text (</> icon) to avoid formatting issues.

ABCD|123XER|12345|JonasLLC|17MAY2020|1500|Sarah|Frost|22JUN1995|***|John|Smith|02MAR1990|***|458ABC|45689|Jonas|11MAY2020|20000|Noel|White|12MAY2000|***|Tim|Kerry|16JUN2005|***|784TYU|54321|JonasLLC|17MAR2020|15000|Tom|Baker|02MAR2010|***
WXYZ|342VCF|67890|Chipotle|15MAY2020|10000|Nick|Black|15JUN1978|***|Ronney|Black|15JUN2000|***|987RTV|568471|Macys|18JAN2020|45000|Mike|Meyer|14FEB1992|***|Nancy|Meyer|14FEB2012|***

 

 

buddha_d
Pyrite | Level 9

Thanks Chris. This works for me. 

Shmuel
Garnet | Level 18

Just add one more line to @ChrisNZ code:

 

data WANT (keep=TOTAL);
  length TOTAL $500;
  set HAVE;
  by EMP_ID;
  if first.EMP_ID then TOTAL=EMP_ID;
  TOTAL=catx('|',TOTAL,TRANS_NO,ACCT_NO,ACCT_NAME,vvalue(TRANS_DATE),
             TRANS_AMT,FNAME,LNAME,vvalue(DOB),'***');
  if last.EMP_ID then output;  /* avoid multiple lines per emp_id */
run;
buddha_d
Pyrite | Level 9

Thanks for the help Shmuel.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1787 views
  • 0 likes
  • 3 in conversation