Hi guys,
I have a dataset that I got from joining two tables. Since there are multiple users on a given account, I have duplicate values. I am trying to find a way to code the out put listed below. As you see, I want to concatenate records based on a individual employee id and trans_no, acct_no, acct_name, trans_date,trans_amt along with employee name and his dob. I tried below code, but it's not working out as I want to concatenate duplicate part of the record to non-duplicated record (employee fname,lname,dob).
data have;
input fname $ lname $ trans_no $ emp_id $ acct_no $ acct_name $ trans_date date9. trans_amt dob date9. ;
FORMAT DOB TRANS_DATE DATE9.;
cards;
John Smith 123XER ABCD 12345 JonasLLC 17May2020 1500 02mar1990
Sarah Frost 123XER ABCD 12345 JonasLLC 17May2020 1500 22Jun1995
Noel White 458ABC ABCD 45689 Jonas 11May2020 20000 12may2000
Tim Kerry 458ABC ABCD 45689 Jonas 11May2020 20000 16jun2005
Tom Baker 784TYU ABCD 54321 JonasLLC 17Mar2020 15000 02mar2010
Nick Black 342VCF WXYZ 67890 Chipotle 15May2020 10000 15Jun1978
Mike Meyer 987RTV WXYZ 568471 Macys 18jan2020 45000 14Feb1992
Ronney Black 342VCF WXYZ 67890 Chipotle 15May2020 10000 15Jun2000
Nancy Meyer 987RTV WXYZ 568471 Macys 18jan2020 45000 14Feb2012
;
run;
PROC SORT DATA=HAVE ;
BY emp_id ;
RUN;
data want (keep=total);
LENGTH TOTAL $500;
set have;
by emp_id;
IF FIRST.EMP_ID THEN TOTAL= STRIP(TOTAL)||'|'||STRIP(emp_id)||'|'||STRIP(trans_no)||'|'||STRIP(acct_no)||'|'||STRIP(acct_name)||'|'||STRIP(PUT(trans_date, date9.))||'|'||
STRIP(PUT(trans_amt,8.))||'|'||STRIP(fname)||'|'||STRIP(lname)||'|'||STRIP(PUT(dob,DATE9.))||'|'||'***'||'|' ;
ELSE DELETE;
RUN;
I want the output to look like this:
|ABCD|123XER|12345|JonasLLC|17May2020|1500|***|John|Smith|02mar1990|***|Sarah|Frost|22Jun1995|***
|458ABC|45689|Jonas|11May2020|20000|***|Noel|White|12may2000|***|Tim|Kerry|16jun2005|***
|784TYU|54321|JonasLLC|17Mar2020|15000|Tom|Baker|02mar2010|
|WXYZ|342VCF|67890|Chipotle|15May2020|10000|***|Nick|Black|15Jun1978|***|Ronney|Black|15Jun2000|***
|987RTV|568471|Macys|18jan2020|45000|***|Mike|Meyer|14Feb1992|***|Nancy|Meyer|14Feb2012|
Thanks in advance.
Like this?
proc sort data=HAVE ;
by EMP_ID TRANS_NO ACCT_NO ACCT_NAME TRANS_DATE TRANS_AMT;
run;
data WANT ;*(keep=TOTAL);
length TOTAL $500;
retain TOTAL ;
set HAVE;
by EMP_ID TRANS_NO ACCT_NO ACCT_NAME TRANS_DATE TRANS_AMT;
if first.EMP_ID then TOTAL=EMP_ID;
if first.TRANS_AMT then TOTAL=catx('|',TOTAL,TRANS_NO,ACCT_NO,ACCT_NAME,vvalue(TRANS_DATE),TRANS_AMT,'***');
TOTAL=catx('|',TOTAL,FNAME,LNAME,vvalue(DOB),'***');
if last.EMP_ID then output;
run;
Please use the proper way to insert code (running man icon) and text (</> icon) to avoid formatting issues.
ABCD|123XER|12345|JonasLLC|17MAY2020|1500|Sarah|Frost|22JUN1995|***|John|Smith|02MAR1990|***|458ABC|45689|Jonas|11MAY2020|20000|Noel|White|12MAY2000|***|Tim|Kerry|16JUN2005|***|784TYU|54321|JonasLLC|17MAR2020|15000|Tom|Baker|02MAR2010|*** WXYZ|342VCF|67890|Chipotle|15MAY2020|10000|Nick|Black|15JUN1978|***|Ronney|Black|15JUN2000|***|987RTV|568471|Macys|18JAN2020|45000|Mike|Meyer|14FEB1992|***|Nancy|Meyer|14FEB2012|***
The output would have 2 records for two employee ids.
Thanks.
The logic you want is unclear. Why 5 records output?
Does this help?
data WANT (keep=TOTAL);
length TOTAL $500;
set HAVE;
by EMP_ID;
if first.EMP_ID then TOTAL=EMP_ID;
TOTAL=catx('|',TOTAL,TRANS_NO,ACCT_NO,ACCT_NAME,vvalue(TRANS_DATE),TRANS_AMT,FNAME,LNAME,vvalue(DOB),'***');
run;
Hi ChrisNZ,
Thanks for responding. There is supposed be two records (as there are only two employee ids). On top of it. I have to separate out repeated data (duplicate, 123XER ABCD 12345 JonasLLC 17May2020 1500 (red color text)) from non duplicate data (John Smith , 02mar1990 and Sarah Frost , 22Jun1995). Take a look at the first two records.
John Smith 123XER ABCD 12345 JonasLLC 17May2020 1500 02mar1990
Sarah Frost 123XER ABCD 12345 JonasLLC 17May2020 1500 22Jun1995
I want to have two employee_ids and each employee id should have repeated part of record concatenated with nonrepeated part of the code.
The highlighted text is repeated in two records at least. The result would look like this. The highlighted text is repeated in two records and rest of the string is not repeated record (ie, fname,lname and date of birth).
Let me know if you need more clarity.
Thanks again.
The output will have two records only( as there are 2 employee ids ABCD and WXYZ). Space on the page is not allowing me to put all of the single employee on the same line. Though it looks 4 lines, but it should have two lines of records.
The output should look like this
|ABCD|123XER|12345|JonasLLC|17May2020|1500|***|John|Smith|02mar1990|***|Sarah|Frost|22Jun1995|***|458ABC|45689|Jonas|11May2020|20000|***|Noel|White|12may2000|***|Tim|Kerry|16jun2005|***|784TYU|54321|JonasLLC|17Mar2020|15000|Tom|Baker|02mar2010|
|WXYZ|342VCF|67890|Chipotle|15May2020|10000|***|Nick|Black|15Jun1978|***|Ronney|Black|15Jun2000|***|987RTV|568471|Macys|18jan2020|45000|***|Mike|Meyer|14Feb1992|***|Nancy|Meyer|14Feb2012|
Like this?
proc sort data=HAVE ;
by EMP_ID TRANS_NO ACCT_NO ACCT_NAME TRANS_DATE TRANS_AMT;
run;
data WANT ;*(keep=TOTAL);
length TOTAL $500;
retain TOTAL ;
set HAVE;
by EMP_ID TRANS_NO ACCT_NO ACCT_NAME TRANS_DATE TRANS_AMT;
if first.EMP_ID then TOTAL=EMP_ID;
if first.TRANS_AMT then TOTAL=catx('|',TOTAL,TRANS_NO,ACCT_NO,ACCT_NAME,vvalue(TRANS_DATE),TRANS_AMT,'***');
TOTAL=catx('|',TOTAL,FNAME,LNAME,vvalue(DOB),'***');
if last.EMP_ID then output;
run;
Please use the proper way to insert code (running man icon) and text (</> icon) to avoid formatting issues.
ABCD|123XER|12345|JonasLLC|17MAY2020|1500|Sarah|Frost|22JUN1995|***|John|Smith|02MAR1990|***|458ABC|45689|Jonas|11MAY2020|20000|Noel|White|12MAY2000|***|Tim|Kerry|16JUN2005|***|784TYU|54321|JonasLLC|17MAR2020|15000|Tom|Baker|02MAR2010|*** WXYZ|342VCF|67890|Chipotle|15MAY2020|10000|Nick|Black|15JUN1978|***|Ronney|Black|15JUN2000|***|987RTV|568471|Macys|18JAN2020|45000|Mike|Meyer|14FEB1992|***|Nancy|Meyer|14FEB2012|***
Thanks Chris. This works for me.
Just add one more line to @ChrisNZ code:
data WANT (keep=TOTAL);
length TOTAL $500;
set HAVE;
by EMP_ID;
if first.EMP_ID then TOTAL=EMP_ID;
TOTAL=catx('|',TOTAL,TRANS_NO,ACCT_NO,ACCT_NAME,vvalue(TRANS_DATE),
TRANS_AMT,FNAME,LNAME,vvalue(DOB),'***');
if last.EMP_ID then output; /* avoid multiple lines per emp_id */
run;
Thanks for the help Shmuel.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.