Hi All,
New to SAS EG and SQL, so I appreciate your patience and assistance! So far I've been able to find a solution to every hiccup I've encountered in my learning process except this one. What I'm trying to do is combine all the row values of a certain column into a single column/row when three different ID variables are the same.
Here's the sample data:
Person | Provider | Date | Code |
---|---|---|---|
John | ABC | 1/1/2015 | 12345 |
John | ABC | 1/1/2015 | 34567 |
John | ABC | 1/1/2015 | 45678 |
John | XYZ | 1/1/2015 | 12345 |
Jane | RST | 1/1/2015 | 12345 |
Jane | RST | 2/1/2015 | 45678 |
Jane | RST | 2/1/2015 | 34567 |
And here's what I'm after:
Person | Provider | Date | Combo |
---|---|---|---|
John | ABC | 1/1/2015 | 12345 34567 45678 |
John | XYZ | 1/1/2015 | 12345 |
Jane | RST | 1/1/2015 | 12345 |
Jane | RST | 2/1/2015 | 45678 34567 |
So, essentially, if the Person/Provider/Date are the same in various rows, then I want to combine all the Codes of those rows into one column so there's only one row for each unique Person/Provider/Date possibility. My original data table has many more possibilities than I've listed above, but I hope this provides a general idea on what I'm trying to accomplish.
Any idea on how I can do this via PROC SQL? Or, if it's not possible via PROC SQL (which I've encountered on some other hiccups), any idea on the solution via DATA STEP?
Many thanks in advance!
I would suggest not putting them into a single variable. One issue with a single variable: how big should it be? Do you know the maximum number of codes that will ever be encountered? If not, you might declare the size for the variable combo to be too small to hold the result.
SQL would not be my first choice to attempt such an operation as it really doesn't have a native concept of "previous row" or "next row".
Proc transpose will create a single record for the repeated values with a new variable for each level and doesn't need to know how many you have. A brief example for your data:
proc sort data=have; by person provider date;run;
proc transpose data=have out=want;
by person provider date;
var code;
run;
If you really think you need this one variable:
options missing= " "; /* if your codes are numeric the combo below will have the missing character in the result*/
data finalwant;
set want;
array c col: ;
combo = catx(" ", of c
run;
options missing =".";
What will you actually be doing with the resulting data? Many analysis operations work much better in your current form.
I would suggest not putting them into a single variable. One issue with a single variable: how big should it be? Do you know the maximum number of codes that will ever be encountered? If not, you might declare the size for the variable combo to be too small to hold the result.
SQL would not be my first choice to attempt such an operation as it really doesn't have a native concept of "previous row" or "next row".
Proc transpose will create a single record for the repeated values with a new variable for each level and doesn't need to know how many you have. A brief example for your data:
proc sort data=have; by person provider date;run;
proc transpose data=have out=want;
by person provider date;
var code;
run;
If you really think you need this one variable:
options missing= " "; /* if your codes are numeric the combo below will have the missing character in the result*/
data finalwant;
set want;
array c col: ;
combo = catx(" ", of c
run;
options missing =".";
What will you actually be doing with the resulting data? Many analysis operations work much better in your current form.
Hi there,
I don't know the maximum number of codes that will ever be encountered, so that possibility is dynamic.
Is there a way to declare a large size for the variable combo to try and reduce the possibility of the variable combo of exceeding the variable size?
The resulting data that I'm trying to get at with the "Combo" variable won't be utilized in SAS for any further analysis operations, so I'm not worried about creating a variable I can't utilize in SAS after I create it. The combo variable is part of an Excel report for my customers and they want all of the codes in one column when the person/provider/date is the same.
I used your provide steps and was able to get the final output I was after, thank you!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.