So I've came across this task with a big dataset, I've make two brief tables to sum the core of this problem. I know if it's a very small table, I can just manually add more columns for each row to list all states for each organization ID and then use TRIM or CATX, but with big data, I couldn't figure out how to do it systematically, since the # of states an organization operates in varies, and the values are stored in different rows of the same column, rather than in different columns of the same row.
What I want to do is to bring States values to a higher level, as an aggregate of all states that this organization operate in.
My input data looks like:
OrganizationID | Product | States |
1 | football | DC |
1 | football | VA |
1 | football | MD |
2 | football | CA |
3 | football | NV |
3 | football | CA |
My desired output data should look like
OrganizationID | Product | States |
1 | football | DC, VA, MD |
2 | football | CA |
3 | football | NV, CA |
There are several options, you can use a data step with BY group processing or you can TRANSPOSE and then use CATX().
https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a
@LisaYIN9309 wrote:
So I've came across this task with a big dataset, I've make two brief tables to sum the core of this problem. I know if it's a very small table, I can just manually add more columns for each row to list all states for each organization ID and then use TRIM or CATX, but with big data, I couldn't figure out how to do it systematically, since the # of states an organization operates in varies, and the values are stored in different rows of the same column, rather than in different columns of the same row.
What I want to do is to bring States values to a higher level, as an aggregate of all states that this organization operate in.
My input data looks like:
OrganizationID Product States 1 football DC 1 football VA 1 football MD 2 football CA 3 football NV 3 football CA
My desired output data should look like
OrganizationID Product States 1 football DC, VA, MD 2 football CA 3 football NV, CA
There are several options, you can use a data step with BY group processing or you can TRANSPOSE and then use CATX().
https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a
@LisaYIN9309 wrote:
So I've came across this task with a big dataset, I've make two brief tables to sum the core of this problem. I know if it's a very small table, I can just manually add more columns for each row to list all states for each organization ID and then use TRIM or CATX, but with big data, I couldn't figure out how to do it systematically, since the # of states an organization operates in varies, and the values are stored in different rows of the same column, rather than in different columns of the same row.
What I want to do is to bring States values to a higher level, as an aggregate of all states that this organization operate in.
My input data looks like:
OrganizationID Product States 1 football DC 1 football VA 1 football MD 2 football CA 3 football NV 3 football CA
My desired output data should look like
OrganizationID Product States 1 football DC, VA, MD 2 football CA 3 football NV, CA
Thank you Reeza, this is great!
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.