So I've came across this task with a big dataset, I've make two brief tables to sum the core of this problem. I know if it's a very small table, I can just manually add more columns for each row to list all states for each organization ID and then use TRIM or CATX, but with big data, I couldn't figure out how to do it systematically, since the # of states an organization operates in varies, and the values are stored in different rows of the same column, rather than in different columns of the same row.
What I want to do is to bring States values to a higher level, as an aggregate of all states that this organization operate in.
My input data looks like:
OrganizationID | Product | States |
1 | football | DC |
1 | football | VA |
1 | football | MD |
2 | football | CA |
3 | football | NV |
3 | football | CA |
My desired output data should look like
OrganizationID | Product | States |
1 | football | DC, VA, MD |
2 | football | CA |
3 | football | NV, CA |
There are several options, you can use a data step with BY group processing or you can TRANSPOSE and then use CATX().
https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a
@LisaYIN9309 wrote:
So I've came across this task with a big dataset, I've make two brief tables to sum the core of this problem. I know if it's a very small table, I can just manually add more columns for each row to list all states for each organization ID and then use TRIM or CATX, but with big data, I couldn't figure out how to do it systematically, since the # of states an organization operates in varies, and the values are stored in different rows of the same column, rather than in different columns of the same row.
What I want to do is to bring States values to a higher level, as an aggregate of all states that this organization operate in.
My input data looks like:
OrganizationID Product States 1 football DC 1 football VA 1 football MD 2 football CA 3 football NV 3 football CA
My desired output data should look like
OrganizationID Product States 1 football DC, VA, MD 2 football CA 3 football NV, CA
There are several options, you can use a data step with BY group processing or you can TRANSPOSE and then use CATX().
https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a
@LisaYIN9309 wrote:
So I've came across this task with a big dataset, I've make two brief tables to sum the core of this problem. I know if it's a very small table, I can just manually add more columns for each row to list all states for each organization ID and then use TRIM or CATX, but with big data, I couldn't figure out how to do it systematically, since the # of states an organization operates in varies, and the values are stored in different rows of the same column, rather than in different columns of the same row.
What I want to do is to bring States values to a higher level, as an aggregate of all states that this organization operate in.
My input data looks like:
OrganizationID Product States 1 football DC 1 football VA 1 football MD 2 football CA 3 football NV 3 football CA
My desired output data should look like
OrganizationID Product States 1 football DC, VA, MD 2 football CA 3 football NV, CA
Thank you Reeza, this is great!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.