DATA Step, Macro, Functions and more

summarize data

Accepted Solution Solved
Reply
Contributor
Posts: 37
Accepted Solution

summarize data

Hi i have 3 tables called one two and three (in realtime i have around 40 to 50 different tables)

 

i have to summarize these tables based on the ID as the key variable , like ID 3 is in all the table with different dept name, so in the final summarized table the dept for Id 3 should be all the 3 dept separated by a comma (,) 

 

data i have below 

 

data one;
input id dept$;
cards;
1 del
2 mum
3 bang
;
run;

data two;
input id dept$;
cards;
3 bbsr
4 cal
;
run;

data three;
input id dept$;
cards;
5 pun
3 hyd
6 che
;
run;

 

data i want :

 

iddept
1del
2mum
3bang ,Hyd ,bbsr
4cal
5pun
6che

 


Accepted Solutions
Solution
3 weeks ago
Super User
Posts: 6,524

Re: summarize data

Posted in reply to soham_sas

That would happen if you leave out the statement:

 

if last.id;

 

If you do have further issues, be sure to post the log.

View solution in original post


All Replies
PROC Star
Posts: 1,287

Re: summarize data

Posted in reply to soham_sas

since no duplicates, 

1. do a one to one merge 

2. catx with delim

3. keep the vars you want in final and you are done

Super User
Posts: 22,820

Re: summarize data

Posted in reply to soham_sas

Here's an illustration of two different methods:

 

*create sample data for demonstration;
data have;
    infile cards dlm='09'x;
    input OrgID Product $   States $;
    cards;
1   football    DC
1   football    VA
1   football    MD
2   football    CA
3   football    NV
3   football    CA
;
run;

*Sort - required for both options;
proc sort data=have;
    by orgID;
run;

**********************************************************************;
*Use RETAIN and BY group processing to combine the information;
**********************************************************************;
data want_option1;
    set have;
    by orgID;
    length combined $100.;
    retain combined;

    if first.orgID then
        combined=states;
    else
        combined=catx(', ', combined, states);

    if last.orgID then
        output;
run;

**********************************************************************;
*Transpose it to a wide format and then combine into a single field;
**********************************************************************;
proc transpose data=have out=wide prefix=state_;
    by orgID;
    var states;
run;

data want_option2;
    set wide;
    length combined $100.;
    combined=catx(', ', of state_:);
run;

https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a

 


@soham_sas wrote:

Hi i have 3 tables called one two and three (in realtime i have around 40 to 50 different tables)

 

i have to summarize these tables based on the ID as the key variable , like ID 3 is in all the table with different dept name, so in the final summarized table the dept for Id 3 should be all the 3 dept separated by a comma (,) 

 

data i have below 

 

data one;
input id dept$;
cards;
1 del
2 mum
3 bang
;
run;

data two;
input id dept$;
cards;
3 bbsr
4 cal
;
run;

data three;
input id dept$;
cards;
5 pun
3 hyd
6 che
;
run;

 

data i want :

 

id dept
1 del
2 mum
3 bang ,Hyd ,bbsr
4 cal
5 pun
6 che

 


 

Super User
Posts: 6,524

Re: summarize data

Posted in reply to soham_sas

First, sort all three data sets by ID. 

 

I'm not sure why you want Hyd as the middle value instead of the last value (or if that is just a mistake).

 

After sorting, you could use:

 

data want;

set one two three;

by id;

length all_depts $ 100;

if first.id then all_depts = dept;

else all_depts = catx(', ', all_depts, dept);

if last.id;

drop dept;

run;

 

Notice that you need a new, longer variable to hold the full set of characters.  If you want to, you could add this as the final statement (before the run statement):

 

rename all_depts = dept;

 

If you really want Hyd as the middle value you would need to change the SET statement:

 

set one three two;

 

Finally, note that this is probably a bad idea.  The vast majority of the time it will be easier to use the data in its original form.

Contributor
Posts: 37

Re: summarize data

Posted in reply to Astounding

@Astounding i tried the code , its giving the output like below , but we need the 5th observation only for Id 3 (i.e 3 bang,bbsr,hyd)

 

please suggest 

 

also as there are many datasets so i am appendeing all the datsets , then sort it by Id and then i am applying  the code 

 

1 del
2 mum
3 bang
3 bang,bbsr
3 bang,bbsr,hyd
4 cal
5 pun
6 che

Solution
3 weeks ago
Super User
Posts: 6,524

Re: summarize data

Posted in reply to soham_sas

That would happen if you leave out the statement:

 

if last.id;

 

If you do have further issues, be sure to post the log.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 130 views
  • 0 likes
  • 4 in conversation