DATA Step, Macro, Functions and more

Merge

Accepted Solution Solved
Reply
Super Contributor
Posts: 1,040
Accepted Solution

Merge

Hi Team,

I was asked to group the less than or equal to 5 as "other";

so i first did a proc freq of the cohort----the output produced is something like table1

for the table 1 i write the code:

if count<=5 then Div2=Other as the output is shown in Table2;

Now i take the table 2 (after dropping count) and merge it back to cohort table(by Dept  and Div)

In the report i will take Dept and Div2 from and use Group option in the proc report Define statement...and get as shown in table3  or

Do i need to do anything else
????

Table1

Dept      Div       count

A           Div1      150

A          Div2        1

A         Div3          4

A         Div4        200

B        Div1         25     

B        Div2         52

B        Div3          3

B       Div4          5

B       Div5        96

B      Div6         82

B     Div7         100

Table2

Dept    Div               Div2    

A          Div1              Div1    

A          Div2                other      

A          Div3              other        

A         Div4                Div4      

B        Div 1                 Div1       

B      Div2                  Div2       

B       Div3                 other       

B       Div4                 other         

B      Div5                 Div5       

B       Div6                 Div6        

B       Div7                 Div7  

Table3

Dept    Div2    

A          Div1      150    

            other          5         

            Div4      200 

   

B         Div1         25       

           Div2         52     

           other          8       

           Div5         96     

           Div6        82     

            Div7     100

Thanks


Accepted Solutions
Solution
‎12-14-2012 03:16 PM
SAS Super FREQ
Posts: 8,740

Re: Merge

Hi:

  Well, if you use PROC REPORT, you can bypass the PROC FREQ and the need to do a MERGE. It will take 2 PROC REPORT steps, but that is only 2 passes through the data. If you do PROC FREQ, a merge and a REPORT, then you are touching data 3 times. In order to make an example, I had to make some "fake" data for illustration purposes.

  In the program below, step 1 uses PROC REPORT to summarize the "fake" data AND create the DIV2 variable to create an output dataset named WORK.NEWOUT.  No PROC FREQ necessary because the GROUP usage will allow REPORT to summarize the value of COUNT for each DEPT/DIV. Then step 2 just uses WORK.NEWOUT in a PROC REPORT step to produce the final report. No merge necessary. Just send the final dataset to PROC REPORT -- I added a SUMMARIZE in an RBREAK to show you that you can still summarize on COUNT.

cynthia

data testit;
infile datalines;
input Dept $ div $ count;
return;
datalines;
A Div1 75
A Div1 75
A Div2 1
A Div3 2
A Div3 2
A Div4 50
A Div4 15
A Div4 85
A Div4 50
B Div1 15
B Div1 5
B Div1 10
B Div2 26
B Div2 13
B Div2 13
B Div3 1
B Div3 2
B Div4 2
B Div4 3
B Div5 24
B Div5 24
B Div5 48
B Div6 41
B Div6 10
B Div6 21
B Div6 10
B Div7 10
B Div7 90
B Div7 50
B Div7 50
;
run;
   
** This PROC REPORT step makes a DATASET called;
** WORK.NEWOUT with the summary information and the ;
** new column DIV2;
ods listing;
proc report data=testit nowd out=newout;
  title '1) Prepare Data with PROC REPORT';
  column dept  div count div2 ;
  define dept / group;
  define div / group;
  define count / sum;
  define div2 / computed;
  compute div2 / character length=8;
    if count.sum gt 5 then div2 = div;
    else div2 = 'Other';
  endcomp;
run;
ods listing close;
 
ods html file='c:\temp\usereport.html';
 
proc report data=newout nowd;
  title '2) Final Report using WORK.NEWOUT';
  column dept div2 count;
  define dept / group;
  define div2 / group 'Div';
  define count / sum;
  break after dept / summarize;
  compute after dept;
    line ' ';
  endcomp;
run;
ods html close;

View solution in original post


All Replies
Solution
‎12-14-2012 03:16 PM
SAS Super FREQ
Posts: 8,740

Re: Merge

Hi:

  Well, if you use PROC REPORT, you can bypass the PROC FREQ and the need to do a MERGE. It will take 2 PROC REPORT steps, but that is only 2 passes through the data. If you do PROC FREQ, a merge and a REPORT, then you are touching data 3 times. In order to make an example, I had to make some "fake" data for illustration purposes.

  In the program below, step 1 uses PROC REPORT to summarize the "fake" data AND create the DIV2 variable to create an output dataset named WORK.NEWOUT.  No PROC FREQ necessary because the GROUP usage will allow REPORT to summarize the value of COUNT for each DEPT/DIV. Then step 2 just uses WORK.NEWOUT in a PROC REPORT step to produce the final report. No merge necessary. Just send the final dataset to PROC REPORT -- I added a SUMMARIZE in an RBREAK to show you that you can still summarize on COUNT.

cynthia

data testit;
infile datalines;
input Dept $ div $ count;
return;
datalines;
A Div1 75
A Div1 75
A Div2 1
A Div3 2
A Div3 2
A Div4 50
A Div4 15
A Div4 85
A Div4 50
B Div1 15
B Div1 5
B Div1 10
B Div2 26
B Div2 13
B Div2 13
B Div3 1
B Div3 2
B Div4 2
B Div4 3
B Div5 24
B Div5 24
B Div5 48
B Div6 41
B Div6 10
B Div6 21
B Div6 10
B Div7 10
B Div7 90
B Div7 50
B Div7 50
;
run;
   
** This PROC REPORT step makes a DATASET called;
** WORK.NEWOUT with the summary information and the ;
** new column DIV2;
ods listing;
proc report data=testit nowd out=newout;
  title '1) Prepare Data with PROC REPORT';
  column dept  div count div2 ;
  define dept / group;
  define div / group;
  define count / sum;
  define div2 / computed;
  compute div2 / character length=8;
    if count.sum gt 5 then div2 = div;
    else div2 = 'Other';
  endcomp;
run;
ods listing close;
 
ods html file='c:\temp\usereport.html';
 
proc report data=newout nowd;
  title '2) Final Report using WORK.NEWOUT';
  column dept div2 count;
  define dept / group;
  define div2 / group 'Div';
  define count / sum;
  break after dept / summarize;
  compute after dept;
    line ' ';
  endcomp;
run;
ods html close;

Super Contributor
Posts: 1,040

Re: Merge

Hi,

Thanks for the detailed explanation. I appreciate your time

Regards

Super Contributor
Posts: 1,040

Re: Merge

Hi,

I have a couple of questions   here??

I dont have have the Table1 dataset initially.

I did a proc freq to get it!!!

So when we have to bypass it....how to go about???

The testit data you created is very huge with the specifications I provided here

Regards

And Imagine

SAS Super FREQ
Posts: 8,740

Re: Merge

Hi:

  I don't think my WORK.TESTIT is HUGE, but it is "fake" data. You had one observation for each DEPT/DIV combination in TABLE1 and then wanted to create DIV2 to make an "Other" category for the observations where the summary COUNT was LT 5.

  WORK.TESTIT goes into PROC REPORT in step 1. What comes out of PROC REPORT should be very close to your TABLE1. So, whatever dataset you used in order to GET table 1 would be the data= value you would use in STEP1. But then, my PROC REPORT summarizes the data AND creates the DIV2 variable. In your example, you had:

starting data --> PROC FREQ creates TABLE1 --> you add DIV2 to create TABLE2 --> merge makes final table --> final table to PROC REPORT

whereas in my program the process is this:

starting data --> to PROC REPORT1 creates WORK.NEWOUT and, creates DIV2 (in 1 step)--> WORK.NEWOUT to PROC REPORT2

  You did not show your starting data or any of your code. You only described your code and showed TABLE 1/TABLE2. I made my "fake" data by making fake observations to simulate what you got out of PROC FREQ. I did that so my dataset from step 1 (WORK.NEWOUT) would "look" like your TABLE1/TABLE2 combined. For example, your TABLE1 had:

Dept Div count

A    Div1 150
A    Div2   1
A    Div3   4
A    Div4 200
B    Div1  25
B    Div2  52
B    Div3   3
B    Div4   5
B    Div5  96
B    Div6  82
B    Div7 100
 

Then you showed how you created DIV2 and got TABLE2 and then you said you merged back to get the DIV2 data with the TABLE1 data. and THEN you sent the final data to PROC REPORT.

I was trying to simplify your programming approach. So, my WORK.NEWOUT had:
Dept    div     count    div2
A      Div1     150     Div1
A      Div2       1     Other
A      Div3       4     Other
A      Div4     200     Div4
B      Div1      30     Div1
B      Div2      52     Div2
B      Div3       3     Other
B      Div4       5     Other
B      Div5      96     Div5
B      Div6      82     Div6
B      Div7     200     Div7

(and you can see that I got a few obs wrong in my fake data, but it's fake data. But still the Other values for DIV2 did get calculated correctly.) With 1 PROC REPORT step, I bypassed creating your TABLE1 and let PROC REPORT create TABLE2. And no MERGE was needed either.


  The bottom line is that whatever dataset you sent to PROC FREQ initially is the dataset you would use in Step 1. You would not need my "fake" data program with DATALINES at all. You would only need Step1 PROC REPORT and Step 2 PROC REPORT. Step 1 creates WORK.NEWOUT, which is used by Step2. You can change this name or leave it. If you change the name in Step 1, for OUT=, then you have to change the name in Step 2, too.

cynthia

Super Contributor
Posts: 1,040

Re: Merge

Thanks a lot for the help.

My Table1 i got is after a proc freq step.only then the count variable u see in table1 is got

could you twell me how to approach if I dont have the count variable in Table1???

Regards

SAS Super FREQ
Posts: 8,740

Re: Merge

Hi:

  What DOES your data look like -- the data that you send to PROC FREQ? It is hard to program in a vacuum. You didn't show the data that you started with, you only showed TABLE1. The adjustment would not be too hard (Hint: PROC REPORT will let you request the N statistic and you can RENAME the variable N to COUNT), but I don't want to write a program that could potentially change again if the data I make up is different from the data you actually are using.

cynthia

Super Contributor
Posts: 1,040

Re: Merge


Hi,

Sorry for the inconvinience. As you said the Table1 is the outcome of a proc freq and I agree with you on the N statistic.

Thans so much

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 307 views
  • 1 like
  • 2 in conversation