Hi Team,
I was asked to group the less than or equal to 5 as "other";
so i first did a proc freq of the cohort----the output produced is something like table1
for the table 1 i write the code:
if count<=5 then Div2=Other as the output is shown in Table2;
Now i take the table 2 (after dropping count) and merge it back to cohort table(by Dept and Div)
In the report i will take Dept and Div2 from and use Group option in the proc report Define statement...and get as shown in table3 or
 Do i need to do anything else
????
Table1
Dept Div count
A Div1 150
A Div2 1
A Div3 4
A Div4 200
B Div1 25
B Div2 52
B Div3 3
B Div4 5
B Div5 96
B Div6 82
B Div7 100
Table2
Dept Div Div2
A Div1 Div1
A Div2 other
A Div3 other
A Div4 Div4
B Div 1 Div1
B Div2 Div2
B Div3 other
B Div4 other
B Div5 Div5
B Div6 Div6
B Div7 Div7
Table3
Dept Div2
A Div1 150
other 5
Div4 200
B Div1 25
Div2 52
other 8
Div5 96
Div6 82
Div7 100
Thanks
Hi:
Well, if you use PROC REPORT, you can bypass the PROC FREQ and the need to do a MERGE. It will take 2 PROC REPORT steps, but that is only 2 passes through the data. If you do PROC FREQ, a merge and a REPORT, then you are touching data 3 times. In order to make an example, I had to make some "fake" data for illustration purposes.
In the program below, step 1 uses PROC REPORT to summarize the "fake" data AND create the DIV2 variable to create an output dataset named WORK.NEWOUT. No PROC FREQ necessary because the GROUP usage will allow REPORT to summarize the value of COUNT for each DEPT/DIV. Then step 2 just uses WORK.NEWOUT in a PROC REPORT step to produce the final report. No merge necessary. Just send the final dataset to PROC REPORT -- I added a SUMMARIZE in an RBREAK to show you that you can still summarize on COUNT.
cynthia
data testit;
infile datalines;
input Dept $ div $ count;
return;
datalines;
A Div1 75
A Div1 75
A Div2 1
A Div3 2
A Div3 2
A Div4 50
A Div4 15
A Div4 85
A Div4 50
B Div1 15 
B Div1 5 
B Div1 10 
B Div2 26
B Div2 13
B Div2 13
B Div3 1
B Div3 2
B Div4 2
B Div4 3
B Div5 24
B Div5 24
B Div5 48
B Div6 41
B Div6 10
B Div6 21
B Div6 10
B Div7 10 
B Div7 90
B Div7 50
B Div7 50
;
run;
    
** This PROC REPORT step makes a DATASET called;
** WORK.NEWOUT with the summary information and the ;
** new column DIV2;
ods listing;
proc report data=testit nowd out=newout;
  title '1) Prepare Data with PROC REPORT';
  column dept  div count div2 ;
  define dept / group;
  define div / group;
  define count / sum;
  define div2 / computed;
  compute div2 / character length=8;
    if count.sum gt 5 then div2 = div;
    else div2 = 'Other';
  endcomp;
run;
ods listing close;
  
ods html file='c:\temp\usereport.html';
  
proc report data=newout nowd;
  title '2) Final Report using WORK.NEWOUT';
  column dept div2 count;
  define dept / group;
  define div2 / group 'Div';
  define count / sum;
  break after dept / summarize;
  compute after dept;
    line ' ';
  endcomp;
run;
ods html close;
Hi:
Well, if you use PROC REPORT, you can bypass the PROC FREQ and the need to do a MERGE. It will take 2 PROC REPORT steps, but that is only 2 passes through the data. If you do PROC FREQ, a merge and a REPORT, then you are touching data 3 times. In order to make an example, I had to make some "fake" data for illustration purposes.
In the program below, step 1 uses PROC REPORT to summarize the "fake" data AND create the DIV2 variable to create an output dataset named WORK.NEWOUT. No PROC FREQ necessary because the GROUP usage will allow REPORT to summarize the value of COUNT for each DEPT/DIV. Then step 2 just uses WORK.NEWOUT in a PROC REPORT step to produce the final report. No merge necessary. Just send the final dataset to PROC REPORT -- I added a SUMMARIZE in an RBREAK to show you that you can still summarize on COUNT.
cynthia
data testit;
infile datalines;
input Dept $ div $ count;
return;
datalines;
A Div1 75
A Div1 75
A Div2 1
A Div3 2
A Div3 2
A Div4 50
A Div4 15
A Div4 85
A Div4 50
B Div1 15 
B Div1 5 
B Div1 10 
B Div2 26
B Div2 13
B Div2 13
B Div3 1
B Div3 2
B Div4 2
B Div4 3
B Div5 24
B Div5 24
B Div5 48
B Div6 41
B Div6 10
B Div6 21
B Div6 10
B Div7 10 
B Div7 90
B Div7 50
B Div7 50
;
run;
    
** This PROC REPORT step makes a DATASET called;
** WORK.NEWOUT with the summary information and the ;
** new column DIV2;
ods listing;
proc report data=testit nowd out=newout;
  title '1) Prepare Data with PROC REPORT';
  column dept  div count div2 ;
  define dept / group;
  define div / group;
  define count / sum;
  define div2 / computed;
  compute div2 / character length=8;
    if count.sum gt 5 then div2 = div;
    else div2 = 'Other';
  endcomp;
run;
ods listing close;
  
ods html file='c:\temp\usereport.html';
  
proc report data=newout nowd;
  title '2) Final Report using WORK.NEWOUT';
  column dept div2 count;
  define dept / group;
  define div2 / group 'Div';
  define count / sum;
  break after dept / summarize;
  compute after dept;
    line ' ';
  endcomp;
run;
ods html close;
Hi,
Thanks for the detailed explanation. I appreciate your time
Regards
Hi,
I have a couple of questions here??
I dont have have the Table1 dataset initially.
I did a proc freq to get it!!!
So when we have to bypass it....how to go about???
The testit data you created is very huge with the specifications I provided here
Regards
And Imagine
Hi:
I don't think my WORK.TESTIT is HUGE, but it is "fake" data. You had one observation for each DEPT/DIV combination in TABLE1 and then wanted to create DIV2 to make an "Other" category for the observations where the summary COUNT was LT 5.
WORK.TESTIT goes into PROC REPORT in step 1. What comes out of PROC REPORT should be very close to your TABLE1. So, whatever dataset you used in order to GET table 1 would be the data= value you would use in STEP1. But then, my PROC REPORT summarizes the data AND creates the DIV2 variable. In your example, you had:
starting data --> PROC FREQ creates TABLE1 --> you add DIV2 to create TABLE2 --> merge makes final table --> final table to PROC REPORT
whereas in my program the process is this:
starting data --> to PROC REPORT1 creates WORK.NEWOUT and, creates DIV2 (in 1 step)--> WORK.NEWOUT to PROC REPORT2
You did not show your starting data or any of your code. You only described your code and showed TABLE 1/TABLE2. I made my "fake" data by making fake observations to simulate what you got out of PROC FREQ. I did that so my dataset from step 1 (WORK.NEWOUT) would "look" like your TABLE1/TABLE2 combined. For example, your TABLE1 had:
Dept Div count
A    Div1 150
A    Div2   1
A    Div3   4
A    Div4 200
B    Div1  25 
B    Div2  52
B    Div3   3
B    Div4   5
B    Div5  96
B    Div6  82
B    Div7 100
  
Then you showed how you created DIV2 and got TABLE2 and then you said you merged back to get the DIV2 data with the TABLE1 data. and THEN you sent the final data to PROC REPORT.
I was trying to simplify your programming approach. So, my WORK.NEWOUT had:
Dept    div     count    div2
 A      Div1     150     Div1
 A      Div2       1     Other
 A      Div3       4     Other
 A      Div4     200     Div4
 B      Div1      30     Div1
 B      Div2      52     Div2
 B      Div3       3     Other
 B      Div4       5     Other
 B      Div5      96     Div5
 B      Div6      82     Div6
 B      Div7     200     Div7
(and you can see that I got a few obs wrong in my fake data, but it's fake data. But still the Other values for DIV2 did get calculated correctly.) With 1 PROC REPORT step, I bypassed creating your TABLE1 and let PROC REPORT create TABLE2. And no MERGE was needed either.
  The bottom line is that whatever dataset you sent to PROC FREQ initially is the dataset you would use in Step 1. You would not need my "fake" data program with DATALINES at all. You would only need Step1 PROC REPORT and Step 2 PROC REPORT. Step 1 creates WORK.NEWOUT, which is used by Step2. You can change this name or leave it. If you change the name in Step 1, for OUT=, then you have to change the name in Step 2, too.
cynthia
Thanks a lot for the help.
My Table1 i got is after a proc freq step.only then the count variable u see in table1 is got
could you twell me how to approach if I dont have the count variable in Table1???
Regards
Hi:
What DOES your data look like -- the data that you send to PROC FREQ? It is hard to program in a vacuum. You didn't show the data that you started with, you only showed TABLE1. The adjustment would not be too hard (Hint: PROC REPORT will let you request the N statistic and you can RENAME the variable N to COUNT), but I don't want to write a program that could potentially change again if the data I make up is different from the data you actually are using.
cynthia
Hi,
Sorry for the inconvinience. As you said the Table1 is the outcome of a proc freq and I agree with you on the N statistic.
Thans so much
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
