Hello,
I have the following data set and need to summarize in the manner of the following code. However, instead of outputting to total for the variable PS, I need the distinct total only.
proc means data=WORK.TEST noprint;
by R TC RN SS SH;
var PS;
output out= TEST1 sum (E)=E_F n(PS)=NB_PS;
run;
RN | E | TC | R | SS | SH | PS |
1 | 707 | 5003 | A02 | SS | 2028106601 | |
1 | 698 | 5003 | A02 | SS | 2028106601 | |
1 | 1119 | 5003 | A02 | SS | 2028106601 | |
1 | 2099 | 5003 | A02 | 2028612941 | ||
1 | 1370 | 5003 | A02 | 2024667991 | ||
1 | 865 | 5003 | A02 | 2028221751 | ||
1 | 3026 | 5003 | A02 | 2028221751 | ||
1 | 1008 | 5003 | A02 | 2025744721 | ||
1 | 604 | 6003 | A09 | SH | 2027282411 | |
1 | 1285 | 6003 | A09 | SH | 2027282411 | |
1 | 2107 | 6003 | A09 | 2025877621 | ||
1 | 854 | 6003 | A09 | 2025250411 | ||
1 | 955 | 6003 | A09 | 2024224891 | ||
1 | 502 | 6003 | A09 | 2022288221 |
So instead of getting the output:
R | TC | RN | SS | SH | E_F | NB_PS |
A02 | 5003 | 1 | SS | 8368 | 3 | |
A02 | 5003 | 1 | 2524 | 5 | ||
A09 | 6003 | 1 | SH | 1889 | 2 | |
A09 | 6003 | 1 | 4418 | 4 |
I need the output:
R | TC | RN | SS | SH | E_F | NB_PS |
A02 | 5003 | 1 | SS | 8368 | 1 | |
A02 | 5003 | 1 | 2524 | 4 | ||
A09 | 6003 | 1 | SH | 1889 | 1 | |
A09 | 6003 | 1 | 4418 | 4 |
Thank you for any help!
Here's the structure for a PROC SQL solution.
Proc sql;
create table want as
select R, TC, RN, SS, SH, /*grouping variables*/
/*aggregate functions*/
sum(e) as e_f,
count(distinct PS) as N_PS
from test
group by R, TC, RN, SS, SH
quit;
Explain your logic please?
Where does RN come from?
Do you need the SUM for all records or only the distinct records and how do you identify the distinct records if this is the case?
Proc means cannot do a distinct count, you can try SQL or a double PROC FREQ instead.
RN is a number assigned to each PS. It can be 1,51,55 etc. In this specific example they are all 1.
I need E_F to be the sum of all E but NB_PS needs to be the count of distinct PS. I am not sure how to get the output I need using SQL or PROC FREQ.
Here's the structure for a PROC SQL solution.
Proc sql;
create table want as
select R, TC, RN, SS, SH, /*grouping variables*/
/*aggregate functions*/
sum(e) as e_f,
count(distinct PS) as N_PS
from test
group by R, TC, RN, SS, SH
quit;
Thank you! This is exactly what I needed.
This is a job for PROC FREQ, with a BY statement.
The first run of PROC FREQ will tell you, for example, that 2028106601 appears 3 times in it's BY group. The second run of PROC FREQ will tell you that there is only one value of NB_PS in that BY group.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.