Hello,
I have the following data set and need to summarize in the manner of the following code. However, instead of outputting to total for the variable PS, I need the distinct total only.
proc means data=WORK.TEST noprint;
by R TC RN SS SH;
var PS;
output out= TEST1 sum (E)=E_F n(PS)=NB_PS;
run;
RN | E | TC | R | SS | SH | PS |
1 | 707 | 5003 | A02 | SS | 2028106601 | |
1 | 698 | 5003 | A02 | SS | 2028106601 | |
1 | 1119 | 5003 | A02 | SS | 2028106601 | |
1 | 2099 | 5003 | A02 | 2028612941 | ||
1 | 1370 | 5003 | A02 | 2024667991 | ||
1 | 865 | 5003 | A02 | 2028221751 | ||
1 | 3026 | 5003 | A02 | 2028221751 | ||
1 | 1008 | 5003 | A02 | 2025744721 | ||
1 | 604 | 6003 | A09 | SH | 2027282411 | |
1 | 1285 | 6003 | A09 | SH | 2027282411 | |
1 | 2107 | 6003 | A09 | 2025877621 | ||
1 | 854 | 6003 | A09 | 2025250411 | ||
1 | 955 | 6003 | A09 | 2024224891 | ||
1 | 502 | 6003 | A09 | 2022288221 |
So instead of getting the output:
R | TC | RN | SS | SH | E_F | NB_PS |
A02 | 5003 | 1 | SS | 8368 | 3 | |
A02 | 5003 | 1 | 2524 | 5 | ||
A09 | 6003 | 1 | SH | 1889 | 2 | |
A09 | 6003 | 1 | 4418 | 4 |
I need the output:
R | TC | RN | SS | SH | E_F | NB_PS |
A02 | 5003 | 1 | SS | 8368 | 1 | |
A02 | 5003 | 1 | 2524 | 4 | ||
A09 | 6003 | 1 | SH | 1889 | 1 | |
A09 | 6003 | 1 | 4418 | 4 |
Thank you for any help!
Here's the structure for a PROC SQL solution.
Proc sql;
create table want as
select R, TC, RN, SS, SH, /*grouping variables*/
/*aggregate functions*/
sum(e) as e_f,
count(distinct PS) as N_PS
from test
group by R, TC, RN, SS, SH
quit;
Explain your logic please?
Where does RN come from?
Do you need the SUM for all records or only the distinct records and how do you identify the distinct records if this is the case?
Proc means cannot do a distinct count, you can try SQL or a double PROC FREQ instead.
RN is a number assigned to each PS. It can be 1,51,55 etc. In this specific example they are all 1.
I need E_F to be the sum of all E but NB_PS needs to be the count of distinct PS. I am not sure how to get the output I need using SQL or PROC FREQ.
Here's the structure for a PROC SQL solution.
Proc sql;
create table want as
select R, TC, RN, SS, SH, /*grouping variables*/
/*aggregate functions*/
sum(e) as e_f,
count(distinct PS) as N_PS
from test
group by R, TC, RN, SS, SH
quit;
Thank you! This is exactly what I needed.
This is a job for PROC FREQ, with a BY statement.
The first run of PROC FREQ will tell you, for example, that 2028106601 appears 3 times in it's BY group. The second run of PROC FREQ will tell you that there is only one value of NB_PS in that BY group.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.