Hello,
I have the original dataset.
Now I want to generate the report as
So I guess that my steps would be:
But I am stuck on the first two steps.
My poor code:
proc means data=myfile.old mean min max;
class Lead_Studio_Name Genre;
output out=myfile.mean_result;
run;
proc transpose data=myfile.mean_result;
out=transpose_result;
by Lead_Studio_Name;
var Genre;
run;
I know it is wrong, but I need advice.
Thanks.
Well you've laid out your steps in broad terms but it might be worth fleshing it out to help figure out what you need.
1. Calculate mean, min and max for rotten tomatoes and audience scored
2. Compute variable that is of the form (MIN#, MAX #)
3. Since mean is in same column as min/max it has to be a character field, so convert mean to character as well.
4. Now transpose, a data step with explicit output is probably better.
5. Calculate counts by genre, PROC freq
6. Transpose results from #5 via proc transpose
7. Merge it in with data from #4
8. Add conditional highlights with proc report.
Heres an example of 1-4:
proc means data= have noprint NWAY;
class studio ;
var rotten user;
output out=step1 min= rotten_min user_min max=rotten_max user_max mean=rotten_mean user_mean;
run;
data step2
set step1;
Length Stat $12;
Stat='Mean';
rt_score = put(rotten_mean, 8.1);
user_score = put(user_mean, 8.1);
output;
Stat='(Min, Max)';
rt_score = catx(', ', rotten_min, rotten_max);
user_score=...;
output;
run;
Well you've laid out your steps in broad terms but it might be worth fleshing it out to help figure out what you need.
1. Calculate mean, min and max for rotten tomatoes and audience scored
2. Compute variable that is of the form (MIN#, MAX #)
3. Since mean is in same column as min/max it has to be a character field, so convert mean to character as well.
4. Now transpose, a data step with explicit output is probably better.
5. Calculate counts by genre, PROC freq
6. Transpose results from #5 via proc transpose
7. Merge it in with data from #4
8. Add conditional highlights with proc report.
Heres an example of 1-4:
proc means data= have noprint NWAY;
class studio ;
var rotten user;
output out=step1 min= rotten_min user_min max=rotten_max user_max mean=rotten_mean user_mean;
run;
data step2
set step1;
Length Stat $12;
Stat='Mean';
rt_score = put(rotten_mean, 8.1);
user_score = put(user_mean, 8.1);
output;
Stat='(Min, Max)';
rt_score = catx(', ', rotten_min, rotten_max);
user_score=...;
output;
run;
I am in the PROC TRANSPOSE step.
proc transpose data=myfile.step3 out=myfile.step4(drop=_name_);
by Lead_Studio_Name;
var Genre;
run;
My question is in the original dataset, Genre has 9 but I only get 7 by the code. In the output dataset, I have COL1-COL7. Why?
What does your log say?
85 proc freq data=myfile.have; NOTE: Data file MYFILE.have.DATA is in a format that is native to another host, or the file encoding does not match the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce performance. 86 table Studio*User / out=myfile.step3; 87 88 run; NOTE: There were 136 observations read from the data set MYFILE.have. NOTE: The data set MYFILE.STEP3 has 81 observations and 4 variables. NOTE: PROCEDURE FREQ used (Total process time): real time 0.83 seconds cpu time 0.79 seconds 89 90 data myfile.step3(drop=percent); 91 set myfile.step3; 92 run; NOTE: There were 81 observations read from the data set MYFILE.STEP3. NOTE: The data set MYFILE.STEP3 has 81 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.04 seconds cpu time 0.02 seconds 93 94 95 proc transpose data=myfile.step3 out=myfile.step4(drop=_name_); 96 by Studio; 97 var User; 98 run; NOTE: There were 81 observations read from the data set MYFILE.STEP3. NOTE: The data set MYFILE.STEP4 has 33 observations and 8 variables. NOTE: PROCEDURE TRANSPOSE used (Total process time): real time 0.04 seconds cpu time 0.02 seconds
The image of Step3 is
The proc transpose is incorrect. VAR should reflect the variable (count) to be in new columns and you can use ID/IDlabel to generate appropriate variable names.
Give writing the code a shot and post back if you have issues.
I got it, I add id to get all columns. One more thing is that I ahve some missing values marked as ".".
How can set it as 0?
You have to manually do that unfortunately.
Use a datastep with an array to loop over your Genre and if missing set to 0
Post your data as text, NOT picture. proc sql; create table temp1 as select sex,' Mean ' as statistic, put(mean(weight),12.2) as score_weight length=20, put(mean(height),12.2) as score_height length=20 from sashelp.class group by sex union select sex,'(Min,Max)', cats('(',put(min(weight),12.2),',',put(max(weight),12.2),')') , cats('(',put(min(height),12.2),',',put(max(height),12.2),')') from sashelp.class group by sex order by 1,2; quit; proc freq data=sashelp.class noprint; table sex*age/list out=freq; run; proc transpose data=freq out=temp2(drop=_:) prefix=age; by sex; var count; id age; run; options missing='0'; data want; merge temp1 temp2; by sex; run; proc format; value fmt 0,.='red' 1='yellow' 2-high='green'; run; proc report data=want nowd; define statistic/style={just=center}; define score_:/style={just=center}; define age:/style={backgroundcolor=fmt.}; run;
proc sql; create table temp1 as select sex,' Mean ' as statistic, put(mean(weight),12.2) as score_weight length=20, put(mean(height),12.2) as score_height length=20 from sashelp.class group by sex union select sex,'(Min,Max)', cats('(',put(min(weight),12.2),',',put(max(weight),12.2),')') , cats('(',put(min(height),12.2),',',put(max(height),12.2),')') from sashelp.class group by sex order by 1,2; quit; proc freq data=sashelp.class noprint; table sex*age/list out=freq; run; proc transpose data=freq out=temp2(drop=_:) prefix=age; by sex; var count; id age; run; options missing='0'; data want; merge temp1 temp2; by sex; run; proc format; value fmt 0,.='red' 1='yellow' 2='green' 3-high='purple'; run; proc report data=want nowd; define sex/order; define statistic/style={just=center}; define score_:/style={just=center}; define age:/style={backgroundcolor=fmt.}; run;
Please explain the reasoning that you duplicate the counts for the Genre and have a separate row for the values of range of the ratings scores.
I would generally expect a separate row to provide additional information. It seems more "natural" that the min/max appear on the same row and only have one row per studio. Which means that a single operation, either Proc Report or Tabulate would work.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.