- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is some sample made up data to show what I want.
data successes;
input code group something success;
datalines;
1104 1 8 0
1104 1 7 0
1105 1 2 0
1105 1 3 1
1105 1 4 0
1106 1 5 0
1106 1 6 1
1106 1 7 0
1106 1 8 0
1106 1 9 0
1106 1 1 0
1107 1 1 0
1107 1 3 0
1107 1 5 0
1107 1 7 0
1107 1 9 1
1108 1 2 0
1109 1 4 0
1109 1 6 0
1110 1 3 1
1110 1 2 1
1110 1 1 1
1111 2 2 0
1111 2 3 1
1111 2 4 0
1112 2 5 0
1112 2 6 1
1112 2 7 0
1112 2 8 0
1112 2 9 0
;
run;
For this data set, there are a number of observations for each code (in the real data anywhere between 1 and 25). A "1" in success means that that code was successful. In this data, codes 1104, 1108, and 1109 had no success in group 1. 1105, 1106, 1107, and 1110 had success.
What I want, is to adjust the data in such a way that I can do a Proc Freq to come up with a success percent by group. Group 1 has a success rate of 4/7 = 57%.
My thoughts were that I would make a successflag of some sort that was a 1 if any observation of a code had a success value of 1. Then de-duplicate based on code and successflag to create a data set I could run the FREQ against.
I tried
data mysuccesses;
mysuccessflag = 0;
if group = 1;
do until (last.code);
set successes;
by code;
if success = 1 then mysuccessflag = 1;
end;
run;
but that is wrong in many ways. Looping in the data step error. Help here appreciated.
I was trying to figure out how to loop through all the observations for a particular code and then set my flag to 1 if any observation had a success of 1.
I think I want a data set that looks like
data successes;
input code group something success mysuccessflag;
datalines;
1104 1 8 0 0
1104 1 7 0 0
1105 1 2 0 1
1105 1 3 1 1
1105 1 4 0 1
1106 1 5 0 1
1106 1 6 1 1
1106 1 7 0 1
1106 1 8 0 1
1106 1 9 0 1
1106 1 1 0 1
1107 1 1 0 1
1107 1 3 0 1
1107 1 5 0 1
1107 1 7 0 1
1107 1 9 1 1
1108 1 2 0 0
1109 1 4 0 0
1109 1 6 0 0
1110 1 3 1 1
1110 1 2 1 1
1110 1 1 1 1
1111 2 2 0 1
1111 2 3 1 1
1111 2 4 0 1
1112 2 5 0 1
1112 2 6 1 1
1112 2 7 0 1
1112 2 8 0 1
1112 2 9 0 1
;
run;
With that I could
proc sort data = successes out = cleanedsuccesses nodupkey;
by code;
run;
proc freq data = cleanedsuccesses;
where group = 1;
tables mysuccessflag;
run;
which gets me where I want to be.
But I'm open to suggestions. I can do this with SQL but I want to do it something like this.
Thank you.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming you want a data set with the percent of any success by group, success is always 0/1 as described then this may work:
proc summary data=successes nway ; class group code; var success; output out=successmax max=; run; proc summary data=successmax nway; class group; var success; output out= perc mean=; run;
The value of Success in the final set Perc is decimal percentage so instead of 57% it has 0.5714285714. Display as needed.
Note that use of Class variables to group the data doesn't require explicit sorting though the output data will be.
You can use data set options to drop the _type_ and _freq_ variables. _freq_ may help with verifying the number of codes used to calculate the percentage if concerned.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming you want a data set with the percent of any success by group, success is always 0/1 as described then this may work:
proc summary data=successes nway ; class group code; var success; output out=successmax max=; run; proc summary data=successmax nway; class group; var success; output out= perc mean=; run;
The value of Success in the final set Perc is decimal percentage so instead of 57% it has 0.5714285714. Display as needed.
Note that use of Class variables to group the data doesn't require explicit sorting though the output data will be.
You can use data set options to drop the _type_ and _freq_ variables. _freq_ may help with verifying the number of codes used to calculate the percentage if concerned.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc summary data=successes nway ;
class group code;
var success;
output out=successmax max=;
run;
So, we are summarizing the data by group and code.
The nway option collapses that down to one observation per combo of group and code.
Output creates a file.
Max= is where I kind of get lost. That somehow magically assigns the maximum value of the variable named (var = success) as the value of the summarized success variable. I think.
I don't understand it fully but it works. If I add
PROC FORMAT;
VALUE success_fmt
0 = "No"
1 = "Yes";
RUN;
proc freq data = successmax;
where group = 1;
format success success_fmt.;
tables success;
run;
working right off the first created data set I get
The SAS System 09:10 Friday, March 2, 2018 8
The FREQ Procedure
Cumulative Cumulative
success Frequency Percent Frequency Percent
No 3 42.86 3 42.86
Yes 4 57.14 7 100.00
Very cool. Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@HB wrote:
proc summary data=successes nway ;
class group code;
var success;
output out=successmax max=;
run;
So, we are summarizing the data by group and code.
The nway option collapses that down to one observation per combo of group and code.
Output creates a file.
Max= is where I kind of get lost. That somehow magically assigns the maximum value of the variable named (var = success) as the value of the summarized success variable. I think.
That is exactly what the MAX statistic does, report the maximum value of the variable(s) within the combinations of the class variable(s).
Since your description was sort of "I want a 'success' for the group/code combination if any of the individual records were a success" then one way to think of that is the maximum value.
0/1 coded variables are very nice for use with some of the summary statistics: n is the number of valid values; mean is percent and sum is the number of 1 values (successes, valid, true, what have you for interpretation).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try a DoW loop instead.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dow fun:
data successes;
input code group something success;
datalines;
1104 1 8 0
1104 1 7 0
1105 1 2 0
1105 1 3 1
1105 1 4 0
1106 1 5 0
1106 1 6 1
1106 1 7 0
1106 1 8 0
1106 1 9 0
1106 1 1 0
1107 1 1 0
1107 1 3 0
1107 1 5 0
1107 1 7 0
1107 1 9 1
1108 1 2 0
1109 1 4 0
1109 1 6 0
1110 1 3 1
1110 1 2 1
1110 1 1 1
1111 2 2 0
1111 2 3 1
1111 2 4 0
1112 2 5 0
1112 2 6 1
1112 2 7 0
1112 2 8 0
1112 2 9 0
;
run;
data want;
_success=0;__success=0;
do until(last.group);
_found=0;
do until(last.code);
set successes;
by group code;
if not _found and success=1 then _success+success;
if success=1 then do;__success+success;_found=1;end;
end;
if last.group then success_pct=divide(_success,__success)*100;
end;
drop _: code something success;
run;