SAS Programming

HB · Posted 03-02-2018 11:38 AM

Here is some sample made up data to show what I want.

data successes;
   input code group something success;
datalines;
1104 1 8 0
1104 1 7 0
1105 1 2 0
1105 1 3 1
1105 1 4 0
1106 1 5 0
1106 1 6 1
1106 1 7 0
1106 1 8 0
1106 1 9 0
1106 1 1 0
1107 1 1 0
1107 1 3 0
1107 1 5 0
1107 1 7 0
1107 1 9 1
1108 1 2 0
1109 1 4 0
1109 1 6 0
1110 1 3 1
1110 1 2 1
1110 1 1 1
1111 2 2 0
1111 2 3 1
1111 2 4 0
1112 2 5 0
1112 2 6 1
1112 2 7 0
1112 2 8 0
1112 2 9 0
;
run;

For this data set, there are a number of observations for each code (in the real data anywhere between 1 and 25). A "1" in success means that that code was successful. In this data, codes 1104, 1108, and 1109 had no success in group 1. 1105, 1106, 1107, and 1110 had success.

What I want, is to adjust the data in such a way that I can do a Proc Freq to come up with a success percent by group. Group 1 has a success rate of 4/7 = 57%.

My thoughts were that I would make a successflag of some sort that was a 1 if any observation of a code had a success value of 1. Then de-duplicate based on code and successflag to create a data set I could run the FREQ against.

I tried

data mysuccesses;
	mysuccessflag = 0;
	if group = 1;
	do until (last.code);
		set successes;
		by code;
		if success = 1 then mysuccessflag = 1;
	end;
run;

but that is wrong in many ways. Looping in the data step error. Help here appreciated.

I was trying to figure out how to loop through all the observations for a particular code and then set my flag to 1 if any observation had a success of 1.

I think I want a data set that looks like

data successes;
   input code group something success mysuccessflag;
datalines;
1104 1 8 0 0
1104 1 7 0 0
1105 1 2 0 1
1105 1 3 1 1
1105 1 4 0 1
1106 1 5 0 1
1106 1 6 1 1
1106 1 7 0 1
1106 1 8 0 1
1106 1 9 0 1
1106 1 1 0 1
1107 1 1 0 1
1107 1 3 0 1
1107 1 5 0 1
1107 1 7 0 1
1107 1 9 1 1
1108 1 2 0 0
1109 1 4 0 0
1109 1 6 0 0
1110 1 3 1 1
1110 1 2 1 1
1110 1 1 1 1
1111 2 2 0 1
1111 2 3 1 1
1111 2 4 0 1
1112 2 5 0 1
1112 2 6 1 1
1112 2 7 0 1
1112 2 8 0 1
1112 2 9 0 1
;
run;

With that I could

proc sort data = successes out = cleanedsuccesses nodupkey;
	by code;
run;

proc freq data = cleanedsuccesses;
	where group = 1;
	tables mysuccessflag;
run;

which gets me where I want to be.

But I'm open to suggestions. I can do this with SQL but I want to do it something like this.

Thank you.

ballardw · Posted 03-02-2018 11:57 AM

Assuming you want a data set with the percent of any success by group, success is always 0/1 as described then this may work:

proc summary data=successes nway ;
   class group code;
   var success;
   output out=successmax max=;
run;

proc summary data=successmax nway;
   class group;
   var success;
   output out= perc mean=;
run;

The value of Success in the final set Perc is decimal percentage so instead of 57% it has 0.5714285714. Display as needed.

Note that use of Class variables to group the data doesn't require explicit sorting though the output data will be.

You can use data set options to drop the _type_ and _freq_ variables. _freq_ may help with verifying the number of codes used to calculate the percentage if concerned.

View solution in original post

ballardw · Posted 03-02-2018 11:57 AM

Assuming you want a data set with the percent of any success by group, success is always 0/1 as described then this may work:

proc summary data=successes nway ;
   class group code;
   var success;
   output out=successmax max=;
run;

proc summary data=successmax nway;
   class group;
   var success;
   output out= perc mean=;
run;

The value of Success in the final set Perc is decimal percentage so instead of 57% it has 0.5714285714. Display as needed.

Note that use of Class variables to group the data doesn't require explicit sorting though the output data will be.

You can use data set options to drop the _type_ and _freq_ variables. _freq_ may help with verifying the number of codes used to calculate the percentage if concerned.

HB · Posted 03-02-2018 12:43 PM

proc summary data=successes nway ;

class group code;

var success;

output out=successmax max=;

run;

So, we are summarizing the data by group and code.

The nway option collapses that down to one observation per combo of group and code.

Output creates a file.

Max= is where I kind of get lost. That somehow magically assigns the maximum value of the variable named (var = success) as the value of the summarized success variable. I think.

I don't understand it fully but it works. If I add

PROC FORMAT;

VALUE success_fmt

0 = "No"

1 = "Yes";

RUN;

proc freq data = successmax;

where group = 1;

format success success_fmt.;

tables success;

run;

working right off the first created data set I get

                                          The SAS System           09:10 Friday, March 2, 2018   8

                                        The FREQ Procedure

                                                       Cumulative    Cumulative
                   success    Frequency     Percent     Frequency      Percent
                   
                       No            3       42.86             3        42.86
                       Yes           4       57.14             7       100.00

which gets me there.

Very cool. Thank you.

ballardw · Posted 03-02-2018 01:24 PM

@HB wrote:

proc summary data=successes nway ;

class group code;

var success;

output out=successmax max=;

run;

So, we are summarizing the data by group and code.

The nway option collapses that down to one observation per combo of group and code.

Output creates a file.

Max= is where I kind of get lost. That somehow magically assigns the maximum value of the variable named (var = success) as the value of the summarized success variable. I think.

That is exactly what the MAX statistic does, report the maximum value of the variable(s) within the combinations of the class variable(s).

Since your description was sort of "I want a 'success' for the group/code combination if any of the individual records were a success" then one way to think of that is the maximum value.

0/1 coded variables are very nice for use with some of the summary statistics: n is the number of valid values; mean is percent and sum is the number of 1 values (successes, valid, true, what have you for interpretation).

Reeza · Posted 03-02-2018 11:57 AM

Try a DoW loop instead.

novinosrin · Posted 03-02-2018 04:33 PM

Dow fun:

data successes;
   input code group something success;
datalines;
1104 1 8 0
1104 1 7 0
1105 1 2 0
1105 1 3 1
1105 1 4 0
1106 1 5 0
1106 1 6 1
1106 1 7 0
1106 1 8 0
1106 1 9 0
1106 1 1 0
1107 1 1 0
1107 1 3 0
1107 1 5 0
1107 1 7 0
1107 1 9 1
1108 1 2 0
1109 1 4 0
1109 1 6 0
1110 1 3 1
1110 1 2 1
1110 1 1 1
1111 2 2 0
1111 2 3 1
1111 2 4 0
1112 2 5 0
1112 2 6 1
1112 2 7 0
1112 2 8 0
1112 2 9 0
;
run;

data want;
_success=0;__success=0;
do until(last.group);
	_found=0;
	do until(last.code);
	set successes;
	by group code;
	if not _found and success=1 then _success+success;
	if success=1 then do;__success+success;_found=1;end;
	end;
	if last.group then success_pct=divide(_success,__success)*100;
end;
drop _: code something success;
run;

SAS Programming

"Coalesce" a value across groups of observations

Re: "Coalesce" a value across groups of observations

Re: "Coalesce" a value across groups of observations

Re: "Coalesce" a value across groups of observations

Re: "Coalesce" a value across groups of observations

Re: "Coalesce" a value across groups of observations

Re: "Coalesce" a value across groups of observations

proc sql count non missing observations in variable by group

Selectively deleting observations within a group

Write column values from one observation to another

Count Unique Observations

Grouping observations with start and end marking

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...