Hello everyone,
below you can find an example of a data set that i want to turn to the OUTPUT example below as well.
Have DATASET:
Var1 Var2 Var3 Var4
Nick ran1 Y N
Nick ran1 N Y
John ran2 Y N
John ran2 N Y
George ran3 N Y
Output (Want) Dataset:
Var1 Var2 Var3 Var4
Nick ran1 Y Y
John ran2 Y Y
George ran3 N Y
Please explain the logic that allows you to transform the HAVE data set into the WANT data set.
lets just say that for some reason nick and john are both qualified for having VAR 3 AND VAR 4 (e.g. 'Y') but the input program could only accept only one VAR input at a time.
For example
nick wears a black and white shirt
so the dataset becomes as follows:
PROBLEMATIC INPUT DESIRED FORMAT FOR FURTHER ANALYSIS
Name Status White Black ---> Name Status White Black
Nick Dressed Y N Nick Dressed Y Y
Nick Dressed N Y
This would be much easier if you used 0s and 1s instead of N and Y.
data have;
input var1 $ var2 $ var3 var4;
cards;
nick ran1 1 0
nick ran1 0 1
john ran2 1 0
john ran2 0 1
george ran3 0 1
;
proc summary data=have nway;
class var1 var2;
var var3 var4;
output out=want max=;
run;
As a side issue, please don't use different variable names in your examples than in your original data set.
@Primohunter wrote:
Thanks for your response
will keep that in mind. i just used different real life variable names to make the reasoning easier.
Reasoning is easier if you use the same variable names in your examples and actual data. We're trying to help you, but you have to help us too.
Is there another way to do it with those character variables?
Convert char to numeric, use PROC SUMMARY as I have shown, and then convert 0 and 1 back to N and Y.
@Primohunter wrote:
Is there another way to do it with those character variables?
You could use PROC SQL, but the syntax is less elegant than the succinct "max=" of PROC SUMMARY's OUTPUT statement:
data have;
input (var1-var4)($);
cards;
Nick ran1 Y N
Nick ran1 N Y
John ran2 Y N
John ran2 N Y
George ran3 N Y
;
proc sql;
create table want as
select var1, var2, max(var3) as var3, max(var4) as var4
from have
group by 2, 1;
quit;
The "maximum" here is based on the alphabetical order 'N' < 'Y'.
And so I still believe, despite the fine example from @FreelanceReinh , that handling binary variables via numeric 0s and 1s is superior for a number of reasons, and I urge all readers of this thread to use numeric 0s and 1s for binary variables.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.