I created the following synthetic data.
data test;
input Name $ run squat jump pushup;
cards;
Bob 1 0 0 1
Mat 0 1 0 1
Cat 1 0 1 1
Dog 1 0 0 0
Fred 0 1 1 1
Sam 0 0 1 0
Pat 0 0 0 1
Jack 1 1 1 1
Bat 0 0 0 0
;
run;
I have the following table to fill
One exercise | More than one exercise | |
Run | ||
Pushup | ||
Jump | ||
Squat |
I want to know within my data when someone only do anyone of the exercise or if they did multiple exrcise. For example: When bob ran&pushup- he cannot be in one exercise column but dog can because he only ran. After which I will find the sum for each exercise.
I thought maybe I need to change my data from long to wide but have no idea of what to do next.
@Reeza If I understand you correctly the dataset I created represented how my real data looks-like. The output should be in percentage, so my output should be similar to this table below;
One exercise (%) | Multiple exercises(%) | |
jump | 100 (20.00) | 800(79.11) |
run | 212(24.26) | 534(66.74 |
If you look at my previous dataset, in some observation two exercises e.g. jump and run have values whereas only jump has a value in another observation. I want to know if jump was the only exercise done or if there were two exercises-what is the counts. One thing am not sure about is if the two output will be mutually exclusive or not. I hope this is helpful.
The numbers in the output are made up. I tried usgin this code but my output is not what I wanted. My desired output is when fent is 1 and other drugs are 0, and when fent is 1 and other drugs is 1.
data test2;
set test;
where fent=1;
if fent=1 then do;
if oxy=0 or hydro=0 or codeine =0 then single=1;
else if oxy=1 or hydro=1 or codeine =1 then single =2;
end;
run;
I have can produce this using 1 Data step, two PROC TRANSPOSEs, and a PROC TABULATE. This seems too hard. Why is your synthetic data structured this way? Can you choose another structure?
Proc format;
value OneExercise
0 = "More than one exercise"
1 = "One exercise"
;
data test;
input Name $ run squat jump pushup;
array exercise[4] run--pushup;
OneExercise=sum(of run--pushup)=1;
format OneExercise OneExercise.;
cards;
Bob 1 0 0 1
Mat 0 1 0 1
Cat 1 0 1 1
Dog 1 0 0 0
Fred 0 1 1 1
Sam 0 0 1 0
Pat 0 0 0 1
Jack 1 1 1 1
Bat 0 0 0 0
;
PROC SORT ;
BY OneExercise;
RUN;
PROC TRANSPOSE DATA=test
OUT=Transpose1
PREFIX=Count
NAME=Exercise
;
BY OneExercise;
VAR run squat jump pushup;
PROC SORT data=Transpose1 ;
BY OneExercise Exercise;
PROC TRANSPOSE DATA=Transpose1
OUT=Transpose2 (drop=_NAME_)
PREFIX=Count
;
BY OneExercise Exercise;
VAR Count:;
PROC TABULATE DATA=Transpose2 ;
VAR Count1;
CLASS OneExercise / ORDER=UNFORMATTED MISSING;
CLASS Exercise / ORDER=UNFORMATTED MISSING;
TABLE /* Row Dimension */
Exercise =" ",
/* Column Dimension */
OneExercise=" "*
Count1=" "*
Sum=" "*f=3.0 ;
RUN;
More than one exercise | One exercise | |
---|---|---|
jump | 3 | 1 |
pushup | 5 | 1 |
run | 3 | 1 |
squat | 3 | 0 |
How could you have a count of 100 from only 9 input observations. Show the report you want for the example data you posted.
@Tom Please ignore the initial output table. I made a mistake.
Using the data above, I trried the following code it seems to work but I need a second opnion to see if it work on a large dataset;
data test2;
set test;
where fent=1;
if fent=1 then do;
if sum(oxy,hydro,codeine)=0 then F_drug=1;
else if sum(oxy,hydro,codeine)>0 then F_drug =2;
end;
run;
proc freq data=test2;
table F_drug;
run;
That doesn't seem to have anything to do with your original suggested output. Looks like you are just classifying each observation as to whether any of the list of variables is 1 or not. It is probably easier to just code that new variable using the same 0/1 pattern as your used for the original variables.
data test1;
set test;
any = max(of run squat jump pushup) ;
run;
proc freq ;
tables any;
run;
Use the OF keyword in the MAX() function (or any function that accepts multiple inputs) will allow you to use variable lists. So this would also work for your example since those are the only numeric variables in the dataset.
any = max(of _numeric_) ;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.