Re: How to find the count of specific variable within a dataset

CathyVI · Posted 01-27-2021 06:28 PM

I created the following synthetic data.

data test;

input Name $ run squat jump pushup;

cards;

Bob 1 0 0 1

Mat 0 1 0 1

Cat 1 0 1 1

Dog 1 0 0 0

Fred 0 1 1 1

Sam 0 0 1 0

Pat 0 0 0 1

Jack 1 1 1 1

Bat 0 0 0 0

;

run;

I have the following table to fill

	One exercise	More than one exercise
Run
Pushup
Jump
Squat

I want to know within my data when someone only do anyone of the exercise or if they did multiple exrcise. For example: When bob ran&pushup- he cannot be in one exercise column but dog can because he only ran. After which I will find the sum for each exercise.

I thought maybe I need to change my data from long to wide but have no idea of what to do next.

Reeza · Posted 01-27-2021 07:30 PM

This would be infinitely easier if you show a fully worked example. I have a strong suspicion of what you want but I'd be guessing. Please provide sample data and sample output that aligns, preferably using the same terms here. So given the sample data, what would you expect to see in the output? And is this representative of your data or do you also need to handle multiple days?

CathyVI · Posted 01-28-2021 08:41 AM

@Reeza If I understand you correctly the dataset I created represented how my real data looks-like. The output should be in percentage, so my output should be similar to this table below;

	One exercise (%)	Multiple exercises(%)
jump	100 (20.00)	800(79.11)
run	212(24.26)	534(66.74

If you look at my previous dataset, in some observation two exercises e.g. jump and run have values whereas only jump has a value in another observation. I want to know if jump was the only exercise done or if there were two exercises-what is the counts. One thing am not sure about is if the two output will be mutually exclusive or not. I hope this is helpful.

Reeza · Posted 01-28-2021 02:49 PM

How did you get those numbers from the table above? It's hard to try and align the example to the output when the numbers don't align.

CathyVI · Posted 01-29-2021 09:23 AM

@Reeza @PhilC

The numbers in the output are made up. I tried usgin this code but my output is not what I wanted. My desired output is when fent is 1 and other drugs are 0, and when fent is 1 and other drugs is 1.

data test2;

set test;

where fent=1;

if fent=1 then do;

if oxy=0 or hydro=0 or codeine =0 then single=1;

else if oxy=1 or hydro=1 or codeine =1 then single =2;

end;

run;

PhilC · Posted 01-28-2021 04:59 PM

I have can produce this using 1 Data step, two PROC TRANSPOSEs, and a PROC TABULATE. This seems too hard. Why is your synthetic data structured this way? Can you choose another structure?

PhilC · Posted 01-29-2021 04:03 PM

Proc format;
  value OneExercise
    0 = "More than one exercise"
    1 = "One exercise"
    ;
data test;
 input Name $ run squat jump pushup;
 array exercise[4] run--pushup;
 OneExercise=sum(of run--pushup)=1;
 format OneExercise OneExercise.;
cards;
Bob 1 0 0 1
Mat 0 1 0 1
Cat 1 0 1 1
Dog 1 0 0 0
Fred 0 1 1 1
Sam 0 0 1 0
Pat 0 0 0 1
Jack 1 1 1 1
Bat 0 0 0 0
;
PROC SORT	;
	BY OneExercise;
RUN;
PROC TRANSPOSE DATA=test 
	OUT=Transpose1
	PREFIX=Count
	NAME=Exercise
;
	BY OneExercise;
	VAR run squat jump pushup;
PROC SORT data=Transpose1	;
	BY OneExercise Exercise;
PROC TRANSPOSE DATA=Transpose1
	OUT=Transpose2 (drop=_NAME_)
	PREFIX=Count
;
	BY OneExercise Exercise;
	VAR Count:;
PROC TABULATE DATA=Transpose2	 	;
	VAR Count1;
	CLASS OneExercise /	ORDER=UNFORMATTED MISSING;
	CLASS Exercise /	ORDER=UNFORMATTED MISSING;
	TABLE /* Row Dimension */
        Exercise =" ",
        /* Column Dimension */
        OneExercise=" "*
          Count1=" "*
            Sum=" "*f=3.0		;
RUN;

	More than one exercise	One exercise
jump	3	1
pushup	5	1
run	3	1
squat	3	0

Tom · Posted 01-29-2021 09:33 AM

How could you have a count of 100 from only 9 input observations. Show the report you want for the example data you posted.

CathyVI · Posted 01-29-2021 09:40 AM

@Tom Please ignore the initial output table. I made a mistake.

Using the data above, I trried the following code it seems to work but I need a second opnion to see if it work on a large dataset;

data test2;

set test;

where fent=1;

if fent=1 then do;

if sum(oxy,hydro,codeine)=0 then F_drug=1;

else if sum(oxy,hydro,codeine)>0 then F_drug =2;

end;

run;

proc freq data=test2;

table F_drug;

run;

Tom · Posted 01-29-2021 11:39 AM

That doesn't seem to have anything to do with your original suggested output. Looks like you are just classifying each observation as to whether any of the list of variables is 1 or not. It is probably easier to just code that new variable using the same 0/1 pattern as your used for the original variables.

data test1;
  set test;
  any = max(of  run squat jump pushup) ;
run;
proc freq ;
 tables any;
run;

Use the OF keyword in the MAX() function (or any function that accepts multiple inputs) will allow you to use variable lists. So this would also work for your example since those are the only numeric variables in the dataset.

any = max(of _numeric_) ;

How to find the count of specific variable within a dataset