Hi all,
I am trying to run a two sample t-test on a large csv data set that has many blanks (some of the data given to me was not complete) and is set up in columns with diagnosis, count, efficiency, etc. I am trying to call the first column of the data, which is the diagnosis that splits the data up into the four groups that I want to run multiple T-Tests on (I can run t-tests in one chunk of code that will compare group 1 to group 2, group 1 to group 3, group 1 to group 4, group 2 to group 3, group 2 to group 4, etc). I am also trying to call the third column, which is the efficiency, so that I can compare the results of the four different groups in this one category. Just to clarify, the diagnosis is what splits the dataset into four groups (all the data in this column is just ones, twos, threes, and fours) and the efficiency column, which is variable I want to compare, is two over (the count column is between the diagnosis and efficiency).
Here is my code:
data master;
infile "/folders/myfolders/sasuser.v94/master.csv" dlm=',' DSD firstobs=2;
input @;
diagnosis = scan(_INFILE_,1,',');
se = scan(_INFILE_,3,',');
if left(se)='.' then delete;
put _all_;
data master;
array _char diagnosis se ;
array _num 8 diagnosis1 se1;
do i=1 to dim(_char);
_num(i) = input(_char(i),best32.);
end;
run;
proc glm data=master alpha=0.05;
title "Two sample t-test on se counts";
class diagnosis;
run;
proc ttest data=master;
where diagnosis in (1,2);
class diagnosis;
var se;
run;
proc ttest data=master;
where diagnosis in (1,3);
class diagnosis;
var se;
run;
proc ttest data=master;
where diagnosis in (1,4);
class diagnosis;
var se;
run;
proc ttest data=master;
where diagnosis in (2,3);
class diagnosis;
var se;
run;
proc ttest data=master;
where diagnosis in (2,4);
class diagnosis;
var se;
run;
proc ttest data=master;
where diagnosis in (3,4);
class diagnosis;
var se;
run;
This code will not generate a T-Test, and whenever I run it it will just directly take me to the output data. Can you please point out why this code is not generating a T-Test, and propose a new form of code to replace this that would generate a T-Test for me?
You already tried it but likely have an error:. This from your code:
data master;
array _char diagnosis se ;
array _num 8 diagnosis1 se1;
do i=1 to dim(_char);
_num(i) = input(_char(i),best32.);
end;
Should be
data master;
set master;
array _char diagnosis se ;
array _num 8 diagnosis1 se1;
do i=1 to dim(_char);
_num(i) = input(_char(i),best32.);
end;
And use the variable SE1 created in the second datastep.
The most likely cause is that you SE variable may not be numeric. You create it with SCAN which will result in a text variable.
Proc TTest requires the VAR variables to be numeric. Use the SE1 variable you created in the proc.
Do you mind if you walk me through how to make the SE variable numeric? I do not know where to put the code for that or what the code to make it numeric would be.
You already tried it but likely have an error:. This from your code:
data master;
array _char diagnosis se ;
array _num 8 diagnosis1 se1;
do i=1 to dim(_char);
_num(i) = input(_char(i),best32.);
end;
Should be
data master;
set master;
array _char diagnosis se ;
array _num 8 diagnosis1 se1;
do i=1 to dim(_char);
_num(i) = input(_char(i),best32.);
end;
And use the variable SE1 created in the second datastep.
Thank you so much! My code runs the exact way I wanted it to!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.