Greetings. I've imported data from an access file. The key variables of the dataset follow:
specie (goat or sheep)
ID (animals ID number)
gender (F, M, M/C)
biopsy_letter (A-F; some had only one biopsy, A; others may have had up to F biopsies over their lifetime)
biopsy_date (date of each biopsy)
step (each 'step' is measured from a single biopsy; each biopsy had four 'step' measures)
So after import, the table of data has numerous rows per animal. For instance: Goat 3876 is represented by 4 rows since this animal had only one biopsy performed for which there are four step measures. Another: Sheep 4465 has 12 rows since this animal had three biopsies for each of which there are four step measures.
The first goal is to simply characterize the dataset: How many goats and sheep are represented? Within each species, how many female, male, castrated male animals are represented?
My first reaction is to do this as follows:
proc freq ;
tables specie*gender ;
run ;
But that just gives me a row count for these factors; i.e., goat 3876 is counted as four animals and sheep 4465 is counted as 12. Is there a way to identify the subject? Do I have to transform the dataset by animal ID and then use proc freq? I've no experience with report, tabulate, summary procedures. Is the simple solution in one of those?
Thank you for your help,
Dave
use nodupkey with sort while you keep only data that matter that could help you.
data test ;
input specie $ ID Gender $ Biopsy_letter $ Step;
cards;
G 122 M A 1
G 122 M A 2
G 122 M A 3
G 122 M A 4
G 132 F A 1
G 132 F A 2
G 132 F A 3
G 132 F A 4
S 546 F A 1
S 546 F A 2
S 546 F A 3
S 546 F A 4
S 546 F B 1
S 546 F B 2
S 546 F B 3
S 546 F B 4
;
run;
proc sort data=test(keep=specie ID Gender) out=new nodupkey ;
by specie ID Gender ;
run ;
proc freq ;
tables specie*gender ;
run ;
What do you want the output to look like? It sounds like proc freq and/or tabulate you want, but without an example of input/output data its hard to say.
Reeza,
The table produced by proc freq would be fine. So rows would be specie (goat of sheep) and columns would be number of females, intact males, and castrated males. the cells of the table would then tell me the number or proportion of each specie*gender pairing. Only trouble is the GLM format of the imported data over represents each animal with too many rows. I'm working on the transformed dataset now (it always takes me a while to figure out how to do this) but I was wondering if there was a more elegant procedure or command within proc freq that might allow me to identify subjects.
Hope that answers your question.
Dave
use nodupkey with sort while you keep only data that matter that could help you.
data test ;
input specie $ ID Gender $ Biopsy_letter $ Step;
cards;
G 122 M A 1
G 122 M A 2
G 122 M A 3
G 122 M A 4
G 132 F A 1
G 132 F A 2
G 132 F A 3
G 132 F A 4
S 546 F A 1
S 546 F A 2
S 546 F A 3
S 546 F A 4
S 546 F B 1
S 546 F B 2
S 546 F B 3
S 546 F B 4
;
run;
proc sort data=test(keep=specie ID Gender) out=new nodupkey ;
by specie ID Gender ;
run ;
proc freq ;
tables specie*gender ;
run ;
I was unfamiliar with 'nodupkey' but it works very well.
Thank you for the suggestion.
Dave
One way, get one record per animal then report.
Proc sql;
create table want as
select distinct specie, id, gender
from have;
quit;
proc tabulate data=have;
class specie, gender;
table specie, gender*n;
run;
Thank you. I'm not familiar with Proc SQL but was successful in applying your code. I did need to remove the comma in the class statement of Proc Tabulate. Other than that, this solution worked great. Thank you,
Dave
If there's an easy way to know the first record of each animal you could filter on that,
i.e. if the first record of every animal was biopsy A and step =1
proc freq data=have;
where biopsy='A' and step=1;
table species*gender;
run;
Otherwise you'll have to use one of the suggestions above.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.