BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
das
Obsidian | Level 7 das
Obsidian | Level 7

Greetings. I've imported data from an access file. The key variables of the dataset follow:

specie (goat or sheep)

ID (animals ID number)

gender (F, M, M/C)

biopsy_letter (A-F; some had only one biopsy, A; others may have had up to F biopsies over their lifetime)

biopsy_date (date of each biopsy)

step (each 'step' is measured from a single biopsy; each biopsy had four 'step' measures)

So after import, the table of data has numerous rows per animal. For instance: Goat 3876 is represented by 4 rows since this animal had only one biopsy performed for which there are four step measures. Another: Sheep 4465 has 12 rows since this animal had three biopsies for each of which there are four step measures.

The first goal is to simply characterize the dataset: How many goats and sheep are represented? Within each species, how many female, male, castrated male animals are represented?

My first reaction is to do this as follows:

proc freq ;

     tables specie*gender ;

run ;

But that just gives me a row count for these factors; i.e., goat 3876 is counted as four animals and sheep 4465 is counted as 12. Is there a way to identify the subject? Do I have to transform the dataset by animal ID and then use proc freq? I've no experience with report, tabulate, summary procedures. Is the simple solution in one of those?

Thank you for your help,

Dave

1 ACCEPTED SOLUTION

Accepted Solutions
mohamed_zaki
Barite | Level 11

use nodupkey with sort while you keep only data that matter that could help you.

data test ;

input specie $ ID Gender $ Biopsy_letter $ Step;

cards;

G 122 M A 1

G 122 M A 2

G 122 M A 3

G 122 M A 4

G 132 F A 1

G 132 F A 2

G 132 F A 3

G 132 F A 4

S 546 F A 1

S 546 F A 2

S 546 F A 3

S 546 F A 4

S 546 F B 1

S 546 F B 2

S 546 F B 3

S 546 F B 4

;

run;

proc sort data=test(keep=specie ID Gender) out=new  nodupkey ;

  by specie ID Gender  ;

run ;

proc freq ;

     tables specie*gender ;

run ;

View solution in original post

7 REPLIES 7
Reeza
Super User

What do you want the output to look like? It sounds like proc freq and/or tabulate you want, but without an example of input/output data its hard to say.

das
Obsidian | Level 7 das
Obsidian | Level 7

Reeza,

The table produced by proc freq would be fine. So rows would be specie (goat of sheep) and columns would be number of females, intact males, and castrated males. the cells of the table would then tell me the number or proportion of each specie*gender pairing. Only trouble is the GLM format of the imported data over represents each animal with too many rows. I'm working on the transformed dataset now (it always takes me a while to figure out how to do this) but I was wondering if there was a more elegant procedure or command within proc freq that might allow me to identify subjects.

Hope that answers your question.

Dave

mohamed_zaki
Barite | Level 11

use nodupkey with sort while you keep only data that matter that could help you.

data test ;

input specie $ ID Gender $ Biopsy_letter $ Step;

cards;

G 122 M A 1

G 122 M A 2

G 122 M A 3

G 122 M A 4

G 132 F A 1

G 132 F A 2

G 132 F A 3

G 132 F A 4

S 546 F A 1

S 546 F A 2

S 546 F A 3

S 546 F A 4

S 546 F B 1

S 546 F B 2

S 546 F B 3

S 546 F B 4

;

run;

proc sort data=test(keep=specie ID Gender) out=new  nodupkey ;

  by specie ID Gender  ;

run ;

proc freq ;

     tables specie*gender ;

run ;

das
Obsidian | Level 7 das
Obsidian | Level 7

I was unfamiliar with 'nodupkey' but it works very well.

Thank you for the suggestion.

Dave

ballardw
Super User

One way, get one record per animal then report.

Proc sql;

     create table want as

     select distinct specie, id, gender

     from have;

quit;

proc tabulate data=have;

     class specie, gender;

     table specie, gender*n;

run;

das
Obsidian | Level 7 das
Obsidian | Level 7

Thank you. I'm not familiar with Proc SQL but was successful in applying your code. I did need to remove the comma in the class statement of Proc Tabulate. Other than that, this solution worked great. Thank you,

Dave

Reeza
Super User

If there's an easy way to know the first record of each animal you could filter on that,

i.e. if the first record of every animal was biopsy A and step =1

proc freq data=have;

where biopsy='A' and step=1;

table species*gender;

run;

Otherwise you'll have to use one of the suggestions above.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1333 views
  • 4 likes
  • 4 in conversation