Solved: Re: Generate summary report for data imported from Access

das · Posted 09-25-2014 03:37 PM

Greetings. I've imported data from an access file. The key variables of the dataset follow:

specie (goat or sheep)

ID (animals ID number)

gender (F, M, M/C)

biopsy_letter (A-F; some had only one biopsy, A; others may have had up to F biopsies over their lifetime)

biopsy_date (date of each biopsy)

step (each 'step' is measured from a single biopsy; each biopsy had four 'step' measures)

So after import, the table of data has numerous rows per animal. For instance: Goat 3876 is represented by 4 rows since this animal had only one biopsy performed for which there are four step measures. Another: Sheep 4465 has 12 rows since this animal had three biopsies for each of which there are four step measures.

The first goal is to simply characterize the dataset: How many goats and sheep are represented? Within each species, how many female, male, castrated male animals are represented?

My first reaction is to do this as follows:

proc freq ;

tables specie*gender ;

run ;

But that just gives me a row count for these factors; i.e., goat 3876 is counted as four animals and sheep 4465 is counted as 12. Is there a way to identify the subject? Do I have to transform the dataset by animal ID and then use proc freq? I've no experience with report, tabulate, summary procedures. Is the simple solution in one of those?

Thank you for your help,

Dave

mohamed_zaki · Posted 09-25-2014 04:54 PM

use nodupkey with sort while you keep only data that matter that could help you.

data test ;

input specie $ ID Gender $ Biopsy_letter $ Step;

cards;

G 122 M A 1

G 122 M A 2

G 122 M A 3

G 122 M A 4

G 132 F A 1

G 132 F A 2

G 132 F A 3

G 132 F A 4

S 546 F A 1

S 546 F A 2

S 546 F A 3

S 546 F A 4

S 546 F B 1

S 546 F B 2

S 546 F B 3

S 546 F B 4

;

run;

proc sort data=test(keep=specie ID Gender) out=new nodupkey ;

by specie ID Gender ;

run ;

proc freq ;

tables specie*gender ;

run ;

View solution in original post

Reeza · Posted 09-25-2014 04:09 PM

What do you want the output to look like? It sounds like proc freq and/or tabulate you want, but without an example of input/output data its hard to say.

das · Posted 09-25-2014 04:17 PM

Reeza,

The table produced by proc freq would be fine. So rows would be specie (goat of sheep) and columns would be number of females, intact males, and castrated males. the cells of the table would then tell me the number or proportion of each specie*gender pairing. Only trouble is the GLM format of the imported data over represents each animal with too many rows. I'm working on the transformed dataset now (it always takes me a while to figure out how to do this) but I was wondering if there was a more elegant procedure or command within proc freq that might allow me to identify subjects.

Hope that answers your question.

Dave

mohamed_zaki · Posted 09-25-2014 04:54 PM

use nodupkey with sort while you keep only data that matter that could help you.

data test ;

input specie $ ID Gender $ Biopsy_letter $ Step;

cards;

G 122 M A 1

G 122 M A 2

G 122 M A 3

G 122 M A 4

G 132 F A 1

G 132 F A 2

G 132 F A 3

G 132 F A 4

S 546 F A 1

S 546 F A 2

S 546 F A 3

S 546 F A 4

S 546 F B 1

S 546 F B 2

S 546 F B 3

S 546 F B 4

;

run;

proc sort data=test(keep=specie ID Gender) out=new nodupkey ;

by specie ID Gender ;

run ;

proc freq ;

tables specie*gender ;

run ;

das · Posted 09-30-2014 07:51 PM

I was unfamiliar with 'nodupkey' but it works very well.

Thank you for the suggestion.

Dave

ballardw · Posted 09-25-2014 04:16 PM

One way, get one record per animal then report.

Proc sql;

create table want as

select distinct specie, id, gender

from have;

quit;

proc tabulate data=have;

class specie, gender;

table specie, gender*n;

run;

das · Posted 09-30-2014 07:50 PM

Thank you. I'm not familiar with Proc SQL but was successful in applying your code. I did need to remove the comma in the class statement of Proc Tabulate. Other than that, this solution worked great. Thank you,

Dave

Reeza · Posted 09-25-2014 05:02 PM

If there's an easy way to know the first record of each animal you could filter on that,

i.e. if the first record of every animal was biopsy A and step =1

proc freq data=have;

where biopsy='A' and step=1;

table species*gender;

run;

Otherwise you'll have to use one of the suggestions above.

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away