Solved: count repeated variables by group

Barkamih · Posted 06-20-2018 03:35 PM

herd                     cow_id
10                           101
10                           101
10                           101
10                           101
10                           101
10                           101 
10                           101
10                           102
10                           102
10                           102
10                           102
10                           103
10                           104
10                           104
10                           104
10                           105
11                           106
11                           106
11                           106
11                           106
11                           106
11                           106
11                           106
11                           106
11                           106
11                           107
11                           107
11                           108
11                           108
11                           108
11                           108
11                           108
11                           108
11                           108

I just wanted to count the number of cow_id for each herd that I have in my dataset.

Data named: Fatd and I have variable herd and coe_id, I want to delete all herds that have less than 5 cow_ids.

If you look at this data, Herd 10 has 5 cows ID (101, 102, 103, 104, 105). the repetition 7 times of cow_id 101 mean this cow has 7 times of measurements, so in my question, I don't care about the repetition, I'm just looking for cow_id. And herd 11 has 3 cows ID (106, 107, 108), so in this case, I just want to keep herd 10 and delete herd 11, it does not matter how much each cow had been repeated.

I just care about the number of cows in each herd not about the number of measurements. (Repeated cow_id is a different time of measurements).

I hope you get my idea this time

Regards

Ibrahim

PeterClemmensen · Posted 06-20-2018 03:43 PM

Something like this?

data have;
input herd cow_id;
datalines;
10 101
10 101
10 101
10 101
10 101
10 101
10 101
10 102
10 102
10 102
10 102
10 103
10 104
10 104
10 104
10 105
11 106
11 106
11 106
11 106
11 106
11 106
11 106
11 106
11 106
11 107
11 107
11 108
11 108
11 108
11 108
11 108
11 108
11 108
;

proc sql;
   create table want as
   select * from have
   where herd in 
   (select distinct herd from have group by herd having count(distinct cow_id) >= 5);
quit;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 06-20-2018 03:43 PM

Something like this?

data have;
input herd cow_id;
datalines;
10 101
10 101
10 101
10 101
10 101
10 101
10 101
10 102
10 102
10 102
10 102
10 103
10 104
10 104
10 104
10 105
11 106
11 106
11 106
11 106
11 106
11 106
11 106
11 106
11 106
11 107
11 107
11 108
11 108
11 108
11 108
11 108
11 108
11 108
;

proc sql;
   create table want as
   select * from have
   where herd in 
   (select distinct herd from have group by herd having count(distinct cow_id) >= 5);
quit;

The DATA to DATA Step Macro
Blog: SASnrd

Barkamih · Posted 06-20-2018 04:08 PM

My dataset over million, how I can put them in the code, I'm sorry about this but I'm new with SAS.

REGARDS

Reeza · Posted 06-20-2018 04:16 PM

You don't.

You reference your SAS data set. We use cards/datalines to create example data to test the code.

@Barkamih wrote:

My dataset over million, how I can put them in the code, I'm sorry about this but I'm new with SAS.

REGARDS

pau13rown · Posted 06-20-2018 03:44 PM

proc sort data=....;

by herd cow_id;

run;

data ....

retain count;

set ....;

by herd cow_id;

if first.herd then count=1;

if first.cow_id then count=count+1;

run;

does that work?

Barkamih · Posted 06-20-2018 04:09 PM

NOT WORKING

PaigeMiller · Posted 06-20-2018 04:12 PM

@Barkamih wrote:

NOT WORKING

When someone says "NOT WORKING", this gives us NO information on what has gone wrong, and so there's no way for us to help further.

Give us information.

Show us your code

Show us your SASLOG

Show us a part of the results that indicate a problem.

--
Paige Miller

Barkamih · Posted 06-20-2018 04:16 PM

Sorry about this, I was so stressful because I have a deadline soon.

this id the error

1273
1274
1275
1276 data FATD
1277
1278 retain count;
------
56
ERROR 56-185: RETAIN is not allowed in the DATA statement when option DATASTMTCHK=COREKEYWORDS.
Check for a missing semicolon in the DATA statement, or use DATASTMTCHK=NONE.

1279
1280 set FATD;
1281
1282 by herd cow_id;
1283
1284 if first.herd then count=1;
1285
1286 if first.cow_id then count=count+1;
1287
1288 run;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

Reeza · Posted 06-20-2018 04:24 PM

You're missing a semicolon.

PaigeMiller · Posted 06-20-2018 04:25 PM

@Barkamih wrote:

1276 data FATD
1277
1278 retain count;

Put a semi-colon after data FATD

--
Paige Miller

Barkamih · Posted 06-20-2018 04:28 PM

Yes I found that

but Nothing change in data as this log showing

408 proc sort data=FATD;
409
410 by herd cow_id;
411
412 run;

NOTE: There were 1519417 observations read from the data set WORK.FATD.
NOTE: The data set WORK.FATD has 1519417 observations and 55 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 3.11 seconds
cpu time 1.86 seconds

413
414
415
416 data FATD;
417
418 retain count;
419
420 set FATD;
421
422 by herd cow_id;
423
424 if first.herd then count=1;
425
426 if first.cow_id then count=count+1;
427
428 run;

NOTE: There were 1519417 observations read from the data set WORK.FATD.
NOTE: The data set WORK.FATD has 1519417 observations and 55 variables.
NOTE: DATA statement used (Total process time):
real time 9.37 seconds
cpu time 1.71 seconds

novinosrin · Posted 06-20-2018 04:34 PM

@Barkamih May i ask why haven't you tried @PeterClemmensen sql solution in the first place ?

Barkamih · Posted 06-21-2018 12:23 PM

That what I used to solve my issue!!!!!!

novinosrin · Posted 06-20-2018 03:48 PM

data have;
input herd                     cow_id;
cards;
10                           101
10                           101
10                           101
10                           101
10                           101
10                           101 
10                           101
10                           102
10                           102
10                           102
10                           102
10                           103
10                           104
10                           104
10                           104
10                           105
11                           106
11                           106
11                           106
11                           106
11                           106
11                           106
11                           106
11                           106
11                           106
11                           107
11                           107
11                           108
11                           108
11                           108
11                           108
11                           108
11                           108
11                           108 
;

proc sql;
create table want as
select *
from have
group by herd
having  count(distinct cow_id)>=5;
quit;

Barkamih · Posted 06-20-2018 04:51 PM

Thank you every one.

my best regards to all of you

Brakmih

count repeated variables by group

Re: count repeated variables by group

Re: count repeated variables by group

Re: count repeated variables by group

Re: count repeated variables by group

Re: count repeated variables by group

Re: count repeated variables by group

Re: count repeated variables by group

Re: count repeated variables by group

Re: count repeated variables by group

Re: count repeated variables by group

Re: count repeated variables by group

Re: count repeated variables by group

Re: count repeated variables by group

Re: count repeated variables by group

Re: count repeated variables by group

Registration is open

SAS Training: Just a Click Away