Solved: How do I count the number of observations per unique ID conditional on...

asgee · Posted 01-29-2020 12:39 PM

Hi All,

This seems like a super basic question but I'm working on a project where there are some IDs with repeated observations over time. My current data looks something like this:

ID	Date_Test (mm/dd/yr)	Test_Data
ABC	01_08_2020	0.231
ABC	02_02_2020	0.523
ABC	03_04_2020	0.146
GG	01_15_2020	0.875
GG	02_14_2020	0.345
DEF	01_11_2020	0.383
DEF	02_05_2020	0.825
LIN	01_30_2020	0.622

What I want to know is which of the ID's have 2 or more testing data. I'm less concerned about knowing which particular ID's they are, but more on knowing what is the total # of ID's in the dataset that have 2 or more testing data.

I tried to run the code below but I don't think it's correct considering it's selecting the true testing data value and not the count:

proc print data = master;
where test_data >=2;
run;

If there was a way I can select and know which particular observations and how many testing data they have, that would be really helpful. Thanks in advance!

r_behata · Posted 01-29-2020 01:11 PM


data have;
input id $ Date_Test :mmddyy10. Test_Data;
format date_test mmddyy10.;
cards;
ABC
01_08_2020
0.231
ABC
02_02_2020
0.523
ABC
03_04_2020
0.146
GG
01_15_2020
0.875
GG
02_14_2020
0.345
DEF
01_11_2020
0.383
DEF
02_05_2020
0.825
LIN
01_30_2020
0.622
;
run;


data master;
cnt=0;
	do _n_=1 by 1 until(last.id);
		set have;
		by id notsorted;
		cnt+1;
	end;

	do _n_=1 to _n_ ;
		set have;
		output;
	end;
run;

proc print data = master;
var id Date_Test Test_Data;
where cnt >=2;
run;

View solution in original post

novinosrin · Posted 01-29-2020 01:00 PM



data have;
input id $ Date_Test :mmddyy10. Test_Data;
format date_test mmddyy10.;
cards;
ABC
01_08_2020
0.231
ABC
02_02_2020
0.523
ABC
03_04_2020
0.146
GG
01_15_2020
0.875
GG
02_14_2020
0.345
DEF
01_11_2020
0.383
DEF
02_05_2020
0.825
LIN
01_30_2020
0.622
;

proc sql;
create table want as
select count( id) as want
from (select distinct id from have group by id having count(test_data)>=2);
quit;

r_behata · Posted 01-29-2020 01:11 PM


data have;
input id $ Date_Test :mmddyy10. Test_Data;
format date_test mmddyy10.;
cards;
ABC
01_08_2020
0.231
ABC
02_02_2020
0.523
ABC
03_04_2020
0.146
GG
01_15_2020
0.875
GG
02_14_2020
0.345
DEF
01_11_2020
0.383
DEF
02_05_2020
0.825
LIN
01_30_2020
0.622
;
run;


data master;
cnt=0;
	do _n_=1 by 1 until(last.id);
		set have;
		by id notsorted;
		cnt+1;
	end;

	do _n_=1 to _n_ ;
		set have;
		output;
	end;
run;

proc print data = master;
var id Date_Test Test_Data;
where cnt >=2;
run;

asgee · Posted 01-29-2020 02:49 PM

I just have a quick question re: the table that's produced based on your code. So firstly it shouldn't display any ID's of people with only 1 test_data right? Second, I see that it does list subjects which have multiple instances of test_data. But if say the subject appears 3 times in this table, does this display the 1st observation/row for this subject?

Reality:

ID Test_Data

AD 13

GG 123

GG 33

GG 72

DF 89

What your code produces (?):

1)

GG 33

GG 72

OR

2)

GG 123

GG 33

GG 72

Just wanted to double check. Thanks again for the help!

ballardw · Posted 01-29-2020 02:56 PM

@asgee wrote:

I just have a quick question re: the table that's produced based on your code. So firstly it shouldn't display any ID's of people with only 1 test_data right? Second, I see that it does list subjects which have multiple instances of test_data. But if say the subject appears 3 times in this table, does this display the 1st observation/row for this subject?

Reality:

ID Test_Data

AD 13

GG 123

GG 33

GG 72

DF 89

What your code produces (?):

1)

GG 33

GG 72

OR

2)

GG 123

GG 33

GG 72

Just wanted to double check. Thanks again for the help!

You'll learn more if you test the code with modified input data.

asgee · Posted 01-29-2020 03:00 PM

Nvm yes you're right, just tested it myself and figured out how the code works.

subhroster · Posted 01-30-2020 02:57 AM

Hi Asgee,

You can use proc SQL to keep only the repeating ids and then use Proc Freq to get the count of ID's.

data have;
input id $ date_test:mmddyy8. test_Date;
format date_test mmddyy8.;
datalines;
ABC 01/08/2020 0.231
ABC 02/02/2020 0.523
ABC 03/04/2020 0.146
GG 01/15/2020 0.875
GG 02/14/2020 0.345
DEF 01/11/2020 0.383
DEF 02/05/2020 0.825
LIN 01/30/2020 0.622
;
run;

proc sort data=have nouniquekeys out=alldups uniqueout=uniques;
 by id;
run;
proc freq data=alldups noprint;                                                                                                                              
 tables id / out=want(drop=percent);                                                                                                  
run;

Hope this helps. Check out my article on Proc Sort.

Subhro Kar
www.9to5sas.com

How do I count the number of observations per unique ID conditional on a variable?

Re: How do I count the number of observations per unique ID conditional on a variable?

Re: How do I count the number of observations per unique ID conditional on a variable?

Re: How do I count the number of observations per unique ID conditional on a variable?

Re: How do I count the number of observations per unique ID conditional on a variable?

Re: How do I count the number of observations per unique ID conditional on a variable?

Re: How do I count the number of observations per unique ID conditional on a variable?

Re: How do I count the number of observations per unique ID conditional on a variable?