Hi All,
This seems like a super basic question but I'm working on a project where there are some IDs with repeated observations over time. My current data looks something like this:
ID | Date_Test (mm/dd/yr) | Test_Data |
ABC | 01_08_2020 | 0.231 |
ABC | 02_02_2020 | 0.523 |
ABC | 03_04_2020 | 0.146 |
GG | 01_15_2020 | 0.875 |
GG | 02_14_2020 | 0.345 |
DEF | 01_11_2020 | 0.383 |
DEF | 02_05_2020 | 0.825 |
LIN | 01_30_2020 | 0.622 |
What I want to know is which of the ID's have 2 or more testing data. I'm less concerned about knowing which particular ID's they are, but more on knowing what is the total # of ID's in the dataset that have 2 or more testing data.
I tried to run the code below but I don't think it's correct considering it's selecting the true testing data value and not the count:
proc print data = master;
where test_data >=2;
run;
If there was a way I can select and know which particular observations and how many testing data they have, that would be really helpful. Thanks in advance!
data have;
input id $ Date_Test :mmddyy10. Test_Data;
format date_test mmddyy10.;
cards;
ABC
01_08_2020
0.231
ABC
02_02_2020
0.523
ABC
03_04_2020
0.146
GG
01_15_2020
0.875
GG
02_14_2020
0.345
DEF
01_11_2020
0.383
DEF
02_05_2020
0.825
LIN
01_30_2020
0.622
;
run;
data master;
cnt=0;
do _n_=1 by 1 until(last.id);
set have;
by id notsorted;
cnt+1;
end;
do _n_=1 to _n_ ;
set have;
output;
end;
run;
proc print data = master;
var id Date_Test Test_Data;
where cnt >=2;
run;
data have;
input id $ Date_Test :mmddyy10. Test_Data;
format date_test mmddyy10.;
cards;
ABC
01_08_2020
0.231
ABC
02_02_2020
0.523
ABC
03_04_2020
0.146
GG
01_15_2020
0.875
GG
02_14_2020
0.345
DEF
01_11_2020
0.383
DEF
02_05_2020
0.825
LIN
01_30_2020
0.622
;
proc sql;
create table want as
select count( id) as want
from (select distinct id from have group by id having count(test_data)>=2);
quit;
data have;
input id $ Date_Test :mmddyy10. Test_Data;
format date_test mmddyy10.;
cards;
ABC
01_08_2020
0.231
ABC
02_02_2020
0.523
ABC
03_04_2020
0.146
GG
01_15_2020
0.875
GG
02_14_2020
0.345
DEF
01_11_2020
0.383
DEF
02_05_2020
0.825
LIN
01_30_2020
0.622
;
run;
data master;
cnt=0;
do _n_=1 by 1 until(last.id);
set have;
by id notsorted;
cnt+1;
end;
do _n_=1 to _n_ ;
set have;
output;
end;
run;
proc print data = master;
var id Date_Test Test_Data;
where cnt >=2;
run;
I just have a quick question re: the table that's produced based on your code. So firstly it shouldn't display any ID's of people with only 1 test_data right? Second, I see that it does list subjects which have multiple instances of test_data. But if say the subject appears 3 times in this table, does this display the 1st observation/row for this subject?
Reality:
ID Test_Data
AD 13
GG 123
GG 33
GG 72
DF 89
What your code produces (?):
1)
GG 33
GG 72
OR
2)
GG 123
GG 33
GG 72
Just wanted to double check. Thanks again for the help!
@asgee wrote:
I just have a quick question re: the table that's produced based on your code. So firstly it shouldn't display any ID's of people with only 1 test_data right? Second, I see that it does list subjects which have multiple instances of test_data. But if say the subject appears 3 times in this table, does this display the 1st observation/row for this subject?
Reality:
ID Test_Data
AD 13
GG 123
GG 33
GG 72
DF 89
What your code produces (?):
1)
GG 33
GG 72
OR
2)
GG 123
GG 33
GG 72
Just wanted to double check. Thanks again for the help!
You'll learn more if you test the code with modified input data.
Nvm yes you're right, just tested it myself and figured out how the code works.
Hi Asgee,
You can use proc SQL to keep only the repeating ids and then use Proc Freq to get the count of ID's.
data have; input id $ date_test:mmddyy8. test_Date; format date_test mmddyy8.; datalines; ABC 01/08/2020 0.231 ABC 02/02/2020 0.523 ABC 03/04/2020 0.146 GG 01/15/2020 0.875 GG 02/14/2020 0.345 DEF 01/11/2020 0.383 DEF 02/05/2020 0.825 LIN 01/30/2020 0.622 ; run; proc sort data=have nouniquekeys out=alldups uniqueout=uniques; by id; run; proc freq data=alldups noprint; tables id / out=want(drop=percent); run;
Hope this helps. Check out my article on Proc Sort.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.