BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
asgee
Obsidian | Level 7

Hi All,

 

This seems like a super basic question but I'm working on a project where there are some IDs with repeated observations over time. My current data looks something like this: 

 

ID

Date_Test (mm/dd/yr)

Test_Data

ABC

01_08_2020

0.231

ABC

02_02_2020

0.523

ABC

03_04_2020

0.146

GG

01_15_2020

0.875

GG

02_14_2020

0.345

DEF

01_11_2020

0.383

DEF

02_05_2020

0.825

LIN

01_30_2020

0.622

 

What I want to know is which of the ID's have 2 or more testing data. I'm less concerned about knowing which particular ID's they are, but more on knowing what is the total # of ID's in the dataset that have 2 or more testing data. 

 

I tried to run the code below but I don't think it's correct considering it's selecting the true testing data value and not the count:

proc print data = master;
where test_data >=2;
run;

If there was a way I can select and know which particular observations and how many testing data they have, that would be really helpful. Thanks in advance! 

1 ACCEPTED SOLUTION

Accepted Solutions
r_behata
Barite | Level 11

data have;
input id $ Date_Test :mmddyy10. Test_Data;
format date_test mmddyy10.;
cards;
ABC
01_08_2020
0.231
ABC
02_02_2020
0.523
ABC
03_04_2020
0.146
GG
01_15_2020
0.875
GG
02_14_2020
0.345
DEF
01_11_2020
0.383
DEF
02_05_2020
0.825
LIN
01_30_2020
0.622
;
run;


data master;
cnt=0;
	do _n_=1 by 1 until(last.id);
		set have;
		by id notsorted;
		cnt+1;
	end;

	do _n_=1 to _n_ ;
		set have;
		output;
	end;
run;

proc print data = master;
var id Date_Test Test_Data;
where cnt >=2;
run;

View solution in original post

6 REPLIES 6
novinosrin
Tourmaline | Level 20


data have;
input id $ Date_Test :mmddyy10. Test_Data;
format date_test mmddyy10.;
cards;
ABC
01_08_2020
0.231
ABC
02_02_2020
0.523
ABC
03_04_2020
0.146
GG
01_15_2020
0.875
GG
02_14_2020
0.345
DEF
01_11_2020
0.383
DEF
02_05_2020
0.825
LIN
01_30_2020
0.622
;

proc sql;
create table want as
select count( id) as want
from (select distinct id from have group by id having count(test_data)>=2);
quit;
r_behata
Barite | Level 11

data have;
input id $ Date_Test :mmddyy10. Test_Data;
format date_test mmddyy10.;
cards;
ABC
01_08_2020
0.231
ABC
02_02_2020
0.523
ABC
03_04_2020
0.146
GG
01_15_2020
0.875
GG
02_14_2020
0.345
DEF
01_11_2020
0.383
DEF
02_05_2020
0.825
LIN
01_30_2020
0.622
;
run;


data master;
cnt=0;
	do _n_=1 by 1 until(last.id);
		set have;
		by id notsorted;
		cnt+1;
	end;

	do _n_=1 to _n_ ;
		set have;
		output;
	end;
run;

proc print data = master;
var id Date_Test Test_Data;
where cnt >=2;
run;
asgee
Obsidian | Level 7

I just have a quick question re: the table that's produced based on your code. So firstly it shouldn't display any ID's of people with only 1 test_data right? Second, I see that it does list subjects which have multiple instances of test_data. But if say the subject appears 3 times in this table, does this display the 1st observation/row for this subject?

 

Reality: 

ID     Test_Data

AD        13

GG       123

GG        33

GG        72

DF         89

 

What your code produces (?):

 

1) 

GG         33 

GG         72

 

OR 

 

2) 

GG 123

GG 33

GG 72

 

Just wanted to double check. Thanks again for the help!

ballardw
Super User

@asgee wrote:

I just have a quick question re: the table that's produced based on your code. So firstly it shouldn't display any ID's of people with only 1 test_data right? Second, I see that it does list subjects which have multiple instances of test_data. But if say the subject appears 3 times in this table, does this display the 1st observation/row for this subject?

 

Reality: 

ID     Test_Data

AD        13

GG       123

GG        33

GG        72

DF         89

 

What your code produces (?):

 

1) 

GG         33 

GG         72

 

OR 

 

2) 

GG 123

GG 33

GG 72

 

Just wanted to double check. Thanks again for the help!


You'll learn more if you test the code with modified input data.

asgee
Obsidian | Level 7

Nvm yes you're right, just tested it myself and figured out how the code works. 

subhroster
Fluorite | Level 6

Hi Asgee,

You can use proc SQL to keep only the repeating ids and then use Proc Freq to get the count of ID's.

data have;
input id $ date_test:mmddyy8. test_Date;
format date_test mmddyy8.;
datalines;
ABC 01/08/2020 0.231
ABC 02/02/2020 0.523
ABC 03/04/2020 0.146
GG 01/15/2020 0.875
GG 02/14/2020 0.345
DEF 01/11/2020 0.383
DEF 02/05/2020 0.825
LIN 01/30/2020 0.622
;
run;

proc sort data=have nouniquekeys out=alldups uniqueout=uniques;
 by id;
run;
proc freq data=alldups noprint;                                                                                                                              
 tables id / out=want(drop=percent);                                                                                                  
run;

Hope this helps. Check out  my article on Proc Sort.

Subhro Kar
www.9to5sas.com

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 8818 views
  • 4 likes
  • 5 in conversation