Hello Community! I am working with a new data set that has numerous entries for each participant. I want to set a threshold for inclusion such that 10 visits must be present for a participant to be included in my new data set.
After meeting this threshold of number of visits, I need to figure out how many participants met that visit number threshold.
It's a .cvs file I converted from an excel file if that makes any difference
I tried PROC FREQ but couldn't figure out how to do anything with those results. Probably the completely wrong approach but I'm learning SAS on the job so cut me some slack!
Thanks for any advice!
Jackie E.
Use SQL for this:
data have;
input Participant;
n = _n_;
datalines;
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1001
1001
1001
1001
1001
1001
1001
1001
1001
1001
1001
1001
1002
1002
1002
1002
1002
1002
;
proc sql;
create table participants as
select * from have
group by participant
having count(*) >= 10
order by participant, n;
select * from participants;
quit;
PG
Showing some data (even fake) would make things more concrete. - PG
Basically I need to work with only this column (sorry for the small amount of data, but it takes awhile to reblind the info) and the other columns don't matter for the analysis. I need to find if there are 10 entries for participant #1000 and if not, disregard the info for that participant. So on and so forth. So for this data, it would show that 1000 and 1001 both qualify but 1002 does not have enough entries.
Participant |
1000 |
1000 |
1000 |
1000 |
1000 |
1000 |
1000 |
1000 |
1000 |
1000 |
1001 |
1001 |
1001 |
1001 |
1001 |
1001 |
1001 |
1001 |
1001 |
1001 |
1001 |
1001 |
1002 |
1002 |
1002 |
1002 |
1002 |
1002 |
Thanks!
Jackie E.
Use SQL for this:
data have;
input Participant;
n = _n_;
datalines;
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1001
1001
1001
1001
1001
1001
1001
1001
1001
1001
1001
1001
1002
1002
1002
1002
1002
1002
;
proc sql;
create table participants as
select * from have
group by participant
having count(*) >= 10
order by participant, n;
select * from participants;
quit;
PG
proc sql;
select * from have
where participant in (select participant from have group by participant
having count(*)>=10);
quit;
Thank you both for the responses! After spending my lunch break researching (not too successfully) SQL I am wondering how to manipulate the data after using an SQL statement(Both of your responses are the first I've seen of SQL in action). I am more familiar with using IF-THEN-DO statements to manipulate Data.
So after the SQL statement you provided, which does manipulate the data how I want it too, is there any way to an another step or give this SQL statement an output name for storing and manipulating further?
Thanks!
Jackie E.
After executing my code, the resulting observations are in dataset Participants.
The select * from participants; statement can be removed as it is only there to provide a printed output.
- PG
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.