Solved: Data Set Analysis- Setting thresholds for inclusion.

JackieEvans85 · Posted 10-31-2014 12:42 PM

Hello Community! I am working with a new data set that has numerous entries for each participant. I want to set a threshold for inclusion such that 10 visits must be present for a participant to be included in my new data set.

After meeting this threshold of number of visits, I need to figure out how many participants met that visit number threshold.

It's a .cvs file I converted from an excel file if that makes any difference

I tried PROC FREQ but couldn't figure out how to do anything with those results. Probably the completely wrong approach but I'm learning SAS on the job so cut me some slack!

Thanks for any advice!

Jackie E.

PGStats · Posted 10-31-2014 01:16 PM

Use SQL for this:

data have;

input Participant;

n = _n_;

datalines;

1000

1001

1002

;

proc sql;

create table participants as

select * from have

group by participant

having count(*) >= 10

order by participant, n;

select * from participants;

quit;

PG

View solution in original post

PGStats · Posted 10-31-2014 12:56 PM

Showing some data (even fake) would make things more concrete. - PG

PG

JackieEvans85 · Posted 10-31-2014 01:08 PM

Basically I need to work with only this column (sorry for the small amount of data, but it takes awhile to reblind the info) and the other columns don't matter for the analysis. I need to find if there are 10 entries for participant #1000 and if not, disregard the info for that participant. So on and so forth. So for this data, it would show that 1000 and 1001 both qualify but 1002 does not have enough entries.

Participant
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1001
1001
1001
1001
1001
1001
1001
1001
1001
1001
1001
1001
1002
1002
1002
1002
1002
1002

Thanks!

Jackie E.

PGStats · Posted 10-31-2014 01:16 PM

Use SQL for this:

data have;

input Participant;

n = _n_;

datalines;

1000

1001

1002

;

proc sql;

create table participants as

select * from have

group by participant

having count(*) >= 10

order by participant, n;

select * from participants;

quit;

PG

stat_sas · Posted 10-31-2014 01:24 PM

proc sql;

select * from have

where participant in (select participant from have group by participant

having count(*)>=10);

quit;

JackieEvans85 · Posted 10-31-2014 02:07 PM

Thank you both for the responses! After spending my lunch break researching (not too successfully) SQL I am wondering how to manipulate the data after using an SQL statement(Both of your responses are the first I've seen of SQL in action). I am more familiar with using IF-THEN-DO statements to manipulate Data.

So after the SQL statement you provided, which does manipulate the data how I want it too, is there any way to an another step or give this SQL statement an output name for storing and manipulating further?

Thanks!

Jackie E.

PGStats · Posted 10-31-2014 02:20 PM

After executing my code, the resulting observations are in dataset Participants.

The select * from participants; statement can be removed as it is only there to provide a printed output.

- PG

PG

Data Set Analysis- Setting thresholds for inclusion.

Re: Data Set Analysis- Setting thresholds for inclusion.

Re: Data Set Analysis- Setting thresholds for inclusion.

Re: Data Set Analysis- Setting thresholds for inclusion.

Re: Data Set Analysis- Setting thresholds for inclusion.

Re: Data Set Analysis- Setting thresholds for inclusion.

Re: Data Set Analysis- Setting thresholds for inclusion.

Re: Data Set Analysis- Setting thresholds for inclusion.

Registration is open

SAS Training: Just a Click Away