turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Question about proc survey select

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-31-2009 02:01 PM

I have a data set with 15 observation hours for each of my subjects. I'm trying to use survey select to generate new data sets with decreasing numbers of hours, that I'm then comparing to the total hours. (i.e. what is the correlation between 14 and 15 hours? 13 and 15? 12 and 15? etc.) I have 2 strata, basically age and subject. My problem is that I'd like to keep the same set of hours for each subject, and I can't figure out how to do that. For example, when I generate a set containing 3 hours out of the 15, if the hours for subject 1 are 8, 10, and 12 then I want the hours for subject 2 (and all others) to also be 8, 10 and 12. Is there any way to do this?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

09-01-2009 10:41 AM

> I have a data set with 15 observation hours for each

> of my subjects.

Does the data set produced by the following code model your data?

[pre]

proc plan seed=618029071;

factors

subjid = 10 ordered

agegroup = 1 of 3 random

time = 15 ordered

y1 = 1 of 200

/ noprint;

treatments y2=1 of 50 y3=1 of 30;

output out=plan;

run;

quit;

proc print;

run;

[/pre]

> I'm trying to use survey select to

> generate new data sets with decreasing numbers of

> hours, that I'm then comparing to the total hours.

> (i.e. what is the correlation between 14 and 15

> hours? 13 and 15? 12 and 15? etc.) I have 2 strata,

> basically age and subject. My problem is that I'd

> like to keep the same set of hours for each subject,

> and I can't figure out how to do that. For example,

> when I generate a set containing 3 hours out of the

> 15, if the hours for subject 1 are 8, 10, and 12 then

> I want the hours for subject 2 (and all others) to

> also be 8, 10 and 12. Is there any way to do this?

I don't think SURVERSELECT is going to work well this. SURVEYSELECT selects observations from data sets. It sounds like you want to select levels of a variable (TIME). In your example.

[pre]where time in(8,10,12) [/pre]

There are a number of ways to select (k of n) values at random. There is CALL RANPERK

[pre]

CALL RANPERK Routine

Randomly permutes the values of the arguments, and returns a permutation of k out of n values [/pre]

Also PROC PLAN in the FACTORS statement. I used this above to create sample data.

[pre]

name=m < OF n > < selection-type >

where

name

is a valid SAS name. This gives the name of a factor in the design.

m

is a positive integer that gives the number of values to be selected. If n is specified, the value of m must be less than or equal to n.

n

is a positive integer that gives the number of values to be selected from.

[/pre]

There are others, these are the ones I'm most familiar with.

If I am correct the details of which method(s) might be most appropriate depend on the output you desire. You mentioned correlation. If you describe (with sample data) how the data should look to produce the analysis this will help refine the solution.

Also, do you want to do this for all (n of m) subsets and do you want replication? That is replications of subsets of size n.

> of my subjects.

Does the data set produced by the following code model your data?

[pre]

proc plan seed=618029071;

factors

subjid = 10 ordered

agegroup = 1 of 3 random

time = 15 ordered

y1 = 1 of 200

/ noprint;

treatments y2=1 of 50 y3=1 of 30;

output out=plan;

run;

quit;

proc print;

run;

[/pre]

> I'm trying to use survey select to

> generate new data sets with decreasing numbers of

> hours, that I'm then comparing to the total hours.

> (i.e. what is the correlation between 14 and 15

> hours? 13 and 15? 12 and 15? etc.) I have 2 strata,

> basically age and subject. My problem is that I'd

> like to keep the same set of hours for each subject,

> and I can't figure out how to do that. For example,

> when I generate a set containing 3 hours out of the

> 15, if the hours for subject 1 are 8, 10, and 12 then

> I want the hours for subject 2 (and all others) to

> also be 8, 10 and 12. Is there any way to do this?

I don't think SURVERSELECT is going to work well this. SURVEYSELECT selects observations from data sets. It sounds like you want to select levels of a variable (TIME). In your example.

[pre]where time in(8,10,12) [/pre]

There are a number of ways to select (k of n) values at random. There is CALL RANPERK

[pre]

CALL RANPERK Routine

Randomly permutes the values of the arguments, and returns a permutation of k out of n values [/pre]

Also PROC PLAN in the FACTORS statement. I used this above to create sample data.

[pre]

name=m < OF n > < selection-type >

where

name

is a valid SAS name. This gives the name of a factor in the design.

m

is a positive integer that gives the number of values to be selected. If n is specified, the value of m must be less than or equal to n.

n

is a positive integer that gives the number of values to be selected from.

[/pre]

There are others, these are the ones I'm most familiar with.

If I am correct the details of which method(s) might be most appropriate depend on the output you desire. You mentioned correlation. If you describe (with sample data) how the data should look to produce the analysis this will help refine the solution.

Also, do you want to do this for all (n of m) subsets and do you want replication? That is replications of subsets of size n.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to data_null__

09-02-2009 04:53 PM

Thanks for your help.

This is what my data looks like (very similar to what you posted)

proc plan;

factors

age = 3 ordered

subjid = 5 ordered

hour = 15 ordered

y1 = 1 of 200

/ noprint;

output out=plan;

run;

quit;

proc print;

run;

I'm having some trouble getting sample data output to look like what I want as the end result, but this is close:

proc plan;

factors

age = 3 ordered

subjid = 5 ordered

hour = 14 ordered

y1 = 1 of 200

/ noprint;

output out=sample1;

run;

quit;

proc print;

run;

What I actually want is a random selection of 14 out of the original 15 hours, instead of hours 1 to 14. My snag is that I want the same hours for each age and subject, so that the output would look like the above if hour 15 was the hour that was randomly chosen to be thrown out.

Another way I've thought to do this is to copy the original data 10 times and add a replication column, so that my input data would look something like:

proc plan;

factors

age = 3 ordered

subjid = 5 ordered

rep = 10 ordered

hour = 15 ordered

/ noprint;

output out=sample2;

run;

quit;

proc print;

run;

I would then use a random number generator to select the hours that I want to keep and have SAS delete all the other hours with

data sample2;

set sample2;

if rep = 1 and hour = 10 then delete;

if rep = 2 and hour = 5 then delete;

if rep = 3 and hour = 14 then delete;

run;

(Obviously, with the real data I would continue so it included all 10 replicates.)

The problem with this is that it gets very time consuming, since I want to do this, not just for 14 hours, but with 13 hours, 12 hours, 11 hours, all the way down to 1 hour.

I hope that made things clearer and not more confusing. If you have advice for either of these methods I'd really appreciate it.

This is what my data looks like (very similar to what you posted)

proc plan;

factors

age = 3 ordered

subjid = 5 ordered

hour = 15 ordered

y1 = 1 of 200

/ noprint;

output out=plan;

run;

quit;

proc print;

run;

I'm having some trouble getting sample data output to look like what I want as the end result, but this is close:

proc plan;

factors

age = 3 ordered

subjid = 5 ordered

hour = 14 ordered

y1 = 1 of 200

/ noprint;

output out=sample1;

run;

quit;

proc print;

run;

What I actually want is a random selection of 14 out of the original 15 hours, instead of hours 1 to 14. My snag is that I want the same hours for each age and subject, so that the output would look like the above if hour 15 was the hour that was randomly chosen to be thrown out.

Another way I've thought to do this is to copy the original data 10 times and add a replication column, so that my input data would look something like:

proc plan;

factors

age = 3 ordered

subjid = 5 ordered

rep = 10 ordered

hour = 15 ordered

/ noprint;

output out=sample2;

run;

quit;

proc print;

run;

I would then use a random number generator to select the hours that I want to keep and have SAS delete all the other hours with

data sample2;

set sample2;

if rep = 1 and hour = 10 then delete;

if rep = 2 and hour = 5 then delete;

if rep = 3 and hour = 14 then delete;

run;

(Obviously, with the real data I would continue so it included all 10 replicates.)

The problem with this is that it gets very time consuming, since I want to do this, not just for 14 hours, but with 13 hours, 12 hours, 11 hours, all the way down to 1 hour.

I hope that made things clearer and not more confusing. If you have advice for either of these methods I'd really appreciate it.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

09-02-2009 05:58 PM

I think the following may give you what you want. I used RANPERK to get a list of hours to KEEP. I generated IF statements with a data step because it seemed easy enough to do it that way. Writing wallpaper with code.

This produced a sample for each K from 14 to 1 by -1 the variable K will act as a BY variable in your analysis.

Let me know if this does what you want.

[pre]

dm 'clear log; clear output;';

proc plan seed=767318578;

factors

age = 3 ordered

subjid = 5 ordered

hour = 15 ordered

y1 = 1 of 200

/ noprint;

output out=plan;

run;

quit;

*proc print;

run;

filename FT85F001 temp;

data hours;

file FT85F001;

seed=1046482356;

array _h[15] (1:15);

do k = dim(_h)-1 to 1 by -1;

put +3 k= ';';

put +3 'if hour in(' @;

call ranperk(seed,k,of _h);

do _n_ = 1 to k;

put _h[_n_] 3. @;

end;

put ') then output;';

end;

run;

data hoursV / view=hoursV;

set plan;

%inc FT85F001 / source2;

run;

proc sort data=hoursV out=hours;

by descending k;

run;

proc print data=_last_(obs=100);

run;

[/pre]

This produced a sample for each K from 14 to 1 by -1 the variable K will act as a BY variable in your analysis.

Let me know if this does what you want.

[pre]

dm 'clear log; clear output;';

proc plan seed=767318578;

factors

age = 3 ordered

subjid = 5 ordered

hour = 15 ordered

y1 = 1 of 200

/ noprint;

output out=plan;

run;

quit;

*proc print;

run;

filename FT85F001 temp;

data hours;

file FT85F001;

seed=1046482356;

array _h[15] (1:15);

do k = dim(_h)-1 to 1 by -1;

put +3 k= ';';

put +3 'if hour in(' @;

call ranperk(seed,k,of _h

do _n_ = 1 to k;

put _h[_n_] 3. @;

end;

put ') then output;';

end;

run;

data hoursV / view=hoursV;

set plan;

%inc FT85F001 / source2;

run;

proc sort data=hoursV out=hours;

by descending k;

run;

proc print data=_last_(obs=100);

run;

[/pre]

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to data_null__

09-03-2009 01:33 PM

Perfect! Thank you so much for your help with this.