turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- subsetting random observations

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-12-2008 02:41 AM

Hi,

I have a data set in which i want to subset a sample with a condition. In the subset I want to pull out 1.5 % of random observations of a variable and take the average of it.

I could do the first part of this task to pull the subset with a condition (using WHERE clause).

I'm trying to figure out how I can pull 1.5 % of random observations of the variable.

After that, I think I can use Avg function to get a mean of it.

Appreciate you help!

I have a data set in which i want to subset a sample with a condition. In the subset I want to pull out 1.5 % of random observations of a variable and take the average of it.

I could do the first part of this task to pull the subset with a condition (using WHERE clause).

I'm trying to figure out how I can pull 1.5 % of random observations of the variable.

After that, I think I can use Avg function to get a mean of it.

Appreciate you help!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

08-12-2008 02:55 AM

Hi.

Check out the Surveyselect procedure if you have licence for SAS/STAT.

If not, one usual way is to code :

1) a data step that creates a random variable (using the RANUNI or RAND functions)

2) sort the new dataset by this random variable

3) keep only the x first observations of the sorted dataset.

This last way of doing is just like shuffling cards.

Olivier

Check out the Surveyselect procedure if you have licence for SAS/STAT.

If not, one usual way is to code :

1) a data step that creates a random variable (using the RANUNI or RAND functions)

2) sort the new dataset by this random variable

3) keep only the x first observations of the sorted dataset.

This last way of doing is just like shuffling cards.

Olivier

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Olivier

08-12-2008 03:28 AM

You can also do a subsetting if in the first datastep (on the new "ranuni"-variable). Then there is no need to sort and read the data a third time.

Regards,

Linus

Regards,

Linus

Data never sleeps

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

08-12-2008 06:39 AM

The code below (second data step) is almost what SAS provides in the training for the SAS certification.

HTH

Patrick

data work.rsubset;

set (where=(
run;

data work.rsubset(drop=obsleft sampsize);

/* sampsize=10; */

sampsize=ceil(totobs*0.015);

obsleft=totobs;

do while(sampsize>0);

pickit+1;

if ranuni(0)
set sasuser.revenue point=pickit

nobs=totobs;

output;

sampsize=sampsize-1;

end;

obsleft=obsleft-1;

end;

stop;

run;

Message was edited by: Patrick Message was edited by: Patrick

HTH

Patrick

data work.rsubset;

set

data work.rsubset(drop=obsleft sampsize);

/* sampsize=10; */

sampsize=ceil(totobs*0.015);

obsleft=totobs;

do while(sampsize>0);

pickit+1;

if ranuni(0)

nobs=totobs;

output;

sampsize=sampsize-1;

end;

obsleft=obsleft-1;

end;

stop;

run;

Message was edited by: Patrick Message was edited by: Patrick

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

08-14-2012 08:47 AM

San:

Are you asking for 1.5% randomly selected from the entire data set? Or do you want 1.5% of the subset that meets a given condition (you said "subset a random sample with a condition"). If it's the latter, then your question hasn't been answered yet.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

08-14-2012 02:41 PM

We should also ask whether the subset is selected with or without replacement. For small subsets of a large data set it probably does not matter a lot, but....