## delete obs according to percentage specified

Solved
Regular Contributor
Posts: 190

# delete obs according to percentage specified

Below  code is to get the percent of the obs, I hav 300 obs in test dataset i want to delete certain random oberservations according to the percent provided to my code parameter. means if the delete_freq=20 then 20% of  300 ie from the dataset 60 obs should be deleted and only 240 observations in test1 should be remaining.

Am at a fix to populate a condition to delete the obs according to the delete_freq parameter any help is appreciated.

%macro delobs(delete_freq=)
data test;
set key.test1;
run;

proc sort data= test;
by name;
run;

proc freq data=work.test;
tables name / out=counts (KEEP=name count percent);
run;

data all;
merge counts test;
by name;
if count <= &delete_Freq then delete variable_obs;
run;

%mend;
%delobs(delete_freq=12);

Accepted Solutions
Solution
‎11-25-2016 12:15 AM
Valued Guide
Posts: 589

## Re: delete obs according to percentage specified

Just a simple code that output depending on percentage.

proc sql;

select count(*) Into: OBSCOUNT

from sashelp.class;

quit;

%let pct=80 ; /* Percentage of observation to output */

data test1;

set sashelp.class ;

if _N_=INT((&pct*&obscount)/100)+1 then stop;

run;

Thanks,
Suryakiran

All Replies
Super User
Posts: 9,599

## Re: delete obs according to percentage specified

```%macro delobs(delete_freq=)
data test;
set key.test1;
run;
proc sort data= test;
by name;
run;
proc freq data=work.test;
tables name / out=counts (KEEP=name count percent);
run;
data all;
merge counts test;
by name;
if count > &delete_Freq. then output;
run;
%mend delobs;
%delobs(delete_freq=12);```

Reverse your logic and only output if count > paramter.

PROC Star
Posts: 1,457

## Re: delete obs according to percentage specified

Do you need to delete exactly 60 obs, or would it be okay to give each record a 20% chance of deletion, which could result in more or less records deleted, depending on chance?

Posts: 3,852

## Re: delete obs according to percentage specified

@Quentin I was thinking it would be easier to select 80% of the records using SURVEYSELECT.

Super User
Posts: 23,667

## Re: delete obs according to percentage specified

The datastep in NOT RANDOM. It always get sorted by name and removes all within specified range.

Regular Contributor
Posts: 190

## Re: delete obs according to percentage specified

[ Edited ]

@Quentin i want to randomly delete obs from my data that is in terms of percentage. has mentioned above i hav 300 obs so if my parameter value is 20 so 20% of 300 is 240 so in the output dataset 240 obs should be remaining.

PROC Star
Posts: 1,457

## Re: delete obs according to percentage specified

Agree with the suggestions from others that this is what PROC SURVEYSELECT was made for.  Gone are the days when you might have created a variable with random numbers, sorted, and then selected records yourself.

``````proc surveyselect
data=sashelp.shoes
method=srs
n=240
/*  samprate=.8 */
out=MySample
seed=0
;
run;``````

You can specify the count of records to select, or the sampling rate.  Of course you could wrap that in a macro, which could compute the count you want from the rate, or ...

Regular Contributor
Posts: 190

## Re: delete obs according to percentage specified

@Quentin can't my code be updated to randomly delete certain specific percent of observations from the total observations in the dataset... as i am new to surveyselect don't know its working....
Super User
Posts: 13,502

## Re: delete obs according to percentage specified

[ Edited ]

Your example code does not even look like it was attempting to create a subset of a specific percentage size of the original data. It looks like it was attempting to delete records whose identification variable, Name, appeared fewer than a specified number of times.

So which do you want to do: Select a random subset of a specified number of records or remove very specific identified records that appear with low frequency?

The basic approach you started with will not remove a "random" anything. As a minimum you need a random value somewhere if the goal is actually a random subset. And Surveyselect will not require passing the data through multiple data steps or procedures.

Regular Contributor
Posts: 190

## Re: delete obs according to percentage specified

[ Edited ]

@ballarwd I want to delete observations from the dataset based on the delete_freq=20 parameter i.e delete only 20% of the total observations. for example i have 200obs & have value 20 in delete_freq then just delete 20% of total 200 obs i.e any 40 obs will be deleted and only 160 obs will be left...

Super User
Posts: 13,502

## Re: delete obs according to percentage specified

RTelang wrote:
I want to delete observations from the dataset based on the delete_freq=20 parameter i.e delete only 20% of the total observations. for example i have 200obs & have value 20 in delete_freq then just delete 20% of total 200 obs i.e any 40 obs will be deleted and only 160 obs will be left...

The SAMPRATE option indicates what percentage of records to select (or keep). A SAMPRATE value less than one such as .8 is keep 80 percent, or if you use a value greater than one it is treated as percentage so Samprate=80 would keep 80 percent of the records.

OR you can actually specifize the number of records to keep with SAMPSIZE if you prefer.

If you insist on using a removal percentage then have your code substract that value from 100 and place that in the SAMPRATE= option.

Regular Contributor
Posts: 190

## Re: delete obs according to percentage specified

@ballward my criteria is i calculate, want to delete a certain percent of obs values from the DS observations.. dont want to keep the specific number of values i.e i want to directly calculate 20% of 300 that is 40 & then delete 40 obs from the total 300 obs tats it....
PROC Star
Posts: 1,457

## Re: delete obs according to percentage specified

That is what surveryselect is doing.  You specify how many records (or what proportion) to select, rather than delete.

If you really want to do it with data step code, this page presents surveyselect and two data step options.

Super User
Posts: 23,667

## Re: delete obs according to percentage specified

RTelang wrote:
@ballward my criteria is i calculate, want to delete a certain percent of obs values from the DS observations.. dont want to keep the specific number of values i.e i want to directly calculate 20% of 300 that is 40 & then delete 40 obs from the total 300 obs tats it....

That's like saying 2 + 3 is different from 3 + 2.

Super User
Posts: 23,667

## Re: delete obs according to percentage specified

RTelang wrote:
@Quentin can't my code be updated to randomly delete certain specific percent of observations from the total observations in the dataset... as i am new to surveyselect don't know its working....

I suggest reading the documentation then.

☑ This topic is solved.