DATA Step, Macro, Functions and more

delete obs according to percentage specified

Accepted Solution Solved
Reply
Regular Contributor
Posts: 190
Accepted Solution

delete obs according to percentage specified

Below  code is to get the percent of the obs, I hav 300 obs in test dataset i want to delete certain random oberservations according to the percent provided to my code parameter. means if the delete_freq=20 then 20% of  300 ie from the dataset 60 obs should be deleted and only 240 observations in test1 should be remaining.

Am at a fix to populate a condition to delete the obs according to the delete_freq parameter any help is appreciated.

 


%macro delobs(delete_freq=)
data test;
set key.test1;
run;

proc sort data= test;
by name;
run;

proc freq data=work.test;
tables name / out=counts (KEEP=name count percent);
run;

data all;
merge counts test;
by name;
if count <= &delete_Freq then delete variable_obs;
run;

%mend;
%delobs(delete_freq=12);

 


Accepted Solutions
Solution
‎11-25-2016 12:15 AM
Frequent Contributor
Posts: 123

Re: delete obs according to percentage specified

Just a simple code that output depending on percentage.

proc sql;

select count(*) Into: OBSCOUNT

from sashelp.class;

quit;

%let pct=80 ; /* Percentage of observation to output */

data test1;

set sashelp.class ;

if _N_=INT((&pct*&obscount)/100)+1 then stop;

run;

 

Thanks,
Suryakiran

View solution in original post


All Replies
Super User
Super User
Posts: 7,401

Re: delete obs according to percentage specified

%macro delobs(delete_freq=)
  data test;
    set key.test1;
  run;
  proc sort data= test;
    by name;
  run;
  proc freq data=work.test;
    tables name / out=counts (KEEP=name count percent);
  run;
  data all;
    merge counts test;
    by name;
    if count > &delete_Freq. then output;
  run;
%mend delobs;
%delobs(delete_freq=12);

Reverse your logic and only output if count > paramter.

PROC Star
Posts: 1,231

Re: delete obs according to percentage specified

Do you need to delete exactly 60 obs, or would it be okay to give each record a 20% chance of deletion, which could result in more or less records deleted, depending on chance?

Respected Advisor
Posts: 3,777

Re: delete obs according to percentage specified

@Quentin I was thinking it would be easier to select 80% of the records using SURVEYSELECT. 

Super User
Posts: 17,819

Re: delete obs according to percentage specified

The datastep in NOT RANDOM. It always get sorted by name and removes all within specified range. 

 

Look into PROC SURVEYSELECT instead. 

Regular Contributor
Posts: 190

Re: delete obs according to percentage specified

[ Edited ]

@Quentin i want to randomly delete obs from my data that is in terms of percentage. has mentioned above i hav 300 obs so if my parameter value is 20 so 20% of 300 is 240 so in the output dataset 240 obs should be remaining.

PROC Star
Posts: 1,231

Re: delete obs according to percentage specified

Agree with the suggestions from others that this is what PROC SURVEYSELECT was made for.  Gone are the days when you might have created a variable with random numbers, sorted, and then selected records yourself.

 

proc surveyselect 
  data=sashelp.shoes
  method=srs
  n=240
/*  samprate=.8 */
  out=MySample
  seed=0
;
run;

 

You can specify the count of records to select, or the sampling rate.  Of course you could wrap that in a macro, which could compute the count you want from the rate, or ...

Regular Contributor
Posts: 190

Re: delete obs according to percentage specified

@Quentin can't my code be updated to randomly delete certain specific percent of observations from the total observations in the dataset... as i am new to surveyselect don't know its working....
Super User
Posts: 10,500

Re: delete obs according to percentage specified

[ Edited ]

Your example code does not even look like it was attempting to create a subset of a specific percentage size of the original data. It looks like it was attempting to delete records whose identification variable, Name, appeared fewer than a specified number of times.

 

So which do you want to do: Select a random subset of a specified number of records or remove very specific identified records that appear with low frequency?

 

The basic approach you started with will not remove a "random" anything. As a minimum you need a random value somewhere if the goal is actually a random subset. And Surveyselect will not require passing the data through multiple data steps or procedures.

 

 

Regular Contributor
Posts: 190

Re: delete obs according to percentage specified

[ Edited ]

@ballarwd I want to delete observations from the dataset based on the delete_freq=20 parameter i.e delete only 20% of the total observations. for example i have 200obs & have value 20 in delete_freq then just delete 20% of total 200 obs i.e any 40 obs will be deleted and only 160 obs will be left...

Super User
Posts: 10,500

Re: delete obs according to percentage specified


RTelang wrote:
I want to delete observations from the dataset based on the delete_freq=20 parameter i.e delete only 20% of the total observations. for example i have 200obs & have value 20 in delete_freq then just delete 20% of total 200 obs i.e any 40 obs will be deleted and only 160 obs will be left...

 

@Quentin's reply with Surveyselect code is your best bet.

The SAMPRATE option indicates what percentage of records to select (or keep). A SAMPRATE value less than one such as .8 is keep 80 percent, or if you use a value greater than one it is treated as percentage so Samprate=80 would keep 80 percent of the records.

OR you can actually specifize the number of records to keep with SAMPSIZE if you prefer.

 

If you insist on using a removal percentage then have your code substract that value from 100 and place that in the SAMPRATE= option.

Regular Contributor
Posts: 190

Re: delete obs according to percentage specified

@ballward my criteria is i calculate, want to delete a certain percent of obs values from the DS observations.. dont want to keep the specific number of values i.e i want to directly calculate 20% of 300 that is 40 & then delete 40 obs from the total 300 obs tats it....
PROC Star
Posts: 1,231

Re: delete obs according to percentage specified

That is what surveryselect is doing.  You specify how many records (or what proportion) to select, rather than delete.

 

If you really want to do it with data step code, this page presents surveyselect and two data step options.

 

http://support.sas.com/kb/24/722.html

Super User
Posts: 17,819

Re: delete obs according to percentage specified


RTelang wrote:
@ballward my criteria is i calculate, want to delete a certain percent of obs values from the DS observations.. dont want to keep the specific number of values i.e i want to directly calculate 20% of 300 that is 40 & then delete 40 obs from the total 300 obs tats it....

That's like saying 2 + 3 is different from 3 + 2. 

 

 

Super User
Posts: 17,819

Re: delete obs according to percentage specified


RTelang wrote:
@Quentin can't my code be updated to randomly delete certain specific percent of observations from the total observations in the dataset... as i am new to surveyselect don't know its working....

I suggest reading the documentation then. 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 20 replies
  • 521 views
  • 8 likes
  • 8 in conversation