10-21-2016 06:37 AM
Below code is to get the percent of the obs, I hav 300 obs in test dataset i want to delete certain random oberservations according to the percent provided to my code parameter. means if the delete_freq=20 then 20% of 300 ie from the dataset 60 obs should be deleted and only 240 observations in test1 should be remaining.
Am at a fix to populate a condition to delete the obs according to the delete_freq parameter any help is appreciated.
proc sort data= test;
proc freq data=work.test;
tables name / out=counts (KEEP=name count percent);
merge counts test;
if count <= &delete_Freq then delete variable_obs;
10-21-2016 04:19 PM
Just a simple code that output depending on percentage.
select count(*) Into: OBSCOUNT
%let pct=80 ; /* Percentage of observation to output */
set sashelp.class ;
if _N_=INT((&pct*&obscount)/100)+1 then stop;
10-21-2016 07:20 AM
%macro delobs(delete_freq=) data test; set key.test1; run; proc sort data= test; by name; run; proc freq data=work.test; tables name / out=counts (KEEP=name count percent); run; data all; merge counts test; by name; if count > &delete_Freq. then output; run; %mend delobs; %delobs(delete_freq=12);
Reverse your logic and only output if count > paramter.
10-21-2016 08:34 AM
Do you need to delete exactly 60 obs, or would it be okay to give each record a 20% chance of deletion, which could result in more or less records deleted, depending on chance?
10-21-2016 09:48 AM - edited 10-21-2016 09:49 AM
@Quentin i want to randomly delete obs from my data that is in terms of percentage. has mentioned above i hav 300 obs so if my parameter value is 20 so 20% of 300 is 240 so in the output dataset 240 obs should be remaining.
10-21-2016 10:39 AM
Agree with the suggestions from others that this is what PROC SURVEYSELECT was made for. Gone are the days when you might have created a variable with random numbers, sorted, and then selected records yourself.
proc surveyselect data=sashelp.shoes method=srs n=240 /* samprate=.8 */ out=MySample seed=0 ; run;
You can specify the count of records to select, or the sampling rate. Of course you could wrap that in a macro, which could compute the count you want from the rate, or ...
10-21-2016 10:53 AM
10-21-2016 11:32 AM - edited 10-21-2016 11:50 AM
Your example code does not even look like it was attempting to create a subset of a specific percentage size of the original data. It looks like it was attempting to delete records whose identification variable, Name, appeared fewer than a specified number of times.
So which do you want to do: Select a random subset of a specified number of records or remove very specific identified records that appear with low frequency?
The basic approach you started with will not remove a "random" anything. As a minimum you need a random value somewhere if the goal is actually a random subset. And Surveyselect will not require passing the data through multiple data steps or procedures.
10-21-2016 11:51 AM - edited 10-21-2016 12:05 PM
@ballarwd I want to delete observations from the dataset based on the delete_freq=20 parameter i.e delete only 20% of the total observations. for example i have 200obs & have value 20 in delete_freq then just delete 20% of total 200 obs i.e any 40 obs will be deleted and only 160 obs will be left...
10-21-2016 11:58 AM
I want to delete observations from the dataset based on the delete_freq=20 parameter i.e delete only 20% of the total observations. for example i have 200obs & have value 20 in delete_freq then just delete 20% of total 200 obs i.e any 40 obs will be deleted and only 160 obs will be left...
@Quentin's reply with Surveyselect code is your best bet.
The SAMPRATE option indicates what percentage of records to select (or keep). A SAMPRATE value less than one such as .8 is keep 80 percent, or if you use a value greater than one it is treated as percentage so Samprate=80 would keep 80 percent of the records.
OR you can actually specifize the number of records to keep with SAMPSIZE if you prefer.
If you insist on using a removal percentage then have your code substract that value from 100 and place that in the SAMPRATE= option.
10-21-2016 12:25 PM
10-21-2016 12:55 PM
That is what surveryselect is doing. You specify how many records (or what proportion) to select, rather than delete.
If you really want to do it with data step code, this page presents surveyselect and two data step options.
10-21-2016 01:17 PM
@ballward my criteria is i calculate, want to delete a certain percent of obs values from the DS observations.. dont want to keep the specific number of values i.e i want to directly calculate 20% of 300 that is 40 & then delete 40 obs from the total 300 obs tats it....
That's like saying 2 + 3 is different from 3 + 2.
10-21-2016 11:54 AM
@Quentin can't my code be updated to randomly delete certain specific percent of observations from the total observations in the dataset... as i am new to surveyselect don't know its working....
I suggest reading the documentation then.
Need further help from the community? Please ask a new question.