Below code is to get the percent of the obs, I hav 300 obs in test dataset i want to delete certain random oberservations according to the percent provided to my code parameter. means if the delete_freq=20 then 20% of 300 ie from the dataset 60 obs should be deleted and only 240 observations in test1 should be remaining.
Am at a fix to populate a condition to delete the obs according to the delete_freq parameter any help is appreciated.
%macro delobs(delete_freq=)
data test;
set key.test1;
run;
proc sort data= test;
by name;
run;
proc freq data=work.test;
tables name / out=counts (KEEP=name count percent);
run;
data all;
merge counts test;
by name;
if count <= &delete_Freq then delete variable_obs;
run;
%mend;
%delobs(delete_freq=12);
Just a simple code that output depending on percentage.
proc sql;
select count(*) Into: OBSCOUNT
from sashelp.class;
quit;
%let pct=80 ; /* Percentage of observation to output */
data test1;
set sashelp.class ;
if _N_=INT((&pct*&obscount)/100)+1 then stop;
run;
%macro delobs(delete_freq=) data test; set key.test1; run; proc sort data= test; by name; run; proc freq data=work.test; tables name / out=counts (KEEP=name count percent); run; data all; merge counts test; by name; if count > &delete_Freq. then output; run; %mend delobs; %delobs(delete_freq=12);
Reverse your logic and only output if count > paramter.
Do you need to delete exactly 60 obs, or would it be okay to give each record a 20% chance of deletion, which could result in more or less records deleted, depending on chance?
@Quentin I was thinking it would be easier to select 80% of the records using SURVEYSELECT.
The datastep in NOT RANDOM. It always get sorted by name and removes all within specified range.
Look into PROC SURVEYSELECT instead.
@Quentin i want to randomly delete obs from my data that is in terms of percentage. has mentioned above i hav 300 obs so if my parameter value is 20 so 20% of 300 is 240 so in the output dataset 240 obs should be remaining.
Agree with the suggestions from others that this is what PROC SURVEYSELECT was made for. Gone are the days when you might have created a variable with random numbers, sorted, and then selected records yourself.
proc surveyselect
data=sashelp.shoes
method=srs
n=240
/* samprate=.8 */
out=MySample
seed=0
;
run;
You can specify the count of records to select, or the sampling rate. Of course you could wrap that in a macro, which could compute the count you want from the rate, or ...
Your example code does not even look like it was attempting to create a subset of a specific percentage size of the original data. It looks like it was attempting to delete records whose identification variable, Name, appeared fewer than a specified number of times.
So which do you want to do: Select a random subset of a specified number of records or remove very specific identified records that appear with low frequency?
The basic approach you started with will not remove a "random" anything. As a minimum you need a random value somewhere if the goal is actually a random subset. And Surveyselect will not require passing the data through multiple data steps or procedures.
@ballarwd I want to delete observations from the dataset based on the delete_freq=20 parameter i.e delete only 20% of the total observations. for example i have 200obs & have value 20 in delete_freq then just delete 20% of total 200 obs i.e any 40 obs will be deleted and only 160 obs will be left...
@RTelang wrote:
I want to delete observations from the dataset based on the delete_freq=20 parameter i.e delete only 20% of the total observations. for example i have 200obs & have value 20 in delete_freq then just delete 20% of total 200 obs i.e any 40 obs will be deleted and only 160 obs will be left...
@Quentin's reply with Surveyselect code is your best bet.
The SAMPRATE option indicates what percentage of records to select (or keep). A SAMPRATE value less than one such as .8 is keep 80 percent, or if you use a value greater than one it is treated as percentage so Samprate=80 would keep 80 percent of the records.
OR you can actually specifize the number of records to keep with SAMPSIZE if you prefer.
If you insist on using a removal percentage then have your code substract that value from 100 and place that in the SAMPRATE= option.
That is what surveryselect is doing. You specify how many records (or what proportion) to select, rather than delete.
If you really want to do it with data step code, this page presents surveyselect and two data step options.
http://support.sas.com/kb/24/722.html
@RTelang wrote:
@ballward my criteria is i calculate, want to delete a certain percent of obs values from the DS observations.. dont want to keep the specific number of values i.e i want to directly calculate 20% of 300 that is 40 & then delete 40 obs from the total 300 obs tats it....
That's like saying 2 + 3 is different from 3 + 2.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.