BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RTelang
Fluorite | Level 6

Below  code is to get the percent of the obs, I hav 300 obs in test dataset i want to delete certain random oberservations according to the percent provided to my code parameter. means if the delete_freq=20 then 20% of  300 ie from the dataset 60 obs should be deleted and only 240 observations in test1 should be remaining.

Am at a fix to populate a condition to delete the obs according to the delete_freq parameter any help is appreciated.

 


%macro delobs(delete_freq=)
data test;
set key.test1;
run;

proc sort data= test;
by name;
run;

proc freq data=work.test;
tables name / out=counts (KEEP=name count percent);
run;

data all;
merge counts test;
by name;
if count <= &delete_Freq then delete variable_obs;
run;

%mend;
%delobs(delete_freq=12);

 

1 ACCEPTED SOLUTION

Accepted Solutions
SuryaKiran
Meteorite | Level 14

Just a simple code that output depending on percentage.

proc sql;

select count(*) Into: OBSCOUNT

from sashelp.class;

quit;

%let pct=80 ; /* Percentage of observation to output */

data test1;

set sashelp.class ;

if _N_=INT((&pct*&obscount)/100)+1 then stop;

run;

 

Thanks,
Suryakiran

View solution in original post

20 REPLIES 20
RW9
Diamond | Level 26 RW9
Diamond | Level 26
%macro delobs(delete_freq=)
  data test;
    set key.test1;
  run;
  proc sort data= test;
    by name;
  run;
  proc freq data=work.test;
    tables name / out=counts (KEEP=name count percent);
  run;
  data all;
    merge counts test;
    by name;
    if count > &delete_Freq. then output;
  run;
%mend delobs;
%delobs(delete_freq=12);

Reverse your logic and only output if count > paramter.

Quentin
Super User

Do you need to delete exactly 60 obs, or would it be okay to give each record a 20% chance of deletion, which could result in more or less records deleted, depending on chance?

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
data_null__
Jade | Level 19

@Quentin I was thinking it would be easier to select 80% of the records using SURVEYSELECT. 

Reeza
Super User

The datastep in NOT RANDOM. It always get sorted by name and removes all within specified range. 

 

Look into PROC SURVEYSELECT instead. 

RTelang
Fluorite | Level 6

@Quentin i want to randomly delete obs from my data that is in terms of percentage. has mentioned above i hav 300 obs so if my parameter value is 20 so 20% of 300 is 240 so in the output dataset 240 obs should be remaining.

Quentin
Super User

Agree with the suggestions from others that this is what PROC SURVEYSELECT was made for.  Gone are the days when you might have created a variable with random numbers, sorted, and then selected records yourself.

 

proc surveyselect 
  data=sashelp.shoes
  method=srs
  n=240
/*  samprate=.8 */
  out=MySample
  seed=0
;
run;

 

You can specify the count of records to select, or the sampling rate.  Of course you could wrap that in a macro, which could compute the count you want from the rate, or ...

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
RTelang
Fluorite | Level 6
@Quentin can't my code be updated to randomly delete certain specific percent of observations from the total observations in the dataset... as i am new to surveyselect don't know its working....
ballardw
Super User

Your example code does not even look like it was attempting to create a subset of a specific percentage size of the original data. It looks like it was attempting to delete records whose identification variable, Name, appeared fewer than a specified number of times.

 

So which do you want to do: Select a random subset of a specified number of records or remove very specific identified records that appear with low frequency?

 

The basic approach you started with will not remove a "random" anything. As a minimum you need a random value somewhere if the goal is actually a random subset. And Surveyselect will not require passing the data through multiple data steps or procedures.

 

 

RTelang
Fluorite | Level 6

@ballarwd I want to delete observations from the dataset based on the delete_freq=20 parameter i.e delete only 20% of the total observations. for example i have 200obs & have value 20 in delete_freq then just delete 20% of total 200 obs i.e any 40 obs will be deleted and only 160 obs will be left...

ballardw
Super User

@RTelang wrote:
I want to delete observations from the dataset based on the delete_freq=20 parameter i.e delete only 20% of the total observations. for example i have 200obs & have value 20 in delete_freq then just delete 20% of total 200 obs i.e any 40 obs will be deleted and only 160 obs will be left...

 

@Quentin's reply with Surveyselect code is your best bet.

The SAMPRATE option indicates what percentage of records to select (or keep). A SAMPRATE value less than one such as .8 is keep 80 percent, or if you use a value greater than one it is treated as percentage so Samprate=80 would keep 80 percent of the records.

OR you can actually specifize the number of records to keep with SAMPSIZE if you prefer.

 

If you insist on using a removal percentage then have your code substract that value from 100 and place that in the SAMPRATE= option.

RTelang
Fluorite | Level 6
@ballward my criteria is i calculate, want to delete a certain percent of obs values from the DS observations.. dont want to keep the specific number of values i.e i want to directly calculate 20% of 300 that is 40 & then delete 40 obs from the total 300 obs tats it....
Quentin
Super User

That is what surveryselect is doing.  You specify how many records (or what proportion) to select, rather than delete.

 

If you really want to do it with data step code, this page presents surveyselect and two data step options.

 

http://support.sas.com/kb/24/722.html

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Reeza
Super User

@RTelang wrote:
@ballward my criteria is i calculate, want to delete a certain percent of obs values from the DS observations.. dont want to keep the specific number of values i.e i want to directly calculate 20% of 300 that is 40 & then delete 40 obs from the total 300 obs tats it....

That's like saying 2 + 3 is different from 3 + 2. 

 

 

Reeza
Super User

@RTelang wrote:
@Quentin can't my code be updated to randomly delete certain specific percent of observations from the total observations in the dataset... as i am new to surveyselect don't know its working....

I suggest reading the documentation then. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 2484 views
  • 8 likes
  • 8 in conversation