Hello,
I have been using the dataset Class from sashelp.class as data source to illustrate what I would like to do.
So, here's the case scenario. We have many datasets, from dataset1 to dataset4. In each dataset, we have 5 records except the last one (dataset4) which has 4 records.
The dataset class3 is generated randomly to provide a list of records to be deleted from the original dataset, and we also need to keep a trace of the information that was deleted somewhere else. (dataset1A -- dataset4A).
Also, I would like to open a dataset, delete all the requested records in one shot then closed it.
So, I wonder if the script below is deleting all the requested records per dataset in one shot.
data lookup;
input name $8. dsname $9.;
datalines;
Alfred dataset1
Alice dataset1
Barbara dataset1
Carol dataset1
Henry dataset1
James dataset2
Jane dataset2
Janet dataset2
Jeffrey dataset2
John dataset2
Joyce dataset3
Judy dataset3
Louise dataset3
Mary dataset3
Philip dataset3
Robert dataset4
Ronald dataset4
Thomas dataset4
William dataset4
;
run;
data class;
set sashelp.class;
run;
proc sql;
create table class2 as
select a.*,
b.dsname
from class as a
inner join lookup as b
on(a.name=b.name);
quit;
%macro multipleds;
%do j=1 %to 4;
%let name=%sysfunc(cats(dataset,&j));
%put &name.;
Data &name.;
set class2;
if compress(dsname) eq %tslit(dataset&j.) then output dataset&j.;
run;
%end;
%mend multipleds;
%multipleds;
data temp (keep=name);
if 0 then set SASHELP.CLASS nobs=NOBS;
do I=1 to 10;
N=rand('uniform',1,NOBS);
OBS=N;
set SASHELP.CLASS point=N;
output;
end ;
stop ;
run;
/**** class3 contains the list of records to be deleted ****/
proc sql;
create table class3 as
select a.*
from lookup as a
inner join temp as b
on(a.name=b.name);
quit;
/********* put deleted records into new datasets *************/
data _null_;
set class3;
dsname2=cats(dsname,"A");
call execute
(compbl(cat(
"proc sql;",
"create table ", dsname2, " as ",
"select a.* ",
"from ",dsname," as a ",
"where name in (select name from class3) ",
"having dsname eq ",%tslit(dsname),";quit;"
)));
run;
/********* Deleting few records per datasets *************/
data _null_;
set class3;
call execute
(compbl(cat(
"proc sql;",
"delete ",
"from ",dsname,
"where name in (select name from class3 having dsname eq ",%tslit(dsname),");quit;"
)));
run;
%macro multipleds2;
%do j=1 %to 4;
%let name=%sysfunc(cats(dataset,&j));
%put &name.;
Data &name.;
set &name.;
run;
%end ;
%mend multipleds2;
%multipleds2;
So, I wonder if the script below is deleting all the requested records per dataset in one shot.
You could try it and see. Maxim 4
I'm not quite sure I know what you want. This would be my take on what I think is going on.
The first data step combines data sets to filter and adds the name of the source data set, library and data set, for each record.
The sorts are to get the combined and temp set into the same order.
The last data step uses the In= options to add temporary variables that indicate which data sets contribute to the current observation when merged and then use conditional logic to write to two different data sets. The removed has the data that matches the Temp data. This does assume that one of the data sets (really should be temp) does not have repeats of the matching variable(s).
data combined; set set1 set2 set3 set4 indsname=dsn; source = dsn; run; proc sort data=combined; by name; /* what ever matching criteria */ run; proc sort data=temp; by name; run; data removed kept ; merge combined (in=in1) temp (in=in2) ; by name; /* same matching criteria as sorts*/ if in2 then output removed; else output kept; run;
If you don't want the Source variable in the Kept data set add the option (drop=source).
Imagine that we have a list of policies to delete from the original datasets (removepolicieslst)
Removepolicieslst look like that:
PolicyNumber dataset path …
A1111001 Dataset1 p:/path1
A1111011 Dataset1 p:/path1
A1111101 Dataset1 p:/path1
A1111051 Dataset2 p:/path1
A1111061 Dataset2 p:/path1
And so on….
Now we want to remove from dataset1, all the transactions associated to the policyNumber A1111001, A1111011, A1111101 in one shot, then send this information into the dataset1A, instead of opening dataset1, remove the records associated to A1111001, then update the dataset1 and at the same time send the information to dataset1A. Do the same steps for the transactions associated to A1111011 and to A1111101.
A global dataset containing all the information on thousands of policies is not possible.
So my question is, with the example done with sashelp.class are we removing the transaction from dataset1, then dataset2, then dataset3 thereafter the dataset4.
"in one shot" ? Depends a lot on what you mean by that.
It is a little hard to tell what part of your code is setting up the example and which part is the actual attempt to produce the output from that example input.
Let's assume that LOOKUP was your attempt to make datasets named DATASET1 to DATASET4 from SASHELP.CLASS. So something like:
data dataset1 dataset2 dataset3 dataset4;
set sashelp.class;
if findw('Alfred Alice Barbara Carol Henry',Name,' ','st') then output dataset1 ;
if findw('James Jane Janet Jeffrey John',Name,' ','st') then output dataset2 ;
if findw('Joyce Judy Louise Mary Philip',Name,' ','st') then output dataset3 ;
if findw('Robert Ronald Thomas William',Name,' ','st') then output dataset4 ;
run;
And that TEMP is your attempt to make a list of keys to be excluded. So something like:
data temp;
do i=1 to 10;
p=rand('uniform',1,nobs);
set sashelp.class point=p nobs=nobs;
output;
end;
stop;
keep name;
run;
proc sort;
by name;
run;
Does that dataset actually need to have DSNAME variable? Why? Could the same NAME appear in more than one of the datasets? Let's just assume it is not important yet.
So to then split your four input datasets into eight output datasets you might so something like:
data in1 in2 in3 in4 out1 out2 out3 out4;
merge dataset1(in=in1) dataset2(in=in2) dataset3(in=in3) dataset4(in=in4) temp(in=exclude);
by name;
if in1 then if exclude then output out1; else output in1;
if in2 then if exclude then output out2; else output in2;
if in3 then if exclude then output out3; else output in3;
if in4 then if exclude then output out4; else output in4;
run;
Result will look something like this:
179 data in1 in2 in3 in4 out1 out2 out3 out4; 180 merge dataset1(in=in1) dataset2(in=in2) dataset3(in=in3) dataset4(in=in4) temp(in=exclude); 181 by name; 182 if in1 then if exclude then output out1; else output in1; 183 if in2 then if exclude then output out2; else output in2; 184 if in3 then if exclude then output out3; else output in3; 185 if in4 then if exclude then output out4; else output in4; 186 run; NOTE: There were 5 observations read from the data set WORK.DATASET1. NOTE: There were 5 observations read from the data set WORK.DATASET2. NOTE: There were 5 observations read from the data set WORK.DATASET3. NOTE: There were 4 observations read from the data set WORK.DATASET4. NOTE: There were 10 observations read from the data set WORK.TEMP. NOTE: The data set WORK.IN1 has 3 observations and 5 variables. NOTE: The data set WORK.IN2 has 2 observations and 5 variables. NOTE: The data set WORK.IN3 has 4 observations and 5 variables. NOTE: The data set WORK.IN4 has 3 observations and 5 variables. NOTE: The data set WORK.OUT1 has 3 observations and 5 variables. NOTE: The data set WORK.OUT2 has 5 observations and 5 variables. NOTE: The data set WORK.OUT3 has 1 observations and 5 variables. NOTE: The data set WORK.OUT4 has 1 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.04 seconds cpu time 0.04 seconds
So if you know there are FOUR datasets to read in and EIGHT datasets to write out then a few well placed %DO loops will allow you to generate that code pattern. So perhaps something like this:
%macro exclude
(dslist
,excludeds
);
%local i n dsname;
%let n=%sysfunc(countw(&dslist,%str( )));
data
%do i=1 %to &n ;
%let dsname=%scan(&dslist,&i,%str( ));
&dsname._out &dsname._in
%end;
;
merge
%do i=1 %to &n ;
%let dsname=%scan(&dslist,&i,%str( ));
&dsname(in=in&i)
%end;
&excludeds(in=exclude)
;
by NAME;
%do i=1 %to &n ;
%let dsname=%scan(&dslist,&i,%str( ));
if in&i then if exclude then output &dsname._out; else output &dsname._in;
%end;
run;
%mend exclude;
Let's try it:
249 options mprint; 250 %exclude(dataset1 dataset2 dataset3 dataset4,temp); MPRINT(EXCLUDE): data dataset1_out dataset1_in dataset2_out dataset2_in dataset3_out dataset3_in dataset4_out dataset4_in ; MPRINT(EXCLUDE): merge dataset1(in=in1) dataset2(in=in2) dataset3(in=in3) dataset4(in=in4) temp(in=exclude) ; MPRINT(EXCLUDE): by NAME; MPRINT(EXCLUDE): if in1 then if exclude then output dataset1_out; MPRINT(EXCLUDE): else output dataset1_in; MPRINT(EXCLUDE): if in2 then if exclude then output dataset2_out; MPRINT(EXCLUDE): else output dataset2_in; MPRINT(EXCLUDE): if in3 then if exclude then output dataset3_out; MPRINT(EXCLUDE): else output dataset3_in; MPRINT(EXCLUDE): if in4 then if exclude then output dataset4_out; MPRINT(EXCLUDE): else output dataset4_in; MPRINT(EXCLUDE): run; NOTE: There were 5 observations read from the data set WORK.DATASET1. NOTE: There were 5 observations read from the data set WORK.DATASET2. NOTE: There were 5 observations read from the data set WORK.DATASET3. NOTE: There were 4 observations read from the data set WORK.DATASET4. NOTE: There were 10 observations read from the data set WORK.TEMP. NOTE: The data set WORK.DATASET1_OUT has 3 observations and 5 variables. NOTE: The data set WORK.DATASET1_IN has 3 observations and 5 variables. NOTE: The data set WORK.DATASET2_OUT has 5 observations and 5 variables. NOTE: The data set WORK.DATASET2_IN has 2 observations and 5 variables. NOTE: The data set WORK.DATASET3_OUT has 1 observations and 5 variables. NOTE: The data set WORK.DATASET3_IN has 4 observations and 5 variables. NOTE: The data set WORK.DATASET4_OUT has 1 observations and 5 variables. NOTE: The data set WORK.DATASET4_IN has 3 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.04 seconds cpu time 0.04 seconds
Instead of randomly generating a list of delete requests in dataset CLASS3. I just made an arbitrary dataset DELETE_GROUP below. The main point is that you can let SAS do all the logical work using the WHERE constraints in the second DATA step below. Just make a hash object out of DELETE_GROUP keyed on name and dsname.
data delete_group;
input dsname $ name $;
datalines;
DATASET1 Barbara
DATASET1 Henry
DATASET2 James
DATASET2 Jeffrey
DATASET3 Mary
DATASET3 Philip
DATASET4 Thomas
;
run;
data keep_dataset1 (where=(dsname='DATASET1' and delete_code=0))
keep_dataset2 (where=(dsname='DATASET2' and delete_code=0))
keep_dataset3 (where=(dsname='DATASET3' and delete_code=0))
keep_dataset4 (where=(dsname='DATASET4' and delete_code=0))
del_dataset1 (where=(dsname='DATASET1' and delete_code=1))
del_dataset2 (where=(dsname='DATASET2' and delete_code=1))
del_dataset3 (where=(dsname='DATASET3' and delete_code=1))
del_dataset4 (where=(dsname='DATASET4' and delete_code=1))
;
set dataset1-dataset4 indsname=dsn;
dsname=scan(dsn,2);
if _n_=1 then do;
declare hash h (dataset:'delete_group');
h.definekey('name','dsname');
h.definedone();
end;
delete_code=(h.check()=0);
run;
And you don't have to macro-ize the logic to make it more dynamic. Once you have the DELETE_GROUPS dataset, you can use PROC SQL to make some macro vars naming the input and output datasets.
proc sql noprint;
select distinct dsname into :dsname_list separated by ' ' from delete_group;
select distinct cats('keep_',dsname)||' (where=(dsname='||quote(dsname)||' and delete_code=0))'
into :keep_list separated by ' ' from delete_group;
select distinct cats('del_',dsname)||' (where=(dsname='||quote(dsname)||' and delete_code=1))'
into :delete_list separated by ' ' from delete_group;
quit;
%put &=dsname_list;
%put &=keep_list;
%put &=delete_list;
data &keep_list &delete_list ;
set &dsname_list indsname=dsn;
dsname=scan(dsn,2);
if _n_=1 then do;
declare hash h (dataset:'delete_group');
h.definekey('name','dsname');
h.definedone();
end;
delete_code=(h.check()=0);
run;
@alepage You've got already solution proposals so just adding a few comments.
SAS is not a database and it does not support cascading deletes.
It's rarely done with SAS tables but you could define foreign key constraints to enforce that you never delete a master record while there are still transactions linked to it. One reason such constraints aren't used often with SAS tables is that it requires deletion directly on the source table and deletions are only logical meaning the table doesn't reduce in size.
If you need things like cascading deletes then I'd recommend to store the data in a database that supports such functionality.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.