I'm dropping duplcated columns from my tables with this code.
Step by step explanation:
I'm creating the table toto through a select on the table dicoat.
Once created, I'm having a datastep
However the while loop is not recognized. for the datastep. The rule should be if there are duplicate columns, then drop the duplicate columns
and do that until the counting is superior or equal to one.
The issue is that the while loop is not recognized. Edited: Is it allowed to put a proc sql within a loop like that?
Why?
Am I doing something wrong?
Thanks
proc sql;
create table toto
as select min(nomvar) as nomvar,count(title) as counting
from dicoat
group by title
having count(title) > 1;
data work.toto;
set toto;
do while(counting>=1);
proc sql;
delete from dicoat where nomvar in (select nomvar from toto);
insert into toto
select min(nomvar) as nomvar,count(title) as counting from dicoat
group by title
having count(title) > 1;
end;
run;
data _null_;
file tempf;
set toto end=lastobs;
if _n_=1 then put "data aat;set aat (drop=";
put var /;
if lastobs then put ");run;";
run;
%inc tempf;
filename tempf clear;
Not really seeing what you try to achieve, some sample data can be of help - along with a more extensive description of your problem.
And no, you can't mix SQL and data step in that way. You could call SQL from within a data step using call execute, but I can't tell if that's the right way to go for you. My gut feeling is that there is a simpler way to do this.
Ok LinusH.
I have a table called dicoat.
It will be something like this
data dicoat;
input title$ nomvar$;
cards;
aty var12
zat var13
zat var52
aty var55
zat var98
att var100
;
As you can see, I'm trying to remove the duplicate from dicoat with my code recursively until dicoat has only one nomvar and one title. In the same time, I need to put all the duplcate into a macro variable as I need to use that macrovariable
to drop the columns in the table aat as I've put in my code above.
aat before transformation
var12 var13 var52 var55 var98 var100
aat after columns drop
var12 var13 var100
I hope it is clearer. If not, let me know.
Why don't you just build a macro variable with all variables that should be dropped? And then just rewrite you table using a accumulated drop statement.
proc sort data=dicoat;
by title;
run;
data _null_;
set dicoat end=last;
length dropvar $1000;
retain dropvar;
by title;
if not first.title then dropvar = catx(' ',dropvar,nomvar);
if last then call symput('dropvar',trim(dropvar));
run;
options symbolgen;
data aat;
set aat(drop=&DROPVAR.);
run;
Thanks LinusH.
Let me have a look.
Andy,
In the example you showed - I see zat appears twice.
You'd like to only see zat once, correct? In that case, how do you decide which row to delete? var13 or var52?
Hi Jberry,
It does not matter , which variables to delete. However, as I have many , I need one to recursively delete them until I have only one.
Does it make sense?
If you don't care which row appears, try
proc sql;
create table no_dupes as
select *
from (select *,monotonic() as rn from dicoat) AS rs
group by title
having rn=min(rn);
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.