- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm dropping duplcated columns from my tables with this code.
Step by step explanation:
I'm creating the table toto through a select on the table dicoat.
Once created, I'm having a datastep
However the while loop is not recognized. for the datastep. The rule should be if there are duplicate columns, then drop the duplicate columns
and do that until the counting is superior or equal to one.
The issue is that the while loop is not recognized. Edited: Is it allowed to put a proc sql within a loop like that?
Why?
Am I doing something wrong?
Thanks
proc sql;
create table toto
as select min(nomvar) as nomvar,count(title) as counting
from dicoat
group by title
having count(title) > 1;
data work.toto;
set toto;
do while(counting>=1);
proc sql;
delete from dicoat where nomvar in (select nomvar from toto);
insert into toto
select min(nomvar) as nomvar,count(title) as counting from dicoat
group by title
having count(title) > 1;
end;
run;
data _null_;
file tempf;
set toto end=lastobs;
if _n_=1 then put "data aat;set aat (drop=";
put var /;
if lastobs then put ");run;";
run;
%inc tempf;
filename tempf clear;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not really seeing what you try to achieve, some sample data can be of help - along with a more extensive description of your problem.
And no, you can't mix SQL and data step in that way. You could call SQL from within a data step using call execute, but I can't tell if that's the right way to go for you. My gut feeling is that there is a simpler way to do this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok LinusH.
I have a table called dicoat.
It will be something like this
data dicoat;
input title$ nomvar$;
cards;
aty var12
zat var13
zat var52
aty var55
zat var98
att var100
;
As you can see, I'm trying to remove the duplicate from dicoat with my code recursively until dicoat has only one nomvar and one title. In the same time, I need to put all the duplcate into a macro variable as I need to use that macrovariable
to drop the columns in the table aat as I've put in my code above.
aat before transformation
var12 var13 var52 var55 var98 var100
aat after columns drop
var12 var13 var100
I hope it is clearer. If not, let me know.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why don't you just build a macro variable with all variables that should be dropped? And then just rewrite you table using a accumulated drop statement.
proc sort data=dicoat;
by title;
run;
data _null_;
set dicoat end=last;
length dropvar $1000;
retain dropvar;
by title;
if not first.title then dropvar = catx(' ',dropvar,nomvar);
if last then call symput('dropvar',trim(dropvar));
run;
options symbolgen;
data aat;
set aat(drop=&DROPVAR.);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks LinusH.
Let me have a look.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Andy,
In the example you showed - I see zat appears twice.
You'd like to only see zat once, correct? In that case, how do you decide which row to delete? var13 or var52?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jberry,
It does not matter , which variables to delete. However, as I have many , I need one to recursively delete them until I have only one.
Does it make sense?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you don't care which row appears, try
proc sql;
create table no_dupes as
select *
from (select *,monotonic() as rn from dicoat) AS rs
group by title
having rn=min(rn);
quit;