Re: drop the duplicate columns with sas

andy_wk · Posted 06-10-2014 03:47 AM

I'm dropping duplcated columns from my tables with this code.

Step by step explanation:

I'm creating the table toto through a select on the table dicoat.

Once created, I'm having a datastep

However the while loop is not recognized. for the datastep. The rule should be if there are duplicate columns, then drop the duplicate columns

and do that until the counting is superior or equal to one.

The issue is that the while loop is not recognized. Edited: Is it allowed to put a proc sql within a loop like that?

Why?

Am I doing something wrong?

Thanks

proc sql; create table toto as select min(nomvar) as nomvar,count(title) as counting from dicoat group by title having count(title) > 1; data work.toto;

set toto; do while(counting>=1); proc sql; delete from dicoat where nomvar in (select nomvar from toto); insert into toto select min(nomvar) as nomvar,count(title) as counting from dicoat group by title having count(title) > 1; end; run; data _null_; file tempf; set toto end=lastobs; if _n_=1 then put "data aat;set aat (drop="; put var /; if lastobs then put ");run;"; run; %inc tempf; filename tempf clear;

LinusH · Posted 06-10-2014 04:41 AM

Not really seeing what you try to achieve, some sample data can be of help - along with a more extensive description of your problem.

And no, you can't mix SQL and data step in that way. You could call SQL from within a data step using call execute, but I can't tell if that's the right way to go for you. My gut feeling is that there is a simpler way to do this.

Data never sleeps

andy_wk · Posted 06-10-2014 04:52 AM

Ok LinusH.

I have a table called dicoat.

It will be something like this

data dicoat;

input title$ nomvar$;

cards;

aty var12

zat var13

zat var52

aty var55

zat var98

att var100

;

As you can see, I'm trying to remove the duplicate from dicoat with my code recursively until dicoat has only one nomvar and one title. In the same time, I need to put all the duplcate into a macro variable as I need to use that macrovariable

to drop the columns in the table aat as I've put in my code above.

aat before transformation

var12 var13 var52 var55 var98 var100

aat after columns drop

var12 var13 var100

I hope it is clearer. If not, let me know.

LinusH · Posted 06-10-2014 05:31 AM

Why don't you just build a macro variable with all variables that should be dropped? And then just rewrite you table using a accumulated drop statement.

proc sort data=dicoat;

by title;

run;

data _null_;

set dicoat end=last;

length dropvar $1000;

retain dropvar;

by title;

if not first.title then dropvar = catx(' ',dropvar,nomvar);

if last then call symput('dropvar',trim(dropvar));

run;

options symbolgen;

data aat;

set aat(drop=&DROPVAR.);

run;

Data never sleeps

andy_wk · Posted 06-10-2014 05:37 AM

Thanks LinusH.

Let me have a look.

JBerry · Posted 06-10-2014 09:08 PM

Andy,

In the example you showed - I see zat appears twice.

You'd like to only see zat once, correct? In that case, how do you decide which row to delete? var13 or var52?

andy_wk · Posted 06-11-2014 04:19 AM

Hi Jberry,

It does not matter , which variables to delete. However, as I have many , I need one to recursively delete them until I have only one.

Does it make sense?

JBerry · Posted 06-10-2014 09:35 PM

If you don't care which row appears, try

proc sql;

create table no_dupes as

select *

from (select *,monotonic() as rn from dicoat) AS rs

group by title

having rn=min(rn);

quit;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away