Giving numbers based on column values- no need to remove any duplicates

Reply
Contributor
Posts: 35

Giving numbers based on column values- no need to remove any duplicates

Dear all,

I have a data set which has a variable containing some card nos with duplicates. I want to give numbering to all the cards in a separate column in such a way that all the same card nos hold the same number(new column) please see the below example and kindly advise:

Have dataset= only column1 from below

want dataset= column1 and 2 as given below

column1 column2

456789     1

456789     1

234567     2

456789     1

543672     3

543672     3

234567     2

891234     4

if I filter with cilumn2 with 1 I shd get the card no 456789 three times ( as it is present 3 times in the dataset and I don't want to remove the duplicates)

thank you in advance!!

Contributor
Posts: 74

Re: Giving numbers based on column values- no need to remove any duplicates

Hi, I would probably do it this way:

proc format;

    value newcode

456789 = '1'  /* if this information comes from a file you can create the format with a CTLIN file, or simply merge that file with the data */

234567 = '2'

543672 = '3'

891234 = '4';

run;

data a;

    length col1 8;

    input col1;

datalines;

456789

456789

234567

456789

543672

543672

234567

891234

;

run;

data b;

    set a;

    col2 = put(col1, newcode.);

run;

proc freq data = b; table col2 * col1 / nopercent nocum list ; run;

Hope that helps.

Super User
Super User
Posts: 7,977

Re: Giving numbers based on column values- no need to remove any duplicates

That requires hardcoding the proc format.  You can of course do it simply by sorting the dataset then:

data want;

     set have;

     retain column2;

     if _n_=1 then column2=1;

     else if column1 ne lag(column1) then column2=1;

     else column2=column2+1;

run;

If you, for some unknown reason need the data in the order it previously was then add _n_ into the dataset as it stands, then sort, add the counter, then sort by the old _n_.

E.g.

data inter;

     set have;

     ord=_n_;

run;

proc sort data=inter;

     by column1;

run;

data want;

     set inter;

     retain column2;

     if _n_=1 then column2=1;

     else if column1 ne lag(column1) then column2=1;

     else column2=column2+1;

run;

proc sort data=want;

     by ord;

run;

Contributor
Posts: 52

Re: Giving numbers based on column values- no need to remove any duplicates


A solution amongst others.

/*******************************/
/**** random sample dataset ****/
/*******************************/
data t_have(keep=id);
  do i= 1 to 10;
     id = 10+ ceil(3*ranuni(7));
  output;
  end;
run;

/*****************************/
/**** a possible solution ****/
/*****************************/
data t_want(keep=id rnk);
   length id rnk 8.;
   if _N_=1 then do;
      declare hash ha(multidata:'N');
         ha.definekey('zId');
         ha.definedata('zId','zRnk');
         ha.definedone();
   end;
   rnk=0;
   do until(aDone);
      set t_have end=aDone;
      zId=id;
      rc=ha.find();
      if (rc>0) then do; rnk+1; zRnk=rnk; ha.add(); output; end;
                else do; rnk=zRnk; output; end;
   end;
run;

t_want becomes:
====================
id    rnk

11     1
13     2
13     2
13     2
12     3
13     2
13     2
13     2
12     3
12     3
====================

hope this helps.

Ask a Question
Discussion stats
  • 3 replies
  • 159 views
  • 0 likes
  • 4 in conversation