BookmarkSubscribeRSS Feed
Danglytics
Calcite | Level 5

  Hi,

I have a transactional dataset that is keyed on an ID that is a 7 character string, I would like to recode this ID to a numeric ID.

Input dataset would look like

PID status

abcdefg A

abcdefg B

bcdefgh A

cdefghi A

etc..

I would like my ouput dataset to look like

PID status NID

abcdefg A 1

abcdefg B 1

bcdefgh A 2

cdefghi A 3

etc..

The logic I thought about is using PROC FREQ to get all the unique PID and use the _N_ as the new recoded NID and match back to the original dataset, but i'm not sure how to create the recode dataset on the ouput of PROC FREQ.

I know the following doesnt work but i'm not sure how to get what I want..

PROC FREQ DATA=WORK.INPUT NOPRINT ;

TABLE PID ;

OUTPUT OUT=WORK.RECODE _N_ ;

RUN ;

Thanks for your help.


6 REPLIES 6
Reeza
Super User

use the By groups feature instead and a retain statement to maintain the value until you reach a new group.

Data want;

set have;

by pid;

retain new_id 0;

if first.pid then new_id+1;

run;

Linlin
Lapis Lazuli | Level 10

try this one:

data have;

input PID $7. status $;

cards;

abcdefg A

abcdefg B

bcdefgh A

cdefghi A

bcdefgh B

cdefghi B

;

proc sort;

by pid;

data want;

  set have;

  by pid;

  nid+first.pid;

  run;

  proc print;run;

Obs      PID      status    nid

1     abcdefg      A        1

2     abcdefg      B        1

3     bcdefgh      A        2

4     bcdefgh      B        2

5     cdefghi      A        3

6     cdefghi      B        3

Linlin

Haikuo
Onyx | Level 15

If using DOW, then automatic variable _n_ actually is what OP wants. Borrowing some of LinLin's code:

data have;

input PID $7. status $;

cards;

abcdefg A

abcdefg B

bcdefgh A

cdefghi A

bcdefgh B

cdefghi B

;

proc sort;

by pid;

data want;

do until (last.pid);

   set have;

     by pid;

     nid=_n_;

     output;

end;

run;

proc print;run;

Haikuo

Patrick
Opal | Level 21

A hash approach which avoids the need to pre-sort the source data set.

data have;
input PID $7. status $;
cards;
abcdefg A
bcdefgh B
abcdefg B
bcdefgh A
cdefghi A
cdefghi B
;
run;

data want(drop=_:);

   set have;

   attrib NID length=8;

   if _n_=1 then

   do;

      declare hash h1();

      _rc = h1.defineKey('PID');

      _rc = h1.defineData('NID');

      _rc = h1.defineDone( );

   end;

   if h1.find() ne 0 then

   do;

      _NID+1;

      NID=_NID;

      _rc=h1.add();

   end;

run;

Small code change added to make it work correctly.

MikeZdeb
Rhodochrosite | Level 12

hi ... I tried Patrick's "no sort" method and I don't think that it gives the counts within STATUS as requested (does it need a tweak?)

here's a "no sort" method, though Linlins' sort + set + by is most likely a better choice with large files

the COUNTs will take a bit longer ... you'll have to add an LRECL option on the FILE statement to accommodate more data (a limit of 32K with _FILE_, but it's still pretty fast even with 24K obs, less than 1 second on a not too fast PC)

data have;

input pid : $7. status : $1. @@;

datalines;

abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C

abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C

;

filename nosee dummy;

data want;

set have;

file nosee;

put status @;

nid = count(_file_,status);

run;

pid      status    nid

abcdefg      A        1

bcdefgh      B        1

abcdefg      A        2

bcdefgh      A        3

cdefghi      A        4

cdefghi      C        1

abcdefg      A        5

bcdefgh      B        2

abcdefg      A        6

bcdefgh      A        7

cdefghi      A        8

cdefghi      C        2

MikeZdeb
Rhodochrosite | Level 12

hi ... if both Linlin and I interpreted the original post correctly (produce counts within values of STATUS), here's a modification of the hash approach that, as you say, eliminates the need for the sort ...

data have;

input pid : $7. status : $1. @@;

datalines;

abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C

abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C

;

data want;

declare hash h();

h.defineKey('status');

h.defineData('nid');

h.defineDone();

do until (done);

   set have end=done;

   if h.find() then h.add();

   nid + 1;

   output;

   h.replace();

   call missing (nid);

end;

stop;

run;

  pid      status    nid

abcdefg      A        1

bcdefgh      B        1

abcdefg      A        2

bcdefgh      A        3

cdefghi      A        4

cdefghi      C        1

abcdefg      A        5

bcdefgh      B        2

abcdefg      A        6

bcdefgh      A        7

cdefghi      A        8

cdefghi      C        2

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1453 views
  • 0 likes
  • 6 in conversation