Proc Freq Output

Danglytics · Posted 04-13-2012 01:45 PM

Hi,

I have a transactional dataset that is keyed on an ID that is a 7 character string, I would like to recode this ID to a numeric ID.

Input dataset would look like

PID status

abcdefg A

abcdefg B

bcdefgh A

cdefghi A

etc..

I would like my ouput dataset to look like

PID status NID

abcdefg A 1

abcdefg B 1

bcdefgh A 2

cdefghi A 3

etc..

The logic I thought about is using PROC FREQ to get all the unique PID and use the _N_ as the new recoded NID and match back to the original dataset, but i'm not sure how to create the recode dataset on the ouput of PROC FREQ.

I know the following doesnt work but i'm not sure how to get what I want..

PROC FREQ DATA=WORK.INPUT NOPRINT ;

TABLE PID ;

OUTPUT OUT=WORK.RECODE _N_ ;

RUN ;

Thanks for your help.

Reeza · Posted 04-13-2012 01:48 PM

use the By groups feature instead and a retain statement to maintain the value until you reach a new group.

Data want;

set have;

by pid;

retain new_id 0;

if first.pid then new_id+1;

run;

Linlin · Posted 04-13-2012 01:58 PM

try this one:

data have;

input PID $7. status $;

cards;

abcdefg A

abcdefg B

bcdefgh A

cdefghi A

bcdefgh B

cdefghi B

;

proc sort;

by pid;

data want;

set have;

by pid;

nid+first.pid;

run;

proc print;run;

Obs PID status nid

1 abcdefg A 1

2 abcdefg B 1

3 bcdefgh A 2

4 bcdefgh B 2

5 cdefghi A 3

6 cdefghi B 3

Linlin

Haikuo · Posted 04-14-2012 09:17 PM

If using DOW, then automatic variable _n_ actually is what OP wants. Borrowing some of LinLin's code:

data have;

input PID $7. status $;

cards;

abcdefg A

abcdefg B

bcdefgh A

cdefghi A

bcdefgh B

cdefghi B

;

proc sort;

by pid;

data want;

do until (last.pid);

set have;

by pid;

nid=_n_;

output;

end;

run;

proc print;run;

Haikuo

Patrick · Posted 04-14-2012 12:20 AM

A hash approach which avoids the need to pre-sort the source data set.

data have;
input PID $7. status $;
cards;
abcdefg A
bcdefgh B
abcdefg B
bcdefgh A
cdefghi A
cdefghi B
;
run;

data want(drop=_:);

set have;

attrib NID length=8;

if _n_=1 then

do;

declare hash h1();

_rc = h1.defineKey('PID');

_rc = h1.defineData('NID');

_rc = h1.defineDone( );

end;

if h1.find() ne 0 then

do;

_NID+1;

NID=_NID;

_rc=h1.add();

end;

run;

Small code change added to make it work correctly.

MikeZdeb · Posted 04-14-2012 09:02 AM

hi ... I tried Patrick's "no sort" method and I don't think that it gives the counts within STATUS as requested (does it need a tweak?)

here's a "no sort" method, though Linlins' sort + set + by is most likely a better choice with large files

the COUNTs will take a bit longer ... you'll have to add an LRECL option on the FILE statement to accommodate more data (a limit of 32K with _FILE_, but it's still pretty fast even with 24K obs, less than 1 second on a not too fast PC)

data have;

input pid : $7. status : $1. @@;

datalines;

abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C

;

filename nosee dummy;

data want;

set have;

file nosee;

put status @;

nid = count(_file_,status);

run;

pid status nid

abcdefg A 1

bcdefgh B 1

abcdefg A 2

bcdefgh A 3

cdefghi A 4

cdefghi C 1

abcdefg A 5

bcdefgh B 2

abcdefg A 6

bcdefgh A 7

cdefghi A 8

cdefghi C 2

MikeZdeb · Posted 04-14-2012 08:11 PM

hi ... if both Linlin and I interpreted the original post correctly (produce counts within values of STATUS), here's a modification of the hash approach that, as you say, eliminates the need for the sort ...

data have;

input pid : $7. status : $1. @@;

datalines;

abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C

;

data want;

declare hash h();

h.defineKey('status');

h.defineData('nid');

h.defineDone();

do until (done);

set have end=done;

if h.find() then h.add();

nid + 1;

output;

h.replace();

call missing (nid);

end;

stop;

run;

pid status nid

abcdefg A 1

bcdefgh B 1

abcdefg A 2

bcdefgh A 3

cdefghi A 4

cdefghi C 1

abcdefg A 5

bcdefgh B 2

abcdefg A 6

bcdefgh A 7

cdefghi A 8

cdefghi C 2

Proc Freq Output

Re: Proc Freq Output

Re: Proc Freq Output

Re: Proc Freq Output

Re: Proc Freq Output

Re: Proc Freq Output

Re: Proc Freq Output

Proc Freq Output

Re: Proc Freq Output

Re: Proc Freq Output

Re: Proc Freq Output

Re: Proc Freq Output

Re: Proc Freq Output

Re: Proc Freq Output

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away