Hi,
I have a transactional dataset that is keyed on an ID that is a 7 character string, I would like to recode this ID to a numeric ID.
Input dataset would look like
PID status
abcdefg A
abcdefg B
bcdefgh A
cdefghi A
etc..
I would like my ouput dataset to look like
PID status NID
abcdefg A 1
abcdefg B 1
bcdefgh A 2
cdefghi A 3
etc..
The logic I thought about is using PROC FREQ to get all the unique PID and use the _N_ as the new recoded NID and match back to the original dataset, but i'm not sure how to create the recode dataset on the ouput of PROC FREQ.
I know the following doesnt work but i'm not sure how to get what I want..
PROC FREQ DATA=WORK.INPUT NOPRINT ;
TABLE PID ;
OUTPUT OUT=WORK.RECODE _N_ ;
RUN ;
Thanks for your help.
use the By groups feature instead and a retain statement to maintain the value until you reach a new group.
Data want;
set have;
by pid;
retain new_id 0;
if first.pid then new_id+1;
run;
try this one:
data have;
input PID $7. status $;
cards;
abcdefg A
abcdefg B
bcdefgh A
cdefghi A
bcdefgh B
cdefghi B
;
proc sort;
by pid;
data want;
set have;
by pid;
nid+first.pid;
run;
proc print;run;
Obs PID status nid
1 abcdefg A 1
2 abcdefg B 1
3 bcdefgh A 2
4 bcdefgh B 2
5 cdefghi A 3
6 cdefghi B 3
Linlin
If using DOW, then automatic variable _n_ actually is what OP wants. Borrowing some of LinLin's code:
data have;
input PID $7. status $;
cards;
abcdefg A
abcdefg B
bcdefgh A
cdefghi A
bcdefgh B
cdefghi B
;
proc sort;
by pid;
data want;
do until (last.pid);
set have;
by pid;
nid=_n_;
output;
end;
run;
proc print;run;
Haikuo
A hash approach which avoids the need to pre-sort the source data set.
data have;
input PID $7. status $;
cards;
abcdefg A
bcdefgh B
abcdefg B
bcdefgh A
cdefghi A
cdefghi B
;
run;
data want(drop=_:);
set have;
attrib NID length=8;
if _n_=1 then
do;
declare hash h1();
_rc = h1.defineKey('PID');
_rc = h1.defineData('NID');
_rc = h1.defineDone( );
end;
if h1.find() ne 0 then
do;
_NID+1;
NID=_NID;
_rc=h1.add();
end;
run;
Small code change added to make it work correctly.
hi ... I tried Patrick's "no sort" method and I don't think that it gives the counts within STATUS as requested (does it need a tweak?)
here's a "no sort" method, though Linlins' sort + set + by is most likely a better choice with large files
the COUNTs will take a bit longer ... you'll have to add an LRECL option on the FILE statement to accommodate more data (a limit of 32K with _FILE_, but it's still pretty fast even with 24K obs, less than 1 second on a not too fast PC)
data have;
input pid : $7. status : $1. @@;
datalines;
abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C
abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C
;
filename nosee dummy;
data want;
set have;
file nosee;
put status @;
nid = count(_file_,status);
run;
pid status nid
abcdefg A 1
bcdefgh B 1
abcdefg A 2
bcdefgh A 3
cdefghi A 4
cdefghi C 1
abcdefg A 5
bcdefgh B 2
abcdefg A 6
bcdefgh A 7
cdefghi A 8
cdefghi C 2
hi ... if both Linlin and I interpreted the original post correctly (produce counts within values of STATUS), here's a modification of the hash approach that, as you say, eliminates the need for the sort ...
data have;
input pid : $7. status : $1. @@;
datalines;
abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C
abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C
;
data want;
declare hash h();
h.defineKey('status');
h.defineData('nid');
h.defineDone();
do until (done);
set have end=done;
if h.find() then h.add();
nid + 1;
output;
h.replace();
call missing (nid);
end;
stop;
run;
pid status nid
abcdefg A 1
bcdefgh B 1
abcdefg A 2
bcdefgh A 3
cdefghi A 4
cdefghi C 1
abcdefg A 5
bcdefgh B 2
abcdefg A 6
bcdefgh A 7
cdefghi A 8
cdefghi C 2
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.