Hi,
I have a transactional dataset that is keyed on an ID that is a 7 character string, I would like to recode this ID to a numeric ID.
Input dataset would look like
PID status
abcdefg A
abcdefg B
bcdefgh A
cdefghi A
etc..
I would like my ouput dataset to look like
PID status NID
abcdefg A 1
abcdefg B 1
bcdefgh A 2
cdefghi A 3
etc..
The logic I thought about is using PROC FREQ to get all the unique PID and use the _N_ as the new recoded NID and match back to the original dataset, but i'm not sure how to create the recode dataset on the ouput of PROC FREQ.
I know the following doesnt work but i'm not sure how to get what I want..
PROC FREQ DATA=WORK.INPUT NOPRINT ;
TABLE PID ;
OUTPUT OUT=WORK.RECODE _N_ ;
RUN ;
Thanks for your help.
use the By groups feature instead and a retain statement to maintain the value until you reach a new group.
Data want;
set have;
by pid;
retain new_id 0;
if first.pid then new_id+1;
run;
try this one:
data have;
input PID $7. status $;
cards;
abcdefg A
abcdefg B
bcdefgh A
cdefghi A
bcdefgh B
cdefghi B
;
proc sort;
by pid;
data want;
set have;
by pid;
nid+first.pid;
run;
proc print;run;
Obs PID status nid
1 abcdefg A 1
2 abcdefg B 1
3 bcdefgh A 2
4 bcdefgh B 2
5 cdefghi A 3
6 cdefghi B 3
Linlin
If using DOW, then automatic variable _n_ actually is what OP wants. Borrowing some of LinLin's code:
data have;
input PID $7. status $;
cards;
abcdefg A
abcdefg B
bcdefgh A
cdefghi A
bcdefgh B
cdefghi B
;
proc sort;
by pid;
data want;
do until (last.pid);
set have;
by pid;
nid=_n_;
output;
end;
run;
proc print;run;
Haikuo
A hash approach which avoids the need to pre-sort the source data set.
data have;
input PID $7. status $;
cards;
abcdefg A
bcdefgh B
abcdefg B
bcdefgh A
cdefghi A
cdefghi B
;
run;
data want(drop=_:);
set have;
attrib NID length=8;
if _n_=1 then
do;
declare hash h1();
_rc = h1.defineKey('PID');
_rc = h1.defineData('NID');
_rc = h1.defineDone( );
end;
if h1.find() ne 0 then
do;
_NID+1;
NID=_NID;
_rc=h1.add();
end;
run;
Small code change added to make it work correctly.
hi ... I tried Patrick's "no sort" method and I don't think that it gives the counts within STATUS as requested (does it need a tweak?)
here's a "no sort" method, though Linlins' sort + set + by is most likely a better choice with large files
the COUNTs will take a bit longer ... you'll have to add an LRECL option on the FILE statement to accommodate more data (a limit of 32K with _FILE_, but it's still pretty fast even with 24K obs, less than 1 second on a not too fast PC)
data have;
input pid : $7. status : $1. @@;
datalines;
abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C
abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C
;
filename nosee dummy;
data want;
set have;
file nosee;
put status @;
nid = count(_file_,status);
run;
pid status nid
abcdefg A 1
bcdefgh B 1
abcdefg A 2
bcdefgh A 3
cdefghi A 4
cdefghi C 1
abcdefg A 5
bcdefgh B 2
abcdefg A 6
bcdefgh A 7
cdefghi A 8
cdefghi C 2
hi ... if both Linlin and I interpreted the original post correctly (produce counts within values of STATUS), here's a modification of the hash approach that, as you say, eliminates the need for the sort ...
data have;
input pid : $7. status : $1. @@;
datalines;
abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C
abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C
;
data want;
declare hash h();
h.defineKey('status');
h.defineData('nid');
h.defineDone();
do until (done);
set have end=done;
if h.find() then h.add();
nid + 1;
output;
h.replace();
call missing (nid);
end;
stop;
run;
pid status nid
abcdefg A 1
bcdefgh B 1
abcdefg A 2
bcdefgh A 3
cdefghi A 4
cdefghi C 1
abcdefg A 5
bcdefgh B 2
abcdefg A 6
bcdefgh A 7
cdefghi A 8
cdefghi C 2
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.