Hi,
I have a transactional dataset that is keyed on an ID that is a 7 character string, I would like to recode this ID to a numeric ID.
Input dataset would look like
PID status
abcdefg A
abcdefg B
bcdefgh A
cdefghi A
etc..
I would like my ouput dataset to look like
PID status NID
abcdefg A 1
abcdefg B 1
bcdefgh A 2
cdefghi A 3
etc..
The logic I thought about is using PROC FREQ to get all the unique PID and use the _N_ as the new recoded NID and match back to the original dataset, but i'm not sure how to create the recode dataset on the ouput of PROC FREQ.
I know the following doesnt work but i'm not sure how to get what I want..
PROC FREQ DATA=WORK.INPUT NOPRINT ;
TABLE PID ;
OUTPUT OUT=WORK.RECODE _N_ ;
RUN ;
Thanks for your help.
use the By groups feature instead and a retain statement to maintain the value until you reach a new group.
Data want;
set have;
by pid;
retain new_id 0;
if first.pid then new_id+1;
run;
try this one:
data have;
input PID $7. status $;
cards;
abcdefg A
abcdefg B
bcdefgh A
cdefghi A
bcdefgh B
cdefghi B
;
proc sort;
by pid;
data want;
set have;
by pid;
nid+first.pid;
run;
proc print;run;
Obs PID status nid
1 abcdefg A 1
2 abcdefg B 1
3 bcdefgh A 2
4 bcdefgh B 2
5 cdefghi A 3
6 cdefghi B 3
Linlin
If using DOW, then automatic variable _n_ actually is what OP wants. Borrowing some of LinLin's code:
data have;
input PID $7. status $;
cards;
abcdefg A
abcdefg B
bcdefgh A
cdefghi A
bcdefgh B
cdefghi B
;
proc sort;
by pid;
data want;
do until (last.pid);
set have;
by pid;
nid=_n_;
output;
end;
run;
proc print;run;
Haikuo
A hash approach which avoids the need to pre-sort the source data set.
data have;
input PID $7. status $;
cards;
abcdefg A
bcdefgh B
abcdefg B
bcdefgh A
cdefghi A
cdefghi B
;
run;
data want(drop=_:);
set have;
attrib NID length=8;
if _n_=1 then
do;
declare hash h1();
_rc = h1.defineKey('PID');
_rc = h1.defineData('NID');
_rc = h1.defineDone( );
end;
if h1.find() ne 0 then
do;
_NID+1;
NID=_NID;
_rc=h1.add();
end;
run;
Small code change added to make it work correctly.
hi ... I tried Patrick's "no sort" method and I don't think that it gives the counts within STATUS as requested (does it need a tweak?)
here's a "no sort" method, though Linlins' sort + set + by is most likely a better choice with large files
the COUNTs will take a bit longer ... you'll have to add an LRECL option on the FILE statement to accommodate more data (a limit of 32K with _FILE_, but it's still pretty fast even with 24K obs, less than 1 second on a not too fast PC)
data have;
input pid : $7. status : $1. @@;
datalines;
abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C
abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C
;
filename nosee dummy;
data want;
set have;
file nosee;
put status @;
nid = count(_file_,status);
run;
pid status nid
abcdefg A 1
bcdefgh B 1
abcdefg A 2
bcdefgh A 3
cdefghi A 4
cdefghi C 1
abcdefg A 5
bcdefgh B 2
abcdefg A 6
bcdefgh A 7
cdefghi A 8
cdefghi C 2
hi ... if both Linlin and I interpreted the original post correctly (produce counts within values of STATUS), here's a modification of the hash approach that, as you say, eliminates the need for the sort ...
data have;
input pid : $7. status : $1. @@;
datalines;
abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C
abcdefg A bcdefgh B abcdefg A bcdefgh A cdefghi A cdefghi C
;
data want;
declare hash h();
h.defineKey('status');
h.defineData('nid');
h.defineDone();
do until (done);
set have end=done;
if h.find() then h.add();
nid + 1;
output;
h.replace();
call missing (nid);
end;
stop;
run;
pid status nid
abcdefg A 1
bcdefgh B 1
abcdefg A 2
bcdefgh A 3
cdefghi A 4
cdefghi C 1
abcdefg A 5
bcdefgh B 2
abcdefg A 6
bcdefgh A 7
cdefghi A 8
cdefghi C 2
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: