I have a dataset with many sites (like in excerpt below) where the 3 participants at the end with XXX are additional dummy participants. What I want is to modify these XXX patids so that they are in the format I have in the "want" dataset below, as opposed to having XXX at the end, while keeping the group assignment from have dataset unchanged.
data have;
input site $ patid $ group $;
datalines;
ABC ABCPROT01001 A
ABC ABCPROT01002 B
ABC ABCPROT01003 A
ABC ABCPROT01005 A
ABC ABCPROT01006 A
ABC ABCPROT01XXX B
ABC ABCPROT01XXX A
ABC ABCPROT01XXX B
CDF CDFPROT01004 A
CDF CDFPROT01005 A
CDF CDFPROT01006 A
CDF CDFPROT01007 B
CDF CDFPROT01008 A
CDF CDFPROT01009 B
CDF CDFPROT01010 A
CDF CDFPROT01012 A
CDF CDFPROT01013 B
CDF CDFPROT01XXX B
CDF CDFPROT01XXX B
CDF CDFPROT01XXX A
AMD AMDPROT01001 A
AMD AMDPROT01002 B
AMD AMDPROT01003 A
AMD AMDPROT01XXX B
AMD AMDPROT01XXX A
AMD AMDPROT01XXX A
;
run;
data want;
input site $ patid $ group $;
datalines;
ABC ABCPROT01001 A
ABC ABCPROT01002 B
ABC ABCPROT01003 A
ABC ABCPROT01005 A
ABC ABCPROT01006 A
ABC ABCPROT01007 B
ABC ABCPROT01008 A
ABC ABCPROT01009 B
CDF CDFPROT01004 A
CDF CDFPROT01005 A
CDF CDFPROT01006 A
CDF CDFPROT01007 B
CDF CDFPROT01008 A
CDF CDFPROT01009 B
CDF CDFPROT01010 A
CDF CDFPROT01012 A
CDF CDFPROT01013 B
CDF CDFPROT01014 B
CDF CDFPROT01015 B
CDF CDFPROT01016 A
AMD AMDPROT01001 A
AMD AMDPROT01002 B
AMD AMDPROT01003 A
AMD AMDPROT01004 B
AMD AMDPROT01005 A
AMD AMDPROT01006 A
;
run;
Try this
data want (drop = n l);
set have;
by site notsorted;
if first.site then n = 0;
l = substr(patid, length(patid) - 2);
if not find(l, 'x', 'i') then n = input(l, 3.);
else n = sum(n, 1);
substr(patid, 10, 3) = put(n, z3.);
retain n;
run;
Result:
site patid group ABC ABCPROT01001 A ABC ABCPROT01002 B ABC ABCPROT01003 A ABC ABCPROT01005 A ABC ABCPROT01006 A ABC ABCPROT01007 B ABC ABCPROT01008 A ABC ABCPROT01009 B CDF CDFPROT01004 A CDF CDFPROT01005 A CDF CDFPROT01006 A CDF CDFPROT01007 B CDF CDFPROT01008 A CDF CDFPROT01009 B CDF CDFPROT01010 A CDF CDFPROT01012 A CDF CDFPROT01013 B CDF CDFPROT01014 B CDF CDFPROT01015 B CDF CDFPROT01016 A QQL QQLPROT01001 A QQL QQLPROT01002 B QQL QQLPROT01003 A AMD AMDPROT01001 A AMD AMDPROT01002 B AMD AMDPROT01003 A AMD AMDPROT01004 B AMD AMDPROT01005 A AMD AMDPROT01006 A
Your data step code cuts of the last characters of the PATID field.
I think this does what you are asking for.
data have; input site $ patid :$12. group $; datalines; ABC ABCPROT01001 A ABC ABCPROT01002 B ABC ABCPROT01003 A ABC ABCPROT01005 A ABC ABCPROT01006 A ABC ABCPROT01XXX B ABC ABCPROT01XXX A ABC ABCPROT01XXX B CDF CDFPROT01004 A CDF CDFPROT01005 A CDF CDFPROT01006 A CDF CDFPROT01007 B CDF CDFPROT01008 A CDF CDFPROT01009 B CDF CDFPROT01010 A CDF CDFPROT01012 A CDF CDFPROT01013 B CDF CDFPROT01XXX B CDF CDFPROT01XXX B CDF CDFPROT01XXX A AMD AMDPROT01001 A AMD AMDPROT01002 B AMD AMDPROT01003 A AMD AMDPROT01XXX B AMD AMDPROT01XXX A AMD AMDPROT01XXX A ; run; data want; set have; retain lastno ; if substr(patid,10) ne 'XXX' then lastno=input( substr(patid,10),f3.); else do; lastno+1; patid= cats(substr(patid,1,9),put(lastno,z3.)); end; drop lastno; run;
Thank you, ballardw. This does work great but what about if I have a site that only has the 3 patids with XXX? I edited my initial dataset so for example, for site QQL, there's no existing patids yet but I'd like to add those 3 dummy patids in there and I wanted them to be assigned as QQLPROT01001, QQLPROT01002, QQLPROT01003. The code now instead assigns them as QQLPROT01017, QQLPROT01018, QQLPROT01019 (I guess since there's no previous QQL patid before them, it takes the last patid from CDF, which is CDFPROT01016)
data have;
input site $ patid :$12. group $;
datalines;
ABC ABCPROT01001 A
ABC ABCPROT01002 B
ABC ABCPROT01003 A
ABC ABCPROT01005 A
ABC ABCPROT01006 A
ABC ABCPROT01XXX B
ABC ABCPROT01XXX A
ABC ABCPROT01XXX B
CDF CDFPROT01004 A
CDF CDFPROT01005 A
CDF CDFPROT01006 A
CDF CDFPROT01007 B
CDF CDFPROT01008 A
CDF CDFPROT01009 B
CDF CDFPROT01010 A
CDF CDFPROT01012 A
CDF CDFPROT01013 B
CDF CDFPROT01XXX B
CDF CDFPROT01XXX B
CDF CDFPROT01XXX A
QQL QQLPROT01XXX A
QQL QQLPROT01XXX B
QQL QQLPROT01XXX A
AMD AMDPROT01001 A
AMD AMDPROT01002 B
AMD AMDPROT01003 A
AMD AMDPROT01XXX B
AMD AMDPROT01XXX A
AMD AMDPROT01XXX A
;
run;
data want;
input site $ patid :$12. group $;
datalines;
ABC ABCPROT01001 A
ABC ABCPROT01002 B
ABC ABCPROT01003 A
ABC ABCPROT01005 A
ABC ABCPROT01006 A
ABC ABCPROT01007 B
ABC ABCPROT01008 A
ABC ABCPROT01009 B
CDF CDFPROT01004 A
CDF CDFPROT01005 A
CDF CDFPROT01006 A
CDF CDFPROT01007 B
CDF CDFPROT01008 A
CDF CDFPROT01009 B
CDF CDFPROT01010 A
CDF CDFPROT01012 A
CDF CDFPROT01013 B
CDF CDFPROT01014 B
CDF CDFPROT01015 B
CDF CDFPROT01016 A
QQL QQLPROT01001 A
QQL QQLPROT01002 B
QQL QQLPROT01003 A
AMD AMDPROT01001 A
AMD AMDPROT01002 B
AMD AMDPROT01003 A
AMD AMDPROT01004 B
AMD AMDPROT01005 A
AMD AMDPROT01006 A
;
run;
Try this
data want (drop = n l);
set have;
by site notsorted;
if first.site then n = 0;
l = substr(patid, length(patid) - 2);
if not find(l, 'x', 'i') then n = input(l, 3.);
else n = sum(n, 1);
substr(patid, 10, 3) = put(n, z3.);
retain n;
run;
Result:
site patid group ABC ABCPROT01001 A ABC ABCPROT01002 B ABC ABCPROT01003 A ABC ABCPROT01005 A ABC ABCPROT01006 A ABC ABCPROT01007 B ABC ABCPROT01008 A ABC ABCPROT01009 B CDF CDFPROT01004 A CDF CDFPROT01005 A CDF CDFPROT01006 A CDF CDFPROT01007 B CDF CDFPROT01008 A CDF CDFPROT01009 B CDF CDFPROT01010 A CDF CDFPROT01012 A CDF CDFPROT01013 B CDF CDFPROT01014 B CDF CDFPROT01015 B CDF CDFPROT01016 A QQL QQLPROT01001 A QQL QQLPROT01002 B QQL QQLPROT01003 A AMD AMDPROT01001 A AMD AMDPROT01002 B AMD AMDPROT01003 A AMD AMDPROT01004 B AMD AMDPROT01005 A AMD AMDPROT01006 A
@PeterClemmensen, thank you so much, this did the trick!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.