BookmarkSubscribeRSS Feed
need_sas_help
Calcite | Level 5

I have a following datatset with two character variable (subject, arm)

subject       arm
-------       ----
10001          PA
10001          CA
10001          CA
10002          NA
10002          PA
10002          NA     


I need a following dataset where arm variable will be the combination of every unique value
for each subject.
The result dataset should be as below.:


subject       arm
-------       -----
10001          PA + CA
10002          NA + PA

I know it possible to do it using proc sort using nodupkey and then merging and using catx
function. But that will be a very long ugly code. Will you be kind enough
to help me with a more efficient code?

Thank you so much.

19 REPLIES 19
SASKiwi
PROC Star

Something like this (untested):

data want;

  set have;

  by subject;

  retain arm_new;

  if first.subject then arm_new = '';

  arm_new = catx(' + ',arm);

  if last.subject then output;

run;

Haikuo
Onyx | Level 15

I won't say it would be very long and ugly if using proc sort first. but you do have other options.

data have;

input subject$       arm$50.;

cards;

10001          PA

10001          CA

10001          CA

10002          NA

10002          PA

10002          NA

;

proc sort data=have out=have1 nodupkey;

by subject arm;

run;

data want;

length arm $ 50;

do until (last.subject);

set have1 (rename= arm=_arm);

by subject;

arm=catx('+', arm, _arm);

end;

drop _arm;

run;

/*array : assume your data is at least clustered by subject*/

data want_array;

length arm $ 50;

array t(100)/*100 is arbitary, using Hash to obtain full dynamic*/ $ 2;

do _n_=1 by 1 until (last.subject);

set have (rename=arm=_arm);

by subject notsorted;

if _arm not in t then t(_n_)=_arm;

end;

arm=catx('+', of t(*));

keep subject arm;

run;

Haikuo

CharlotteCain
Quartz | Level 8

Hi,

May i request you to please explain the execution of a portion of your code :

/*array : assume your data is at least clustered by subject*/

data want_array;

length arm $ 50;

array t(100)/*100 is arbitary, using Hash to obtain full dynamic*/ $ 2;

do _n_=1 by 1 until (last.subject);

set have (rename=arm=_arm);

by subject notsorted;

if _arm not in t then t(_n_)=_arm;/*this part please? thanks*/

end;

arm=catx('+', of t(*));

keep subject arm;

run;

Haikuo
Onyx | Level 15

I can't find the official online doc myself at the moment,

if _arm not in t /*this t is referring array t*/

then t(_n_)=_arm;

it basically only allow distinct _arm entering array t.

Let me know if I have myself clear enough.

Regards,

Haikuo

PGStats
Opal | Level 21

The IN operator involving an array is documented here:

SAS(R) 9.4 Language Reference: Concepts, Fourth Edition

PG

PG
stat_sas
Ammonite | Level 13

proc sql;

create table want as

select distinct subject,arm from have

order by subject;

quit;

data need;

set want;

by subject;

retain arm;

arm=catx(' + ',lag(arm),arm);

if last.subject;

run;

PGStats
Opal | Level 21

Is this good enough?

data have;

input subject $ arm $;

datalines;

10001          PA

10001          CA

10001          CA

10002          NA

10002          PA

10002          NA    

;

proc sort data=have; by subject arm; run;

data want(rename=dum=arm);

length dum $200;

do until(last.subject);

    set have; by subject arm;

    if last.arm then dum = catx(" + ", dum, arm);

    end;

run;

proc print data=want noobs; run;

PG

PG
Astounding
PROC Star

PG Stats,

I know my SQL is rudimentary at best, but wouldn't something like this work, even on unsorted data?

proc sql noprint;

   select subject, distinct arm separated by ' + ' as all_arm

   from have

   group by subject;

quit;

need_sas_help
Calcite | Level 5

Astounding.. looks good.. but unfortunately given an error.

2079  proc sql noprint;

2080     select caseno, distinct drnm01 separated by ' + ' as all_arm


                                 ------


                                 22


                                 202


ERROR 22-322: Syntax error, expecting one of the following: a quoted string, !, !!, &, (, *, **,


              +, ',', -, '.', /, <, <=, <>, =, >, >=, ?, AND, AS, BETWEEN, CONTAINS, EQ, EQT,


              FORMAT, FROM, GE, GET, GT, GTT, IN, INFORMAT, INTO, IS, LABEL, LE, LEN, LENGTH,


              LET, LIKE, LT, LTT, NE, NET, NOT, NOTIN, OR, TRANSCODE, ^, ^=, |, ||, ~, ~=.



ERROR 202-322: The option or parameter is not recognized and will be ignored.



2081     from &RAWLIB..A02DR


2082     group by caseno;


2083  quit;


NOTE: The SAS System stopped processing this step because of errors.


NOTE: PROCEDURE SQL used (Total process time):


      real time           0.01 seconds


      cpu time            0.01 seconds

DBailey
Lapis Lazuli | Level 10

separated by used when building a macro variable

PGStats
Opal | Level 21

This type of concatenation is only available when selecting INTO a macro variable. As in

Proc sql;

select distinct arm into :myArmList separated by " + "

from have;

quit;

PG

PG
SR_FR
Obsidian | Level 7

a proposal...

The code won't work on unsorted data...

With a a PROC SORT NODUPKEYS, the code will be more simple...

best regards

SR

Paul65
Calcite | Level 5

I tried using hash objects and could not get it to work.  But I believe the following code works using perl regular expressions:

data temp_1;

input subject $ arm $;

cards;

10000          NJ

10001          PA

10001          CA

10001          CA

10002          NA

10002          PA

10002          NA

10003          NJ

10004          NJ

;

data temp_2;

   set temp_1  end=last;

   length old_subject $ 5

          arm_list    $ 100;

   retain old_subject

          arm_list;

   keep old_subject

        arm_list;

   rename old_subject = subject

          arm_list = arm;

   if _N_ = 1 then do;

      old_subject = subject;

      arm_list = arm;

   end;

   if old_subject = subject then do;

      if ^prxmatch('/' || strip(arm) || '/', arm_list) then

      arm_list = strip(arm_list) || ' + ' || strip(arm);

   end;

   else do;

      arm_list = prxchange('s/^[ +]+(.*)/$1/', 1, arm_list);

   output;

   arm_list = arm;

   old_subject = subject;

   end;

   if last then

      output;

run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 19 replies
  • 4060 views
  • 1 like
  • 11 in conversation