Help using Base SAS procedures

A data manupulation question

Reply
Contributor
Posts: 44

A data manupulation question

I have a following datatset with two character variable (subject, arm)

subject       arm
-------       ----
10001          PA
10001          CA
10001          CA
10002          NA
10002          PA
10002          NA     


I need a following dataset where arm variable will be the combination of every unique value
for each subject.
The result dataset should be as below.:


subject       arm
-------       -----
10001          PA + CA
10002          NA + PA

I know it possible to do it using proc sort using nodupkey and then merging and using catx
function. But that will be a very long ugly code. Will you be kind enough
to help me with a more efficient code?

Thank you so much.

Super User
Posts: 3,261

Re: A data manupulation question

Posted in reply to need_sas_help

Something like this (untested):

data want;

  set have;

  by subject;

  retain arm_new;

  if first.subject then arm_new = '';

  arm_new = catx(' + ',arm);

  if last.subject then output;

run;

Respected Advisor
Posts: 3,156

Re: A data manupulation question

Posted in reply to need_sas_help

I won't say it would be very long and ugly if using proc sort first. but you do have other options.

data have;

input subject$       arm$50.;

cards;

10001          PA

10001          CA

10001          CA

10002          NA

10002          PA

10002          NA

;

proc sort data=have out=have1 nodupkey;

by subject arm;

run;

data want;

length arm $ 50;

do until (last.subject);

set have1 (rename= arm=_arm);

by subject;

arm=catx('+', arm, _arm);

end;

drop _arm;

run;

/*array : assume your data is at least clustered by subject*/

data want_array;

length arm $ 50;

array t(100)/*100 is arbitary, using Hash to obtain full dynamic*/ $ 2;

do _n_=1 by 1 until (last.subject);

set have (rename=arm=_arm);

by subject notsorted;

if _arm not in t then t(_n_)=_arm;

end;

arm=catx('+', of t(*));

keep subject arm;

run;

Haikuo

Frequent Contributor
Posts: 137

Re: A data manupulation question

Hi,

May i request you to please explain the execution of a portion of your code :

/*array : assume your data is at least clustered by subject*/

data want_array;

length arm $ 50;

array t(100)/*100 is arbitary, using Hash to obtain full dynamic*/ $ 2;

do _n_=1 by 1 until (last.subject);

set have (rename=arm=_arm);

by subject notsorted;

if _arm not in t then t(_n_)=_arm;/*this part please? thanks*/

end;

arm=catx('+', of t(*));

keep subject arm;

run;

Respected Advisor
Posts: 3,156

Re: A data manupulation question

Posted in reply to CharlotteCain

I can't find the official online doc myself at the moment,

if _arm not in t /*this t is referring array t*/

then t(_n_)=_arm;

it basically only allow distinct _arm entering array t.

Let me know if I have myself clear enough.

Regards,

Haikuo

Respected Advisor
Posts: 4,937

Re: A data manupulation question

Posted in reply to CharlotteCain

The IN operator involving an array is documented here:

SAS(R) 9.4 Language Reference: Concepts, Fourth Edition

PG

PG
Trusted Advisor
Posts: 1,231

Re: A data manupulation question

Posted in reply to need_sas_help

proc sql;

create table want as

select distinct subject,arm from have

order by subject;

quit;

data need;

set want;

by subject;

retain arm;

arm=catx(' + ',lag(arm),arm);

if last.subject;

run;

Respected Advisor
Posts: 4,937

Re: A data manupulation question

Posted in reply to need_sas_help

Is this good enough?

data have;

input subject $ arm $;

datalines;

10001          PA

10001          CA

10001          CA

10002          NA

10002          PA

10002          NA    

;

proc sort data=have; by subject arm; run;

data want(rename=dum=arm);

length dum $200;

do until(last.subject);

    set have; by subject arm;

    if last.arm then dum = catx(" + ", dum, arm);

    end;

run;

proc print data=want noobs; run;

PG

PG
Super User
Posts: 5,518

Re: A data manupulation question

PG Stats,

I know my SQL is rudimentary at best, but wouldn't something like this work, even on unsorted data?

proc sql noprint;

   select subject, distinct arm separated by ' + ' as all_arm

   from have

   group by subject;

quit;

Contributor
Posts: 44

Re: A data manupulation question

Posted in reply to Astounding

Astounding.. looks good.. but unfortunately given an error.

2079  proc sql noprint;

2080     select caseno, distinct drnm01 separated by ' + ' as all_arm


                                 ------


                                 22


                                 202


ERROR 22-322: Syntax error, expecting one of the following: a quoted string, !, !!, &, (, *, **,


              +, ',', -, '.', /, <, <=, <>, =, >, >=, ?, AND, AS, BETWEEN, CONTAINS, EQ, EQT,


              FORMAT, FROM, GE, GET, GT, GTT, IN, INFORMAT, INTO, IS, LABEL, LE, LEN, LENGTH,


              LET, LIKE, LT, LTT, NE, NET, NOT, NOTIN, OR, TRANSCODE, ^, ^=, |, ||, ~, ~=.



ERROR 202-322: The option or parameter is not recognized and will be ignored.



2081     from &RAWLIB..A02DR


2082     group by caseno;


2083  quit;


NOTE: The SAS System stopped processing this step because of errors.


NOTE: PROCEDURE SQL used (Total process time):


      real time           0.01 seconds


      cpu time            0.01 seconds

Super Contributor
Posts: 578

Re: A data manupulation question

Posted in reply to Astounding

separated by used when building a macro variable

Respected Advisor
Posts: 4,937

Re: A data manupulation question

Posted in reply to Astounding

This type of concatenation is only available when selecting INTO a macro variable. As in

Proc sql;

select distinct arm into :myArmList separated by " + "

from have;

quit;

PG

PG
Super User
Posts: 7,868

Re: A data manupulation question

Posted in reply to need_sas_help

Actually, IMO your stated solution is the most elegant one. All others tend to be more complicated or obfuscated.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 43

Re: A data manupulation question

Posted in reply to need_sas_help

a proposal...

The code won't work on unsorted data...

With a a PROC SORT NODUPKEYS, the code will be more simple...

best regards

SR

N/A
Posts: 1

Re: A data manupulation question

Posted in reply to need_sas_help

I tried using hash objects and could not get it to work.  But I believe the following code works using perl regular expressions:

data temp_1;

input subject $ arm $;

cards;

10000          NJ

10001          PA

10001          CA

10001          CA

10002          NA

10002          PA

10002          NA

10003          NJ

10004          NJ

;

data temp_2;

   set temp_1  end=last;

   length old_subject $ 5

          arm_list    $ 100;

   retain old_subject

          arm_list;

   keep old_subject

        arm_list;

   rename old_subject = subject

          arm_list = arm;

   if _N_ = 1 then do;

      old_subject = subject;

      arm_list = arm;

   end;

   if old_subject = subject then do;

      if ^prxmatch('/' || strip(arm) || '/', arm_list) then

      arm_list = strip(arm_list) || ' + ' || strip(arm);

   end;

   else do;

      arm_list = prxchange('s/^[ +]+(.*)/$1/', 1, arm_list);

   output;

   arm_list = arm;

   old_subject = subject;

   end;

   if last then

      output;

run;

Ask a Question
Discussion stats
  • 19 replies
  • 811 views
  • 1 like
  • 11 in conversation