BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Lapis Lazuli | Level 10

Hi guys, 

suppose to have the following dataset: 

data DB;
  input ID :$200. Reason1 :$200.Reason2 :$200.;
cards;
0001 Considered     Excluded
0001 Violated       Not_excluded
0001 Excluded       Not_Verified
0001 Excluded       Not_Verified   
run;

Is there a way to get the following? 

 

data DB1;
  input ID :$200. Reason1 :$200.Reason2 :$200.;
cards;
0001 Considered,Violated,Excluded     Excluded,Not_excluded,Not_Verified
0001 Considered,Violated,Excluded     Excluded,Not_excluded,Not_Verified
0001 Considered,Violated,Excluded     Excluded,Not_excluded,Not_Verified   
run;

In other words I would like to combine into one row and by each variable "Reason*" all the values the variable(s) takes. 

 

Note that, after the combination, I would like to repeat the combination the number of times the variable Reason* takes a value ID by ID. 

In the real dataset I have Reason1 , Reason12, Reason3, ..., Reason10.

Can anyone help me please? 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You will need to make NEW variable(s).

data want ;
  set have end=eof;
  length newreason $200 ;
  retain newreason;
  newreason = catx(',',newreason,reason);
  if eof; 
  drop reason;
run;

If the data is grouped by something, perhaps your ID variable, then this can be simplified by by  placing the SET inside a DO loop and using a BY statement.  This will eliminate the need for the RETAIN statement.

data want;
  do until (last.id);
     set have;
     by id;
     length newreason $200;
     newreason=catx(',',newreason,reason);
  end;
  drop reason;
run;

To output multiple copies you could change the DO statement to count the number of observations and then add a second DO loop around an output statement. 

data want;
  do _n_=1 by 1 until (last.id);
     set have;
     by id;
     length newreason $200;
     newreason=catx(',',newreason,reason);
  end;
  do _n_=1 to _n_;
    output;
  end;
  drop reason;
run;

But if there are other variables besides ID and REASON you want to keep then instead just add another DO loop with and additional SET statement.

data want;
  do until (last.id);
     set have;
     by id;
     length newreason $200;
     newreason=catx(',',newreason,reason);
  end;
  do until (last.id);
     set have;
     by id;
     output;
  end;
  drop reason;
run;

And to handle multiple variables just use arrays.

data want;
  do until (last.id);
     set have;
     by id;
     array reason[10];
     array newreason[10] $200;
     do over reason;
       newreason=catx(',',newreason,reason);
     end;
  end;
  do until (last.id);
     set have;
     by id;
     output;
  end;
  drop reason1-reason10;
run;

If you don't like using the deprecated DO OVER syntax then just add explicit indexes instead.

     do _n_=1 to dim(reason);
       newreason[_n_]=catx(',',newreason[_n_],reason[_n_]);
     end;

 

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

In my opinion, this is not a good idea, and it makes future programming difficult. See Maxim 19. Also, why do you need an output of three identical rows?

 

Maybe a different re-arrangement of the data would work better. Please tell us what analyses/plots/tables you would create once you have the data re-arranged as you want?

--
Paige Miller
NewUsrStat
Lapis Lazuli | Level 10
Thank you very much for your help. The reason why I have to replicate rows is because the dataset contains other variables with different values and for the subsequent analysis I have to apply different types of filters.
PaigeMiller
Diamond | Level 26

@NewUsrStat wrote:
Thank you very much for your help. The reason why I have to replicate rows is because the dataset contains other variables with different values and for the subsequent analysis I have to apply different types of filters.

Unfortunately, this does not answer my question about what analyses/plots/tables are you trying to create.

--
Paige Miller
Tom
Super User Tom
Super User

You will need to make NEW variable(s).

data want ;
  set have end=eof;
  length newreason $200 ;
  retain newreason;
  newreason = catx(',',newreason,reason);
  if eof; 
  drop reason;
run;

If the data is grouped by something, perhaps your ID variable, then this can be simplified by by  placing the SET inside a DO loop and using a BY statement.  This will eliminate the need for the RETAIN statement.

data want;
  do until (last.id);
     set have;
     by id;
     length newreason $200;
     newreason=catx(',',newreason,reason);
  end;
  drop reason;
run;

To output multiple copies you could change the DO statement to count the number of observations and then add a second DO loop around an output statement. 

data want;
  do _n_=1 by 1 until (last.id);
     set have;
     by id;
     length newreason $200;
     newreason=catx(',',newreason,reason);
  end;
  do _n_=1 to _n_;
    output;
  end;
  drop reason;
run;

But if there are other variables besides ID and REASON you want to keep then instead just add another DO loop with and additional SET statement.

data want;
  do until (last.id);
     set have;
     by id;
     length newreason $200;
     newreason=catx(',',newreason,reason);
  end;
  do until (last.id);
     set have;
     by id;
     output;
  end;
  drop reason;
run;

And to handle multiple variables just use arrays.

data want;
  do until (last.id);
     set have;
     by id;
     array reason[10];
     array newreason[10] $200;
     do over reason;
       newreason=catx(',',newreason,reason);
     end;
  end;
  do until (last.id);
     set have;
     by id;
     output;
  end;
  drop reason1-reason10;
run;

If you don't like using the deprecated DO OVER syntax then just add explicit indexes instead.

     do _n_=1 to dim(reason);
       newreason[_n_]=catx(',',newreason[_n_],reason[_n_]);
     end;

 

NewUsrStat
Lapis Lazuli | Level 10
Thank you very much for your help. Unfortunately the code that refers to the array does not work. But, anyway, how it become if the value of the variable is repeated identical?
Tom
Super User Tom
Super User

If the values repeat then they will appear multiple times.

If the goal is to get the distinct values then you would need to make the logic more complex so that the current reason is only added when it does not already appear.

do _n_=1 to dim(reason);
  if not findw(newreason[_n_],reason[_n_],',','t') then 
   newreason[_n_]=catx(',',newreason[_n_],reason[_n_])
  ;
end;

If you don't care about the case then add the i modifier into addition to the t modifier in the FINDW() function call.

 

But with these extra complication it might be better to use multiple passes so the logic can be simpler and easier to follow.  So first build the list of unique values.  Then build the concatenations. And finally remerge it with the original data.

* Make TALL dataset with all of the reasons from all of the reason variables ;
proc transpose data=have out=step1;
   by id;
   var reason1-reason10;
run;
* Eliminate duplicates ;
proc sort data=step1 out=step2 nodupkey ;
   by id _name_ col1 ;
run;
* Concatenate ;
data step3;
   do until (last._name_);
      set step2;
      by id _name_;
      length list $200 ;
      list = catx(',',list,col1);
   end;
   drop col1;
run;
* Turn back into separate variables;
proc transpose data=step3 out=step4 ;
   by id;
   id _name_;
   var list ;
run;
* Combine with original data ;
data want;
   merge have(drop=reason1-reason10) step4;
   by id;
run;

 

dxiao2017
Lapis Lazuli | Level 10

Hi @Tom , I think your solution is very complicated, I think a macro contains sql and data steps solve this question better (in that, this approach address the major issue of the question more directly, i.e., the desired outcome is unique combinations of all variations of reason1 and reason2 for an unique ID, so use sql select into; and data step  to set a series of newly created dataset can be proper techniques, and these steps need to be repeated so have to use macro with do loop), what do you think?

 

My other comment for your solution is that, I do not think array is a proper technique to solve this question, because obviously all of the columns are characters, there is no any chance for you to do a calculation of numbers so no need to use array, also the major issue is to combine several different character values of a columns to one character value. Techniques for process character values and character strings are more suitable.

ballardw
Super User

@NewUsrStat wrote:
Thank you very much for your help. Unfortunately the code that refers to the array does not work. But, anyway, how it become if the value of the variable is repeated identical?

What does "array does not work"? 

If code throws an error then provide the log with the code and all the messages. Copy the text from the log then on the forum open a text box in a message using the </> icon above the message window and paste the code.

 

If the result is not as expected then provide the data used in the form of a working data step and then provide more detail as to the expected result for the example data.

 

I am afraid that a generic "does not work" is very unhelpful in diagnosing problems.

dxiao2017
Lapis Lazuli | Level 10

Hi @NewUsrStat , you can also use macro and sql steps to accomplish this goal, the code and output are as follows. This macro also works when you have more IDs (here I add only two more ids and you can have more) and more variations of reason1 and reason2 columns, this macro solves your question. This macro only has two simple sql steps and two simple data steps, easy to understand and easy to use. The final output contains unique id and the unique combinations of reason1 and reason2, if you want to display the reason1 and reason2 for all rows of an unique ID, just use if/then statement, such as if ID=0001 then reason1="copy the reason1 column here"; in a date step.

data DB;
  input ID :$200. Reason1 :$200.Reason2 :$200.;
cards;
0001 Considered     Excluded
0001 Violated       Not_excluded
0001 Excluded       Not_Verified
0001 Excluded       Not_Verified
0002 considered     excluded
0003 considered     excluded
0003 violated       not_excluded
;   
run;
%macro datawant;
proc sql noprint;
select distinct id
   into :id1-
   from db;
quit;
%put &sqlobs;
%do i=1 %to &sqlobs;
proc sql noprint;
select distinct reason1
   into :reason1&i separated by ','
   from db
   where id="&&id&i";
select distinct reason2
   into :reason2&i separated by ','
   from db
   where id="&&id&i";
quit;
%put &&reason1&i &&reason2&i;
data final&i;
   id="&&id&i";
   reason1="&&reason1&i";
   reason2="&&reason2&i";
run;
%end;
data finalfinal;
   set %do i=1 %to 3;
       final&i
       %end;
       ;
run;
proc print data=finalfinal;run;
%mend datawant;
%datawant;

dxiao2017_0-1779032539706.png

 

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.

Explore Now →
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 228 views
  • 1 like
  • 5 in conversation