BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello 

What is the way to  concatenate all columns names when value=1?

I want to do it only for columns that start with name "Ind".

I want to have a dynamic program  because the number of columns (columns start with Ind) might be changed.

In this example the wanted new column called "New" will get the following values:

For ID=1  the value will be  315,527

For ID=2  the value will be  118,315

For ID=3  the value will be 118,224,315,527

For ID=4  the value will be  527

 

I also want to create another new columns called "New2" that will have the concatenate descriptions (using proc format).

proc format ;

value fff

118='Reason_A'

224='Reason_B'

315='Reason_C'

527='Reason_D'

;

Run;

 

For ID=1  the value will be  'Reason_C,Reason_D'
For ID=2  the value will be  'Reason_A,Reason_C'
For ID=3  the value will be  'Reason_A,Reason_B,Reason_C,Reason_D'
For ID=4  the value will be ' Reason_D'

 

 

Data have;
input ID Ind118 Ind224 Ind315 Ind527 ;
cards;
1 0 0 1 1 
2 1 0 1 0
3 1 1 1 1 
4 0 1 0 0
5 0 0 0 1
;
Run;

 

 

 

4 REPLIES 4
Kurt_Bremser
Super User

First, get data out of structure by transposing, and eliminate the 0 values:

proc transpose
  data=have
  out=long1 (where=(col1 ne 0))
;
by id;
var ind:;
run;

Next get the value out of _name_, and drop the now redundant col1:

data long2;
set long;
length ind $3;
ind = substr(_name_,4);
drop _name_ col1;
run;

Then concatenate:

data wanr;
set long2;
by id;
length value $100;
if first.id then value = "";
value = catx(',',value,put(ind,$fff.));
if last.id;
keep id value;
run;

Untested, posted from my tablet.

 

PS your format needs to be a character format.

Tom
Super User Tom
Super User

Don't store the REASON into the NAME of the variable.  Store it as its own variable. 

data tall;
  input ID @ ;
  do reason=118,224,315,527 ;
    input value @;
    output;
  end;
cards;
1 0 0 1 1 
2 1 0 1 0
3 1 1 1 1 
4 0 1 0 0
5 0 0 0 1
;

Then the problem is simple.

proc format ;
  value fff
    118='Reason_A'
    224='Reason_B'
    315='Reason_C'
    527='Reason_D'
  ;
run;

data want;
  do until(last.id);
    set tall ;
    by id;
    length new new2 $200;
    if value then do;
       new=catx(',',new,reason);
       new2=catx(',',new2,put(reason,fff.));
    end;
  end;
  drop reason value;
run;

Results

Obs    ID    new                new2

 1      1    315,527            Reason_C,Reason_D
 2      2    118,315            Reason_A,Reason_C
 3      3    118,224,315,527    Reason_A,Reason_B,Reason_C,Reason_D
 4      4    224                Reason_B
 5      5    527                Reason_D

If you keep it in the "wide" format then you will need convert the text from the name into the numeric reason to use the format.

data want;
  set have;
  array ind ind:;
  length new new2 $200;
  do index=1 to dim(ind);
    if ind[index] then do;
      reason=input(substr(vname(ind[index]),4),32.);
      new=catx(',',new,reason);
      new2=catx(',',new2,put(reason,fff.));
    end;
  end;
  drop index reason;
run;
mkeintz
PROC Star

I think the key issue here, is the extracting the numeric part of the variable name - i.e. the numeric characters following IND in the variable name.  You can use the VNAME function to save those characters in a retained array, which you need to do only once, during the first observation.

 

Then just retrieve those characters when the corresponding IND variable =1.

data have;
input ID Ind118 Ind224 Ind315 Ind527 ;
cards;
1 0 0 1 1 
2 1 0 1 0
3 1 1 1 1 
4 0 1 0 0
5 0 0 0 1
run;

proc format;
  value $fff
    '118'='Reason_A'
    '224'='Reason_B'
    '315'='Reason_C'
    '527'='Reason_D'
    '999'='Other Reason'
    ;
run;

data want (drop=_:);
  set have;
  array _vals {10} $3 ;
  retain _vals: ; /*Numeric characters from the IND varnames */
  array _ind ind: ;
  if _n_=1 then do over _ind;
    _vals{_i_} = left(tranwrd(upcase(vname(_ind)),'IND',''));
  end;
  length codes $15 ;
  length reason $100;
  do over _ind;
    if _ind=1 then do;
      codes=catx(',',codes,_vals{_i_});
      reason=catx(',',reason,put(_vals{_i_},$fff.));
    end;
  end;
run;

Be sure to make the _VALS array large enough to accommodate the maximum number of expected IND variables.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

If you are going to use implicit array reference then why not go all in and use it everywhere.

(Also why not use the numeric format the original poster created instead of having to make a new character format?).

proc format;
  value fff
    118='Reason_A'
    224='Reason_B'
    315='Reason_C'
    527='Reason_D'
    999='Other Reason'
  ;
run;

data want ;
  set have;
  array reason reason1-reason100;
  retain reason: ;
  array ind ind: ;
  if _n_=1 then do over ind;
    reason = input(substr(vname(ind),4),32.);
  end;
  length new new2 $200;
  do over ind;
    if ind then do;
      new=catx(',',new,reason);
      new2=catx(',',new2,put(reason,fff.));
    end;
  end;
  drop reason: ;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1274 views
  • 0 likes
  • 4 in conversation