BookmarkSubscribeRSS Feed
Ronein
Onyx | Level 15

Hello

Please see a data base called have with columns: ID  ,month and follow up variables X2101-X2108 

(Meaning of structure of X variables is year month  :YYMM).

The meaning of month variable is the month the customer left the club.

I want to create a new data set called wanted that will convert variables X's to missing  when the date of variable X is after month variable.

So the expected data set is 

 

Ronein_0-1629988691476.png

 

What is way to do it using array or another method?

 

 

 

 

Data have;
Input ID month X2101 X2102 X2103 X2104 X2105 X2106 X2107 X2108;
Cards;
1 2106 10 20 30 40 50 60 70 80
2 .    40 20 20 80 50 70 70 90
3 2103 15 25 40 50 70 40 20 80
;
Run;
7 REPLIES 7
Tom
Super User Tom
Super User

Change your structure so that month information is not stored in the NAME of the variable.

Either using PROC TRANSPOSE, or just read it in that way to start with.

data have;
  input ID left @;
  do month=2101,2102,2103,2104,2105,2106,2107,2108;
    input value @;
    output;
  end;
cards;
1 2106 10 20 30 40 50 60 70 80
2 .    40 20 20 80 50 70 70 90
3 2103 15 25 40 50 70 40 20 80
;

Now detecting which values to "empty" is easy.

data want;
  set have;
  if . < left <= month then value=.;
run;

If you want to see a report in that tabular layout it is easy to do with the tall structure.

proc report data=want missing;
  column id left value,month;
  define id/group;
  define left/group;
  define value / min ' ' width=4;
  define month / across;
run;

Results:

                                            month


         ID       left  2101  2102  2103  2104  2105  2106  2107  2108
  --------------------------------------------------------------------
          1       2106    10    20    30    40    50     .     .     .
          2          .    40    20    20    80    50    70    70    90
          3       2103    15    25     .     .     .     .     .     .

 You could even produce the report directly from the full data by using a WHERE clause.

proc report headline nocenter data=have missing;
  where not (. < left <=month) ;
  column id left value,month;
  define id/group;
  define left/group;
  define value / min ' ' width=4;
  define month / across;
run;
Ronein
Onyx | Level 15
Thank you,
However I ask not to change the source data set structure.
I recieve it as it is from external source and It consists millions of rows and many columns .
The question was how to perform the desired task wihtout changing the source data set.
It means that structure of data is wide (not long) and columns names are X2101,X2102....X2108
PaigeMiller
Diamond | Level 26

But you should change the structure, so the calculations will be easier, and the final data set will be easier to work with.

--
Paige Miller
Tom
Super User Tom
Super User

So if the variables of interest are the only ones that start with X you can use a variable list to define an array that automatically adjust to new columns.  If there are other variables whose name starts with X then you might have to hard code the list or at least use some other logic to calculate the list.

data want;
  set have;
  array x x:;
  if not missing(month) then do i=1 to dim(x);
    if month <= input(substr(vname(x[i]),2),32.) then x[i]=.;
  end;
  drop i;
run;
ballardw
Super User

First thing, what about that missing month? Everything is greater than missing, so is "after".

 

Assuming that missing months mean don't process this seems to work for the example data;

data want;
  set have;
  array x(*) X2101--X2108;
  if not missing (month) then do i=1 to dim(x);
     name=vname(x[i]);
     if index(name,put(month,4.))>0 then do j=i to dim(x);
        call missing(x[j]);
     end;
  end;
  drop i j name;
run;

HOWEVER you have to increment the names of the variables in the array every time a column is added, the assumption is the columns will be adjacent.

Caveat, your example desired output has the columns of the X variables in a different order and if that reflects your starting data the -- list operator may not work as expected. It is also very off-putting to have example data provided in a different order than the expected appearance as it is just a bit of a headache to read one set left to right and the other right to left.

 

 

 

 

Ronein
Onyx | Level 15
IF month value is missing then it means that customer didnt leave and then we dont need to convert to missing

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1254 views
  • 0 likes
  • 5 in conversation