Re: Convert to missing IF columns names date is greater then value in ...

Ronein · Posted 08-26-2021 10:39 AM

Hello

Please see a data base called have with columns: ID ,month and follow up variables X2101-X2108

(Meaning of structure of X variables is year month :YYMM).

The meaning of month variable is the month the customer left the club.

I want to create a new data set called wanted that will convert variables X's to missing when the date of variable X is after month variable.

So the expected data set is

What is way to do it using array or another method?

Data have;
Input ID month X2101 X2102 X2103 X2104 X2105 X2106 X2107 X2108;
Cards;
1 2106 10 20 30 40 50 60 70 80
2 .    40 20 20 80 50 70 70 90
3 2103 15 25 40 50 70 40 20 80
;
Run;

Tom · Posted 08-26-2021 10:58 AM

Change your structure so that month information is not stored in the NAME of the variable.

Either using PROC TRANSPOSE, or just read it in that way to start with.

data have;
  input ID left @;
  do month=2101,2102,2103,2104,2105,2106,2107,2108;
    input value @;
    output;
  end;
cards;
1 2106 10 20 30 40 50 60 70 80
2 .    40 20 20 80 50 70 70 90
3 2103 15 25 40 50 70 40 20 80
;

Now detecting which values to "empty" is easy.

data want;
  set have;
  if . < left <= month then value=.;
run;

If you want to see a report in that tabular layout it is easy to do with the tall structure.

proc report data=want missing;
  column id left value,month;
  define id/group;
  define left/group;
  define value / min ' ' width=4;
  define month / across;
run;

Results:

                                            month


         ID       left  2101  2102  2103  2104  2105  2106  2107  2108
  --------------------------------------------------------------------
          1       2106    10    20    30    40    50     .     .     .
          2          .    40    20    20    80    50    70    70    90
          3       2103    15    25     .     .     .     .     .     .

You could even produce the report directly from the full data by using a WHERE clause.

proc report headline nocenter data=have missing;
  where not (. < left <=month) ;
  column id left value,month;
  define id/group;
  define left/group;
  define value / min ' ' width=4;
  define month / across;
run;

Ronein · Posted 08-26-2021 11:09 AM

Thank you,
However I ask not to change the source data set structure.
I recieve it as it is from external source and It consists millions of rows and many columns .
The question was how to perform the desired task wihtout changing the source data set.
It means that structure of data is wide (not long) and columns names are X2101,X2102....X2108

PaigeMiller · Posted 08-26-2021 11:12 AM

But you should change the structure, so the calculations will be easier, and the final data set will be easier to work with.

--
Paige Miller

Tom · Posted 08-26-2021 02:23 PM

So if the variables of interest are the only ones that start with X you can use a variable list to define an array that automatically adjust to new columns. If there are other variables whose name starts with X then you might have to hard code the list or at least use some other logic to calculate the list.

data want;
  set have;
  array x x:;
  if not missing(month) then do i=1 to dim(x);
    if month <= input(substr(vname(x[i]),2),32.) then x[i]=.;
  end;
  drop i;
run;

ballardw · Posted 08-26-2021 11:23 AM

First thing, what about that missing month? Everything is greater than missing, so is "after".

Assuming that missing months mean don't process this seems to work for the example data;

data want;
  set have;
  array x(*) X2101--X2108;
  if not missing (month) then do i=1 to dim(x);
     name=vname(x[i]);
     if index(name,put(month,4.))>0 then do j=i to dim(x);
        call missing(x[j]);
     end;
  end;
  drop i j name;
run;

HOWEVER you have to increment the names of the variables in the array every time a column is added, the assumption is the columns will be adjacent.

Caveat, your example desired output has the columns of the X variables in a different order and if that reflects your starting data the -- list operator may not work as expected. It is also very off-putting to have example data provided in a different order than the expected appearance as it is just a bit of a headache to read one set left to right and the other right to left.

Ronein · Posted 08-26-2021 02:10 PM

IF month value is missing then it means that customer didnt leave and then we dont need to convert to missing

Kurt_Bremser · Posted 08-26-2021 03:05 PM

Transpose.

Dates do not belong in column names.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Convert to missing IF columns names date is greater then value in another col

Re: Convert to missing IF columns names date is greater then value in another col

Re: Convert to missing IF columns names date is greater then value in another col

Re: Convert to missing IF columns names date is greater then value in another col

Re: Convert to missing IF columns names date is greater then value in another col

Re: Convert to missing IF columns names date is greater then value in another col

Re: Convert to missing IF columns names date is greater then value in another col

Re: Convert to missing IF columns names date is greater then value in another col

Registration is open

Registration is open

SAS Training: Just a Click Away