Hello
Please see a data base called have with columns: ID ,month and follow up variables X2101-X2108
(Meaning of structure of X variables is year month :YYMM).
The meaning of month variable is the month the customer left the club.
I want to create a new data set called wanted that will convert variables X's to missing when the date of variable X is after month variable.
So the expected data set is
What is way to do it using array or another method?
Data have;
Input ID month X2101 X2102 X2103 X2104 X2105 X2106 X2107 X2108;
Cards;
1 2106 10 20 30 40 50 60 70 80
2 . 40 20 20 80 50 70 70 90
3 2103 15 25 40 50 70 40 20 80
;
Run;
Change your structure so that month information is not stored in the NAME of the variable.
Either using PROC TRANSPOSE, or just read it in that way to start with.
data have;
input ID left @;
do month=2101,2102,2103,2104,2105,2106,2107,2108;
input value @;
output;
end;
cards;
1 2106 10 20 30 40 50 60 70 80
2 . 40 20 20 80 50 70 70 90
3 2103 15 25 40 50 70 40 20 80
;
Now detecting which values to "empty" is easy.
data want;
set have;
if . < left <= month then value=.;
run;
If you want to see a report in that tabular layout it is easy to do with the tall structure.
proc report data=want missing;
column id left value,month;
define id/group;
define left/group;
define value / min ' ' width=4;
define month / across;
run;
Results:
month ID left 2101 2102 2103 2104 2105 2106 2107 2108 -------------------------------------------------------------------- 1 2106 10 20 30 40 50 . . . 2 . 40 20 20 80 50 70 70 90 3 2103 15 25 . . . . . .
You could even produce the report directly from the full data by using a WHERE clause.
proc report headline nocenter data=have missing;
where not (. < left <=month) ;
column id left value,month;
define id/group;
define left/group;
define value / min ' ' width=4;
define month / across;
run;
But you should change the structure, so the calculations will be easier, and the final data set will be easier to work with.
So if the variables of interest are the only ones that start with X you can use a variable list to define an array that automatically adjust to new columns. If there are other variables whose name starts with X then you might have to hard code the list or at least use some other logic to calculate the list.
data want;
set have;
array x x:;
if not missing(month) then do i=1 to dim(x);
if month <= input(substr(vname(x[i]),2),32.) then x[i]=.;
end;
drop i;
run;
First thing, what about that missing month? Everything is greater than missing, so is "after".
Assuming that missing months mean don't process this seems to work for the example data;
data want; set have; array x(*) X2101--X2108; if not missing (month) then do i=1 to dim(x); name=vname(x[i]); if index(name,put(month,4.))>0 then do j=i to dim(x); call missing(x[j]); end; end; drop i j name; run;
HOWEVER you have to increment the names of the variables in the array every time a column is added, the assumption is the columns will be adjacent.
Caveat, your example desired output has the columns of the X variables in a different order and if that reflects your starting data the -- list operator may not work as expected. It is also very off-putting to have example data provided in a different order than the expected appearance as it is just a bit of a headache to read one set left to right and the other right to left.
Transpose.
Transpose.
Transpose.
Dates do not belong in column names.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.