Hello
I have a data set with 2 columns: CustID (customer ID) and YYMM(year-month in format YYMM).
This data reflect the existing of the customer of each month.
If the customer doesn't exist in specific month then there will not be a raw in this month.
The data is always on 12 months (In this example : from YYMM 2303 till 2402)
I want to create a data set called want that for each custID will have one row with 2 fields:
CustID
Vector (Vector of binary 1/0 with length of 12 that reflect existence of customer in 12 months).
For example:
CustID 3557 will get value 001111111111
CustID 3188 will get value 111111111111
CustID 999 will get value 111001111111
What is the best way to create that required data set?
Data have;
Input YYMM CustID;
cards;
2303 3557
2304 3557
2305 3557
2306 3557
2307 3557
2308 3557
2309 3557
2310 3557
2311 3557
2312 3557
2303 3188
2304 3188
2305 3188
2306 3188
2307 3188
2308 3188
2309 3188
2310 3188
2311 3188
2312 3188
2401 3188
2402 3188
2303 999
2304 999
2305 999
2308 999
2309 999
2310 999
2311 999
2312 999
2401 999
2402 999
;
Run;
%let till_YYMM=2402;
%let till_Date=%sysfunc(inputn(&till_YYMM.,YYMMn4.));
%put &till_Date;
%let From_Date=%sysfunc(intnx(month,%sysfunc(inputn(&till_YYMM.,yymmn4.)),-11));
%put &From_Date;
data _null_;
From_YYMM = put(&From_Date.,yymmn4.);
call symputx('From_YYMM',From_YYMM);
run;
%put &From_YYMM.;
data _null_;
date_start=mdy(mod(&From_YYMM.,100),1,floor(&From_YYMM./100));
date_end=mdy(mod(&till_YYMM.,100),1,floor(&till_YYMM./100));
call symputx('date_start',put(date_start,best.));
call symputx('date_end',put(date_end,best.));
format date_start date_end date9.;
run;
data FollowUp_months_tbl;
date=&date_start.;
end_date=&date_end.;
YYMM=input(put(date,yymmn4.),best.);
format date end_date date9.;
do while (date<=end_date);
output;
date=intnx('month', date, 1, 's');
YYMM=input(put(date,yymmn4.),best.);
end;
format date YYMMN4.;
drop date end_date;
run;
Again with the "not a date variable that requires use as a date for any real purpose".
If you have boss that wants to see YYMM then create DATES and assign a format that assigns an appearance of YYMM (custom ) format. Then the first step in every use of those variables won't be to recreate an actual date variable.
Compounded by creating macro date variables that are formatted and then require the exact same thing again.
What does the "6 numbers" in the subject line mean? I see a want with 12 digits, not 6 and nothing resembling 6 is mentioned anywhere.
And why does CustID 3557 will get value 001111111111 ?
If the end is 2402 and the first is 2303 why is 0 in the first position? It isn't for the Custid 3188 or 999? I can see Custid 3557 getting 111111111100 because there aren't YYMM values of 2401 or 2402.
IF my interpretation of what Custid 3557 should get is correct then perhaps:
%let compdate = %sysfunc(intnx(month,%sysfunc(inputn(2402,yymmn4.)),-11)); data want; set have; Adate = mdy(mod(yymm,100),1,floor(yymm/100)); retain v1-v12; format v1-v12 1.; array v(12) ; by custid notsorted; if first.custid then do i=1 to 12; /* iniatlize array to 0 for each custid*/ v[i]=0; end; /* set each position with the present*/ v[intck('month',&compdate.,adate)+1]=1; if last.custid then do; binary= cats(of v(*)); output; end; keep custid binary; run;
If "binary" is supposed to be numeric then that's another story but not much of one. Could be done with an INPUT and assigning a Z12. format.
Create a YEAR variable and make your YYMM variable an actual DATE.
data have;
input YYMM CustID $;
yymm=input(cats('20',put(yymm,Z4.),'01'),yymmdd8.);
format yymm yymmn6.;
year=year(yymm);
cards;
2303 3557
2304 3557
2305 3557
2306 3557
2307 3557
2308 3557
2309 3557
2310 3557
2311 3557
2312 3557
2303 3188
2304 3188
2305 3188
2306 3188
2307 3188
2308 3188
2309 3188
2310 3188
2311 3188
2312 3188
2401 3188
2402 3188
2303 999
2304 999
2305 999
2308 999
2309 999
2310 999
2311 999
2312 999
2401 999
2402 999
;
Then process the data by CUSTID and YEAR and use an ARRAY to make a separate variable for each MONTH.
proc sort data=have;
by custid year yymm;
run;
data want;
do until(last.year);
set have;
by custid year;
array months month1-month12;
if first.year then do month=1 to 12; months[month]=0; end;
months[month(yymm)]=1;
end;
drop yymm month;
run;
Result
Cust Obs ID year month1 month2 month3 month4 month5 month6 month7 month8 month9 month10 month11 month12 1 3188 2023 0 0 1 1 1 1 1 1 1 1 1 1 2 3188 2024 1 1 0 0 0 0 0 0 0 0 0 0 3 3557 2023 0 0 1 1 1 1 1 1 1 1 1 1 4 999 2023 0 0 1 1 1 0 0 1 1 1 1 1 5 999 2024 1 1 0 0 0 0 0 0 0 0 0 0
Data have;
Input YYMM :yymmn5. CustID;
format YYMM yymmn4.;
cards;
2303 3557
2304 3557
2305 3557
2306 3557
2307 3557
2308 3557
2309 3557
2310 3557
2311 3557
2312 3557
2303 3188
2304 3188
2305 3188
2306 3188
2307 3188
2308 3188
2309 3188
2310 3188
2311 3188
2312 3188
2401 3188
2402 3188
2303 999
2304 999
2305 999
2308 999
2309 999
2310 999
2311 999
2312 999
2401 999
2402 999
;
Run;
proc sort data=have;by CustID YYMM ;run;
data want;
flag='000000000000';
do until(last.CustID);
set have;
by CustID;
substr(flag,intck('month','01mar2023'd,YYMM)+1,1)='1';
end;
keep CustID flag;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.