Hello
There are 5 data sets:
data set tbl1 with information for each customer(ID) when he joined the research.
The join date is written as year+month (YYMM) .
for example: 2001 is JAN 2020,2002 is FEB 2020, 2003 is MARCH 2020 and so on.
Data set T2001 includes information of grade for each ID in JAN 2020
Data set T2002 includes information of grade for each ID in FEB 2020
Data set T2003 includes information of grade for each ID in MAR 2020
Data set T2004 includes information of grade for each ID in APR 2020
Data set T2005 includes information of grade for each ID in MAY 2020
Since we are in month MAY today the last information is of MAY 2020.
I want to create a data set that will contain for each customer information if grades since he joined the research until now (MAY 2020).
g1 is grade 1month after joining the research
g2 is grade 2 months after joining the research
g3 is grade 3 months after joining the research
and so on
Data tbl1;
input ID JoinYYMM;
cards;
1 2001
2 2003
3 2003
4 2002
;
Run;
Data T2001;
input ID grade;
cards;
1 10
2 9
3 9
4 8
;
run;
Data T2002;
input ID grade;
cards;
1 12
2 9
3 7
4 8
;
run;
Data T2003;
input ID grade;
cards;
1 7
2 9
3 6
4 8
;
run;
Data T2004;
input ID grade;
cards;
1 5
2 8
3 6
4 9
;
run;
Data T2005;
input ID grade;
cards;
1 8
2 11
3 7
4 10
;
run;
Data wanted;
input ID FollowUpPeriod g1 g2 g3 g4 g5;
cards;
1 2001-2005 10 12 7 5 8
2 2003-2005 9 8 11 . .
3 2003-2005 6 6 7 . .
4 2002-2005 8 8 9 10 .
;
run;
Hi @Ronein
Here is an attempt to achieve this:
data have;
retain id FollowUpPeriod g:;
merge tbl1 T2001 (rename= (grade=g2001))
T2002 (rename= (grade=g2002))
T2003 (rename= (grade=g2003))
T2004 (rename= (grade=g2004))
T2005 (rename= (grade=g2005));
by ID;
FollowUpPeriod = catx("-",joinYYMM,2005);
array _a (*) g:;
do i=1 to dim(_a);
if substr(vname(_a(i)),2) < joinYYMM then do;
call missing (_a(i));
end;
end;
drop i joinYYMM;
run;
proc transpose data=have out=have_tr;
var g:;
by id FollowUpPeriod;
run;
proc transpose data=have_tr (where=(col1 ne .)) out=want (drop=_:) prefix=g;
var col1;
by id FollowUpPeriod;
run;
Best,
Hi @Ronein
Here is an attempt to achieve this:
data have;
retain id FollowUpPeriod g:;
merge tbl1 T2001 (rename= (grade=g2001))
T2002 (rename= (grade=g2002))
T2003 (rename= (grade=g2003))
T2004 (rename= (grade=g2004))
T2005 (rename= (grade=g2005));
by ID;
FollowUpPeriod = catx("-",joinYYMM,2005);
array _a (*) g:;
do i=1 to dim(_a);
if substr(vname(_a(i)),2) < joinYYMM then do;
call missing (_a(i));
end;
end;
drop i joinYYMM;
run;
proc transpose data=have out=have_tr;
var g:;
by id FollowUpPeriod;
run;
proc transpose data=have_tr (where=(col1 ne .)) out=want (drop=_:) prefix=g;
var col1;
by id FollowUpPeriod;
run;
Best,
I highly recommend NOT using YYMM date format for your presentations, especially in year 2020 when 2001-2012 are easily construed as years instead of 2020 month 01-12.
You might also consider a grid report showing the grades (since joining).
The individual data sets can be easily stacked using a name range list syntax
SET T2001-T2005;
After the data is stacked, a merge by ID can combine join dates with the stack and filtered the grade values according to your rules
The YYMM date constructs can be read (i.e. parsed) into SAS date values using informat YYMMN4. The N in the informat name means there is No separator between year and month. I would recommend your date presentation be NLDATEYMM. (Jan 2020, Feb 2020, etc)
Example - Data for code
Data joindates; input ID Join yymmN4.; format join NLDATEYMM.; cards; 1 2001 2 2003 3 2003 4 2002 ; Data T2001; input ID grade; cards; 1 10 2 9 3 9 4 8 ; Data T2002; input ID grade; cards; 1 12 2 9 3 7 4 8 ; Data T2003; input ID grade; cards; 1 7 2 9 3 6 4 8 ; Data T2004; input ID grade; cards; 1 5 2 8 3 6 4 9 ; Data T2005; input ID grade; cards; 1 8 2 11 3 7 4 10 ;
Example - Combine and Report
data stack(index=(id)); set T2001-T2005 indsname=table; source = table; date = input(scan(table, -1, 'T'),yymmN4.); format date NLDATEYMM.; drop source; run; data combined; merge joindates stack; by id; grade_filtered = ifn(date < join, ., grade);
months_since_join = 1 + intck('month', join, date); * value for transpose ID; run; ods html file='report.html' style=plateau; proc report data=combined; column id join grade_filtered,date; define id / group; define join / group; define date / '' across order=data; define grade_filtered / 'Grades' ; run; ods html close;
Output
If you want to create the pivoted data set (for reporting purposes?) a SQL and TRANSPOSE step is added.
proc sql; create table stage1 as select * from ( select id, join, grade, months_since_join , catx('-',put(join,NLDATEYMM.),put(max(date),NLDATEYMM.)) as FollowUpPeriod length=17 from combined group by id, join ) union ( select distinct 0, 0, 0, months_since_join from combined ) order by id, months_since_join ; proc transpose data=stage1 prefix=g out=want(where=(id>0) drop=_name_); by id join FollowUpPeriod; id months_since_join; var grade; where months_since_join > 0; run;
Result data set
Thank you very much.
The only problem is that in the wanted data set there should be columns: g1,g2,g3,g4,g5 instead of
g2001 ,g2002,g2003,g2004,g2005
The logic is that the important information is when was the grade with relation to how many months since joining
Thank you so much.
It is a very clever way.
I want to ask please a few questions:
date = input(scan(table, -1, 'T'),yymmN4.);
I see that the target is to get for each observation the data set source name.(for example T2002) .
Then you want to take sub-string that contain only date figures (For example: 2002).
Can you explain please what is the meaning of '-1' in SCAN function?
Can you also use SUBST(table,2,4) ?
Then I see that you use INPUT function with YYMMN4. format in order to convert it to SAS date.
May you also explain please what is the format NLDATEYMM?
As I understand it is only change the display of the date.
The SCAN function allows you to extract words from string, and to specify what characters are the word delimiters.
The INDSNAME= option names a temporary variable that will contain the active data set being read from the tables identified by the SET T2001-T2005 statement.
The values will be WORK.T2001, WORK.T2002, WORK.T2003, WORK.T2004, WORK.T2005
I told SCAN to take the -1st word from those values using the letter T as the word delimiter. -1 means get the word counted from the right hand side instead of the right hand side. The +1st word with delimiter T is "WORK.". At this point the scanned word should be digits representing date part YYMM, according to the naming construct conventions you are working with or imposing.
The same YYMM portion could have been extracted with SUBSTR(SCAN(table,2,'.'),2)
The YYMM string is input as a SAS date value using informat YYMMN4.
Because using the same YYMM date representation in viewers and output would be confusing I recommended using format NLDATEYMM.
The NLDATEYMM. format renders a SAS date value as a string with the construct MON YYYY, where MON is the three character month abbreviation and the YYYY is the 4 digits of the year.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.