BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

There are 5 data sets:

data set tbl1 with information for each customer(ID) when he joined the research.

The join date is written as year+month (YYMM) .

for example: 2001 is JAN 2020,2002 is FEB 2020, 2003 is MARCH 2020 and so on.

Data set T2001 includes information of grade for each ID in JAN 2020

Data set T2002 includes information of grade for each ID in FEB 2020

Data set T2003 includes information of grade for each ID in MAR 2020

Data set T2004 includes information of grade for each ID in APR 2020

Data set T2005 includes information of grade for each ID in MAY 2020 

Since we are in month MAY today the last information is of MAY 2020.

I want to create a data set that will contain for each customer information if grades since he joined the research until now (MAY 2020).

g1 is grade 1month after joining the research

g2 is grade 2 months after joining the research

g3 is grade 3 months after joining the research

and so on

 

Data tbl1;
input ID  JoinYYMM;
cards;
1 2001
2 2003
3 2003
4 2002
;
Run;

Data T2001;
input ID grade;
cards;
1 10
2 9
3 9
4 8
;
run;

Data T2002;
input ID grade;
cards;
1 12
2 9
3 7
4 8
;
run;

Data T2003;
input ID grade;
cards;
1 7
2 9
3 6
4 8
;
run;

Data T2004;
input ID grade;
cards;
1 5
2 8
3 6
4 9
;
run;

Data T2005;
input ID grade;
cards;
1 8
2 11
3 7
4 10
;
run;



Data wanted;
input ID FollowUpPeriod g1 g2 g3 g4 g5;
cards;
1 2001-2005 10 12 7 5 8
2 2003-2005 9 8 11 . .
3 2003-2005 6 6 7 . .
4 2002-2005 8 8 9 10 .
;
run;


 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @Ronein 

 

Here is an attempt to achieve this:

data have;
	retain id FollowUpPeriod g:;
	
	merge tbl1 T2001 (rename= (grade=g2001)) 
			   T2002 (rename= (grade=g2002)) 
			   T2003 (rename= (grade=g2003)) 
			   T2004 (rename= (grade=g2004)) 
			   T2005 (rename= (grade=g2005));
	by ID;
	FollowUpPeriod = catx("-",joinYYMM,2005);
	
	array _a (*) g:;

	do i=1 to dim(_a);
		if substr(vname(_a(i)),2) < joinYYMM then do;
			call missing (_a(i));
		end;
	end;
	drop i joinYYMM;
run;

proc transpose data=have out=have_tr;
	var g:;
	by id FollowUpPeriod;
run;

proc transpose data=have_tr (where=(col1 ne .)) out=want (drop=_:) prefix=g;
	var col1;
	by id FollowUpPeriod;
run;

Capture d’écran 2020-05-09 à 11.26.58.png

Best,

View solution in original post

6 REPLIES 6
ed_sas_member
Meteorite | Level 14

Hi @Ronein 

 

Here is an attempt to achieve this:

data have;
	retain id FollowUpPeriod g:;
	
	merge tbl1 T2001 (rename= (grade=g2001)) 
			   T2002 (rename= (grade=g2002)) 
			   T2003 (rename= (grade=g2003)) 
			   T2004 (rename= (grade=g2004)) 
			   T2005 (rename= (grade=g2005));
	by ID;
	FollowUpPeriod = catx("-",joinYYMM,2005);
	
	array _a (*) g:;

	do i=1 to dim(_a);
		if substr(vname(_a(i)),2) < joinYYMM then do;
			call missing (_a(i));
		end;
	end;
	drop i joinYYMM;
run;

proc transpose data=have out=have_tr;
	var g:;
	by id FollowUpPeriod;
run;

proc transpose data=have_tr (where=(col1 ne .)) out=want (drop=_:) prefix=g;
	var col1;
	by id FollowUpPeriod;
run;

Capture d’écran 2020-05-09 à 11.26.58.png

Best,

RichardDeVen
Barite | Level 11

I highly recommend NOT using YYMM date format for your presentations, especially in year 2020 when 2001-2012 are easily construed as years instead of 2020 month 01-12.  

 

You might also consider a grid report showing the grades (since joining).

 

The individual data sets can be easily stacked using a name range list syntax

   SET T2001-T2005;

 

After the data is stacked, a merge by ID can combine join dates with the stack and filtered the grade values according to your rules

 

The YYMM date constructs can be read (i.e. parsed) into SAS date values using informat YYMMN4.  The N in the informat name means there is No separator between year and month.  I would recommend your date presentation be NLDATEYMM. (Jan 2020, Feb 2020, etc)

 

Example - Data for code

Spoiler
Data joindates;
input ID  Join yymmN4.; format join NLDATEYMM.;
cards;
1 2001
2 2003
3 2003
4 2002
;

Data T2001;
input ID grade;
cards;
1 10
2 9
3 9
4 8
;

Data T2002;
input ID grade;
cards;
1 12
2 9
3 7
4 8
;

Data T2003;
input ID grade;
cards;
1 7
2 9
3 6
4 8
;

Data T2004;
input ID grade;
cards;
1 5
2 8
3 6
4 9
;

Data T2005;
input ID grade;
cards;
1 8
2 11
3 7
4 10
;

Example - Combine and Report

data stack(index=(id));
  set T2001-T2005 indsname=table;
  source = table;
  date = input(scan(table, -1, 'T'),yymmN4.);
  format date NLDATEYMM.;
  drop source;
run;

data combined;
  merge joindates stack;
  by id;

  grade_filtered = ifn(date < join, ., grade);
months_since_join = 1 + intck('month', join, date); * value for transpose ID; run; ods html file='report.html' style=plateau; proc report data=combined; column id join grade_filtered,date; define id / group; define join / group; define date / '' across order=data; define grade_filtered / 'Grades' ; run; ods html close;

Output

RichardADeVenezia_0-1589019416527.png

If you want to create the pivoted data set (for reporting purposes?) a SQL and TRANSPOSE step is added.

proc sql;
  create table stage1 as
  select * from 
  ( select id, join, grade, months_since_join
    , catx('-',put(join,NLDATEYMM.),put(max(date),NLDATEYMM.)) as FollowUpPeriod length=17
    from combined
    group by id, join
  ) union
  ( select distinct 0, 0, 0, months_since_join from combined )
  order by id, months_since_join
  ;

proc transpose data=stage1 prefix=g out=want(where=(id>0) drop=_name_);
  by id join FollowUpPeriod;
  id months_since_join;
  var grade;
  where months_since_join > 0;
run;

Result data set

transposed.png

Ronein
Meteorite | Level 14

photo.PNGThank you very much.

The only problem is that in the wanted data set there should be columns: g1,g2,g3,g4,g5 instead of 

g2001 ,g2002,g2003,g2004,g2005

The logic is that the important information is when was the grade with relation to how many months since joining 

ed_sas_member
Meteorite | Level 14

HI @Ronein 

 

Have you tried the code provided above? It will give you the expected output.

 

Best,

Ronein
Meteorite | Level 14

Thank you so much.

It is a very clever way.

I want to ask please a few questions:

date = input(scan(table, -1, 'T'),yymmN4.); 

I see that the target is to get for each observation the data set source name.(for example T2002) .

Then you want to take sub-string that contain only date figures (For example: 2002).

Can you explain please what is the meaning of '-1' in SCAN function?

Can you also use SUBST(table,2,4) ?

Then I see that you use INPUT function with YYMMN4. format in order to convert it to SAS date.

 

May you also explain please what is the format NLDATEYMM?

As I understand it is only change the display of the date.

 

RichardDeVen
Barite | Level 11

The SCAN function allows you to extract words from string, and to specify what characters are the word delimiters.

 

The INDSNAME= option names a temporary variable that will contain the active data set being read from the tables identified by the SET T2001-T2005 statement.

 

The values will be WORK.T2001, WORK.T2002, WORK.T2003, WORK.T2004, WORK.T2005

I told SCAN to take the -1st word from those values using the letter T as the word delimiter.  -1 means get the word counted from the right hand side instead of the right hand side.  The +1st word with delimiter T is "WORK.".  At this point the scanned word should be digits representing date part YYMM, according to the naming construct conventions you are working with or imposing.

 

The same YYMM portion could have been extracted with SUBSTR(SCAN(table,2,'.'),2)

 

The YYMM string is input as a SAS date value using informat YYMMN4.

Because using the same YYMM date representation in viewers and output would be confusing I recommended using format NLDATEYMM.

The NLDATEYMM. format renders a SAS date value as a string with the construct MON YYYY, where MON is the three character month abbreviation and the YYYY is the 4 digits of the year.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 731 views
  • 2 likes
  • 3 in conversation