BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TheNovice
Quartz | Level 8

I have been trying to figure out how to do this accurately...


I have a list of ID's that undergo a certain code assignment. I have pulled this data from Jul1, 2019 onwards. 

 

E is always the first step and denotes the start of the assignment. There are steps that come after and each step has a date to it.

 

What i need to do is the following:

Of the IDs that had the code E in July 2019 , how many of those IDs did not appear again until 6 months after (in February) I would then repeat the same exercise for other months. i.e aug 2019 E's not appearing until 6 months after and so on.

 

My data is as follows:

ID

code

date

1234

E

02JUL2019:00:00:00

1234

D7

03JUL2019:00:00:00

1234

1S

19JUL2019:00:00:00

1234

E

01SEP2019:00:00:00

1234

D7

02SEP2019:00:00:00

1234

E

01JAN2020:00:00:00

1234

D7

03JAN2020:00:00:00

234

E

05AUG2019:00:00:00

234

E

04JAN2020:00:00:00

4455

E

28SEP2019:00:00:00

4455

D7

29SEP2019:00:00:00

789

E

05JUL2019:00:00:00

789

D7

07JUL2019:00:00:00

789

E

04SEP2019:00:00:00

789

D7

06SEP2019:00:00:00

789

2S

17SEP2019:00:00:00

789

D1

24SEP2019:00:00:00

789

D2

28SEP2019:00:00:00

789

E

04NOV2019:00:00:00

789

E

05DEC2019:00:00:00

789

D7

07DEC2019:00:00:00

789

E

06MAR2020:00:00:00

789

D7

08MAR2020:00:00:00

232

E

03JUL2019:00:00:00

232

D7

05JUL2019:00:00:00

232

E

02AUG2019:00:00:00

 

I tried enumerating the variables but then i am getting stuck on how to pick accounts that had the E in August, no activity in between and then first activity recorded in March 2019 which would be an E. Appreciate any help.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

With hope I understand you correctly then try next code.

I saved your test data as an excel sheet with suffix xlsx.

 

libname testxl xlsx '/folders/myfolders/flat/test0422.xlsx';

data have; 
 set testxl.sheet1(rename=(date=dtt));
     date = datepart(input(dtt,datetime18.));
     format date date9.;
run;

proc sort data=have; by id date; run;

proc transpose data=have(where=(code='E')) 
               out=temp1(drop=_name_);
  by id;
  var date;
run;

data want (keep=ID date1 date2 dif);
 set temp1;
     array dt col: ;
     do i= 1 to dim(dt)-1;
        date1 = dt(i);
        if date1 = . then leave;
        do j = i+1 to dim(dt);
           date2 = dt(j);
           dif= intck('month',date1,date2); 
                put id= date1= date2= dif=;       
           if dif > 5 then output;
        end;
    end;
run;

View solution in original post

5 REPLIES 5
PGStats
Opal | Level 21

What would the output be for your example data?

PG
unison
Lapis Lazuli | Level 10

I imagine you can approach like:

data have;
	input id code :$2. date :date9.;
	format date date9.;
	datalines;
1234 E 02JUL2019
1234 D7 03JUL2019
1234 1S 19JUL2019
1234 E 01FEB2020
1234 D7 02SEP2019
;
run;

proc sort data=have;
	by id code date;
run;

data want;
	set have;
	by id code;
	format l_dt date9.;
	l_dt=lag(date);

	if first.code then
		l_dt=.;
	mths_btwn=intck('month', l_dt, date);

	if mths_btwn >=6;
run;

proc print data=want;
run;
-unison
Shmuel
Garnet | Level 18

With hope I understand you correctly then try next code.

I saved your test data as an excel sheet with suffix xlsx.

 

libname testxl xlsx '/folders/myfolders/flat/test0422.xlsx';

data have; 
 set testxl.sheet1(rename=(date=dtt));
     date = datepart(input(dtt,datetime18.));
     format date date9.;
run;

proc sort data=have; by id date; run;

proc transpose data=have(where=(code='E')) 
               out=temp1(drop=_name_);
  by id;
  var date;
run;

data want (keep=ID date1 date2 dif);
 set temp1;
     array dt col: ;
     do i= 1 to dim(dt)-1;
        date1 = dt(i);
        if date1 = . then leave;
        do j = i+1 to dim(dt);
           date2 = dt(j);
           dif= intck('month',date1,date2); 
                put id= date1= date2= dif=;       
           if dif > 5 then output;
        end;
    end;
run;
TheNovice
Quartz | Level 8

Thank you so much! it worked 🙂

ballardw
Super User

Your data shows an earmark of translation from a (lazy programmer) data source that doesn't actually distinguish between DATES and TIMES. Since every single value you show has a time component of 00:00:00 and you named the variable DATE I would suggest making the value a SAS date value instead of the apparent datetime.

 

This can be accomplished either when the data is brought into SAS by reading with an appropriate informat or modify after read with data step and something like:

Data want;

   set have;

   date = datepart(date);

   format date date9.;

run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 685 views
  • 0 likes
  • 5 in conversation