DATA Step, Macro, Functions and more

Data step

Reply
Learner
Posts: 1

Data step

Hi every one I wanted to know how do i find the new data entry of the week ? 

 

For Example My data looks like this 

Input : 

Week   Patient 

W1          P1

W1          P2

W1          P3 

W2          P1   

W2          P2 

W2          P4

W3          P3

W3          P4

W3          P5

 

Desired output 

 

Week  Patient   Flag 

W1          P1

W1          P2

W1          P3 

W2          P1   

W2          P2 

W2          P4       New 

W3          P3

W3          P4

W3          P5       New

 

Basically if i am finding the new patient in w6 then I have to look back all possible week and compare how many of these patients have visited me before W6 and if there a new which was not there till w6 then I have to mark that one as a new patient  

The weekly data set I have is huge I there is no way i can write individual subset code for everyone 

any help will be much appreciated 

Thanks in advance 

Super User
Posts: 6,524

Re: Data step

Posted in reply to spdattani

It looks like you have a second, unwritten rule here:  Don't mark anybody in W1 as "New".

 

To get the result you are asking for:

 

proc sort data=have;

by patient week;

run;

 

data want;

set have;

by patient week;

if first.patient and week ne 'W1' then flag='New';

run;

 

There is a hidden pitfall here.  This program assumes that the weeks go as high as 9, but no higher.  Otherwise the sorted order would be wrong:

 

W1

W10

W11

W2

 

If you have more than 9 weeks, you would need to extract a numeric variable from WEEK:

 

weeknum = input( substr(week,2), 2.);

 

Then use WEEKNUM instead of WEEK for sorting.

 

Super User
Posts: 6,524

Re: Data step

Posted in reply to Astounding

A quick follow-up here ... SAS provides a way that you can sort character values but make "W9" sort before "W10".  Here's the revised solution:

 

proc sort data=have sortseq=linguistic (numeric_collation=on);

by patient week;

run;

 

data want;

set have;

by patient;

if first.patient and week ne 'W1' then flag='New';

run;

 

Note that the second BY statement has changed as well.

PROC Star
Posts: 1,287

Re: Data step

Posted in reply to spdattani
data have;
input (Week   Patient) ($) ;
datalines;
W1          P1
W1          P2
W1          P3 
W2          P1   
W2          P2 
W2          P4
W3          P3
W3          P4
W3          P5
;
data want;
if _n_=1 then do;
if 0 then set have;
 dcl hash h();
 h.definekey('patient');
 h.definedone();
 end;
 set have end=last;
 by Week   Patient;
 length Flag $10;
 call missing(Flag);
 if first.week then _n+1;
if _n>1 then do; 
if h.check() ne 0 then Flag='NEW';
end;
h.replace();
drop _:;
run;
Super Contributor
Posts: 448

Re: Data step

Posted in reply to spdattani

Check this out.

 

Data have;
infile datalines dlm=" ";
input Week $ Patient $;
datalines; 
W1 P1
W1 P2
W1 P3
W2 P1
W2 P2
W2 P4
W3 P3
W3 P4
W3 P5
;
run;

PROC SORT DATA=HAVE;
BY WEEK;
RUN;
%LET Patient_List=PATIENTS;
DATA WANT;
SET HAVE ;
IF FIND(SYMGET("Patient_List"),STRIP(PATIENT))=0 
		THEN DO;
			CALL SYMPUT("Patient_List",CATX(',',SYMGET("Patient_List"),PATIENT));
			FLAG="NEW";
		END;
IF WEEK="W1" THEN FLAG="";
RUN;
Thanks,
Suryakiran
PROC Star
Posts: 1,287

Re: Data step

Posted in reply to spdattani

arrays:

 

data have;
input (Week   Patient) ($) ;
datalines;
W1          P1
W1          P2
W1          P3 
W2          P1   
W2          P2 
W2          P4
W3          P3
W3          P4
W3          P5
;

data want;
do _n_=1 by 1 until(eof);
set have end=eof;
by Week   Patient;
array temp(1000)$ _temporary_ ;/*arbitrary subscript value*/
length Flag $10;
call missing(Flag);
if first.week then _n+1;
if _n=1 then temp(_n_)=Patient;
else if _n>1 and patient not in temp then do; Flag="NEW";temp(_n_)=Patient;end;
output;
end;
drop _:;
run;


Ask a Question
Discussion stats
  • 5 replies
  • 149 views
  • 0 likes
  • 4 in conversation