BookmarkSubscribeRSS Feed
🔒 This topic is locked. We are no longer accepting replies to this topic. Need further help? Please sign in and ask a new question.
broberts
Calcite | Level 5

I have a report thats run daily. I need to run each day against the other to find the variables in 5 different columns that have either changed from 0 to 1 or 1 to 0, then summarize each type of count and total count of changes. I have tried to create a unique id and then do an inner join with a nested statement, but somehow i am getting the wrong results. Is there an easier way to accomplish this? Maybe with proc report, proc freq, or a proc compare with some sort of summary?

Proc SQL;
CREATE TABLE work.FileA_TempID
AS (select MemberKey,ReportingPopulationName,MeasureKey,SubMeasureKey,SampleStatus,PrimaryMember,HRSampleMemberID,DENOMCNT, NUMERCNT, ADNUMCNT,MRNUMCNT,SPNUMCNT,CONTRCNT, CATS(MeasureKey,SubMeasureKey,MemberKey) AS TempID
FROM ref.TX_24);
RUN;
proc sql;
CREATE TABLE work.FileB_TempID
AS (select MemberKey,ReportingPopulationName,MeasureKey,SubMeasureKey,SampleStatus,PrimaryMember,HRSampleMemberID,DENOMCNT2, NUMERCNT2, ADNUMCNT2,MRNUMCNT2,SPNUMCNT2, CONTRCNT2, CATS(MeasureKey,SubMeasureKey,MemberKey) AS TempID
FROM ref.TX_25);
RUN;
/*JOIN with Nested Statement*/
proc sql;
CREATE TABLE work.Filtered_Table AS
select FileA_TempID.MemberKey,FileA_TempID.TempID,FILEB_TEMPID.ReportingPopulationName,FILEB_TEMPID.MeasureKey, FILEB_TEMPID.SubMeasureKey, FILEB_TEMPID.SampleStatus, FILEB_TEMPID.PrimaryMember, FILEB_TEMPID.HRSampleMemberID, FILEA_TEMPID.DENOMCNT,FILEB_TEMPID.DENOMCNT2, FILEA_TEMPID.NUMERCNT,FILEB_TEMPID.NUMERCNT2,FILEA_TEMPID.ADNUMCNT, FILEB_TEMPID.ADNUMCNT2, FILEA_TEMPID.MRNUMCNT, FILEB_TEMPID.MRNUMCNT2,FILEA_TEMPID.SPNUMCNT, FILEB_TEMPID.SPNUMCNT2,FILEA_TEMPID.CONTRCNT, FILEB_TEMPID.CONTRCNT2  
from work.fileb_tempid as FB inner join work.filea_tempid as FA
on FB.TempID=FA.TempID
where
(FB.NUMERCNT2=0 and FA.NUMERCNT=1) or (FB.NUMERCNT2=1 and FA.NUMERCNT=0) or (ADNUMCNT2=0 and FA.ADNUMCNT=1)or (FB.ADNUMCNT2=1 and FA.ADNUMCNT=0)
or (FB.SPNUMCNT2=1 and FA.SPNUMCNT=0)or(FB.SPNUMCNT2=0 and FA.SPNUMCNT=1)
or (FB.CONTRCNT2=1 and FA.CONTRCNT=0)or (FB.CONTRCNT2=0 and FA.CONTRCNT=1)or(FB.MRNUMCNT2=1 and FA.MRNUMCNT=0)or (FB.MRNUMCNT2=0 and FA.MRNUMCNT=1);
Run;
proc sql;
create table work.Final as 
select TempID,MemberKey,MeasureKey,SubMeasureKey,ADNUMCNT,ADNUMCNT2,MRNUMCNT,MRNUMCNT2,SPNUMCNT,SPNUMCNT2,CONTRCNT,CONTRCNT2,ReportingPopulationName,SampleStatus,PrimaryMember,HRSampleMemberID
from work.Filtered_Table;
Quit;
2 REPLIES 2
Shmuel
Garnet | Level 18

You don't need to rename variables for comparison; Instead I assumed both files have the same variable names; I have concatenated to the two days datasets and sorted them by TempID, added a variable to point to the the original dataset, and then checked by DIF() function for changes.

 

Check next Code. In case of any issue please post some test data.

Proc SQL;
CREATE TABLE work.FileA_TempID
AS select CATS(MeasureKey,
		  SubMeasureKey, MemberKey) AS TempID,
		  MemberKey, ReportingPopulationName, MeasureKey,
          SubMeasureKey, SampleStatus, PrimaryMember,
		  HRSampleMemberID, DENOMCNT, NUMERCNT, ADNUMCNT,
		  MRNUMCNT, SPNUMCNT, CONTRCNT, 
	"TX24" as origin
FROM ref.TX_24);
RUN;
proc sql;
CREATE TABLE work.FileB_TempID
AS select CATS(MeasureKey,
		  SubMeasureKey, MemberKey) AS TempID,
		  MemberKey, ReportingPopulationName, MeasureKey,
          SubMeasureKey, SampleStatus, PrimaryMember,
		  HRSampleMemberID, DENOMCNT, NUMERCNT, ADNUMCNT,
		  MRNUMCNT, SPNUMCNT, CONTRCNT, 
	"TX25" as origin
FROM ref.TX_25);
RUN;

data tx2425 / view=tx2425;
 set FileA_TempID FileB_TempID;
run;
proc sort data=tx2425 out=sorted;
  by TempID origin;
run;

data want;
 set sorted;
  by tempID;
     retain prevorg;
     if first.TempID and origin='TX25' OR
	    first.TempID and last.TempID 
	    then do;
	    /* no parallel TempID in TX_24 OR single TempID*/
		.... enter your code .....
	    return; 
	 end;
	 
	 if first.TempID then prevorg = origin;
else do; if origin = 'TX25' and prevorg = 'TX24' then do; if dif(NUMERCNT) or dif(ADNUMCNT) or dif(MRNUMCNT) or dif(SPNUMCNT) or dif(CONTRCNT) then output; end; end; KEEP TempID MemberKey MeasureKey SubMeasureKey ADNUMCNT ADNUMCNT2 MRNUMCNT MRNUMCNT2 SPNUMCNT SPNUMCNT2 CONTRCNT CONTRCNT2 ReportingPopulationName SampleStatus PrimaryMember HRSampleMemberID; from work.Filtered_Table; run;

 

 

 

 

ballardw
Super User

Example data.

What the output for the example data should look like.

No reason in an example to provide 15 grouping variables. A single unique key value will suffice for an example. Probably only need to compare 3 variables as an example.

 

Since your code involves a lot of comparing not-like-named variables I am not sure what is going on. Why do what seem like should be the same names in different data sets have a different name?

 

It is not clear from your description how many days are involved, exactly 2 or is this a "current day" versus some unknown number of previous days?

 

Assuming exactly two days this is where I might start.

data x1;
   input id $ v1-v3;
datalines;
a 1 1 0
b 0 1 1
c 0 0 0
d 0 0 1
;
data x2;
   input id $ v1-v3;
datalines;
a 1 1 1
b 0 1 1
c 0 1 0
d 1 0 1
;

data combined;
  set x1 x2;
run;

proc means data=combined nway ;
   class id;
   var v1-v3;
   output out=want (drop=_:) range=;
run;

The range function returns the largest minus the smallest value. In the case of 1/0 coded variables then a range of 0 means "no change" and 1 means "changed".

So in the want data set it is easy to see that V3 changed for id=A, V2 changed for Id=C, V1 change for id=D and no other changes.