BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mdakkak
Fluorite | Level 6

Hi SAS communities,

 

I am currently trying to match merge 4 data sets into one. There should only be 1 observations per person, which includes the most recent visit. The data set should include people that have data in the 3 data sets: Contact, Demog, and Vitals. Also, most of the people in those data sets have not died and will not have a matching observation in the NDI data set.  Conversely, most of the people in the ‘NDI’ data set are not in our cohort (so we don’t want to include those observations).  Therefore, when performing a match-merge with the ‘NDI’ data set, the observations from the ‘NDI’ data set which don’t have a matching row in the other 3 data sets should not be included. 

 

DATA HYPANL.HYPANALYSIS2;
	RETAIN SSN StateCd GenderCd EthCd RaceCd AgeAtVisit SBP DBP WtLb HypRelDeathInd;
	MERGE 	WORK.VitalLast		(IN = InVitalLast)
			HypTabs.NDI 		(IN = InNDI)
			HypTabs.Contact 	(IN = InContact)
			HypTabs.DEMOG 		(IN = InDemog);
	BY SSN;
	IF 	InContact 		= 1 
		AND InVitalLast = 1 
		AND InDemog 	= 1;
		
		
/* 	IF LAST.SSN; */
	AgeAtVisit=yrdif(BirthDt, VisitDt, 'AGE');	
	FORMAT 	BirthDt YYMMDD10.
			AgeAtVisit 4.0;
			
/* 		IF InNDI = 0 THEN HypRelDeathInd = 0;		 */
/* 	IF InNDI = . 		THEN HypRelDeathInd = 0; */
/* 	ELSE IF InNDI = 0 	THEN HypRelDeathInd = 0; */
/* 	ELSE IF InNDI = 1 	THEN HypRelDeathInd = 1; */

	IF HypRelDeathInd = . 		THEN HypRelDeathInd = 0;
/* 	ELSE IF HypRelDeathInd = 0 	THEN HypRelDeathInd = 0; */
/* 	ELSE 							 HypRelDeathInd = 1; */

	FORMAT 	HypRelDeathInd 	IndVbl. 
			StateCd 		$StateCd.;

	FORMAT HypRelDeathInd IndVbl. StateCd $StateCd.;
	DROP VisitDt HtIn DeathDt ICD10 CODCd City Inits ZipCd EthRaceCd BirthDt;
RUN;
PROC PRINT DATA = HYPANL.HYPANALYSIS2;
RUN;

I have tried several IF THEN statements, but I still get 566 observations. However, with some IF THEN statements I get 11 observations that have a 'Yes' in the 'HypRelDeathInd' variable, and with other IF THEN statements I get 16. Either way, I still end up with 566 observations, which means there are 7 missing observations. 

 

Thanks in advance for any help/tips!

1 ACCEPTED SOLUTION

Accepted Solutions
adhunamukherjee
Fluorite | Level 6

Hi,

UT_RECORDS dataset has 7 observations which have ID with 8 digits. The ID should have typically 9 digits as it is a SSN. Here the first digit of 0 needs to be added to make it work. 

This is the reason why 36-78-8134 in HypTabs.CONTACT and HypTabs.DEMOG didn't merge with 036-78-8134 in HypTabs.VITALS.

 

You need to add the following when you create your CONTACT_UT and DEMOG_UT datasets which you created before:

IDChar = PUT(ID, 9.);
SSN = CATX("-", SUBSTR(IDChar, 1, 3), SUBSTR(IDChar, 4, 2), SUBSTR(IDChar, 6, 4));

IF ID < 100000000 THEN SSN = "0" || SSN;

This ensures that if the ID has 8 digits, then a 0 is added as a prefix to the SSN.

Once you make this change and generate the updated CONTACT and DEMOG datasets, you should get 573 observations.

Please let me know if this doesn't work.

Thanks,
Adhuna Mukhopadhyay

 

View solution in original post

7 REPLIES 7
bhufman
Calcite | Level 5

Have you tried using the YDRIF statement?

AgeAtVisit = YRDIF(BirthDt, VisitDt, 'Age');
FORMAT AgeAtVisit 2.0;


ChrisNZ
Tourmaline | Level 20

@bhufman 

YRDIF is not a statement. It is a function. Proper terminology matters in order to communicate effectively.

Also, random uppercase and lowercase letters throughout the code makes it very hard to read. Same for no indentation and no alignment.

Choose a format you like and stick to it. This will save you invaluable time. For example, language in lower case and data/user words in uppercase, like:

   merge HYPTABS.DEMOG_1   (in= INDEMOG      drop= ETHRACECD          )
         HYPTABS.CONTACT_1 (in= INCONTACT    keep= SSN STATECD        )
         WORK.LASTVITALS   (in= INLASTVITALS drop= HTIN               )
         HYPTABS.NDI_1     (in= INNDI        drop= DEATHDT ICD10 CODCD) ;

 

mkeintz
PROC Star

I haven't studied your code, but note that NONE of the in dummies (InVitalLast InNDI InContact InDemog) will ever have a missing value.  They will be either zero or one.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisNZ
Tourmaline | Level 20

> The data set should include people that have data in the 3 data sets: Contact, Demog, and Vitals   ... 8>< ...  the observations from the ‘NDI’ data set which don’t have a matching row in the other 3 data sets should not be included. 

 

Surely this clause you wrote implements all these criteria?

	IF 	InContact 		= 1 
		AND InVitalLast = 1 
		AND InDemog 	= 1;

 

> I have tried several IF THEN statements, but I still get 566 observations. However, with some IF THEN statements I get 11 observations that have a 'Yes' in the 'HypRelDeathInd' variable, and with other IF THEN statements I get 16. Either way, I still end up with 566 observations, which means there are 7 missing observations. 

 

I have no idea what you are after. Provide some sample data? 

 

 

 

adhunamukherjee
Fluorite | Level 6

Hi,

UT_RECORDS dataset has 7 observations which have ID with 8 digits. The ID should have typically 9 digits as it is a SSN. Here the first digit of 0 needs to be added to make it work. 

This is the reason why 36-78-8134 in HypTabs.CONTACT and HypTabs.DEMOG didn't merge with 036-78-8134 in HypTabs.VITALS.

 

You need to add the following when you create your CONTACT_UT and DEMOG_UT datasets which you created before:

IDChar = PUT(ID, 9.);
SSN = CATX("-", SUBSTR(IDChar, 1, 3), SUBSTR(IDChar, 4, 2), SUBSTR(IDChar, 6, 4));

IF ID < 100000000 THEN SSN = "0" || SSN;

This ensures that if the ID has 8 digits, then a 0 is added as a prefix to the SSN.

Once you make this change and generate the updated CONTACT and DEMOG datasets, you should get 573 observations.

Please let me know if this doesn't work.

Thanks,
Adhuna Mukhopadhyay

 

ChrisNZ
Tourmaline | Level 20

@adhunamukherjee With that kind of data, it's probably easier to merge numeric values.

if ID = input(compress(CHAR,,'dk'),10.);

 

mdakkak
Fluorite | Level 6

Thank you so much! This worked!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1969 views
  • 1 like
  • 5 in conversation