Hi SAS communities,
I am currently creating a temporary data set that contains the most recent visit for each person. I have this code:
PROC SORT DATA= HypTabs.Vitals;
BY SSN VISITDT;
RUN;
DATA WORK.Vital_Last;
SET HYPTABS.Vitals;
BY SSN;
IF LAST.SSN;
RUN;
This gives me 6 columns and 598 observations. The goal is to get to 573 observations.
So then I merged 4 different data sets together and created the HypAnalysis2 data with this code:
DATA HYPANL.HYPANALYSIS2;
RETAIN SSN StateCd GenderCd EthCd RaceCd AgeAtVisit SBP DBP WtLb HypRelDeathInd;
MERGE WORK.Vital_Last (IN = InVitals)
HypTabs.NDI (IN = InNDI)
HypTabs.Contact (IN = InContact)
HypTabs.DEMOG (IN = InDemog);
BY SSN;
IF InContact = 1
AND InVitals = 1
AND InDemog = 1;
IF LAST.SSN;
AgeAtVisit=yrdif(BirthDt, VisitDt, 'AGE');
FORMAT BirthDt YYMMDD10.
AgeAtVisit 4.0;
/* IF InNDI = 0 THEN HypRelDeathInd = 0; */
IF InNDI = . THEN HypRelDeathInd = 0;
ELSE IF InNDI = 0 THEN HypRelDeathInd = 0;
ELSE IF InNDI = 1 THEN HypRelDeathInd = 1;
FORMAT HypRelDeathInd IndVbl.
StateCd $StateCd.;
FORMAT HypRelDeathInd IndVbl. StateCd $StateCd.;
DROP VisitDt HtIn DeathDt ICD10 CODCd City Inits ZipCd EthRaceCd BirthDt;
RUN;
Once I run that code, I get 403 observations (which should be 573) and 10 columns (which is what I want). In those data sets that were merged, there were 3 states, Utah, Iowa and Mississippi. When looking at the StateCd, I noticed that none of the observations where from Utah. Which leads me to assume that my Utah Vitals data is incomplete/incorrect. If needed, I can share my Utah Vitals data but I am not sure if that is the problem, or if the problem is within the code above.
Thanks in advance for any help!
This gives me 6 columns and 598 observations. The goal is to get to 573 observations.
This requires a lot more explanation. How do you want to get down to 573? Which 573? What is the logic?
I'm afraid that without having your data sets, there's no way we can help you get to the proper 573 records. But, if you'd like to send us small portions of these data sets, and tell us the right number of records to get when using these small portions of the data, then we might have a chance. Please provide the small portions of the data sets using these instructions, and not using any other method.
Hi!
I had a similar issue with the Utah dataset. I realized that some of the SSN from the Utah dataset were missing a leading 0 and were therefore only 8 digits long. I went back to Prog 2 and added this to where we created Contact_UT which fixed it for me.
IDLength=Lengthn(IDCHAR); IF IDLength=8 THEN IDCHAR=CAT(0,IDCHAR);
It's odd that you are missing all data for Utah though, maybe the Utah SSNs are not matching up to the other SSNs for some reason. If you want to share your code from Prog 2 where you created Contact_UT I can take a look for you.
Also, I'm not sure if you need this whole thing:
/* IF InNDI = 0 THEN HypRelDeathInd = 0; */
IF InNDI = . THEN HypRelDeathInd = 0;
ELSE IF InNDI = 0 THEN HypRelDeathInd = 0;
ELSE IF InNDI = 1 THEN HypRelDeathInd = 1;
If you're trying to get HypRelDeathInd to be 0 where it's missing data, all you would need is:
IF HypRelDeathInd = . THEN HypRelDeathInd = 0;
Hello mdakkak!
I am wondering if the problem is occurring during your IF and IF ELSE statements. For example, it could be possible that you are "overmatching" your data and, as a result, too many observations are being removed. Since you are non-match merging these data sets, you are dealing with gaps in the data sets which is why SAS is listing less observations and indicating that some observations do not exist.
I would consider trying to use an IF statement in conjunction with an IF NOT statement in replace of the code that you currently have for the merge. I think the solution is that you need to specify the InDemog and InVitals automatic variables to show up. In addition to this, you would need to manually populate the value for HypRelDeathInd using conditional logic. You can use the keyword "NOT" to call people who don't show up in your dataset and then manually fill in the indicator variable to be 0. I have listed an example for you to look at here:
As others have mentioned, it has to do with leading zero's in some of your observations ssn's. Go to your Utah demog dataset from the previous project and add this: ID = PUT(NewID,z9.); before you string together your SSN using CatX.
OR
If that doesn't work the other code to try using Z9 in the SUBSTR as follows:
SSN = CATX('-', SUBSTR(PUT(ID,Z9.),1,3), SUBSTR(PUT(ID,Z9.), 4,2), SUBSTR (PUT(ID,Z9.),6,4));
Good Luck.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.