data SP340_WITH_REG_TIME_PtLHIN nonmatches ;
merge sp340_wregtime_sorted(in=INSp340) DAD1718_sorted(in=INDAD)(KEEP=PATIENT_LHIN);
by CIHI_KEY;
if INSp340 =1 and INDAD=1 then output SP340_WITH_REG_TIME_PtLHIN;
ELSE output nonmatches;
run;
Can somone please advise how to write the keep option correctly in the merge statement
Hi @Ranjeeta,
With the KEEP= dataset option you exclude all other variables. So, you should add the BY variable to the keep list:
keep=patient_lhin cihi_key.
Do it in the same parenthesis
data SP340_WITH_REG_TIME_PtLHIN nonmatches ;
merge sp340_wregtime_sorted(in=INSp340) DAD1718_sorted(in=INDAD KEEP=PATIENT_LHIN);
by CIHI_KEY;
if INSp340 =1 and INDAD=1 then output SP340_WITH_REG_TIME_PtLHIN;
ELSE output nonmatches;
run;
If SAS says that a variable is not in a data set, it is not there. Period.
Please post the output from the contents procedure
It was because i did not nclude the variable on keep option
There were 620 observations read from the data set WORK.SP340_WREGTIME_SORTED.
NOTE: There were 1197772 observations read from the data set WORK.DAD1718_SORTED.
NOTE: The data set EVT.SP340_WITH_REG_TIME_PTLHIN has 620 observations and 17 variables.
NOTE: The data set WORK.NONMATCHES has 1197174 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 7.00 seconds
cpu time 3.23 seconds
Why am i getting 22 additional recors in nonmatches
The totals should only add up if both datasets only have one observation per BY group.
There are up to three (mutually exclusive) possible cases for the pair (INfullcohort, INQBP): (1, 0), (0, 1) and (1, 1). You write the observations belonging to the first case to dataset EVTnotQBP (which makes sense), but EVTQBP will then in general contain observations with "keys that are only present in the EVT.Evt_qbp dataset" plus observations with keys that are present in both datasets.
Edit: Regarding your previous question:
For example, even if the 620 keys (CIHI_KEY values) in SP340_WREGTIME_SORTED were unique, it could be that each of 22 of them match 2 observations in DAD1718_SORTED while 22 other keys don't match any observation. Now, if the remaining 620-22-22=576 keys lead to one-to-one matches, you'll get 576*1+22*2=620 observations in the "intersection" (SP340_WITH_REG_TIME_PTLHIN). Finally, DAD1718_SORTED will contribute 1197772-620 observations to dataset NONMATCHES in addition to the 22 with non-matching keys from SP340_WREGTIME_SORTED, resulting in a total of 1197772-620+22=1197174 observations in NONMATCHES.
Just insert the appropriate IF condition into the ELSE statement:
else if INfullcohort=0 and INQBP=1 then output EVTQBP;
Hi @Ranjeeta,
With the KEEP= dataset option you exclude all other variables. So, you should add the BY variable to the keep list:
keep=patient_lhin cihi_key.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.