- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data SP340_WITH_REG_TIME_PtLHIN nonmatches ;
merge sp340_wregtime_sorted(in=INSp340) DAD1718_sorted(in=INDAD)(KEEP=PATIENT_LHIN);
by CIHI_KEY;
if INSp340 =1 and INDAD=1 then output SP340_WITH_REG_TIME_PtLHIN;
ELSE output nonmatches;
run;
Can somone please advise how to write the keep option correctly in the merge statement
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Ranjeeta,
With the KEEP= dataset option you exclude all other variables. So, you should add the BY variable to the keep list:
keep=patient_lhin cihi_key.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do it in the same parenthesis
data SP340_WITH_REG_TIME_PtLHIN nonmatches ;
merge sp340_wregtime_sorted(in=INSp340) DAD1718_sorted(in=INDAD KEEP=PATIENT_LHIN);
by CIHI_KEY;
if INSp340 =1 and INDAD=1 then output SP340_WITH_REG_TIME_PtLHIN;
ELSE output nonmatches;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
but I see the variabe when I do proc contents Please advise
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If SAS says that a variable is not in a data set, it is not there. Period.
Please post the output from the contents procedure
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It was because i did not nclude the variable on keep option
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There were 620 observations read from the data set WORK.SP340_WREGTIME_SORTED.
NOTE: There were 1197772 observations read from the data set WORK.DAD1718_SORTED.
NOTE: The data set EVT.SP340_WITH_REG_TIME_PTLHIN has 620 observations and 17 variables.
NOTE: The data set WORK.NONMATCHES has 1197174 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 7.00 seconds
cpu time 3.23 seconds
Why am i getting 22 additional recors in nonmatches
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The totals should only add up if both datasets only have one observation per BY group.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
merge EVT.EVT_hometime(in=INfullcohort) EVT.Evt_qbp(in=INQBP);
by CIHI_KEY;
if INfullcohort =1 and INQBP=0 then output EVTnotQBP;
ELSE output EVTQBP;
run;
EVT.Home time data set has 696 keys and EVT.Evt_qbp has 687 keys
I am trying to write out the 9 keys in the EVTnotQBP dataset and want to know if my code is doing it correctly?
Also in the EVTQBP dataset I want to write out the keys that are only present in the EVT.Evt_qbp dataset which has 687 observations
Is my code correct to get the desired output?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There are up to three (mutually exclusive) possible cases for the pair (INfullcohort, INQBP): (1, 0), (0, 1) and (1, 1). You write the observations belonging to the first case to dataset EVTnotQBP (which makes sense), but EVTQBP will then in general contain observations with "keys that are only present in the EVT.Evt_qbp dataset" plus observations with keys that are present in both datasets.
Edit: Regarding your previous question:
For example, even if the 620 keys (CIHI_KEY values) in SP340_WREGTIME_SORTED were unique, it could be that each of 22 of them match 2 observations in DAD1718_SORTED while 22 other keys don't match any observation. Now, if the remaining 620-22-22=576 keys lead to one-to-one matches, you'll get 576*1+22*2=620 observations in the "intersection" (SP340_WITH_REG_TIME_PTLHIN). Finally, DAD1718_SORTED will contribute 1197772-620 observations to dataset NONMATCHES in addition to the 22 with non-matching keys from SP340_WREGTIME_SORTED, resulting in a total of 1197772-620+22=1197174 observations in NONMATCHES.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just insert the appropriate IF condition into the ELSE statement:
else if INfullcohort=0 and INQBP=1 then output EVTQBP;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Ranjeeta,
With the KEEP= dataset option you exclude all other variables. So, you should add the BY variable to the keep list:
keep=patient_lhin cihi_key.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
NOTE: There were 1197772 observations read from the data set WORK.DAD1718_SORTED.
NOTE: The data set EVT.SP340_WITH_REG_TIME_PTLHIN has 620 observations and 17 variables.
NOTE: The data set WORK.NONMATCHES has 1197174 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 7.00 seconds
cpu time 3.23 seconds
Why am I getting 22 additional records
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content