BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ranjeeta
Pyrite | Level 9

data SP340_WITH_REG_TIME_PtLHIN nonmatches ;
merge sp340_wregtime_sorted(in=INSp340) DAD1718_sorted(in=INDAD)(KEEP=PATIENT_LHIN);
by CIHI_KEY;
if INSp340 =1 and INDAD=1 then output SP340_WITH_REG_TIME_PtLHIN;
ELSE output nonmatches;
run;

Can somone please advise how to write the keep option correctly in the merge statement 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @Ranjeeta,

 

With the KEEP= dataset option you exclude all other variables. So, you should add the BY variable to the keep list:
keep=patient_lhin cihi_key.

View solution in original post

14 REPLIES 14
PeterClemmensen
Tourmaline | Level 20

Do it in the same parenthesis

 

data SP340_WITH_REG_TIME_PtLHIN nonmatches ;
merge sp340_wregtime_sorted(in=INSp340) DAD1718_sorted(in=INDAD KEEP=PATIENT_LHIN);
by CIHI_KEY;
if INSp340 =1 and INDAD=1 then output SP340_WITH_REG_TIME_PtLHIN;
ELSE output nonmatches;
run;
Ranjeeta
Pyrite | Level 9
I get the error BY variable CIHI_KEY is not on input data set WORK.DAD1718_SORTED
but I see the variabe when I do proc contents Please advise
PeterClemmensen
Tourmaline | Level 20

If SAS says that a variable is not in a data set, it is not there. Period.

 

Please post the output from the contents procedure

Ranjeeta
Pyrite | Level 9

It was because i did not nclude the variable on keep option

Ranjeeta
Pyrite | Level 9

There were 620 observations read from the data set WORK.SP340_WREGTIME_SORTED.
NOTE: There were 1197772 observations read from the data set WORK.DAD1718_SORTED.
NOTE: The data set EVT.SP340_WITH_REG_TIME_PTLHIN has 620 observations and 17 variables.
NOTE: The data set WORK.NONMATCHES has 1197174 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 7.00 seconds
cpu time 3.23 seconds

 

Why am i getting 22 additional recors in nonmatches

Tom
Super User Tom
Super User

The totals should only add up if both datasets only have one observation per BY group.

Ranjeeta
Pyrite | Level 9
data EVTQBP EVTnotQBP;
merge EVT.EVT_hometime(in=INfullcohort) EVT.Evt_qbp(in=INQBP);
by CIHI_KEY;
if INfullcohort =1 and INQBP=0 then output EVTnotQBP;
ELSE output EVTQBP;
run;
EVT.Home time data set has 696 keys and EVT.Evt_qbp has 687 keys
I am trying to write out the 9 keys in the EVTnotQBP dataset and want to know if my code is doing it correctly?
Also in the EVTQBP dataset I want to write out the keys that are only present in the EVT.Evt_qbp dataset which has 687 observations
Is my code correct to get the desired output?

FreelanceReinh
Jade | Level 19

There are up to three (mutually exclusive) possible cases for the pair (INfullcohort, INQBP): (1, 0), (0, 1) and (1, 1). You write the observations belonging to the first case to dataset EVTnotQBP (which makes sense), but EVTQBP will then in general contain observations with "keys that are only present in the EVT.Evt_qbp dataset" plus observations with keys that are present in both datasets.

 

Edit: Regarding your previous question:

For example, even if the 620 keys (CIHI_KEY values) in SP340_WREGTIME_SORTED were unique, it could be that each of 22 of them match 2 observations in DAD1718_SORTED while 22 other keys don't match any observation. Now, if the remaining 620-22-22=576 keys lead to one-to-one matches, you'll get 576*1+22*2=620 observations in the "intersection" (SP340_WITH_REG_TIME_PTLHIN). Finally, DAD1718_SORTED will contribute 1197772-620 observations to dataset NONMATCHES in addition to the 22 with non-matching keys from SP340_WREGTIME_SORTED, resulting in a total of 1197772-620+22=1197174 observations in NONMATCHES.

Ranjeeta
Pyrite | Level 9
ok then how would i write the code to get only the EVT QBP cases
FreelanceReinh
Jade | Level 19

Just insert the appropriate IF condition into the ELSE statement:

else if INfullcohort=0 and INQBP=1 then output EVTQBP;
Ranjeeta
Pyrite | Level 9
Thankyou !!
FreelanceReinh
Jade | Level 19

Hi @Ranjeeta,

 

With the KEEP= dataset option you exclude all other variables. So, you should add the BY variable to the keep list:
keep=patient_lhin cihi_key.

Ranjeeta
Pyrite | Level 9
There were 620 observations read from the data set WORK.SP340_WREGTIME_SORTED.
NOTE: There were 1197772 observations read from the data set WORK.DAD1718_SORTED.
NOTE: The data set EVT.SP340_WITH_REG_TIME_PTLHIN has 620 observations and 17 variables.
NOTE: The data set WORK.NONMATCHES has 1197174 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 7.00 seconds
cpu time 3.23 seconds

Why am I getting 22 additional records
Ranjeeta
Pyrite | Level 9
thankyou

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 28324 views
  • 4 likes
  • 4 in conversation