BookmarkSubscribeRSS Feed
Dhana18
Obsidian | Level 7

Hi Reeza,

I did order freq to find out if there are any duplicate ids and there are no duplicate ids

proc freq data=STDTELMT.Treat18_TRANSPOSED order=freq;
table event_id;
run;

the visit data set has 2105 observations, treat data set has 752 observation with common variable event_id. I sorted both data set by event_id and tried to merge and they are not merged by event id instead treat data set is stacked on top of visit data set. Could you please tell me what did do wrong?

 

Sorting visit data by event_id:

proc sort data=STDTELMT.VISIT18_MERGED;
by event_id;
run;

 

/*SORTING TREATMENT DATA BEFORE TRANSPOSING BY EVENT_ID*/
PROC SORT DATA=RX_MERGED;;
BY EVENT_ID;
RUN;
/*REMOVING DUPLICATES*/
data STDTELMT.TREAT_COUNT;
set RX_MERGED;
by event_id;
t_f=first.event_id;
t_l=last.event_id;
if t_f=1;
run;
/*TRANSPOSING TREATMENT DATSET*/
proc transpose data=STDTELMT.TREAT_COUNT out=STDTELMT.Treat18_TRANSPOSED;
by EVENT_ID ;
ID RX;
var RX;
run;

merging two (visit and treat) data sets

DATA STDTELMT.MERGED ;
MERGE STDTELMT.VISIT18_MERGED STDTELMT.Treat18_TRANSPOSED;
by event_id;
RUN;

the log says :

NOTE: There were 2105 observations read from the data set STDTELMT.VISIT18_MERGED.
NOTE: There were 752 observations read from the data set STDTELMT.TREAT18_TRANSPOSED.
NOTE: The data set STDTELMT.MERGED has 2857 observations and 41 variables.
NOTE: DATA statement used (Total process time):
real time 0.13 seconds
cpu time 0.01 seconds

2 REPLIES 2
ballardw
Super User

@Dhana18 wrote:

Hi Reeza,

I did order freq to find out if there are any duplicate ids and there are no duplicate ids

proc freq data=STDTELMT.Treat18_TRANSPOSED order=freq;
table event_id;
run;

 

DATA STDTELMT.MERGED ;
MERGE STDTELMT.VISIT18_MERGED STDTELMT.Treat18_TRANSPOSED;
by event_id;
RUN;

the log says :

NOTE: There were 2105 observations read from the data set STDTELMT.VISIT18_MERGED.
NOTE: There were 752 observations read from the data set STDTELMT.TREAT18_TRANSPOSED.
NOTE: The data set STDTELMT.MERGED has 2857 observations and 41 variables.
NOTE: DATA statement used (Total process time):
real time 0.13 seconds
cpu time 0.01 seconds


Since 2105 + 752 = 2857 that indicates that there are no exact matches at all between the values of the EVENT_ID variable in STDTELMT.VISIT18_MERGED and STDTELMT.TREAT18_TRANSPOSED

 

So you need to look very closely at the values of your variable.

If the variable is character look for leading spaces or differences in capitalization between values you think match. Also there might be other non-printable characters trailing. Check the assigned length of the values to see if they match.

data example;
   x='abc ';
   y=length(x);
   put y=;
run;

In the above code the character after the c in 'abc ' is not a blank but the ASCII null character. So while the output might look like X is 3 characters it actually has 4 and would not match a value of 'abc' or 'abc ' (with a space) for merging.

 

If the variable is numeric you might accidentally have decimal portions of the value in one or both sets that don't match.

FreelanceReinh
Jade | Level 19

Hi @Dhana18,

 

ballardw's advice is very good (as always). If I were to implement it, I would proceed as follows:

 

First, print unformatted values

 

proc print data=stdtelmt.treat18_transposed(firstobs=752);
format event_id;
var event_id;
run;

proc print data=stdtelmt.visit18_merged(obs=1);
format event_id;
var event_id;
run;

Both steps print only one EVENT_ID value. According to your problem description ("treat data set is stacked on top of visit data set"), the first of the two printed values must be lexicographically smaller than the second (assuming character variables, otherwise: numerically smaller) -- contrary to your expectation of matching values in VISIT18_MERGED for each EVENT_ID from TREAT18_TRANSPOSED. If this does not seem to be the case, please repeat the above two steps with an extended FORMAT statement:

 

format event_id $hex30.;

(again, assuming character variables EVENT_ID, of length <=15; if they are numeric, please use format event_id hex16.;)

and post the results.

 

 

 

@ Super Users: It looks like this thread is actually a continuation of this one and the two should be merged.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 525 views
  • 0 likes
  • 3 in conversation