Hello,
I have to merge two data sets by subject_id and measure date.
First data set is as follows, (data1)
Subject_id. Measure-date
ABC0001 8/2017
ABC0002 10/2020
ABC0005 9/2017
Second data set is as follows; (data2)
Subject_id Measuredate height weight record_number
ABC0001 5/2012 176 56 500
ABC0001 6/2016 180 70 500
ABC0001 8/2017 182 99 500
ABC0002 10/2020 160 55 510
ABC0002 11/2021 170 66 510
ABC0005 9/2017 190 80 502
Desired data is as follows; (data want)
Subject_id measure date height weight record_number
ABC0001 8/2017 182 99 500
ABC0002 10/2020 160 55 510
ABC0005 9/2017 190 80 502
Problem is as follows:
The data that I get:
Subject_id measure date height weight record_number
ABC0001 8/2017 182 99 500
ABC0002 10/2020 . . .
ABC0005 9/2017 . . .
The dataset I am working on is ACTUALLY thousands of observations. When I merge the data sets, I get some observations merged as desired, while the majority of the other observations are appearing as missing data. I am not sure what the problem is.
The code I am using is:
Proc sort data=data1; by subject_id measuredate;run;
proc sort data=data2; by subject_id measure date; run;
data want;
merge data1 (in=a) data2 (in=b); by subject_id measuredate; if b; run;\
I strongly suspect that "measure date" is your problem. It appears to be a character variable and it looks like it is affected by leading blanks. To confirm what your variable types are run this:
proc contents data = data1;
run;
proc contents data = data2;
run;
If measure date is character, then it would be a very good idea to convert it to a numeric SAS date. That might require adding a day to the month and year you currently have like the first or last day of the month.
@Rahim_221 I don't see how the data you showed in the initial question could possibly be a three-component date. You show it as a single variable, and the values only display month and year:
First data set is as follows, (data1)
Subject_id. Measuredate
ABC0001 8/2017
ABC0002 10/2020
ABC0005 9/2017
The only way that those values could contain a day portion is if they were numeric SAS dates being displayed with a format like mmyys10. For the other data set, you say Measuredate contains only month and year. For that to be true, it would have to be a character (text) value. And if both of those things are true, you can't perform a SAS DATA step merge based on variables of different types. So it would be useful to provide code that will re-recreate the two data sets to help us better understand the issues you are experiencing.
You could try using the data2datastep macro to produce the code required to reproduce your data. For example, this SAS program grabs the source code for the data2datastep macro from the internet and compiles the macro for you. Then, a macro call reads the cars dataset from the sashelp library and generates a program that will re-create 10 observations of the data in the work library. The program is saved to a file named make_cars_data.sas and in home directory (~/😞
/* Get and compile the macro code from GitHub */
filename getmacro url "https://raw.githubusercontent.com/SASJedi/sas-macros/master/data2datastep.sas";
%include getmacro;
filename getmacro clear;
/* Call the macro to do the work */
%DATA2DATASTEP(cars,sashelp,work,~/make_cars_data.sas,10)
If you just leave the last parameter blank, the whole dataset will be re-created. After running that program, the make_cars_data.sas looks like this:
data work.CARS(label='2004 Car Data');
infile datalines dsd truncover;
input Make:$13. Model:$40. Type:$8. Origin:$6. DriveTrain:$5. MSRP:DOLLAR8. Invoice:DOLLAR8. EngineSize:32. Cylinders:32. Horsepower:32. MPG_City:32. MPG_Highway:32. Weight:32. Wheelbase:32. Length:32.;
format MSRP DOLLAR8. Invoice DOLLAR8.;
label EngineSize="Engine Size (L)" MPG_City="MPG (City)" MPG_Highway="MPG (Highway)" Weight="Weight (LBS)" Wheelbase="Wheelbase (IN)" Length="Length (IN)";
datalines4;
Acura,MDX,SUV,Asia,All,"$36,945","$33,337",3.5,6,265,17,23,4451,106,189
Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820","$21,761",2,4,200,24,31,2778,101,172
Acura,TSX 4dr,Sedan,Asia,Front,"$26,990","$24,647",2.4,4,200,22,29,3230,105,183
Acura,TL 4dr,Sedan,Asia,Front,"$33,195","$30,299",3.2,6,270,20,28,3575,108,186
Acura,3.5 RL 4dr,Sedan,Asia,Front,"$43,755","$39,014",3.5,6,225,18,24,3880,115,197
;;;;
If you do that for both of your data sets and then share the resulting code to reproduce them here, we can help resolve the problem more quickly.
All the best,
Mark
If both variables are actually SAS date values, align them to the first day of the month by using the INTNX function with the "b" modifier.
measure_date = intnx('month',measure_date,0,'b');
It is only going to work if the variables are storing the same type of information.
If they are numeric variables with date values then make sure they are aligned to the same date of the month.
measure_date = intnx('month',measure_date,0);
If they are character strings make sure they are consistently formatted. Do the months have leading zeros or not? Do the values have leading spaces or not? It would probably be better to convert them into date values instead of leaving them as character strings.
measure_date_number = input(cats('1/',measure_date),ddmmyy10.);
format measure_date_number yymms7.;
If you leave them as charcter strings then standardize the style used.
measure_date = put(input(cats('1/',measure_date),ddmmyy10.),yymms7.);
Note that it is best to store dates as character strings in YMD order so that when sorted they will appear in chronological order.
You apparently want one record per id, containing measure_date from DATA1 (which only has 1 obs per id), and all other variables from the most recent obs in DATA2.
If so, assuming data1 is sorted by ID, and data2 is sorted by ID/measure_date, then:
data want;
merge data1 data2 (drop=measure_date);
by id;
if last.id;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.