proc sort data=old.Ex1_Missing_Weeks out=ds_com; by SUBJECT RecordPosition Ref_Date1_ Ref_Date2_ EXSTDT ; where EXSTDT ne ''; run;
proc sort data=new.Ex1_Missing_Weeks out=ds_new; by SUBJECT RecordPosition Ref_Date1_ Ref_Date2_ EXSTDT; run;
data ds_new;
set ds_new;
if RecordPosition=. then RecordPosition=0;
subject=strip(subject);
run;
proc sort data=ds_new; by SUBJECT RecordPosition Ref_Date1_ Ref_Date2_ EXSTDT; run;
data ds_com_;
set ds_com;
length EXSTDT1 $10.;
if Ref_Date0_ ne . then
Ref_Date0_n =put(Ref_Date0_ ,yymmdd10.);
else Ref_Date0_n='';
if Ref_Date1_ ne . then
Ref_Date1_n =put(Ref_Date1_ ,yymmdd10.);
else Ref_Date1_n='';
if Ref_Date2_ ne . then
Ref_Date2_n =put(Ref_Date2_ ,yymmdd10.);
else Ref_Date2_n='';
if EXSTDT not in ('','None') then do;
mm=scan(EXSTDT,1,'/');
dd=scan(EXSTDT,2,'/');
if length(EXSTDT) =8 then
EXSTDT1=strip(substr(EXSTDT,5))|| '-0'||substr(EXSTDT,1,1) || '-0'|| substr(EXSTDT,3,1) ;
else if length(EXSTDT) =9 and length(mm)=1 then
EXSTDT1=strip(substr(EXSTDT,6))|| '-0'||substr(EXSTDT,1,1) || '-' ||substr(EXSTDT,3,2) ;
else if length(EXSTDT) =9 and length(dd)=1 then
EXSTDT1=strip(substr(EXSTDT,6))|| '-' ||substr(EXSTDT,1,2) || '-0'|| substr(EXSTDT,4,1) ;
else if length(EXSTDT)=10 then
EXSTDT1=substr(EXSTDT,7)|| '-' || substr(EXSTDT,1,2) ||'-'|| substr(EXSTDT,4,2) ;
end;
else EXSTDT1=EXSTDT;
subject=strip(subject);
DM_Comments=strip(DM_comments);
if RecordPosition=. then RecordPosition=0;
drop mm dd EXSTDT Ref_Date1_ Ref_Date2_ Ref_Date0_;
run;
proc sort data=ds_com_ out=ds_com1(rename=(EXSTDT1=EXSTDT Ref_Date0_n=Ref_Date0_ Ref_Date1_n=Ref_Date1_ Ref_Date2_n=Ref_Date2_));
by SUBJECT RecordPosition Ref_Date1_n Ref_Date2_n EXSTDT1 DM_Comments; run; *EXSTDT1=EXSTDT;
proc sort data=ds_com1; by SUBJECT RecordPosition Ref_Date1_ Ref_Date2_ EXSTDT; run;
data fin merge.Ex1_Missing_Weeks;
merge ds_new(in=a) ds_com1(in=b);
by SUBJECT RecordPosition Ref_Date1_ Ref_Date2_ EXSTDT;
if a;
run;
With this code it appears you are doing something, possibly unnatural, with date values.
if Ref_Date0_ ne . then Ref_Date0_n =put(Ref_Date0_ ,yymmdd10.); else Ref_Date0_n=''; if Ref_Date1_ ne . then Ref_Date1_n =put(Ref_Date1_ ,yymmdd10.); else Ref_Date1_n=''; if Ref_Date2_ ne . then Ref_Date2_n =put(Ref_Date2_ ,yymmdd10.); else Ref_Date2_n=''; if EXSTDT not in ('','None') then do; mm=scan(EXSTDT,1,'/'); dd=scan(EXSTDT,2,'/'); if length(EXSTDT) =8 then EXSTDT1=strip(substr(EXSTDT,5))|| '-0'||substr(EXSTDT,1,1) || '-0'|| substr(EXSTDT,3,1) ; else if length(EXSTDT) =9 and length(mm)=1 then EXSTDT1=strip(substr(EXSTDT,6))|| '-0'||substr(EXSTDT,1,1) || '-' ||substr(EXSTDT,3,2) ; else if length(EXSTDT) =9 and length(dd)=1 then EXSTDT1=strip(substr(EXSTDT,6))|| '-' ||substr(EXSTDT,1,2) || '-0'|| substr(EXSTDT,4,1) ; else if length(EXSTDT)=10 then EXSTDT1=substr(EXSTDT,7)|| '-' || substr(EXSTDT,1,2) ||'-'|| substr(EXSTDT,4,2) ; end; else EXSTDT1=EXSTDT;
99.9% of the time I see something that starts by Putting a date value into a character version and then tearing it apart tells me that someone does not know the functions Year, Month, Day exist to extract numeric values. All that EXSTDT with substr looks like someone does not know that all you need to do to change the appearance of a date value is to use a format.
The following code will write a date to the results window in just a few of the date formats that SAS provides.
data example; file print; date= '15DEC2020'd; put "date9. format" date= date9.; put "mmyydd10. format" date= mmddyy10.; put "yymon. format" date= yymon.; put "yyq. format" date= yyq.; put "yymmdd10. format" date= yymmdd10.; put "mmddyyd10. format" date= mmddyyd10.; put "mmddyyc10. format" date= mmddyyc10.; put "mmddyyb10. format" date= mmddyyb10.; put "mmddyyp10. format" date= mmddyyp10.; run;
If you want to sort, compare or display dates then the way is to use the SAS date numeric value and use the desired format. The Format of the numeric value means nothing when it comes to merging or comparing unless you have malformed dates with non-integer values.
And if the exercise was because one set had date in character form it is almost certainly easier to change the character version to numeric and avoid all that string manipulation.
Now: Which variable from which data set are you "unable to bring values" while merging?
If both data sets have a variable of the same name that is NOT on the BY statement then that is the intended behavior of Merge. One of the uses of merge is to replace values.
In the following code I expect the values for Y from the data set Two to replace the Y values in data set One;
data one; input x y z; datalines; 1 1 1 2 2 2 3 3 3 ; data two; input x y q; datalines; 1 11 44 2 22 22 3 33 33 ; data example; merge one two; by x; run;
So if you are "missing" one of the variables you may need to rename it to avoid this overwrite behavior.
Which is very easy to do:
data example2; merge one two(rename=(y=y2)); by x; run;
Thank you so much for taking your time and explaining in detail. The issue is however not obvious to me as there are no Errors/Warnings/Notes written to the Log window.
I need bring over values from one particular column called DM_Comments in dataset ds_com1(in=b), as this is a listing which helps carry over the past comments and running with fresh data. So each time DM team just needs to look at the new records with missing DM_Comments and not review the whole output all over again.
Hope that helps understand the rationale for the listing.
I am doing the similar merging for several other listings and the values in the column get carried over. I also used Strip() function just to remove any unwanted spaces in the By variables. Also tried with Proc Sql but same output.
I have also attached the Log herewith.
I think you may just need to examine your result a bit closer:
From your log:
26551 data fin merge.Ex1_Missing_Weeks; 26552 merge ds_new(in=a) ds_com1(in=b); 26553 by SUBJECT ;*RecordPosition Ref_Date1_ Ref_Date2_ EXSTDT; 26554 if a; 26555 run; NOTE: There were 16976 observations read from the data set WORK.DS_NEW. NOTE: There were 16211 observations read from the data set WORK.DS_COM1.
If this is supposed to be a one-to-one match then from the number of observations in the source data sets then 765 observations do not have a match in the DS_Com1 data. If all those appear at the top of the data set you may just think the values didn't come in if you did not scroll down far enough.
Or if there are no actual matches at all then the "if a;" means that none of the Ds_com1 data would be available.
Try removing or commenting out the "if a;" and see if your output data set jumps to 16976+16211 = 33187 observations (or some number significantly larger than 16976) that will tell you either no matches or very few.
Thanks again !
It is not supposed to be a one to one match since one of the dataset contains output from last month and more records have been entered since then. Here are some details on the datasets -
1. Old Dataset from which Comments needs to be carried over - Ds_com1 -
Subject Recordpos EXDSTXT DM_COMMENTS EXSTDT REF_DATE0_ REF_DATE1_ REF_DATE2_
101 1 15.5 Completed Dose 2018-10-12 2018-10-16 2018-10-16 2018-10-16
101 2 10.5 Incomplete Dose 2018-10-12 2018-10-16 2018-10-16 2018-10-16
2. New Dataset with fresh data which does not contain Comments - Ds_new -
Subject Recordpos EXDSTXT EXSTDT REF_DATE0_ REF_DATE1_ REF_DATE2_
101 1 15.5 2018-10-12 2018-10-16 2018-10-16 2018-10-16
101 2 10.5 2018-10-16 2018-10-20 2018-10-20 2018-10-20
101 3 15.5 2018-10-18 2018-10-22 2018-10-22 2018-10-22
3. Expected Dataset after merge - Fin/Mer -
Subject Recordpos EXDSTXT DM_COMMENTS EXSTDT REF_DATE0_ REF_DATE1_ REF_DATE2_
101 1 15.5 Completed Dose 2018-10-12 2018-10-16 2018-10-16 2018-10-16
101 2 10.5 Incomplete Dose 2018-10-16 2018-10-20 2018-10-20 2018-10-20
101 3 15.5 2018-10-18 2018-10-22 2018-10-22 2018-10-22
So, for the old matching records DM_comments column should have data carried over while for the new records it should be blank.
Data.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
Make subsets of the final two data sets that you merge with ONE record each that should match.
Use the link above to generate a data step of each of those two one record data set.
Copy that code and paste it into a text box opened with the </>. This is critical. The message boxes on this forum will reformat text and often does so in a manner that the generated data steps will not run.
Then we can test the data.
Note: showing example data from either views, proc print output or similar can be influenced by things like formats so that you think the values are the same but the underlying values are not actually equal.
Thanks once again for your help! I figured the issue was with another program used to convert CSV into a sas dataset. Because of which some hidden spaces were getting saved in the column values used for merging. I recreated the dataset and it solved the issue.
Missing a column can be either because of dropping it unintentionally or because of renaming it.
You did not mention which column you miss. Could it be that you searched with mistyping the variable name ?
Check the DROP and RENAME statements in your code.
Hello,
The column is present in the merged dataset but with missing values or No values. However DM team has entered data in it so it should be present with values/data. Also, this column is only present in one dataset so overwriting is also not the possibility.
Thank you !
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.