Solved: how to merge keeping one row for each key with missing values

lvalencia · Posted 02-15-2023 10:52 AM

I'm trying to merge two tables as below:

Table 1

ID	Column1
1	0
2	0
3	10
4	50
5	20

Table2

ID	Column2	Column3
1	27	.
2	.	.
3	5	.
4	52	71
5	28	84

The two tables are merged with the code below:

DATA table_merged;

        MERGE table1 table2;

        BY ID;

        proc print data=table_merged;

RUN;

I'm receiving the table below:

table_merged :

ID	Column1	Column2	Column3
1	0	.	.
.	.	27	.
2	0	.	.
3	10	.	.
.	.	5	.
4	50	52	71
5	20	28	84

Even when the ID fields have the same Format, it seems the missing values are saved in another row without ID. Can someone please help me to understand what I'm missing here?

ballardw · Posted 02-22-2023 07:18 AM

Try printing Test4 with the Format of your cohort variable changed to something like yymmdd10. or Date9.

The symptom you show is quite likely that the underlying value of the cohort variable is different.

You don't provide an example of the data from work.cl_parameters_actuals. One suspects that at least some of the dates in the Cohort variable there are not actually the last day of the quarter. The behavior in proc means/summary of a format to create a group from a variable is that the lowest value present in the data is used for the result. Since you have created your Cohort variable in your Qtrlist data set to be the END of the quarter then it is very likely that the actual date values used for the MERGE do not match, creating multiple records.

So to get what I believe you expect force the cohort variable to be the same date in both sets. That likely means a pass through a data step for your work.cl_parameters_actuals data set and INTNX to the end of the intervals.

View solution in original post

Quentin · Posted 02-15-2023 11:14 AM

You're right, this doesn't make sense. When you merge two data sets by ID, it is not possible for the output dataset to have values of ID which are not in either of the two data sets you merged.

I would double check the values of ID in table1 and table2, to make sure they don't have any missing values.

proc freq data=table1;
   tables ID /missing;
run;

proc freq data=table2;
   tables ID /missing;
run;

proc freq data=table_merged;
   tables ID /missing;
run;

Astounding · Posted 02-15-2023 11:15 AM

The format of the variables does not influence the outcome. However ...

There is no way on God's green earth that those data sets combined to produce the output you have shown. Neither data set contains an observation where ID has a missing value. So the combined data set cannot contain such a value.

As a best guess, one data set contains slightly different values than posted. For example, Table2 does not contain an observation with ID=1. Instead, it contains an observation with ID=1.00000000001 and that is rounded to 1 for printing purposes. That wouldn't explain everything about the results you got, but it would explain a lot.

Or perhaps ID is a character variable, and a couple of values in Table2 contain a trailing tab character instead of a trailing blank.

ballardw · Posted 02-15-2023 11:18 AM

One suspects that your tables do not contain what you think that they may.

Creating data sets using you shown values:

data table1;
  input id column1;
datalines;
1 0
2 0
3 10
4 50
5 20
;

data table2;
   input id column2 column3;
datalines;
1 27 .
2 . .
3 5 .
4 52 71
5 28 84
;

DATA table_merged;
    MERGE table1 table2;
     BY ID;
run;

proc print data=table_merged;
run;

Result:

Obs	id	column1	column2	column3
1	1	0	27	.
2	2	0	.	.
3	3	10	5	.
4	4	50	52	71
5	5	20	28	84

Provide examples of your data as something other than a word processor table. Best is to convert your data to data step code: Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

Or show the LOG from your merge step. Copy the code and all the notes or messages and paste into a text box.

Tom · Posted 02-15-2023 11:22 AM

What you ran is probably different than what you think you ran.

Let's convert your text into actual code and try it.

data table1 ;
  input ID Column1;
cards;
1 0
2 0
3 10
4 50
5 20
;
data table2;
  input ID Column2 Column3;
cards;
1 27 .
2 . .
3 5 .
4 52 71
5 28 84
;

data want;
  merge table1 table2;
  by id;
run;

proc print;
run;

Result

Obs    ID    Column1    Column2    Column3

 1      1        0         27          .
 2      2        0          .          .
 3      3       10          5          .
 4      4       50         52         71
 5      5       20         28         84

The only way to get the pattern of merging you show is if the values of ID for two of the observations in TABLE2 are slightly different.

data table2;
  input ID Column2 Column3;
cards;
1.1 27 .
2 . .
3.1 5 .
4 52 71
5 28 84
;

But that does not explain why the values of ID do not print.

Obs     ID    Column1    Column2    Column3

 1     1.0        0          .          .
 2     1.1        .         27          .
 3     2.0        0          .          .
 4     3.0       10          .          .
 5     3.1        .          5          .
 6     4.0       50         52         71
 7     5.0       20         28         84

lvalencia · Posted 02-22-2023 06:14 AM

Dear All,

Apologise for the initial post. (struggling to understand how posts here work, anyways)

Below the code I'm running:

%let start_qtr = '31JAN2016'd;
%let stop_qtr = intnx("quarter", today(), -1, 'e');

data qtrlist;
	diff=intck('quarter',&start_qtr,&stop_qtr);
		do i= 0 to diff;
		cohort=intnx('quarter',&start_qtr,i,'e');
			output;
		end;
		format cohort yyq4.;
		drop i diff ;
run;

PROC MEANS data = work.cl_parameters_actuals sum maxdec=2;
	where report_date = "&rep_end."d and date_cancellation >= "&rep_begin."d and date_cancellation <= "&rep_end."d;
	class cohort;
	var balance_cancelled actuals_def_amount early_month_defaults later_month_defaults;
	format cohort yyq4.;
	label balance_cancelled = "Default";
	label actuals_def_amount = "Default based on dunning level 4";
	label early_month_defaults = "Early month defaults";
	label later_month_defaults = "Later month defaults";
	output out=default_actuals_by_qtr (drop=_TYPE_ _FREQ_) sum= / autoname;
RUN;

PROC SORT data=default_actuals_by_qtr;
   by cohort;
	where not missing(cohort);
RUN;

DATA test4;
   merge qtrlist default_actuals_by_qtr;
   by cohort;
  proc print data=test4; 
RUN;

Below a shorcut of the table I receive:

It seems the merge is not working well. Happy to learn what is my mistake here. Thanks!

ballardw · Posted 02-22-2023 07:18 AM

Try printing Test4 with the Format of your cohort variable changed to something like yymmdd10. or Date9.

The symptom you show is quite likely that the underlying value of the cohort variable is different.

You don't provide an example of the data from work.cl_parameters_actuals. One suspects that at least some of the dates in the Cohort variable there are not actually the last day of the quarter. The behavior in proc means/summary of a format to create a group from a variable is that the lowest value present in the data is used for the result. Since you have created your Cohort variable in your Qtrlist data set to be the END of the quarter then it is very likely that the actual date values used for the MERGE do not match, creating multiple records.

So to get what I believe you expect force the cohort variable to be the same date in both sets. That likely means a pass through a data step for your work.cl_parameters_actuals data set and INTNX to the end of the intervals.

how to merge keeping one row for each key with missing values

Re: how to merge keeping one row for each key with missing values

Re: how to merge keeping one row for each key with missing values

Re: how to merge keeping one row for each key with missing values

Re: how to merge keeping one row for each key with missing values

Re: how to merge keeping one row for each key with missing values

Re: how to merge keeping one row for each key with missing values

Re: how to merge keeping one row for each key with missing values

Registration is open

Registration is open

SAS Training: Just a Click Away