Solved: Re: Problems with merging two exactly same datastes with differnces of...

VinnyR · Posted 02-25-2019 01:01 PM

152
1153  proc sort data=lbc;
1154    by subject foldername lbdtn lbdtc recordid;
1155  run;

NOTE: There were 399 observations read from the data set WORK.LBC.
NOTE: SAS sort was used.
NOTE: The data set WORK.LBC has 399 observations and 83 variables.
NOTE: Compressing data set WORK.LBC decreased size by 86.67 percent.
      Compressed is 2 pages; un-compressed would require 15 pages.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 seconds


1156
1157  proc transpose data=lbc out=trlbcraw;
1158    by subject foldername lbdtn lbdtc recordid;
1159    var na_raw k_raw creat_raw  cl_raw BICARB_raw gluc_raw bun_raw ca_raw ua_raw phos_raw mg_raw creact_raw ck_raw bnp_raw;
1160  run;

NOTE: There were 399 observations read from the data set WORK.LBC.
NOTE: The data set WORK.TRLBCRAW has 5586 observations and 8 variables.
NOTE: Compressing data set WORK.TRLBCRAW decreased size by 71.43 percent.
      Compressed is 10 pages; un-compressed would require 35 pages.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds


1161
1162  proc transpose data=lbc out=trlbcun;
1163    by subject foldername lbdtn lbdtc recordid;
1164    var na_un k_un creat_un  cl_un BICARB_un gluc_un bun_un ca_un ua_un phos_un mg_un creact_un ck_un bnp_un;
1165  run;

NOTE: There were 399 observations read from the data set WORK.LBC.
NOTE: The data set WORK.TRLBCUN has 5586 observations and 8 variables.
NOTE: Compressing data set WORK.TRLBCUN decreased size by 83.64 percent.
      Compressed is 9 pages; un-compressed would require 55 pages.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
      real time           0.05 seconds
      cpu time            0.06 seconds


1166
1167  data chem;
1168    merge  trlbcraw(in=x1 rename=(col1=LBORRES) drop=_name_ _label_) trlbcun(in=x2 rename=(col1=LBORRESU)drop=_name_ _label_);
1169    by subject foldername lbdtn lbdtc recordid;
1170    if x1;
1171  run;

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 5586 observations read from the data set WORK.TRLBCRAW.
NOTE: There were 5586 observations read from the data set WORK.TRLBCUN.
NOTE: The data set WORK.CHEM has 5586 observations and 7 variables.
NOTE: Compressing data set WORK.CHEM decreased size by 86.54 percent.
      Compressed is 7 pages; un-compressed would require 52 pages.
NOTE: DATA statement used (Total process time):
      real time           0.05 seconds
      cpu time            0.06 seconds

Hi,

can somebody help on the merge statement issue. I want the exactly same output but without the Merge statement note. I think Proc Sql can help, but I am not able to figure how to get there. Inputs are appreciated.

EDITED: 1:40 PM EST. Thanks

Thanks Kurt, wasn't aware about this functionality

Tom · Posted 02-25-2019 03:40 PM

If you just want to match observations from two datasets here are some choices.

Use MERGE without BY statement. Make sure to set the right setting for the MERGENOBY option.
Add an new variable to use for the BY statement.
Use two separate SET statements.

options mergnoby=nowarn;
data one;
  merge A B ;
run;

data Ax ;
  n+1;
  set a;
run;
data Bx;
  n+1;
  set b;
run;
data two;
  merge ax bx;
  by n;
run;

data three;
  set a;
  set b;
run;

Whether any of these make sense for your code I don't know since I don't really understand what you are trying to do.

Note that if the number of records do NOT match between the two datasets then the results will be different based on the method used. In the last one the extra records from the longer file will be lost. In the first one the values from the last record from the shorter file will be repeated for all of the extra records from the longer file. In the middle one the variables from the shorter file will be missing on the extra records.

View solution in original post

Kurt_Bremser · Posted 02-25-2019 01:31 PM

Please post logs or code by simply copy/pasting them into a window opened with the {i} or "little running man" button. Then we all can read them, even behind corporate firewalls that prevent the download of Office filed.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

VinnyR · Posted 02-25-2019 01:42 PM

Thanks, edited the post!

Astounding · Posted 02-25-2019 02:02 PM

You have the right number of observations in the right order for both incoming data sets. Just remove the BY statement and the IF statement.

VinnyR · Posted 02-25-2019 02:07 PM

By statement is a required statement if you merge datasets

SASKiwi · Posted 02-25-2019 02:49 PM

No it is not. You won't get an error if you MERGE datasets without a BY statement.

Astounding · Posted 02-25-2019 02:47 PM

No, it's not. It's often needed to get the proper result, but it isn't needed in this case.

VinnyR · Posted 02-25-2019 02:53 PM

1399
1400 data chem;
1401 merge trlbcraw(in=x1 rename=(col1=LBORRES) drop=_name_ _label_) trlbcun(in=x2 rename=(col1=LBORRESU)drop=_name_ _label_);
1402 *by subject foldername lbdtn lbdtc recordid;
1403 run;

ERROR: No BY statement was specified for a MERGE statement.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.CHEM may be incomplete. When this step was stopped there were 0 observations and 7 variables.
WARNING: Data set WORK.CHEM was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

VinnyR · Posted 02-25-2019 02:55 PM

Organization specific rule, may be! Any other solution?

Kurt_Bremser · Posted 02-25-2019 03:20 PM

You have a non-default setting of system option MERGENOBY.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

VinnyR · Posted 02-25-2019 03:32 PM

Thanks, Kurt, but I am not sure if it is a good idea to use this option. Is there nothing wrong in using this option?

Kurt_Bremser · Posted 03-01-2019 04:20 AM

This option can be set to a non-default value in order to alert programmers to a missing by statement when a merge is used (this is usually not wanted and causes problems).

If the option is set to ERROR for your organization, and you need a merge without by, you can temporarily set the option to NOWARN before the step and then reset it to ERROR after the step in order to comply with coding policies. When all three steps (option-data-option) happen in immediate succession, it is clear what you are doing and that you are doing it on purpose.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

VinnyR · Posted 03-01-2019 09:50 AM

Thanks a lot Kurt,

I understand your point now.

Thanks,
Vinny

Tom · Posted 02-25-2019 03:40 PM

If you just want to match observations from two datasets here are some choices.

Use MERGE without BY statement. Make sure to set the right setting for the MERGENOBY option.
Add an new variable to use for the BY statement.
Use two separate SET statements.

options mergnoby=nowarn;
data one;
  merge A B ;
run;

data Ax ;
  n+1;
  set a;
run;
data Bx;
  n+1;
  set b;
run;
data two;
  merge ax bx;
  by n;
run;

data three;
  set a;
  set b;
run;

Whether any of these make sense for your code I don't know since I don't really understand what you are trying to do.

Note that if the number of records do NOT match between the two datasets then the results will be different based on the method used. In the last one the extra records from the longer file will be lost. In the first one the values from the last record from the shorter file will be repeated for all of the extra records from the longer file. In the middle one the variables from the shorter file will be missing on the extra records.

VinnyR · Posted 02-25-2019 03:59 PM

Thanks for second and third options TOM!

Problems with merging two exactly same datastes with differnces of results from proc transpose

Re: Problems with merging two exactly same datastes with differnces of results from proc transpose

Re: Problems with merging two exactly same datastes with differnces of results from proc transpose

Re: Problems with merging two exactly same datastes with differnces of results from proc transpose

Re: Problems with merging two exactly same datastes with differnces of results from proc transpose

Re: Problems with merging two exactly same datastes with differnces of results from proc transpose

Re: Problems with merging two exactly same datastes with differnces of results from proc transpose

Re: Problems with merging two exactly same datastes with differnces of results from proc transpose

Re: Problems with merging two exactly same datastes with differnces of results from proc transpose

Re: Problems with merging two exactly same datastes with differnces of results from proc transpose

Re: Problems with merging two exactly same datastes with differnces of results from proc transpose

Re: Problems with merging two exactly same datastes with differnces of results from proc transpose

Re: Problems with merging two exactly same datastes with differnces of results from proc transpose

Re: Problems with merging two exactly same datastes with differnces of results from proc transpose

Re: Problems with merging two exactly same datastes with differnces of results from proc transpose

Re: Problems with merging two exactly same datastes with differnces of results from proc transpose

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away