Hi I get the error message "Error By variables are not properly sorted. but I have a proc sort step for each dataset prior to the merge statement.
code looks like this: (sadly can copy-paste as data is on a secure server without browser access.)
proc sort data="....(pathway)....." out=Population;
by pnr;
run;
Proc print data=Population;
run;
proc sort data="....(pathway)....." out=DTSPNR;
by pnr;
run;
Proc print data=DTSPNR;
Data DTS_Merged_Population;
merge "...(pathway)..." "...(pathway)...";
by pnr;
run;
what am I doing wrong?
Post your log please
@Jannie_D wrote:
Hi I get the error message "Error By variables are not properly sorted. but I have a proc sort step for each dataset prior to the merge statement.
code looks like this: (sadly can copy-paste as data is on a secure server without browser access.)
proc sort data="....(pathway)....." out=Population;
by pnr;
run;
Proc print data=Population;
run;
proc sort data="....(pathway)....." out=DTSPNR;
by pnr;
run;
Proc print data=DTSPNR;
Data DTS_Merged_Population;
merge "...(pathway)..." "...(pathway)...";
by pnr;
run;
what am I doing wrong?
The problem is highlighted in orange. In the merge-statement, you must use the sorted versions of your datasets.
All you need at this point is to list the sorted datasets in the merge statement. Just keep in mind that any variables that each dataset has in common, the dataset listed 2nd will overwrite values in the 1st.
Hi @Jannie_D
One additional note to @benjamin_2018 comment,
if you have many-to-many keys in merge it won't work the same way SQL's joins do.
What @andreas_lds wrote can be simply done in the following way:
proc sort data="....(pathway)....." out=Population;
by pnr;
run;
Proc print data=Population;
run;
proc sort data="....(pathway)....." out=DTSPNR;
by pnr;
run;
Proc print data=DTSPNR;
run;
Data DTS_Merged_Population;
merge Population DTSPNR;
by pnr;
run;
Btw. if I were you I would change the "....(pathway)....."
to libname statement.
All the best
Bart
none of the solutions helped other ideas
I now added af data step before the rest to avoid using a pathway in the merge step
POST THE LOG.
Edit out any confidential information (overwrite it with X's), but leave the horizontal structure and all messages intact.
Post the complete log of all SORT and DATA steps, as you run them in one submit.
Without seeing the log, it's next to impossible to help someone who's new to the field and has problems describing what exactly is done and what exactly happens.
Let's step through your program and explain what you should see in the log.
proc sort data="....(pathway)....." out=Population;
by pnr;
run;
Proc print data=Population;
run;
The first step should show a NOTE saying that the dataset WORK.POPULATION has been created with XX observations and YY variables. The second step should print ALL of that dataset to all of your open output destination. This step is NOT needed for the merge to work (or for you to know that the sort worked as the LOG notes will show that.)
You then repeat the same for a second dataset.
And your final step is the one that should combine the data. Notice that the two names listed on the MERGE statement must match exactly the two names listed in the OUT= option of the two earlier PROC SORT steps (and the DATA= option of the PROC PRINT steps).
data DTS_Merged_Population;
merge Population DTSPNR;
by pnr;
run;
This will also output a note about creating WORK.DTS_MERGED_POPULATION and show how many observations and how many variables. The number of observations and variables will vary based on how the data is structured and how many matches of PNR values there are but there are some bounds we can place.
The minimum number of observations will be the larger of the sizes of the two inputs. The maximum number of observations will be the SUM of the sizes of the two individual datasets, that could happen if none of the value of PNR match between the two datasets.
The minimum number of variables will be the larger number from the two input datasets. That would mean that every variable that is in the dataset with the most variables is also in the other dataset. (note if this is the case then perhaps you did not want to merge the data at all, instead perhaps you meant to use the SET statement instead of the MERGE statement so that the two datasets are interleaved by PNR in the new dataset.) The maximum number of variables would be the SUM minus one. That would mean that the only common variable is PNR.
If you see a NOTE warning that multiple datasets had multiple observations for the same BY group then the is probably NOT the way you want to combine the data.
Let's assume you fixed all the earlier errors, including the MERGE statement which should now read:
merge Population DTSPNR;
The error message about BY variables would not normally occur with a single BY variable (PNR). By any chance, is the code you posted an abbreviation for a more complex program, and you actually are merging by two variables (not just PNR)?
I think the problem might me that I need to run the DTS dataset with another dataset called DTS format, Så that it is formated correctly. But How to I do that. there are no commen variables, but one variable in the format file covers the labels in the DTS file. any suggestions?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.