BookmarkSubscribeRSS Feed
Jannie_D
Calcite | Level 5

Hi I get the error message "Error By variables are not properly sorted. but I have a proc sort step for each dataset prior to the merge statement.

code looks like this: (sadly can copy-paste as data is on a secure server without browser access.)

 

proc sort data="....(pathway)....." out=Population;

by pnr;

run;

Proc print data=Population;

run;

proc sort data="....(pathway)....." out=DTSPNR;

by pnr;

run;

Proc print data=DTSPNR;

 

Data DTS_Merged_Population;

merge "...(pathway)..." "...(pathway)...";

by pnr;

run;

 

what am I doing wrong?

 

 

12 REPLIES 12
Jannie_D
Calcite | Level 5
I sadly ca not post it as I can't copy paste it, because of safety restrictions.

but it has notes prior to the error saying
note: format $DTS_JN was not found or could not be loaded
for around 30 variables.
so I guess that is the problem. but how do I solve that?

andreas_lds
Jade | Level 19

@Jannie_D wrote:

Hi I get the error message "Error By variables are not properly sorted. but I have a proc sort step for each dataset prior to the merge statement.

code looks like this: (sadly can copy-paste as data is on a secure server without browser access.)

 

proc sort data="....(pathway)....." out=Population;

by pnr;

run;

Proc print data=Population;

run;

proc sort data="....(pathway)....." out=DTSPNR;

by pnr;

run;

Proc print data=DTSPNR;

 

Data DTS_Merged_Population;

merge "...(pathway)..." "...(pathway)...";

by pnr;

run;

 

what am I doing wrong?

 

 


The problem is highlighted in orange. In the merge-statement, you must use the sorted versions of your datasets.

Jannie_D
Calcite | Level 5
Thank you.
I will try to change that.
would a libref be sufficient or should I create a datastep first?
I'm a SAS beginner
benjamin_2018
Fluorite | Level 6

All you need at this point is to list the sorted datasets in the merge statement. Just keep in mind that any variables that each dataset has in common, the dataset listed 2nd will overwrite values in the 1st. 

yabwon
Onyx | Level 15

Hi @Jannie_D 

 

One additional note to @benjamin_2018 comment,

if you have many-to-many keys in merge it won't work the same way SQL's joins do.

 

What @andreas_lds wrote can be simply done in the following way:

 

proc sort data="....(pathway)....." out=Population;
  by pnr;
run;

Proc print data=Population;
run;

proc sort data="....(pathway)....." out=DTSPNR;
  by pnr;
run;

Proc print data=DTSPNR;
run; 

Data DTS_Merged_Population;
  merge Population DTSPNR;
  by pnr;
run;

Btw. if I were you I would change the "....(pathway)....." to libname statement.

 

All the best

Bart

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Jannie_D
Calcite | Level 5

none of the solutions helped other ideas 

I now added af data step before the rest to avoid using a pathway in the merge step

 

 

Kurt_Bremser
Super User

POST THE LOG.

Edit out any confidential information (overwrite it with X's), but leave the horizontal structure and all messages intact.

Post the complete log of all SORT and DATA steps, as you run them in one submit.

 

Without seeing the log, it's next to impossible to help someone who's new to the field and has problems describing what exactly is done and what exactly happens.

Jannie_D
Calcite | Level 5
As I wrote earlier because it is a secure drive with acces to a browser is it not possible to do a copy paste. And there a many lines in that log.
Tom
Super User Tom
Super User

Let's step through your program and explain what you should see in the log.

proc sort data="....(pathway)....." out=Population;
  by pnr;
run;

Proc print data=Population;
run;

The first step should show a NOTE saying that the dataset WORK.POPULATION has been created with XX observations and YY variables.  The second step should print ALL of that dataset to all of your open output destination.  This step is NOT needed for the merge to work (or for you to know that the sort worked as the LOG notes will show that.)

You then repeat the same for a second dataset.

And your final step is the one that should combine the data.  Notice that the two names listed on the MERGE statement must match exactly the two names listed in the OUT= option of the two earlier PROC SORT steps (and the DATA= option of the PROC PRINT steps).

data DTS_Merged_Population;
  merge Population DTSPNR;
  by pnr;
run;

This will also output a note about creating WORK.DTS_MERGED_POPULATION and show how many observations and how many variables.  The number of observations and variables will vary based on how the data is structured and how many matches of PNR values there are but there are some bounds we can place. 

 

The minimum number of observations will be the larger of the sizes of the two inputs. The maximum number of observations will be the SUM of the sizes of the two individual datasets, that could happen if none of the value of PNR match between the two datasets.

 

The minimum number of variables will be the larger number from the two input datasets.  That would mean that every variable that is in the dataset with the most variables is also in the other dataset.  (note if this is the case then perhaps you did not want to merge the data at all, instead perhaps you meant to use the SET statement instead of the MERGE statement so that the two datasets are interleaved by PNR in the new dataset.) The maximum number of variables would be the SUM minus one.  That would mean that the only common variable is PNR.

 

If you see a NOTE warning that multiple datasets had multiple observations for the same BY group then the is probably NOT the way you want to combine the data.

Astounding
PROC Star

Let's assume you fixed all the earlier errors, including the MERGE statement which should now read:

merge Population DTSPNR;

The error message about BY variables would not normally occur with a single BY variable (PNR).  By any chance, is the code you posted an abbreviation for a more complex program, and you actually are merging by two variables (not just PNR)?

Jannie_D
Calcite | Level 5

I think the problem might me that I need to run the DTS dataset with another dataset called DTS format, Så that it is formated correctly. But How to I do that. there are no commen variables, but one variable in the format file covers the labels in the DTS file. any suggestions?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 2706 views
  • 0 likes
  • 8 in conversation