DATA Step, Macro, Functions and more

BY variable not properly sorted?

Reply
N/A
Posts: 0

BY variable not properly sorted?

Hi,
I have not used SAS is a while so I may be missing something obvious here. I've tried a few approaches (dropping and recreating data sets, resorting etc) however continue to get this error.

232
233 proc sort data=dosdist;
234 by claim_number;
235 run;

NOTE: There were 17644882 observations read from the data set WORK.DOSDIST.
NOTE: The data set WORK.DOSDIST has 17644882 observations and 4 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 1:43.34
cpu time 21.84 seconds


236
237 data diags2;
238 merge dosdist(in=a)
239 diags(in=b);
240 by claim_number;
241 if b;
242 run;

ERROR: BY variables are not properly sorted on data set WORK.DOSDIST.
a=1 b=1 claim_number=EP090810701843592
FIRST.claim_number=0 LAST.claim_number=0 _ERROR_=1 _N_=6554391
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 6554389 observations read from the data set WORK.DOSDIST.
NOTE: There were 1984547 observations read from the data set WORK.DIAGS.
WARNING: The data set WORK.DIAGS2 may be incomplete. When this step was stopped there were 6554388
observations and 47 variables.
WARNING: Data set WORK.DIAGS2 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 3:44.86
cpu time 18.95 seconds
N/A
Posts: 0

Re: BY variable not properly sorted?

I was successful using SQL
Valued Guide
Posts: 632

Re: BY variable not properly sorted?

I do not see anything in the code specifically, but do you know that WORK.DIAGS is sorted correctly? Do you know that the attributes of CLAIM_NUMBER, especially the length, are the same for both data sets?
N/A
Posts: 0

Re: BY variable not properly sorted?

Thanks ArtC (the hammer and nails guy?). I'm running this on a workstation as I do not have a server license.

Here is what I have (log included):
The CONTENTS Procedure

Data Set Name WORK.DIAGS Observations 7258643

1 claim_number Char 18


Sort Information

Sortedby claim_number
Validated YES
Character Set ANSI

The CONTENTS Procedure

Data Set Name WORK.DOS Observations 17644882

1 claim_number Char 18


Sort Information

Sortedby claim_number
Validated YES
Character Set ANSI
Sort Option NODUPREC
404 proc sort data=diags;
405 by claim_number;

NOTE: Input data set is already sorted, no sorting done.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.04 seconds
cpu time 0.00 seconds


406 proc sort data=dos;
407 by claim_number;
408

NOTE: Input data set is already sorted, no sorting done.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds


409 data diags2;
410 merge diags dos;
411 by claim_number;
412 run;

ERROR: BY variables are not properly sorted on data set WORK.DIAGS.
FIRST.claim_number=1 LAST.claim_number=1 _ERROR_=1 _N_=5434037
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1474189 observations read from the data set WORK.DIAGS.
NOTE: There were 5434036 observations read from the data set WORK.DOS.
WARNING: The data set WORK.DIAGS2 may be incomplete. When this step was stopped there were
5434036 observations and 50 variables.
WARNING: Data set WORK.DIAGS2 was not replaced because this step was stopped.
Super Contributor
Super Contributor
Posts: 3,174

Re: BY variable not properly sorted?

I believe that the OP has a "variable length" compatibility issue which is contributing to the error condition. Suggest using PROC CONTENTS on the two input files being merged.

Scott Barry
SBBWorks, Inc.
N/A
Posts: 0

Re: BY variable not properly sorted?

It appears to be a failure in proc sort. Here is the offending observation post sort:
EI020510400011934
EI020510400011934
EI020%10400011935
EI020510400011935

The proper sort order should be:
EI020%10400011935
EI020510400011934
EI020510400011934
EI020510400011935

If I sort just the subset the order is correct however when I sort the entire data set, it comes out incorrect.
N/A
Posts: 0

Re: BY variable not properly sorted?

It also appears that the special characters are not on the input data file and the values are being corrupted by proc sort. 22 million obs, is this just to big?
Super Contributor
Super Contributor
Posts: 3,174

Re: BY variable not properly sorted?

Not likely - suggest you open a Support track to investigate the condition (link at bottom of page).

Scott Barry
SBBWorks, Inc.
Valued Guide
Posts: 632

Re: BY variable not properly sorted?

Sometimes those 'special characters' can be introduced in non-standard ways. To check the size of data issue, which I agree with Scott is unlikely to be the culprit, try a FIRSTOBS and LASTOBS or OBS system option to isolate a few hundred obs on either side and see if the problem goes away. it is more likely that the special character has already been introduced.
N/A
Posts: 0

Re: BY variable not properly sorted?

I ran several times, not only are special characters showing up randomly, they are occurring on different observations and not every time (1 or 2 out of 20+ million. SAS tech support thinks it may be an issue with the disk drive or bad memory as I am able to demonstrate in the log that the characters are not on the input file or in the first dataset created but show up right after the sort. I cannot reproduce every time. I am also occasionally losing the end of record marker causing the output data file having data from the following record in the last field on the previous record (once again, one record out of several million, occurring occasionally). Thanks for responding though.
Regular Learner
Posts: 1

Re: BY variable not properly sorted?

I am not an expert but maybe you should have PROC SORT command for both data which you want to merge.  

Ask a Question
Discussion stats
  • 10 replies
  • 7357 views
  • 0 likes
  • 4 in conversation