BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,
I have not used SAS is a while so I may be missing something obvious here. I've tried a few approaches (dropping and recreating data sets, resorting etc) however continue to get this error.

232
233 proc sort data=dosdist;
234 by claim_number;
235 run;

NOTE: There were 17644882 observations read from the data set WORK.DOSDIST.
NOTE: The data set WORK.DOSDIST has 17644882 observations and 4 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 1:43.34
cpu time 21.84 seconds


236
237 data diags2;
238 merge dosdist(in=a)
239 diags(in=b);
240 by claim_number;
241 if b;
242 run;

ERROR: BY variables are not properly sorted on data set WORK.DOSDIST.
a=1 b=1 claim_number=EP090810701843592
FIRST.claim_number=0 LAST.claim_number=0 _ERROR_=1 _N_=6554391
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 6554389 observations read from the data set WORK.DOSDIST.
NOTE: There were 1984547 observations read from the data set WORK.DIAGS.
WARNING: The data set WORK.DIAGS2 may be incomplete. When this step was stopped there were 6554388
observations and 47 variables.
WARNING: Data set WORK.DIAGS2 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 3:44.86
cpu time 18.95 seconds
10 REPLIES 10
deleted_user
Not applicable
I was successful using SQL
ArtC
Rhodochrosite | Level 12
I do not see anything in the code specifically, but do you know that WORK.DIAGS is sorted correctly? Do you know that the attributes of CLAIM_NUMBER, especially the length, are the same for both data sets?
deleted_user
Not applicable
Thanks ArtC (the hammer and nails guy?). I'm running this on a workstation as I do not have a server license.

Here is what I have (log included):
The CONTENTS Procedure

Data Set Name WORK.DIAGS Observations 7258643

1 claim_number Char 18


Sort Information

Sortedby claim_number
Validated YES
Character Set ANSI

The CONTENTS Procedure

Data Set Name WORK.DOS Observations 17644882

1 claim_number Char 18


Sort Information

Sortedby claim_number
Validated YES
Character Set ANSI
Sort Option NODUPREC
404 proc sort data=diags;
405 by claim_number;

NOTE: Input data set is already sorted, no sorting done.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.04 seconds
cpu time 0.00 seconds


406 proc sort data=dos;
407 by claim_number;
408

NOTE: Input data set is already sorted, no sorting done.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds


409 data diags2;
410 merge diags dos;
411 by claim_number;
412 run;

ERROR: BY variables are not properly sorted on data set WORK.DIAGS.
FIRST.claim_number=1 LAST.claim_number=1 _ERROR_=1 _N_=5434037
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1474189 observations read from the data set WORK.DIAGS.
NOTE: There were 5434036 observations read from the data set WORK.DOS.
WARNING: The data set WORK.DIAGS2 may be incomplete. When this step was stopped there were
5434036 observations and 50 variables.
WARNING: Data set WORK.DIAGS2 was not replaced because this step was stopped.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
I believe that the OP has a "variable length" compatibility issue which is contributing to the error condition. Suggest using PROC CONTENTS on the two input files being merged.

Scott Barry
SBBWorks, Inc.
deleted_user
Not applicable
It appears to be a failure in proc sort. Here is the offending observation post sort:
EI020510400011934
EI020510400011934
EI020%10400011935
EI020510400011935

The proper sort order should be:
EI020%10400011935
EI020510400011934
EI020510400011934
EI020510400011935

If I sort just the subset the order is correct however when I sort the entire data set, it comes out incorrect.
deleted_user
Not applicable
It also appears that the special characters are not on the input data file and the values are being corrupted by proc sort. 22 million obs, is this just to big?
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Not likely - suggest you open a Support track to investigate the condition (link at bottom of page).

Scott Barry
SBBWorks, Inc.
ArtC
Rhodochrosite | Level 12
Sometimes those 'special characters' can be introduced in non-standard ways. To check the size of data issue, which I agree with Scott is unlikely to be the culprit, try a FIRSTOBS and LASTOBS or OBS system option to isolate a few hundred obs on either side and see if the problem goes away. it is more likely that the special character has already been introduced.
deleted_user
Not applicable
I ran several times, not only are special characters showing up randomly, they are occurring on different observations and not every time (1 or 2 out of 20+ million. SAS tech support thinks it may be an issue with the disk drive or bad memory as I am able to demonstrate in the log that the characters are not on the input file or in the first dataset created but show up right after the sort. I cannot reproduce every time. I am also occasionally losing the end of record marker causing the output data file having data from the following record in the last field on the previous record (once again, one record out of several million, occurring occasionally). Thanks for responding though.
hanish
Calcite | Level 5

I am not an expert but maybe you should have PROC SORT command for both data which you want to merge.  

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 12764 views
  • 0 likes
  • 4 in conversation