BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Sandeep77
Lapis Lazuli | Level 10

Hi all,

I am trying to merge two datasets Sep_release and Letters_dwhdw. I want to find how many accounts have same TA_pcode (postcode) common in both dataset. I am using the merge step but I am getting error that by variables are not properly sorted. I have sorted both the dataset first and then used the merge step. Can you please suggest the mistake?

data Sep_release;
    set Sep_release;
    TA_pcode = compress(TA_pcode);
run;
data Letters_dwhdw;
    set Letters_dwhdw;
    TA_pcode = compress(TA_pcode);
run;

data Comp_table;
    merge Sep_release(in=a) Letters_dwhdw(in=b);
    by TA_pcode;
    if a and b;
    if TA_pcode = TA_pcode;
run;

Error log:
          ;*';*";*/;quit;run;
2          OPTIONS PAGENO=MIN;
3          %LET _CLIENTTASKLABEL='Matching the postcode';
4          %LET _CLIENTPROCESSFLOWNAME='Process Flow';
5          %LET
5        ! _CLIENTPROJECTPATH='C:\Users\sshukla\AppData\Roaming\SAS\EnterpriseGuide\8\AutoRecovery\6b3b8e9d-52ed-41b1-af1a-b82393d31
5        ! a16\Equifax released_(Recovered).egp';
6          %LET _CLIENTPROJECTPATHHOST='LWLT5CG9322XFL';
7          %LET _CLIENTPROJECTNAME='Equifax released_(Recovered).egp';
8          %LET _SASPROGRAMFILE='';
9          %LET _SASPROGRAMFILEHOST='';
10         
11         ODS _ALL_ CLOSE;
12         OPTIONS DEV=SVG;
13         GOPTIONS XPIXELS=0 YPIXELS=0;
14         %macro HTML5AccessibleGraphSupported;
15             %if %_SAS_VERCOMP_FV(9,4,4, 0,0,0) >= 0 %then ACCESSIBLE_GRAPH;
16         %mend;
17         ODS LISTING GPATH=&sasworklocation;
18         FILENAME EGHTML TEMP;
19         ODS HTML5(ID=EGHTML) FILE=EGHTML
20             OPTIONS(BITMAP_MODE='INLINE')
21             %HTML5AccessibleGraphSupported
22             ENCODING='utf-8'
23             STYLE=HTMLBlue
24             NOGTITLE
25             NOGFOOTNOTE
26             GPATH=&sasworklocation
27         ;
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
28         
29         data Comp_table;
30             merge Sep_release(in=a) Letters_dwhdw(in=b);
31             by TA_pcode;
32             if a and b;
33             if TA_pcode = TA_pcode;
34         run;

WARNING: Multiple lengths were specified for the BY variable TA_PCODE by input data sets. This might cause unexpected results.
ERROR: BY variables are not properly sorted on data set WORK.SEP_RELEASE.
a=1 b=0 SEQUENCE_NUMBER=1 REFERENCE_NUMBER=100001130 ADDRESS_MATCHED=Yes ERROR_MESSAGE=  SEGMENT_NUMBER=T41
SEGMENT=Applicant Trace IP OPS_FLAG=New Address Exact TA_ADDRESS1=136 MEADOW WAY TA_ADDRESS2=  TA_ADDRESS3=CAVERSHAM READING
TA_ADDRESS4=BERKS TA_PCODE=RG45LY RANK_1=100 RANK_TYPE_1=Movement APPENDED_DOB=15041961 APPENDED_FORENAME=CLARENCE
NO_INDIVIDUALS_SHARING_NM_DOB=1 NO_INDIVIDUALS_SHARING_NM_NO_DOB=0 DECEASED=NO LAS_RES_IND=No LAS_NON_RES_IND=Strong
LAS_NO_OPEN_INSIGHT_ACC=1 LAS_NO_OPEN_INSIGHT_ACC_L3M=1 LAS_NO_OPEN_INSIGHT_ACC_LM=1 LAS_NO_DIFF_OPEN_INSIGHT_LENDERS=1
LAS_CURRENTLY_REGISTERED_ON_ER=No LAS_TIME_SINCE_SUBJECT_ON_EER=0 LAS_NEW_RR_IN_DIFF_SURNAME=No load records
LAS_TIME_SINCE_RR_DELETE_FOR_APP=No delete records LAS_REGISTERED_TELEPHONE_NUMBER=No LAS_TEL_NUMBER= 
LAS_COUNT_ACTIVITY_IN_L1M=No searches ever LAS_COUNT_ACTIVITY_IN_L3M=No searches ever MOVEMENT_OUT_INDICATOR=Subject
LAS_RES_COUNT_INCL_MOV_INFO=1 LAS_NON_RES_COUNT_INCL_MOV_INFO=3 TA_RESIDENCY_IND=Strong TA_NON_RESIDENCY_IND=No
TA_NO_OPEN_INSIGHT_ACC=2 TA_NO_OPEN_INSIGHT_ACC_L3M=2 TA_NO_OPEN_INSIGHT_ACC_L1M=2 TA_NO_DIFF_OPEN_INSIGHT_LENDERS=2
TA_CURR_REG_ON_ER=No TA_TIME_SINCE_SUBJECT_ON_EER=0 TA_NEW_RR_IN_DIFF_SURNAME=No load records
TA_TIME_SINCE_RR_DELETE_FOR_APP=No delete records TA_REG_TEL_NUM=Yes TA_TEL_NUM_1=***********
TA_COUNT_ACTIVITY_IN_L1M=No searches ever TA_COUNT_ACTIVITY_IN_L3M=No searches ever TA_RES_COUNT_INCL_MOV_INFO=3
TA_NON_RES_COUNT_INCL_MOV_INFO=0 TYPE_1=Last link in a chain LINK_DATE_1=28032013 EQUIFAX_ADDRESS_KEY_1=50070180528 STD_CODE_1= 
LOCAL_NUMBER_1=  TPS_1=  DATE_LOADED_1=  LINE_TYPE_1=  DISTANCE_1=95 NAME_MATCH_1=Subject NAME_1=MR,CLARENCE, ,MITCHELL
DOB_1=15041961 ADDITIONAL_NAME1_1=, , ADDITIONAL_NAME2_1=, , ADDITIONAL_NAME3_1=, , DISPUTE_ALERT=NO CIFAS=NO CSISF30=169
OPS_NUM_S=1 OPS_NUM_T=2.5 AccountKey=. CompanyKey=. PortfolioKey=. CorrespondenceKey=. ContactTypeKey=. CorrespondenceID=  DateKey=.
LetterCode=  LetterName=  LetterGroup=  debt_code=  FIRST.TA_PCODE=1 LAST.TA_PCODE=1 _ERROR_=1 _N_=1
2                                                          The SAS System                             10:55 Thursday, April 13, 2023

NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2 observations read from the data set WORK.SEP_RELEASE.
NOTE: There were 1 observations read from the data set WORK.LETTERS_DWHDW.
WARNING: The data set WORK.COMP_TABLE may be incomplete.  When this step was stopped there were 0 observations and 83 variables.
WARNING: Data set WORK.COMP_TABLE was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      user cpu time       0.00 seconds
      system cpu time     0.01 seconds
      memory              2144.34k
      OS Memory           34664.00k
      Timestamp           04/13/2023 05:04:29 PM
      Step Count                        55  Switch Count  0
      

35         
36         
37         
38         %LET _CLIENTTASKLABEL=;
39         %LET _CLIENTPROCESSFLOWNAME=;
40         %LET _CLIENTPROJECTPATH=;
41         %LET _CLIENTPROJECTPATHHOST=;
42         %LET _CLIENTPROJECTNAME=;
43         %LET _SASPROGRAMFILE=;
44         %LET _SASPROGRAMFILEHOST=;
45         
46         ;*';*";*/;quit;run;
47         ODS _ALL_ CLOSE;
48         
49         
50         QUIT; RUN;
51         
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You can use PROC SORT to soft data.

You also have an issue with the LENGTH of your BY variable being different on the two datasets. 

WARNING: Multiple lengths were specified for the BY variable TA_PCODE by input data sets. This might cause unexpected results.

Figure out what the length should be and make sure they are both defined the same.  You should probably do that on the step that creates these dataset originally. (Plus if you have attached formats to the variable you might want to remove it to avoid causing the values to be truncated on printing.)

data Sep_release;
    length TA_pcode $20 ;
    set Sep_release;
    TA_pcode = compress(TA_pcode);
    format TA_pcode  ;
run;
proc sort data=Sep_release;
    by TA_pcode;
run;
data Letters_dwhdw;
    length TA_pcode $20 ;
    set Letters_dwhdw;
    TA_pcode = compress(TA_pcode);
    format TA_pcode  ;
run;
proc sort data=Letters_dwhdw;
    by TA_pcode;
run;

data Comp_table;
    merge Sep_release(in=a) Letters_dwhdw(in=b);
    by TA_pcode;
    if a and b;
run;

 

 

View solution in original post

5 REPLIES 5
Tom
Super User Tom
Super User

Since your first two steps are potentially modifying the values of TA_PCODE by removing any leading or embedded spaces you at the least will need to sort the dataset by TA_PCODE before you can MERGE by that variable.

 

But your IF statement in the merging step does not make any sense.  Comparing a variable to itself it will always be TRUE.

Sandeep77
Lapis Lazuli | Level 10

Thank you, but how do I sort TA_pcode so that I can run this correctly?

Tom
Super User Tom
Super User

You can use PROC SORT to soft data.

You also have an issue with the LENGTH of your BY variable being different on the two datasets. 

WARNING: Multiple lengths were specified for the BY variable TA_PCODE by input data sets. This might cause unexpected results.

Figure out what the length should be and make sure they are both defined the same.  You should probably do that on the step that creates these dataset originally. (Plus if you have attached formats to the variable you might want to remove it to avoid causing the values to be truncated on printing.)

data Sep_release;
    length TA_pcode $20 ;
    set Sep_release;
    TA_pcode = compress(TA_pcode);
    format TA_pcode  ;
run;
proc sort data=Sep_release;
    by TA_pcode;
run;
data Letters_dwhdw;
    length TA_pcode $20 ;
    set Letters_dwhdw;
    TA_pcode = compress(TA_pcode);
    format TA_pcode  ;
run;
proc sort data=Letters_dwhdw;
    by TA_pcode;
run;

data Comp_table;
    merge Sep_release(in=a) Letters_dwhdw(in=b);
    by TA_pcode;
    if a and b;
run;

 

 

Reeza
Super User
There's no proc sort shown in the code?
It's best to always include it in right before the Merge.

ballardw
Super User

If you habitually use code with the same data set as the source and result set such as

 

data Sep_release;
    set Sep_release;

You will eventually have some issues with logic that corrupts your data. This code completely replaces the source data set. So you can accidentally remove variables, replace values for variables that you didn't want to, remove or add observations.

 

 

Consider a simple example where you need to subtract 1 from a value for some reason:

 

data Sep_release;
    set Sep_release;
    x = x-1;
run;

Then realize that you needed that compress step. So you edit the code to submit with the compress and run it.

 

data Sep_release;
    set Sep_release;
    x = x-1;
   TA_pcode = Compress(TA_pcode);
run;

You have just subtracted an additional 1 from X.

 

I had to spend about half-week recovering some data where code similar to this had been run a data set multiple times and a variable that was supposed to have been reduced from 4 categories to 3 (combining the 1 and 2 values to 1, 3 to 2 and 4 to 3) actually had all of one category: 1 because the collapse had been run multiple times. Nothing like having to search and find 20 year old code, find the source data set, make the correction to the data and then rerun a lot of dependent code that had used this field incorrectly.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1182 views
  • 4 likes
  • 4 in conversation