Solved: Re: Multiple set statement with by variable

rajdeep · Posted 08-11-2020 05:33 PM

Hi Team,

I am trying a code to execute with the below code, but the ACCOUNT_KEY column values getting plotted randomly from the ACCOUNT_DIM table to the output table.

Please explain how to freeze ACCOUNT_KEY values against the ACCOUNT_NUMBER column, because every time I run the below step, the ACCOUNT_KEY values are getting changed in the output table for the same Account_numbers.

DATA TEST(KEEP= FTF_IND ACCOUNT_NUMBER ACCOUNT_KEY APP_ID TABLET_ID IMAGE_ID ACQUISITIONTYPE);
SET ACCOUNT_DIM(IN=A KEEP= ACCOUNT_NUMBER ACCOUNT_KEY CHANGE_CURRENT_IND ACCOUNT_REGISTRATION_TYPE_DESC where=(CHANGE_CURRENT_IND='Y'));
SET ACQTYPE(IN=B RENAME=(MAST_ACCOUNT_ID=ACCOUNT_NUMBER));
BY ACCOUNT_NUMBER;
IF A AND B THEN OUTPUT;
RUN;

Please help if possible.

ballardw · Posted 08-12-2020 10:47 AM

@rajdeep wrote:

Hi KurtBremser,

Thanks a lot for your quick response. I also found the same idea that, it's giving the expected result with Merge, but can you please explain why it's random with multiple SET statement and if suppose we want to make it happen with SET statement then can we achieve the expected output.

Thanks in advance.

Not random just complicated. As I said, multiple set statements without experience are difficult to work with. If you do not control the order of data then it can get more unpredictable.

View solution in original post

PaigeMiller · Posted 08-11-2020 05:43 PM

You talk about plots, but you don't show us the plot code or the plot output. It would also help if you showed us a portion of this data.

--
Paige Miller

ballardw · Posted 08-11-2020 06:05 PM

Really need to provide a small example of data from both sets and what you expect the result of the data step to be for that example.

Multiple set statements seldom do what you expect when using them without a LOT of experience.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

Tom · Posted 08-11-2020 06:09 PM

The BY and IF statement are doing nothing in that data step (other than possible causing SAS to use an index when accessing the second dataset).

What is it you are trying to do? Why not just merge them BY the variable?

Astounding · Posted 08-11-2020 08:15 PM

I suspect you are trying to match by account_number. To do that, the two SET statements need to be replaced. This is probably the correct replacement:

merge ACCOUNT_DIM (IN=A KEEP= ACCOUNT_NUMBER ACCOUNT_KEY CHANGE_CURRENT_IND ACCOUNT_REGISTRATION_TYPE_DESC where=(CHANGE_CURRENT_IND='Y'))
      ACQTYPE     (IN=B RENAME=(MAST_ACCOUNT_ID=ACCOUNT_NUMBER));

If that's incorrect, you will need to explain more about what the outcome should be.

RichardDeVen · Posted 08-12-2020 04:04 AM

Is the data master/detail ? In other words, for each account_number in ACCOUNT_DIM do you have multiple associated records in ACQTYPE ?

rajdeep · Posted 08-12-2020 05:36 AM

Hi Richard,

Please find the sample test data.

data ACCOUNT_DIM;
INPUT ACCOUNT_KEY 1-7 ACCOUNT_NUMBER $9-14 ACCOUNT_REGISTRATION_TYPE_DESC $ CHANGE_CURRENT_IND $1.;
datalines;
1067347	234057 FTF N
1035020	194500 FTF Y
1067350	234060 FTF Y
1067351	234061 FTF Y
1067352	234062 FTF Y
;
run;

data ACQTYPE;
INPUT ACCOUNT_NUMBER $1-6 ACQUISITIONTYPE $7-11 FTF_IND $13-16 APP_ID 18-23  TABLET_ID $25-26 IMAGE_ID $28-29;
datalines;
234061 FTFT FTF  268074    
234062 FTFT FTF  268074    
111111 NFTF NFTF 268074    
111123 NFTF NFTF 268074    
111134 NFTF NFTF 268074    
;
run;
proc sort data=ACQTYPE;
by account_number;
run;


DATA TEST(KEEP= FTF_IND ACCOUNT_NUMBER ACCOUNT_KEY APP_ID TABLET_ID IMAGE_ID ACQUISITIONTYPE);
set ACCOUNT_DIM(IN=A KEEP= ACCOUNT_NUMBER ACCOUNT_KEY CHANGE_CURRENT_IND ACCOUNT_REGISTRATION_TYPE_DESC where=(CHANGE_CURRENT_IND='Y')) ;
 set ACQTYPE(IN=B );
BY ACCOUNT_NUMBER;
IF A AND B THEN OUTPUT;
RUN;

I just want the 1067351 & 1067352 ACCOUNT_KEYs should be mapped against the 234061 & 234062 ACCOUNT_NUMBERs in the output dataset i.e TEST. If you try executing again and again you can see the random ACCOUNT_KEYs is coming ahead of 234061 & 234062 ACCOUNT_NUMBERS.

Thanks for your quick reply. please let me know if there's any disconnect.

@

extremely sorry for the plot word confusion, I meant mapping actually.

@

Thanks for your reply. you are correct, experience matters a lot. I am still learning actually.

Kurt_Bremser · Posted 08-12-2020 07:06 AM

So it should be a simple MERGE by account_number?

data ACCOUNT_DIM;
input
  ACCOUNT_KEY 1-7
  ACCOUNT_NUMBER $9-14
  ACCOUNT_REGISTRATION_TYPE_DESC $
  CHANGE_CURRENT_IND $1.
;
datalines;
1067347 234057 FTF N
1035020 194500 FTF Y
1067350 234060 FTF Y
1067351 234061 FTF Y
1067352 234062 FTF Y
;

data ACQTYPE;
input
  ACCOUNT_NUMBER $1-6
  ACQUISITIONTYPE $7-11
  FTF_IND $13-16
  APP_ID 18-23
  TABLET_ID $25-26
  IMAGE_ID $28-29
;
datalines;
234061 FTFT FTF  268074    
234062 FTFT FTF  268074    
111111 NFTF NFTF 268074    
111123 NFTF NFTF 268074    
111134 NFTF NFTF 268074    
;

proc sort data=ACQTYPE;
by account_number;
run;

data test (
  keep=
    FTF_IND ACCOUNT_NUMBER ACCOUNT_KEY APP_ID
    TABLET_ID IMAGE_ID ACQUISITIONTYPE
);
merge
  ACCOUNT_DIM (
    in=a
    keep=
      ACCOUNT_NUMBER ACCOUNT_KEY CHANGE_CURRENT_IND
      ACCOUNT_REGISTRATION_TYPE_DESC
    where=(CHANGE_CURRENT_IND='Y')
  )
  ACQTYPE (in=b)
;
by account_number;
if a and b;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

rajdeep · Posted 08-12-2020 08:10 AM

Hi KurtBremser,

Thanks a lot for your quick response. I also found the same idea that, it's giving the expected result with Merge, but can you please explain why it's random with multiple SET statement and if suppose we want to make it happen with SET statement then can we achieve the expected output.

Thanks in advance.

Tom · Posted 08-12-2020 08:16 AM

@rajdeep wrote:

Hi KurtBremser,

Thanks a lot for your quick response. I also found the same idea that, it's giving the expected result with Merge, but can you please explain why it's random with multiple SET statement and if suppose we want to make it happen with SET statement then can we achieve the expected output.

Thanks in advance.

There is nothing random about it (unless one of the actual datasets is coming from a database that does not provide the observations in the same order each time). When you SET a dataset it reads the next observation from that dataset. If no more observations are available the data step stops. So in each iteration your data step will read one observation from A and then one observation from B. If there are variables in B that were also in A then the values from B will overwrite the values read from A. The number of observations created will be the minimum of the number in A and the number in B.

If you want to match the observations on ACCOUNT_NUMBER then list them both on a MERGE statement.

If you want to interleave the observations by ACCOUNT_NUMBER then list them both on one SET statement. Then you will get sum of the number of observations. Any variables in A that are not in B will have missing values on the observations that are read from B (and the reverse). If the same ACCOUNT_NUMBER appears in both A and B then both will stay as separate observations with the observations from A appearing before the observations from B.

Kurt_Bremser · Posted 08-12-2020 08:33 AM

What you wanted is a merge, so you use that statement. SET has other uses, as you could see and as was explained by @Tom

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 08-12-2020 10:47 AM

@rajdeep wrote:

Hi KurtBremser,

Thanks a lot for your quick response. I also found the same idea that, it's giving the expected result with Merge, but can you please explain why it's random with multiple SET statement and if suppose we want to make it happen with SET statement then can we achieve the expected output.

Thanks in advance.

Not random just complicated. As I said, multiple set statements without experience are difficult to work with. If you do not control the order of data then it can get more unpredictable.

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!