Help using Base SAS procedures

Help with merging

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Help with merging

Hi!

I have 8 different data set, 4 quarter data set from individual level and 4 quarter data set from household level. I would like to combine all these data set into one. Previously I'm using this code:

*For Individual

data work.i07;

  set hw.i071(in=m071) hw.i072(in=m072) hw.i073(in=m073) hw.i074(in=m074);

  if m071=1 then qtr=1;

  else if m072=1 then qtr=2;

  else if m073=1 then qtr=3;

  else if m074=1 then qtr=4;

run;

*For Household

data work.f07;

  set hw.F071(in=f071) hw.F072(in=f072) hw.F073(in=f073) hw.F074(in=f074);

  if f071=1 then qtr=1;

  else if f072=1 then qtr=2;

  else if f073=1 then qtr=3;

  else if f074=1 then qtr=4;\

run;

*Merging both

data hw.all07;

  merge work.f07

work.i07p;

by CU_ID;

run;

That code works, but I wonder if there is any simple code to do it. Perhaps using single data step instead using multiple step. Thank you.


Accepted Solutions
Solution
‎07-07-2013 11:36 PM
Super Contributor
Posts: 297

Re: Help with merging

Hi inuman,

It is important to understand that simpler code doesn't necessarily mean better code.  You need to measure how efficient the code is to run and also take into account how easy the code is to interpret by other users.  I have created some code that is very efficient from a keystroke perspective, but was either inefficient to run when bench marked against other methods or unreadable by my colleagues.

I don't really understand why you are creating the variable QTR twice.  If you are merging the datasets with QTR in both, then the "F" datasets QTR variable results will be overwritten with the results from the QTR variable in the "I" dataset once merged.


Keeping this in mind you could do the following:

Dual Set Merge.

DATA FDATA (INDEX = (CU_ID));

  SET HW.F071 HW.F072  HW.F073  HW.F074;

RUN;

DATA WORK.I07;

  SET HW.I071 (IN = M071) HW.I072 (IN = M072) HW.I073 (IN = M073) HW.I074 (IN = M074);

       IF M071=1 THEN QTR=1;

  ELSE IF M072=1 THEN QTR=2;

  ELSE IF M073=1 THEN QTR=3;

  ELSE IF M074=1 THEN QTR=4;

  SET  FDATA KEY = CU_ID;

RUN;

As suggested by KSharp you could use INDSNAME, however this was only implemented in version 9.2.

SQL Merge;

PROC SQL;

CREATE TABLE WORK.I07V2 AS

SELECT * FROM

(SELECT *,

        1 AS QTR FROM HW.I071

UNION ALL

SELECT * ,

        2 AS QTR FROM HW.I072

UNION ALL

SELECT * ,

        3 AS QTR FROM HW.I073

UNION ALL

SELECT * ,

        4 AS QTR FROM HW.I074) AS I,

(SELECT * FROM HW.F071

UNION ALL

SELECT * FROM HW.F072

UNION ALL

SELECT * FROM HW.F073

UNION ALL

SELECT * FROM HW.F074) AS F

WHERE I.CU_ID = F.CU_ID

;

QUIT;

Be careful when using SQL for merges as a Cartesian product is produced and then trimmed to meet the conditions of the where statement, which can be quite inefficient.  SAS datasteps are highly optimized and should be used in most instances, however SQL certainly has it's uses and shouldn't be entirely over looked.  Once again it comes down to benchmarking and testing to ensure that you have the most efficient code for any given situation.

I have made a few assumptions about the structure of your data in the above example, but the basics are here for you to build upon.  You could also perform a hash lookup, but I find these confusing for other users to read, therefore I use them sparingly and only when entirely necessary from a performance standpoint - usually when you have a very small dataset you want to merge against a very large one.

I hope this is useful.  If not, please send a sample of the data you are using and I will attempt to help further.

Regards,

Scott

View solution in original post


All Replies
Super User
Posts: 9,681

Re: Help with merging

The one thing I can think is using INDSNAME= option.

data work.i07;

  set hw.i0: indsname=dsn;

name=scan(dsn,-1,'.');

Solution
‎07-07-2013 11:36 PM
Super Contributor
Posts: 297

Re: Help with merging

Hi inuman,

It is important to understand that simpler code doesn't necessarily mean better code.  You need to measure how efficient the code is to run and also take into account how easy the code is to interpret by other users.  I have created some code that is very efficient from a keystroke perspective, but was either inefficient to run when bench marked against other methods or unreadable by my colleagues.

I don't really understand why you are creating the variable QTR twice.  If you are merging the datasets with QTR in both, then the "F" datasets QTR variable results will be overwritten with the results from the QTR variable in the "I" dataset once merged.


Keeping this in mind you could do the following:

Dual Set Merge.

DATA FDATA (INDEX = (CU_ID));

  SET HW.F071 HW.F072  HW.F073  HW.F074;

RUN;

DATA WORK.I07;

  SET HW.I071 (IN = M071) HW.I072 (IN = M072) HW.I073 (IN = M073) HW.I074 (IN = M074);

       IF M071=1 THEN QTR=1;

  ELSE IF M072=1 THEN QTR=2;

  ELSE IF M073=1 THEN QTR=3;

  ELSE IF M074=1 THEN QTR=4;

  SET  FDATA KEY = CU_ID;

RUN;

As suggested by KSharp you could use INDSNAME, however this was only implemented in version 9.2.

SQL Merge;

PROC SQL;

CREATE TABLE WORK.I07V2 AS

SELECT * FROM

(SELECT *,

        1 AS QTR FROM HW.I071

UNION ALL

SELECT * ,

        2 AS QTR FROM HW.I072

UNION ALL

SELECT * ,

        3 AS QTR FROM HW.I073

UNION ALL

SELECT * ,

        4 AS QTR FROM HW.I074) AS I,

(SELECT * FROM HW.F071

UNION ALL

SELECT * FROM HW.F072

UNION ALL

SELECT * FROM HW.F073

UNION ALL

SELECT * FROM HW.F074) AS F

WHERE I.CU_ID = F.CU_ID

;

QUIT;

Be careful when using SQL for merges as a Cartesian product is produced and then trimmed to meet the conditions of the where statement, which can be quite inefficient.  SAS datasteps are highly optimized and should be used in most instances, however SQL certainly has it's uses and shouldn't be entirely over looked.  Once again it comes down to benchmarking and testing to ensure that you have the most efficient code for any given situation.

I have made a few assumptions about the structure of your data in the above example, but the basics are here for you to build upon.  You could also perform a hash lookup, but I find these confusing for other users to read, therefore I use them sparingly and only when entirely necessary from a performance standpoint - usually when you have a very small dataset you want to merge against a very large one.

I hope this is useful.  If not, please send a sample of the data you are using and I will attempt to help further.

Regards,

Scott

New Contributor
Posts: 2

Re: Help with merging

Thank you so much for both of your answer (Ksharp and Scott_Mitchell). I'll try your suggestion and see which one is the most efficient way. I'll be back here if I find any unresolved issues. Thank you once again.

Best,

Wisnu

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 224 views
  • 3 likes
  • 3 in conversation