BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sleretrano
Quartz | Level 8

SAS Enterprise Guide 5.1

 

I have two similar data sets, say Table1 and Table2 (basically they structurally are the same data set similar variables (with different cell values) regarding different periods), with 64 columns and approximately 550K rows.

 

I'd like to do the following.

 

                  PROC SQL;
                  CREATE TABLE JOINED TABLE AS
                  SELECT Some variables
                  FROM Table1 T1
                  LEFT JOIN Table2 T2 ON ((T1.Var1=T2.Var1 AND T1.Var2=T2.Var2) OR T1.Var1=T2.Var2);
                  QUIT;

 

But it's just too slow.

 

Is there a (much) more efficient way of achieving the same result?

 

Thanks.

 

 

Here's an example.

 

DATA Data201606;
    LENGTH
        Var1             $ 14
        Var2             $ 14
        WholeBunchOfVars_201606 $ 24 ;
    FORMAT
        Var1             $CHAR14.
        Var2             $CHAR14.
        WholeBunchOfVars_201606 $CHAR24. ;
    INFORMAT
        Var1             $CHAR14.
        Var2             $CHAR14.
        WholeBunchOfVars_201606 $CHAR24. ;
    INFILE DATALINES4
        DLM='7F'x
        MISSOVER
        DSD ;
    INPUT
        Var1             : $CHAR14.
        Var2             : $CHAR14.
        WholeBunchOfVars_201606 : $CHAR24. ;
DATALINES4;
42001223829005420012238211001913672811803603960
42001223829005420012238291001913672811803603960
42001223829005420012238221001913672811803603960
42005537849003420055378411001728228741505881940
42000000420200 149282958163677760150000
42000001251100 137224761123272858132500
42000002571100 115006036153103523280000
42000002992100 176705643184314704200000
42000003721100 134284160134284151225000
;;;;

DATA Data201607;
    LENGTH
        Var1             $ 14
        Var2             $ 14
        WholeBunchOfVars_201607 $ 24 ;
    FORMAT
        Var1             $CHAR14.
        Var2             $CHAR14.
        WholeBunchOfVars_201607 $CHAR24. ;
    INFORMAT
        Var1             $CHAR14.
        Var2             $CHAR14.
        WholeBunchOfVars_201607 $CHAR24. ;
    INFILE DATALINES4
        DLM='7F'x
        MISSOVER
        DSD ;
    INPUT
        Var1             : $CHAR14.
        Var2             : $CHAR14.
        WholeBunchOfVars_201607 : $CHAR24. ;
DATALINES4;
42001223829005420012238221001913672811803603960
42001223829005420012238211001913672811803603960
42001223829005420012238291001913672811803603960
48073274008100...50000
48077274928100...150000
48120011318100...200000
48097297288100...200000
48001314298100...250000
48035295388100...575000
48089429268100...600000
48001012248100...600000
48075642348100...700000
42026909980100.000000000.25000
42001115759100.000000000145868575112500
42014081290100.00000000015346509325000
48000000048500.001241385.50000
48000000048400.001241385.150000
;;;;

PROC SQL;
CREATE TABLE JOINED_TABLE AS 
SELECT *
FROM Data201606 T1
LEFT JOIN Data201607 T2 ON ((T1.Var1=T2.Var1 AND T1.Var2=T2.Var2) OR T1.Var1=T2.Var2);
QUIT;
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
You didn't post the output when T1.Var1=T2.Var2  yet ?




DATA Data201606;
    LENGTH
        Var1             $ 14
        Var2             $ 14
        WholeBunchOfVars_201606 $ 24 ;
    FORMAT
        Var1             $CHAR14.
        Var2             $CHAR14.
        WholeBunchOfVars_201606 $CHAR24. ;
    INFORMAT
        Var1             $CHAR14.
        Var2             $CHAR14.
        WholeBunchOfVars_201606 $CHAR24. ;
    INFILE DATALINES4
        DLM='7F'x
        MISSOVER
        DSD ;
    INPUT
        Var1             : $CHAR14.
        Var2             : $CHAR14.
        WholeBunchOfVars_201606 : $CHAR24. ;
DATALINES4;
42001223829005420012238211001913672811803603960
42001223829005420012238291001913672811803603960
42001223829005420012238221001913672811803603960
42005537849003420055378411001728228741505881940
42000000420200 149282958163677760150000
42000001251100 137224761123272858132500
42000002571100 115006036153103523280000
42000002992100 176705643184314704200000
42000003721100 134284160134284151225000
;;;;

DATA Data201607;
    LENGTH
        Var1             $ 14
        Var2             $ 14
        WholeBunchOfVars_201607 $ 24 ;
    FORMAT
        Var1             $CHAR14.
        Var2             $CHAR14.
        WholeBunchOfVars_201607 $CHAR24. ;
    INFORMAT
        Var1             $CHAR14.
        Var2             $CHAR14.
        WholeBunchOfVars_201607 $CHAR24. ;
    INFILE DATALINES4
        DLM='7F'x
        MISSOVER
        DSD ;
    INPUT
        Var1             : $CHAR14.
        Var2             : $CHAR14.
        WholeBunchOfVars_201607 : $CHAR24. ;
DATALINES4;
42001223829005420012238221001913672811803603960
42001223829005420012238211001913672811803603960
42001223829005420012238291001913672811803603960
48073274008100...50000
48077274928100...150000
48120011318100...200000
48097297288100...200000
48001314298100...250000
48035295388100...575000
48089429268100...600000
48001012248100...600000
48075642348100...700000
42026909980100.000000000.25000
42001115759100.000000000145868575112500
42014081290100.00000000015346509325000
48000000048500.001241385.50000
48000000048400.001241385.150000
;;;;


data want;
 if _n_=1 then do;
  if 0 then set Data201607;
  declare hash h1(dataset:'Data201607',multidata:'y');
  h1.definekey('Var1','Var2');
  h1.definedata('WholeBunchOfVars_201607 ');
  h1.definedone();
  
  declare hash h2(dataset:'Data201607',multidata:'y');
  h2.definekey('Var2');
  h2.definedata('WholeBunchOfVars_201607 ');
  h2.definedone();
 end;
call missing(of _all_);
set Data201606;
 rc=h1.find();
 if rc=0 then do;
  do while(rc=0);
   output;
   rc=h1.find_next();
  end;
 end;
  else do;
         rx=h2.find(key:var1);
         if rx=0 then do;
           do while(rx=0);
            output;
            rx=h1.find_next();
           end;
         end;
         else output;
        end;
drop rc rx;
run;



View solution in original post

16 REPLIES 16
Ksharp
Super User

Hash Table.
Post your data and the output you want to see .

sleretrano
Quartz | Level 8
Hi, Xia.
I'm not sure I understand what you mean with the output I want to see.
The output is huge, the inputs are equally large. I can't possibly do it by hand and SAS takes too long to do it.
Ksharp
Super User
I mean make a little example to show what you are looking for . Like :

data a;
input v1 v2 x;
cards;
1 1 1
1 2 2
2 1 2
;
run;
data b;
input v1 v2 x;
cards;
1 1 1
1 2 2
2 1 2
;
run;

output:
........
sleretrano
Quartz | Level 8

I posted an example. Thank you for your time.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

As @Ksharp has said, show an example of your data (as a datastep).  If the data is the same, why do you have it in two datasets?  Identify what needs to be separate, and then put the data into one dataset with a parameter/result setup in a normalised dataset;
PARAM                     RES

PERIOD 1 RESULT  123

PERIOD 2 RESULT   345

 

You can then do many things with that one dataset, by grouping on param for instance, or transposing using param.  Its rarely of any benefit to split data up.

sleretrano
Quartz | Level 8
It's not actually the same data set. I thought what I said, even though technically false, would be understandable.
What I meant is that the data sets have the same variables, but the number of rows and cell values are different.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, but if the datasets are structurally the same, with the only difference being a value Period, why not put the two datasets together, and have a variable called period.  That way you can transpose by period, sort and do datastep processing etc.

sleretrano
Quartz | Level 8

I don't think I've made my self clear. The data sets are different on possibly a LOT of values. Furthermore, some variables have the period in their name, which makes them different variables though I was referring to them as being the same variable. Anyway, I posted an example.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, I don't know your data so I am only going on what I see posted.  It is something that comes up a lot, and it is due to the desire to put "data" in datasetnames (and variables).  If you have ever done databasing you will know about normalised fixed data structures which don't change.  These are far simpler to work with in SAS also, and SAS is built to do this kind of functioning in built.  Lets take your example test data, I create a single table with a fixed structure, putting the "data" from the dataset name into a proper variable in the data so that I can work with it like data.  I then present a simple example of transposing the data based on this data using the "data" items created - i.e. no merge, just a transpose with by group.

data data201606;
  length var1 var2 $14 wholebunchofvars_201606 $24;
  format var1 var2 $char14. wholebunchofvars_201606 $char24.;
  informat var1 var2 $char14. wholebunchofvars_201606 $char24.;
  infile datalines4 dlm='7f'x missover dsd;
  input var1 : $char14. var2 : $char14. wholebunchofvars_201606 : $char24.;
datalines4;
42001223829005420012238211001913672811803603960
42001223829005420012238291001913672811803603960
42001223829005420012238221001913672811803603960
42005537849003420055378411001728228741505881940
42000000420200 149282958163677760150000
42000001251100 137224761123272858132500
42000002571100 115006036153103523280000
42000002992100 176705643184314704200000
42000003721100 134284160134284151225000
;;;;
run;

data data201607;
  length var1 var2 $14 wholebunchofvars_201607 $24;
  format var1 var2 $char14. wholebunchofvars_201607 $char24.;
  informat var1 var2 $char14. wholebunchofvars_201607 $char24.;
  infile datalines4 dlm='7f'x missover dsd;
  input var1 : $char14. var2 : $char14. wholebunchofvars_201607 : $char24.;
datalines4;
42001223829005420012238221001913672811803603960
42001223829005420012238211001913672811803603960
42001223829005420012238291001913672811803603960
48073274008100...50000
48077274928100...150000
48120011318100...200000
48097297288100...200000
48001314298100...250000
48035295388100...575000
48089429268100...600000
48001012248100...600000
48075642348100...700000
42026909980100.000000000.25000
42001115759100.000000000145868575112500
42014081290100.00000000015346509325000
48000000048500.001241385.50000
48000000048400.001241385.150000
;;;;
run;

data working_data;
  set data201606 (in=a rename=(wholebunchofvars_201606=wholebunchofvars)) 
      data201607 (in=b rename=(wholebunchofvars_201607=wholebunchofvars));
  period=ifn(a,1,2);
  date=ifn(a,"01JUN2016"d,"01JUL2016"d);
  format date date9.;
run;

proc sort data=working_data;
  by var1 var2 period;
run;
proc transpose data=working_data out=t_work_data;
  by var1 var2;
  var wholebunchofvars;
  id period;
  idlabel period;
run;

 

sleretrano
Quartz | Level 8

For some reason I'm unable to run your code. I attached the error to this message. Do you have any idea what character is causing this?


Prog.jpg
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, there was something odd when I copied out from the test data you gave to my environment, it shows a square between some of the data.  Must be the different system encodings.  Anyways, if you take your original test data, and copy from below the datalines to the ;;;; and replace that in the code you run, it should be fine.  

 

Could be something wierd with the website too.  This is what I see when I copy your test data to my system:

Capture.PNG

sleretrano
Quartz | Level 8
No luck. The problem is when it rans the working_data data step.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Not sure what to suggest, I just copied it out of the website and ran it, worked fine.  Have you tried a new session?  Does the code look the same as on here, it just looks like some special character is messing it up.

Try copying and pasting the whole text into Notepad first.

sleretrano
Quartz | Level 8
I did try a new session and also tried to copy to Notepad first, still didn't work. I too am at a loss.

Thanks for the tips.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 1904 views
  • 3 likes
  • 3 in conversation