BookmarkSubscribeRSS Feed
somebody
Lapis Lazuli | Level 10

I want to merging all obs in the dataset TEST_STATS to each obs in the dataset TEST_NOMATCH if they match by the NAME_CLEANED variable. I have attached the sample datasets so they only have obs with 1 value for the NAME_CLEANED var to test. My real dataset has many values for this NAME_CLEANED variable. If TEST_STATS has 4 obs and TEST_NOMATCH has 40 obs, what I want to have is a new dataset with160(40x4) obs. 

Originally, I did this using a DATA step with MERGE. However, I found out that they dont really do what I want.

data test_merged; merge test_nomatch test_stats; by name_cleaned; run;

The output has only 40 obs, where the values change for the first 4 obs and then remain the same for the last 36 obs. 

I can achieve what I want by using PROC SQL as followed: 

proc sql;
create table test_merged_sql as
select a.*, b.*
from test_nomatch as a, test_stats as b
where a.name_cleaned= b.name_cleaned;
run;quit;

My question: how can I achieve what I want using a DATA step? and What went wrong with my original DATA step?

 

 

11 REPLIES 11
somebody
Lapis Lazuli | Level 10

The reason I would want to use a DATA step is that I can separate between the obs that matches and the ones that do not match. How can I do this with PROC SQL?

 

andreas_lds
Jade | Level 19

Please post example data that matches your real-data as close as possible. And post data as data-step-code, that makes it easier to use.

somebody
Lapis Lazuli | Level 10

is there a way to export the dataset as a data-step code? or do I have to do this manually?

 

Shmuel
Garnet | Level 18

SAS data sets have different internal format depending OS and SAS version,

thus attaching a sas data-set is not desirable.

somebody
Lapis Lazuli | Level 10

Here are the datasets:

data test_stats;
   input constituent_ticker: $5. secid: $20. name_cleaned: $100.;
datalines;

AA	0P00018WO0	ALCOA
AA	0P0000008J	ALCOA
AA	0P0000A57O	ALCOA
AAI	0P00006XU1	ALCOA

run;


data test_nomatch;
   input fund_ticker: $5. name: $100. name_cleaned: $100.;
datalines;
ACWI	ALCOA	ALCOA
DGT	ALCOA	ALCOA
EUSA	ALCOA	ALCOA
GRES	ALCOA	ALCOA
HAP	ALCOA	ALCOA
IGE	ALCOA	ALCOA
ITOT	ALCOA	ALCOA
IUSV	ALCOA	ALCOA
IVE	ALCOA	ALCOA
IVV	ALCOA	ALCOA
IWB	ALCOA	ALCOA
IWD	ALCOA	ALCOA
IWR	ALCOA	ALCOA
IWS	ALCOA	ALCOA
IWV	ALCOA	ALCOA
IYM	ALCOA	ALCOA
IYY	ALCOA	ALCOA
JHML	ALCOA	ALCOA
JHMM	ALCOA	ALCOA
JKI	ALCOA	ALCOA
MXI	ALCOA	ALCOA
PBP	ALCOA	ALCOA
PHDG	ALCOA	ALCOA
PICK	ALCOA	ALCOA
PRF	ALCOA	ALCOA
PYZ	ALCOA	ALCOA
RWL	ALCOA	ALCOA
SIZE	ALCOA	ALCOA
SPHB	ALCOA	ALCOA
SPLG	ALCOA	ALCOA
SPTM	ALCOA	ALCOA
SPY	ALCOA	ALCOA
SPYV	ALCOA	ALCOA
SPYX	ALCOA	ALCOA
TOK	ALCOA	ALCOA
URTH	ALCOA	ALCOA
VLUE	ALCOA	ALCOA
XLB	ALCOA	ALCOA
XME	ALCOA	ALCOA
XMVM	ALCOA	ALCOA
run;

 

yabwon
Onyx | Level 15

Hi,

 

It seems that this is what you may need:

data Test_nomatch;
input (constituent_ticker SecID name_cleaned) ($);
cards;
abc 1 A
def 2 B
ghi 3 C
abc 4 A
def 5 B
ghi 6 C
123 1 D
456 2 E
678 3 F
;
run;

data Test_stats;
input (fund_ticker Name name_cleaned) ($);
cards;
x a A
y b B
y c C
t a A
s b B
v c C
;
run;



data matched notmatched(drop=fund_ticker Name);
  if 0 then set Test_nomatch Test_stats;
  declare hash TS(dataset:"Test_stats", multidata:"Y");
  TS.defineKey("name_cleaned");
  TS.defineData("fund_ticker","Name");
  TS.defineDone();

  do until(EOF);
    set Test_nomatch end=EOF;

    if TS.check() = 0 then
    do;
      TS.reset_dup();
      do while(TS.do_over() = 0);
        output matched;
      end;
    end;
    else
      /*call missing(fund_ticker,Name);*/
      output notmatched;
  end;

stop;
run;

 

All the best;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Kurt_Bremser
Super User

Your notmatched dataset will have incorrect non-missing values for fund_ticker and name; you should set those to missing before the OUTPUT, or drop the variables with a dataset option in the first place.

yabwon
Onyx | Level 15

@Kurt_Bremser thanks a million!  I've updated the code.

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15

I'm calling it "programming on 4 hands" and I'm practicing it as often as I can during production programing. 😉

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1303 views
  • 2 likes
  • 6 in conversation