BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hello Folks:

 

I have three tables sorted by two categorical values 'index' and 'description' and one table by 'description' only. Below is the desired output. I I did many-to-many merge while not having 'index' variable in the STANDARD dataset. 

This seems to be working on small mock datasets. However, my actual datasets are large.

 

My question is: Do you have any comments if I'm doing merge right here? Is it safe approach to go ahead and apply to my actual large datasets?

 

TABLES MERGE.png

DATA ONE; 
INPUT INDEX	DESCRIPTION $ VALUE1;
CARDS;
1	A	0.1
1	B	0.2
1	C	0.25
2	A	0.33
2	B	0.41
2	C	0.48
3	A	0.56
3	B	0.63
3	C	0.71
;

DATA TWO; 
INPUT INDEX	DESCRIPTION $ VALUE2;
CARDS;
1	A	0.1
1	B	0.2
1	C	0.25
2	A	0.33
2	B	0.41
2	C	0.48
3	A	0.56
3	B	0.63
3	C	0.71
;

DATA THREE; 
INPUT INDEX	DESCRIPTION $ VALUE3;
CARDS;
1	A	0.1
1	B	0.2
1	C	0.25
2	A	0.33
2	B	0.41
2	C	0.48
3	A	0.56
3	B	0.63
3	C	0.71
;

DATA STANDARD; 
INPUT DESCRIPTION $ VALUE;
CARDS;
A	0.3
B	0.25
C	0.22
;

PROC SORT DATA=ONE;
BY DESCRIPTION INDEX;
PROC SORT DATA=TWO;
BY DESCRIPTION INDEX;
PROC SORT DATA=THREE;
BY DESCRIPTION INDEX;
PROC SORT DATA=STANDARD;
BY DESCRIPTION;

DATA MERGED;
MERGE ONE TWO THREE STANDARD;
BY DESCRIPTION;
RUN; 
PROC PRINT; RUN;

PROC SORT DATA=MERGED;
BY INDEX DESCRIPTION;
RUN;
1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

It's not safe, and can generate incorrect matches.  For example, try removing an observation from ONE, TWO, or THREE, and see if you like the result.  

 

In a small sample, everything matches up.  In larger data sets, it's unlikely that there will be 100% matches for INDEX and DESCRIPTION.

 

Unfortunately, you need to match ONE, TWO, and THREE by both INDEX and DESCRIPTION.  Then you can add STANDARD in a separate step.

View solution in original post

12 REPLIES 12
Astounding
PROC Star

It's not safe, and can generate incorrect matches.  For example, try removing an observation from ONE, TWO, or THREE, and see if you like the result.  

 

In a small sample, everything matches up.  In larger data sets, it's unlikely that there will be 100% matches for INDEX and DESCRIPTION.

 

Unfortunately, you need to match ONE, TWO, and THREE by both INDEX and DESCRIPTION.  Then you can add STANDARD in a separate step.

Cruise
Ammonite | Level 13

@Astounding this might explain why my summary plot is far off. How about to duplicate the STANDARD dataset by the level of INDEX in the three other datasets and merge?  In other words, bringing STANDARD table to the same level of other datasets for one-to-one merge?

 

koyelghosh
Lapis Lazuli | Level 10

Will PROC SQL be a bad idea? Just curious .... also you did not mention how big is the data you have.

This is the question I have (to others seeing this message as well) .. is it better to use MERGE in DATA step or use PROC SQL?

Under what conditions we should sway towards one method rather than the other?

SAS experts here please.

Kurt_Bremser
Super User

Create a format from table STANDARD.

Sort ONE, TWO and THREE by index and description.

Merge ONE, TWO and THREE by index and description, apply the format to description to derive value.

Cruise
Ammonite | Level 13

@Kurt_Bremser 

I understand execpt how to create format from STANDARD and applying the format to description to derive value.

Tom
Super User Tom
Super User

@Cruise wrote:

@Kurt_Bremser 

I understand execpt how to create format from STANDARD and applying the format to description to derive value.


Since the value you want to create is a number then create an INFORMAT and use the INPUT() function to create the value.

To make a format/informat from data create a dataset in the format needed for the CNTLIN= option of PROC FORMAT.

DATA STANDARD; 
INPUT DESCRIPTION $ VALUE;
CARDS;
A 0.3
B 0.25
C 0.22
;

data fmt;
 fmtname='STANDARD';
 type='I';
 set standard;
 rename description=start value=label;
run;
proc format cntlin=fmt;
run;


data test;
  input description $ ;
  value=input(description,standard.);
cards;
A
B
C
;

proc print;
run;
Obs    description    value

 1          A          0.30
 2          B          0.25
 3          C          0.22
Cruise
Ammonite | Level 13

Sorry, if I wasn't clear. My goal was to duplicate STANDARD data to the level INDEX in the other datasets so that I can do one-to-one merge. Right now STANDARD has 3 rows and ONE, TWO and THREE datasets have total 9 rows (3 descriptions by 3 index). If I duplicate STANDARD 3 times by adding INDEX variable for each duplicates then I'll be able to do one-to-one merge.

 

However, I find Astounding's solution easier since I don't bother merging ONE, TWO, THREE by INDEX and DECSRIPTION variables first and then merge STANDARD to the resulting merged dataset by DESCRIPTION variable.

 

Please let me know if I'm missing something here.

 

 

Kurt_Bremser
Super User
data cntlin;
set standard (rename=(description=start value=label));
fmtname = 'myfmt';
type = 'C';
run;

proc format cntlin=cntlin;
run;

Now you can do

value = input(put(description,$myfmt.),best.);

It's probably better to create an informat and use only the input() function, but I'm not in front of SAS right now to test that. I only know formats by heart, but not informats 😉 

Astounding
PROC Star
If formats are too difficult, sort and merge BY DESCRIPTION INDEX (not BY INDEX DESCRIPTION). Then you can merge in summary data BY DESCRIPTION.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 1750 views
  • 8 likes
  • 5 in conversation