BookmarkSubscribeRSS Feed
suchismita
Quartz | Level 8

 

proc sql noprint;
  select  memname
  into :table_list separated by ' '
  from comp_tbls   ;
quit;


data all;
  merge  &table_list;by id1 id2;
run;

Hello All,

 

I want to merge a dynamic number of sas datasets that is available  in comp_tbls table and they will be merged by id1 and id2. Now there could be some datasets that won't have the id1 and id2 at all in them. So I want to if there is any way that i can identify which dataset does not have id1 and id2 . So that I can put an indicator saying 'ID1 and ID2 combination does not exist'.

 

As an example let's say I have below 8 datasets in comp_tbls. 

COMP_DC11111
COMP_DCTY567

COMP_FFBNH
COMP_DC32344
COMP_HWADD
COMP_HWBSL
COMP_HWSAD
COMP_HWSLI

My final merged output looks like below. Some blank values are there  where

 id1 and id2 combination does not exist

or it could be 

id1 and id2 exist but  the value under the COMP_* column  could be really null.

 

I want to update those blank as  'ID1 and ID2 combination does not exist' where the above situation (in bold letter) occurs .

 

id1  id2 COMP_DC11111 COMP_DCTY567 COMP_FFBNH COMP_DC32344 COMP_HWADD COMP_HWBSL COMP_HWSAD COMP_HWSLI
1 15 V   A T Y f e u
4 56 Y Y B       j  
5 67 N U T d y m   k
8 REPLIES 8
PhilC
Rhodochrosite | Level 12

Please be a bit more specific. I'm not following everything in your question.

 

on this statement, I can offer insight.   


@suchismita wrote:

So I want to if there is any way that i can identify which dataset does not have id1 and id2 . So that I can put an indicator saying 'ID1 and ID2 combination does not exist'.

When you use LIBNAME to define a library, several tables are updated in SAS.  One of them is DICTIONARY.COLUMNS (its view is known as SASHELP.VCOLUMN).

 

proc sql;
  select libname, memname, name
    from Dictionary.columns
    where upcase(name) in ("MAKE","MODEL");
quit;

You could use this dataset to select files with columns ID1 and ID2 in the library(ies) you choose.

 

Similarly, you can use DICTIONARY.TABLES:

proc sql;
  select cats(libname,".",memname)
    into :PRDSAL_table_list separated by ' ' 
    from Dictionary.Tables
    where MEMNAME like "PRDSAL_"
       AND LIBNAME in ("SASHELP");
quit;

 

suchismita
Quartz | Level 8

thank you for your reply.

to elaborate it little more, here I am trying to explain it with a  simple example. Please let me know if it is easy to understand now.

 

Let's say I  have an incoming data for a set of employees where I  have data like below:

table1 

id1 id2 dob name
309 6789 07/08/1987 abc
567 8905 07/09/1966 xyz
348 1235 07/10/1967 cdf
456 5688 07/10/1969 vbn

 

Then I need to perform a comparison if any of the combination of id1 and id2 already exist in any of employees department (sales, marketing ,payroll and there could be more etc). I will use id1 and id2 combination to pull what  all department  they exist.

 

My sales dept data is like below:

 

id1 id2 dob name
309 6789 07/08/1987 abc
567 8905 07/09/1966 xyz
348 1235 07/10/1967 efg

 

Marketing dept data is like below:

 

id1 id2 dob name
309 6789 07/08/1987 abc
567 8905 07/09/1966 xyz
348 1235   cdf

 

Payroll dept data is like below:

id1 id2 dob name
309 6789 07/08/1987 abc
567 8905 07/09/1966 xyz

 

Now when I want get a comparison  of all 4 datasets, I merged all 4 by id1 and id2. And I am renaming column names as _<dept name> , see below green bold names.

data all;

merge &table_list;by id1 id2;

run;

where table_list is macro variable that contains all table names like sales, marketing and payroll etc.

 

id1 id2 dob name dob_sales name_sales dob_marketing name_marketing dob_payroll name_payroll
309 6789 07/08/1987 abc 07/08/1987 abc 07/08/1987 abc 07/08/1987 abc
567 8905 07/09/1966 xyz 07/09/1966 xyz 07/09/1966 xyz 07/09/1966 xyz
348 1235 07/10/1967 cdf 07/10/1967 efg   cdf    

 

So now since 348( id1) and 1235(id2) combination does not exist in payroll dataset , so I would like update dob_payroll and name_payroll it as 'id1 and id2 combination does not exist'.

 

If we know the number of datasets we are merging, we can assign alias like @ballardwmentioned . But here every time the number  and name  of datasets in merge statement will change it could be 5 or 10 or 20 ,based on incoming/existing data.

 

Please let me know if this helps to understand the problem statement. 

 

 

andreas_lds
Jade | Level 19

So now since 348( id1) and 1235(id2) combination does not exist in payroll dataset , so I would like update dob_payroll and name_payroll it as 'id1 and id2 combination does not exist'.

Hardly possible to insert text in a date-variable.

For future posts: please take some more time to find a proper title for your topics.

suchismita
Quartz | Level 8

sure will do that. thank you.

If I convert dob to character variable, then is there a way to update it?

andreas_lds
Jade | Level 19

Yes, but the variable is hardly usable, when converted to text. So why not just keep the missing values, and maybe add a footnote when reporting, explaining the missing value.

PhilC
Rhodochrosite | Level 12

the dataset option "IN=" could help you flag non matching records.  There are more than one way to do this, some better for you than others --- you will have to chose.  Below is one example.

 

/*simulating your setup*/

%put &=table_list ;
/*log:*/
*table_list = 
    TABLELST.sales(in=insales)     
    TABLELST.marketing(in=inmarketing)   
    TABLELST.payroll(in=inPayroll)
/*end log*/;

data all errors ;
  merge &table_list;
    by id1 id2;
  if insales and inmarketing and inPayroll
    then output all;
    else output errors; 
run;
Tom
Super User Tom
Super User

Sounds like you don't want to MERGE that data at all. 

Use a SET statement with the INDSNAME= option to indicate which dataset contributed the observation.

You could then store the name into a permanent variable and use that in conjunction with PROC TRANSPOSE to generate new variable names.

data step1;
   set ds1 ds2 ds3 indsname=dsn;
   by id1 id2;
   dsname = upcase(scan(dsn,-1,'.'));
run;
proc transpose data=step1 name=variable out=step2;
  by id1 id2 dsname notsorted;
  var dob name;
run;
proc transpose data=step2 out=want(drop=_name_) delim=_;
  by id1 id2 ;
  id variable dsname;
  var col1;
run;
ballardw
Super User

As soon as you merge data sets the variable(s) exist unless you specifically drop them before the merge.

So testing after merging isn't a very effective approach, especially if those variables exist in multiple data sets.

 

If you intend to merge BY variables then you need to make sure the variables are there before the merge because missing variables (not values, the variable) will cause an error and the merge will not take place at all.

Unless you use the NOTSORTED option on your BY statement all of the data sets must be sorted by the BY variables before merging. If you do use NOTSORTED is suspect you won't be happy with the results

 

You could identify which datasets do not contribute to a record by using the IN= dataset option but the BY variable still need to be present in the data set though missing would be acceptable.

data set1;
  input id $ var;
datalines;
.  11
1  22
2  33
;

data set2;
   input id $ var2;
datalines;
1  111
2  222
;


data example;
   merge set1 (in=in1) 
         set2 (in=in2)
   ;
   by id;
   if not(in1) or not(in2) then mergeby='Only one set contributed';
   else if in1 and in2 then mergeby='Both sets contributed'; 
run;

The logic for which data sets might be missing values would require additional variables with the values of the BY variables, unique to each data set to avoid the behavior of Merge to replace values since only one variable by the same name can ever exist in a SAS data set.

Note that when merging if you have multiple missing values for any of the by variables in more than one data set you in the realm of a many-to-many  merge and the result is almost never as desired in that case.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 820 views
  • 0 likes
  • 5 in conversation