sas programming

suchismita · Posted 08-16-2021 10:05 AM

proc sql noprint;
  select  memname
  into :table_list separated by ' '
  from comp_tbls   ;
quit;


data all;
  merge  &table_list;by id1 id2;
run;

Hello All,

I want to merge a dynamic number of sas datasets that is available in comp_tbls table and they will be merged by id1 and id2. Now there could be some datasets that won't have the id1 and id2 at all in them. So I want to if there is any way that i can identify which dataset does not have id1 and id2 . So that I can put an indicator saying 'ID1 and ID2 combination does not exist'.

As an example let's say I have below 8 datasets in comp_tbls.

COMP_DC11111
COMP_DCTY567

COMP_FFBNH
COMP_DC32344
COMP_HWADD
COMP_HWBSL
COMP_HWSAD
COMP_HWSLI

My final merged output looks like below. Some blank values are there where

id1 and id2 combination does not exist

or it could be

id1 and id2 exist but the value under the COMP_* column could be really null.

I want to update those blank as 'ID1 and ID2 combination does not exist' where the above situation (in bold letter) occurs .

id1	id2	COMP_DC11111	COMP_DCTY567	COMP_FFBNH	COMP_DC32344	COMP_HWADD	COMP_HWBSL	COMP_HWSAD	COMP_HWSLI
1	15	V		A	T	Y	f	e	u
4	56	Y	Y	B				j
5	67	N	U	T	d	y	m		k

PhilC · Posted 08-16-2021 03:24 PM

Please be a bit more specific. I'm not following everything in your question.

on this statement, I can offer insight.

@suchismita wrote:

So I want to if there is any way that i can identify which dataset does not have id1 and id2 . So that I can put an indicator saying 'ID1 and ID2 combination does not exist'.

When you use LIBNAME to define a library, several tables are updated in SAS. One of them is DICTIONARY.COLUMNS (its view is known as SASHELP.VCOLUMN).

proc sql;
  select libname, memname, name
    from Dictionary.columns
    where upcase(name) in ("MAKE","MODEL");
quit;

You could use this dataset to select files with columns ID1 and ID2 in the library(ies) you choose.

Similarly, you can use DICTIONARY.TABLES:

proc sql;
  select cats(libname,".",memname)
    into :PRDSAL_table_list separated by ' ' 
    from Dictionary.Tables
    where MEMNAME like "PRDSAL_"
       AND LIBNAME in ("SASHELP");
quit;

suchismita · Posted 08-17-2021 12:42 AM

thank you for your reply.

to elaborate it little more, here I am trying to explain it with a simple example. Please let me know if it is easy to understand now.

Let's say I have an incoming data for a set of employees where I have data like below:

table1

id1	id2	dob	name
309	6789	07/08/1987	abc
567	8905	07/09/1966	xyz
348	1235	07/10/1967	cdf
456	5688	07/10/1969	vbn

Then I need to perform a comparison if any of the combination of id1 and id2 already exist in any of employees department (sales, marketing ,payroll and there could be more etc). I will use id1 and id2 combination to pull what all department they exist.

My sales dept data is like below:

id1	id2	dob	name
309	6789	07/08/1987	abc
567	8905	07/09/1966	xyz
348	1235	07/10/1967	efg

Marketing dept data is like below:

id1	id2	dob	name
309	6789	07/08/1987	abc
567	8905	07/09/1966	xyz
348	1235		cdf

Payroll dept data is like below:

id1	id2	dob	name
309	6789	07/08/1987	abc
567	8905	07/09/1966	xyz

Now when I want get a comparison of all 4 datasets, I merged all 4 by id1 and id2. And I am renaming column names as _<dept name> , see below green bold names.

data all;

merge &table_list;by id1 id2;

run;

where table_list is macro variable that contains all table names like sales, marketing and payroll etc.

id1	id2	dob	name	dob_sales	name_sales	dob_marketing	name_marketing	dob_payroll	name_payroll
309	6789	07/08/1987	abc	07/08/1987	abc	07/08/1987	abc	07/08/1987	abc
567	8905	07/09/1966	xyz	07/09/1966	xyz	07/09/1966	xyz	07/09/1966	xyz
348	1235	07/10/1967	cdf	07/10/1967	efg		cdf

So now since 348( id1) and 1235(id2) combination does not exist in payroll dataset , so I would like update dob_payroll and name_payroll it as 'id1 and id2 combination does not exist'.

If we know the number of datasets we are merging, we can assign alias like @ballardwmentioned . But here every time the number and name of datasets in merge statement will change it could be 5 or 10 or 20 ,based on incoming/existing data.

Please let me know if this helps to understand the problem statement.

andreas_lds · Posted 08-17-2021 02:03 AM

So now since 348( id1) and 1235(id2) combination does not exist in payroll dataset , so I would like update dob_payroll and name_payroll it as 'id1 and id2 combination does not exist'.

Hardly possible to insert text in a date-variable.

For future posts: please take some more time to find a proper title for your topics.

suchismita · Posted 08-17-2021 09:50 AM

sure will do that. thank you.

If I convert dob to character variable, then is there a way to update it?

andreas_lds · Posted 08-18-2021 04:00 AM

Yes, but the variable is hardly usable, when converted to text. So why not just keep the missing values, and maybe add a footnote when reporting, explaining the missing value.

PhilC · Posted 08-17-2021 08:24 AM

the dataset option "IN=" could help you flag non matching records. There are more than one way to do this, some better for you than others --- you will have to chose. Below is one example.

/*simulating your setup*/

%put &=table_list ;
/*log:*/
*table_list = 
    TABLELST.sales(in=insales)     
    TABLELST.marketing(in=inmarketing)   
    TABLELST.payroll(in=inPayroll)
/*end log*/;

data all errors ;
  merge &table_list;
    by id1 id2;
  if insales and inmarketing and inPayroll
    then output all;
    else output errors; 
run;

Tom · Posted 08-18-2021 12:40 PM

Sounds like you don't want to MERGE that data at all.

Use a SET statement with the INDSNAME= option to indicate which dataset contributed the observation.

You could then store the name into a permanent variable and use that in conjunction with PROC TRANSPOSE to generate new variable names.

data step1;
   set ds1 ds2 ds3 indsname=dsn;
   by id1 id2;
   dsname = upcase(scan(dsn,-1,'.'));
run;
proc transpose data=step1 name=variable out=step2;
  by id1 id2 dsname notsorted;
  var dob name;
run;
proc transpose data=step2 out=want(drop=_name_) delim=_;
  by id1 id2 ;
  id variable dsname;
  var col1;
run;

ballardw · Posted 08-16-2021 03:45 PM

As soon as you merge data sets the variable(s) exist unless you specifically drop them before the merge.

So testing after merging isn't a very effective approach, especially if those variables exist in multiple data sets.

If you intend to merge BY variables then you need to make sure the variables are there before the merge because missing variables (not values, the variable) will cause an error and the merge will not take place at all.

Unless you use the NOTSORTED option on your BY statement all of the data sets must be sorted by the BY variables before merging. If you do use NOTSORTED is suspect you won't be happy with the results

You could identify which datasets do not contribute to a record by using the IN= dataset option but the BY variable still need to be present in the data set though missing would be acceptable.

data set1;
  input id $ var;
datalines;
.  11
1  22
2  33
;

data set2;
   input id $ var2;
datalines;
1  111
2  222
;


data example;
   merge set1 (in=in1) 
         set2 (in=in2)
   ;
   by id;
   if not(in1) or not(in2) then mergeby='Only one set contributed';
   else if in1 and in2 then mergeby='Both sets contributed'; 
run;

The logic for which data sets might be missing values would require additional variables with the values of the BY variables, unique to each data set to avoid the behavior of Merge to replace values since only one variable by the same name can ever exist in a SAS data set.

Note that when merging if you have multiple missing values for any of the by variables in more than one data set you in the realm of a many-to-many merge and the result is almost never as desired in that case.

sas programming

Re: sas programming

Re: sas programming

Re: sas programming

Re: sas programming

Re: sas programming

Re: sas programming

Re: sas programming

Re: sas programming

Classroom Training Available!