@chaitanyagaddam Suggest you post the desired target dataset for the sample source data you shared. That should clarify what you need.
Adding rows for source tables that have no rows feels like a sub-optimal approach. The main reason for such an approach I can think of is for some reporting requirement. IF this is for reporting and it is about ensuring that in reports all possible values for a class variable get printed then consider creating a classdata set instead. One example here.
Data ob1;
Set sashelp.class;
Where sex="M";
Run;
Data ob2;
Set sashelp.class;
Where sex="F";
Run;
Data ob3;
Set sashelp.class;
Where sex="Trans";
Run;
Data ob4;
Set sashelp.class;
Where sex="Other";
Run;
data all_obs;
length sex $5;
set ob1-ob4;
run;
data all_class;
length sex $5;
do sex='M','F','Trans','Other';
output;
end;
run;
/* sample reports */
proc means data=work.all_obs classdata=work.all_class ;
class Sex;
var;
run;
proc tabulate data=work.all_obs classdata=work.all_class noseps;
class sex;
table sex;
run;
/*proc report ....*/
If your source data set got zero rows then it can't add any rows to the target data set.
If the zero row data set contains a variable (column) that doesn't exist in any of the other merged source data sets then this variable will get added to the target data set and be all missing.
Do you mean something like this:
Code:
data all4together;
set ob1 ob2 ob3 ob4;
run;
Log:
1 data all4together;
2 set ob1 ob2 ob3 ob4;
3 run;
NOTE: There were 10 observations read from the data set WORK.OB1.
NOTE: There were 9 observations read from the data set WORK.OB2.
NOTE: There were 0 observations read from the data set WORK.OB3.
NOTE: There were 0 observations read from the data set WORK.OB4.
NOTE: The data set WORK.ALL4TOGETHER has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds
?
Bart
So what is the issue ? Because the code clearly reads all the files, even the empty ones.
Data shell; /* create shell datasets */
length sex $ 5;
sex="M"; output;
sex="F"; output;
sex="Trans"; output;
sex="Other"; output;
Run;
/* sort by sex */
proc sort data=shell;
by sex;
run;
proc sort data=sashelp.class out=class_sorted;
by sex;
run;
data combined;
merge shell class_sorted;
by sex;
run;
proc print data=combined;
run;
@chaitanyagaddam wrote:
The actual point is obs3 and obs4 datasets have 0 obs I have to create a dummy for 0 observations dataset by using _null_ and make it merge with obs1 and obs2 datset.in final dataset i want to have records for obs3 and obs4 datasets updates with 0 along with 19 obs
Still does not make sense.
Are you just talking about make a REPORT of the number of observations in each dataset?
NOBS DATASET 10 WORK.OB1 9 WORK.OB2 0 WORK.OB3 0 WORK.OB4
Or are you talking about adding extra observations?
So perhaps something like this?
data skeleton;
length sex $5 ;
do sex='F','M','Other','Trans';
output;
end;
run;
data want;
merge skeleton ob1-ob4;
by sex;
run;
Why do you need either of those? What analysis are you trying to do?
Have you looked at the PRELOADFMT option of proc summary?
https://communities.sas.com/t5/SAS-Procedures/completetypes-preloadfmt-vs-completetypes/td-p/83674
@chaitanyagaddam Suggest you post the desired target dataset for the sample source data you shared. That should clarify what you need.
Adding rows for source tables that have no rows feels like a sub-optimal approach. The main reason for such an approach I can think of is for some reporting requirement. IF this is for reporting and it is about ensuring that in reports all possible values for a class variable get printed then consider creating a classdata set instead. One example here.
Data ob1;
Set sashelp.class;
Where sex="M";
Run;
Data ob2;
Set sashelp.class;
Where sex="F";
Run;
Data ob3;
Set sashelp.class;
Where sex="Trans";
Run;
Data ob4;
Set sashelp.class;
Where sex="Other";
Run;
data all_obs;
length sex $5;
set ob1-ob4;
run;
data all_class;
length sex $5;
do sex='M','F','Trans','Other';
output;
end;
run;
/* sample reports */
proc means data=work.all_obs classdata=work.all_class ;
class Sex;
var;
run;
proc tabulate data=work.all_obs classdata=work.all_class noseps;
class sex;
table sex;
run;
/*proc report ....*/
Can you show an example of the output dataset you want to create? You're using the word MERGE, but that has a specific meaning in SAS, which I don't think is what you intent.
It sounds like you want to vertically concatenate / stack the four datasets, and get an output dataset where you have 21 records, which are 10 records from males, 9 records from females, and two records with all missing values, from no trans and no other. Is that right?
In SAS, a 0 obs dataset has no rows. This is different from having one row with missing values.
If you want to create a dataset with one row of missing values when a WHERE condition is false, you could do it like:
data foo;
if last then output ;
set sashelp.class end=last;
where sex="Other";
run;
During compile time, the compiler will set the variable LAST to the value 1 if there are no records that meet the WHERE statement. (Side note: the documentation claims the end= variable is always initialized to zero and will not be set to 1 until the SET statement executes, which is incorrect.)
Above will output one record, but it will also only output one record if you change the where clause to sex="M". So you can make the output statement conditional to only output a record with missing values when on the first iteration of the data step, before the SET statement has executed, LAST=1. And you have to add another OUTPUT statement at the bottom of the loop, which will output records that do meet the WHERE clause.
So you could your four steps like:
data ob1;
if _N_=1 and last then output ;
set sashelp.class end=last;
where sex="M";
output ;
run;
data ob2;
if _N_=1 and last then output ;
set sashelp.class end=last;
where sex="F";
output ;
run;
data ob3;
if _N_=1 and last then output ;
set sashelp.class end=last;
where sex="Trans";
output ;
run;
data ob4;
if _N_=1 and last then output ;
set sashelp.class end=last;
where sex="Other";
output ;
run;
Then concatenation them:
data want ;
set ob1-ob4 ;
run ;
Result is:
Obs Name Sex Age Height Weight 1 Alfred M 14 69.0 112.5 2 Henry M 14 63.5 102.5 3 James M 12 57.3 83.0 4 Jeffrey M 13 62.5 84.0 5 John M 12 59.0 99.5 6 Philip M 16 72.0 150.0 7 Robert M 12 64.8 128.0 8 Ronald M 15 67.0 133.0 9 Thomas M 11 57.5 85.0 10 William M 15 66.5 112.0 11 Alice F 13 56.5 84.0 12 Barbara F 13 65.3 98.0 13 Carol F 14 62.8 102.5 14 Jane F 12 59.8 84.5 15 Janet F 15 62.5 112.5 16 Joyce F 11 51.3 50.5 17 Judy F 14 64.3 90.0 18 Louise F 12 56.3 77.0 19 Mary F 15 66.5 112.0 20 . . . 21 . . .
If you want Sex to have the value "T" or "O", you could add an assignment statement to the top of the DATA step.
There are probably easier ways to make a row of missing values. For example, you could make a dataset with four records with values of sex "M" "F" "T" "O" and merge/join that against sashelp.class, and that would do it:
data grid ;
do sex="M","F","T","O" ;
output ;
end ;
run ;
proc sql ;
create table want2 as
select a.sex as sex_grid, b.* from
grid as a left join sashelp.class as b
on a.sex=b.sex
;
quit ;
Given the 19 obs in sashelp.class have only sex='M' or 'F', you want to add two more obs, with SEX='Trans' or 'Other', and all other variables missing. I.e you want 21 obs.
Along the lines of @yabwon 's suggestion but without the proc sort:
data obs1 (where=(sex='F'))
obs2 (where=(sex='M'))
obs3 (where=(sex='Trans'))
obs4 (where=(sex='Other'));
length sex $5;
set sashelp.class;
run;
data want (drop=i);
set obs1-obs4 end=end_of_data;
array genders {4} $5 _temporary_ ('M','F','Trans','Other');
i=whichc(sex,of genders{*});
if i>0 then call missing(genders{i});
output;
if end_of_data;
call missing(of _all_);
do i=1 to dim(genders);
sex=genders{i};
if sex^=' ' then output;
end;
run;
This uses a temporary array with all the expected values of SEX. Any values found in the incoming data are deleted from the array, leaving only those analogous to your zero-obs datasets. Those are added after the "if end_of_data;" statement.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.