Program 1:
DATA EUROPE_CAR;
SET SASHELP.CARS;
IF MAKE IN ('BMW');
IF MAKE EQ 'Audi' OR MAKE EQ 'BMW' THEN DO;
KEEP TYPE;
END;
IF MAKE EQ 'Audi' THEN DO;
KEEP Origin;
END;
RUN;
Program 2:
DATA EUROPE_CARS;
SET SASHELP.CARS;
IF MAKE IN ('Audi');
IF MAKE EQ 'BMW' THEN DO;
KEEP TYPE;
END;
IF MAKE EQ 'Audi' THEN DO;
KEEP Origin;
END;
RUN;
Question:
I expect program 1 to keep TYPE variable only while Program 2 to keep ORIGIN variable; both programs keep TYPE and ORIGIN.
Anyone can help? Thanks.
Is there any other way to keep the variables that i wanted base on the conditions on both programs?
We might help you if you tell us why you want to do this.
As @Reeza said, KEEP is global. Meaning that executes regardless of conditions in the data step. Which is logical. The data step needs to know how the output data should look like, before creating and add observations to it.
What would you suspect that the output data set to like if there are makes BMW and Audi?
Perhaps you are confusing assignment logic with subsetting (where you use WHERE).
Take a look at my comments and clarify if you can so we understand what you need:
Program 1:
DATA EUROPE_CAR;
SET SASHELP.CARS;
IF MAKE IN ('BMW');
/* =======================================================
The above statement will eliminate all other records that are not BMW.
Which means, your "IF" statement below does not have to check for "Audi" as
it has already been dropped out of the dataset. Clarify if
this is what you are expecting
======================================================= */
IF MAKE EQ 'Audi' OR MAKE EQ 'BMW' THEN DO;
KEEP TYPE; /* This KEEP statement keeps the TYPE. Sure.
But you are not dropping MAKE. Therefore, you will have
both in the output. You need to clarify what you are
expecting in this step */
END;
IF MAKE EQ 'Audi' THEN DO;
KEEP Origin; /*same comment as above */
END;
RUN;
Program 2:
/*=======================================================
comments from Program:1 also apply for Program#2.
Please clarify so we can help
======================================================= */
DATA EUROPE_CARS;
SET SASHELP.CARS;
IF MAKE IN ('Audi');
IF MAKE EQ 'BMW' THEN DO;
KEEP TYPE;
END;
IF MAKE EQ 'Audi' THEN DO;
KEEP Origin;
END;
RUN;
OK. I have another situation:
Dataset:
Customer HP_info MP_info Email_info Home_Phone Mobile_Phone Email
A1 Y Y Y 123 445 xxx
A2 Y Y N 119 110
A3 N Y N 124
What I wanted is:
A1 dataset
Customer HP_info MP_info Email_info Home_Phone Mobile_Phone Email
A1 Y Y Y 123 445 xxx
A2 dataset
Customer HP_info MP_info Email_info Home_Phone Mobile_Phone
A2 Y Y N 119 110
A3 dataset
Customer HP_info MP_info Email_info Mobile_Phone
A3 N Y N 124
Both data sets and variables get created during the compilation phase of a data step but you only process data during the execution phase. That's why both data sets and variables can't get created during data step execution.
The way around this is by either dynamically create SAS code and then execute this SAS code as a next step or to create hash tables which can get created during exection phase (not the variables though). All these techniques are advanced and require quite a bit of coding.
It is normally not a good idea to split up data the way you want it. There is by group processing, where clauses and if statements to process data in chunks or conditionally.
data have;
infile datalines truncover;
input (Customer HP_info MP_info Email_info Home_Phone Mobile_Phone Email) ($);
datalines;
A1 Y Y Y 123 445 xxx
A2 Y Y N 119 110
A3 N Y N 124
run;
data A1 A2(drop=Email) A3;
set have;
if customer='A1' then
output A1;
else if customer='A2' then
output A2;
else if customer='A3' then
output A3;
run;
Can you explain WHY you want the output shown?
If it is predetermined then just code what you want to happen.
data a1 a2(drop=email) a3(drop=home_phone email);
set have ;
if customer='A1' then output A1;
else if customer='A2' then output A2;
else if customer='A3' then output A3;
run;
If instead it is some type of report then look at the NOZERO option on the DEFINE statement in PROC REPORT.
data have ;
length Customer $10 HP_info MP_info Email_info $1 Home_Phone Mobile_Phone $10 Email $20 ;
input customer -- email;
cards;
A1 Y Y Y 123 445 xxx
A2 Y Y N 119 110 .
A3 N Y N . 124 .
;;;;
proc report data=have ;
by customer ;
define _all_ / display nozero ;
run;
Here is the 2nd situation i highlighted earlier. Below only show 3 customers but i could have 100 customers, and my end result will be in excel listing. All info from the customers are unknown different combinations and we only know if the info is available through N or Y indicator.
Dataset:
Customer HP_info MP_info Email_info Home_Phone Mobile_Phone Email
A1 Y Y Y 123 445 xxx
A2 Y Y N 119 110
A3 N Y N 124
What I wanted is:
A1 dataset
Customer HP_info MP_info Email_info Home_Phone Mobile_Phone Email
A1 Y Y Y 123 445 xxx
A2 dataset
Customer HP_info MP_info Email_info Home_Phone Mobile_Phone
A2 Y Y N 119 110
A3 dataset
Customer HP_info MP_info Email_info Mobile_Phone
A3 N Y N 124
So in that case with 3 binary variables there 8 possible combinations. So you can split your data into 8 separate datasets if you want.
data
yyy
yyn (drop=Email_info)
yny (drop=MP_info)
ynn (drop=MP_info Email_info)
nyy (drop=HP_info)
nyn (drop=HP_info Email_info)
nny (drop=HP_info MP_info)
nnn (drop=HP_info MP_info Email_info)
;
set have;
select (cats(of HP_info MP_info Email_info)) ;
when ('YYY') output yyy;
when ('YYN') output yyn;
when ('YNY') output yny;
when ('YNN') output ynn;
when ('NYY') output nyy;
when ('NYN') output nyn;
when ('NNY') output nny;
when ('NNN') output nnn;
end;
run;
NOTE: There were 3 observations read from the data set WORK.HAVE. NOTE: The data set WORK.YYY has 1 observations and 7 variables. NOTE: The data set WORK.YYN has 1 observations and 6 variables. NOTE: The data set WORK.YNY has 0 observations and 6 variables. NOTE: The data set WORK.YNN has 0 observations and 5 variables. NOTE: The data set WORK.NYY has 0 observations and 6 variables. NOTE: The data set WORK.NYN has 1 observations and 5 variables. NOTE: The data set WORK.NNY has 0 observations and 5 variables. NOTE: The data set WORK.NNN has 0 observations and 4 variables.
There are ways for macro language to do something similar to what you ask, but it is simpler to create multiple data sets:
data audi_only (keep=origin)
bmw_only (keep=type);
set sashelp.cars;
if make='Audi' then output audi_only;
else output bmw_only;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.