BookmarkSubscribeRSS Feed
subobscure
Obsidian | Level 7

I have an empty dataset A with 2 variables date_num and date_char

It has been created as:-

 

data a;
attrib 	date_num length=8. format=date9.
		date_char length=$200.
		;
	run;

I have another dataset E having 2 variables origdt (original date - numeric) and id(character) variable.

 

The dataset E is:-

ID              ORIGDT
520-03-08 17-Dec-15
330-03-02 19-May-15
110-42-01
330-09-02
380-04-21 5-Sep-16
380-09-07
110-09-01
610-05-03 12-May-15
110-34-10 7-Nov-16
110-11-01 6-Feb-17

 The datasset E can be created as :-

data E;
input ID$1-9 ORIGDT:Date9.;
format ORIGDT DATE9.;
datalines;
520-03-08 17-Dec-15
330-03-02 19-May-15
110-42-01 .
330-09-02 .
380-04-21 5-Sep-16
380-09-07 .
110-09-01 .
610-05-03 12-May-15
110-34-10 7-Nov-16
110-11-01 6-Feb-17
run;

 

I have created a new dataset as under:-

 

data temp;
set a(obs=0) e;
if not missing(origdt) then date_num=origdt;
if not missing(origdt) then date_char=put(origdt,DATE9.);
run;

The problem is that for variable date_num/date_char, even for values which have missing origdt, the value of previous observation is retained and populated, unless I explicitly put an else statement.

 

Please Help !!!!! 

 

To all those who are saying that do not include 'A' dataset, the reason why it has been set is because, it's the metadata.

It forms the basis for newly created variables. It essentially contains their formats, informats, label, length (variable attributes), etc...

 

Also I know that variables are being retained, but I need a further explanation. This is a rare situation in which if an 'if statement' is used alongwith the set statement, the variables for which even if the condition is not true are being populated.(Check attached file)

 

However if the datasets are simply set, even if there is a missing variable in they remain as is.

 

 

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data in the form of a datastep!!  Post it as text using the code window - {i} above post area.  

Also, why are you bothering to create a, it adds nothing to the code.  Just do:

data temp;
  set e;
  date_num=ifn(not missing(origdt),origdt,.);
  date_char=ifc(not missing(origdt),put(origdt,date9.),"");
  format date_num date9.;
run;
subobscure
Obsidian | Level 7
  1. Answered the question for creation of A in question
  2. pasted the raw dataset as text
Astounding
PROC Star

The issue you are encountering is that any variable that comes from a SAS data set is automatically retained.  (I guess you already noticed that.)  An easy solution would be to skip creating the data set A:

 

data want;

set e;

attrib date_num length=8. format=date9.

date_char length=$200;

if not missing(orig_dt) then do;

   date_num = origdt;

   date_char = put(origdt, date9.);

end;

run;

 

If there is some reason you have to use the data set A instead, there are ways to program around this.  Of course, that means you get a longer, clunkier program as a result.  For example:

 

data want;

if 5=4 then do;

   set a (rename=date_num = dummy1 date_char = dummy2));

   date_num = dummy1;

   date_char = dummy2;

   drop dummy1 dummy2;

end;

if not missing(orig_dt) then do;

   date_num = origdt;

   date_char = put(origdt, date9.);

end;

run;

 

 

Now DATE_NUM and DATE_CHAR are no longer coming from a SAS data set, and are thus no longer retained.

subobscure
Obsidian | Level 7

@Astounding Thanks for replying on my post.

The thing is that my dataset A is actually the metadata, it contains the standard attributes of around 50,000 variables (Label, length, type, format, informat).

Dataset E is my raw data. The variables have to be processed and brought into final dataset from it. So, what I am trying to so, is to process variables in E (origdt) and assign them to A (My final dataset containing standards (date_num / date_char))

 

I know I can do this in 2 steps i.e.

data temp;

set A E;

run;

data temp;

set temp;

/************

My preprocessing

***********/

run;

But, the code becomes large, and it makes no sense particularly if my raw data is coming from 20 different datasets....

Any advice is highly appreciated.... 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, the code below should work.  I would ask what you plan to do with a dataset with 50k variables?  Sounds to me like you need a database, or a data warehouse.

 

data a;
  attrib date_num length=8. format=date9.
		     date_char length=$200.;
run;
data E;
  input ID$1-9 ORIGDT:Date9.;
  format ORIGDT DATE9.;
datalines;
520-03-08 17-Dec-15
330-03-02 19-May-15
110-42-01 .
330-09-02 .
380-04-21 5-Sep-16
380-09-07 .
110-09-01 .
610-05-03 12-May-15
110-34-10 7-Nov-16
110-11-01 6-Feb-17
run;

data want (drop=id origdt);
  set a e;
  date_num=ifn(not missing(origdt),origdt,.);
  date_char=ifc(not missing(origdt),put(origdt,date9.),"");
  if id="" then delete;
run;
subobscure
Obsidian | Level 7

Thanks for the response...

I know that it works with inf / inc

Even in my original code it works , if I put the else statement.

 

The reason why I have posted this topic is because, I want to know that if the variables are retained in  a set statement in SAS, then why is it that when I do a simple set (without if ) the values of the common variables are not populated.

 

Example....

data E1;
  input ID$1-9 ORIGDT:Date9.;
  format ORIGDT DATE9.;
datalines;
520-03-08 17-Dec-15
330-03-02 19-May-15
110-42-01 .
330-09-02 .
;
data E2;
  input ID$1-9 ORIGDT:Date9.;
  format ORIGDT DATE9.;
datalines;
380-04-21 5-Sep-16
380-09-07 .
110-09-01 .
610-05-03 12-May-15
          7-Nov-16
110-11-01 6-Feb-17
run;

data E3;
set E1 E2;
run;

But if I use an if statement in the same datastep for an existing variable, for the values in which the if statement is false the values retained from previous observations are populated.

I want to know the PDV logic, (how the variables are retained.)

If they are retained and I agree they are retained, then why are they not retained in a simple SET STATEMENT...?

(From the code above you can see that even though there are missing values in E1 and E2 dataset but still they are coming as missing in dataset3. They are NOT RETAINED !!!)

Astounding
PROC Star

They are retained.  However, SAS faces an additional question.  When the SET statement switches from one data set to another, what should happen?  Part of the answer is that SAS re-sets any variables coming in from any data set mentioned in the SET statement, before it begins to read from that second data set.  Then the "retain" feature remains in place as SAS reads additional observations from the second data set.

 

I'll have to think about this ... whether there is an easy way to process thousands of variables in one DATA step instead of two.  Is the amount of data manipulation (IF/THEN, etc.) also large, or is that relatively small?  As you mentioned, simplest might be using ELSE statements.  And perhaps adding these statements at the end of your DATA step would work:

 

output;

call missing(of _all_);

ballardw
Super User

How about showing what the desired result for the example data should look like? Again, as a data step.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2063 views
  • 0 likes
  • 4 in conversation