There are probably many ways to go about fixing my issue, and I have tried all that I know and still no luck. I have many variables and observations, but let's just focus on four of them. Subject, Projectedid, Actual, Actualid. I have a single observation for each subject and I need each variable to be populated, but some subjects are missing actual and actualin. If a subject is missing actual and actual in then I need it to be populated with what was projected. This would be an example of my dataset (obviously this is not the actual dataset).
Subject Projectedid Actual Actualid
001 1 summer sum
002 2 winter win
003 1 winter win
004 1 . .
I need my final dataset to look like;
Subject Projectedid Actual Actualid
001 1 summer sum
002 2 winter win
003 1 winter win
004 1 summer sum
Note: If the variabels are not missing, then sometimes the projected and actual do not match and that is how it is suppose to be.
So, I thought I could use a data step and populate the missing variables by using an if then do;
data favorite_seasons;
set seasons;
if actual='.' then do;
if projectid='1' then actual='summer';
else if projectid='2' then actual='winter'
end;
if actualid='.' then do;
if projectedid='1' then actualid='sum';
if projectedid='2' then actualid='win';
end;
run;
I thought this would work, but it is not populating correctly. Also, I need to kept my variables names that same so I am not sure if I need to do that in proc sql or not. Any help?
Hi,
Some points are not clear in your question. I think you made some typos in the variables names (Actualcd, Actualin, Actualid, ...).
Can you edit your post in order to harmonize the column names or, if they refer to different variables, explain the use of the different variables.
Also, in your example there are two observations with Projectedcd equal to 1. How do we determine which should be used to populate the missing observation.
Are Actual and Actualcd only alternative reprersentations of the variable Projectedcd ?
If so, you can use formats rather rthan creating new columns in your dataset.
Maybe your problem is with the tests :
if XXX='.' then do;
If the column contains blanks, the test will fail.
Try
if strip(XXX)='.' then do;
to remove leading and trailing blanks.
You are comparating strings.
' . ' is not equal to '.'
Depending on how the Actual column was generated, you may have some blanks before or after
the dot for the missing observations.
Also, SAS convention for an empty string is a single space " ", which is tested true against "" and " ".
Projectid (1) = summer
(2) = winter
let's say that projectid means that the subjects are projected to like (1) summer or (2) winter, and actualid and actual are what they 'actually' like, and if actual and actualid are missing then I need it to be populated with what they are projected to like.
Hi,
Sorry, your post is a bit mixed up there with the various differences. When posting a question it is very important to post test data in the form of a datastep - this boths avoids us having to type things in, but also shows the structure of the data - i.e. is projectid character or num, from your logic it appears to be character. Anyways I have typed all that in and show working code - assuming your data matches that structure:
data have; input Subject $ Projectid Actual $ Actualid $; if strip(actual)="" then actual="."; if strip(actualid)="" then actualid="."; datalines; 001 1 summer sum 002 2 winter win 003 1 winter win 004 1 . . ; run; data favorite_seasons; set have; if actual='.' then actual=ifc(projectid=1,'summer','winter'); if actualid='.' then actualid=substr(actual,1,3); run;
data favorite_seasons;
set seasons;
if ACTUAL="" then ACTUAL=ifc(PROJECTID='1','Summer', 'Winter');
else if ACTUAL="" then ACTUAL=ifc(PROJECTID='2','Winter','Summer');
if ACTUALID="" then ACTUALID=ifc(PROJECTID='1', 'Sum', 'Win');
else if ACTUALID="" then ACTUALID=ifc(PROJECTID='2', 'Win', 'Sum');
run;
This is the code that I ended up using, I really liked the 'IFC' procedure, but I did not understand what the substr procedure was doing.
The substr procedure was used to extract the first three letters of the season name.
Actualid seems indeed to be redundant with the Actual variable. If your dataset is large,
this can lead to store much unnecessary information.
In order to gain space, it may be better to work with your season codification and use formats
to get names whenever needed. Here is an example using character codes '1', '2' for Summer and Winter.
proc format;
invalue $season 'Summer'='1'
'Winter'='2'
other=.;
value $seas_name '1'="Summer"
'2'="Winter";
value $seas_short '1'="Sum"
'2'="Win";
run;
data have;
infile datalines dlm=',';
format Subject $3. Projectid Actual $1.;
informat Actual $season.;
input Subject Projectid Actual;
/* Summer and Winter will be replaced by their codes in the resulting dataset */
datalines;
001,1,Summer
002,2,Winter
003,1,Winter
004,1,.
;
run;
data want;
set have;
if Actual=. then Actual=Projectid;
run;
proc print data=want;
format Projectid Actual $seas_name.;
run;
Why it is 004 1 summer sum not 004 1 winter win
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.