BookmarkSubscribeRSS Feed
trt3s
Calcite | Level 5

There are probably many ways to go about fixing my issue, and I have tried all that I know and still no luck. I have many variables and observations, but let's just focus on four of them. Subject, Projectedid, Actual, Actualid. I have a single observation for each subject and I need each variable to be populated, but some subjects are missing actual and actualin. If a subject is missing actual and actual in then I need it to be populated with what was projected. This would be an example of my dataset (obviously this is not the actual dataset).

 

Subject Projectedid Actual     Actualid

001       1                summer   sum

002       2                winter       win

003       1                winter       win

004       1                    .             .

 

I need my final dataset to look like;

 

Subject Projectedid Actual       Actualid

001       1                 summer   sum

002       2                 winter       win

003       1                 winter       win

004       1                 summer   sum

 

Note: If the variabels are not missing, then sometimes the projected and actual do not match and that is how it is suppose to be.

 

So, I thought I could use a data step and populate the missing variables by using an if then do;

 

 

data favorite_seasons;

        set seasons;

 

        if actual='.' then do;

        if projectid='1' then actual='summer';

        else if projectid='2' then actual='winter'

        end;

 

         if actualid='.' then do;

         if projectedid='1' then actualid='sum';

         if projectedid='2' then actualid='win';

         end;

run;

 

 

I thought this would work, but it is not populating correctly. Also, I need to kept my variables names that same so I am not sure if I need to do that in proc sql or not. Any help?

 

 

 

 

9 REPLIES 9
gamotte
Rhodochrosite | Level 12

Hi,

 

Some points are not clear in your question. I think you made some typos in the variables names (Actualcd, Actualin, Actualid, ...).

Can you edit your post in order to harmonize the column names or, if they refer to different variables, explain the use of the different variables.

Also, in your example there are two observations with Projectedcd equal to 1. How do we determine which should be used to populate the missing observation.

Are Actual and Actualcd only alternative reprersentations of the variable Projectedcd ?

If so, you can use formats rather rthan creating new columns in your dataset.

gamotte
Rhodochrosite | Level 12

Maybe your problem is with the tests :

 

if XXX='.' then do;

 If the column contains blanks, the test will fail.

 

Try

if strip(XXX)='.' then do;

 

to remove leading and trailing blanks.

trt3s
Calcite | Level 5
I am not understanding why I would need to use strip in this situation?
gamotte
Rhodochrosite | Level 12

You are comparating strings.

'    .     ' is not equal to '.'

Depending on how the Actual column was generated, you may have some blanks before or after

the dot for the missing observations.

Also, SAS convention for an empty string is a single space " ", which is tested true against "" and " ".

trt3s
Calcite | Level 5

Projectid (1) = summer

               (2) = winter

 

let's say that projectid means that the subjects are projected to like (1) summer or (2) winter, and actualid and actual are what they 'actually' like, and if actual and actualid are missing then I need it to be populated with what they are projected to like. 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

Sorry, your post is a bit mixed up there with the various differences.  When posting a question it is very important to post test data in the form of a datastep - this boths avoids us having to type things in, but also shows the structure of the data - i.e. is projectid character or num, from your logic it appears to be character.  Anyways I have typed all that in and show working code - assuming your data matches that structure:

data have;
  input Subject $ Projectid Actual $ Actualid $;
  if strip(actual)="" then actual=".";
  if strip(actualid)="" then actualid=".";
datalines;
001       1                summer   sum
002       2                winter       win
003       1                winter       win
004       1                    .             .
;
run;

data favorite_seasons;
  set have;
  if actual='.' then actual=ifc(projectid=1,'summer','winter');
  if actualid='.' then actualid=substr(actual,1,3);
run;
trt3s
Calcite | Level 5

data favorite_seasons;
        set seasons;

if ACTUAL="" then ACTUAL=ifc(PROJECTID='1','Summer', 'Winter');
else if ACTUAL="" then ACTUAL=ifc(PROJECTID='2','Winter','Summer');

if ACTUALID="" then ACTUALID=ifc(PROJECTID='1', 'Sum', 'Win');
else if ACTUALID="" then ACTUALID=ifc(PROJECTID='2', 'Win', 'Sum');


run;

 

 

This is the code that I ended up using, I really liked the 'IFC' procedure, but I did not understand what the substr procedure was doing.

gamotte
Rhodochrosite | Level 12

The substr procedure was used to extract the first three letters of the season name.

Actualid seems indeed to be redundant with the Actual variable. If your dataset is large,

this can lead to store much unnecessary information.

In order to gain space, it may be better to work with your season codification and use formats

to get names whenever needed. Here is an example using character codes '1', '2' for Summer and Winter.

 

proc format;
   invalue $season 'Summer'='1'
                  'Winter'='2'
				  other=.;
   value $seas_name '1'="Summer"
                   '2'="Winter";
   value $seas_short '1'="Sum"
                    '2'="Win";
run;

data have;
  infile datalines dlm=',';

  format Subject $3. Projectid Actual $1.;
  informat Actual $season.;

  input Subject Projectid Actual;
/* Summer and Winter will be replaced by their codes in the resulting dataset */
datalines;
001,1,Summer
002,2,Winter
003,1,Winter
004,1,.
;
run;


data want;
    set have;
    if Actual=. then Actual=Projectid;
run;

proc print data=want;
format Projectid Actual $seas_name.;
run;
Ksharp
Super User
Why it is 
004       1                 summer   sum

not
004       1                 winter   win

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1124 views
  • 0 likes
  • 4 in conversation