BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Mscarboncopy
Pyrite | Level 9

I need to create a new var (newvarA) with values starting on 8 from a variable A (1-7).  The new values  would be based on combinations of values in the first var (A). Any value that is not in the combination must be retained. 8 for example would be given when 1 and 3 are coded.  9 for 1 and 5. I might have one more combination to make.

I tried this (below) and it is not working but I know this is too simplistic and I tried to find resources and could not.  My output is all missing for newvarA. And I also do not know how to retain the single values. The missing values (always after the last value for each ID) also need to be retained, as the data file has other variables that create those missing values in A.

I would appreciate any help.  I use sas 9.4

Thanks,

 

Data want;
set  have;
if A ne . then do;
if (A = 1 and A =3) then newvarA=8;

if (A = 1 and A =5) then newvarA=9;
end;
run;

DATA  test;
 INPUT ID A ;
CARDS;
1 1
1 3   
1 5
1 .
1 .
2 1
2 5
3 1
3 .
3 .
3 . 4 1 4 3
4 .
5 1
5 2
5 5
5 . ; RUN;

desired output

IDAnewvar A
118
135
15.
1..
1. 
233
266
311
3 .
3 .
3 .
418
43.
4..
519
522
55.
5..
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@Mscarboncopy wrote:

I am sorry. I changed the second obs at the last minute, but did not change it in the input code.

It was supposed to be 3 and 6 in the input also, to show that I might have a 3 but not have a 1.

These values are not tied to anything and can be random. I would never have the same value repeating for any given  Id. 

They are "sources" of documents used to code the data (i.e. the other variables in the file). A record (id) can have 1 source or more (up to 7). 

The reason to create new "values" is because a few Ids have only one Obs but more than one source and in those cases, the second and third source were dropped.

The data file is created by merging a few different files. 

Would my only option  be formatting the variable A to be wide and then use an array to assign the combinations?

Ideally, I would end up with a long  data set, as all the other variables are in that format.

Thank you for your help @ballardw 

 


I might suggest that a more robust approach would be to go back to the step where things were dropped.

I'm not quite sure what your "created by merging" means in terms of timing when the values you want were added but perhaps showing what that step looked like would be appropriate. In many cases I consider "prevention" a better approach than "fixing". Removing data that is actually needed later is logic problem that if addressed sooner is likely to pay off in more than one way in the long run.

 

Perhaps it would be better to have an observation with multiple source variables or source indicator variables.

 

Consider: I have data that has a number of indicator variables that take a value of 1 indicating that source is used and 0 when not used for a specific observation.

I can then do things like:

Select all observations where source number 1 was used (or not used): Where source1=1; for example

Select all observations where 2, 3 , ...., n (n being the number of sources) specif sources were used.

   Where source1=1 and Source3=1;  (or shorter: where sum(source1,source3)=2;

Or all observations where 3 (or 5 or whatever) number of sources (or more or fewer) but don't care which specific sources were used:  Where sum( of source:) = 3;

 

Having separate variables would also let you to report based on source.

View solution in original post

4 REPLIES 4
ballardw
Super User

One variable can NEVER have two values for a single observation. So "A=1 and A=3" is always false. Which why you get missing values.

 

So to apply that sort of logic to values on separate rows you would need to either reshape your data so all the A values are on one observations before creating the "new" variable.

 

You show  values of 2, 3, 5 and 6 in NewvarA. What is the rule for creating them??? I have to assume that the Newvar A= 5 on the second observation was supposed to be 9. Is that correct? When you have Newvar = 8 or 9 is there any specific rule(s) for which observation they will be matched to?

 

If there are two 1's and one 3 should there be two 8's in the newvar? How about if two 1's and two 3's?

 

You show different values of A in the output than appear in the input. Is that intentional?

 


@Mscarboncopy wrote:

I need to create a new var (newvarA) with values starting on 8 from a variable A (1-7).  The new values  would be based on combinations of values in the first var (A). Any value that is not in the combination must be retained. 8 for example would be given when 1 and 3 are coded.  9 for 1 and 5. I might have one more combination to make.

I tried this (below) and it is not working but I know this is too simplistic and I tried to find resources and could not.  My output is all missing for newvarA. And I also do not know how to retain the single values. The missing values (always after the last value for each ID) also need to be retained, as the data file has other variables that create those missing values in A.

I would appreciate any help.  I use sas 9.4

Thanks,

 

Data want;
set  have;
if A ne . then do;
if (A = 1 and A =3) then newvarA=8;

if (A = 1 and A =5) then newvarA=9;
end;
run;

DATA  test;
 INPUT ID A ;
CARDS;
1 1
1 3   
1 5
1 .
1 .
2 1
2 5
3 1
3 .
3 .
3 . 4 1 4 3
4 .
5 1
5 2
5 5
5 . ; RUN;

desired output

ID A newvar A
1 1 8
1 3 5
1 5 .
1 . .
1 .  
2 3 3
2 6 6
3 1 1
3   .
3   .
3   .
4 1 8
4 3 .
4 . .
5 1 9
5 2 2
5 5 .
5 . .

 

Mscarboncopy
Pyrite | Level 9

I am sorry. I changed the second obs at the last minute, but did not change it in the input code.

It was supposed to be 3 and 6 in the input also, to show that I might have a 3 but not have a 1.

These values are not tied to anything and can be random. I would never have the same value repeating for any given  Id. 

They are "sources" of documents used to code the data (i.e. the other variables in the file). A record (id) can have 1 source or more (up to 7). 

The reason to create new "values" is because a few Ids have only one Obs but more than one source and in those cases, the second and third source were dropped.

The data file is created by merging a few different files. 

Would my only option  be formatting the variable A to be wide and then use an array to assign the combinations?

Ideally, I would end up with a long  data set, as all the other variables are in that format.

Thank you for your help @ballardw 

 

ballardw
Super User

@Mscarboncopy wrote:

I am sorry. I changed the second obs at the last minute, but did not change it in the input code.

It was supposed to be 3 and 6 in the input also, to show that I might have a 3 but not have a 1.

These values are not tied to anything and can be random. I would never have the same value repeating for any given  Id. 

They are "sources" of documents used to code the data (i.e. the other variables in the file). A record (id) can have 1 source or more (up to 7). 

The reason to create new "values" is because a few Ids have only one Obs but more than one source and in those cases, the second and third source were dropped.

The data file is created by merging a few different files. 

Would my only option  be formatting the variable A to be wide and then use an array to assign the combinations?

Ideally, I would end up with a long  data set, as all the other variables are in that format.

Thank you for your help @ballardw 

 


I might suggest that a more robust approach would be to go back to the step where things were dropped.

I'm not quite sure what your "created by merging" means in terms of timing when the values you want were added but perhaps showing what that step looked like would be appropriate. In many cases I consider "prevention" a better approach than "fixing". Removing data that is actually needed later is logic problem that if addressed sooner is likely to pay off in more than one way in the long run.

 

Perhaps it would be better to have an observation with multiple source variables or source indicator variables.

 

Consider: I have data that has a number of indicator variables that take a value of 1 indicating that source is used and 0 when not used for a specific observation.

I can then do things like:

Select all observations where source number 1 was used (or not used): Where source1=1; for example

Select all observations where 2, 3 , ...., n (n being the number of sources) specif sources were used.

   Where source1=1 and Source3=1;  (or shorter: where sum(source1,source3)=2;

Or all observations where 3 (or 5 or whatever) number of sources (or more or fewer) but don't care which specific sources were used:  Where sum( of source:) = 3;

 

Having separate variables would also let you to report based on source.

Mscarboncopy
Pyrite | Level 9

Absolutely. Thank you. I went back and found the issue in the code where I was dropping  Source.

Also, thank you for the idea you gave me of how to use source.  I really like that. 

Best,

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1049 views
  • 2 likes
  • 2 in conversation