Solved: Creating a new variable based on combination of values in another vari...

Mscarboncopy · Posted 09-05-2023 04:02 PM

I need to create a new var (newvarA) with values starting on 8 from a variable A (1-7). The new values would be based on combinations of values in the first var (A). Any value that is not in the combination must be retained. 8 for example would be given when 1 and 3 are coded. 9 for 1 and 5. I might have one more combination to make.

I tried this (below) and it is not working but I know this is too simplistic and I tried to find resources and could not. My output is all missing for newvarA. And I also do not know how to retain the single values. The missing values (always after the last value for each ID) also need to be retained, as the data file has other variables that create those missing values in A.

I would appreciate any help. I use sas 9.4

Thanks,

Data want;
set have;
if A ne . then do;
if (A = 1 and A =3) then newvarA=8;

if (A = 1 and A =5) then newvarA=9;
end;
run;

DATA  test;
 INPUT ID A ;
CARDS;
1 1
1 3   
1 5
1 .
1 .
2 1
2 5
3 1
3 .
3 .
3 .
4 1
4 3
4 . 
5 1
5 2
5 5
5 . 
;
RUN;

desired output

ID	A	newvar A
1	1	8
1	3	5
1	5	.
1	.	.
1	.
2	3	3
2	6	6
3	1	1
3		.
3		.
3		.
4	1	8
4	3	.
4	.	.
5	1	9
5	2	2
5	5	.
5	.	.

ballardw · Posted 09-05-2023 08:38 PM

@Mscarboncopy wrote:

I am sorry. I changed the second obs at the last minute, but did not change it in the input code.

It was supposed to be 3 and 6 in the input also, to show that I might have a 3 but not have a 1.

These values are not tied to anything and can be random. I would never have the same value repeating for any given Id.

They are "sources" of documents used to code the data (i.e. the other variables in the file). A record (id) can have 1 source or more (up to 7).

The reason to create new "values" is because a few Ids have only one Obs but more than one source and in those cases, the second and third source were dropped.

The data file is created by merging a few different files.

Would my only option be formatting the variable A to be wide and then use an array to assign the combinations?

Ideally, I would end up with a long data set, as all the other variables are in that format.

Thank you for your help @ballardw

I might suggest that a more robust approach would be to go back to the step where things were dropped.

I'm not quite sure what your "created by merging" means in terms of timing when the values you want were added but perhaps showing what that step looked like would be appropriate. In many cases I consider "prevention" a better approach than "fixing". Removing data that is actually needed later is logic problem that if addressed sooner is likely to pay off in more than one way in the long run.

Perhaps it would be better to have an observation with multiple source variables or source indicator variables.

Consider: I have data that has a number of indicator variables that take a value of 1 indicating that source is used and 0 when not used for a specific observation.

I can then do things like:

Select all observations where source number 1 was used (or not used): Where source1=1; for example

Select all observations where 2, 3 , ...., n (n being the number of sources) specif sources were used.

Where source1=1 and Source3=1; (or shorter: where sum(source1,source3)=2;

Or all observations where 3 (or 5 or whatever) number of sources (or more or fewer) but don't care which specific sources were used: Where sum( of source:) = 3;

Having separate variables would also let you to report based on source.

View solution in original post

ballardw · Posted 09-05-2023 04:42 PM

One variable can NEVER have two values for a single observation. So "A=1 and A=3" is always false. Which why you get missing values.

So to apply that sort of logic to values on separate rows you would need to either reshape your data so all the A values are on one observations before creating the "new" variable.

You show values of 2, 3, 5 and 6 in NewvarA. What is the rule for creating them??? I have to assume that the Newvar A= 5 on the second observation was supposed to be 9. Is that correct? When you have Newvar = 8 or 9 is there any specific rule(s) for which observation they will be matched to?

If there are two 1's and one 3 should there be two 8's in the newvar? How about if two 1's and two 3's?

You show different values of A in the output than appear in the input. Is that intentional?

@Mscarboncopy wrote:

I need to create a new var (newvarA) with values starting on 8 from a variable A (1-7). The new values would be based on combinations of values in the first var (A). Any value that is not in the combination must be retained. 8 for example would be given when 1 and 3 are coded. 9 for 1 and 5. I might have one more combination to make.

I tried this (below) and it is not working but I know this is too simplistic and I tried to find resources and could not. My output is all missing for newvarA. And I also do not know how to retain the single values. The missing values (always after the last value for each ID) also need to be retained, as the data file has other variables that create those missing values in A.

I would appreciate any help. I use sas 9.4

Thanks,

Data want;
set have;
if A ne . then do;
if (A = 1 and A =3) then newvarA=8;

if (A = 1 and A =5) then newvarA=9;
end;
run;
DATA  test;
 INPUT ID A ;
CARDS;
1 1
1 3   
1 5
1 .
1 .
2 1
2 5
3 1
3 .
3 .
3 .
4 1
4 3
4 . 
5 1
5 2
5 5
5 . 
;
RUN; 
desired output

ID A newvar A

1 1 8

1 3 5

1 5 .

1 . .

1 .

2 3 3

2 6 6

3 1 1

3 .

3 .

3 .

4 1 8

4 3 .

4 . .

5 1 9

5 2 2

5 5 .

5 . .

Mscarboncopy · Posted 09-05-2023 07:27 PM

I am sorry. I changed the second obs at the last minute, but did not change it in the input code.

It was supposed to be 3 and 6 in the input also, to show that I might have a 3 but not have a 1.

These values are not tied to anything and can be random. I would never have the same value repeating for any given Id.

They are "sources" of documents used to code the data (i.e. the other variables in the file). A record (id) can have 1 source or more (up to 7).

The reason to create new "values" is because a few Ids have only one Obs but more than one source and in those cases, the second and third source were dropped.

The data file is created by merging a few different files.

Would my only option be formatting the variable A to be wide and then use an array to assign the combinations?

Ideally, I would end up with a long data set, as all the other variables are in that format.

Thank you for your help @ballardw

ballardw · Posted 09-05-2023 08:38 PM

@Mscarboncopy wrote:

I am sorry. I changed the second obs at the last minute, but did not change it in the input code.

It was supposed to be 3 and 6 in the input also, to show that I might have a 3 but not have a 1.

These values are not tied to anything and can be random. I would never have the same value repeating for any given Id.

They are "sources" of documents used to code the data (i.e. the other variables in the file). A record (id) can have 1 source or more (up to 7).

The reason to create new "values" is because a few Ids have only one Obs but more than one source and in those cases, the second and third source were dropped.

The data file is created by merging a few different files.

Would my only option be formatting the variable A to be wide and then use an array to assign the combinations?

Ideally, I would end up with a long data set, as all the other variables are in that format.

Thank you for your help @ballardw

I might suggest that a more robust approach would be to go back to the step where things were dropped.

I'm not quite sure what your "created by merging" means in terms of timing when the values you want were added but perhaps showing what that step looked like would be appropriate. In many cases I consider "prevention" a better approach than "fixing". Removing data that is actually needed later is logic problem that if addressed sooner is likely to pay off in more than one way in the long run.

Perhaps it would be better to have an observation with multiple source variables or source indicator variables.

Consider: I have data that has a number of indicator variables that take a value of 1 indicating that source is used and 0 when not used for a specific observation.

I can then do things like:

Select all observations where source number 1 was used (or not used): Where source1=1; for example

Select all observations where 2, 3 , ...., n (n being the number of sources) specif sources were used.

Where source1=1 and Source3=1; (or shorter: where sum(source1,source3)=2;

Or all observations where 3 (or 5 or whatever) number of sources (or more or fewer) but don't care which specific sources were used: Where sum( of source:) = 3;

Having separate variables would also let you to report based on source.

Mscarboncopy · Posted 09-06-2023 09:44 AM

Absolutely. Thank you. I went back and found the issue in the code where I was dropping Source.

Also, thank you for the idea you gave me of how to use source. I really like that.

Best,

Creating a new variable based on combination of values in another variable

Re: Creating a new variable based on combination of values in another variable

Re: Creating a new variable based on combination of values in another variable

Re: Creating a new variable based on combination of values in another variable

Re: Creating a new variable based on combination of values in another variable

Re: Creating a new variable based on combination of values in another variable

Catch up on SAS Innovate 2026

Creating a new variable based on combination of values in another variable

Re: Creating a new variable based on combination of values in another variable

Re: Creating a new variable based on combination of values in another variable

Re: Creating a new variable based on combination of values in another variable

Re: Creating a new variable based on combination of values in another variable

Re: Creating a new variable based on combination of values in another variable

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away