BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kevin_T
Fluorite | Level 6

Hello,
I have been facing a problem for the past few hours, resulting in a result that I suppose is quite easy to solve.
I will quickly describe the data to you, and then I will tell you what I would like to get.

Here is what I currently have: Have.png
The "name" column simply contains two values, "A" or "B".

The second column, "ID" contains various identifiers, including "A66571", which is the most important ID in this issue.

The other columns ("AA", "BB" ...) are columns with either "YES" or "NO".

 

What I would like is that for each different value of "name", and when "ID" = "A66571", if a column
("AA" and / or "BB" ...) = "YES", then we keep "YES"; if "ID" = "A66571" and column = "NO", then "." for all this column, including for "ID" = "A66571"
(for each value different from "name" obviously).

 

But if "ID" is different from "A66571", and if a column ("AA" for example) = "YES", then we keep "YES",
if the column = "NO", then we keep "NO" (and therefore keep the "." as specified in the previous step).

 

Here is what I would like to get: Want.png

I hope you can help me, I've been stuck for several hours now.

Thank you.

Kevin_T

 

1 ACCEPTED SOLUTION

Accepted Solutions
AMSAS
SAS Super FREQ

Here's an example, I've added comments so hopefully it will make sense

 

/* Build sample HAVE data */

%let seed=18 ;

data have ;
	keep name id aa bb cc dd ee ;
	drop i k ;
	names="AB" ;
	yesno="YN" ;
	array yn(5) $ aa bb cc dd ee ;
	do i = 1 to 2 ;
		do id=1 to 8 ;
			do k=1 to 5 ; 
				yn{k}=substr(yesno,int(ranuni(&seed)*2)+1,1) ;
			end ;
			name=substr(names, i,1) ; 
			output ;
		end ;
	end ;
run ;


data want ;
	/* Array to hold the values */
	array yn{5} $ aa bb cc dd ee  ;
	/* simple flag string the first 0 refers to aa, second to bb, etc... */
	/* if flag is Y then we want to keep those values */
	/* if flag is N then we want to set those to missing */
	retain
		keepVar "NNNNN" ;
	set have ;
	by name ;
	/* If it's the first occurance of a name */
	if first.name then do ;
		/* Look at the 5 variables aa, bb, cc, dd, ee                */
		/* set the keepVar flags to only maintain those thare ware Y */
		/* You might need to add something here for your ID=A66571   */
		do i=1 to 5 ;
			substr(keepVar,i,1)=yn{i} ;
		end ;
		put name= keepVar= ;
	end ;
	/* Now that keepVar is defined we can loop through the variables */
	/* setting any where the first record was N to missing */
	do i=1 to 5 ;
		if substr(keepVar,i,1)="N" then 
			yn{i}="." ;
	end ;

run ;

View solution in original post

12 REPLIES 12
Kevin_T
Fluorite | Level 6

Hello,

 

I'm trying to find a way to replace the values ​​of some rows, based on a particular row, itself based on the "name" column.

I will explain what I would like to achieve (I have tried many possibilities already, but never have the perfect result).

 

In the "name" column, there are two names, A and B.
The "ID" column contains "ID", the important identifier being A66571, so I already put it in the first line, by "name" column.

There are then different columns ("AA", "BB"...).

 

What I want is that if for ID = "A66571", I subsequently have columns = "NO", then I want to enter "." in these same columns, but for EACH ID (including A66571), for this "name" column.

But if for the ID A66571, I subsequently have columns = "YES", then the other rows, for these same columns do not change.

 

Here's what I currently have,

Have.png

 

 

 and next what I would like to get :

Want.png

What I never manage to get is, for example, when ID = A66571, and the columns "AA" and "BB" = "YES", for me, when ID = "A08176", the columns "AA" and "BB" are always = "." .


I hope you will help me solve this problem, I've been blocking it for quite a few hours (using Arrays, Retain, Lags...).

Thank you so much.

 

Kevin_T

ChrisNZ
Tourmaline | Level 20

You don't provide usable test data, so I'l just provide a snippet.

I have not understood how you use ID.

Something like this should work:

array VALUES [8] $ AA--HH;

array RETAIN [8] _temporary_;

do I=1 to 8;

  if VALUES[I]='YES' then KEEP[I]=1;

  if ^KEEP[I] then VALUES[I]='.';

end;

 

Kevin_T
Fluorite | Level 6

Hello @ChrisNZ,

 

Yes, your snippet also works, I just modified some parts of the code to match my personal code.

 


@ChrisNZ wrote:

 

I have not understood how you use ID.

 


Well, If the ID=A66571, for each new value of the column "name", and if a column (AA for example) is equal to "NO", so all the AA column for ths particular "name" value is set to missing, and so on for BB...

So, A66571 is my "Master ID", and the others just follow Master ID's rules.

 

Thanks for contributing.

 

Regards,

Kevin_T

ChrisNZ
Tourmaline | Level 20

If my snippet works, I hope you use that rather than the chosen solution.

Simpler code is easier to read and to maintain. 

andreas_lds
Jade | Level 19

@Kevin_T wrote:

Hello,

 

I'm trying to find a way to replace the values ​​of some rows, based on a particular row, itself based on the "name" column.

I will explain what I would like to achieve (I have tried many possibilities already, but never have the perfect result).

 

In the "name" column, there are two names, A and B.
The "ID" column contains "ID", the important identifier being A66571, so I already put it in the first line, by "name" column.

There are then different columns ("AA", "BB"...).

 

What I want is that if for ID = "A66571", I subsequently have columns = "NO", then I want to enter "." in these same columns, but for EACH ID (including A66571), for this "name" column.

But if for the ID A66571, I subsequently have columns = "YES", then the other rows, for these same columns do not change.

 

Here's what I currently have,

[...]

 

 

 and next what I would like to get :

[...]

What I never manage to get is, for example, when ID = A66571, and the columns "AA" and "BB" = "YES", for me, when ID = "A08176", the columns "AA" and "BB" are always = "." .


I hope you will help me solve this problem, I've been blocking it for quite a few hours (using Arrays, Retain, Lags...).

Thank you so much.

 

Kevin_T


Please don't double post the same question. I have merged your second post with the original one.

Kevin_T
Fluorite | Level 6

Hello @andreas_lds,

 


@andreas_lds wrote:


Please don't double post the same question. I have merged your second post with the original one.


Thank you for merging the two topics.

I'm sorry, this was my first post on this site, and when I created the first topic, I realized 2 hours later that it was gone, so I created a new one, not knowing that the first was considered spam...

 

Regards,

Kevin_T

Kevin_T
Fluorite | Level 6

Hello,

I've already seen this topic and the answer, which didn't help me.

AMSAS
SAS Super FREQ

Here's an example, I've added comments so hopefully it will make sense

 

/* Build sample HAVE data */

%let seed=18 ;

data have ;
	keep name id aa bb cc dd ee ;
	drop i k ;
	names="AB" ;
	yesno="YN" ;
	array yn(5) $ aa bb cc dd ee ;
	do i = 1 to 2 ;
		do id=1 to 8 ;
			do k=1 to 5 ; 
				yn{k}=substr(yesno,int(ranuni(&seed)*2)+1,1) ;
			end ;
			name=substr(names, i,1) ; 
			output ;
		end ;
	end ;
run ;


data want ;
	/* Array to hold the values */
	array yn{5} $ aa bb cc dd ee  ;
	/* simple flag string the first 0 refers to aa, second to bb, etc... */
	/* if flag is Y then we want to keep those values */
	/* if flag is N then we want to set those to missing */
	retain
		keepVar "NNNNN" ;
	set have ;
	by name ;
	/* If it's the first occurance of a name */
	if first.name then do ;
		/* Look at the 5 variables aa, bb, cc, dd, ee                */
		/* set the keepVar flags to only maintain those thare ware Y */
		/* You might need to add something here for your ID=A66571   */
		do i=1 to 5 ;
			substr(keepVar,i,1)=yn{i} ;
		end ;
		put name= keepVar= ;
	end ;
	/* Now that keepVar is defined we can loop through the variables */
	/* setting any where the first record was N to missing */
	do i=1 to 5 ;
		if substr(keepVar,i,1)="N" then 
			yn{i}="." ;
	end ;

run ;
Kevin_T
Fluorite | Level 6

Hello @AMSAS, your answer meets my expectations exactly!


I was not far from this result, I had not thought of storing the values ​​found in a new variable (here keepVar), indeed, instead of Y or N, I had YES or NO, so I had to adapt the base table with the TRANWRD function, then following your program, repeat the reverse step with TRANWRD.

 

Thank you very much for your contribution, it helped me a lot!

 

Regards,

Kevin_T

Tom
Super User Tom
Super User

You don't want to LAG() anything. Sounds like you need to RETAIN the values from the first observation in each group so you can use them.  This is assuming that the specific value of "A66571" is NOT important. Instead it is just the fact that it is the first observation in the group.

 

Since you don't seem to want to output the retained values you can actually just use a TEMPORARY array.  Values in temporary arrays are always retained across observations.  That way you don't need to invent new variable names or issue RETAIN statement for them.

data want;
  set have;
  by name;
  array current AA -- HH ;
  array first [100] $3 _temporary_;
  if first.name then do index=1 to dim(current);
    first[index] = current[index];
  end;
  do index=1 to dim(current);
    if 'NO' = first[index] then current[index]=' ' ;
  end;
  drop index;
run;

I just used 100 for the dimension of the temporary array so that it is larger than any dimension you might have for the original list of variables.  Also the positional variable list is assuming that AA to HH are adjacent in the input table.  Also assumed that they are character strings with maximum of 3 bytes long.  Also changed it to set the values to blanks, not sure why you would want to store a period into a character variable.

Kevin_T
Fluorite | Level 6

Hello @Tom,

 

I thought I could use the LAG function but it's tricky to use with this kind of problem. I've tried TEMPORARY arrays before posting here, but it didn't work.

And I also considered using a BLANK instead of a period.

 

I see your solution works too, so thanks for your contribution.

 

Regards,

Kevin_T

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2255 views
  • 3 likes
  • 6 in conversation