Solved: LAG Function Combined with conditional functions

Kevin_T · Posted 11-17-2020 09:11 AM

Hello,
I have been facing a problem for the past few hours, resulting in a result that I suppose is quite easy to solve.
I will quickly describe the data to you, and then I will tell you what I would like to get.

Here is what I currently have:
The "name" column simply contains two values, "A" or "B".

The second column, "ID" contains various identifiers, including "A66571", which is the most important ID in this issue.

The other columns ("AA", "BB" ...) are columns with either "YES" or "NO".

What I would like is that for each different value of "name", and when "ID" = "A66571", if a column
("AA" and / or "BB" ...) = "YES", then we keep "YES"; if "ID" = "A66571" and column = "NO", then "." for all this column, including for "ID" = "A66571"
(for each value different from "name" obviously).

But if "ID" is different from "A66571", and if a column ("AA" for example) = "YES", then we keep "YES",
if the column = "NO", then we keep "NO" (and therefore keep the "." as specified in the previous step).

Here is what I would like to get:

I hope you can help me, I've been stuck for several hours now.

Thank you.

Kevin_T

AMSAS · Posted 11-17-2020 03:38 PM

Here's an example, I've added comments so hopefully it will make sense

/* Build sample HAVE data */

%let seed=18 ;

data have ;
	keep name id aa bb cc dd ee ;
	drop i k ;
	names="AB" ;
	yesno="YN" ;
	array yn(5) $ aa bb cc dd ee ;
	do i = 1 to 2 ;
		do id=1 to 8 ;
			do k=1 to 5 ; 
				yn{k}=substr(yesno,int(ranuni(&seed)*2)+1,1) ;
			end ;
			name=substr(names, i,1) ; 
			output ;
		end ;
	end ;
run ;


data want ;
	/* Array to hold the values */
	array yn{5} $ aa bb cc dd ee  ;
	/* simple flag string the first 0 refers to aa, second to bb, etc... */
	/* if flag is Y then we want to keep those values */
	/* if flag is N then we want to set those to missing */
	retain
		keepVar "NNNNN" ;
	set have ;
	by name ;
	/* If it's the first occurance of a name */
	if first.name then do ;
		/* Look at the 5 variables aa, bb, cc, dd, ee                */
		/* set the keepVar flags to only maintain those thare ware Y */
		/* You might need to add something here for your ID=A66571   */
		do i=1 to 5 ;
			substr(keepVar,i,1)=yn{i} ;
		end ;
		put name= keepVar= ;
	end ;
	/* Now that keepVar is defined we can loop through the variables */
	/* setting any where the first record was N to missing */
	do i=1 to 5 ;
		if substr(keepVar,i,1)="N" then 
			yn{i}="." ;
	end ;

run ;

View solution in original post

Kevin_T · Posted 11-17-2020 06:08 AM

Hello,

I'm trying to find a way to replace the values of some rows, based on a particular row, itself based on the "name" column.

I will explain what I would like to achieve (I have tried many possibilities already, but never have the perfect result).

In the "name" column, there are two names, A and B.
The "ID" column contains "ID", the important identifier being A66571, so I already put it in the first line, by "name" column.

There are then different columns ("AA", "BB"...).

What I want is that if for ID = "A66571", I subsequently have columns = "NO", then I want to enter "." in these same columns, but for EACH ID (including A66571), for this "name" column.

But if for the ID A66571, I subsequently have columns = "YES", then the other rows, for these same columns do not change.

Here's what I currently have,

and next what I would like to get :

What I never manage to get is, for example, when ID = A66571, and the columns "AA" and "BB" = "YES", for me, when ID = "A08176", the columns "AA" and "BB" are always = "." .

I hope you will help me solve this problem, I've been blocking it for quite a few hours (using Arrays, Retain, Lags...).

Thank you so much.

Kevin_T

ChrisNZ · Posted 11-17-2020 06:27 PM

You don't provide usable test data, so I'l just provide a snippet.

I have not understood how you use ID.

Something like this should work:

array VALUES [8] $ AA--HH;

array RETAIN [8] _temporary_;

do I=1 to 8;

if VALUES[I]='YES' then KEEP[I]=1;

if ^KEEP[I] then VALUES[I]='.';

end;

High-Performance SAS Coding - Third Edition

Kevin_T · Posted 11-18-2020 03:53 AM

Hello @ChrisNZ,

Yes, your snippet also works, I just modified some parts of the code to match my personal code.

@ChrisNZ wrote:

I have not understood how you use ID.

Well, If the ID=A66571, for each new value of the column "name", and if a column (AA for example) is equal to "NO", so all the AA column for ths particular "name" value is set to missing, and so on for BB...

So, A66571 is my "Master ID", and the others just follow Master ID's rules.

Thanks for contributing.

Regards,

Kevin_T

ChrisNZ · Posted 11-18-2020 05:28 PM

If my snippet works, I hope you use that rather than the chosen solution.

Simpler code is easier to read and to maintain.

High-Performance SAS Coding - Third Edition

andreas_lds · Posted 11-18-2020 12:55 AM

@Kevin_T wrote:

Hello,

I'm trying to find a way to replace the values of some rows, based on a particular row, itself based on the "name" column.

I will explain what I would like to achieve (I have tried many possibilities already, but never have the perfect result).

In the "name" column, there are two names, A and B.
The "ID" column contains "ID", the important identifier being A66571, so I already put it in the first line, by "name" column.

There are then different columns ("AA", "BB"...).

What I want is that if for ID = "A66571", I subsequently have columns = "NO", then I want to enter "." in these same columns, but for EACH ID (including A66571), for this "name" column.

But if for the ID A66571, I subsequently have columns = "YES", then the other rows, for these same columns do not change.

Here's what I currently have,

[...]

and next what I would like to get :

[...]

What I never manage to get is, for example, when ID = A66571, and the columns "AA" and "BB" = "YES", for me, when ID = "A08176", the columns "AA" and "BB" are always = "." .

I hope you will help me solve this problem, I've been blocking it for quite a few hours (using Arrays, Retain, Lags...).

Thank you so much.

Kevin_T

Please don't double post the same question. I have merged your second post with the original one.

Kevin_T · Posted 11-18-2020 03:57 AM

Hello @andreas_lds,

@andreas_lds wrote:
Please don't double post the same question. I have merged your second post with the original one.

Thank you for merging the two topics.

I'm sorry, this was my first post on this site, and when I created the first topic, I realized 2 hours later that it was gone, so I created a new one, not knowing that the first was considered spam...

Regards,

Kevin_T

PaigeMiller · Posted 11-17-2020 09:41 AM

See if the answer from @Kurt_Bremser here works for you

https://communities.sas.com/t5/SAS-Programming/LAG-Not-Copying-Data-Lag-Line/td-p/699227

--
Paige Miller

Kevin_T · Posted 11-17-2020 10:02 AM

Hello,

I've already seen this topic and the answer, which didn't help me.

AMSAS · Posted 11-17-2020 03:38 PM

Here's an example, I've added comments so hopefully it will make sense

/* Build sample HAVE data */

%let seed=18 ;

data have ;
	keep name id aa bb cc dd ee ;
	drop i k ;
	names="AB" ;
	yesno="YN" ;
	array yn(5) $ aa bb cc dd ee ;
	do i = 1 to 2 ;
		do id=1 to 8 ;
			do k=1 to 5 ; 
				yn{k}=substr(yesno,int(ranuni(&seed)*2)+1,1) ;
			end ;
			name=substr(names, i,1) ; 
			output ;
		end ;
	end ;
run ;


data want ;
	/* Array to hold the values */
	array yn{5} $ aa bb cc dd ee  ;
	/* simple flag string the first 0 refers to aa, second to bb, etc... */
	/* if flag is Y then we want to keep those values */
	/* if flag is N then we want to set those to missing */
	retain
		keepVar "NNNNN" ;
	set have ;
	by name ;
	/* If it's the first occurance of a name */
	if first.name then do ;
		/* Look at the 5 variables aa, bb, cc, dd, ee                */
		/* set the keepVar flags to only maintain those thare ware Y */
		/* You might need to add something here for your ID=A66571   */
		do i=1 to 5 ;
			substr(keepVar,i,1)=yn{i} ;
		end ;
		put name= keepVar= ;
	end ;
	/* Now that keepVar is defined we can loop through the variables */
	/* setting any where the first record was N to missing */
	do i=1 to 5 ;
		if substr(keepVar,i,1)="N" then 
			yn{i}="." ;
	end ;

run ;

Kevin_T · Posted 11-18-2020 03:17 AM

Hello @AMSAS, your answer meets my expectations exactly!

I was not far from this result, I had not thought of storing the values found in a new variable (here keepVar), indeed, instead of Y or N, I had YES or NO, so I had to adapt the base table with the TRANWRD function, then following your program, repeat the reverse step with TRANWRD.

Thank you very much for your contribution, it helped me a lot!

Regards,

Kevin_T

Tom · Posted 11-17-2020 04:40 PM

You don't want to LAG() anything. Sounds like you need to RETAIN the values from the first observation in each group so you can use them. This is assuming that the specific value of "A66571" is NOT important. Instead it is just the fact that it is the first observation in the group.

Since you don't seem to want to output the retained values you can actually just use a TEMPORARY array. Values in temporary arrays are always retained across observations. That way you don't need to invent new variable names or issue RETAIN statement for them.

data want;
  set have;
  by name;
  array current AA -- HH ;
  array first [100] $3 _temporary_;
  if first.name then do index=1 to dim(current);
    first[index] = current[index];
  end;
  do index=1 to dim(current);
    if 'NO' = first[index] then current[index]=' ' ;
  end;
  drop index;
run;

I just used 100 for the dimension of the temporary array so that it is larger than any dimension you might have for the original list of variables. Also the positional variable list is assuming that AA to HH are adjacent in the input table. Also assumed that they are character strings with maximum of 3 bytes long. Also changed it to set the values to blanks, not sure why you would want to store a period into a character variable.

Kevin_T · Posted 11-18-2020 03:45 AM

Hello @Tom,

I thought I could use the LAG function but it's tricky to use with this kind of problem. I've tried TEMPORARY arrays before posting here, but it didn't work.

And I also considered using a BLANK instead of a period.

I see your solution works too, so thanks for your contribution.

Regards,

Kevin_T

LAG Function Combined with conditional functions

Re: LAG Function Combined with conditional functions

LAG Function Combined with Conditional Functions

Re: LAG Function Combined with Conditional Functions

Re: LAG Function Combined with Conditional Functions

Re: LAG Function Combined with Conditional Functions

Re: LAG Function Combined with Conditional Functions

Re: LAG Function Combined with Conditional Functions

Re: LAG Function Combined with conditional functions

Re: LAG Function Combined with conditional functions

Re: LAG Function Combined with conditional functions

Re: LAG Function Combined with conditional functions

Re: LAG Function Combined with conditional functions

Re: LAG Function Combined with conditional functions

Registration is open

SAS Training: Just a Click Away