Solved: Re: Extracting a infromation from a cell in row from a data set

George3 · Posted 10-01-2019 11:22 AM

Hello,

I am a revived SAS user, so I am a little "rusty" with some of my programming. I am trying to devise a program that will read each row in a column an compare the values of each row. The issue here is that the values in this column decrease until it reaches row 9. I need the increase between Row 8 and Row 9 to signal the system to stop, and extract Row 8. I found the program below that handles the extraction portion of the process, but I need help with the proper logic so it will do this with any sets of numbers. To help clarify, with the program below, the correct row that should be extracted is "Row 8". How do I get the system to recognize this increase, which I suspect is an if else statement, but I can't seem to wrap my mind about the proper way to word the statement to extract this row? Keep in mind, I need the flexibility that the values could be any sets of numbers. Below is a simple example to get me on my way, but it only handles the extraction rather than the logic to extract row 8. Any help is welcomed. Thank you

George

data have;
    infile datalines dsd dlm=",";
    input   Variable_names : $char10.
            Var1            : 8.
            Var2            : 8.
            Var3            : 8.
            Var4            : 8.
            Var5            : 8.
            Var6            : 8.;
    datalines;
Row 1 , 5     , 1, 2, 3, 4, 5
Row 2 , 5     , 1, 2, 3, 4, 5
Row 3 , 4     , 1, 2, 3, 4, 5
Row 4 , 4     , 1, 2, 3, 4, 5
Row 5 , 3   , 1, 2, 3, 4, 5
Row 6 , 3     , 1, 2, 3, 4, 5
Row 7 , 2     , 1, 2, 3, 4, 5
Row 8 , 2     , 1, 2, 3, 4, 5
Row 9 , 5     , 1, 2, 3, 4, 5
Row 10, 5     , 1, 2, 3, 4, 5
Row 11, 6     , 1, 2, 3, 4, 5
Row 12, 7     , 1, 2, 3, 4, 5
Row 13, 8     , 1, 2, 3, 4, 5
;
run;

data want_with;
set have;
if strip(Var1) = 2 then output want_with;
run;

FreelanceReinh · Posted 10-01-2019 11:52 AM

Hello @George3 and welcome to the SAS Support Communities!

To extract only the last observation before the first increase, you can use a look-ahead merge:

data want(drop=nextval);
merge have have(keep=var1 rename=(var1=nextval) firstobs=2);
if nextval>var1>. then do;
  output;
  stop;
end;
run;

View solution in original post

maguiremq · Posted 10-01-2019 11:52 AM

data have_2;
set have;
row = _N_;
run;

proc sort data=have_2;
by descending row;
run;

data want;
set have_2;
if strip(var1) < lag(strip(var1)) then flag = 1;
run;

The lag function is probably what you're after. This captures all value changes, so I'm not sure if it's exactly what you want, but it flags the 8th.

George3 · Posted 10-01-2019 12:40 PM

thank you for your help.

FreelanceReinh · Posted 10-01-2019 11:52 AM

Hello @George3 and welcome to the SAS Support Communities!

To extract only the last observation before the first increase, you can use a look-ahead merge:

data want(drop=nextval);
merge have have(keep=var1 rename=(var1=nextval) firstobs=2);
if nextval>var1>. then do;
  output;
  stop;
end;
run;

George3 · Posted 10-01-2019 12:39 PM

Hello,

Perfect! Thank you very much.

George

mkeintz · Posted 10-01-2019 01:10 PM

In case you are reading in all the raw data merely to write out one observation, it is possible to modify the INPUT statement to stop processing once the condition is satisfied. I.e. if you don't already have dataset HAVE, you don't need to create it: In the case of big raw data files, this could be notably more efficient:

data want (drop=nxt_:);
    infile datalines dsd dlm=",";
    input @1  Variable_names  : $char10.
            Var1            : 8.
            Var2            : 8.
            Var3            : 8.
            Var4            : 8.
            Var5            : 8.
            Var6            : 8.
    / nxt_varnames :$char10. nxt_var1 :8. @@;
	if nxt_var1>var1 then do; output; stop; end;
    datalines;
Row 1 , 5     , 1, 2, 3, 4, 5
Row 2 , 5     , 1, 2, 3, 4, 5
Row 3 , 4     , 1, 2, 3, 4, 5
Row 4 , 4     , 1, 2, 3, 4, 5
Row 5 , 3     , 1, 2, 3, 4, 5
Row 6 , 3     , 1, 2, 3, 4, 5
Row 7 , 2     , 1, 2, 3, 4, 5
Row 8 , 2     , 1, 2, 3, 4, 5
Row 9 , 5     , 1, 2, 3, 4, 5
Row 10, 5     , 1, 2, 3, 4, 5
Row 11, 6     , 1, 2, 3, 4, 5
Row 12, 7     , 1, 2, 3, 4, 5
Row 13, 8     , 1, 2, 3, 4, 5
run;

The "/" in the INPUT statement tells SAS to skip to the new raw data line.
The trailing "@@" in the INPUT statement tells SAS to hold the input pointer after reading the preceding variable. I.e. the pointer will be just after the second variable (VAR1) in the second data line. This means that in the next iteration of the DATA step, the INPUT processing will start in the same location. But the leader "@1" tells SAS to back up to column 1 of the current record (which was the 2nd record in the preceding iteration.

regards,

Mark

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

George3 · Posted 10-01-2019 01:43 PM

Hello Mark,

Thank you. This isn't the route I need to take, but I will keep this if I need it in the future.

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away