name | date | value |
a | Jan-11 | 0 |
a | Feb-11 | 0 |
a | Mar-11 | 0 |
a | May-11 | 0 |
a | Jun-11 | 0 |
a | Jul-11 | 3 |
a | Aug-11 | 15 |
a | Oct-11 | 22 |
a | Nov-11 | 21 |
a | Dec-11 | 20 |
a | Feb-12 | 0 |
a | Mar-12 | 21 |
a | Apr-12 | 0 |
b | Jan-11 | 0 |
b | Feb-11 | 0 |
b | Mar-11 | 0 |
b | May-11 | 22 |
b | Jun-11 | 23 |
b | Jul-11 | 55 |
b | Aug-11 | 565 |
b | Oct-11 | 566 |
b | Nov-11 | 20 |
b | Dec-11 | 20 |
b | Feb-12 | 20 |
b | Mar-12 | 11 |
b | Apr-12 | 0 |
Experts,
I am trying to read the data starting from first non-zero value for every "Name" here reading for "a" should start from July-11 and for "b" it should start from May-11.
Can we do this by manipulating while reading raw file or we need have the data in sas dataset format and then only we use datasteps and procs to do task.
Thanks.
Sorry, did you try removing the value variable from drop statement as below
data have;
input name$ date$ value;
cards;
a Jan-11 0
a Feb-11 0
a Mar-11 0
a May-11 0
a Jun-11 0
a Jul-11 3
a Aug-11 15
a Oct-11 22
a Nov-11 21
a Dec-11 20
a Feb-12 0
a Mar-12 21
a Apr-12 0
b Jan-11 0
b Feb-11 0
b Mar-11 0
b May-11 22
b Jun-11 23
b Jul-11 55
b Aug-11 565
b Oct-11 566
b Nov-11 20
b Dec-11 20
b Feb-12 20
b Mar-12 11
b Apr-12 0
;
data want;
set have;
retain flag;
by name ;
if first.name then flag=.;
if value ne 0 then flag=value;
if flag ne .;
drop flag;
run;
output:
Thanks,
Jag
Here is a possible way:
data have;
infile cards dlm='09'x;
input (name date) (:$8.) value;
retain n;
if name ne lag(name) then
n=constant('big');
if value ne 0 then
n=_n_;
if _n_ >= n then
output;
cards;
a Jan-11 0
a Feb-11 0
a Mar-11 0
a May-11 0
a Jun-11 0
a Jul-11 3
a Aug-11 15
a Oct-11 22
a Nov-11 21
a Dec-11 20
a Feb-12 0
a Mar-12 21
a Apr-12 0
b Jan-11 0
b Feb-11 0
b Mar-11 0
b May-11 22
b Jun-11 23
b Jul-11 55
b Aug-11 565
b Oct-11 566
b Nov-11 20
b Dec-11 20
b Feb-12 20
b Mar-12 11
b Apr-12 0
;
Alternatively please try
data want;
set have;
retain flag;
by name ;
if first.name then flag=.;
if value ne 0 then flag=value;
if flag ne .;
drop value flag;
run;
Thanks,
Jag
Hai.kuo , this code is showing error.
Jagdishkatam, this seems not working correctly as values are truncating and its not starting "date" from first non zero instance for "name" "b".
Sorry, did you try removing the value variable from drop statement as below
data have;
input name$ date$ value;
cards;
a Jan-11 0
a Feb-11 0
a Mar-11 0
a May-11 0
a Jun-11 0
a Jul-11 3
a Aug-11 15
a Oct-11 22
a Nov-11 21
a Dec-11 20
a Feb-12 0
a Mar-12 21
a Apr-12 0
b Jan-11 0
b Feb-11 0
b Mar-11 0
b May-11 22
b Jun-11 23
b Jul-11 55
b Aug-11 565
b Oct-11 566
b Nov-11 20
b Dec-11 20
b Feb-12 20
b Mar-12 11
b Apr-12 0
;
data want;
set have;
retain flag;
by name ;
if first.name then flag=.;
if value ne 0 then flag=value;
if flag ne .;
drop flag;
run;
output:
Thanks,
Jag
Yes..!
Its working now and giving expected output.
I know there may different ways of achieving the same result, Can you please explain not only how the code is working but also how to approach and think solutions for such data manipulations?
Thanks
In my code i used the concept of last observation carried forward (LOCF). With first.name, i tried to retain the values and if the value is zero then the previous values will replace the next value. so if the value is zero between the records then those values will be replaced with previous non zero value. This way only the first records will have zero values and remaining records get a non zero values. In last i did not consider the records with missing values(zero).
Hope this helps.
Thanks,
Jag
Hi, haikuo's code works fine. Just make the code to read a SAS dataset than raw data to understand easily:
data have;
input name$ date$ value;
cards;
a Jan-11 0
a Feb-11 0
a Mar-11 0
a May-11 0
a Jun-11 0
a Jul-11 3
a Aug-11 15
a Oct-11 22
a Nov-11 21
a Dec-11 20
a Feb-12 0
a Mar-12 21
a Apr-12 0
b Jan-11 0
b Feb-11 0
b Mar-11 0
b May-11 22
b Jun-11 23
b Jul-11 55
b Aug-11 565
b Oct-11 566
b Nov-11 20
b Dec-11 20
b Feb-12 20
b Mar-12 11
b Apr-12 0
;
data want;
set have;
retain n;
if name ne lag(name) then
n=constant('big');
if value ne 0 then
n=_n_;
if _n_ >= n then
output;
drop n;
run;
Oh yes it works, but can you also explain the logic behind the code.
I mean first two IF conditions are not interdependent and if i am correct they are mutually exclusive.
And code did not work with raw file/ Infile statement, why?
How can we put this in simple English language for better understanding.
It didn't work because the "copy & paste" between HTML and my text editor messed up the delimiter, it converted 'Tabs' into 'Blanks'. In following version, the code is updated that way that data is streaming in to mimic the real life raw file instead of using unbuffered 'CARDS'.
filename FT15F001 temp;
data have;
infile FT15F001;
input (name date) (:$8.) value;
retain n;
if name ne lag(name) then
n=constant('big');
if value ne 0 then
n=_n_;
if _n_ >= n then
output;
parmcards;
a Jan-11 0
a Feb-11 0
a Mar-11 0
a May-11 0
a Jun-11 0
a Jul-11 3
a Aug-11 15
a Oct-11 22
a Nov-11 21
a Dec-11 20
a Feb-12 0
a Mar-12 21
a Apr-12 0
b Jan-11 0
b Feb-11 0
b Mar-11 0
b May-11 22
b Jun-11 23
b Jul-11 55
b Aug-11 565
b Oct-11 566
b Nov-11 20
b Dec-11 20
b Feb-12 20
b Mar-12 11
b Apr-12 0
;
Sir, That's fine.No worries! I know that haikuo's code can't go wrong. But I am just intrigued with your usage of constant function. What a Brilliant FULLPROOF concept that you gave us that can handle any large mega datasets. Did you just like that knew that 1.78e NUMBER?I was simply reading about that number for 2 hours lol..Can you offer a small and simple explanation on that constant application here,when and if you have the time/?
Sorry for the bother,
Naveen
Thanks, and my codes go wrong all the time unfortunately. Usage of Constant() is just a spontaneous act and I would have chosen a regular flag (like others here) in the real work to make it more readable and to some extend, more robust. The reason to use Constant('big') was a result from my decision to retain row number when first encounter value=0, and only outputs those greater row numbers (_n_>=n), reset to a largest number when switching IDs.
Regards,
Haikuo
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.