- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Current version: 9.04.01M6P111518
I used a survey tool that reported the individual responses in a wonky way. I can clean the data down to the picture shown of the first record. I eliminated all superfluous variables. I reduced the observations and added the name variable by the following code (that works). The code that does not work is shown next.
Data survey.e244x survey2.e244d;
set original;
length name $18.;
name=section__;
if missing(name) and __Responses = '0' then delete;
keep section__ name __responses answer q_text;
run;
And the data that results is shown in this proc print.
Data with three observations are at https://github.com/campnmug/surveyproblem
So far so good. However, what I want to end up with is all data on the same line, that is all of the data from Sam Kidd as observation 1 in the new dataset and John Kidd (not shown) as obs=2, and so forth. My first thought was to backfill the blank name rows with the name from the first obs as long as missing(name) was true. When missing(name) was false then the record would be the next student. This is where I ran into problems, the code below worked on obs=2, but not on obs=3! I am baffled. Here is the code and data look as a result.
Data e244;
set survey2.e244d;
lagname=lag(name);
if missing(name)=1 then name=lagname;
run;
title 'data=e244';
proc print data=e244; run;
with the result:
So I put in some markers to check what was happening and everything appears correctly working, but the lagname does not work again for obs=3.
Data e244debug;
set survey2.e244d;
/* put in debugging markers to chack the flow */
q=substr(Q_text,1,3); /* For Exp */
works=.; n=_n_;
if missing(name)=0 then do; works=0; end;
lagname=lag(name);
if missing(name)=1 and __responses='1' and q='For' then do; name=lagname; works=1; end;
if missing(name)=1 and __responses='1' and q='Exp' then do; name=lagname; works=2; end;
run;
title 'data=e244debug';
proc print data=e244debug; run;
With the result
At this point, I have to hand it over to you!
This bugs me that the lagname is not working, but even if it did the eventual goal is not solved, that is to get all the relevant information on a single observation for each student respondent. In the meantime I can manipulate this in excel and get what I need, but why should I have to do that when I should have been able to SAS this much faster and more accurately. Fortunately, it is a survey with less than 40 obs. (BTW names in the dataset are fictional, but the responses are original.
Associate Professor Emeritus, Department of Economics
College of Business Administration
Akron, OH 44325-1908
www.linkedin.com/in/stevencmyers
https://econdatascience.com/
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How about a much simpler approach?
Data e244;
set survey2.e244d;
retain name1;
if not missing(name) then name1=name;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You don't seem to understand how LAG() works. It does not know anything about observations, it just records the values passed into it. If NAME is missing when LAG() is called then the missing value gets into the queue and will be returned by the next call.
Instead of LAG() just use RETAIN. Make a new variable. Set it to be retained from observation to observation. Set it to a value when NAME is not missing.
Here is an example using both LAG() and RETAIN to show the difference.
data have;
row+1;
input name $ @@;
cards;
a b . c d . e . . f
;
data want;
set have;
length lag_name new_name $8;
lag_name = lag(name);
new_name = coalescec(name,new_name);
retain new_name;
run;
Obs row name lag_name new_name 1 1 a a 2 2 b a b 3 3 b b 4 4 c c 5 5 d c d 6 6 d d 7 7 e e 8 8 e e 9 9 e 10 10 f f