BookmarkSubscribeRSS Feed
StevenMyers
Fluorite | Level 6

Current version: 9.04.01M6P111518


I used a survey tool that reported the individual responses in a wonky way. I can clean the data down to the picture shown of the first record. I eliminated all superfluous variables. I reduced the observations and added the name variable by the following code (that works). The code that does not work is shown next.

Data survey.e244x survey2.e244d;
set original;
length name $18.;
name=section__;
if missing(name) and __Responses = '0' then delete;
keep section__ name __responses answer q_text;
run;

And the data that results is shown in this proc print. 

e244d_look.PNG

Data with three observations are at https://github.com/campnmug/surveyproblem 

 

So far so good. However, what I want to end up with is all data on the same line, that is all of the data from Sam Kidd as observation 1 in the new dataset and John Kidd (not shown) as obs=2, and so forth. My first thought was to backfill the blank name rows with the name from the first obs as long as missing(name) was true. When missing(name) was false then the record would be the next student. This is where I ran into problems, the code below worked on obs=2, but not on obs=3! I am baffled. Here is the code and data look as a result. 

Data e244;
	set survey2.e244d;
	lagname=lag(name);
	if missing(name)=1 then name=lagname; 
run;
title 'data=e244';
proc print data=e244; run;

with the result:

E244d_lagfail.PNG

So I put in some markers to check what was happening and everything appears correctly working, but the lagname does not work again for obs=3.

Data e244debug;
	set survey2.e244d;
	/* put in debugging markers to chack the flow */
	q=substr(Q_text,1,3);  /* For Exp */
    works=.; n=_n_;
	if missing(name)=0 then do; works=0; end;
	lagname=lag(name);
	if missing(name)=1 and __responses='1' and q='For' then do;  name=lagname; works=1;   end; 
	if missing(name)=1 and __responses='1' and q='Exp' then do;  name=lagname; works=2;   end; 
run;
title 'data=e244debug';
proc print data=e244debug; run;

With the result 

E244d_lagfail_debug.PNG

 

At this point, I have to hand it over to you! 

 

This bugs me that the lagname is not working, but even if it did the eventual goal is not solved, that is to get all the relevant information on a single observation for each student respondent. In the meantime I can manipulate this in excel and get what I need, but why should I have to do that when I should have been able to SAS this much faster and more accurately. Fortunately, it is a survey with less than 40 obs. (BTW names in the dataset are fictional, but the responses are original. 

 

 

Steven C. Myers
Associate Professor Emeritus, Department of Economics
College of Business Administration
Akron, OH 44325-1908
www.linkedin.com/in/stevencmyers
https://econdatascience.com/

2 REPLIES 2
PaigeMiller
Diamond | Level 26

How about a much simpler approach?

Data e244;
	set survey2.e244d;
	retain name1;
	if not missing(name) then name1=name; 
run;

 

--
Paige Miller
Tom
Super User Tom
Super User

You don't seem to understand how LAG() works.  It does not know anything about observations, it just records the values passed into it. If NAME is missing when LAG() is called then the missing value gets into the queue and will be returned by the next call.

Instead of LAG() just use RETAIN. Make a new variable. Set it to be retained from observation to observation. Set it to a value when NAME is not missing.

 

Here is an example using both LAG() and RETAIN to show the difference.

data have;
  row+1;
  input name $ @@;
cards;
a b . c d . e . . f
;
data want;
  set have;
  length lag_name new_name $8;
  lag_name = lag(name);
  new_name = coalescec(name,new_name);
  retain new_name;
run;
Obs    row    name    lag_name    new_name

  1      1     a                     a
  2      2     b         a           b
  3      3               b           b
  4      4     c                     c
  5      5     d         c           d
  6      6               d           d
  7      7     e                     e
  8      8               e           e
  9      9                           e
 10     10     f                     f

 

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1062 views
  • 1 like
  • 3 in conversation