I am running into a problem where I would like to duplicate data from one row that contains data to other rows with the same key value, but does not contain any additional data. However, ONLY conditioned if variable 3 or V3 is equal to "Cardio". For instance. I have a dataset that looks like this.
The first two row contain the same key, but the second row does not contain any other information other than the key (no value for v1,v2, v3 etc.) Because the first row has "cardio" for V3, this row would need to be duplicated to the second row that does not contain the information. Now the second image is what I want the dataset to look like.
How am I able to do that in SAS? Thanks in advance.
It is real hard to program against pictures. Best on this forum is to provide example data in the form of data step code so we can make a data set.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
Completely untested and assumes your data is sorted by the Key variable. Assumes there is only one row that needs to be assigned.
data want; set have; by key; lv1 = lag(v1); lv2 = lag(v2); lv3 = lag(v3); lv4 = lag(v4); lv5 = lag(v5); if not first.key and lv3='Cardio' and missing(v2) then do; /* assumes that missing v2 means all of v1-v4 need to be replaced*/ v1 = lv1; v2 = lv2; v3 = lv3; v4 = lv4; v5 = lv5; end; drop lv1 - lv5; run;
The BY statement creates variables that indicate whether the current observation is the first or last of the By group. The values are 1 (true) and 0 (false) and accessed using the First.variablename and last.variablename syntax.
Lag will retrieve the previous observation values from a data set on the SET statement. LAG is a queued function and will retrieve the value the last time a condition was true so you likely in this case do not want to conditionally request the lagged values.
Thank you so much for this! This is my first post here, however I will definitely take your advice moving forward whenever I run into any problems. Thank you!!!
@ballardw 's response is the most sensible given you only have five variables V1 through V5.
But if you have, say, 50 vars, you would need to code 50 LVx=lag(Vx) statements and 50 Vx=LVx statements: In that case, given my understanding of your task, this might be preferable:
data want (drop=_:) ;
set have ;
by id;
retain _id . ;
if _n_=1 then do;
declare hash h (dataset:'have (obs=0 rename=(id=_id))');
h.definekey('_id');
h.definedata(all:'Y');
h.definedone();
end;
if first.id then do;
if v3='Cardio' then _rc=h.replace();
else _rc=h.remove();
end;
else if missing(v2) then _rc=h.find();
run;
Remember, this checks for any ID that begins with V3='Cardio', and all subsequent records that have missing V2. That's how I understand your description.
Editted note: corrected code from
if v3='Cardio' then _rc h.replace();
to
if v3='Cardio' then _rc=h.replace();
this is highly helpful as well! Going to test this one out.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.