Hi everyone,
I think I have a quite easy problem I cannot solve due to my lack of knowledge of SAS language.
For example I've a dataset as follow:
Name | Year | Source | File | Paper |
John | 2000 | X | -4 | |
John | 2000 | Y | -3 | 54 |
Jack | 2000 | X | 6 | 5 |
Jack | 2000 | Y | 8 | 12 |
John | 2001 | X | -2 | |
John | 2001 | Y | -1 | 13 |
Jack | 2001 | X | 8 | 12 |
Jack | 2001 | Y | 12 | 14 |
John | 2002 | X | -2 | |
John | 2002 | Y | -1 | 26 |
Jack | 2002 | X | 6 | 5 |
Jack | 2002 | Y | 8 | 12 |
What I whish to do is:
if (Name) = Jack and (Source)= X, to copy according to the (Year) the values from (File)
into (Paper) where (Name) = John , (Source) = X and (Year) are identical to the one from the row of the selected data.
I hope I am clear enough, to sum up I want to fill the blank cells from Paper coresponding to John with the value from File from Jack when Source and Year are identical.
Thank you for your help, I am looking for this code for two days and don't succed.
Regards,
Damien
Then I suggest:
- sort your dataset in a way that a source observation will always be encountered before a possible target observation
- use RETAIN to define (a) helper variable(s) to keep values across observations
- use a condition according to your rules to identify a source observation and assign the values to the retain variable(s)
- use another condition to identify the target observation(s) where variable(s) need to be set from the retained helper variable(s)
- use first. conditions to reset the retain variable(s) when a group change occurs
Does the dataset adhere to that structure all the way through?
(for every name/year, there's always a X and a Y)
If so, then
proc sort data=have;
by year source name;
run;
* Now we always have a succession of Jack/John for every year and source;
data want;
set have;
oldfile = lag(file); * get value from preceding observation;
if paper = . and name = 'John' then paper = oldfile;
* if paper and file are character variables, compare with empty string;
drop oldfile;
run;
* restore original order;
proc sort data=want;
by year descending name source;
run;
Thank you for your answer but no, the dataset doesn't adhere to that structure all the way through, it l be more like this:
Name | Year | Source | File | Paper |
John | 2000 | X | -4 | |
Jack | 2000 | X | 6 | 5 |
Jack | 2000 | Y | 8 | 12 |
John | 2001 | X | -2 | |
John | 2001 | Y | -1 | 13 |
Jack | 2001 | X | 8 | 12 |
Jack | 2001 | Y | 12 | 14 |
John | 2002 | X | -2 | |
John | 2002 | Y | -1 | 26 |
Jack | 2002 | X | 6 | 5 |
Jack | 2002 | Y | 8 | 12 |
I cannot use the lag function for the entire dataset, what I need is more a code where I can really select the cell corresponding to my code
(ex: copy File where Year=2000 and Name=Jack and Source=X;
paste in Paper when Name=John and Source=X and Year=2000)
if possible with a code allowing me not to enter all the years (I ve more than 50) but stating if Year is identical.
Sorry if I am not so clear, I am not so familiar with SAS programming.
Then I suggest:
- sort your dataset in a way that a source observation will always be encountered before a possible target observation
- use RETAIN to define (a) helper variable(s) to keep values across observations
- use a condition according to your rules to identify a source observation and assign the values to the retain variable(s)
- use another condition to identify the target observation(s) where variable(s) need to be set from the retained helper variable(s)
- use first. conditions to reset the retain variable(s) when a group change occurs
You could also do it as a join + coalesce() function (not tested below - post data as a datastep):
proc sql; create table WANT as select A.NAME, A.YEAR, A.SOURCE, A.FILE, COALESCE(A.PAPER,B.FILE) as PAPER /* this is so if paper already exists use that, otherwise the merged value */ from HAVE A left join (select * from HAVE where NAME="Jack" and SOURCE="X") B on A.NAME=B.NAME and A.YEAR=B.YEAR; quit;
Thank you for your answer, I am going to check how I can use it but I don't think it will work for me, in my dataset I have actually 50 columns and 50 "Name" with a lot of missing values and "Name" not existing under some sources.
I really need a code which allow me to select a specific data from a specific variable and to paste it in an other variable with a specific "Name".
It doens't seem to me it is possible with this code but maybe I misundertood it.
Regards,
We can only answer the question as you provide it here. If your question does not match your requirement, nor will the solution. Post some test data - which matches what you have (just a few lines, as a datastep) - and what you want your output to look like. If you have that many columns then there are several approaches you could take, my first idea would be to normalise your data, but you may be able to use arrays. But this is guessing without seeing.
If your rules are hardcoded, then it can't be automated. If the rules can be generalized then it can be automated.
It sounds like you're saying your rules are hardcoded. If so then you may as well write a whole bunch of if statements.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.