BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Raza_M
Obsidian | Level 7

Hi,

I using longitudinal data stored in a wide format. In order to benefit from repeated measure, I need to convert the data in long format. In original data, I have many variables but for the sake of understanding I am using self-explanatory names. I have survey years 2014, 2015, 2016, 2017 and variables names have suffix _4, _5, _6, and _7.

 

*STEP 1: CREATE SURVEY YEARS CORRESPONDING RESPONSES;

 

data test1b;

set test1a;

if Exposure_4>. and outcome_4>. and Exposure_5>. and outcome_5>. then do;

 survey_year1=2014; exposure1= Exposure_4; outcome1= outcome_4; * First survey year (survey_year1), exposure (exposure1) and exposure (outcome1);

 survey_year2=2015; exposure2= Exposure_5; outcome2= outcome_5; * Second survey year, exposure and outcome;

   output; end;

** 2015 - 2016 **;

if Exposure_5>. and outcome_5>. and Exposure_6>. and outcome_6>. then do;

 survey_year1=2015; exposure1= Exposure_5; outcome1= outcome_5;

 survey_year2=2016; exposure2= Exposure_6; outcome2= outcome_6;

   output; end;

** 2016 - 2017 **;

if Exposure_6>. and exdrink_6>. and Exposure_7>. and exdrink_7>. then do;

 survey_year1=2016; exposure1= Exposure_6; exdrink1= exdrink_6;

 survey_year2=2017; exposure2= Exposure_7; exdrink2= exdrink_7;

   output; end;

   run;

 

*STEP 2: RE CODE SURVEY_YEAR* TO YEAR TO USE IN ARRAY TO CREATE CURRENT AND PREVIOUS OUTCOME AND EXPOSURE;

 

data test1c;

set test1b;

 

if survey_year1=2014 then year=1; * naming the survey years again as "year" for the arrays below;

if survey_year1=2015 then year=2;

if survey_year1=2016 then year=3;

if survey_year1=2017 then year=4;

 

if survey_year2=2014 then year2=1;

if survey_year2=2015 then year2=2;

if survey_year2=2016 then year2=3;

if survey_year2=2017 then year2=4;

run;

 

 

*STEP 3: USE ARRAY TO CONVERT WIDE TO LONG FORMAT;

data test1d;

set  test1c;

array sx{4} sex_4 sex_5 sex_6 sex_7;

array ag{4} age_4 age_5 age_6 age_7;

array out{4} outcome_4 outcome_5 outcome_6 outcome_7;

array exp{4} Exposure_4 Exposure_5 Exposure_6 Exposure_7;

sex=sx{year};

age=ag{year};

outcome=out{year};

exposure=exposure{year};

** Survey 1;

 ex_outcome=out{year2};

 ex_exposure=exp{ivuosi2};

run;

 

Step 1 and 2 are working fine but step3 isn’t. I am not getting any errors after running step three. It runs fine but is not working, data is not changing to wide format and doesn’t assign current and previous values. I might be missing something but I do not know what.

 

Can someone point out what is wrong or missing here? 

 

Quick responses are highly appreciated, I have lost lot of time trying to fix the code above.

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

You need a "look ahead", which can be done via merging HAVE with the next line of HAVE, as shown here:

 

data have;
input id $ year a b;
datalines;
1 2012 6.1 0
1 2014 6.1 0
1 2016 7.3 1
1 2018 2.5 1
2 2012 10 1
2 2014 . 1
2 2016 10 1
2 2018 10 1
3 2012 14.5 0
3 2014 14.5 0
3 2016 13.6 0
3 2018 13.6 1
4 2012 5.9 1
4 2014 . .
4 2016 . .
4 2018 5.9 1
5 2012 4.7 0
5 2014 4.7 0
5 2016 9.0 0
5 2018 9.0 1
;

data want;
    merge have have(firstobs=2 keep=id a rename=(id=next_id a=next_a));
    if id^=next_id then next_a=.;
    drop next_id;
run;
--
Paige Miller

View solution in original post

17 REPLIES 17
PaigeMiller
Diamond | Level 26

You haven't given a value to variable YEAR, so SAS cannot evaluate lines like

sex=sx{year};

I think you want a DO loop here:

data test1d;
set  test1c;
array sx{4} sex_4 sex_5 sex_6 sex_7;
array ag{4} age_4 age_5 age_6 age_7;
array out{4} outcome_4 outcome_5 outcome_6 outcome_7;
array exp{4} Exposure_4 Exposure_5 Exposure_6 Exposure_7;
do year=1 to 4;
    sex=sx{year};
    age=ag{year};
    outcome=out{year};
    exposure=exposure{year};
    output;
end;
drop sex_: age_: outcome_: exposure_:;
run;

 

--
Paige Miller
Raza_M
Obsidian | Level 7

Thank you for answering.

Year I have in Step 2, I want to use the year variable created in step 2.

PaigeMiller
Diamond | Level 26

@Raza_M wrote:

Thank you for answering.

Year I have in Step 2, I want to use the year variable created in step 2.


No, you don't want to use that variable YEAR. The YEAR created in earlier data steps will not get the job done in an ARRAY, because it is constant on a given row of the input data set in the SET statement. You need YEAR to vary within an observation to be able to reference the four different values of SX and the four different values of AG and so on. (Also, I think it is relatively pointless to convert actual YEAR values like 2015 to a single digit). 

 

data test1d;
set test1c;
array sx{4} sex_4 sex_5 sex_6 sex_7;
array ag{4} age_4 age_5 age_6 age_7;
array out{4} outcome_4 outcome_5 outcome_6 outcome_7;
array exp{4} Exposure_4 Exposure_5 Exposure_6 Exposure_7;
do yr=1 to 4;
    sex=sx{yr};
    age=ag{yr};
    outcome=out{yr};
    exposure=exposure{yr};
    output;
end;
drop sex_: age_: outcome_: exposure_:;
run;
--
Paige Miller
Raza_M
Obsidian | Level 7

Hmm OK then I put a DO loop in the array statement.

I still need previous years exposure. Can you help to create a syntax where I have change in exposure and outcome based on previous years' responses.

In example below person 1 has responded to all four waves and has a change in exposure and outcome. 

 

id    year    exposure    outcome

1     2014       7                0

1     2015       8                1

1     2016       8                1

1     2017       5                0

 

For the analysis, I want to have previous exposure and previous outcome for each year. I need to bring next survey values in a same row as previous survey.

 

id    year    exposure    outcome    exp_2       outcome_2

1     2014       7                0               8                   1

1     2015       8                1               8                   1

1     2016       8                1               5                   0

1     2017       5                0                .                   .

 

 

PaigeMiller
Diamond | Level 26

At this point, I am requesting you provide a representative portion of your data as SAS data step code, following these instructions: https://blogs.sas.com/content/sastraining/2016/03/11/jedi-sas-tricks-data-to-data-step-macro/

 

We're trying to help you, now you have to help us by following the directions.

 

Regarding previous year's exposure, I don't understand your example. How can the previous year's exposure be known for 2014 if there are no earlier years in the data?

--
Paige Miller
Raza_M
Obsidian | Level 7

OK, Thank you! I do that. 2014 does not have a previous exposure, 2014 is previous exposure for 2015. It is more like a baseline exposure.

Raza_M
Obsidian | Level 7

data have;
input id $ year $ a $ b $;
datalines;
1 2012 6.1 0
1 2014 6.1 0
1 2016 7.3 1
1 2018 2.5 1
2 2012 10 1
2 2014 . 1
2 2016 10 1
2 2018 10 1
3 2012 14.5 0
3 2014 14.5 0
3 2016 13.6 0
3 2018 13.6 1
4 2012 5.9 1
4 2014 . .
4 2016 . .
4 2018 5.9 1
5 2012 4.7 0
5 2014 4.7 0
5 2016 9.0 0
5 2018 9.0 1
;

Above is an example of the data, due to confidentiality issues i cannot share from original data. Now I want to bring subsequent value in front by id.
For example below is a want data.

 

data want;
input id $ year $ a $ want $ b $;
datalines;
1 2012 6.1 6.1 0
1 2014 6.1 7.3 0
1 2016 7.3 2.5 1
1 2018 2.5 . 1
2 2012 10 9.4 1
2 2014 9.4 10 1
2 2016 10 10 1
2 2018 10 . 1
3 2012 14.5 14.5 0
3 2014 14.5 13.6 0
3 2016 13.6 13.6 0
3 2018 13.6 . 1
4 2012 5.9 . 1
4 2014 . . .
4 2016 . 5.9 .
4 2018 5.9 . 1
5 2012 4.7 4.7 1
5 2014 4.7 9.0 0
5 2016 9.0 12.6 0
5 2018 12.6 . 0
;

PaigeMiller
Diamond | Level 26

data want;
input id $ year $ a $ want $ b $;
datalines;
1 2012 6.1 6.1 0
1 2014 6.1 7.3 0
1 2016 7.3 2.5 1
1 2018 2.5 . 1
2 2012 10 9.4 1
2 2014 9.4 10 1
2 2016 10 10 1
2 2018 10 . 1
3 2012 14.5 14.5 0
3 2014 14.5 13.6 0
3 2016 13.6 13.6 0
3 2018 13.6 . 1
4 2012 5.9 . 1
4 2014 . . .
4 2016 . 5.9 .
4 2018 5.9 . 1
5 2012 4.7 4.7 1
5 2014 4.7 9.0 0
5 2016 9.0 12.6 0
5 2018 12.6 . 0
;

 

I have highlighted some number in red. I don't understand the logic of how these numbers are derived from the original data set, and I request explanation.

--
Paige Miller
Raza_M
Obsidian | Level 7

OK, I'll try to be clearer.

 

Thank you for your help yesterday!

 

I have extracted an example from my data using dummy names of variables and values, unfortunately due to confidentiality issues i cannot share observation from original data. Below is an example data that can be used for a solution. In the "have" data, there is a variable "a". I want to bring values of subsequent years of "a" in front of original values of "a". In the "want" data in the end, there is a  "want" variable that i want to create. Values are bounded by id. I want the changes to occur within id that is why last value of "want" variable in the "want" data of each id is missing because there is no survey year after 2018, so 2018 does not have a subsequent year.

 

I hope I better explained this time.

 

data have;
input id $ year $ a $ b $;
datalines;
1 2012 6.1 0
1 2014 6.1 0
1 2016 7.3 1
1 2018 2.5 1
2 2012 10 1
2 2014 . 1
2 2016 10 1
2 2018 10 1
3 2012 14.5 0
3 2014 14.5 0
3 2016 13.6 0
3 2018 13.6 1
4 2012 5.9 1
4 2014 . .
4 2016 . .
4 2018 5.9 1
5 2012 4.7 0
5 2014 4.7 0
5 2016 9.0 0
5 2018 9.0 1
;

 

I want to bring values of "a" from subsequent years in the same row as original values. For example,  

 

data want;
input id $ year $ a $ want $ b $;
datalines;
1 2012 6.1 6.1 0
1 2014 6.1 7.3 0
1 2016 7.3 2.5 1
1 2018 2.5 . 1
2 2012 10 9.4 1
2 2014 9.4 10 1
2 2016 10 10 1
2 2018 10 . 1
3 2012 14.5 14.5 0
3 2014 14.5 13.6 0
3 2016 13.6 13.6 0
3 2018 13.6 . 1
4 2012 5.9 . 1
4 2014 . . .
4 2016 . 5.9 .
4 2018 5.9 . 1
5 2012 4.7 4.7 1
5 2014 4.7 9.0 0
5 2016 9.0 12.6 0
5 2018 12.6 . 0
;

PaigeMiller
Diamond | Level 26

So where does the 9.4 come from in the WANT data set for ID 2 and year 2012? There is no value of 9.4 in the HAVE data set.

--
Paige Miller
Raza_M
Obsidian | Level 7

Sorry, that was a mistake. I have corrected it.

 

data want;
input id $ year $ a $ want $ b $;
datalines;
1 2012 6.1 6.1 0
1 2014 6.1 7.3 0
1 2016 7.3 2.5 1
1 2018 2.5 . 1
2 2012 10 . 1
2 2014 . 10 1
2 2016 10 10 1
2 2018 10 . 1
3 2012 14.5 14.5 0
3 2014 14.5 13.6 0
3 2016 13.6 13.6 0
3 2018 13.6 . 1
4 2012 5.9 . 1
4 2014 . . .
4 2016 . 5.9 .
4 2018 5.9 . 1
5 2012 4.7 4.7 0
5 2014 4.7 9.0 0
5 2016 9.0 9.0 0
5 2018 9.0 . 1
;

PaigeMiller
Diamond | Level 26

You need a "look ahead", which can be done via merging HAVE with the next line of HAVE, as shown here:

 

data have;
input id $ year a b;
datalines;
1 2012 6.1 0
1 2014 6.1 0
1 2016 7.3 1
1 2018 2.5 1
2 2012 10 1
2 2014 . 1
2 2016 10 1
2 2018 10 1
3 2012 14.5 0
3 2014 14.5 0
3 2016 13.6 0
3 2018 13.6 1
4 2012 5.9 1
4 2014 . .
4 2016 . .
4 2018 5.9 1
5 2012 4.7 0
5 2014 4.7 0
5 2016 9.0 0
5 2018 9.0 1
;

data want;
    merge have have(firstobs=2 keep=id a rename=(id=next_id a=next_a));
    if id^=next_id then next_a=.;
    drop next_id;
run;
--
Paige Miller
Raza_M
Obsidian | Level 7

There is still a confusion. I do not have "want" dataset and I do not have "want" variable. I wish/want to create this variable using "a" variable in "have" dataset. I need help in creating a syntax that give me the "want" variable. 

 

Is it clear or is there still a confusion? I am sorry for being so bad at explaining. 

PaigeMiller
Diamond | Level 26

@Raza_M wrote:

There is still a confusion. I do not have "want" dataset and I do not have "want" variable. I wish/want to create this variable using "a" variable in "have" dataset. I need help in creating a syntax that give me the "want" variable. 

 

Is it clear or is there still a confusion? I am sorry for being so bad at explaining. 


While it may be possible to do this all with one DATA step, what is the problem of doing it in two steps? The solution is pretty simple if you do it in two steps.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 1832 views
  • 3 likes
  • 2 in conversation