BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
joebacon
Pyrite | Level 9

Hi,

 

I have two datasets one in long format and one in wide format.

 

data WORK.FIELD2;
  infile datalines truncover;

input Study_ID:BEST12. Sex:BEST12. 
FU_Day_1:BEST12. FU_Day_1_time:BEST12. FU_Day_2:BEST12. FU_Day_2_time:BEST12. FU_Day_3:BEST12. 
FU_Day_3_time:BEST12. FU_Day_4:BEST12. FU_Day_4_time:BEST12. FU_Day_5:BEST12. FU_Day_5_time:BEST12. FU_Day_6:BEST12. 
FU_Day_6_time:BEST12. FU_Day_7:BEST12. FU_Day_7_time:BEST12. FU_Day_8:BEST12. FU_Day_8_time:BEST12. FU_Day_9:BEST12. 
FU_Day_9_time:BEST12. FU_Day_10:BEST12. FU_Day_10_time:BEST12. FU_Day_11:BEST12. FU_Day_11_time:BEST12. FU_Day_12:BEST12. 
FU_Day_12_time:BEST12. FU_Day_13:BEST12. FU_Day_13_time:BEST12. FU_Day_14:BEST12. FU_Day_14_time:BEST12. FU_Day_15:BEST12. 
FU_Day_15_time:BEST12. FU_Day_16:BEST12. FU_Day_16_time:BEST12. FU_Day_17:BEST12. FU_Day_17_time:BEST12. FU_Day_18:BEST12. 
FU_Day_18_time:BEST12. FU_Day_19:BEST12. FU_Day_19_time:BEST12. FU_Day_20:BEST12. FU_Day_20_time:BEST12. FU_Day_21:BEST12. 
FU_Day_21_time:BEST12. FU_Day_22:BEST12. FU_Day_22_time:BEST12. FU_Day_23:BEST12. FU_Day_23_time:BEST12. FU_Day_24:BEST12. 
FU_Day_24_time:BEST12. FU_Day_25:BEST12. FU_Day_25_time:BEST12. FU_Day_26:BEST12. FU_Day_26_time:BEST12. FU_Day_27:BEST12. 
FU_Day_27_time:BEST12. FU_Day_28:BEST12. FU_Day_28_time:BEST12. FU_Day_29:BEST12. FU_Day_29_time:BEST12. FU_Day_30:BEST12. 
FU_Day_30_time:BEST12.;

format Study_ID BEST12. Sex BEST12. FU_Day_1 BEST12. FU_Day_1_time BEST12. FU_Day_2 BEST12. FU_Day_2_time BEST12. 
FU_Day_3 BEST12. FU_Day_3_time BEST12. FU_Day_4 BEST12. FU_Day_4_time BEST12. FU_Day_5 BEST12. 
FU_Day_5_time BEST12. FU_Day_6 BEST12. FU_Day_6_time BEST12. FU_Day_7 BEST12. FU_Day_7_time BEST12. FU_Day_8 BEST12. 
FU_Day_8_time BEST12. FU_Day_9 BEST12. FU_Day_9_time BEST12. FU_Day_10 BEST12. FU_Day_10_time BEST12. FU_Day_11 BEST12. 
FU_Day_11_time BEST12. FU_Day_12 BEST12. FU_Day_12_time BEST12. FU_Day_13 BEST12. FU_Day_13_time BEST12. FU_Day_14 BEST12. 
FU_Day_14_time BEST12. FU_Day_15 BEST12. FU_Day_15_time BEST12. FU_Day_16 BEST12. FU_Day_16_time BEST12. FU_Day_17 BEST12. 
FU_Day_17_time BEST12. FU_Day_18 BEST12. FU_Day_18_time BEST12. FU_Day_19 BEST12. FU_Day_19_time BEST12. FU_Day_20 BEST12. 
FU_Day_20_time BEST12. FU_Day_21 BEST12. FU_Day_21_time BEST12. FU_Day_22 BEST12. FU_Day_22_time BEST12. FU_Day_23 BEST12. 
FU_Day_23_time BEST12. FU_Day_24 BEST12. FU_Day_24_time BEST12. FU_Day_25 BEST12. FU_Day_25_time BEST12. FU_Day_26 BEST12. 
FU_Day_26_time BEST12. FU_Day_27 BEST12. FU_Day_27_time BEST12. FU_Day_28 BEST12. FU_Day_28_time BEST12. FU_Day_29 BEST12. 
FU_Day_29_time BEST12. FU_Day_30 BEST12. FU_Day_30_time BEST12.;
datalines;
1 0 0 0 0 0 0 0 2 2 2 2 8 10 9 12 3 3 5 3 8 6 0 0 0 0 2 3 0 0 0 0 2 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 0 2 3 2 3 0 0 12 13.5 12 13.5 12 13.5 12 13.5 2 4 0 0 1 2 7 4 2 3.5 0 0 0 0 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 0 5 3 0 0 0 0 2 0.5 4 2 3 2 0 0 4 2 0 0 3 4 2 1 0 0 0 0 0 0 3 2.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 5 0 0 0 0 2 1 0 0 2 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 1 0 0 3 5 3 5 7 6 0 0 0 0 0 0 0 0 2 2 0 0 3 1.5 0 0 0 0 2 2 2 2 2 1 0 0 2 3 5 3.5 . . . . . . . . . . . . . . . . . . . . . .
6 0 0 0 0 0 0 0 0 0 3 1 3 3 3 3 0 0 3 1.5 0 0 0 0 3 2 3 2.5 4 4 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 1 1 0.5 18 4 3 2 0 0 4 3 0 0 2 1 0 0 2 2 0 0 0 0 2 1 0 0 0 0 4 3 2 2 8 3 3 2 0 0 . . . . . . . . . . . . . . . . . . . . . .
8 0 14 7 8 5 0 0 2 2 0 0 0 0 3 2 2 3 0 0 4 3 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 . . . . . . . . . . . . . . . . . . . . . . . .
9 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
;

 

 

data work.fulong ;
  infile datalines dsd dlm='|' truncover;
  input Study_ID Sex day FU_day FU_time counter FU_EBAC FU_14day_DRink
    max_Daycount FU_14Day_Mean_Weekly_Drink
    FU_14Day_Drinks_Per_Drink_Day DrinkDays TotalDrinkDays
    Total_FU_Ebac FU_14Day_Mean_eBAC Heavy VeryHeavy
    Fu_14Day_Very_Heavy_Drink_Count Fu_14Day_Heavy_Drink_Count
    Fu_14Day_Peak_Drinks Fu_14Day_Peak_BAC
  ;
  format Study_ID best12. Sex best12. ;
  informat Study_ID best32. Sex best32. ;
datalines4;
||1|||0|||1|||0|0|||0|0|0|0||
1|0|6|8|10|11|0.25632613908872|41|14|20.5|2.92857142857142|1|9|1.31587146282973|0.14620794031441|1|1|3|4|9|0.28816690647482
1|0|16|2|2|1|0.06421486810551|41|14|20.5|2.92857142857142|1|9|1.31587146282973|0.14620794031441|0|0|3|4|9|0.28816690647482
1|0|15|0|0|2|0|41|14|20.5|2.92857142857142|0|9|1.31587146282973|0.14620794031441|0|0|3|4|9|0.28816690647482
1|0|3|0|0|14|0|41|14|20.5|2.92857142857142|0|9|1.31587146282973|0.14620794031441|0|0|3|4|9|0.28816690647482
1|0|11|0|0|6|0|41|14|20.5|2.92857142857142|0|9|1.31587146282973|0.14620794031441|0|0|3|4|9|0.28816690647482
1|0|5|2|2|12|0.06421486810551|41|14|20.5|2.92857142857142|1|9|1.31587146282973|0.14620794031441|0|0|3|4|9|0.28816690647482
1|0|4|2|2|13|0.06421486810551|41|14|20.5|2.92857142857142|1|9|1.31587146282973|0.14620794031441|0|0|3|4|9|0.28816690647482
1|0|10|8|6|7|0.25739280575539|41|14|20.5|2.92857142857142|1|9|1.31587146282973|0.14620794031441|1|1|3|4|9|0.28816690647482
1|0|7|9|12|10|0.28816690647482|41|14|20.5|2.92857142857142|1|9|1.31587146282973|0.14620794031441|1|1|3|4|9|0.28816690647482
1|0|13|2|3|4|0.06394820143884|41|14|20.5|2.92857142857142|1|9|1.31587146282973|0.14620794031441|0|0|3|4|9|0.28816690647482
3                                                                                                                        The SAS System                                                                                               15:51 Monday, June 1, 2020

1|0|8|3|3|9|0.09632230215827|41|14|20.5|2.92857142857142|1|9|1.31587146282973|0.14620794031441|0|0|3|4|9|0.28816690647482
1|0|12|0|0|5|0|41|14|20.5|2.92857142857142|0|9|1.31587146282973|0.14620794031441|0|0|3|4|9|0.28816690647482
1|0|14|0|0|3|0|41|14|20.5|2.92857142857142|0|9|1.31587146282973|0.14620794031441|0|0|3|4|9|0.28816690647482
1|0|9|5|3|8|0.16107050359712|41|14|20.5|2.92857142857142|1|9|1.31587146282973|0.14620794031441|1|0|3|4|9|0.28816690647482
2|0|9|0|0|7|0|62|14|31|4.42857142857142|0|9|1.72495|0.19166111111111|0|0|4|5|12|0.3339
2|0|15|0|0|1|0|62|14|31|4.42857142857142|0|9|1.72495|0.19166111111111|0|0|4|5|12|0.3339
2|0|8|2|4|8|0.05518333333333|62|14|31|4.42857142857142|1|9|1.72495|0.19166111111111|0|0|4|5|12|0.3339
2|0|13|0|0|3|0|62|14|31|4.42857142857142|0|9|1.72495|0.19166111111111|0|0|4|5|12|0.3339
2|0|3|0|0|13|0|62|14|31|4.42857142857142|0|9|1.72495|0.19166111111111|0|0|4|5|12|0.3339
;;;;

 

Originally, I thought I wanted the data from the long format merged with that from the wide fomat. However, I would ultimately just like the following variables into the wide dataset:

FU_14day_DRink max_daycount FU_14Day_Mean_Weekly_Drink

FU_14Day_Drinks_Per_Drink_Day

TotalDrinkDays Total_FU_Ebac FU_14Day_Mean_eBAC Fu_14Day_Very_Heavy_Drink_Count Fu_14Day_Heavy_Drink_Count Fu_14Day_Peak_Drinks Fu_14Day_Peak_BAC

 

I was able to merge the wide onto the long, but the long onto the wide stacked the two datasets. Any help would be appreciated.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So if all of the observations are exactly the same then use FIRST. or LAST. flags to take either the first or the last observation in the group.

data want;
  merge one many ;
  by study_id;
  if first.study_id;
run;

View solution in original post

11 REPLIES 11
joebacon
Pyrite | Level 9
I should mention that all those variables are a single number that is repeated. They are metrics. I only need that one number per study_id (observation). I was trying to find a way to just output their max value to another dataset without repeated measures, but got lost in SQL.
ballardw
Super User

@joebacon wrote:
I should mention that all those variables are a single number that is repeated. They are metrics. I only need that one number per study_id (observation). I was trying to find a way to just output their max value to another dataset without repeated measures, but got lost in SQL.

Show us as a list of variables at least one example of finding the "max" value.

The data step has a MAX function that takes multiple variables.

joebacon
Pyrite | Level 9

So, my thought was if I could just get the max it would input only one value, but when I did this: 

data trythis2;
set trythis;
MaxFU_14day_DRink= max(FU_14day_DRink);
keep study_id FU_14day_drink;
run;

It still outputs the study_id repeated and the max 14 day drink repeated. If I can get it to only one value, I could merge the wide with the wide and easily output these variables. It feels like it should be really simple

Tom
Super User Tom
Super User

@joebacon wrote:

So, my thought was if I could just get the max it would input only one value, but when I did this: 

data trythis2;
set trythis;
MaxFU_14day_DRink= max(FU_14day_DRink);
keep study_id FU_14day_drink;
run;

It still outputs the study_id repeated and the max 14 day drink repeated. If I can get it to only one value, I could merge the wide with the wide and easily output these variables. It feels like it should be really simple


Taking the MAX() of just one variable is not going to do anything.  If FU_14day_DRink=10 then the max of it is also 10. 

ballardw
Super User

The data step MAX works on one observation (row of data).

If you want a MAX across records you use a procedure like Means or Summary.

Either sort by a grouping variable and use BY that variable or use a Class variable to get summaries by the values of another variable (or variables).

 

I think the first time you asked about this data I or someone suggested cutting the scope WAY back and only have about 3 of those related variables instead of 30. Make up similar data but enough to show with 7 variables (the id, the time and the day variable) and maybe 10 records in the "wide". Make the "long" as needed to work with that. Ensure the values of the variables are such that we can fairly easily see where the results come from. (Hint: this means not having almost everything 1 and 0)

 

Then you can manually calculate the needed result and can describe that in better detail. I have to say that I actually have no idea what you are attempting to do.

joebacon
Pyrite | Level 9

As always, thank you!

 

I have data that looks like this 

data work.trythis2 ;
  infile datalines dsd dlm='|' truncover;
  input Study_ID FU_14day_DRink ;
  format Study_ID best12. ;
  informat Study_ID best32. ;
datalines4;
1|
1|41
1|41
1|41
1|41
1|41
1|41
1|41
1|41
2|28
2|28
2|28
2|28
2|28
2|28
2|28
;;;;

I already calculated the values but I need data that look like this

data work.trythis2 ;
  infile datalines dsd dlm='|' truncover;
  input Study_ID FU_14day_DRink ;
  format Study_ID best12. ;
  informat Study_ID best32. ;
datalines4;
1|41
2|28
3|89
4|22
5|90

;;;;

 

ballardw
Super User

If you data is sorted by ID then this may reduce the "long" data as you want:

data work.want;
   set work.trythis2;
   by study_id;
   if last.study_id;
run;

This grabs the last record. I don't know if was intentional or not but the first id has blank value for the other variable.

Or a similar approach using MAX

Proc summary data=work.trythis2 nway;
   class study_id;
   var <list of variables>;
   output out=work.want (drop=_type_ _freq_) max=;
run;

If all of the values are the same for a variable any of max, min or median would work. There is an extremely small chance the mean, which should work, might acquire a small decimal difference when calculated so I wouldn't recommend that. Min, Max and Median should be the same from the way you describe the data.

 

 

joebacon
Pyrite | Level 9

Then, I was looking up different uses of the Put and Output statements. I tried a variation with output that I haven't gotten to run. It also has the problem that not every study_id is 14 repetitions. :

 

data trythis2;
set trythis;
keep study_id FU_14day_drink;
reps+1;
if reps=1 then FU_14day_drink=0;
max(FU_14day_drink);
if reps=14 then do;
output;
stop;
end;
run;
Tom
Super User Tom
Super User

Are you asking to take a variable from the LONG format and merge it into the WIDE format?

If so you are either going to convert the WIDE to long or you need to specify which value from the LONG you want.  Do you want just one of the observations for that STUDY_ID?  Or some type of aggregate across all of the observations that that STUDY_ID?

joebacon
Pyrite | Level 9
Yes. That is what I am trying to say. My point is that all the values of long are all the same. I was trying to output max in hopes that it would've output just one value rather than the list for each study_id.
Tom
Super User Tom
Super User

So if all of the observations are exactly the same then use FIRST. or LAST. flags to take either the first or the last observation in the group.

data want;
  merge one many ;
  by study_id;
  if first.study_id;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1580 views
  • 1 like
  • 3 in conversation