BookmarkSubscribeRSS Feed
optimist
Calcite | Level 5

Hastie,

Could I ask what sort of analysis or report you plan on running with the resultant dataset? As others have commented, 40,320 variable is a lot of variables. If you will be doing additional analysis what statistical/analytical application (SAS, Stata, SPSS, R, etc.)  and which procedure or routine do you plan on using?  If you are preparing a report, what tool do you intend to use (Excel, Crystal Reports, SQL Server, etc.)  Someone here might be able to suggest a more efficient way to organize the data. 

Hatsie
Calcite | Level 5

Sure. So, I have 7 days of data per person, collected 24 hrs/day in 15-sec intervals, and as the data are organized currently, each 15-second interval is its own row. Eventually I would to have like a dataset that looks like this, where I have already removed all of the data for the chunks of time when I know (based on separate logs) that each person was sleeping.

Participant      WearDay1   SittingHrs1   StandHrs1   SteppingHrs1     SitStandTransitions1    Steps1    WearDay2    SittingHrs2  StandHrs2...

Guy Smiley     3/4/2014         9.1                  2.6                 1.8                            41                5596      3/5/2014           9.6                2.1

Kermit            11/21/2013      8.6                  4.1                 2.2                            68                7637      11/22/2014        8.8                3.2

PrairieDawn    5/9/2014         9.9                  1.9                  1.3                           32                2974       5/10/2014         8.7                1.5

Then I would create overall variables for each person's week, so that I know the average hrs/day each person spends sitting down, standing up, and stepping, as well as the average number of steps/day they take, and the average number of sit-to-stand transitions they make.


*** EDited this to fix a couple of variable names.

Hatsie
Calcite | Level 5

And, Optimist, to answer the rest of your question, I am planning to do the rest of the analysis in SAS. I work in public health and this is data from a pilot study we did looking at lifestyle patterns in a specific disease group....  the eventual goal is to quantify/describe how they are spending their time, and determine whether those who are spending more time sitting have higher levels of certain biomarkers in the blood.

SpiggyTopes
Calcite | Level 5

I suspect you're just making it hard for yourself by converting rows to columns, both in your initial steps and in your eventual requirement. In terms of the latter, what does having all days on a single row give you that one row per day would not? A simple PROC SUMMARY would allow you to roll together days of data if that's what's ultimately required. If it's the kind of project I think it is, then there will be further questions and analyses that spring from the results of this stage, and the simpler your data structure is, the easier it's going to be to perform those analyses. Keep everything in simple non-repeating columns as far as you possibly can, and I think you can probably achieve that using simple DATA steps, PROC SORTs and PROC SUMMARYs. Transpose for reporting at the end if you must, but keep your data simple as possible as long as possible.

Patrick
Opal | Level 21

I would assume the sleeping patterns and times are relevant for your study. You have very granular source data but wouldn't you loose all this information in your summary table?

If the Stats procedures you plan to use don't require a wide structure (many variables) then I would keep the data in a long structure and create an additional classification variable which tells you the state per observation (sleep, walk, etc.).

optimist
Calcite | Level 5

I would follow up on Patrick's comment and ask whether you are use the longitudinal component of your data in some sort of hazard rate model or something where you are just using the length of time in a state in something like a categorical analysis of some sort. Either way, flipping the time horizon from rows to columns is NOT the format the SAS PROCs that I can think of would use as an input dataset.

If you are still insistent on having all of these variables, there have been several decent suggestions for doing the transformation and promote one of the responses as the right answer.  (You should also read the finer options and examples on PROC TRANSPOSE. It probably IS what you want to use.)  

Ksharp
Super User

The simplest way is using proc means + idgroup. But you need to reorder variables . The second way is you don't need to reorder them .

data have;
input id Time VariableA VariableB VariableC     ;
cards;
1 1        3               2              9
1 2        8               5             11
1 3        2               7              5
2 1        3               2              9
2 2        8               5             11
2 3        3               2              9
2 4        8               5             11
;
run;

/*** First way  Need reorder variables***/
proc sql noprint;
select max(n) into : n from (select count(*)as n from have group by id);
quit;
%put &n;
proc summary data=have;
by id;
output out=want(drop=_:) idgroup(out[&n] (Time VariableA VariableB VariableC) = );
run;

/** Second way No need reorder variables**/
proc sort data=have(keep=time) out=temp nodupkey;by time;run;
data _null_;
 set temp end=last;
 if _n_ eq 1 then call execute('data want1;merge ');
 call execute(catt('have(where=(time_',time,'=',time,') rename=(Time=Time_',time,' VariableA=VariableA_',time,' VariableC=VariableC_',time,'))'));
 if last then call execute(';by id;run;');
run;

Xia Keshan

jakarman
Barite | Level 11

Hatsie, You answer to optimist is one to think about. Adam datasets Analysis Data Model (ADaM) | CDISC have a well defined structure for some column oriented approaches.

There is lot to find at http://www.lexjansen.com/ for pharma very related to public health. In any case if you want to do something on the US markt you must be aware of the fda.

That is a long intro as advice to you forgetting that idea of putting all variables in one record.

Conformance to common approaches and guidelines is more important.

Xia has already started showing something like that to you.    

---->-- ja karman --<-----
patelshree
Fluorite | Level 6

Hatsie,

There are so many good suggestions here and the unanimous theme is that you probably should leave the data as is (vertical) or convert it into a normalized structure. If I were given this data and if I understand your requirement correctly, I would approach it this way - to get the daily activity sums. You can employ various SAS procedures to analyze it further.

proc sql;

  create table summ as

  select distinct Participant, datepart(time) as Day format=date9., sum(steps) as Sum_Steps, sum(standups) as Sum_Standups <, and so on>

  from guysmiley

  where <conditions to eliminate the records when the participant is sleeping>

  group by Participant, datepart(time);

quit;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 5019 views
  • 1 like
  • 13 in conversation