BookmarkSubscribeRSS Feed
mvhoya
Obsidian | Level 7

Hi SAS Communities,

 

I have quite a large dataset currently in long form with 71 variables and each observation contains an ID, a visit date, and then the remaining 69 variables such as age, sex, z-scores, etc. I would like to create one row for each ID, so that I'm able to run proc freq for the other variables once the IDs are grouped together. The output I hope to get would be a column for ID, a column for each visit_date (visit 1 to visit 28), and a column for each remaining variable. I attempted this on my own and I'm able to see each visit date in the same row as the ID it belongs to, but I am missing all of my other variables. Lastly, the reason I would like to transpose the structure of the data is because with the data being in a long structure, when I run proc freq for any variable such as sex, it counts each observation for the same ID as an individual observation of sex which is not accurate because it's an overestimation. If there's another way to code proc freq, that would be helpful as well but I think transposing the data seems to be the best solution. Thank you in advance for your suggestions and assistance.

 

The code I started and attempted to use is below:

/* Converting data from long to wide */
proc transpose data=work.clean out=work.wide (drop=_name_
								              rename = (col1 = visit_1));
    var visit_date;
    by id;
	run;

proc print data = work.wide;
	run;

 

4 REPLIES 4
PhilC
Rhodochrosite | Level 12

Why not do both?  You have, to begin, demographics data, which is usually connecting one ID to one or more personal traits.  Such a dataset would have one row per ID. 

 

Next you have historical data pertaining to "visits"  your data will have columns  where by the data is a specific measurement which is connected to both an individual ID and a individual visit. 

 

Separate it, divide and conquer it.  There no need to convert the whole file is there?  It's kind of overkill, at least it seems that way from an observer's perspective whom does not know the data very well.

mvhoya
Obsidian | Level 7
My dataset currently contains multiple observations (rows) for each ID. So the same ID can appear in the dataset between 1 to 28 times and each time, it is linked to a separate visit_date. That's why I think converting it to a wide structure would allow me to run my frequencies more effectively, with the visit_dates all grouped to one ID in one row.
Reeza
Super User
If you wanted to do your frequencies by a specific time period this wouldn't work though.
ballardw
Super User

@mvhoya wrote:
My dataset currently contains multiple observations (rows) for each ID. So the same ID can appear in the dataset between 1 to 28 times and each time, it is linked to a separate visit_date. That's why I think converting it to a wide structure would allow me to run my frequencies more effectively, with the visit_dates all grouped to one ID in one row.

Frequencies of what?

With upto 28 variables that represent the same thing you need to use 28 variable names to get any frequency, whether in a data step to "count" one way or another or in any of the procs like freq or a report procedure.

 

Here's an idea: provide a dummy data set of 4 or 5 ID values with varying numbers of observations, say 3 to 10 at most. Then manually determine the sorts of output you need from that example data and show use the desired result with any "rules".

 

Or if that is two complicated look at the SAS supplied practice data sets like SASHELP.PRDSALE that has multiple estimates and results for different periods (think "visit" for time period), or SASHELP.STOCKS that has monthly summaries for multiple stock price values for different stocks.

 

If you have not worked with Proc Report or Tabulate then you might be surprised with just how many types of summary reports can be generated.

 

The main reason to move multiple records into a single one, in my opinion, is for modeling where you want one outcome per ID as the dependent variable of a model and need all the the other values to create the model. Which is way more than some frequencies.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 833 views
  • 0 likes
  • 4 in conversation