BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
OlsabeckT29
Fluorite | Level 6

Hello, I am working with a long dataset  of about 15 million records that currently is listing out the vaccine details for each shot a person received. I need to get this dataset into a wide dataset. I have 9 variables that need to be transposed and would rather not do 9 proc transpose steps with this large of a dataset. I have recip_id as my by variable and Dose1 is what I need to transpose on. Dose1 ranges from 1-13.

 

Is there a more efficient way to get what I need? I am running SAS 9.4

Below is an example of what I have vs what I want:

 

data have;

input recip_id Dose1 admin_date CVX MC Auth Lot_id dose_num vax_fund admin_state interval

Datalines;

1 1 10/16/2021 208 PF US acvds 1 k MI .

1 2 11/6/2021 208 PF US bdsa 2 k MI  21

1 3 5/29/2021 207 MO US bde1de 3 k MI 90

2 1 6/7/2021 207 MO US 2ds42 1 h OH . 

3 1 7/7/2021 208 PF US 3fs1 1 h WI .

3 2 7/7/2021 208 PF US fews2 2 k WI 28

3 3 8/4/2021 208 PF US gesw21 3 k WI 87

4 1 1/21/2022 208 PF US fewq21 1 h NY .

4 2 12/22/2020 208 PF US gresa21 2 h NY 28

5 1 1/12/2021 212 JS US gewa2 1 k WV .

6 1 4/12/2021 207 MO US fewqa2 1 k IN .

6 2 4/29/2021 207 MO US gewad2 2 k IN 27

   

Want 

recip_id Dose1 admin_date1 CVX1 MC1 Auth Lot_id1 dose_num1 vax_fund1 admin_state1 interval1 admin_date2 CVX2 MC2  Auth2 Lot_id2 dose_num2 vax_fund2 admin_state2 interval2 admin_date3 CVX3 MC3 Auth3 Lot_id3 dose_num3 vax_fund3 admin_state3 interval3

1 1 10/16/2021 208 PF US acvds 1 k MI . 11/6/2021 208 PF US bdsa 2 k MI  21 5/29/2021 207 MO US bde1de 3 k MI 90

2 1 6/7/2021 207 MO US 2ds42 1 h OH . 

3 1 7/7/2021 208 PF US 3fs1 1 h WI . 7/7/2021 208 PF US fews2 2 k WI 28 8/4/2021 208 PF US gesw21 3 k WI 87

4 1 1/21/2022 208 PF US fewq21 1 h NY . 12/22/2020 208 PF US gresa21 2 h NY 28

5 1 1/12/2021 212 JS US gewa2 1 k WV .

6 1 4/12/2021 207 MO US fewqa2 1 k IN .  4/29/2021 207 MO US gewad2 2 k IN 27

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Why idoesn't the dashboard just read the data directly from the source database?  Why copy it into a text file?

What type of text file? A delimited file, like a CSV file?

How robust is the tool that reads the "text" file?   Could it handle having some lines with fewer values than others?

If so then just generate the text file directly with a data step.

data _null_;
  set have;
  by id;
  file out dsd ;
  if first.id then put irecip_id Dose1 @;
  put admin_date CVX MC Auth Lot_id dose_num vax_fund admin_state interval @;
  if last.id then put;
run;

And if you need add the extra commas at the ends of the lines then just make the code a little more complex.

For example if there should be 10 replicates of those 9 variables then you could use something like this to write the extra commas to ends of the lines for the ids that had fewer than ten.

  if first.id then put irecip_id Dose1 @;
  put admin_date CVX MC Auth Lot_id dose_num vax_fund admin_state interval @;
  dummy=' ';
  if last.id then do;
    do i=1 to 9*(10-dose1); put dummy @; end
    put;
  end;

You could even make it smart enough to count the number of observations if that DOSE1 variable is not actually a count of the number of observations per ID.

 

View solution in original post

7 REPLIES 7
HB
Barite | Level 11 HB
Barite | Level 11
May I ask why you would want to do that? You are denormalizing the data- things like state, state2, state3, state4, etc. are bad from a database design perspective.
What is the goal? To display vaccine history by personID? Because you can do lots of things to present the data without messing up the data structure.
ballardw
Super User

Do you actually need a data set? What will you do with the data set?

If you are expecting a human to read the result that would be a Report and making a report would likely be the approach.

 

Did you test your data step before providing it to us? It throws many 'invalid data' errors. Your input statement (besides not ending in a semicolon ) would attempt to make all of the variables numeric. Most of your variables are apparently not numeric plus there appears to be an issue with spacing, possibly caused by pasting tab delimited data. Also the datalines do not end in a semicolon either.

OlsabeckT29
Fluorite | Level 6

The dataset I have in sas is read through an oracle connection and has millions of observations. I just type it into the question to provide an example of what I am working with. 

 

This dataset is used to generate a dashboard in Power BI so the end result is an exported txt file. The dashboard shows vaccine coverage rates. It needs to be a person level dataset, not a shot level dataset to do this. 

Tom
Super User Tom
Super User

Why idoesn't the dashboard just read the data directly from the source database?  Why copy it into a text file?

What type of text file? A delimited file, like a CSV file?

How robust is the tool that reads the "text" file?   Could it handle having some lines with fewer values than others?

If so then just generate the text file directly with a data step.

data _null_;
  set have;
  by id;
  file out dsd ;
  if first.id then put irecip_id Dose1 @;
  put admin_date CVX MC Auth Lot_id dose_num vax_fund admin_state interval @;
  if last.id then put;
run;

And if you need add the extra commas at the ends of the lines then just make the code a little more complex.

For example if there should be 10 replicates of those 9 variables then you could use something like this to write the extra commas to ends of the lines for the ids that had fewer than ten.

  if first.id then put irecip_id Dose1 @;
  put admin_date CVX MC Auth Lot_id dose_num vax_fund admin_state interval @;
  dummy=' ';
  if last.id then do;
    do i=1 to 9*(10-dose1); put dummy @; end
    put;
  end;

You could even make it smart enough to count the number of observations if that DOSE1 variable is not actually a count of the number of observations per ID.

 

pink_poodle
Barite | Level 11

You would need to transpose long to wide using arrays. Here is a post about it:

https://communities.sas.com/t5/SAS-Programming/Long-to-wide-Array/td-p/158600

 

Ksharp
Super User
 

data have;
input recip_id Dose1 (admin_date CVX MC Auth Lot_id dose_num vax_fund admin_state interval) (:$40.);
Datalines;
1 1 10/16/2021 208 PF US acvds 1 k MI .
1 2 11/6/2021 208 PF US bdsa 2 k MI  21
1 3 5/29/2021 207 MO US bde1de 3 k MI 90
2 1 6/7/2021 207 MO US 2ds42 1 h OH . 
3 1 7/7/2021 208 PF US 3fs1 1 h WI .
3 2 7/7/2021 208 PF US fews2 2 k WI 28
3 3 8/4/2021 208 PF US gesw21 3 k WI 87
4 1 1/21/2022 208 PF US fewq21 1 h NY .
4 2 12/22/2020 208 PF US gresa21 2 h NY 28
5 1 1/12/2021 212 JS US gewa2 1 k WV .
6 1 4/12/2021 207 MO US fewqa2 1 k IN .
6 2 4/29/2021 207 MO US gewad2 2 k IN 27
;
run;

proc sql noprint;
select distinct catt('have(where=(Dose1=',Dose1,') rename=(
admin_date=admin_date',Dose1,' CVX=CVX',Dose1,' MC=MC',Dose1,
' Auth=Auth',Dose1,' Lot_id=Lot_id',Lot_id,' dose_num=dose_num',Dose1,
' vax_fund=vax_fund',Dose1,' admin_state=admin_state',Dose1,' interval=interval',Dose1,'
))') into :merge separated by ' '
from have;
quit;
data want;
merge &merge.;
by recip_id;
drop Dose1;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 679 views
  • 3 likes
  • 6 in conversation