BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Luke01
Fluorite | Level 6

Hey community,

 

I have some data with a total of 52 variables.

 

Issue is that these variables when exported from the data capture database they are lumped together all in one excel cell, with different combinations of the variables. For example (made up data has been used for example purposes).

 

Excel cell A1:        site_id = '0', pt_initials = 'AA', dob = '1800-01-01', sex = '1', form_3_followup_complete = '0', study_id = '000'

Excel cell B1:        fup_cont_attemp = '1', fup_time = '30', fup_yesno = '0', fup_nocall = '2', fup_date = '1900-01-01 13:30', fup_comment = 'weather is nice today'

 

How can I code in SAS so that it presents it like the below?

site_id

pt_initials

dob

sex

form_3_followup_complete

study_id

fup_cont_attemp

fup_time

fup_yesno

fup_nocall

fup_date

fup_comment

0

AA

1800-01-01

1

0

000

 

 

 

 

 

 

 

 

 

 

 

 

1

30

0

2

1900-01-01 13:30

Weather is nice today

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Please post an example as TEXT (either attach as a .txt file or paste into the Insert Code pop-up window).  It sounds like you are saying you have CSV file with one line and two columns that looks like:

"site_id = '0', pt_initials = 'AA', dob = '1800-01-01', sex = '1', form_3_followup_complete = '0', study_id = '000'","fup_cont_attemp = '1', fup_time = '30', fup_yesno = '0', fup_nocall = '2', fup_date = '1900-01-01 13:30', fup_comment = 'weather is nice today'"

You could convert that into a file that just has the comma separated NAME=VALUE pairs.

data _null_;
   infile 'original.csv' dsd truncover ;
   file 'step1.csv' ;
   length cell $32767. ;
   do until (cell=' ');
     input cell :$32767.  @;
     put cell ',' @;
   end;
   put +(-1) ' ' ;
run;
site_id = '0', pt_initials = 'AA', dob = '1800-01-01', sex = '1', form_3_followup_complete = '0', study_id = '000',fup_cont_attemp = '1', fup_time = '30', fup_yesno = '0', fup_nocall = '2', fup_date = '1900-01-01 13:30', fup_comment = 'weather is nice today'

Then you could read in the name=value pairs and then parse them.

data tall ;
  infile 'step1.csv' dsd truncover ;
  length row order 8 name $32 value $200 cell $300;
  row+1;
  do order=1 by 1 until (cell=' ');
     input cell @;
     if cell=' ' then continue;
     name=scan(cell,1,'=');
     value=scan(cell,2,'=');
     output;
  end;
  input;
  drop cell;
run;

You could then use PROC TRANSPOSE to convert this tall structure into a wide one. But make sure you don't have the same NAME value appearing more than once in the same row.

proc transpose data=tall out=want ;
  by row;
  id name ;
  var value;
run;

You could also add logic at some point to convert some of the values into numeric, probably based on the name.

 

View solution in original post

6 REPLIES 6
Luke01
Fluorite | Level 6

yes it is cvs, the issue being i have been given the data.

 

The issue being is that if you delimit it via comma, then you get each variable + the value in the same cell.

 

I want to have the variable name as the column title and the value below such as in the example.

Patrick
Opal | Level 21

@Luke01

My first reaction is to tell you to go back to the data provider and ask for data in a more common structure for data exchange (like a proper .csv or an XML).

But let's assume you have to deal with what you've got. If this is a text file in the structure as you describe then below code should do the job. I haven't added all the variables but the code demonstrates the approach and that it's working.

/* create temporary file with .csv data structure */
filename havecsv temp lrecl=400;
data _null_;
  file havecsv;
  length str $400;
  str="site_id = '0', pt_initials = 'AA', dob = '1800-01-01', sex = '1', form_3_followup_complete = '0', study_id = '000'";
  str=cats('"',str,'","',"fup_cont_attemp = '1', fup_time = '30', fup_yesno = '0', fup_nocall = '2', fup_date = '1900-01-01 13:30', fup_comment = 'weather is nice today'",'"');
  put str;
  stop;
run;

/* read .csv structured data */
/*filename havecsv '<path to your .csv>' lrecl=400;*/
data want;
  attrib
    site_id     informat=$10.
    pt_initials informat=$10.
    dob         informat=yymmdd10.  format=date9.
    sex         informat=$1.
    fup_date    informat=ymddttm24. format=datetime20.
    ;
  infile havecsv truncover scanover dlm=',' dsd;
  input 
    @'site_id =' site_id
    @'pt_initials =' pt_initials
    @'dob =' dob
    @'sex =' sex
    @'fup_date =' fup_date
    ;
run;

proc print;
run;

Capture.JPG

 

The code will read all the source data belonging to the same line into a single observation meaning your "cell A" and "cell B" will be read into a single SAS observation. 

That's not how you show us the desired result but given your source data that's how it looks like to me you should be doing it. This assumes that there is never the same variable name "the same key value" on a single line (like same "variable" in different cells on the same row). 

Luke01
Fluorite | Level 6

thank you, i am new to SAS to this will also give me an opportunity to research all the code terms you have provided. I will reply with how i go 🙂

Patrick
Opal | Level 21

@Luke01 

The important keywords to look-up stuff in the documentation are SCANOVER in combination with DSD.

Reading data in the structure you have is not very common and documentation might be a bit "spread out" - but it's all there.

 

SCANOVER allows to parse an input buffer for strings (using these strings as "delimiter") and then read the value following the string.

DSD used together with SCANOVER then allows to read the value within the quotes (with the quotes excluded from the value read).

The INFORMAT instructs SAS how to read the text value from source into a SAS variable. The FORMAT instructs SAS how to present (print) the value stored in a SAS variable.

Defining SAS variables using an ATTRIB statement will implicitly create SAS numeric or character variables based on the Informat and format used. It will also implicitly assign the Lengths to the variables based on the Informat/Format definitions. You can use the LENGTH option of the ATTRIB statement to explicitly define the type (character or numeric) and length of a variable.

Tom
Super User Tom
Super User

Please post an example as TEXT (either attach as a .txt file or paste into the Insert Code pop-up window).  It sounds like you are saying you have CSV file with one line and two columns that looks like:

"site_id = '0', pt_initials = 'AA', dob = '1800-01-01', sex = '1', form_3_followup_complete = '0', study_id = '000'","fup_cont_attemp = '1', fup_time = '30', fup_yesno = '0', fup_nocall = '2', fup_date = '1900-01-01 13:30', fup_comment = 'weather is nice today'"

You could convert that into a file that just has the comma separated NAME=VALUE pairs.

data _null_;
   infile 'original.csv' dsd truncover ;
   file 'step1.csv' ;
   length cell $32767. ;
   do until (cell=' ');
     input cell :$32767.  @;
     put cell ',' @;
   end;
   put +(-1) ' ' ;
run;
site_id = '0', pt_initials = 'AA', dob = '1800-01-01', sex = '1', form_3_followup_complete = '0', study_id = '000',fup_cont_attemp = '1', fup_time = '30', fup_yesno = '0', fup_nocall = '2', fup_date = '1900-01-01 13:30', fup_comment = 'weather is nice today'

Then you could read in the name=value pairs and then parse them.

data tall ;
  infile 'step1.csv' dsd truncover ;
  length row order 8 name $32 value $200 cell $300;
  row+1;
  do order=1 by 1 until (cell=' ');
     input cell @;
     if cell=' ' then continue;
     name=scan(cell,1,'=');
     value=scan(cell,2,'=');
     output;
  end;
  input;
  drop cell;
run;

You could then use PROC TRANSPOSE to convert this tall structure into a wide one. But make sure you don't have the same NAME value appearing more than once in the same row.

proc transpose data=tall out=want ;
  by row;
  id name ;
  var value;
run;

You could also add logic at some point to convert some of the values into numeric, probably based on the name.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 762 views
  • 3 likes
  • 4 in conversation