Hey community,
I have some data with a total of 52 variables.
Issue is that these variables when exported from the data capture database they are lumped together all in one excel cell, with different combinations of the variables. For example (made up data has been used for example purposes).
Excel cell A1: site_id = '0', pt_initials = 'AA', dob = '1800-01-01', sex = '1', form_3_followup_complete = '0', study_id = '000'
Excel cell B1: fup_cont_attemp = '1', fup_time = '30', fup_yesno = '0', fup_nocall = '2', fup_date = '1900-01-01 13:30', fup_comment = 'weather is nice today'
How can I code in SAS so that it presents it like the below?
site_id | pt_initials | dob | sex | form_3_followup_complete | study_id | fup_cont_attemp | fup_time | fup_yesno | fup_nocall | fup_date | fup_comment |
0 | AA | 1800-01-01 | 1 | 0 | 000 |
|
|
|
|
|
|
|
|
|
|
|
| 1 | 30 | 0 | 2 | 1900-01-01 13:30 | Weather is nice today |
Please post an example as TEXT (either attach as a .txt file or paste into the Insert Code pop-up window). It sounds like you are saying you have CSV file with one line and two columns that looks like:
"site_id = '0', pt_initials = 'AA', dob = '1800-01-01', sex = '1', form_3_followup_complete = '0', study_id = '000'","fup_cont_attemp = '1', fup_time = '30', fup_yesno = '0', fup_nocall = '2', fup_date = '1900-01-01 13:30', fup_comment = 'weather is nice today'"
You could convert that into a file that just has the comma separated NAME=VALUE pairs.
data _null_;
infile 'original.csv' dsd truncover ;
file 'step1.csv' ;
length cell $32767. ;
do until (cell=' ');
input cell :$32767. @;
put cell ',' @;
end;
put +(-1) ' ' ;
run;
site_id = '0', pt_initials = 'AA', dob = '1800-01-01', sex = '1', form_3_followup_complete = '0', study_id = '000',fup_cont_attemp = '1', fup_time = '30', fup_yesno = '0', fup_nocall = '2', fup_date = '1900-01-01 13:30', fup_comment = 'weather is nice today'
Then you could read in the name=value pairs and then parse them.
data tall ;
infile 'step1.csv' dsd truncover ;
length row order 8 name $32 value $200 cell $300;
row+1;
do order=1 by 1 until (cell=' ');
input cell @;
if cell=' ' then continue;
name=scan(cell,1,'=');
value=scan(cell,2,'=');
output;
end;
input;
drop cell;
run;
You could then use PROC TRANSPOSE to convert this tall structure into a wide one. But make sure you don't have the same NAME value appearing more than once in the same row.
proc transpose data=tall out=want ;
by row;
id name ;
var value;
run;
You could also add logic at some point to convert some of the values into numeric, probably based on the name.
Export from the database to a file format that is easier to read, e.g. csv. In fact your text data already seems to be comma delimited.
yes it is cvs, the issue being i have been given the data.
The issue being is that if you delimit it via comma, then you get each variable + the value in the same cell.
I want to have the variable name as the column title and the value below such as in the example.
My first reaction is to tell you to go back to the data provider and ask for data in a more common structure for data exchange (like a proper .csv or an XML).
But let's assume you have to deal with what you've got. If this is a text file in the structure as you describe then below code should do the job. I haven't added all the variables but the code demonstrates the approach and that it's working.
/* create temporary file with .csv data structure */
filename havecsv temp lrecl=400;
data _null_;
file havecsv;
length str $400;
str="site_id = '0', pt_initials = 'AA', dob = '1800-01-01', sex = '1', form_3_followup_complete = '0', study_id = '000'";
str=cats('"',str,'","',"fup_cont_attemp = '1', fup_time = '30', fup_yesno = '0', fup_nocall = '2', fup_date = '1900-01-01 13:30', fup_comment = 'weather is nice today'",'"');
put str;
stop;
run;
/* read .csv structured data */
/*filename havecsv '<path to your .csv>' lrecl=400;*/
data want;
attrib
site_id informat=$10.
pt_initials informat=$10.
dob informat=yymmdd10. format=date9.
sex informat=$1.
fup_date informat=ymddttm24. format=datetime20.
;
infile havecsv truncover scanover dlm=',' dsd;
input
@'site_id =' site_id
@'pt_initials =' pt_initials
@'dob =' dob
@'sex =' sex
@'fup_date =' fup_date
;
run;
proc print;
run;
The code will read all the source data belonging to the same line into a single observation meaning your "cell A" and "cell B" will be read into a single SAS observation.
That's not how you show us the desired result but given your source data that's how it looks like to me you should be doing it. This assumes that there is never the same variable name "the same key value" on a single line (like same "variable" in different cells on the same row).
thank you, i am new to SAS to this will also give me an opportunity to research all the code terms you have provided. I will reply with how i go 🙂
The important keywords to look-up stuff in the documentation are SCANOVER in combination with DSD.
Reading data in the structure you have is not very common and documentation might be a bit "spread out" - but it's all there.
SCANOVER allows to parse an input buffer for strings (using these strings as "delimiter") and then read the value following the string.
DSD used together with SCANOVER then allows to read the value within the quotes (with the quotes excluded from the value read).
The INFORMAT instructs SAS how to read the text value from source into a SAS variable. The FORMAT instructs SAS how to present (print) the value stored in a SAS variable.
Defining SAS variables using an ATTRIB statement will implicitly create SAS numeric or character variables based on the Informat and format used. It will also implicitly assign the Lengths to the variables based on the Informat/Format definitions. You can use the LENGTH option of the ATTRIB statement to explicitly define the type (character or numeric) and length of a variable.
Please post an example as TEXT (either attach as a .txt file or paste into the Insert Code pop-up window). It sounds like you are saying you have CSV file with one line and two columns that looks like:
"site_id = '0', pt_initials = 'AA', dob = '1800-01-01', sex = '1', form_3_followup_complete = '0', study_id = '000'","fup_cont_attemp = '1', fup_time = '30', fup_yesno = '0', fup_nocall = '2', fup_date = '1900-01-01 13:30', fup_comment = 'weather is nice today'"
You could convert that into a file that just has the comma separated NAME=VALUE pairs.
data _null_;
infile 'original.csv' dsd truncover ;
file 'step1.csv' ;
length cell $32767. ;
do until (cell=' ');
input cell :$32767. @;
put cell ',' @;
end;
put +(-1) ' ' ;
run;
site_id = '0', pt_initials = 'AA', dob = '1800-01-01', sex = '1', form_3_followup_complete = '0', study_id = '000',fup_cont_attemp = '1', fup_time = '30', fup_yesno = '0', fup_nocall = '2', fup_date = '1900-01-01 13:30', fup_comment = 'weather is nice today'
Then you could read in the name=value pairs and then parse them.
data tall ;
infile 'step1.csv' dsd truncover ;
length row order 8 name $32 value $200 cell $300;
row+1;
do order=1 by 1 until (cell=' ');
input cell @;
if cell=' ' then continue;
name=scan(cell,1,'=');
value=scan(cell,2,'=');
output;
end;
input;
drop cell;
run;
You could then use PROC TRANSPOSE to convert this tall structure into a wide one. But make sure you don't have the same NAME value appearing more than once in the same row.
proc transpose data=tall out=want ;
by row;
id name ;
var value;
run;
You could also add logic at some point to convert some of the values into numeric, probably based on the name.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.