Hello,
I have a .csv file with JSON data that looks as follows (sample of data below):
id | dep_date | dep_1_PA | date_2 | dep_2_PA |
1346 | 1/20/2019 | ['{"dep_count":4,"dep_duration":4,"happy_count":10,"qual":95,"end_datetime":"2019-01-25T06:44:30Z","date":"2019-01-25","mid":"2019-01-25T04:08:00Z","counter_report":0,"counter_nap":297,"counter_dep":15,"counter_to_eat":0}' '{"dep_count":0,"dep_duration":0,"happy_count":5,"qual":96,"end_datetime":"2019-01-26T07:15:30Z","date":"2019-01-26","mid":"2019-01-26T04:39:30Z","counter_report":0,"counter_nap":300,"counter_dep":13,"counter_to_eat":0}' '{"dep_count":1,"dep_duration":1,"happy_count":4,"qual":99,"end_datetime":"2019-01-27T06:46:00Z","date":"2019-01-27","mid":"2019-01-27T04:13:30Z","counter_report":6,"counter_nap":296,"counter_dep":2,"counter_to_eat":0}'] |
2/19/2019 | ['{"dep_count":0,"dep_duration":0,"happy_count":15,"qual":96,"end_datetime":"2019-02-04T12:25:30Z","date":"2019-02-04","mid":"2019-02-04T09:01:30Z","counter_report":0,"counter_nap":392,"counter_dep":17,"counter_to_eat":0} '{"dep_count":4,"dep_duration":7,"happy_count":13,"qual":93,"end_datetime":"2019-02-05T09:09:30Z","date":"2019-02-05","mid":"2019-02-05T06:13:00Z","counter_report":0,"counter_nap":327,"counter_dep":26,"counter_to_eat":0}' '{"dep_count":1,"dep_duration":3,"happy_count":12,"qual":95,"end_datetime":"2019-02-06T12:05:30Z","date":"2019-02-06","mid":"2019-02-06T08:12:00Z","counter_report":1,"counter_nap":444,"counter_dep":21,"counter_to_eat":0}'] |
1786 | 3/14/2019 | ['{"dep_count":2,"dep_duration":2,"happy_count":8,"qual":97,"end_datetime":"2019-03-10T05:39:00Z","date":"2019-03-10","mid":"2019-03-10T03:00:30Z","counter_report":0,"counter_nap":307,"counter_dep":9,"counter_to_eat":0}' '{"dep_count":0,"dep_duration":0,"happy_count":5,"qual":99,"end_datetime":"2019-03-11T04:38:30Z","date":"2019-03-11","mid":"2019-03-11T01:50:30Z","counter_report":0,"counter_nap":332,"counter_dep":5,"counter_to_eat":0}' '{"dep_count":1,"dep_duration":4,"happy_count":15,"qual":92,"end_datetime":"2019-03-12T05:49:00Z","date":"2019-03-12","mid":"2019-03-12T02:16:00Z","counter_report":0,"counter_nap":393,"counter_dep":34,"counter_to_eat":0}'] |
4/14/2019 | [] |
I need it to look like this:
ID | dep_date | dep_count | dep_duration | happy_count | qual | end_datetime | date | mid | counter_report | counter_nap | counter_dep | counter_to_eat |
1346 | 1/20/2019 | 4 | 4 | 10 | 95 | 2019-01-25T06:44:30Z | 1/25/2019 | 2019-01-25T04:08:00Z | 0 | 297 | 15 | 0 |
1346 | 1/20/2019 | 0 | 0 | 5 | 96 | 2019-01-26T07:15:30Z | 1/26/2019 | 2019-01-26T04:39:30Z | 0 | 300 | 13 | 0 |
1346 | 1/20/2019 | 1 | 1 | 4 | 99 | 2019-01-27T06:46:00Z | 1/27/2019 | 2019-01-27T04:13:30Z | 6 | 296 | 2 | 0 |
1346 | 2/19/2019 | 0 | 0 | 15 | 96 | 2019-02-04T12:25:30Z | 2/4/2019 | 2019-02-04T09:01:30Z | 0 | 392 | 17 | 0 |
1346 | 2/19/2019 | 4 | 7 | 13 | 93 | 2019-02-05T09:09:30Z | 2/5/2019 | 2019-02-05T06:13:00Z | 0 | 327 | 26 | 0 |
1346 | 2/19/2019 | 1 | 3 | 12 | 95 | 2019-02-06T12:05:30Z | 2/6/2019 | 2019-02-06T08:12:00Z | 1 | 444 | 21 | 0 |
1786 | 3/14/2019 | 2 | 2 | 8 | 97 | 2019-03-10T05:39:00Z | 3/10/2019 | 2019-03-10T03:00:30Z | 0 | 307 | 9 | 0 |
1786 | 3/14/2019 | 0 | 0 | 5 | 99 | 2019-03-11T04:38:30Z | 3/11/2019 | 2019-03-11T01:50:30Z | 0 | 332 | 5 | 0 |
1786 | 3/14/2019 | 1 | 4 | 15 | 92 | 2019-03-12T05:49:00Z | 3/12/2019 | 2019-03-12T02:16:00Z | 0 | 393 | 34 | 0 |
1786 | 4/14/2019 | |||||||||||
1786 | 4/14/2019 | |||||||||||
1786 | 4/14/2019 |
|
Is there a way to parse this JSON data accordingly in SAS? I can't do it in Excel. Thanks.
Once you have the JSON data properly extracted from the CSV file you can finally turn to trying to parse it.
data tall;
set json(keep=id type rep json);
where json ne ' ';
length line row 8 name $32 value $200 term $300 record $1500;
do line=1 to countw(json,'|');
record=left(scan(json,line,'|'));
do row=1 by 1 until(term=' ');
term=scan(record,row,"[{,'}]"||'A0'x);
term=translate(term,' ','A0'x);
name=dequote(scan(term,1,':','q'));
value=dequote(scan(term,2,':','q'));
if term ne ' ' then output;
end;
end;
drop json term record;
run;
proc sort;
by id type rep line row;
run;
proc transpose data=tall out=wide(drop=_name_) ;
where name ne 'id';
by id type rep line ;
id name;
var value;
run;
Looks like you also might need to do a little more cleaning. It appears some of the JSON strings have MID instead of MIDPOINT for one the timestamp variables.
Please attach a representative sample .csv. We need to understand in detail how your .csv looks like - especially where the line feeds are. Make sure the data you share really look like the source the SAS code needs to process (=avoid any intermediary step using Excel for providing the data).
I don't see how I can attach the .csv to a post, so I've uploaded it here
@confooseddesi89 wrote:
I don't see how I can attach the .csv to a post, so I've uploaded it here
@confooseddesi89 Just drag&drop. I've done it here now because many people won't download from some link.
The following code won't return the final result you're likely after but should take you one step further. Please don't just state what's still missing but just where you need further help with.
One of the challenges with your data: The json in the data needs some clean-up to use with a json engine. The question is where all this additional quoting got added.
Could it be that the .csv you've shared is some export from a database where the json string is stored in a CLOB field? Would you in the end directly access the database? And if so: Which database?
%let source_path=C:\temp;
data step1;
infile "&source_path\dep_PA to paste sample v3 for sas forum.csv"
firstobs=2
lrecl=10000
encoding="utf-8"
termstr=crlf
dlm=','
truncover
dsd
;
length id dep 8;
format dep_date date9.;
input id dep_date:mmddyy10. dep_PA:$2000. @;
dep=1;
dep_pa=transtrn(dep_pa,"['",'[');
dep_pa=transtrn(dep_pa,"']",']');
dep_pa=transtrn(dep_pa,'""','"');
output;
input dep_date:mmddyy10. dep_PA:$2000.;
dep=2;
dep_pa=transtrn(dep_pa,"['",'[');
dep_pa=transtrn(dep_pa,"']",']');
dep_pa=transtrn(dep_pa,'""','"');
output;
run;
data step2;
set step1;
length name $32 value $40 ;
do i=1 by 1 until(name=' ');
name=dequote(scan(scan(dep_pa,i,"{{,'}]"),1,':'));
value=scan(scan(dep_pa,i,"{{,'}]"),2,':');
if name ne ' ' then output;
end;
run;
You have posted a listing, not a CSV file. A CSV files is a text file. A CSV file has only one line per observation.
The listing you posted appears to have line breaks in the middle of some of the cells. That will not work.
So first step is to make sure you actually have just one line of text in the CSV file.
Also it would be a lot easier to use something other than a COMMA is the separator in your text file since COMMA is a common character in JSON text. So perhaps use a pipe character as the delimiter. Or TAB if you never have to actually look at the file with an editor. TAB characters are normally invisible they make it hard to look at or modify files that have them.
The text you posted also does not look like JSON text to me. What is that extra single quote doing there in the beginning?
['{"dep_count":4,"dep_duration":4,"happy_count":10
It is hard to tell from your post what the values in the DEP_1_PA and DEP_2_PA fields really have in them, but if they are consistently structed as those NAME:VALUE pairs then perhaps you can just use SCAN() to help you.
Try doing something like this to see what you get.
data tall;
set have;
length variable name $32 value $40 ;
array x DEP_1_PA DEP_2_PA;
do over x;
variable = scan(vname(x),2,'_');
do i=1 by 1 until(name=' ');
name=dequote(scan(scan(x,i,"{{,'}]"),1,':'));
value=scan(scan(x,i,"{{,'}]"),2,':');
if name ne ' ' then output;
end;
end;
drop DEP_1_PA DEP_2_PA;
run;
So you should now have a dataset like with ID, DEP_DATE, DATE_2, VARIABLE, NAME and VALUE.
You can then try re-ordering and using PROC TRANPOSE.
proc sort data=tall;
by id dep_date date2 name variable ;
run;
proc transpose data=tall out=want;
by id dep_date date2;
id name variable;
var value;
run;
Hello,
Using the code you gave me, I get this error:
ERROR: The ID value "__1" occurs twice in the same BY group.
ERROR: The ID value "counter_dep1" occurs twice in the same BY group.
ERROR: The ID value "counter_dep1" occurs twice in the same BY group.
ERROR: The ID value "counter_dep2" occurs twice in the same BY group.
ERROR: The ID value "counter_dep2" occurs twice in the same BY group.
ERROR: The ID value "counter_nap1" occurs twice in the same BY group.
ERROR: The ID value "counter_nap1" occurs twice in the same BY group.
ERROR: The ID value "counter_nap2" occurs twice in the same BY group.
ERROR: The ID value "counter_nap2" occurs twice in the same BY group.
ERROR: The ID value "counter_report1" occurs twice in the same BY group.
ERROR: The ID value "counter_report1" occurs twice in the same BY group.
ERROR: The ID value "counter_report2" occurs twice in the same BY group.
ERROR: The ID value "counter_report2" occurs twice in the same BY group.
ERROR: The ID value "counter_to_eat1" occurs twice in the same BY group.
ERROR: The ID value "counter_to_eat1" occurs twice in the same BY group.
ERROR: The ID value "counter_to_eat2" occurs twice in the same BY group.
ERROR: The ID value "counter_to_eat2" occurs twice in the same BY group.
ERROR: The ID value "date1" occurs twice in the same BY group.
ERROR: The ID value "date1" occurs twice in the same BY group.
ERROR: The ID value "date2" occurs twice in the same BY group.
ERROR: The ID value "date2" occurs twice in the same BY group.
ERROR: The ID value "dep_count1" occurs twice in the same BY group.
ERROR: The ID value "dep_count1" occurs twice in the same BY group.
ERROR: The ID value "dep_count2" occurs twice in the same BY group.
ERROR: The ID value "dep_count2" occurs twice in the same BY group.
ERROR: The ID value "dep_duration1" occurs twice in the same BY group.
ERROR: The ID value "dep_duration1" occurs twice in the same BY group.
ERROR: The ID value "dep_duration2" occurs twice in the same BY group.
ERROR: The ID value "dep_duration2" occurs twice in the same BY group.
ERROR: The ID value "end_datetime1" occurs twice in the same BY group.
ERROR: The ID value "end_datetime1" occurs twice in the same BY group.
ERROR: The ID value "end_datetime2" occurs twice in the same BY group.
ERROR: The ID value "end_datetime2" occurs twice in the same BY group.
ERROR: The ID value "happy_count1" occurs twice in the same BY group.
ERROR: The ID value "happy_count1" occurs twice in the same BY group.
ERROR: The ID value "happy_count2" occurs twice in the same BY group.
ERROR: The ID value "happy_count2" occurs twice in the same BY group.
ERROR: The ID value "mid1" occurs twice in the same BY group.
ERROR: The ID value "mid1" occurs twice in the same BY group.
ERROR: The ID value "mid2" occurs twice in the same BY group.
ERROR: The ID value "mid2" occurs twice in the same BY group.
ERROR: The ID value "qual1" occurs twice in the same BY group.
ERROR: The ID value "qual1" occurs twice in the same BY group.
ERROR: The ID value "qual2" occurs twice in the same BY group.
ERROR: The ID value "qual2" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
id=1346 dep_date=01/20/2019 date_2=02/19/2019
ERROR: The ID value "__1" occurs twice in the same BY group.
ERROR: The ID value "counter_dep1" occurs twice in the same BY group.
ERROR: The ID value "counter_dep1" occurs twice in the same BY group.
ERROR: The ID value "counter_nap1" occurs twice in the same BY group.
ERROR: The ID value "counter_nap1" occurs twice in the same BY group.
ERROR: The ID value "counter_report1" occurs twice in the same BY group.
ERROR: The ID value "counter_report1" occurs twice in the same BY group.
ERROR: The ID value "counter_to_eat1" occurs twice in the same BY group.
ERROR: The ID value "counter_to_eat1" occurs twice in the same BY group.
ERROR: The ID value "date1" occurs twice in the same BY group.
ERROR: The ID value "date1" occurs twice in the same BY group.
ERROR: The ID value "dep_count1" occurs twice in the same BY group.
ERROR: The ID value "dep_count1" occurs twice in the same BY group.
ERROR: The ID value "dep_duration1" occurs twice in the same BY group.
ERROR: The ID value "dep_duration1" occurs twice in the same BY group.
ERROR: The ID value "end_datetime1" occurs twice in the same BY group.
ERROR: The ID value "end_datetime1" occurs twice in the same BY group.
ERROR: The ID value "happy_count1" occurs twice in the same BY group.
ERROR: The ID value "happy_count1" occurs twice in the same BY group.
ERROR: The ID value "mid1" occurs twice in the same BY group.
ERROR: The ID value "mid1" occurs twice in the same BY group.
ERROR: The ID value "qual1" occurs twice in the same BY group.
ERROR: The ID value "qual1" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
id=1786 dep_date=03/14/2019 date_2=04/14/2019
ERROR: All BY groups were bad.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 109 observations read from the data set WORK.TALL.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0
observations and 0 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
Also, the "tall" dataset looks like below:
id | dep_date | date_2 | variable | name | value | i |
1346 | 1/20/2019 | 2/19/2019 | 1 | 13 | ||
1346 | 1/20/2019 | 2/19/2019 | 1 | 25 | ||
1346 | 1/20/2019 | 2/19/2019 | 2 | 25 | ||
1346 | 1/20/2019 | 2/19/2019 | 2 | 13 | ||
1346 | 1/20/2019 | 2/19/2019 | 1 | [ | 1 | |
1346 | 1/20/2019 | 2/19/2019 | 2 | [ | 1 | |
1346 | 1/20/2019 | 2/19/2019 | 1 | counter_dep | 15 | 11 |
1346 | 1/20/2019 | 2/19/2019 | 1 | counter_dep | 13 | 23 |
1346 | 1/20/2019 | 2/19/2019 | 1 | counter_dep | 2 | 35 |
1346 | 1/20/2019 | 2/19/2019 | 2 | counter_dep | 17 | 11 |
1346 | 1/20/2019 | 2/19/2019 | 2 | counter_dep | 26 | 23 |
1346 | 1/20/2019 | 2/19/2019 | 2 | counter_dep | 21 | 35 |
1346 | 1/20/2019 | 2/19/2019 | 1 | counter_nap | 297 | 10 |
1346 | 1/20/2019 | 2/19/2019 | 1 | counter_nap | 300 | 22 |
1346 | 1/20/2019 | 2/19/2019 | 1 | counter_nap | 296 | 34 |
1346 | 1/20/2019 | 2/19/2019 | 2 | counter_nap | 392 | 10 |
1346 | 1/20/2019 | 2/19/2019 | 2 | counter_nap | 327 | 22 |
1346 | 1/20/2019 | 2/19/2019 | 2 | counter_nap | 444 | 34 |
1346 | 1/20/2019 | 2/19/2019 | 1 | counter_report | 0 | 9 |
1346 | 1/20/2019 | 2/19/2019 | 1 | counter_report | 0 | 21 |
1346 | 1/20/2019 | 2/19/2019 | 1 | counter_report | 6 | 33 |
1346 | 1/20/2019 | 2/19/2019 | 2 | counter_report | 0 | 9 |
1346 | 1/20/2019 | 2/19/2019 | 2 | counter_report | 0 | 21 |
1346 | 1/20/2019 | 2/19/2019 | 2 | counter_report | 1 | 33 |
1346 | 1/20/2019 | 2/19/2019 | 1 | counter_to_eat | 0 | 12 |
1346 | 1/20/2019 | 2/19/2019 | 1 | counter_to_eat | 0 | 24 |
1346 | 1/20/2019 | 2/19/2019 | 1 | counter_to_eat | 0 | 36 |
1346 | 1/20/2019 | 2/19/2019 | 2 | counter_to_eat | 0 | 12 |
1346 | 1/20/2019 | 2/19/2019 | 2 | counter_to_eat | 0 | 24 |
1346 | 1/20/2019 | 2/19/2019 | 2 | counter_to_eat | 0 | 36 |
1346 | 1/20/2019 | 2/19/2019 | 1 | date | "2019-01-25" | 7 |
1346 | 1/20/2019 | 2/19/2019 | 1 | date | "2019-01-26" | 19 |
1346 | 1/20/2019 | 2/19/2019 | 1 | date | "2019-01-27" | 31 |
1346 | 1/20/2019 | 2/19/2019 | 2 | date | "2019-02-04" | 7 |
1346 | 1/20/2019 | 2/19/2019 | 2 | date | "2019-02-05" | 19 |
1346 | 1/20/2019 | 2/19/2019 | 2 | date | "2019-02-06" | 31 |
1346 | 1/20/2019 | 2/19/2019 | 1 | dep_count | 4 | 2 |
1346 | 1/20/2019 | 2/19/2019 | 1 | dep_count | 0 | 14 |
1346 | 1/20/2019 | 2/19/2019 | 1 | dep_count | 1 | 26 |
1346 | 1/20/2019 | 2/19/2019 | 2 | dep_count | 0 | 2 |
1346 | 1/20/2019 | 2/19/2019 | 2 | dep_count | 4 | 14 |
1346 | 1/20/2019 | 2/19/2019 | 2 | dep_count | 1 | 26 |
1346 | 1/20/2019 | 2/19/2019 | 1 | dep_duration | 4 | 3 |
1346 | 1/20/2019 | 2/19/2019 | 1 | dep_duration | 0 | 15 |
1346 | 1/20/2019 | 2/19/2019 | 1 | dep_duration | 1 | 27 |
1346 | 1/20/2019 | 2/19/2019 | 2 | dep_duration | 0 | 3 |
1346 | 1/20/2019 | 2/19/2019 | 2 | dep_duration | 7 | 15 |
1346 | 1/20/2019 | 2/19/2019 | 2 | dep_duration | 3 | 27 |
1346 | 1/20/2019 | 2/19/2019 | 1 | end_datetime | "2019-01-25T06 | 6 |
1346 | 1/20/2019 | 2/19/2019 | 1 | end_datetime | "2019-01-26T07 | 18 |
1346 | 1/20/2019 | 2/19/2019 | 1 | end_datetime | "2019-01-27T06 | 30 |
1346 | 1/20/2019 | 2/19/2019 | 2 | end_datetime | "2019-02-04T12 | 6 |
1346 | 1/20/2019 | 2/19/2019 | 2 | end_datetime | "2019-02-05T09 | 18 |
1346 | 1/20/2019 | 2/19/2019 | 2 | end_datetime | "2019-02-06T12 | 30 |
1346 | 1/20/2019 | 2/19/2019 | 1 | happy_count | 10 | 4 |
1346 | 1/20/2019 | 2/19/2019 | 1 | happy_count | 5 | 16 |
1346 | 1/20/2019 | 2/19/2019 | 1 | happy_count | 4 | 28 |
1346 | 1/20/2019 | 2/19/2019 | 2 | happy_count | 15 | 4 |
1346 | 1/20/2019 | 2/19/2019 | 2 | happy_count | 13 | 16 |
1346 | 1/20/2019 | 2/19/2019 | 2 | happy_count | 12 | 28 |
1346 | 1/20/2019 | 2/19/2019 | 1 | mid | "2019-01-25T04 | 8 |
1346 | 1/20/2019 | 2/19/2019 | 1 | mid | "2019-01-26T04 | 20 |
1346 | 1/20/2019 | 2/19/2019 | 1 | mid | "2019-01-27T04 | 32 |
1346 | 1/20/2019 | 2/19/2019 | 2 | mid | "2019-02-04T09 | 8 |
1346 | 1/20/2019 | 2/19/2019 | 2 | mid | "2019-02-05T06 | 20 |
1346 | 1/20/2019 | 2/19/2019 | 2 | mid | "2019-02-06T08 | 32 |
1346 | 1/20/2019 | 2/19/2019 | 1 | qual | 95 | 5 |
1346 | 1/20/2019 | 2/19/2019 | 1 | qual | 96 | 17 |
1346 | 1/20/2019 | 2/19/2019 | 1 | qual | 99 | 29 |
1346 | 1/20/2019 | 2/19/2019 | 2 | qual | 96 | 5 |
1346 | 1/20/2019 | 2/19/2019 | 2 | qual | 93 | 17 |
1346 | 1/20/2019 | 2/19/2019 | 2 | qual | 95 | 29 |
1786 | 3/14/2019 | 4/14/2019 | 1 | 13 | ||
1786 | 3/14/2019 | 4/14/2019 | 1 | 25 | ||
1786 | 3/14/2019 | 4/14/2019 | 1 | [ | 1 | |
1786 | 3/14/2019 | 4/14/2019 | 2 | [ | 1 | |
1786 | 3/14/2019 | 4/14/2019 | 1 | counter_dep | 9 | 11 |
1786 | 3/14/2019 | 4/14/2019 | 1 | counter_dep | 5 | 23 |
1786 | 3/14/2019 | 4/14/2019 | 1 | counter_dep | 34 | 35 |
1786 | 3/14/2019 | 4/14/2019 | 1 | counter_nap | 307 | 10 |
1786 | 3/14/2019 | 4/14/2019 | 1 | counter_nap | 332 | 22 |
1786 | 3/14/2019 | 4/14/2019 | 1 | counter_nap | 393 | 34 |
1786 | 3/14/2019 | 4/14/2019 | 1 | counter_report | 0 | 9 |
1786 | 3/14/2019 | 4/14/2019 | 1 | counter_report | 0 | 21 |
1786 | 3/14/2019 | 4/14/2019 | 1 | counter_report | 0 | 33 |
1786 | 3/14/2019 | 4/14/2019 | 1 | counter_to_eat | 0 | 12 |
1786 | 3/14/2019 | 4/14/2019 | 1 | counter_to_eat | 0 | 24 |
1786 | 3/14/2019 | 4/14/2019 | 1 | counter_to_eat | 0 | 36 |
1786 | 3/14/2019 | 4/14/2019 | 1 | date | "2019-03-10" | 7 |
1786 | 3/14/2019 | 4/14/2019 | 1 | date | "2019-03-11" | 19 |
1786 | 3/14/2019 | 4/14/2019 | 1 | date | "2019-03-12" | 31 |
1786 | 3/14/2019 | 4/14/2019 | 1 | dep_count | 2 | 2 |
1786 | 3/14/2019 | 4/14/2019 | 1 | dep_count | 0 | 14 |
1786 | 3/14/2019 | 4/14/2019 | 1 | dep_count | 1 | 26 |
1786 | 3/14/2019 | 4/14/2019 | 1 | dep_duration | 2 | 3 |
1786 | 3/14/2019 | 4/14/2019 | 1 | dep_duration | 0 | 15 |
1786 | 3/14/2019 | 4/14/2019 | 1 | dep_duration | 4 | 27 |
1786 | 3/14/2019 | 4/14/2019 | 1 | end_datetime | "2019-03-10T05 | 6 |
1786 | 3/14/2019 | 4/14/2019 | 1 | end_datetime | "2019-03-11T04 | 18 |
1786 | 3/14/2019 | 4/14/2019 | 1 | end_datetime | "2019-03-12T05 | 30 |
1786 | 3/14/2019 | 4/14/2019 | 1 | happy_count | 8 | 4 |
1786 | 3/14/2019 | 4/14/2019 | 1 | happy_count | 5 | 16 |
1786 | 3/14/2019 | 4/14/2019 | 1 | happy_count | 15 | 28 |
1786 | 3/14/2019 | 4/14/2019 | 1 | mid | "2019-03-10T03 | 8 |
1786 | 3/14/2019 | 4/14/2019 | 1 | mid | "2019-03-11T01 | 20 |
1786 | 3/14/2019 | 4/14/2019 | 1 | mid | "2019-03-12T02 | 32 |
1786 | 3/14/2019 | 4/14/2019 | 1 | qual | 97 | 5 |
1786 | 3/14/2019 | 4/14/2019 | 1 | qual | 99 | 17 |
1786 | 3/14/2019 | 4/14/2019 | 1 | qual | 92 | 29 |
The parsing by dep_date / date_2 is incorrect. For example, the values 15, 13, and 2 in the "value" column belong to date 1/20/2019, whereas the values 17, 26, and 21 belong to date 2/19/2019. None of these value belong to both of these dates (2nd and third columns should be collapsed). I used the code below:
data tall; set tall; if Variable=2 then dep_date=date_2;drop date_2;run;
To get this:
id | dep_date | variable | name | value | i |
1346 | 1/20/2019 | 1 | [ | 1 | |
1346 | 1/20/2019 | 1 | dep_count | 4 | 2 |
1346 | 1/20/2019 | 1 | dep_duration | 4 | 3 |
1346 | 1/20/2019 | 1 | happy_count | 10 | 4 |
1346 | 1/20/2019 | 1 | qual | 95 | 5 |
1346 | 1/20/2019 | 1 | end_datetime | "2019-01-25T06 | 6 |
1346 | 1/20/2019 | 1 | date | "2019-01-25" | 7 |
1346 | 1/20/2019 | 1 | mid | "2019-01-25T04 | 8 |
1346 | 1/20/2019 | 1 | counter_report | 0 | 9 |
1346 | 1/20/2019 | 1 | counter_nap | 297 | 10 |
1346 | 1/20/2019 | 1 | counter_dep | 15 | 11 |
1346 | 1/20/2019 | 1 | counter_to_eat | 0 | 12 |
1346 | 1/20/2019 | 1 | 13 | ||
1346 | 1/20/2019 | 1 | dep_count | 0 | 14 |
1346 | 1/20/2019 | 1 | dep_duration | 0 | 15 |
1346 | 1/20/2019 | 1 | happy_count | 5 | 16 |
1346 | 1/20/2019 | 1 | qual | 96 | 17 |
1346 | 1/20/2019 | 1 | end_datetime | "2019-01-26T07 | 18 |
1346 | 1/20/2019 | 1 | date | "2019-01-26" | 19 |
1346 | 1/20/2019 | 1 | mid | "2019-01-26T04 | 20 |
1346 | 1/20/2019 | 1 | counter_report | 0 | 21 |
1346 | 1/20/2019 | 1 | counter_nap | 300 | 22 |
1346 | 1/20/2019 | 1 | counter_dep | 13 | 23 |
1346 | 1/20/2019 | 1 | counter_to_eat | 0 | 24 |
1346 | 1/20/2019 | 1 | 25 | ||
1346 | 1/20/2019 | 1 | dep_count | 1 | 26 |
1346 | 1/20/2019 | 1 | dep_duration | 1 | 27 |
1346 | 1/20/2019 | 1 | happy_count | 4 | 28 |
1346 | 1/20/2019 | 1 | qual | 99 | 29 |
1346 | 1/20/2019 | 1 | end_datetime | "2019-01-27T06 | 30 |
1346 | 1/20/2019 | 1 | date | "2019-01-27" | 31 |
1346 | 1/20/2019 | 1 | mid | "2019-01-27T04 | 32 |
1346 | 1/20/2019 | 1 | counter_report | 6 | 33 |
1346 | 1/20/2019 | 1 | counter_nap | 296 | 34 |
1346 | 1/20/2019 | 1 | counter_dep | 2 | 35 |
1346 | 1/20/2019 | 1 | counter_to_eat | 0 | 36 |
1346 | 2/19/2019 | 2 | [ | 1 | |
1346 | 2/19/2019 | 2 | dep_count | 0 | 2 |
1346 | 2/19/2019 | 2 | dep_duration | 0 | 3 |
1346 | 2/19/2019 | 2 | happy_count | 15 | 4 |
1346 | 2/19/2019 | 2 | qual | 96 | 5 |
1346 | 2/19/2019 | 2 | end_datetime | "2019-02-04T12 | 6 |
1346 | 2/19/2019 | 2 | date | "2019-02-04" | 7 |
1346 | 2/19/2019 | 2 | mid | "2019-02-04T09 | 8 |
1346 | 2/19/2019 | 2 | counter_report | 0 | 9 |
1346 | 2/19/2019 | 2 | counter_nap | 392 | 10 |
1346 | 2/19/2019 | 2 | counter_dep | 17 | 11 |
1346 | 2/19/2019 | 2 | counter_to_eat | 0 | 12 |
1346 | 2/19/2019 | 2 | 13 | ||
1346 | 2/19/2019 | 2 | dep_count | 4 | 14 |
1346 | 2/19/2019 | 2 | dep_duration | 7 | 15 |
1346 | 2/19/2019 | 2 | happy_count | 13 | 16 |
1346 | 2/19/2019 | 2 | qual | 93 | 17 |
1346 | 2/19/2019 | 2 | end_datetime | "2019-02-05T09 | 18 |
1346 | 2/19/2019 | 2 | date | "2019-02-05" | 19 |
1346 | 2/19/2019 | 2 | mid | "2019-02-05T06 | 20 |
1346 | 2/19/2019 | 2 | counter_report | 0 | 21 |
1346 | 2/19/2019 | 2 | counter_nap | 327 | 22 |
1346 | 2/19/2019 | 2 | counter_dep | 26 | 23 |
1346 | 2/19/2019 | 2 | counter_to_eat | 0 | 24 |
1346 | 2/19/2019 | 2 | 25 | ||
1346 | 2/19/2019 | 2 | dep_count | 1 | 26 |
1346 | 2/19/2019 | 2 | dep_duration | 3 | 27 |
1346 | 2/19/2019 | 2 | happy_count | 12 | 28 |
1346 | 2/19/2019 | 2 | qual | 95 | 29 |
1346 | 2/19/2019 | 2 | end_datetime | "2019-02-06T12 | 30 |
1346 | 2/19/2019 | 2 | date | "2019-02-06" | 31 |
1346 | 2/19/2019 | 2 | mid | "2019-02-06T08 | 32 |
1346 | 2/19/2019 | 2 | counter_report | 1 | 33 |
1346 | 2/19/2019 | 2 | counter_nap | 444 | 34 |
1346 | 2/19/2019 | 2 | counter_dep | 21 | 35 |
1346 | 2/19/2019 | 2 | counter_to_eat | 0 | 36 |
1786 | 3/14/2019 | 1 | [ | 1 | |
1786 | 3/14/2019 | 1 | dep_count | 2 | 2 |
1786 | 3/14/2019 | 1 | dep_duration | 2 | 3 |
1786 | 3/14/2019 | 1 | happy_count | 8 | 4 |
1786 | 3/14/2019 | 1 | qual | 97 | 5 |
1786 | 3/14/2019 | 1 | end_datetime | "2019-03-10T05 | 6 |
1786 | 3/14/2019 | 1 | date | "2019-03-10" | 7 |
1786 | 3/14/2019 | 1 | mid | "2019-03-10T03 | 8 |
1786 | 3/14/2019 | 1 | counter_report | 0 | 9 |
1786 | 3/14/2019 | 1 | counter_nap | 307 | 10 |
1786 | 3/14/2019 | 1 | counter_dep | 9 | 11 |
1786 | 3/14/2019 | 1 | counter_to_eat | 0 | 12 |
1786 | 3/14/2019 | 1 | 13 | ||
1786 | 3/14/2019 | 1 | dep_count | 0 | 14 |
1786 | 3/14/2019 | 1 | dep_duration | 0 | 15 |
1786 | 3/14/2019 | 1 | happy_count | 5 | 16 |
1786 | 3/14/2019 | 1 | qual | 99 | 17 |
1786 | 3/14/2019 | 1 | end_datetime | "2019-03-11T04 | 18 |
1786 | 3/14/2019 | 1 | date | "2019-03-11" | 19 |
1786 | 3/14/2019 | 1 | mid | "2019-03-11T01 | 20 |
1786 | 3/14/2019 | 1 | counter_report | 0 | 21 |
1786 | 3/14/2019 | 1 | counter_nap | 332 | 22 |
1786 | 3/14/2019 | 1 | counter_dep | 5 | 23 |
1786 | 3/14/2019 | 1 | counter_to_eat | 0 | 24 |
1786 | 3/14/2019 | 1 | 25 | ||
1786 | 3/14/2019 | 1 | dep_count | 1 | 26 |
1786 | 3/14/2019 | 1 | dep_duration | 4 | 27 |
1786 | 3/14/2019 | 1 | happy_count | 15 | 28 |
1786 | 3/14/2019 | 1 | qual | 92 | 29 |
1786 | 3/14/2019 | 1 | end_datetime | "2019-03-12T05 | 30 |
1786 | 3/14/2019 | 1 | date | "2019-03-12" | 31 |
1786 | 3/14/2019 | 1 | mid | "2019-03-12T02 | 32 |
1786 | 3/14/2019 | 1 | counter_report | 0 | 33 |
1786 | 3/14/2019 | 1 | counter_nap | 393 | 34 |
1786 | 3/14/2019 | 1 | counter_dep | 34 | 35 |
1786 | 3/14/2019 | 1 | counter_to_eat | 0 | 36 |
1786 | 4/14/2019 | 2 | [ | 1 |
I then used the "proc transpose" code you gave me, while deleting "date2" and "variable" (which should not be in the ID statement, as the data should remain by the dep_date/variable) and I still got the error "the ID value [name] occurs twice in the same BY group." See below:
ERROR: The ID value "__" occurs twice in the same BY group.
ERROR: The ID value "counter_dep" occurs twice in the same BY group.
ERROR: The ID value "counter_dep" occurs twice in the same BY group.
ERROR: The ID value "counter_nap" occurs twice in the same BY group.
ERROR: The ID value "counter_nap" occurs twice in the same BY group.
ERROR: The ID value "counter_report" occurs twice in the same BY group.
ERROR: The ID value "counter_report" occurs twice in the same BY group.
ERROR: The ID value "counter_to_eat" occurs twice in the same BY group.
ERROR: The ID value "counter_to_eat" occurs twice in the same BY group.
ERROR: The ID value "date" occurs twice in the same BY group.
ERROR: The ID value "date" occurs twice in the same BY group.
ERROR: The ID value "dep_count" occurs twice in the same BY group.
ERROR: The ID value "dep_count" occurs twice in the same BY group.
ERROR: The ID value "dep_duration" occurs twice in the same BY group.
ERROR: The ID value "dep_duration" occurs twice in the same BY group.
ERROR: The ID value "end_datetime" occurs twice in the same BY group.
ERROR: The ID value "end_datetime" occurs twice in the same BY group.
ERROR: The ID value "happy_count" occurs twice in the same BY group.
ERROR: The ID value "happy_count" occurs twice in the same BY group.
ERROR: The ID value "mid" occurs twice in the same BY group.
ERROR: The ID value "mid" occurs twice in the same BY group.
ERROR: The ID value "qual" occurs twice in the same BY group.
ERROR: The ID value "qual" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
id=1346 dep_date=01/20/2019
ERROR: The ID value "counter_dep" occurs twice in the same BY group.
ERROR: The ID value "counter_dep" occurs twice in the same BY group.
ERROR: The ID value "counter_nap" occurs twice in the same BY group.
ERROR: The ID value "counter_nap" occurs twice in the same BY group.
ERROR: The ID value "counter_report" occurs twice in the same BY group.
ERROR: The ID value "counter_report" occurs twice in the same BY group.
ERROR: The ID value "counter_to_eat" occurs twice in the same BY group.
ERROR: The ID value "counter_to_eat" occurs twice in the same BY group.
ERROR: The ID value "date" occurs twice in the same BY group.
ERROR: The ID value "date" occurs twice in the same BY group.
ERROR: The ID value "dep_count" occurs twice in the same BY group.
ERROR: The ID value "dep_count" occurs twice in the same BY group.
ERROR: The ID value "dep_duration" occurs twice in the same BY group.
ERROR: The ID value "dep_duration" occurs twice in the same BY group.
ERROR: The ID value "end_datetime" occurs twice in the same BY group.
ERROR: The ID value "end_datetime" occurs twice in the same BY group.
ERROR: The ID value "happy_count" occurs twice in the same BY group.
ERROR: The ID value "happy_count" occurs twice in the same BY group.
ERROR: The ID value "mid" occurs twice in the same BY group.
ERROR: The ID value "mid" occurs twice in the same BY group.
ERROR: The ID value "qual" occurs twice in the same BY group.
ERROR: The ID value "qual" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
id=1346 dep_date=02/19/2019
ERROR: The ID value "__" occurs twice in the same BY group.
ERROR: The ID value "counter_dep" occurs twice in the same BY group.
ERROR: The ID value "counter_dep" occurs twice in the same BY group.
ERROR: The ID value "counter_nap" occurs twice in the same BY group.
ERROR: The ID value "counter_nap" occurs twice in the same BY group.
ERROR: The ID value "counter_report" occurs twice in the same BY group.
ERROR: The ID value "counter_report" occurs twice in the same BY group.
ERROR: The ID value "counter_to_eat" occurs twice in the same BY group.
ERROR: The ID value "counter_to_eat" occurs twice in the same BY group.
ERROR: The ID value "date" occurs twice in the same BY group.
ERROR: The ID value "date" occurs twice in the same BY group.
ERROR: The ID value "dep_count" occurs twice in the same BY group.
ERROR: The ID value "dep_count" occurs twice in the same BY group.
ERROR: The ID value "dep_duration" occurs twice in the same BY group.
ERROR: The ID value "dep_duration" occurs twice in the same BY group.
ERROR: The ID value "end_datetime" occurs twice in the same BY group.
ERROR: The ID value "end_datetime" occurs twice in the same BY group.
ERROR: The ID value "happy_count" occurs twice in the same BY group.
ERROR: The ID value "happy_count" occurs twice in the same BY group.
ERROR: The ID value "mid" occurs twice in the same BY group.
ERROR: The ID value "mid" occurs twice in the same BY group.
ERROR: The ID value "qual" occurs twice in the same BY group.
ERROR: The ID value "qual" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
id=1786 dep_date=03/14/2019
WARNING: 3 BY groups omitted due to earlier errors.
NOTE: There were 109 observations read from the data set WORK.TALL.
NOTE: The data set WORK.WANT has 1 observations and 17 variables.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
I understand why I'm getting this error; I therefore created a counter by id, dep_date, and name, "day_count", and added "day_count" as a BY group in the PROC TRANSPOSE, which solved the issue. My data now look like this:
id | dep_date | day_count | _NAME_ | __ | _ | counter_dep | counter_nap | counter_report | counter_to_eat | date | dep_count | dep_duration | end_datetime | happy_count | mid | qual | ___ |
1346 | 1/20/2019 | 1 | value | 15 | 297 | 0 | 0 | "2019-01-25" | 4 | 4 | "2019-01-25T06 | 10 | "2019-01-25T04 | 95 | |||
1346 | 1/20/2019 | 2 | value | 13 | 300 | 0 | 0 | "2019-01-26" | 0 | 0 | "2019-01-26T07 | 5 | "2019-01-26T04 | 96 | |||
1346 | 1/20/2019 | 3 | value | 2 | 296 | 6 | 0 | "2019-01-27" | 1 | 1 | "2019-01-27T06 | 4 | "2019-01-27T04 | 99 | |||
1346 | 2/19/2019 | 1 | value | 17 | 392 | 0 | 0 | "2019-02-04" | 0 | 0 | "2019-02-04T12 | 15 | "2019-02-04T09 | 96 | |||
1346 | 2/19/2019 | 2 | value | 26 | 327 | 0 | 0 | "2019-02-05" | 4 | 7 | "2019-02-05T09 | 13 | "2019-02-05T06 | 93 | |||
1346 | 2/19/2019 | 3 | value | 21 | 444 | 1 | 0 | "2019-02-06" | 1 | 3 | "2019-02-06T12 | 12 | "2019-02-06T08 | 95 | |||
1786 | 3/14/2019 | 1 | value | 9 | 307 | 0 | 0 | "2019-03-10" | 2 | 2 | "2019-03-10T05 | 8 | "2019-03-10T03 | 97 | |||
1786 | 3/14/2019 | 2 | value | 5 | 332 | 0 | 0 | "2019-03-11" | 0 | 0 | "2019-03-11T04 | 5 | "2019-03-11T01 | 99 | |||
1786 | 3/14/2019 | 3 | value | 34 | 393 | 0 | 0 | "2019-03-12" | 1 | 4 | "2019-03-12T05 | 15 | "2019-03-12T02 | 92 | |||
1786 | 4/14/2019 | 1 | value |
However:
1) The value is cut off for end_datetime and mid (e.g., "2019-01-25T06 should be 2019-01-25T06:44:30Z). This problem was evident in the original "tall" dataset, as you can see in the first table I posted.
1) I need to strip the quotation marks off of date (which should be in date format), end_datetime, and mid (which should both be in datetime format).
Thanks.
Looks like you are almost there.
You appear to be using too short a variable when scanning. or perhaps your JSON text is just truncating when you read it into SAS?
You appear to have used the wrong delimiters in the SCAN() function call if it is use the slash in the date strings as a delimiter.
You appear to have multiple values for the same NAME. What do you want to do with those? You could add a new counter variable. Then you could either add the new variable to the BY group or the ID variable list so that you have distinct names.
Someone above posted an actual file. So let's use that file.
The first thing that needs to be fixed it to remove the END OF LINE characters from some of the values so that each observation is only ONE LINE in the CSV file. You can use this macro: https://github.com/sasutils/macros/blob/master/replace_crlf.sas
Let's replace the LINE FEED characters with a PIPE character so we know where the line breaks used to be.
filename csv "c:\downloads\dep_PA to paste sample v3 for sas forum.csv";
filename csvfix temp;
%replace_crlf(csv encoding=any,csvfix,cr=,lf='|');
Now we can read the CSV file into an actual SAS dataset.
data have;
infile csvfix dsd truncover firstobs=2 ;
input id $ dep_date :mmddyy. dep_1_PA :$1500. date_2 :mmddyy. dep_2_PA :$1500.;
format dep_date date_2 yymmdd10.;
run;
Now let's parse the strings into records and then into terms and then into NAME/VALUE pairs.
data tall;
set have;
array x DEP_1_PA DEP_2_PA;
length col line row 8 name $32 value $200 term $300 record $1500;
do col=1 to dim(x);
do line=1 by 1 until(record=' ');
record=scan(x[col],line,'|');
do row=1 by 1 until(term=' ');
term=scan(record,row,"[{,'}]");
term=translate(term,' ','A0'x);
name=dequote(scan(term,1,':','q'));
value=dequote(scan(term,2,':','q'));
if term ne ' ' then output;
end;
end;
end;
drop DEP_1_PA DEP_2_PA term record;
run;
Now we can transpose
proc transpose data=tall out=want(drop=_name_);
by id dep_date date_2 col line;
id name ;
var value;
run;
Note that everything is a character string, but most of those variables look like numbers to me.
You could write the file back out into an actual CSV file and let PROC IMPORT try to read it in and it will do a pretty good job with this small sample.
filename newcsv temp;
proc export data=want file=newcsv dbms=csv;
run;
proc import file=newcsv dbms=csv out=want2 ;
run;
Variables in Creation Order # Variable Type Len Format Informat 1 id Num 8 BEST12. BEST32. 2 dep_date Num 8 YYMMDD10. YYMMDD10. 3 date_2 Num 8 YYMMDD10. YYMMDD10. 4 col Num 8 BEST12. BEST32. 5 line Num 8 BEST12. BEST32. 6 dep_count Num 8 BEST12. BEST32. 7 dep_duration Num 8 BEST12. BEST32. 8 happy_count Num 8 BEST12. BEST32. 9 qual Num 8 BEST12. BEST32. 10 end_datetime Num 8 B8601DZ35. B8601DZ35. 11 date Num 8 YYMMDD10. YYMMDD10. 12 mid Num 8 B8601DZ35. B8601DZ35. 13 counter_report Num 8 BEST12. BEST32. 14 counter_nap Num 8 BEST12. BEST32. 15 counter_dep Num 8 BEST12. BEST32. 16 counter_to_eat Num 8 BEST12. BEST32.
Hi,
This is nearly perfect, but:
1.) One line of data is missing; where end_datetime=2019-02-05T09:09:30Z (should be between rows 4 and 5). See below:
Obs | id | dep_date | col | line | dep_count | dep_duration | happy_count | qual | end_datetime | date | mid | counter_report | counter_nap | counter_dep | counter_to_eat |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1346 | 2019-01-20 | 1 | 1 | 4 | 4 | 10 | 95 | 25JAN19:06:44:30 | 2019-01-25 | 25JAN19:04:08:00 | 0 | 297 | 15 | 0 |
2 | 1346 | 2019-01-20 | 1 | 2 | 0 | 0 | 5 | 96 | 26JAN19:07:15:30 | 2019-01-26 | 26JAN19:04:39:30 | 0 | 300 | 13 | 0 |
3 | 1346 | 2019-01-20 | 1 | 3 | 1 | 1 | 4 | 99 | 27JAN19:06:46:00 | 2019-01-27 | 27JAN19:04:13:30 | 6 | 296 | 2 | 0 |
4 | 1346 | 2019-02-19 | 2 | 1 | 0 | 0 | 15 | 96 | 04FEB19:12:25:30 | 2019-02-04 | 04FEB19:09:01:30 | 0 | 392 | 17 | 0 |
5 | 1346 | 2019-02-19 | 2 | 3 | 1 | 3 | 12 | 95 | 06FEB19:12:05:30 | 2019-02-06 | 06FEB19:08:12:00 | 1 | 444 | 21 | 0 |
6 | 1786 | 2019-03-14 | 1 | 1 | 2 | 2 | 8 | 97 | 10MAR19:05:39:00 | 2019-03-10 | 10MAR19:03:00:30 | 0 | 307 | 9 | 0 |
7 | 1786 | 2019-03-14 | 1 | 2 | 0 | 0 | 5 | 99 | 11MAR19:04:38:30 | 2019-03-11 | 11MAR19:01:50:30 | 0 | 332 | 5 | 0 |
8 | 1786 | 2019-03-14 | 1 | 3 | 1 | 4 | 15 | 92 | 12MAR19:05:49:00 | 2019-03-12 | 12MAR19:02:16:00 | 0 | 393 | 34 | 0 |
The line of data should look like this:
ID | dep_date | col | line | dep_count | dep_duration | happy_count | qual | end_datetime | date | mid | counter_report | counter_nap | counter_dep | counter_to_eat |
1346 | 2019-02-19 | 2 | 2 | 4 | 7 | 13 | 93 | 05FEB19:09:09:30Z | 2019-02-05 | 05FEB19:06:13:00Z | 0 | 327 | 26 | 0 |
I've discovered the data gets lost somewhere when moving from "have" to "tall" (the code below):
data tall;
set have;
array x DEP_1_PA DEP_2_PA;
length col line row 8 name $32 value $200 term $300 record $1500;
do col=1 to dim(x);
do line=1 by 1 until(record=' ');
record=scan(x[col],line,'|');
do row=1 by 1 until(term=' ');
term=scan(record,row,"[{,'}]");
term=translate(term,' ','A0'x);
name=dequote(scan(term,1,':','q'));
value=dequote(scan(term,2,':','q'));
if term ne ' ' then output;
end;
end;
end;
drop DEP_1_PA DEP_2_PA term record;
run;
2) The issue with the duplicate dates ("date_2") is still present; e.g., where dates are 2019-01-25, 2019-01-26, and 2019-01-27, the dep_date should be 1/20/2019, whereas where dates are 2019-02-04 and 2019-02-06, dep_date should be 2/19/2019. None of these value belong to both of these dates (dep_date and date_2 should be collapsed). I used the code below:
data want2; set want2; if col=2 then dep_date=date_2;drop date_2;run;
However, is there a better way to do this - to not create "date_2" to begin with? (My larger dataset also has more than 2 "dep_dates" per person, which will make correcting the issue as with the above "if" code more tedious.)
Thanks.
Look at the formatting of LINE 2 for that source observation.
Remove the DROP statement and you should see it in RECORD.
See if you can figure out why it is different than line 1 and line 3 for that observation.
Do perhaps some of the values not have NAME:VALUE pairs? Does it have some extra strange characters that perhaps should be used as delimiters in the second scan() to look for the pairs?
Looks like there are more non-breaking spaces inserted. You could perhaps include those in the second SCAN() function. You could probably remove the line that replaces them with actual spaces since you are now using them as delimiter.
data tall;
set have;
array x DEP_1_PA DEP_2_PA;
length col line row 8 name $32 value $200 term $300 record $1500;
do col=1 to dim(x);
do line=1 by 1 until(record=' ');
record=scan(x[col],line,'|');
do row=1 by 1 until(term=' ');
term=scan(record,row,"[{,'}]"||'A0'x);
term=translate(term,' ','A0'x);
name=dequote(scan(term,1,':','q'));
value=dequote(scan(term,2,':','q'));
if term ne ' ' then output;
end;
end;
end;
drop DEP_1_PA DEP_2_PA term record;
run;
3392 data _null_; 3393 set tall; 3394 where id='1346' and col=2 and line=2; 3395 put (_all_) (=) ; 3396 run; id=1346 dep_date=2019-01-20 date_2=2019-02-19 col=2 line=2 row=1 name=dep_count value=4 id=1346 dep_date=2019-01-20 date_2=2019-02-19 col=2 line=2 row=2 name=dep_duration value=7 id=1346 dep_date=2019-01-20 date_2=2019-02-19 col=2 line=2 row=3 name=happy_count value=13 id=1346 dep_date=2019-01-20 date_2=2019-02-19 col=2 line=2 row=4 name=qual value=93 id=1346 dep_date=2019-01-20 date_2=2019-02-19 col=2 line=2 row=5 name=end_datetime value=2019-02-05T09:09:30Z id=1346 dep_date=2019-01-20 date_2=2019-02-19 col=2 line=2 row=6 name=date value=2019-02-05 id=1346 dep_date=2019-01-20 date_2=2019-02-19 col=2 line=2 row=7 name=mid value=2019-02-05T06:13:00Z id=1346 dep_date=2019-01-20 date_2=2019-02-19 col=2 line=2 row=8 name=counter_report value=0 id=1346 dep_date=2019-01-20 date_2=2019-02-19 col=2 line=2 row=9 name=counter_nap value=327 id=1346 dep_date=2019-01-20 date_2=2019-02-19 col=2 line=2 row=10 name=counter_dep value=26 id=1346 dep_date=2019-01-20 date_2=2019-02-19 col=2 line=2 row=11 name=counter_to_eat value=0 NOTE: There were 11 observations read from the data set WORK.TALL. WHERE (id='1346') and (col=2) and (line=2); NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
What is the meaning of the two date columns in the CSV file?
How do those dates relate to the dates in the JSON text?
Why not just use a different name for those two dates that does not conflict with the names in the JSON text?
Thanks, the new code worked.
The dep_date is a reference date. It's actually not that important - but I'm not sure what you mean by " use a different name." The dep_date and date_2 names are not used for any of the JSON data.
The code I posted works is it takes that one observation and splits into separate observations for reach JSON string VARIABLE and each LINE in within that variable.
The only way it could cause name conflicts is that you have the same name in the same line or the name conflicts with one of the grouping variables. If you have same the NAME in the same LINE of the same VARIABLE then what does that mean? And what do you want to DO about it?
(Note this is why JSON is a terrible format for storing relational data.)
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.