Home
JAQForum Ver 24.01
Log In or Join  
Active Topics
Local Time 08:36 27 Nov 2024 Privacy Policy
Jump to

Notice. New forum software under development. It's going to miss a few functions and look a bit ugly for a while, but I'm working on it full time now as the old forum was too unstable. Couple days, all good. If you notice any issues, please contact me.

Forum Index : Microcontroller and PC projects : Parsing a CVS string

Author Message
CircuitGizmos

Guru

Joined: 08/09/2011
Location: United States
Posts: 1425
Posted: 10:18am 04 Mar 2013
Copy link to clipboard 
Print this post

I made a library routine called FIELD for parsing Comma-Separated Variables in a string. Useful for reading from a file a line at a time and parsing that line. Could be used for reading a file full of GPS records, or other CSV data.



PRINT "CSV parsing test"
PRINT

PRINT "Data with space and comma"
cline$ = "12 0 in,out "
PRINT cline$
PRINT "________"
PRINT FIELD$( cline$, 0 )
PRINT FIELD$( cline$, 1 )
PRINT FIELD$( cline$, 2 )
PRINT FIELD$( cline$, 3 )
PRINT FIELD$( cline$, 4 )
PRINT FIELD$( cline$, 5 )
PRINT
PRINT

PRINT "Data with space, comma, and tab"
cline$ = "12" + CHR$(9) + "0 in,out"
PRINT cline$
PRINT "________"
PRINT FIELD$( cline$, 0 )
PRINT FIELD$( cline$, 1 )
PRINT FIELD$( cline$, 2 )
PRINT FIELD$( cline$, 3 )
PRINT FIELD$( cline$, 4 )
PRINT FIELD$( cline$, 5 )
PRINT
PRINT

PRINT "Data with just comma"
PRINT cline$
PRINT "________"
PRINT FIELD$( cline$, 0, ",")
PRINT FIELD$( cline$, 1, ",")
PRINT FIELD$( cline$, 2, ",")
PRINT
PRINT

PRINT "Data with quotes"
cline$ = "12," + CHR$(34) + "0 in" + CHR$(34) + ",out,fred," + CHR$(34) + "zest step" + CHR$(34) + ",buh"
PRINT CLINE$
PRINT "________"
PRINT FIELD$( cline$, 1 )
PRINT FIELD$( cline$, 2 )
PRINT FIELD$( cline$, 3 )
PRINT FIELD$( cline$, 4 )
PRINT FIELD$( cline$, 5 )
PRINT FIELD$( cline$, 6 )
PRINT
PRINT

PRINT "Data with quotes supressed"
PRINT CLINE$
PRINT "________"
PRINT FIELD$( cline$, 2,,1 )
PRINT FIELD$( cline$, 5,,1 )
PRINT
PRINT

PRINT "Data with quotes"
cline$ = CHR$(34) + "0 in" + CHR$(34) + ",out,fred," + CHR$(34) + "zest step" + CHR$(34)
PRINT CLINE$
PRINT "________"
PRINT FIELD$( cline$, 1 )
PRINT FIELD$( cline$, 2 )
PRINT FIELD$( cline$, 3 )
PRINT FIELD$( cline$, 4 )



'----------------------------------------------------------- -------------
'
' FIELD - CSV field parsing
' field_string$ = String to parse
' field_index = Index to requested field
' field_delim$ = Optional field delimeters
' field_supress = Suppress quote in quoted field
'
' Parsing a CVS string - CircuitGizmos March 2013
'----------------------------------------------------------- -------------

Function FIELD$( field_string$, field_index, field_delim$, field_supress )
LOCAL field_loop
LOCAL field_dcount
LOCAL field_dloop
LOCAL field_length
LOCAL field_field$

field_length = LEN(field_string$)
field_field$ = ""
field_dcount = 1
field_loop = 1


' Delimiter defaults: SPACE COMMA TAB
IF field_delim$ = "" THEN field_delim$ = " ," + CHR$(9)

DO
field_delim = 0
' Process inside paren (no delimiters)
IF MID$( field_string$, field_loop, 1 ) = CHR$(34) THEN
DO
IF field_dcount = field_index THEN
field_field$ = field_field$ + MID$( field_string$, field_loop, 1 )
ENDIF
field_loop = field_loop + 1
LOOP UNTIL (MID$( field_string$, field_loop, 1 ) = CHR$(34)) OR (field_loop > field_length)
ENDIF

' Search through delimiters
FOR field_dloop = 1 to LEN(field_delim$)
' If char is a delim
IF MID$( field_string$, field_loop, 1 ) = MID$( field_delim$, field_dloop, 1 ) THEN
field_dcount = field_dcount + 1
field_delim = 1
ENDIF
NEXT field_dloop

' Add to returned string if not delim
IF field_delim = 0 THEN
IF field_dcount = field_index THEN
field_field$ = field_field$ + MID$( field_string$, field_loop, 1 )
ENDIF
ENDIF

field_loop = field_loop + 1
LOOP UNTIL field_loop > field_length

' Suppress quote / return field
FOR field_loop = 1 to LEN(field_field$)
IF (MID$( field_field$, field_loop, 1 ) = CHR$(34)) AND (field_supress <> 0) THEN
'
ELSE
FIELD$ = FIELD$ + MID$( field_field$, field_loop, 1 )
ENDIF
NEXT field_loop

End Function



Output:


CSV parsing test

Data with space and comma
12 0 in,out
________

12
0
in
out



Data with space, comma, and tab
12 0 in,out
________

12
0
in
out



Data with just comma
12 0 in,out
________

12 0 in
out


Data with quotes
12,"0 in",out,fred,"zest step",buh
________
12
"0 in"
out
fred
"zest step"
buh


Data with quotes supressed
12,"0 in",out,fred,"zest step",buh
________
0 in
zest step


Data with quotes
"0 in",out,fred,"zest step"
________
"0 in"
out
fred
"zest step"




Micromites and Maximites! - Beginning Maximite
 
JohnS
Guru

Joined: 18/11/2011
Location: United Kingdom
Posts: 3802
Posted: 11:01am 04 Mar 2013
Copy link to clipboard 
Print this post

Does it fix typos, though?

CVS -> CSV

John
 
CircuitGizmos

Guru

Joined: 08/09/2011
Location: United States
Posts: 1425
Posted: 11:10am 04 Mar 2013
Copy link to clipboard 
Print this post

@!)#*%&#

I've been doing that all day.

CSV CSV CSV CSV...
Micromites and Maximites! - Beginning Maximite
 
vk4tec

Senior Member

Joined: 24/03/2012
Location: Australia
Posts: 239
Posted: 12:25pm 04 Mar 2013
Copy link to clipboard 
Print this post

Sub nmea_sentence
Do
msg$ ="$" ' subroutine start
Do While Input$(1, #1) <> "$" : Loop ' wait for the start
For i = 0 To max
arg$(i) = "" ' clear ready for data
Do ' loops until a specific exit
x$ = Input$(1, #1)
' Print x$
msg$ = msg$ + x$ ' get the character
If x$ = "," Then Exit ' new data field, increment i
If x$ = "*" Then Return ' we have all the data so return
arg$(i) = arg$(i) + x$
'Print arg$(i) ' add to the data
Loop
'Print arg$(i) ' loop back for the next char
Next i ' move to the next data field
' Print "Corrupt data..." ' exceeded max data items
'Print msg$
Loop

End Sub
Andrew Rich VK4TEC
www.tech-software.net
 
CircuitGizmos

Guru

Joined: 08/09/2011
Location: United States
Posts: 1425
Posted: 12:32pm 04 Mar 2013
Copy link to clipboard 
Print this post

  vk4tec said   Sub nmea_sentence


That is nmea only. This does other CSV records, too.
Micromites and Maximites! - Beginning Maximite
 
paceman
Guru

Joined: 07/10/2011
Location: Australia
Posts: 1329
Posted: 07:40pm 04 Mar 2013
Copy link to clipboard 
Print this post

Hi CG,
Looks like another good candidate for the library.
 
MicroBlocks

Guru

Joined: 12/05/2012
Location: Thailand
Posts: 2209
Posted: 11:44pm 04 Mar 2013
Copy link to clipboard 
Print this post

There are a few 'problems' with that routine.

Readability:
The supplied code and results do not match.
It is zero based or one based. The samples are just a mix of them and confuses things.
Better variable names could be used. Prefixing everything with "field_" is unnecessary and is just clutter. It looks like a leftover from using only global variables.

For example field_string$ could be named Line$ or Record$.
When names like "fields" are used the name "Record" is probably the best.
The rest of the variable names are also without usefull meaning, but this is debatable because everyone has their own preferences.
I personally would change the function name to GetField.

Quotes are normally suppressed when parsing CSV files.
It should be the default. Change Suppress into KeepQuotes.
CSV files are normally delimited with only one specific character. the "C" stands for comma, so that should be the default value. More then one delimiter is a good extra functionality and supplying it through a parameter only when needed is handy.

The comments in the code like " Process inside paren (no delimiters)" use 'paren', probably a shorthand for parentheses, but in the actual code double quotes are tested. Comments in code should explain what is done, not tell a different story. It only confuses more.

An IF statement that does nothing when true is just weird and looks very "hacky".

Efficiency (speed):
The efficiency is terrible.
The MID$ function is used every time, should be used only once to get the character.
Looping through delimiters is a big time waster and makes the code overly complicated.
Just use the INSTR function
And the real speed killer is that when you want the first field the code just keeps on going until all fields are processed. Resulting in a large amount of wasted time.

All this talk is of course worth nothing if no rewrite is offered. So here it is:
[code]
Function GetField$( Record$, FieldNumber, Delimiter$, KeepQuotes )
LOCAL Index, Char, InQuote, Count, Result$

InQuote = 0
Count = 0

' If no delimiter is provided use a comma
IF Delimiter$ = "" then Delimiter$ = ","

' Loop through all the characters
FOR Index = 1 TO LEN(Record$)
Char = ASC(MID$(Record$, Index, 1))

IF Char = 34 THEN
' Flip InQuote between 0 and 1.
' A=1 will signal that the next characters are
' within quotes and delimiters should be ignored
InQuote = NOT InQuote
ENDIF

' Only check for delimiters when not within two quotes
IF NOT InQuote AND INSTR(Delimiter$, chr$(char)) >= 1 THEN
Count = Count + 1
' if the requested field is processed,
' stop processing the rest of the record
IF Count > FieldNumber THEN
EXIT FOR
ENDIF
ELSE

' Only collect characters for the requested field.
IF Count = FieldNumber THEN

'Add the character and only add quotes when KeepQuotes is 1
IF Char <> 34 OR KeepQuotes THEN
Result$ = Result$ + chr$(char)
ENDIF
ENDIF
ENDIF
NEXT

GetField$ = Result$
End Function

[/code]

The code vk4tec provided is much faster because it returns an array of fields preventing parsing fields again and again. (Only faster when more then 1 or 2 fields from a records are needed, but that is often the case).
With some modification it can be made more generic.
It is a BASIC language problem that functions can not return Arrays so a global variable has to be used. Not as neat, but if speed counts a necessary step.

Code modified to return an array
[code]
Function GetFieldArray( Record$, Delimiter$, KeepQuotes )
LOCAL Index, Char, InQuote, Count

InQuote = 0
Count = 0

' If no delimiter is provided use a comma
IF Delimiter$ = "" then Delimiter$ = ","

' Loop through all the characters
FOR Index = 1 to LEN(Record$)
Char = ASC(MID$(Record$, Index, 1))

IF Char = 34 THEN
' Flip InQuote between 0 and 1.
' A 1 will signal that the next characters
' are within quotes and delimiters should be ignored
InQuote = NOT InQuote
ENDIF

' Only check for delimiters when not within two quotes
IF NOT InQuote AND INSTR(Delimiter$, chr$(char)) >= 1 THEN
Count = Count + 1
ELSE

' Only collect characters for the requested field.
'Add the character and only add quotes when KeepQuotes is 1
IF Char <> 34 OR KeepQuotes THEN
GetFieldArray$(Count) = GetFieldArray$(Count) + chr$(char)
ENDIF
ENDIF
ENDIF
NEXT

GetFieldArray = Count + 1
End Function
[/code]

It can be used like this (The most efficient way to parse a delimited string):
[code]
Dim GetFieldArray$(30)
Value$ = "$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,0 03.1,W*6A"

N = GetFieldArray(Value$)
'A list of all values
for Index = 0 to N - 1
print GetFieldArray$(Index)
next

'or directly use the field
Speed = VAL(GetFieldArray$(5))

[/code]


A memory efficient but more difficult to read version:
[code]
Function GetFieldArray( R$, D$, K )
LOCAL I, C, Q, T
IF D$ = "" then D$ = ","
FOR I = 1 to LEN(R$)
C = ASC(MID$(R$, I, 1))
IF C = 34 THEN Q = NOT Q
IF NOT Q AND INSTR(D$, CHR$(C)) >= 1 THEN
T = T + 1
ELSE
IF C <> 34 OR K THEN GetFieldArray$(T) = GetFieldArray$(T) + CHR$(C)
ENDIF
NEXT
GetFieldArray = T + 1
End Function
[/code]


Edited by TZAdvantage 2013-03-06
Microblocks. Build with logic.
 
Geoffg

Guru

Joined: 06/06/2011
Location: Australia
Posts: 3194
Posted: 02:14am 05 Mar 2013
Copy link to clipboard 
Print this post

A great function CG and I am sure that Hugh will want to add it to the Library.

I must admit that TZAdvantage has got a lot of good points and you should consider adopting a lot of them. But, don't be despondent... I remember when I designed the mini Maximite and you came out with a much better version (the CGMMSTICK1). That is what happens when you start the ball rolling.

Geoff
Geoff Graham - http://geoffg.net
 
MicroBlocks

Guru

Joined: 12/05/2012
Location: Thailand
Posts: 2209
Posted: 02:43am 05 Mar 2013
Copy link to clipboard 
Print this post

Indeed.
I always am in doubt when i write a post like that.
Never know how the other will react.
Constructive criticism is what i want to do.
I am direct and that sometimes gives the wrong 'feeling'.

CG, after reading that post again, i could have used some better words, i apologize for that.
I hope you read it and take it in good spirit as that is what is intended.

Microblocks. Build with logic.
 
CircuitGizmos

Guru

Joined: 08/09/2011
Location: United States
Posts: 1425
Posted: 04:57am 05 Mar 2013
Copy link to clipboard 
Print this post

  Geoffg said   I remember when I designed the mini Maximite and you came out with a much better version (the CGMMSTICK1).
Geoff


Thanks, Geoff for the nice words - I didn't make the CGMMSTICK thinking that what I make would be "better", but just an alternate. I wanted the Maximite I made to be compatible with yours, not hardware incompatible. It is a different form-factor, but no MMBasic firmware changes would be needed.

Same with the CGCOLORMAX. Different from the Colour Maximite (it isn't a through-hole kit and does have 'extra' stuff) but still completely compatible from the perspective of the MMBasic firmware. People like the Colour Maximite box a lot more than the more industrial box that I chose.

The devices I make are alternates for the Maximites you made, but compatible in the ways that are important.
Micromites and Maximites! - Beginning Maximite
 
CircuitGizmos

Guru

Joined: 08/09/2011
Location: United States
Posts: 1425
Posted: 07:00am 05 Mar 2013
Copy link to clipboard 
Print this post

I appreciate the parts of that feedback that really were constructive. But a lot of it was very much not constructive. You say:

"I always am in doubt when i write a post like that. Never know how the other will react."

Perhaps you should listen to that little voice. Trim your post to suggestions, rather than blunt negative criticism.

Saying something like: "The supplied code and results do not match." and yet not saying why is useless and incorrect. That output is directly from the code.

I posted some code that I thought would help somebody. I expected it to be improved. It evolved over time.

Providing feedback that is frankly insulting is unfortunate - it is a reflection on your character. For anyone who might think of posting their example code you have shown that this is the feedback that they will get.

I hope that this doesn't further suppress enthusiasm to share example code, and I hope you learn to be a little better socially developed.

Micromites and Maximites! - Beginning Maximite
 
MicroBlocks

Guru

Joined: 12/05/2012
Location: Thailand
Posts: 2209
Posted: 07:41am 05 Mar 2013
Copy link to clipboard 
Print this post

I am sorry you feel that way.
I was just being honest.

(My post was a lot longer, but on second thought, why bother.)
We are all adults.

Microblocks. Build with logic.
 
shoebuckle
Senior Member

Joined: 21/01/2012
Location: Australia
Posts: 189
Posted: 02:46pm 05 Mar 2013
Copy link to clipboard 
Print this post

@TZ,
I am going to put CG's CSV Parsing routine in MMLib and but since there is no "right" way of solving the problem, only the one which suits your needs, I would like to add your variations to show others what can be done, however your example invoking code doesn't run on MMBasic (Gets an "error: Expected a number" at the line N = GetFieldArray(Value$)). Would you like to post a working version or you could email it to mmlib@geoffg.net?

The code (all versions) will appear as CSVPARSn.BAS, to my mind a more suitable name.

@VK4TEC
I will also include your version. Would you like to post or email a bit of code to demonstrate invocation of your routine?

Cheers,
Hugh

 
MicroBlocks

Guru

Joined: 12/05/2012
Location: Thailand
Posts: 2209
Posted: 06:30pm 05 Mar 2013
Copy link to clipboard 
Print this post

@Hugh,

I have tested this code in the DOS version of MMBasic.
I checked and it is version 4.0
I'll go get the newest version and try it out again.
Checked it in DOS version 4.3 and that also works.
i used this:
[code]
'Required Global variable for GetFieldArray function
'Dimension this variable so that it can hold all values
Dim GetFieldArray$(12)
' Value is a minimal GPS record
Value$ = "$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,0 03.1,W*6A"

Cls
Print "GPS Record:"
Print Value$
Print

N = GetFieldArray(Value$)
'A list of all values
Print "List of all fields"
Print "------------------"
For Index = 0 To N - 1
print Index, GetFieldArray$(Index)
Next
Print "------------------"
Print

'or directly use a field
Speed = Val(GetFieldArray$(7))
Print "A field used by its index"
Print "-------------------------"
Print "Speed = " Speed
Print "-------------------------"

End


' ============================================================ =================
' GetFieldArray
' A function to split a string into an Array of fields
'
' Author: TZ Advantage
' Date: 5 march 2013
' Version: 1.0
'
' This function requires the use of a global variable
' named GetFieldArray$(). It should be dimensioned large
' enough to contain all fields.
'
' Best used in situations when all fields are needed as
' it will be the fastest.
' Consider using the function GetField()
' when only one or two fields are needed or speed is of no concern
'
' Function parameters:
' Record$: A string containing the delimited fields
' Delimiter$: (Optional) delimiter character, when omitted a comma is used
' KeepQuotes: (Optional) [0|1]
' Use value 1 if double quotes around field values have to be kept,
' when omitted or value 0 is used double quotes are discarded
' ============================================================ =================
Function GetFieldArray( Record$, Delimiter$, KeepQuotes )
Local Index, Char, InQuote, Count

InQuote = 0
Count = 0

' If no delimiter is provided use a comma
If Delimiter$ = "" then Delimiter$ = ","

' Loop through all the characters
For Index = 1 To LEN(Record$)
Char = Asc(MID$(Record$, Index, 1))

If Char = 34 Then
' Flip InQuote between 0 and 1.
' A 1 will signal that the next characters
' are within quotes and delimiters should be ignored
InQuote = Not InQuote
EndIf

' Only check for delimiters when not within two quotes
If Not InQuote AND INSTR(Delimiter$, chr$(char)) >= 1 Then
Count = Count + 1
Else

' Only collect characters for the requested field.
'Add the character and only add quotes when KeepQuotes is 1
If Char <> 34 Or KeepQuotes Then
GetFieldArray$(Count) = GetFieldArray$(Count) + chr$(char)
EndIf
EndIf
EndIf
Next

GetFieldArray = Count + 1
End Function
[/code]
Nice to have would be a Ubound function so that the upper limit of arrays can be tested. Same for LBound to know the lowest index of an array.

Edited by TZAdvantage 2013-03-07
Microblocks. Build with logic.
 
shoebuckle
Senior Member

Joined: 21/01/2012
Location: Australia
Posts: 189
Posted: 10:31pm 05 Mar 2013
Copy link to clipboard 
Print this post

Thanks TZ. That's great and I will add it to the library.

I know it's a bit of a back-handed compliment and I am spruking the value of FORMAT.BAS, but your code has highlighted one of its benefits. If you run your code through FORMAT you will find that the last four lines are formatted incorrectly and that is because you have an extra ENDIF at the 4th line from the end. Your code does what you wanted it to because in this instance MMBasic has ignored the extra EndIf, but this is just the kind of error can cause much head scratching if the code is later modified.

I have run your code in v4.3 (without the extra EndIf) on my Mono Maximite and it worked perfectly.
Cheers,
Hugh
 
MicroBlocks

Guru

Joined: 12/05/2012
Location: Thailand
Posts: 2209
Posted: 10:39pm 05 Mar 2013
Copy link to clipboard 
Print this post

Hugh,

Good catch!
My only excuse is that it was very late!.

Microblocks. Build with logic.
 
Print this page


To reply to this topic, you need to log in.

© JAQ Software 2024