Notice. New forum software under development. It's going to miss a few functions and look a bit ugly for a while, but I'm working on it full time now as the old forum was too unstable. Couple days, all good. If you notice any issues, please contact me.
|
Forum Index : Microcontroller and PC projects : Parsing a CVS string
Author | Message | ||||
CircuitGizmos Guru Joined: 08/09/2011 Location: United StatesPosts: 1425 |
I made a library routine called FIELD for parsing Comma-Separated Variables in a string. Useful for reading from a file a line at a time and parsing that line. Could be used for reading a file full of GPS records, or other CSV data. PRINT "CSV parsing test" PRINT "Data with space and comma" cline$ = "12 0 in,out " PRINT cline$ PRINT "________" PRINT FIELD$( cline$, 0 ) PRINT FIELD$( cline$, 1 ) PRINT FIELD$( cline$, 2 ) PRINT FIELD$( cline$, 3 ) PRINT FIELD$( cline$, 4 ) PRINT FIELD$( cline$, 5 ) PRINT "Data with space, comma, and tab" cline$ = "12" + CHR$(9) + "0 in,out" PRINT cline$ PRINT "________" PRINT FIELD$( cline$, 0 ) PRINT FIELD$( cline$, 1 ) PRINT FIELD$( cline$, 2 ) PRINT FIELD$( cline$, 3 ) PRINT FIELD$( cline$, 4 ) PRINT FIELD$( cline$, 5 ) PRINT "Data with just comma" PRINT cline$ PRINT "________" PRINT FIELD$( cline$, 0, ",") PRINT FIELD$( cline$, 1, ",") PRINT FIELD$( cline$, 2, ",") PRINT "Data with quotes" cline$ = "12," + CHR$(34) + "0 in" + CHR$(34) + ",out,fred," + CHR$(34) + "zest step" + CHR$(34) + ",buh" PRINT CLINE$ PRINT "________" PRINT FIELD$( cline$, 1 ) PRINT FIELD$( cline$, 2 ) PRINT FIELD$( cline$, 3 ) PRINT FIELD$( cline$, 4 ) PRINT FIELD$( cline$, 5 ) PRINT FIELD$( cline$, 6 ) PRINT "Data with quotes supressed" PRINT CLINE$ PRINT "________" PRINT FIELD$( cline$, 2,,1 ) PRINT FIELD$( cline$, 5,,1 ) PRINT "Data with quotes" cline$ = CHR$(34) + "0 in" + CHR$(34) + ",out,fred," + CHR$(34) + "zest step" + CHR$(34) PRINT CLINE$ PRINT "________" PRINT FIELD$( cline$, 1 ) PRINT FIELD$( cline$, 2 ) PRINT FIELD$( cline$, 3 ) PRINT FIELD$( cline$, 4 ) '----------------------------------------------------------- ------------- ' ' FIELD - CSV field parsing ' field_string$ = String to parse ' field_index = Index to requested field ' field_delim$ = Optional field delimeters ' field_supress = Suppress quote in quoted field ' ' Parsing a CVS string - CircuitGizmos March 2013 '----------------------------------------------------------- ------------- Function FIELD$( field_string$, field_index, field_delim$, field_supress ) LOCAL field_loop LOCAL field_dcount LOCAL field_dloop LOCAL field_length LOCAL field_field$ field_length = LEN(field_string$) field_field$ = "" field_dcount = 1 field_loop = 1 ' Delimiter defaults: SPACE COMMA TAB IF field_delim$ = "" THEN field_delim$ = " ," + CHR$(9) DO field_delim = 0 ' Process inside paren (no delimiters) IF MID$( field_string$, field_loop, 1 ) = CHR$(34) THEN DO IF field_dcount = field_index THEN field_field$ = field_field$ + MID$( field_string$, field_loop, 1 ) ENDIF field_loop = field_loop + 1 LOOP UNTIL (MID$( field_string$, field_loop, 1 ) = CHR$(34)) OR (field_loop > field_length) ENDIF ' Search through delimiters FOR field_dloop = 1 to LEN(field_delim$) ' If char is a delim IF MID$( field_string$, field_loop, 1 ) = MID$( field_delim$, field_dloop, 1 ) THEN field_dcount = field_dcount + 1 field_delim = 1 ENDIF NEXT field_dloop ' Add to returned string if not delim IF field_delim = 0 THEN IF field_dcount = field_index THEN field_field$ = field_field$ + MID$( field_string$, field_loop, 1 ) ENDIF ENDIF field_loop = field_loop + 1 LOOP UNTIL field_loop > field_length ' Suppress quote / return field FOR field_loop = 1 to LEN(field_field$) IF (MID$( field_field$, field_loop, 1 ) = CHR$(34)) AND (field_supress <> 0) THEN ' ELSE FIELD$ = FIELD$ + MID$( field_field$, field_loop, 1 ) ENDIF NEXT field_loop End Function Output: CSV parsing test Data with space and comma 12 0 in,out ________ 12 0 in out Data with space, comma, and tab 12 0 in,out ________ 12 0 in out Data with just comma 12 0 in,out ________ 12 0 in out Data with quotes 12,"0 in",out,fred,"zest step",buh ________ 12 "0 in" out fred "zest step" buh Data with quotes supressed 12,"0 in",out,fred,"zest step",buh ________ 0 in zest step Data with quotes "0 in",out,fred,"zest step" ________ "0 in" out fred "zest step" Micromites and Maximites! - Beginning Maximite |
||||
JohnS Guru Joined: 18/11/2011 Location: United KingdomPosts: 3802 |
Does it fix typos, though? CVS -> CSV John |
||||
CircuitGizmos Guru Joined: 08/09/2011 Location: United StatesPosts: 1425 |
@!)#*% I've been doing that all day. CSV CSV CSV CSV... Micromites and Maximites! - Beginning Maximite |
||||
vk4tec Senior Member Joined: 24/03/2012 Location: AustraliaPosts: 239 |
Sub nmea_sentence Do msg$ ="$" ' subroutine start Do While Input$(1, #1) <> "$" : Loop ' wait for the start For i = 0 To max arg$(i) = "" ' clear ready for data Do ' loops until a specific exit x$ = Input$(1, #1) ' Print x$ msg$ = msg$ + x$ ' get the character If x$ = "," Then Exit ' new data field, increment i If x$ = "*" Then Return ' we have all the data so return arg$(i) = arg$(i) + x$ 'Print arg$(i) ' add to the data Loop 'Print arg$(i) ' loop back for the next char Next i ' move to the next data field ' Print "Corrupt data..." ' exceeded max data items 'Print msg$ Loop End Sub Andrew Rich VK4TEC www.tech-software.net |
||||
CircuitGizmos Guru Joined: 08/09/2011 Location: United StatesPosts: 1425 |
That is nmea only. This does other CSV records, too. Micromites and Maximites! - Beginning Maximite |
||||
paceman Guru Joined: 07/10/2011 Location: AustraliaPosts: 1329 |
Hi CG, Looks like another good candidate for the library. |
||||
MicroBlocks Guru Joined: 12/05/2012 Location: ThailandPosts: 2209 |
There are a few 'problems' with that routine. Readability: The supplied code and results do not match. It is zero based or one based. The samples are just a mix of them and confuses things. Better variable names could be used. Prefixing everything with "field_" is unnecessary and is just clutter. It looks like a leftover from using only global variables. For example field_string$ could be named Line$ or Record$. When names like "fields" are used the name "Record" is probably the best. The rest of the variable names are also without usefull meaning, but this is debatable because everyone has their own preferences. I personally would change the function name to GetField. Quotes are normally suppressed when parsing CSV files. It should be the default. Change Suppress into KeepQuotes. CSV files are normally delimited with only one specific character. the "C" stands for comma, so that should be the default value. More then one delimiter is a good extra functionality and supplying it through a parameter only when needed is handy. The comments in the code like " Process inside paren (no delimiters)" use 'paren', probably a shorthand for parentheses, but in the actual code double quotes are tested. Comments in code should explain what is done, not tell a different story. It only confuses more. An IF statement that does nothing when true is just weird and looks very "hacky". Efficiency (speed): The efficiency is terrible. The MID$ function is used every time, should be used only once to get the character. Looping through delimiters is a big time waster and makes the code overly complicated. Just use the INSTR function And the real speed killer is that when you want the first field the code just keeps on going until all fields are processed. Resulting in a large amount of wasted time. All this talk is of course worth nothing if no rewrite is offered. So here it is: [code] Function GetField$( Record$, FieldNumber, Delimiter$, KeepQuotes ) LOCAL Index, Char, InQuote, Count, Result$ InQuote = 0 Count = 0 ' If no delimiter is provided use a comma IF Delimiter$ = "" then Delimiter$ = "," ' Loop through all the characters FOR Index = 1 TO LEN(Record$) Char = ASC(MID$(Record$, Index, 1)) IF Char = 34 THEN ' Flip InQuote between 0 and 1. ' A=1 will signal that the next characters are ' within quotes and delimiters should be ignored InQuote = NOT InQuote ENDIF ' Only check for delimiters when not within two quotes IF NOT InQuote AND INSTR(Delimiter$, chr$(char)) >= 1 THEN Count = Count + 1 ' if the requested field is processed, ' stop processing the rest of the record IF Count > FieldNumber THEN EXIT FOR ENDIF ELSE ' Only collect characters for the requested field. IF Count = FieldNumber THEN 'Add the character and only add quotes when KeepQuotes is 1 IF Char <> 34 OR KeepQuotes THEN Result$ = Result$ + chr$(char) ENDIF ENDIF ENDIF NEXT GetField$ = Result$ End Function [/code] The code vk4tec provided is much faster because it returns an array of fields preventing parsing fields again and again. (Only faster when more then 1 or 2 fields from a records are needed, but that is often the case). With some modification it can be made more generic. It is a BASIC language problem that functions can not return Arrays so a global variable has to be used. Not as neat, but if speed counts a necessary step. Code modified to return an array [code] Function GetFieldArray( Record$, Delimiter$, KeepQuotes ) LOCAL Index, Char, InQuote, Count InQuote = 0 Count = 0 ' If no delimiter is provided use a comma IF Delimiter$ = "" then Delimiter$ = "," ' Loop through all the characters FOR Index = 1 to LEN(Record$) Char = ASC(MID$(Record$, Index, 1)) IF Char = 34 THEN ' Flip InQuote between 0 and 1. ' A 1 will signal that the next characters ' are within quotes and delimiters should be ignored InQuote = NOT InQuote ENDIF ' Only check for delimiters when not within two quotes IF NOT InQuote AND INSTR(Delimiter$, chr$(char)) >= 1 THEN Count = Count + 1 ELSE ' Only collect characters for the requested field. 'Add the character and only add quotes when KeepQuotes is 1 IF Char <> 34 OR KeepQuotes THEN GetFieldArray$(Count) = GetFieldArray$(Count) + chr$(char) ENDIF ENDIF ENDIF NEXT GetFieldArray = Count + 1 End Function [/code] It can be used like this (The most efficient way to parse a delimited string): [code] Dim GetFieldArray$(30) Value$ = "$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,0 03.1,W*6A" N = GetFieldArray(Value$) 'A list of all values for Index = 0 to N - 1 print GetFieldArray$(Index) next 'or directly use the field Speed = VAL(GetFieldArray$(5)) [/code] A memory efficient but more difficult to read version: [code] Function GetFieldArray( R$, D$, K ) LOCAL I, C, Q, T IF D$ = "" then D$ = "," FOR I = 1 to LEN(R$) C = ASC(MID$(R$, I, 1)) IF C = 34 THEN Q = NOT Q IF NOT Q AND INSTR(D$, CHR$(C)) >= 1 THEN T = T + 1 ELSE IF C <> 34 OR K THEN GetFieldArray$(T) = GetFieldArray$(T) + CHR$(C) ENDIF NEXT GetFieldArray = T + 1 End Function [/code] Microblocks. Build with logic. |
||||
Geoffg Guru Joined: 06/06/2011 Location: AustraliaPosts: 3194 |
A great function CG and I am sure that Hugh will want to add it to the Library. I must admit that TZAdvantage has got a lot of good points and you should consider adopting a lot of them. But, don't be despondent... I remember when I designed the mini Maximite and you came out with a much better version (the CGMMSTICK1). That is what happens when you start the ball rolling. Geoff Geoff Graham - http://geoffg.net |
||||
MicroBlocks Guru Joined: 12/05/2012 Location: ThailandPosts: 2209 |
Indeed. I always am in doubt when i write a post like that. Never know how the other will react. Constructive criticism is what i want to do. I am direct and that sometimes gives the wrong 'feeling'. CG, after reading that post again, i could have used some better words, i apologize for that. I hope you read it and take it in good spirit as that is what is intended. Microblocks. Build with logic. |
||||
CircuitGizmos Guru Joined: 08/09/2011 Location: United StatesPosts: 1425 |
Thanks, Geoff for the nice words - I didn't make the CGMMSTICK thinking that what I make would be "better", but just an alternate. I wanted the Maximite I made to be compatible with yours, not hardware incompatible. It is a different form-factor, but no MMBasic firmware changes would be needed. Same with the CGCOLORMAX. Different from the Colour Maximite (it isn't a through-hole kit and does have 'extra' stuff) but still completely compatible from the perspective of the MMBasic firmware. People like the Colour Maximite box a lot more than the more industrial box that I chose. The devices I make are alternates for the Maximites you made, but compatible in the ways that are important. Micromites and Maximites! - Beginning Maximite |
||||
CircuitGizmos Guru Joined: 08/09/2011 Location: United StatesPosts: 1425 |
I appreciate the parts of that feedback that really were constructive. But a lot of it was very much not constructive. You say: "I always am in doubt when i write a post like that. Never know how the other will react." Perhaps you should listen to that little voice. Trim your post to suggestions, rather than blunt negative criticism. Saying something like: "The supplied code and results do not match." and yet not saying why is useless and incorrect. That output is directly from the code. I posted some code that I thought would help somebody. I expected it to be improved. It evolved over time. Providing feedback that is frankly insulting is unfortunate - it is a reflection on your character. For anyone who might think of posting their example code you have shown that this is the feedback that they will get. I hope that this doesn't further suppress enthusiasm to share example code, and I hope you learn to be a little better socially developed. Micromites and Maximites! - Beginning Maximite |
||||
MicroBlocks Guru Joined: 12/05/2012 Location: ThailandPosts: 2209 |
I am sorry you feel that way. I was just being honest. (My post was a lot longer, but on second thought, why bother.) We are all adults. Microblocks. Build with logic. |
||||
shoebuckle Senior Member Joined: 21/01/2012 Location: AustraliaPosts: 189 |
@TZ, I am going to put CG's CSV Parsing routine in MMLib and but since there is no "right" way of solving the problem, only the one which suits your needs, I would like to add your variations to show others what can be done, however your example invoking code doesn't run on MMBasic (Gets an "error: Expected a number" at the line N = GetFieldArray(Value$)). Would you like to post a working version or you could email it to mmlib@geoffg.net? The code (all versions) will appear as CSVPARSn.BAS, to my mind a more suitable name. @VK4TEC I will also include your version. Would you like to post or email a bit of code to demonstrate invocation of your routine? Cheers, Hugh |
||||
MicroBlocks Guru Joined: 12/05/2012 Location: ThailandPosts: 2209 |
@Hugh, I have tested this code in the DOS version of MMBasic. I checked and it is version 4.0 I'll go get the newest version and try it out again. Checked it in DOS version 4.3 and that also works. i used this: [code] 'Required Global variable for GetFieldArray function 'Dimension this variable so that it can hold all values Dim GetFieldArray$(12) ' Value is a minimal GPS record Value$ = "$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,0 03.1,W*6A" Cls Print "GPS Record:" Print Value$ N = GetFieldArray(Value$) 'A list of all values Print "List of all fields" Print "------------------" For Index = 0 To N - 1 print Index, GetFieldArray$(Index) Next Print "------------------" 'or directly use a field Speed = Val(GetFieldArray$(7)) Print "A field used by its index" Print "-------------------------" Print "Speed = " Speed Print "-------------------------" End ' ============================================================ ================= ' GetFieldArray ' A function to split a string into an Array of fields ' ' Author: TZ Advantage ' Date: 5 march 2013 ' Version: 1.0 ' ' This function requires the use of a global variable ' named GetFieldArray$(). It should be dimensioned large ' enough to contain all fields. ' ' Best used in situations when all fields are needed as ' it will be the fastest. ' Consider using the function GetField() ' when only one or two fields are needed or speed is of no concern ' ' Function parameters: ' Record$: A string containing the delimited fields ' Delimiter$: (Optional) delimiter character, when omitted a comma is used ' KeepQuotes: (Optional) [0|1] ' Use value 1 if double quotes around field values have to be kept, ' when omitted or value 0 is used double quotes are discarded ' ============================================================ ================= Function GetFieldArray( Record$, Delimiter$, KeepQuotes ) Local Index, Char, InQuote, Count InQuote = 0 Count = 0 ' If no delimiter is provided use a comma If Delimiter$ = "" then Delimiter$ = "," ' Loop through all the characters For Index = 1 To LEN(Record$) Char = Asc(MID$(Record$, Index, 1)) If Char = 34 Then ' Flip InQuote between 0 and 1. ' A 1 will signal that the next characters ' are within quotes and delimiters should be ignored InQuote = Not InQuote EndIf ' Only check for delimiters when not within two quotes If Not InQuote AND INSTR(Delimiter$, chr$(char)) >= 1 Then Count = Count + 1 Else ' Only collect characters for the requested field. 'Add the character and only add quotes when KeepQuotes is 1 If Char <> 34 Or KeepQuotes Then GetFieldArray$(Count) = GetFieldArray$(Count) + chr$(char) EndIf EndIf EndIf Next GetFieldArray = Count + 1 End Function [/code] Nice to have would be a Ubound function so that the upper limit of arrays can be tested. Same for LBound to know the lowest index of an array. Microblocks. Build with logic. |
||||
shoebuckle Senior Member Joined: 21/01/2012 Location: AustraliaPosts: 189 |
Thanks TZ. That's great and I will add it to the library. I know it's a bit of a back-handed compliment and I am spruking the value of FORMAT.BAS, but your code has highlighted one of its benefits. If you run your code through FORMAT you will find that the last four lines are formatted incorrectly and that is because you have an extra ENDIF at the 4th line from the end. Your code does what you wanted it to because in this instance MMBasic has ignored the extra EndIf, but this is just the kind of error can cause much head scratching if the code is later modified. I have run your code in v4.3 (without the extra EndIf) on my Mono Maximite and it worked perfectly. Cheers, Hugh |
||||
MicroBlocks Guru Joined: 12/05/2012 Location: ThailandPosts: 2209 |
Hugh, Good catch! My only excuse is that it was very late!. Microblocks. Build with logic. |
||||
Print this page |