Memo‎ > ‎

Divide tokens by spaces except those between pair of quotes

posted Jan 17, 2014, 5:51 AM by Teng-Yok Lee   [ updated Jan 17, 2014, 5:53 AM ]
Scenario: Assume there is a string '--opt1=1 --opt2="2 3"'. We want to divide it into 2 token '--opt1=1' and '--opt2="2 3"'.

Define the regular expression: Let's define 3 tokens for now:

\S*".*"\S*: a pair of double quotes optionally surrounded by non-space (e.g. --opt2="2 3" ).
\S*'.*'\S*: a pair of double quotes optionally surrounded by non-space (e.g. --opt2='2 3' ).
\S+: a string of non non-space (e.g. --opt2='2 3' ).

The combined regular expression: "\S*\".*\"\S*|\S*'.*'\S*|\S+"

Usage:
In python, use re.findall to divide the tokens:

>>> m = re.findall("\S*\".*\"\S*|\S*'.*'\S*|\S+", '-1=2 "--3=3 5"');                                                                                                          
>>> m
['-1=2', '"--3=3 5"']
>>>


In GWT, it can be done as follows:

            // The array to store the tokens.
            ArrayList args = new ArrayList<String>();

            String regularExpression = "\\S*\".*\"\\S*|\\S*'.*'\\S*|\\S+";

            // "g" means that all tokens will be searched.
            RegExp regExp = RegExp.compile(regularExpression, "g");

            if( regExp == null )
                return args;

            // Now, parse the tokens in a string text.
            // Here is similar to strtok: When calling RegExp.exec,
            // it will automatically update lastIndex,
            // which is the offset in the string text for the next search.


            for(MatchResult matchResult = regExp.exec(text); matchResult != null; matchResult = regExp.exec(text) )
                args.add(matchResult.getGroup(0));

            return args;

References:

http://www.gwtproject.org/javadoc/latest/com/google/gwt/regexp/shared/RegExp.html
http://stackoverflow.com/questions/1520800/why-regexp-with-global-flag-in-javascript-give-wrong-results
http://docs.python.org/2/library/re.html#re.findall
Comments