|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2009-09-16 01:39 UTC] anoop dot john at zyxware dot com
Description: ------------ I am using a complex regex pattern to match stock tickers in a piece of text. The pattern given below $pattern = '/\(((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*;((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*\)/'; should match (AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW) and it does match it when the subject string is given alone. However when you prepend another particular string that does not match this pattern in front of this subject string the regex ceases to match the original portion of the string. The culprit string is given below. (Euronext, NASDAQ: CRXL; AMEX,NYSE,NASDAQ, Swiss Exchange: CRX;NasdaqGM: QTWW) The pattern matches only one opening brace and will not match another opening brace. So it cannot be that the pattern ate through the first pair of brackets and went into the second pair of brackets and fails to match when the culprit string is prepended. Reproduce code: --------------- $pattern = '/\(((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*;((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*\)/'; preg_match_all($pattern, '(Euronext, NASDAQ: CRXL; AMEX,NYSE,NASDAQ, Swiss Exchange: CRX;NasdaqGM: QTWW) (AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', $matches, PREG_SET_ORDER); var_export($matches); echo "<br /><br />"; preg_match_all($pattern, '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', $matches, PREG_SET_ORDER); var_export($matches); Expected result: ---------------- array ( 0 => array ( 0 => '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', 1 => 'AMEX,NYSE, Swiss Exchange', 2 => 'CRX', 3 => 'Nasdaq', 4 => 'QTWW', ), ) array ( 0 => array ( 0 => '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', 1 => 'AMEX,NYSE, Swiss Exchange', 2 => 'CRX', 3 => 'Nasdaq', 4 => 'QTWW', ), ) Actual result: -------------- array ( ) array ( 0 => array ( 0 => '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', 1 => 'AMEX,NYSE, Swiss Exchange', 2 => 'CRX', 3 => 'Nasdaq', 4 => 'QTWW', ), ) PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Tue Nov 25 02:00:01 2025 UTC |
I am sorry but by adding a ? to the end of the pattern I would make the closing brace an optional match and the regex would match the content of the first braces till it stops matching and the content of the second brace completely including the closing brace. But the point is to not match the content of the first set of braces at all. The following is the results from the suggested change. The matches array now contain the partial match from the content of the first brace as matches[0] and the full match from the second brace as matches[1]. This is incorrect. The contents of the first pair of braces should not be matched at all. array ( 0 => array ( 0 => '(Euronext, NASDAQ: CRXL; AMEX,NYSE,NASDAQ,Swiss Exchange: CRX', 1 => 'Euronext, NASDAQ', 2 => 'CRXL', 3 => ' AMEX,NYSE,NASDAQ,Swiss Exchange', 4 => 'CRX', ), 1 => array ( 0 => '(AMEX,NYSE, Swiss Exchange:CRX;Nasdaq: QTWW)', 1 => 'AMEX,NYSE, Swiss Exchange', 2 => 'CRX', 3 => 'Nasdaq', 4 => 'QTWW', ), ) array ( 0 => array ( 0 => '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq:QTWW)', 1 => 'AMEX,NYSE, Swiss Exchange', 2 => 'CRX', 3 => 'Nasdaq', 4 => 'QTWW', ), ) To put you in context. The regex does this. Match two sets of combinations of one of the words from AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE and any number of (words or groups of words separated by spaces) separated by commas paired with a stock ticker in full caps and separted from exchange name by : and both combinations enclosed within one brace and separated by ; and remember 1) Combination of exchange names of first stock 2) First stock name 3) Combination of exchange names of second stock 4) Second stock name