我希望您能帮助我了解如何从文本文件中提取这些信息。有一个字段是可选的(此处标记为"S")
文本如下:
名称案例编号工期计划日期日期A模式金额CRATE S账户合计PETER AB02651341 RN建筑商IUL CTAT 02/05/15 02/05/1501 380.00 0.0050 1.90JOHNSON,DON A BF06010672 FY agvant 15 NT1 02/02/15 02/01/15 01 83.04 0.0500 4.15SARA ZZ02659940 RN CUST GUAR 2015年1月31日2015年1年30日12 18450.00-0.0025 46.13-MIKE KH02979366 RN CUST GUAR 02/02/15 02/01/15 01 109.83 0.0025.50.14
是否可以将其输出为这样(在阵列或其他结构中):
名称案例编号工期计划日期ST日期A模式金额CRATE S账户合计PETER AB02651341 RN建筑商IUL CTAT 2015年2月2日2015年5月2日01 380.00 0.0050 1.90JOHNSON,DON A BF06010672 FY agvant 15 NT1 2015年2月2日2015年1月2日01 83.04 0.0500 4.15SARA ZZ02659940 RN CUST GUAR 2015年1月31日2015年1日12-18450.00 0.0025-46.13MIKE KH02979366 RN CUST GUAR 2015年2月2日01 109.83 0.0025.50.14
最终输出将是这样的:
Array ( [0] => Array ( [NAME] => PETER [Case No.] => AB02651341 [Duration] => RN [PLAN] => BUILDER IUL CTAT [DATE ST] => 02/02/15 [DATE A] => 02/05/2015 [MODE] => 01 [AMOUNT] => 380.00 [CRATE] => 0.0050 [S] => [AccountTotal] => 1.90 )
[1] => Array ( [NAME] => JOHNSON, DON A [Case No.] => BF06010672 [Duration] => FY [PLAN] => AGGVANT 15 NT1 [DATE ST] => 02/2/2015 [DATE A] => 02/01/15 [MODE] => 01 [AMOUNT] => 83.04 [CRATE] => 0.0500 [S] => [AccountTotal] => 4.15 )
[2] => Array ( [NAME] => SARA [Case No.] => ZZ02659940 [Duration] => RN [PLAN] => CUST GUAR [DATE ST] => 01/31/2015 [DATE A] => 01/30/2015 [MODE] => 12 [AMOUNT] => -18,450.00 [CRATE] => 0.0025 [S] => [AccountTotal] => -46.13 )
[3] => Array ( [NAME] => MIKE [Case No.] => KH02979366 [Duration] => RN [PLAN] => CUST GUAR [DATE ST] => 02/02/15 [DATE A] => 02/01/2015 [MODE] => 01 [AMOUNT] => 109.83 [CRATE] => 0.0025 [S] => .50 [AccountTotal] => .14 ) )
也许这会奏效?
$a = <<<EOT
NAME Case No. Duration PLAN ACCT DATE ST DATE A MODE AMOUNT CRATE S AccountTotal
PETER AB02651341 RN BUILDER IUL CTAT 02/05/15 02/05/15 01 380.00 0.0050 1.90
JOHNSON, DON A BF06010672 FY AGGVANT 15 NT1 02/02/15 02/01/15 01 83.04 0.0500 4.15
SARA ZZ02659940 RN CUST GUAR 01/31/15 01/30/15 12 18,450.00- 0.0025 46.13-
MIKE KH02979366 RN CUST GUAR 02/02/15 02/01/15 01 109.83 0.0025 .50 .14
EOT;
$cols = array(
'NAME' => ''s+(.*?)',
'Case No.' => ''s+('w'w'd{8})',
'Duration' => ''s('w'w)',
'PLAN' => ''s+(.*?)',
'DATE ST' => ''s+('d'd/'d'd/'d'd)',
'DATE A' => ''s+('d'd/'d'd/'d'd)',
'MODE' => ''s+('d'd)',
'AMOUNT' => ''s+('-?.*?)',
'CRATE' => ''s+('d+'.'d+)',
'S' => ''s+(['.'d]*)',
'AccountTotal' => ''s+('-?.*?)$',
);
$result = array();
foreach (explode(PHP_EOL, $a) as $row) {
if (preg_match('#' . implode(array_values($cols)) . '#', $row, $matches)) {
// Move any trailing dash to the front of AMOUNT and
// AccountTotal (a bit hackish - could be improved :)
$matches[8] = preg_replace('/(.*)-$/', '-$1', $matches[8]);
$matches[11] = preg_replace('/(.*)-$/', '-$1', $matches[11]);
$result[] = array_combine(array_keys($cols), array_slice($matches, 1));
}
}
print_r($result);
输出:
Array
(
[0] => Array
(
[NAME] => PETER
[Case No.] => AB02651341
[Duration] => RN
[PLAN] => BUILDER IUL CTAT
[DATE ST] => 02/05/15
[DATE A] => 02/05/15
[MODE] => 01
[AMOUNT] => 380.00
[CRATE] => 0.0050
[S] =>
[AccountTotal] => 1.90
)
[1] => Array
(
[NAME] => JOHNSON, DON A
[Case No.] => BF06010672
[Duration] => FY
[PLAN] => AGGVANT 15 NT1
[DATE ST] => 02/02/15
[DATE A] => 02/01/15
[MODE] => 01
[AMOUNT] => 83.04
[CRATE] => 0.0500
[S] =>
[AccountTotal] => 4.15
)
[2] => Array
(
[NAME] => SARA
[Case No.] => ZZ02659940
[Duration] => RN
[PLAN] => CUST GUAR
[DATE ST] => 01/31/15
[DATE A] => 01/30/15
[MODE] => 12
[AMOUNT] => -18,450.00
[CRATE] => 0.0025
[S] =>
[AccountTotal] => -46.13
)
[3] => Array
(
[NAME] => MIKE
[Case No.] => KH02979366
[Duration] => RN
[PLAN] => CUST GUAR
[DATE ST] => 02/02/15
[DATE A] => 02/01/15
[MODE] => 01
[AMOUNT] => 109.83
[CRATE] => 0.0025
[S] => .50
[AccountTotal] => .14
)
)
您可以在regexp之后使用?
来表示它是可选的。因此,如果XXX是该行前面部分的正则表达式,则可以编写:
preg_match('/^XXX(?:'s+(['d.]+))?'s+(['d.]+)$/', $line, $match);
未提供字段时,S
字段的捕获组将为空。