Skip to content

Fix #37 (undefined in Show PatternSet) by lazy match #38

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 11 additions & 10 deletions .github/workflows/haskell-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
#
# For more information, see https://github.com/haskell-CI/haskell-ci
#
# version: 0.15.20220525
# version: 0.15.20220710
#
# REGENDATA ("0.15.20220525",["github","regex-tdfa.cabal"])
# REGENDATA ("0.15.20220710",["github","regex-tdfa.cabal"])
#
name: Haskell-CI
on:
Expand All @@ -32,14 +32,14 @@ jobs:
strategy:
matrix:
include:
- compiler: ghc-9.4.0.20220501
- compiler: ghc-9.4.0.20220623
compilerKind: ghc
compilerVersion: 9.4.0.20220501
compilerVersion: 9.4.0.20220623
setup-method: ghcup
allow-failure: true
- compiler: ghc-9.2.2
- compiler: ghc-9.2.3
compilerKind: ghc
compilerVersion: 9.2.2
compilerVersion: 9.2.3
setup-method: ghcup
allow-failure: false
- compiler: ghc-9.0.2
Expand Down Expand Up @@ -107,17 +107,17 @@ jobs:
mkdir -p "$HOME/.ghcup/bin"
curl -sL https://downloads.haskell.org/ghcup/0.1.17.8/x86_64-linux-ghcup-0.1.17.8 > "$HOME/.ghcup/bin/ghcup"
chmod a+x "$HOME/.ghcup/bin/ghcup"
if $HEADHACKAGE; then "$HOME/.ghcup/bin/ghcup" config add-release-channel https://raw.githubusercontent.com/haskell/ghcup-metadata/master/ghcup-prereleases-0.0.7.yaml; fi
"$HOME/.ghcup/bin/ghcup" install ghc "$HCVER"
"$HOME/.ghcup/bin/ghcup" install cabal 3.6.2.0
"$HOME/.ghcup/bin/ghcup" config add-release-channel https://raw.githubusercontent.com/haskell/ghcup-metadata/master/ghcup-prereleases-0.0.7.yaml;
"$HOME/.ghcup/bin/ghcup" install ghc "$HCVER" || (cat "$HOME"/.ghcup/logs/*.* && false)
"$HOME/.ghcup/bin/ghcup" install cabal 3.6.2.0 || (cat "$HOME"/.ghcup/logs/*.* && false)
else
apt-add-repository -y 'ppa:hvr/ghc'
apt-get update
apt-get install -y "$HCNAME"
mkdir -p "$HOME/.ghcup/bin"
curl -sL https://downloads.haskell.org/ghcup/0.1.17.8/x86_64-linux-ghcup-0.1.17.8 > "$HOME/.ghcup/bin/ghcup"
chmod a+x "$HOME/.ghcup/bin/ghcup"
"$HOME/.ghcup/bin/ghcup" install cabal 3.6.2.0
"$HOME/.ghcup/bin/ghcup" install cabal 3.6.2.0 || (cat "$HOME"/.ghcup/logs/*.* && false)
fi
env:
HCKIND: ${{ matrix.compilerKind }}
Expand Down Expand Up @@ -186,6 +186,7 @@ jobs:
26021a13b401500c8eb2761ca95c61f2d625bfef951b939a8124ed12ecf07329
f76d08be13e9a61a377a85e2fb63f4c5435d40f8feb3e12eb05905edb8cdea89
key-threshold: 3
active-repositories: hackage.haskell.org, head.hackage.ghc.haskell.org:override
EOF
fi
cat >> $CABAL_CONFIG <<EOF
Expand Down
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
For the package version policy (PVP), see http://pvp.haskell.org/faq .

### 1.3.1.3

_2022-07-14, Andreas Abel_

- Fix an `undefined` in `Show PatternSet` [#37](https://github.com/haskell-hvr/regex-tdfa/issues/37))
- Document POSIX character classes (e.g. `[[:digit:]]`)

### 1.3.1.2 Revision 1

_2022-05-25, Andreas Abel_
Expand Down
42 changes: 28 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ a =~ b :: String -- or ByteString, or Text...
λ> "alexis-de-tocqueville" =~ "[a-z]+" :: String
>>> "alexis"

λ> "alexis-de-tocqueville" =~ "[0-9]+" :: String
λ> "alexis-de-tocqueville" =~ "[[:digit:]]+" :: String
>>> ""
```

Expand Down Expand Up @@ -103,7 +103,7 @@ getAllTextMatches (a =~ b) :: [String]
λ> getAllTextMatches ("john anne yifan" =~ "[a-z]+") :: [String]
>>> ["john","anne","yifan"]

λ> getAllTextMatches ("0a0b0" =~ "0[a-z]0") :: [String]
λ> getAllTextMatches ("0a0b0" =~ "0[[:lower:]]0") :: [String]
>>> ["0a0"]
```
Note that `"0b0"` is not included in the result since it overlaps with `"0a0"`.
Expand All @@ -120,6 +120,17 @@ featureful than some other regex engines you might be used to, such as PCRE.
* `\b` &mdash; Match beginning or end of word
* `\B` &mdash; Match neither beginning nor end of word

While shorthands like `\d` (for digit) are not recognized, one can use the respective
POSIX character class inside `[...]`. E.g., `[[:digit:][:lower:]_]` is short for
`[0-9a-z_]`. The supported character classes are listed on
[Wikipedia](https://en.wikipedia.org/w/index.php?title=Regular_expression&oldid=1095256273#Character_classes)
and defined in module
[`TNFA`](https://github.com/haskell-hvr/regex-tdfa/blob/95d47cb982d2cf636b2cb6260a866f9907341c45/lib/Text/Regex/TDFA/TNFA.hs#L804-L816).

Please also consult a variant of this documentation which is part of the
[Text.Regex.TDFA haddock](http://hackage.haskell.org/package/regex-tdfa/docs/Text-Regex-TDFA.html),
and the original documentation at the [Haskell wiki](https://wiki.haskell.org/Regular_expressions#regex-tdfa).

### Less common stuff

#### Get match indices
Expand Down Expand Up @@ -148,16 +159,6 @@ getAllSubmatches (a =~ b) :: [(Int, Int)] -- (index, length)

`regex-tdfa` does not provide find-and-replace.

## The relevant links

This documentation is also available in [Text.Regex.TDFA haddock](http://hackage.haskell.org/package/regex-tdfa-1.2.3.2/docs/Text-Regex-TDFA.html).

This was also documented at the [Haskell wiki](https://wiki.haskell.org/Regular_expressions#regex-tdfa). The original Darcs repository was at [code.haskell.org](http://code.haskell.org/regex-tdfa/). When not updated, this was forked and maintained by Roman Cheplyaka as [regex-tdfa-rc](http://hackage.haskell.org/package/regex-tdfa-rc).

Then the repository moved to <https://github.com/ChrisKuklewicz/regex-tdfa>, which was primarily maintained by [Artyom (neongreen)](https://github.com/neongreen).

Finally, maintainership was passed on again and the repository moved to its current location at <https://github.com/haskell-hvr/regex-tdfa>.

## Avoiding backslashes

If you find yourself writing a lot of regexes, take a look at
Expand Down Expand Up @@ -193,10 +194,23 @@ By building on this thesis and adding a few more optimizations, regex-tdfa match

Regardless of performance, nearly every single OS and Libra for POSIX regular expressions has bugs in sub-matches. This was detailed on the [Regex POSIX Haskell wiki page](https://wiki.haskell.org/Regex_Posix), and can be demonstrated with the [regex-posix-unittest](http://hackage.haskell.org/package/regex-posix-unittest) suite of checks. Test [regex-tdfa-unittest](http://hackage.haskell.org/package/regex-tdfa-unittest) should show regex-tdfa passing these same checks. I owe my understanding of the correct behvior and many of these unit tests to Glenn Fowler at AT&T ("An Interpretation of the POSIX regex Standard").

### Maintainance history

The original Darcs repository was at [code.haskell.org](http://code.haskell.org/regex-tdfa/).
For a while a fork was maintained by Roman Cheplyaka as
[regex-tdfa-rc](http://hackage.haskell.org/package/regex-tdfa-rc).

Then the repository moved to <https://github.com/ChrisKuklewicz/regex-tdfa>,
which was primarily maintained by [Artyom (neongreen)](https://github.com/neongreen).

Finally, maintainership was passed on again and the repository moved to its current location
at <https://github.com/haskell-hvr/regex-tdfa>.

## Other related packages

You can find several other related packages by searching for "tdfa" on [hackage](http://hackage.haskell.org/packages/search?terms=tdfa).
Searching for "tdfa" on [hackage](http://hackage.haskell.org/packages/search?terms=tdfa)
finds some related packages (unmaintained as of 2022-07-14).

## Document notes

This was written 2016-04-30.
This README was originally written 2016-04-30.
22 changes: 6 additions & 16 deletions lib/Text/Regex/TDFA/CorePattern.hs
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,17 @@ module Text.Regex.TDFA.CorePattern(Q(..),P(..),WhichTest(..),Wanted(..)

import Control.Monad (liftM2, forM, replicateM)
import Control.Monad.RWS (RWS, runRWS, ask, local, listens, tell, get, put)

import Data.Array.IArray(Array,(!),accumArray,listArray)
import Data.Either (partitionEithers, rights)
import Data.List(sort)
import Data.IntMap.EnumMap2(EnumMap)
import qualified Data.IntMap.EnumMap2 as Map(singleton,null,assocs,keysSet)
--import Data.Maybe(isNothing)
import Data.IntSet.EnumSet2(EnumSet)
import qualified Data.IntSet.EnumSet2 as Set(singleton,toList,isSubsetOf)
import Data.Semigroup as Sem

import Text.Regex.TDFA.Common {- all -}
import Text.Regex.TDFA.Pattern(Pattern(..),starTrans)
-- import Debug.Trace
Expand Down Expand Up @@ -278,18 +281,6 @@ makeGroupArray :: GroupIndex -> [GroupInfo] -> Array GroupIndex [GroupInfo]
makeGroupArray maxGroupIndex groups = accumArray (\earlier later -> later:earlier) [] (1,maxGroupIndex) filler
where filler = map (\gi -> (thisIndex gi,gi)) groups

fromRight :: [Either Tag GroupInfo] -> [GroupInfo]
fromRight [] = []
fromRight ((Right x):xs) = x:fromRight xs
fromRight ((Left _):xs) = fromRight xs

partitionEither :: [Either Tag GroupInfo] -> ([Tag],[GroupInfo])
partitionEither = helper id id where
helper :: ([Tag]->[Tag]) -> ([GroupInfo]->[GroupInfo]) -> [Either Tag GroupInfo] -> ([Tag],[GroupInfo])
helper ls rs [] = (ls [],rs [])
helper ls rs ((Right x):xs) = helper ls (rs.(x:)) xs
helper ls rs ((Left x):xs) = helper (ls.(x:)) rs xs

-- Partial function: assumes starTrans has been run on the Pattern
-- Note that the lazy dependency chain for this very zigzag:
-- varies information is sent up the tree
Expand All @@ -305,7 +296,7 @@ patternToQ :: CompOption -> (Pattern,(GroupIndex,DoPa)) -> (Q,Array Tag OP,Array
patternToQ compOpt (pOrig,(maxGroupIndex,_)) = (tnfa,aTags,aGroups) where
(tnfa,(tag_dlist,nextTag),groups) = runRWS monad startReader startState
aTags = listArray (0,pred nextTag) (tag_dlist [])
aGroups = makeGroupArray maxGroupIndex (fromRight groups)
aGroups = makeGroupArray maxGroupIndex (rights groups)

-- implicitly inside a PGroup 0 converted into a GroupInfo 0 undefined 0 1
monad = go (starTrans pOrig) (Advice 0) (Advice 1)
Expand Down Expand Up @@ -354,8 +345,7 @@ patternToQ compOpt (pOrig,(maxGroupIndex,_)) = (tnfa,aTags,aGroups) where
-- withOrbit uses MonadWriter(listens to makeOrbit/Left), collects
-- children at all depths
withOrbit :: PM a -> PM (a,[Tag])
withOrbit = listens childStars
where childStars x = let (ts,_) = partitionEither x in ts
withOrbit = listens $ fst . partitionEithers

{-# INLINE makeGroup #-}
-- makeGroup usesMonadWriter(tell/Right)
Expand All @@ -380,7 +370,7 @@ patternToQ compOpt (pOrig,(maxGroupIndex,_)) = (tnfa,aTags,aGroups) where
withParent :: GroupIndex -> PM a -> PM (a,[Tag])
withParent this = local (const (Just this)) . listens childGroupInfo
where childGroupInfo x =
let (_,gs) = partitionEither x
let gs = snd $ partitionEithers x
children :: [GroupIndex]
children = norep . sort . map thisIndex
-- filter to get only immediate children (efficiency)
Expand Down
9 changes: 6 additions & 3 deletions lib/Text/Regex/TDFA/Pattern.hs
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
{-# OPTIONS_GHC -fno-warn-incomplete-uni-patterns #-}

-- | This "Text.Regex.TDFA.Pattern" module provides the 'Pattern' data
-- type and its subtypes. This 'Pattern' type is used to represent
-- the parsed form of a Regular Expression.

module Text.Regex.TDFA.Pattern
(Pattern(..)
,PatternSet(..)
Expand Down Expand Up @@ -105,9 +108,9 @@ instance Show PatternSet where
in shows charSpec
. showsPrec i scc' . showsPrec i sce' . showsPrec i sec'
. if '-' `elem` special then showChar '-' else id
where byRange xAll@(x:xs) | length xAll <=3 = xAll
| otherwise = groupRange x 1 xs
byRange _ = undefined
where byRange xAll@(~(x:xs))
| length xAll <=3 = xAll
| otherwise = groupRange x 1 xs
groupRange x n (y:ys) = if (fromEnum y)-(fromEnum x) == n then groupRange x (succ n) ys
else (if n <=3 then take n [x..]
else x:'-':(toEnum (pred n+fromEnum x)):[]) ++ groupRange y 1 ys
Expand Down
7 changes: 3 additions & 4 deletions regex-tdfa.cabal
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
cabal-version: 1.12
name: regex-tdfa
version: 1.3.1.2
x-revision: 1
version: 1.3.1.3

build-Type: Simple
license: BSD3
Expand All @@ -27,7 +26,7 @@ extra-source-files:

tested-with:
GHC == 9.4.1
GHC == 9.2.2
GHC == 9.2.3
GHC == 9.0.2
GHC == 8.10.7
GHC == 8.8.4
Expand All @@ -47,7 +46,7 @@ source-repository head
source-repository this
type: git
location: https://github.com/haskell-hvr/regex-tdfa.git
tag: v1.3.1.2-r1
tag: v1.3.1.3

flag force-O2
default: False
Expand Down