23 March 2012

Floating-point

Single-precision: sign 1b, exponent 8b, fraction 23+1b implied (= 6 ~ 9 decimal sigfigs)
Double-precision: sign 1b, exponent 11b, fraction 52+1b implied (= 15 ~ 17 decimal sigfigs)

Special cases:
  • Exponent = 0
    • fraction = 0: (+/-) zero
    • fraction != 0: "subnormal" number with implied bit set to 0 instead
  • Exponent = (FF or 3FF, maximum value in allocated bits)
    • fraction = 0: (+/-) infinity
    • fraction != 0: NaN (sign ignored)
      • top explicit fraction bit = 1: "quiet NaN"
      • top explicit fraction bit = 0 (and rest != 0): "signaling NaN"

18 March 2012

Cross-Platform Regular Expressions

There used to be a lot of blathering about regexes but I think nobody would read them anyway and I want a reference. Also it's incomplete and there is no logic whatsoever to which parts I chose. This is why I wanted colorful code tags.

The patterns in this post are the regex string, which may need to be escaped further in code. Perl's regexes are the base of many other ones, but tiny differences abound.
References: perlre; java.util.Pattern; Python re; vimdoc *pattern*;
As expected, normal alphanumeric characters are regexes that match themselves and only themselves.
^ matches start of string.
$ matches end of string.
. matches any character except newline.
| matchers the OR of two regexes (Vim: requires escaping as \|.)
(pattern) groups a regex (Vim: requires escaping as \(pattern\).)
[chars] matches any of a set of characters, e.g. c or h or a or r or s.

Quantifiers

hee hee, h3 tags
Quantifiers (Vim calles them "multis") take the previous piece of regex and allow it to match repeatedly some number of times (possibly none).

*: 0 or more
+: 1 or more
?: 0 or 1
{5}: 5
{5,}: 5 or more
{5,10}: 5 to 10 inclusive
Python: {,10} is allowed (not in perl!)
Vim: \+, \?, and \{ must be escaped for the above meaning (under normal 'magic' setting). You can escape or not escape the }. \= is a synonym for \?, as a second question mark would delimit the offset in backward search. \{} or \{,5} is allowed.
They are tightly bounding, so 42+ matches strings like 42222, not 42424242.

The above quantifiers are greedy; they try to match as much as possible. At the end of any of them, add an extra ? and it becomes reluctant (Java tutorial term), trying to match as little as possible. Alternatively, adding a + makes the quantifier possessive; it matches everything it can and no less, even if it takes up something the rest of the pattern might need.
Python, Vim: no possessives.
Vim: No adding ?s. Instead only bracket quantifiers can be made reluctant, by adding a dash, like \{-5,10}.

Character Classes

As we've seen [aeiou] matches one of the characters between the square brackets. Then there are a variety of commonly-occurring character-sets that people want to match. Some very common ones:
\w = "word" character (alphanumeric, underscore)
\d = digit
\s = whitespace (space, tab, newlines)
\h = horizontal whitespace
Each can be capitalized to match any character not in the class instead. Note that they might include foreign Unicode characters with the same properties; check your language.

Weird Stuff

X is a regex.
(?#X): nothing, merely a comment (not in Java)
(?:X): X, non-capturing
(?=X): X, via zero-width positive lookahead
(?!X): X, via zero-width negative lookahead
(?<=X): X, via zero-width positive lookbehind
(?<!X): X, via zero-width negative lookbehind
(?>X): X, as an independent, non-capturing group

As should be expected Vim has different notation for every single one of these:
\%(X\): X, non-capturing
For other assertions, put one of \@=, \@!, \@<=, \@<!, \@> after a regex, probably a group.

Explanation: Zero-widths match a null substring, but only if the part after/before that null substring matches/does not match the pattern.

Independent group: It will look for the "first" way this pattern matches and only match that. Same idea as possessive quantifiers.

Here's a Vim string for a standalone URL from the random rst syntax I'm staring at. Admire.
"\<\%(\%(\%(https\=\|file\|ftp\|gopher\)://\|\%(mailto\|news\):\)[^[:space:]'\"<>]\+\|www[[:alnum:]_-]*\.[[:alnum:]_-]\+\.[^[:space:]'\"<>]\+\)[[:alnum:]/]"


A weak Vim substitute command for detecting raw ampersands in XML, for some reason.
:%s/&\(amp;\|gt;\|lt;\|quot;\|#\)\@!/\&/gc

Also. Infamous RFC regex

17 March 2012

Chapter 23

I happen to feel like this is a good way to, I don't know, let off steam and deal with reality. It's probably just me.
1
  • (dermal tissue)
  • (vascular tissue)
  • (ground tissue)
  • epidermal cell
  • (trichome)
  • (xylem) [nonlinear syllabus; from Chapter 22]
  • (tracheid) [ditto] 
  • (phloem) [ditto]
  • vessel element
  • sieve tube element
  • companion cell
  • parenchyma
  • collenchyma
  • sclerenchyma
  • meristem
  • meristematic tissue
  • apical meristem
  • differentiation
2
  • taproot
  • fibrous root
  • (epidermis)
  • root hair
  • cortex
  • endodermis
  • vascular cylinder
  • root cap
  • Casparian strip
3
  • node
  • internode
  • bud
  • vascular bundle
  • pith
  • primary growth
  • secondary growth
  • vascular cambium
  • cork cambium
  • heartwood
  • sapwood
  • bark
4
  • blade
  • petiole
  • mesophyll
  • palisade mesophyll
  • spongy mesophyll
  • stoma
  • guard cell
  • transpiration
5
  • adhesion
  • capillary action
  • pressure-flow hypothesis
  • (transpirational pull)

04 March 2012

Accent Marks

cliché
déjà vu
doppelgänger
naïve
Pokémon
touché


Of course substituting the unmarked letters will annoy only the grammar Nazis among us, but...