Another list of related work:
https://github.com/tarpas/pytest-testmon/issues/42#issuecomment-310487641
A good model for how easy integration should be from an author's perspective would be something like Sentry (example integration docs).
Perhaps what's needed could be described as "Sentry for feature usage metrics"?
In fact, although Sentry is primarily aimed at error aggregation, it seems very close to being a good fit for this: perhaps it can be extended to allow for good feature usage reporting and metrics, too?
(If not, it can at least serve as design inspiration.)
Philipp Emanuel Weidmann's maybe (2016) is another interesting take: it executes shell scripts (or other executables) under ptrace, to intercept, log, and stub file system operations.
This implementation offers little safety and robustness, but it's a step toward what an abstract interpretation based execution previewer and modifier might look like.
This is a dumping ground for ideas, whimsies, and thought experiments that grab my interest.
(See the child streams on the right.)
Erik proposed this change to libraries@haskell.org:
"A better type signature for forM_
" (continued here)
Haskell's Applicative type class is a nifty abstraction that inhabits a sweet spot between the ubiquity of Functor and the power of Monad. It lends itself wonderfully to the applicative style of Haskell.
This stream collects hacks and ideas related to using Applicative.
Initial setup:
charm
opening script: ~/.local/bin/charm
(manually expand ~
first)Important shortcuts:
Settings:
To configure Vim or such as external editor, see here. PyCharm's line and column can be passed like -c "call cursor($LineNumber$, $ColumnNumber$)"
(must be double-quotes; single-quotes don't work)
Example:
gvim
-c "call cursor($LineNumber$, $ColumnNumber$)" $FilePathRelativeToSourcepath$
$SourcepathEntry$
Streams by this user that have been favorited by others.
Another list of related work:
https://github.com/tarpas/pytest-testmon/issues/42#issuecomment-310487641
https://github.com/tarpas/pytest-testmon
This pytest plugin looks like a solid and user-friendly implementation of the idea for faster TDD and CI.
https://python-tia.readthedocs.io/ – The generic Test Impact Analysis (TIA) preprocessor for test tools.
This project provides this useful overview of related tools:
Update: coverage.py now has support for capturing this!
Initial setup:
charm
opening script: ~/.local/bin/charm
(manually expand ~
first)Important shortcuts:
Settings:
To configure Vim or such as external editor, see here. PyCharm's line and column can be passed like -c "call cursor($LineNumber$, $ColumnNumber$)"
(must be double-quotes; single-quotes don't work)
Example:
gvim
-c "call cursor($LineNumber$, $ColumnNumber$)" $FilePathRelativeToSourcepath$
$SourcepathEntry$
A good model for how easy integration should be from an author's perspective would be something like Sentry (example integration docs).
Perhaps what's needed could be described as "Sentry for feature usage metrics"?
In fact, although Sentry is primarily aimed at error aggregation, it seems very close to being a good fit for this: perhaps it can be extended to allow for good feature usage reporting and metrics, too?
(If not, it can at least serve as design inspiration.)
Philipp Emanuel Weidmann's maybe (2016) is another interesting take: it executes shell scripts (or other executables) under ptrace, to intercept, log, and stub file system operations.
This implementation offers little safety and robustness, but it's a step toward what an abstract interpretation based execution previewer and modifier might look like.
There is at least one academic implementation of abstract interpretation for Bash:
This work is oriented toward static analysis and finding bugs, but the techniques could probably be repurposed for use in an interactive evaluator.
What if a tool could automate this?
What I usually find myself doing with scripts like this amounts to doing manual abstract interpretation. Instead of executing the script, I open it in an editor, and trace through it.
Simple scripts may just be a list of commands to review, but more complex scripts often involve:
I know of a number of existing open source projects that actively collect usage metrics, but these tend to be ad hoc, heavy-weight, and specialised to the project in question:
Package installations
Feature usage
This seems to be a big unfilled niche. There should really be a good common system for collecting and reporting aggregate statistics about features and users, that library and application authors can use to guide decisions.
One of the most annoying things about open source is you have absolutely no idea who is using your stuff for what unless it breaks.
There is no shortage of articles and essays explaining why this anti-pattern is bad:
There are various attempts to address the first two problems:
vipe
to inspect the script before executing.This still leaves the further problems unresolved, though.
Erik proposed this change to libraries@haskell.org:
"A better type signature for forM_
" (continued here)
Haskell's Applicative type class is a nifty abstraction that inhabits a sweet spot between the ubiquity of Functor and the power of Monad. It lends itself wonderfully to the applicative style of Haskell.
This stream collects hacks and ideas related to using Applicative.
One solution to this is to have variants of the _
functions that are type-specialised to only accept ()
as the loop body's result:
traverse_' :: (Applicative f, Foldable t)
=> (a -> f ()) -> t a -> f ()
traverse_' = traverse_
for_' :: (Applicative f, Foldable t)
=> t a -> (a -> f ()) -> f ()
for_' = for_
mapM_' :: (Monad m, Foldable t)
=> (a -> m ()) -> t a -> m ()
mapM_' = mapM_
forM_' :: (Monad m, Foldable t)
=> t a -> (a -> m ()) -> m ()
forM_' = forM_
These make it explicit that the loop should have no result, and makes it a type error to accidentally introduce a non-()
result.
GHC has a warning for this. Enabling -Wall
(or -fwarn-unused-do-bind
) will complain whenever a do block discards a value non-explicitly:
ghci> do putStrLn <$> getLine; return ()
Warning:
A do-notation statement discarded a result of type ‘IO ()’
Suppress this warning by saying ‘_ <- (<$>) putStrLn getLine’ or by using the flag -fno-warn-unused-do-bind
However, forM_
defeats this check by discarding all the loop's result values regardless of type. One may intend to discard only ()
, but when a bug like the above slips in, forM_
will just as happily discard IO ()
or any other type too, and the checker will be none the wiser.
This error is easy to introduce, especially in more complex code.
It's insidious when it happens: there will often be no warning sign or hard failure, only strange results or inexplicable behaviour down the line due to the actions and effects that have silently gone "missing".
The problem is in the loop body: <$>
applies HT.insert
and abort
to the result of the HT.lookup
action, but the resulting IO actions are then simply discarded by forM_
as values, without being used or executed.
This is similar to saying:
putStrLn <$> getLine -- has type IO (IO ())
This action yields a separate IO action as its result. Executing it and discarding its result will end up executing only the getLine
, and not the putStrLn
.
Correcting this requires using join
, or equivalently using =<<
as the application operator instead of <$>
:
putStrLn =<< getLine -- has type IO ()
This combines the two actions into a single action, as expected.
erikd
on Freenode #haskell.au shares an interesting bug.
The relevant code is:
buildTable :: IO EvenCache
buildTable = do
ht <- HT.new
forM_ pairs $ \ (k,v) ->
maybe (HT.insert ht k v) (const $ abort k) <$>
HT.lookup ht k
return ht
Can you spot the error?
If you write software on Unix, you've probably seen this many times:
curl example.org/install.sh | sh
Often, there is a sudo
involved.
This is bad, for four main reasons:
It's (usually) insecure.
Far too many installation instructions use plain HTTP (or omit the URL scheme), which means that any malicious intermediary can inject arbitrary code and compromise the user's system. Well-known URLs means that attackers can passively target them on any network where developers might connect to the Internet (Wi-Fi hotspots, coffee shops, hackathons…)
It's dangerously unreliable.
Even if you use securely verified HTTPS for distribution, if the connection fails while in progress, the truncated script will still be executed. If you're lucky, this might only give you an error, or a broken installation. If you're unlucky, the consequences could be disastrous: for example, rm -rf /tmp/foo/...
could be truncated into rm -rf /
.
It's usually poorly targeted.
Even if you solve the previous two problems, these installation scripts usually target the host environment badly. They might mess with /usr/local
or /opt
instead of using your OS's package manager, or vice versa. They might install their own copies of dependencies, when you want them to use your existing ones, or vice versa. They might install stuff into ~/bin
when you want ~/.local/bin
, or vice versa. The script has to try and work on every common OS and configuration in a fool-proof fashion, which usually means not supporting any particular OS or configuration particularly well.
It obscures useful choices. Because these scripts are intended to be fire-and-forget, many choices of versions, options, and other tweakables are obscured.
Those features could be baked into existing tools and test runners (unittest
, py.test
, trial
, …), but that might involve a lot of redundant work.
Python already has a de facto standard generic code coverage tool, coverage.py. One of the great things about it that it can invoke any Python tool or script and collect configurable coverage information about its execution, without any special support from the tool being invoked.
Could this same generic approach work for reverse coverage?
However, reverse coverage is far from being useful only at that big scale. Imagine being able to:
Most conventional test coverage tools answer the "forward" coverage question: developers want to know what code the tests exercise or miss.
Reverse coverage, by contrast, is relatively specialised. One main application of it is in the context of big, complex, automated integration builds, where you only want to re-run the subset of tests that depend on the code that changed with each commit, rather than the full test suite.
As a result, reverse coverage support tends to be obscure, or locked away inside heavy-weight distributed build systems (such as Google's formerly in-house Bazel).
First some definitions:
Forward test coverage answers the question:
Given these tests, what part of the codebase gets executed?
Reverse test coverage answers the question:
Given this part of the codebase, which tests execute it?
In an IRC discussion, the topic of collecting and using reverse test coverage information came up.
Thoughts by this user that have been liked by others.
This seems to be a big unfilled niche. There should really be a good common system for collecting and reporting aggregate statistics about features and users, that library and application authors can use to guide decisions.