Learning By Way Of Ignorance
Tuesday 09/10/19 12:09:37 PM

Lately I've stumbled upon a variety of tools that are included in most Linux distributions that I have never needed, was never made aware of or just never found.

One example would be comm :

$ whatis comm
comm                 (1)  - compare two sorted files line by line

Another [embarrassing] example is scp's ability to copy files/directories from one remote machine to another remote machine - in one step - without the files/directories ever being copied locally:

$ ssh host2 'ls -la /tmp/foolish'
total 8
drwxr-xr-x 2 mm   mm   4096 Jul  3 12:22 .
drwxrwxrwt 9 root root 4096 Jul  3 12:23 ..

$ scp host1:/etc/hosts host2:/tmp/foolish/

$ ssh host2 'ls -la /tmp/foolish'
total 12
drwxr-xr-x 2 mm   mm   4096 Jul  3 12:24 .
drwxrwxrwt 9 root root 4096 Jul  3 12:24 ..
-rw-r--r-- 1 mm   mm    221 Jul  3 12:24 hosts

Lacking a personal life and always seeking to incorporate new tools, I've developed a serious jones - a habit if you will - of listing packages and looking up binaries and scripts that I don't recognize (I have a crush on all package with 'utils' in the name). I would also stumble on GNU gold while hanging out in /usr/bin or /usr/sbin. This would always result in me wanting to find what packages my new friends were members of and immediate sift through the contents.

Considering most of the hosts that I work...err...hang out with are based on RedHat Enterprise Linux (or RHEL in ComputerNerdish), I would search through RPM package listings. In the past I have worked extensively with RPM's - I've managed hosts and their contents with rpm - I've managed Apt-RPM and Yum package repositories - I've built and currently build rpms. In fact, I recently just finished an automated package build and distribution solution that works with Subversion and will be releasing it to the wild very soon. In spite of all this *bragadocia* I was under the impression that the only way to query which rpm a file or directory belonged to was by running rpm -qf /path/to/file and the only way to list the contents of that package was to run rpm -ql packagename. That is, until I incorporated some slick functions into my .bashrc that replaced the two step operation of looking up and listing packages and found myself listing the contents of the package rpm:

$ wrl rpm

Listing Contents of rpm-


Doh! I didn't even have to look at rpmquery's usage and I knew all the work I had just finished was sorta Elisha Gray'd (except I didn't come to timely innovation). Nevertheless it was mildly fun and educating to recreate the wheel (some of rpmquery's functionality) and so I thought I would share what I came up with.

Shell Functions As A Tool Test Drive

I spend a lot of time writing tools to automate or simplify stuff I make my minions (computers) do. I've found that I can non-intrusively test the functionality of a potential tool by incorporating it into my environment via functions in my .bashrc.

Let's look at the first two functions, which functionally accomplishes looking up the package name for any file either in my PATH or specified.

From $HOME/.bashrc:

function Validate {
    [ -d "$1" ] && Results="$1" && return 0
    local ItsType=$(type -t $1 || return 1)
    [[ "${ItsType}" == +(file) ]] && Results=$(type -P $1)  && return 0
    [[ "${ItsType}" == +(alias|keyword|function|builtin) ]] && return 1

function Which_RPM {
    Validate $1  && Target=${Results} || return 1
    Validate rpm && Rpm=${Results}    || return 1
    ${Rpm} -qf ${Target}


The first function is used to verify the existence, path and type of any non-builtin executables. This is used primarily for portability and secondarily for security. It serves as a functional resource (either directly or indirectly) for the rest of the functions covered here.


Unlike the first, the second function accomplishes one of the tasks that I set out to achieve; namely to return the the package name for any base (i.e. not the full path) file name that happens to be in my $PATH. It will also return the package name for any file or directory that is specified by passing the full path on the command line. The first two operations utilize Validate to check the location of the argument passed on the command line and to check the location of the rpm excutable. The last line combines the two to get the desired result.

One thing I want to point out is my naming convention for variables and functions.

Variables : Clear, purpose named, CamelCase styled, curly brace bound

Validated Commands : Variable name matches command for which it stored with first letter capitalized, curly brace bound

Functions : Clear, purpose named, CamelCase like, with underscored word separation

All caps variables reduce readability and therefore I avoid them. The use of underscores has been something I avoided in the past - they look terrible with lowercase and they waste keyboard strokes (this is solved later on). However I have found using caps with underscored separate names look pretty good. Finally, the underscores aid in distinguishing the quasi namespace hierarchy which nicely separates functions from other stuff (like variables born of my crazy variable naming convention).

$ Which_RPM monit

$ Which_RPM /etc/postfix        

The next functional goal is to be able to list the contents of the packages that Which_RPM returns.


function Which_RPM_List {
    Validate rpm && Rpm=${Results} || return 1
    for SomeFiles in $(echo $@)
        WhichRPMResult=$(Which_RPM $@)
        echo -e "\nListing Contents of ${WhichRPMResult}:\n"
        ${Rpm} -ql ${WhichRPMResult}

The first thing to note is that although Which_RPM_List calls Which_RPM we still have to Validate the rpm executable. Bash has no mechanism or concept of inheritance AND variable scoping is fairly restrictive when it comes to functions. Normally the right solution to this problem is to declare a global variable for anything that is used or shared by more than one function. But I wanted to see if I could accomplish everything in referenceable, modular units. Also notice that the name of the function follows the standard of establishing a namespace. Let's see this function in action:

$ Which_RPM_List monit

Listing Contents of monit-5.1.1-1.el5.rf:


At this point, I was fairly happy but I immediately realized that Which_RPM will not deal with more than one argument passed to it. Sure I could go back and change it so that it can deal with multiple files or dirs, returning multiple package names, but I won't. Out of laziness and curiousity I decided to create another function that will use Which_RPM to do its thing on an entire iterated list. Here's what we got:


function Which_RPM_Plural {
    Validate basename && Basename=${Results} || return 1
    for SomeFiles in $(echo $@)
        WhichRPMResult=$(Which_RPM ${SomeFiles}) || continue
        JustBase=$(basename ${SomeFiles})
        echo "${JustBase} :: ${WhichRPMResult}"

Seems pretty straight forward, even if it is not as efficient as it could be. One of the things I wanted to make sure it could do deal with passing everything in a directory as well as Bash brace expandable paths:

$ Which_RPM_Plural /etc/profile.d/*
bash_completion.sh :: bash-completion-20060301-1.el5.rf
colorls.csh :: coreutils-5.97-23.el5_4.2
colorls.sh :: coreutils-5.97-23.el5_4.2
glib2.csh :: glib2-2.12.3-4.el5_3.1
glib2.sh :: glib2-2.12.3-4.el5_3.1
lang.csh :: initscripts-8.45.30-2.el5.centos
lang.sh :: initscripts-8.45.30-2.el5.centos
less.csh :: less-436-2.el5
less.sh :: less-436-2.el5
vim.csh :: vim-enhanced-7.0.109-6.el5
vim.sh :: vim-enhanced-7.0.109-6.el5
which-2.sh :: which-2.16-7

$ Which_RPM_Plural /sbin/{fsck,service,chkconfig}
fsck :: e2fsprogs-1.39-23.el5
service :: initscripts-8.45.30-2.el5.centos
chkconfig :: chkconfig-

Yeah, this is nice, except it's butt ugly. Never fear, another function is here:

function Which_RPM_Plural_Formatted {
    Validate column && Column=${Results} || return 1
    Which_RPM_Plural $@ | ${Column} -t

And tonights winning lottery numbers are:


$ Which_RPM_Plural_Formatted /etc/profile.d/*
bash_completion.sh  ::  bash-completion-20060301-1.el5.rf
colorls.csh         ::  coreutils-5.97-23.el5_4.2
colorls.sh          ::  coreutils-5.97-23.el5_4.2
glib2.csh           ::  glib2-2.12.3-4.el5_3.1
glib2.sh            ::  glib2-2.12.3-4.el5_3.1
lang.csh            ::  initscripts-8.45.30-2.el5.centos
lang.sh             ::  initscripts-8.45.30-2.el5.centos
less.csh            ::  less-436-2.el5
less.sh             ::  less-436-2.el5
vim.csh             ::  vim-enhanced-7.0.109-6.el5
vim.sh              ::  vim-enhanced-7.0.109-6.el5
which-2.sh          ::  which-2.16-7

$ Which_RPM_Plural_Formatted /sbin/{fsck,service,chkconfig}
fsck       ::  e2fsprogs-1.39-23.el5
service    ::  initscripts-8.45.30-2.el5.centos
chkconfig  ::  chkconfig-

At this point I should be happy, but I am far from it. Why? Well, while I like the long, quasi namespaced, formatted function names, they make for terrible command line usage (terrible to type). We could set some aliases to shorter names OR we can bust out some more functions. YEAH! (obviously I am intentionally getting carried away):

function wr {
    Which_RPM "$@"

function wrl {
    Which_RPM_List "$@"

function wrpf {
    Which_RPM_Plural_Formatted "$@"

This is where the quasi namespaced fuction names really pay off. I simply name each new substituting function with the lowercased, first character of each word in the function name being substituted. Now to show off them shorty function names from the command line. But first, tab completion:


$ wr
wr        write     wrjpgcom  wrl       wrpf      
$ wr

1 $ wrpf /etc/profile.d/*
bash_completion.sh  ::  bash-completion-20060301-1.el5.rf
colorls.csh         ::  coreutils-5.97-23.el5_4.2
colorls.sh          ::  coreutils-5.97-23.el5_4.2
glib2.csh           ::  glib2-2.12.3-4.el5_3.1
glib2.sh            ::  glib2-2.12.3-4.el5_3.1
lang.csh            ::  initscripts-8.45.30-2.el5.centos
lang.sh             ::  initscripts-8.45.30-2.el5.centos
less.csh            ::  less-436-2.el5
less.sh             ::  less-436-2.el5
vim.csh             ::  vim-enhanced-7.0.109-6.el5
vim.sh              ::  vim-enhanced-7.0.109-6.el5
which-2.sh          ::  which-2.16-7

Alas, thorough satisfaction is achieved except for the part where I found rpmquery already provides more or less everything I came up with:

rpmquery -f

$ rpmquery -f /sbin/{fsck,service,chkconfig}

Bourne/Bash Have Hash

Thursday 08/04/11

For a while I have been using function Validate in order to ensure the existence of all non-builtin commands needed by a script. This also aided in making sure my scripts were slightly more secure by validating that the the command I expect to be a binary is actually a binary and not an alias to a malicious script or even malicious function established in the current shell. The benefits of function Validate don't exactly come without tradeoff. While one of my goals will always be to maximize the usage of language builtins, some of my scripts make use of a fair amount of non-builtin commands. For instance, Ambit uses the following non-builtin commands:

mkdir uniq sort sed cat touch host rm grep 
column head comm tail mv tr egrep logger

None of this stuff is 'exotic' or gleaned from packages that are not usually part of the typical base *NIX install, but it is still a high number of commands to validate. Luckily, the validation process happens only once and has shown to execute very fast. But...

As someone that has overlooked important information when learning a subject I have developed a habit of going back and re-reading material (over and over sometimes). Today I was looking through Bash's Parameter Expansion documentation when I stumbled on the the hash builtin to Bash:

$ help hash
hash: hash [-lr] [-p pathname] [-dt] [name ...]
    For each NAME, the full pathname of the command is determined and
    remembered.  If the -p option is supplied, PATHNAME is used as the
    full pathname of NAME, and no path search is performed.  The -r
    option causes the shell to forget all remembered locations.  The -d
    option causes the shell to forget the remembered location of each NAME.
    If the -t option is supplied the full pathname to which each NAME
    corresponds is printed.  If multiple NAME arguments are supplied with
    -t, the NAME is printed before the hashed full pathname.  The -l option
    causes output to be displayed in a format that may be reused as input.
    If no arguments are given, information about remembered commands is 


Ascendancy Of Greatness...Accurately Predicted
Tuesday 09/10/19 at 12:09:37 PM

For those that are just learning who Jon Jones is after his dominating win tonight over Mauricio Rua (to become the UFC Light Heavyeight Champion) know that your's truly and an old friend predicted the coming of the aforementioned Jones. Here's a Facebook back-and-forth between myself and Jeremiah Mason 8 months ago:

August 1, 2010 at 10:27pm

Mike Marschall -> Jeremiah Mason

This is so frustrating:

"UFC prospect shines Jon Jones (R) was too quick for Vladimir Matyushenko. Cagewriter: Test passed"

The guy is already one of the best in the division and unless he has a complete meltdown he will be the best light heavyweight of all time. I thought everyone knew that until I started seeing the "prospect" label.

August 2, 2010 at 2:45pm

Jeremiah Mason -> Mike Marschall

They need him on main cards..i know there trying to build name reconition, but he's got it all...i think he's more polished than Ryan Bader, yet he's on the main card against little Nog...maybe there just keeping him back to see if anyone comes up with an answer forhim...Matuchinco/Hamil/Bonner(allegedly) world class wrestlers destroyed...give him a wc striker and i guarentee Bones is to fast for them..his only thing that needs to be tested is his chin and cardio, how he reacts if game plan A isn't working..I tell everyone who will listen, jump on this train while u can, cause once it gets going, there will be no room

August 2, 2010 at 2:45pm

Mike Marschall

Yeah there is no doubt they are trying to milk it to maximize things for everyone. He would smash Bader. Yeah the bandwagon is going to get full really fast. Baring some freak accident, I can almost promise he will be the richest fighter in MMA before he is done and he will definitely be considered the greatest of all time. He is talented, mature and marketable.

[Unbelievably] Simple Enterprise Job Scheduling/Management
Tuesday 09/10/19 at 12:09:37 PM

Jobber is a simple solution for centrally managing scheduled jobs across many systems. Jobber uses UNIX/Linux cron in addition to isitme (part of Jobber) to determine which user (EUID - effective user id) a job will run as, what time a job will run at and what hosts a job will run on. Jobber makes use of /etc/crontab as the central configuration resource for establishing all jobs. This file can then be distributed to all hosts via a variety of methods (Puppet/Chef, SCP, Rsync, etc.).


Latest Jobber Source

How Jobber Works

Once jobber and isitme are copied to all hosts that need centralized job control, something similar to the following can be added to /etc/crontab:

    Time     User         Host    Job  

*/5 * * * * Puppet jobber puppet pmaster
*/5 * * * * Puppet jobber puppet pupdate
*/2 * * * * Apache jobber apache ApachePush
*/5 * * * * Yum    jobber yum    YumPush

The /etc/crontab file allows for job entries to be established with an additional user field. This is ideal for those that like to establish 'job' or 'role' accounts for jobs that make use of automated authentication and process execution. Other than the user field, job entries in /etc/crontab are standard cron entries that establish an execution schedule and a job to run. In the case of a job that is Jobber controlled, cron runs jobber with two command line options:

  1. The hostname or service name representing the host[s] that will execute the job. Here we define 'service name' as a group of hosts that fulfill the same need (often cooperatively) and are likely identically configured.
  2. The job {binary,script} to be executed.

Jobber then acts as a wrapper with 3 primary functions:

  1. Call isitme, passing the hostname or service name to see if the current host running jobber matches the hostname or one of the hosts that make up the service.
  2. Execute the job {binary,script} that has been passed to jobber.
  3. Log (to syslog) the job name and the status of the job.

A job's status is either "Executed", "Failed" or "Skipped". If the current host matches the hostname or one of the hosts in the service passed to Jobber then the job will execute. If it does not then the job is skipped. This means all hosts that receive the centrally managed /etc/crontab file will attempt to execute scheduled jobs.

Jobber logs all job execution to syslog (regardless of a job's final status). This + Massh result in simple means to check job exection status no matter how many hosts are involved. The following shows that the job 'webpush' executed on 1.a.tt and not on the other two hosts:

1 $ massh -r att -c 'tail -5 /var/log/cron | grep JOB=webpush' -o | grep -v Failed
1.a.tt : 2011-03-19T07:45:02.378388-04:00 1 jobber[32491]: JOB=webpush STATUS=Executed
2.a.tt : 2011-03-19T04:45:01.512791-07:00 2 jobber[16464]: JOB=webpush STATUS=Skipped
3.a.tt : Mar 19 04:45:01 3 jobber[10349]: JOB=webpush STATUS=Skipped

Using Massh's -S (Success) option it is possible to verify the host by hostname only (Massh will only return the hostname[s] where the command ran successfully):

1 $ massh -r att -c 'tail -5 /var/log/cron | grep JOB=webpush.*E' -S 

My (Not So) Humble Opinion
Tuesday 09/10/19 at 12:09:37 PM

Decided to post an email I wrote to a friend and former colleague where I explain my obvious disdain for Cloud Computing.

He wrote:

"...I know you are a cloud hater, but there is a shitload of opportunity to work on that stuff."

To which eagerly replied:

I have mislead almost everyone regarding my feelings about Cloud Computing. I do not have a problem with Cloud Computing as a concept. My problem is with the notion of Cloud Computing as a product or technology when it's just bullshit Marketecture for selling one of the following:

1) The excess computing capacity of companies that have created massive, global, horizontal environments (Amazon, Google, Yahoo, Facebook, Microsoft). These environments were designed to be simple to configure, manage and grow. The nature of such environments (huge scale) demand high levels of automation.

2) All the stuff in #1, but purpose built rather than a repackaging of excess capacity and existing tools. Hosting companies like RackSpace and Media Temple have created Clouds (ugh!) for the purpose of selling generalized computing capacity to the masses.

Cloud Computing is nothing more than what the aforementioned companies have constructed to address Internet capacity/demand for ~25 years. Cloud Computing (the term) was created so that sales people could turn around and sell excess capacity along with automation and management tools created by various operations teams at many of the worlds large Internet companies. The Internet and Computing Industry press have made the term ubiquitous via constant, trumped up buzz (my opinion of course). The idea of Computing as a Commodity (cpu cycles, storage) is not new, unless existing for 30 to 40 years is new. Combine this with the fact that there is no new technology represented by Cloud Computing and it is clear that Cloud Computing amounts to:

- A little technology recycling
- A lot of hot air and conceptual fluff
- An unmitigated marketing success
[an error occurred while processing this directive]

This site will always use valid XHTML and valid CSS.

© AfterTheTweet.com 2020