Page 1 of 1

I need some help with AWK

PostPosted: Sat Nov 17, 2012 9:06 am
by jbv
Hi folks,

I've hit a wall, and I can't work out how to get around it.

While trying to restructure my build environment, I came upon a glitch (major hole) in my limited knowledge of awk

I am trying to process /var/lib/dpkg/available using awk and I have just discovered that the awk increment operator is creating an issue for me.

Basically, I am trying to find the string "Package: g++" in /var/lib/dpkg/available using the following awk one-liner:
Code: Select all
awk /"Package: g++"/,/^$/ /var/lib/dpkg/available

However, the ++ in the search string, is being interpreted by awk as being an increment operator and consequently awk is returning every section beginning with the letter "g".

The following line that looks for and outputs the section related to gcc-4.3-base works fine.
Code: Select all
awk /"Package: gcc-4.3-base"/,/^$/ /var/lib/dpkg/available


This gets a little screwier because the "g++" is passed as a variable inside of a for-loop, and the output is piped so as to be concatenated to a new file, so in real code the actual awk line looks more like this:
Code: Select all
#!/bin/bash

    cd /var/lib/dpkg/info
    for i in `find -name '*.list'`

    do
      #echo "Found : "$i
      # echo $i | sed 's:.list::'
      # the first 2 characters of $i will be "'/" se we need
      # to strip the first two characters from the filename
      # and place the new string in an Environment variable
      # Then we need to remember that the last 5 characters of our
      # environment variable will be ".list" so now we will need
      # to strip the the string ".list" from the variable
      # the next 2 lines of this script, do just that
      pkg_name=${i:2}
      pkg_name=${pkg_name/.list/}
      echo "Including Package : "$pkg_name
      #Pause
      # To make awk search from end of line, use the "$" at the end of
      # the search begin section - example /"findme"$/
      # To make awk find the next blank line use start at begining and
      # start at end search with no match to find - example /^$/
      #
      awk /"Package: ""$pkg_name"/,/^$/ "/var/lib/dpkg/available" >> "/tmp/dpkg-available.new"
    done


Is there a way I can tell awk to not use auto-increment and that the "++" is really part of the search string?
Ideally, the solution should work without any other pre-processing when being used inside of my "find" loop.

Advance Thanks, Brenton

Re: I need some help with AWK

PostPosted: Sat Nov 17, 2012 12:36 pm
by KazzaMozz
Hi Brenton
I have not ever used awk as this is beyond my knowledge. What I do have is the ability to maybe find some extra info for you to help with your problem.

You may or may not have already found these places to look at but on the off chance I hope it helps.
Increment Operators
Increment & Decrement Operators
History & Info

More than likely you have scrounged the net for this but if not maybe you will find the above of some use.

Cheers
Kazza

Re: I need some help with AWK

PostPosted: Sat Nov 17, 2012 4:11 pm
by saintless
jbv wrote:Is there a way I can tell awk to not use auto-increment and that the "++" is really part of the search string?

I hope this will help you a bit:
In a regexp, a backslash before any character that is not in the above table, and not listed in section Additional Regexp Operators Only in gawk, means that the next character should be taken literally, even if it would normally be a regexp operator. E.g., /a\+b/ matches the three characters `a+b'.


Or may be it is possible to exclude a-z characters with this:
[^ ...]
This is a complemented character list. The first character after the `[' must be a `^'. It matches any characters except those in the square brackets, or newline. For example:

[^0-9]

matches any character that is not a digit.


The information is from here:
http://www.math.utah.edu/docs/info/gawk_5.html

Cheers, Toni

Re: I need some help with AWK

PostPosted: Sun Nov 18, 2012 1:51 am
by jbv
Thanks Kazza, Toni,

Toni - you are a genius. That little trigger was just what I needed. Thanks.

After the input from you both and a little bit of bash string manipulation, I'm all sorted now.
I found a really neat page that was very helpful with the bash stuff <here>

One day, I will need to go back over a lot of our scripts and optimize them. Because I'm relatively new to *nix, I often just hit the net to find the way to do what I want to do. As a result of this, there are times when I'm using sed and awk to do things that I can do within bash itself. However, I'll leave that until things settle down a little.

Thanks again. Cheers, Brenton

Re: I need some help with AWK

PostPosted: Sun Nov 18, 2012 2:23 am
by jbv
For those asking ... Where's the Beef?

Here's the Beef ...

Scenario is as follows. We have a script that parses /var/lib/dpkg/info and finds the names of all packages that have been recently added to the dpkg database. This script builds a table of the package names and then looks inside of /var/lib/dpkg/available and using awk it extracts all of the details for the specific package and places them into another file.

As mentioned above, package names that contained "++" in them such as "g++" were wreaking havoc.
The fix was to escape the "++" by placing a "\" in front of it, so that it became "\++"
Code: Select all
g++                   needs to be    g\++
g++-4.4               needs to be    g\++-4.4
libstdc++6-4.4-dev    needs to be    libstdc\++6-4.4-dev


After playing around with sed and actually getting it to do what was needed, I remembered that you can manipulate strings in bash. The following bash substitution does the trick.
Code: Select all
  # Escape ++
  pkg_name=${pkg_name//++/\++}
  # Escape --
  pkg_name=${pkg_name//--/\--}

Running this substitution on all package names as they are processed, does the trick.
The silly thing is that I was already using bash string manipulation in the package name.

Re: I need some help with AWK

PostPosted: Sun Nov 18, 2012 4:47 am
by saintless
Glad to help, Brenton :)
Cheers