Category : PowerShell

SportyPal Screen Scraping with PowerShell

imageFor the past year (2010) I had been using SportyPal – an application for tracking exercise (runs mainly for me). It has mobile apps (iPhone, Android, WinMob etc.) that do the actual tracking and then upload the data to their website where you can view history, graphs, charts, records etc. It really is neat, and the user interface on the mobile device really looks good (at least the WinMob and Android versions I used)…

I have been pinging them on their forums for a while now about when their subscription service would be launched and about getting a subscription trial – they announced that they expected it in Summer 2010, it got delayed and delayed and although I got vague answers about release dates from their forum and twitter responses, it still was not available by the end of December 2010.

imageSo, much as I loved their app, it was time to switch – RunKeeper was the new app/service I chose. The problem I faced was how to get my years worth of data out of SportyPal – they do allow you to export GPX data for each run, but it’s on a run by run, manual basis – not good.

Time to crack open PowerShell…

A bit of poking around with Chrome Developer Tools and Fiddler2 identified the sequence for logging in and the format of the history/activity page that lists all runs.
So now I had a basis for screen-scraping the data I needed, and I also notice that for each run there was a link to download the GPX.

I put together a script that would login, open the activities page and grab the data about each run, it also downloads a copy of the GPS data to a separate file for each run, with the filename set as the date/time stamp of the run.
I couldn’t see any easy method of importing (but to be honest I didn’t look for very long), but as I only had 100 runs to import I simply did it manually (RunKeepers import function is only about 3 clicks).

Initially I found that every GPX file I imported came up with an error about the GPX being invalid, however, after a browse around the forums I found that one of the namespaces was incorrect (http://www.topografix.com/GPX/1/0 when it should be http://www.topografix.com/GPX/1/1). This did the trick and now on importing each GPX file the correct run details, route etc all showed up – so all was good.

Here’s the PowerShell script – it logs in to your account, screen scrapes all your activities and then downloads the GPX for each. It also ‘fixes’ the downloaded GPX so it has the correct namespace :

$email = "your_email"
$password = "your_password"


$url = "http://www.sportypal.com/Login"

"Starting..."
""

     [System.Net.HttpWebRequest] $request = [System.Net.HttpWebRequest] [System.Net.WebRequest]::Create($url)
     $cookieJar = new-object "System.Net.CookieContainer"
     $request.CookieContainer = $cookieJar
     $request.Method = "POST"
     $request.ContentType = "application/x-www-form-urlencoded"
     $param = "Email=" + $email + "&Password=" + $password + "&login=Login"

     $request.ContentLength = $param.Length

     [System.IO.StreamWriter] $stOut = new-object System.IO.StreamWriter($request.GetRequestStream(), [System.Text.Encoding]::ASCII)
     $stOut.Write($param)
     $stOut.Write($sourceParam)
     $stOut.Close()

     "Logging in..."
     ""

     [System.Net.HttpWebResponse] $response = [System.Net.HttpWebResponse] $request.GetResponse()

     if ($response.StatusCode -ne 200)
     {
           $result = "Error : " + $response.StatusCode + " : " + $response.StatusDescription
     }
     else
     {
           $sr = New-Object System.IO.StreamReader($response.GetResponseStream())
           $txt = $sr.ReadToEnd()
           $cutstart = $txt.Substring($txt.IndexOf('<table id="my_workouts"'))
           $cutend = $cutstart.Substring(0,$cutstart.IndexOf("</div>"))

           "Getting workouts"
           $workouts = @()
           $ipos = 0
           while(($ipos -ne -1) -and ($ipos -lt ($cutend.Length -1)))
           {
                $s = $cutend.IndexOf("<tr id=", $ipos)
                if ($s -ne -1)
                {
                    $e = $cutend.IndexOf("</tr>", $s)
                }
                else
                {
                    $e = -1
                }
                if(($e -ne -1) -and ($s -ne -1))
                {
                    $tr = $cutend.Substring($s, ($e + 5) - $s)
                    $workouts += $tr
                }
                $ipos = $e
           }

           #$workouts | %{ $id = $_.Substring(11,6); $id }

           "Got " + $workouts.Length + " workouts"
           foreach($wo in $workouts)
           {
                $id = $wo.Substring(11,6)
                $s = $wo.IndexOf("dateval_$id") + 23
                $dt = (New-Object "System.dateTime"(1970,1,1)).AddMilliseconds($wo.Substring($s,13))
                $s = $wo.IndexOf("td_number clickDetails") + 24
                $e = $wo.IndexOf("</td>", $s)
                $dist = $wo.Substring($s, $e-$s).Trim()
                $dist = $dist.Substring(0, $dist.Length-2)
                $s = $wo.IndexOf("td_number clickDetails", $e) + 24
                $e = $wo.IndexOf("</td>", $s)
                $time = $wo.Substring($s, $e-$s).Trim()
                $s = $wo.IndexOf("td_number clickDetails", $e) + 24
                $e = $wo.IndexOf("</td>", $s)
                $cals = $wo.Substring($s, $e-$s).Trim()
                $cals = $cals.Substring(0, $cals.Length-4)
                "Workout on $dt ( ID = $id ) : $dist : $time : $cals calories"

                # now grab the GPX
                $filename = "c:scriptssportypal" + $dt.ToString("yyyy-MM-dd_HHmm") + ".gpx"
                $gpxUrl = "http://www.sportypal.com/Workouts/ExportGPX?workout_id=$id"
                [System.Net.HttpWebRequest] $gpxRequest = [System.Net.HttpWebRequest] [System.Net.WebRequest]::Create($gpxUrl)
                $gpxRequest.CookieContainer = $request.CookieContainer
                $gpxRequest.AllowWriteStreamBuffering = $false
                $gpxResponse = [System.Net.HttpWebResponse]$gpxRequest.GetResponse()
                [System.IO.Stream]$st = $gpxResponse.GetResponseStream()

                # write to disk
                $mode = [System.IO.FileMode]::Create
                $fs = New-Object System.IO.FileStream $filename, $mode
                $read = New-Object byte[] 256
                [int] $count = $st.Read($read, 0, $read.Length)
                while ($count -gt 0)
                {
                    $fs.Write($read, 0, $count)
                    $count = $st.Read($read, 0, $read.Length)
                }
                $fs.Close()
                $st.Close()
                $gpxResponse.Close()
                "- GPX Data in $filename"
           }

     }

     $response.Close()



    # now fix the namespace error
    $files = gci "c:scriptssportypal*.*"
    foreach ($file in $files)
    {
        "Fixing namespace error in " + $file
        $content = get-content $file
        $content[0] = $content[0].Replace('xmlns="http://www.topografix.com/GPX/1/0"', 'xmlns="http://www.topografix.com/GPX/1/1"')
        Set-Content $file $content
    }

    "Complete"

Feel free to use this (at your own risk – I accept no liability whatsoever), but realise that you may be breaking all manner of T’s & C’s for SportyPal.

… and on a final note to anyone from SportyPal :

Sorry. I persevered, I really did, I was ready to throw money at you for a PRO subscription, I could even have lived with a further slipped release date, but not keeping your fans updated, giving no idea of a roadmap – doesn’t make us feel the love”.

GEO: 51.4043807983398 : -1.2872029542923

UPDATE: Thanks to Ricardo, who identified a bug where workouts marked as private would not download correctly. The script is now updated with this fix.

PowerMeter from PowerShell

image I was trying to get my Home Energy Monitor application working to Google PowerMeterimage this evening. To get things moving quickly I decided to prototype in PowerShell (as you the full sugary goodness of the .Net framework for free). Here’s the details on accessing PowerMeter from PowerShell…

Although it is called Google PowerMeter, it is simply a service that records variables (of either cumulative or instantaneous values).

Firstly you need to get registered for an account, and it is not obvious how you actually get a Google PowerMeter account if you don’t have some of the supported devices or don’t have a contract with one partner utility companies. The easy way is to put together a url that basically requests a new account. the format of the url is :-

https://www.google.com/powermeter/device/activate?mfg=MMM&model=PPP&did=DDD&cvars=N

All the details are on this page. I have a CurrentCost Envi, so my url became:

https://www.google.com/powermeter/device/activate?mfg=CurrentCost&model=Envi&did=8097&dvars=1

Note I’m using dvars at the end instead of cvar – dvars are for durational measurements and cvars are for instantaneous measurements – you need to get these right or your uploads will fail. the dvars=1 means I want only 1 variable (energy), I could have opted for more (dvars=2, or dvars=3 etc), but 1 will do for now.

imageWhen you’ve created your url, simply browse to it. Google will authenticate you with your usual Google account and then ask you to give a friendly name to the variable(s) you created. When complete you’ll be presented with a activation code. You can get this activation again by browsing to your settings page in Google PowerMeter.  From this activation code you need 3 pieces of data as highlighted below :

image

The first is your ‘auth token’, the second is your ‘user id’ and the third is your ‘device id’.

Now for the PowerShell script. It is fairly simple, it creates an Xml string with the start date of the reading, the duration (in seconds) and the value and then uploads this to Google PowerMeter. It does need some Headers adding first to make sure your sending the correct Content-Type and to make sure you are authorized…

$auth = "YOUR AUTH TOKEN"
$user = "YOUR USER ID"
$device = "YOUR DEVICE ID"
$variable = "d1"
$url = "https://www.google.com/powermeter/feeds/event"
$var = "https://www.google.com/powermeter/feeds/user/$user/$user/variable/$device.$variable"

$start = [System.Xml.XmlConvert]::ToString([System.DateTime]::Now)
$duration = 1
$energy = 9999

$xmlData = @"
    <feed xmlns=`"http://www.w3.org/2005/Atom`"
    xmlns:meter=`"http://schemas.google.com/meter/2008`"
    xmlns:batch=`"http://schemas.google.com/gdata/batch`">
    <entry>
        <category scheme=`"http://schemas.google.com/g/2005#kind`"
        term=`"http://schemas.google.com/meter/2008#durMeasurement`"/>
        <meter:subject>{0}</meter:subject>
        <batch:id>A.1</batch:id>
        <meter:startTime uncertainty=`"1.0`">{1}</meter:startTime>
        <meter:duration uncertainty=`"1.0`">{2}</meter:duration>
        <meter:quantity uncertainty=`"0.001`" unit=`"kW h`">{3}</meter:quantity>
    </entry>
    </feed>
"@

$rdgData = [string]::Format($xmlData, $var, $start, $duration, $energy)

$wc = New-Object System.Net.WebClient
$whc = New-Object System.Net.WebHeaderCollection$res
$whc.Add("Content-Type", "application/atom+xml")
$whc.Add("Authorization", "AuthSub token=`"$auth`"")
$wc.Headers = $whc

$response = $wc.UploadString($url, "POST", $rdgData)

Now you have the xml response in the $response variable. To check this you can simply ([xml]$response).feed.entry.status.code – you’re looking for a 201 (‘Created’).
You should now have a measurement lodged with Google PowerMeter !!  Enjoy…

GEO 51.4043388366699:-1.2875679731369

Generate sitemaps using PowerShell

I was discussing ‘googlability’  – a new word I made up meaning ‘the ability to find via Google’ – of our knowledgebase with one of the technical guys at work.
It seems that we seldom get matches in Google searches (and the built in search is somewhat lame) – I was quite surprised with the fact that Google wasn’t matching anything.

Looking into it a bit further, I found that although our knowledgebase is public, the Urls are pretty undiscoverable, all having a ‘articleid’ parameter – obviously, the GoogleBot couldn’t just guess at the values and so was skipping the majority of our article, apart from the few listed on the main page.

We needed to give it some hints by adding a sitemap. I (ever so) briefly toyed with adding a sitemap page to the knowledgebase website using the standard XML based sitemap protocol etc, but our site is written in PHP and I didn’t want to get bogged down in all that again…
In a rare burst of being pragmatic and keeping things simple (as opposed to _way_ over engineering a solution) I recalled that Google’s webmaster tools allow you to submit a text file as a sitemap with one Url per line.

I knew the format of the Url for our articles so it just required a bit of PowerShell to generate a bunch of lines containing Urls with sequential numbers and write them to a file. version 1 looked like this :

set-content "c:sitemap.txt" (1..1000 | %{ "http://support.c2c.com/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=$_&nav=0`n" })

However, uploading this sitemap caused the Google machine to choke and spew out a bunch of errors about invalid Urls… A little more digging uncovered that the text file uploaded must be encoded in UTF8. So version 2 looked like this :

set-content "c:sitemap.txt" (1..1000 | %{ "http://support.c2c.com/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=$_&nav=0`n" }) -encoding UTF8

Out popped a text file with 1000 Urls, in the correct format, with the correct encoding and accepted by the Google machine with no problems.
Probably 10 minute work all in – I wouldn’t have even got the PHP coding tools fired up in that time – reminder to self “KISS works !!

GEO51.4043502807617:-1.28752994537354

Google Results Ranking

Disclaimer: Screenscraping results like this probably contravening Google’s Terms of Use (or something) and I do not advocate that you do it – this is purely hypothetical, if I did want to do it, this is how I would go about it  😉

Further Disclaimer: The results page formats could change at any time and may well break this script, if that happens you are on your own (FireBug and some modified regex should help you out).

image

So, if you wanted to get the Google ranking of a bunch of domains when searching for a particular term you could use one of the many SEO page ranking test sites that are available, but these are a pain in as much it they require you to enter the search term and the domain name you are looking for and they give you the ranking (what position in the results the domain name comes). that is fine for individual searches (like what position is kapie.com if I search on ‘Ken Hughes’), but not very good for doing a comparison of multiple domains against the search term.

I looked at using Googles Search API to get this info, but unfortunately it only returns 4 or 8 results (it is mainly designed to present some brief results in a box on your website), what I needed was to look at a lot more results (like up to 500)….

Back to my trusty friend – PowerShell…

I create a web client, have it download the first X (500) results to the search term, load the link Url and the position into a hashtable and then lookup the hashtable to find the rank position of each of the domain names I am looking for.
It was actually pretty easy, the only difficult part was getting the regex(s) correct – Regex is evil, as evil as Perl….

Here is the script code :

  $domainNames = "google.com", "live.com", "bing.com", "yahoo.com"
  $maxResult = 100
  $searchTerm = "search"

  $urlPattern = "<s*as*[^>]*?hrefs*=s*[`"']*([^`"'>]+)[^>]*?>"
  $hitPattern = "<s*(h3)sclass=r>(.*?)</1>"

  $wc = new-object "System.Net.WebClient"
  $urlRegex = New-Object System.Text.RegularExpressions.Regex $urlPattern
  $hitRegex = New-Object System.Text.RegularExpressions.Regex $hitPattern
  $urls = @{}

  $resultsIndex = 0
  $count = 1
  while($resultsIndex -lt $maxResults)
  {
    $inputText = $wc.DownloadString("http://www.google.com/search?q=$searchTerm&start=$resultsIndex")

    "Parsing : " + $resultsIndex

    $index = 0
    while($index -lt $inputText.Length)
    {
      $match = $hitRegex.Match($inputText, $index)
      if($match.Success -and $match.Length -gt 0)
      {
        $urlMatch = $urlRegex.Match($match.Value.ToString())
        if(($urlMatch.Success) -and ($urlMatch.Length -gt 0))
        {
          $newKey = $urlMatch.Groups[1].Value.ToString()
          if(!$urls.ContainsKey($newKey))
          {
            $urls.Add($newkey, $count)
          }
          $count++
        }
        $index = $match.Index + $match.Length
      }
      else
      {
        $index = $inputText.Length
      }
    }
    $resultsIndex += 10
  }


  foreach($domain in $domainNames)
  {
    $maxPos = -1
    foreach($key in $urls.Keys)
    {
      if($key.Contains($domain))
      {
        $pos = [int] $urls[$key]
        if(($pos -lt $maxPos) -or ($maxPos = -1))
        {
          $maxPos = $pos
        }
      }
    }
    if($maxPos -eq -1)
    {
      $domain + " : Not Found"
    }
    else
    {
      $domain + " : Found at result #" + $maxPos
    }
  }

Drop me a line in the comments if you find it useful…

GEO 51.4043197631836:-1.28760504722595

Replace in Files for PowerShell

A while back I restructured my website so that this blog no longer started at the root, instead starting from /blog. This was so that I could introduce some other web apps and have a subfolder for projects etc.

One of the pains of this restructure was modifying all the links – I thought I had caught all this with a Redirector HttpModule, but recently realised that for some reason I had not caught images embedded in the posts themselves.
Also it was becoming a pain having to remember to include the HttpModule in my web.config everytime I upgraded my blog (dasBlog)

I wanted it fixed properly this time, so grabbed a copy of all the XML files in my ‘content’ folder, copied them to a local folder and cracked open PowerShell…

I wanted every instance of www.mywebsite.com changed to www.mywebsite.com/blog – not difficult, but this would also change valid urls such as www.mywebsite.com/blog/page.aspx to www.mywebsite.com/blog/blog/page.aspx (note the /blog/blog in the url)

So I got everything I needed done with two ‘one liners’ in PowerShell…

dir | %{ $a = get-content $_ ; $a = $a -replace (“www.mywebsite.com”, “www.mywebsite.com/blog”) ; set-content $_ $a }

…and…

dir | %{ $a = get-content $_ ; $a = $a -replace (“www.mywebsite.com/blog/blog”, “www.mywebsite.com/blog”) ; set-content $_ $a }

All fixed…

 

GEO 51.4043197631836:-1.28760504722595 

Tweet from PowerShell

twitterI have been playing with Twitter recently and thought it might be neat to see if I could post a ‘tweet’ from PowerShell. There is a great Google Group that discusses their API. The APIs are all REST based and really easy to use – the only complexity is that you need HTTP Basic Authentication to do anything ‘real’.

One of the more simple API calls is to get the public timeline. No authentication is required for this so you can simply the url into your browser and get back the data (xml format, but json and other formats are available also). Try this :Windows_PowerShell_icon

http://twitter.com/statuses/public_timeline.xml

Now, for doing an update we need the following API:

update

Updates the authenticating user’s status.  Requires the status parameter specified below.  Request must be a POST.

URL: http://twitter.com/statuses/update.format

Formats: xml, json.  Returns the posted status in requested format when successful.

Parameters:

  • status.  Required.  The text of your status update.  Be sure to URL encode as necessary.  Must not be more than 160 characters and should not be more than 140 characters to ensure optimal display.

The fact it must be a POST means we have to use a HttpWebRequest (as opposed to the easier WebClient). Anyway, here is the PowerShell function :

function Send-Tweet([string]$text, [string]$username, [string]$password)

{

     $updateurl = http://twitter.com/statuses/update.xml

     $result = $null

     $text = [System.Web.HttpUtility]::UrlEncode($text)

 

     [System.Net.HttpWebRequest] $request = [System.Net.HttpWebRequest] [System.Net.WebRequest]::Create($updateurl)

     $request.Credentials = new-object System.Net.NetworkCredential($username, $password)

     $request.Method = “POST”

     $request.ContentType = “application/x-www-form-urlencoded”

     $param = “status=” + $text

     $sourceParam = “&source=PowerShell”

     $request.ContentLength = $param.Length + $sourceParam.Length

 

     [System.IO.StreamWriter] $stOut = new-object System.IO.StreamWriter($request.GetRequestStream(), [System.Text.Encoding]::ASCII)

     $stOut.Write($param)

     $stOut.Write($sourceParam)

     $stOut.Close()

 

     [System.Net.HttpWebResponse] $response = [System.Net.HttpWebResponse] $request.GetResponse()

     if ($response.StatusCode -ne 200)

     {

           $result = “Error : “ + $response.StatusCode + ” : “ + $response.StatusDescription

     }

     else

     {

           $sr = New-Object System.IO.StreamReader($response.GetResponseStream())

           [xml]$xml = [xml]$sr.ReadToEnd()

           $id = $xml.status.id

           $tweet = $xml.status.text

           if ($tweet.length -gt 50) { $tweet = $tweet.Substring(0,50) + “…(truncacted)” }

           $result = “Tweet “ + $id + ” added : “ + $tweet

     }

    

     return $result

}

And to use it :

send-tweet “I’m sending updates from PowerShell, cool or what ??” “<your_username>” “<your_password>”

 
GEO 51.4043197631836:-1.28760504722595

PowerShell Plus

One of my colleagues switched me on to PowerShell Plus and I’m loving it.

PowerShellPlusUI Code editor, snippets, values of variables, logging tools and much more, including a really neat feature called ‘MiniMode’ (see the toolbar icon at the extreme right in the image.

This ‘MiniMode’ closes all toolbars/toolwindows except the main console but also makes the console window transparent (user configurable level of transparency). This mode is real easy to work with…

PowerShellPlusMiniMode

There is a free single user license for non commercial use.

I encourage you to try it out.

GEO 51.4043197631836:-1.28760504722595

Immersing myself in PowerShell

PowerShell has been around for some time now, what with betas and CTPs. For (the released version of) Vista it became available a month or so ago.

It’s been on my ‘must get to grips with’ list for a while now and I’ve kinda been following some blogs about it, slowly getting a little knowledge here and there.

Mr Hanselman (the oracle for all things technical) has done a couple of podcasts on it and I listened a couple of weeks ago to an episode of Hanselminutes where he interviewed Bruce Payette. Bruce is the language architect for PowerShell and has just (2 weeks ago) released a book on it (Windows PowerShell in Action).

Bought the book last week in the US and have had it open ever since. Everything I do, I now do with a rosy PowerShell perspective. The only way (I find) to really to get to grips with something is to completely immerse yourself in it – think in it, live it, breathe it….

Today I have been updating the server side file for our Archive One (email archiving for Exchange) auto update feature (we’re just released V5.0 SR1, so the build numbers that are checked have changed). It’s a simple XML file that is parsed for ‘GA’ release version (ProdVer) and ‘HF’ version (hotfix). It looks like this (sample only):

<?xml version="1.0"?> <Versions> <AOnePolService ProdVer="5.0.0.1643" HotFix="5.0.0.1643"></AOnePolService> <AOneCmplService ProdVer="4.3.0.1077" Hotfix="4.3.0.1094"></AOneCmplService> </Versions>

I wanted to provide a quick and easy way to find the latest version of each products. I came up with the following PowerShell one liner (split over three lines for readability):

([XML] (new-object ("net.webclient")).Downloadstring(
"http://support.c2c.com/versioncheck/currentversions.xml")).versions.get_ChildNodes() |
% { "" } { $_.psbase.Name + "`t GA=" + $_.prodver + "`t HF=" + $_.hotfix } { "" }

Downloads the xml file, parses it and lists out the product, GA version and HF version

Watch this space for some Active Directory related stuff as I have been very active in scripting AD over the past few days and am in the process of porting it to PowerShell.