f-log

just another web log

20 May 2012 22:15
Continuing my TV recipe
The films story was an easy one, now on with the programmes.
Lets start with the SQL query (which I run in the excellent sqlite manager firefox plugin which incidentally I have contributed code to).
SELECT
    T.[id],
    P.[Title],
    P.[Episode],
    P.[Description],
    P.[sub-title],
    strftime('%d',P.[Date]),
    P.[Start Time],
    C.[Name]
FROM
    programme P
INNER JOIN Channel C ON P.channelId = C.Id
LEFT JOIN Tracker T ON P.Description = T.Description
WHERE
    P.[title] in ('The Simpsons','Mythbusters','Pawn Stars','Family Guy','South Park','American Dad!','QI')
    AND (
            (
                date = date('now')
                AND [start time]>=time('now')
            )
            OR date=date('now','+1 day')
    )
    AND P.[Episode] !=''
    AND (
        T.[Title] is null
        OR T.[seen] =0
        OR P.[title] LIKE 'Robot Chicken%'
    )
ORDER BY [date],[start time]

"T.[id]" gives me the ID for the trackers script to set the 'seen' flag.
All the "P." columns are from the "Programme" table and the "C.[Name]" is the channel name, rather than the channel ID.
"INNER JOIN" results in a match for each channel ID to a channel Name.
"LEFT JOIN" is for all programmes even if there is not a corresponding tracker entry.
Then limit to my currently preferred programmes where the date and time are today or tomorrow.
Make sure the tracker is either null(not set) or explicitly set to 0(not seen) with the special caveat that I really want to watch Robot Chicken !
The "strftime('%d'" just lets me know easily if its today or tomorrow without a massive data string, I know what year it is!
All this means two things, one if I have not set an episode to 'seen' then it will show in my search meaning I feel no compulsion to record everything I have not seen, two I will not miss any random episodes that are on at different times or on different channels.
You might find this strange if you have skimmed through all the TV recipe posts but, I am watching considerably less than I use to. Mainly because I only go to the TV when I know something is on or needs to be recorded.
The only downside is, I might miss something outside my search parameters, like a new show, but hey I can live with that.
20 May 2012 21:43
Continuing my TV recipe
so now I have a nice big database of all the programmes from my preferred channels (read:all channels minus the ones I do not want/have no access to), what can I do with it ?
Time for some background on why I wanted to get all that data.
When I am tired or ill or just want to veg out, I like to watch TV, but even with sky's hundreds of channels I often find nothing on. But, being in 'veg' mode I then watch anything.
To try and fix this problem I started recording programmes and films I might want to watch in a future 'veg' episode. This works but takes up a huge amount of time and its easy to miss things.
For instance, I used to regularly check Film4 each night, but that missed films on other channels. On a Friday I would try and scour all the major channels for the coming weekend, just for films, again often missing odd ones here and there.
I the meantime for non-films I would just record every episode and hope for the best, this meant I often recorded tons of old episodes I did not want to see again.
The Sky interface is just too slow to manually check each recording and I still managed to miss stuff.
So... this project automates the data acquisition and then I can make the choices of what to record almost automatic.
But there is a problem. The Simpons(for instance) which is on my episode hit-list has 300+ episodes and I do not want to read the description each time to find out if it is one I have seen before or not.
A new table is born, "Trackers". The "Trackers" table has most of the same fields as the "Programmes" table minus the date/time and with an additional field "Seen".
I can not mark an episode as 'seen' and when I 'Join' the "Trackers" and "Programmes" tables together I need never see that episode in the results again.
To set an episode to 'seen' I have a tiny bash script
#!/bin/bash
echo "Set Seen Tracker"
echo "================"
echo "(hit return with no value to exit)"
tid="nothing yet"
while [ ${#tid} != 0 ]
do
echo -n "tracker id:"
read tid
if [ ${#tid} != 0 ]; then
    sqlite3 database/tv.db "update tracker set seen=1 where id=$tid"
fi
done

The ID comes up in my queries and I just enter it in the window running the above script *bing* done.
For the film query I just need to get all data from today and flagged as a Film.
select [Title],[Genre],[Year],[Name],[Date],[Start Time] from programme P INNER JOIN Channel C ON P.channelId = C.Id where [Date] >= date('now') AND Film='true' ORDER BY [Date],[start time]
And although this make return over one thousand results for a two week period it takes almost no time to eye-ball the list looking for interesting films, many are repeats or +1
Now I never need to search the film channels again while remaining confident that I am not missing anything.
The other query for programme episodes is a bit more complex so I will leave that for another post ...
05 May 2012 21:41
I have been recently reading a number of Cory Doctorow (expect reviews) books.
Although I had heard of the author long ago, comments on slashdot.org referring to him as a 'hack' meant I had given his work a wide berth.
But what I HAVE to post about now is his non-fiction title ©ontent which you can download for FREE from http://craphound.com/content/download/
The author REALLY wants you to download all his books for FREE and they exist in many MANY formats.
I like mine in paper format and so got them from the Cory Doctorow section of amazon,where a number were only 1p (+p&p) each.
The sub title of ©ontent (or just content) is
Selected Essays on the Technology, Creativity, Copyright, and the Future of the Future
It is extremely difficult to try and describe what exactly it entails other than to say *YOU MUST READ THIS BOOK*, and you must try and get as many people as possible to also read it.
After reading it I want to order 1000s of copies and distribute them to everyone I see.
Comprising twenty eight stories/essays that are all very readable, well researched and easy to read, Cory details the REAL whys, whats and wherefores about humanities past, present and future connection with copyright, copying, the Internet and intellectually property.
The key thing I came away with, is that where we are in the history of mankind we are doing it all wrong, not just from a personal stand point but from a VERY clear historical one.
Radio was to change the world by broadcasting performances to anyone who could build a set, then cassette recorders, then video recorders and now the Internet.
Each one was a game changer and the laws were updated to ensure mankind benefited, but for some reason this is NOT happening with the Internet and digital copying.
"Copying stuff is never, ever going to get any harder than it is today"
Which makes me ask, 'How long until we can each have a copy of every relevant(and possibly irrelevant) piece or data created since the beginning of recorded history, every text of every book, every image from every web site, every film/television broadcast, every digital post/tweet, every sound bite from every radio programme, every recorded thought ever, in the palm of out hand, digitally searchbale and reproducable?'
05 May 2012 21:06
Continuing my TV recipe
Got my preferred channel list, time to get some programmes.
As noted in this post the programme files are just tilde '~' delimited text files, one programme per line. So just loop through the list of channel ids and end up with a load(202) numbered .dat files.
Before I could do anything with them I had to fix the line endings.
egrep -lIUr $'\r$' raw/channels/*.dat | xargs sed -i 's/\r//'
Which translates to find all the files with a \r at the end of the line and remove it.
Now I could (and probably should have) written a SQL INSERT command that only adds new records, but instead I read all the data into a programmeTEMP table and then INSERT new records in to the main programme table.
sqlite3 database/tv.db "delete from programmeTEMP"
for i in raw/channels/*.dat;
do
    channelid=$(echo "$i" | sed -e "s/[^0-9]*//g")
    cat raw/channels/$channelid.dat | grep "~" | sed -e 's/\"/\\\"/gi' -e "s/'/''/gi" -e "s/~/','/gi" -e "s/^/insert into programmeTEMP values (null,$channelid,'/gi" -e "s/$/');/gi" | sqlite3 database/tv.db
done

Which just loops through all the channels inserting all fields.
Next and very important, need to make the Date field searchable as its initial incarnation is just a string.
sqlite3 database/tv.db "UPDATE programmeTEMP SET [Date] = (SUBSTR([Date],7,4)||'-'||substr([Date],4,2))||'-'||substr([Date],1,2) WHERE [Date] LIKE '__/__/____';"
Now, matching on the Channel ID, Programme Title, Date and Start Time import new records.
echo "insert into programme select null,PT.channelId,PT.Title,PT.[Sub-Title],PT.Episode,PT.Year,PT.Director,PT.Performers,PT.Premiere,PT.Film,PT.Repear,PT.Subtitles,PT.Widescreen,PT.[New series],PT.[Deaf signed],PT.[Black and White],PT.[Film star rating],PT.[Film certificate],PT.Genre,PT.Description,PT.[Radio Times Choice],PT.Date,PT.[Start Time],PT.[End Time],PT.Duration from programmeTEMP PT \
    LEFT JOIN programme P ON PT.channelId = P.channelId AND PT.Title = P.Title AND PT.Date = P.Date AND PT.[Start Time] = P.[Start Time] WHERE P.title IS NULL;" | sqlite3 database/tv.db

Note how it is easy to use either sqlite method, pipe command in or pass as second parameter of sqlite3.
Now I have all the raw data I need.
Next time, why the trackers and what can be mined.
05 May 2012 20:33
Got a postcard this morning, no not from a friend or family member on holiday with a nice picture on it.
This postcard had simply
Please call Emma on
01482 380680
thankyou

The note was hand written and looked like something a delivery company would put through the door, except someone had been in all day.
But what really spooked me was it was addressed to "The Occupier", hand written along with our address.
Wife wanted me to call it straight away but I googled the phone number and found out this is a scam perpetrated by a debt collection agency called Westcot.
Now you know how the back of a post card is in two halves, one for the message and one for the address. well the separator on this one is small print VERY small print.
2f is a trading name of Westcot credit services limited, registered in Scotland. no. 84131.
registered office: kyleshill house, 1 glencarin street, salcoats KA21 5JT

Any normal person will not see that the dividing line is in fact tiny tiny printed text.
If you get one of these, DO NOT RING! DO NOT GIVE THE REFERENCE NUMBER!
We have no bad debt so this is not legitimate, if you suspect the postcard is related to your bad debt then wait until you receive a real sealed letter.
Card was postage paid HQ44483
Note: I am classifying this as a SCAM due to the deceptive practises used, what the company is doing does appear to be legal(barely).
loading results, please wait loading animateloading animateloading animate
[More tags]
rss feed

email

root

flog archives


Disclaimer:
This page is by me for me, if you are not me then please be aware of the following
I am not responsible for anything that works or does not work including files and pages made available at www.jumpstation.co.uk I am also not responsible for any information(or what you or others do with it) available at www.jumpstation.co.uk
In fact I'm not responsible for anything ever, so there!

[Pay4Foss banner long]