Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: unix

  • Make An RSS Powered Email List

    Make An RSS Powered Email List

    Sometimes the problems we have are insanely more complicated than they should be, because our heads get trapped in a space they shouldn’t be. For years, I used FeedBurner to manage my RSS feeds. Then I realized I didn’t care that much, I didn’t need to know who was accessing my feeds, and really the only benefit was that I could put ads in my feeds (which … no one clicked on anyway) and 12 people could subscribe via email. Well, with some research I found that I could put ads in my feeds with some WordPress functions (I’ve since removed them) and I could use Subscribe2 as a plugin to email when I had new posts. Don’t get me wrong, I really like Subscribe2 except I didn’t like the interface. It’s on the back end of WordPress. My site is a BuddyPress site. Everything is on the front end for my users, and I like it that way. It all looks ‘branded.’ The plugin dev was, fairly, under the idea of ‘Works how I want it.’ I thought about forking, but as I looked at the code, I thought that I really didn’t need WordPress to handle this. I needed a way to email an RSS feed to a mailing list. A proper, opt-in, stable, mailing list, that wouldn’t affect my blog while processing.

    I found an answer that can be applied to any site, WordPress or not.

    What?

    The problem:
    I have an RSS feed I want to be able to email to people who don’t like RSS, and I don’t want to use FeedBurner, MailChimp or any external process.

    The solution:

    • Setup a MailMan list for your emails and configure as needed
    • Install and configure rss2email
    • Setup a cron job

    Why?

    Why not use [this product]? When I presented this to people the first thing they said was ‘Just use FeedBurner!’ or ‘Use MailChimp!’ I get that I can use those third-party vendors, but I don’t want to. I like to self-host because then, if things go wrong, I have the ability to do something about it. This is also how I increase my IT toolkit. I have a need, I find a solution. Sometimes I write the code, and sometimes I just master a new tool.

    Why Mailman? Mailman (or properly GNU Mailman) is installed on my server and while it may be old, the current stable version is 2.1.14, and that came out September 2010. It’s still updated, maintained and supported. And it works. I’m familiar with it, I’m comfortable with it, and there’s no reason not to. I did experiment with phpList, which claims to be able to handle RSS feeds as ‘source’ on it’s own. The problem, as my Twitter buddy @JohnPBloch warned me, is it’s not friendly. He said “I don’t care much for phpList. I always felt like the software was working against me instead of with me.” And he’s right. It took 5 minutes of setup for me to gag and delete it.

    Why rss2email? I don’t think RSS is dead. In fact, I still prefer it to email (or Twitter). My unread RSS list (currently run via Google Reader) sits, quietly, patiently, waiting my attention. I don’t get spam or junk feeds, and if I decided to mark all my Fail Blog feeds as read without reading them, no one knows but me. I don’t have to reply to anything, and I can go back to using my email for communication. That said, I know a lot of people who like to get updates in their email, so I have to take that into consideration when running a site. Sometimes, when you make a site, you cater to yourself. Sometimes you cater to your audience.

    Why cron? I’m on Linux. That’s what you do when you want to schedule things.

    How?

    Bear in mind that my details are going to be specific to my situation.

    Mailman is the easiest. You make a mailing list. I wanted an announcement mailing list, so under Privacy options -> Sender Filters I set up that all users were moderated by default and to discard their emails. Then added my email under “List of non-member addresses whose postings should be automatically accepted.” I knew I was going to have emails sent from a specific address, and I didn’t want that address to GET the emails, so by putting it on that list, I don’t have to worry about approving posts. That’s pretty much all the ‘special’ customization I did. If I’d wanted to put a reject to other people’s emails to the list, instead of a discard, I’d have added this “This is an announcement only list. Your post has been rejected.”

    rss2email was the hardest, but only because it was new. The install process is really straightforward. I made very minor tweaks to the config.py file, based on a first run when I realized that my mailing list would strip HTML. I set HTML_EMAIL and USE_CSS_STYLING to 0 to get it all to plain text. I changed my DEFAULT_FROM and OVERRIDE_FROM to ‘pretty’ versions of my domain information. Then I had to customize my SMTP stuff, since it requires authentication and is on a special port (not 25).

    cron was the most surprising. Since I have to run rss2email out of the same folder it’s in (Python…) I wrote a quick shell script called rss2email.sh that has two steps. It changes directory to my install of rss2email and then it calls the command ./r2e run. Then I called it with a twice-daily (0 and 12) cron job with /home/USERNAME/rss2email.sh and it’s done. I’m not going to give you a blow by detail on how to do cron stuff. It’s way too complicated to try and overview here.

    Satisfied?

    Pretty well. I’m still massaging the output a bit. The HTML output of a full RSS entry was weird. After changing my WordPress feed to excerpts, I ended up with some weird lines in my emails:

    This was caused by some old functions I had in my theme to insert ads into my feeds (mentioned above). I took that out, called ./r2e reset to clean out the database, and then re-ran the script. Worked fine! I admit, I had some moments where I didn’t like putting the excerpt in. I use a custom crafted excerpt in every single post on this site as part of my layout, and while I prefer to avoid ‘read more!’ type things in my feeds, I realized my custom excerpts would be ‘good enough’ for anyone who cared. I do wish WordPress would make two feeds, one excerpt and one none, but that’s a job for another day. At least I’ve managed to semi-customize my Mailman digest header (via manually editing masthead.txt).

    After all my testing, I deleted the mailing list and reset everything, did a clean build, and voila! Works like a champ, without putting extra stress on things!

  • My WordPress Scripts

    My WordPress Scripts

    I actually have a couple little scripts I use to update my site instead of using the WordPress built in tools. Invariably, when someone has problems with the automated upgrade tool I comment that I rarely use it. That means people ask me ‘Well how do you update WordPress?’ and I tell them ‘Manually.’ Which isn’t really true. Sometimes I say ‘I wrote a script.’ So … here’s my scripts for upgrading WordPress. I don’t have one for themes, since I always do that manually and check everything. That’s the bane of having highly customized child themes.

    These are both Bash shell scripts and run perfectly happily on my server (Linux running CentOS). They’re GPL2, free, and somewhat use at your own peril.

    WordPress Upgrade Script

    WordPress Plugin Script

    As I went to post these, I took a bit of time cleaning them up, putting in comments (the ‘Blame Nacin’ bit is a joke) and formatting them nicely. When I started to look at the plugin one, I realized how freakin’ crazy it is and how many weird custom things I tucked in there. Also I made it so you always had to put in a version number, which if I wanted to ‘release’ that, is something to do away with. One of the catches with it is that a plugin’s ‘default’ zip file of plugin-name.zip is the trunk build. So you’d always have to know your version to upgrade.

    On the WordPress upgrader, I could trick that with a simple “if the version is ‘latest’, then download latest.zip”, because the WordPress most recent release is latest.zip. Now, I never call that. I like being master of my domain (pun? maybe) so I always make the time to know what I’m calling. I could have assumed that if you don’t put a version in, you want to install the latest, which is pretty easy to flip around, but I don’t like to assume like that. Still, you can take the code if you want it and do that.

    On the other hand, I think if someone wants a plugin, then they should be able to go ‘Get me the plugin’ and it should default to the latest. Probably this is because I test trunk WordPress a lot, but rarely do I trunk a plugin. Since there’s no easy to gank default like that for plugins, I came up with this really fun bit of code, shoved in an if-then check. I really like awk, by the way. It’s insanely powerful.

      # We're getting the readme from the repo and using that to calculate the latest stable release.
      wget -qO $1-readme.txt http://plugins.svn.wordpress.org/$1/trunk/readme.txt
    
      if ! [ -f $1-readme.txt ]
      then
        echo "FAILURE: The plugin is goobered in the WordPress repository, so we can't determine the latest stable release."
        exit 1
      else
        tr -cd '\11\12\40-\176' < $1-readme.txt > $1-readme-tr.txt
        VERSION=.`awk '/Stable/ {print $3}' $1-readme-tr.txt`
        rm $1-readme.txt $1-readme-tr.txt
      fi
    

    All these are free for use, and both are under GPL2. I can offer SOME support, but they’re filed under ‘Works for me!’ in my bin.

  • Unix One Liner – Writing to a file

    Unix One Liner – Writing to a file

    In 2010, I had to log into 100 odd accounts and edit the .profile file so that the line ‘cd ~’ was included. Sounds time consuming, doesn’t it? I couldn’t use a for-loop to log into the accounts, but since they were named ‘test001’ through ‘test100’ and they all had my sudo password saved, it was pretty easy to sort out what I needed. And by easy I mean I pled to Twitter and got stumped on ‘cat’ for a long time until, finally, I wondered if echo worked the way I thought it did.

    It does perplex me that ‘write’ doesn’t. I mean… it should, right? ‘write filename content’ but no. Not so much. And even echo doesn’t format the way I’d expected! It’s

    echo CONTENT >> FILENAME

    Oh Unix, I love you so.

    sudo su - test001
    echo "cd ~" >> .profile
    exit
    

    The trick was remembering that echo … echos. So if I’d use echo cd ~ >> .profile I would have ended up with cd /usr/home/account/ in my .profile, which I didn’t want. The other trick was remembering that the >> part means ‘Add to’ so if the file DID exist (it never did) it would add this to the end on a new line.

    So it only took me 5 minutes instead of the far longer way!

    sudo su - test001
    vi .profile
    a
    cd ~
    [esc]
    ZZ
    exit

    And yes, I did make a for-loop ‘for test001 through test100…’ though this ended up not working as well as I wanted it to, when I found some of the older accounts were named tst099 and test_100 for some reason. Ahh, scripting. You work so well when everyone else is consistent.

  • But If, Baby, I’m The Bottom, You’re The TOP

    Earlier this month I talked about how my server was acting wonky and how I fixed it using, among other tools, TOP.

    This week I was chatting with a fellow about CPU usage and his site. He runs a rather large WordPress blog and the database is about 500 megs. As a comparison, this site, with about 500 posts, is under 5 megs, and my big site, with thousands of posts, comments, and a forum, is 10 megs. The biggest site I run on my server is 850 megs (just down from 910 after some clean up). The difference between his site and mine is that his is slow and he knows it. As we discussed ways to speed it up, I had some thoughts on WordPress and how, at a certain point, you’re going to need to dig into the guts of your server and learn TOP.

    The ‘problem’ with most ‘How do I make my WordPress site run faster?’ tutorials, as I’ve seen it, is they address surviving the digg effect. That is, they talk about how to deal with having a high volume of traffic on your site and, for the most part, you can make it with just adding caching plugins.

    Once your site gets ‘big’ or ‘popular’ you’re going to have to move off shared/cloud hosting and over to your own server. For most of us, the first step is a VPS (Virtual Private Server). Shared Hosting means ‘You have an account on a server with a hundred other people.’ It’s great for small sites, inexpensive and easy to use. The problem is you could have terrible neighbors, who use up all the CPU. Think of it like those old New York apartments where someone’s a jerk at 5am and uses up the hot water so you, at 7am, have none. Yeah, it’s kind of like that. That’s the day you think ‘I want a house!’

    Only, well, we’re not all up for houses just yet. A house would be a dedicated server, where it’s just you. Cloud hosting, which I touched on earlier, would be the college dorms of webhosting. It has a lot of benefits for the really small sites, and actually some for large sites, but I’m not sold of their overall usefullness yet, so I’ll talk about them some other time. What I want to talk about are Virtual Private Servers, the condo-sub-leasing (or rent-to-own maybe) of website hosting, and how the new VPS user should really get on TOP of things (sorry, bad pun) to make their lives easier.

    TOP. Well ‘top’ really. Unix commands are generally all lower case like that.

    The top command is a system monitor tool that outputs a list of processes. Have you ever seen Task Manager in Windows? It’s kind of like that tab for ‘Processes’ that you look at and run away from. The default view of top is by percentage of CPU usage and the “top” CPU users are listed. See? The name made sense. You can also see how much processing power is being used, memory hogs and other cool things. Most modern Unix-systems let you sort the list, colorize it, etc, though you have to be command line savvy.

    Here’s what top looked like for me about an hour ago.

    top - 12:44:44 up 126 days, 23:13,  1 user,  load average: 0.12, 0.17, 0.17
    Tasks:  91 total,   1 running,  90 sleeping,   0 stopped,   0 zombie
    Cpu(s):  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si
    Mem:    524288k total,   358248k used,   166040k free,        0k buffers
    Swap:        0k total,        0k used,        0k free,        0k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    15616 nobody    15   0 94540  65m  20m S  1.0 12.8   0:00.27 httpd
    12261 ipstenu   16   0  1908 1012  780 R  0.4  0.2   0:00.30 top
    [...]
    28630 root      16   0  107m  86m 1096 S  0.0 16.8   1:06.30 /usr/sbin/clamd

    I wanted to point out clamd, which has been the bane of my existance. Thing won’t DIE. I ended up going in to /etc/exim.conf and manually commended out the clamd line (and restarted the service) to finally get it gone.

    But top, as you can see, has a freakishly large amount of information. My server is doing fine, at this point, so I don’t have a whole lot to show you. What you can see right away is that I can tell, with a glance, what’s going on. I could see, though and at this point I have a ‘nobody’ process. That just means someone’s accessing my website. No, really! That’s good! The CPU and memory usage seem high, but they vanish in a second. Basically, someone rang my doorbell and for that brief moment, electricity was used. The next thing I see is the top command, which is run by me (hi!) and down the line is that idiot, clamd.

    I actually scan top a lot at work these days, trying to understand what’s causing issues. It’s good for ‘right now!’ things, but not so much if I want to see what started a strange spike a couple hours (weeks) ago. For that you need a whole mess of tools.

  • Hotlinking

    There are two questions I need to spell out for folks before I get into the code bits, and I’ll try to keep this as light-tech as possible. If you run a website, or have a free site, or just want to post your pictures on the web, you need to know this. If you have your own domain, you need to know this. If you post pictures to a bulletin board, you need to know this. Basically, if you use the internet at all, read this. I’ll let you know when you need to stop reading.

    Things everyone should know
    Things every webmaster should know
    Things every ISP should know


    Things everyone should know

    Bandwidth

    Bandwidth means, for computer users, the data transfer rate, or how much data can be transferred in a given time period. The easiest example here is how you access the net. If you use a modem for dial up (and I feel for you), then you use 14.4, 28.8, 33.6 or 54 kilobytes (kb) per second. To give you an idea how small a kb it, one letter (that is ‘a’ for example) is a kb, roughly. That’s not an exact science, but it’ll give you a rough idea. At 14.4, your email downloads at roughly 14 letters a second. Which is why dial-up sucks. In the world of computers, bigger bandwidth is better. The more bandwidth, the faster you can download the preview of the new Batman movie.

    In addition to speed, bandwidth also means how much data you can transfer in a given time period. This website has an allocation of 30 gigabytes of data per month, and we average about 5. My other website has the same allocation and averages 18. If I go over my data transfer for a given month, I can either pay out the nose for extra bandwidth, or I can let the site be shut down till the next month. The reason this is important to know, is if you run a website, every time a page loads, you use bandwidth. On a site like Yahoo! GeoCities, you get 3 GB/month. Yeah, you think that’s great, but it really sucks if you want to post things like a blog and people click here a lot. This aspect of bandwidth is the reason why most sites I design are low on the graphics. More graphics means more data transfered means more bandwidth used. In the case of data transfer allocation, bigger sites does not equal better, though bigger bandwidth is king.

    Then again, the bigger your site, the longer it takes to download, and the less time it takes for people on 56k to get pissed and tell you that you suck. Finding a webdesign that’s a balance between your dream design and speed is why people like me have jobs.

    In summation: Bandwidth controls how fast you can view the net from your home, as well as how much data a website can share with the world each month. Having more bandwidth is better all the time, but forcing users to use more bandwidth with image heavy sites and poorly coded web pages is not cool.

    Hotlinking

    Hotlinking is putting a link to someone else’s webpage’s graphic on your site. This is also called bandwidth theft. Directly linking to a website’s files (images, video, etc.) means that when someone accesses your website, they draw bandwidth from another. If you use an >IMG< tag to show a picture from someone else's page on your blog, forum post, or website, that's hotlinking. You're stealing their bandwidth. There is a case in which this sort of 'theft' is ethically permissible, though some webhosts don't like it. If you have multiple Yahoo! sites, and one is low on bandwidth, you can shuttle some of your content to the other site, and thus split up the bandwidth. This isn't always a good idea, as if it's against the Terms of Service on your host, they can kill you. Which is why you should always back up your websites on your on computer. If you own your own domains (like I do) and have multiple 'subdomains,' then it's okay to share an image. ipstenu.org is considered a different website that ipstenu.org/blog, so I have to tell my server it's okay to share between the two. But that's code geeky. What the common websurfer needs to know is this: direct linking to a picture, movie file, or any other content on someone else's site, unless it's a simple URL link to that site, is bad form, ethically asinine, and impolite. It’s akin to stealing electricity from your neighbor by plugging into their outlets.

    In summation: Hotlinking is stealing bandwidth from someone else’s website, and is considered to be unethical.

    Things every webmaster should know

    Now that you’ve gotten this far, we’re going into heavy geekitude. I have actually once had my site nearly shut down because someone was hotlinking to an image, and I had to figure out how to prevent it. This is the knowledge I share with you.

    Hotlink Prevention for Apache

    Apache is the de facto webserver for Unix. I don’t like IIS (Windows webserver) and so few people use Netscape’s webserver, I won’t even consider that anymore. Pretty much, I use Apache and if you don’t, I haven’t a clue how to help you.

    On Apache (and in theory this works on IIS, but as I said, I don’t use it), there is a file in the root of your html folder called .htaccess. This is an Apache directives file, or a config file, that controls how Apache handles the folders in the same folder as the .htaccess file. Your website has a folder, usually called public_html. Inside that folder you have things like a file named index.shtml and a folder named cgi-bin. Below is an example of what my webserver’s root public_html folder might look like.

        .htaccess    blog      index.shtml    images      cgi-bin   robots.txt
        folder1      folder2   foldern

    The .htaccess folder controls how the subfolders (blog, cgi-bin, folder1, folder2, and foldern) are handled. If I look at my .htaccess file, and you can open it up in your text editor of choice, I see this at the very bottom:

        RewriteEngine on
        RewriteCond %{HTTP_REFERER} !^$
        RewriteCond %{HTTP_REFERER} !^https://ipstenu.org/.*$      [NC]
        RewriteCond %{HTTP_REFERER} !^http://ipstenu.org/.*$      [NC]
        RewriteCond %{HTTP_REFERER} !^https://ipstenu.org.*$      [NC]
        RewriteCond %{HTTP_REFERER} !^http://ipstenu.org.*$      [NC]
        RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]
    

    This means that I’m telling Apache to turn on the mod ‘RewriteEngine’ and to only permit my webpage (the HTTP_REFERER) to access the images. The images I list are in the ‘RewriteRule.’ I could use variables like ‘jp?g’, but I know what the file extensions are for the files on my server, and I cheat that way. If I wanted to be really mean, and didn’t worry so much about my bandwidth, I’d change the last line to RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ images/nohotlink.gif [L] so that when you try and link to /images/jojo.jpg, you’d get some witty image about how hotlinking is wrong.

    I actually do that on my other server, but the gif I use is 2k so it’s not something I worry about. It also makes it easy for me to later go back and see who’s been hitting that particular GIF and find the mean people. Yes, I have been known to send nasty notes to them.

    Keep in mind, as with any .htaccess rewrites, you may block some legitimate traffic (such as users behind proxies or privacy firewalls) using these techniques.

    Now here’s the big problem. Not all ISPs let you use the Rewrite mod! Half the reason I switched to my current provider was hotlinking (the other was SQL). The rewrite mod (module, don’t you know?) “provides a rule-based rewriting engine to rewrite requested URLs on the fly.” It’s totally magic, and I secretly adore it. It’s complex as fuck, though, and I still don’t really get all that it does. I do know that it works.

    Things ever ISP should know

    You’re a fucking bastard if you don’t let your users use mod_rewrite.

    Was that harsh? Sorry, I mean to say ‘You don’t give a rats ass about bandwidth if you don’t let your users use this.’ I’m well aware there are security ‘concerns’ about what mean people can do with it, but let’s face it, if someone’s smart enough to figure out everything you can do with mod_rewrite, then you’re in trouble anyway. There is a performance hit as every request is checked against the rewrite rules, so if you’re running an image intensive site, this can suck. But the trade off between performance and bandwidth are, to me, minimal.

    Look, if a user have a website with images, and some dickhead out there is hotlinking to that user’s images, then you, the ISP, have to handle the bandwidth crisis, and the pissy user asking you why he can’t use this feature to stop the dickheads?

    And speaking of security, I can’t find any hack for it. So if the fear is ‘really smart, but really evil people utilizing my server for nefarious purposes,’ I think that should be pretty low on the list. I’d put ‘spammer’ and ‘virus distributor’ ahead of it.

    Hotlinking can act like a DDoS attack, and if there’s ever a way to prevent it, by G-d, do it! The mod takes five fucking minutes to install.

    SimpleNet, I’m looking at you.