Actions

WikiApiary talk

Operations/2013/April

From WikiApiary, monitoring the MediaWiki universe

← Previous month Next month →

Audit Bee Change

Audit Bee got hung up trying to audit Ubuntu-Forum Wiki. It seems that this install has been modified to hide its version information. When queried via the API it returns:

"generator":"MediaWiki MediaWiki"

Audit Bee was throwing an exception on this and never auditing it. I made a change so that this will no longer stop Audit Bee. You will see a log message that looks like this:

Ubuntu-Forum Wiki Unable to determine version from MediaWiki MediaWiki. Auditing without confirming any flags. Operator please check.

This allows the audit process to not get hung up, even though it really cannot be audited automatically.

🐝 thingles (talk) 01:23, 3 April 2013 (UTC)

Ouch, I suspected this might happen. I had hoped that Bumble Bee will find the version somewhere. Still it had a good cause since Bumble Bee is better than ever. :) --[[kgh]] (talk) 20:50, 4 April 2013 (UTC)

More memory

I was excited to see today's announcement that Linode had doubled the memory on their instances. I requested the upgrade right away, which ended up causing 15 minutes of unavailability but then I'm back with 2x memory. I expect this will keep WikiApiary a little bit snappier, and I will be tweaking some memory settings further once I see how things shake out.

Pub2-memory-upgrade.png

🐝 thingles (talk) 01:09, 11 April 2013 (UTC)

I believe things are more energetic now. :) --[[kgh]] (talk) 17:38, 12 April 2013 (UTC)

Upgraded to 1.20.3

I finally upgraded my farm to 1.20.3. I also updated the majority of extensions via git. If you notice anything off please let me know. 🐝 thingles (talk) 02:09, 11 April 2013 (UTC)

Wikia?

I added a bunch of Farm:Wikia wikis tonight for fun. I’m curious what people think of adding Wikia sites en masse. If nothing special is done it would totally skew extension counts. It would also make the WikiApiary storage and polling needs increase a lot. Thoughts? 🐝 thingles (talk) 03:59, 11 April 2013 (UTC)

We should definitely start to separate statistics and counts since it is most and more interesting to see what the others, i.e. other than farms, do in the wikiverse. This provides e.g. "much" better information on which extensions are actually used rather than just installed and probably not used (probably the case for farms). What about starting out with "smaller" farms like Shoutwiki etc. rather than Wikia which seems to be just to big at the moment. --[[kgh]] (talk) 14:30, 11 April 2013 (UTC)

Language categories

I modified Template:General siteinfo to set categories for websites with the language that a wiki is set to. I mostly like to stick with semantic properties, and Property:Has language exists but I think categories will be useful for visitors and queries. Additionally the categories use the language name where the property is using the language code. The category pages all just transclude Template:Language category so they can easily be added and modified. They are all subcategories of Category:Languages. The names of the categories are derived from the language codes using Template:Language label. There are still a number of language codes identified in Special:WantedCategories that I didn't put mappings in Template:Language label for yet. If others want to fill the rest out feel free, or I'll get the rest later. 🐝 thingles (talk) 02:46, 12 April 2013 (UTC)

Cool idea and never forget the job queue ;) --[[kgh]] (talk) 17:03, 12 April 2013 (UTC)
That's a good point [[kgh]]! To help out, I added a near-realtime display on the job queue length in WikiApiary:Operations (diff). Big win using Extension:External Data! :-) 🐝 thingles (talk) 17:15, 12 April 2013 (UTC)
That's great. I could have thought about it in the first place. For admins the job queue is really good to have though one really has to know what triggered them to be able to interpret it better. Yeah, I should have a closer look at the Extension:External Data extension. I guess I underestimated it. --[[kgh]] (talk) 17:34, 12 April 2013 (UTC)

Defunct overrides everything else

By the way, I noticed some recent edits and wanted to highlight that when you mark a site as defunct, you do not need to uncheck the other flag fields (active, validated, etc). Defunct is an override for everything else, so the site will not be considered active, User:Bumble Bee will not bother with it, etc, regardless of what it's active setting is if it is marked defunct. Just FYI. 🐝 thingles (talk) 17:21, 12 April 2013 (UTC)

Oops, I did this all the time. :| Thanks for the re-vaccination. :) --[[kgh]] (talk) 17:35, 12 April 2013 (UTC)

Bug and bad behavior fixed for extension names

Saw a nasty issue with Beauzons.com today. It looked like User:Bumble Bee was broken but the real issue was that this site had an extension with the name Conversion utf8<=>latin. This presented two problems. First, it caused SMW to throw an exception when processing the subobject declaration which kept the page itself from even rendering right because of the =. This is the problem I saw, as autoedits to the Semantic Forms API failing and throwing an exception. Changing the = to a - removed that. However, the > and < signs are illegal as well. So, I made a helper function to filter these out and this should be all fixed now. Illegal characters are just being dropped, and equals is being changed to a dash. I fixed Beauzons.com/Extensions (change) by hand for now, Bumble Bee currently isn't able to reach the target. 🐝 thingles (talk) 18:42, 14 April 2013 (UTC)

Since this change Bumble Bee has been creating a lot of new extension pages with titles which include the URL, and no URL listed.--Ete (talk) 15:10, 15 April 2013 (UTC)
Thanks. Yeah, this diff shows what's happening. Those pages were previously invalid properties. I'll add code to attempt to unwrap these wikilinks hiding it titles. Ugh. :-\ 🐝 thingles (talk) 15:32, 15 April 2013 (UTC) PS: After I fix that, more new extension pages will be created and then the bad ones can be deleted.
Took me a while to get to this. The fix wasn't as bad as I thought it might have been. Man what I wouldn't give for MediaWiki to actually enforce some sanity around things like extension names and versions. This is a hard one to make sure I don't introduce any additional bugs. If you see anything weird please let me know. The query embedded below finds mostly extensions with this problem (note, there are valid ones in there). This should be an easy way to see if the fix worked. Assuming this works fine, after this has propagated through the Extension pages that have these URL's in the title should be deleted. Don't try to delete them before as Semantic Forms will just recreate them until all the references are gone. 🐝 thingles (talk) 03:35, 12 May 2013 (UTC) PS: I'm a little shocked to see Wikia doing this in the title of Extension:GoogleDocs4MW; tsk tsk. :-)
 
CustisWiki#Extension_HttpAuth
ETEC 510#Extension_httpsLogin
HLWIKI Canada#Extension_httpsLogin
HPC User Group EPFL#Extension_httpsLogin
IPiSoft Wiki#Extension_HttpAuth
IRB WikiDesk#Extension_httpsLogin
LIANE#Extension_httpsLogin
LNCD Wiki#Extension_httpsLogin
Linux Driver Project#Extension_Http:BL
NYU CCPP Wiki#Extension_httpsLogin
OXIDwiki#Extension_httpsLogin
Open-Naturstein#Extension_Http:BL
Rosalab Wiki#Extension_HttpAuth
Synthesia Help#Extension_httpsLogin
UFSGrid - Wiki#Extension_Http:BL
Uml2Wiki#Extension_HttpAuth
Wiki Testbed de aplicaciones mΓ³viles - Laboratorio de Software#Extension_httpsLogin
Wiki4Intranet#Extension_HttpAuth
YourcmcWiki#Extension_HttpAuth
Success! 🐝 thingles (talk) 04:19, 12 May 2013 (UTC)

Even more SMWInfo

I just saw an email on the semediawiki-user mailing list highlighting SMWInfo properties that were new to me. I tested the call on WikiApiary and indeed I get valid results (I'm running 1.9 alpha):

{
   "info":{
      "propcount":1264452,
      "usedpropcount":129,
      "declaredpropcount":"111",
      "proppagecount":118,
      "querycount":"52833",
      "querysize":"51359",
      "conceptcount":"8",
      "subobjectcount":"72581"
   }
}

I'm going to be modifying User:Bumble Bee soon to collect these new items: querycount, querysize, conceptcount, subobjectcount.

Comparing these new values to WikiApiary:Collect Semantic MediaWiki usage setup it looks like querycount and conceptcount are now in the new SMWInfo call, although the value I have for querycount is different than this. Querysize looks to be a sum of all sizes. Subobjectcount is a welcome addition that I didn't have in mine. Thanks to mwjames for adding these in response to bug 46458! 🐝 thingles (talk) 15:11, 15 April 2013 (UTC)

Okay, I just pushed a change (commit and commit) for User:Bumble Bee to request and store the four new stats that Extension:Semantic MediaWiki 1.9 return. Here is a screenshot to show this in the apiary_db. This isn't available for graphing yet. The PHP data accessors for the graphing need to be completely rethought and merged into one script. Later, at least now the data is getting collected. 🐝 thingles (talk) 01:26, 16 April 2013 (UTC)

NewSMWinfo-collected.png

No more Oakleys or Raybans

You may have noticed some spam user accounts being created matching "Rayban*" and "Oakley*". I just modified the wiki to deny any registration starting with those strings.

function DenyRegistrationByUsername( $user, &$message ) {
        $username = $user->getName();
        if (preg_match( '/^Rayban/', $username ) OR preg_match( '/^Oakley/', $username ) ) {
                $message = 'The username ' . $username . ' is banned on this wiki.';
                return false;
        }
        return true;
}
$wgHooks['AbortNewAccount'][] = 'DenyRegistrationByUsername';

If you see anything weird let me know. I did a test and confirmed it works. If other patterns show up I can easily modify. The next step might also be to feed these to fail2ban and block the IP addresses that attempt to register these. 🐝 thingles (talk) 03:39, 19 April 2013 (UTC)

SMW changes for better stats!

Just FYI, check out Bug 46458. MWJames has made some more changes in SMWInfo that will enable even more stats for Semantic MediaWiki sites! So awesome! 🐝 thingles (talk) 14:07, 19 April 2013 (UTC)

Cannot wait to see SMW 1.9 spread as soon as it was released. :) --[[kgh]] (talk) 18:10, 24 April 2013 (UTC)

Suspend a site?

As I've watched some +defunct activity and have been looking at sites that are in error myself I've been thinking that there might need to be something less severe than just marking a site as defunct. For example, User:Kghbln marked BromWiki (en) as defunct appropriately so as their API is returning PHP errors. However, the wiki itself is up. This is a reasonable thing to do, otherwise this wiki will generate errors for the foreseeable future. This makes me wonder if there shouldn't be an option to suspend a site?

For example, with BromWiki (en) another option would be to suspend checking the site for 7 days. Or 14 days. Or 2 months. Then let User:Bumble Bee check again and see if they have fixed things. Marking as defunct will take the site out forever. Thoughts? Perhaps this is too complex? a Another approach would be to have User:Audit Bee check defunct sites very infrequently to see if they have "unfunct" themselves. I've got reservations about doing that though. Thoughts? 🐝 thingles (talk) 11:31, 21 April 2013 (UTC)

I think the "active" marker could serve this purpose. As soon as I uncheck "active" Bumble Bee should stop checking the website for x days, probably a fortnight. "Defunct" is indeed more for wikis which are not longer there at all. These could also be revisited infrequently to be sure, but we should not worry about them to much. --[[kgh]] (talk) 11:23, 22 April 2013 (UTC)
Yep, I think splitting defunct is a good idea. Additionally, perhaps the split can be automated, so long as the bot can load up the main page and check if it gives an error. API not working->check main page, if only API is down mark as API unavailable, those with both down marked as down. Then have the sites with issues checked once every 1 hour*number of errors in a row^2 (or similar formula, perhaps using the time since last working to avoid needing to record number of errors), so a site would still get rechecked automatically a few times after being taken down, but would not be constantly checked once it's been down for a while.--ete (talk) 15:47, 22 April 2013 (UTC)

Welcome Backup Bee!

You may have noticed that I added some new properties related to backing up websites. This is all very experimental! This morning I hacked on User:Backup Bee for a while and I have him to a functional state for at least one backup type. Take a look at his code if you wish. Comments welcome. Only the "Snapshot (text)" backup option is supported right now. This bot will only run against a site once a week using the WikiApiary:Current day by hour segments groupings, see WikiApiary:Backup schedule for just backups. You can also see Backup Bee's log file. I've got this bot running in a debug mode right now to see how he is going. You'll see him write to a "Backup log" subpage on wikis that he backs up to (e.g., Wiki thing/Backup log). Would anyone be willing to volunteer to test a restore of the dump file? Let me know if you would do that. Please do not enable this on a wiki with more than 10,000 or so pages right now. Exciting stuff!

Huge credit to wikiteam who made their dumpgenerator.py code available. This is doing the hard work with User:Backup Bee just directing it.

🐝 thingles (talk) 16:38, 21 April 2013 (UTC)

Very cool. I may be able to test backups at some point (and would be fine with data from my wikis being used for any tests), but don't have shell access to either wiki right now (should get it for the larger wiki soon, but it has 21.6k pages) and have a few other tasks to do first once I get shell. Would it be okay to enable backups for my smaller (1.6k page) wiki anyway?--ete (talk) 15:47, 22 April 2013 (UTC)
Yeah, go ahead and enable it on the small ones. I need more wikis to test with. Note that only "Snapshot (text)" works right now so pick that. 🐝 thingles (talk) 17:49, 22 April 2013 (UTC)
I could do tests with CAcert in Berlin using SMW. --[[kgh]] (talk) 14:40, 23 April 2013 (UTC)
CAcert in Berlin has been backed up for the first time, see CAcert in Berlin/Backup log. The resulting file is less than 100k, and you can download it here. Please share how it goes. 🐝 thingles (talk) 20:39, 23 April 2013 (UTC)
I have just imported the backup into http://smw.wikihoster.net/ and it looks good to me. No problems during import. :) However only the last two revisions are stored, but this should be ok from my point of view. This is more than enough to preserve the data for memory lane. :) Cheers --[[kgh]] (talk) 18:04, 24 April 2013 (UTC) PS Widgets is not installed there, so the header looks a bit stange.
Awesome! That looks great. The snapshot backup is just that, only the most recent version. When I enable full backup that will give all page history as well. Thanks for testing this kgh! 🐝 thingles (talk) 20:00, 24 April 2013 (UTC)
Yeah, that's great. :) Sure, I did not think about the snapshot part. :| --[[kgh]] (talk) 20:05, 24 April 2013 (UTC)
Some wikis and farms will have their own regular backups; perhaps a syntax for linking for them might be useful? It would probably not be feasible to backup larger sites remotely on a regular basis, nor do I think many administrators would welcome such attempts. GreenReaper (talk) 20:14, 23 April 2013 (UTC)
Ah, I didn't understand this was already running, it's always nice to see WikiTeam job expanding. :-) I suggest to upload the dumps to archive.org, some can be quite big so it's better for your hosting too. I've commented on [1]. --Nemo 07:46, 4 May 2013 (UTC)

Planning upgrade to MediaWiki 1.21

I'm crazy excited to get upgraded to MediaWiki 1.21. I'm considering installing the 1.21rc4 candidate. I'm specifically really looking to dive into Extension:Scribunto. I think WikiApiary may benefit from some Lua capabilities. Any comments on the ugprade? Any of you done it? I know from Statistics that only 3 sites are using 1.21rcX so I'm guessing nobody here has done it yet. :-) 🐝 thingles (talk) 16:55, 22 April 2013 (UTC) PS: It's such a bummer that over 60% of the wikis monitored are 1.17.x and older. :-\

In case you cannot wait until mid of May you could try to install MW 1.21. WMF is already beyond this RC so ... Cheers --[[kgh]] (talk) 14:46, 23 April 2013 (UTC)
I'd just grab the REL1_21 branch from git using something like:
git clone https://gerrit.wikimedia.org/r/p/mediawiki/core.git
git checkout -b REL1_21 origin/REL1_21
You can then use git pull to keep it up to date. When REL1_22 is out, check that out instead.
As for 1.17, I'm guessing many people are happy with it. It works well, there's been few compelling reasons to upgrade, and you have to switch from SVN to Git, which is an additional hassle along with the normal extension and patch tweaks. :Incidentally, the 1.17.x and lower" graph on statistics page seems a little off - I think you need to adjust it to order 1.9 below 1.17 (and 1.16, etc. - I'm sure there are plenty on older versions that aren't represented in the graph; clearly lots are still on 1.9!). GreenReaper (talk) 19:45, 23 April 2013 (UTC)
I guess you have a point regarding later versions than MW 1.17. The best improvement in later versions from the users point of view is probably the new way to show the diff of revisions. Still people running MediaWiki tend to be a bit lazy when it comes to system maintenance. As we see here there are also a lot of cases there MW is installed and that's it. --[[kgh]] (talk) 18:09, 24 April 2013 (UTC)

Changed account creation from Questy to Recaptcha

WikiApiary (and all of my wikis) have recently had a big jump in spambot registrations. For now, I decided to switch to using reCAPTCHA on account registration. I've left everything else, so as soon a user performs email confirmation they will not need to provide captcha. Effectively, this really means that captcha is only for registration. If you see any issues please let me know. If you think reCaptcha is a terrible solution I'm all ears on other approaches. 🐝 thingles (talk) 11:54, 23 April 2013 (UTC)

Hmm, I am a bit worried about ReCAPTCHA. So far I have had much better results with questy. Probably changing and hardening the set of questions is a way, e.g. "Type in the third letter of the third word." or "Enter the result of five plus three instead of 8+9" or "Type in ZGL in reverse order" and nasty things like this. Cheers --[[kgh]] (talk) 14:44, 23 April 2013 (UTC)
I saw two registrations come through after ReCAPTCHA. So, I went back to Questy but am trying something dynamic. Using these two for now to see how it works out.
$wgCaptchaQuestions[] = array (
    'question' => "What day of the week is it at <a href='http://google.com/search?q=gmt+time'>Greenwich Mean Time</a> (GMT) right now?",
    'answer' => gmdate("l")
);
$wgCaptchaQuestions[] = array (
    'question' => "In 24-hour format, what hour is it in <a href='http://google.com/search?q=gmt+time'>Greenwich Mean Time</a> (GMT) right now?",
    'answer' => gmdate("G")
);
Thoughts? I'll probably add some spelled-out-number ones as well. 🐝 thingles (talk) 15:08, 23 April 2013 (UTC)
Also be sure you're making full use of DNSBLs. WikiFur currently uses $wgDnsBlacklistUrls = array( 'combined.abuse.ch.', 'xbl.spamhaus.org.', 'dnsbl-3.uceprotect.net.', 'dnsbl-2.uceprotect.net.', 'cbl.abuseat.org.', 'http.dnsbl.sorbs.net.', 'opm.tornevall.org.' ); (our server is in Europe, you might want to reorder these for shortest ping times from your server). GreenReaper (talk) 19:30, 23 April 2013 (UTC)
Thanks for the suggestion. I do the stopforumspam import weekly. I just added and enabled the $wgDnsBlacklistUrls too. I've probably put in enough stuff now that I should just sit tight and see what happens. :-) 🐝 thingles (talk) 20:29, 23 April 2013 (UTC)
I also just added another Questy rule that I think will prove very effective.
$myChallengeString = substr(md5(uniqid(mt_rand(), true)), 0, 8);
$myChallengeIndex = rand(0, 7) + 1;
$myChallengePositions = array ('first', 'second', 'third', 'fourth', 'fifth', 'sixth', 'seventh', 'eighth');
$myChallengePositionName = $myChallengePositions[$myChallengeIndex - 1];
$wgCaptchaQuestions[] = array (
    'question' => "Please provide the $myChallengePositionName character from the sequence <code>$myChallengeString</code>:",
    'answer' => $myChallengeString[$myChallengeIndex - 1]
);
I tested this and it seems that Extension:ConfirmEdit works fine with the answer being dynamic like this. If you are so inclined I'd love people to create some accounts to make sure that I haven't broken anything in the process of making this much stronger. Just share the username so the account can be deleted. My testing on another wiki with these rules suggests they work fine. Questy appears to store the answer each time so the randomness appears to work fine. 🐝 thingles (talk) 20:29, 23 April 2013 (UTC)
Interesting configuration. Another information I'd love to see in the stats is the kind of captcha being used for account creation, but that's not so easy. --Nemo 07:50, 4 May 2013 (UTC)
Interesting Nemo, what would you be looking to identify by knowing the type of captcha? 🐝 thingles (talk) 13:40, 4 May 2013 (UTC)
We should ask a spambots owner to counsel us! :p I don't know, probably something in the HTML of Special:userLogin/signup, but it will always be hacky. --Nemo 19:48, 4 May 2013 (UTC)

Naming Extensions Better?

Check out the conversation that kgh and I had at Extension talk:VisualEditor regarding the reuse of the name by another organization. It causes a lot of problems including version numbers being really weird and making it nearly impossible to send a notice that someone is using an outdated version. I'm digging this idea of looking at the URL of the extension and conditionally appending the the name of the extension. I think this would also make sense for Wikia. Really any companies that create a large number of extensions putting their name in the extension kind of works. I could imagine there being a Extension:VisualEditor, Extension:VisualEditor (Wikia) and Extension:VisualEditor (Hallo Welt) all at the same time. It's an easy change to add to User:Bumble Bee. It will create new extension pages and will undoubtedly orphan some current extension pages. But I think it could be a good solution to duplicate names.

Some numbers related to this. There are 1376 instances of extensions from Hallo Welt. This represents 78 extensions. This is the list of extensions that would have (Hallo Welt) appended to their name. Many of these are forked from extensions with the same exact name but are now on very different version numbers.

Extension:WidgetBar, Extension:NamespaceCss, Extension:WantedArticle, Extension:Blog, Extension:Emoticons, Extension:PermissionManager, Extension:MailChanges, Extension:RSSFeeder, Extension:Review, Extension:FormattingHelp, Extension:PageAccess, Extension:Preferences, Extension:WhoIsOnline, Extension:WikiAdmin, Extension:ArticleInfo, Extension:ResponsibleEditors, Extension:StateBar, Extension:RSSStandards, Extension:ShoutBox, Extension:GroupManager, Extension:UserSidebar, Extension:SaferEdit, Extension:BlueSpiceProjectFeedbackHelper, Extension:ExtensionInfo, Extension:Authors, Extension:InterWikiLinks, Extension:WatchList, Extension:HideTitle, Extension:InsertMagic, Extension:Statistics, Extension:CountThings, Extension:NamespaceManager, Extension:LinkTankConsistencyCheck, Extension:LinkTankWebCapture, Extension:LinkTankTagCloud, Extension:LinkTankMail, Extension:LinkTank, Extension:LinkTankAutoCreateLinkedTitles, Extension:RentALink, Extension:LinkTankExtendedSearch, Extension:LinkTankCategoryTree, Extension:LinkTankCrawler, Extension:PagesVisited, Extension:TopMenuBarCustomizer, Extension:UserPreferences, Extension:CSyntaxHighlight, Extension:ExtendedSearch, Extension:VisualDiff, Extension:UEModulePDF, Extension:InsertFile, Extension:UserManager, Extension:PiwikConnector, Extension:Avatars, Extension:Notifications, Extension:Bookshelf, Extension:Rating, Extension:SuperList, Extension:InsertLink, Extension:Dashboards, Extension:FacebookConnect, Extension:BookshelfUI, Extension:VisualEditor, Extension:RatedComments, Extension:FlowPlayer, Extension:Flexiskin, Extension:UEModuleBookPDF, Extension:PageTemplates, Extension:ConfirmSelectedAccounts, Extension:Checklist, Extension:ImageMapEdit, Extension:SmartList, Extension:Readers, Extension:UniversalExport, Extension:InsertCategory, Extension:BlueForge, Extension:SecureFileStore, Extension:Disqus, Extension:InfoBox


Some numbers related to this. There are 42 instances of extensions from Wikia. This represents 10 extensions. This is the list of extensions that would have (Wikia) appended to their name.

Extension:Wikia Special Unlockdb, Extension:WikiaNewFiles, Extension:User Wall - disabled, Extension:Create Page, Extension:Content Feeds, Extension:Founder Emails, Extension:Wikia RSS feed, Extension:MiniEditor, Extension:RelatedPages, Extension:User Wall

Notably Wikia already puts their name in some of the extensions. I could make User:Bumble Bee smart enough to only append the organization name if the name isn't already in the extension name itself. So "MiniEditor" would become "MiniEditor (Wikia)", but "Wikia RSS feed" would not be changed.

Looking forward to comments. 🐝 thingles (talk) 02:24, 26 April 2013 (UTC)

Any comment on this? I found at least one example of this convention being used already in Extension:Piwik Integration and Extension:Piwik Integration (DIPF Edition). After a couple of days this still seems like a good approach to use. 🐝 thingles (talk) 14:32, 27 April 2013 (UTC)