Wikipedia talk:CSVLoader/Archive 1

Tag extension[edit]

Would there be any benefit in developing a tag extension to process CSV (or hyphen-separated, or whatever) files, given that no spreadsheet applications I know of are equipped to handle piped wikitext? Tisane (talk) 12:47, 2 June 2010 (UTC)[reply]

Is your question related to the CSV plugin? I do not understand what you are looking to do. — Ganeshk (talk) 02:09, 3 June 2010 (UTC)[reply]

Creating new pages in ta.wiktionary[edit]

DoneThank you very much indeed Ganesh! for your teachings through phone, particularly in our mother(Tamil language) tongue.my first entry in ta .wiktionary . We are going to discuss about the upload of the one lakh words--தகவலுழவன் (talk) 02:24, 16 July 2010 (UTC)[reply]

You are welcome. I am glad to hear that the first article is out. Let me know if you have any questions. — Ganeshk (talk) 12:02, 16 July 2010 (UTC)[reply]

adding more categories in the existing page[edit]

You trained me well for new productions. I am having a question about the categorization in already existing pages.Each word in the ta.wiktionary comes under many categories. Sometimes in many words, few categories to be added. How can i add those categories? There is an option to add, only one category in the more... tab of AWB.--தகவலுழவன் (talk) 02:18, 18 July 2010 (UTC)[reply]

Use the Append/Prepend Text option to add any text you want. Check "Enabled", select "Append" and enter the new categories in the box below. — Ganeshk (talk) 16:45, 18 July 2010 (UTC)[reply]

I used the way. but if one of the category already exists, it writes one more time. The only one category way automatically skips perfectly. then how can i manage more than one category perfectly as the only one category way.--தகவலுழவன் (talk) 02:05, 19 July 2010 (UTC)[reply]

mmm, I guess the only way is to run the bot once for each category. Finish the first category and move to the next and so on. — Ganeshk (talk) 04:49, 25 July 2010 (UTC)[reply]

Differentiate your find and replace[edit]

Will you please differentiate AWB-find and replace with CSV-find and replace usage?
In each word, i am in a position to replace a file (file:Example.jpg) in a template {{படம்|file:Example.jpg|{{PAGENAME}}}} with the specific file which differs from word to word.--தகவலுழவன் (talk) 14:22, 24 July 2010 (UTC)[reply]

I think AWB's Find and Replace will do for this requirement. — Ganeshk (talk) 04:51, 25 July 2010 (UTC)[reply]

ta-wiktionary-test pages[edit]

Hope you well. Will you please see the 25 test pages of your CSVloder.Please furnish your view and a vote. See you then.--தகவலுழவன் (talk) 07:02, 7 October 2010 (UTC)[reply]

Thanks for making this available[edit]

I just found this page and I am quite eager to try it. I wrote some AWB modules with similar capabilities, see here, but this seem much cleaner solution. I am planning to use it for creating and updating commons:category:Creator templates on Commons. Thanks --Jarekt (talk) 14:52, 4 February 2011 (UTC)[reply]

I just tested it to make sure it is compatible with the latest AWB version. Please try it and let me know your feedback. — Ganeshk (talk) 00:15, 5 February 2011 (UTC)[reply]

I tried it and it did mostly what I wanted: I managed to upload 3 pages (like this. It basically did what I needed, except for handling of foreign characters. I had word "Zasów" which was uploded as "Zas�w". Other Improvements I can see:

allow resizing of the CSV loader Settings window, so there is more space for "article text".
Pick a font with equal size letters for "article text" window
I found name "article text" confusing, since it only make sense if you operate in the wikipedia article namespace. I was using it in Commons creator namespace. May be "Append/Prepend/Replace text".

I am still eager to try your Find & Replace functionality. Thanks, This functionality is something I was missing from AWB since first time I tried it. May be you should try to distribute your plugin with the rest of the code? --Jarekt (talk) 04:54, 6 February 2011 (UTC)[reply]

Thanks for the feedback.

Please save the text file as UTF-8 to fix the foreign character issue. I tested it here. You can do this on Notepad by setting the encoding box on the Save dialog to UTF-8.
I will the add window resize request to my to do list.
I have changed the font for the text to Lucida Console. Please download the new DLL, 1.0.0.10.
I fixed the text caption to say, Append/Prepend/Replace text. Please download the new DLL, 1.0.0.10.

I plan to add this to rest of AWB source at some point. It will help me out with the debugging as well. — Ganeshk (talk) 16:33, 6 February 2011 (UTC)[reply]

Thanks. I will let you know how is the new version working out. --Jarekt (talk) 04:21, 7 February 2011 (UTC)[reply]

My second attempt worked better. I uploaded ~100 infobox templates based on CSV data scraped from Wikipedia. I had some issues though:

the %%key%% did not seem to work. It is AWB build-in shortkey for returning names based on the page name in format compatible with DEFAULTSORT.
I had to do 3 separate runs to create all templates. First 2 runs only created handful of pages and skipped most of the rows. But 3rd one was a charm and created most of the templates. This issue only happen when bot autosaving was used - in manual mode no pages were skipped.
Auto loading of the list based on the first column has a little of an issue - if CSV file thinks it has some empty rows at the end than "empty page names" are added to the list. There should be some check to prevent that.

Otherwise it is a great tool. Thanks again --Jarekt (talk) 20:41, 13 February 2011 (UTC)[reply]

I suggest you to use ## (any symbol will work) for the fields. For example, ##key##. This will not conflict with AWB keywords.
The tool will work with bot autosaving as well. Please check the skip conditions.
I will look into the empty row issue.

Thanks for the feedback. — Ganeshk (talk) 23:57, 13 February 2011 (UTC)[reply]

I did another round of working with CSVLoader - this time using replace feature in autosaving mode. It worked great as long as you stick precisely to the steps: create replacement rules, load the file and run from the beginning to the end. Any small alteration of the rules and the process stopped working:

For example I usually like to do a lot of testing of the new script before I unleash it and that usually broke the script. More precisely if I pick file #1, #100, #200 and last one from the list it works fine but then when I try file #2 than I get wrong substitution.
Another thing I like doing is testing and tweaking the replacement rules, and I was using several rules at the same time. Unfortunately each time I look at my rules all ##keywords## are gone replaced with the last substitution performed. I fix them all to the original state and do my tweak, but then substitution does not work without reloading the csv file. So I had to reload the file. All those steps take a while. Better way would be to allow changing the rules without need to to redo all the steps.

But even with all those limitations this tool is much better than any alternative approaches I come up with, and allows me to do semiautomatic cleanup edits in 3 phases: 1) run a bot to capture some text, like author from some collection of files or templates and save it to a file; 2) use spreadsheet to correct/unify the text; 3) use CSVLoader to replace original text with corrected version. --Jarekt (talk) 14:12, 22 February 2011 (UTC)[reply]

Agreed, the plugin has limitations. The plugin stores the replacement rules (with the ##keywords##) when it is started. As each article gets processed, the replacement rules are purged and replaced with the actual values from the file. That is the reason you see the last substitution made. It will be difficult for the plugin to figure out that rules have been changed in the middle of the run. Do you have any suggestions on how this can be done?
I will check into your issue #1.

I am glad to hear that you are finding the plugin useful. — Ganeshk (talk) 00:55, 24 February 2011 (UTC)[reply]

Error message[edit]

Status New

Description

Exception: NullReferenceException

Message: Object reference not set to an instance of an object.

Call stack:

   at WikiFunctions.Parse.FindandReplace.Decode(String text)
   at WikiFunctions.Parse.FindandReplace.AddNew(Replacement r)
   at CSVLoader.CSVLoader.ProcessArticle(IAutoWikiBrowser sender, IProcessArticleEventArgs eventargs)
   at WikiFunctions.Article.SendPageToPlugin(IAWBPlugin plugin, IAutoWikiBrowser sender)
   at AutoWikiBrowser.MainForm.ProcessPage(Article theArticle, Boolean mainProcess)

AWBPlugins	AWBBasePlugins	ListMakerPlugins
CSV Loader	No Limits Plugin	UserContribsNoLimitsForAdminAndBotsPlugin UserContribsUserDefinedNumberForAdminAndBotsPlugin WhatTranscludesPageNoLimitsForAdminAndBotsPlugin WhatTranscludesPageAllNSNoLimitsForAdminAndBotsPagePlugin CategoryNoLimitsForAdminAndBotsPlugin CategoryRecursiveNoLimitsForAdminAndBotsPlugin

Jarekt (talk) 16:11, 21 March 2011 (UTC)[reply]

To duplicate: [encountered while processing page [1]]

Site URL: http://commons.wikimedia.org

Operating system Microsoft Windows NT 5.1.2600 Service Pack 3

.NET FW Version 2.0.50727.3615

AWB version AutoWikiBrowser (5.2.0.0), WikiFunctions (5.2.0.0), revision 7471 (2010-12-17 01:03:47)

Workaround deleting all replacement rules not using data pulled out of the file

Fixed in version

Ganeshk, This is error message I get if one of the replacement rules do not involve data pulled out of the file. Also I do not seem to be able to use your tool with "Find and replace" "Advanced Settings". I usually use "Advanced Settings" for everything since I find them more readable. Greetings. --Jarekt (talk) 16:11, 21 March 2011 (UTC)[reply]

Hi Jarekt, I will take a look at this. I am a little busy in RL right now. — Ganeshk (talk) 00:54, 27 March 2011 (UTC)[reply]

Another issue[edit]

I run into another issue where I am not sure I understand the cause: The tool worked fine doing find and replace for first dozen or two of records but then something broke and started inserting the same text to remaining files not matching values in the spreadsheet. I suspect that the problem might be caused by the fact that I used skip option (when some template is present) with you tool. May be there is a way to detect when process breaks and stop it. --Jarekt (talk) 16:17, 30 March 2011 (UTC)[reply]

Two images, minor order change[edit]

Made a minor order change, moving the walkthrough up below the download subsection, and the history moved to bottom.

Here are two images to add at the beginning for the example page: [2][3] I would upload them myself but I don't upload images anymore because veteran editors here always delete them. Errectstapler (talk) 04:02, 14 April 2011 (UTC)[reply]

Hi Errectstapler, Thanks for the changes. The DLL need not be loaded each time if the CSVLoader.dll is copied to the AutoWikiBrowser folder. There is no loading required. That was the reason the walkthrough does not list that step. — Ganeshk (talk) 10:52, 14 April 2011 (UTC)[reply]

clarification[edit]

RE: Copy the downloaded CSVLoader.dll file to the AutoWikiBrowser folder

Is this the plugins folder? Errectstapler (talk) 15:57, 27 April 2011 (UTC)[reply]

No, it is the root folder where the AutoWikiBrowser.exe resides. — Ganeshk (talk) 03:58, 31 August 2011 (UTC)[reply]

Also the pictures on User:Ganeshk/CSVLoader/Walkthrough are of a text document not a csv document. The instructions describe a csv document. Errectstapler (talk) 16:21, 27 April 2011 (UTC)[reply]

CSV file is a text file with delimited data. — Ganeshk (talk) 03:58, 31 August 2011 (UTC)[reply]

merged instructions into walkthrough[edit]

I boldly merged the instruction into the walkthrough page. These were almost the same instructions, repeated twice :) Errectstapler (talk) 16:41, 27 April 2011 (UTC)[reply]

the edit summary[edit]

we(Sodabottle,Drsrisenthil & me) are creating new pages in ta.wiktionary as you guided. 90% work load is reduced because of your CSV loader.Thank you very much indeed.The remaining 10% workload lays in the edit summary section of AWB and making internal links. As you instructed, we are using open office spreadsheet.column A for heading, column B for its meaning.Is it possible to past automatically the column B content in the edit summary of AWB for the new word? Because, the edit summary differs according to the new word. When we patrol in the recent changes of the ta.wiktionary page, it will be easy. Otherwise, every time we have to open every new page to verify. please, make a option button (i.e. also paste in the Edit summary) in the CSV loader.We are constantly moving ta.wiktionary ahead.The ta.wiktionary position among all other wiktionaries now. Thanks in advance.தகவலுழவன் (talk) 00:04, 13 June 2011 (UTC)[reply]

Okay. I will look into adding this functionality. — Ganeshk (talk) 11:14, 13 June 2011 (UTC)[reply]

Done This has been implemented. Please download the new version, 1.0.0.11. — Ganeshk (talk) 03:42, 31 August 2011 (UTC)[reply]

Great.By this implementation, you have been reduced our patrolling time.Thanks indeed.--தகவலுழவன் (talk) 03:05, 3 September 2011 (UTC)[reply]

Thanks for this nifty feature Ganesh. As Tha.Uzhavan says, this has reduced patrolling/verifying time greatly :-)--Sodabottle (talk) 12:23, 14 September 2011 (UTC)[reply]

Glad to hear that. Happy to help. :) Just noticed that tawikt moved to 9th position. Congrats. — Ganeshk (talk) 04:30, 15 September 2011 (UTC)[reply]

Fixed the issues that you reported over e-mail. Please download the new version, 1.0.0.12. — Ganeshk (talk) 14:46, 3 September 2011 (UTC)[reply]

Thanks Ganesh, your plugin is an inevitable tool for our Tamil wiki (especially, Tamil wiktionary) experience, I appreciate your effort and prompt help.--Senthi (talk) 16:02, 15 September 2011 (UTC)[reply]

Drsrisenthil, No problem. Glad to be of help. — Ganeshk (talk) 22:07, 15 September 2011 (UTC)[reply]

the problem[edit]

In the recent changes of the ta.wiktionary, the edit summery comes only in bot mode not in the semi automatic way.Please, fix ?--தகவலுழவன் (talk) 05:55, 26 July 2012 (UTC)[reply]

Send me a screenshot. You have my e-mail address. — Ganeshk (talk) 09:52, 26 July 2012 (UTC)[reply]

Extremely sorry.It is my mistake.It is not exists in your current version.--தகவலுழவன் (talk) 23:13, 28 July 2012 (UTC)[reply]

Share CSV-files[edit]

Please post your .csv files in your namespace and post it below. Other projects could use the raw data to create articles in their language. Thanks in advance - Grashoofd (talk) 20:05, 2 July 2011 (UTC)[reply]

i too expecting those files --தகவலுழவன் (talk) 07:19, 3 July 2011 (UTC)[reply]

By the way, usable databases for bot writing are also VERY welcome. It's a maze out there.. Grashoofd (talk) 16:32, 3 July 2011 (UTC)[reply]

What am I doing wrong?[edit]

Hi, Ganesh. Thanks for making this; it will make things much easier for me. I've gone through 550 of the 640 Indian disticts updating census info and was pretty happy when I found that this program exists. So far it's not working for me, though, and I've been trying for hours on end.

As a test I've made this .txt file:

User:PhnomPencil/sandbox/Chilakaluripet,Guntur,Andhra Pradesh
User:PhnomPencil/sandbox/Manalurpet,Viluppuram,Tamil Nadu
User:PhnomPencil/sandbox/Khalia,Haora,West Bengal

After loading it, on the loader settings:

For options I tried different things, but "skip when no changes made" was the only one I checked, usually.

For field separator I wrote a comma.

For column headers I wrote:

##City##,##District##,##State##

I chose "replace text".

The text (large box) was

##City## is a city in [[##District## district]] in the [[India]]n state of [[##State##]].

Under "Skip" I chose "Existing pages".

So the list of three pages shows up, but when I click "start" a bug report pops up. Nothing shows up in the edit window. Every single time.

I've tried it with an older version of AWB -- 4.1 -- but that's "not enabled"; if I want to edit I can only use the latest version.

I'm no techie so I suspect I'm making some obvious error. Can you tell what that is?

Thanks for your help. I got AWB specifically so I could use this plug-in; it will revolutionize my editing. (And if I end up using it a lot I'll create a Bot account if required)

Cheers, PhnomPencil (talk) 14:32, 15 October 2011 (UTC)[reply]

It was a bug related to edit summary box on the loader settings (it should default to summary in the main window if blank). I have fixed it and released a new version, 1.0.0.14. Please try that and let me know if you run into any issues. I have also made comma as the default delimiter. — Ganeshk (talk) 15:14, 15 October 2011 (UTC)[reply]

Thanks for the quick fix, Ganesh. I'm grinning from ear to ear now. PhnomPencil (talk) 15:51, 15 October 2011 (UTC)[reply]

You are welcome. Please do read WP:MASSCREATION if you are planning to create a lot of stubs. — Ganeshk (talk) 18:34, 15 October 2011 (UTC)[reply]

How can i collect a particular data from en. wiktionary Category?[edit]

I have been translating this en.wiktionary category into the ta. wiktionary category To link audio files of commons, usally the template ({{PAGENAME}}) helps automatically.But in the above mentioned en.category, the template is worthless. Because, the header is in Chinese, the respective audio file name is in roman characters i.e., pinyin. Is it possible to collect the audio file name in one column of a spread sheet and the respective Chinese name in another column?--தகவலுழவன் (talk) 17:41, 4 January 2012 (UTC)[reply]

I am working on getting AWB access on English Wiktionary to help you out. See Wiktionary:Beer_parlour#AWB_access. — Ganeshk (talk) 12:05, 6 January 2012 (UTC)[reply]

I have e-mailed a CSV file over to you. — Ganeshk (talk) 02:10, 7 January 2012 (UTC)[reply]

Few lines from the extract:

a1,Zh-ā.ogg
啊,Zh-ā.ogg
阿,Zh-a.ogg
ai3,zh-ǎi.ogg
矮,zh-ǎi.ogg
ai4,Zh-ài.ogg
愛,Zh-ài.ogg
爱,Zh-ài.ogg
愛人,zh-àiren.ogg
爱人,zh-àiren.ogg
an1,Zh-an1.ogg

Here is the custom module that I used to extract:

        Public Function ProcessArticle(ByVal ArticleText As String, ByVal ArticleTitle As String, ByVal wikiNamespace As Integer, ByRef Summary As String, ByRef Skip As Boolean) As String Implements WikiFunctions.Plugin.IModule.ProcessArticle
            Skip = True
            Summary = ""

            Dim audiofile as string = ""
            Dim fw As System.IO.StreamWriter = Nothing
            Dim m As Match 

            m  = Regex.Match(ArticleText, "\{\{audio\|([^\r\n]+)\.ogg")
            if m.Success then
                    audiofile = m.Result("$1") + ".ogg"
            end if
		
            fw = System.IO.File.AppendText("C:\Temp\AudioFile.txt")
            fw.writeline(ArticleTitle + "," + audiofile)
            fw.close()
		            
            Return ArticleText
        End Function

— Ganeshk (talk) 02:13, 7 January 2012 (UTC)[reply]

thank you very much for your detailed response.--தகவலுழவன் (talk) 02:13, 9 January 2012 (UTC)[reply]

Capitalization bug[edit]

Hi. It appears that the CSVLoader won't work unless the first character in the article name (the first field) is uppercase.--Father Goose (talk) 04:36, 20 January 2012 (UTC)[reply]

I noticed that AWB was converting the first character to uppercase. I have modified the CSVLoader program to do the same. Please download the new version, 1.0.0.15. — Ganeshk (talk) 12:02, 20 January 2012 (UTC)[reply]

Great, thank you. :-) --Father Goose (talk) 00:09, 21 January 2012 (UTC)[reply]

AWB failed to automatically remove[edit]

Dear Ganeshk! When I use WAB to create new aarticles by using bot's account in automatical mode, I got the message: "AWB failed to automatically remove the page from the list while skipping the pages. Please remove it manually".

After that, program can not continue create the article even in manual mode or other account. Could you please help me. Thank you in advance!--Cheers! (talk) 14:34, 22 April 2012 (UTC)[reply]

Can you please post some screenshots for each step that you are doing? That will help me troubleshoot the problem. Thanks. — Ganeshk (talk) 15:12, 22 April 2012 (UTC)[reply]

I found the problem. When I prepare the file in excel, there is a '2 space between two character in the first collum. For axample, [Nam=Hung] and [Nam==Hung]. Thank you vey much for your consider.--Cheers! (talk) 07:36, 26 April 2012 (UTC)[reply]

Great. No problem. — Ganeshk (talk) 21:49, 26 April 2012 (UTC)[reply]

AWB said No changes[edit]

Dear Ganeshk! I'm using your plugin and AWB 5.4.0.0. For my first steps I used your example "CSV data". All works fine until I pressed "start". AWB told me for each row: No Changes. Press the "Skip" button below to skip to the next page. What did I wrong? Thank You! --Poldi66 (talk) 20:01, 3 October 2012 (UTC)[reply]

Hi Poldi66, can you post the CSV data that you had used here? — Ganeshk (talk) 12:17, 4 October 2012 (UTC)[reply]

I got it! The file have the extension txt, but it does not contain raw txt. I've saved the file with edit and know it works. Thank you for the conversation and the friendly welcome. --Poldi66 (talk) 19:20, 4 October 2012 (UTC)[reply]

You are welcome. I see you that you are with the Perrypedia. Let me know if you need any further help. — Ganeshk (talk) 02:19, 5 October 2012 (UTC)[reply]

Creating a list on a page[edit]

Hi Ganeshk. Can you remind me, please, of the link for information on how to chew a large CSV file using your wonderful CSVLoader and then spit it out into a list on one page? I think I need to make a module (not sure which one) which then creates a file in my Project folder, by I might be wrong. Diolch yn fawr! Llywelyn2000 (talk) 14:36, 12 October 2013 (UTC)[reply]