Need to delete all line breaks in source Thread poster: kd42
| kd42 Estonia Local time: 11:23 English to Russian
Hi, a [*****] client sent me a project with a lot of line breaks, because it is converted from a PDF document by a lazy PM assistant. I heavily use MT which does not realize that a line break is just a white space, so I want to convert the line breaks to spaces. Is there any simple way to do it? Thank you. | | | Michael Beijer United Kingdom Local time: 09:23 Member (2009) Dutch to English + ... | Stepan Konev Russian Federation Local time: 11:23 English to Russian CleanUp Tasks | Jun 6, 2021 |
You can use this CleanUp Tasks plugin Here is how. Find "Modifying text" and follow the instructions from there to create a rule. Search: \n Replace: just type a spac... See more You can use this CleanUp Tasks plugin Here is how. Find "Modifying text" and follow the instructions from there to create a rule. Search: \n Replace: just type a space char here Check the "Regex" box. Click Save as. Before using this rule, uncheck all other options except "Use Conversions". Also, I guess you can edit the sdlxliff file with Notepad, but I never tried that. ▲ Collapse | | | Multiverse Solutions s.r.o. (X) Local time: 10:23 Polish to English + ... directly in DOC(X) | Jun 6, 2021 |
If you have a Word file, open the Find & Replace window and use: Find ^p Replace with a single space This will combine all neighbouring paragraphs into larger units. However, if there are no empty lines between paragraphs, you will get a single paragraph that will need manual splitting into sentences / segments / paragraphs. To speed up cleaning, you may use eg three spaces in the Replace field and apply Highlight. Inserting manual Enter (paragraph mark) where needed into a three-spaces space is easy. Cleaning up excess spaces is equally easy after the whole process (two spaces in Find, one space in Replace).
[Edited at 2021-06-06 05:29 GMT] | |
|
|
Samuel Murray Netherlands Local time: 10:23 Member (2006) English to Afrikaans + ... | kd42 Estonia Local time: 11:23 English to Russian TOPIC STARTER Thanks for the suggestions everyone, I'll try MS Word | Jun 6, 2021 |
Thanks a lot for coming to help me, Michael, Stepan, Multiverse, Samuel. No I don’t have the source Word document or Excel workbook, must work on sdl data. The plugin recommended by Stepan repeatedly crashes Studio at the very beginning, with a message which I have no intention to investigate. It is nearly impossible to delete line breaks in Notepad because sdl data contains the source twice, and you should edit only the second occurrence. Therefor... See more Thanks a lot for coming to help me, Michael, Stepan, Multiverse, Samuel. No I don’t have the source Word document or Excel workbook, must work on sdl data. The plugin recommended by Stepan repeatedly crashes Studio at the very beginning, with a message which I have no intention to investigate. It is nearly impossible to delete line breaks in Notepad because sdl data contains the source twice, and you should edit only the second occurrence. Therefore, I am going to copy-paste the xliff data from Notepad to a Word document, make the second instance of source bold, and then convert all bold paragraph marks to spaces. Then I will copy-paste the data back into xliff using Notepad. If I do not come back with my grievances and curses, it means this workflow was a success. Stay healthy and have a nice day! =) ▲ Collapse | | | kd42 Estonia Local time: 11:23 English to Russian TOPIC STARTER A few more words | Jun 6, 2021 |
A couple of updates. 1) — The plugin recommended by Stepan crashed Studio because I cut the body of the xliff data and pasted it into MS Word, intending to do the conversion using MS Word, then I decided to ask the colleagues. So I restored the data, the plugin stopped crashing Studio, I created a rule to replace /n with a white space using Regex, and it did not work, most likely I am making a mistake or missing something which is obvious to the plugin developer or an ... See more A couple of updates. 1) — The plugin recommended by Stepan crashed Studio because I cut the body of the xliff data and pasted it into MS Word, intending to do the conversion using MS Word, then I decided to ask the colleagues. So I restored the data, the plugin stopped crashing Studio, I created a rule to replace /n with a white space using Regex, and it did not work, most likely I am making a mistake or missing something which is obvious to the plugin developer or an advanced Studio user. So I quit at this stage. 2) —— I opened sdl data in Notepad selected the tag and everything after it, cut and pasted it into Word. This is necessary because if you try and open the data in Word, if will attempt to somehow interpret the xml, and fail. When the data was in Word, I ran a find/replace pass with “Match wildcards” active, searching for \\*\ and replacing it with just the formatting: bold font. Then I replaced all bold paragraph marks with spaces. Then I copy-pasted the data back into Notepad and saved it. The resulting xliff opens in Studio and has no line breaks. Have a nice, working afternoon! =) ▲ Collapse | | | Stepan Konev Russian Federation Local time: 11:23 English to Russian Probably wrong version, you didn't mention yours | Jun 6, 2021 |
kd42 wrote: The plugin recommended by Stepan repeatedly crashes Studio at the very beginning, with a message which I have no intention to investigate. Probably you need the other version (there are two of them). This one is for Trados 2015-2017: https://appstore.sdl.com/language/app/cleanup-tasks/550/ | |
|
|
kd42 Estonia Local time: 11:23 English to Russian TOPIC STARTER The crash was caused by me | Jun 6, 2021 |
Thanks, Stepan, see update 1) above. | | | Stepan Konev Russian Federation Local time: 11:23 English to Russian Ah, I see now, ok | Jun 6, 2021 |
I agree that some Trados features require stepping out of the comfort zone to use them properly. That's true. However, before posting my suggestion, I tried it myself and it worked. Btw you mentioned that you tried /n. It must be \n instead.
[Edited at 2021-06-06 13:39 GMT] | | | kd42 Estonia Local time: 11:23 English to Russian TOPIC STARTER Most likely it was my mistake | Jun 6, 2021 |
Stepan Konev wrote: I agree that some Trados features require stepping out of the comfort zone to use them properly. That's true. However, before posting my suggestion, I tried it myself and it worked. Btw you mentioned that you tried /n. It must be \n instead. I never doubted your knowledge and skills. I checked the settings file and it is /n instead of \n. You solved the problem. The developer might wish to somehow simplify regex or trigger a warning in such or similar cases, because very few translators have the background and sharp eye like you do. I sometimes receive big projects containing very many small files with unwanted line breaks. Using Word is not feasible with them. Thanks to you I now have a good solution to this issue, I owe you a bottle of Old Tallinn. =) | | | kd42 Estonia Local time: 11:23 English to Russian TOPIC STARTER Need to add a space between number and units | Aug 28, 2021 |
G'day everyone. I got this "Cleanup Source" plugin working on my system, perform simple tasks, and has just finished scanning its entire manual, with no result. My problem: the source text is full of values with units, like -- 60V, 77Hz, 12KW, where I need to separate the numeric value from the units with a space, like this -- 60 V, 77 Hz, 12 KW. In MS Word I use the following pattern Find: ([0-9])(Hz) Replace with: \1 \2 (meaning "the te... See more G'day everyone. I got this "Cleanup Source" plugin working on my system, perform simple tasks, and has just finished scanning its entire manual, with no result. My problem: the source text is full of values with units, like -- 60V, 77Hz, 12KW, where I need to separate the numeric value from the units with a space, like this -- 60 V, 77 Hz, 12 KW. In MS Word I use the following pattern Find: ([0-9])(Hz) Replace with: \1 \2 (meaning "the text within the first pair brackets" "a space" "the text within the second pair brackets") Question: Does Cleanup Source have this syntax/capability? Thank you. ▲ Collapse | |
|
|
Stepan Konev Russian Federation Local time: 11:23 English to Russian Replace \ with $ | Aug 28, 2021 |
Find: ([0-9])([A-z]) Replace with: $1 $2 *I'm not sure if this situation is possible (when you use special letters in units), but just in case... If you replace 'z' in the above regex with 'ž', the regex will also capture all letters of the Estonian alphabet including Šš, Žž, Õõ, Ää, Öö, Üü.
[Edited at 2021-08-28 20:40 GMT] | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Need to delete all line breaks in source Trados Business Manager Lite | Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.
More info » |
| Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |